A simple nomogram score for screening patients with type 2 diabetes to detect those with hypertension: A cross-sectional study based on a large community survey in China

Objectives Compared with unaffected individuals, patients with type 2 diabetes (T2DM) have higher risk of hypertension, and diabetes combined with hypertension can lead to server cardiovascular disease. Therefore, the purpose of this study was to establish a simple nomogram model to identify the determinants of hypertension in patients with T2DM and to quickly calculate the probability of hypertension in individuals with T2DM. Materials and methods A total of 643,439 subjects participating in the national physical examination has been recruited in this cross-sectional study. After excluding unqualified subjects, 30,507 adults with T2DM were included in the final analysis. 21,355 and 9,152 subjects were randomly assigned to the model developing group and validation group, respectively, with a ratio of 7:3. The potential risk factors used in this study to assess hypertension in patients with T2DM included questionnaire investigation and physical measurement variables. We used the least absolute shrinkage and selection operator models to optimize feature selection, and the multivariable logistic regression analysis was for predicting model. Discrimination and calibration were assessed using the receiver operating curve (ROC) and calibration curve. Results The results showed that the major determinants of hypertension in patients with T2DM were age, gender, drinking, exercise, smoking, obesity and atherosclerotic vascular disease. The area under ROC curve of developing group and validation group are both 0.814, indicating that the prediction model owns high disease recognition ability. The p values of the two calibration curves are 0.625 and 0.445, suggesting that the nomogram gives good calibration. Conclusion The individualized nomogram model can facilitate improved screening and early identification of patients with hypertension in T2DM. This procedure will be useful in developing regions with high epidemiological risk and poor socioeconomic status just like Urumqi, in Northern China.

Introduction Diabetes, cancer and cerebrovascular and cardiovascular diseases (CCVd) are known as the major chronic diseases in the world, which threatening human life and people's physical and mental health with the rising trend. Blood glucose and blood pressure control are the priority of national public health in China [1]. The seventh statistical result of the International Diabetes Federation (IDF) shows that there are 425 million people suffering from diabetes worldwide [2]. The prevalence of hypertension should be paid more attention as well, a nationwide survey shows that 29.6%(about 311.9 million) of Chinese adults over 18 years old have high blood pressure, and 41.3%(about 244.5 million) are in pre-hypertension (pre-HTN) [3], while the treatment and control rate in all hypertensives are less than 30% and 10%, respectively [4].
In general, diabetes and hypertension coexist, and the two diseases will aggravate each other [5,6]. A national survey of outpatients and community residents in China shows that nearly a quarter of diabetic patients have hypertension at the same time, but the control rate of blood pressure is low [4,7,8]. As we know, diabetes and hypertension are the main risk factors of cardiovascular disease, and their combination can lead to severer atherosclerosis. Therefore, the early detection and economical screening for hypertension in diabetic patients will be a more urgent challenge for doctors all over the world.
At present, many studies aim at the situation that the prevalence of other diseases increases while both hypertension and diabetes show up, such as cardiovascular disease [9], kidney disease [10], heart disease [11], etc. There are also studies exploring whether there should be differences in the treatment path between patients with hypertension alone and those with diabetes mellitus and hypertension [12]. Dutch scholars try to find whether the excessive intake of protein in patients with type 1 diabetes (T1DM) is a risk factor for hypertension, but they do not get the correlation between the two [13]. As far as I know, there are few studies on the classification model and determinant factors analysis of hypertension for diabetic patients. Previous literature had confirmed that most diabetes and hypertension could be prevented by reasonable intervention and control [14,15]. Therefore, it is of great significance to establish a simple and rapid hypertension screening model in patients with T2DM and to improve the detection level of hypertension in auxiliary medical institutions.
The objective of this study is to use easily obtained data to construct a nomogram model to identify T2DM patients with a high likelihood of hypertension. Our approach will be useful in locations with high epidemiological risk and poor socioeconomic status, which do not pay attention to regular blood pressure monitoring, such as Xinjiang, China.

Patient selection
The Xinjiang physical examination is a free physical examination provided by the Chinese government for all Xinjiang people. A total of 643,439 citizens participating in the physical examination of Urumqi in 2018 was recruited, and we had access to information that could identify individual participants during or after data collection. Subjects who met the following inclusion criteria were eligible to participate in the study: (1) age over 20; (2) Type 2 diabetes mellitus (T2DM) patients; (3) participants signed a written informed consent. After a strict data filtration 30,507 were enrolled, finally. The data were randomly divided into development group (n = 21,355) and validation group (n = 9,152). We have used the data from the development set to build the nomograms and used the data from the validation set to verify the model. This study was performed in accordance with the principles outlined in the Declaration of Helsinki and approved by Xinjiang Uygur Autonomous Region CDC ethical committee and the institutional review board.

Patient characteristics
The Xinjiang physical examination variables include 3 parts: questionnaire, physical examination and laboratory testing. The questionnaire includes information on medical history and lifestyle, such as smoking, drinking, diet and exercise habits. Physical measurement indexes include height, body weight, heart rate, waist circumference and abdominal ultrasound. Abdominal ultrasound can observe the shape and size of the abdominal organs, as well as determine whether these organs have tumors, cysts or stones, including liver, kidney, gallbladder and other organs. Laboratory test indicators include blood glucose and blood biochemistry. In this study, we wanted to establish a simple model that can calculate the possibility of hypertension when it comes to patients with T2DM, only through questionnaire and physical measurement indicators, so this study didn't include the laboratory test indicators.

Diagnosis of T2DM
The definition of T2DM in this study was: 2 hours after meal, blood glucose � 11.1mmol/l, fasting blood glucose � 7.0mmol/l, or the main complaint of T2DM and taking hypoglycemic drugs, the final incidence of adult T2DM was 10.5%, excluding gestational diabetes and T1DM. The prevalence was equivalent to the incidence of previous studies [16].

Diagnosis of hypertension
Blood pressure was measured on both arms of the participants. Hypertension was defined as simultaneous systolic blood pressure � 140 mmHg, simultaneous diastolic blood pressure � 90 mmHg or hypertension with antihypertensive drugs. According to the analysis, the prevalence of hypertension in patients with T2DM included in the study was 43.7%, which was 1.49 times of the prevalence of hypertension in people without T2DM in this study, which was equivalent to the incidence in the previous study [8].

Risk factors
The potential risk factors used in this study to assess hypertension in patients with T2DM included: age, gender, ethnicity, career, smoking, drinking, exercise, diet habits, obesity, atherosclerotic vascular disease (ASCVD), kidney disease, eye diseases, psychosis, gallbladder_disease, bronchitis and tuberculosis.
Basic information of participants: gender included "male" and "female"; career included "trader or service people", "agriculture workers", "factory workers", "soldier" and "others"; ethnic groups were divided into six categories: "Han", "Uygur", "Kazak", "Hui", "Mongolian" and "other nationalities"; living habits included smoking, drinking, exercise and eating habits. Drinking included drinking/time(g) and drinking frequency: "never", "occasionally", "regularly" and "daily"; exercise frequency were divided into four levels:"never", "occasionally", ">1 per week" and "daily"; smoking included smoking amount/day (cigarettes) and smoking situation: "non-smoker","ever-smoker" and "current-smoker"; diet habits were divided into three categories: "meat based", "meat balanced" and "vegetarian based"; In this study, BMI (Body Mass Index) > 28 kg/m 2 was identified as obesity. The diagnosis of gallbladder disease and kidney diseases were based on the description from the patients and in conjunction with B ultrasonic examination. Gallbladder disease include cholecystitis, cholecystectomy and gallstone, kidney diseases included: diabetic nephropathy, renal failure, acute/ chronic nephritis, hydronephrosis, and renal calculus. The other baseline comorbidities considered in this study were determined by health questionnaire survey of morbidities, whether the participant has previously been diagnosed by a doctor. The presence of eye diseases was defined as diagnosis of one or more of the following: retinal hemorrhage, papilledema and cataract; coronary, cerebrovascular or peripheral disease were collectively called ASCVD, participants were asked to answer the question "have you ever had a heart attack, or have you used a stent or taken a bridge?", ASCVD in this study included: myocardial infarction, cerebral infarction, angina pectoris, stroke, cerebral hemorrhage and coronary heart disease. While patients had one or more diseases, the relative variable would be defined as yes, otherwise would be defined as no.

Statistical analysis
The data of national physical examination are large, and with jumbled variables, existing many missing and abnormal values. So data pre-processing is a very important step, the quality of preprocessing will directly affect the performance of the later prediction models [17]. Firstly, we deleted nearly 200 variables with no meaning to this study. Secondly, we filled in outliers and nulls, classification variables were filled with the most frequent value, and continuous variables were filled with mean value.
While comparing the baseline characteristics between the development group and validation group, differences with a two-sided p-value of <0.05 were deemed statistically significant. Categorical variables were presented as the number (percentage). Continuous variables consistent with a normal distribution were presented as mean±standard deviation; otherwise, the median and quartile are used. Chi-square test were used to compare the differences in categorical variables. Independent sample t-tests were used to compare the differences in normal continuous variables, while the Wilcoxon test was used for nonnormal continuous variables.
Lasso regression was used to screen the risk factors. Lasso regression is a kind of shrinkage method in linear regression model. It shrinks the estimated value of uncorrelated variables to close to zero, and then filter out non-zero variables. Lasso combines the advantages of selection process (easy to explain) and expression (robust), which is particularly useful in large data sets requiring efficient and fast algorithms [18]. Analysis steps: Step 1: included 18 effective variables in the data set into the lasso process, and the optimal penalty parameter λ was determined by 10-fold cross validation.
Step 2: multivariable logistic regression analysis was used to build a predicting model by incorporating the feature selected in the LASSO regression model. The features were considered as odds ratio (OR) having 95% confidence interval (CI) and as pvalue. The statistical significance levels were all two-sided. Variables with the p-value of 0.05 were included in the model [19,20]. Finally, the prediction model was evaluated in terms of discrimination and calibration.
The resolution of the prediction model referred to its ability to distinguish people with hypertension from people without hypertension. The area under the curve (AUC) of ROC was used to evaluate whether the prediction results of the model meet the requirements [21]. AUC is usually between 0.5 and 1.0. The closer AUC value is to 1, the stronger the recognition ability of prediction model is [22]. Hosmer-Lemeshow good of fit test was used to evaluate the calibration of the prediction model. If the smaller the chi square value of the statistics is, the larger the corresponding p value is, the better the calibration of the prediction model will be. If the test results show statistical significance (p < 0.05), it shows that there is a certain difference between the predicted value of the model and the actual observed value, and the model calibration is poor [23].
The open source Python software Version 3.7.2(https://www.python.org) was used for data pre-processing, Pandas library and NumPy library were used for interpolation and processing of outliers, and Matplotlib library was used for data description and outliers judgment; the open source R software Version 3.6.1(http://www.r-project.org) was used for data modeling, CARET-package was used to divide the data into development and validation group randomly, LASSO analysis was performed with glmnet-packages, RMS-package was used to establish a nomogram model, the nomogramEx-package was used to figure the scores of every character in nomogram, the ROC was plotted using ROCR-package and PROC-package, and the calibration curves had been drawn by RMS-package.

Patient demographics
A total of 30,507 cases were included in the study, including 21,355 in the development set, 9,152 in the validation set. All data of cases including demographic and disease in the two groups were given in Table 1. The comparison of baseline data showed that there was no significant difference between the development set and the validation set.

Characteristics selection
Through lasso regression, we got 9 non-zero coefficient characteristics, which showed that we reduced 18 indexes to 9 indexes. As it was shown in Fig 1A and 1B. These features included age, gender, drinking frequency, exercise frequency, smoking situation, obesity, ASCVD, kidney diseases and gallbladder diseases ( Table 2).

Independent prognostic factors in the developing set
The 9 variables obtained by lasso regression were included in logistic multiple regression model, and the regression results were shown in Table 2. Through model analysis, we knew that: age(OR 1.06), gender(female, OR 0.62), drinking frequency(occasionally: OR 1.29, regularly: OR 6.58, daily: OR 6.94), exercise frequency(occasionally: OR 0.60, >1 per week: OR 0.60, daily: OR 0.59), smoking situation(ever: OR 0.92, current: OR 2.01), obesity(yes: OR 2.23), ASCVD(yes: OR 3.83), kidney diseases(yes: OR 1.32), tuberculosis(yes: OR 1.27) were independent determinant factors of hypertension (Table 2). In addition, there was no evidence of multicol-linearity among the covariates included in the model. Maximum VIF(variance inflation factor) was 1.245, and lowest eigen value was 1.003.

Nomogram of hypertension
Based on the logistic multiple regression, OR of categorical variables kidney diseases and tuberculosis were closed to 1, indicating that these variables had less influences, so we didn't take theses in the model.
Finally, we got Urumqi Hypertension Nomogram Model consisting of 7 factors (Fig 2). The longer the length of line, the greater the impact of risk factors on the effectiveness rate of hypertension. Each sub-type in these variables is assigned a score. The cumulative sum of each "point" is the "total points". The corresponding "diagnostic possibility" of "total point" is the predicted probability of hypertension suggested by our designed nomogram. Take an example of nomogram usage: a sample was randomly selected from the subjects. A 40 years old man with diabetes, regular drinking, exercise occasionally, no smoking, with ASCVD but without obesity history, and we can calculate that his total score was 109.66, and the probability of hypertension was close to 80%.

Validation of the nomogram
The validation of the model was based on discrimination and calibration. Plot prediction accuracy ROC, and calculate AUC value of development and validation group, respectively. And the AUC value of development group and validation group were both 0.814 (Fig 3A and 3B), indicating that nomogram prediction model had good discrimination ability. The calibration of the prediction model was evaluated by Hosmer-Lemeshow good of fit test, and the calibration curve (Fig 4A and 4B) was obtained. When p>0.05, the calibration ability of the model is good. The calibration curve of the development group was p = 0.625, and the calibration curve of the validation group was p = 0.445, all of which were greater than 0.05, indicating that the model had good calibration ability.

Discussion
Diabetes and high blood pressure have become the main chronic diseases endangering the health of Chinese adults [23][24][25]. Hypertension and diabetes are the main manifestations of metabolic syndrome. The results of this study show that the prevalence of hypertension in patients with T2DM in Urumqi, Xinjiang, China is 43.7%, which was 1.49 times higher than that in patients without diabetes. We established a nomogram to predict the possibility of hypertension, with the feature of user-friendly digital interfaces, increased accuracy, and more easily understood prognoses, which can be widely used in prognostic devices in oncology and

PLOS ONE
medicine and this would be the first to fit the nomogram model for the study of determinant factors of hypertension with T2DM. Studies have shown that diabetes can accelerate the hardening of the aorta, reduce its own compliance, reduce its elastic expansion ability, and increase the systolic pressure; diabetic peripheral nerve damage can cause microvascular contraction dysfunction and change the diastolic pressure [26,27]. This may be because hypertension and diabetes are both rooted in the same soil as insulin resistance [28,29]. In the early stage of insulin resistance, hyperinsulinemia can increase sympathetic nerve activity by promoting the reabsorption of sodium in renal tubules, accelerate heart rate, increase vascular resistance, promote the proliferation of smooth muscle of arterioles, lead to lumen stenosis, increase intracellular calcium concentration, and be more sensitive to pressor substances, so as to increase blood pressure. T2DM and hypertension are viewed as homologous diseases. Both diseases share the same etiology, mutual influence and harm. In recent years, epidemiological studies have clarified the role and pathogenesis of microcirculation disorders in diabetic patients [30], which is more closely related to multiple organs and body systems, including the heart, brain, blood vessels, kidneys, eyes and nerves [31,32].
Besides, Rahul Aggarwal reported that the combination of T2DM and hypertension is more likely to be associated with person who is older, male, higher BMI and less physical activity [33], a reasonable diet, exercise, weight control, and central obesity control are useful in

PLOS ONE
preventing hypertension [34][35][36][37]. Previous studies have also provided us with evidence of aggravation of organ damage caused by alcohol in diabetic patients [38]. Different from that in healthy people, diabetic patients should abstain from alcohol. Besides, we found that smoking is the risk factor of hypertension, and the harm of nicotine to human endocrine system and vascular system has already been proved [39]. Therefore, we can control the occurrence and development of hypertension by quit drinking and smoking, strengthening exercise, and losing weight. Our study found that the possibility of developing hypertension is related to the degree of smoking, drinking and exercising, suggesting that although we now smoke, quitting smoking is beneficial to disease control. The higher the frequency of drinking, the more likely we are to develop the disease, suggesting that although we may not be able to abstain completely, reducing the frequency of drinking may also be beneficial in controlling the disease.
However, some known risk factors for hypertension, such as unhealthy diet [40], were not associated with the incidence of hypertension in this study. The reason for this result may be that compared with the general population, diabetic patients may focus on healthy living

PLOS ONE
habits, which may due to the ability of diabetic patients to get more medical attention and more information about health improvement measures, such as diabetics tended to control blood sugar by reducing the intake of high-fat and high-oil foods [41].

PLOS ONE
Diabetes and hypertension are considered to be the main risk factors for cardiovascular disease and stroke [42,43], and the mortality rate of cardiovascular disease in T2DM and hypertension patients is higher than that in people with two diseases alone [44]. In view of the increasingly severe trend of diabetes and hypertension in recent years, we developed a nomogram to identity of hypertension in patients with T2DM. This nomogram is very intuitive, so diabetic patients can easily calculate their probability of hypertension without the help of nursing staff. This research has several advantages. First, as far as we know, this is the first model to develop and evaluate the possibility of hypertension in patients with T2DM. Second, we used a national database to identify a large and representative sample of national health examination. Third, The variables we used were all from the questionnaire, and there was no need to measure any indicators (such as blood pressure, blood routine, height, weight, etc.). The variables were very easy to obtain, and this method was especially effective in areas with poor medical treatment and lacking attention to blood pressure measurement. Although it is very easy to measure blood pressure, there are still some regions, where regular blood pressure measurement is not completely universal. Therefore, our study can be used as an effective auxiliary diagnostic tool for hypertension. Fourth, it includes four adjustable variables: smoking, drinking, exercise habits and obesity, which can encourage people to prevent diseases through a healthy lifestyle, as studies have shown that healthy lifestyle is an economical and effective means to prevent chronic diseases [45,46].
We did not include the family history information in our model, and many people suffer from undiagnosed diabetes or hypertension [47], given the inadequacy of China's health system in the last century. As a result, many participants were uncertain about their previous generation's medical history, and family history parameters did not improve risk prediction, so we did not collect data on family history.
This study also has some limitations. First of all, our study is a cross-sectional study, which fails to determine the cause and effect of the disease and couldn't know the interaction between diabetes and hypertension. Second, we are unable to collect data on other potential risk factors, such as intake of sodium and potassium in the diet. Third, the data used in this study is the physical examination data of Urumqi, China, which may limit the extrapolation of the results. It is generally believed that there are some differences in the pathophysiology of diseases between Asians and Caucasians, and there are similar differences between Asian countries. Fourth, previous studies have confirmed that education are also important determinants of

PLOS ONE
hypertension. However, our physical examination data failed to obtain the education of participants. Finally, this cross-sectional study can not predict the disease. However, the model can obtain the probability that an individual has a disease, and the risk factors can remind people to improve their living habits and control the occurrence and development of the disease.

Conclusion
In view of the increasingly serious harm of diabetes mellitus combined with hypertension to human beings, based on a large-scale physical examination population, this study used questionnaire variables to establish a screening model, and obtained the risk factors of the disease, and high accuracy of the model. Based on these results, we developed a nomogram, which could easily and quickly calculate the possibility of an individual suffering from a disease, this method is useful especially in those areas with poor medical condition, which can also encourage people to live a healthy life to prevent the occurrence of disease. However, further studies are needed to develop models that more precisely predict various comorbidities (including hypertension) and support preventive guidelines and interventions for patients who have survived T2DM.