Development and validation of a prediction model estimating the 10-year risk for type 2 diabetes in China

Purpose To derive and validate a concise prediction model estimating the 10-year risk for type 2 diabetes (T2DM) in China. Methods A total of 11494 subjects from the China Health and Nutrition Survey recorded from 2004 to 2015 were analyzed and only 6023 participants were enrolled in this study. Four logistic models were analyzed using the derivation cohort. Methods of calibration and discrimination were used for the validation cohort. Results In the derivation cohort, 257 patients were identified from a total of 4498 cases. In the validation cohort, 92 patients were identified from a total of 1525 cases. Four models performed nicely for both calibration and discrimination. The AUC in the derivation cohort for models A, B, C and D were 0.788 (0.761–0.816), 0.807 (0.780–0.834), 0.905 (0.879–0.932) and 0.882 (0.853–0.912), respectively. The Youden index for models A, B, C and D were 1.46, 1.48, 1.67 and 1.65, respectively. Model C showed the highest sensitivity and model D showed the highest specificity. Conclusion Models A and B were non-invasive and can be used to identify high-risk patients for broad screening. Models C and D may be used to provide more accurate assessments of diabetes risk. Furthermore, model C showed the best performance for predicting T2DM risk and identifying individuals who are in need of interventions, current approach improvement and additional follow-up.


Results
In the derivation cohort, 257 patients were identified from a total of 4498 cases. In the validation cohort, 92 patients were identified from a total of 1525 cases. Four models performed nicely for both calibration and discrimination. The AUC in the derivation cohort for models A, B, C and D were 0.788 (0.761-0.816), 0.807 (0.780-0.834), 0.905 (0.879-0.932) and 0.882 (0.853-0.912), respectively. The Youden index for models A, B, C and D were 1.46, 1.48, 1.67 and 1.65, respectively. Model C showed the highest sensitivity and model D showed the highest specificity.

Conclusion
Models A and B were non-invasive and can be used to identify high-risk patients for broad screening. Models C and D may be used to provide more accurate assessments of diabetes risk. Furthermore, model C showed the best performance for predicting T2DM risk and identifying individuals who are in need of interventions, current approach improvement and additional follow-up.

Introduction
The global prevalence of adult diabetes is increasing and becoming a major public health problem. In 2017, The International Diabetes Federation (IDF) estimated the global diabetes prevalence as 425 million among adults aged 20-79 years of age (8.8%) and over 224 million adults were found to be living with undiagnosed diabetes. A higher proportion of undiagnosed diabetes cases were found in low-and middle-income countries. Moreover, over one third (36.5%) of deaths attributed to diabetes occurred in people under the age of 60 years [1]. In China, the true prevalence of undiagnosed diabetes may be underestimated. A total of 9.7%-10.9% of the population was diagnosed with diabetes and 35.7%-60.7% were cases of undiagnosed diabetes [2][3][4]. Type 2 diabetes (T2DM) and its associated complications have caused significant economic burden to patients and is a major public health challenge facing China [5,6]. Thus, the prevention and early management of diabetes and its complications are necessary this burden for the general Chinese population. Risk prediction models have considerable potential to help diagnose a patient. During the past 20 years, dozens of prediction models for diabetes have been developed. However, none of these models have been routinely used in China thus far. Clinical utility for imperfect prediction models has been a concern. Risk scores derived from Caucasian populations may not be suitable for Chinese populations as there is significant geographical and biological variation in China. There have been many types of T2DM risk prediction scores and models generated in China [7][8][9][10][11][12][13][14][15][16], but they all face several limitations. Most do not account for lifestyle variations, such as physical activity, dietary behavior or sleep duration. Others are based on invasive and cost-effective data such as blood tests and radiology or on a small and inappropriate selection of the cohort. Others are based on a short-term follow-up or lack transparent reporting of the steps deriving the model.
The aims of this article were to derive large population-based, innovative and simple models for screening high-risk non-diabetic individuals in China using available data. We also assess the clinical utility of four algorithms using decision curve analysis. In addition, this study also compared the performance of models developed to evaluate their effectiveness.

Methods
This cohort study complies with the Prognosis Research Strategy (PROGRESS) framework. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) statement was used in this study.

Data and participants
Data was downloaded from the China Health and Nutrition Survey on 2019/1/9 (https://www. cpc.unc.edu/projects/china/). The survey was performed within a period of 7 days using a multi-stage, random clustering method, and selected samples from 15 provinces and cities in China. A total of 11494 subjects aged 20-80 years were observed from 2004 to 2015.
A flow diagram of the study performed is summarized in Fig 1. Patient exclusion criteria excluding included individuals (1) with missing sociodemographic and clinical data, (2) with prevalent T2DM or use of anti-diabetic drug treatment at the time of baseline, (3) who are pregnant, (4) with a history of cancer and cardiovascular diseases and (5) who are <20 or >80 of age. A total of 5453 participants were excluded and 6023 subjects were included in the final analysis.
A total of 8 provinces were selected and about three-fourths of participants (N = 4498) were added into a training data set. The remaining 2 provinces were included in the outside validation set (N = 1525) using a random sampling method. density lipoprotein (LDL-c), insulin) were measured. Previous history and demographic characteristics were obtained through a standard questionnaire. Detailed household dietary intake information for three consecutive days was collected. History of CVD was defined as previous ischemic heart disease and/or cerebrovascular incidents. BMI was calculated as weight (kg)/ height (m 2 ). Height was measured to the nearest 0.5 cm and weight was measured to the nearest 0.1 kg. Blood pressure was measured in the right arm three times and averaged using a mercury sphygmomanometer. After overnight fasting at least 10 h, blood samples were collected in the morning and were processed within 2 h.

Sample size
The rule of thumb for fitting multivariate models suggests that each variable (EPV) requires 10 events to avoid overfitting. For models A, B, C and D, we finally chose 8, 17, 20 and 5 predictors variables, respectively. The sample sizes for models A, B, C and D were 80, 170, 200 and 50, respectively. We found that our sample size completely met this rule.

Data preparing
Data preparation included processing missing data, selection of variables and defining and balancing the training and validation data sets. Individuals with missing outcomes, sociodemographic or clinical values were excluded from our analysis. Univariable and multivariable binary logistic regression as well as least absolute shrinkage and selection operator (LASSO) regression were used for variable selection. Significance for univariable and multivariable binary logistic regression was set at P < 0.05. From this, the relationship between each variable with T2DM could be identified. LASSO regression chose potential predictors by choosing minimum λ criteria, as the area under the receiver operating characteristic (AUC) value plotted versus log (λ). In addition, penalty function in the LASSO regression helped avoid overfitting and aided in the development of a robust model. A simple random sampling method using a 3: 1 ratio was used to balance the training and validation data sets.

Model development and validation
The 6023 participants were divided into a training set (N = 4498) and a validation set (N = 1525) using a simple random sampling method involving a 3: 1 ratio. Data in the training set were used to develop models for undiagnosed diabetes. Univariable and multivariable binary logistic regression and lasso regression were applied to filter risk factors. Nomograms were constructed using the rms package in R software. To achieve an unbiased estimate for the models, the internal validation was performed in the training set using a bootstrap sampling method. Then, external validation was performed using the AUC in the validation set. Differences between different AUCs were compared using the DeLong method. The Youden's index was used to identify the best cut-off value for undiagnosed diabetes. The accuracy of these models was calculated. Clinical usefulness was evaluated using net benefit. Decision curves of the four models were plotted using rmda package in R software (version 3.6.0 http://www.r-project.org).

Statistical analysis
Continuous variables were described as median (25th-75th percentile) or mean ± SD. Categorical data were presented as number (percentage). The difference between the model derivation cohort and model validation cohort was compared using Student's t tests for continuous data and Chisquared tests for categorical variables. The Mann-Whitney U and Kruskal-Wallis tests were applied for variables with skewed distributions. The predictive performance of the constructed predictive models was evaluated using accuracy, sensitivity and specificity, as well as receiver operating characteristic (ROC) curves and the AUC value. The models were evaluated by Youden index, accuracy, precision, sensitivity, specificity, optimal cutoff value, positive predictive value (PPV), negative predictive value (NPV), true positive rate (TPR), false positive rate (FPR), false negative rate (FNR), true negative rate (TNR), false discovery rate (FDR) and AUC. For each variable, one category was chosen as the control and odds ratios (ORs) and 95% confidence intervals (CIs) were calculated for other categories. All statistical analyses were performed using R software version 3.6.0 (http://www.r-project.org). R packages aiding in these analyses included rms, rmda, pROC, and shiny, plot. A P <0.05 was considered as statistically significant for all tests.

Baseline characteristics of participants
A total of 5471 cases who did not have a complete record were excluded from this analysis. As a result, 6023 subjects met the inclusion criteria and contained data of 10 or more years of follow-up visits. Three quarters of this group (n = 4498) was randomly allocated into the derivation cohort and the remaining (n = 1525) cases were allocated into the validation cohort ( Fig  1). Table 1 summarizes the baseline characteristics of study subjects included in both the derivation and validation sets. The mean age of individuals in the derivation cohort was 42.0 ± 18.8 years and a total of 2092 (46.5%) were male. In total, 4498 (67%) cases contained complete information for model A variables, 3950 (59%) for model B, 2617 (39%) for model C and 3241 (48%) for model D. A total of 257 newly diagnosed diabetes cases accounted for 5.7% of the total population from 2004 to 2015 in the derivation cohort. The development and validation cohorts showed similar sociodemographic, physical examinations and laboratory characteristics. Table 1 summarizes medical characteristics for the cases. For the variables of interest in the derivation cohort, the average of the triceps skin fold was 15.0 ± 8.1 cm, average sleep hours were 8.15 ± 1.32 hours, median calorie consumption was 2213.2 kcal, median carbohydrate consumption was 316.4 g, median fat consumption was 68.6 g and median protein consumption was 66.7 g. More details of study participants are shown in Table 1.  (Fig 2A and 2B). A total of 21 features were reduced to 17 potential predictors in the cohort for model B (Fig 2C and 2D). A total of 27 features were reduced to 20 potential predictors in the cohort for model C (Fig 2E and 2F). A total of 6 features were reduced to 5 potential predictors in the cohort for model D (Fig 2H  and 2K). We finally chose 8, 17, 20 and 5 predictors from 8, 21, 27, 6 primary variables for developing models A, B, C and D, respectively. In addition, there were no features observed with zero coefficients in the LASSO logistic regression model. The coefficients of all features are listed in S2 Table.

Development of the logistic individualized prediction model
As shown in Fig 3A-3D, the nomogram of the logistic model A was a quantitative and convenient tool that predicts the risk of T2DM using age, gender, ethnicity, hypertension, smoking, alcohol intake, waist and BMI in the training cohort (S2 Table).

Model calibration
The mean absolute error of models A and B were 0.006 and the mean absolute error of models C and D were 0.004. Internal bootstrap validation showed that the nomogram of the model A derived curve was close to the bias-corrected curve and the ideal curve at a probability between 0 and 0.20. When the probability was lower than 0.20, model A may underestimate the probability of undiagnosed diabetes (Fig 5A). Model B was similar, where the start point of underestimation was also 0.20 (Fig 5B). The nomogram of the model C derived curve performed well on all scales. Model D resembled model C (Fig 5C and 5D). Models C and D fitted well and showed good calibration.

Decision curve analysis
To compare clinical usefulness of the models, decision curve analysis was performed as shown in Fig 6. On the y axis, the vertical distance to the x axis showed the standard net benefit. The x axis showed the threshold probability for diabetes. Each line represented clinical usefulness for each model. In our analysis, models A, B, C and D all demonstrated better cost-effectiveness than no treatment. Models C and D exhibited the best performance. Models A and B showed slightly improved net benefit compared to models C and D. Compared to strategies that either no or all patients received intervention, models C and D showed higher net benefit. When absolute risk threshold was approximately 60%, these interventions were shown to be useful.

Discussion
In this study, we developed and validated four models to predict the 10-year risk of T2DM in Chinese residents. A total of four models were produced. Both internal and external validation was performed in the cohort and the results showed good performance in discrimination and calibration for the four models. Net benefit curves also demonstrated the clinical benefit of this nomogram. Results showed that although model B did not precede model A significantly, the other two models showed considerable improvement (S3 and S4 Tables), with the best overall performance being shown for model C. Model C showed the best discrimination and highest sensitivity. Even though model D, which only included blood test results, showed better discrimination than models A and B, it had the lowest sensitivity and highest specificity (S3 Table). Model C, based on comprehensive details in lifestyle and clinical results, could be used to provide a multi-aspect management for pre-diabetic or diabetic patients and provide better information regarding potential effects of risk factors. For those who have been diagnosed with diabetes, using nomograms for self-monitoring could delay the occurrence and progress of complications. These models improve current guidance using fixed thresholds for fasting blood glucose or HBA1c levels as diagnostic criterion for diabetes, as it contains subjects with high-risk for T2DM and not only individuals already diagnosed with T2DM. Model C showed the highest clinical net benefit revealed by the DCA curve in Fig 5, meaning that model C had the greatest clinical use. In addition, models A and B with basic and non-invasive prediction factors could be used to identify high-risk diabetes cases that require a test for fasting insulin, blood glucose or HBA1c levels. Patients could perform these tests independently in their homes. Model A was simpler than model B and included 7 basic common risk factors, such as age and gender. Although model B was not superior to model A regarding performance of ROC curve, predictors such as sleep duration and physical activity were more important and controllable. Model B included a more comprehensive detail of lifestyle and could be used for self-management and medical treatment advising. After identifying patients at high risk by using model B, patients need to complete relevant tests including blood lipid, insulin, FBG and HbA1c level analyses. Models C and D may be used to provide more accurate assessments of diabetes risk and individualized blood glucose management. Overall, the use of these models is more accurate in predicting diabetes risk and is also suitable for extensive diabetes screening and selfmonitoring.
In China, there are many T2DM risk prediction scores or models generated [7][8][9][10][11][12][13][14][15][16], Common risk factors included age, sex, ethnicity, waist circumference, BMI and hypertension. Although these unmodifiable risk factors played roles in T2DM, changes in modifiable factors such as dietary behavior can reduce the risk and influence disease progression [18]. Recent meta-analysis also revealed an association between dietary behaviors and physical activity in relation to T2DM [19][20][21][22]. In addition, meta-analysis demonstrated a U-shaped relationship between sleep duration and T2DM risk, with the lowest risk being 7-8 h of sleep per day [23]. This result is consistent with our findings. In our study, we evaluated beverages, physical activity, calorie intake, carbohydrates, fat, protein and sleep duration to establish a risk model emphasizing the impact of lifestyle on T2DM development and progression. There was slight improvement when model B included these variables that may be attributed to the limited number of study subjects. As known, published models in China have not been routinely used in the clinical. QDiabetes-2018 is a successful example of a risk model put into clinical use https://doi.org/10.1371/journal.pone.0237936.g006 [24]. Of the models, seven were from one region [8-10, 12, 13, 15, 16] and two were from multicenter in China [11,14]. In our study, the China Health and Nutritional survey was conducted in 10 provinces including the municipalities of Beijing and Shanghai from 1989 to 2015. Five used nomograms [13][14][15][16]25]. And five used risk score [8][9][10][11][12]. Compared to risk score, a nomogram is more user friendly and accurate based on continuous variables and simple algorithm diagrams. One nomogram used data from an abdominal CT [15], however, the costs of a CT make it unsuitable for screening and extensive use. K. Wang et al. established their nomogram using semi-lab indicators [16], however, there is a lack of medical examinations routinely performed in China. In addition, the AUCs of the nomogram in women were unsatisfactory. In models A and B, it is not consuming to finish the prediction and the AUCs remain satisfactory. Furthermore, these nomograms [13,15,16] did not have sufficient calibration to provide evidence on predicted probability in accordance with actual observed probability. In our model, all risk models were internally validated using the bootstrap sampling method and externally validated in the validation set.
Limitations do exist in this study. First, only 6023 patients contained complete data. There was a significant difference between most characteristics for included and excluded participants (S5 Table). Age was set to be over 20, contributing to a significant difference. According to the excluding criterion, individuals with missing sociodemographic and clinical data needed to be removed from the study and these individuals accounted for a large proportion of subjects. Thus, additional more external validation of these models and more complete data are needed before clinical use. Second, there may be under-ascertainment of T2DM diagnosis since record terms were used as a criterion and we did not have complete data on oral glucose tolerance testing (OGTT) tests. This may lead to misclassification bias for outcomes. Only 257 newly diagnosed diabetes cases from 2004 to 2015 were included in the derivation cohort and overfitting may be difficult to avoid. Our derivation cohort contained 2617 events, the recommended events were at least 10 and there were on average 131 events per variable predictor. Split sample validation is still valuable in this study. Validation has been completed by randomly selecting individuals from 2 provinces in China to develop the score. Furthermore, emergency algorithms, such as machine learning, neural networks and decision trees can be used to build a risk model.

Conclusions
Model A can be completed at home and patients can decide whether to pursue further blood testing. Model B includes more comprehensive details regarding lifestyle and can be used for self-management and to provide advice when seeking medical treatment. Model C showed the best performance and can identify patients who need more interventions and intensive followups. Furthermore, it can be used to develop an individualized intervention plan. If these models were used in the clinic, such as in medical electronic record system and self-management systems, there would be a decrease in economic burden associated with diabetes and better management of complications associated with this malady.