Dietary Information Improves Model Performance and Predictive Ability of a Noninvasive Type 2 Diabetes Risk Model

There is no diabetes risk model that includes dietary predictors in Asia. We sought to develop a diet-containing noninvasive diabetes risk model in Northern China and to evaluate whether dietary predictors can improve model performance and predictive ability. Cross-sectional data for 9,734 adults aged 20–74 years old were used as the derivation data, and results obtained for a cohort of 4,515 adults with 4.2 years of follow-up were used as the validation data. We used a logistic regression model to develop a diet-containing noninvasive risk model. Akaike’s information criterion (AIC), area under curve (AUC), integrated discrimination improvements (IDI), net classification improvement (NRI) and calibration statistics were calculated to explicitly assess the effect of dietary predictors on a diabetes risk model. A diet-containing type 2 diabetes risk model was developed. The significant dietary predictors including the consumption of staple foods, livestock, eggs, potato, dairy products, fresh fruit and vegetables were included in the risk model. Dietary predictors improved the noninvasive diabetes risk model with a significant increase in the AUC (delta AUC = 0.03, P<0.001), an increase in relative IDI (24.6%, P-value for IDI <0.001), an increase in NRI (category-free NRI = 0.155, P<0.001), an increase in sensitivity of the model with 7.3% and a decrease in AIC (delta AIC = 199.5). The results of the validation data were similar to the derivation data. The calibration of the diet-containing diabetes risk model was better than that of the risk model without dietary predictors in the validation data. Dietary information improves model performance and predictive ability of noninvasive type 2 diabetes risk model based on classic risk factors. Dietary information may be useful for developing a noninvasive diabetes risk model.


Introduction
A number of noninvasive diabetes risk models have been developed to predict the risk of developing type 2 diabetes and to identify high-risk individuals [1][2][3][4][5][6][7][8][9][10][11]. The advantages of a noninvasive diabetes risk model are its lower cost and greater convenience compared with the blood glucose test. Furthermore, it can be widely used in a population-based screening program for health promotion and disease prevention. Thus far, most of the noninvasive diabetes risk models have been developed for Caucasian populations [1][2][3][4][5]. However, Glümer et al. have evaluated the performance of one diabetes risk model for Caucasians in nine populations of diverse ethnic origin and found that the risk model did not perform well in other ethnic groups [12]. This finding suggested that these diabetes risk models that were developed for Caucasian populations may be limited when applied to an Asian population. However, very few diabetes risk models have been developed for the Asian population [6][7][8][9][10][11]. Furthermore, most of the risk models developed for Caucasians did not include dietary predictors, and there is no diabetes risk model including dietary predictors for Asians.
Many cohort studies have confirmed that diet plays an important role in the development of type 2 diabetes. Many studies have suggested that type 2 diabetes can be prevented in highrisk individuals by intensive behavioral interventions targeting the diet [13][14][15]. However, compared with classic risk factors such as age, gender, physical activity, a family history of disease, and anthropometrics indices, previous studies have underestimated the impact of dietary predictors in these established diabetes risk models. In 2007, Simmons et al. reported that dietary predictors did not improve the predictive ability of noninvasive diabetes risk models using classic risk factors and comparing the value of C-statistics between risk models with and without dietary predictors [16]. However, C-statistics are not sufficiently sensitive to assess the effect of important new predictors in risk prediction models [17], thus inaccurately reflecting the real effect of dietary predictors in noninvasive diabetes risk models. To solve this dilemma, net reclassification improvement (NRI) and integrated discrimination improvement (IDI) have been proposed as better measures of discrimination than C-statistics when evaluating new predictors [18]. Based on these measures, a previous study has demonstrated that dietary information is useful in cardiovascular risk models [19]. To date, no study has explicitly evaluated the effect of dietary predictors in diabetes risk models employing these measures.
Therefore, the primary purpose of this study was to develop a noninvasive diabetes risk model that included dietary predictors based on baseline data from a cohort of 9,734 subjects in China and to evaluate the risk model in another independent cohort for external validation. Additionally, we evaluated whether dietary predictors improved model performance and the predictive ability of the noninvasive diabetes risk model.

Materials and Methods
The derivation data In our study, baseline data for the Harbin Cohort Study on Diet, Nutrition and Chronic Noncommunicable Disease (HDNNCDS) were used as the derivation data. The HDNNCDS was launched in 2010 by the national key discipline, Department of Nutrition and Food Hygiene at Harbin Medical University [20]. The HDNNCDS covered 7 urban administrate regions of Harbin. Each region was divided into 3 strata according to their financial situation and a total of 42 communities were randomly selected from each stratum in each administrate region by performing a stratified multistage random cluster sampling design. Subjects were eligible to participate in the study if they 1) were between 20 and 74 years old, 2) had been living in Harbin for at least two years, and 3) were without cancer or type 1 diabetes mellitus. The diabetes diagnosis records were used to distinguish the participants' diabetes type. A total of 9,734 subjects participated in the HDNNCDS. The first follow-up survey of all cohort subjects in the HDNNCDS is ongoing and is not yet completed. In our analysis, we excluded subjects at baseline who had undergone dietary intervention for diabetes or other diseases (n = 655) and had a total calorie intake ! 4500 kcal/d or 500 kcal/d (n = 315). A total of 8,764 subjects including 1,096 individuals with diabetes were included for analysis in the HDNNCDS.

The validation data
The cohort data from the Harbin People's Health Study (HPHS) was used as the validation data. The HPHS was launched in 2008 by the Centers for Disease Control and Prevention and the Public Health School in Harbin [21]. The HPHS covered 5 urban administrate regions of Harbin. Each region were divided into 3 strata based on their financial situation and one or two neighbourhood committees were chosen from 15 communities that were randomly selected from each stratum in each administrate region by performing a stratified multistage random cluster sampling design. A total of 8,940 subjects, aged 20 to 74 years old, were recruited for the study. Subjects were eligible to participate in the study if they had no history of using postmenopausal hormone therapy, malignancy, thyroid dysfunction, renal calculi, corticosteroid or calcitriol use. A total of 4,515 subjects (approximately 50.5% of the total subjects) were randomly selected to participate in the follow-up survey due to the study's financial constraints. In 2012, 4,158 subjects completed the first in-person follow-up survey with a response rate of 92.1%. In the validation data, subjects were also excluded at baseline in HPHS if they had diabetes (n = 550), were undergoing dietary interventions for diabetes or other diseases (n = 126), and had a total calorie intake ! 4500 kcal/d or 500 kcal/d (n = 52), leaving 3,430 subjects included in the analysis. In total, 394 incident cases of type 2 diabetes were observed during the 4.2 years of follow-up.

Ethics statement
Both of these two study protocols were approved by the Ethics Committee of the Harbin Medical University, and written informed consent was provided by all subjects. The methods in this study were in accordance with the approved guidelines.

Questionnaire survey
Detailed in-person interviews were administered by trained personnel using a structured questionnaire to collect information on demographic characteristics, dietary habits, lifestyles, physical condition and anthropometric characteristic. The questions about dietary habits, lifestyles and physical condition in the questionnaire were the same in the HDNNCDS and the HPHS.
The section on dietary habits was evaluated by the validated food frequency questionnaire (FFQ) containing data regarding usual dietary intake over the past 12 months, including 103 food items from 14 food groups, which were white rice, wheaten food, potato and its products, beans and its products, fresh vegetables, fresh fruits, livestock and its products, poultry and its products, milk and dairy products, eggs and its products, fish and its products, snack, beverage and ice cream. For each food item, participants are asked to choose their usual rate of consumption frequency categories from "per day", "per week", "per month", and "never" and then answered the number of times for the corresponding frequency categories. The question regarding the amount of food intake consumed in lians (a unit of weight equal to 50 g) or ml (for liquid food item) for the corresponding frequency was measured by using molds of photographs of standard potion sizes. Each food items were quantified in g/d with multiplying the frequency by the amount of the food item. The energy intake per day was estimated by the Food Nutrition Calculator (V1.60, Chinese Center for Disease Control [CDC], Beijing, China).
The section on lifestyles and physical condition mainly included prior disease history, a family history of disease, regular exercise, labor intensity, cigarette smoking, alcohol consumption, and taking medicines and health products in the prior 12 months. Current smokers were defined as those who smoked at least 100 cigarettes in a lifetime or smoked every day or currently smoked some days. Current drinkers were defined as those who consumed ! 1 alcoholic drink each month in the 12 months prior to the survey. Regular exercise was defined as any kind of recreational or sport physical activity other than walking for work or life performed at least 30 minutes for three or more days per week.

Anthropometric measurements and biochemical analyses
Anthropometric measurements, including height, weight, and waist circumference, were also taken at baseline by well-trained examiners, with subjects wearing light, thin clothing and no shoes. Body weight and height were measured to the nearest 0.1 kg and 0.1 cm, respectively. Blood pressures were measured 3 times with a standard mercury sphygmomanometer on the right arm of each subject after a 10-minute rest in a sitting position, and the mean values were used for analysis. Body mass index (BMI) was calculated as weight (kg) divided by the square of the height in meters (m 2 ). An oral glucose tolerance test was administered according to the World Health Organization guidelines for each subject. Fasting and 2-h postprandial serum blood glucose levels were measured using an automatic biochemistry analyzer (Hitachi, Japan).

Outcome ascertainment
Diabetes was identified by self-reports of a history of a diagnosis of diabetes, fasting blood glucose ! 7.0 mmol/L, and/or 2-h glucose ! 11.1 mmol/L, and/or taking medication for diabetes.
Hypertension was identified by self-reports of a history of a diagnosis of hypertension, a systolic blood pressure ! 140 mmHg and/or diastolic blood pressure ! 90 mmHg, and/or taking medication for hypertension.

Statistical analysis
All statistical analyses were performed using SPSS v21.0 (Beijing Stats Data CO. Ltd, Beijing China) and R 2.15.1 (http://www.r-project.org/). A two-sided P<0.05 was considered statistically significant.
Student's t-test and the χ 2 test were used for the comparison to baseline clinical characteristics and dietary information between the derivation data and validation data. In this study, noninvasive risk factors were defined as factors that could be measured without taking a blood sample. The variables included in the model were divided into classic risk factors and dietary predictors. Classic risk factors included age, gender, BMI, waist circumference, regular exercise, labor intensity, alcohol consumption, cigarette smoking, education, hypertension and a family history of diabetes. Dietary predictors included the 14 food groups in the FFQ and total calorie intake. Their adjusted odds ratio (OR) and 95% confidence intervals (95% CI) were estimated using a multivariate backward logistic regression analysis. The diabetes risk score was developed for the two multivariate logistic regression models: a classic noninvasive risk score and a diet-containing risk score, and points were assigned to each variable based on the magnitude of its regression coefficient [22]. In the diabetes risk score system, we transformed the unit of diet consumption from gram per day to liang per day, to facilitate its use in China. The dietary predictors except diary in the risk model were continuous in order to use the full information for dietary predictors. The risk score allows decimal value if the weight is not in round number. For the dairy products, because 24.4% and 29.4% of the total subjects in the derivation data and validation data were never consume milk and its products, we transformed the dairy and its products variable into binary variable in the risk model. A total diabetes risk score for each individual was calculated as the sum of the points for each variable.
The classic noninvasive risk model and the diet-containing risk model were compared with Akaike's information criterion (AIC), area under curve (AUC), IDI, NRI and the Hosmer-Lemeshow goodness-of-fit test (HL test) for model performance, discrimination and predictive ability, and general calibration in derivation data and validation data separately. An AIC difference between two models of 10 or greater was considered to be significant with a lower AIC value indicating a better model performance [23]. AUC, IDI and NRI were calculated to evaluate the model discrimination and predictive ability [24]. To apply NRI in the logistic regression model, we adopted the approach proposed by Pencina et al [25]. The Hosmer-Lemeshow goodness-of-fit test (HL test) was used to examine how well the predicted prevalence matched the observed prevalence of type 2 diabetes, and P-values of 0.01 or greater from the test were considered to indicate a good calibration. Table 1 shows the baseline clinical characteristics and average daily diet intakes in the derivation and validation data. The derivation data were older than the validation data, and the proportion of males was higher in the derivation data than in the validation data. The other baseline characteristics and average daily diet intakes were very different between the derivation and validation data. Student's t-test and the χ 2 test were used for the comparison to baseline clinical characteristics and average daily diet intakes between the derivation data and validation data. Table 2 presents the classic noninvasive risk model and the diet-containing risk model in the derivation data. In the classic noninvasive risk model, age, gender, hypertension, a family history of diabetes, alcohol consumption, regular exercise, BMI and waist circumference were significantly associated with the risk of type 2 diabetes. In the diet-containing risk model, the classic risk factors and consumption of staple foods, livestock, eggs, potato, dairy and its products, fresh fruit and vegetable were retained in the model. The detailed risk score systems based on the two models were presented in the S1 Table. The detailed parameters of the risk model evaluation that dietary predictors improved in the derivation data are presented in Table 3. Dietary predictors improved the model performance with a significantly decreased AIC value (delta AIC = 199.5). Dietary predictors improved the model discrimination with a significant increase in the AUC (delta AUC = 0.03, P<0.001). Compared with the classic noninvasive risk model, the addition of dietary predictors to the risk model significantly increased the relative IDI and NRI by 24.6% and 0.155 separately. Based on the results of the HL-test, the predicted prevalence of type 2 diabetes in the two models matched the observed prevalence well (χ 2 = 7.29; P = 0.51 for the classic noninvasive risk model; χ 2 = 6.34, P = 0.61 for the diet-containing risk model). The sensitivity of the diet-containing risk model in the derivation data was significantly higher than that of classic noninvasive risk model at the 20% probability cut-off point (57.6% vs 50.3%, P<0.001).

Results
The specificity of the two models did not differ significantly in the derivation data (84.5% vs 84.7%, P = 0.389).
To further validate the usefulness of the diet-containing risk model of the derivation data and to evaluate the effect of dietary predictors on the diabetes risk model, the scoring methods were ascertained by applying the scores to the validation data. The distribution of the scores in both models of both dataset was normal distribution by K-S test (classic noninvasive risk score in the derivation data: Z = 1.25, P = 0.090; in the validation data: Z = 0.953, P = 0.323; diet-containing risk score in the derivation data: Z = 1.318, P = 0.062; in the validation data: Z = 0.812, P = 0.525). The score distribution of the classic noninvasive risk model in the derivation data (median 68.5, standard deviation 11.8, range from 21.7 to 117.9) was not significantly different from that in the validation data (median 68.2, standard deviation 11.4, range from 34.1 to 107.8) (P = 0.060), whereas the score distribution of the diet-containing risk model was shifted significantly to higher levels in the validation data (median 72.8, standard deviation 13.0, range from 20.6 to 127.7) compared with the derivation data (median 71.5, standard deviation 12.4, range from 30.4 to 112.1) (P<0.001). As shown in Table 4, in the validation data, the sum of the scores of the diet-containing risk model predicted the incidence of type 2 diabetes more effectively than the sum of the scores based on the classic risk model, with a significant increase in the AUC (delta AUC = 0.03, P<0.001). Compared with the classic noninvasive risk model, the addition of dietary predictors to the model significantly increased the relative IDI and NRI by 22.5% and 0.219 separately. Based on the results of the HL-test, the calibration of the dietcontaining risk model was better than that of the classic noninvasive risk model (χ 2 = 13.20; P = 0.11 for the diet-containing risk model; χ 2 = 17.57, P = 0.03 for the classic noninvasive risk model). Detailed information for the HL-test in the two models is presented in Fig 1. The 4.2-years cumulative incidence of type 2 diabetes was significantly increased with elevating quintiles of the risk scores of both models as shown in the Fig 2. The sensitivity of the diet-containing risk model in the validation data was significantly higher than that of classic noninvasive risk model at the 20% probability cut-off point (49.0% vs 40.4%, P<0.001). The specificity of the two models did not differ significantly (85.8% vs 85.7%, P = 0.543).

Discussion
In the present study, we developed a diet-containing noninvasive type 2 diabetes risk model for Northern Chinese using dietary information and classic noninvasive risk factors. We observed that the addition of dietary predictors to the noninvasive diabetes risk model improved the model performance and its predictive ability. Many noninvasive risk models have been developed to predict the risk of type 2 diabetes [1][2][3][4][5][6][7][8][9][10][11]. Some of them have been developed for Asians [6][7][8][9][10][11], which also can be adapted to the setting and purpose of our study. Most of these risk models included the following classic risk factors: age, gender, BMI, WC, family history of diabetes, hypertension, sport time, smoking and alcohol consumption. We also included all of these risk factors in the classic noninvasive  risk model in our study, and the effects of these risk factors were in agreement with previous studies, which could be seen as the update and utility of these risk models in our population. Therefore, the performance of the classic noninvasive risk model in our study can represent the performance of these previous risk models, and dietary information will improve the model performance and predictive ability of these previous risk models. The effect of diet has been widely studied and acknowledged. For example, a prospective study showed that the consumption of milk and dairy products is associated with a markedly reduced prevalence of the metabolic syndrome [26]. A meta-analysis also suggested that 1.35 servings a day of vegetables compared with 0.2 servings resulted in a 14% reduction of the risk of type 2 diabetes [27], a 6% lower risk of type 2 diabetes per 1 serving a day increment of fruit intake [28], and an 11% increase in the risk of type 2 diabetes per 1 serving a day of white rice consumption [29]. However, dietary predictors have not been adequately emphasized in type 2 diabetes risk prediction models. Simmons et al. reported that the addition of dietary predictors does not improve the performance of the model or the predictive ability of type 2 diabetes risk models. The authors speculated that risk factors such as BMI interfere with the effects of long-term diet and future diabetes risks, and this may account for the lack of improvement with the addition of dietary predictors to the model. However, these classic risk factors cannot completely account for the association between diet and type 2 diabetes. For example, a previous study showed that increased red meat consumption over time was associated with an elevated subsequent risk of type 2 diabetes, and the association was only partially mediated by body weight [30]. In addition, the value of C-statistics is insensitive when evaluating new predictors. In our study, we used new measures such as NRI and IDI to evaluate the effects of dietary predictors in the noninvasive diabetes risk model. We included 14 kinds of food that are common in China. We found potato, dairy products, fresh fruit and vegetables decreased the risk of type 2 diabetes, and staple foods including white rice and wheaten food, eggs and livestock increased the risk, which is consistent with previous studies. In the present study, the diet-containing risk model had better AIC and AUC than the classic noninvasive risk model in the derivation and validation data. The addition of dietary predictors to the risk model yielded a significant increase in relative IDI and NRI for the risk model compared with the classic noninvasive risk model. Furthermore, the Hosmer-Lemeshow goodness-of-fit test for the dietcontaining noninvasive risk model indicated better agreement between the observed and predicted probabilities of type 2 diabetes incidence across deciles than that of the classic noninvasive risk model in the validation data. The increased sensitivity also indicated that the dietcontaining risk model identified 7 and 8 more patients than classic noninvasive risk model per one hundred diabetic patients in the derivation data and validation data. These results indicate that dietary predictors can improve the model performance and predictive ability of the noninvasive diabetes risk model. Dietary information is useful for constructing a noninvasive type 2 diabetes risk model.
In China, currently more than half of the people with diabetes are undiagnosed [31], and the burden of expenditures for medication and care is rising as the incidence of type 2 diabetes increases rapidly. Although the existing classic noninvasive diabetes risk models may be helpful for identifying high risk subjects in the primary care settings, they are probably not efficient tools for preventing diabetes in clinical practice for the reason that they include limited modifiable risk factors. The diet-containing noninvasive risk model can improve disease awareness and identify high-risk individuals and potential targets for the reduction of type 2 diabetes risk. Most of the risk factors in the diet-containing risk model are modifiable. Therefore, this model can establish specific, effective and feasible strategies for the prevention of type 2 diabetes and motivate patients to comply with a healthy lifestyle and with treatment plans. The dietary information in our study came from FFQ. The FFQ in our study presents 103 food items, and it takes 20-30 minutes to finish and can self-administered. This method enables to obtain dietary data in a relative simple, cost-effective and time efficient approach. It is suitable and feasible for the integration of FFQ into noninvasive diabetes risk models. Although dietary record provides more accurate dietary data of an individual for diet evaluation than FFQ does in a clinical setting, time and labor intensive are two major constraints of dietary record, which have been suggested as barriers for physicians to apply this method in clinical practice.
The strength of our study is that we included dietary predictors in the type 2 diabetes risk prediction model, providing more modifiable factors than previous studies in the noninvasive risk model for the Asian population. Furthermore, we validated the usefulness of dietary predictors in the noninvasive type 2 diabetes risk model in the derivation data and in another independent cohort. Our studies also have limitations. First, we cannot exclude the possibility of measurement errors from information bias in the dietary data, although trained personnel performed the interviews using a validated FFQ and molds and photos of portion sizes. The reproducibility and validity of the FFQ in our study has been assessed in the previous studies, and the results indicated that the FFQ in our study is a reliable method for assessing dietary intake [20,21]. Therefore, the measurement errors from information bias due to food data may affect the predictive ability very slightly. Additionally, our analysis did not include a section on nutrient supplementation, which has been found to be associated with type 2 diabetes and may lead to potential biases. Second, the two risk models were developed based on crosssectional data because of its large sample size and relative good representation of the area. Although the two risk models in our study could only associate prevalent cases of type 2 diabetes, rather than identify incident cases, the discrimination and predictive ability of the dietcontaining risk model were relative good with higher sensitivity and acceptable AUC when evaluated it in an independent cohort.

Conclusion
In conclusion, we developed a diet-containing noninvasive risk model for a northern Chinese population using classic noninvasive risk factors and dietary information. Dietary information improves model performance and predictive ability of noninvasive type 2 diabetes risk model based on classic risk factors. Dietary information may be useful for developing noninvasive type 2 diabetes risk models.
Supporting Information S1 Table. The detailed risk score systems based on the two models. (DOCX)