Nutritional markers of undiagnosed type 2 diabetes in adults: Findings of a machine learning analysis with external validation and benchmarking | PLOS One

Advertisement

Browse Subject Areas

?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1 — Fig 1.

Flowchart depicting the analytic workflow adopted in the study.
a-adjusted by resampling methods incl. oversampling, under-sampling, random oversampling (ROSE) and synthetic minority oversampling technique (SMOTE).

More »

Fig 2 — Fig 2.

Overlapped ROC curves demonstrating predictive performance of logistic regression models on internal validation data.
Using unbalanced, original training data and re-structured with oversampling, ROSE and SMOTE resampling.

More »

Fig 3 — Fig 3.

Overlapped ROC curves demonstrating predictive performance of logistic regression models on external validation data.
Using unbalanced, original training data and re-structured with oversampling, ROSE and SMOTE resampling.

More »

Fig 4 — Fig 4.

Overlapped ROC curves demonstrating predictive performance of random forests models on internal validation data.
Using unbalanced, original training data and re-structured with oversampling, ROSE and SMOTE resampling.

More »

Fig 5 — Fig 5.

Overlapped ROC curves demonstrating predictive performance of random forests models on external validation data.
Using unbalanced, original training data and re-structured with oversampling, ROSE and SMOTE resampling.

More »

Fig 6 — Fig 6.

Overlapped ROC curves demonstrating predictive performance of artificial neural network models on internal validation data.
Using unbalanced, original training data and re-structured with oversampling, ROSE and SMOTE resampling.

More »

Fig 7 — Fig 7.

Overlapped ROC curves demonstrating predictive performance of artificial neural network models on external validation data.
Using unbalanced, original training data and re-structured with oversampling, ROSE and SMOTE resampling.

More »

Fig 8 — Fig 8.

Variable importance plot of the best-performing random forest model produced by ROSE resampling.
BMXWAIST = waist circumference; RIDAGEYR = age; BMXBMI = body mass index; WHQ150 = age when heaviest weight; BMXLEG = upper leg length; BMXARMC = arm circumference; BMXWT = weight; WHD050 = self-reported weight– 1 year ago; WHD020 = current self-reported weight; WHD140 = self-reported greatest weight; BMXHT = standing height; carb = carbohydrate; caffeine = caffeine; INDFMPIR = income-poverty ratio; bcar = beta carotene; acar = alpha carotene; kcal = energy; dodecanoic = SFA 12:0 (Dodecanoic); copper = copper; atoc = vitamin E alpha tocopherol.

More »

Fig 9 — Fig 9.

Variable importance plot of the best-performing artificial neural network model produced by ROSE resampling.
BMXWAIST = waist circumference; BMXBMI = body mass index; RIDAGEYR = age; WHQ150 = age when heaviest weight; BMXARMC = arm circumference; BMXWT = weight; WHD050 = self-reported weight– 1 year ago; WHD020 = current self-reported weight; BMXLEG = upper leg length; DMDEDUC2 = education level; WHD140 = self-reported greatest weight; HSD010 = self-rated general health; WHQ030 = How do you consider your weight?; PAQ650 = vigorous recreational activities; WHQ040 = Like to weigh more, less, or same?; DBD895 = number of meals not home prepared; BMXHT = standing height; PAQ665 = moderate recreational activities; DBD910 = number of frozen meals/pizzas in past 30 days; carb = carbohydrate.

More »

Fig 10 — Fig 10.

Variable importance plot of the best-performing artificial neural network models produced by SMOTE resampling.
BMXWAIST = waist circumference; BMXBMI = body mass index; RIDAGEYR = age; WHQ150 = age when heaviest weight; BMXARMC = arm circumference; BMXWT = weight; WHD050 = self-reported weight– 1 year ago; WHD020 = current self-reported weight; BMXLEG = upper leg length; DMDEDUC2 = education level; WHD140 = self-reported greatest weight; HSD010 = self-rated general health; WHQ030 = How do you consider your weight?; PAQ650 = vigorous recreational activities; WHQ040 = Like to weigh more, less, or same?; DBD895 = number of meals not home prepared; BMXHT = standing height; PAQ665 = moderate recreational activities; DBD910 = number of frozen meals/pizzas in past 30 days; carb = carbohydrate.

More »

Table 1 — Table 1.

Nutritional and other markers of undiagnosed type 2 diabetes identified by the best-performing logistic model (AUC = 75.7%).

More »

Table 2 — Table 2.

Nutritional and other markers of undiagnosed type 2 diabetes identified by best-performing ANN and RF models.

More »

Fig 11 — Fig 11.

Benchmarking with the ADA diabetes risk test.
Comparison of predictive performance of ADA diabetes risk test on internal validation data (AUC = 0.737028) and the best-performing predictive model on internal validation data (AUC = 0.7566544), as per DeLong test for comparing two ROC curves, was non-significant (p = 0.3201) indicating performances on a par with each other. Comparison of predictive performance of ADA diabetes risk test on external validation data (AUC = 0.7401352) and the best-performing predictive model on external validation data (AUC = 0.7464), as per DeLong test for comparing two ROC curves, was also non-significant (p = 0.0643).

More »

Table 3 — Table 3.

Creation of variables analogous to those in the American Diabetes Association (ADA) diabetes risk test using National Health and Nutrition Examination Survey (NHANES) data.

More »

Table 4 — Table 4.

Performance comparison of the ADA diabetes risk test versus the best-performing model on NHANES data.

More »