Figures
Abstract
Objective
Prediabetes is a silent condition that often goes undetected. However, timely interventions could prevent its progression to type 2 diabetes. Traditional glycemic markers, such as hemoglobin A1c (HbA1c), have limitations, creating a need for new diagnostic biomarkers. In this study, our objective was to develop an interpretable machine learning model using biomarkers related to oxidative stress, inflammation, and lipid metabolism to classify prediabetes independently of traditional glycemic markers, such as HbA1c. We also compared multiple biomarker panels to determine which biomarkers offer the highest predictive accuracy.
Methods
We developed and validated interpretable machine learning models using clinical and biomarker data from 545 participants (405 healthy controls and 140 with prediabetes). To ensure robust and generalizable findings, we employed a nested cross-validation technique, managed feature collinearity using the variance inflation factor (VIF), and interpreted the final model with Shapley Additive exPlanations (SHAP) [Kapoor S, Narayanan A. Patterns. 4(9):100804 (2023); Vabalas A, et al. PLoS One. 14(11):e0224365 (2019); Lundberg SM, Lee SI. Adv Neural Inf Process Syst. 30:4768–77 (2017)].
Results
Our approach identified a distinct panel of inflammatory biomarkers (IL-10, IGF-1, and CRP) capable of classifying prediabetes independently of traditional glycemic markers. This non-glycemic model achieved a promising Area Under the Curve (AUC) of 0.711 on holdout validation, establishing inflammation as a key and measurable indicator of early metabolic dysfunction.
Conclusion
Our findings introduce a novel panel of inflammatory biomarkers that show promise in the identification of prediabetes independently of traditional glucose-based measures. By highlighting inflammation as an early indicator of metabolic dysfunction, this approach may enhance precision in the detection of prediabetes. Longitudinal studies with larger and more diverse populations are essential to clinically validate these biomarkers and confirm their value in improving the early diagnosis and management of metabolic health.
Citation: Maalouf M, Tammam M, Kurungadan S, Alsereidi A, Afzal M, Jelinek HF (2026) An inflammatory biomarker panel for prediabetes classification using interpretable machine learning. PLoS One 21(3): e0341195. https://doi.org/10.1371/journal.pone.0341195
Editor: Aleksandra Klisic, University of Montenegro-Faculty of Medicine, MONTENEGRO
Received: October 20, 2025; Accepted: January 3, 2026; Published: March 16, 2026
Copyright: © 2026 Maalouf et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data used in this study are publicly available in the Mendeley Data repository at https://data.mendeley.com/datasets/x8z62gkhhw/1.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
Prediabetes is a common metabolic disorder characterized by elevated blood glucose levels below the threshold for the diagnosis of type 2 diabetes mellitus (T2DM). Almost 590 million adults worldwide live with diabetes and more than 630 million people are estimated to have prediabetes. This underscores a major public health crisis and a critical window for early intervention and prevention [1–3]. Despite its health consequences, including the increased risks of cardiovascular disease, kidney disease, and neuropathy, prediabetes is often undiagnosed because it usually occurs without overt symptoms [4–8].
Traditionally, the diagnosis of prediabetes is based mainly on fasting plasma glucose (FPG) and glycated hemoglobin (HbA1c) tests. However, these methods may not reliably detect early metabolic changes and can sometimes misclassify individuals [9–11]. Therefore, researchers are increasingly exploring alternative diagnostic approaches, such as the evaluation of cardiovascular risk factors and patient subphenotypes, to improve predictive accuracy and early risk identification [12,13].
Recent research has highlighted the importance of biomarkers associated with early metabolic disturbances, independent of traditional glucose markers. Systematic reviews highlight ongoing efforts to identify biomarkers that could improve our understanding and early detection of type 2 diabetes [14]. Researchers now recognize biological processes, such as chronic inflammation, mitochondrial dysfunction, and oxidative stress, as crucial early disruptions that precede overt glycemic disorders [15–19]. Biomarkers related to these processes, mainly markers of oxidative DNA damage, have the potential for early detection and intervention in metabolic disorders [20].
Machine learning (ML), especially interpretable algorithms, provides a powerful approach to identifying complex interactions among biomarkers related to prediabetes [21]. However, current studies based on ML often rely heavily on traditional glucose-based biomarkers, restricting the ability to uncover novel metabolic markers [22–24].
Recent studies in predictive modeling have emphasized the predictive potential of biomarkers. For example, these biomarkers have been applied to directly predict the risk of diabetes using machine learning [25]. In addition, these techniques have been shown to be more comprehensive in their ability to evaluate disease progression using inflammatory and related markers in other chronic conditions that frequently coexist with metabolic disorders, such as depression [26] and cardiac autonomic neuropathy [27].
To address this gap, our study uses a targeted ML approach specifically designed to evaluate the predictive value of non-glycemic biomarkers. We systematically examine biomarkers associated with mitochondrial function, inflammation, and oxidative stress, which have previously been shown to be relevant for the prediction of chronic disease [28–30]. By applying robust statistical methods and interpretable machine learning approaches (SHAP), our objective is to identify biomarkers that predict prediabetes independently of HbA1c. Clarifying the independent predictive value of these biomarkers could significantly improve early risk identification and enable more targeted preventive interventions in clinical practice [31].
2 Methods
2.1 Study design and data source
This study is a secondary analysis of data from the DiabHealth rural diabetes screening clinic, which prospectively collected participant data between 2002 and 2015. The original data collection received full ethical approval from the Charles Sturt University Human Research Ethics Committee (CSU HREC; protocol 2006/042), and all participants provided their written informed consent [32]. For the present research, the pre-existing, de-identified dataset was accessed in January 2025 solely for statistical analysis; no participant contact, recruitment, or data collection occurred after 2015. Data were cleaned for the parent study using established protocols described by Jelinek et al. [33], which gives a complete analytic dataset for the variables examined here.
2.2 Study population and outcome definition
The analytical cohort for this study was derived from the parent DiabHealth dataset, which initially included 847 participants. From this cohort, individuals with a prior diagnosis of diabetes were first excluded, resulting in a sample of 604 individuals. To further refine the cohort to specifically isolate the prediabetic state and minimize the confounding effects associated with advanced age, we applied additional exclusion criteria. Participants with fasting glucose levels ≥ 7.0 mmol/L (indicative of previously undiagnosed diabetes) or those over 85 years of age were removed. This final selection process yielded a study cohort of 545 participants. Prediabetes was defined according to the criteria of the American Diabetes Association (ADA) [5]: participants with fasting screening glucose between 5.6 and 6.9 mmol/L (inclusive) were classified as ’prediabetes’ (n = 140), and those with levels less than 5.6 mmol/L were classified as ’control’ (n = 405). Baseline characteristics were compared between groups using Welch’s t-test for continuous variables and the -test for categorical variables, implemented using Python’s SciPy library.
2.3 Biomarker panels for analysis
The biomarkers in this study were selected based on their established involvement in metabolic disorders and their potential predictive value for T2DM (Table 1). The biomarker panel included biomarkers of lipid metabolism (triglycerides, total cholesterol [TC], high-density lipoprotein [HDL], low-density lipoprotein [LDL]), oxidative stress (GSH, GSSG, GSH/GSSG, 8-OHdG), mitochondrial function (Humanin, MOTS-c, p66Shc), inflammation (CRP, IL-6, IL-1β, IL-10, MCP-1, IGF-1), the glycemic marker HbA1c, and the demographic variable age.
Inflammatory biomarkers such as interleukin-6 (IL-6), C-reactive protein (CRP), and interleukin-1β (IL-1β) were included because they link chronic low-grade inflammation to insulin resistance through activation of the c-Jun N-terminal kinase (JNK) and IκB kinase β (IKKβ)–nuclear factor κB (NF-κB) signaling pathways [15,17]. Oxidative stress biomarkers, particularly 8-hydroxy-2’-deoxyguanosine (8-OHdG) and the reduced-to-oxidized glutathione ratio (GSH/GSSG), were selected based on evidence that systemic oxidative damage precedes the clinical onset of hyperglycemia [18,20]. Mitochondrial-derived peptides, including Humanin and the mitochondrial open reading frame of 12S rRNA type-c (MOTS-c), were included as emerging metabolic regulators that modulate systemic insulin sensitivity [29].
To systematically assess predictive capacity and isolate non-glycemic signals, variables were grouped into seven biomarker panels (Table 2). Three panels captured single biological domains: (i) cholesterol_only (lipid biomarkers: triglycerides, TC, HDL, LDL), (ii) oxidative_only (oxidative stress biomarkers: GSH, GSSG, GSH/GSSG, 8-OHdG), and (iii) inflammatory_only (inflammatory biomarkers: CRP, IL-6, IL-1β, IL-10, MCP-1, IGF-1). Two additional panels evaluated conventional risk markers in isolation: (iv) HbA1c_only (glycated hemoglobin) and (v) age_only.
The remaining panels were defined to contrast non-glycemic biomarker information with more traditional predictors. The all_biomarkers panel combines all non-glycemic biomarkers (lipid, oxidative stress, mitochondrial function, and inflammatory markers), explicitly excluding HbA1c and age, reflecting the predictive capacity of biological biomarkers alone. The all_features panel includes all biomarkers (non-glycemic biomarkers and HbA1c) together with age. Clinical variables such as hypertension status, history of cardiovascular disease, and gender were deliberately excluded from all panels to focus on the predictive contribution of biomarkers and age. Future studies could integrate these clinical and demographic factors into extended risk prediction models.
2.4 Machine learning and statistical analysis
We developed a robust predictive model using Python with libraries including scikit-learn, pandas, statsmodels, XGBoost, LightGBM, and SHAP. The analysis strictly adhered to best practices to avoid data leakage and ensure external validity and generalizability of the findings [34,35]. The dataset was divided into a main set (80% of the data) for initial model development and a holdout set (20%) for final evaluation.
2.4.1 Model evaluation and hyperparameter tuning.
To ensure robust model development and evaluation, we applied nested cross-validation consisting of two levels: an outer five-fold cross-validation for model evaluation and an inner three-fold cross-validation within each outer fold for hyperparameter tuning. Hyperparameters were optimized using grid search (GridSearchCV) by maximizing the Area Under the Receiver Operating Characteristic Curve (AUC). The final performance metrics are the aggregated predictions from each test fold, providing unbiased estimates of generalization performance.
2.4.2 Data processing pipeline.
All preprocessing steps were encapsulated within a scikit-learn pipeline object and applied independently inside each training fold to prevent data leakage [35]. The pipeline included:
- Multicollinearity reduction: A custom transformer iteratively removed the feature with the highest variance inflation factor (VIF) until all remaining features had a VIF below 5.0, resulting in a simpler, more interpretable model free of highly correlated variables [36].
- Standardization: Numeric features were standardized (mean = 0, standard deviation = 1) using StandardScaler, and categorical features encoded with OneHotEncoder.
2.4.3 Model training and selection.
We evaluated four machine learning algorithms: Logistic Regression for interpretability [37], Random Forest for robustness against noise and outliers [38], and LightGBM [39] and XGBoost [40] for computational efficiency and high predictive performance [41]. The pipeline adhered strictly to best practices to prevent data leakage and ensure the generalizability of the model [34,35]. To manage class imbalance, we configured Logistic Regression, Random Forest, and LightGBM to use class_weight = ‘balanced’, while for XGBoost we used an adaptive scale_pos_weight based on the class distribution in each training fold. The hyperparameter grids explored are detailed in Table 3. The final model and the biomarker panel were selected on the basis of the highest mean AUC obtained during cross-validation. The class imbalance between controls and prediabetic subjects was addressed by class weighting, without the application of matching or additional imbalance adjustment techniques.
2.4.4 Final model calibration and interpretation.
The selected final model was re-trained in the entire training set and calibrated using isotonic regression (calibratedClassifierCV). Performance was evaluated on the independent holdout set. Feature importance was assessed using Shapley Additive exPlanations (SHAP), a method that quantifies the contribution of each feature to model predictions [42]. SHAP calculations were based on a background sample of 100 observations, using the general shap.Explainer to automatically select the appropriate explainer type.
3 Results
3.1 Baseline characteristics
The final analytical cohort consisted of 405 controls (74.3%) and 140 individuals with prediabetes (25.7%). The prediabetic group was significantly older (p = 0.002) and exhibited significant differences in markers of oxidative stress (GSSG, p = 0.002; GSH/GSSG ratio, p < 0.001) and mitochondrial function (MOTS-c, p = 0.002; p66Shc, p = 0.001) compared to controls. As expected, the glucose and HbA1c levels in the screen were significantly higher in the prediabetes group (p < 0.001 for both). Hypertension status, cardiovascular disease history, and gender did not differ significantly between groups (all p > 0.05) and therefore were not included as predictors in subsequent modeling, as Table 4 shows.
3.2 Model performance and selection
After applying a multicollinearity threshold (VIF < 5), our analysis identified a LightGBM model using the inflammatory_only biomarker panel as the optimal classifier. It achieved the highest mean area under the curve (AUC = 0.743) across cross-validation folds, outperforming other biomarker combinations (Table 5).
The inflammatory_only (LightGBM) panel produced the highest mean AUC (0.743). Although its performance was not statistically distinguishable from the all_features, all_biomarkers and oxidative_only panels due to overlap of the 95% confidence intervals, all these panels were significantly better than the baseline models such as HbA1c_only and age_only, whose confidence intervals did not overlap with those four top panels. Therefore, the inflammatory_only panel was selected as the final model due to its high predictive accuracy and fewer variables.
We then evaluated the final model on an independent holdout test set, where it demonstrated moderate predictive capacity with an AUC of 0.711 (95% CI: 0.591–0.824; Fig 1A). Detailed performance metrics, including precision, recall, specificity, accuracy, and F1 score, together with their corresponding 95% confidence intervals, are summarized in Table 6. The confusion matrix visualizing the model predictions on the holdout set is provided in Fig 2. The calibration curve indicated an acceptable agreement between the predicted probabilities and the actual observations, although minor deviations suggest potential areas for further improvement (Fig 1B). The relatively wide confidence intervals emphasize the importance of future validation in larger datasets to achieve more precise estimates.
(A) ROC curve (AUC = 0.711). (B) Calibration curve demonstrating acceptable agreement between predicted and observed probabilities.
Although the expanded biomarker panel (VIF < 10) included additional relevant features, it yielded slightly lower overall discriminatory power in the holdout set (AUC = 0.699, 95% CI: 0.585–0.809) than the primary inflammatory panel (AUC = 0.711, 95% CI: 0.591–0.824). While the expanded panel demonstrated higher specificity (0.951) and accuracy (0.807), its significantly lower recall (0.393 vs. 0.500) supports the selection of the more parsimonious inflammatory set for early-stage screening, where identifying a greater proportion of at-risk individuals is a priority (S3 Table and S2 Fig).
3.3 SHAP interpretation of key biomarkers
SHAP feature importance analysis identified IGF-1, IL-10, and CRP as the most influential biomarkers in the final LightGBM model (Fig 3). These biomarkers consistently demonstrated strong predictive contributions, emphasizing the central role of inflammation in the early metabolic disorder associated with prediabetes. In particular, CRP emerged among the top three biomarkers in both SHAP analyses with multicollinearity thresholds of VIF < 5 and VIF < 10 (see Supplementary S1 Fig), highlighting its consistency as a predictive marker. A comparative sensitivity analysis using a multicollinearity threshold (VIF < 10) (see Supplementary S2 Table) underscored the advantage of our focused inflammatory biomarker approach, as the inclusion of additional correlated biomarkers reduced the interpretability and clarity of the feature importance estimates.
(A) Global feature importance, ranking predictors by their mean absolute impact. (B) Beeswarm plot showing the impact of each predictor’s value on the model output for every individual. The analysis highlights IGF-1, IL-10, and CRP as top influential biomarkers.
4 Discussion
This study successfully identified a panel of inflammatory biomarkers (IGF-1, IL-10, and CRP) that shows potential to predict prediabetes without relying on traditional glycemic markers. Although the holdout validation AUC of 0.711 indicates moderate predictive power, its true significance lies in achieving this performance without any glycemic input. This finding establishes that a distinct inflammatory signal is present and can be independently detected in the prediabetic state, offering a fundamentally new axis for early risk assessment and underscoring the biological importance of our findings.
The identification of inflammatory biomarkers underscores the biological significance of our findings. IGF-1 is recognized for its role in metabolic homeostasis and modulation of inflammation [43]. Previous research highlights the potential to target inflammation in metabolic diseases such as diabetes [44]. IL-10 has been associated with the regulation of chronic inflammation in metabolic disorders [45]. Finally, CRP, a general marker of inflammation, has demonstrated predictive capacity for insulin resistance and the subsequent development of diabetes [46]. The performance of this panel aligns well with the hypothesis that chronic inflammation is a primary driver of insulin resistance that defines prediabetes [47].
A key strength of our methodology was the consideration of multicollinearity among variables to improve the interpretability of our results and minimize the effect of correlation. Using a threshold for VIF < 5, our analysis identified a concise and efficient inflammatory panel. In a sensitivity analysis using a less restrictive threshold of VIF < 10, the panel that included all biomarkers produced a mean cross-validation AUC of 0.727 (95% CI: 0.708–0.747; S2 Table), with GSSG, triglycerides, and CRP emerging as the most critical features (S1 Fig). However, this broader panel did not offer a performance advantage over our more focused inflammatory panel (Mean AUC: 0.727 vs. 0.743; S2 Table and S1 Table, respectively). This highlights a classic trade-off: for this dataset, the simpler inflammatory model provided the best balance of predictive accuracy and interpretability.
Regarding the holdout set, the performance metrics of our model highlight its ideal clinical application. Although the AUC was 0.711 and the F1 score was 0.538 in an imbalanced cohort, the interplay between high specificity (0.877) and moderate sensitivity (0.500) is particularly informative. This profile suggests that the optimal role of the model is not as a standalone diagnostic test but as a highly effective screening tool to identify a subset of at-risk individuals who would most benefit from confirmatory glycemic testing. The successful identification of half of actual prediabetic cases using an independent biological signal represents a significant advance for targeted preventive medicine.
4.1 Practical feasibility and implementation considerations
Although identified inflammatory biomarkers (IGF-1, IL-10, and CRP) show promising predictive potential, practical considerations, such as cost, availability, and laboratory requirements, significantly influence their use in routine screening settings. The C-reactive protein (CRP) is the most feasible and practical of the three [48]. However, insulin-like growth factor (IGF-1) testing, although clinically accessible, generally involves higher costs, requires specialized equipment, and presents variability between different assay platforms, thus limiting its widespread implementation [49,50]. The interleukin-10 (IL-10) assay is primarily a research-based test, available only to a limited extent due to its high costs, complex logistics for sample handling, and insufficient standardization, restricting its immediate application in clinical screening [51]. Although these biomarker tests indicate promising predictive accuracy, it should be noted that standard HbA1c tests are generally more cost-effective and widely accessible. Thus, the biomarker panel identified here is expected to complement rather than replace existing diagnostic procedures.
4.2 Limitations and future directions
This study has several limitations, each guiding clear directions for future research:
- Generalizability and cohort diversity: A sample of 545 from a single rural location limits the generalizability of our findings. Biomarker expression can vary substantially between diverse ethnic and geographic populations, highlighting the need for external validation in more extensive and diverse study cohorts. Using a biomarker panel with a Variance Inflation Factor of less than 10 could lead to more accurate predictive abilities in larger datasets.
- Methodological robustness: Due to our relatively small sample size, our findings may be affected by the random number seed used for data partitioning. Future research should replicate the analyses using various randomized seed values to confirm the consistency of biomarkers.
- Biomarkers interactions and interpretability Although SHAP highlights the relative importance of individual biomarkers, it does not fully capture the interactions between them. Future research could incorporate advanced interaction methods, such as SHAP interaction values or alternative explainable techniques, to deepen the understanding of how these biomarkers interact biologically and improve predictive models.
- Confounding variables and model complexity: Our study mainly focused on biomarkers, but variables such as age and other demographic characteristics were notable confounders. Future research should create hybrid predictive models that integrate biomarkers with additional demographic and clinical factors, such as BMI and waist circumference. Methods such as stratification by age groups would further clarify the effects of these biomarkers, regardless of confounders.
- Biomarker dynamics and advanced modeling: Our study was cross-sectional, capturing biomarkers at a specific point in time. Future studies should include longitudinal data to better understand their dynamics. Furthermore, exploring advanced techniques such as the synthetic minority sampling technique (SMOTE) [52], Bayesian optimization [53], and adaptive probability thresholds could improve predictive accuracy.
- Optimal classification thresholds: Our analysis used a standard fixed probability threshold of 0.5. Future work should investigate adaptive thresholds, potentially improving metrics such as the F1 score and recall, particularly in datasets with significant class imbalance.
5 Conclusion
This study introduces a novel biomarker-based approach for the detection of prediabetes, highlighting inflammatory biomarkers (IGF-1, IL-10, and CRP) as promising early indicators, independent of traditional glucose-based methods. Using interpretable machine learning techniques, our model demonstrated promising predictive performance (AUC = 0.711), establishing inflammation as a biological signal of early metabolic dysfunction. Although further validation in larger, diverse populations and practical considerations such as test standardization and cost must be addressed, our findings offer a meaningful step toward precision diagnostics. Ultimately, incorporating inflammation-focused biomarkers into routine screening protocols could facilitate earlier preventive interventions, significantly reducing the risk of progression to type 2 diabetes.
Supporting information
S1 File. Combined supporting information.
This file contains supplementry tables.
https://doi.org/10.1371/journal.pone.0341195.s001
(PDF)
S1 Fig. SHAP analysis of the best model from the
analysis: Shows global feature importance (bar plot) and a SHAP summary beeswarm plot identifying GSSG, Triglyceride, and CRP as the top predictors.
https://doi.org/10.1371/journal.pone.0341195.s002
(TIF)
S2 Fig. Holdout set validation for the best
model: Includes the ROC curve, calibration plot, and confusion matrix for the expanded all-biomarker panel.
https://doi.org/10.1371/journal.pone.0341195.s003
(TIF)
References
- 1.
International Diabetes Federation. IDF Diabetes Atlas 11th edition: Global estimates 589 million adults with diabetes and 635 million with impaired glucose tolerance (prediabetes). 2025. https://diabetesatlas.org/resources/idf-diabetes-atlas-2025/
- 2. Sun H, Saeedi P, Karuranga S, Pinkepank M, Ogurtsova K, Duncan BB, et al. IDF Diabetes Atlas: global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res Clin Pract. 2022;183:109119. pmid:34879977
- 3. Chung WK, Erion K, Florez JC, Hattersley AT, Hivert M-F, Lee CG, et al. Precision medicine in diabetes: a Consensus Report from the American Diabetes Association (ADA) and the European Association for the Study of Diabetes (EASD). Diabetologia. 2020;63(9):1671–93. pmid:32556613
- 4. Tabák AG, Herder C, Rathmann W, Brunner EJ, Kivimäki M. Prediabetes: a high-risk state for diabetes development. Lancet. 2012;379(9833):2279–90. pmid:22683128
- 5. American Diabetes Association Professional Practice Committee. 2. Diagnosis and classification of diabetes: standards of care in diabetes-2024. Diabetes Care. 2024;47(Suppl 1):S20–42. pmid:38078589
- 6. Liu J, Grundy SM, Wang W, Smith SC Jr, Vega GL, Wu Z, et al. Ten-year risk of cardiovascular incidence related to diabetes, prediabetes, and the metabolic syndrome. Am Heart J. 2007;153(4):552–8. pmid:17383293
- 7. Di Pino A, Urbano F, Piro S, Purrello F, Rabuazzo AM. Update on pre-diabetes: focus on diagnostic criteria and cardiovascular risk. World J Diabetes. 2016;7(18):423–32. pmid:27795816
- 8. Hostalek U. Global epidemiology of prediabetes - present and future perspectives. Clin Diabetes Endocrinol. 2019;5:5. pmid:31086677
- 9. Bergman M, Abdul-Ghani M, Neves JS, Monteiro MP, Medina JL, Dorcely B, et al. Pitfalls of HbA1c in the diagnosis of diabetes. The Journal of Clinical Endocrinology & Metabolism. 2020;105(8):2803–11.
- 10. Mostafa SA, Davies MJ, Srinivasan BT, Carey ME, Webb D, Khunti K. Should glycated haemoglobin (HbA1c) be used to detect people with type 2 diabetes mellitus and impaired glucose regulation?. Postgraduate Medical Journal. 2010;86(1021):656–62.
- 11. Staimez LR, Kipling LM, Nina Ham J, Legvold BT, Jackson SL, Wilson PWF, et al. Potential misclassification of diabetes and prediabetes in the U.S.: mismatched HbA1c and glucose in NHANES 2005-2016. Diabetes Res Clin Pract. 2022;189:109935. pmid:35662612
- 12. Grundy SM, Stone NJ, Bailey AL, et al. 2018 AHA/ACC/AACVPR/AAPA/ABC/ACPM/ADA/AGS/APhA/ASPC/NLA/PCNA guideline on the management of blood cholesterol. Circulation. 2019;139(25):e1082–143.
- 13. Wagner R, Heni M, Tabák AG, Machann J, Schick F, Randrianarisoa E, et al. Pathophysiology-based subphenotyping of individuals at elevated risk for type 2 diabetes. Nat Med. 2021;27(1):49–57. pmid:33398163
- 14. Abbasi A, Sahlqvist A-S, Lotta L, Brosnan JM, Vollenweider P, Giabbanelli P, et al. A systematic review of biomarkers and risk of incident type 2 diabetes: an overview of epidemiological, prediction and aetiological research literature. PLoS One. 2016;11(10):e0163721. pmid:27788146
- 15. Wellen KE, Hotamisligil GS. Inflammation, stress, and diabetes. J Clin Invest. 2005;115(5):1111–9. pmid:15864338
- 16. Hotamisligil GS. Inflammation, metaflammation and immunometabolic disorders. Nature. 2017;542(7640):177–85. pmid:28179656
- 17. Donath MY, Shoelson SE. Type 2 diabetes as an inflammatory disease. Nat Rev Immunol. 2011;11(2):98–107. pmid:21233852
- 18. Marrocco I, Altieri F, Peluso I. Measurement and clinical significance of biomarkers of oxidative stress in humans. Oxid Med Cell Longev. 2017;2017:6501046. pmid:28698768
- 19. Odegaard AO, Jacobs DR Jr, Sanchez OA, Goff DC Jr, Reiner AP, Gross MD. Oxidative stress, inflammation, endothelial dysfunction and incidence of type 2 diabetes. Cardiovasc Diabetol. 2016;15:51. pmid:27013319
- 20. An Y, Xu B-T, Wan S-R, Ma X-M, Long Y, Xu Y, et al. The role of oxidative stress in diabetes mellitus-induced vascular endothelial dysfunction. Cardiovasc Diabetol. 2023;22(1):237. pmid:37660030
- 21. Kavakiotis I, Tsave O, Salifoglou A. Machine Learning and Data Mining Methods in Diabetes Research. Computational and Structural Biotechnology Journal. 2017;15:104–16.
- 22. Jelinek HF, Stranieri A, Yatsko A, Venkatraman S. Data analytics identify glycated haemoglobin co-markers for type 2 diabetes mellitus diagnosis. Comput Biol Med. 2016;75:90–7. pmid:27268735
- 23. Zueger T, Schallmoser S, Kraus M, Saar-Tsechansky M, Feuerriegel S, Stettler C. Machine Learning for Predicting the Risk of Transition from Prediabetes to Diabetes. Diabetes Technol Ther. 2022;24(11):842–7. pmid:35848962
- 24. Zhang X, Yao W, Wang D. Development and validation of machine learning models for identifying prediabetes and diabetes in normoglycemia. Diabetes/Metabolism Research and Reviews. 2024;40(8):e70003.
- 25. Yousef H. Exploratory machine learning prediction of diabetes risk using novel biomarkers. Scientific Reports. 2024;14:14409.
- 26. Bader M, Abdelwanis M, Maalouf M, Jelinek HF. Detecting depression severity using weighted random forest and oxidative stress biomarkers. Sci Rep. 2024;14(1):16328. pmid:39009760
- 27. Abdelwanis M, Moawad K, Mohammed S, Hummieda A, Syed S, Maalouf M, et al. Sequential classification approach for enhancing the assessment of cardiac autonomic neuropathy. Comput Biol Med. 2025;190:109999. pmid:40112561
- 28. Salahudeen T, Maalouf M, Elfadel IAM, Jelinek HF. Predicting depression severity using machine learning models: Insights from mitochondrial peptides and clinical factors. PLoS One. 2025;20(5):e0320955. pmid:40367215
- 29. Lee C, Yen K, Cohen P. Humanin: a harbinger of mitochondrial-derived peptides?. Trends in Endocrinology & Metabolism. 2013;24(5):222–8.
- 30. Cai H, Liu Y, Men H, Zheng Y. Protective mechanism of humanin against oxidative stress in aging-related cardiovascular diseases. Front Endocrinol (Lausanne). 2021;12:683151. pmid:34177809
- 31. Tian X, Wang L, Zhong L, Zhang K, Ge X, Luo Z, et al. The research progress and future directions in the pathophysiological mechanisms of type 2 diabetes mellitus from the perspective of precision medicine. Front Med (Lausanne). 2025;12:1555077. pmid:40109716
- 32. F Jelinek H, Wilding C, Tinely P. An innovative multi-disciplinary diabetes complications screening program in a rural community: a description and preliminary results of the screening. Australian Journal of Primary Health. 2006;12(1):14–20.
- 33. Jelinek H, Yatsko A, Stranieri A, Venkatraman S. Novel data mining techniques for incomplete clinical data in diabetes management. BJAST. 2014;4(33):4591–606.
- 34. Kapoor S, Narayanan A. Leakage and the reproducibility crisis in machine-learning-based science. Patterns (N Y). 2023;4(9):100804. pmid:37720327
- 35. Vabalas A, Gowen E, Poliakoff E, Casson AJ. Machine learning algorithm validation with a limited sample size. PLoS One. 2019;14(11):e0224365. pmid:31697686
- 36. Midi H, Sarkar SK, Rana S. Collinearity diagnostics of binary logistic regression model. Journal of Interdisciplinary Mathematics. 2010;13(3):253–67.
- 37. Maalouf M, Siddiqi M. Weighted logistic regression for large-scale imbalanced and rare events data. Knowledge-Based Systems. 2014;59:142–8.
- 38. Breiman L. Random Forests. Machine Learning. 2001;45(1):5–32.
- 39. Ke G, Meng Q, Finley T, et al. LightGBM: a highly efficient gradient boosting decision tree. In: Adv Neural Inf Process Syst. 2017; 30:3146–54.
- 40.
Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016. p. 785–94.
- 41. Al Nuaimi H, Abdelmagid M, Bouabid A. Classification of WatSan technologies using machine learning techniques. Water. 2023;15(15):2829.
- 42. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30:4768–77.
- 43. Friedrich N, Thuesen B, Jørgensen T. Insulin-like growth factor-I and risk of type 2 diabetes mellitus: a systematic review and meta-analysis. Diabetes Care. 2012;35(1):199–209.
- 44. Donath MY. Targeting inflammation in the treatment of type 2 diabetes: time to start. Nat Rev Drug Discov. 2014;13(6):465–76. pmid:24854413
- 45. Esposito K, Nappo F, Giugliano F, Di Palo C, Ciotola M, Barbieri M, et al. Cytokine milieu tends toward inflammation in type 2 diabetes. Diabetes Care. 2003;26(5):1647. pmid:12716849
- 46. Festa A, D’Agostino R Jr, Tracy RP, Haffner SM. Elevated levels of acute-phase proteins and plasminogen activator inhibitor-1 predict the development of type 2 diabetes. Diabetes. 2002;51(4):1131–7.
- 47. Matulewicz N, Karczewska-Kupczewska M. Insulin resistance and chronic inflammation. Postepy Higieny i Medycyny Doswiadczalnej. 2016;70:1245–58.
- 48. Oppong R, Jit M, Smith RD, Butler CC, Melbye H, Mölstad S, et al. Cost-effectiveness of point-of-care C-reactive protein testing to inform antibiotic prescribing decisions. Br J Gen Pract. 2013;63(612):e465-71. pmid:23834883
- 49. Lee JKY, Cradic K, Singh RJ, Jones J, Li J. Discordance of insulin-like growth factor-1 results and interpretation on four different platforms. Clin Chim Acta. 2023;539:130–3. pmid:36528048
- 50. Huang R, Shi J, Wei R, Li J. Challenges of insulin-like growth factor-1 testing. Crit Rev Clin Lab Sci. 2024;61(5):388–403. pmid:38323343
- 51. Minshawi F, Lanvermann S, McKenzie E. The generation of an engineered interleukin-10 protein with improved stability and biological function. Frontiers in Immunology. 2020;11:1794.
- 52. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-sampling Technique. jair. 2002;16:321–57.
- 53. Snoek J, Larochelle H, Adams RP. Practical Bayesian optimization of machine learning algorithms. Adv Neural Inf Process Syst. 2012;25:2951–9.