An inflammatory biomarker panel for prediabetes classification using interpretable machine learning

Maher Maalouf; Maram Tammam; Sana Kurungadan; Asmaa Alsereidi; Muhammad Afzal; Herbert F. Jelinek

doi:10.1371/journal.pone.0341195

Abstract

Objective

Prediabetes is a silent condition that often goes undetected. However, timely interventions could prevent its progression to type 2 diabetes. Traditional glycemic markers, such as hemoglobin A1c (HbA1c), have limitations, creating a need for new diagnostic biomarkers. In this study, our objective was to develop an interpretable machine learning model using biomarkers related to oxidative stress, inflammation, and lipid metabolism to classify prediabetes independently of traditional glycemic markers, such as HbA1c. We also compared multiple biomarker panels to determine which biomarkers offer the highest predictive accuracy.

Methods

We developed and validated interpretable machine learning models using clinical and biomarker data from 545 participants (405 healthy controls and 140 with prediabetes). To ensure robust and generalizable findings, we employed a nested cross-validation technique, managed feature collinearity using the variance inflation factor (VIF), and interpreted the final model with Shapley Additive exPlanations (SHAP) [Kapoor S, Narayanan A. Patterns. 4(9):100804 (2023); Vabalas A, et al. PLoS One. 14(11):e0224365 (2019); Lundberg SM, Lee SI. Adv Neural Inf Process Syst. 30:4768–77 (2017)].

Results

Our approach identified a distinct panel of inflammatory biomarkers (IL-10, IGF-1, and CRP) capable of classifying prediabetes independently of traditional glycemic markers. This non-glycemic model achieved a promising Area Under the Curve (AUC) of 0.711 on holdout validation, establishing inflammation as a key and measurable indicator of early metabolic dysfunction.

Conclusion

Our findings introduce a novel panel of inflammatory biomarkers that show promise in the identification of prediabetes independently of traditional glucose-based measures. By highlighting inflammation as an early indicator of metabolic dysfunction, this approach may enhance precision in the detection of prediabetes. Longitudinal studies with larger and more diverse populations are essential to clinically validate these biomarkers and confirm their value in improving the early diagnosis and management of metabolic health.

Citation: Maalouf M, Tammam M, Kurungadan S, Alsereidi A, Afzal M, Jelinek HF (2026) An inflammatory biomarker panel for prediabetes classification using interpretable machine learning. PLoS One 21(3): e0341195. https://doi.org/10.1371/journal.pone.0341195

Editor: Aleksandra Klisic, University of Montenegro-Faculty of Medicine, MONTENEGRO

Received: October 20, 2025; Accepted: January 3, 2026; Published: March 16, 2026

Copyright: © 2026 Maalouf et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data used in this study are publicly available in the Mendeley Data repository at https://data.mendeley.com/datasets/x8z62gkhhw/1.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

Prediabetes is a common metabolic disorder characterized by elevated blood glucose levels below the threshold for the diagnosis of type 2 diabetes mellitus (T2DM). Almost 590 million adults worldwide live with diabetes and more than 630 million people are estimated to have prediabetes. This underscores a major public health crisis and a critical window for early intervention and prevention [1–3]. Despite its health consequences, including the increased risks of cardiovascular disease, kidney disease, and neuropathy, prediabetes is often undiagnosed because it usually occurs without overt symptoms [4–8].

Traditionally, the diagnosis of prediabetes is based mainly on fasting plasma glucose (FPG) and glycated hemoglobin (HbA1c) tests. However, these methods may not reliably detect early metabolic changes and can sometimes misclassify individuals [9–11]. Therefore, researchers are increasingly exploring alternative diagnostic approaches, such as the evaluation of cardiovascular risk factors and patient subphenotypes, to improve predictive accuracy and early risk identification [12,13].

Recent research has highlighted the importance of biomarkers associated with early metabolic disturbances, independent of traditional glucose markers. Systematic reviews highlight ongoing efforts to identify biomarkers that could improve our understanding and early detection of type 2 diabetes [14]. Researchers now recognize biological processes, such as chronic inflammation, mitochondrial dysfunction, and oxidative stress, as crucial early disruptions that precede overt glycemic disorders [15–19]. Biomarkers related to these processes, mainly markers of oxidative DNA damage, have the potential for early detection and intervention in metabolic disorders [20].

Machine learning (ML), especially interpretable algorithms, provides a powerful approach to identifying complex interactions among biomarkers related to prediabetes [21]. However, current studies based on ML often rely heavily on traditional glucose-based biomarkers, restricting the ability to uncover novel metabolic markers [22–24].

Recent studies in predictive modeling have emphasized the predictive potential of biomarkers. For example, these biomarkers have been applied to directly predict the risk of diabetes using machine learning [25]. In addition, these techniques have been shown to be more comprehensive in their ability to evaluate disease progression using inflammatory and related markers in other chronic conditions that frequently coexist with metabolic disorders, such as depression [26] and cardiac autonomic neuropathy [27].

To address this gap, our study uses a targeted ML approach specifically designed to evaluate the predictive value of non-glycemic biomarkers. We systematically examine biomarkers associated with mitochondrial function, inflammation, and oxidative stress, which have previously been shown to be relevant for the prediction of chronic disease [28–30]. By applying robust statistical methods and interpretable machine learning approaches (SHAP), our objective is to identify biomarkers that predict prediabetes independently of HbA1c. Clarifying the independent predictive value of these biomarkers could significantly improve early risk identification and enable more targeted preventive interventions in clinical practice [31].

2 Methods

2.1 Study design and data source

This study is a secondary analysis of data from the DiabHealth rural diabetes screening clinic, which prospectively collected participant data between 2002 and 2015. The original data collection received full ethical approval from the Charles Sturt University Human Research Ethics Committee (CSU HREC; protocol 2006/042), and all participants provided their written informed consent [32]. For the present research, the pre-existing, de-identified dataset was accessed in January 2025 solely for statistical analysis; no participant contact, recruitment, or data collection occurred after 2015. Data were cleaned for the parent study using established protocols described by Jelinek et al. [33], which gives a complete analytic dataset for the variables examined here.

2.2 Study population and outcome definition

The analytical cohort for this study was derived from the parent DiabHealth dataset, which initially included 847 participants. From this cohort, individuals with a prior diagnosis of diabetes were first excluded, resulting in a sample of 604 individuals. To further refine the cohort to specifically isolate the prediabetic state and minimize the confounding effects associated with advanced age, we applied additional exclusion criteria. Participants with fasting glucose levels ≥ 7.0 mmol/L (indicative of previously undiagnosed diabetes) or those over 85 years of age were removed. This final selection process yielded a study cohort of 545 participants. Prediabetes was defined according to the criteria of the American Diabetes Association (ADA) [5]: participants with fasting screening glucose between 5.6 and 6.9 mmol/L (inclusive) were classified as ’prediabetes’ (n = 140), and those with levels less than 5.6 mmol/L were classified as ’control’ (n = 405). Baseline characteristics were compared between groups using Welch’s t-test for continuous variables and the -test for categorical variables, implemented using Python’s SciPy library.

2.3 Biomarker panels for analysis

The biomarkers in this study were selected based on their established involvement in metabolic disorders and their potential predictive value for T2DM (Table 1). The biomarker panel included biomarkers of lipid metabolism (triglycerides, total cholesterol [TC], high-density lipoprotein [HDL], low-density lipoprotein [LDL]), oxidative stress (GSH, GSSG, GSH/GSSG, 8-OHdG), mitochondrial function (Humanin, MOTS-c, p66Shc), inflammation (CRP, IL-6, IL-1β, IL-10, MCP-1, IGF-1), the glycemic marker HbA1c, and the demographic variable age.

Download:

Table 1. Description of biomarkers and clinical features used in the study.

https://doi.org/10.1371/journal.pone.0341195.t001

Inflammatory biomarkers such as interleukin-6 (IL-6), C-reactive protein (CRP), and interleukin-1β (IL-1β) were included because they link chronic low-grade inflammation to insulin resistance through activation of the c-Jun N-terminal kinase (JNK) and IκB kinase β (IKKβ)–nuclear factor κB (NF-κB) signaling pathways [15,17]. Oxidative stress biomarkers, particularly 8-hydroxy-2’-deoxyguanosine (8-OHdG) and the reduced-to-oxidized glutathione ratio (GSH/GSSG), were selected based on evidence that systemic oxidative damage precedes the clinical onset of hyperglycemia [18,20]. Mitochondrial-derived peptides, including Humanin and the mitochondrial open reading frame of 12S rRNA type-c (MOTS-c), were included as emerging metabolic regulators that modulate systemic insulin sensitivity [29].

To systematically assess predictive capacity and isolate non-glycemic signals, variables were grouped into seven biomarker panels (Table 2). Three panels captured single biological domains: (i) cholesterol_only (lipid biomarkers: triglycerides, TC, HDL, LDL), (ii) oxidative_only (oxidative stress biomarkers: GSH, GSSG, GSH/GSSG, 8-OHdG), and (iii) inflammatory_only (inflammatory biomarkers: CRP, IL-6, IL-1β, IL-10, MCP-1, IGF-1). Two additional panels evaluated conventional risk markers in isolation: (iv) HbA1c_only (glycated hemoglobin) and (v) age_only.

Download:

Table 2. Biomarker panels used in analysis.

https://doi.org/10.1371/journal.pone.0341195.t002

The remaining panels were defined to contrast non-glycemic biomarker information with more traditional predictors. The all_biomarkers panel combines all non-glycemic biomarkers (lipid, oxidative stress, mitochondrial function, and inflammatory markers), explicitly excluding HbA1c and age, reflecting the predictive capacity of biological biomarkers alone. The all_features panel includes all biomarkers (non-glycemic biomarkers and HbA1c) together with age. Clinical variables such as hypertension status, history of cardiovascular disease, and gender were deliberately excluded from all panels to focus on the predictive contribution of biomarkers and age. Future studies could integrate these clinical and demographic factors into extended risk prediction models.

2.4 Machine learning and statistical analysis

We developed a robust predictive model using Python with libraries including scikit-learn, pandas, statsmodels, XGBoost, LightGBM, and SHAP. The analysis strictly adhered to best practices to avoid data leakage and ensure external validity and generalizability of the findings [34,35]. The dataset was divided into a main set (80% of the data) for initial model development and a holdout set (20%) for final evaluation.

2.4.1 Model evaluation and hyperparameter tuning.

To ensure robust model development and evaluation, we applied nested cross-validation consisting of two levels: an outer five-fold cross-validation for model evaluation and an inner three-fold cross-validation within each outer fold for hyperparameter tuning. Hyperparameters were optimized using grid search (GridSearchCV) by maximizing the Area Under the Receiver Operating Characteristic Curve (AUC). The final performance metrics are the aggregated predictions from each test fold, providing unbiased estimates of generalization performance.

2.4.2 Data processing pipeline.

All preprocessing steps were encapsulated within a scikit-learn pipeline object and applied independently inside each training fold to prevent data leakage [35]. The pipeline included:

Multicollinearity reduction: A custom transformer iteratively removed the feature with the highest variance inflation factor (VIF) until all remaining features had a VIF below 5.0, resulting in a simpler, more interpretable model free of highly correlated variables [36].
Standardization: Numeric features were standardized (mean = 0, standard deviation = 1) using StandardScaler, and categorical features encoded with OneHotEncoder.

2.4.3 Model training and selection.

We evaluated four machine learning algorithms: Logistic Regression for interpretability [37], Random Forest for robustness against noise and outliers [38], and LightGBM [39] and XGBoost [40] for computational efficiency and high predictive performance [41]. The pipeline adhered strictly to best practices to prevent data leakage and ensure the generalizability of the model [34,35]. To manage class imbalance, we configured Logistic Regression, Random Forest, and LightGBM to use class_weight = ‘balanced’, while for XGBoost we used an adaptive scale_pos_weight based on the class distribution in each training fold. The hyperparameter grids explored are detailed in Table 3. The final model and the biomarker panel were selected on the basis of the highest mean AUC obtained during cross-validation. The class imbalance between controls and prediabetic subjects was addressed by class weighting, without the application of matching or additional imbalance adjustment techniques.

Download:

Table 3. Hyperparameter grids used for optimization.

https://doi.org/10.1371/journal.pone.0341195.t003

2.4.4 Final model calibration and interpretation.

The selected final model was re-trained in the entire training set and calibrated using isotonic regression (calibratedClassifierCV). Performance was evaluated on the independent holdout set. Feature importance was assessed using Shapley Additive exPlanations (SHAP), a method that quantifies the contribution of each feature to model predictions [42]. SHAP calculations were based on a background sample of 100 observations, using the general shap.Explainer to automatically select the appropriate explainer type.

3 Results

3.1 Baseline characteristics

The final analytical cohort consisted of 405 controls (74.3%) and 140 individuals with prediabetes (25.7%). The prediabetic group was significantly older (p = 0.002) and exhibited significant differences in markers of oxidative stress (GSSG, p = 0.002; GSH/GSSG ratio, p < 0.001) and mitochondrial function (MOTS-c, p = 0.002; p66Shc, p = 0.001) compared to controls. As expected, the glucose and HbA1c levels in the screen were significantly higher in the prediabetes group (p < 0.001 for both). Hypertension status, cardiovascular disease history, and gender did not differ significantly between groups (all p > 0.05) and therefore were not included as predictors in subsequent modeling, as Table 4 shows.

Download:

Table 4. Baseline characteristics of the study population (n = 545).

https://doi.org/10.1371/journal.pone.0341195.t004

3.2 Model performance and selection

After applying a multicollinearity threshold (VIF < 5), our analysis identified a LightGBM model using the inflammatory_only biomarker panel as the optimal classifier. It achieved the highest mean area under the curve (AUC = 0.743) across cross-validation folds, outperforming other biomarker combinations (Table 5).

Download:

Table 5. Performance summary of the best model for each biomarker panel (VIF < 5).

https://doi.org/10.1371/journal.pone.0341195.t005

The inflammatory_only (LightGBM) panel produced the highest mean AUC (0.743). Although its performance was not statistically distinguishable from the all_features, all_biomarkers and oxidative_only panels due to overlap of the 95% confidence intervals, all these panels were significantly better than the baseline models such as HbA1c_only and age_only, whose confidence intervals did not overlap with those four top panels. Therefore, the inflammatory_only panel was selected as the final model due to its high predictive accuracy and fewer variables.

We then evaluated the final model on an independent holdout test set, where it demonstrated moderate predictive capacity with an AUC of 0.711 (95% CI: 0.591–0.824; Fig 1A). Detailed performance metrics, including precision, recall, specificity, accuracy, and F1 score, together with their corresponding 95% confidence intervals, are summarized in Table 6. The confusion matrix visualizing the model predictions on the holdout set is provided in Fig 2. The calibration curve indicated an acceptable agreement between the predicted probabilities and the actual observations, although minor deviations suggest potential areas for further improvement (Fig 1B). The relatively wide confidence intervals emphasize the importance of future validation in larger datasets to achieve more precise estimates.

Download:

Fig 1. Final validation of the inflammatory panel model in the holdout test set.

(A) ROC curve (AUC = 0.711). (B) Calibration curve demonstrating acceptable agreement between predicted and observed probabilities.

https://doi.org/10.1371/journal.pone.0341195.g001

Download:

Table 6. Updated classification metrics of the final model on the holdout test set.

https://doi.org/10.1371/journal.pone.0341195.t006

Download:

Fig 2. Confusion matrix of the final LightGBM model on the holdout test set.

https://doi.org/10.1371/journal.pone.0341195.g002

Although the expanded biomarker panel (VIF < 10) included additional relevant features, it yielded slightly lower overall discriminatory power in the holdout set (AUC = 0.699, 95% CI: 0.585–0.809) than the primary inflammatory panel (AUC = 0.711, 95% CI: 0.591–0.824). While the expanded panel demonstrated higher specificity (0.951) and accuracy (0.807), its significantly lower recall (0.393 vs. 0.500) supports the selection of the more parsimonious inflammatory set for early-stage screening, where identifying a greater proportion of at-risk individuals is a priority (S3 Table and S2 Fig).

3.3 SHAP interpretation of key biomarkers

SHAP feature importance analysis identified IGF-1, IL-10, and CRP as the most influential biomarkers in the final LightGBM model (Fig 3). These biomarkers consistently demonstrated strong predictive contributions, emphasizing the central role of inflammation in the early metabolic disorder associated with prediabetes. In particular, CRP emerged among the top three biomarkers in both SHAP analyses with multicollinearity thresholds of VIF < 5 and VIF < 10 (see Supplementary S1 Fig), highlighting its consistency as a predictive marker. A comparative sensitivity analysis using a multicollinearity threshold (VIF < 10) (see Supplementary S2 Table) underscored the advantage of our focused inflammatory biomarker approach, as the inclusion of additional correlated biomarkers reduced the interpretability and clarity of the feature importance estimates.

Download:

Fig 3. SHAP analysis of the final LightGBM model.

(A) Global feature importance, ranking predictors by their mean absolute impact. (B) Beeswarm plot showing the impact of each predictor’s value on the model output for every individual. The analysis highlights IGF-1, IL-10, and CRP as top influential biomarkers.

https://doi.org/10.1371/journal.pone.0341195.g003

4 Discussion

This study successfully identified a panel of inflammatory biomarkers (IGF-1, IL-10, and CRP) that shows potential to predict prediabetes without relying on traditional glycemic markers. Although the holdout validation AUC of 0.711 indicates moderate predictive power, its true significance lies in achieving this performance without any glycemic input. This finding establishes that a distinct inflammatory signal is present and can be independently detected in the prediabetic state, offering a fundamentally new axis for early risk assessment and underscoring the biological importance of our findings.

The identification of inflammatory biomarkers underscores the biological significance of our findings. IGF-1 is recognized for its role in metabolic homeostasis and modulation of inflammation [43]. Previous research highlights the potential to target inflammation in metabolic diseases such as diabetes [44]. IL-10 has been associated with the regulation of chronic inflammation in metabolic disorders [45]. Finally, CRP, a general marker of inflammation, has demonstrated predictive capacity for insulin resistance and the subsequent development of diabetes [46]. The performance of this panel aligns well with the hypothesis that chronic inflammation is a primary driver of insulin resistance that defines prediabetes [47].

A key strength of our methodology was the consideration of multicollinearity among variables to improve the interpretability of our results and minimize the effect of correlation. Using a threshold for VIF < 5, our analysis identified a concise and efficient inflammatory panel. In a sensitivity analysis using a less restrictive threshold of VIF < 10, the panel that included all biomarkers produced a mean cross-validation AUC of 0.727 (95% CI: 0.708–0.747; S2 Table), with GSSG, triglycerides, and CRP emerging as the most critical features (S1 Fig). However, this broader panel did not offer a performance advantage over our more focused inflammatory panel (Mean AUC: 0.727 vs. 0.743; S2 Table and S1 Table, respectively). This highlights a classic trade-off: for this dataset, the simpler inflammatory model provided the best balance of predictive accuracy and interpretability.

Regarding the holdout set, the performance metrics of our model highlight its ideal clinical application. Although the AUC was 0.711 and the F1 score was 0.538 in an imbalanced cohort, the interplay between high specificity (0.877) and moderate sensitivity (0.500) is particularly informative. This profile suggests that the optimal role of the model is not as a standalone diagnostic test but as a highly effective screening tool to identify a subset of at-risk individuals who would most benefit from confirmatory glycemic testing. The successful identification of half of actual prediabetic cases using an independent biological signal represents a significant advance for targeted preventive medicine.

4.1 Practical feasibility and implementation considerations

Although identified inflammatory biomarkers (IGF-1, IL-10, and CRP) show promising predictive potential, practical considerations, such as cost, availability, and laboratory requirements, significantly influence their use in routine screening settings. The C-reactive protein (CRP) is the most feasible and practical of the three [48]. However, insulin-like growth factor (IGF-1) testing, although clinically accessible, generally involves higher costs, requires specialized equipment, and presents variability between different assay platforms, thus limiting its widespread implementation [49,50]. The interleukin-10 (IL-10) assay is primarily a research-based test, available only to a limited extent due to its high costs, complex logistics for sample handling, and insufficient standardization, restricting its immediate application in clinical screening [51]. Although these biomarker tests indicate promising predictive accuracy, it should be noted that standard HbA1c tests are generally more cost-effective and widely accessible. Thus, the biomarker panel identified here is expected to complement rather than replace existing diagnostic procedures.

4.2 Limitations and future directions

This study has several limitations, each guiding clear directions for future research:

Generalizability and cohort diversity: A sample of 545 from a single rural location limits the generalizability of our findings. Biomarker expression can vary substantially between diverse ethnic and geographic populations, highlighting the need for external validation in more extensive and diverse study cohorts. Using a biomarker panel with a Variance Inflation Factor of less than 10 could lead to more accurate predictive abilities in larger datasets.
Methodological robustness: Due to our relatively small sample size, our findings may be affected by the random number seed used for data partitioning. Future research should replicate the analyses using various randomized seed values to confirm the consistency of biomarkers.
Biomarkers interactions and interpretability Although SHAP highlights the relative importance of individual biomarkers, it does not fully capture the interactions between them. Future research could incorporate advanced interaction methods, such as SHAP interaction values or alternative explainable techniques, to deepen the understanding of how these biomarkers interact biologically and improve predictive models.
Confounding variables and model complexity: Our study mainly focused on biomarkers, but variables such as age and other demographic characteristics were notable confounders. Future research should create hybrid predictive models that integrate biomarkers with additional demographic and clinical factors, such as BMI and waist circumference. Methods such as stratification by age groups would further clarify the effects of these biomarkers, regardless of confounders.
Biomarker dynamics and advanced modeling: Our study was cross-sectional, capturing biomarkers at a specific point in time. Future studies should include longitudinal data to better understand their dynamics. Furthermore, exploring advanced techniques such as the synthetic minority sampling technique (SMOTE) [52], Bayesian optimization [53], and adaptive probability thresholds could improve predictive accuracy.
Optimal classification thresholds: Our analysis used a standard fixed probability threshold of 0.5. Future work should investigate adaptive thresholds, potentially improving metrics such as the F1 score and recall, particularly in datasets with significant class imbalance.

5 Conclusion

This study introduces a novel biomarker-based approach for the detection of prediabetes, highlighting inflammatory biomarkers (IGF-1, IL-10, and CRP) as promising early indicators, independent of traditional glucose-based methods. Using interpretable machine learning techniques, our model demonstrated promising predictive performance (AUC = 0.711), establishing inflammation as a biological signal of early metabolic dysfunction. Although further validation in larger, diverse populations and practical considerations such as test standardization and cost must be addressed, our findings offer a meaningful step toward precision diagnostics. Ultimately, incorporating inflammation-focused biomarkers into routine screening protocols could facilitate earlier preventive interventions, significantly reducing the risk of progression to type 2 diabetes.

Supporting information

S1 File. Combined supporting information.

This file contains supplementry tables.

https://doi.org/10.1371/journal.pone.0341195.s001

(PDF)

S1 Fig. SHAP analysis of the best model from the analysis: Shows global feature importance (bar plot) and a SHAP summary beeswarm plot identifying GSSG, Triglyceride, and CRP as the top predictors.

https://doi.org/10.1371/journal.pone.0341195.s002

(TIF)

S2 Fig. Holdout set validation for the best model: Includes the ROC curve, calibration plot, and confusion matrix for the expanded all-biomarker panel.

https://doi.org/10.1371/journal.pone.0341195.s003

(TIF)

References

1. International Diabetes Federation. IDF Diabetes Atlas 11th edition: Global estimates 589 million adults with diabetes and 635 million with impaired glucose tolerance (prediabetes). 2025. https://diabetesatlas.org/resources/idf-diabetes-atlas-2025/
2. Sun H, Saeedi P, Karuranga S, Pinkepank M, Ogurtsova K, Duncan BB, et al. IDF Diabetes Atlas: global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res Clin Pract. 2022;183:109119. pmid:34879977
- View Article
- PubMed/NCBI
- Google Scholar
3. Chung WK, Erion K, Florez JC, Hattersley AT, Hivert M-F, Lee CG, et al. Precision medicine in diabetes: a Consensus Report from the American Diabetes Association (ADA) and the European Association for the Study of Diabetes (EASD). Diabetologia. 2020;63(9):1671–93. pmid:32556613
- View Article
- PubMed/NCBI
- Google Scholar
4. Tabák AG, Herder C, Rathmann W, Brunner EJ, Kivimäki M. Prediabetes: a high-risk state for diabetes development. Lancet. 2012;379(9833):2279–90. pmid:22683128
- View Article
- PubMed/NCBI
- Google Scholar
5. American Diabetes Association Professional Practice Committee. 2. Diagnosis and classification of diabetes: standards of care in diabetes-2024. Diabetes Care. 2024;47(Suppl 1):S20–42. pmid:38078589
- View Article
- PubMed/NCBI
- Google Scholar
6. Liu J, Grundy SM, Wang W, Smith SC Jr, Vega GL, Wu Z, et al. Ten-year risk of cardiovascular incidence related to diabetes, prediabetes, and the metabolic syndrome. Am Heart J. 2007;153(4):552–8. pmid:17383293
- View Article
- PubMed/NCBI
- Google Scholar
7. Di Pino A, Urbano F, Piro S, Purrello F, Rabuazzo AM. Update on pre-diabetes: focus on diagnostic criteria and cardiovascular risk. World J Diabetes. 2016;7(18):423–32. pmid:27795816
- View Article
- PubMed/NCBI
- Google Scholar
8. Hostalek U. Global epidemiology of prediabetes - present and future perspectives. Clin Diabetes Endocrinol. 2019;5:5. pmid:31086677
- View Article
- PubMed/NCBI
- Google Scholar
9. Bergman M, Abdul-Ghani M, Neves JS, Monteiro MP, Medina JL, Dorcely B, et al. Pitfalls of HbA1c in the diagnosis of diabetes. The Journal of Clinical Endocrinology & Metabolism. 2020;105(8):2803–11.
- View Article
- Google Scholar
10. Mostafa SA, Davies MJ, Srinivasan BT, Carey ME, Webb D, Khunti K. Should glycated haemoglobin (HbA1c) be used to detect people with type 2 diabetes mellitus and impaired glucose regulation?. Postgraduate Medical Journal. 2010;86(1021):656–62.
- View Article
- Google Scholar
11. Staimez LR, Kipling LM, Nina Ham J, Legvold BT, Jackson SL, Wilson PWF, et al. Potential misclassification of diabetes and prediabetes in the U.S.: mismatched HbA1c and glucose in NHANES 2005-2016. Diabetes Res Clin Pract. 2022;189:109935. pmid:35662612
- View Article
- PubMed/NCBI
- Google Scholar
12. Grundy SM, Stone NJ, Bailey AL, et al. 2018 AHA/ACC/AACVPR/AAPA/ABC/ACPM/ADA/AGS/APhA/ASPC/NLA/PCNA guideline on the management of blood cholesterol. Circulation. 2019;139(25):e1082–143.
- View Article
- Google Scholar
13. Wagner R, Heni M, Tabák AG, Machann J, Schick F, Randrianarisoa E, et al. Pathophysiology-based subphenotyping of individuals at elevated risk for type 2 diabetes. Nat Med. 2021;27(1):49–57. pmid:33398163
- View Article
- PubMed/NCBI
- Google Scholar
14. Abbasi A, Sahlqvist A-S, Lotta L, Brosnan JM, Vollenweider P, Giabbanelli P, et al. A systematic review of biomarkers and risk of incident type 2 diabetes: an overview of epidemiological, prediction and aetiological research literature. PLoS One. 2016;11(10):e0163721. pmid:27788146
- View Article
- PubMed/NCBI
- Google Scholar
15. Wellen KE, Hotamisligil GS. Inflammation, stress, and diabetes. J Clin Invest. 2005;115(5):1111–9. pmid:15864338
- View Article
- PubMed/NCBI
- Google Scholar
16. Hotamisligil GS. Inflammation, metaflammation and immunometabolic disorders. Nature. 2017;542(7640):177–85. pmid:28179656
- View Article
- PubMed/NCBI
- Google Scholar
17. Donath MY, Shoelson SE. Type 2 diabetes as an inflammatory disease. Nat Rev Immunol. 2011;11(2):98–107. pmid:21233852
- View Article
- PubMed/NCBI
- Google Scholar
18. Marrocco I, Altieri F, Peluso I. Measurement and clinical significance of biomarkers of oxidative stress in humans. Oxid Med Cell Longev. 2017;2017:6501046. pmid:28698768
- View Article
- PubMed/NCBI
- Google Scholar
19. Odegaard AO, Jacobs DR Jr, Sanchez OA, Goff DC Jr, Reiner AP, Gross MD. Oxidative stress, inflammation, endothelial dysfunction and incidence of type 2 diabetes. Cardiovasc Diabetol. 2016;15:51. pmid:27013319
- View Article
- PubMed/NCBI
- Google Scholar
20. An Y, Xu B-T, Wan S-R, Ma X-M, Long Y, Xu Y, et al. The role of oxidative stress in diabetes mellitus-induced vascular endothelial dysfunction. Cardiovasc Diabetol. 2023;22(1):237. pmid:37660030
- View Article
- PubMed/NCBI
- Google Scholar
21. Kavakiotis I, Tsave O, Salifoglou A. Machine Learning and Data Mining Methods in Diabetes Research. Computational and Structural Biotechnology Journal. 2017;15:104–16.
- View Article
- Google Scholar
22. Jelinek HF, Stranieri A, Yatsko A, Venkatraman S. Data analytics identify glycated haemoglobin co-markers for type 2 diabetes mellitus diagnosis. Comput Biol Med. 2016;75:90–7. pmid:27268735
- View Article
- PubMed/NCBI
- Google Scholar
23. Zueger T, Schallmoser S, Kraus M, Saar-Tsechansky M, Feuerriegel S, Stettler C. Machine Learning for Predicting the Risk of Transition from Prediabetes to Diabetes. Diabetes Technol Ther. 2022;24(11):842–7. pmid:35848962
- View Article
- PubMed/NCBI
- Google Scholar
24. Zhang X, Yao W, Wang D. Development and validation of machine learning models for identifying prediabetes and diabetes in normoglycemia. Diabetes/Metabolism Research and Reviews. 2024;40(8):e70003.
- View Article
- Google Scholar
25. Yousef H. Exploratory machine learning prediction of diabetes risk using novel biomarkers. Scientific Reports. 2024;14:14409.
- View Article
- Google Scholar
26. Bader M, Abdelwanis M, Maalouf M, Jelinek HF. Detecting depression severity using weighted random forest and oxidative stress biomarkers. Sci Rep. 2024;14(1):16328. pmid:39009760
- View Article
- PubMed/NCBI
- Google Scholar
27. Abdelwanis M, Moawad K, Mohammed S, Hummieda A, Syed S, Maalouf M, et al. Sequential classification approach for enhancing the assessment of cardiac autonomic neuropathy. Comput Biol Med. 2025;190:109999. pmid:40112561
- View Article
- PubMed/NCBI
- Google Scholar
28. Salahudeen T, Maalouf M, Elfadel IAM, Jelinek HF. Predicting depression severity using machine learning models: Insights from mitochondrial peptides and clinical factors. PLoS One. 2025;20(5):e0320955. pmid:40367215
- View Article
- PubMed/NCBI
- Google Scholar
29. Lee C, Yen K, Cohen P. Humanin: a harbinger of mitochondrial-derived peptides?. Trends in Endocrinology & Metabolism. 2013;24(5):222–8.
- View Article
- Google Scholar
30. Cai H, Liu Y, Men H, Zheng Y. Protective mechanism of humanin against oxidative stress in aging-related cardiovascular diseases. Front Endocrinol (Lausanne). 2021;12:683151. pmid:34177809
- View Article
- PubMed/NCBI
- Google Scholar
31. Tian X, Wang L, Zhong L, Zhang K, Ge X, Luo Z, et al. The research progress and future directions in the pathophysiological mechanisms of type 2 diabetes mellitus from the perspective of precision medicine. Front Med (Lausanne). 2025;12:1555077. pmid:40109716
- View Article
- PubMed/NCBI
- Google Scholar
32. F Jelinek H, Wilding C, Tinely P. An innovative multi-disciplinary diabetes complications screening program in a rural community: a description and preliminary results of the screening. Australian Journal of Primary Health. 2006;12(1):14–20.
- View Article
- Google Scholar
33. Jelinek H, Yatsko A, Stranieri A, Venkatraman S. Novel data mining techniques for incomplete clinical data in diabetes management. BJAST. 2014;4(33):4591–606.
- View Article
- Google Scholar
34. Kapoor S, Narayanan A. Leakage and the reproducibility crisis in machine-learning-based science. Patterns (N Y). 2023;4(9):100804. pmid:37720327
- View Article
- PubMed/NCBI
- Google Scholar
35. Vabalas A, Gowen E, Poliakoff E, Casson AJ. Machine learning algorithm validation with a limited sample size. PLoS One. 2019;14(11):e0224365. pmid:31697686
- View Article
- PubMed/NCBI
- Google Scholar
36. Midi H, Sarkar SK, Rana S. Collinearity diagnostics of binary logistic regression model. Journal of Interdisciplinary Mathematics. 2010;13(3):253–67.
- View Article
- Google Scholar
37. Maalouf M, Siddiqi M. Weighted logistic regression for large-scale imbalanced and rare events data. Knowledge-Based Systems. 2014;59:142–8.
- View Article
- Google Scholar
38. Breiman L. Random Forests. Machine Learning. 2001;45(1):5–32.
- View Article
- Google Scholar
39. Ke G, Meng Q, Finley T, et al. LightGBM: a highly efficient gradient boosting decision tree. In: Adv Neural Inf Process Syst. 2017; 30:3146–54.
- View Article
- Google Scholar
40. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016. p. 785–94.
41. Al Nuaimi H, Abdelmagid M, Bouabid A. Classification of WatSan technologies using machine learning techniques. Water. 2023;15(15):2829.
- View Article
- Google Scholar
42. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30:4768–77.
- View Article
- Google Scholar
43. Friedrich N, Thuesen B, Jørgensen T. Insulin-like growth factor-I and risk of type 2 diabetes mellitus: a systematic review and meta-analysis. Diabetes Care. 2012;35(1):199–209.
- View Article
- Google Scholar
44. Donath MY. Targeting inflammation in the treatment of type 2 diabetes: time to start. Nat Rev Drug Discov. 2014;13(6):465–76. pmid:24854413
- View Article
- PubMed/NCBI
- Google Scholar
45. Esposito K, Nappo F, Giugliano F, Di Palo C, Ciotola M, Barbieri M, et al. Cytokine milieu tends toward inflammation in type 2 diabetes. Diabetes Care. 2003;26(5):1647. pmid:12716849
- View Article
- PubMed/NCBI
- Google Scholar
46. Festa A, D’Agostino R Jr, Tracy RP, Haffner SM. Elevated levels of acute-phase proteins and plasminogen activator inhibitor-1 predict the development of type 2 diabetes. Diabetes. 2002;51(4):1131–7.
- View Article
- Google Scholar
47. Matulewicz N, Karczewska-Kupczewska M. Insulin resistance and chronic inflammation. Postepy Higieny i Medycyny Doswiadczalnej. 2016;70:1245–58.
- View Article
- Google Scholar
48. Oppong R, Jit M, Smith RD, Butler CC, Melbye H, Mölstad S, et al. Cost-effectiveness of point-of-care C-reactive protein testing to inform antibiotic prescribing decisions. Br J Gen Pract. 2013;63(612):e465-71. pmid:23834883
- View Article
- PubMed/NCBI
- Google Scholar
49. Lee JKY, Cradic K, Singh RJ, Jones J, Li J. Discordance of insulin-like growth factor-1 results and interpretation on four different platforms. Clin Chim Acta. 2023;539:130–3. pmid:36528048
- View Article
- PubMed/NCBI
- Google Scholar
50. Huang R, Shi J, Wei R, Li J. Challenges of insulin-like growth factor-1 testing. Crit Rev Clin Lab Sci. 2024;61(5):388–403. pmid:38323343
- View Article
- PubMed/NCBI
- Google Scholar
51. Minshawi F, Lanvermann S, McKenzie E. The generation of an engineered interleukin-10 protein with improved stability and biological function. Frontiers in Immunology. 2020;11:1794.
- View Article
- Google Scholar
52. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-sampling Technique. jair. 2002;16:321–57.
- View Article
- Google Scholar
53. Snoek J, Larochelle H, Adams RP. Practical Bayesian optimization of machine learning algorithms. Adv Neural Inf Process Syst. 2012;25:2951–9.
- View Article
- Google Scholar

[ref1] 1. International Diabetes Federation. IDF Diabetes Atlas 11th edition: Global estimates 589 million adults with diabetes and 635 million with impaired glucose tolerance (prediabetes). 2025. https://diabetesatlas.org/resources/idf-diabetes-atlas-2025/

[ref2] 2. Sun H, Saeedi P, Karuranga S, Pinkepank M, Ogurtsova K, Duncan BB, et al. IDF Diabetes Atlas: global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res Clin Pract. 2022;183:109119. pmid:34879977
View Article
PubMed/NCBI
Google Scholar

[3] View Article

[4] PubMed/NCBI

[5] Google Scholar

[ref3] 3. Chung WK, Erion K, Florez JC, Hattersley AT, Hivert M-F, Lee CG, et al. Precision medicine in diabetes: a Consensus Report from the American Diabetes Association (ADA) and the European Association for the Study of Diabetes (EASD). Diabetologia. 2020;63(9):1671–93. pmid:32556613
View Article
PubMed/NCBI
Google Scholar

[7] View Article

[8] PubMed/NCBI

[9] Google Scholar

[ref4] 4. Tabák AG, Herder C, Rathmann W, Brunner EJ, Kivimäki M. Prediabetes: a high-risk state for diabetes development. Lancet. 2012;379(9833):2279–90. pmid:22683128
View Article
PubMed/NCBI
Google Scholar

[11] View Article

[12] PubMed/NCBI

[13] Google Scholar

[ref5] 5. American Diabetes Association Professional Practice Committee. 2. Diagnosis and classification of diabetes: standards of care in diabetes-2024. Diabetes Care. 2024;47(Suppl 1):S20–42. pmid:38078589
View Article
PubMed/NCBI
Google Scholar

[15] View Article

[16] PubMed/NCBI

[17] Google Scholar

[ref6] 6. Liu J, Grundy SM, Wang W, Smith SC Jr, Vega GL, Wu Z, et al. Ten-year risk of cardiovascular incidence related to diabetes, prediabetes, and the metabolic syndrome. Am Heart J. 2007;153(4):552–8. pmid:17383293
View Article
PubMed/NCBI
Google Scholar

[19] View Article

[20] PubMed/NCBI

[21] Google Scholar

[ref7] 7. Di Pino A, Urbano F, Piro S, Purrello F, Rabuazzo AM. Update on pre-diabetes: focus on diagnostic criteria and cardiovascular risk. World J Diabetes. 2016;7(18):423–32. pmid:27795816
View Article
PubMed/NCBI
Google Scholar

[23] View Article

[24] PubMed/NCBI

[25] Google Scholar

[ref8] 8. Hostalek U. Global epidemiology of prediabetes - present and future perspectives. Clin Diabetes Endocrinol. 2019;5:5. pmid:31086677
View Article
PubMed/NCBI
Google Scholar

[27] View Article

[28] PubMed/NCBI

[29] Google Scholar

[ref9] 9. Bergman M, Abdul-Ghani M, Neves JS, Monteiro MP, Medina JL, Dorcely B, et al. Pitfalls of HbA1c in the diagnosis of diabetes. The Journal of Clinical Endocrinology & Metabolism. 2020;105(8):2803–11.
View Article
Google Scholar

[31] View Article

[32] Google Scholar

[ref10] 10. Mostafa SA, Davies MJ, Srinivasan BT, Carey ME, Webb D, Khunti K. Should glycated haemoglobin (HbA1c) be used to detect people with type 2 diabetes mellitus and impaired glucose regulation?. Postgraduate Medical Journal. 2010;86(1021):656–62.
View Article
Google Scholar

[34] View Article

[35] Google Scholar

[ref11] 11. Staimez LR, Kipling LM, Nina Ham J, Legvold BT, Jackson SL, Wilson PWF, et al. Potential misclassification of diabetes and prediabetes in the U.S.: mismatched HbA1c and glucose in NHANES 2005-2016. Diabetes Res Clin Pract. 2022;189:109935. pmid:35662612
View Article
PubMed/NCBI
Google Scholar

[37] View Article

[38] PubMed/NCBI

[39] Google Scholar

[ref12] 12. Grundy SM, Stone NJ, Bailey AL, et al. 2018 AHA/ACC/AACVPR/AAPA/ABC/ACPM/ADA/AGS/APhA/ASPC/NLA/PCNA guideline on the management of blood cholesterol. Circulation. 2019;139(25):e1082–143.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref13] 13. Wagner R, Heni M, Tabák AG, Machann J, Schick F, Randrianarisoa E, et al. Pathophysiology-based subphenotyping of individuals at elevated risk for type 2 diabetes. Nat Med. 2021;27(1):49–57. pmid:33398163
View Article
PubMed/NCBI
Google Scholar

[44] View Article

[45] PubMed/NCBI

[46] Google Scholar

[ref14] 14. Abbasi A, Sahlqvist A-S, Lotta L, Brosnan JM, Vollenweider P, Giabbanelli P, et al. A systematic review of biomarkers and risk of incident type 2 diabetes: an overview of epidemiological, prediction and aetiological research literature. PLoS One. 2016;11(10):e0163721. pmid:27788146
View Article
PubMed/NCBI
Google Scholar

[48] View Article

[49] PubMed/NCBI

[50] Google Scholar

[ref15] 15. Wellen KE, Hotamisligil GS. Inflammation, stress, and diabetes. J Clin Invest. 2005;115(5):1111–9. pmid:15864338
View Article
PubMed/NCBI
Google Scholar

[52] View Article

[53] PubMed/NCBI

[54] Google Scholar

[ref16] 16. Hotamisligil GS. Inflammation, metaflammation and immunometabolic disorders. Nature. 2017;542(7640):177–85. pmid:28179656
View Article
PubMed/NCBI
Google Scholar

[56] View Article

[57] PubMed/NCBI

[58] Google Scholar

[ref17] 17. Donath MY, Shoelson SE. Type 2 diabetes as an inflammatory disease. Nat Rev Immunol. 2011;11(2):98–107. pmid:21233852
View Article
PubMed/NCBI
Google Scholar

[60] View Article

[61] PubMed/NCBI

[62] Google Scholar

[ref18] 18. Marrocco I, Altieri F, Peluso I. Measurement and clinical significance of biomarkers of oxidative stress in humans. Oxid Med Cell Longev. 2017;2017:6501046. pmid:28698768
View Article
PubMed/NCBI
Google Scholar

[64] View Article

[65] PubMed/NCBI

[66] Google Scholar

[ref19] 19. Odegaard AO, Jacobs DR Jr, Sanchez OA, Goff DC Jr, Reiner AP, Gross MD. Oxidative stress, inflammation, endothelial dysfunction and incidence of type 2 diabetes. Cardiovasc Diabetol. 2016;15:51. pmid:27013319
View Article
PubMed/NCBI
Google Scholar

[68] View Article

[69] PubMed/NCBI

[70] Google Scholar

[ref20] 20. An Y, Xu B-T, Wan S-R, Ma X-M, Long Y, Xu Y, et al. The role of oxidative stress in diabetes mellitus-induced vascular endothelial dysfunction. Cardiovasc Diabetol. 2023;22(1):237. pmid:37660030
View Article
PubMed/NCBI
Google Scholar

[72] View Article

[73] PubMed/NCBI

[74] Google Scholar

[ref21] 21. Kavakiotis I, Tsave O, Salifoglou A. Machine Learning and Data Mining Methods in Diabetes Research. Computational and Structural Biotechnology Journal. 2017;15:104–16.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref22] 22. Jelinek HF, Stranieri A, Yatsko A, Venkatraman S. Data analytics identify glycated haemoglobin co-markers for type 2 diabetes mellitus diagnosis. Comput Biol Med. 2016;75:90–7. pmid:27268735
View Article
PubMed/NCBI
Google Scholar

[79] View Article

[80] PubMed/NCBI

[81] Google Scholar

[ref23] 23. Zueger T, Schallmoser S, Kraus M, Saar-Tsechansky M, Feuerriegel S, Stettler C. Machine Learning for Predicting the Risk of Transition from Prediabetes to Diabetes. Diabetes Technol Ther. 2022;24(11):842–7. pmid:35848962
View Article
PubMed/NCBI
Google Scholar

[83] View Article

[84] PubMed/NCBI

[85] Google Scholar

[ref24] 24. Zhang X, Yao W, Wang D. Development and validation of machine learning models for identifying prediabetes and diabetes in normoglycemia. Diabetes/Metabolism Research and Reviews. 2024;40(8):e70003.
View Article
Google Scholar

[87] View Article

[88] Google Scholar

[ref25] 25. Yousef H. Exploratory machine learning prediction of diabetes risk using novel biomarkers. Scientific Reports. 2024;14:14409.
View Article
Google Scholar

[90] View Article

[91] Google Scholar

[ref26] 26. Bader M, Abdelwanis M, Maalouf M, Jelinek HF. Detecting depression severity using weighted random forest and oxidative stress biomarkers. Sci Rep. 2024;14(1):16328. pmid:39009760
View Article
PubMed/NCBI
Google Scholar

[93] View Article

[94] PubMed/NCBI

[95] Google Scholar

[ref27] 27. Abdelwanis M, Moawad K, Mohammed S, Hummieda A, Syed S, Maalouf M, et al. Sequential classification approach for enhancing the assessment of cardiac autonomic neuropathy. Comput Biol Med. 2025;190:109999. pmid:40112561
View Article
PubMed/NCBI
Google Scholar

[97] View Article

[98] PubMed/NCBI

[99] Google Scholar

[ref28] 28. Salahudeen T, Maalouf M, Elfadel IAM, Jelinek HF. Predicting depression severity using machine learning models: Insights from mitochondrial peptides and clinical factors. PLoS One. 2025;20(5):e0320955. pmid:40367215
View Article
PubMed/NCBI
Google Scholar

[101] View Article

[102] PubMed/NCBI

[103] Google Scholar

[ref29] 29. Lee C, Yen K, Cohen P. Humanin: a harbinger of mitochondrial-derived peptides?. Trends in Endocrinology & Metabolism. 2013;24(5):222–8.
View Article
Google Scholar

[105] View Article

[106] Google Scholar

[ref30] 30. Cai H, Liu Y, Men H, Zheng Y. Protective mechanism of humanin against oxidative stress in aging-related cardiovascular diseases. Front Endocrinol (Lausanne). 2021;12:683151. pmid:34177809
View Article
PubMed/NCBI
Google Scholar

[108] View Article

[109] PubMed/NCBI

[110] Google Scholar

[ref31] 31. Tian X, Wang L, Zhong L, Zhang K, Ge X, Luo Z, et al. The research progress and future directions in the pathophysiological mechanisms of type 2 diabetes mellitus from the perspective of precision medicine. Front Med (Lausanne). 2025;12:1555077. pmid:40109716
View Article
PubMed/NCBI
Google Scholar

[112] View Article

[113] PubMed/NCBI

[114] Google Scholar

[ref32] 32. F Jelinek H, Wilding C, Tinely P. An innovative multi-disciplinary diabetes complications screening program in a rural community: a description and preliminary results of the screening. Australian Journal of Primary Health. 2006;12(1):14–20.
View Article
Google Scholar

[116] View Article

[117] Google Scholar

[ref33] 33. Jelinek H, Yatsko A, Stranieri A, Venkatraman S. Novel data mining techniques for incomplete clinical data in diabetes management. BJAST. 2014;4(33):4591–606.
View Article
Google Scholar

[119] View Article

[120] Google Scholar

[ref34] 34. Kapoor S, Narayanan A. Leakage and the reproducibility crisis in machine-learning-based science. Patterns (N Y). 2023;4(9):100804. pmid:37720327
View Article
PubMed/NCBI
Google Scholar

[122] View Article

[123] PubMed/NCBI

[124] Google Scholar

[ref35] 35. Vabalas A, Gowen E, Poliakoff E, Casson AJ. Machine learning algorithm validation with a limited sample size. PLoS One. 2019;14(11):e0224365. pmid:31697686
View Article
PubMed/NCBI
Google Scholar

[126] View Article

[127] PubMed/NCBI

[128] Google Scholar

[ref36] 36. Midi H, Sarkar SK, Rana S. Collinearity diagnostics of binary logistic regression model. Journal of Interdisciplinary Mathematics. 2010;13(3):253–67.
View Article
Google Scholar

[130] View Article

[131] Google Scholar

[ref37] 37. Maalouf M, Siddiqi M. Weighted logistic regression for large-scale imbalanced and rare events data. Knowledge-Based Systems. 2014;59:142–8.
View Article
Google Scholar

[133] View Article

[134] Google Scholar

[ref38] 38. Breiman L. Random Forests. Machine Learning. 2001;45(1):5–32.
View Article
Google Scholar

[136] View Article

[137] Google Scholar

[ref39] 39. Ke G, Meng Q, Finley T, et al. LightGBM: a highly efficient gradient boosting decision tree. In: Adv Neural Inf Process Syst. 2017; 30:3146–54.
View Article
Google Scholar

[139] View Article

[140] Google Scholar

[ref40] 40. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016. p. 785–94.

[ref41] 41. Al Nuaimi H, Abdelmagid M, Bouabid A. Classification of WatSan technologies using machine learning techniques. Water. 2023;15(15):2829.
View Article
Google Scholar

[143] View Article

[144] Google Scholar

[ref42] 42. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30:4768–77.
View Article
Google Scholar

[146] View Article

[147] Google Scholar

[ref43] 43. Friedrich N, Thuesen B, Jørgensen T. Insulin-like growth factor-I and risk of type 2 diabetes mellitus: a systematic review and meta-analysis. Diabetes Care. 2012;35(1):199–209.
View Article
Google Scholar

[149] View Article

[150] Google Scholar

[ref44] 44. Donath MY. Targeting inflammation in the treatment of type 2 diabetes: time to start. Nat Rev Drug Discov. 2014;13(6):465–76. pmid:24854413
View Article
PubMed/NCBI
Google Scholar

[152] View Article

[153] PubMed/NCBI

[154] Google Scholar

[ref45] 45. Esposito K, Nappo F, Giugliano F, Di Palo C, Ciotola M, Barbieri M, et al. Cytokine milieu tends toward inflammation in type 2 diabetes. Diabetes Care. 2003;26(5):1647. pmid:12716849
View Article
PubMed/NCBI
Google Scholar

[156] View Article

[157] PubMed/NCBI

[158] Google Scholar

[ref46] 46. Festa A, D’Agostino R Jr, Tracy RP, Haffner SM. Elevated levels of acute-phase proteins and plasminogen activator inhibitor-1 predict the development of type 2 diabetes. Diabetes. 2002;51(4):1131–7.
View Article
Google Scholar

[160] View Article

[161] Google Scholar

[ref47] 47. Matulewicz N, Karczewska-Kupczewska M. Insulin resistance and chronic inflammation. Postepy Higieny i Medycyny Doswiadczalnej. 2016;70:1245–58.
View Article
Google Scholar

[163] View Article

[164] Google Scholar

[ref48] 48. Oppong R, Jit M, Smith RD, Butler CC, Melbye H, Mölstad S, et al. Cost-effectiveness of point-of-care C-reactive protein testing to inform antibiotic prescribing decisions. Br J Gen Pract. 2013;63(612):e465-71. pmid:23834883
View Article
PubMed/NCBI
Google Scholar

[166] View Article

[167] PubMed/NCBI

[168] Google Scholar

[ref49] 49. Lee JKY, Cradic K, Singh RJ, Jones J, Li J. Discordance of insulin-like growth factor-1 results and interpretation on four different platforms. Clin Chim Acta. 2023;539:130–3. pmid:36528048
View Article
PubMed/NCBI
Google Scholar

[170] View Article

[171] PubMed/NCBI

[172] Google Scholar

[ref50] 50. Huang R, Shi J, Wei R, Li J. Challenges of insulin-like growth factor-1 testing. Crit Rev Clin Lab Sci. 2024;61(5):388–403. pmid:38323343
View Article
PubMed/NCBI
Google Scholar

[174] View Article

[175] PubMed/NCBI

[176] Google Scholar

[ref51] 51. Minshawi F, Lanvermann S, McKenzie E. The generation of an engineered interleukin-10 protein with improved stability and biological function. Frontiers in Immunology. 2020;11:1794.
View Article
Google Scholar

[178] View Article

[179] Google Scholar

[ref52] 52. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-sampling Technique. jair. 2002;16:321–57.
View Article
Google Scholar

[181] View Article

[182] Google Scholar

[ref53] 53. Snoek J, Larochelle H, Adams RP. Practical Bayesian optimization of machine learning algorithms. Adv Neural Inf Process Syst. 2012;25:2951–9.
View Article
Google Scholar

[184] View Article

[185] Google Scholar

Figures

Abstract

Objective

Methods

Results

Conclusion

1 Introduction

2 Methods

2.1 Study design and data source

2.2 Study population and outcome definition

2.3 Biomarker panels for analysis

2.4 Machine learning and statistical analysis

2.4.1 Model evaluation and hyperparameter tuning.

2.4.2 Data processing pipeline.

2.4.3 Model training and selection.

2.4.4 Final model calibration and interpretation.

3 Results

3.1 Baseline characteristics

3.2 Model performance and selection

3.3 SHAP interpretation of key biomarkers

4 Discussion

4.1 Practical feasibility and implementation considerations

4.2 Limitations and future directions

5 Conclusion

Supporting information

S1 File. Combined supporting information.

S1 Fig. SHAP analysis of the best model from the analysis: Shows global feature importance (bar plot) and a SHAP summary beeswarm plot identifying GSSG, Triglyceride, and CRP as the top predictors.

S2 Fig. Holdout set validation for the best model: Includes the ROC curve, calibration plot, and confusion matrix for the expanded all-biomarker panel.

References