Table 1.
Baseline characteristics of the study population.
Table 2.
Multivariate logistic regression analysis of characteristic factors for diabetic retinopathy.
Table 3.
Performance of predictive models on test set.
Fig 1.
Receiver operating characteristic (ROC) curves for the four predictive models.
The curves illustrate the sensitivity (true positive rate) versus 1-specificity (false positive rate) for each model. The diagonal dashed line represents the performance of a random classifier. XGBoost and Random Forest models show superior performance with the largest areas under the curve (AUC), indicating better discriminatory power compared to Logistic Regression and Neural Networks.
Fig 2.
Bar chart comparing accuracy, precision, recall, and F1-score across four machine learning models.
Each stacked bar represents the accumulated values of accuracy, precision, recall, and F1-score. The Random Forest and XGBoost models demonstrated the highest combined metrics, indicating their superior predictive capabilities. Distinct colors identify each metric: yellow for accuracy, red for precision, dark blue for recall, and light blue for F1-score.
Table 4.
Feature importance for Random Forest and XGBoost Models.
Fig 3.
Feature weight values in the Random Forest model.
This bar chart illustrates the relative importance of features as determined by the Random Forest model. The x-axis represents the weight values, ranging from 0 to 0.4, and the y-axis lists the featuress. ‘24h Urinary Microalbumin (mg/L)’ and ‘Urine Protein Creatinine Ratio (mg/mmol)’ demonstrate the highest weight values, highlighting their significant contributions to the model’s predictions. Other features show progressively lower weights, emphasizing their comparatively lesser importance in the analysis.
Fig 4.
Feature weight values in the XGBoost Model.
This bar chart illustrates the relative importance of features as determined by the XGBoost Model. The x-axis represents the weight values, ranging from 0 to 0.4, and the y-axis lists the featuress. ‘24h Urinary Microalbumin (mg/L)’ and ‘Urine Protein Creatinine Ratio (mg/mmol)’ also demonstrate the highest weight values, highlighting their significant contributions to the model’s predictions. Other features show progressively lower weights, emphasizing their comparatively lesser importance in the analysis.