Development and validation of predictive models for diabetic retinopathy using machine learning

doi:10.1371/journal.pone.0318226

Table 1.

Baseline characteristics of the study population.

More »

Expand

Table 2.

Multivariate logistic regression analysis of characteristic factors for diabetic retinopathy.

More »

Expand

Table 3.

Performance of predictive models on test set.

More »

Expand

Fig 1.

Receiver operating characteristic (ROC) curves for the four predictive models.

The curves illustrate the sensitivity (true positive rate) versus 1-specificity (false positive rate) for each model. The diagonal dashed line represents the performance of a random classifier. XGBoost and Random Forest models show superior performance with the largest areas under the curve (AUC), indicating better discriminatory power compared to Logistic Regression and Neural Networks.

More »

Expand

Fig 2.

Bar chart comparing accuracy, precision, recall, and F1-score across four machine learning models.

Each stacked bar represents the accumulated values of accuracy, precision, recall, and F1-score. The Random Forest and XGBoost models demonstrated the highest combined metrics, indicating their superior predictive capabilities. Distinct colors identify each metric: yellow for accuracy, red for precision, dark blue for recall, and light blue for F1-score.

More »

Expand

Table 4.

Feature importance for Random Forest and XGBoost Models.

More »

Expand

Fig 3.

Feature weight values in the Random Forest model.

This bar chart illustrates the relative importance of features as determined by the Random Forest model. The x-axis represents the weight values, ranging from 0 to 0.4, and the y-axis lists the featuress. ‘24h Urinary Microalbumin (mg/L)’ and ‘Urine Protein Creatinine Ratio (mg/mmol)’ demonstrate the highest weight values, highlighting their significant contributions to the model’s predictions. Other features show progressively lower weights, emphasizing their comparatively lesser importance in the analysis.

More »

Expand

Fig 4.

Feature weight values in the XGBoost Model.

This bar chart illustrates the relative importance of features as determined by the XGBoost Model. The x-axis represents the weight values, ranging from 0 to 0.4, and the y-axis lists the featuress. ‘24h Urinary Microalbumin (mg/L)’ and ‘Urine Protein Creatinine Ratio (mg/mmol)’ also demonstrate the highest weight values, highlighting their significant contributions to the model’s predictions. Other features show progressively lower weights, emphasizing their comparatively lesser importance in the analysis.

More »

Expand