Fig 1.
Workflow of the study.
Fig 2.
Top 20 important features selected by random forest algorithm.
Features are ordered by importance score (mean decrease in Gini impurity). Key categories include liver function markers (LBXSTB, LBXSAPSI), electrolytes (LBXSKSI, LBXSCA, LBDSCASI), hematological/inflammatory indices (LBXHGB, LBXMC, LBXMCHSI, LBXRBCSI, LBXRDW, lymphocyte count, platelet count), and socioeconomic factors (RIDRETH1, INDFMPIR). Abbreviations follow NHANES variable naming conventions.
Table 1.
Performance comparison of each model in the validation set.
Fig 3.
ROC curves of each model and corresponding AUC values.
The MLPClassifier achieved the highest AUC (0.935), followed by Gradient Boosting (0.919), AdaBoost (0.881), Linear Discriminant Analysis (0.751), and Logistic Regression (0.739). Dashed diagonal line represents random classification.
Fig 4.
Calibration curves of each model, where (AE) are the results of LR, LDA, GB, MLP and AdaBoost models respectively.
Plots show the relationship between predicted probability of depression risk (x-axis) and observed frequency (y-axis) for (A) Logistic Regression, (B) Linear Discriminant Analysis, (C) Gradient Boosting, (D) MLPClassifier, and (E) AdaBoost. The dashed diagonal line represents perfect calibration.
Fig 5.
DCA curves of each model, where (AE) are the results of LR, LDA, GB, MLP and AdaBoost models respectively.
Curves depict the net benefit of using each model for clinical decision-making compared to treating all patients (gray solid line) or treating none (gray dashed line). Models are shown for (A) Logistic Regression, (B) Linear Discriminant Analysis, (C) Gradient Boosting, (D) MLPClassifier, and (E) AdaBoost.