Construction of a depression risk prediction model for hepatitis B patients based on machine learning strategy

doi:10.1371/journal.pone.0341236

Fig 1.

Workflow of the study.

More »

Expand

Fig 2.

Top 20 important features selected by random forest algorithm.

Features are ordered by importance score (mean decrease in Gini impurity). Key categories include liver function markers (LBXSTB, LBXSAPSI), electrolytes (LBXSKSI, LBXSCA, LBDSCASI), hematological/inflammatory indices (LBXHGB, LBXMC, LBXMCHSI, LBXRBCSI, LBXRDW, lymphocyte count, platelet count), and socioeconomic factors (RIDRETH1, INDFMPIR). Abbreviations follow NHANES variable naming conventions.

More »

Expand

Table 1.

Performance comparison of each model in the validation set.

More »

Expand

Fig 3.

ROC curves of each model and corresponding AUC values.

The MLPClassifier achieved the highest AUC (0.935), followed by Gradient Boosting (0.919), AdaBoost (0.881), Linear Discriminant Analysis (0.751), and Logistic Regression (0.739). Dashed diagonal line represents random classification.

More »

Expand

Fig 4.

Calibration curves of each model, where (AE) are the results of LR, LDA, GB, MLP and AdaBoost models respectively.

Plots show the relationship between predicted probability of depression risk (x-axis) and observed frequency (y-axis) for (A) Logistic Regression, (B) Linear Discriminant Analysis, (C) Gradient Boosting, (D) MLPClassifier, and (E) AdaBoost. The dashed diagonal line represents perfect calibration.

More »

Expand

Fig 5.

DCA curves of each model, where (AE) are the results of LR, LDA, GB, MLP and AdaBoost models respectively.

Curves depict the net benefit of using each model for clinical decision-making compared to treating all patients (gray solid line) or treating none (gray dashed line). Models are shown for (A) Logistic Regression, (B) Linear Discriminant Analysis, (C) Gradient Boosting, (D) MLPClassifier, and (E) AdaBoost.

More »

Expand