Fig 1.
Overview of the proposed methodology for CKD prediction.
Fig 2.
Class distribution of the CKD datasets: (A) Dataset 1 before applying SMOTE, (B) Dataset 2 before applying SMOTE, (C) Dataset 1 after applying SMOTE within training folds, (D) Dataset 2 after applying SMOTE within training folds.
SMOTE was applied exclusively during the training phase of each cross-validation fold to prevent data leakage.
Table 1.
Conservative hyperparameter optimization for Dataset 1 (UAE Tawam Hospital).
Table 2.
Conservative hyperparameter optimization for Dataset 2 (UCI CKD).
Table 3.
Performance metrics for Dataset 1 (UAE Tawam Hospital) without SMOTE.
Table 4.
Performance metrics for Dataset 1 (UAE Tawam Hospital) with SMOTE.
Table 5.
Performance metrics for Dataset 2 (UCI CKD) without SMOTE.
Table 6.
Performance metrics for Dataset 2 (UCI CKD) with SMOTE.
Fig 3.
ROC curve comparison for Dataset 1 (UAE Tawam Hospital).
Panel A: without SMOTE; Panel B: with SMOTE. XGBoost demonstrates improved discrimination with SMOTE (AUC: 0.886 → 0.904).
Fig 4.
ROC curve comparison for Dataset 2 (UCI CKD).
Panel A: without SMOTE; Panel B: with SMOTE. XGBoost achieves optimal performance (AUC = 0.948 ± 0.013) with SMOTE.
Fig 5.
SHAP feature importance comparison across datasets.
Panel A: Dataset 1 (UAE Tawam Hospital), where cardiovascular–renal markers are predominantly influential. Panel B: Dataset 2 (UCI CKD), emphasizing direct renal function indicators.
Fig 6.
SHAP summary plot comparison across datasets.
Panel A: Dataset 1 (UAE Tawam Hospital). Panel B: Dataset 2 (UCI CKD). Feature value distributions highlight clinically coherent patterns and consistent model behavior across both cohorts.
Fig 7.
Individual prediction explanations using SHAP waterfall plots.
Panel A: Dataset 1, representative Non-CKD prediction. Panel B: Dataset 2, representative Non-CKD prediction. The plots provide transparent, case-level clinical reasoning by showing how each feature contribution shifts the model output from the baseline toward the final decision.
Fig 8.
LIME explanation for UCI Dataset case.
Sample 50 showing CKD prediction driven by low specific gravity and severe anemia, validating SHAP’s hematological marker hierarchy.
Fig 9.
LIME explanation for Tawam Dataset case.
Sample 49 showing non-CKD prediction dominated by excellent eGFR protection, consistent with SHAP’s cardiovascular-renal emphasis.
Fig 10.
Calibration summary for Dataset 1 (Tawam): Brier (left) and ECE (right) with/without SMOTE.
Table 7.
Calibration metrics (Brier, ECE) from outer-fold OOF predictions (lower is better).
Fig 11.
Calibration summary for Dataset 2 (UCI): Brier (left) and ECE (right) with/without SMOTE.