Table 1.
Features selected by BorutaSHAP for each prediction target.
Fig 1.
BorutaSHAP feature importance for CL prediction.
Boxplots show the distribution of SHAP-based importance scores (Z-score, log scale) across 5,000 iterations for each candidate feature. Green boxes indicate accepted features; red boxes indicate rejected features; blue boxes represent Max, Mean, Median, and Min Shadow reference distributions. Features are ordered by median importance from left to right.
Fig 2.
BorutaSHAP feature importance for VD prediction.
Boxplots show the distribution of SHAP-based importance scores (Z-score, log scale) across 5,000 iterations. Color coding follows the same convention as Fig 1.
Table 2.
Test set performance comparison between single-task LightGBM and multi-task MTGBM models.
Fig 3.
LightGBM prediction scatter plots across train, validation, and test sets.
(A) CL train, (B) CL validation, (C) CL test, (D) VD train, (E) VD validation, and (F) VD test predictions. Red dashed lines indicate perfect prediction; green dashed lines indicate 2-fold error boundaries. R² and GMFE are reported for each split.
Fig 4.
MTGBM prediction scatter plots across train, validation, and test sets.
(A) CL train, (B) CL validation, (C) CL test, (D) VD train, (E) VD validation, and (F) VD test predictions. Red dashed lines indicate perfect prediction; green dashed lines indicate 2-fold error boundaries. R² and GMFE are reported for each split.
Table 3.
Range-stratified GMFE comparison between LightGBM and MTGBM on the VD test set.
Fig 5.
Range-stratified GMFE comparison for VD prediction.
(A) GMFE values for LightGBM and MTGBM across three VD ranges: low (<0.5 L/kg), mid (0.5–2.0 L/kg), and high (>2.0 L/kg). (B) Δ GMFE (MTGBM − LightGBM) per range; positive values indicate MTGBM performed worse than LightGBM.
Fig 6.
Comparative performance summary between LightGBM and MTGBM.
(A) MSE comparison for CL and VD targets. (B) Test set prediction scatter plot for CL. (C) Test set prediction scatter plot for VD. (D) Residual plot for CL predictions. (E) Residual plot for VD predictions. (F) Model performance summary. Blue: LightGBM; red: MTGBM. Dashed lines indicate perfect prediction.
Fig 7.
Per-run CL MSE differences across 10 repeated random splits.
Horizontal bar plot of Δ MSE (MTGBM − LightGBM) for each of 10 independent data partitions. Blue bars indicate runs where MTGBM achieved lower MSE; orange bars indicate runs where LightGBM achieved lower MSE.
Fig 8.
Per-run VD MSE differences across 10 repeated random splits.
Horizontal bar plot of Δ MSE (MTGBM − LightGBM) for each of 10 independent data partitions. Blue bars indicate runs where MTGBM achieved lower MSE; orange bars indicate runs where LightGBM achieved lower MSE.
Fig 9.
(A) Beeswarm plot of SHAP values for top-10 features for CL prediction. (B) Bar plot of mean absolute SHAP values for CL prediction. (C) Beeswarm plot of SHAP values for top-10 features for VD prediction. (D) Bar plot of mean absolute SHAP values for VD prediction. Each point in beeswarm plots represents one test compound, colored by feature value (red: high; blue: low).