Multi-task gradient boosting with multi-modal molecular representations for simultaneous prediction of drug clearance and volume of distribution

doi:10.1371/journal.pone.0348173

Table 1.

Features selected by BorutaSHAP for each prediction target.

More »

Expand

Fig 1.

BorutaSHAP feature importance for CL prediction.

Boxplots show the distribution of SHAP-based importance scores (Z-score, log scale) across 5,000 iterations for each candidate feature. Green boxes indicate accepted features; red boxes indicate rejected features; blue boxes represent Max, Mean, Median, and Min Shadow reference distributions. Features are ordered by median importance from left to right.

More »

Expand

Fig 2.

BorutaSHAP feature importance for VD prediction.

Boxplots show the distribution of SHAP-based importance scores (Z-score, log scale) across 5,000 iterations. Color coding follows the same convention as Fig 1.

More »

Expand

Table 2.

Test set performance comparison between single-task LightGBM and multi-task MTGBM models.

More »

Expand

Fig 3.

LightGBM prediction scatter plots across train, validation, and test sets.

(A) CL train, (B) CL validation, (C) CL test, (D) VD train, (E) VD validation, and (F) VD test predictions. Red dashed lines indicate perfect prediction; green dashed lines indicate 2-fold error boundaries. R² and GMFE are reported for each split.

More »

Expand

Fig 4.

MTGBM prediction scatter plots across train, validation, and test sets.

(A) CL train, (B) CL validation, (C) CL test, (D) VD train, (E) VD validation, and (F) VD test predictions. Red dashed lines indicate perfect prediction; green dashed lines indicate 2-fold error boundaries. R² and GMFE are reported for each split.

More »

Expand

Table 3.

Range-stratified GMFE comparison between LightGBM and MTGBM on the VD test set.

More »

Expand

Fig 5.

Range-stratified GMFE comparison for VD prediction.

(A) GMFE values for LightGBM and MTGBM across three VD ranges: low (<0.5 L/kg), mid (0.5–2.0 L/kg), and high (>2.0 L/kg). (B) Δ GMFE (MTGBM − LightGBM) per range; positive values indicate MTGBM performed worse than LightGBM.

More »

Expand

Fig 6.

Comparative performance summary between LightGBM and MTGBM.

(A) MSE comparison for CL and VD targets. (B) Test set prediction scatter plot for CL. (C) Test set prediction scatter plot for VD. (D) Residual plot for CL predictions. (E) Residual plot for VD predictions. (F) Model performance summary. Blue: LightGBM; red: MTGBM. Dashed lines indicate perfect prediction.

More »

Expand

Fig 7.

Per-run CL MSE differences across 10 repeated random splits.

Horizontal bar plot of Δ MSE (MTGBM − LightGBM) for each of 10 independent data partitions. Blue bars indicate runs where MTGBM achieved lower MSE; orange bars indicate runs where LightGBM achieved lower MSE.

More »

Expand

Fig 8.

Per-run VD MSE differences across 10 repeated random splits.

Horizontal bar plot of Δ MSE (MTGBM − LightGBM) for each of 10 independent data partitions. Blue bars indicate runs where MTGBM achieved lower MSE; orange bars indicate runs where LightGBM achieved lower MSE.

More »

Expand

Fig 9.

SHAP summary plots for MTGBM.

(A) Beeswarm plot of SHAP values for top-10 features for CL prediction. (B) Bar plot of mean absolute SHAP values for CL prediction. (C) Beeswarm plot of SHAP values for top-10 features for VD prediction. (D) Bar plot of mean absolute SHAP values for VD prediction. Each point in beeswarm plots represents one test compound, colored by feature value (red: high; blue: low).

More »

Expand