Table 1.
A sample dataset generated from the table data extraction and curation module for the Table 5 of DOI 10.1016/j.thromres.2015.07.019 [31], with DOI, table number, title of the article, drug name, dosage, route of administration, animals, number of animals, and clearance values in the columns respectively.
Fig 1.
Circular dendrogram representing various animal grouping strategies that can be considered in the PK parameter prediction models.
Some of these groups were considered as independent candidates for clearance prediction in this study.
Table 2.
An example of the routes of administration curated from research literature and the associated short forms (abbreviations) assigned for building ML models. Note: not all listed routes of administration were included in ML model development.
Fig 2.
3D scatter plot representing the dataset (raw/imbalanced) used to develop the clearance/clearance/F(bioavailability) prediction model.
This plot contains a representative dataset used in the model development (A) clearance data distribution corresponding to drug and route of administration, (B) clearance data distribution corresponding to drug and animal.
Fig 3.
Distribution of datasets selected for the prediction models, (A) imbalanced, (B) undersampling, (C) oversampling, and (D) simultaneous resampling methods.
Compared to Fig 3A, figures B, C, and D attain a well-balanced data distribution by modifying the frequency of data samples. This is accomplished by either decreasing or increasing the number of samples, using the Imbalanced-Learn Python Module.
Fig 4.
(A). SHAP summary plot, Figure (B): SHAP bar plot, (C) – (D): Representative SHAP waterfall plots depicting feature contributions to individual predictions.
It shows, how each attribute contributes positively or negatively to predict the target values. E[f(X)] represents the base value which is the average model output from the SHAP implementation and functions as the reference point. Representative datasets with actual values, highlighted in green boxes, and its predictions represented as f(X). Note that, in this study the unit of clearance value is considered in mL/min/kg, and so the predicted value may find different from the one displayed in green boxes.
Fig 5.
(A). SHAP summary plot, Figure (B): SHAP bar plot, (C) – (D): Representative SHAP waterfall plots depicting feature (study design variables and molecular descriptors) contributions to individual predictions.
It shows how each attribute contributes positively or negatively to predicting the target values. E[f(X)] represents the base value which is the average model output from the SHAP implementation and functions as the reference point. Representative datasets with actual values, highlighted in green boxes, and their predictions represented as f(X). Note that, in this study the unit of clearance value is considered in mL/min/kg, and so the predicted value may find different from the one displayed in green boxes.
Table 3.
Cross-validation scores for the hybrid ML CLT dataset curated from the literature for all species.
Fig 6.
Primary imbalanced dataset showcasing (A) major clusters, (B) for the group ‘Ungulates’, (C) for the group ‘Small Ruminants’, (D) for the group ‘Companion Animals’, with clusters based on the drug administered per species.
Clusters can be identified from its color.
Table 4.
Cross-validation scores for a subset of the hybrid ML CLT dataset focusing on the group ungulates.
Table 5.
Cross-validation scores for a subset of the hybrid ML CLT dataset focusing on small ruminants.
Table 6.
Cross-validation scores for a subset of the hybrid ML CLT dataset focusing on companion animals.
Table 7.
Performance Metrics R2, MAE, RMSE, and EVS scores for various data resampling methods for selected ML models.
Table 8.
Performance Metrics R2, MAE, RMSE, and EVS scores for various data resampling methods for Case 6 where Route = IV.
Fig 7.
Goodness-of-fit metrics of RF model for the true vs predicted value for (A) all dataset, (B) ungulates dataset, (C) small ruminants, (D) companion animals dataset.
True (Actual) values are fitted in the best-fit line (test data – cyan, train data – pink), and light, dark blue scatters correspond to the predicted values of test and training respectively. A vertical column separation (dashed line) is given for outcomes corresponding to three different groups: (i) datasets where all routes of administration are taken into consideration (hybrid ML CL); (ii) considering datasets with only route IV is selected, and (iii) datasets with all the routes except IV (non-IV) are considered. (70:30 train:test data splitting ratio).
Table 9.
A view of existing clearance-based prediction models. Included some of the studies based on the prediction method adopted.