Machine learning-based unified models for predicting drug clearance from pharmacokinetic animal and study design variables

Remya Ampadi Ramachandran; Lisa A. Tell; Melissa A. Mercer; Xuan Xu; Nuwan Indika Millagaha Gedara; Maaike Ottoline Clapham; Zhoumeng Lin; Jim E. Riviere; Majid Jaberi-Douraki

doi:10.1371/journal.pone.0346432

Abstract

Clearance (CL) is a primary pharmacokinetic (PK) parameter crucial to determine how quickly a drug is eliminated from the body, which guides the appropriate dosing interval to maintain a consistent concentration in blood. Given the importance of CL, this study aimed to use machine learning (ML) techniques to predict CL values by identifying patterns and relationships within an extracted dataset of PK variables from published articles. Variables evaluated in the extracted dataset included drug, dose, animal species, and route of administration. Nine distinct ML models were then applied to analyze the CL data, incorporating both imbalanced and balanced data generated through resampling methods. Since the CL data used in this study is a collection of all CL values (true CL and CL/F) extracted from scientific articles, the collected CL variable for both IV and non-IV administration routes are referred to as hybrid ML CL. To analyze the effect of ML models in predicting the CL values, we used the hybrid ML CL dataset for six different subsets of data including one solely from the intravenous route of administration. Linear regression, multi-layer perceptron, and random forest models consistently had the highest efficiency in predicting CL values, with an R² score > 0.87. However, R² increased to > 0.95 when analyzing only ungulates or small ruminants, and > 0.92 for the companion animal group. This study has the potential to help researchers employ computational, mathematical, and ML models to predict and estimate CL values and changes in CL values based on variables. This study focuses on evaluating the feasibility of predicting drug CL in situations where direct CL data are not available. Rather than addressing drug development processes, the research examines whether study design variables can serve as input parameters for a proposed cross-species extrapolation tool aimed specifically at predicting existing drug CL values.

Citation: Ampadi Ramachandran R, Tell LA, Mercer MA, Xu X, Millagaha Gedara NI, Clapham MO, et al. (2026) Machine learning-based unified models for predicting drug clearance from pharmacokinetic animal and study design variables. PLoS One 21(5): e0346432. https://doi.org/10.1371/journal.pone.0346432

Editor: James Guevara Pulido, Universidad El Bosque, COLOMBIA

Received: June 10, 2025; Accepted: March 19, 2026; Published: May 6, 2026

Copyright: © 2026 Ampadi Ramachandran et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: “Researchers may obtain the code and datasets produced or analyzed during this study from the corresponding author upon reasonable request (jaberi@k-state.edu). The full dataset cannot be publicly shared due to licensing and reuse restrictions associated with the original journal sources. However, full list of source papers DOIs, a representative dataset and python code snippets can be accessible from our 1DATA platform’s (https://1data.life/) Drug Clearance data repository using the following link: https://1data.life/pages/publication/ML_Predicting_Drug_Clearance_PK/.

Funding: This work was supported by the USDA via the FARAD program (Award No.: 2022-41480-38135, 202341480-41034, 2024-41480-43679, and 2025-41480-45282) and its support for the 1DATA Consortium at Kansas State University. MJ-D also accepted funding from BioNexus KC (20-7) for this project. Neither USDA nor BioNexus KC had a direct role in this article.

Competing interests: NO authors have competing interests.

Introduction

Veterinary drugs play a vital role in maintaining the health of both companion and agricultural animals. Since veterinary drugs are evaluated for safety and efficacy in the target species at the earliest stages of drug development, early and successful dose characterization is critical to the veterinary drug development pipeline [1]. While there are many similarities between human and veterinary pharmaceutical development, product developers face unique challenges when bringing veterinary drugs to market. Factors such as disease, species, age, animal breed/genetics, and physiologic state (including pregnancy, lactation, or disease) can significantly impact a drug’s pharmacokinetics (PK) [2–7]. Therefore, the development of in silico PK modeling methods may be particularly advantageous to improve the efficiency of assessing target dosing strategies during preclinical drug development.

The two most critical PK parameters for dose characterization are the Volume of Distribution (Vd) and Clearance (CL). Vd is a proportionality constant equal to the amount of drug in the body divided by the plasma concentration at a given time and is primarily used for an appropriate loading dose. Since Vd is primarily determined by the drug’s physical properties, including lipophilicity, water solubility, charge, and extent of protein binding, Vd is relatively straightforward to successfully model via a variety of in silico approaches [8]. Meanwhile, CL is the PK parameter that measures the ability of the body to eliminate a drug per unit of time. As such, CL is the relevant parameter to calculating the dose and dosing interval required to maintain the steady-state plasma concentration of a drug. CL is also used to model the terminal half-life of a drug and to predict potential drug-drug interactions.

In contrast to Vd, CL has proven to be a far more challenging PK parameter to model. Total body clearance (CL_T) is a summation of all the individual organ CLs, including liver and kidney. The reason modeling CL is challenging is that CL_T is influenced by both drug properties and animal physiology. Consequently, accurately modeling CL_T requires accounting for the complex interactions between drug metabolism and organ-specific physiology [9]. CL_T can be affected by factors such as species, genetics, age, physiologic status, and disease—all of which must be considered when developing a dosing regimen [3,4,10]. In vivo CL_T is best determined following intravenous (IV) administration, as this method bypasses the variability introduced by drug absorption (as reflected by drug bioavailability [F]). However, when in vivo studies are conducted using extravascular routes of administration, bioavailability (F) is not directly measured, and clearance is often reported as CL/F, which further increases the challenge of modeling CL.

Approaches that predict individual organ CL, such as hepatic CL, based on in vitro techniques have been extensively explored in small molecule development for humans [11] and in a limited fashion to predict hepatic CL in veterinary species [12]. However, these in vitro techniques can only predict hepatic CL, not the CL_T used for dose characterization. Therefore, CL_T has most commonly been predicted using in silico methods, be that allometric scaling [13,14], in vivo–in vitro extrapolation (IVIVE) [15], or physiologically based pharmacokinetic (PBPK) modeling [16]. These methods have variable accuracy when compared to in vivo data and are based on individual studies or small pools of data to predict PK parameters, which may not fully reflect the variability of the target population.

Artificial intelligence (AI) and Machine Learning (ML) have been explored as a method for predicting CL_T in various species including preclinical species [17–19]. ML methods incorporate large pools of data in a variety of ages, weights, and physiologic states to predict parameters based on broad population-based data. These methods have been applied to predict plasma half-lives and CL_T of drugs in veterinary species using molecular descriptors [20,21], but have not yet been used to explore the ability to predict CL_T in veterinary species using the pharmacokinetic study design parameters for individual drugs. The proposed AI/ML model focuses on data harvesting from scientific literature and prediction of true CL which includes the IV-only CL dataset as well the prediction of hybrid ML CL where the dataset includes both true CL and CL/F values, based on pharmacokinetic animal study design variables. This model stands apart because it primarily chooses the PK study design variables as the predictor or independent variables, unlike other mathematical and ML PK models that rely on molecular descriptors and physicochemical properties [21,22].

Although robust pharmacokinetic data is fundamental for understanding drug disposition, CL values are often missing or difficult to obtain due to resource limitations, ethical constraints, or the absence of comprehensive in vivo studies. To address these practical barriers, the present study investigates whether study design variables alone can be leveraged to reliably predict drug CL values when empirical data are missing. By focusing the analysis on existing drugs, the proposed cross-species extrapolation tool is used to explore the potential of study design variables as inputs in generating accurate CL predictions. This study aimed to evaluate the efficacy of nine distinct ML models to estimate total body CL in a variety of veterinary species.

Methodology

Data resources

Similar to other ML methods, collecting data is a vital part of developing an automated drug CL_T prediction model. Due to the limited accessibility of the preexisting CL_T database suitable for ML research, we relied on scientific literature to create our dataset [22,23]. Our automated Web Crawler for PK was utilized to download XML versions of scientific articles [24]. The database we utilized for this study comprised scientific publications in XML format, obtained mainly from the Scopus, Springer, and Crossref TDM API providers. These publications primarily cover the following Anatomical Therapeutic Chemical (ATC) drug classes: QD01–QD11 (Dermatologicals), QH01-QH05 (Systemic Hormonal Preparations Excluding Sex Hormones and Insulin), QJ01–QJ05 (Antiinfectives for Systemic Use), and QP51–QP54 (Antiparasitic Products, Insecticides, and Repellents). Based on the ATC classification, some of these drugs may belong to other drug classes including QMs (NSAIDs – Anti-inflammatory and antirheumatic products). The ATCvet code used here is formed by adding the letter Q to the standard ATC code [25,26]. In both the ATC and the ATCvet systems, preparations are categorized based on their intended therapeutic use (Level 1: broad anatomical groups QA – QV), preparations are further subdivided into therapeutic main groups (Level 2: QA01, QA02, etc.), categorized into chemical, therapeutic, or pharmacological subgroups (Levels 3: QA02A, QA02B, etc., and Level 4: QA02AA, QA02AB, etc.), and finally the chemical substance classification except for QI Immunological (Level 5: QA02AA01).

Data collection

As part of the automated data extraction procedure, the XML versions of scientific articles were given as input to table the data extraction module to identify their table format and extract corresponding PK data [24,27–29]. Likewise, in this study, the automated modules implemented using the Python programming language (version 3.8) [30] generates an output dataset focused on developing drug clearance prediction model. Table 1 indicates a sample output generated in.csv file format from the table data extraction and curation module for the DOI 10.1016/j.thromres.2015.07.019 [31].

Download:

Table 1. A sample dataset generated from the table data extraction and curation module for the Table 5 of DOI 10.1016/j.thromres.2015.07.019 [31], with DOI, table number, title of the article, drug name, dosage, route of administration, animals, number of animals, and clearance values in the columns respectively.

https://doi.org/10.1371/journal.pone.0346432.t001

Manual extractions were also conducted to verify and validate the precision of the automatic extraction process and to fill in any missing data. In this way, data present in tables of extracted scientific articles were curated both manually [32] and by using an automated extraction approach implemented in Python modules. Finally, for this study, we have limited our data collection and curation to the CL data for every drug, route of administration, and animal, that were extracted from the published records [24] and could be accessed through the rule-based tabular data extraction module for the XML version created in Python [30]. The rule-based tabular data extraction method is similar to the named entity recognition (NER) model that addresses the challenges in accurately extracting the PK parameters from scientific literature [33]. Additionally, a database of pharmacokinetic data manually extracted from scientific articles by the Food Animal Residue Avoidance and Depletion (FARAD) program was included [32]. Therefore, due to the variety of different routes of administration, the extracted CL data used in this study included both CL and apparent CL (CL/F). As such, the CL data parameter in this study was termed a hybrid route-independent machine learning CL (hybrid ML CL_T), which is a combination of CL and CL/F values from both extravascular and intravascular drug administrations.

Data compliance

The dataset used in this study was both manually and automatically curated from scientific publications indexed mainly in PubMed and Scopus databases. Relevant articles were identified using keyword-based searches and the data including clearance, drug, dosage, route of information, and animal/species were systematically extracted. Because all the data were obtained from open access or subscription based publications, used in compliance with the terms and conditions of the respective bibliographic sources, no prior approval was required. The data collected, through the two methodologies employed in this study, namely the automated customizable web crawler [24] and the manual collection [32], together with the data analyzed and shared, adhered to the terms and conditions of the data sources, along with relevant ethical and legal standards.

Data pre-processing

To ensure data quality, data cleansing was incorporated in the initial phase of model development which helped to merge manually and automatically collected datasets; remove unwanted observations, manage unwanted outliers, handle or rebuild missing data, de-duplicate, unit standardization, and resulted in a cleaned dataset. As part of generating a valid dataset for this study, we used experts’ opinion in this data preparation phase to determine the range of clearance values, as mentioned in the Data Sampling section below, used in this study from the dataset curated. Data preprocessing is the term coined to represent such data manipulation or discarding data to generate high-quality datasets for implementing high-performance ML models [34].

Drug CL can be expressed as (volume/time) and is commonly normalized to body weight for veterinary species. As such, the CL values published in the articles were expressed in a variety of units, including L/h/kg, mL/min/kg, μL/min/mg, mL/h/kg, and similar variations. To address the variety of different units for CL, we also converted the collected CL data into a unit standardized format (mL/min/kg). After cleaning the dataset, it was divided into two datasets for training and testing the models. The performance, reliability, and generalizability of the machine learning model were confirmed by using the 5-fold cross-validation scheme. Here, instead of performing a single train-test split, the training dataset was randomly divided into five equal-sized subsets or folds. The model was then trained on four of the folds and evaluated on the remaining fold. This process was repeated five times, with each fold serving as the test set exactly once [34].

Data sampling

The data obtained from the pre-processing module often had an imbalance in nature relative to class distributions (drugs or animal groups shown in Fig 1) of unequal size, yet most ML algorithms deliver optimal outcomes for balanced class distributions [34,35]. For example, there are not equal numbers of papers published across species (e.g., there are far more PK studies performed in cattle than camelids), and certain routes of administration are more commonly studied than others, based on the feasibility of administration of certain species. Duplicating or creating synthetic datasets for minority classes (oversampling), dropping data from majority classes (undersampling), or using a combination of both approaches (simultaneous sampling) have been utilized to analyze imbalanced datasets before fitting the ML models. Python imbalanced-learn library which was imported as imblearn has been employed with RandomUnderSampler and RandomOverSampler for undersampling and oversampling techniques, respectively. Imblearn Pipeline has been used to implement the resampling technique from the selected methods [36,37]. In this manner, along with the implementation of binning strategies to handle continuous variables or targets, flexible response-stratified sampling strategies can be applied within regression framework. This approach addressed imbalance issues in the datasets while preserving the continuous nature of the target clearance values, ensuring the ML model’s ability to perform accurate prediction.

Download:

Fig 1. Circular dendrogram representing various animal grouping strategies that can be considered in the PK parameter prediction models.

Some of these groups were considered as independent candidates for clearance prediction in this study.

https://doi.org/10.1371/journal.pone.0346432.g001

In this way, the literature-derived clearance data were curated and harmonized by converting reported values to a consistent unit (mL/min/kg), incorporating key study variables (drug, dose, route of administration, and species), and applying predefined aggregation and outlier screening procedures, with domain-informed review used to resolve ambiguous or sparsely reported entries. Because the resulting dataset exhibited an uneven distribution across clearance ranges, resampling strategies were applied within the training pipeline to mitigate data imbalance during model development.

Dataset

Modeling predictions of true CL or CL/F in this study was achieved using datasets that included drug, dose, route of administration, and species, which were extracted from peer-reviewed published pharmacokinetic studies using live animals or in vitro studies. A total of 483 drugs, 18 routes of administration, and 52 species having different combinations of CL_T values were curated. Fig 1 and 2 (https://1data.life/pages/publication/ML_Predicting_Drug_Clearance_PK/Cl_Paper_Fig1.svg) present a general overview of the datasets extracted. S1 File (https://1data.life/pages/publication/ML_Predicting_Drug_Clearance_PK/Cl_Paper_Supplment_S1_.html) includes different data clusters with species and drugs gathered from research literature in their raw state. The pre-processing module refined these curated datasets by consolidating drug names, routes of administration, and species to create a clean, consistent dataset. Fig 2 and Table 2 describe the data after eliminating the multiple occurrences of various data combinations with their drug and CL_T values. In the data preprocessing phase, we placed high importance on removing any records that had missing information in any of the columns, including drug, dose, route, species, and clearance. It also handled grouping or assigning the same name to different representations of the same route forms, such as ‘IV’, intravenously’, and ‘intravenous’, as shown in Table 2. Additionally, the in vitro routes reported in Table 2 were automatically extracted by the table data curation module [24,27], but they were not all used for the ML model development, as they do not reflect real routes of administration, but rather an experimental scenario.

Download:

Table 2. An example of the routes of administration curated from research literature and the associated short forms (abbreviations) assigned for building ML models. Note: not all listed routes of administration were included in ML model development.

https://doi.org/10.1371/journal.pone.0346432.t002

Download:

Fig 2. 3D scatter plot representing the dataset (raw/imbalanced) used to develop the clearance/clearance/F(bioavailability) prediction model.

This plot contains a representative dataset used in the model development (A) clearance data distribution corresponding to drug and route of administration, (B) clearance data distribution corresponding to drug and animal.

https://doi.org/10.1371/journal.pone.0346432.g002

Feature importance score

In ML model development, a crucial step involves calculating the scores of all predictor variables in a model in order to assess their importance in the decision-making or prediction process. In the proposed CL_T prediction model, the feature importance of a forest of trees has been considered [38]. Here, the fitted attribute feature importances (feature_importances_) were determined by calculating the average and standard deviation of the decrease in impurity for each tree in the model. By using this approach, we have the flexibility to perform feature importance calculations either by mean decrease in impurity or feature permutation. Relying on feature permutation to measure feature importance can eliminate bias towards high-cardinality features, which is a drawback of using impurity-based feature importance calculation [34]. This research additionally assessed the significance of features using SHAP (Shapley Additive exPlanations) analysis [39]. The RF ranks feature contributions globally, supporting features with continuous values due to its regular and effective data splitting across all trees, and it calculates the features’ overall contributions to accuracy or impurity reduction. On the other hand, SHAP values apply cooperative game theory to assign a fair, instance-specific credit of each feature’s contribution to predictions, which may be aggregated for overarching insights. Due to these reasons, RF feature importance scores and SHAP values may show complementary insights into model interpretability resulting in potentially different feature contributions.

Performance measures

The efficiency of an ML model in making predictions is measured by its quality assessment. The assessment is made using various model evaluation tools including cross-validation scores or metric functions like scoring parameters. The cross-validation score relies on an internal scoring strategy, involving splitting the dataset into k consecutive folds, with each fold being allocated to the validation phase and the others to the training set. This helps to prevent problems of overfitting. In the case of predicting continuous target values, regression models are preferred over classification models. Since our proposed study was focused on predicting drug CL_T values, which is a continuous variable, we opted to use ML regression models. The performance of these models was evaluated using r² scores as the primary performance metrics. The r² scores evaluated indicate the model’s capability of ‘how perfectly the model is trained for the given dataset’. An ideal model is the one with a r² score of 1 or close to 1 which in turn confirms the efficiency of the model that we implemented for predicting the CL_T values. It can be calculated using Equation (1) [34,40,41].

(1)

where , and are the predicted, and actual values of y, respectively.

The explained variance score (EVS) represents the amount of error dispersion present in a particular dataset as calculated using Equation (2). Similar to the r² score, values closer to 1 are preferable.

(2)

where is the variance of prediction errors, and is the variance of actual values.

Root mean square error (RMSE), which is an extension of mean square error (MSE), is recognized as the predominant error measure of precision in regression, quantifying performance in the same units as the predicted values. RMSE values can be computed using equation (3), with a score of 0 being deemed ideal:

(3)

The mean absolute error (MAE) metric helps us understand how different our predictions are from the actual values in the dataset. It assists in assessing the accuracy of the model by measuring the absolute differences between the predicted and real values (Equation (4)):

(4)

where are the actual values and are the predicted values for in with n being the length of actual values.

Results

Data sampler and machine learning models

A dataset comprising 4,788 records was extracted from the tables present in the XML files. There were 129 drugs with extractable data across 10 ungulate species (alpacas, buffalo, camels, cattle, donkeys, goats, horses, llamas, pigs, and sheep). There were 50 drugs with extractable data across 2 small ruminant species and 33 drugs with extractable data across 12 avian species. To evaluate the efficiency of ML models at predicting CL_T values, we separated certain groups from the overall data analysis for individual examination (cases), with a sample of the findings presented in this article. The PK data curated from the literature includes imbalanced observations of its data clusters such as species, route of administration, drug, and dosage, all of which affect the target variable clearance. Fig 3A confirms the imbalanced nature of the selected dataset.

Download:

Fig 3. Distribution of datasets selected for the prediction models, (A) imbalanced, (B) undersampling, (C) oversampling, and (D) simultaneous resampling methods.

Compared to Fig 3A, figures B, C, and D attain a well-balanced data distribution by modifying the frequency of data samples. This is accomplished by either decreasing or increasing the number of samples, using the Imbalanced-Learn Python Module.

https://doi.org/10.1371/journal.pone.0346432.g003

As mentioned in the ‘Data Sampling’ section above, random resampling strategies such as random undersampling (Fig 3B), random oversampling (Fig 3C), and a combination of these strategies for generating simultaneously sampled (Fig 3D) datasets were applied [42]. Fig 3 is an illustration of the influence of various resampling strategies over the imbalanced dataset distribution. Similarly, a balanced distribution of classes was achieved by adjusting the frequency of data samples, either by reducing or increasing them. Undersampling (Fig 3B) reduces the number of samples in the original majority class, making it comparable to the original count of the minority class. However, because it works by removing samples rather than adding new ones, the resulting dataset may not appear as dramatically balanced as datasets created through oversampling (Fig 3C) or simultaneous (Fig 3D) sampling.

Feature importance

The features in importance analysis refer to the predictor study design parameters for individual drugs such as route of administration, species, drug, or dose. Feature importance analysis conducted in this study shows the level of influence or percentage of the study design variables’ contribution to the corresponding target variable (CL). As per the feature importance score from the RF feature importance assessment, the CL_T value is primarily influenced by (in order of decreasing contribution) the drug, then the dosage, route of administration, and then the treated species. However, according to the RF feature importance score for the balanced dataset, the route of administration had a relatively lower impact on the CL_T value. This is not unexpected because the CL_T parameter being estimated by the ML model was the hybrid ML CL_T, which includes CL/F which, through bioavailability, already accounts for the influence of the route of administration. Additionally, since all drug types were included, even though CL_T is typically dose-independent, dose-dependent CL could be observed for drugs with saturable metabolism.

While Fig 4A–B depicts the contribution of each feature to the overall prediction of clearance values for the imbalanced dataset; Fig 4C–D depicts the feature contribution scores obtained for selected drug, dosage, animal, and administration route within the curated imbalanced datasets from the SHAP analysis. The figures also display the base value from SHAP implementation, and feature attributions that demonstrate the impact of each feature in modifying the prediction in relation to the base value. The base value indicates the average model output from SHAP implementation. In addition to the impact of specific feature contributions, these plots illustrate the closeness of the actual and predicted values for the specified dataset. SHAP contribution measures facilitate understanding of local interpretability, clarifying the reasoning behind the model’s prediction for a specific sample, in contrast to the global interpretability provided by RF feature importance measures.

Download:

Fig 4. (A). SHAP summary plot, Figure (B): SHAP bar plot, (C) – (D): Representative SHAP waterfall plots depicting feature contributions to individual predictions.

It shows, how each attribute contributes positively or negatively to predict the target values. E[f(X)] represents the base value which is the average model output from the SHAP implementation and functions as the reference point. Representative datasets with actual values, highlighted in green boxes, and its predictions represented as f(X). Note that, in this study the unit of clearance value is considered in mL/min/kg, and so the predicted value may find different from the one displayed in green boxes.

https://doi.org/10.1371/journal.pone.0346432.g004

In the proposed model, we primarily examined how study design variables influenced the prediction of drug clearance values. To further evaluate the impact of molecular descriptors, we conducted a feasibility analysis comparing models based on study-design-only predictors, combined study-design + molecular descriptors, and molecular-descriptors-only. Fig 5A–B shows the results obtained from the prototype model with each feature’s contribution to the overall prediction of clearance for the imbalanced dataset, whereas Fig 5C–D provides the corresponding feature contribution score and prediction results obtained for the same combination of drug, dosage, animal, and administration route as shown in Fig 4C–D, now supplemented with Mw, LogP, TPSA values.

Download:

Fig 5. (A). SHAP summary plot, Figure (B): SHAP bar plot, (C) – (D): Representative SHAP waterfall plots depicting feature (study design variables and molecular descriptors) contributions to individual predictions.

It shows how each attribute contributes positively or negatively to predicting the target values. E[f(X)] represents the base value which is the average model output from the SHAP implementation and functions as the reference point. Representative datasets with actual values, highlighted in green boxes, and their predictions represented as f(X). Note that, in this study the unit of clearance value is considered in mL/min/kg, and so the predicted value may find different from the one displayed in green boxes.

https://doi.org/10.1371/journal.pone.0346432.g005

Compared to the model with study design variables alone, incorporating molecular descriptors modestly improved the performance scores to the higher R² score of approximately 0.82 for the combined imbalanced dataset in both cross-validation and 70:30 training:test data split scenarios. In contrast, the molecular-descriptors-only model performed substantially worse, with an R² score of approximately 0.54. These findings highlight both the feasibility of integrating molecular descriptors and the central importance of study design variables in predicting drug clearance for the combinations of predictors examined in this study.

Prediction models

This paper presents ML prediction models for four subsets of the CL_T dataset using the PK study design variables as the predictor variables, resulting in a hybrid ML CL_T dataset for Cases 1–5. Cases 1–5 focused on animal categories and included all routes of administration (intravascular and extravascular): Case 1 – the total hybrid ML CL dataset curated from the literature for all animal categories, Case 2 – a subset of hybrid ML CL dataset focusing on ungulates, Case 3 – a subset of the ungulate hybrid ML CL dataset focusing solely on small ruminants, Case 4 – a subset of hybrid ML CL dataset focusing on companion animals, Case 5 – hybrid ML CL prediction models for different train:test data splitting options. Case 6 was a subset of the dataset (true CL) with only the intravenous (IV, IV bolus, or IV constant rate infusion) route of administration across all dosages and animal species. The performance of the models was evaluated under two different conditions: A) without implementing resampling techniques (imbalanced datasets), and B) with resampling techniques (balanced datasets) in predicting the drug CL or CL/F values.

Unlike other prediction models [43–47] that rely on molecular descriptors of drugs, structural characteristics, physiological data, or physicochemical properties, the ML prediction models discussed in this study use the study design variables of an individual drug including drug dosage, route of administration, and animal species as input variables to predict the hybrid ML CL_T (Case 1–5) and true CL_T (IV) (Case 6) rates of drugs. Unlike other ML models, the ML models presented in this manuscript focus on identifying patterns and relationships within the PK study design variable, including feature contributions and interactions, to make parameter estimations. We do not assess the clinical implications of the input variables nor attempt to make predictions for drugs not included in the analyzed datasets, thereby reducing the need for molecular descriptors of drug structure. The dataset is also used to assess what ML approach is optimal to estimate CL_T data. In the proposed ML prediction models (Case 1–5), all routes of administration, including intravenous (IV), are treated equally in determining the hybrid ML CL_T rates, while IV administration is considered the primary determinant of CL in pharmacological, clinical and structure-activity relationship models [43–47].

Case 1 (all animal categories and route forms – hybrid ML CL_T)

Imbalanced Dataset: This prediction model was formulated using the raw/primary dataset for all CL_T values for all animal categories. Fig 6A represents a subgroup of clusters of the primary dataset based on species and drugs administered. The primary dataset was given as a candidate for the prediction models. ML models including tree-based models such as random forest (RF), and neural network models including multi-layer perceptron (MLP) were implemented here. Quantitative evaluations were considered (Table 3) and a cross-validation score (R² Score ± Standard Deviation (STD)) of 0.791 ± 0.017 confirmed the actual efficiency of ML models in determining the CL_T data from the given PK study design parameters.

Download:

Table 3. Cross-validation scores for the hybrid ML CL_T dataset curated from the literature for all species.

https://doi.org/10.1371/journal.pone.0346432.t003

Download:

Fig 6. Primary imbalanced dataset showcasing (A) major clusters, (B) for the group ‘Ungulates’, (C) for the group ‘Small Ruminants’, (D) for the group ‘Companion Animals’, with clusters based on the drug administered per species.

Clusters can be identified from its color.

https://doi.org/10.1371/journal.pone.0346432.g006

Resampled Dataset: Balanced datasets generated using resampling techniques were also considered for the model implementation. Results shown in Table 3 revealed the importance of considering a balanced dataset for the CL_T prediction tasks. The same models including RF, and MLP with other regression models were considered for the imbalanced dataset as well. Compared to the imbalanced and undersampled models, both oversampled and simultaneously sampled sets generated improved cross-validation scores (Table 3). A cross-validation score greater than 0.871 ± 0.006 for the oversampled and simultaneously sampled sets reflects the efficiency of models especially CART, RF, and MLP, in handling continuous target data of the hybrid ML CL_T parameter.

Case 2 (ungulates and all route forms – hybrid ML CL_T)

Here the prediction models are considered a subset of the curated datasets focused on CL_T values for ungulates. Similar to the raw datasets (Case 1), this case also performs model efficiency validation with imbalanced and balanced datasets. The cross-validation scores (Table 4) show improved efficiency of the RF model (>85%) with imbalanced, LR model (~90%) with undersampled datasets, while RF, MLP, LR, RIDGE, and CART models (~92%) with oversampled and simultaneously sampled datasets. Fig 6B shows the clusters for the group ‘Ungulates’ based on species and drugs administered.

Download:

Table 4. Cross-validation scores for a subset of the hybrid ML CL_T dataset focusing on the group ungulates.

https://doi.org/10.1371/journal.pone.0346432.t004

Case 3 (small ruminants and all route forms – hybrid ML CL_T)

This case considered the curated datasets focused on the species ‘Small ruminants’ (Fig 6C). Similar to the previous cases, all nine ML regression models were implemented, and their efficiencies were validated. In this dataset, LR outperformed the imbalanced and undersampled datasets (>87%) while MLP, RF, and CART models show the highest efficiency (~96%) with oversampled and simultaneously sampled datasets. Cross-validation scores obtained in this case are depicted in Table 5.

Download:

Table 5. Cross-validation scores for a subset of the hybrid ML CL_T dataset focusing on small ruminants.

https://doi.org/10.1371/journal.pone.0346432.t005

Case 4 (companion animals and all route forms – hybrid ML CL_T)

This case considered the curated datasets focused on ‘Companion Animals’ (Fig 6D). Similar to the previous cases, all nine ML regression models were implemented and validated their efficiencies. In this case, the CART model outperformed all the resampling strategies (~91% and above), while MLP gave the highest score with imbalanced datasets (~79%). However, the tree-based models (CART, RF), MLP, and LR models had almost comparable outcomes. Cross-validation scores obtained in this case are depicted in Table 6.

Download:

Table 6. Cross-validation scores for a subset of the hybrid ML CL_T dataset focusing on companion animals.

https://doi.org/10.1371/journal.pone.0346432.t006

Case 5 (all route forms for different training:test data splitting – hybrid ML CL_T)

Model validation was also performed by choosing different train:test dataset splitting criteria. It reveals a general sense of how the implemented model is performing on the new set. We applied this approach to all four scenarios (Case 1–4) previously discussed. Table 7 summarizes the performance metrics of selected ML models corresponding to various training:test splits selected for the study. RF was selected for the model validation phase based on the R² scores achieved for the balanced datasets (Cases 1–4). For imbalanced and undersampled approaches, R² score and EVS were unsurprisingly observed to decrease with an increase in the number of test samples, potentially caused by an insufficient amount of learning data corresponding to the test dataset. Other performance metrics such as MAE and RMSE also appear to be good despite the wide range of CL_T values in the dataset. Similarly, with an increase in the number of test datasets, we observe lower goodness of fit or R² compared to the traditional 80:20 training:test data splitting, but with a minimum variation for oversampled and simultaneously sampled datasets.

Download:

Table 7. Performance Metrics R², MAE, RMSE, and EVS scores for various data resampling methods for selected ML models.

https://doi.org/10.1371/journal.pone.0346432.t007

Case 6 (IV only – true CL_T)

The PK study design variables-based CL_T prediction model discussed in Case 1–5 did not take into account the influence of the bioavailability of a drug through various routes of administration [48,49]. As discussed earlier, the CL_T data we have curated from the literature includes both actual CL and apparent CL (CL/F). Therefore, the hybrid route independent ML CL_T that we have predicted so far is a combination of clearance values from both extravascular and intravascular drug administrations, thereby more closely representing CL/F. In Case 6 cross-validation scores for various ML regression models were evaluated for datasets solely after intravenous (IV), IV Bolus, or Constant Rate Infusion, as the route of administration. Therefore, the CL_T dataset that we used in this case was restricted to actual/true CL values. The prediction results from the cross-validation process for the imbalanced and balanced dataset appeared promising. For the raw dataset, the highest accuracy score with a balanced dataset was around 0.865 ± 0.015, achieved by tree-based (RF, CART) and MLP models. With an imbalanced dataset, the highest scores were 0.787 ± 0.041 (MLP) and 0.781 ± 0.040 (RF). In the ungulate subset, tree-based and LR models showed improved performance, achieving scores of 0.936 ± 0.029 and 0.863 ± 0.023, respectively, outperforming slightly MLP (0.935 ± 0.016, 0.780 ± 0.073) for balanced and imbalanced datasets. For the small-ruminant subset, MLP achieved scores of 0.886 ± 0.082 (balanced) and 0.796 ± 0.140 (imbalanced), while tree-based and LR models scored 0.899 ± 0.084 (balanced) and 0.746 ± 0.187 (imbalanced). In the companion animal subset, tree-based and LR models outperformed MLP with a score of 0.961 ± 0.004 compared to 0.954 ± 0.010. However, with an imbalanced dataset, MLP (0.904 ± 0.027) performed better than RF (0.883 ± 0.030).

After evaluating the imbalanced dataset using the cross-validation process, the ML models were independently adjusted by hyperparameter tuning (S2 File) to enhance prediction accuracy and explore different resampling techniques. Table 8 illustrates the prediction accuracy of the raw dataset (different animal categories), ungulates, small ruminants, and companion animals, for the RF models with 70:30 training:test data splitting for various resampling techniques. The low value of MAE and the prediction scores for the test dataset confirm that the model is effective even with an imbalanced dataset. Furthermore, Fig 7 (A–D) provides a comparative analysis of simultaneously sampled datasets (70:30 training:test data splitting ratio), highlighting the impact of considering the bioavailability of a drug through various routes of administration. The analysis shows results for three different groups separated with a dashed line: one where the dataset with all route forms is included (hybrid ML CL_T), the second one where the dataset with only the intravenous (IV) route form is considered (true CL), and the third one where the dataset with IV route form is excluded (non-IV CL/F).

Download:

Table 8. Performance Metrics R², MAE, RMSE, and EVS scores for various data resampling methods for Case 6 where Route = IV.

https://doi.org/10.1371/journal.pone.0346432.t008

Download:

Fig 7. Goodness-of-fit metrics of RF model for the true vs predicted value for (A) all dataset, (B) ungulates dataset, (C) small ruminants, (D) companion animals dataset.

True (Actual) values are fitted in the best-fit line (test data – cyan, train data – pink), and light, dark blue scatters correspond to the predicted values of test and training respectively. A vertical column separation (dashed line) is given for outcomes corresponding to three different groups: (i) datasets where all routes of administration are taken into consideration (hybrid ML CL); (ii) considering datasets with only route IV is selected, and (iii) datasets with all the routes except IV (non-IV) are considered. (70:30 train:test data splitting ratio).

https://doi.org/10.1371/journal.pone.0346432.g007

Discussion and concluding remarks

Clearance is a critical PK parameter that is used to determine the rate of drug administration required to maintain steady-state concentrations. The present study focused on exploring the feasibility of predicting individual drug CL_T by considering drug and corresponding study design variables as the predictor variables. The drug CL_T prediction model implemented in this work relied solely on the literature data [24,27] and tested the possibility of predicting drug clearance, hybrid ML CL, true CL, and CL/F, by using individual drug, dose, route of administration, and species as independent or predictor variables in a machine learning approach. When compared to our previous MRL models [50], the CL_T dataset has some limitations such as the lack of multiple records for similar CL data and the fact that it is not a physiological true CL dataset, but rather a combination of both true CL and CL/F values. CL/F is influenced by both the drug’s clearance and its bioavailability, leading to a poor reflection of the actual clearance rate, especially in cases of low bioavailability. However, the model’s R² score showed the efficiency of predicting CL_T from the imbalanced and balanced classes which is a positive note on the CL_T prediction. Similarly, other performance metrics obtained in the study also demonstrate the efficiency of ML models in accurately predicting the drug CL_T (hybrid ML CL, true CL) parameter from the selected features.

Compared to other classic PK and QSAR models [15,47,51,52], the CL_T data in this study is the hybrid ML CL_T because it includes both actual CL and apparent CL (CL/F) values curated from the scientific literature. This is a common attribute of clinical pharmacokinetic data and the values can be adjusted by bioavailability if known in a specific application. The ML models developed in this study predict PK hybrid ML CL_T (Case 1–5), and actual CL vs. CL/F (Case 6) parameters based on the drug, the animal subjects, route of administration, and dosage of the drug data available in the literature. In addition, this method does not use molecular, structural, physiological, or physiochemical descriptors of the drug/compound and concentrates on making predictions for individual drug included in the analyzed dataset and assesses the impact of its study design variables [50]. The models do not allow drug to drug extrapolations.

In the current study, different ML models including traditional SVC, k-NN, tree-based, and neural network-based models are implemented to validate the hybrid ML CL_T, and true CL prediction accuracy. For the given dataset, tree-based models outperformed any other algorithms, especially for the resampling scenarios. Both qualitative and quantitative performance measures (Table 4) confirm the interpretation. The model’s performance potentially could be improved by training it with extensive, high-quality, and comprehensive datasets covering the majority of cases. This would enable the ML model to accurately estimate the clearance values of unknown data, even if that data is incomplete or has not been previously calculated. Like the model developed for MRL prediction [50], we can extend these studies to the prediction of any PK parameters provided we have enough data to train the model.

The performance measures depicted in the Tables and Figures are for the cleaned dataset generated after the outlier removal strategy. Here, a priori threshold criteria were applied to the training dataset in accordance with the recommendations from domain experts. In this way, the model addresses the anti-leakage concern so that it prevents the test data from influencing model fitting. Additionally, the influence of outlier removal on the model performance is determined by comparing the performance metrics with and without outlier removal. It enabled the model’s sensitivity assessment to extreme values. The efficacy and robustness of the proposed ML model for hybrid ML CL and true CL prediction were confirmed by the comparable R² scores and RMSE scores in two scenarios.

The Food Animal Residue Avoidance and Depletion (FARAD) program, which started in 1982, assists producers, veterinarians, and allied professionals working with food-producing animals to avoid drug residues appearing in the food produced from these animals, avoiding residues is a PK problem. As part of the continued development of this computerized platform, we aim to generate diverse datasets and databases of PK parameters by performing automatic curation of scientific articles [27]. These databases serve as a knowledge resource summarizing the scientific literature related to PK studies and can support animal safety by limiting the need for in vivo PK procedures allowing rapid determination of strategies to mitigate drug and chemical residues in food animals under real-time clinical use scenarios. The present model is a useful tool to assess whether newly published CL data in a specific species is within expected values, and if not, to probe what conditions might explain the divergence when applied to a specific species for which drug residue predictions are required.

Data mining methodologies are applied to scientific articles or literature to extract data for research purposes. These methods treat the articles as a data resource, enabling researchers to gather and systematically analyze research papers, identify key concepts, relationships, and trends, and ultimately leverage the collective knowledge embedded within these resources [53–56]. We employed similar approaches in data collection, utilizing data extracted from HTML and PDF formats of scientific articles [57], whereas some recent studies relied on pre-existing databases such as openFDA [58], SRS [59], and BCI, Codex Alimentarius [50] as their primary data source. The data extraction and curation approaches used in this study are comparable to those presented in the studies of Gonzalez Hernandez, F. et al. [33] and Li, et al. [60], which also rely on scientific literature for pharmacokinetic datasets due to the absence of a robust database. In contrast, the studies presented in [21] focus on the manual curation of PK data and [61] focus on the OpenAI GPT-4 models for mining PK data from the scientific literature.

Some of the challenges that we faced in the current study are primarily associated with its data collection phase. Two pre-existing FARAD databases were utilized, and the initial approach involved manual curation of the existing data for this analysis [32], while the second technique involved the extraction of manuscript data from tables through an automated procedure [24,27]. The manual data extraction step was labor-intensive and time-consuming. In the automated table data extraction procedure, we had to deal with some specific scenarios including, but not limited to, (i) compounds or groups of compounds where the program was looking only for drug names based on the ATC drug classes [25,26], drug bank [62]; (ii) drug names such as sultamicillin, a prodrug for sulbactam and ampicillin was missing in our drug database [63]; and (iii) drug names mentioned as abbreviation specific to the curated manuscript [64]. However, the data collected was sufficient to create a foundation model that can be expanded upon to enhance a strategy of reinforcement learning in predicting drug CL_T [65]. The curated dataset had clearance values in different ranges and occurrences, resulting in an imbalanced dataset. The imbalanced distribution of classes is a key concern during the training phase of ML models as it can result in unsatisfactory models (classifiers/predictors), inaccurate or biased predictions as evident from the performance metrics of the raw dataset. The imbalanced class distributions were addressed by applying resampling methods such as oversampling, undersampling, and simultaneous sampling and proved the ML model’s potential to enhance the CL_T, hybrid ML CL_T, true CL, and CL/F prediction for a reliable dataset [42].

Table 9 presents a comprehensive analysis of various studies conducted on the estimation of drug clearance. Certain studies utilized both allometric and rule-based methods to estimate and predict drug clearance. These studies relied on statistical analyses to evaluate the model’s accuracy and performance [13,68,70,77,78]. Our proposed CL_T prediction model outperforms recent studies that have utilized ML models including artificial neural networks, and regression models, sacrificing prediction accuracy in the process [19,75,72]. An advantage of the proposed model is that it is not restricted to any limited set of drugs in the development phase of the model, but it is capable of accommodating combinations of the PK study design variables such as drug-dosage-route_of_administration- animal curated from the article repository [24,27]. This allows our model to be flexible enough to handle any new data records associated with CL prediction for drug-dosage-route_of_administration-animal input combination and could play a vital role in ensuring that veterinary animals receive safe, effective, and individualized drug therapy, especially in minor species (e.g., goats) where PK data is limited.

Download:

Table 9. A view of existing clearance-based prediction models. Included some of the studies based on the prediction method adopted.

https://doi.org/10.1371/journal.pone.0346432.t009

In the early phases of the study (Cases 1–5), the ML models analyzed the target CL data independently from the absorption characteristics [48,49]. For drugs given through different routes of administration, their bioavailability is typically much lower than one, indicating incomplete absorption. By definition, IV administration has a bioavailability of 1. By considering the influence of various routes of administration on the bioavailability and the CL of a drug, the later phase of the research restricts the selection of datasets to route administration as intravenous (IV), IV Bolus, or Constant Rate Infusion. The regression scores proved satisfactory for the CL prediction model, despite the small dataset (Fig 7). This demonstrates the effectiveness of the model in predicting CL_T based on the study design variables such as drug, dose, and animal.

Discussion of limitations and future directions

One of the limitations of our study is the small amount of data we used. However, the ML models were still good at predicting CL values. Because we curated the majority of the data automatically from scientific publications, the reported dose and CL values for a specific drug, route of administration, and animal, might vary across studies. This could affect model predictions. Since we used automated data extraction as one of the data extraction methods, the CL values came from both IV and non-IV routes of administration. The majority of the datasets compiled in this way come from IV studies, where CL is directly measured and remains unaffected by bioavailability. For the remaining data from extravascular routes, we used CL/F as it is the standard reported in the literature. This led us to choose hybrid ML CL models instead of total CL prediction models. Unlike other PK models that use molecular descriptors and physicochemical properties, our study is based on the PK study design variables for drugs included in the dataset. While high accuracy can be accomplished using drug, route of administration, dose, and animal as predictors, these factors alone lack mechanistic insight and cannot generalize across chemically diverse compounds (e.g., different drugs). Incorporating molecular descriptors could capture intrinsic structural and physicochemical properties affecting disposition, thereby potentially improving model trustworthiness and predictive reliability. To validate this aspect and progress our research, we are presently implementing advanced models that integrate ‘Chemical and Physical Properties: Computed Properties’ including Mw, LogP, TPSA from the PubChem database [79], along with the study design variables as predictor variables, resulting in an R² score of ~ 0.82 for the combined imbalanced dataset in both cross-validation and 70:30 training:test data split scenarios; thereby showing the potential enhancement of the clearance prediction model.

The proposed CL prediction models, although preliminary due to the limited dataset, demonstrate the value of ML models in analyzing sparse PK parameter datasets. The models developed here will support future research efforts using automated techniques, including i) analyzing PK parameters, and ii) predicting PK parameters based on variables such as the drug itself, its molecular descriptors, physiochemical properties, dose, route of administration, and species. This work is part of the ongoing FARAD program, which requires real-time estimates of drug PK parameters to calculate evidence-based withdrawal intervals (WDIs) for specific drugs in specific species. CL quantifies the rate at which a drug is eliminated from an animal’s system, directly influencing the duration of WDIs. A higher CL rate typically leads to a shorter withdrawal period, while a lower clearance rate extends the time required for drug residues to fall below regulatory limits [80]. These intervals are essential to ensure that food animal products are free from potentially harmful drug residues following extra-label drug use. Our data-mining project aims to build a comprehensive pharmaceutical database that will enhance FARAD’s capacity to provide timely and accurate residue avoidance recommendations. This objective is particularly critical given the increasing global trade of minor species across regulatory boundaries, lack of extensive PK data in minor species, as well as the growing restrictions on conducting live animal studies to generate toxicological endpoints. Leveraging ML models to predict PK parameters may assist in addressing these challenges and contributing to improved food safety and public health outcomes.

In summary, this paper introduces an automated method for data mining that can effectively estimate CL_T for drugs in a new dataset (PK study design variables test data), and this method can also be extended to estimate drug CL that is not reported or is currently missing. Databases, such as those curated in this study, play a vital role in the development of ML algorithms that effectively estimate PK parameters, such as CL. Overall, the models in this study were efficient at accurately estimating and predicting CL_T values, achieving very low MAE and RMSE scores, as well as achieving an EVS and R² score closer to 1, which is indicative of high model efficiency. The models we employed to estimate CL_T for a subset of groups such as ungulates and small ruminants had a higher degree of predictive accuracy, which might be due to a limited range of CL_T values and narrow species selection. Although the results demonstrate that study design variables can facilitate the prediction of drug CL, the approach is constrained by its focus on existing drugs and the exclusion of comprehensive drug development factors. Moreover, while the cross-species extrapolation tool shows potential, its effectiveness is constrained by the practical difficulties of obtaining accurate CL values across different species and study contexts. Future research should therefore enhance the model to incorporate additional biologically relevant predictors such as molecular descriptors and assess the performance of the proposed cross-species extrapolation tool in early-stage compound evaluation, where empirical CL data is often scarce [81,82]. Because there was a wide range of CL_T values, our study has limitations when it comes to using the leave-one-out cross-validation technique for various species or drug groups. Additionally, the current dataset considered in the model lacks consistent metadata, such as study or publication year and laboratory origin. However, it is worth mentioning that incorporating these validation schemes in future versions of the model may provide additional insight into its robustness and broader applicability to PK parameter prediction as well as model generalizability to entirely unseen dataset and model sensitivity to evolving experimental practices and source-specific reporting patterns.

In general, machine-learning models implemented in this study demonstrated robust and reliable clearance prediction using curated PK-literature data, despite the challenges including inherent sparsity, noise, and class imbalance. Feature importance and SHAP analyses showed that predictions were primarily influenced by drug related properties, dosage, species, and route of administration. This aligns with the established PK principles and supporting interpretability for medicinal chemists and DMPK scientists. The model performance was stable across several data-splitting approaches and aligned well with observed outcomes, with no systematic contradictions to known PK behavior. Though direct benchmarking against traditional allometry or IVIVE approaches was not conducted, performance was comparable to ranges reported in the literature. Future work will focus on the benchmarking with mechanistic methods, and model development incorporating expanded molecular descriptors and high-quality human data to improve translational relevance and support a comprehensive One Health predictive framework.

Supporting information

S1 File. Machine Learning-Based Unified Models for Predicting Drug Clearance from Pharmacokinetic Animal and Study Design Variables.

https://doi.org/10.1371/journal.pone.0346432.s001

(HTML)

S2 File. Best parameters identified for different data sampling methods to fit the 9 ML regression models.

Random Forest (RF), Multi-Layer Perceptron (MLP), Linear Regression (LR), Ridge Regression (RIDGE), Lasso Regression (LASSO), Elastic Net (EN), K-Neighbors (k-NN), Classification and Regression Trees (CART), and Support Vector Regressor (SVR) for six different cases. Case 1: all animal categories and route forms. Case 2: ungulates and all route forms, Case 3: small ruminants and all route forms, Case 4: companion animals and all route forms for hybrid ML CLT prediction, and Case 6: IV only dataset, CLT prediction [6A: all animal categories, 6B: ungulates, 6C: small ruminants, 6D: companion animals].

https://doi.org/10.1371/journal.pone.0346432.s002

(PDF)

Acknowledgments

Authors would like to acknowledge Kansas State University Olathe, and FARAD for supporting this study.

References

1. Lathers CM. Challenges and opportunities in animal drug development: A regulatory perspective. Nat Rev Drug Discov. 2003;2(11):915–8. pmid:14560318
- View Article
- PubMed/NCBI
- Google Scholar
2. Toutain P-L, Ferran A, Bousquet-Mélou A. Species differences in pharmacokinetics and pharmacodynamics. Handb Exp Pharmacol. 2010;(199):19–48. pmid:20204582
- View Article
- PubMed/NCBI
- Google Scholar
3. Martinez M, Modric S. Patient variation in veterinary medicine: Part I. Influence of altered physiological states. J Vet Pharmacol Ther. 2010;33(3):213–26. pmid:20557438
- View Article
- PubMed/NCBI
- Google Scholar
4. Modric S, Martinez M. Patient variation in veterinary medicine - Part II - influence of physiological variables: Variation in veterinary medicine part II. Journal of Veterinary Pharmacology and Therapeutics. 2011;34(3):209–23.
- View Article
- Google Scholar
5. Danhof M. Kinetics of drug action in disease states: towards physiology-based pharmacodynamic (PBPD) models. J Pharmacokinet Pharmacodyn. 2015;42(5):447–62. pmid:26319673
- View Article
- PubMed/NCBI
- Google Scholar
6. Feghali M, Venkataramanan R, Caritis S. Pharmacokinetics of drugs in pregnancy. Seminars in Perinatology. 2015;39(7):512.
- View Article
- Google Scholar
7. Steinberg I. Pharmacokinetics of Drugs in Pregnancy and Lactation. Cardiac Problems in Pregnancy. John Wiley & Sons, Ltd. 2019. 433–55.
8. Toutain PL, Bousquet-Mélou A. Volumes of distribution. J Vet Pharmacol Ther. 2004;27(6):441–53. pmid:15601439
- View Article
- PubMed/NCBI
- Google Scholar
9. Toutain PL, Bousquet-Mélou A. Plasma clearance. J Vet Pharmacol Ther. 2004;27(6):415–25. pmid:15601437
- View Article
- PubMed/NCBI
- Google Scholar
10. Riviere JE. Comparative pharmacokinetics: principles, techniques and applications. John Wiley & Sons; 2011.
11. Riley RJ, McGinnity DF, Austin RP. A unified model for predicting human hepatic, metabolic clearance from in vitro intrinsic clearance data in hepatocytes and microsomes. Drug Metab Dispos. 2005;33(9):1304–11. pmid:15932954
- View Article
- PubMed/NCBI
- Google Scholar
12. Visser M, Zaya MJ, Locuson CW, Boothe DM, Merritt DA. Comparison of predicted intrinsic hepatic clearance of 30 pharmaceuticals in canine and feline liver microsomes. Xenobiotica. 2018;49(2):177–86.
- View Article
- Google Scholar
13. Huang Q, Gehring R, Tell LA, Li M, Riviere JE. Interspecies allometric meta-analysis of the comparative pharmacokinetics of 85 drugs across veterinary and laboratory animal species. J Vet Pharmacol Ther. 2015;38(3):214–26. pmid:25333341
- View Article
- PubMed/NCBI
- Google Scholar
14. Mahmood I, Martinez M, Hunter RP. Interspecies allometric scaling. Part I: Prediction of clearance in large animals. J Vet Pharmacol Ther. 2006;29(5):415–23. pmid:16958787
- View Article
- PubMed/NCBI
- Google Scholar
15. Tess DA, Ryu S, Di L. In Vitro - in vivo extrapolation of hepatic clearance in preclinical species. Pharm Res. 2022;39(7):1615–32. pmid:35257289
- View Article
- PubMed/NCBI
- Google Scholar
16. Lin Z, Gehring R, Mochel JP, Lavé T, Riviere JE. Mathematical modeling and simulation in animal health - Part II: Principles, methods, applications, and value of physiologically based pharmacokinetic modeling in veterinary medicine and food safety assessment. J Vet Pharmacol Ther. 2016;39(5):421–38. pmid:27086878
- View Article
- PubMed/NCBI
- Google Scholar
17. Ahmadi M, Alizadeh B, Ayyoubzadeh SM, Abiyarghamsari M. Predicting pharmacokinetics of drugs using artificial intelligence tools: A systematic review. Eur J Drug Metab Pharmacokinet. 2024;49(3):249–62. pmid:38457092
- View Article
- PubMed/NCBI
- Google Scholar
18. Iwata H, Matsuo T, Mamada H, Motomura T, Matsushita M, Fujiwara T, et al. Prediction of total drug clearance in humans using animal data: Proposal of a multimodal learning method based on deep learning. J Pharm Sci. 2021;110(4):1834–41. pmid:33497658
- View Article
- PubMed/NCBI
- Google Scholar
19. Iwata H, Matsuo T, Mamada H, Motomura T, Matsushita M, Fujiwara T, et al. Predicting total drug clearance and volumes of distribution using the machine learning-mediated multimodal method through the imputation of various nonclinical data. J Chem Inf Model. 2022;62(17):4057–65. pmid:35993595
- View Article
- PubMed/NCBI
- Google Scholar
20. Wu P-Y, Chou W-C, Wu X, Kamineni VN, Kuchimanchi Y, Tell LA, et al. Development of machine learning-based quantitative structure–activity relationship models for predicting plasma half-lives of drugs in six common food animal species. Toxicological Sciences. 2024;203(1):52–66.
- View Article
- Google Scholar
21. Inauen D, Lautz LS, Hendriks AJ, Gehring R. Augmented allometric scaling: Predicting drug clearance in farm animals with machine learning using body weight. Computational Toxicology. 2025;33:100341.
- View Article
- Google Scholar
22. Chou WC, Lin Z. Machine learning and artificial intelligence in physiologically based pharmacokinetic modeling. Toxicol Sci. 2023;191(1):1–14.
- View Article
- Google Scholar
23. Grzegorzewski J, Brandhorst J, Green K, Eleftheriadou D, Duport Y, Barthorscht F, et al. PK-DB: pharmacokinetics database for individualized and stratified computational modeling. Nucleic Acids Res. 2021;49(D1):D1358–64. pmid:33151297
- View Article
- PubMed/NCBI
- Google Scholar
24. Ampadi Ramachandran R, Tell LA, Rai S, Millagaha Gedara NI, Xu X, Riviere JE. An automated customizable live web crawler for curation of comparative pharmacokinetic data: An intelligent compilation of research-based comprehensive article repository. Pharmaceutics. 2023;15(5):1384.
- View Article
- Google Scholar
25. WHOCC - ATCvet Index. https://www.whocc.no/atcvet/atcvet_index/. Accessed 2022 October 31.
26. WHOCC - ATC/DDD Index. https://www.whocc.no/atc_ddd_index/?code=J04B&showdescription=no. Accessed 2022 August 25.
27. Ampadi Ramachandran R, Sholehrasa H, Tell L, Caragea D, Jaberi-Douraki M. Enhancing pharmacokinetic data extraction with LLMs and rule-based methods: a hybrid approach using machine learning and regex. Raleigh, NC. https://cdn.ymaws.com/www.aavpt.org/resource/resmgr/biennial_2025/proceedings_for_the_23rd_aav.pdf
- View Article
- Google Scholar
28. Ampadi Ramachandran R, Tell LA, Rai S, Millagaha Gedara N, Sholehrasa H, Riviere JE, etal. Automated extraction of pharmacokinetic parameters from structured XML scientific articles: Enhancing data accessibility at scale. arXiv; 2026. https://doi.org/10.48550/arXiv.2604.21063 Accessed 2026 April 22.
29. Sholehrasa H, Ghanaatian A, Caragea D, Tell LA, Riviere JE, Jaberi-Douraki M. AutoPK: Leveraging LLMs and a hybrid similarity metric for advanced retrieval of pharmacokinetic data from complex tables and documents. In: 2025 IEEE 37th International Conference on Tools with Artificial Intelligence (ICTAI) . 2025. p. 338–46. https://doi.org/10.1109/ICTAI66417.2025.00051 Accessed 2025 January 09.
30. Python 3.0 Release. https://www.python.org/download/releases/3.0/. Accessed 2023 February 6.
31. Dumont JA, Loveday KS, Light DR, Pierce GF, Jiang H. Evaluation of the toxicology and pharmacokinetics of recombinant factor VIII Fc fusion protein in animals. Thromb Res. 2015;136(6):1266–72. pmid:26514955
- View Article
- PubMed/NCBI
- Google Scholar
32. Home | FARAD. http://www.farad.org/. Accessed 2023 February 2.
33. Gonzalez Hernandez F, Nguyen Q, Smith VC, Cordero JA, Ballester MR, Duran M, et al. Named entity recognition of pharmacokinetic parameters in the scientific literature. Sci Rep. 2024;14(1):23485.
- View Article
- Google Scholar
34. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research. 2011;12:2825–30.
- View Article
- Google Scholar
35. Scikit-learn. https://scikit-learn/stable/modules/generated/sklearn.metrics.balanced_accuracy_score.html. Accessed 2024 March 26.
36. Imbalanced-learn documentation — Version 0.12.0. https://imbalanced-learn.org/stable/. Accessed 2024 February 13.
37. Pipeline — Version 0.12.0. https://imbalanced-learn.org/stable/references/generated/imblearn.pipeline.Pipeline.html. Accessed 2024 February 13.
38. Feature importances with a forest of trees. https://scikit-learn/stable/auto_examples/ensemble/plot_forest_importances.html. Accessed 2024 March 28.
39. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 2017. https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html
- View Article
- Google Scholar
40. Cross-validation: evaluating estimator performance. scikit-learn. https://scikit-learn/stable/modules/cross_validation.html. Accessed 2024 March 27.
41. Scikit-learn. https://scikit-learn/stable/modules/generated/sklearn.metrics.r2_score.html. Accessed 2022 September 26.
42. User guide: contents. https://imbalanced-learn.org/stable/user_guide.html. Accessed 2023 April 21.
43. Pradeep P, Patlewicz G, Pearce R, Wambaugh J, Wetmore B, Judson R. Using chemical structure information to develop predictive models for in vitro toxicokinetic parameters to inform high-throughput risk-assessment. Comput Toxicol. 2020;16: pmid:34124416
- View Article
- PubMed/NCBI
- Google Scholar
44. Kosugi Y, Hosea N. Direct comparison of total clearance prediction: computational machine learning model versus bottom-up approach using in vitro assay. Mol Pharm. 2020;17(7):2299–309. pmid:32478525
- View Article
- PubMed/NCBI
- Google Scholar
45. Chen J, Yang H, Zhu L, Wu Z, Li W, Tang Y, et al. In silico prediction of human renal clearance of compounds using quantitative structure-pharmacokinetic relationship models. Chem Res Toxicol. 2020;33(2):640–50. pmid:31957435
- View Article
- PubMed/NCBI
- Google Scholar
46. Lombardo F, Bentzien J, Berellini G, Muegge I. Prediction of human clearance using in silico models with reduced bias. Mol Pharm. 2024;21(3):1192–203. pmid:38285644
- View Article
- PubMed/NCBI
- Google Scholar
47. Keefer CE, Chang G, Di L, Woody NA, Tess DA, Osgood SM, et al. The comparison of machine learning and mechanistic in vitro-in vivo extrapolation models for the prediction of human intrinsic clearance. Mol Pharm. 2023;20(11):5616–30. pmid:37812508
- View Article
- PubMed/NCBI
- Google Scholar
48. Benet LZ, Zia-Amirhosseini P. Basic principles of pharmacokinetics. Toxicol Pathol. 1995;23(2):115–23. pmid:7569664
- View Article
- PubMed/NCBI
- Google Scholar
49. Holford N, Yim DS. Clearance. Translational and Clinical Pharmacology. 2015;23(2):42–5.
- View Article
- Google Scholar
50. Zad N, Tell LA, Ampadi Ramachandran R, Xu X, Riviere JE, Baynes R, et al. Development of machine learning algorithms to estimate maximum residue limits for veterinary medicines. Food and Chemical Toxicology. 2023;179:113920.
- View Article
- Google Scholar
51. Howgate EM, Rowland Yeo K, Proctor NJ, Tucker GT, Rostami-Hodjegan A. Prediction ofin vivodrug clearance fromin vitrodata. I: Impact of inter-individual variability. Xenobiotica. 2006;36(6):473–97.
- View Article
- Google Scholar
52. Miljković F, Martinsson A, Obrezanova O, Williamson B, Johnson M, Sykes A, et al. Machine learning models for human in vivo pharmacokinetic parameters with in-house validation. Mol Pharm. 2021;18(12):4520–30. pmid:34758626
- View Article
- PubMed/NCBI
- Google Scholar
53. Smith VC, Gonzalez Hernandez F, Wattanakul T, Chotsiri P, Cordero JA, Ballester MR. An automated classification pipeline for tables in pharmacokinetic literature. Scientific Reports. 2025;15(1):10071.
- View Article
- Google Scholar
54. Meta-Research: A Collection of Articles. eLife. https://elifesciences.org/collections/8d233d47/meta-research-a-collection-of-articles. 2018. Accessed 2025 March 28.
55. Himmelstein DS, Romero AR, Levernier JG, Munro TA, McLaughlin SR, Greshake Tzovaras B, et al. Sci-Hub provides access to nearly all scholarly literature. Rodgers PA, editor. eLife. 2018 Feb 9;7:e32822. https://doi.org/10.7554/eLife.32822
56. Bone A, Houck K. The benefits of data mining. Elife. 2017;6:e30280. pmid:28813246
- View Article
- PubMed/NCBI
- Google Scholar
57. Jaberi-Douraki M, Taghian Dinani S, Millagaha Gedara NI, Xu X, Richards E, Maunsell F. Large-scale data mining of rapid residue detection assay data from HTML and PDF documents: improving data access and visualization for veterinarians. Frontiers in Veterinary Science. 2021;8:674730.
- View Article
- Google Scholar
58. Jaberi-Douraki M, Xu X, Dima D, Ailawadhi S, Anwer F, Mazzoni S. Global disparities in drug-related adverse events of patients with multiple myeloma: A pharmacovigilance study. Blood Cancer J. 2024;14(1):1–10.
- View Article
- Google Scholar
59. Xu X, Riviere JE, Raza S, Millagaha Gedara NI, Ampadi Ramachandran R, Tell LA, et al. In-silico approaches to assessing multiple high-level drug-drug and drug-disease adverse drug effects. Expert Opin Drug Metab Toxicol. 2024;20(7):579–92. pmid:38299552
- View Article
- PubMed/NCBI
- Google Scholar
60. Li Y, Shen Y, Cai Y, Zhang Y, Gao J, Huang L, et al. Integrating transcriptomic data with a novel drug efficacy prediction model for TCM active compound discovery. Sci Rep. 2025;15(1):7688. pmid:40044718
- View Article
- PubMed/NCBI
- Google Scholar
61. Niu Z, Xiao X, Wu W, Cai Q, Jiang Y, Jin W, et al. PharmaBench: Enhancing ADMET benchmarks with large language models. Sci Data. 2024;11(1):985. pmid:39256394
- View Article
- PubMed/NCBI
- Google Scholar
62. Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, et al. DrugBank: A comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 2006;34(Database issue):D668-72. pmid:16381955
- View Article
- PubMed/NCBI
- Google Scholar
63. Lode H. Role of sultamicillin and ampicillin/sulbactam in the treatment of upper and lower bacterial respiratory tract infections. Int J Antimicrob Agents. 2001;18(3):199–209. pmid:11673031
- View Article
- PubMed/NCBI
- Google Scholar
64. Hoffman A, Stepensky D, Ezra A, Van Gelder JM, Golomb G. Mode of administration-dependent pharmacokinetics of bisphosphonates and bioavailability determination. Int J Pharm. 2001;220(1–2):1–11. pmid:11376962
- View Article
- PubMed/NCBI
- Google Scholar
65. Pyqlearning: Pyqlearning is Python library to implement reinforcement learning and deep reinforcement learning, especially for Q-learning, deep Q-network, and multi-agent deep Q-network which can be optimized by annealing models such as simulated annealing, adaptive simulated annealing, and quantum Monte Carlo method. https://github.com/accel-brain/accel-brain-code/tree/master/Reinforcement-Learning. Accessed 2023 September 19.
66. Ito K, Houston JB. Comparison of the use of liver models for predicting drug clearance using in vitro kinetic data from hepatic microsomes and isolated hepatocytes. Pharm Res. 2004;21(5):785–92. pmid:15180335
- View Article
- PubMed/NCBI
- Google Scholar
67. Ito K, Houston JB. Prediction of human drug clearance from in vitro and preclinical data using physiologically based and empirical approaches. Pharm Res. 2005;22(1):103–12.
- View Article
- Google Scholar
68. Tang H, Mayersohn M. A novel model for prediction of human drug clearance by allometric scaling. Drug Metabolism and Disposition. 2005;33(9):1297–303.
- View Article
- Google Scholar
69. Tang H, Hussain A, Leal M, Mayersohn M, Fluhler E. Interspecies prediction of human drug clearance based on scaling data from one or two animal species. Drug Metab Dispos. 2007;35(10):1886–93.
- View Article
- Google Scholar
70. Mahmood I. Prediction of human drug clearance from animal data: application of the rule of exponents and “fu Corrected Intercept Method” (FCIM). J Pharm Sci. 2006;95(8):1810–21. pmid:16795002
- View Article
- PubMed/NCBI
- Google Scholar
71. Goteti K, Brassil PJ, Good SS, Garner CE. Estimation of human drug clearance using multiexponential techniques. J Clin Pharmacol. 2008;48(10):1226–36. pmid:18559487
- View Article
- PubMed/NCBI
- Google Scholar
72. Paixão P, Gouveia LF, Morais JAG. Prediction of the in vitro intrinsic clearance determined in suspensions of human hepatocytes by using artificial neural networks. European Journal of Pharmaceutical Sciences. 2010;39(5):310–21.
- View Article
- Google Scholar
73. Varma MV, Steyn SJ, Allerton C, El-Kattan AF. Predicting clearance mechanism in drug discovery: extended clearance classification system (ECCS). Pharm Res. 2015;32(12):3785–802.
- View Article
- Google Scholar
74. Zhuang X, Lu C. PBPK modeling and simulation in drug research and development. Acta Pharm Sin B. 2016;6(5):430–40. pmid:27909650
- View Article
- PubMed/NCBI
- Google Scholar
75. Wang Y, Liu H, Fan Y, Chen X, Yang Y, Zhu L. In silico prediction of human intravenous pharmacokinetic parameters with improved accuracy. J Chem Inf Model. 2019;59(9):3968–80.
- View Article
- Google Scholar
76. Yoshida K, Doi Y, Iwazaki N, Yasuhara H, Ikenaga Y, Shimizu H, et al. Prediction of human pharmacokinetics for low-clearance compounds using pharmacokinetic data from chimeric mice with humanized livers. Clin Transl Sci. 2022;15(1):79–91. pmid:34080287
- View Article
- PubMed/NCBI
- Google Scholar
77. Proctor NJ, Tucker GT, Rostami-Hodjegan A. Predicting drug clearance from recombinantly expressed CYPs: intersystem extrapolation factors. Xenobiotica. 2004;34(2):151–78.
- View Article
- Google Scholar
78. Huang Q, Riviere JE. The application of allometric scaling principles to predict pharmacokinetic parameters across species. Expert Opin Drug Metab Toxicol. 2014;10(9):1241–53. pmid:24984569
- View Article
- PubMed/NCBI
- Google Scholar
79. PubChem. PubChem. https://pubchem.ncbi.nlm.nih.gov/. Accessed 2025 October 2.
80. Riviere JE, Papich MG. Veterinary pharmacology and therapeutics. John Wiley & Sons; 2018. https://books.google.com/books?hl=en&lr=&id=rqVFDwAAQBAJ&oi=fnd&pg=PT4&dq=Veterinary+Pharmacokinetics+Jim+E.+Riviere+and+Mark+G.+Papich&ots=0SiB3BxNXJ&sig=WyezYyCc4TbSfh80ncgZBkBBmyk
81. Beaumont K, Gosset JR, Keefer CE. Integrated assessment of drug clearance and cross-species scalability. Predictive ADMET. John Wiley & Sons, Ltd. 2014. 291–318.
82. Thiel C, Hofmann U, Ghallab A, Gebhardt R, Hengstler JG, Kuepfer L. Towards knowledge-driven cross-species extrapolation. Drug Discovery Today: Disease Models. 2016;22:21–6.
- View Article
- Google Scholar

[ref1] 1. Lathers CM. Challenges and opportunities in animal drug development: A regulatory perspective. Nat Rev Drug Discov. 2003;2(11):915–8. pmid:14560318
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Toutain P-L, Ferran A, Bousquet-Mélou A. Species differences in pharmacokinetics and pharmacodynamics. Handb Exp Pharmacol. 2010;(199):19–48. pmid:20204582
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Martinez M, Modric S. Patient variation in veterinary medicine: Part I. Influence of altered physiological states. J Vet Pharmacol Ther. 2010;33(3):213–26. pmid:20557438
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Modric S, Martinez M. Patient variation in veterinary medicine - Part II - influence of physiological variables: Variation in veterinary medicine part II. Journal of Veterinary Pharmacology and Therapeutics. 2011;34(3):209–23.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref5] 5. Danhof M. Kinetics of drug action in disease states: towards physiology-based pharmacodynamic (PBPD) models. J Pharmacokinet Pharmacodyn. 2015;42(5):447–62. pmid:26319673
View Article
PubMed/NCBI
Google Scholar

[17] View Article

[18] PubMed/NCBI

[19] Google Scholar

[ref6] 6. Feghali M, Venkataramanan R, Caritis S. Pharmacokinetics of drugs in pregnancy. Seminars in Perinatology. 2015;39(7):512.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref7] 7. Steinberg I. Pharmacokinetics of Drugs in Pregnancy and Lactation. Cardiac Problems in Pregnancy. John Wiley & Sons, Ltd. 2019. 433–55.

[ref8] 8. Toutain PL, Bousquet-Mélou A. Volumes of distribution. J Vet Pharmacol Ther. 2004;27(6):441–53. pmid:15601439
View Article
PubMed/NCBI
Google Scholar

[25] View Article

[26] PubMed/NCBI

[27] Google Scholar

[ref9] 9. Toutain PL, Bousquet-Mélou A. Plasma clearance. J Vet Pharmacol Ther. 2004;27(6):415–25. pmid:15601437
View Article
PubMed/NCBI
Google Scholar

[29] View Article

[30] PubMed/NCBI

[31] Google Scholar

[ref10] 10. Riviere JE. Comparative pharmacokinetics: principles, techniques and applications. John Wiley & Sons; 2011.

[ref11] 11. Riley RJ, McGinnity DF, Austin RP. A unified model for predicting human hepatic, metabolic clearance from in vitro intrinsic clearance data in hepatocytes and microsomes. Drug Metab Dispos. 2005;33(9):1304–11. pmid:15932954
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref12] 12. Visser M, Zaya MJ, Locuson CW, Boothe DM, Merritt DA. Comparison of predicted intrinsic hepatic clearance of 30 pharmaceuticals in canine and feline liver microsomes. Xenobiotica. 2018;49(2):177–86.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref13] 13. Huang Q, Gehring R, Tell LA, Li M, Riviere JE. Interspecies allometric meta-analysis of the comparative pharmacokinetics of 85 drugs across veterinary and laboratory animal species. J Vet Pharmacol Ther. 2015;38(3):214–26. pmid:25333341
View Article
PubMed/NCBI
Google Scholar

[41] View Article

[42] PubMed/NCBI

[43] Google Scholar

[ref14] 14. Mahmood I, Martinez M, Hunter RP. Interspecies allometric scaling. Part I: Prediction of clearance in large animals. J Vet Pharmacol Ther. 2006;29(5):415–23. pmid:16958787
View Article
PubMed/NCBI
Google Scholar

[45] View Article

[46] PubMed/NCBI

[47] Google Scholar

[ref15] 15. Tess DA, Ryu S, Di L. In Vitro - in vivo extrapolation of hepatic clearance in preclinical species. Pharm Res. 2022;39(7):1615–32. pmid:35257289
View Article
PubMed/NCBI
Google Scholar

[49] View Article

[50] PubMed/NCBI

[51] Google Scholar

[ref16] 16. Lin Z, Gehring R, Mochel JP, Lavé T, Riviere JE. Mathematical modeling and simulation in animal health - Part II: Principles, methods, applications, and value of physiologically based pharmacokinetic modeling in veterinary medicine and food safety assessment. J Vet Pharmacol Ther. 2016;39(5):421–38. pmid:27086878
View Article
PubMed/NCBI
Google Scholar

[53] View Article

[54] PubMed/NCBI

[55] Google Scholar

[ref17] 17. Ahmadi M, Alizadeh B, Ayyoubzadeh SM, Abiyarghamsari M. Predicting pharmacokinetics of drugs using artificial intelligence tools: A systematic review. Eur J Drug Metab Pharmacokinet. 2024;49(3):249–62. pmid:38457092
View Article
PubMed/NCBI
Google Scholar

[57] View Article

[58] PubMed/NCBI

[59] Google Scholar

[ref18] 18. Iwata H, Matsuo T, Mamada H, Motomura T, Matsushita M, Fujiwara T, et al. Prediction of total drug clearance in humans using animal data: Proposal of a multimodal learning method based on deep learning. J Pharm Sci. 2021;110(4):1834–41. pmid:33497658
View Article
PubMed/NCBI
Google Scholar

[61] View Article

[62] PubMed/NCBI

[63] Google Scholar

[ref19] 19. Iwata H, Matsuo T, Mamada H, Motomura T, Matsushita M, Fujiwara T, et al. Predicting total drug clearance and volumes of distribution using the machine learning-mediated multimodal method through the imputation of various nonclinical data. J Chem Inf Model. 2022;62(17):4057–65. pmid:35993595
View Article
PubMed/NCBI
Google Scholar

[65] View Article

[66] PubMed/NCBI

[67] Google Scholar

[ref20] 20. Wu P-Y, Chou W-C, Wu X, Kamineni VN, Kuchimanchi Y, Tell LA, et al. Development of machine learning-based quantitative structure–activity relationship models for predicting plasma half-lives of drugs in six common food animal species. Toxicological Sciences. 2024;203(1):52–66.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref21] 21. Inauen D, Lautz LS, Hendriks AJ, Gehring R. Augmented allometric scaling: Predicting drug clearance in farm animals with machine learning using body weight. Computational Toxicology. 2025;33:100341.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref22] 22. Chou WC, Lin Z. Machine learning and artificial intelligence in physiologically based pharmacokinetic modeling. Toxicol Sci. 2023;191(1):1–14.
View Article
Google Scholar

[75] View Article

[76] Google Scholar

[ref23] 23. Grzegorzewski J, Brandhorst J, Green K, Eleftheriadou D, Duport Y, Barthorscht F, et al. PK-DB: pharmacokinetics database for individualized and stratified computational modeling. Nucleic Acids Res. 2021;49(D1):D1358–64. pmid:33151297
View Article
PubMed/NCBI
Google Scholar

[78] View Article

[79] PubMed/NCBI

[80] Google Scholar

[ref24] 24. Ampadi Ramachandran R, Tell LA, Rai S, Millagaha Gedara NI, Xu X, Riviere JE. An automated customizable live web crawler for curation of comparative pharmacokinetic data: An intelligent compilation of research-based comprehensive article repository. Pharmaceutics. 2023;15(5):1384.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref25] 25. WHOCC - ATCvet Index. https://www.whocc.no/atcvet/atcvet_index/. Accessed 2022 October 31.

[ref26] 26. WHOCC - ATC/DDD Index. https://www.whocc.no/atc_ddd_index/?code=J04B&showdescription=no. Accessed 2022 August 25.

[ref27] 27. Ampadi Ramachandran R, Sholehrasa H, Tell L, Caragea D, Jaberi-Douraki M. Enhancing pharmacokinetic data extraction with LLMs and rule-based methods: a hybrid approach using machine learning and regex. Raleigh, NC. https://cdn.ymaws.com/www.aavpt.org/resource/resmgr/biennial_2025/proceedings_for_the_23rd_aav.pdf
View Article
Google Scholar

[87] View Article

[88] Google Scholar

[ref28] 28. Ampadi Ramachandran R, Tell LA, Rai S, Millagaha Gedara N, Sholehrasa H, Riviere JE, etal. Automated extraction of pharmacokinetic parameters from structured XML scientific articles: Enhancing data accessibility at scale. arXiv; 2026. https://doi.org/10.48550/arXiv.2604.21063 Accessed 2026 April 22.

[ref29] 29. Sholehrasa H, Ghanaatian A, Caragea D, Tell LA, Riviere JE, Jaberi-Douraki M. AutoPK: Leveraging LLMs and a hybrid similarity metric for advanced retrieval of pharmacokinetic data from complex tables and documents. In: 2025 IEEE 37th International Conference on Tools with Artificial Intelligence (ICTAI) . 2025. p. 338–46. https://doi.org/10.1109/ICTAI66417.2025.00051 Accessed 2025 January 09.

[ref30] 30. Python 3.0 Release. https://www.python.org/download/releases/3.0/. Accessed 2023 February 6.

[ref31] 31. Dumont JA, Loveday KS, Light DR, Pierce GF, Jiang H. Evaluation of the toxicology and pharmacokinetics of recombinant factor VIII Fc fusion protein in animals. Thromb Res. 2015;136(6):1266–72. pmid:26514955
View Article
PubMed/NCBI
Google Scholar

[93] View Article

[94] PubMed/NCBI

[95] Google Scholar

[ref32] 32. Home | FARAD. http://www.farad.org/. Accessed 2023 February 2.

[ref33] 33. Gonzalez Hernandez F, Nguyen Q, Smith VC, Cordero JA, Ballester MR, Duran M, et al. Named entity recognition of pharmacokinetic parameters in the scientific literature. Sci Rep. 2024;14(1):23485.
View Article
Google Scholar

[98] View Article

[99] Google Scholar

[ref34] 34. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research. 2011;12:2825–30.
View Article
Google Scholar

[101] View Article

[102] Google Scholar

[ref35] 35. Scikit-learn. https://scikit-learn/stable/modules/generated/sklearn.metrics.balanced_accuracy_score.html. Accessed 2024 March 26.

[ref36] 36. Imbalanced-learn documentation — Version 0.12.0. https://imbalanced-learn.org/stable/. Accessed 2024 February 13.

[ref37] 37. Pipeline — Version 0.12.0. https://imbalanced-learn.org/stable/references/generated/imblearn.pipeline.Pipeline.html. Accessed 2024 February 13.

[ref38] 38. Feature importances with a forest of trees. https://scikit-learn/stable/auto_examples/ensemble/plot_forest_importances.html. Accessed 2024 March 28.

[ref39] 39. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 2017. https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html
View Article
Google Scholar

[108] View Article

[109] Google Scholar

[ref40] 40. Cross-validation: evaluating estimator performance. scikit-learn. https://scikit-learn/stable/modules/cross_validation.html. Accessed 2024 March 27.

[ref41] 41. Scikit-learn. https://scikit-learn/stable/modules/generated/sklearn.metrics.r2_score.html. Accessed 2022 September 26.

[ref42] 42. User guide: contents. https://imbalanced-learn.org/stable/user_guide.html. Accessed 2023 April 21.

[ref43] 43. Pradeep P, Patlewicz G, Pearce R, Wambaugh J, Wetmore B, Judson R. Using chemical structure information to develop predictive models for in vitro toxicokinetic parameters to inform high-throughput risk-assessment. Comput Toxicol. 2020;16: pmid:34124416
View Article
PubMed/NCBI
Google Scholar

[114] View Article

[115] PubMed/NCBI

[116] Google Scholar

[ref44] 44. Kosugi Y, Hosea N. Direct comparison of total clearance prediction: computational machine learning model versus bottom-up approach using in vitro assay. Mol Pharm. 2020;17(7):2299–309. pmid:32478525
View Article
PubMed/NCBI
Google Scholar

[118] View Article

[119] PubMed/NCBI

[120] Google Scholar

[ref45] 45. Chen J, Yang H, Zhu L, Wu Z, Li W, Tang Y, et al. In silico prediction of human renal clearance of compounds using quantitative structure-pharmacokinetic relationship models. Chem Res Toxicol. 2020;33(2):640–50. pmid:31957435
View Article
PubMed/NCBI
Google Scholar

[122] View Article

[123] PubMed/NCBI

[124] Google Scholar

[ref46] 46. Lombardo F, Bentzien J, Berellini G, Muegge I. Prediction of human clearance using in silico models with reduced bias. Mol Pharm. 2024;21(3):1192–203. pmid:38285644
View Article
PubMed/NCBI
Google Scholar

[126] View Article

[127] PubMed/NCBI

[128] Google Scholar

[ref47] 47. Keefer CE, Chang G, Di L, Woody NA, Tess DA, Osgood SM, et al. The comparison of machine learning and mechanistic in vitro-in vivo extrapolation models for the prediction of human intrinsic clearance. Mol Pharm. 2023;20(11):5616–30. pmid:37812508
View Article
PubMed/NCBI
Google Scholar

[130] View Article

[131] PubMed/NCBI

[132] Google Scholar

[ref48] 48. Benet LZ, Zia-Amirhosseini P. Basic principles of pharmacokinetics. Toxicol Pathol. 1995;23(2):115–23. pmid:7569664
View Article
PubMed/NCBI
Google Scholar

[134] View Article

[135] PubMed/NCBI

[136] Google Scholar

[ref49] 49. Holford N, Yim DS. Clearance. Translational and Clinical Pharmacology. 2015;23(2):42–5.
View Article
Google Scholar

[138] View Article

[139] Google Scholar

[ref50] 50. Zad N, Tell LA, Ampadi Ramachandran R, Xu X, Riviere JE, Baynes R, et al. Development of machine learning algorithms to estimate maximum residue limits for veterinary medicines. Food and Chemical Toxicology. 2023;179:113920.
View Article
Google Scholar

[141] View Article

[142] Google Scholar

[ref51] 51. Howgate EM, Rowland Yeo K, Proctor NJ, Tucker GT, Rostami-Hodjegan A. Prediction ofin vivodrug clearance fromin vitrodata. I: Impact of inter-individual variability. Xenobiotica. 2006;36(6):473–97.
View Article
Google Scholar

[144] View Article

[145] Google Scholar

[ref52] 52. Miljković F, Martinsson A, Obrezanova O, Williamson B, Johnson M, Sykes A, et al. Machine learning models for human in vivo pharmacokinetic parameters with in-house validation. Mol Pharm. 2021;18(12):4520–30. pmid:34758626
View Article
PubMed/NCBI
Google Scholar

[147] View Article

[148] PubMed/NCBI

[149] Google Scholar

[ref53] 53. Smith VC, Gonzalez Hernandez F, Wattanakul T, Chotsiri P, Cordero JA, Ballester MR. An automated classification pipeline for tables in pharmacokinetic literature. Scientific Reports. 2025;15(1):10071.
View Article
Google Scholar

[151] View Article

[152] Google Scholar

[ref54] 54. Meta-Research: A Collection of Articles. eLife. https://elifesciences.org/collections/8d233d47/meta-research-a-collection-of-articles. 2018. Accessed 2025 March 28.

[ref55] 55. Himmelstein DS, Romero AR, Levernier JG, Munro TA, McLaughlin SR, Greshake Tzovaras B, et al. Sci-Hub provides access to nearly all scholarly literature. Rodgers PA, editor. eLife. 2018 Feb 9;7:e32822. https://doi.org/10.7554/eLife.32822

[ref56] 56. Bone A, Houck K. The benefits of data mining. Elife. 2017;6:e30280. pmid:28813246
View Article
PubMed/NCBI
Google Scholar

[156] View Article

[157] PubMed/NCBI

[158] Google Scholar

[ref57] 57. Jaberi-Douraki M, Taghian Dinani S, Millagaha Gedara NI, Xu X, Richards E, Maunsell F. Large-scale data mining of rapid residue detection assay data from HTML and PDF documents: improving data access and visualization for veterinarians. Frontiers in Veterinary Science. 2021;8:674730.
View Article
Google Scholar

[160] View Article

[161] Google Scholar

[ref58] 58. Jaberi-Douraki M, Xu X, Dima D, Ailawadhi S, Anwer F, Mazzoni S. Global disparities in drug-related adverse events of patients with multiple myeloma: A pharmacovigilance study. Blood Cancer J. 2024;14(1):1–10.
View Article
Google Scholar

[163] View Article

[164] Google Scholar

[ref59] 59. Xu X, Riviere JE, Raza S, Millagaha Gedara NI, Ampadi Ramachandran R, Tell LA, et al. In-silico approaches to assessing multiple high-level drug-drug and drug-disease adverse drug effects. Expert Opin Drug Metab Toxicol. 2024;20(7):579–92. pmid:38299552
View Article
PubMed/NCBI
Google Scholar

[166] View Article

[167] PubMed/NCBI

[168] Google Scholar

[ref60] 60. Li Y, Shen Y, Cai Y, Zhang Y, Gao J, Huang L, et al. Integrating transcriptomic data with a novel drug efficacy prediction model for TCM active compound discovery. Sci Rep. 2025;15(1):7688. pmid:40044718
View Article
PubMed/NCBI
Google Scholar

[170] View Article

[171] PubMed/NCBI

[172] Google Scholar

[ref61] 61. Niu Z, Xiao X, Wu W, Cai Q, Jiang Y, Jin W, et al. PharmaBench: Enhancing ADMET benchmarks with large language models. Sci Data. 2024;11(1):985. pmid:39256394
View Article
PubMed/NCBI
Google Scholar

[174] View Article

[175] PubMed/NCBI

[176] Google Scholar

[ref62] 62. Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, et al. DrugBank: A comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 2006;34(Database issue):D668-72. pmid:16381955
View Article
PubMed/NCBI
Google Scholar

[178] View Article

[179] PubMed/NCBI

[180] Google Scholar

[ref63] 63. Lode H. Role of sultamicillin and ampicillin/sulbactam in the treatment of upper and lower bacterial respiratory tract infections. Int J Antimicrob Agents. 2001;18(3):199–209. pmid:11673031
View Article
PubMed/NCBI
Google Scholar

[182] View Article

[183] PubMed/NCBI

[184] Google Scholar

[ref64] 64. Hoffman A, Stepensky D, Ezra A, Van Gelder JM, Golomb G. Mode of administration-dependent pharmacokinetics of bisphosphonates and bioavailability determination. Int J Pharm. 2001;220(1–2):1–11. pmid:11376962
View Article
PubMed/NCBI
Google Scholar

[186] View Article

[187] PubMed/NCBI

[188] Google Scholar

[ref65] 65. Pyqlearning: Pyqlearning is Python library to implement reinforcement learning and deep reinforcement learning, especially for Q-learning, deep Q-network, and multi-agent deep Q-network which can be optimized by annealing models such as simulated annealing, adaptive simulated annealing, and quantum Monte Carlo method. https://github.com/accel-brain/accel-brain-code/tree/master/Reinforcement-Learning. Accessed 2023 September 19.

[ref66] 66. Ito K, Houston JB. Comparison of the use of liver models for predicting drug clearance using in vitro kinetic data from hepatic microsomes and isolated hepatocytes. Pharm Res. 2004;21(5):785–92. pmid:15180335
View Article
PubMed/NCBI
Google Scholar

[191] View Article

[192] PubMed/NCBI

[193] Google Scholar

[ref67] 67. Ito K, Houston JB. Prediction of human drug clearance from in vitro and preclinical data using physiologically based and empirical approaches. Pharm Res. 2005;22(1):103–12.
View Article
Google Scholar

[195] View Article

[196] Google Scholar

[ref68] 68. Tang H, Mayersohn M. A novel model for prediction of human drug clearance by allometric scaling. Drug Metabolism and Disposition. 2005;33(9):1297–303.
View Article
Google Scholar

[198] View Article

[199] Google Scholar

[ref69] 69. Tang H, Hussain A, Leal M, Mayersohn M, Fluhler E. Interspecies prediction of human drug clearance based on scaling data from one or two animal species. Drug Metab Dispos. 2007;35(10):1886–93.
View Article
Google Scholar

[201] View Article

[202] Google Scholar

[ref70] 70. Mahmood I. Prediction of human drug clearance from animal data: application of the rule of exponents and “fu Corrected Intercept Method” (FCIM). J Pharm Sci. 2006;95(8):1810–21. pmid:16795002
View Article
PubMed/NCBI
Google Scholar

[204] View Article

[205] PubMed/NCBI

[206] Google Scholar

[ref71] 71. Goteti K, Brassil PJ, Good SS, Garner CE. Estimation of human drug clearance using multiexponential techniques. J Clin Pharmacol. 2008;48(10):1226–36. pmid:18559487
View Article
PubMed/NCBI
Google Scholar

[208] View Article

[209] PubMed/NCBI

[210] Google Scholar

[ref72] 72. Paixão P, Gouveia LF, Morais JAG. Prediction of the in vitro intrinsic clearance determined in suspensions of human hepatocytes by using artificial neural networks. European Journal of Pharmaceutical Sciences. 2010;39(5):310–21.
View Article
Google Scholar

[212] View Article

[213] Google Scholar

[ref73] 73. Varma MV, Steyn SJ, Allerton C, El-Kattan AF. Predicting clearance mechanism in drug discovery: extended clearance classification system (ECCS). Pharm Res. 2015;32(12):3785–802.
View Article
Google Scholar

[215] View Article

[216] Google Scholar

[ref74] 74. Zhuang X, Lu C. PBPK modeling and simulation in drug research and development. Acta Pharm Sin B. 2016;6(5):430–40. pmid:27909650
View Article
PubMed/NCBI
Google Scholar

[218] View Article

[219] PubMed/NCBI

[220] Google Scholar

[ref75] 75. Wang Y, Liu H, Fan Y, Chen X, Yang Y, Zhu L. In silico prediction of human intravenous pharmacokinetic parameters with improved accuracy. J Chem Inf Model. 2019;59(9):3968–80.
View Article
Google Scholar

[222] View Article

[223] Google Scholar

[ref76] 76. Yoshida K, Doi Y, Iwazaki N, Yasuhara H, Ikenaga Y, Shimizu H, et al. Prediction of human pharmacokinetics for low-clearance compounds using pharmacokinetic data from chimeric mice with humanized livers. Clin Transl Sci. 2022;15(1):79–91. pmid:34080287
View Article
PubMed/NCBI
Google Scholar

[225] View Article

[226] PubMed/NCBI

[227] Google Scholar

[ref77] 77. Proctor NJ, Tucker GT, Rostami-Hodjegan A. Predicting drug clearance from recombinantly expressed CYPs: intersystem extrapolation factors. Xenobiotica. 2004;34(2):151–78.
View Article
Google Scholar

[229] View Article

[230] Google Scholar

[ref78] 78. Huang Q, Riviere JE. The application of allometric scaling principles to predict pharmacokinetic parameters across species. Expert Opin Drug Metab Toxicol. 2014;10(9):1241–53. pmid:24984569
View Article
PubMed/NCBI
Google Scholar

[232] View Article

[233] PubMed/NCBI

[234] Google Scholar

[ref79] 79. PubChem. PubChem. https://pubchem.ncbi.nlm.nih.gov/. Accessed 2025 October 2.

[ref80] 80. Riviere JE, Papich MG. Veterinary pharmacology and therapeutics. John Wiley & Sons; 2018. https://books.google.com/books?hl=en&lr=&id=rqVFDwAAQBAJ&oi=fnd&pg=PT4&dq=Veterinary+Pharmacokinetics+Jim+E.+Riviere+and+Mark+G.+Papich&ots=0SiB3BxNXJ&sig=WyezYyCc4TbSfh80ncgZBkBBmyk

[ref81] 81. Beaumont K, Gosset JR, Keefer CE. Integrated assessment of drug clearance and cross-species scalability. Predictive ADMET. John Wiley & Sons, Ltd. 2014. 291–318.

[ref82] 82. Thiel C, Hofmann U, Ghallab A, Gebhardt R, Hengstler JG, Kuepfer L. Towards knowledge-driven cross-species extrapolation. Drug Discovery Today: Disease Models. 2016;22:21–6.
View Article
Google Scholar

[239] View Article

[240] Google Scholar

Figures

Abstract

Introduction

Methodology

Data resources

Data collection

Data compliance

Data pre-processing

Data sampling

Dataset

Feature importance score

Performance measures

Results

Data sampler and machine learning models

Feature importance

Prediction models

Case 1 (all animal categories and route forms – hybrid ML CLT)

Case 2 (ungulates and all route forms – hybrid ML CLT)

Case 3 (small ruminants and all route forms – hybrid ML CLT)

Case 4 (companion animals and all route forms – hybrid ML CLT)

Case 5 (all route forms for different training:test data splitting – hybrid ML CLT)

Case 6 (IV only – true CLT)

Discussion and concluding remarks

Discussion of limitations and future directions

Supporting information

S1 File. Machine Learning-Based Unified Models for Predicting Drug Clearance from Pharmacokinetic Animal and Study Design Variables.

S2 File. Best parameters identified for different data sampling methods to fit the 9 ML regression models.

Acknowledgments

References

Case 1 (all animal categories and route forms – hybrid ML CL_T)

Case 2 (ungulates and all route forms – hybrid ML CL_T)

Case 3 (small ruminants and all route forms – hybrid ML CL_T)

Case 4 (companion animals and all route forms – hybrid ML CL_T)

Case 5 (all route forms for different training:test data splitting – hybrid ML CL_T)

Case 6 (IV only – true CL_T)