Fig 1.
An overview of our computational-experimental framework for prediction and pre-clinical testing of compound-protein bioactivity profiles.
Two separate prediction problems are considered: (1) filling the gaps in existing compound-target interaction maps and (2) prediction of target interactions for a new or investigational compound. Molecular descriptors of drug compounds and protein targets are encoded as kernels, and used for binding affinity prediction with a regularized least squares regression model KronRLS. Finally, a subset of predicted compound-protein bioactivities is experimentally tested (see Materials and Methods for details). Since the experimental validations do not exists at the time of making the predictions, this approach effectively assesses any potential model overfitting to the training data only. We chose to use kernel-based models as these are well-suited for representing structured objects, such as molecules, that cannot be accurately described by a standard feature vector. Different types of drug and protein kernels can be calculated using readily available chemical structures and amino acid sequences. The resulting matrices associate all pairs of input objects, and therefore a kernel function can be considered as a similarity measure.
Fig 2.
Drug-protein interaction prediction scenarios.
(dx, px) denotes a query drug-protein pair, the binding affinity of which is to be predicted. (a) The Bioactivity Imputation scenario: both the drug dx and protein px are present in the training set, i.e., there exist known bioactivity values for the drug dx and protein px, but not for their interaction (dx, px). (b) The New Drug scenario: the protein px is present in the training set, whereas the drug dx is not, i.e., there exist known bioactivity values for the protein px but not for the drug dx. (c) The New Target scenario: the drug dx is present in the training set, whereas the protein px is not, i.e., there exist known bioactivity values for the drug dx, but not for the protein px. (d) The New Drug-Target Pair scenario: neither the drug dx nor protein px is present in the training set, i.e., there exist no bioactivity values neither for the drug dx nor protein px. In this work, we focused primarily on two most common and practical prediction scenarios of (a) and (b), which correspond to filling the gaps in existing experimentally-measured drug-target interaction maps and prediction of target interactions for an investigational drug compound, respectively.
Table 1.
Applied nested cross-validation strategies.
Fig 3.
Computational evaluation of the model predictions.
(a) Leave-one-out and (b) leave-drug-out cross-validation results. The prediction accuracy was evaluated with Pearson correlation (r) between binding affinities (pKi) from the study by Metz et al. [3] and those predicted using KronRLS algorithm with different pairs of compound (rows) and protein (columns) molecular descriptors encoded as kernel matrices (c). The corresponding root mean squared error (RMSE) values are shown in S1 Fig. Of note, Gaussian interaction profile drug kernel (KD-GIP), which resulted in the highest predictive performance under the Bioactivity Imputation scenario (a), was not evaluated under the New Drug scenario (b), because it is constructed based on the bioactivity profile of a drug to be predicted, that is, using information that in practice is unavailable when predicting target interactions for a new investigational drug compound.
Fig 4.
Comparison between computationally-predicted and experimentally-measured bioactivities.
(a) Scatter plot between bioactivity values of 100 compound-kinase pairs (detailed in S2 Table). r indicates Pearson correlation. The orange cross points correspond to compound-kinase pairs tested in the study of Metz et al. but randomly blinded by us in the training of the model, forming an additional validation set. When no clear interaction between compound and kinase was observed in our experimental assay, the pIC50 value was set to 4.9 M, corresponding to the highest drug concentration used in our screen (12,500 nM). The higher the pKi/pIC50 value, the stronger the affinity between the two molecules. Red lines mark a relatively stringent interaction threshold (7 M), distinguishing the top left corner as the region containing false positive interaction predictions, and the bottom right corner as false negative predictions. (b) A set of receiver operating characteristic (ROC) curves to investigate the model performance as a function of varying activity threshold. We applied 11 different interaction threshold values from the pIC50 interval [6 M, 8 M] to binarize the experimentally-measured bioactivities into true class labels, and then determined how accurately the model can discriminate between the interacting and non-interacting compound-kinase pairs. The average area under the ROC curves (AUC) equals 0.970.
Fig 5.
Prediction of target interactions for an investigational kinase inhibitor tivozanib.
(a) Predicted and measured bioactivity profiles of tivozanib against its 3 established on-targets (FLT1, FLT4, KDR; average bioactivity from ChEMBL; S3 Table) and 7 predicted off-target kinases tested in our experimental assay. Pearson correlation r = 0.668 (p = 0.035). When no clear compound-kinase interaction was observed in our assay, the pIC50 value was set to 4.9 M, corresponding to the highest drug concentration used (12,500 nM). Predicted values belong to approximately constant range because we focused on experimental validation of the model-predicted off-target interactions. Three of them turned out to be false positives, and therefore the range of experimental results varies more than the range of predicted values. (b) Evaluation of negative interaction predictions from the model. Among 82 kinases with low predicted binding affinities (pKi < 6 M), 64 were screened by Gao et al., and 59 of these are not likely targets of tivozanib (as they have at least 50% of the activity remaining at the high compound concentration of 1 μM).