Computational-experimental approach to drug-target interaction mapping: A case study on kinase inhibitors

doi:10.1371/journal.pcbi.1005678

Fig 1.

An overview of our computational-experimental framework for prediction and pre-clinical testing of compound-protein bioactivity profiles.

Two separate prediction problems are considered: (1) filling the gaps in existing compound-target interaction maps and (2) prediction of target interactions for a new or investigational compound. Molecular descriptors of drug compounds and protein targets are encoded as kernels, and used for binding affinity prediction with a regularized least squares regression model KronRLS. Finally, a subset of predicted compound-protein bioactivities is experimentally tested (see Materials and Methods for details). Since the experimental validations do not exists at the time of making the predictions, this approach effectively assesses any potential model overfitting to the training data only. We chose to use kernel-based models as these are well-suited for representing structured objects, such as molecules, that cannot be accurately described by a standard feature vector. Different types of drug and protein kernels can be calculated using readily available chemical structures and amino acid sequences. The resulting matrices associate all pairs of input objects, and therefore a kernel function can be considered as a similarity measure.

More »

Expand

Fig 2.

Drug-protein interaction prediction scenarios.

(d_x, p_x) denotes a query drug-protein pair, the binding affinity of which is to be predicted. (a) The Bioactivity Imputation scenario: both the drug d_x and protein p_x are present in the training set, i.e., there exist known bioactivity values for the drug d_x and protein p_x, but not for their interaction (d_x, p_x). (b) The New Drug scenario: the protein p_x is present in the training set, whereas the drug d_x is not, i.e., there exist known bioactivity values for the protein p_x but not for the drug d_x. (c) The New Target scenario: the drug d_x is present in the training set, whereas the protein p_x is not, i.e., there exist known bioactivity values for the drug d_x, but not for the protein p_x. (d) The New Drug-Target Pair scenario: neither the drug d_x nor protein p_x is present in the training set, i.e., there exist no bioactivity values neither for the drug d_x nor protein p_x. In this work, we focused primarily on two most common and practical prediction scenarios of (a) and (b), which correspond to filling the gaps in existing experimentally-measured drug-target interaction maps and prediction of target interactions for an investigational drug compound, respectively.

More »

Expand

Table 1.

Applied nested cross-validation strategies.

More »

Expand

Fig 3.

Computational evaluation of the model predictions.

(a) Leave-one-out and (b) leave-drug-out cross-validation results. The prediction accuracy was evaluated with Pearson correlation (r) between binding affinities (pK_i) from the study by Metz et al. [3] and those predicted using KronRLS algorithm with different pairs of compound (rows) and protein (columns) molecular descriptors encoded as kernel matrices (c). The corresponding root mean squared error (RMSE) values are shown in S1 Fig. Of note, Gaussian interaction profile drug kernel (KD-GIP), which resulted in the highest predictive performance under the Bioactivity Imputation scenario (a), was not evaluated under the New Drug scenario (b), because it is constructed based on the bioactivity profile of a drug to be predicted, that is, using information that in practice is unavailable when predicting target interactions for a new investigational drug compound.

More »

Expand

Fig 4.

Comparison between computationally-predicted and experimentally-measured bioactivities.

(a) Scatter plot between bioactivity values of 100 compound-kinase pairs (detailed in S2 Table). r indicates Pearson correlation. The orange cross points correspond to compound-kinase pairs tested in the study of Metz et al. but randomly blinded by us in the training of the model, forming an additional validation set. When no clear interaction between compound and kinase was observed in our experimental assay, the pIC₅₀ value was set to 4.9 M, corresponding to the highest drug concentration used in our screen (12,500 nM). The higher the pK_i/pIC₅₀ value, the stronger the affinity between the two molecules. Red lines mark a relatively stringent interaction threshold (7 M), distinguishing the top left corner as the region containing false positive interaction predictions, and the bottom right corner as false negative predictions. (b) A set of receiver operating characteristic (ROC) curves to investigate the model performance as a function of varying activity threshold. We applied 11 different interaction threshold values from the pIC₅₀ interval [6 M, 8 M] to binarize the experimentally-measured bioactivities into true class labels, and then determined how accurately the model can discriminate between the interacting and non-interacting compound-kinase pairs. The average area under the ROC curves (AUC) equals 0.970.

More »

Expand

Fig 5.

Prediction of target interactions for an investigational kinase inhibitor tivozanib.

(a) Predicted and measured bioactivity profiles of tivozanib against its 3 established on-targets (FLT1, FLT4, KDR; average bioactivity from ChEMBL; S3 Table) and 7 predicted off-target kinases tested in our experimental assay. Pearson correlation r = 0.668 (p = 0.035). When no clear compound-kinase interaction was observed in our assay, the pIC₅₀ value was set to 4.9 M, corresponding to the highest drug concentration used (12,500 nM). Predicted values belong to approximately constant range because we focused on experimental validation of the model-predicted off-target interactions. Three of them turned out to be false positives, and therefore the range of experimental results varies more than the range of predicted values. (b) Evaluation of negative interaction predictions from the model. Among 82 kinases with low predicted binding affinities (pK_i < 6 M), 64 were screened by Gao et al., and 59 of these are not likely targets of tivozanib (as they have at least 50% of the activity remaining at the high compound concentration of 1 μM).

More »

Expand