VBB is an Academic Editor of PLOS ONE journal. This does not alter the authors' adherence to all the PLOS ONE policies on sharing data and materials, as detailed online in the guide for authors.
Conceived and designed the experiments: OS WB VB. Performed the experiments: OS. Analyzed the data: OS WB MA ME PK VB. Contributed reagents/materials/analysis tools: OS WB VR VB. Wrote the paper: OS WB MA ME VR PK VB.
High-throughput screening (HTS) experiments provide a valuable resource that reports biological activity of numerous chemical compounds relative to their molecular targets. Building computational models that accurately predict such activity status (active vs. inactive) in specific assays is a challenging task given the large volume of data and frequently small proportion of active compounds relative to the inactive ones. We developed a method, DRAMOTE, to predict activity status of chemical compounds in HTP activity assays. For a class of HTP assays, our method achieves considerably better results than the current state-of-the-art-solutions. We achieved this by modification of a minority oversampling technique. To demonstrate that DRAMOTE is performing better than the other methods, we performed a comprehensive comparison analysis with several other methods and evaluated them on data from 11 PubChem assays through 1,350 experiments that involved approximately 500,000 interactions between chemicals and their target proteins. As an example of potential use, we applied DRAMOTE to develop robust models for predicting FDA approved drugs that have high probability to interact with the thyroid stimulating hormone receptor (TSHR) in humans. Our findings are further partially and indirectly supported by 3D docking results and literature information. The results based on approximately 500,000 interactions suggest that DRAMOTE has performed the best and that it can be used for developing robust virtual screening models. The datasets and implementation of all solutions are available as a MATLAB toolbox online at
Experimental screening of chemical compounds for their biological activity has partial coverage and leaves millions of chemical compounds untested [
In this study we examine robust solutions that can be used for
Although good progress has been achieved for building predictive models for HTS data, there are still many issues in current methods that need to be investigated further.
First, many studies have developed prediction models for HTS data without considering precision or other precision relevant scores like F1Score in optimizing the performance of these models. Recently, some studies [
Second, generating and selecting a good subset of features is an important step in developing a well-performing prediction model, and may help in the cases of data with large class imbalance [
To tackle the above-mentioned problems, in this study we examine robust solutions to be used for
For this study we selected nine datasets from the PubChem BioAssay database where targets are proteins except for one dataset where the target is cell-based. Although we have a special interest in protein targets, we choose a case that is cell-based to illustrate the generality of our method. It is worth noting that all the datasets we chose are based on the confirmatory assays and we avoided selection of primary assays based on recommendation of [
Dataset | Target Name (Target) | Type of interacting compounds | Minority Class Size | Majority Class Size | IR Ratio |
---|---|---|---|---|---|
BenchSet (AID: 773, AID: 1006 and AID: 1379) | Luciferase [Photuris pennsylvanica](Protein) | Inhibitors | 487 | 184,154 | 1:377 |
AID 596 | Microtubule-associated protein tau [Homo sapiens] (Protein) | Binders | 1,391 | 66,726 | 1:48 |
AID 618 | Matrix metalloproteinase 1, partial [Homo sapiens] (Protein) | Inhibitors | 537 | 86,197 | 1:160 |
AID 644 | Rho-associated protein kinase 2 [Homo sapiens] (Protein) | Inhibitors | 67 | 139 | 1:2 |
AID 886 | Chain B, The Structure Of Wild-Type Human Hadh2 (Protein) | Inhibitors | 2,463 | 64,616 | 1:26 |
AID 899 | Cytochrome P450 2C19 precursor [Homo sapiens] (Protein) | Inhibitors and Substrates | 1,901 | 6,443 | 1:3 |
AID 938 | Thyroid stimulating hormone receptor [Homo sapiens] (Protein) | Agonist Activators | 1,794 | 60,806 | 1:34 |
AID 743042 | Androgen receptor [Homo sapiens] (Protein) | Antagonist Activators | 674 | 6,939 | 1:10 |
AID 743288 | Hek293 cell line (Cell) | Binders | 95 | 2,128 | 1:22 |
Total Interactions |
The DrugBank database data (accessed on August, 2014) was downloaded from the website:
Generating and selecting a good subset of features is an important step in developing a well-performing classification model, and may also help in the cases of large class imbalance [
A large set of compiled features, as described in the previous section, leads to generating information of different level of redundancy, as well as introduces features that may not be relevant to the types of biological activity of chemicals as observed in particular assays. A good FS method should be able to remove a lot of such redundant or irrelevant information [
Six widely used classifiers are applied as a basis for comparing different solutions of the class imbalance problem for activity testing in PubChem assays. These include support vector machines [
Performance of all methods referred to in the results section is obtained form a 5-fold cross-validation. The testing fold was never used in the training phase. Since we performed 5-fold cross-validation, with six classifiers and five class imbalance solutions, we performed 150 (5 folds × 6 classifiers × 5 solutions = 150) experiments for each dataset and 1,350 in total for all nine datasets. We report the average performance over the 5-folds of every dataset, as well as the standard deviation. In addition, we perform significance analysis between the methods using one-way analysis of variance (ANOVA). In cases where there is a significant difference between the methods, we further apply the well-known pair-wise Tukey mean-mean multiple comparison (MCC) to determine which pairs are significantly different, while simultaneously examining all methods [see
The predictions of a classifier for a HTS dataset should result in high precision in order for the set of predicted active compounds to contain as few FP predictions as possible. The number of FPs is a crucial factor in measuring the reliability of predictions as minimizing it leads to increased chances of successful follow up experiments.
F1Score [
HTS experiments are usually characterized by only a small number of active chemical compounds obtained after screening a big compound set. This nature of imbalanced distributions of the active and inactive compound classes may lead to a degraded classification performance that should be addressed. The class imbalance problem is one of the challenging tasks that received a lot of attention [
There are certain limitations with the existing solutions for data preprocessing in the case of class imbalance. Methods like RU and SMOTE apply sampling procedures to data without considering the effect of sampling on the classification performance. These methods are independent of any feedback from the classifier and may affect the performance only to a certain limit. In other words, these methods do not provide a mechanism to have a control over precision or other performance metrics. Other algorithms like GSVM-RU, take into account the performance of the classifier, but are limited to a specific classifier, e.g. GSVM-RU is limited to SVM and cannot be applied to other classifiers. MWMOTE needs more parameters for selecting an informative set of minority samples and is limited to optimize the performance over nearest neighbor type of classifiers. We propose here a novel method motivated by ideas from active learning (AL) (for more details about AL see
A) SMOTE generates the light blue samples by interpolation between a randomly chosen minority sample and k-nearest neighbors. B) DRAMOTE generates the light blue samples by choosing a minority sample based on its importance (i.e. contribution to precision) and the direction towards a safe region. A minority sample (red colored) that is very close to the majority negatives circles will be probably misclassified as a negative one and hence, it should get more support compared to the green colored minority samples. Once a minority sample is chosen, another point needs to be chosen for interpolation. The direction of interpolation can be controlled by choosing a nearest neighbor which is not overlapping with the negative class. This, in turn, helps in providing support for the red colored point while not harming the classifier performance in its surrounding region.
We made a number of experiments to evaluate performance of the methods we used. The results are provided in
Larger standard deviation values are the result of averaging over different types of classifiers in this summary table.
Dataset | Method | Sensitivity % | Precision % | F1 Score % | F0.5 Score % |
---|---|---|---|---|---|
RU | 85.67 (±2.5) | 1.07 (±0.29) | 2.93 (±0.56) | 1.33 (±0.35) | |
GSVM-RU | 68.53 (±6) | 2.73 (±2.05) | 5.13 (±3.7) | 3.36 (±2.49) | |
SMOTE | 62.79 (±15.32) | 10.44 (±16.11) | 10.87 (±15.17) | ||
MWMOTE | 69.49 (±13.18) | 4.9 (±6.7) | 7.87 (±9.08) | 5.75 (±7.52) | |
DRAMOTE | 58.14 (±19.2) | 11.62 (±11.42) | |||
[ |
5 | NA |
NA |
||
RU | 75.9 (±3.04) | 5.3 (±1.17) | 9.89 (±2.07) | 6.51 (±1.41) | |
GSVM-RU | 4.56 (±2.78) | 8.46 (±4.78) | 5.59 (±3.34) | ||
SMOTE | 64.02 (±13.8) | 10.9 (±8.95) | 16.38 (±9.28) | 12.47 (±9.16) | |
MWMOTE | 62.1 (±14.3) | 10.8 (±9.2) | 16.11 (±9.34) | 12.32 (±9.37) | |
DRAMOTE | 42.9 (±13.52) | ||||
RU | 72.54 (±3.41) | 1.38 (±0.31) | 2.7 (±0.59) | 1.71 (±0.38) | |
GSVM-RU | 2.64 (±1.48) | 4.89 (±2.59) | 3.24 (±1.79) | ||
SMOTE | 43.01 (±17.87) | 10.07 (±12.36) | 10.93 (±8.36) | 10.01 (±10.42) | |
MWMOTE | 42.34 (±18.53) | 10.31 (±12.72) | 10.24 (±10.49) | ||
DRAMOTE | 29.69 (±15.26) | 9.73 (±6.09) | |||
RU | 50.29 (±4.46) | 35.08 (±2.56) | 40.32 (±3.1) | 37.3 (±2.49) | |
GSVM-RU | 36.02 (±2.51) | 39.84 (±2.48) | |||
SMOTE | 47.3 (±14.1) | 41.78 (±7.23) | 40.95 (±3.21) | 41.65 (±3.72) | |
MWMOTE | 47.37 (±12.37) | 42.22 (±6.68) | 41.99 (±3.24) | ||
DRAMOTE | 40.09 (±8.51) | 38.84 (±1.64) | 41.49 (±5.79) | ||
RU | 67.65 (±2.55) | 80.52 (±1.75) | 72.27 (±2.31) | ||
GSVM-RU | 99.25 (±0.97) | 54.51 (±26.52) | 65.87 (±29.63) | 58.53 (±27.76) | |
SMOTE | 96.94 (±4.11) | 75.2 (±4.92) | |||
MWMOTE | 97.03 (±3.27) | 74.32 (±4.81) | 83.98 (±2.75) | 77.9 (±4.06) | |
DRAMOTE | 94.38 (±8.1) | 83.55 (±3.72) | 78.56 (±4.17) | ||
RU | 77.65 (±3.43) | 45.96 (±7.07) | 57.33 (±5.46) | 49.89 (±6.7) | |
GSVM-RU | 25.82 (±2.6) | 40.69 (±2.84) | 30.25 (±2.76) | ||
SMOTE | 70.44 (±8.14) | 53.52 (±14.02) | |||
MWMOTE | 70.5 (±8.48) | 52.61 (±13.66) | 58.55 (±6.55) | 54.55 (±10.83) | |
DRAMOTE | 64.51 (±8.01) | 56.73 (±5.38) | 54.47 (±10.69) | ||
RU | 66.17 (±2) | 79.4 (±1.45) | 37.3 (±2.49) | ||
GSVM-RU | 99.16 (±0.5) | 45.85 (±17.01) | 56.79 (±17.22) | 49.64 (±24.09) | |
SMOTE | 91.86 (±0.9) | 80.05 (±1.8) | 84 (±1.34) | 81.94 (±11.11) | |
MWMOTE | 94.49 (±8.2) | 70.7 (±8) | 80.74 (±1.9) | 74.41 (±6.24) | |
DRAMOTE | 91.39 (±4) | ||||
RU | 71.34 (±7.44) | 17.22 (±2.83) | 27.66 (±4) | 20.28 (±3.21) | |
GSVM-RU | 11.11 (±0.65) | 19.81 (±0.9) | 13.47 (±0.74) | ||
SMOTE | 33.38 (±16.32) | 36.99 (±21.61) | 27.71 (±8.52) | 29.84 (±10.97) | |
MWMOTE | 35.52 (±14.9) | 36.54 (±18.4) | 30.56 (±7.01) | 32.18 (±9.78) | |
DRAMOTE | 35.38 (±14.13) | ||||
RU | 68.09 (±5.53) | 8.38 (±1.07) | 14.89 (±1.77) | 10.16 (±1.27) | |
GSVM-RU | 5.76 (±0.4) | 10.78 (±0.68) | 7.08 (±0.48) | ||
SMOTE | 25.74 (±18.34) | 26.99 (±23.95) | 24.56 (±6.5) | 24.05 (±10.15) | |
MWMOTE | 23.8 (±17.4) | 33.02 (±21.18) | 23.75 (±9.67) | 25.78 (±10.32) | |
DRAMOTE | 27.88 (±14.66) |
a NA indicates that a particular measure was not reported in the referenced work
In
To see where a particular solution stands among all the remaining ones, we also ranked the performance of each of the methods for every classifier based on the F1Score. We then averaged the rank position for each of the methods. The method with the lowest score is the best performing. We provide in
Classifier | RU | GSVM-RU | SMOTE | MWMOTE | DRAMOTE |
---|---|---|---|---|---|
SVM-L | 3 | 5 | 1 | 4 | 2 |
SVM-R | 4 | 5 | 2 | 3 | 1 |
KNN | 3 | 5 | 2 | 4 | 1 |
LDA | 4 | 5 | 2 | 3 | 1 |
NBC | 1 | 4 | 3 | 5 | 2 |
RF | 4 | 5 | 1 | 3 | 2 |
Average | 3.17 | 4.83 | 3.67 |
SMOTE, MWMOTE and DRAMOTE are all methods that generate synthetic data with exactly the same number of new over-sampling points. However, DRAMOTE gives preference to generating points contributing more to the precision of a particular classifier. Results of
Compared to GSVM-RU that was reported as an effective method for PubChem BioAssays [
This section describes a case study for prediction of activity status of FDA drugs with TSHR protein. TSHR is a key protein in the control of thyroid function and belongs to the superfamily of G-protein-coupled receptors (GPCRs) [
The biochemical relevance of the 10 top ranked predictions by DRAMOTE was further indirectly supported by
The application of DRAMOTE to the TSHR dataset (AID 938) resulted in precision of 81.02% and sensitivity of 91.39%. After building an ensemble of all six trained classifiers, the performance improved by maintaining similar level of precision (~81%) but with a sensitivity of 98.84% (i.e. more than 7% increase in sensitivity).
We investigated the potential interaction of approved drugs from the DrugBank database [
Rank | DrugBank ID | Drug Name | Description | Ensemble System Score |
---|---|---|---|---|
1 | DB00904 | Ondansetron | Treatment of nausea and vomiting caused by cytotoxic chemotherapy drugs | 0.98 |
2 | DB00962 | Zaleplon | Sedative/hypnotic, mainly used for insomnia | 0.97 |
3 | DB01349 | Tasosartan | Treat patients with essential hypertension | 0.966 |
4 | DB00405 | Dexbrompheniramine | Treat allergic conditions such as hay fever or urticaria | 0.96 |
5 | DB01261 | Sitagliptin | Control of type 2 diabetes mellitus | 0.958 |
6 | DB06439 | Tyloxapol | Non-ionic detergent often used as a surfactant | 0.957 |
7 | DB00889 | Granisetron | Antiemetic and antinauseant for cancer chemotherapy patients | 0.954 |
8 | DB01342 | Forasartan | Used alone or with other antihypertensive agents to treat hypertension | 0.953 |
9 | DB00748 | Carbinoxamine | First generation antihistamine that competes with free histamine for binding at HA-receptor sites | 0.95 |
10 | DB06267 | Udenafil | Treat erectile dysfunction | 0.945 |
Docking simulations can indirectly support the previous top findings in our data-driven approach. While docking simulations are prone to false positives, the presence of consistent levels in binding values between our findings and the top experimentally ranked interactions reported in AID 938 gives more confidence about our suggested candidates having active interaction status with TSHR.
The random set is based on choosing 10 random drugs from approved drugs list in DrugBank database. The experimental set includes the top 10 drugs as listed in the original BioAssay AID 938 of PubChem database.
A literature review of our top predictions points out that
In this study, we extensively compare several state-of-the-art methods that handle class imbalance problem based on advanced sampling techniques. The results based on approximately 500,000 interactions suggest that DRAMOTE can be used for developing robust virtual screening models to recognize candidate chemical compounds for potential activity with specific molecular targets in specific assays. Moreover, we applied DRAMOTE to screen for drugs likely to interact with the TSHR as a case study and we presented the top 10 drugs that potentially interact with TSHR along with indirect supporting evidence of their validity from literature and simulated 3D docking.
The orange color highlights the top docking results of a drug binding to the chosen activation site.
(TIFF)
(DOCX)
Mean and variance of 5-fold cross-validation performance scores are displayed for each method and for each used classifiers.
(DOCX)
The file also includes most of the features we selected after applying variable selection over the originals set of generated features.
(DOCX)
(DOCX)
The file includes also all information about DRAMOTE and its procedure.
(DOCX)
(DOCX)
(DOCX)
(DOCX)
The authors thank Dr. Hammad Naveed, Ahmed Elshewy, Loqmane Seridi, Dr. Salim Bougouffa, Haitham Ashoor and Dr. Mahmut Uludag for multiple insightful and valuable discussions about experimental design and results presentation. The computational analysis for this study was performed on Dragon and SnapDragon compute clusters of Computational Bioscience Research Center at King Abdullah University of Science and Technology (KAUST).This study is supported by the KAUST base research funds of VBB and PK.