Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Enhancing Hit Identification in Mycobacterium tuberculosis Drug Discovery Using Validated Dual-Event Bayesian Models

  • Sean Ekins ,

    Affiliations Collaborative Drug Discovery, Burlingame, California, United States of America, Collaborations in Chemistry, Fuquay-Varina, North Carolina, United States of America

  • Robert C. Reynolds,

    Current address: University of Alabama at Birmingham, College of Arts and Sciences, Department of Chemistry, Birmingham, Alabama, United States of America

    Affiliation Southern Research Institute, Birmingham, Alabama, United States of America

  • Scott G. Franzblau,

    Affiliation Institute for Tuberculosis Research, University of Illinois at Chicago, Chicago, Illinois, United States of America

  • Baojie Wan,

    Affiliation Institute for Tuberculosis Research, University of Illinois at Chicago, Chicago, Illinois, United States of America

  • Joel S. Freundlich,

    Affiliations Department of Pharmacology & Physiology, UMDNJ – New Jersey Medical School, Newark, New Jersey, United States of America, Department of Medicine, Center for Emerging and Reemerging Pathogens, UMDNJ – New Jersey Medical School, Newark, New Jersey, United States of America

  • Barry A. Bunin

    Affiliation Collaborative Drug Discovery, Burlingame, California, United States of America


High-throughput screening (HTS) in whole cells is widely pursued to find compounds active against Mycobacterium tuberculosis (Mtb) for further development towards new tuberculosis (TB) drugs. Hit rates from these screens, usually conducted at 10 to 25 µM concentrations, typically range from less than 1% to the low single digits. New approaches to increase the efficiency of hit identification are urgently needed to learn from past screening data. The pharmaceutical industry has for many years taken advantage of computational approaches to optimize compound libraries for in vitro testing, a practice not fully embraced by academic laboratories in the search for new TB drugs. Adapting these proven approaches, we have recently built and validated Bayesian machine learning models for predicting compounds with activity against Mtb based on publicly available large-scale HTS data from the Tuberculosis Antimicrobial Acquisition Coordinating Facility. We now demonstrate the largest prospective validation to date in which we computationally screened 82,403 molecules with these Bayesian models, assayed a total of 550 molecules in vitro, and identified 124 actives against Mtb. Individual hit rates for the different datasets varied from 15–28%. We have identified several FDA approved and late stage clinical candidate kinase inhibitors with activity against Mtb which may represent starting points for further optimization. The computational models developed herein and the commercially available molecules derived from them are now available to any group pursuing Mtb drug discovery.


Mycobacterium tuberculosis (Mtb), the causative agent of tuberculosis (TB), continues to exact a devastating toll on healthcare infrastructure and human life worldwide. Mtb infects approximately one-third of the world’s population and kills 1.7–1.8 million people annually [1], on a par with another neglected disease, malaria [2]. While there are very effective treatments for TB, they are neither quick acting nor lacking side effects, resulting in poor patient compliance. In addition, resistance to the available first- and second-line TB drug cocktails is increasing [3], further exacerbated by complicating co-infections with other diseases [4], [5]. There has been a lack of new antibiotic for TB in the last 40 years apart from the recently approved bedaquiline for multidrug resistant TB [6], [7]. There are, however, other promising agents in ongoing clinical trials, although there is an urgent need for back-up and new alternative drugs [8], [9]. Hence, significant investment has been made towards whole-cell phenotypic screening of drug-like small molecule libraries in a search for new compounds that might stem the course of a potential epidemic of totally drug-resistant Mtb [5], [10][14]. Unfortunately, the hit rate for these costly TB screens in the best cases is in the low single digits (∼1.7–5%) when compound concentrations are 10 µg/ml [13], [14] or 10 µM [12]. It is not uncommon to have hit rates below 1% at concentrations of 14.3 µM [10] and 25 µM [11], as seen elsewhere in high-throughput screening (HTS) and infectious disease drug discovery [15][17]. Furthermore, the information from these inefficient and expensive HTS campaigns does not appear to have been used to direct “informed” selection of new libraries in subsequent screens and compound optimization in TB drug discovery. In this regard, virtual screening and computational approaches have been widely adopted in the pharmaceutical industry [18] alongside, or even prior to HTS, to ultimately improve efficiency [19], [20]. There are many computational methods that can assist in identifying compounds with activity against Mtb [21][24]: these include ligand-based [25] and protein-based [26] methods to identify molecules with ideal physiochemical properties [27].

We have recently conducted an extensive review of the use of computational approaches in TB drug discovery, concluding computational models are employed with little or no integration into the standard TB drug discovery workflow [28]. Furthermore, machine learning and compound classification methods have been infrequently used. These approaches are especially effective for virtual screening of libraries [29], [30]. General classification models such as Bayesian classification models have more recently been tested on datasets of several thousand compounds with activity against Mtb, demonstrating classification accuracy greater than 70% [25]. We have also recently reported retrospective Bayesian machine learning model analyses for Mtb [22][24] using large HTS data sets that were published and made publicly accessible [12], [13]. We observed 8–10 fold enrichment in identifying TB actives in the top scoring molecules. A recently published study [22] applied our TB Bayesian models to datasets published by others and showed four- to ten-fold enrichment factors for the top ranked compounds [22]. For comparison, others have recently described general Bayesian models for antibacterial compounds with 1.5–2 fold enrichments [31]. Such results suggested to us that whole-cell screening data from one laboratory can be used to build machine learning models that appropriately rank compounds screened and identified as Mtb hits by others [22]. These previously published models, however, did not account for the cytotoxicity of molecules to mammalian cells lines, e.g. African green monkey (Vero) cells.

Our most recent work has incorporated cytotoxicity data alongside Mtb bioactivity data by selecting for relatively non-cytotoxic actives with IC90<10 µg/ml (CB2-TAACF [13]) or 10 µM (MLSMR [15]) and a selectivity index (SI) greater than ten. SI was calculated as SI = CC50/IC90 where CC50 is the concentration that resulted in 50% inhibition of Vero cells (CC50). In this manner, we have generated Bayesian models [32] with enhanced predictive capability. We prospectively validated these models alongside previous Bayesian models in collaboration with established screening laboratories [32]. We now describe an additional series of three prospective validation experiments using commercially available molecules. Critically, the scale of our prospective validation has increased five-fold from 106 molecules in our recent publication [32] to 550 molecules in this current study that were predicted to be active and relatively non-cytotoxic to cultured Vero cells and experimentally tested. In the process of this evaluation we have further demonstrated the utility of our Bayesian approach to hit discovery and identified valuable starting points for the development of novel antitubercular agents: 124 actives against Mtb, including two families built around drug-like heterocyclic cores and several FDA-approved human kinase-targeting drugs. This represents the largest validation of such models against Mtb to date.


We have created and then applied computational TB models, which exploit heterogeneous collections of data. The models are used prospectively to virtually score large libraries of potential antitubercular agents and prioritize them for testing. Empirical assessment of a top-ranked fraction of each library for both antitubercular activity and Vero cell toxicity was then pursued and followed by analyses as to model performance.

MLSMR Dose Response and Cytotoxicity Model

A dual-event Bayesian model strategy has been recently described which resulted in the MLSMR dose response and cytotoxicity model [32]. The model information is repeated here as we have now made extensive use of it in this study. We selected non-cytotoxic actives as those with IC90<10 µΜ and SI>10. This model had a leave-one-out cross-validation receiver operator curve (LOO ROC) value of 0.86 (Table 1). All statistics for this model were equivalent or superior to the previously published MLSMR single point and dose response models (Table S1), which have been extensively validated elsewhere [22][24].

Table 1. Mean (SD) leave one out and leave out 50%×100 cross validation of Mtb Bayesian models (ROC = receiver operator characteristic).

Using the FCFP-6 descriptors we previously identified [32] those substructure descriptors that contribute to the Mtb activity (Figure S1) including the oxazole 2-thioether, aryl/heteroaryloxyacetic acid, and quinolone 3-carboxylic acid cores, and those substructure descriptors that are not present in active compounds such as thiazole 2-amides, 2-substituted pyrazoles, 2-substituted benzimidazoles, N-functionalized pyrrolidines, N-arylamides, and 2-substituted pyridines (Figure S2) [32].

TAACF Kinase Dataset Bayesian Models

The compounds from a library based on kinase inhibitor scaffolds screened through the TAACF was also utilized to construct multiple Bayesian models (Table 1), using the same methodology and validation approach as described previously [22][24]. We now describe these for the first time. Using 23,797 compounds with single point Mtb screening data we were able to build a Bayesian model with LOO ROC of 0.89. This statistic was stable after leave out 50%×100 validation and the model statistics of concordance, specificity and selectivity were >75% (Table 1). From our Bayesian modeling experience, values of >70% for these statistics are acceptable [22][24]. Using the FCFP-6 descriptors we identified those substructure descriptors that contribute to Mtb activity including 2-substituted 5-membered heterocycles, N-alkylated pyrroles, and imidazoles (Figure S3), and those that are not present in active compounds including imidazolidine diones and aminothienopyridazinones (Figure S4).

When we focused on the subset of 1,248 compounds with dose response data, model statistics decreased (Table 1). ROC values were ∼0.75 and the other statistics dropped to ∼61%, which are low and may be the result of the much smaller dataset being sensitive to the large percentage of data left out for testing compared with the single point model. Although these model statistics are low, the model may still have predictive utility [32]. Using the FCFP-6 descriptors we identified those substructure descriptors that contribute to the Mtb activity including 2-substituted 5-membered heterocycles, phenol, fluoroarene, and pyrazolopyrimidine (Figure S5), and those that are not present in active compounds including sulfonamides and nitrobenzene (Figure S6).

When the cytotoxicity data was considered in the active classification (we selected actives as those with IC90<10 µg/mL and selectivity SI>10) the ROC increased to >0.75 and all other statistics improved but did not reach levels of the single point model (Table 1). This model was used to evaluate additional compounds. Using the FCFP-6 descriptors we can identify those substructure descriptors that contribute to activity including pyranone and thiophene 2-amides where the amide nitrogen is substituted with an oxazole or oxadiazole (Figure S7), and those that are not present in actives includes 5-alkoxy substituted indole, benzenesulfonamide, pyrazolopyridine, and acylhydrazide (Figure S8).


Both the MLSMR dose response with cytotoxicity and the TAACF kinase dose response with cytotoxicity models were used to screen the Asinex library (N = 25,008), Maybridge library (N = 57,200), and Selleck Chemicals kinase library (N = 194). The libraries were ranked by each model and the top scoring compounds were purchased without any further selection criteria. For the Asinex library, ultimately 94 molecules were selected through scoring with the MLSMR dose response and cytotoxicity model and 88 based on the TAACF kinase dose response and cytotoxicity model. The selected Asinex compounds were purchased and tested at a single concentration. The MLSMR model retrieved 8 hits while the TAACF model correctly predicted 19 hits, where a hit demonstrated >90% inhibition at 100 µg/mL (Figure 1). These represent 8.5% and 21.6% hit rates, respectively, or an overall hit rate of 14.8% when both datasets are combined. For the Maybridge library, 174 total compounds were selected with the same two Bayesian models and 50 molecules had >90% inhibition at 100 µg/mL (Table S2) representing a total 28.7% hit rate. All of the molecules from the Selleck Chemicals kinase library were virtually screened with the MLSMR dose response and cytotoxicity model, the TAACF-CB2 (CB2) dose response and cytotoxicity model (described previously [32]) and the TAACF kinase dose response and cytotoxicity model. Forty-seven molecules had greater than or equal to 90% inhibition of Mtb activity at 50 µg/mL (Table S3), representing a total hit rate of 24.2%. The data for all three models can be plotted (Figure S9). Interestingly this shows that the previously published CB2 model [32] has an enrichment of 2.5 fold in the top 5% of compounds at finding compounds (compared to the random hit rate) only based on Mtb activity. However, it should also be noted that the TAACF kinase and MLSMR dual-event models performed below random in this analysis.

Figure 1. Asinex hits picked with MLSMR dose response and cytotoxicity model and TAACF kinase dose response and cytotoxicity model.More positive numbers from the Bayesian models suggest likely Mtb activity.


There is increasing evidence that computational methods can assist in TB research [33]. Our recent efforts have validated the concept that we as a community can more rationally interrogate large public datasets from HTS campaigns that have attempted to find the next antitubercular drug, or its parent hit. Instead of merely taking the “needle in a haystack” approach, we have demonstrated that the data as a whole (actives and inactives) can inform us as how to more quickly and inexpensively find actives in future screens. Previous work highlighted a 14% hit rate when applying a single-event Bayesian model and a dual-event model afforded 5/7 hits with an MIC ≤2 µg/ml [32]. The current study greatly expands on these efforts. We are not aware of any other published TB computational studies that have made such extensive prospective predictions using machine learning models followed up by in vitro screening for proof of principle. Approaches using docking [28] and inhibitor shape-based methods [34] have recently undergone limited in vitro validation important to provide confidence in the outputs. We are observing a shift towards earlier utilization of these more efficient computational methods. The results validate the hypothesis that Bayesian models identify subsets of libraries enriched with actives, through testing a very small percentage (<1%) of the number of compounds usually probed with HTS. For example, the whole-cell screening hit rate in the SRI studies (used to create the Bayesian models) was (∼1.7–5%) [12][14]. The Institute for Tuberculosis Research hit rate (Table S5) for screening has depended on compound concentration, assay readout and library type (diversity: 0.67–4.55%; approved drug: 16–21%). In the current study with the computational screening of 82,403 molecules, we assayed 550 molecules and identified 124 actives across commercially available datasets (average hit rate of 22.5%). Such models can clearly select for molecules with whole–cell activity [35], enabling fewer compounds to be tested to find a diverse array of actives. Such an approach may also assist the traditional medicinal chemistry workflow, not only by providing hits more efficiently, but through decreasing the time and cost involved in their evolution to leads and eventually a clinical candidate.

Interestingly, the MLSMR Bayesian model for antitubercular activity previously identified a series of Mtb inhibitors with the pyrazolo[1,5-a]pyrimidine core [32]. In this study, five of the actives found with the MLSMR dose response and cytotoxicity model (SYN 20119330, SYN 20119329, LEG 19176288, SYN 20120645 and LEG 13368564; Figure 1) maintain the core pyrazole, but have different substitution patterns involving either a fused pyridine or tetrahydropyridine. These may help to expand the structure-activity relationship for this inhibitor class. In addition, we have tested two compounds (SYN22269211 and SYN 22269234) that confirm we can expand on this series (Table S4) and future efforts will specifically examine Bayesian models for hit-to-lead and lead optimization. These are very labor-intensive processes, as typified by the successful evolution of CGI-17341 to the clinical candidate PA-824 over the course of hundreds compounds and many years [36], [37] and the pursuit of next-generation nitroimidazoles by Denny and colleagues [38][41].

The TAACF kinase dose response and cytotoxicity model identified novel hits (SYN 15409784, SYN 15411227, SYN 15410071, SYN15409672, SYN15410833, SYN15408578, SYN 15408469, SYN15410837, SYN15408522, SYN 15411076, SYN 15408444 and SYN 15410945 in Figure 1) featuring a tertiary amine nitrogen tethered by three carbons to an imidazole and to two different aromatic moieties each by a methylene. Intriguingly, these molecules share the imidazolylpropylamino functionality of a set of antitubercular hits found via HTS of a commercial kinase-focused library (Table 7 in [14]). Other molecules retrieved by these models appear to be unique (Figure 1). The Bayesian models provide an encouragingly high hit rate for this dataset (14.8%) based on the single concentration MABA data. However, none of these hits displayed a satisfactory SI>10, defined in this case as CC50 Vero/MIC MABA, (Figure 1). In contrast, the Maybridge data set (Table S2) showed a higher single concentration MABA hit rate (28.7%) and several apparently selective molecules meeting the SI>10 criteria (BTB05726, BTB14927, HTS 12819, JFD00897, JFD01059, JFD02381, KM02770, and KM03304). It is interesting to note that the MLSMR Bayesian model also correctly rank ordered JFD02381 and JFD02382 which differ by only a methyl group, but have MABA MIC of 5.84 µg/mL and >100 µg/mL, respectively (Table S2).

Surprisingly, many of the known human kinase inhibitors with Mtb growth inhibition did not exhibit acceptable SI values of >10. This may reflect the modest antitubercular activity of these small molecules due to comparable binding affinities for kinase ATP binding sites common to targets in both cells [42], [43]. The issues surrounding the concurrent optimizations of Mtb kinase inhibition and antitubercular whole-cell efficacy are known [44] and certainly are magnified given the concern over mammalian cell cytotoxicity. Additionally, studies demonstrating the antagonistic effects of human kinase inhibitors on TB infection through, for example, reactivation must be noted [45], [46]. The largest selectivity indices in our study were 6 for XL880 and 3 for NVP-TAE684 and AP24534. Only a few of these kinase inhibitors are approved drugs: lapatinib (breast cancer targeting HER2), sorafenib (renal cell carcinoma, multikinase inhibitor), vandetanib (medullary thyroid cancer, multikinase inhibitor) and regorafenib (metastatic colorectal cancer, VEGFR2-TIE2 inhibitor). While the in vitro antitubercular activities of these molecules are likely much poorer than their known human kinase activities, they represent the potential for repurposing [47], [48] and specifically underscore the value of in silico repurposing as we have described previously [49], [50]. They may also indicate new targets in Mtb to be pursued and significant interest exists in Mtb kinases [51], [52]. Based on MABA and LORA MIC values of 5.9 and 5.3 µg/ml, respectively, XL880 may be worthy of follow-up chemistry efforts. This multikinase inhibitor has sub-nanomolar potency versus human c-Met and anti-angiogenesis potential [53][55] but has not previously been shown to have activity versus Mtb. Others have suggested the approved kinase inhibitor imatinib (gleevec) has antitubercular effects by targeting the host kinases [56]. Direct effects of other compounds on Mtb might, therefore, be supplemented by modulating host kinases [57], [58]. Several other studies have screened libraries of kinase inhibitor compounds against whole cell Mtb with hit rates of 0.14% [17] and 5% [14] versus cultured Mtb. This is the first study to our knowledge in which several of the hits from screening a kinase library have been approved drugs or advanced clinical candidates. Clearly while these compounds target kinases in humans, they also potentiate targets in Mtb.

Only one of the three Bayesian models performed better than random in identifying active molecules from the kinase inhibitor dataset. The performance of the TB kinase dose response and cytotoxicity model is perhaps not surprising given its lower validation statistics. The MLSMR dose response and cytotoxicity model, however, performed similarly and displayed much better validation statistics. Clearly other factors are at play such as their differential learning of cytotoxicity from each dataset as well as antitubercular efficacy.

These extensive evaluations combining prospective prediction and in vitro testing, suggest Bayesian machine learning models for Mtb can identify novel structural classes of antituberculars. While defining actives with both efficacy and selective cytotoxicity may be ideal, our efforts show it is still difficult to achieve this in practice consistently; it is possible in some cases to achieve the desired SI>10 (Table S2). Still more evaluation is warranted to understand how such dual event Bayesian models can provide increased confidence in predictions. We have identified an opportunity for TB researchers to collaboratively use computational models to identify molecules with whole-cell activity and in some cases acceptable mammalian cell cytotoxicity. The weight of evidence we now submit alongside our previous studies [22][24], [32], [35] overwhelmingly argues for the inclusion of such computational approaches prior to additional large-scale HTS for Mtb based on their ability to identify compounds with whole cell activity alone. We can, thus, focus resources on testing compounds more likely to have favorable activity and promising selectivity. Resources may be saved for more expensive in vivo studies and later drug development costs. We now provide this set of 124 hits derived by Bayesian models and validated in vitro as a resource to the public to further investigate potential targets and mechanisms by which they are active against Mtb. Understanding how we can further optimize these hits and avoid cytotoxicity may lead to new treatments for tuberculosis.

Materials and Methods

Ethics Statement


Small Molecules

Small molecules for biological assay were purchased from Asinex Corp. (Winston-Salem, NC), Maybridge/Thermo Fisher Scientific Inc. (Waltham, MA) and Selleck Chemicals (Houston, TX). Compounds were used as supplied from the commercial company. No overt solubility issues were identified.

CDD Database and SRI Datasets

The development of the CDD TB database (Collaborative Drug Discovery Inc. Burlingame, CA) has been previously described [24]. The Tuberculosis Antimicrobial Acquisition and coordinating Facility (TAACF) and Molecular Libraries Small Molecule Repository (MLSMR) screening datasets [12][14] were collected and uploaded in CDD TB from sdf files and mapped to custom protocols [59]. All of the Mtb datasets used in model building are available for free public read only access and mining upon registration [60], [61], making them a valuable molecule resource for researchers along with available contextual data on these samples from other non Mtb assays. These datasets are also publically available in PubChem [62].

Machine Learning using Bioactivity and Cytotoxicity Data

We have previously described the generation and validation of Laplacian-corrected Bayesian classifier models [22][24] developed with single point screening and dose response data. In this study we have generated Laplacian-corrected Bayesian classifier models using Discovery Studio 2.5.5 [25], [63][66]. Molecular function class fingerprints of maximum diameter 6 (FCFP_6) [67], AlogP, molecular weight, number of rotatable bonds, number of rings, number of aromatic rings, number of hydrogen bond acceptors, number of hydrogen bond donors, and molecular fractional polar surface area were calculated from input sdf files using the “calculate molecular properties” protocol to distinguish between compounds that are active against Mtb and those that are inactive in this study. Bayesian classifier models with the molecular descriptors described above were built using the “create Bayesian model” protocol and: 1. the MLSMR [12] dose response (IC90) and cytotoxicity data for 2,273 compounds (165 active with IC90<10 µM and selectivity SI>10 in Vero cells) [32]; 2. the single point screening data for 23,797 compounds from a library based on kinase inhibitor scaffolds (1,308 active; >90% inhibition at 10 µg/ml); 3. half of the maximal inhibitory concentration (IC50) dose response data for 1,248 compounds from a library based on kinase inhibitor scaffolds (663 active; IC50<5 µg/mL); 4. the IC90 and cytotoxicity data for 1,248 compounds from a library based on kinase inhibitor scaffolds (182 active with IC90<10 µg/mL and selectivity SI>10 for Vero cells). Each model was validated using leave-one-out (LOO) cross-validation. Each sample was left out one at a time, and a model built using the results of the samples, and that model used to predict the left-out sample. Once all the samples had predictions, a receiver operator curve (ROC) plot was generated, and the cross validated (XV) ROC area under the curve (AUC) calculated (Table 1). All models generated were additionally evaluated by leaving out 50% of the data and rebuilding the model 100 times using a custom protocol for validation, to generate the XV ROC and AUC (Table 1).

M. tuberculosis Assay for Biological Activity

Primary screening and minimum inhibitory concentrations (MIC) against replicating and non-replicating cultures of Mtb were determined using the microplate Alamar Blue assay (MABA [68], [69]) (except that 20 µL of 0.6 mM resazurin was used instead of the commercial Alamar Blue reagent) and the low oxygen recovery assay (LORA [70]), respectively. The former was determined against Mtb H37Rv ATCC 27294 (American Type Culture Collection) following 7 days incubation with test compounds. The latter was determined against low oxygen adapted Mtb H37Rv luxAB carrying a luciferase reporter gene following 10 days incubation under low oxygen followed by 28 hours of normoxic recovery. Both assays were conducted in microplate format in 7H12 medium [69]. For the purposes of this study a hit in the MABA primary screen was defined as achieving at least a 90% reduction in fluorescence relative to untreated controls. The LORA assay was used to determine if any of the hits were also active at low oxygen concentrations. LORA MIC data, was only run once with 8 concentrations and is in line with standard protocols used by this laboratory for many other libraries and studies. The MIC was defined as the lowest concentration producing a reduction of ≥90% in fluorescence (MABA) or luminescence (LORA) relative to untreated controls. Cytotoxicity for Vero cells was determined following 72 hours exposure [69]. Viability was assessed on the basis of cellular conversion of MTS into a soluble formazan product using the Promega CellTiter 96 Aqueous Non-Radioactive Cell Proliferation Assay.

Supporting Information

Supplemental material is available online. The Bayesian models created in Discovery Studio are available from the authors upon written request.

Figure S1.

MLSMR dose response and cytotoxicity model: good features from FCFP_6.


Figure S2.

MLSMR dose response and cytotoxicity model: bad features from FCFP_6.


Figure S3.

TB kinase single point model: good features from FCFP_6.


Figure S4.

TB kinase single point model: bad features from FCFP_6.


Figure S5.

TB kinase dose response model: good features from FCFP_6.


Figure S6.

TB kinase dose response model: bad features from FCFP_6.


Figure S7.

TB kinase dose response and cytotoxicity model: good features from FCFP_6.


Figure S8.

TB kinase dose response and cytotoxicity model: bad features from FCFP_6.


Figure S9.

Results for the 194 compounds tested in the Selleckchem kinase library screened for whole-cell TB activity with Bayesian models. Random rate is based on the empirical HTS hit rate; MLSMR is based on the MLSMR dose response and cytotoxicity model; CB2 is based on the CB2 dose response and cytotoxicity model [32]. Kinase is based on the MLSMR dose response and cytotoxicity model. Best curve is based on a 100% hit rate.


Table S1.

Mean (SD) leave one out and leave out 50%×100 cross validation of previously published Bayesian models (ROC = receiver operator characteristic) – data from [24].


Table S2.

SRI hits from Maybridge picked using Bayesian dose response and cytotoxicity models. More positive numbers from the Bayesian models suggest likely activity. SI is the selectivity index calculated by dividing the CC50 by either the MIC MABA or MIC LORA. Bold values have an SI greater than 10. Numbers in parentheses are the maximal similarity of the compound to members of the training set for the respective model. Bold is preferred model.


Table S3.

SRI hits from Selleckchem picked using Bayesian dose response and cytotoxicity models (MLSMR, CB2 [32] and Kinase). More positive numbers from the Bayesian models suggest likely activity. SI is the selectivity index calculated by dividing the CC50 by either the MIC MABA or MIC LORA. Bold values have an SI greater than 10. Numbers in parentheses are the maximal similarity of the compound to members of the training set for the respective model.


Table S4.

Additional follow up compounds for the pyrazolo[1,5-a]pyrimidine core. SI is the selectivity index calculated by dividing the CC50 by either the MIC MABA or MIC LORA. Bold values have an SI greater than 10.


Table S5.

Whole cell screening hit rates at Institute for Tuberculosis Research.



S.E. acknowledges colleagues at CDD for developing the software. Accelrys are kindly acknowledged for providing Discovery Studio. S.E. gratefully acknowledges Allen C. Casey for initial discussions on this project and continued encouragement. R.C.R acknowledges Judith V. Hobrath for ordering compounds and providing data for analysis.

Author Contributions

Conceived and designed the experiments: SE RCR. Performed the experiments: SE SGF BW. Analyzed the data: SE RCR SGF BW JSF. Contributed reagents/materials/analysis tools: SE RCR SGF BW BAB. Wrote the paper: SE RCR SGF JSF.


  1. 1. Balganesh TS, Alzari PM, Cole ST (2008) Rising standards for tuberculosis drug development. Trends Pharmacol Sci 29: 576–581.
  2. 2. Lopez AD, Mathers CD (2006) Measuring the global burden of disease and epidemiological transitions: 2002–2030. Ann Trop Med Parasitol 100: 481–499.
  3. 3. Dalton T, Cegielski P, Akksilp S, Asencios L, Caoili JC, et al. (2012) Prevalence of and risk factors for resistance to second-line drugs in people with multidrug-resistant tuberculosis in eight countries: a prospective cohort study. Lancet 380: 1406–1417.
  4. 4. Zhang Y (2005) The magic bullets and tuberculosis drug targets. Annu Rev Pharmacol Toxicol 45: 529–564.
  5. 5. Ballel L, Field RA, Duncan K, Young RJ (2005) New small-molecule synthetic antimycobacterials. Antimicrob Agents Chemother 49: 2153–2163.
  6. 6. Andries K, Verhasselt P, Guillemont J, Gohlmann HW, Neefs JM, et al. (2005) A diarylquinoline drug active on the ATP synthase of Mycobacterium tuberculosis. Science 307: 223–227.
  7. 7. Voelker R (2013) MDR-TB has new drug foe after fast-track approval. JAMA 309: 430.
  8. 8. Koul A, Arnoult E, Lounis N, Guillemont J, Andries K (2011) The challenge of new drug discovery for tuberculosis. Nature 469: 483–490.
  9. 9. Kaneko T, Cooper C, Mdluli K (2011) Challenges and opportunities in developing novel drugs for TB. Future Med Chem 3: 1373–1400.
  10. 10. Mak PA, Rao SP, Ping Tan M, Lin X, Chyba J, et al. (2012) A High-Throughput Screen To Identify Inhibitors of ATP Homeostasis in Non-replicating Mycobacterium tuberculosis. ACS Chem Biol 7: 1190–1197.
  11. 11. Stanley SA, Grant SS, Kawate T, Iwase N, Shimizu M, et al. (2012) Identification of Novel Inhibitors of M. tuberculosis Growth Using Whole Cell Based High-Throughput Screening. ACS Chem Biol 7: 1377–1384.
  12. 12. Maddry JA, Ananthan S, Goldman RC, Hobrath JV, Kwong CD, et al. (2009) Antituberculosis activity of the molecular libraries screening center network library. Tuberculosis (Edinb) 89: 354–363.
  13. 13. Ananthan S, Faaleolea ER, Goldman RC, Hobrath JV, Kwong CD, et al. (2009) High-throughput screening for inhibitors of Mycobacterium tuberculosis H37Rv. Tuberculosis (Edinb) 89: 334–353.
  14. 14. Reynolds RC, Ananthan S, Faaleolea E, Hobrath JV, Kwong CD, et al. (2012) High throughput screening of a library based on kinase inhibitor scaffolds against Mycobacterium tuberculosis H37Rv. Tuberculosis (Edinb) 92: 72–83.
  15. 15. Macarrón R Banks MN, Bojanic D, Burns DJ, Cirovic DA, et al. (2011) Impact of high-throughput screening in biomedical research. Nat rev Drug Disc 10: 188–195.
  16. 16. Gold B, Pingle M, Brickner SJ, Shah N, Roberts J, et al. (2012) Nonsteroidal anti-inflammatory drug sensitizes Mycobacterium tuberculosis to endogenous and exogenous antimicrobials. Proc Natl Acad Sci U S A 109: 16004–16011.
  17. 17. Magnet S, Hartkoorn RC, Szekely R, Pato J, Triccas JA, et al. (2010) Leads for antitubercular compounds from kinase inhibitor library screens. Tuberculosis (Edinb) 90: 354–360.
  18. 18. Ekins S, Mestres J, Testa B (2007) In silico pharmacology for drug discovery: methods for virtual ligand screening and profiling. Br J Pharmacol 152: 9–20.
  19. 19. Schneider G (2010) Virtual screening: an endless staircase? Nat Rev Drug Discov 9: 273–276.
  20. 20. Scior T, Bender A, Tresadern G, Medina-Franco JL, Martinez-Mayorga K, et al. (2012) Recognizing Pitfalls in Virtual Screening: A Critical Review. J Chem Inf Model 52: 867–881.
  21. 21. Barry CE, 3rd, Slayden RA, Sampson AE, Lee RE (2000) Use of genomics and combinatorial chemistry in the development of new antimycobacterial drugs. Biochem Pharmacol 59: 221–231.
  22. 22. Ekins S, Freundlich JS (2011) Validating new tuberculosis computational models with public whole cell screening aerobic activity datasets Pharm Res. 28: 1859–1869.
  23. 23. Ekins S, Kaneko T, Lipinksi CA, Bradford J, Dole K, et al. (2010) Analysis and hit filtering of a very large library of compounds screened against Mycobacterium tuberculosis Mol BioSyst. 6: 2316–2324.
  24. 24. Ekins S, Bradford J, Dole K, Spektor A, Gregory K, et al. (2010) A Collaborative Database And Computational Models For Tuberculosis Drug Discovery. Mol BioSystems 6: 840–851.
  25. 25. Prathipati P, Ma NL, Keller TH (2008) Global Bayesian models for the prioritization of antitubercular agents. J Chem Inf Model 48: 2362–2370.
  26. 26. Cho Y, Ioerger TR, Sacchettini JC (2008) Discovery of novel nitrobenzothiazole inhibitors for Mycobacterium tuberculosis ATP phosphoribosyl transferase (HisG) through virtual screening. J Med Chem 51: 5984–5992.
  27. 27. Oprea TI, Davis AM, Teague SJ, Leeson PD (2001) Is there a difference between leads and drugs? A historical perspective. J Chem Inf Comput Sci 41: 1308–1315.
  28. 28. Ekins S, Freundlich JS, Choi I, Sarker M, Talcott C (2011) Computational Databases, Pathway and Cheminformatics Tools for Tuberculosis Drug Discovery. Trends in Microbiology 19: 65–74.
  29. 29. Garcia-Garcia A, Galvez J, de Julian-Ortiz JV, Garcia-Domenech R, Munoz C, et al. (2005) Search of chemical scaffolds for novel antituberculosis agents. J Biomol Screen 10: 206–214.
  30. 30. Planche AS, Scotti MT, Lopez AG, de Paulo Emerenciano V, Perez EM, et al. (2009) Design of novel antituberculosis compounds using graph-theoretical and substructural approaches. Mol Divers 13: 445–458.
  31. 31. Singh N, Chaudhury S, Liu R, Abdulhameed MD, Tawa G, et al. (2012) QSAR Classification Model for Antibacterial Compounds and Its Use in Virtual Screening. J Chem Inf Model 52: 2559–2569.
  32. 32. Ekins S, Reynolds R, Kim H, Koo M-S, Ekonomidis M, et al. (2013) Bayesian Models Leveraging Bioactivity and Cytotoxicity Information for Drug Discovery. Chem Biol 20: 370–378.
  33. 33. Miller K (2012) Where Tuberculosis Meets Computation: 10 Points of Intersection. Biomed Comput Rev Summer: 20–28.
  34. 34. Ballester PJ, Mangold M, Howard NI, Robinson RL, Abell C, et al.. (2012) Hierarchical virtual screening for the discovery of new molecular scaffolds in antibacterial hit identification. J R Soc Interface.
  35. 35. Sarker M, Talcott C, Madrid P, Chopra S, Bunin BA, et al. (2012) Combining cheminformatics methods and pathway analysis to identify molecules with whole-cell activity against Mycobacterium tuberculosis. Pharm Res 29: 2115–2127.
  36. 36. Ashtekar DR, Costa-Perira R, Nagrajan K, Vishvanathan N, Bhatt AD, et al. (1993) In vitro and in vivo activities of the nitroimidazole CGI 17341 against Mycobacterium tuberculosis. Antimicrob Agents Chemother 37: 183–186.
  37. 37. Stover CK, Warrener P, VanDevanter DR, Sherman DR, Arain TM, et al. (2000) A small-molecule nitroimidazopyran drug candidate for the treatment of tuberculosis. Nature 405: 962–966.
  38. 38. Blaser A, Palmer BD, Sutherland HS, Kmentova I, Franzblau SG, et al. (2012) Structure-activity relationships for amide-, carbamate-, and urea-linked analogues of the tuberculosis drug (6S)-2-nitro-6-{[4-(trifluoromethoxy)benzyl]oxy}-6,7-dihydro-5H-imidazo[2,1-b][1, 3]oxazine (PA-824). J Med Chem 55: 312–326.
  39. 39. Thompson AM, Sutherland HS, Palmer BD, Kmentova I, Blaser A, et al. (2012) Synthesis and structure-activity relationships of varied ether linker analogues of the antitubercular drug (6S)-2-nitro-6-{[4-(trifluoromethoxy)benzyl]oxy}-6,7-dihydro-5h-imidazo[2,1-b][1, 3]oxazine (PA-824). J Med Chem 54: 6563–6585.
  40. 40. Palmer BD, Thompson AM, Sutherland HS, Blaser A, Kmentova I, et al. (2010) Synthesis and structure-activity studies of biphenyl analogues of the tuberculosis drug (6S)-2-nitro-6-{[4-(trifluoromethoxy)benzyl]oxy}-6,7-dihydro-5H-imidazo[2,1-b][1, 3]oxazine (PA-824). J Med Chem 53: 282–294.
  41. 41. Thompson AM, Blaser A, Anderson RF, Shinde SS, Franzblau SG, et al. (2009) Synthesis, reduction potentials, and antitubercular activity of ring A/B analogues of the bioreductive drug (6S)-2-nitro-6-{[4-(trifluoromethoxy)benzyl]oxy}-6,7-dihydro-5H-imidazo[2,1-b][1, 3]oxazine (PA-824). J Med Chem 52: 637–645.
  42. 42. Feng BY, Simeonov A, Jadhav A, Babaoglu K, Inglese J, et al. (2007) A high-throughput screen for aggregation-based inhibition in a large compound library. J Med Chem 50: 2385–2390.
  43. 43. McGovern SL, Shoichet BK (2003) Kinase inhibitors: not just for kinases anymore. J Med Chem 46: 1478–1483.
  44. 44. Chapman TM, Bouloc N, Buxton RS, Chugh J, Lougheed KE, et al. (2012) Substituted aminopyrimidine protein kinase B (PknB) inhibitors show activity against Mycobacterium tuberculosis. Bioorg Med Chem Lett 22: 3349–3353.
  45. 45. Maiga M, Lun S, Guo H, Winglee K, Ammerman NC, et al. (2012) Risk of tuberculosis reactivation with tofacitinib (CP-690550). J Infect Dis 205: 1705–1708.
  46. 46. Teo M, O'Connor TM, O'Reilly SP, Power DG (2012) Sorafenib-induced tuberculosis reactivation. Onkologie 35: 514–516.
  47. 47. Kinnings SL, Liu N, Tonge PJ, Jackson RM, Xie L, et al. (2011) A machine learning-based method to improve docking scoring functions and its application to drug repurposing. J Chem Inf Model 51: 408–419.
  48. 48. Walsh CT, Fischbach MA (2009) Repurposing libraries of eukaryotic protein kinase inhibitors for antibiotic discovery. Proc Natl Acad Sci U S A 106: 1689–1690.
  49. 49. Ekins S, Williams AJ, Krasowski MD, Freundlich JS (2011) In silico repositioning of approved drugs for rare and neglected diseases. Drug Disc Today 16: 298–310.
  50. 50. Ekins S, Williams AJ (2011) Finding promiscuous old drugs for new uses. Pharm Res 28: 1786–1791.
  51. 51. Cavazos A, Prigozhin DM, Alber T (2012) Structure of the Sensor Domain of Mycobacterium tuberculosis PknH Receptor Kinase Reveals a Conserved Binding Cleft. J Mol Biol 422: 488–494.
  52. 52. Alber T (2009) Signaling mechanisms of the Mycobacterium tuberculosis receptor Ser/Thr protein kinases. Curr Opin Struct Biol 19: 650–657.
  53. 53. Bean J, Brennan C, Shih JY, Riely G, Viale A, et al. (2007) MET amplification occurs with or without T790M mutations in EGFR mutant lung tumors with acquired resistance to gefitinib or erlotinib. Proc Natl Acad Sci U S A 104: 20932–20937.
  54. 54. Qian F, Engst S, Yamaguchi K, Yu P, Won KA, et al. (2009) Inhibition of tumor cell growth, invasion, and metastasis by EXEL-2880 (XL880, GSK1363089), a novel inhibitor of HGF and VEGF receptor tyrosine kinases. Cancer Res 69: 8009–8016.
  55. 55. You WK, Sennino B, Williamson CW, Falcon B, Hashizume H, et al. (2011) VEGF and c-Met blockade amplify angiogenesis inhibition in pancreatic islet cancer. Cancer Res 71: 4758–4768.
  56. 56. Napier RJ, Rafi W, Cheruvu M, Powell KR, Zaunbrecher MA, et al. (2011) Imatinib-sensitive tyrosine kinases regulate mycobacterial pathogenesis and represent therapeutic targets against tuberculosis. Cell Host Microbe 10: 475–485.
  57. 57. Bruns H, Stegelmann F, Fabri M, Dohner K, van Zandbergen G, et al. (2012) Abelson Tyrosine Kinase Controls Phagosomal Acidification Required for Killing of Mycobacterium tuberculosis in Human Macrophages. J Immunol 189: 4069–4078.
  58. 58. Koul A, Herget T, Klebl B, Ullrich A (2004) Interplay between mycobacteria and host signalling pathways. Nat Rev Microbiol 2: 189–202.
  59. 59. Collaborative Drug Discovery website Available: Accessed 2013 April 4.
  60. 60. Hohman M, Gregory K, Chibale K, Smith PJ, Ekins S, et al. (2009) Novel web-based tools combining chemistry informatics, biology and social networks for drug discovery. Drug Discov Today 14: 261–270.
  61. 61. Collaborative Drug Discovery, Inc.
  62. 62. The PubChem Database website Available: 2013 April 4.
  63. 63. Bender A, Scheiber J, Glick M, Davies JW, Azzaoui K, et al. (2007) Analysis of Pharmacology Data and the Prediction of Adverse Drug Reactions and Off-Target Effects from Chemical Structure. ChemMedChem 2: 861–873.
  64. 64. Klon AE, Lowrie JF, Diller DJ (2006) Improved naive Bayesian modeling of numerical data for absorption, distribution, metabolism and excretion (ADME) property prediction. J Chem Inf Model 46: 1945–1956.
  65. 65. Hassan M, Brown RD, Varma-O'brien S, Rogers D (2006) Cheminformatics analysis and learning in a data pipelining environment. Mol Divers 10: 283–299.
  66. 66. Rogers D, Brown RD, Hahn M (2005) Using extended-connectivity fingerprints with Laplacian-modified Bayesian analysis in high-throughput screening follow-up. J Biomol Screen 10: 682–686.
  67. 67. Jones DR, Ekins S, Li L, Hall SD (2007) Computational approaches that predict metabolic intermediate complex formation with CYP3A4 (+b5). Drug Metab Dispos 35: 1466–1475.
  68. 68. Collins L, Franzblau SG (1997) Microplate alamar blue assay versus BACTEC 460 system for high-throughput screening of compounds against Mycobacterium tuberculosis and Mycobacterium avium. Antimicrob Agents Chemother 41: 1004–1009.
  69. 69. Falzari K, Zhu Z, Pan D, Liu H, Hongmanee P, et al. (2005) In vitro and in vivo activities of macrolide derivatives against Mycobacterium tuberculosis. Antimicrob Agents Chemother 49: 1447–1454.
  70. 70. Cho SH, Warit S, Wan B, Hwang CH, Pauli GF, et al. (2007) Low-oxygen-recovery assay for high-throughput screening of compounds against nonreplicating Mycobacterium tuberculosis. Antimicrob Agents Chemother 51: 1380–1385.