A machine learning model trained on a high-throughput antibacterial screen increases the hit rate of drug discovery
Fig 2
In vitro testing of top-ranked predicted compounds from an FDA-approved compound library.
(A) Schematic of the screening protocol. Eighty-one commercially available compounds (from the top 100) were screened. (B) The screening identified 21 bioactive compounds with a positive predictive value (PPV) of 25.9%. Dark blue and red represent inactive and active compounds, respectively. (C) The top 100 ranked compounds selected for empirical testing belong to different drug families. Most of the compounds exhibiting bioactivity were known antibiotics or antimicrobial compounds. (D) The ratio of OD600nm and prediction scores were plotted against the predicted rank of the corresponding compounds. The results show a linear correlation (Pearson correlation, R = 0.54) between the prediction score and bioactivity. The predicted score is the probability of a compound being active as calculated by the ML model. The predicted rank is the order of the compounds based on the predicted score, where compounds with the higher predicted scores are ranked higher. The red and blue triangles show the gradient of predicted rank and growth (measured as OD600nm), respectively. Dark blue and red indicate compounds’ probability of being inactive and active, respectively. Results are the average of at least three independent biological replicates. Fig 2A was created with https://biorender.com/.