Biology must develop herd immunity against bad-actor molecules

In 2016, the 10 most prescribed therapeutics in the United States were again small-molecule drugs [1]. Small molecules have been game changing for infectious disease prevention and case management. However, novel antimicrobials are urgently needed to address emerging drug resistance, improve tolerability, and pioneer therapeutic options against currently untreatable infections. Drug discovery was originally the domain mostly of the pharmaceutical industry, but academic researchers have increasingly entered the field.


The issue
While these efforts have yielded viable lead candidates, they have also been compromised by the (re)discovery of undesirable "bad actor" compounds in vast numbers. In the medicinal chemistry field, bad actors are small-molecule screening hits that supposedly show a specific bioactivity but are undevelopable and/or lack true selectivity for the proposed target. Four different classes of these undesirable hits emerge frequently: i) compounds directly blocking assay signals [2]; ii) unreproducible hits due to impurities or decomposition of the matter [3]; iii) self-aggregating compounds that nonspecifically absorb targets [4]; and iv) frequently covalently reactive pan-assay interference compounds (PAINS) [5,6] and their natural product companions, invalid metabolic panaceas (IMPs) [7].
Whereas class i to iii problem hits can often be identified through orthogonal counterscreens and variation of experimental conditions [8], PAINS and IMPs can be difficult to spot. They were first noted based on their frequent appearance in independent high-throughputscreening (HTS) campaigns. Their initial bioactivity profiles are typically attractive and often extend to counterscreens, making triaging challenging. In contrast to well-behaved hits, however, PAINS and IMPs generate a variety of misleading assay results originating, amongst others, from covalent, unspecific protein reactivity [9], redox activity [10], and membraneinterference-disturbing cellular pathways [11]. Synthetic hit-to-lead development efforts of these chemotypes proved largely unsuccessful, consuming resources without a meaningful return.
The most relevant outcome of these efforts was the recognition of the problem by the medicinal chemistry community and the short listing of chemotypes that should best be discarded or, if selected for development, be advanced with great scrutiny [8]. A groundbreaking first PAINS compendium comprised over 450 structural classes [6], and several cheminformatics filters were subsequently developed to flag supposedly compromised substructures (Table 1). While this initiative has enjoyed much support from medicinal chemists [12,13], recent studies have cautioned against an uncritical application of electronic filters, since a broad assessment of HTS data sets suggested that many chemical scaffolds may be unduly flagged, while some high-frequency-hit compounds pass undetected [14][15][16]. This debate in the medicinal chemistry community reflects that these early algorithms were derived from a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 screens using a single approximately 100,000-entry library and assay type [6], which creates some condition-specific bias of the filters [10]. However, it was rightly pointed out that the worst offenders are represented by only 16 distinct chemical substructures, originally termed Family_Filter_A chemotypes [10]. Second-generation-assay platform-independent filters more recently developed by the pharmaceutical industry [17] and academic researchers [18] have validated the problem imposed by this subset of 16 structures alone.

Why does it matter?
Realizing that a coveted hit candidate is a promiscuous nuisance is upsetting but without negative impact on the field at large. However, PAINS and IMPs have contaminated the scientific literature in the form of thousands of published articles. The concern is usually not that individual experiments cannot be reproduced-quite the opposite, in fact-but that mechanistic models, claims of target selectivity, and/or excited predictions concerning translational potential are almost always unsubstantiated [5]. This avalanche of publications has created a selfperpetuating system, since commercial vendors often add compounds containing PAIN functionalities to their catalogues as supposedly selective bioactives, and later studies build on the earlier reports.
In addition to intense debate, the problem has triggered severe countermeasures in the medicinal chemistry field. In 2017, eight American Chemical Society (ACS) journals implemented a joint policy that newly discovered bioactives must be examined for known undesirable chemotypes, activity claims be supported by appropriate experimental strategies, and the specific activity of flagged compounds be validated by at least two different assays [8].
Although biologists and biology journals are active contributors to drug discovery, an equivalent tackling of the problem is lacking. For instance, among the worst individual PAINS chemotypes are rhodanine analogs [6] and polyhydroxylated natural phytochemical IMPs such as curcumin and resveratrol [5,19]. Rhodanines have shown a stable presence in the literature, and both of these IMPs enjoyed a steady increase in paper number per year in the past decade ( Fig 1A). Limiting this survey to PLOS journals, we found a comparable rhodanines profile and, encouragingly, a decline in the appearance of both IMPs since 2014-2015 ( Fig 1B). However, both have since plateaued at midlevel, indicating that the fight against PAINS may have started in the PLOS landscape, but the endgame of eradication must still be won.

Reasons for the persistence of PAINS and IMPs in the biology literature
Better herd immunity among biologists against publishing undesirable chemical scaffolds is therefore needed, which will require addressing the root cause of the problem. We believe that the reasons for persistence of PAINS in the biological literature are 3-fold: poor curation of many preassembled chemical libraries, inexperience of many biologists in medicinal chemistry, and an unwillingness to drop a compromised scaffold despite strong experimental evidence for major liabilities. Older diversity sets in particular are insufficiently curated against known problem chemotypes, returning a high hit rate in HTS campaigns that fails to deliver viable candidates. This problem extends to the Library of Pharmacologically Active Compounds (LOPAC) and compendiums of approved drugs that are revisited in the popular framework of drug-repurposing screens. Many bioactives originated from discontinued drug development campaigns, sometimes effectively creating PAIN-enriched collections, and approximately 5% of licensed drugs contain PAIN moieties [6,10,20]. These PAIN drugs were never subjected to the rigorous triaging system of modern drug development. For instance, doxorubicin-a quinone analogappeared as a hit in every antiviral screen that we have conducted against a bioactives library and was active in approximately 85% of over 4,000 published assays [14,20].
Biologists new to the chemical biology field are at greatest risk of selecting PAINS as screening "hits" based on their tantalizing initial bioactivities, often launching elaborate characterization and development studies. As time and major resources are invested, the desire to publish often prevails, even if it becomes obvious that no viable path to the clinic exists. In this case, the study justification usually switches to application of the identified compound as a useful chemical probe in future work. Quite the opposite is true, however. A probe compound must be more target selective and mechanistically precise than an actual drug to be of value [10], eliminating meaningful applications of PAINS and IMPs as research tools.

Basic measures to halt perpetuation of the problem
In order to overcome undesirable chemical scaffolds, a joint effort of biology researchers, reviewers, and editors will be required, matching that underway in the medicinal chemistry field. In the case of publishing bioactive chemicals in biology journals, however, some very basic reporting standards are lacking that must be established. We suggest the mandatory inclusion of four checkpoints into the submission package of studies investigating bioactivities of small-molecule chemicals: 1. Basic compound information: To expedite the cross-referencing of compounds against chemical databases and filter algorithms, structures must be submitted in electronically readable formats such as the simplified molecular-input line-entry system (SMILES) or molecular formula strings. In many biology journals, compounds are still described by twodimensional structure drawings only, making it unnecessarily cumbersome to integrate structures with chemical software. To eliminate the publication of nonreproducible bioactivities that are due, for instance, to compound decomposition or chemical impurities, hits must furthermore be synthetically validated, and substance purity stated.
2. Electronic filtering of undesirable chemotypes: A number of filter algorithms are publicly available that provide a first-pass view of the overall drug likeness of screening hits (Table 1). We have summarized the ongoing discussion of limitations of these filters, but major structural liabilities are recognized with sufficient accuracy to provide valuable information for reviewers and editors. We propose that submissions of new scaffolds for publication should be accompanied by analysis reports from at least two algorithms. Strategies to support development and maintenance of these filter servers should in parallel be discussed on an interdisciplinary level to ensure long-term public access.
3. Activity searches against chemical databases: Summaries of substructure searches of chemical databases such as SciFinder or PubChem against a submitted scaffold need to be provided to facilitate detection of potential frequent hitters. Although drugs can display polypharmacological behavior, and a diverse set of biological effects is therefore not a knock-out criterion [14], scaffolds associated with more than one supposed target must be treated with caution.

Experimental validation:
Publication of an identified Filter_A PAIN [10] needs to be justified by a detailed experimental characterization supporting specificity of the reported activity. For instance, positive target identification through characterization of direct compound binding using label-free technologies such as surface-plasmon resonance, biolayer interferometry, or isothermal titration calorimetry; the generation of informative resistance profiles for target characterization of pathogen-directed antimicrobials; and the development of a meaningful synthetic structure-activity relationship should be requested for a hit with severe potential liabilities.
In our opinion, implementation of these checkpoints is urgently needed to ensure the credibility of drug-discovery studies published in biological journals. Encouraged by the changes established by the ACS, we are optimistic that we can regain focus on exciting and developable drug candidates.