SWEETLEAD: an In Silico Database of Approved Drugs, Regulated Chemicals, and Herbal Isolates for Computer-Aided Drug Discovery

doi:10.1371/journal.pone.0079568

Figure 1.

Effect of inaccurate ligand structural information on virtual screening performance.

A) and B) Chemical structures for indinavir as depicted by the OpenEye Scientific visualizations program VIDA. The 2D structures returned by ChemSpider (A – ChemSpider ID 4515036) and PubChem (B – PubChem ID 5362440) differ in the stereochemistry of a single chiral center, highlighted by the red circles. C) Effect of the differing structures of indinavir on docking results obtained by the OpenEye Scientific docking program FRED. Ten low energy conformers of each ligand were created with the OpenEye Scientific program Omega, and ligands were docked into the protein structure of HIV protease extracted from the indinavir-bound crystal structure PDB 2R5P. The best scoring pose for the PubChem (green carbons) and ChemSpider (orange carbons) ligands are shown in comparison to the crystallographic ligand (yellow carbons). While the correct structure, obtained from PubChem, scores in the top 6% of all approved drugs, the incorrect structure scores in the bottom 12%.

More »

Expand

Figure 2.

Workflow of the consensus building algorithm.

The described process of identifying a correct structure for a given drug begins with a drug or chemical name. In the first stage of the algorithm, the Data Collection stage, several databases are polled by the name and the database IDs linked to that name are retrieved and ranked by frequency each ID was returned (i.e., which ID is ‘most popular’ among databases polled). For each ID returned, the chemical structure associated with that ID is retrieved and standardized (salts removed, standard protonation states and aromaticity models, etc.). In the second stage, the Data Curation stage, the most popular structures from each database are compared. If all structures match, then the structure is assumed to be correct and is assigned to the drug name in the final SWEETLEAD database. If the structures do not match, an iterative cycling through the most popular structures for each database attempts to identify a consensus structure for the drug name. If a consensus or majority structure can not be identified, a manual review is undertaken. Finally, duplicate structures in SWEETLEAD are combined, to allow for numerous brand names and other identifiers for approved drugs.

More »

Expand

Figure 3.

Example outcomes of chemical names input into the SWEETLEAD workflow.

For the list of 1996 API names from the FDA orange book, the percentage of compounds is shown for which either a consensus structure, a majority vote structure, or no clear majority structure was identified via the SWEETLEAD algorithm. Of these drug names, a consensus or majority structure was determined for 91% of compounds.

More »

Expand

Figure 4.

‘Drug-like’ properties of approved drugs vs. non-approved compounds in SWEETLEAD.

Comparison of molecular descriptors frequently referenced as important to drug-likeness between approved drugs and other compounds in the SWEETLEAD database. The property distributions for both the approved drugs and non-approved compounds in SWEETLEAD are shown for A) molecular weight, B) the number of rotatable bonds, C) the number of hydrogen bond donors and D) acceptors.

More »

Expand