Figure 1.
Average molecule size ±SD (one standard deviation) plotted as the function of average anchor size for the largest clusters of similar compounds bound to the top-ranked predicted binding pockets.
Dotted lines separate clusters for which different anchor sizes were found (100%, 75%, 50% and 25% of the average ligand molecule respectively). Inset: cumulative distribution of the average pairwise RMSD of the anchor groups upon global superposition of the template proteins.
Figure 2.
The degree of sequence and structure conservation for the protein's ligand-binding region.
(A) Average sequence entropy, average normalized B-factor for (B) the Cα atoms and (C) the side chain heavy atoms as well as (D) a random property assigned to anchor and non-anchor CBRs. The populations of anchor and non-anchor CBRs were determined using different probability thresholds for anchor residues. Top plots show the p-value of the t-test applied to both populations of CBRs with respect to the property under consideration.
Figure 3.
Ligand binding pose prediction for human fibroblast collagenase (PDB-ID: 1hfc).
Predicted poses (thick, solid) from FINDSITELHM: (left) superimposed ligand with the anchor portion colored in white, (middle) minimized conformation with Amber and (right) generated by AutoDock are compared to the experimental binding pose of hydroxamate inhibitor (thin, transparent). RMSD values were calculated for heavy atoms. Selected binding residues are shown.
Table 1.
Docking results for the FINDSITELHM dataset in terms of ligand heavy atom RMSD from the crystal structure.
Figure 4.
Confidence index for ligand docking by FINDSITELHM.
Box and whiskers plots of the relationship between the accuracy FINDSITELHM in terms of the RMSD from the crystal ligand pose calculated for its heavy atoms and (A) the coverage of the anchor substructure by a target ligand, (B) the structural conservation of anchor binding mode expressed as the average pairwise RMSD (pRMSD) of the anchor functional groups, and (C) correlation between the pocket prediction accuracy by FINDSITE assessed by the distance between the predicted pocket center and the predicted center of mass of the native ligand. Boxes end at the quartiles Q1 and Q3; a horizontal line in a box is the median. Whiskers point at the farthest points within 1.5 times the interquartile range and circles represent the outliers.
Table 2.
Docking results for the Dolores dataset in terms of the fraction of recovered binding residues and specific native contacts.
Figure 5.
Ligand anchor identification for glutathione S-transferase from E. coli (PDB-ID: 1a0f).
Common anchor substructure (A) identified from weakly homologous threading templates as well as different variable groups (R) found in ligands complexed with the template proteins are presented.
Figure 6.
Caption as in Figure 5.
Figure 7.
Ligand anchor identification for the human MTA phosphorylase (PDB-ID: 1sd2; SCOP superfamily/family: Purine and uridine phosphorylases/Purine and uridine phosphorylases; EC: 2.4.2.28).
Common anchor substructure (A) identified from weakly homologous threading templates as well as different variable groups (at the positions R1–R7) found in ligands complexed with the template proteins are presented.
Figure 8.
Ligand anchor identification for lysine aminotransferase from M. tuberculosis (PDB-ID: 2cjd).
Common anchor substructure (A) identified from weakly homologous threading templates as well as different variable groups (R) found in ligands complexed with the template proteins are presented.
Figure 9.
Caption as in Figure 8.
Figure 10.
Caption as in Figure 8.
Figure 11.
Sequence and structure conservation for the selected ligand-binding sites.
(A) Glutathione sulfonic acid complexed with glutathione S-transferase, PDB-ID: 1a0f; (B) 5′-methylthiotubercidin complexed with MTA phosphorylase, PDB-ID: 1sd2; and (C) lysine and piridoxal-5′-phosphate complexed with lysine aminotransferase, PDB-ID: 2cjd. Sequence entropy (red – low, green – high), normalized crystallographic B-factors (red – low, green – high) and random value (red – 0.0, green – 1.0) are presented in left, middle and right column, respectively. The “anchor” part of the molecule is presented in white, whereas the variable part is shown in black.
Figure 12.
Ligand binding pose prediction for glutathione S-transferase (PDB-ID: 1a0f).
Predicted poses (thick sticks) from FINDSITELHM (superimposed ligand with the anchor portion colored in white and minimized conformation), AutoDock, LIGIN, Q-Dock and a randomly placed ligand are compared to the experimental binding pose (thin sticks). RMSD values were calculated for heavy atoms.
Figure 13.
Ligand binding pose prediction for MTA phosphorylase (PDB-ID: 1sd2).
Description as in Figure 12.
Figure 14.
Ligand binding pose prediction for lysine aminotransferase (PDB-ID: 2cjd).
Description as in Figure 12.
Figure 15.
Coverage by anchor and non-anchor functional groups of conserved enzyme substrate substructures from 35 ligand clusters identified for 24 enzymes identified by Babbitt and coworkers [30].
Table 3.
Library ranks assigned to FDA-approved drugs in virtual screening for HIV-1 protease inhibitors.
Figure 16.
Enrichment behavior for FINDSITE (molecular fingerprints) and FINDSITELHM (anchor coverage) approaches compared to a random ligand selection in virtual screening for HIV-1 protease inhibitors.
FINDSITE/FINDSITELHM corresponds to the results obtained by applying data fusion.