Exploring the Composition of Protein-Ligand Binding Sites on a Large Scale

doi:10.1371/journal.pcbi.1003321

Table 1.

Summary data of structures from Binding MOAD used in the propensity calculations.

More »

Expand

Figure 1.

Bigger, hotter atoms have more “raw” contacts with ligands, on average.

Each amino acid is shown with its total number of raw contacts represented by vdw radii and color. The average contacts per atom range 0.16 to 2.42, which has been offset and scaled to 1.0–3.0 vdw radii. The hotter colors indicate more contacts per atom: deep blue ≤0.30, cyan = 0.70, green = 1.00, yellow = 1.55, orange = 2.00, and red ≥2.30.

More »

Expand

Table 2.

Comparison of “raw” ligand contacts to “surface” ligand contacts.

More »

Expand

Figure 2.

Frequencies of solvent-accessible SC with a cutoff of SASA ≥5 Å² and SASA ≥0.5 Å².

Residues are sorted by decreasing hydrophobicity. With the smaller cutoff, the pattern shifts to more hydrophobic residues because poorly exposed, interior residues are able to meet the criteria with only a small patch of exposed surface.

More »

Expand

Table 3.

Comparison of the average number of hydrogen-bonding contacts to surface residues.

More »

Expand

Figure 3.

Relative frequency of SC-only, BB-only or both (SC+BB) interactions per residue.

The residues with “SC” interactions in our analysis combine the SC-only and “SC+BB” contacts (blue+yellow). Residues are ordered by increasing BB-only frequency. Here, all Gly interactions are shown as BB-only to show its overall contribution to BB-only contacts. Due to rounding, columns may occasionally sum to a value other than 100%.

More »

Expand

Figure 4.

Frequencies of BB-only contacts in binding sites, sorted by increasing frequency on the protein surface.

Surface residues with 5 Å² or greater backbone SASA are shown. Gly interactions are shown as BB-only to stress that it constitutes the vast majority of such contacts. Due to rounding, rows may occasionally sum to a value other than 100%.

More »

Expand

Figure 5.

Frequencies and propensities of surface residues.

A) Frequencies of solvent-accessible side chains on the protein surface and in binding sites with SASA cutoff ≥5 Å². Due to rounding, rows in A) may occasionally sum to a value other than 100%. B) Median propensity of residues in ligand binding sites of valid and invalid ligands, analyzed across all proteins. Residues in A and B are ordered by increasing frequency on surface. C) Ratio of residue propensity for valid versus invalid binding sites. Residues ordered by decreasing ratio. Error bars in B and C indicate 95^th percentiles of 10,000 leave-10%-out samples.

More »

Expand

Table 4.

Composition of binding sites for the top-20 valid ligands.

More »

Expand

Table 5.

Composition of binding sites for the top-20 invalid ligands.

More »

Expand

Figure 6.

Propensities of SC interactions in valid sites, with and without the top-20 ligands by frequency.

A) Propensities in valid sites. B) Propensities in invalid sites. The error bars represent 95^th percentile bounds based on leave-10%-out clustering within each set. Residues are ordered alphabetically.

More »

Expand

Figure 7.

Examining the variation in the data, based on sample size.

A) Protein surface, B) valid binding site, and C) invalid binding site frequencies, and D) valid binding site propensities of six residues. Values for subsets of the protein structure set, from 1% to 99% of the full set are shown, with 100 samples at each percent point.

More »

Expand

Table 6.

Median, standard deviation, and 95% confidence interval for the propensity of 6 representative residues.

More »

Expand

Figure 8.

Propensities in valid binding sites.

Propensities are broken down into A) enzyme and B) non-enzyme proteins. The black error bars represent 95^th percentile bounds based on leave-10%-out clustering. For context, red lines represent 95^th percentile bounds of propensities from 10,000 random samples of A) 2500 random, diverse proteins and B) 1000 random, diverse proteins (as seen in Table 4). Stars indicate residues whose median propensity value (leave-10%-out 95^th percentile error) falls outside of the 95^th percentiles of the randomly-sampled propensities.

More »

Expand