Table 1.
Summary data of structures from Binding MOAD used in the propensity calculations.
Figure 1.
Bigger, hotter atoms have more “raw” contacts with ligands, on average.
Each amino acid is shown with its total number of raw contacts represented by vdw radii and color. The average contacts per atom range 0.16 to 2.42, which has been offset and scaled to 1.0–3.0 vdw radii. The hotter colors indicate more contacts per atom: deep blue ≤0.30, cyan = 0.70, green = 1.00, yellow = 1.55, orange = 2.00, and red ≥2.30.
Table 2.
Comparison of “raw” ligand contacts to “surface” ligand contacts.
Figure 2.
Frequencies of solvent-accessible SC with a cutoff of SASA ≥5 Å2 and SASA ≥0.5 Å2.
Residues are sorted by decreasing hydrophobicity. With the smaller cutoff, the pattern shifts to more hydrophobic residues because poorly exposed, interior residues are able to meet the criteria with only a small patch of exposed surface.
Table 3.
Comparison of the average number of hydrogen-bonding contacts to surface residues.
Figure 3.
Relative frequency of SC-only, BB-only or both (SC+BB) interactions per residue.
The residues with “SC” interactions in our analysis combine the SC-only and “SC+BB” contacts (blue+yellow). Residues are ordered by increasing BB-only frequency. Here, all Gly interactions are shown as BB-only to show its overall contribution to BB-only contacts. Due to rounding, columns may occasionally sum to a value other than 100%.
Figure 4.
Frequencies of BB-only contacts in binding sites, sorted by increasing frequency on the protein surface.
Surface residues with 5 Å2 or greater backbone SASA are shown. Gly interactions are shown as BB-only to stress that it constitutes the vast majority of such contacts. Due to rounding, rows may occasionally sum to a value other than 100%.
Figure 5.
Frequencies and propensities of surface residues.
A) Frequencies of solvent-accessible side chains on the protein surface and in binding sites with SASA cutoff ≥5 Å2. Due to rounding, rows in A) may occasionally sum to a value other than 100%. B) Median propensity of residues in ligand binding sites of valid and invalid ligands, analyzed across all proteins. Residues in A and B are ordered by increasing frequency on surface. C) Ratio of residue propensity for valid versus invalid binding sites. Residues ordered by decreasing ratio. Error bars in B and C indicate 95th percentiles of 10,000 leave-10%-out samples.
Table 4.
Composition of binding sites for the top-20 valid ligands.
Table 5.
Composition of binding sites for the top-20 invalid ligands.
Figure 6.
Propensities of SC interactions in valid sites, with and without the top-20 ligands by frequency.
A) Propensities in valid sites. B) Propensities in invalid sites. The error bars represent 95th percentile bounds based on leave-10%-out clustering within each set. Residues are ordered alphabetically.
Figure 7.
Examining the variation in the data, based on sample size.
A) Protein surface, B) valid binding site, and C) invalid binding site frequencies, and D) valid binding site propensities of six residues. Values for subsets of the protein structure set, from 1% to 99% of the full set are shown, with 100 samples at each percent point.
Table 6.
Median, standard deviation, and 95% confidence interval for the propensity of 6 representative residues.
Figure 8.
Propensities in valid binding sites.
Propensities are broken down into A) enzyme and B) non-enzyme proteins. The black error bars represent 95th percentile bounds based on leave-10%-out clustering. For context, red lines represent 95th percentile bounds of propensities from 10,000 random samples of A) 2500 random, diverse proteins and B) 1000 random, diverse proteins (as seen in Table 4). Stars indicate residues whose median propensity value (leave-10%-out 95th percentile error) falls outside of the 95th percentiles of the randomly-sampled propensities.