Accurate Prediction of Peptide Binding Sites on Protein Surfaces

doi:10.1371/journal.pcbi.1000335

Figure 1.

Overview of the method.

(A) A training dataset of protein–peptide complexes is extracted from the Protein Data Bank [20]. (B) The peptide residues are superimposed along with their associated binding environments. (C) Spatial Position Specific Scoring Matrices (S-PSSMs) are created based on the spatial distribution of 14 defined atom types (Table S3) in the binding site of each residue. compared to background protein surfaces sites (D) S-PSSMs corresponding to residues in a query peptide (FxPRD) are then scanned over the surface of the protein. (E) Potential binding sites for each residue of the query peptide are identified, which are then combined using the distance constraints dictated by the peptide sequence. (F) The binding site for the complete peptide is predicted and scored.

More »

Expand

Figure 2.

ROC curve showing performance in the large benchmark.

False positive rate (X axis) plotted against true positive rate (Y) for different p-value cut-offs. False positive predictions are defined as those that either have predicted the wrong binding site or have predicted a binding site for a peptide that is not known to bind. The figure shows the result for our approach (pepsite) at two distance thresholds defining accuracy (6 Å & 10 Å), and for 10 Å with a subset of proteins smaller than 100 amino acids. Equivalent values for rate4site on the same datasets are also shown as well as the ROC curve for pepsite using a stricter cross-validation (i.e., excluding similarities/homologies between proteins as given in the SCOP database).

More »

Expand

Figure 3.

Examples of applying the method.

Predicted peptides are depicted as spheres on the protein surface colored by amino acid type (prolines – pink, alanines and glycines - white, serines - orange, asparagines and glutamines - teal and aspartic/glutamic acid – red). (A) Binding of a collagen peptide (GPAGPPGA) on a human matrix metalloproteinase 2 (1eak). The peptide bound in the solved X-ray structure is colored in red. Note the predicted binding site differs however it is likely correct (see text). (B) Binding of the Ago hook peptide (PDNGTSAWGEPNESSPGWGEMD) on the PIWI domain of the Argonaute protein (PDB IDs: 1ytu [38]; 1w9h [39]): i) the best, though incorrect binding site; ii) the location of the other top scoring predictions (correct). (C) Prediction for the binding of an RGRGRGRG peptide to the human SMN tudor domain (PDB ID: 1mhn [40]), which agrees with NMR data. (D) Prediction of the leucine zipper (helical region 243–264) of the DBC1 sequence binding site on the catalytic domain of SIRT1 (PDB ID: 1m2g [42]) (E) Prediction for the binding of the LMP1 protein of the Epstein-Barr virus peptide DDPHGPVQLS on the TRADD protein (PDB ID: 1f2h [45]).

More »

Expand

Figure 4.

Using the method to scan for regions in Sec31 likely to bind Sec23.

(A) Predictions for the most conserved region of the Sec31 disordered 40 residue peptide segment (GPQNGWNDPPAL) on the Sec23/Sar1 complex. In red is the region of the peptide from the solved structure (PDB IDs: 2qtv [19], 1m2o [48]). (B) P-values (Y-axis) for each 12 residue peptides from residues 770 to 1100 of the Sec31 protein (X-axis) to identify the binding region. The lowest p-values, in the region 965–1010, are very close to the known binding site (981–1021). The black line under the graph shows the actual binding 40 residue peptide and the region colored in red-brown corresponds to the peptide predicted to bind shown in (A) of this figure.

More »

Expand