Prediction of Protein Binding Regions in Disordered Proteins

doi:10.1371/journal.pcbi.1000376

Table 1.

Reference amino acid composition of globular proteins.

More »

Expand

Figure 1.

The construction of the ANCHOR prediction method demonstrated on the N-terminal domain of human p53.

Left: IUPred prediction score for the full length human p53 (top) and S, E_int and E_gain calculated for the disordered N terminal domain of human p53 (middle). Grey boxes show the three binding sites with the overlap of the RPA70N and RNAPII binding sites shown in dark grey. The outputs of the three individually optimized predictors are shown in black and their average, the final prediction score is shown in purple (bottom). Right: PDB structures of the binding sites in the N-terminal region of p53 (yellow) complexed with the respective partners (blue): MDM2 (top, PDB ID: 1ycq [57]), RPA 70N (middle, PDB ID: 2b3g [58]) and RNA PII (bottom, PDB ID: 2gs0 [59]).

More »

Expand

Table 2.

Parameter and prediction accuracy values obtained during the optimization of ANCHOR.

More »

Expand

Table 3.

Prediction efficiency of ANCHOR evaluated on the testing datasets.

More »

Expand

Figure 2.

ROC curves obtained during the testing of ANCHOR.

ROC curves of the predictor with parameter sets optimized on each of the three training subsets and evaluated on the respective testing subsets are shown with red, green and blue lines. The line with unity slope corresponding to random prediction is also shown. The vertical line corresponds to FPR = 0.05, where the final predictor (the average of these three) is used.

More »

Expand

Table 4.

Prediction efficiency of ANCHOR evaluated on an independent dataset (α-MoRFs dataset).

More »

Expand

Figure 3.

The distinct amino acid composition of short disordered binding sites.

The average amino acid composition of the interacting parts of the short disordered binding sites compared to the average amino acid composition of (A) the globular proteins dataset, (B) the disordered proteins dataset and (C) the interacting parts of the shorter chains of the ordered complexes. Amino acids are arranged according to increasing hydrophobicity.

More »

Expand

Table 5.

The independence of the efficiency of ANCHOR from the amino acid composition of the binding sites.

More »

Expand

Figure 4.

Secondary structure distributions in the short disordered binding site dataset.

Fraction of amino acids in different secondary structures in the disordered chains of the complexes. The three groups denote the fractions calculated on all the residues in the PDB structures, only the interacting ones and the ones correctly identified by the predictor.

More »

Expand

Table 6.

Secondary structure distributions in the short disordered binding site dataset.

More »

Expand

Figure 5.

Prediction accuracies and segmentation for the short and long disordered binding sites.

(A) The distribution of the number of binding segments predicted in short (white bars) and long (black bars) binding sites. It shows the segmented nature of longer binding sites. (B) The distribution of the fraction of correctly recovered interacting residues in both the short (white bars) and long (black bars) disordered binding sites.

More »

Expand

Figure 6.

ANCHOR prediction for human p27.

Top: Number of atomic contacts (green) and prediction output (blue) and for the N-terminal binding region of human p27. “D1”and “D2” denote the two strongly interacting domains (red boxes) and “LH” denotes the weakly interacting linker domain between them (yellow box). Bottom: Crystal structure of human p27 (red and yellow) complexed with CDK2 (magenta) and Cyclin A (blue) (PDB ID: 1jsu [62]). Red parts denote regions that are predicted to bind by the predictor. These regions correspond to the experimentally verified strongly binding regions of p27. The figure was generated by PyMOL.

More »

Expand

Figure 7.

ANCHOR prediction for human WASp.

Red bars mark known interaction sites, green box marks the globular WH1 domain, blue boxes mark the GBD and VCA domains. Light red boxes indicate the regions with putative SH3 domain interaction sites.

More »

Expand

Figure 8.

Fraction of disordered and disordered binding site residues in complete proteomes.

The number of amino acids in disordered binding sites divided by the number of amino acids in disordered regions plotted as a function of the number of amino acids in disordered regions divided by the total number of residues in the proteome of the organism for the 736 complete proteomes deposited in the SwissProt database, colored according to the three kingdoms of life. The outlying points are marked with the name of the corresponding organism.

More »

Expand

Figure 9.

Length distribution of disordered and disordered binding sites in complete proteomes.

The length distribution of (A) the disordered protein segments determined by IUPred and (B) predicted disordered binding sites determined by ANCHOR for the 736 complete proteomes available, grouped according to the three kingdoms of life.

More »

Expand