Table 1.
Reference amino acid composition of globular proteins.
Figure 1.
The construction of the ANCHOR prediction method demonstrated on the N-terminal domain of human p53.
Left: IUPred prediction score for the full length human p53 (top) and S, Eint and Egain calculated for the disordered N terminal domain of human p53 (middle). Grey boxes show the three binding sites with the overlap of the RPA70N and RNAPII binding sites shown in dark grey. The outputs of the three individually optimized predictors are shown in black and their average, the final prediction score is shown in purple (bottom). Right: PDB structures of the binding sites in the N-terminal region of p53 (yellow) complexed with the respective partners (blue): MDM2 (top, PDB ID: 1ycq [57]), RPA 70N (middle, PDB ID: 2b3g [58]) and RNA PII (bottom, PDB ID: 2gs0 [59]).
Table 2.
Parameter and prediction accuracy values obtained during the optimization of ANCHOR.
Table 3.
Prediction efficiency of ANCHOR evaluated on the testing datasets.
Figure 2.
ROC curves obtained during the testing of ANCHOR.
ROC curves of the predictor with parameter sets optimized on each of the three training subsets and evaluated on the respective testing subsets are shown with red, green and blue lines. The line with unity slope corresponding to random prediction is also shown. The vertical line corresponds to FPR = 0.05, where the final predictor (the average of these three) is used.
Table 4.
Prediction efficiency of ANCHOR evaluated on an independent dataset (α-MoRFs dataset).
Figure 3.
The distinct amino acid composition of short disordered binding sites.
The average amino acid composition of the interacting parts of the short disordered binding sites compared to the average amino acid composition of (A) the globular proteins dataset, (B) the disordered proteins dataset and (C) the interacting parts of the shorter chains of the ordered complexes. Amino acids are arranged according to increasing hydrophobicity.
Table 5.
The independence of the efficiency of ANCHOR from the amino acid composition of the binding sites.
Figure 4.
Secondary structure distributions in the short disordered binding site dataset.
Fraction of amino acids in different secondary structures in the disordered chains of the complexes. The three groups denote the fractions calculated on all the residues in the PDB structures, only the interacting ones and the ones correctly identified by the predictor.
Table 6.
Secondary structure distributions in the short disordered binding site dataset.
Figure 5.
Prediction accuracies and segmentation for the short and long disordered binding sites.
(A) The distribution of the number of binding segments predicted in short (white bars) and long (black bars) binding sites. It shows the segmented nature of longer binding sites. (B) The distribution of the fraction of correctly recovered interacting residues in both the short (white bars) and long (black bars) disordered binding sites.
Figure 6.
ANCHOR prediction for human p27.
Top: Number of atomic contacts (green) and prediction output (blue) and for the N-terminal binding region of human p27. “D1”and “D2” denote the two strongly interacting domains (red boxes) and “LH” denotes the weakly interacting linker domain between them (yellow box). Bottom: Crystal structure of human p27 (red and yellow) complexed with CDK2 (magenta) and Cyclin A (blue) (PDB ID: 1jsu [62]). Red parts denote regions that are predicted to bind by the predictor. These regions correspond to the experimentally verified strongly binding regions of p27. The figure was generated by PyMOL.
Figure 7.
ANCHOR prediction for human WASp.
Red bars mark known interaction sites, green box marks the globular WH1 domain, blue boxes mark the GBD and VCA domains. Light red boxes indicate the regions with putative SH3 domain interaction sites.
Figure 8.
Fraction of disordered and disordered binding site residues in complete proteomes.
The number of amino acids in disordered binding sites divided by the number of amino acids in disordered regions plotted as a function of the number of amino acids in disordered regions divided by the total number of residues in the proteome of the organism for the 736 complete proteomes deposited in the SwissProt database, colored according to the three kingdoms of life. The outlying points are marked with the name of the corresponding organism.
Figure 9.
Length distribution of disordered and disordered binding sites in complete proteomes.
The length distribution of (A) the disordered protein segments determined by IUPred and (B) predicted disordered binding sites determined by ANCHOR for the 736 complete proteomes available, grouped according to the three kingdoms of life.