Figure 1.
Defining FFRs in HIV Protease Using the Derived FF Score
(A) Comparison of temperature factor (dashed line) and weighted average of the two slowest modes (solid line) obtained with GNM. The HIV protease is modeled as a dimer; however, the plot shows results for a single chain.
(B) Gradient plot ranging from correlated (red) to anticorrelated (blue) movement for each residue in the dimer.
(C) Comparison of normalized scores for unweighted (dashed line) and correlation-weighted (solid line) modes for a single chain. Correlation-weighted modes define the FF score. Regions are identified as FFR when values exceed thresholds (red lines) greater than 1.5 and less than −1.5. The flap region (residues 46 to 56) exceeds the threshold after including correlated movement information (solid line).
(D) Structural mapping of FF score with gradient from negative (blue) to positive (red), (PDB ID: 1HIV).
Figure 2.
FF Score Identifies FFR in Bovine Pancreatic Inhibitor and Calmodulin
Comparison of unweighted (dashed line) and weighted (solid line) FF scores for BPTI ([top], PDB ID: 5PTI) and calmodulin (bottom, PDB ID: 1CLL). FF scores are mapped with the same gradient coloring from negative (blue) to positive (red) as the scale shown in Figure 1D. Both recognition loops (loop 1: residues 11 to 19; loop 2: residues 35 to 42) are identified in BPTI by the FF score, whereas loop 2 is not identified with the unweighted mode. For calmodulin, the FF score allows us to identify the central hinge for this protein (residues 68 to 91 shown in blue because it exceeds the negative threshold of less than −1.5). This central helix, containing eight turns, is known to collapse when bound to calcium and substrate.
Figure 3.
Preliminary Analysis of FFR as Identified by FF Score
(A) The average of all maximal FFR lengths plotted against overall protein length.
(B) The number of different sequence patterns observed for a given window size. Shown are the pattern counts for regions classified as FFR (dash line), non-FFR (thin line), and irrespective of classification (thick line). FFR regions sample a smaller sequence space compared to non-FFR regions. Patterns overlapping boundaries of FFR and non-FFR are excluded from these counts.
Table 1.
FFR Classification Preference for Secondary Structures
Table 2.
FFR Classification Preference for Amino Acids
Figure 4.
Predictor Performance Is a Function of Protein Length
(A) Sequence effect on false-positive (thick line) and false-negative (thin line) error rate. Shorter sequences tend to have higher false positive identification of FFRs when trained on a nonpartitioned dataset.
(B) Comparison of SVM prediction results trained on a nonpartitioned dataset (dashed lines) and a partitioned dataset containing proteins up to 200 residues (solid lines). Improvements were seen in both the false-positive (black) and -negative (red) rates.
(C) Comparison of SVM prediction results trained on a nonpartitioned dataset (dashed lines) and a partitioned dataset containing proteins larger than 200 residues (solid lines). Minor improvements were observed in false-positive (black) and -negative (red) rates.
Figure 5.
Predictor Performance in Identifying Domain Boundaries
Wiggle predictors were evaluated for domain boundary predictions on (A) a benchmark dataset containing domain boundary consensus between experts (BENCH), (B) a partitioned BENCH with proteins up to and including 200 residues (BENCHA), and (C) a partitioned BENCH with proteins longer than 200 residues. Definitions of domain boundaries were expanded up to a window size of 15 (win15) with the boundary in the center.
Figure 6.
Performance of Wiggle Predictors on Arc Repressor
(A) The dimer conformation of the Arc repressor was used to model global fluctuation. Using the FFR definition, the plot for a single chain is shown on the left with structural mapping of values onto a dimer on the right. FF scores are mapped with the gradient code from negative (blue) to positive (red). Only the C-terminal tail exceeds threshold lines (red) and is defined as an FFR while the rest of the protein is not. (PDB ID: 1BAZ)
(B) The hinge between the two helices is identified by predictors as well as N-terminal residues important for DNA recognition. Predictions from Wiggle (solid line) are mapped in green on the structure and Wiggle200 (dashed line) are mapped in orange.
Figure 7.
Wiggle Predictors Identify Important FFR in PVUII Endonuclease
(A) Plot of FF scores and mapping of values in a gradient code from negative (blue) to positive (red) onto the structure of PVUII endonuclease in complex with DNA (yellow). The following structural features are labeled: (1) minor groove binding loop, (2) catalytic loop, (3) potential hinge for DNA binding, (4) tyrosine 94 for Mg++ ion coordination, and (5) major groove binding loop. (PDB ID: 3PVI).
(B) Wiggle predictions (solid line) are mapped in green and Wiggle200 predictions (dashed line) are mapped in orange onto the structure.
Figure 8.
Wiggle Predictors Identify Regions Corresponding to Glycosylation Sites on Erythropoietin
(A) FF score plotted against residue number with thresholds shown in red. Erythropoietin is modeled by the GNM in the complexed form with the corresponding receptor (not shown). All residues have below mean fluctuation (colored blue), but none of the residues are defined as FFRs since they do not exceed the definition threshold. The four glycosylation sites (S126 and lysine substituted K24, K38, and K83) along with G151 are labeled. (PDB ID: 1EER)
(B) FFRs correspond to positive values as predicted by Wiggle (solid line) and Wiggle200 (dashed line) which are structurally mapped onto erythropoetin (green and orange, respectively). Not all loops are identified by the predictors to be functionally flexible, thus showing that discrimination is not based on structural features.
Figure 9.
Comparison of Wiggle Predictors to Structural Disorder Predictors
Comparison of prediction results from Wiggle (red) to various disorder predictors (blue).
Table 3.
Comparison of Predictors Using TEST200 and TESTALL