Structural Constraints on the Covariance Matrix Derived from Multiple Aligned Protein Sequences

doi:10.1371/journal.pone.0028265

Figure 1.

Number of contacts with protein length.

The number of contacts between pseudo-centroids for the domains included in the SCOP40 database is plotted against the number of residues in the domain (Protein length). The green line is the best-fit to the data over the plotted range.

More »

Expand

Figure 2.

Number of contacts per residue.

(a) The number of pseudo-centroid contacts is plotted against the fractional rank of the residues in the protein where 0 is most dense and 1 least dense. The red lines each represent data from the SCOP40 database in ten length bins spanning 100 to 200 residues. The green line is a fitted curve (described in the text). (b) Predicted contacts for 1f4p before correction (cyan) and after correction (red) are plotted along with the observed contacts (green) and the theoretical curve from part a (blue).

More »

Expand

Figure 3.

Number of contacts with sequence separation.

(a) The number of pseudo-centroid contacts is plotted against the fractional sequence separation of the pair with 0 = adjacent to 1 = terminal residues. The red lines each represent data from the SCOP40 database in ten length bins spanning 100 to 200 residues. The green line is a fitted curve (described in the text). Note: it is coincidental that this curve is similar to that plotted in Figure 2. (b) Individual data for 1f4pA plotted using the same colours as Figure 2.

More »

Expand

Figure 4.

Contact map for 1f4pA showing predicted contacts before re-balancing (lower-right) and after correction (upper-left).

Pseudo-centroid contacts under 8 Å are plotted in green.

More »

Expand

Table 1.

Fold recognition over the test decoy sets.

More »

Expand

Figure 5.

[parts: (a) 2trxA (b) 1f4pA (c) 3chyA (d) 5p21A]: True folds against (log) rank.

The cumulative total number of true folds is plotted against the log(rank) of the model in the ranked list of decoys, up to a maximum of 10,000 models (4). As less than this number were sometimes constructed, the plots can end in ‘mid-air’. The result for the basic PLATO method is plotted in bold cyan. The models constructed by the contact augmented method were ranked by three scores: red, using just the basic PLATO score; green, using just the predicted contacts, and blue, using their combined score. The plots in dashed lines are the results after re-balancing the contact matrix with the structural constraints described in the text, using the same colour assignments.

More »

Expand

Figure 6.

Overview of decoy model construction.

The Target sequence to be predicted is matched against the sequence database to generate a multiple Sequence alignment which is used both to predict secondary structure (Predicted Sec. Str.s) and residue contacts (Predicted contacts). These two derived data sets are combined to estimate pairwise packing interactions at the secondary structure element (SSE) level (Sec. Str. packings) which are used in the PLATO method firstly to select the structural class of the protein via 2D SSE layouts of the secondary structures. The corresponding stick models (3D ‘stick’ Forms) provide the framework over which different protein folds are combinatorially generated with pairings of secondary structures being evaluated by their predicted packing score. The ‘stick’ folds are then constructed at the residue (α-carbon) level giving the final set of Ca ranked Folds.

More »

Expand