How a Spatial Arrangement of Secondary Structure Elements Is Dispersed in the Universe of Protein Folds

doi:10.1371/journal.pone.0107959

Figure 1.

Schematic examples of structure alignment by the RW and RR alignment schemes.

The protein structure is shown as a schematic model. Structures A and B were aligned by the RW scheme, and structures A and C were aligned by the RR scheme. The corresponding alignment plots are shown on the right hand side of figure.

More »

Expand

Figure 2.

Typical examples of the SQ, RW, and RR templates.

The cartoon structures on the left present the superposition structures of a query (red) and the closest structure (cyan) to the query. The figures on the right are alignment plots, on which aligned residue pairs are represented as circles. The horizontal axis of the plot represents the residue number of the query protein, and the vertical axis represents that of the template.

More »

Expand

Figure 3.

Comparison of the TM-scores among the best SQ, RW, or RR template and the corresponding query.

(A) The histograms of the TM-scores of the SQ, RW and, RR templates are represented as black, blue, and red lines, respectively. (B) The cumulative histogram of the target proteins with the TM-scores of SQ, RW, and RR templates that is equal to or greater than the abscissa. (C) The scatter plot of the TM-scores of the SQ template versus RR template. The horizontal axis represents the TM-scores of SQ template, and the vertical axis represents those of the RR template. The black arrow indicates the target displaying the largest difference of TM-score between the SQ and RR templates.

More »

Expand

Figure 4.

The SQ and RR templates of aromatic prenyltransferase.

The cartoon figures on the left represent the structure of aromatic prenyltransferase, which exhibited the largest difference in TM-score between the RR and SQ templates, and those in the middle present the SQ and RR template structures. In these figures, only structurally aligned regions are highlighted and colored. The graphs on the right are the alignments plots between the target and template.

More »

Expand

Figure 5.

Analysis of the target size dependence.

The percentage of target proteins with a TM-score of the SQ, RW, and RR templates equal to or greater than 0.5 as a function of the target protein size is presented. The horizontal axis represents the sequence length of the target proteins. The vertical axis represents the percentage of the target proteins with the TM-scores of SQ, RW, and RR templates equal to or greater than 0.5 for a given target size. The lines are colored in black, blue, and red for the SQ, RW, and RR templates, respectively. The dotted lines correspond to the data in which the contributions of protein folds that exhibit topological similarity to the target proteins are excluded.

More »

Expand

Figure 6.

Distributions of the number of structural neighbors.

The distributions identified by the SQ, RW, and RR schemes are shown in the double logarithmic plot. The contribution of the number of topologically similar structures is excluded for and . The red, blue, and black points represent the data from the SQ, RW, and RR schemes, respectively. These distributions are well fitted to the power-law model, as illustrated by the solid lines.

More »

Expand

Figure 7.

Scatter plots showing versus and versus for all fold representatives.

The colors of the plots represent the SCOP classes as follows: all- (red), all- (blue), / (green), + (yellow), and others (black). The break line represents the equation . (A) and (B) respectively represent the scatter plots showing versus and versus .

More »

Expand

Figure 8.

The top 10 SCOP folds having the largest numbers of structural neighbors identified by the RW (A) and RR (B) schemes.

Each bar represents a fold representative, and it is colored by SCOP class as follows: all- (red), all- (blue), / (green), + (yellow), and others (black). The height of the bar represents or . The target proteins are ordered by their values of or . For each target, the SCOP ID and SCOP fold ID are given under the bar. Short descriptions of each fold are also given in the bars.

More »

Expand

Figure 9.

Structures of the target d2ffga1 and its structural neighbors.

The cartoon representation of the protein structure possessing the most frequently observed spatial arrangement of SSEs (d2ffga1) and five examples of its structural neighbors (d1go4a_, d1aopa3, dlrlha_, d2jfga2, and d2p12a1) are presented. This spatial arrangement of SSEs consists of four strands and two helices, which are highlighted by colors in each structure. In the structure of d2ffga1, the strands and helices are highlighted in blue and red, respectively. In the other structures, the colors of the strands and helices with the same chain direction as those in d2ffga1 are identical to those in d2ffga1. The helices and reverse strands with opposing directions are colored in salmon and cyan, respectively. The connectivity diagrams are also shown near the cartoon representations. The color scheme is the same as those for the cartoon representations. The TM-score(d2ffga1 example) calculated by the SQ, RW, and RR schemes is also shown.

More »

Expand

Figure 10.

Graphical representation of the protein fold universe.

(A)–(C) The detailed graphical representations of the protein fold universe connected by the SQ (A), RW (B), and RR (C) schemes. In these networks, protein folds are represented by nodes and connected by the directed edge. The directed edge between and is created from to if the TM-score() is larger than 0.5. The node size is proportional to the out-degree of the node. Nodes are colored according to their SCOP class as follows: all- (red), all- (blue), / (green), + (yellow), and others (black). (D)–(F) The simplified networks of the protein fold universe based on the SQ (D), RW (E) and RR (F) networks. Each node represents all-, all-, /, and + classes defined in the SCOP database. The directed edge is drawn if is larger than 1.0, where represents the alignment scheme. The numerical value of is shown near the edge. The numerical values shown in parentheses in Figure 9(E) and (F) are and , respectively. The width of the arrows indicates the numerical value of .

More »

Expand

Figure 11.

SQ, RW, and RR templates for FM targets in CASP10.

(A) TM-scores of the SQ, RW, and RR templates identified from the PDB snapshot for all FM targets in CASP10. The target domains are ordered according to the TM-scores of their SQ templates. Open squares, asterisks, and filled circles represent TM-scores of SQ, RW, and RR templates, respectively. (B) The native (3td7), the SQ template (PDB id: 2wm5), and the RW template (3p8c) structure of the target T0737-1. For the template structures, only the aligned residues are shown as cartoon models. (C) The alignment plots between the native and the SQ template (left) and between the native and the RR template (right).

More »

Expand

Figure 12.

Comparison of the TM-scores among the best SQ, RW, or RR template and the corresponding query.

The histograms of the TM-scores of the SQ, RW, and RR templates are represented as black, blue, and red lines, respectively.

More »

Expand