Figure 1.
Effect of varying RMSD on structural variation within a class.
The plot shows the fragment content of equivalent BriX classes of length 7 created with fixed RMSD thresholds from 0.6 to 1 Angstrom. The increase in structural variation with higher RMSD thresholds is not uniformly distributed over all positions; there is a clear tendency towards the terminal positions (both carboxy- and amino-terminal), resulting in a fan-like arrangement.
Figure 2.
(A) Effect of increasing RMSD threshold. Shown is the number of BriX classes (circles) and the percentage of classified fragments (squares) in function of the RMSD threshold (0.5–1.0 Angstrom) used during the clustering for fragments containing 7 residues. As expected, higher thresholds result in fewer fragment classes and more identified recurrent fragment structures as the variation within a class is higher and a class thus contains more elements. A threshold of 0.6 Angstrom is sufficient to classify more than half of all fragments of length 7. (B) Number of classes for varying fragment lengths. Shown is the number of classes in function of the fragment length clustered with a fixed RMSD threshold (circles) of 0.9 Angstrom and a RMSD proportional to the fragment length (squares), by increasing the RMSD with 0.1 Angstrom per residue. In both figures, the number of classes increases with the length until a turning point is reached, after which the number of classes drops steeply. When a fixed RMSD is applied, this turning point clearly occurs at fragment length 11, reaching the level of 2,740 classes. (C) Percentage of classified fragments for varying fragment lengths. Shown is the percentage of classified fragments in function of the fragment length clustered with a fixed RMSD threshold (circles) of 0.9 Angstrom and a RMSD proportional to the fragment length (squares), by increasing the RMSD with 0.1 Angstrom per residue. In both plots, the number of classified fragments smoothly decreases when larger fragment lengths are considered. When a proportional RMSD is applied, this decrease is less steep, resulting in a classification percentage of more than 40% compared with 26% (fixed RMSD) at fragment length 14.
Figure 3.
Structural hierarchy of classes based on RMSD distance.
The nodes are represented by means of a DSSP logo (generated using WebLogo [47]) and a denotation of the percentage of BriX classes (in black) and fragments (in red) it contains. At the second level, the hierarchical clustering is able to distinguish the two major secondary structure elements: strands and helices. These branches are further partitioned into loops and small turns. Notable is the content difference between the pure secondary structure nodes (k and p) at the bottom level of the tree. Although node k consists of 12.2% of all BriX classes, it only represents 19.8% of the fragments of the WHAT IF set. Node p, on the contrary, embodies 27.8% of the fragment space, while holding only 3.4% of the BriX classes. This discrepancy shows that the stronger structural constraints imposed on helices result in fewer and larger helical classes than the strand classes created with the same threshold.
Figure 4.
BriX statistics with regard to secondary structure content.
(A, B) Effect secondary structure on the respective classification. The plots show data for classes consisting of one secondary structure element, i.e., pure helical (red), strand (blue), turn (green), and loop (orange) classes. The data selection was based on the fragments or fragment classes having an overall DSSP content of more than 80% in these 4 structural elements. Shown is the percentage of classified fragments regarding an increasing distance threshold. Although the vast majority of helical fragments were found to be recurrent (A), the number of respective structural classes is low compared to the number of strand classes (B). Because of the stabilizing hydrogen bonds, helices do not allow a lot of variation, resulting in few large BriX classes. The variable character and infrequent occurences of loops and turns are the main reason for the small number of recurrent structures and poor classification results. (C) Classification results for the Astral40 validation test. The BriX fragment classification obtained from the WHAT IF globular structure set was used to classify fragments generated from the Astral40 structures. Experiments evaluating the effect of increasing threshold on the percentage of classified fragments were repeated for the full Astral40 set (open circles) and for the Astral40 structures of the major SCOP classes (all α [diamonds], all β [triangles], α/β [closed circles], and α+β [squares]). The initial classification results for the WHAT IF generated fragments (open squares) are shown for reference. The full Astral set follows a similar classification pattern as the WHAT IF set, showing that the latter gives a good representation of protein structures in general. The higher classification rate of helical proteins points to a lower structural variation within these structures.
Figure 5.
Reconstruction of human protein backbones using BriX classes.
(A, B) Local fit approximation for the reconstruction of the set of human protein structures: some examples. The backbones (in red) of α G25K GTP-binding protein (A) and β human C-reactive protein (B) fully covered with BriX classes (green). The covering algorithm selected 35 and 40 redundancy filtered fragment classes to describe the respective structures. (C, D) Global fit approximation for the reconstruction of the set of human protein structures: some examples. A backbone trace of α G25K GTP-binding protein (C) and β human C-reactive protein (D). The target proteins are shown in red and the approximations are shown in green. The overall RMSD is 0.4542 Angstrom and 0.5614 Angstrom, respectively.
Table 1.
Coverage of human proteins with BriX classes.
Figure 6.
Presence of structural switches within groups of fragments containing identical residue sequences.
(A) The effect of the fragment length on the structural variation. Shown is the percentage of identical sequence pairs in function of the structural distance between them for fragments of length 5 (red), 9 (blue), and 13 (green) in the Astral40 dataset. Clearly shown in the main histogram is the tendency of smaller fragments to manifest large structural variation. The smaller plot is the result of carrying out the Hierarchical Agglomeration process on nearly 40,000 sequences where this variation was recognized. The clustering considered two different distance thresholds: 1.5 Angstrom (red) and 2.0 Angstrom (blue) RMSD. The plot shows that for the vast amount of these sequences, 2 structural groups can be identified. (B) Example of structure differences for one amino acid sequence. The sequence AAVGL can adopt both a strand (left) and helix (right) conformation. The strand conformation is present in the Antigen 85-C protein (structure 1DQZ) and starts at residue-number 119. The helical conformation is cut from the 2,2-dialkylglycine decarboxylase protein (1D7U) at residue-number 311. (C) Amino acid usage in plastic sequences. Shown is the frequency of amino acids occurring in sequences that only allow small structural jumps, resulting in tiny variations of a certain conformation (in red) and in sequences where these jumps are larger, resulting in drastic structure switches (in blue). The green bars indicate the presence of the respective amino acids in fragments that were left unclassified in BriX classes, due to their irregular character. Three groups can be distinguished: amino acids promoting (1) a single-well defined regular structure (such as Tryptophan, Tyrosine, Phenylalanine, Cysteine, Asparagine, and Methionine), (2) several regular structures or structural jumps (such as Alanine, Leucine, and Valine), and (3) irregular structures (such as Glycine and Proline).
Table 2.
Representation of SCOP classes in the WHAT IF structure set.