Figure 1.
Capsid shells and the folded topology of a typical capsid protein.
A) Representative icosahedral viral capsid structures with varying sizes. The Satellite Tobacco Mosaic Virus which is a T = 1 virus has a radius of 8.8 nm, and the Paramecium bursaria Chlorella virus 1 (PBCV-1) which is a pT = 169 virus has a radius of 92.9 nm. Here pT stands for ‘pseudo T number’, which simply means the subunits are not chemically identical (the primary sequences are different). These protein shells are large in that they are assembled from tens of up to hundreds of protein monomers, and they are highly symmetrical. B) The signature jelly-roll of viral capsid proteins, with 8 β-strands forming two antiparallel sheets. The wedge or trapezoidal shape of this particular fold immediately reveals six flat surfaces for monomer-monomer interaction; the sides, the two loop ends and the top and the bottom. The prevalence of the jelly-roll fold among capsid proteins might be related to their relative ease for tiling.
Figure 2.
Comparison in structural fold space of capsid proteins and non-capsid ones.
Capsid proteins form large, highly symmetric protein shells (left), while generic proteins form other types of complexes (right), exemplified here by an RNA polymerase elongation complex. Overlap between the structural space of viral capsid proteins and that of generic proteins signifies the set of non-capsid ‘relatives’ of capsid proteins. Figure is for illustration purposes and not drawn to scale.
Figure 3.
Shown in pink is the density distribution of the lengths of non-capsid proteins, and that of capsid proteins is shown in blue. Viral capsid proteins appear to have overall larger domains compared to their cellular counterparts, with a few exceptionally complex domains having more than 600 residues. 600 was later used as a size cutoff in order to examine the two sets that are of comparable sizes.
Figure 4.
Clustering to find representative capsid folds.
Shown here are all pairwise distances between members from the same cluster (grey) and between members from different clusters (blue). Partitioning was chosen such that each cluster is maximally homogeneous, with no members within the same cluster being farther than 0.6 apart.
Figure 5.
The 56 representative capsid folds.
Domains within one cluster are superimposed on one another to show good structural alignment, with number of members in each cluster indicated. The prevalence of singlet clusters reflects the scarcity of structural data for many viral families.
Figure 6.
Capsid proteins are structurally distant from generic proteins.
Each curve plots the empirical cumulative fraction distribution of distances between one set of 56 proteins and their nearest neighbor in the complementary set. The comparison between the capsid set and the non-capsid proteins is colored in blue, while those from the 10,000 permutation tests are colored in grey. The average empirical cumulative fraction distribution of the 10,000 permutation tests is colored in red. The capsid set is clearly further away from its non-self set compared to what happens with random chances.
Table 1.
The 21 folds covered by structural relatives of capsid proteins.
Figure 7.
Statistical significance of test statistic.
No single case in the 10,000 permutations has resulted in 210 or fewer shared folds between the set of 56 protein domains and their complement set, which makes the p-value of our test statistic less than 0.0001, as an upper bound for the statistical significance.
Table 2.
Seven functional classes of proteins we studied are found to be not significantly distinguished in their folded topology.