Periodic Table of Virus Capsids: Implications for Natural Selection and Design

Background For survival, most natural viruses depend upon the existence of spherical capsids: protective shells of various sizes composed of protein subunits. So far, general evolutionary pressures shaping capsid design have remained elusive, even though an understanding of such properties may help in rationally impeding the virus life cycle and designing efficient nano-assemblies. Principal Findings This report uncovers an unprecedented and species-independent evolutionary pressure on virus capsids, based on the the notion that the simplest capsid designs (or those capsids with the lowest “hexamer complexity”, ) are the fittest, which was shown to be true for all available virus capsids. The theories result in a physically meaningful periodic table of virus capsids that uncovers strong and overarching evolutionary pressures, while also offering geometric explanations to other capsid properties (rigidity, pleomorphy, auxiliary requirements, etc.) that were previously considered to be unrelatable properties of the individual virus. Significance Apart from describing a universal rule for virus capsid evolution, our work (especially the periodic table) provides a language with which highly diverse virus capsids, unified only by geometry, may be described and related to each other. Finally, the available virus structure databases and other published data reiterate the predicted geometry-derived rules, reinforcing the role of geometry in the natural selection and design of virus capsids.

A Endo angle propagation and termination rules.
A result of subunit quasi-equivalence introduced in Ref. [1] (and discussed in Fig. 1B) and the trapezoidal subunit shape (a ubiquitous capsid subunit shape [2] present in viruses infecting all domains of life [4]) is that the inter-subunit angles (subunit-subunit dihedral angles) originating from the 1 pentamer (endo angles, introduced earlier [3]) must propagate through the adjacent hexameric lattice (depicted as arrows in Fig. 1B) in what we call endo angle propagation. Although endo angle propagation has been shown to affect neighboring hexamer shape within the natural canonical capsid [2], the interaction/interference of multiple propagations in the confines of a capsid has not been completely investigated and is discussed in Fig S1. Figure S1: We define endo angle rules for the three smallest capsids possessing hexamers (T = 3, 4, 7) within a "face" (a triangular facet containing hexamers and three adjacent pentamers). An endo angle (black ray) propagating from the shaded subunit-subunit interface belonging to a pentamer (A) is challenged and terminated by another endo angle (B, red dotted ray) propagated from a neighboring pentamer, not completely visible for T = 4), resulting in hexamer shapes and capsid endo angle features (C) that are h and k specific. In particular the differences in h-k relationships ensure hexamers of distinct shapes per capsid size (distinctly colored).

B Canonical vs. noncanonical capsids
All our specific predictions are directed towards canonical capsids where subunits (within any given capsid) are tilable and nearly-invariant in shape [2]. This is because the consequence of introducing/imposing curvature into the shell is conveniently imposed as endo angle propagations [3], which then allows for hexamer shapes to be precisely characterized (Section C). However, that our predictions apply to all structurally characterized spherical capsids indicate parallel constraints applied to noncanonical capsid hexamers. It will be interesting to see the differences and similarities between the constraints acting on canonical and noncanonical capsids.

C Defining hexamer complexity C h
Hexamer complexity C h is the minimal number of distinct hexamer shapes that a canonical capsid [2] of specific size (defined by h, k or T ) contains. The possible hexamer shapes that a canonical capsid may possess are shown in Fig. S2B (derived by inspecting Fig. 2 and assuming the working of endo angle propagation and termination rules in Fig. S1).

D Counting hexamer shapes
Previously, we showed that different arrangements of endo dihedral angles (designated "e") among non-endo, or exo angles (designated "x") in a hexamer define distinct hexamer shapes [3]. This assumption has been shown to be true for those natural canonical capsids that have afforded investigation [3]; specifically, we showed that the smallest capsids from each class (T = 3, 4, 7) possess distinct hexamer shapes, named in accordance with the hexamer coloring in Figs. 2 and 3: red (exexex; "ruffled"), blue (exxexx; "wing shaped"), and yellow (exxxxx, "single-pucker") hexamer shapes respectively [3]. These capsids possess the lowest C h of one. Larger capsids increase in C h due to the requirement of additional hexamer shapes colored in Fig. 2 as green (xxxxxx; "flat" 1 ) and cyan (e ′ xxe ′ xx, shaped as an "inverse wing" possessing inverse endo angles e ′ whose acute angles face outward). Figure S2: Hexamer shapes available to capsids. (A), although planar endo angle constraints are able to freely propagate within hexamers (left), only one complete non-planar (or "endo") angle constraint/propagation may be present within a hexamer (collinear propagations not included). If two non-linear/non-parallel propagations meet, one must terminate at that meeting point, which means that multiple non-linear endo angles may exist within a single hexamer only if terminated at its center. E Capsids with low C h are preferred From Fig. S3, we can surmise that, for the range of T numbers observed (T = 1...219 and for a more conservative/truncated range, T = 1...31), capsids with lower C h appear to be preferred as evidenced by a shift to lower C h distributions in observed versus expected capsid distributions. Table S1 lists the first twelve capsid sizes (T ) by class; those sizes displaying C h > 2 are indicated by boldface.
A major difference between the red and black graphs in Fig. S3 comes in the behavior in abundances of expected C h = 3 capsids, that mostly belong to the h > k > 1 regime. Specifically, as we increase from the (n−1) th period to n th period in the periodic table, class 1 (where C h mostly equals 2) and class 3 Figure S3: Capsids tend to prefer lower C h than expected. Plotted in each graph is C h versus observed (solid, black lines) and expected abundances (dotted, red lines) obtained from 119 capsids (A) and 52 family entries (B) each shown for the complete available capsid size range (T = 1...219; left) and a truncated range (right). The expected dataset assumed a uniform size(T )distribution for capsids in the displayed T -range. Family entries represent individual families, except for families displaying more than one capsid size, which were split to maintain one C h per family entry. entries (where C h mostly equals 4) increase by 1, while the class 2 entries (where C h mostly equals 3) increase in a more-or-less arithmetic progression by (n − 1) (evident in Fig 3C in the triangular shape of the class 2 group versus linear shapes of class 1 and class 3 groups respectively).
F Observed capsid abundance ∝ 1/C h Finally, excepting C h = 0 capsids (i.e., capsids that contain no hexamers, or T = 1 capsids), there is an inverse relationship between C h and observed capsid abundance (black lines in Fig. S3). The low observed abundance for C h = 0 capsids is expected, given that most virus families with true C h = 0 appear to be too small to accommodate enough genomic material to infect Table S1: The distribution of capsid sizes into the three morphological classes described by the relationship between the capsid's h and k. The percentage abundance (A(%)) of capsids in the three classes were obtained from a collection of 118 non-redundant capsids belonging to 39 diverse capsid families. Class as a primary source (therefore, most true T = 1 capsids belong to "satellite viruses" that are only able to infect hosts preinfected by a primary infector, presumably since those virus capsids provide insufficient volume to contain an independent infectious genome). Here, the additional/stronger evolutionary impediment appears to be a lower bounded genome size preference (i.e., a non-geometric preference imposing a constraint of C h > 0 may be overlaid with the inverse C h rule to obtain the observed or black graphs in Fig. S3).

G Is there a data-collection bias?
Here, we address the question: are our findings a result of a basic inability to sample structures of large C h , or does the data truly reflect our predictions? Fig. S3 (reflecting the rest of our data) was produced from a compilation of capsids obtained from (1) X-ray crystallography, whose prowess lies in obtaining high resolution capsid 3D structures of "small" sizes (e.g., T = 1...25), and (2) electron microscopy, where large capsids do not disallow the elucidation of capsid size or T number (which can be obtained from simple electron micrographs, if not by 3D capsid reconstructions). Consequently, we argue that if observable to a structural virologist, any new capsid of any size would not be far from finding a public domain home (thereby finding its way in our graphs). Thus we argue that our observed data does not reflect discrepancies in data collection as much as it lends credence to our geometric predictions.
Furthermore, if capsid collection were to be size constrained, it would sill not matter so much, since our existence rules are not size dependant as much as h, k dependant (e.g., although smaller than T = 25, the T = 19 capsid is expected to be higher in hexamer complexity and therefore lower 6 in abundance, which is the case).

H Basic definitions
The Kronecker delta function (δ x ) is quite integral to our future formalisms, and is therefore introduced here as a special topic. Specifically, δ x (or δ x,0 ) is an algorithm, that outputs 1 if x=0 and 0 otherwise, i.e., We can represent this algorithm by the limits or which may be used later on. We also utilize a convenient equation that produces a binary output after comparing two non-negative integers a and b: Some basic definitions: i.e., for all cases, Also, it follows that aδ a = 0 and

I Obtaining endo propagation length φ h,k
Definition of endo angle propagation length φ h,k . It is the distance (in capsomers, including the originating pentameric angle) that the endo angle is allowed to propagate from a pentamer into the hexamers before being intercepted (or terminated). Please refer to Fig. S1 for a review of the endo angle propagation and termination rules. From Fig. S1 (and corroborated in Fig. 2), we can obtain the endo angle propagation length for a capsid of size h, k: which can be described as Here, we obtain a mathematical/algorithmic expression for C h . We can treat the hexamer complexity C h as a sum of its components C h X , where X may be one of the five distinct hexamer shapes (i.e., X ∈ (W, R, S, F, I)), and C h X = 1 only if the hexamer shape "X" exists within the capsid. We now attempt to obtain the C h components for each hexamer shape.
Wing shaped (W). The presence of two linear adjacent endo angles within a hexamer automatically indicate that wing shaped hexamers must exist within the capsid, since the only hexamer that can accommodate two linear angles is the winged shape of profile exxexx [3] (we define a linearly adjacent angle set as a set of two angles within the hexamer of position i and i + 3, where i = i + 6, indicating the cyclic nature of the angles). Therefore, we will expect wing shaped hexamers when φ h,k > 1. So, the hexamer complexity contribution by the presence of a wing shaped hexamer will be Single pucker shaped (S). We can define the closest distance (in capsomer units) between two adjacent pentamers (P h,k ) as  Which is an interesting value, since it is also the maximum number of capsomers that the endo angle can propagate through, i.e., We can also show that if φ h,k ≥ P h,k /2, then the endo angles will form a complete/unbroken cage around the capsid (which is seen in classes 1 and 3). However, if we do not have "complete propagation", then we are guaranteed the existence of a single pucker hexamer, i.e., Ruffle shaped. We also know that if h = k (class 3) then P h,k = 2φ h,k (because if h = k then 2φ h,k = 2h = h + k = P h,k /2) and three adjacent endo angles will terminate at the central hexamer causing the presence a hexamer of exexex profile and of ruffled shape, so Inverse-wing shaped. We know that the ruffled exexex profile is rigid [3], so even the exo (x) to must remain constrained. Since this dihedral's acute angle faces the outside portion of the capsid, we call this special angle the inverse endo (e ′ ) angle. Since inverse endo angles are constrained, they must propagate between any two ruffled hexamers, resulting in the formation of a special inverse-wing shape in large enough capsids (h, k > 1) containing ruffled hexamers (h = k), i.e., we have Flat shaped. Finally, we know that a capsid of large enough size (h > 2) irrespective of class, must possess hexamers that are generally unaffected by endo angle constraints which are therefore generally flat, so Combining the above C h components, our resulting relationship for hexamer complexity will be K The number of hexamers N X We list the number of hexamers N X per hexamer type X: The list (Section L) of all virus capsids used in the abundancy analysis is available in the file MannigeBrooks SI b.xls.