Optimization of an Elastic Network Augmented Coarse Grained Model to Study CCMV Capsid Deformation

The major protective coat of most viruses is a highly symmetric protein capsid that forms spontaneously from many copies of identical proteins. Structural and mechanical properties of such capsids, as well as their self-assembly process, have been studied experimentally and theoretically, including modeling efforts by computer simulations on various scales. Atomistic models include specific details of local protein binding but are limited in system size and accessible time, while coarse grained (CG) models do get access to longer time and length scales but often lack the specific local interactions. Multi-scale models aim at bridging this gap by systematically connecting different levels of resolution. Here, a CG model for CCMV (Cowpea Chlorotic Mottle Virus), a virus with an icosahedral shell of 180 identical protein monomers, is developed, where parameters are derived from atomistic simulations of capsid protein dimers in aqueous solution. In particular, a new method is introduced to combine the MARTINI CG model with a supportive elastic network based on structural fluctuations of individual monomers. In the parametrization process, both network connectivity and strength are optimized. This elastic-network optimized CG model, which solely relies on atomistic data of small units (dimers), is able to correctly predict inter-protein conformational flexibility and properties of larger capsid fragments of 20 and more subunits. Furthermore, it is shown that this CG model reproduces experimental (Atomic Force Microscopy) indentation measurements of the entire viral capsid. Thus it is shown that one obvious goal for hierarchical modeling, namely predicting mechanical properties of larger protein complexes from models that are carefully parametrized on elastic properties of smaller units, is achievable.

: RMSD (nm) between A, B and C-type monomers.
C α atoms (nm) -initial atomistic structures (∆1-36) Dimers C α C α core (residues 50-178) AB vs. CC 0.213 0.185 The CCMV capsid is constructed from 180 copies of the same protein. Since in the T = 3 capsid geometry three symmetrically inequivalent locations for these proteins exist, their folds in these three sites -labeled A, B, and C -are slightly different. Their RMSD deviations (for the entire protein or only its central core) are compiled in Tab. S1; the deviations are visualized in an overlay of the three x-ray structures in Fig. S1 (taken from Speir et al., J. Virol. 80 (7), 3582-3591 (2006)). The biggest differences for the wild type monomer occur at the N-terminal tails, but in our study we use the ∆1-36 mutant where these parts are mostly clipped (the remaining parts at the N-terminal tail are found in the lower left corner in Fig. S1). For these mutants the most relevant differences occur in the C-terminal tails (lower right corner in Fig. S1), which determine also the interface between and the relative orientation of the monomers in a dimer complex. Tab. S2 and Fig. S2 repeat this analysis for dimers, i.e. provide a comparison between the two inequivalent types of dimers (AB and CC type) that occur in the CCMV capsid. Let us add one more comment on the N-terminal structural differences: These manifest at the fivefold and three-fold symmetry sites where proteins come together: While three B-and three C-monomers join their N-terminal tails into a β-barrel at the hexameric (= three-fold symmetry) center, there is no predominant structural motif at the pentameric symmetry site where five A-type monomers meet (for a recent computational study see Bereau et al., J. Chem. Theory. Comput. 8(10), 3750-3758 (2012)). The differences in the folds would start to appear at the cleavage site of the mutant, so constraining the tails there by an elastic network to a structure of the B-or C-type monomer would prevent a transition to the A-type monomer. Figure S1: Aligned monomers of CCMV (∆1-36). Chain A (blue), chain B (red), chain C (green). Figure S2: Aligned dimers of CCMV (∆1-36). AB type dimer (red/blue), CC type dimer (green). The alignment is enforced on the left side, by optimizing the superposition of an A-type and a C-type monomer. Fig. 2C and D of the main text considers a "twist angle" between the two monomers of a dimer. Fig. S3 illustrates its definition. A plane is defined by selecting the center of mass of 3 clusters within the rigid core of each monomer. The clusters are defined as the center of mass of the C α atoms of following residues: Cluster 1 = [41][42][43][46][47][48][49][111][112][113][124][125][126][127][128][129][130] Cluster 3: 80-87 (Note: the residue numbers are given with respect to the simulated deletion mutant starting with residue 37, therefore residue 41 in our simulated structure corresponds to residue 87 in the wild type structure). The angle between the two orthogonal plane vectors defines the twist angle. Figure S3: Twist angle. The two monomers are colored blue (type A) and red (type B).

Comparison of RMSF values for AB dimer simulations
The construction of a network influences not only the stiffness of individual monomers, but also those of the dimers-especially since the large scale motions at the hinges are strongly affected. Fig. S4 illustrates the extent of these fluctuations in three independent atomistic simulations of an AB dimer: one 400 ns long (black) and two 100 ns long (red and magenta); notice that the results are compatible with each other, meaning we have indeed reliable access to the RMSF values. We subsequently used the 400 ns trajectory as our reference. Figure S5 compares RMSF values for an AB dimer, obtained in an atomistic reference trajectory, with values obtained from coarse grained simulations, using the ELNEDYN parameters K500 and K200. The K500 parameters clearly make the dimer too stiff, but even the K200 parameters miss several important flexible regions. As the Fig. S9 (right) in the following section will show, the iterated IDEN network does a much better job in rescuing the magnitude of these RMSF values. Figure S4: RMSF values in three independent atomistic simulations of an AB dimer. The figure shows the 400 ns long reference simulation that was used for the iteration procedure (black), as well as two independent 100 ns long simulations (red and magenta). Variance and correlation cutoff for the IDEN elastic network Figure S6 shows the variance of C α distances in the atomistic reference simulation between all C α pairs for which the standard ELNEDYN criterion would result in the creation of a network bond (pairs are within 0.9 nm and separated by at least 2 residues along the sequence). Observe that some of these distances fluctuate quite substantially in the atomistic simulation, so it seems unwise to permanently link their corresponding C α atoms. Our variance cutoff choice of 0.025 nm 2 , which corresponds to a standard deviation of about 1.6Å (or 0.176 R C ), ensures that no bonds are placed between too strongly fluctuating atom pairs. Figure S6: Variance of C α bond lengths extracted from atomistic reference. Choosing a cutoff at 0.025 nm 2 , such that no bonds are set between C α pairs with a bigger variance, ensures that pairs with strongly fluctuating distances are not linked by a bond. Figure S7: Influence of the correlation cutoff c ij > c min on the density of the elastic network. All C α pairs within 0.9 nm and separated by at least 2 residues in the sequence are considered; c min takes the values 0.8, 0.7, and 0.6 in panels (A), (B), and (C), respectively.
In addition to the variance of distances, we also use the correlation coefficient c ij between the positions of two C α atoms, after mutual alignment. Figure S7 shows how different choices for the correlation cutoff influence the bond definition. Generally, a high correlation cutoff (e.g. c min = 0.8, see Fig. S7A) is restrictive and leads to only few C α pairs being bonded because of correlated motions, while a low correlation cutoff (e.g. c min = 0.6, see Fig. S7C) is generous and results in many more bonds. Note that the criterion of low distance variances already introduces beads in rigid regions with low distance fluctuations. Thus the covariance criterion is mostly important for regions with higher mobility. Here one wants to have pairs bonded that move in a comparatively correlated fashion. We selected c min = 0.7 as a compromise. Notice, though, that the physics of the situation permits a range of values without too much affecting the result: As we permit more and more bonds (by reducing c min ), their correlation drops and presumably their distance fluctuations rises, implying that the bonds we end up placing would after the iteration procedure be not particularly strong.

Convergence criterion for the iterative optimization of IDEN
During iterative optimization of the network we monitor the difference between the bond fluctuations σ 2 d (i, j) in the atomistic reference and the bond fluctuations σ 2 d,CG,n (i, j) in the coarse grained simulation (at iteration n). The difference D n (i, j) = σ 2 d (i, j) − σ 2 d,CG,n (i, j) is thus a variable of the bond pairs (i, j) and during the iteration process we average over all bond pairs and monitor D n . In Fig. S8 we plot D n and its standard deviation σ Dn as a function of iteration number n, normalized by the value at n = 0 (i.e., the original value before we have iterated). The absolute numbers can be found in Tab. S3. We see that both D n and σ Dn drop from their initial values and approach equilibria beyond which further iterations do not substantially improve the result anymore. Based on such plots one can decide when one stops the iterative refinement. We decided to use the values obtained at step n = 11 for our subsequent analysis; both mean and deviation appear to have converged, and the mean is actually very low. Figure S8: Iteration steps with direct scaling. Defining the random variable D n = σ 2 d (i, j)−σ 2 d,CG,n (i, j) as a measure for the difference between all-atom and coarse grained network fluctuations, we can monitor its (scaled) mean D n / D 0 and (scaled) standard deviation σ Dn /σ D0 as a function of iteration number n in order to monitor convergence.
As the iteration proceeds, both the variance between bond lengths and the RMSF values of residues, as measured in the coarse grained simulation, approach the values from the atomistic reference trajectory that we intend to reproduce. Fig. S9 illustrates that this works very well by comparing the atomistic reference with the CG data before iteration and the CG data after 11 iterations. Table S3: Iteration steps with direct scaling. Mean D n and standard deviation σ Dn of the difference variable D n = σ 2 d (i, j) − σ 2 d,CG,n (i, j), as obtained over the ensemble of all bonds (i, j). Notice that n = 0 corresponds to the starting situation before any iterations have been performed.

Additional analysis of the ROMs
The relative orientation maps (ROMs) in the main text are a way to visualize the relative motion of a dimer, by monitoring the projections of some chosen dimer axis onto the orthogonal plane of another chosen axis of the second dimer. Unfortunately, the three-dimensional geometry is very hard to conceptualize in a concrete way, but it turns out that this is not particularly important, as long as one can ensure that the orientational spread of all independent projections is comparable between atomistic target and coarse grained simulation. Should one wish to further quantify the ROMs, then there are several possibilities. One is to consider the ROM distribution functions in polar coordinates and marginalize them into radial and angular distributions, and for the latter one can easily determine mean and standard deviation. This is done in Tab. S4 for the X-and the Z-projection. However, the same procedure does not work well for the Y -projection, because it is centered, and as such an angular spread does not make sense. An alternative approach (which abandons any approximate angular symmetry between the two sets of angles but works under all conditions) is to marginalize over the cartesian (horizontal and vertical) projections of the ROM, as illustrated in Fig. S10. Tab. S5 summarizes these latter marginalizations for dimers in the larger POD+CC simulation, corresponding to Figs. 7 and 8 of the main manuscript. In that case, the five AB dimers are central (lying in the "inner circle"), while the five CC dimers are peripheral (lying in the "outer circle").    Figure S10: Ilustration for the cartesian marginalization of the ROMs presented in Table S5. Mean and standard deviation of the ROM distribution function are decomposed in the cartesian coordinates of the projected plane.