Effect of the Crystal Environment on Side-Chain Conformational Dynamics in Cyanovirin-N Investigated through Crystal and Solution Molecular Dynamics Simulations

Side chains in protein crystal structures are essential for understanding biochemical processes such as catalysis and molecular recognition. However, crystal packing could influence side-chain conformation and dynamics, thus complicating functional interpretations of available experimental structures. Here we investigate the effect of crystal packing on side-chain conformational dynamics with crystal and solution molecular dynamics simulations using Cyanovirin-N as a model system. Side-chain ensembles for solvent-exposed residues obtained from simulation largely reflect the conformations observed in the X-ray structure. This agreement is most striking for crystal-contacting residues during crystal simulation. Given the high level of correspondence between our simulations and the X-ray data, we compare side-chain ensembles in solution and crystal simulations. We observe large decreases in conformational entropy in the crystal for several long, polar and contacting residues on the protein surface. Such cases agree well with the average loss in conformational entropy per residue upon protein folding and are accompanied by a change in side-chain conformation. This finding supports the application of surface engineering to facilitate crystallization. Our simulation-based approach demonstrated here with Cyanovirin-N establishes a framework for quantitatively comparing side-chain ensembles in solution and in the crystal across a larger set of proteins to elucidate the effect of the crystal environment on protein conformations.


Introduction
Protein side-chain conformations observed by X-ray crystallography play a key role in understanding biological function, such as catalysis and molecular recognition, and in identifying a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 of an arginine residue (Arg76) near the di-mannose biding site of P51G-m4-CVN that was proposed to play an important role in ligand binding [17]. Through crystal and solution MD simulation, we found that Arg76 was trapped in a single conformation in the crystal, which did not correspond to the dominant solution state. This observation brought into question the putative role of Arg76 in ligand interaction. It also prompted us to perform additional crystal and solution simulations of CVN in order to assess the effect of the crystal environment on conformation and dynamics across all of the side chains in the protein.
In this work, we assessed the effect of the crystal environment by quantitatively comparing side-chain conformational dynamics of the P51G-m4-CVN:di-mannose complex in solution and crystal MD simulations. Side-chain ensembles obtained from simulation show reasonable agreement with X-ray data, especially for residues participating in crystal contacts during crystal simulation. The simulations support the use of surface engineering to facilitate protein crystallization and provide insight into the influence of crystal packing on side chains in CVN. Combined with crystal simulation protocols, our quantitative analyses performed in this study demonstrate a practical framework for obtaining a broader view of crystal-packing effects on side-chain conformational dynamics by considering a larger set of proteins.

MD Simulation
A detailed description of the crystal and solution simulation setup for the P51G-m4-CVN: di-mannose complex (PDB ID: 2RDK) can be found in reference [18]. The crystal structure comprises two independent chains (A and B), both of which have two copies in the unit cell (i.e., four total chains: A1, A2, B1, and B2; Fig 1b). In short, all simulations were performed  [17]. The secondary structure of molecules A and B are shown in green and yellow cartoon representation, and the di-mannose ligands are depicted as grey spheres. Residues participating in crystal contacts with neighboring molecules are shown as sticks and dots, colored in blue and magenta for molecules A and B, respectively. (b) The simulated triclinic CVN unit cell comprises four chains: two chain A's (blue and green) and two chain B's (orange and yellow). One A/B pair (e.g., the green and yellow chains) forms the asymmetric unit in the original monoclinic crystal structure. doi:10.1371/journal.pone.0170337.g001 using the Amber10 package [19,20]. The FF99SB [21,22] and GLYCAM06 [23] force-field parameters were employed for the protein and ligand, respectively. Both solution and crystal simulations were performed with explicit water molecules. Crystal simulation does not have a bulk solvent region, which is required for solution simulation. Periodic boundary conditions for crystal simulation were set to coincide with the unit cell geometry. The Particle Mesh Ewald method [24] was used for calculating non-bonded interactions with a cutoff of 10 Å for direct calculations. An extensive equilibration phase was first performed for the crystal simulations to ensure an appropriate density of the system [18]. We performed production simulations in solution and in the crystal environment for a total of 32 ns each in the NPT and NVT ensembles, respectively. For the crystal simulations, the presence of multiple copies of proteins with identical packing environments in the unit cell (two chain A's and two chain B's) results in increased sampling (128 ns total-64 ns for both chains A and B). The temperature was maintained at 300 K (which closely corresponds to the crystal growth temperature of 298 K) [17]. All trajectories were processed with ptraj [20] and visualized with VMD [25].

Side-Chain Conformations
Side-chain conformations were defined by rotameric states, as listed in the Penultimate Rotamer Library [26]. We measured side-chain torsion angles sampled during simulation using an in-house edited version of the rotamer program that is part of CCP4 [27]. For each residue, the torsion angles were mapped to the rotamer library to identify the nearest rotameric state. The rotameric state IDs used in the following data correspond to the rotamer tables given in http://www.ccp4.ac.uk/html/rotamer_table.html. Rotamers for each residue can be viewed and downloaded at http://kinemage.biochem.duke.edu/databases/rotkins.php [26]. Conformations were defined for all rotameric residues (i.e., not Ala and Gly) for snapshots extracted every 10 ps over the last half of the solution and the crystal simulations. Disulfide-bonded cysteins were also excluded from analysis. We constructed normalized rotamer probability distributions for the analyses described below. In the case of the crystal simulations, rotameric data was combined for the identical chains in the unit cell (A1/A2 and B1/B2). In the P51G-m4-CVN crystal structure, 81/101 residues in chain A and 80/100 residues in chain B have rotamers.

Definition of Crystal Contacts
We defined a residue as participating in crystal packing if it has any heavy atom within 4 Å of a heavy atom belonging to a crystal neighbor. Such intermolecular pairs were identified with the ncont program that is part of CCP4 [27]. Under this criterion, chains A and B in CVN have 41 and 40 residues participating in a crystal contact, respectively. Thirty-six of the contacting residues in each chain have rotamers.

Solvent Accessibility
To determine if a given residue in CVN is solvent-exposed, we computed its relative solvent accessibility as the per-residue solvent accessible surface area (SASA) divided by the theoretical maximum accessibility of the residue in an Ala-X-Ala peptide [28]. The SASA per residue was calculated separately for chains A and B using the AREAIMOL program [28,29] in CCP4 [27] with a probe sphere radius of 1.4 Å and a density of 15 points per Å 2 . A residue was defined as exposed if its relative accessibility was > 0.2 [2].

Comparison of Side-Chain Conformational Ensembles
Rotamer probability distributions were computed for each rotameric residue (S1-S14 Figs). These distributions were used for several quantitative comparisons of side-chain conformational dynamics. Several representative histograms and their corresponding measures of conformational dynamics, as described below, are presented in Fig 2. For crystal simulation, the original 2RDK crystal structure [17] in space group P2(1) with two crystallographically independent molecules (A and B) in the asymmetric unit was converted to a triclinic one (space group P1) with the same unit cell parameters but four symmetrically independent molecules. Despite the formal loss of the 2(1) symmetry, these four molecules can be split into two pairs of protein chains (A1/A2 and B1/B2). For a given pair (A1/A2 or B1/B2), the molecules reside in nearly identical crystal environments. Therefore, the rotamer probability histograms in molecule A1 were combined with A2, and the same was done for molecules B1 and B2. The use of crystal simulations with multiple molecules in the crystal lattice was proposed as a method to accelerate conformational sampling and is described in detail elsewhere [30]. Since molecules A and B are not crystallographically symmetric, they are in different crystal environments. Thus, the conformational sampling of side chains in these two molecules is expected to be different even for solvent-exposed residues, and we consider side-chain conformational space in molecules A and B separately. Finally, for the purpose of this study, we consider the dynamics of solvent-exposed residues in CVN. We divide these residues into two groups: those that interact with neighboring molecules in the crystal lattice ("contacting") and those that do not ("non-contacting") (see Definition of Crystal Contact above for details). The analyses presented below were performed for all solvent-exposed residues, as well as for the subsets of contacting and non-contacting solvent-exposed side chains.
Likelihood Score. We calculated a likelihood score to evaluate how well the rotamer probability distributions from the solution and crystal simulations reflect the X-ray sidechain conformations. From simulation, we obtain the probability P i y (x) that a given residue i is in rotameric state x under condition y (solution or crystal). For each residue, the rotamer observed in the X-ray data is denoted as x i xray . As a reference, we consider the null model for this residue were equally accessible. We then calculate the average of a relative likelihood score as where n res is the number of rotameric residues considered. Normalizing by P i rand (x) ensures that residues with a small number of accessible rotamers, which have a higher probability to match the experimental rotamer at random, do not dominate the score. We also report the quotient L crystal /L solution : values greater than 1 indicate that the crystal simulations show a higher correspondence than do the solution simulations with the X-ray data and vice versa. Rotamer Agreement. Side-chain conformations in simulation and in the X-ray model were considered in agreement if one of the two most dominant rotamers (the two histogram bins with the greatest number of counts) visited during simulation was the same as that observed in the experimental structure. Rotameric states were determined to be the same if each side-chain torsion angle was within a ± 30˚tolerance of the corresponding χ angle listed in the rotamer library. We note that this criterion is stringent for longer residues (i.e., those with more than two torsion angles along the side chain), as each angle must match to count as conformational agreement.
Overlap Coefficient. To assess the agreement between rotamer probability distributions for solvent-exposed residues in solution and crystal simulation, we compute an overlap coefficient (OC) [31,32]: p S and p C are the probabilities for rotamer state i in solution and crystal simulation, respectively. The sum is over all rotameric states N available to a particular residue. OC values range from 0 to 1, representing zero and full histogram overlap, respectively. Analyzing conformational agreement with the Kullback-Leibler divergence (a score for which a value of zero indicates perfect histogram overlap) [33] yields the same trend as the OC values (correlation of -0.9). Per residue OC values computed for two equal-length and non-overlapping trajectory segments show an R-value of 0.76, indicating sufficient sampling. We used perresidue OC values from these two trajectory segments as an estimate of uncertainty (S15a and S15b Fig).
Conformational Entropy. To compare side-chain dynamics in the crystal and solution simulations, we analyze the conformational entropy of side chains on the protein surface.
This conformational entropy can be expressed as a multidimensional well for each side chain [34,35]: where the index i runs over all N rotameric states for a particular residue and p i is the probability of the i th rotameric state. S conf is split into two terms: the first term corresponds to the thermal motion within a given conformational well (S therm i ) and the second term represents the configurational entropy arising from the sampling of distinct rotameric states. In this approximation, side-chain movement of each residue between these states is considered to be independent from its neighbors, which may result in the overestimation of the total entropy, but allows for a more computationally efficient calculation. We further simplify the analysis with the assumption that S therm i is the same in each rotameric state for a particular side chain. In the current study, we are interested in ΔS conf for the same residue in different environments (solution versus crystal) at the same temperature, and thus the thermal motion terms cancel each other [35]. Thus, the conformational entropy per side chain reduces to only the configurational (second) term in Eq 3. S conf is also calculated with just the configurational component in several studies focusing on protein folding [36][37][38][39][40][41]. We define ΔS conf as S conf,crystal − S conf,solution , and these entropy values are reported as TΔS conf (T = 300 K). Per residue TΔS conf values determined for two equal-length and non-overlapping trajectory segments yield an R-value of 0.54, suggesting adequate sampling. In the same manner as the OC above, we considered the results from these two trajectory segments to estimate the per-residue uncertainty in TΔS conf (S15c and S15d Fig).

Comparison to X-ray Structure Data
We first compared the experimental side-chain conformations in the X-ray data to the rotamers sampled during solution MD simulation. We calculated a likelihood ratio (see Materials and Methods) that assesses the degree to which our simulations capture side-chain conformations observed in the X-ray structure relative to a null model. In the null model, each rotamer for a given residue is considered to be equally accessible. With this approach, the significance of agreement is appropriately considered for residues with different sizes and thus number of rotameric states.
The likelihood scores averaged across solvent-exposed residues in CVN are all well above unity (left-hand side and middle of Table 1), which indicates better agreement with experiment than simply matching rotamers at random. For the set of all solvent-exposed residues, In chains A/B, at total of 54/55 are located on the protein surface, of which 32/31 are contacting and 22/24 are non-contacting. c Values are averaged over the set of all solvent-exposed rotameric residues ("all") and for the subsets of contacting ("cont") and non-contacting ("non-cont") solvent-exposed residues. For residues with alternate conformations in the crystal structure, conformation A was used.
the rotamers in solution and crystal simulation are~2.5-2.9 and~3.4-4.0 times, respectively, more likely to match the X-ray data than the null model. These scores are roughly equivalent to agreement percentages of~50-60% and~70-80% for the solution and crystal simulations, respectively, assuming that the surface residues have five rotameric states on average. The ratio reaches as high as~5 for contacting residues in the crystal simulations. Differences in the likelihood ratios between the sets of contacting and non-contacting residues could be due to differences in residue composition (S16 Fig). For example, while roughly a third of the residues in both of these sets in chain A correspond to longer and more flexible residues (ARG-GLN in S16 Fig), nearly half (0.44) and just a quarter (0.27) of the residues in the contacting and noncontacting sets, respectively, represent shorter and less flexible side-chains (TYR-SER). A similar trend is observed in chain B. We also calculated that~60% and~75% of the side-chain conformations in the experimental structure correspond to one of the two most populated rotameric states from solution and crystal MD simulation, respectively (S1 Table). In the crystal simulations, this level of agreement with experiment increases to 84% for the set of contacting residues. A caveat of such an approach is that the comparison is biased toward smaller residues that have fewer accessible rotamers (e.g., Thr with three states) and are thus more likely to match with experiment at random. A certain level of disagreement should be expected from these analyses due to the fact that the data for the X-ray structure were collected 100 K while our simulations were performed at room temperature (see Discussion for more details).
Overall, the likelihood scores and the comparison of the most dominant rotameric states indicate that the side-chain conformational space sampled during our MD simulations agrees well with experiment. Thus, our further comparative analysis of side-chain ensembles in solution and in the crystal focuses on results obtained from the MD simulations.

Comparison of Side-Chain Conformational Dynamics in Crystal and Solution MD
We first assessed the relative performance of solution and crystal simulation to reproduce X-ray side-chain conformations by taking the quotient of likelihood ratios (see Methods and the righthand side of Table 1). For this quotient, values greater than 1 indicate that the crystal simulations exhibit a higher level of agreement with X-ray side chains than the solution simulation. Values less than 1 signify the opposite trend. For the set of all solvent-exposed rotameric residues, the ratio is greater than 1 (~1.4) for both chains A and B ( Table 1). The ratio increases to 1.45 and 1.60 for the set of residues that participate in crystal contacts, and diminishes closer to 1 for the group of non-contacting residues. Thus, this analysis indicates that the crystal simulations show a higher degree of correspondence than do the solution simulations with the X-ray data.
Rotamer probability histograms obtained from solution and crystal MD simulations are shown in Fig 2 and in S1-S14 Figs. To compare side-chain conformations in these ensembles, we computed the overlap coefficient (OC; Eq 2) between the probability distributions from the solution and crystal MD simulations. OC values closer to one and zero correspond to better and poorer agreement between histograms, respectively. As shown on the left-hand side of Table 2, the average OC (<OC>) across all solvent-exposed residues is~0.6-0.7, indicating that the majority of the side chains sample a comparable region of conformational space in solution and in the crystal. <OC> for the non-contacting set is higher than that of the contacting set by 0.05 (chain A) and 0.15 (chain B).
The rotamer histograms were also used to calculate the average change in residue conformational entropy (<TΔS conf >) as a measure of side-chain dynamics (right-hand side of Table 2). We compare side-chain entropies in the crystal and in solution by only considering the configurational component of the conformational entropy, which is a reasonable approximation [35] (see Materials and Methods for more detail). ΔS conf is defined as S conf,crystal −S conf,solution , and thus values less than zero represent a decrease in side-chain dynamics in the crystal compared to in solution. A negative <TΔS conf > is observed for all solvent-exposed residues as well as for the contacting and non-contacting subsets in both chains A and B. This effect is four (chain A) and three (chain B) times greater in magnitude for the contacting set compared to the non-contacting set. While the values of <TΔS conf > are modest, a significant loss in conformational entropy is observed for several long, polar and contacting surface residues (TΔS conf approaches -1 kcal/mol; Fig 3). No residue exhibits a TΔS conf greater than~0.3 kcal/mol. The per-residue conformational dynamics in chains A and B is expected to be different for solvent-exposed residues in CVN, as the two chains reside in different crystal packing environments.
We glean further insight into the effect of the crystal on side-chain conformational dynamics by examining the relationship between OC and TΔS conf values for each residue (Fig 4). The plots are divided into quadrants to emphasize the relationship between side-chain conformation and dynamics. Filled and open circles denote crystal-contacting and non-contacting residues, respectively. Region I comprises residues for which the crystal has little or no effect on <OC> and <TΔS conf > (kcal/mol) are the average values over the set of all solvent-exposed rotameric residues ("all") and for the subsets of contacting ("cont") and non-contacting ("non-cont") residues. The standard deviation is reported in parentheses next to the average.   Table 2.) The majority of residues fall within this region. The residues for which conformation and/or dynamics are notably different in solution and in the crystal are indicated by data points located outside of region I. Most of the affected side chains undergo a change in conformation (OC < 0.5) with little or no change in dynamics (Fig 4, region II). Both contacting and non-contacting residues fall into this scenario. For the long, polar contacting residues highlighted in Fig 3, the strong decrease in dynamics is accompanied by a change in conformation (Fig 4, region III). Interestingly, we do not observe the scenario in which dynamics is notably decreased while maintaining a dominant solution conformation (Fig 4 region IV). In other words, if the change in dynamics is noticeable for some residues (for example, as indicated by the thermal factor), then its conformation is also most likely affected. Rotamer histograms and crystal packing interactions of several of the most affected residues are highlighted in Fig 5. Discussion X-ray models play an essential role in functional interpretations of proteins. We must carefully evaluate the structural features of these models in the context of the crystal environment. Toward this aim, we have compared side-chain conformational dynamics in solution and crystal MD simulations of a protein for which a high-resolution X-ray structure is available.
Side-chain conformational ensembles obtained from the simulations for solvent-exposed residues in CVN exhibit a high level of correspondence with the X-ray model, as revealed through the likelihood analysis. Although differences in side-chain conformations in simulation and in the crystal structure may, at least to some degree, be due to a rigorous definition of conformational similarity (i.e., all torsion angles along the side chain must match), other factors may contribute. Firstly, CVN was simulated near the crystal growth temperature of 298 K instead of at the cryogenic temperature for data collection (100 K), as low temperatures may be unsuitable for standard MD force fields. Moreover, cryocooling increases the extent of lattice contacts, especially for longer residues (Gln, Glu, Arg, and Lys) [42], and may remodel over a third of all side chains relative to structures solved at room temperature or even eliminate conformations essential for function [8]. Taking into account these considerations, it is possible that the simulated side-chain ensembles more closely represent local conformational dynamics in the crystal before flash freezing, which could account for a certain level of disagreement between MD and X-ray. Accordingly, the deposition of an ensemble of X-ray models [43], for which precedent exists from NMR, may lead to a more comprehensive comparison of side-chain conformational dynamics in the solution and crystal environments.
While in simulation the average loss in side-chain conformational entropy in the crystal is minimal (even for contacting residues), significant decreases in entropy occur for a handful of long and polar contacting residues on the protein surface. TΔS conf for these cases approaches -1 kcal/mol. This value is comparable to the average loss in conformational entropy per residue upon protein folding (-0.95 kcal/mol) [44], indicating that crystal packing can significantly diminish conformational dynamics for long, solvent-exposed side chains. To fully assess the free energy contribution to packing interface energetics, both entropy and enthalpy terms need to be estimated, as they typically compensate each other in a complex manner [45]. While our finding supports application of the SER method [5,6] in which longer residues on the protein surface are replaced with shorter ones (e.g. Ala) to facilitate crystallization by minimizing losses in conformational entropy, one must also consider that this method may affect favorable enthalphic interactions. Nevertheless, our study provides detailed pictures of changes in side-chain conformational dynamics and entropy upon crystal packing formation.
Finally, our results suggest a model for the effect of crystallization on side-chain conformational dynamics. All residues that exhibit different behavior in crystal versus solution MD simulation show a notable change in conformation and/or dynamics. The large majority of residues that exhibit a change in conformation still retains a similar degree of dynamics in the crystal (region II in Fig 4). These cases result from the shift in either broadly distributed or in rather restricted rotamer populations (S17 Fig), which selects a minor solution state or a new rotamer. Moreover, several of these cases correspond to non-contacting residues, indicating that longer-range effects (e.g., electrostatics and solvation), in addition to direct lattice contacts, may play a role in influencing side-chain conformation [2]. Such conformational changes can also be accompanied by a strong decrease in dynamics for long, polar and contacting residues on the protein surface (region III in Fig 4), thus counteracting crystal formation.
Inspection of rotamer state histograms shows that, for some residues, crystal packing may perturb side-chain conformations to rotamer states that are already observed in the solution environment and correspond to either major or minor populations. Such a scenario would correspond to a conformational selection model [46]. For some residues, new rotamer states are realized as the result of packing, which can be characterized by an induced fit model [47]. Both induced fit and conformational selection have been proposed in studies of the influence of the crystal environment on global protein structure [48][49][50][51]. The possibility of multiple ways through which crystal packing affects protein conformational dynamics underscores the complexity of the crystal environment. The methods for crystal MD simulation and the quantitative analyses demonstrated in this study can be used to investigate a larger set of X-ray structures in order to more fully understand this effect.
Supporting Information S1 Fig. Rotamer probability distributions for rotameric residues in Cyanovirin-N: chain A, residues 3-18. The title of each individual histogram specifies the chain and the residue threeletter abbreviation followed by the residue number, and is denoted in red font if the residue participates in a crystal contact. Rotameric states are indicated by numbers on the horizontal axis, and correspond to the order in which they appear in the Penultimate Rotamer Library for each residue (see ref. [26] in the main text). Black and white bars correspond to distributions obtained from solution and crystal MD, respectively. The red circles on the horizontal axis denote the rotamer observed in the X-ray structure; red crosshairs indicate alternate X-ray conformations. These details are the same for S1-S14  (a and b) and TΔS conf (c and d) calculated over the last half of simulation (16-32 ns; solid line) for chains A (top panels) and B (bottom panels). Error bars represent the uncertainty in OC and TΔS conf for each residue. Measurements of these quantities were calculated during two equal-length, non-overlapping trajectory segments (16-24 ns and 24-32 ns) and then taken as lower and upper bound estimates. The error bars represent the interval encompassed by these bounds. (TIF) S16 Fig. Residue composition of contacting and non-contacting solvent-exposed residues in CVN. Frequency of residues in the contacting (top panels) and non-contacting (bottom panels) sets of residues in CVN chains A (at left) and B (at right). Residues on the horizontal axis are ordered based upon the number of rotameric states: left (more rotamers) to right (less rotamers) according to the Penultimate Rotamer Library [26]. Ala and Gly do not have any rotamers, and Cys is excluded from the analysis since the cysteins in CVN participate in disulfide bonds. (TIFF) S17 Fig. Rotamer probability distributions for several residues in CVN that show a change in conformation but not in dynamics. In the same manner as Fig 2 of the main text, rotameric states are denoted by numbers on the horizontal axis, and correspond to the order in which they appear in the Penultimate Rotamer Library [26] for each residue. Distributions obtained from solution and crystal simulation are shown with black and white bars, respectively, and the rotatmer observed in the X-ray structure is denoted by the red dot on the horizontal axis (legend in panel B).  Table. Agreement between dominant rotameric states from MD and X-ray for solventexposed residues in CVN. Side chains in simulation and experiment are determined to be in agreement if one of the two most dominant rotamers from simulation matches the X-ray conformation. Percent agreement is averaged over all solvent-exposed rotameric residues ("all") and for the subsets of contacting ("cont") and non-contacting ("non-cont") solvent-exposed residues. For residues with alternate conformations in the crystal structure, conformation A was used for comparison. (DOCX)