Long-Range Intra-Protein Communication Can Be Transmitted by Correlated Side-Chain Fluctuations Alone

Allosteric regulation is a key component of cellular communication, but the way in which information is passed from one site to another within a folded protein is not often clear. While backbone motions have long been considered essential for long-range information conveyance, side-chain motions have rarely been considered. In this work, we demonstrate their potential utility using Monte Carlo sampling of side-chain torsional angles on a fixed backbone to quantify correlations amongst side-chain inter-rotameric motions. Results indicate that long-range correlations of side-chain fluctuations can arise independently from several different types of interactions: steric repulsions, implicit solvent interactions, or hydrogen bonding and salt-bridge interactions. These robust correlations persist across the entire protein (up to 60 Å in the case of calmodulin) and can propagate long-range changes in side-chain variability in response to single residue perturbations.


Introduction
Allostery is an essential feature of protein regulation and function. Allosteric regulation acts by linking distant sites of a protein together in such a way that information about one site is transmitted to and influences the behavior of another. Chemical modifications as subtle as the phosphorylation of a serine residue can cause dramatic changes in protein function [1], and shifts in structure as small as 1 Å have even been shown to modify behavior in a domain up to 100 Å away [2]. Traditionally, allostery has been understood as a feature of symmetric, multisubunit proteins where the binding of a ligand to one subunit facilitates the binding of similar ligands to the other subunits, resulting in cooperative binding transitions [3]. However, allosteric behavior has now been observed within a single protein domain [4] and its definition extended to include any shift in protein structure and function at one site resulting from modification at another.
Moreover, it was proposed some time ago that information regarding the binding of a ligand or other modification at one protein site could be transmitted through altered protein fluctuations, even if the protein's average structure remains unaffected [5]. Two particularly clear examples of this kind of dynamic allostery have been recently observed in the binding of cAMP to the CAP dimer and in the subsequent binding of the cAMP-activated CAP dimer to DNA [6,7]. In the first step, the binding of cAMP to one monomer of CAP lowers the binding affinity of cAMP to the second even though no structural changes are observed, and calorimetric analysis suggests that the negative cooperativity results entirely from entropic effects [6].
The observed allosteric effect of protein fluctuations has led to the idea that allostery may be present in all proteins [8][9][10], and that functional allostery simply exploits and refines pre-existing long-range correlations and interaction networks. In fact, such networks are to be expected given the physical constraints of the densely-folded, yet fluctuating, protein. Just as in any condensed phase, significant fluctuations in this packed environment are permitted through correlated motions.
Qualitative experimental evidence for long-range correlation abounds in studies demonstrating allosteric regulation, as exemplified in [1] and [2]. However, attempts to quantify these longrange correlations using NMR techniques have proven difficult [11][12][13], and much of our current understanding of correlated motions has come from analyses of molecular dynamics (MD) simulations. Traditional MD trajectories evaluated with covariance matrices and principle component analyses [14] have shed light on important features of intra-protein correlations, such as how backbone motions tend to be significantly correlated within secondary structural units [14] and how a few flexible hinge residues can cause large motions within otherwise stable folds [15]. Energy-perturbative MD simulations, such as anisotropic thermal diffusion [16] and pump-probe MD [17], have been used to observe the rapid anisotropic diffusion of an energy perturbation within the protein. However, these MD studies are limited in their ability to characterize sluggish rearrangements and have largely neglected the contributions of correlated side-chain fluctuations.
Within the folded protein, side-chains are significantly less ordered than the backbone [18], and alternative side-chain configurations in protein crystals are more prevalent than previously thought [19]. The thermodynamic importance of this side-chain variability in calmodulin-ligand binding has been highlighted in Refs. [20][21][22]. In addition, the participation of side-chain fluctuations in long-range networks has been demonstrated through NMR mutational studies [9,23]. In one MD simulation designed to incorporate data from NMR experiments, correlations were even observed between side-chains whose motions appeared decoupled from those of their backbone atoms [24].
Double mutant cycles [25] have also been applied to examine the dependence of folding and binding on interactions between specific residue pairs. While such mutational studies can demonstrate the interactions of certain residue pairs, they are experimentally demanding, making it difficult to obtain a comprehensive picture of any long-range side-chain interactions present, in particular those involving residues essential for folding stability. As an alternative, an evolutionary statistical network analysis method has been developed to determine networks of correlated residues that are common to evolutionarily related proteins [26]. Although this method has had some success in identifying allosterically-related regions within proteins [27], its robustness has been challenged in a study on artificially-generated sequences [28]. In principle, it is also limited to detecting correlated changes in residues during evolution, presumably highlighting only correlated networks with a selected function and can therefore say little about the presence or absence of other correlations.
In this study, we employed an atomistically detailed model to examine the kinds of correlations that emerge among side-chain fluctuations within the natively-folded protein. The computationally inexpensive nature of our model energy function [21], together with a variety of advanced Monte Carlo sampling techniques, allowed an unprecedentedly thorough investigation of the correlations among these fluctuations that result from different types of interactions. Keeping the backbone fixed, we find that long-range correlation of side-chain fluctuations can emerge from different types of atomic interactions, that significant correlations persist across the entire folded protein, and that these correlations alone can propagate changes in structure and mobility over scales as large as 50 Å .

Results
A simple model was used to explore side-chain rotations In order to investigate the correlated rearrangements that arise from side-chain fluctuations alone, it was necessary to isolate these motions from other sources of configurational change. For this reason, we held the backbone fixed in its folded conformation throughout the calculations described in this paper. While fluctuations resulting from bond stretching and angle bending are important and likely to give rise to a great deal of correlated motion, we focused here instead on side-chains' torsional degrees of freedom, as these rotations give rise to the changes in atomic configurations that are largest in magnitude.
This study made use of a model we designed to roughly capture the essential physical determinants of side-chain behavior within the folded protein, namely, steric repulsions, van der Waals attractions, hydrogen bonding, salt-bridge interactions, and solvation [21]. While not fully realistic in every particular (e.g., resolving the positions only of nuclei heavier than hydrogen), the model properly represents the variety, strength, and anisotropy of the side-chain interactions and the physical constraints of the folded backbone on which they reside.
We explored this model with Monte Carlo (MC) sampling (see Methods). Each MC step consisted of the proposed rotation of a single randomly-chosen side-chain dihedral angle. To promote broad sampling of thermally accessible configurations, we permitted moves through sterically disallowed regions of state space. Using exact correction methods, we constructed equilibrium averages with contributions only from sterically allowed structures (those in which each heavy atom excludes a spherical volume with radius 0.75 times its van der Waals radius). See [21] for details.
Boltzmann-weighted ensembles of the side-chain configurations determined using this sampling procedure include a diverse set of rotamer states and correlate well with experimental observations of side-chain fluctuations and changes in entropy upon ligand binding [21]. We therefore applied this method to investigate correlations among the diverse set of rotamer states.
Single residue perturbations effected changes in sidechain fluctuations throughout the protein Several experimental approaches that probe correlations within proteins mutate single residues and observe the resulting changes in structure, function, or dynamics [9,23,29,30]. We began our examination of side-chain correlations in a similar way by modeling the changes in torsional variability that occured throughout a small globular protein, barstar, as a result of perturbations to a single side-chain. Previous MD simulations of barstar suggested a relatively rigid backbone, as well as significant variability in side-chain packing [31]. Additional results from 19 F NMR experiments showed that the P27A mutation results in detectable dynamic changes even in residues more than 12 Å away from the mutation site, and suggested that the motion of barstar's side-chains gives rise to a network of correlated residues [32].

Author Summary
Allosteric regulation occurs when the function of one part of a protein changes in response to a signal recognized by another part of the protein. Such intra-protein communication is essential for many biochemical processes, allowing the cell to adapt its behavior to a dynamic environment. Most studies of the information conveyance underlying allostery have to date focused on the role of backbone motions in mediating large structural changes. Here we focus instead on more subtle contributions, arising from fluctuations of side-chain torsions. Using a model for side-chain bond rotations in the tightly packed environment imposed by native backbone conformations, we observed significant sensitivity of side-chain organization to small, localized perturbations. This susceptibility arises from correlations among side-chain motions that can propagate information within a protein in complex, heterogeneous ways. Specifically, we found appreciable correlations even between side-chains distant from one another, so that the effect of a minor perturbation at one site on the protein could be observed in the altered fluctuations of side-chains throughout the protein.
In conclusion, we have demonstrated that the statistical mechanics of correlated side-chain fluctuations within a model of the folded protein provides the basis for an unconventional but potentially important means of allostery.
Quantifying such correlated fluctuations requires a metric that can report on the extent of local variability at the single residue scale. For this purpose, we calculated the Gibbs entropy for each residue, S S (res) , associated with occupying distinct rotameric states.
where H i~f h (i) 1 , . . . ,h (i) N i g denotes the set of torsional variables for each of the N i rotatable sp 3 -sp 3 hybridized bonds belonging to residue i, and h (i) n denotes the set of ideal torsional angle values for the n th torsional angle in residue i. While sp 3 -sp 2 hybridized bonds were allowed to rotate, they were also excluded from the statistical analysis due the difficulty in determining ideal dihedral angles [33]. The probabilities p(H i ) of these 3 N i states were calculated in simulations by constructing histograms over the course of importance sampling from the Boltzmann distribution of side-chain configurations. In doing so, we focused on the interrotameric rearrangements (those between the three most likely energy basins for the torsional angle of an sp 3 -sp 3 hybridized bond), which allowed the calculation of absolute local entropies that would have been impractical at a higher level of resolution. However, intra-rotameric fluctuations (those within a single torsional energy basin) are sensitive to the structural perturbations we applied, and it is necessary to allow deviations from these ideal angles, w~x{h, in order to fully account for the variety of possible side-chain configurations [34,35]. A quadratic energy is associated with these deviations w (see Methods). Fig. 1 shows the change in S S (res) i that resulted from a singleresidue perturbation. Residues shown in red demonstrated a statistically-significant increase in side-chain variability, while the variability of those shown in blue decreased (see Methods). The perturbations shown, a mutation of isoleucine to glycine at position 86 ( Fig. 1(a)) and a constraint of the glutamate in position 46 to its crystalline configuration ( Fig. 1(b)), were chosen to demonstrate the types of changes possible. (A comparison to the previously studied P27A [32] was not possible, since neither proline nor alanine residues have rotatable side-chains in our model.) Surprisingly, removing the isoleucine side-chain at position 86 (circled in Fig. 1(a)) not only affected the local entropy of a few neighboring residues, but also altered the side-chain variability of residues much farther from the site of mutation. Motions of even distant residues must therefore be linked to those of residue 86. Because the interaction potentials in our model are short in range, the changes in fluxionality that resulted from this mutation must propagate through neighboring residues to those farther away. Fig. 1(b) shows analogous changes that resulted when a residue, E46 (circled), is merely frozen into its crystalline conformation. Such a reduction in motion of one sidechain might be expected to result in the increased variability of its nearest neighbors. However, we found that even so subtle a constraint resulted in unexpected and wide-spread changes in the side-chain fluctuations. Some residues near the frozen amino acid even became slightly more constrained while the variability of a few residues farther away increased. A similar effect was observed in NMR experiments upon ligand-binding in stromelysin 1, where the few residues participating in strong interactions with the ligand lost mobility, but the order parameters of those farther away actually decreased upon binding, indicating an increase in their entropy [36]. It was suggested that the increased fluctuations far from the binding site may counter the loss of entropy at the binding site itself and therefore assist in modulating the thermodynamics of binding.
The changes in side-chain statistics we observed as a result of these single residue perturbations are not readily intuited. Increasing or decreasing disorder at one site may result in the same or opposite effect in other regions of the folded protein, and the effects cannot be easily predicted from the spatial arrangement of the residues. Correlated fluctuations result from several types of interactions and persist throughout protein Correlated fluctuations within the folded protein are commonly quantified using Pearson correlation coefficients [14]. Despite their limitations in detecting nonlinear correlations and correlations between the motions of particles moving orthogonally to one another [37], Pearson coefficients have yielded important information regarding correlated motions. These coefficients are most appropriate for backbone motions as these motions are expected to be correlated in similar directions and to be linear in nature due to the stiffness and collective motions of various secondary structural elements [14]. However, in a study analyzing the results of molecular dynamics simulations of protein G and lysozyme, a generalized correlation measurement based on mutual information was able to detect significantly more correlation than the Pearson coefficient [37]. Side-chain motions are even more likely to fall outside the purview of the Pearson coefficient, dominated as they are by dihedral angle rotations. A parameter based on mutual information is able to provide a more robust measurement of correlated side-chain fluctuations [37], and can be readily derived from simulation data in a similar way to the entropies calculated in the preceding section. We therefore chose to consider the mutual information associated with each pair of residues within a folded protein.
Pairwise mutual information is a measure of the correlation between two random variables. In our case it reports on the degree of correlation between the rotameric state populations of two residues. The mutual information I ij between residues i and j can be calculated as where p(H i ,H j ) denotes the probability of each of the 3 N i : 3 N j joint states of residues i and j, and N i is the number of rotatable sp 3 -sp 3 hybridized bonds in residue i. After rearranging Eq. 2 and substituting in Eq. 1, this becomes where S S (res) ij is the Gibbs entropy associated with the discrete rotameric states for residues i and j considered jointly. Thus when the fluctuations of the two residues are completely independent of one another, and I ij~0 . However, when the residues are correlated, their entropies are inseparable, and I ij w0.
One difficult feature of mutual information is that a numerically-calculated estimate of two completely uncorrelated variables only approaches zero at the limit of infinite sampling. For any finite sampling, a small amount of spurious mutual information will be observed, regardless of the actual coupling between the two variables [38]. When calculating I numerically, this inherent bias in the noise must be accounted for in order to determine the mutual information's statistical significance. We used two approaches to address this bias. In the first, we subtracted out the expected spurious mutual information to estimate the true amount of correlation between the two variables. The resulting excess mutual information, I', between residues i and j is defined as I ij (n) is the numerically-calculated mutual information measured over a finite sampling period consisting of n MC steps. I (ref) ij (n) is the same measurement, but this time computed within a noninteracting reference state, where no correlations are possible (see Methods for details). I' is then an estimate of the mutual information of the infinitely-sampled ensemble. In the second approach, we focused on the robustness of the mutual information measurement, calculating its signal-to-noise ratio, The extended structure of calmodulin (3cln [39]), as shown in Fig. 2(a), provides an exemplary test case for examining how side- Figure 2. Structural representations of extended crystalline calmodulin. The crystal structure (a) and contact map (b) of calciumbound calmodulin (3cln [39]). The calcium ions are shown in yellow, and several residues are labeled in both panels for reference. The distance between each pair of C a atoms is indicated by color (see scale bar) in (b), where xand y-axes run over residue labels. The residue labeling corresponds to the full sequence, however residues that do not possess torsional degrees of freedom in our model (A, G, P, and all residues bound to the calcium ions) are excluded from the contact map. doi:10.1371/journal.pcbi.1002168.g002 chain fluctuations are correlated within the folded protein.
Although in solution this chain collapses, the structure of the crystal is extended, featuring two globular regions connected by an extended a-helix. Any information shared between the two lobes must pass through this extended a-helix, since the pairwise interactions in our model largely decay by 7 Å . We calculated the pairwise excess mutual information, I' ij , for all residue pairs ij in Ca 2z -bound calmodulin, as well as the ratio of I ij =I (ref) ij in order to gauge the significance of the measured correlations. Both quantities are shown in Fig. 3 as functions of the residues' position along the backbone. For reference, we present in Fig. 2(b) the spatial distance between residues in the native structure as a function of the same indices. Panels (b)-(e) of Fig. 3 indicate mutual information resulting from various interaction types considered in isolation. Panel (f) gives results for the full model. Different types of inter-atomic interactions in our model gave rise to different patterns of correlated fluctuations. In Fig. 3(b), correlations that result solely from steric repulsions are shown. While the signature of calmodulin's a-helical structure can be clearly seen along the diagonal, where residues i and iz3 or i and iz4 are often highly correlated, many other residue pairs appear significantly correlated as well, even those that are spatially distant. In Fig. 3(c), the correlations that result from the implicit solvent alone are shown. These correlations are more limited, restricted almost completely to residues that are nearby in space, as can be seen when comparing Fig. 3(c) to Fig. 2(b). Again the a-helical residues display appreciable correlation, even more than that resulting from the repulsive sterics, as might be expected from their high degree of solvent exposure. The correlations that result from considering van der Waals attractions along with the repulsive sterics is shown in Fig. 3(d). While the correlations along the a-helix remain strong, many other correlations emerge as a result of these attractions. Hydrogen bonding and salt bridge interactions, taken alone, generate highly significant correlations throughout the entire structure (see Fig. 3(e)), which appear remarkably insensitive to spatial distance. Since only a subset of the residues participate in such interactions, the fluctuations of the remaining residues are completely uncorrelated in this restricted version of our model. The full potential, used to generate the data in Fig. 3(f), results in both the most significant signal-to-noise ratios and the largest excess mutual information values, indicating a large degree of correlation that spans the full range of inter-residue distances while retaining features of the dominant a-helical structure.
To further explore how different interactions give rise to longrange correlations in both a small globular protein as well as the extended calmodulin structure, we calculated the average excess mutual information per residue pair for all residue pairs in calmodulin and barstar, resolved by the spatial inter-residue distance between C a atoms. (See Fig. 4.) In both proteins, steric repulsions alone give rise to small, but significant, correlations that persist across the entire protein structure. The same is true for the implicit solvent interactions and their combination, S+IS. However, much larger correlations emerge when van der Waals attractions are considered in addition to the steric repulsions. Hydrogen bonding and salt bridge interactions are clearly the most correlating types of interactions considered. However, the full potential, which combines all these interaction types, results in the largest overall correlation.
An additional feature within these plots deserves mention; in both proteins, correlation is at a maximum around 6 Å for all subsets of interactions excepting hydrogen bonding and salt bridges. This short-distance peak indicates that residue pairs adjacent in the amino acid sequence (whose a-carbons are separated by &3:8 Å ) do not interact as strongly on average as do residue pairs that are positioned slightly farther apart. In a-helices, neighboring residues point in different directions and, while still likely to interact with their sequential nearest neighbor, are more likely to interact strongly with their iz3 and iz4 neighbors. In b-sheets, however, residues alternately point towards different faces of the sheet, so that the sidechains on residues i and iz2 are much more likely to interact with one another than do those on i and iz1. Residues influenced only by hydrogen bonding and salt bridge interactions, when artificially freed of the steric constraints that would keep them from collapsing back on themselves, still correlate most strongly with their nearest neighbors.
Substantial long-range correlation is seen throughout both barstar and calmodulin. Moreover, the fact that so many subsets of the full potential independently give rise to long-range correlations suggests that correlated side-chain fluctuations should be a robust characteristic of most protein sequences and nearly any globular fold.

Correlated side-chain motions can propagate changes in fluctuations over 50 Å
Through correlated side-chain fluctuations, local perturbations to a protein (e.g., due to small ligand binding) could in principle be transmitted over substantial distances. We scrutinized this possibility by examining the consequences of mutating a single residue in calmodulin. Such a mechanism of communication was described earlier for barstar (see Fig. 1), whose size limited our conclusions to distances of less than 30 Å . Calmodulin, in its extended structure, provides a better test of the ability of sidechain correlations to transmit information over long distances.
We focused this analysis on correlated fluctuations involving one particular residue in calmodulin, 30K, which we observed to be significantly correlated with several other residues (see Fig. 3(f)). In Fig. 5(a), 30K is colored black, while the pairwise mutual information between this side-chain and all others is indicated in bluescale.
Appreciable correlations are apparent throughout the lower globular region near residue 30. The correlations become stronger within the spatially-constricted alpha-helical bridge and spread out again and weaken in the far lobe. To determine whether these correlations could transmit structural and dynamical information over significant distances, we mutated residue 30K to glycine. The resulting change in S S (res) i for each residue i is shown in Fig. 5(b). A significant decrease in entropy was detected in some neighboring residues, while both increases and decreases in entropy were found for residues farther from the mutation site. Although unexpected, the reduction in entropy at residue i resulting from the removal of a neighboring bulky residue can be readily explained if that nearby mutation results in the loss of a potential hydrogen bonding partner for residue i. Such a loss can result in the probability associated with the hydrogen-bonding subset of residue i's configurations being greatly reduced.
We found statistically significant changes in entropy even in the globular region opposite that of residue 30. Thus we conclude that side-chain fluctuations alone can reliably propagate the effect of a single point mutation over at least 50 Å .
When comparing Fig. 5(a) to Fig. 5(b), it is clear that some, but not all, of the strongly correlated residues in the wild-type calmodulin experience detectable changes in their side-chain variability as a result of this particular mutation. Even some residues that are minimally correlated with residue 30K show significant shifts in their side-chain statistics. Although the mutual information can tell us a great deal about the degree of correlation between two side-chains in our model, it is not a discriminating predictor of changes in side-chain variability upon mutation. The discrepancies are likely due to the fact that our calculation of mutual information lacks contributions from correlated intra-rotameric fluctuations, which are still able to convey information in our model and will therefore influence the detected changes upon side-chain mutation. Furthermore, observing the statistically significant changes in Fig. 5(b) requires a great deal of sampling -were more sampling feasible, additional changes would likely be detected.

Magnitude of side-chain correlations is substantial
If the side-chain motions of a protein's N different residues were negligibly correlated, then the total entropy S associated with transitions among distinct rotameric states could be calculated as a simple sum of single-residue contributions, S& P N j~1 s j . The excess mutual information, summed over all residue pairs, provides a rough measure of the error in such a mean-field estimate. Correspondingly, the quantity P ij I' ij characterizes the global thermodynamic significance of inter-residue correlations. For crystalline barstar modeled with the full potential, P ij I' ij is calculated to be 72 kJ/(mol : 300 K). The higher-order correlations expected in such a dense environment [40] (see Fig. 3 where a single residue is often significantly correlated to several others) make this value an overestimate of the total correlation. Even so, its magnitude is noteworthy. In addition, while allowing intrarotameric fluctuations, this calculation neglects their contribution to the total correlation, which were found to be essential in reproducing the calorimetric TDS binding of calmodulin with its ligands in [21] and are likely to be substantial.

Long-range correlations are present within several different backbone models of folded barstar
The rigidity of the peptide backbone in these calculations justifies to some extent our schematic model of side chain interactions: For our purposes the potential energy function need not resolve subtle thermodynamic differences among diverse chain conformations, but instead serves to establish basic length and energy scales for rearrangements within the native state's basin of attraction.
In considering the biological relevance of our results, backbone rigidity is in part justified empirically by the observation that only weak correlations exist between backbone NMR order parameters, S 2 , and their associated side-chain order parameters, S 2 axis [41]. This weak correlation is likely due to the fact that side-chain and backbone fluctuations largely occur on different time-scales [42], with typical side-chain fluctuations ranging from picoseconds to nanoseconds, while typical collective backbone fluctuations range from nanoseconds to seconds and longer.
However, it is important to assess how variations in backbone configuration of the folded protein might influence the side-chain correlations we have calculated. Toward this end we examined four different structural models from an NMR structure of barstar (1btb [43]). These four conformations were chosen to represent the range of models included in the NMR structure (see Methods). In each case plots of SI'T per pair vs. inter-residue distance for the full potential closely resemble results for the crystal structure (see Fig. 6). Since the statistics of side-chain rotations in a fluctuating backbone environment can be rigorously decomposed into contributions from sub-ensembles in which the backbone is held fixed, the consistent nature of the observed long-range correlation from one backbone structure to another establishes their robustness to typical backbone motions.
Larger backbone fluctuations, however, such as partial unfolding events or the motions of hinged regions, are certain to disrupt many of these correlations and may limit their role in conveying allosteric information. In particular those correlations arising from contact between residues that are spatially proximal, but distant within the protein's amino acid sequence, will attenuate as backbone motions carry them away from one another. However, correlations between residues linked through a path of sequential neighbors, such as those observed along the central a-helix of crystalline calmodulin in Fig. 5, may persist. As a result, some information may continue to be transmitted through side-chain fluctuations even after significant backbone rearrangements, as long as the secondary structure, which is responsible for many of the observed correlations between sequential neighbors (see Fig. 3(f)), remains intact.

Amino acids displayed different propensities towards correlated side-chain motion
In addition to scrutinizing the effect of different types of atomic interactions, we also examined how a protein's amino acid composition might contribute to stronger or weaker correlations among its side-chain fluctuations. To do so, we took a set of twelve small globular proteins with different sequences and folds (see Methods) and calculated the average excess mutual information per pairwise interaction for each amino acid across the entire set of proteins. The result is plotted in Fig. 7.
In general, amino acids with the most sp 3 -sp 3 hybridized rotatable bonds resulted in the largest SI'T values. The amino acid arginine is clearly the most strongly correlated residue, followed closely by lysine. While both of these amino acids have four rotatable bonds, arginine is considerably bulkier than lysine, with more potential hydrogen bonding partners. In addition, arginine has been found to take on fewer alternate rotameric states in simulations of folded proteins than lysine [44].
Similarly, the bulky aromatics (Phe, Tyr, Trp) were more correlating than their single sp 3 -sp 3 hybridized rotamer would indicate, while isoleucine and leucine are both much less correlating than the other residues with two rotatable bonds: glutamine and glutamate. Glycine, alanine, and proline all have SI'T~0, since they possess no rotatable bonds in the model.
Changes in NMR-derived order parameters do not compare well to calculated changes in side-chain entropies Recent NMR measurements on eglin C provided a good opportunity to compare our results with experimental evidence of wide-spread changes in side-chain fluctuations resulting from small perturbations [9,30]. In this work, a series of valine residues were mutated to alanine at various positions in eglin C, a small globular protein with a relatively static backbone, and the resulting changes in the order parameters of side-chain methyl groups were measured. The changes in the NMR-measured order parameters were in many cases quite low; the majority of the statistically significant changes were less than 0.05 (order parameters range from 0.0 for a completely disordered vector to 1.0 for a completely ordered one), with only a few residues showing changes greater than 0.1 [9,30]. The magnitude of these NMR-measured changes combined with the significant statistical errors in our calculations (the average standard deviation was 0.02) rendered such a comparison difficult. In the cases where we could resolve the changes in our MC-calculated order parameters enough to make a meaningful comparison to the experiments we found little correspondence between our data and the NMR measurements.
NMR order parameters are expected to underestimate the full range of side-chain motion, as they neglect motion slower than the tumbling time of the molecule, and recent work demonstrates that such motion can be substantial [45,46]. Similarly, our calculation is also expected to underestimate the full range of motion accessible to the side-chains due to the fixed backbone, which we observed previously to be particularly problematic in calculating methyl group order parameters for alanines [21]. As a result, a poor correspondence between our calculated methylgroup order parameters and those derived from NMR relaxation experiments, in particular those involving mutations to alanine, is perhaps to be expected. In past work we nevertheless demonstrated a clear correspondence to the measured NMR order parameters for wild-type eglin C using the same computational approach [21]. Even if the model is not sufficiently accurate or detailed to make quantitative predictions for altered side-chain fluctuations in the specific case of eglin C, the conclusions we draw here for long-range correlations among side-chain fluctuations should be pertinent to the biophysics of folded proteins in general.

Discussion
The propagation of information across long distances within the folded protein is of great importance in allosteric regulation. While backbone structural changes and fluctuations have long been studied as the bearer of this information, correlated side-chain fluctuations provide potential information conductance pathways Figure 6. Distance dependence of mutual information in different NMR models of barstar. The average excess mutual information I' per residue pair is plotted here for various atomic interactions, binned according to the C a -C a inter-residue distance, for the crystal structure (1a19 [47]) and four NMR model structures (1btb [43]) of barstar, using the full LJ+IS+HBSB potential. See Methods for details. doi:10.1371/journal.pcbi.1002168.g006 that have often been overlooked. In this work we have examined these side-chain correlations and found that changes in fluctuations in response to even single residue perturbations, such as a point mutation or residue immobilization, are statistically significant, widely distributed, and not easily intuited from the protein's sequence or structure. The correlations emerge independently from several different sources: steric repulsions, solvation, and hydrogen bonding and salt bridges. Together, these interactions give rise to robust, statistically-significant correlations that persist across the entire spatial extent of both barstar and calmodulin.
In the calculation of the mutual information values, we model the protein's backbone as a rigid structure, enabling us to investigate the degree of correlation among side-chain fluctuations alone. An understanding of the role of backbone fluctuations is necessary, however, to judge the biological relevance of these observed correlations. Interestingly, we found that significant correlations are present among the side-chains of barstar in several different backbone structures, which collectively represent the range of its typical conformational fluctuations in solution. Moreover, since the time scales characteristic of backbone and side-chain motions are likely well separated in many cases, communication via side-chain rearrangements as we have described may often occur on an effectively static backbone. However, upon larger backbone rearrangement, such as hinge motions or partial unfolding events, a significant fraction of the observed side-chain fluctuations are likely to decouple, and thus the role of side-chain correlations in allosteric regulation where large backbone rearrangements are known to occur may be limited.
We also utilized a simple implicit solvent model to account for the solvent's mean field effect on the protein. While this approach allows us to focus directly on the protein's degrees of freedom, it neglects solvent fluctuations. Physically realistic solvent fluctuations are likely to influence side-chain fluctuations in the same way that side-chain fluctuations influence one another, and the reverse is also true. This potential for correlation between solvent and side-chain fluctuations suggests that fluctuations of the solvent shell may also convey information from one site on the protein to another. Indeed, we observed that even the mean field effect of solvation mediates the correlation of side-chain fluctuations, as seen in Fig. 3(c). However, more and stronger correlations were observed to arise from hydrogen-bonding and salt bridge interactions (see Figs. 3(e) & 4), and it is largely through these very effects that the solvent molecules would influence the sidechains. While we are not able to explore the resulting implications in our current work, the demonstration of allosteric effects mediated through solvent fluctuations would be quite intriguing.
Finally, it is important to note that the magnitude of the correlations measured here is only a fraction of the total magnitude possible among the side-chains, since correlated fluctuations within each individual rotameric well, which are not included in our correlation metric, are sure to contribute significantly, perhaps even to a greater degree, to the overall amount of correlation. Even so, correlations amongst interrotameric fluctuations alone reveal much about the way sidechain fluctuations give rise to long-range correlations within the folded protein. The role of these robustly correlated side-chain fluctuations in allosteric regulation should be considered further.

Model
Our model is defined by an energy function that depends on the atomic coordinates of the protein's residues. The only degrees of freedom in our model are the dihedral angles x~hzw, where h represents the ideal angle of each rotameric state taken from an empirical rotamer library [33], and w describes torsional variations within the potential energy well of each rotamer. Thus the energy function depends on the full set of h and w values (denoted H and W) for all residues, The potential E dihedrals is piecewise quadratic in w, biasing dihedral angles towards their ideal h values; E non{bonded includes a Lennard-Jones function that governs both repulsive sterics (with a hard-sphere cutoff at 0.75 times the van der Waals radius) and attractive van der Waals interactions, as well as hydrogen-bonding and salt-bridge terms; and E implicit solvent accounts for solvation using an approach based on the solvent-accessible surface area. A detailed description of the model is given in [21].

Structures
The analysis outlined in this paper requires a good structural model of the protein considered. For barstar, we use a crystal structure of the mutant C82A at 2.8 Å resolution (1a19 [47]), except for the analysis in Fig. 6 using different NMR structural models, where the structure 1btb [43] was used. In this case four models (numbers 3, 4, 16, and 29) were chosen to represent the structural variety within the full set of NMR models, as assessed by the RMSD values calculated between all possible pairwise combinations of the four models and compared to those of the full ensemble.

Sampling
Side chain configurations were sampled from the Boltzmann distribution using Metropolis Monte Carlo techniques. In order to calculate the highly converged measurements of S S (res) ij needed to produce the results shown in Figs. 1 & 5(b), additional adaptive umbrella sampling techniques and biased sampling procedures were utilized [21,57]. Averages at 300 K were finally constructed from such calculations by summing results for different energies with appropriate statistical weights. were color-coded only if they passed Student's t-test for significance at the 90% level, using the two-sided Student's t statistic. Mean S S (res) values were calculated using a Wang-Landau [57] bias for five sets of five trials each, first for the unperturbed wild-type protein and then for the same protein perturbed either by freezing or by mutating one of its side-chains. Averages and standard deviations were calculated across the five sets and compared to determine the significance. For Fig. 1, trials ran for 200,000 MC sweeps, whereas for Fig. 5(b), trials ran for 50,000 MC sweeps.

Statistical tests
Significance of mutual information. The calculation of both the excess mutual information I' ij and the signal-to-noise ratio of the mutual information I ij (n)=I (ref) For each interacting trial run, the probabilities p(h i ) associated with each rotameric state were recorded and used to bias its corresponding non-interacting reference run. Excess mutual information values and signal-to-noise ratios were then calculated independently for each pair of interacting and non-interacting, but biased, trial runs. Presented results are averages of these I' ij and I ij (n)=I ij was computed from five independent runs of a biased, noninteracting reference system, also initiated from randomly-chosen side-chain configurations. As described above, the lengths and biases of these reference runs were chosen to produce samples equivalent in size and in single-rotamer distribution to the number and distribution of sterically valid configurations generated in the corresponding simulation of the interacting system. The signal:noise ratio and the excess mutual information were calculated independently for each trial; averages over those trials are presented in the figures. The results shown in Fig. 5(a) and Fig. 6 were calculated using the full LJ+IS+HBSB potential.
For Figs. 4 & 6, the above results were collected into interresidue distance bins of 4 Å wide for barstar and 8 Å wide for calmodulin, except for the first two bins which were kept 4 Å wide in order to highlight the peak at short distances of &6 Å .