Theoretical Insights into the Biophysics of Protein Bi-stability and Evolutionary Switches

Deciphering the effects of nonsynonymous mutations on protein structure is central to many areas of biomedical research and is of fundamental importance to the study of molecular evolution. Much of the investigation of protein evolution has focused on mutations that leave a protein’s folded structure essentially unchanged. However, to evolve novel folds of proteins, mutations that lead to large conformational modifications have to be involved. Unraveling the basic biophysics of such mutations is a challenge to theory, especially when only one or two amino acid substitutions cause a large-scale conformational switch. Among the few such mutational switches identified experimentally, the one between the GA all-α and GB α+β folds is extensively characterized; but all-atom simulations using fully transferrable potentials have not been able to account for this striking switching behavior. Here we introduce an explicit-chain model that combines structure-based native biases for multiple alternative structures with a general physical atomic force field, and apply this construct to twelve mutants spanning the sequence variation between GA and GB. In agreement with experiment, we observe conformational switching from GA to GB upon a single L45Y substitution in the GA98 mutant. In line with the latent evolutionary potential concept, our model shows a gradual sequence-dependent change in fold preference in the mutants before this switch. Our analysis also indicates that a sharp GA/GB switch may arise from the orientation dependence of aromatic π-interactions. These findings provide physical insights toward rationalizing, predicting and designing evolutionary conformational switches.


S1 Text: Modeling details and related computational studies Calibration of SBM potential
To construct an appropriate dual-basin SBM, we initially attempted two seemingly most straightforward schemes for linearly combining the SBM potentials for G A and G B (S1 Fig  and S2 Fig). They entail either equalizing the total SBM energies for the two native structures, i.e., E A =E B =−n B , where without loss of generality n B may be identified with the number of native contacts in G B , or setting the per-contact energy ε A =ε B =−1 (a constant one may choose to be −1 without loss of generality). In the first case, the per-contact energy of G A is greater, which in combination with the higher conformational entropic contribution of the disordered termini in G A leads to the population being largely concentrated in either the G A basin or the unfolded region (S3 Fig; a,b). In the second case, the G A basin is significantly shallower due to the smaller number of contacts in the G A contact map (n A =95 whereas n B =137), resulting in an overpopulation of the G B basin with virtually no population in the G A basin (S3 Fig; c,d).
In view of these exploratory results, it is quite clear that a small imbalance in the two SBM strengths of ~4% in favor of G B would populate both native basins effectively. Structurally, some degree of imbalance is expected from the higher degree of disorder at the termini of G A compared to G B . G B also packs more tightly and has a higher contact order due to its -rich fold. With these considerations, the native SBM energies (lowestpossible energies) we adopted for most of our simulations were E A =0.96ε B n A , E B =ε B n B .
As discussed in the main text, we reduced the SBM potential strengths as much as possible to enhance the contributions from the sequence-specific transferable component of the hybrid model potential while maintaining sufficient native bias to enable folding to both G A and G B (S4 Fig). For instance, at ε B =−0.25, some degree of folding to G B was seen but folding to G A was not observed, with the unfolded state dominant at T m (S5 Fig). After exploring a range of values of ε B between −2.0 and −0.25, the simulation results at ε B =−0.37 were seen as most consistent with experiment. Unless stated otherwise, simulation results in the main text were obtained using ε B =−0.37.

SBM potential strength: robustness of predicted switching behavior
While ε B =−0.37 was chosen for extensive analysis, the general behavioral trend of our model G A /G B system is robust for a reasonably wide range of ε B values. As apparent in S4 Fig, the preferences of GA98 and GB98 for the G A and G B folds, respectively, are maintained over a range of different ε B values. A more detailed analysis of this feature of our hybrid model is provided in S7 Fig, where free energy surfaces of GA98 and GB98 computed at their respective T m are shown for different -ε B values, in steps of 0.05, between -ε B = 0.2 and -ε B = 0.5. The simulation protocol used here is the same as that in the main text. Both the G A and G B native basins are stabilized for 0.3 ≤ -ε B ≤ 0.5, whereas -ε B < 0.3 is not sufficient to drive G A folding. Notably, the G B intermediate is populated significantly for -ε B ≤ 0.4, but is depleted for higher -ε B , indicating that the G B intermediate state predicted by our hybrid model is a consequence of the physical transferrable potential, not the Gō-like SBM. S8 Fig shows further that the difference in F(G A -G B ) = -ln(P A /P B ) between GA98 and GB98 (see main text) is approximately constant across different SBM strengths for 0.2 ≤ -ε B ≤ 0.5. Within this range, G A is always more favored by GA98 (lower F) than by GB98 (higher F), irrespective of the approximate linear increase of T m with increasing SBM strength -ε B (S8 Fig, inset).
Further illustration of this robustness is provided by Hamiltonian replica exchange simulations (Fukunishi et al., 2002). In addition to the temperature replica exchange discussed in the main text ---where temperatures of different replicas are swapped, we have also conducted constant-temperature Hamiltonian replica exchange simulations, wherein different ε B values within the range 0. While the present hybrid model is rather robust for 0.3 ≤ -ε B ≤ 0.5, it is instructive to note that the model does not reproduce G A /G B behaviors at the two extremes of no SBM (ε B = 0) or no nonlocal transferrable interactions. The ε B = 0 failure reiterates the fact that to date no transferrable potential by itself can account for G A /G B folding and switching. For ε B = 0 (transferrable potential only), our model samples secondary-structure-rich conformations with a helical bias [max(Q A ) > max(Q B )], but the G A and G B native states are not visited (S11 Fig; a). For the case in which the transferrable component of the potential is artificially limited to nonlocal atomic excluded volume repulsion and local terms that maintain a protein-like backbone geometry (i.e., nonlocal attractive interactions of the transferrable potential such as hydrogen bonding and hydrophobic interactions are not considered), no folding was observed for -ε B = 0.37. Moreover, the low Q A and Q B values of the sampled conformations suggest a lack of secondary structure in this model setup (S11 Fig; b, top). At -ε B = 1, the G A and G B folds are substantially populated but so are conformations along the line connecting the two native basins (S11 Fig b, bottom). Hence these free energy landscapes are not bi-stable. It follows from the control simulations in S11 Fig that neither the transferrable Lund potential nor the SBM are individually sufficient to produce the G A /G B switching results of our hybrid approach.

Transition flux analysis
To characterize the shift in native stabilities in the G A /G B system more comprehensively, we partitioned the Q A /Q B free energy landscape into either three (S13 Fig; a) or eight (S13 Fig; b) macroscopic states (lines separating the states are overlaid onto the GB98 landscape as an example in S13 Fig). Replica exchange simulation trajectories, where each trajectory is a temporally ordered list of conformational changes undergone by one protein chain simulated under varying temperature and recorded every 1,000 Monte Carlo cycles, were analyzed for the number of transitions between the macroscopic states. Transition frequencies were normalized for comparison between GA/GB sequence variants. In the three-state model, transition frequencies to and from the native states (G A vs G B ) showed a gradual shift as the sequence varies from GAwt to GBwt (S13 Fig; c). For the eight-state model (S13 Fig; d), transition frequencies are plotted on a log scale (as a visual aid) and only for the eight sequences GA77 to GB77, where all states were populated to some degree. S13 Fig (e) highlights two of these transitions, namely the ones between the native states and their respective neighboring transition state. Here again, the transition frequencies shift along the sequence axis and crossover in between GA98 and GB98. As an internal consistency check of our simulation procedure, these results confirm our free energy calculations indicating a gradual shift in native state preference from G A to G B as the sequence varies from GAwt to non-wildtype GA/GB variants and then to GBwt.

Possible molecular basis for L45Y-induced structure switch
Because each cluster in Fig.4 is a mixture of GA98 and GB98 and the initial pool of conformations before clustering contained an equal number of GA98 and GB98 conformations, any deviation from a 50/50 ratio in a cluster indicates a population shift due to the mutation. S15 Fig summarizes these shifts. The horizontal "fold bias" variable separates centroids of clusters of conformations similar to G A (left) from those similar to G B (right). The vertical "sequence bias" variable separates the relative per-cluster bias of GA98 vs GB98. Clusters with positive or negative values are enriched, respectively, in GB98 or GA98 conformers. A correlation between fold and sequence bias is seen in S15 Fig but there are outliers. All clusters within the G A native basin (bottom-left, blue) are depleted by L45Y. In contrast, most clusters in the G B intermediate basin are strongly favored by the same mutation (clusters nos. 6, 18, 13, 22; except cluster 17, see below). The same is true for clusters around TS2-G B (cluster nos. 29 and 37). One important exception, however, is the largest, most G B -like native cluster that lies on the horizontal axis (cluster no. 4). This result indicates that according to the transferrable potential here, L45Y favors the G B -like native intermediates even more than the G B native fold itself.
The centroid structures of those clusters in the unfolded state or close to TS1-G B that are most enriched in GB98 conformations (cluster no. 28) or GA98 conformations (cluster no. 12) are shown to the left of the vertical axis in S15 Fig. They all feature the second β-hairpin that hosts the mutated Y45, which favorably interacts with F52 on the opposite strand. This observation on unfolded clusters suggests that L45Y likely has a G B -biasing effect even in the early stages of folding. Interestingly, the kinetically trapped version of the second β-hairpin in the centroid of cluster no. 12 is more favored by GA98 than by GB98 sequences.
We further characterize the L45Y-induced drive toward G B by quantifying the bias in contact statistics in the entire unfolded state (all conformations with Q A < 0.6 and Q B < 0.3). The aromatic-aromatic Y45-F52 interaction stands out as the one affected most by the L45Y substitution. This contact is more frequent in GB98 compared to GA98 conformations by more than 4%. In contrast, the i, i+4 helical contact K46-L50 is favored in GA98 compared to GB98 conformations by about 4%. Further details of this analysis are provided in S16  (Fig.4 of main text). In contrast, cluster no. 17 is most biased toward GA98. This cluster is a clear outlier in that its conformational population is biased towards GA98 conformers (negative vertical value) although its contact pattern is biased toward the G B fold (positive horizontal value). Notably, only the first but not the second β-hairpin is formed in the centroid of this cluster, although stabilizing interactions in the second β-hairpin are important in the formation of the G B fold. The ambivalence in sequence-structure relationship in this cluster underscores once again the bi-stable nature of the GA98 and GB98 sequences.

Related computational studies
It is instructive to compare and contrast our model predictions with computational results in the literature on G A and G B folding. A brief summary of select studies is provided below.
Extensive simulations of coarse-grained chain models have long been performed on GBwt (commonly referred to as protein GB1 or Protein G). The insights they provided into the folding process and their structural characterization of possible intermediate and transition states are of interest. An early Cα Gō-model predicted that GBwt folding is two-state (Karanicolas and Brooks, 2002). In an all-atom Gō-model simulation of GBwt performed around the same time, the major folding pathway was seen to involve an interaction between the helix and the first β-hairpin whereas alternative pathways, including the ones more similar to that suggested by our present GA98/GB98 simulations, have low probabilities (Shimada and Shakhnovich, 2002). However, a subsequent study using a simplified Cα chain model with a physics-based transferable potential found two alternative pathways of GBwt folding: a two-state transition via the interaction of the helix and the second β-hairpin, and a three-state transition where a third β-strand is added and forms a stable intermediate (Brown and Head-Gordon, 2004). Both pathways are similar to what we observed for GA98/GB98 folding (Fig.4 of  main text, and S14 Fig). A subsequent study using a coarse-grained Cα+Cβ (two beads per residue) found that the first major event along the GBwt folding pathway is the formation of the second βhairpin with strong aromatic interactions and its temperature-dependent association with the helix. The third step is the association of the first and fourth -strands followed by incorporation of the Y3 residue into the hydrophobic core consisting of aromatic sidechains, resulting in a "fluctuating native-like globule" (Kmiecik and Kolinski, 2008).
Unfolding simulations on the G A /G B system have also been conducted. An all-atom explicit-water molecular dynamics study of high-temperature unfolding of GA59 and GB59a sequence pair that share 59% amino acid identitysuggested that the second β-hairpin forms early during GB59 folding and associates with the central helix, a process that appears to prevent folding into G A -like structures (Scott and Daggett, 2007). Nonlocal hydrophobic interactions between aromatics were found to serve an important role in this process. The early unfolding transition state they identified (see Scott and Daggett, 2007, Fig. 8 therein) has lost the hydrogen bonds between the first and second β-strands, similar to some of the conformations in our G B intermediate state (e.g. cluster no. 22 in Fig.4 of main text) and TS2-G B (Fig.4 of main text and S14 Fig). They also found that the second β-hairpin was kinetically more stable against unfolding than the first β-hairpin. A subsequent joint experimental-computational study that also involved all-atom explicit-water molecular dynamics simulations of high-temperature GBwt (GB1) unfolding identified intermediate-and transition-state conformations as well (Morrone et al., 2011a , Fig 6 therein). These authors concluded that GBwt folding is not two-state, but they also stated that the partially folded state identified in their work did not accumulate and therefore was distinct from the intermediate states suggested by previous studies (Morrone et al., 2011a). An explicit-water molecular dynamics simulation of mechanical unfolding of GBwt suggested that the second β-hairpin tends to form early during the folding process as well (Wu et al., 2012).
In conjunction with experiment, all-atom explicit-water molecular dynamics simulations of the disordered unfolded states of the GA88/GB88 pair of sequence variants suggested that GB88 has higher helical content in its unfolded state than GA88 despite the helical folded structure of GA88. Moreover, sidechain hydrogen bonds in the GB88 unfolded ensemble were found to stabilize the second β-hairpin (Morrone et al., 2011b). A recent metadynamics study based on NMR data on protein GB3 (which differs from GBwt by only seven amino acids; see PDB:2OED) also found that "the second β-hairpin represents the initial native element in the folding process" (Granata et al., 2013).
A more recent sequence design study of the G A /G B system predicted that the N-and Cterminal regions of the protein are most sensitive to mutations and thus likely to harbor "switch" substitutions like L45Y that leads from GA98 to GB98 (Chen et al., 2016). Their emphasis on the importance of the termini region echoes our transition state analysis, which suggests that the first putative bottleneck for G B folding, TS1-G B (Fig.4, main text), often entails antiparallel alignment of the termini (see the two most frequent representatives of TS1-G B in S14 Fig).
All the results outlined above are in broad agreement with our findings, based on an analysis of the present GA98/GB98 simulations, that the second β-hairpin forms early in G B folding. In contrast, all-atom pure SBM (Whitford et al., 2009) simulations found instead that GB folding is seeded by the first β-hairpin. Langevin dynamics simulations based on these models for GA95/GB95 and GA98/GB98 also did not find any stable intermediate along the G B folding pathway (Kouza and Hansmann, 2012;Sutto and Camilloni, 2012). These differences in results between models with a physical transferable potential (e.g. common molecular dynamics force fields) or a hybrid potential with a transferable component on one hand and pure SBM on the other are likely caused in part by the lack of favorable nonnative interactions in the pure SBM models.

Possibility of a kinetic trap in G B folding
Our simulations suggested a kinetic trap in the G B folding pathway that features an inverted second hairpin (cluster no. 12 in S15 Fig and the 32.4%-populated TS1-G B and 19%-populated TS2-G B cluster centroid conformations in S14 Fig). While the sensitivity of this prediction to variation of model parameters remains to be ascertained, it is worth considering whether such "mirror image" conformations could potentially exist in general. An early high-coordination lattice model study succeeded in intentionally creating the mirror image of protein A by mutations (Olszewski et al., 1996). More mirror-image conformations of helical proteins were reported as well in subsequent investigations (Liwo et al., 1999;Yang et al., 2007). More recently, Onuchic and coworkers (Noel et al., 2012) as well as Scheraga and coworkers (Kachlishvili et al., 2014) have emphasized that mirror-image conformations with secondary structure and tertiary contacts similar to a native structure may sometimes compete with the true native state. In particular, in one of these simulation studies it was demonstrated that metastable mirror-image conformations of the native structure could exist for protein A (i.e. GAwt) and that the two alternative folds are energetically similar (Kachlishvili et al., 2014). Further examples of mirror images of native structures in molecular dynamics simulation were reported for the three-helix bundle α 3 W (Shao, 2015). In fact, mirror-image conformations have long been noted in computational structure prediction and NMR refinement, but they have largely been treated as artifacts of a nonoptimal energy function (Pastore et al., 1991;Skolnick et al., 1997;Zhang, 2009). Since the native structure and its mirror image may share many characteristics, experimental probes with low spatial resolution such as circular dichroism may not always be able to resolve them. Ultimately, whether or not true mirror-image conformations of native structures exist, and their prevalence if they do, are questions that need to be settled by further experiments.
Are current experimental techniques sufficiently sensitive to distinguish between the alternative G B β-strand orientations seen in the present simulations? A recent experimental and simulation study on the five-helix-bundle protein λ 6-85 describes an off-pathway trapped state with high native contact fraction (Q=0.7) by using multiple fluorescence markers (Prigozhin et al., 2015). In the future, similar techniques may also be applied to β-rich proteins such as GBwt and its variants. In the folding simulations of GB98 presented here, we only observed mirror images of the intermediate state but not the native state. As it stands, it is an open question whether three-state folding reported for several GB variants (Giri et al., 2012) can be explained in part or in its entirety by a kinetically trapped mirror-image intermediate because it might be difficult for common experimental probes to distinguish such trapped states from a native-like intermediate.
The mirror-image G B intermediate encountered in our simulations arises from an alternative form of the second β-hairpin. Of relevance here are several computational investigations on the possible occurrence of misfolded states in this hairpin of GBwt (GB1). An early implicit-solvent Monte Carlo simulation study of the 16-residue second β-hairpin region of GBwt found that the Y45-F52 interaction forms early, and that the peptide can form a significant number of nonnative hairpins (Dinner et al., 1999). Further theoretical information is available from subsequent molecular dynamics simulations of the second β-hairpin region in isolation, which has been found in some studies to populate a misfolded hairpin (one strand flipped relative to the other) that has to first unfold before it can proceed toward the correctly folded β-hairpin (Best and Mittal, 2011;Hatch et al., 2014). In another study, a completely misfolded hairpin similar to that in our mirror-image intermediate was found among other misfolded conformations (Pietrucci and Laio, 2009). The prediction of misfolding is not universal, however, as another computational study did not encounter misfolded structures (Juraszek and Bolhuis, 2009). Taken together, these molecular dynamics results on isolated peptides suggest strongly that misfolded second β-hairpins of GBwt do occur, albeit not always in the mirrored configuration found in our trapped intermediates. However, tertiary contacts between the β-hairpin with other parts of GBwt outside the hairpin could facilitate clustering of hydrophobic residues of the hairpin on the "wrong" side in the full G B structural context.
In any event, potential difficulties in identifying the misfolded hairpin in simulation can be caused by the choice of reaction coordinates. On one hand, backbone RMSD, or the fraction of native hydrogen bonds may group a misfolded hairpin with other unfoldedstate structures. On the other hand, some studies may count the misfolded hairpins as native or native-like when using the secondary structure content or the fraction of native contacts formed (Q) as reaction coordinate.