Native Contact Density and Nonnative Hydrophobic Effects in the Folding of Bacterial Immunity Proteins

The bacterial colicin-immunity proteins Im7 and Im9 fold by different mechanisms. Experimentally, at pH 7.0 and 10°C, Im7 folds in a three-state manner via an intermediate but Im9 folding is two-state-like. Accordingly, Im7 exhibits a chevron rollover, whereas the chevron arm for Im9 folding is linear. Here we address the biophysical basis of their different behaviors by using native-centric models with and without additional transferrable, sequence-dependent energies. The Im7 chevron rollover is not captured by either a pure native-centric model or a model augmented by nonnative hydrophobic interactions with a uniform strength irrespective of residue type. By contrast, a more realistic nonnative interaction scheme that accounts for the difference in hydrophobicity among residues leads simultaneously to a chevron rollover for Im7 and an essentially linear folding chevron arm for Im9. Hydrophobic residues identified by published experiments to be involved in nonnative interactions during Im7 folding are found to participate in the strongest nonnative contacts in this model. Thus our observations support the experimental perspective that the Im7 folding intermediate is largely underpinned by nonnative interactions involving large hydrophobics. Our simulation suggests further that nonnative effects in Im7 are facilitated by a lower local native contact density relative to that of Im9. In a one-dimensional diffusion picture of Im7 folding with a coordinate- and stability-dependent diffusion coefficient, a significant chevron rollover is consistent with a diffusion coefficient that depends strongly on native stability at the conformational position of the folding intermediate.


Introduction
The study of proteins that fold in an apparent two-state-like manner [1] has led to tremendous advances in protein folding biophysics [2,3]. In line with the consistency [4] and minimal frustration [5] principles, the energy landscapes of these proteins may be pictured as smooth funnels with little ruggedness [6][7][8]. However, the consistency between local and nonlocal interactions is never perfect [4]. Frustration exists [5] in biomolecules and can sometimes serve important biological functions [9]. It is physically intuitive that energetically favorable nonnative interactions can occur [10]. Through improved experimental techniques, nonnative interactions are now known to be more prevalent than previously appreciated [11,12]. From a fundamental biophysical standpoint, a better understanding of the presence and absence of nonnative interactions is key to deciphering biomolecular recognition and to assessing our grasp of basic protein energetics [13].
As one of the earliest definitive examples of nonnative effects in single-domain proteins, the folding kinetics of bacterial immunity protein Im7 and its homolog Im9 are well characterized [14,15]. Despite their very similar native structures (Fig 1A and 1B), a large body of experimental work demonstrates that Im7 folds via an intermediate stabilized by nonnative contacts, whereas Im9 folding is essentially two-state [16][17][18][19][20][21][22]. The relative simplicity of the Im7/Im9 system makes it well suited for an informative case study. Unlike some of the larger proteins (number of residues n ≳ 100) such as cytochrome c [23] and ribonuclease A [24] that fold in a more complex manner [25], Im7 and Im9 folding is not complicated by a heme or disulfide bonds. Indeed, in view of many single-domain proteins that can fold with no apparent nonnative effects, the nonnative interactions in Im7 are likely a consequence of functional constraints [26,27]. It is noteworthy in this connection that the biological functions of Im7 and Im9 are evolutionarily related by promiscuous interactions [28] that are probably underpinned by nonnative excited-state conformations [29].
Theory and computation have provided valuable insights into the Im7/Im9 system. Experimental F-values were used as constraints in conformational sampling to derive putative folding transition states of these proteins [27,30]. The results suggest a functional origin for the nonnative interactions in Im7 [27]. In a separate effort, an equilibrium intermediate state was predicted for Im7 using a Gō-like model that assumes no favorable nonnative interaction [31]. However, although topological frustration and heterogeneity in contact density can, in some cases, lead to kinetic and equilibrium folding intermediates in the absence of favorable nonnative interactions [32][33][34], a subsequent computational study indicates that Im7 folding cannot be explained by native-centric interactions alone [26]. Instead, nonnative effects arising from "localized frustration" [35] was seen as necessary for rationalizing the peculiar behaviors of Im7 [26]. Consistent with this finding as well as with experiment, a sequential stabilization algorithm for predicting folding pathway was not able to reach the Im7 native structure because of kinetic trapping; but the same algorithm was successful in accessing the Im9 native structure [36].
A clear kinetic difference between Im7 and Im9 is manifested by their chevron plots of logarithmic folding and unfolding rates versus denaturant concentration [11]. The folding arm of the Im7 chevron at pH 7.0 and 10°C exhibits a significant rollover, whereas that of the Im9 does not [16,[18][19][20]. The present study addresses this basic distinction between Im7 and Im9 by direct simulations of folding/unfolding rates. Because each chevron plot is a summary of kinetic and thermodynamic data from a large set of folding/unfolding trajectories [13], it is not yet practical to employ all-atom molecular dynamics [37,38] for the extensive computation necessary to produce model chevron plots. Moreover, current molecular dynamics forcefields are probably insufficient to rationalize highly cooperative folding behaviors such as that of Im9 because the forcefields tend to over-predict nonnative effects [38,39]. Therefore, as an interim method that has been applied elsewhere [40][41][42], we develop tractable explicit-chain coarsegrained models [43] to tackle the chevron behaviors of Im7 and Im9, as these behaviors have not been addressed by direct simulations to date. We model nonnative effects using "hybrid" formulations that augment structure-based native-centric interactions with physics-based, Other residue positions are shown in black (for Im7) or blue (for Im9). Each structure contains four helices (I, II, III, and IV). The bottom panels show free energy profiles −ln P(Q) for Im7 (C) and Im9 (D) computed using three different models around each model's transition midpoint.
The present study addresses also the relationship between conformational diffusion and folding intermediates. Diffusion is a useful concept [54][55][56][57][58][59] in describing physical effects of solvent and internal friction in folding [60][61][62][63]. Whereas mild internal friction likely arises from the particulate nature of the solvent [62] and correlated dihedral rotations along the polypeptide [63], elevated internal friction in compact chains [60] can emerge more generally from topological frustration [32,33] and favorable nonnative interactions [10,54]. As discussed below, the Im7 chevron rollover in our model is associated with a coordinate-and stability-dependent coefficient of one-dimensional diffusion, with a strong anticorrelation between native stability and diffusion rate at the position of the transiently trapped intermediate. Notably, the smallest diffusion coefficients at these trapped positions can be orders of magnitude smaller than those encountered in two-state-like folding.

Results/Discussion
We study three classes of coarse-grained chain models for Im7 and Im9. The rationale for the models-termed db (desolvation-barrier), db+hϕ, and db+MJhϕ-are detailed in Methods. The db models are purely native-centric, whereas the other two are hybrid models [13] that allow for sequence-dependent nonnative hydrophobic interactions based on either homogeneous or heterogeneous nonnative energies. The nonnative interaction strength between any pair of hydrophobic residues is taken to be the same in the homogeneous db+hϕ models. We compare this simple approach [49]-which does not account for effects of mutations among hydrophobic residues-to the heterogeneous db+MJhϕ models that utilize a Miyazawa-Jernigan (MJ) statistical potential [52] for the nonnative interactions among different hydrophobic pairs. To compare models on an equal footing, the average hydrophobic interaction strength in the heterogeneous db+MJhϕ models is chosen to be identical to that of the homogeneous db +hϕ models.
The difference between Im7 and Im9 folding is not apparent in the model proteins' Q-dependent free energy profiles The equilibrium free energy profiles computed near the models' transition midpoints (Fig 1C  and 1D) show no dramatic difference between Im7 and Im9. The free energy barrier is lower for Im7 than for Im9 in the db models (dotted curves); but this trend is reversed when the nonnative interactions in the db+hϕ and db+MJhϕ models are included (dashed and solid curves). Nonnative interactions in these models slow down folding for Im7 but speed up folding for Im9. Unlike previous Im7 models that exhibit a significantly populated equilibrium intermediate [26,31] (which is apparently not quite in line with the success of two-state fitting of experimental equilibrium data for wildtype Im7 [22]), folding in our models is thermodynamically two-state as their folding/unfolding barriers under midpoint conditions are quite high (≳ 5k B T, where k B is Boltzmann constant and T is absolute temperature). The only hint of an Im7 folding intermediate is a small dip in the Im7 profiles ( Fig 1C) at Q % 0.85 that is absent in the Im9 profiles ( Fig 1D). This feature by itself is no definitive evidence for complex folding kinetics, however. Under much stronger folding conditions, folding in our models becomes downhill [64]. Now even less difference is seen in Fig 2A between the equilibrium free energy profiles of Im7 and Im9 under zero-denaturant conditions (ΔG/k B T % −10.5 and −12.0, corresponding to the experimental folding free energy of approximately −24.9 kJ mol −1 for Im7 [19] and −28.2 kJ mol −1 for Im9 [15] at pH 7.0 and 10°C; see Fig 2B).
The main difference between Im7 and Im9 chevron plots is rationalized by heterogeneous nonnative hydrophobic interactions The approximate linearity of native stability versus interaction strength /T ( Fig 2B) allows ΔG/k B T to be used as a proxy for denaturant concentration [42] in model chevron plots. Fig 3 shows that the folding-arm rollover and lack thereof, respectively, in the experimental chevrons for Im7 and Im9 at pH 7.0 and 10°C [16,[18][19][20] is captured by the db+MJhϕ but not the db and db+hϕ models, suggesting that the Im7 rollover arises from the strong nonnative interactions among the large hydrophobic residues as modeled by db+MJhϕ (S1 Fig). The difference between Im7 and Im9 folding cannot be explained by native interactions alone (as in db) or the more generic nonnative hydrophobic effects in db+hϕ. The chevron rollover in the db+MJhϕ Im7 model is a consequence of transient yet long-lived trapped conformations at Q % 0.85 ( Fig  4A), which do not appear in Im9 folding under similarly strong folding conditions ( Fig 4B).
An overview of Im7 and Im9 folding kinetics is afforded by their kinetic profiles, which show a deep minimum at Q % 0.85 for Im7 ( Fig 4C) but not for Im9 ( Fig 4D). Determined from folding trajectories alone [59], kinetic profiles are more useful than free energy profiles for identifying folding intermediates. The Im7/Im9 difference is not apparent from the free energy profiles because, on one hand, kinetic trapping is minimal when folding is only weakly favored ( Fig 1C). On the other hand, when folding is strongly favored (Fig 2A), the contribution from folding trajectories to an equilibrium profile is overwhelmed by that from unfolding trajectories, viz., the resident time in the folded state is much longer than that in the unfolded and intermediate states. Consequently, the deep well at Q % 0.85 in Fig 4C translates to merely a small kink around the same Q value in Fig 2A. A physical account of the main difference between Im7 and Im9 folding kinetics is thus provided. Many mutational experiments are rationalized below as well. Because of their simplicity, however, db+MJhϕ models are limited in some respects. For instance, the midpoint folding rate of Im7 is % 1/5 that of Im9 in this model ( Fig 3C); but the experimental midpoint rate of Im7 (% 1.2-3.0 s −1 [19,65]) is ≳ 40 times that of Im9 (% 0.03 s −1 [15,20]). Moreover, whereas the folding and unfolding arms of the simulated chevron plots are quite symmetric around the transition midpoint, experimental unfolding rate exhibits a much weaker denaturant dependence than folding rate [16,[18][19][20]. These drawbacks are typical of topology-based models [42], which are more apt for folding than for unfolding kinetics [43,66]. But this limitation has little bearing on our analysis of folding kinetics. Improved modeling likely requires special stability-enhancing energies that have minimal effects on folding kinetics [67,68]; but such efforts are outside the scope of the present work.

Contact pattern of the computed Im7 folding intermediate is consistent with experimental inference
Structural properties of our simulated Im7 intermediate (Fig 5) are largely in agreement with mutagenesis experiments, which indicate that the intermediate is stabilized by nonnative interactions between Helix IV and the open end of the Helix I-Helix II hairpin involving residues L3, I7, F15, V16, L18, L19, L34, L37, L38, F41, V42, I68, and I72 [19]. Notably, 12 of these 13 residues are involved in the most populated 20 nonnative hydrophobic contacts (with > 80% probability of occurrence) in the Im7 intermediate simulated using db+MJhϕ (Fig 5A, upper triangle). The only exception is V42, for which the most probable nonnative contact V36-V42 Our computed probabilities of contacts are in line with experiments indicating that Helices I and IV are fully formed but Helix II is partly formed in the Im7 intermediate [14]. In Fig 5A, intrahelical contacts between residues i, i + 4 are present but less probable for Helix II (residues  that are normalized for the 0.8 < Q < 0.9 conformations along folding trajectories. The grey dotted lines mark the M, F, I, and L residues along the Im7 sequence. (B) One such Im7 conformation at Q = 0.844 (green C α trace) is compared with the PDB structure (black trace). In the intermediate conformation (green trace), the N-and Ctermini are marked, respectively, by the blue and red spheres. Hydrophobic residues that participate in significant nonnative interactions are marked as orange or yellow spheres (same color code as that in Fig  1A). A significant nonnative interaction is marked by a gray line between a pair of residues if the pair is not a native contact yet their spatial separation in the conformation shown is less than 8.0 Å and their interaction energy is stronger (more negative) than 32 to 45) than for Helices I and IV (residues 12 to 26 and 65 to 78). Experiment indicates also that Helix III is absent [14] but it is present in our simulated Im7 intermediate. This limitation of the model is likely related to its simple treatment of native interactions. Nonetheless, in agreement with experiment, amino acid substitutions in Helix III result only in small changes in folding rate in the db+MJhϕ model (see below).
A snapshot of the simulated Im7 intermediate state is shown in Fig 5B wherein each of the highlighted nonnative contacts has ! 80% occurrence probability except M1-L18 (56%) in the Im7 intermediate ensemble ( Fig 5C). All except one (V42) of the 13 residues identified by mutagenesis experiments (see above) to be stabilizing the Im7 intermediate are represented in the highlighted nonnative contacts. We have verified that structures very similar to the C α intermediate conformation in Fig 5B are physically realizable by constructing a corresponding atomic structure [69] with added sidechains [70] Our simulated Im7 kinetic intermediate is stabilized by nonnative interactions (S1 Fig). As such, it is diametrically different from the equilibrium intermediates simulated using purely native-centric models [31] with heterogeneous Gō energies [71]. Instead of being a product of nonnative effects, equilibrium intermediates in such Gō-like models arise from their reduced folding cooperativity [72], which can lead to three-state-like free energy profiles for Im7 and Im9 (S3 Fig); but such features are at odds with experiment.

Kinetic effects of Im7 mutations
Effects of select mutations in the db+MJhϕ model for Im7 are examined through their folding kinetic profiles [59] (Fig 6). Some mutations reduce the depth of the kinetic trap at Q % 0.85 relative to that of the wildtype (WT) while others lead only to negligible changes. We compute also the rates of reaching the intermediate position at Q % 0.85 and the folded state at Q = 0.98 from initially unfolded conformations. The former rate (% 3.9 × 10 −7 for WT, in units of reciprocal number of time steps) varies little, whereas the latter overall folding rate (= 5.0 × 10 −8 for WT) is sensitive to mutation. The overall folding rate correlates, albeit imperfectly, with the depth of the Q % 0.85 minimum.
The general trend of variation of the simulated folding rates is largely in line with that of the experimental intermediate-to-native folding rates k in [19] or k IN [65] for the single mutants (= 238 s −1 for WT) in Fig 6A. For both simulation and experiment, folding rate remains essentially unchanged for three mutants (simulated rate in units of 10 −8 , experimental rate in s −1 [19,65] However, the present model cannot account for the dramatic experimental increase in folding rate and the disappearance of folding-arm rollover for F41L (k in = 5000 s −1 [19], % 21 times of that of WT) because F and L have similar MJ energies [52]. For this mutant, the simulated rate 3.6 × 10 −8 is smaller than that of WT. Even mutating F to a non-hydrophobic in the model cannot produce the experimental effect of F41L, viz., the simulated rate for the F41G mutant is 2.68 times that of WT but is far from sufficient. To account for the dramatic impact of F41L, future theoretical studies will need to pursue subtle effects beyond our simple treatment of hydrophobicity, perhaps by considering energetics specific to aromatic residues [13,73]. Consistent with experiment [14], L53A/I54A has a negligible kinetic effect on Im7 in our model (Fig 6B), which is in line with the small experimental F-values of % 0.03-0.16 and k in = 200 s −1 for L53 and I54 in Helix III [19]. In contrast, many double mutants with hydrophobicity-reducing substitutions in Helices I and II can dramatically destabilize the folding intermediate and thus speed up Im7 folding ( Fig 6B). These predictions should be testable by future experiments. However, because mutations in our models change only the nonnative but not the native interactions, as it stands our approach cannot address mutations such as L18A/ L19A/L37A that prevent Im7 folding [22].

Im7/Im9 difference in native contact density and hydrophobicity of Helix II
The three-state kinetics of Im7 is related to its hydrophobic composition. Im7 has 32 hydrophobic residues (17 with stronger and 15 with weaker hydrophobicities; Fig 1) whereas Im9 has 28 (15 and 13 in the two categories). In Helix II, Im7 has two more hydrophobics (V33, V42) and the stronger L38 instead of the weaker V37 in Im9. In Helix IV, Im7 has I72 instead of Im9's V71. Notably, V33, L38, and V72 are involved in 10 of the 20 most probable nonnative contacts in the simulated Im7 intermediate listed above.
Im7 and Im9 have nearly equal numbers of native contacts involving Helices I and IV (54 and 50, respectively, for Im7 and 53 and 49 for Im9). But the number of native contacts involving Helix II is 52 for Im7 (residues 32 to 45) and 62 for Im9 (residues 30 to 44). The native contact density of Helix II is thus appreciably lower for Im7 (52/14 = 3.71) than for Im9 (62/15 = 4.13). With lower local native-centricity and higher local hydrophobicity (Fig 7),    Fig 1 and depicted in a different orientation to highlight the native contacts of Helix II. The same color coding for hydrophobic residues is applied to the sequence alignment in (C) below. The eight green lines in (A) mark the native contacts involving Helix II that are found in Im7 but not in Im9, whereas the eighteen red lines in (B) mark the corresponding contacts that are in Im9 but not in Im7. Native contacts common to both proteins are not marked in (A) and (B). (C) Combined native contact maps for Im7 and Im9 using aligned residue numbering (bottom). The sequence alignment here follows that of Friel et al. [21], wherein Im9 residue number i is shifted to i + 1 for i > 27. The first of the shifted Im9 residues, at position 29, is marked by the dashed lines in the contact maps. Blue-shaded regions in the sequence alignment encompass residues belonging to the four helices as defined by the PDB. In the contact maps, native contacts common to Im7 and Im9 are plotted in black, whereas those belonging only to Im7 or only to Im9 are plotted, respectively, in green or red. Native contacts involving Helix II are those within the two Lshaped blue-shaded regions in the maps. The lower-right map follows the definition for native contacts in Methods; this map is the one used in the simulations as well as in the drawings (A) and (B). Included for comparison is the upper-left native contact map determined by the CSU software [74,75].
Im7's Helix II-which contains two more hydrophobic residues than Im9's as shown by the sequences at the bottom of Fig 7 (see also discussion above)-is more prone to nonnative contacts than Im9's Helix II. Indeed, Helix II is involved in all of the 20 most probable nonnative contacts in the simulated Im7 intermediate.
We emphasize that the critical factor here is the local native contact density of Helix II but not necessarily the overall native contact density of the protein. Im7 has fewer native contacts than Im9 (154 versus 164) in our models; yet the simulated Im7 intermediate remains essentially unchanged even if the number of Im7 native contacts is increased to 161 by using Swiss-PdbViewer [69] to construct additional contacts in its less ordered N-terminal region. Moreover, the trend seen here is not limited to our specific definition of native contacts. To assess the robustness of our inference, we have also applied the CSU software, which employs detailed analysis of interatomic contacts and interface complementarity to determine native contacts [74,75]. Under the CSU criterion, the total number of native contacts is very similar for Im7 and Im9 (177 and 180 respectively; see upper-left map in Fig 7). Nonetheless, similar to the observation above, the local density of CSU-defined native contacts of Helix II is also appreciably lower for Im7 (59/14 = 4.21) than for Im9 (67/15 = 4.47).
Experiments on Im9 have shown that V37L/V71I and V37L/E41V/V71I can lead to threestate folding [15,21] and folding-arm rollover at pH 7.0 and 10°C [21]. Computationally (S4 Fig), these mutations deepen somewhat the shallow minimum at Q % 0.85 in the Im9 kinetic profile (A and C of S4 Fig). But the effect is insufficient to account for experimental data, indicating that further effort is needed to better model the balance between native and nonnative interactions in Im9. For instance, if the native interaction strength of L33 (which acts as a "gatekeeper" [76]) in Helix II was reduced, much deeper Im9 kinetic traps would develop for V37L/V71I and V37L/E41V/V71I (B and D of S4 Fig). Although our present model does not address mutational effects on native interactions, this result indicates nonetheless that L33 mutations that reduce the native interaction strengths (e.g., by substituting it with a less hydrophobic residue) may lead to less cooperative folding of Im9. This suggested behavior should be testable by future experiments.
The above analysis of the interplay between local native contact density and hydrophobicity suggests that the different folding kinetics of wildtype Im7 and Im9 may also be seen in variants of the homogeneous db+hϕ model (K HP = 1 as defined in Methods) with stronger nonnative hydrophobic interaction strengths (K HP > 1). Consistent with this idea, S5 Fig shows that a signficant folding intermediate population starts to develop at K HP = 1.3 for Im7 but no corresponding folding intermediate is observed for Im9 at the same K HP . Two comments are in order here. On one hand, the result in S5 Fig from an alternate formulation of hydrophobicity reinforces our general notion that local native contact density and hydrophobicity are the main physical underpinnings for the Im7-Im9 kinetic difference. On the other hand, a strength of ≳ 1.3 for the homogeneous nonnative hydrophobic interaction is needed to achieve the desired Im7-Im9 difference, whereas the heterogeneous nonnative hydrophobic interaction strengths in the db+MJhϕ model that produce a similar effect average only to 1.0 (see Methods; note that even at K HP = 1.3, the minimum at Q % 0.85 in (A of S5 Fig) is shallower than that in Fig 4C). Physically, K HP ≳ 1.3 is problematic because it implies that nonnative interaction strength is ≳ 30% higher than native interaction strength. For this reason and considering the obvious limitation of the homogeneous approach that it cannot address effects of mutations among hydrophobic residues, the more refined db+MJhϕ approach is adopted in the present study.
Conformational diffusion in Q is extremely coordinate-and stabilitydependent in the presence of a significant kinetic trap The Im7/Im9 system is instructive in elucidating nonnative effects and kinetic trapping in the diffusion picture of protein folding [54][55][56][57][58][59]. Conformational diffusion models with a coordinate and stability-dependent diffusion coefficient on a one-dimensional free energy profile were constructed for two-state-like [57] and downhill [58] folding; but corresponding modeling for folding with a significant chevron rollover has not been much explored. In this regard, it is noteworthy that the rollover in our Im7 db+MJhϕ model appears across only % 8% variation in interaction strength (/k B T = 1.37 and 1.48, respectively, for midpoint and zero denaturant). In contrast, rollover-like features for two-state-like and downhill folders emerge over much wider ranges of interaction strength [58].
The restraining-potential method [56,58] in Methods is used to compute Q-and ΔG-dependent autocorrelation function C Q (t) (Fig 8) and diffusion coefficient D(Q) (Fig 9). The restraining-potential method directly addresses the escape probability from a given Q. Rather than seeking a good fit by Bayesian analysis [55], we adopt this method to explore possible limits of the diffusion picture by testing the consistency between diffusive accounts of restrained and unrestrained chain kinetics.
The most notable Im7/Im9 difference presents itself around the Im7 kinetic trap at Q % 0.8-0.9. Here a dramatic deepening of a dip in D(Q) with increasing native stability is seen for Im7 but not for Im9, whereas D(Q) for other Q-values is not very sensitive to ΔG (Fig 9). Achieving numerical convergence of the computed D(Q) in the Q % 0.85 region of Im7 is difficult because of kinetic trapping. To delimit theoretical possibilities, we obtain lower and upper bounds of D(Q) for Im7 in this region, respectively, by initializing restrained runs from kinetically trapped and random conformations (Fig 9).
Im7 chevrons may now be computed in the diffusion model; but considerable variation ensues (shaded area in Fig 10A) because of numerical uncertainties. The rollover trend of the simulated Im7 chevron is among the predicted possibilities. However, when matched against explicit-chain kinetics, D(Q) is found to be underestimated by an overall factor of e 2.7 % 15 ( Fig  10), indicating that the method for computing D(Q) [56,58] needs to be improved or that a one-dimensional diffusion perspective is of limited applicability here.
Despite these uncertainties, it is clear that a D(Q, ΔG) that decreases exponentially with ΔG at the trap position Q % 0.85 ( Fig 10B) is necessary to reproduce the folding-arm rollover for Im7 (Fig 10A, circles). The required variation of D(Q, ΔG) at this position, which spans two orders of magnitude, is reassuringly consistent with the lower bound estimated by initiating restrained runs from the kinetic trap. In the absence of such a strong dependence of D(Q, ΔG) on ΔG, the predicted folding arm would become essentially linear (top dashed line in Fig 10A). In the same vein and consistent with our explicit-chain results (Fig 3), no folding-arm rollover is produced by the diffusion model for Im9.

Concluding remarks
To recapitulate, our explicit-chain models account physically for the strikingly different folding kinetics of Im7 and Im9 in terms of prevalent nonnative interactions among large hydrophobic residues in Im7 but not in Im9. The proteins' different experimental chevron behaviors are rationalized by our simulation. The same phenomenon may also be described by a onedimensional diffusion process with a very small and strongly stability-dependent diffusion coefficient at the position of the Im7 kinetic trapped intermediate.
Our model interaction schemes are tentative [13,46]. For instance, possible contributions to nonnative interactions from electrostatic [48,51,77] and aromatic [73] effects are not taken   into account. Nonetheless, by comparing different modeling schemes as controls and contrasting Im7 and Im9 behaviors under the same scheme, we arrive at a physical picture that is largely in agreement with experiment. As observed experimentally [65,78], Helices I and IV are essentially formed while Helix II is partially formed in our simulated Im7 intermediate. Kinetic effects of many mutations in our model are consistent with experiment, including those involving Helix III (Figs 3-7), demonstrating the versatility of the hydrid modeling approach to nonnative effects [13].
Several limitations of our model are noted. In particular, the short Helix III is present in our simulated Im7 intermediate but experimentally that is apparently not the case [65,78]. To address this issue, more sophisticated treatments of local conformational propensity [36,79] and sidechain effects [13,42] are probably needed. Indeed, the rich repertoire of experiments on the Im7/Im9 system, such as those on pH [18,21] and temperature [15] effects, offers ample data for testing extensions of our models.
Perhaps the most useful insight from the present effort is that the peculiar folding kinetics of Im7 vis-à-vis that of Im9 is closely related to their difference in the balance between local native contact density and hydrophobicity. This principle embodies a competition between native topology and nonnative interactions [49] and is likely applicable to protein dynamics and biomolecular processes in general. As such, it should be examined in detail and extended to other forms of nonnative interactions in future investigations.

Explicit-chain models
Three related C α chain models for Im7 and Im9 are considered, namely the db, db+hϕ, and db +MJhϕ models. Common to these models is a set of native-centric interactions with desolvation barriers for each protein. Folding and unfolding kinetics is simulated by Langevin dynamics [80]. Desolvation barrier (db) is a robust feature in hydrophobic interactions [81] that tends to enhance folding cooperativity [40,82]. Indeed, for some proteins such as ribosomal protein S6, C α models with db lead to highly cooperative folding behaviors that are consistent with experiments [49] whereas models without db exhibit only weak folding cooperativity [76]. Here, following Ref. [80], the pairwise db energy is defined by a contact minimum well depth of = 1.0, a db height of db = 0.1, and a solvent-separated minimum well depth of ssm = 0.2.
The db model is purely native-centric with the total interaction potential, denoted here as E N , equal to V total in Ref. [80]. The same interaction strength is applied to all native-centric interactions. The native contact sets for Im7 and Im9 are constructed using the same criterion [80]. A pair of residues i, j belongs to the native contact set if at least one pair of their non-hydrogen atoms, one from each residue, are less than 4.5 Å apart in the Protein Data Bank (PDB) structure. The PDB C α separation between i, j is denoted by r n ij . The total number of native contacts in the set,Q n , is equal to 154 and 164, respectively, for Im7 and Im9 (Fig 7). We have explored using alternate "flavored" native-centric interaction strengths [72,83] in accordance with the residue-dependent contact energies in Ref. [71] but, interestingly, the resultant models for Im7 and Im9 fail to fold cooperatively.

Homogeneous and heterogeneous nonnative interactions
Favorable nonnative interactions are included in db+hϕ and db+MJhϕ. Using a hybrid formulation [13,[43][44][45][46][47][48][49][50][51][84][85][86][87][88][89][90][91][92], the total interaction potentials E T of these models are given by E T = E N + E HP , where E HP ¼ P n i P n j¼iþ4 K HP k ij exp½Àðr ij À s h Þ 2 is the sum of sequence-dependent nonnative contact energies over i, j that are both hydrophobic (hϕ), defined to be the eight amino acids A, V, L, I, M, W, F, and Y [47]. r ij is the C α distance between i, j during simulation (1 i, j n, where the total number of residues n = 87 for Im7 and n = 86 for Im9); and σ hϕ = 5.0 Å. The nonnative hϕ interactions in the db+hϕ model are homogeneous with κ ij = −1.0 irrespective of hydrophobic residue type and K HP = 1.0 as in Refs. [47,49], whereas the nonnative hϕ interactions in the db+MJhϕ model are heterogeneous, with κ ij = Δ ij where Δ ij is the contact energy in Table V of Miyazawa and Jernigan [52] and K HP = 1.8 such that the average hϕ energy K HP hκ ij i over all possible 8 × 7/2 + 8 = 36 hϕ pairs is equal to −1.0 (the K HP κ ij values range from −0.216 for A-A to −1.584 for F-F). This average hϕ interaction energy of −1.0 is essentially maintained by the average hϕ energies over all possible nonnative hϕ contact pairs (defined below) for the Im7 and Im9 sequences in the db+MJhϕ models. Those average energies are equal to −0.994 for wildtype Im7 (412 possible nonnative hϕ pairs) and −0.998 for wildtype Im9 (306 possible nonnative hϕ pairs). MJ-type potentials [52,71] are derived from the statistics of native contacts in the protein structure database. Because protein native structures do not contain many significantly unfavorable contacts, MJ potentials are not expected to describe repulsive interactions between amino acid residues with accuracy [93]. Nonetheless, they do provide a crude account of the relative strengths of favorable physical interactions between residues. In fact, it has long been known that MJ potentials for nonpolar pairs reflect the combined hydrophobicities of the two contacting residues [94,95], as is illustrated by the good correlation (Fig 3b of [96]) between a set of MJ energies [71] and the experimental octanol-water transfer free energies of amino acids [53]. In this regard, although there are considerable variations among experimental hydrophobicity scales for all twenty types of amino acids [96,97], a higher degree of consistency among different experimental scales is seen for the hydrophobic (nonpolar and non-charged) amino acids themselves [98]. Taking these considerations together, we view MJ energies between nonpolar residues as a reasonable coarse-grained model of the underlying physics of hydrophobicity. Thus, they should be applicable to favorable nonnative hydrophobic interactions and represent a more refined model than those with homogeneous hydrophobic interaction strengths.
In our models, two hydrophobic residues i, j that are not in contact in the native PDB structure are considered to be in a nonnative contact if |i − j| > 3 and r ij < 8.0 Å (Fig 5). The total number of nonnative contacts in a conformation is denoted by n HP (S1 Fig). All non-bonded energies in our models are temperature independent and pairwise additive. For simplicity, temperature dependence and nonadditity of interactions [99][100][101][102] are not considered here.

Free energy profiles, kinetic profiles, and chevron plots
We consider a residue pair i, j in the native contact set to be in contact during the folding/unfolding simulation when r ij r n ij þ 1:5 Å; i.e., when r ij is not larger than that of the db and therefore within the attractive well of the contact minimum. We use Q, the number of native contacts divided byQ n , as progress variable of folding [103,104]. A free energy profile in units of k B T corresponds to −lnP(Q) where P(Q) is the normalized conformational population distribution as a function of Q (Figs 1 and 2). As was introduced before [59], the kinetic folding path (FP) profiles, −ln P FP | s (Q), is the negative logarithm of average fractional resident time P FP as a function of Q along folding trajectories wherein the notation "| s " indicates that equal weight is assigned to every folding trajectory [59] (Figs 4 and 6). Chevron plots are simulated using change in native stability by varying the simulation temperature as a proxy for variation of denaturant concentration [105] (Fig 3). With a low Langevin viscosity, this approach is computationally efficient and is appropriate for our present purpose because the trend (shape) of model chevron rollover is apparently unaffected by variation over a wide range of Langevin viscosities [101]. Recent tests also indicate that the model chevron plots thus obtained are very similar to those simulated using more sophisticated coarse-grained sidechain models that account for denaturant dependence by experimental transfer free energies [13,41,42].

Nonexplicit-chain models of one-dimensional conformational diffusion
We use the restraining (bias) potential method [55,56,58,106] to estimate Q-dependent diffusion coefficients at different simulation temperatures (hence different free energies of folding ΔG). Following Ref. [55], a Q-dependent diffusion coefficient is given by for a given ΔG. Here the variance varðQÞ hQðt 0 Þ 2 i t 0 À hQðt 0 Þi This method for determining D(Q) is exact if the diffusion process is truly governed by the Smoluchowski equation and K Q is sufficiently large so that variation of free energy G(Q) within a constrained conformational ensemble is essentially linear in Q. The applicability of this approach to protein folding, however, hinges on whether the dynamics along Q is Markovian to a good approximation [55]. For protein folding, D(Q) estimated by the restraining-potential method does exhibit a weak dependence on K Q [58]. We have checked our restraining-potential methodology against that of Xu et al. [58] by using a K Q value that produces conformational distributions similar to theirs. Our D(Q) for chymotrypsin inhibitor 2 at transition midpoint matches well with theirs (S8 Fig). D(Q) can also be estimated using Bayesian analysis [55]. For one dipeptide system, the Bayesian-estimated D(Q) was verified to be very similar to that from restraining potentials [55]. Here we use only the restraining-potential method.
Once D(Q) is in place for a given native stability (free energy of folding) ΔG, the folding MFPT in our nonexplicit-chain models of one-dimensional conformational diffusion (Fig 10) is computed using the discretized form [59] of the general formula [54,109] ðMFPTÞ D ¼ where P eq (Q) is the normalized equilibrium conformational population at Q. The boundary values Q N and Q D for the native (folded) and denatured (unfolded) states are the same as that in our explicit-chain simulations (Fig 2). Alternatively, MFPT can be computed using Kawasaki Monte Carlo (MC) dynamics by generalizing the formulation in Ref. [59] to coordinate-dependent D(Q), viz., the transition probability from Q to Q + δQ is now given by A À1 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi DðQÞDðQ þ dQÞ p exp½ÀdGðQÞ=k B T where δG G(Q + δQ) − G(Q) and A is a constant.
The above geometric mean ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi DðQÞDðQ þ dQÞ p may also be replaced by the arithmetic mean [D(Q) + D(Q + δQ)]/2; the two means are equal in the limit of D(Q + δQ) − D(Q) ! 0. In addition to MFPT, Kawasaki MC is useful also for providing distribution of folding times and other properties of individual trajectories. The data shown in (E) and (F) show that for a given Gō model setup (with either homogeneous or heterogeneous interactions), the Q-dependence of energy and entropy exhibits similar trends for Im7 and Im9, indicating that the nature of the Im7 and Im9 equilibrium intermediates (observed in the models with MJ Gō interactions) are rather similar. In both cases, the second barrier at Q % 0.9 in the MJ Gō model arises from a decrease in conformational entropy with respect to increasing Q (from % 0.8 to % 0.9) that is not fully compensated by a corresponding decrease in energy. (G, H) Snapshots of conformations with Q values corresponding to the thermodynamic intermediate states in the models with MJ Gō interactions for Im7 (G) and Im9 (H). The blue and red spheres correspond, respectively, to the N-and C-termini of the conformations. Snapshots for the models with homogeneous Gō and MJ Gō interactions are depicted by green and red traces respectively. The black traces represent the PDB structures of Im7 (G) and Im9 (H). The Q value for the Im7 snapshots [green and red traces in (G)] is Q = 0.838, that for the Im9 snapshots [green and red traces in (H)] is Q = 0.762. These drawings show quite clearly that the intermediate Im7 and Im9 conformations in the MJ Gō models are largely native. The only regions that deviate significantly from the native conformation are a short disordered C-terminal segment for Im7 (G) and short disordered N-and C-terminal segments for Im9 (H). Comparing the red and green traces in (G) and (H) indicates that the equilibrium intermediates in the Im7 and Im9 models with MJ Gō interactions are a consequence of these models' significantly higher degree of disorder of the C-terminal region relative to that in models with homogeneous Gō interactions. The C-terminal regions are more disordered in the MJ Gō models because each of the amino acid sequences for the regions (GKPGFKQG for Im7 and GKSGFKQG for Im9) contains only one hydrophobic residue. As a result, the favorable interaction between the C-terminal sequence and the rest of the protein is weak when MJ energies are used for the Gō interactions. As discussed in the main text, the present results in this figure may be compared with those reported in