Energetic Frustrations in Protein Folding at Residue Resolution: A Homologous Simulation Study of Im9 Proteins

Energetic frustration is becoming an important topic for understanding the mechanisms of protein folding, which is a long-standing big biological problem usually investigated by the free energy landscape theory. Despite the significant advances in probing the effects of folding frustrations on the overall features of protein folding pathways and folding intermediates, detailed characterizations of folding frustrations at an atomic or residue level are still lacking. In addition, how and to what extent folding frustrations interact with protein topology in determining folding mechanisms remains unclear. In this paper, we tried to understand energetic frustrations in the context of protein topology structures or native-contact networks by comparing the energetic frustrations of five homologous Im9 alpha-helix proteins that share very similar topology structures but have a single hydrophilic-to-hydrophobic mutual mutation. The folding simulations were performed using a coarse-grained Gō-like model, while non-native hydrophobic interactions were introduced as energetic frustrations using a Lennard-Jones potential function. Energetic frustrations were then examined at residue level based on φ-value analyses of the transition state ensemble structures and mapped back to native-contact networks. Our calculations show that energetic frustrations have highly heterogeneous influences on the folding of the four helices of the examined structures depending on the local environment of the frustration centers. Also, the closer the introduced frustration is to the center of the native-contact network, the larger the changes in the protein folding. Our findings add a new dimension to the understanding of protein folding the topology determination in that energetic frustrations works closely with native-contact networks to affect the protein folding.


Introduction
Residue interactions define both the protein structure and the mechanism of protein folding, and a subtle equilibrium between residue contacts exists as a compromise between the protein function and protein folding thermodynamics and kinetics [1][2][3]. Native residue interactions, which are contacts between neighboring residues in the native structures, are believed to play essential roles in determining protein structures and protein folding dynamics [4,5]. Thus, in efforts to solve the protein folding problem, over the decades many theoretical models were developed to reproduce native residue interactions from aminoacid sequences, including a variety kind of coarse-grained models and a few all-atom models [6][7][8][9][10][11][12][13][14][15][16]. On the other hand, recent experimental findings of protein folding intermediates and rarepopulated structures highlight protein conformations that do not exist in native structures [17][18][19][20][21], leading to increasing interests in studies of non-native residue interactions in protein folding [22][23][24][25][26][27].
The energy landscape theory has provided an invaluable framework for studies of protein folding. According to the theory, the formation of a few key native-contacts at start might intrigue a cascade of down-hill like conformation changes leading to the native states, and the protein folding processes form funnel-like free energy-reducing trajectory [12]. The basic idea behind the theory is the so-called principle of ''minimal frustration'', which emphasizes both the natural reduction of undesired reside interactions in the protein folding and the emergence of frustrations as local minima in the funneled energy landscape [28,29]. For a small fast folding protein the energy landscape surface ruggedness is predicted to be small and the folding processes is described by the so-called two-state model as a diffusion of configurations along a reaction coordinate from unfolded states to folded states [30]. One representative model is the Gō-like model in which protein folding is solely driven by the native-contact residue interactions and the non-native residue interactions are either ignored or set to be repulsive [4,31]. However, recent experimental observations and theoretical calculations suggested that non-native residue interactions, to some extent, might affect the overall folding process by introducing local frustrations [23,[32][33][34][35][36][37].
Frustrations in protein folding are usually analyzed using protein conformations of the transition states which formed the so-called the transition state ensemble(TSE). TSE usually locates in the free-energy maxima on reaction paths that connect the native-like and fully unfolded states [38]. Over the decades, intensive studies had been dedicated to structural characterization of TSE conformations, from which significant progresses had been made in understanding of protein frustration principles. For example, Clementi etc. determined key factors in the TSE confirmation distribution for small globular proteins using a coarse-grained Gō-model [39], Shea etc. distinguished two types of frustrations in protein folding: the energetic and topological frustration [22,40]. Sutto etc. examined the effects of frustrations on the formation of on-pathway intermediate in the protein folding of IM7 using an all-atom AMW model [23]. Zarrine-Afsar etc. studied energetic frustrations in protein folding kinetics of the Fyn SH3 domain, interestly they introduced a Gaussian type potential function for non-native hydrophobic residue interactions as energetic frustrations [25]. Hills etc. investigated topological frustrations in the protein folding of a/b/a Sandwich CheY-like proteins using a sequence-sensitive Gō-like model [41]. Very recently, Contessoto etc. studied the interplay between energetic and topological frustrations for a set of 19 proteins of different folding motifs and sizes, also using a Gaussian potential function for energetic frustrations [42]. Taken together, these researches highlighted the overall effects of frustrations on protein folding, including the formation of on-pathway intermediates, acceleration of initial folding, etc. However, the details of how frustrations interact with the native-contact network at residue level and of how they affect protein folding still remain unclear. Thus a description of frustrations at residue level is still desired for fully understanding of the mechanisms of protein folding.
On the other hand, comparative studies of protein folding for homologous proteins have recently become popular both in experiments al and theories [43]. The studies involve a majority of protein folds, including all-a, all-b, a/b and a+b structures, and many popular theoretical tools are used including the framework of energy landscape theory, TSE analysis and Q-value analysis, among others. Homologous studies can highlight common folding mechanisms for relevant proteins. For examples, four immunoglobulin-like (Ig-like) protein domains was found to fold with same mechanism through similar pathways was found, while homologous proteins G and L, the spectrin repeat domains R16 and R17, fold with the same mechanism through different pathways. Besides, comparative studies can also give valuable insights into protein folding of different protein folds. For example, Cho et al.'s simulations suggested that all-a proteins vary their folding pathways from one family member to another whilst all-beta proteins are likely to have similar folding pathways, which is consistent with experimental observations [44].
Inspired with these results and in order to probe the effects of energetic frustrations at residue level here we studied the transition state ensembles of five homologous Im9 proteins, using a frustrated coarse-grained Gō-like model. Im9 proteins were selected from the same domain entry in SCOP [45], so that the selected proteins have both similar structures and sequence. Indeed, the selected structures share the same three dimensional topology and the mutual root-mean-squared derivations (RMSD) of their C a traces are smaller than 0.5 Å . More importantly, they bear a single hydrophilic-to-hydrophobic residue mutation when compared with the while type protein. Thus in our comparative studies differences in topological frustrations are minimized and the effects of energetic frustrations on protein folding are emphasized. In this work, energetic frustrations due to non-native hydrophobic interactions are introduced to the conventional topology-based Gō-like model, using a Lennard-Jones potential function. A variable temperature approach is also applied to the protein folding simulations. Our results reveal that energetic frustrations have highly heterogeneous influences on the folding of the four helices of the examined structures depending on the local environment of the frustration centers. Also, the closer the introduced frustration is to the center of the native-contact network, the larger the changes in the protein folding. Our findings are consistent with experimental observations and add new insight on energetic frustrations in the framework of protein folding topology determination.

Results and Discussion
Validation of the variable temperature protein folding simulation The variable temperature protein folding simulation method is designed to simplify the protein folding simulation protocol and to enhance the sampling of TSE conformations. To validate its efficiency and accuracy, we compared TSE structures of protein G B1 domain (PDB code 2GB1, 56 amino acids) derived using the variable temperature simulation method and those derived from the conventional constant temperature simulation. The frustrated Gō-like model is used in both cases. Figure 1A) shows the system temperature fluctuats around the averaged collapse T h = 0.226 in a variable temperature simulation. Residue Q-values are calculated using all the trajectory snapshots, compared with the conventional constant temperature simulation where only productive trajectory is used for Q-value analysis. Figure 1B and 1C) compare the two sets of Q-values derived from the two temperature methods using the conventional Gō-like model ( Figure 1B) and frustrated Gō-like model ( Figure 1C), respectively. In either case, we find the two sets of Q-values are highly correlated with one another, with a Pearson correlation of 0.99 and a small standard derivation of 0.01. A further comparison based on calculations of all-a Im7 (SCOP domain entry d1ayia_, 86 amino acids) protein folding simulations ( Figure 1D) also shows strong correlation in Q-valus, with a Pearson correlation of 0.99 and a standard derivation of 0.04. These results indicated that the variable temperature simulations give the same protein folding dynamics as do the conventional constant temperature simulations. In variable temperature simulations, the time-consuming determinations of transition temperature T h is avoid and which otherwise requires a series of simulations at a spectrum of temperatures in the conventional folding simulations. These results suggested that variable temperature simulation approach is a safety and efficient replacement of the conventional constant temperature simulation, especially for TSE sampling and relevant protein folding studies. This method shares some feature with that of the conventional replica-exchange molecular dynamics method [46] in that it includes multiple temperature transition during the trajectory producing. The following calculations on Im9 proteins are based on the variable temperature protein folding simulations.
Energetic frustrations facilitate the formation of nativecontacts and increase the transition state barrier in the protein folding of Im9 domains The apparent free energy is proportional to the negative log of the distribution probability (2lnP(Q)) of TSE structure conformations where the Q value is defined by the number of nativecontacts formed in a conformation normalized by the total number of native-contacts found in the native state configuration. Instead of a fixed temperature as in the conventional constant temperature simulations, the averaged transition temperature can be used to determine the temperature factor k B T for the free energy. Figure 2(A-E) shows changes of the apparent ''free energy'' landscape due to the introducing of energetic frustrations.
Two trends in free energy landscape change are obtained from the comparative studies of the examined Im9 domain structures: the overall free energy landscape shift to the high Q-value end (i.e. the native states) and an increase in the height of the free energy barrier that separate the folded and the unfolded states. In some sense, these two trends seem to have contradictory effects on protein folding, but they are actually in consistent with one another: remote non-native hydrophobic interactions as the introduced energetic frustrations help stabilize the peptide in some compact cluster state that in turn facilitate formation of more native-contacts, leading to a shift of free energy landscape to the high-Q end; however, these frustrations also bring extra barrier of hydrophobic residue contact that must be broken before new native-contacts recovered so as to reach the fully folded state, resulting in a lift of transition state barrier. In this sense, what is more interesting is to check changes of detailed residue or secondary structure contacts in TSE as the introducing of energetic frustrations.

Energetic frustrations have heterogeneous effects on protein folding of Im9 domains
In this part we examine the impact of energy frustrations on the TSE conformation distributions at residue level by Q-value comparison. We did this by comparing the residue Q-values derived from the conventional Gō-like model simulations and those from the frustrated Gō-like model simulations. We noticed that the 5 Im9 domain structures share almost the same 3D configurations (see Table 1), thus they share the same, if any, topological frustrations and the changes in Q-value comparison can be ascribed to the difference in the introduced energetic frustrations. At the same time, the single hydrophilic-to-hydrophobic mutation (called, for simplicity, the introduced energetic frustration center) between the selected Im9 domains provided a unique opportunity to examine the topological location dependence of energetic frustrations. When mapping frustration centers to the native-contact network and measuring energetic frustration effects on protein folding, we can get insights on how energetic frustrations closely interact with the native-contact network to reshape the protein folding process. Figure 3(A-E) and Table 2 compared Q-values derived from the conventional Gō-like model with those from the frustrated Gō-like model for the 5 Im9 domains. As mentioned above the difference between the two sets of Q-values can be ascribed to the introduction of energetic frustrations to the native-contact networks of the proteins. One of the striking features of energetic frustrations is their highly irregular distribution as revealed by the changes of residual Q-values. Q-value perturbations are ignorable for the Im9 (also see Table 2), suggesting that energetic frustrations have ignorable effect on Im9 folding dynamics. This highlights the importance of the single hydrophilic-to-hydrophobic mutations as the resource of energetic frustrations that bring differences in the protein folding of the examined Im9 domains. Significant residual Q-values changes are observed in the other 4 mutants, among them D51A has the largest residue Q-value increases. Furthermore, the distribution of high-valued perturbations is far from random (see Figure 4): largest increases are observed for all the helix IV and coil 3 of all the four mutants and relative small changes are detected in helix III; sizable increases in helix I are also observed in H5A mutant. On the whole, large Qvalue increases are detected for residues in helix I, II and IV in all the examined Im9 mutants, indicating that energetic frustrations tend to help longer other than shorter a-helices to state in their folded states. In this sense, the energetic frustration heterogeneity is in part a reflection of the ''random'' one-dimension distribution of secondary structures.

The impact of energetic frustrations on the native contacts between secondary structures
To understand how the non-native hydrophobic interactions imposed frustrations on the protein folding at the secondary structure level, we compared the representative native contact numbers h 0.05 and h 0.10 for all the secondary structure pairs (Table 3, Table 4, Table 5, Table 6, Table 7). In the case of Im9, the introduced energetic frustrations imposed neglectable impact to the native contacts between secondary structure elements, native contact increase is only detectable between helix IV and II, but this perturbation decreases to almost zero at the p = 10% level (Table 3). With the introduction of mutation H5A the significant native-contact increase between Helix IV and Helix I, II, leading to larger Q-values of Helix IV residues. ALA5 locates at the head of coil 1 and it is free to form non-native hydrophobic contacts with coil 5, thus dragging Helix I, II close to Helix IV (Table 4). Tables 3 and 4 together show that non-native hydrophobic interactions have less effect on the coil folding than helix folding. In the case of E41A, increased contacts are found between Helix IV and I, II, which is similar as in the case of H5A but have much stronger intensity (Table 5). We noticed that native-contact increase number is reduced much faster for both Helix I and II than that for Helix IV at higher p = 10% level. The hydrophobic mutation E41A locates at the end of Helix II and exhibits less mobility compared with residues at the head of Coil 1 as in the   (Table 6). This might be due to the critical location of the introduced hydrophobic residue ALA51 -the end of Coil III and beginning of Helix III: at this position it can easily form non-native hydrophobic contact interactions with hydrophobic residues in the 3 surrounding long helices, and the larger mobility of its associated unstructured Coil III and the short helix III facilitate these non-native hydrophobic contact formation, leading to large representative number at higher probability value. The artificial mutation R75A introduces a non-native hydrophobic center at the end of Helix IV, facing Helix III, the C-terminal of Helix II and Coil I. It has similar effects on the secondary structure contacts as in the case of E41A ( Table 7). The difference is that R75A mutation brought stronger contacts between Helix I and IV compared with that in the E41A mutation.

The homologous structures of Im9 domains
The bacterial DNase E colicin immunity proteins had been subjected to intensive experimental and theoretical protein folding studies as representative models of all-a structures [20,23,[47][48][49][50][51][52][53]. They have the same secondary structures elements composed of four a-helices: 3 long helices (namely the helix I, II, IV) and one short helix of 4 to 5 residues (namely the helix III) [54]. To investigate the impact of frustrations on protein folding, homologous structures of Im9 domains were selected based on SCOP classification [45,55] (see Figure 5,6 and Table 1). Four Im9 domains were chosen from SCOP domain in the entry of a.28.2.1/Im9 whose sequences are different only by one or two hydrophilic-to-hydrophobic mutations: d1emva_, d2gyka1, d1fr2a, d1bxia_, and most importantly these domains have very close tertiary structures with their mutual RMSD being less than 0.5 Å . Such a choice is to minimize the difference of topological frustrations thus highlight energetic frustrations in protein folding in the comparative studies. An artificial R75A mutation structure (named d1emvax) based on the Im9 domain d1emva_ was also built manually by removing all the side-chain atoms except C b in residue 75, this structure was designed to study the effects of energetic frustrations near the C-terminal. For simplicity and clarity, we renamed the examined domains by their corresponding mutation names (see Figure 6 and Table 1).

The frustrated Gō -like model: non-native hydrophobic interactions as energetic frustrations
The conventional Gō-like model is solely determined by the native topology of the studied proteins and usually satisfies the principle of minimal frustration. In a coarse-grained Gō-like model the protein conformation is represented by the trace of C a atoms r i f g and a potential energy of the system is defined based on coordinates of C a atoms in their native states. As in the simulations of protein G B1 domain in Refs. [56][57][58][59], the potential energy reads, where the first three terms are covalent interactions between neighboring C a 's, namely the bond, the angle and the dihedral angle interactions, the fourth term is Lennard-Jones potentials for non-covalent interactions between neighboring C a pairs (i, j) that form close contact in the native state (these contacts are called ''native-contact'' and the total number of native-contacts is denoted by ''NC''), and the fifth term is the repulse interactions between remote C a pairs (i, j) that form neither covalent connection nor native-contact (these terms are called the non-native contacts and the total number of non-native contacts is denoted by ''NNC''). The r, h, Q are respectively instantaneous bond lengths, bond angles and dihedral angles, and subscript ''0'' marks the quantities measured in the native-state configurations. A C a pair (i, j) is determined to form a native-contact H ij in the native state if the minimum atom distance between the two residues is less than a cutoff value of 5.5 Å . In a TSE snapshot, a native-contact H ij is said to be kept in its native state if the distance between the ith and jth C a 's satisfies r ij r ij,0 ƒ1:2, otherwise H ij is said to be broken. Parameters for the model read K b~2 00e 0 {2 , K h~4 0e 0 rad {2 , K w~0 :3e 0 , C~4, where e 0 is an arbitrary energy  ij is originally set to be proportional to the knowledge-based effective inter-residue contact energies [60,61] and then normalized so that the averaged e e ij~0 :18e 0 . This type of energy function has been used before, for example, in Ref. [62] for the investigation of the symmetry breaking in protein L and protein G and Energetic frustrations are introduced to above coarse-grained Gō-like model by including non-native contact terms in Eq.1 as following: where s ij~0 :5 if both residue i and j are hydrophobic 0 else ð3Þ and C f = 5.5 Å which is also the maximum distance to determine a native contact in the native structure. Landgevin dynamics simulations were performed using a time step Dt~0:007t and a high friction coefficient b~0:2=t, where t is the time unit t~m=e 0 ð Þ 1=2 . t is determined to be 1.47 ps if using an averaged residue mass m of 119 amu and an averaged distance of 3.8 Å between adjacent C a atoms. As a comparison, Chan etc. used a Guassian type function to study energetic frustrations in protein folding of the Fyn SH3 domain [25].

A variable temperature protein folding simulation
In the conventional protein folding/unfolding simulations, the collapse temperature T h of the system needs to be determined prior to productive simulations being performed and data collected. In practice, this is done by detecting the maxima of the specific heat as a function of system temperature, which usually requires multiple long equilibrium simulations of the system at different trial temperatures. After T h is found, product simulations will be carried out at T h so that the system can efficiently change between folding and unfolding states, giving sufficient sampling of TSE structures [59,63]. However, this procedure sometimes can be time-consuming and inefficient since it needs a repetition of long equilibrium simulations in searching T h . Besides, accurate determination of T h can be very difficult by itself, for example, a slight change in the initial parameters, such as using a different random seed, slightly change the temperature, might lead to large Table 4. Representative native contact number h 0:05 =h 0:10 between secondary structures of H5A.   Frustrations through Native-Contact Network PLOS ONE | www.plosone.org fluctuation in the calculated specific heat. On the other hand, one key feature of Gō-like models is that they usually exhibit two-state first-order-like transition between folding and unfolding states, and at the transition temperature T h a sharp separation in distributions of non-native states from those of native states is likely to be observed in the simulations [22,40]. Based on these observations, we designed a variable temperature simulation method. In this method, specific heat is not determined anymore for searching T h , instead simulations are carried out with its temperature continuously adjusted so that the system could keep an equal opportunity to stay in either native and nonnative states. The detailed procedure is listed as following. First, the simulation starts at some guessed transition temperature T and the system is left to run Langevin dynamics simulation with a certain number of steps N T . Then, the snapshots in the upto-now trajectory are collected and analyzed based on a statistics of the snapshot native-contact number and a distribution probability density is determined using a histogram method with its bin width equal to 1. Usually, from the histogram two peaks will be determined from the probability density function, with one peak corresponding to non-native states (fewer native-contacts) and the other one to the native states (more native-contacts). The simulation temperature T is then updated by a small value DT according to the difference between the two peaks: T~T{DT if the nonnative peak is higher than that of the native states, otherwise T~TzDT. (If the two peaks have the same height, then T is updated randomly). In this study a typical simulation usually included 3610 9 steps which equal to a simulation time of 3 ms, DT~0:002 and N T~2 |10 7 indicating 150 times of temperature change in a single simulation.
Q-value analysis of the transition state ensemble Q-value analysis had been widely used for characterizing the local structure conservation in transition states either by using point-mutation experimental measurements [64] or by numerical determination in protein folding simulations [65]. The transition state ensemble or TSE is usually defined as the structures sampled around the free energy barrier between the folded and unfolded states, which in turn is interpreted by those structures located in the center valley between the two probability density peaks centered respectively at folded and unfolded states [6,38,64,66].
Here Q-value is defined by Refs. [30,50] as following, where N i is the number of native-contacts involving ith residue, the denominator is the number of native-contacts concerning ith residue in the native state and the numerator is the averaged number of native-contacts involving ith residue sampled with TSE conformations. w i~1 indicates that ith residue forms the same  To explain the Q-value changes and their relationship with energetic frustrations, we introduced a quantity, h p , to characterize changes of TSE-snapshot native-contacts caused by non-native hydrophobic interactions. To do this, we first sort out TSE snapshot conformations and collect those native-contacts H ij that show increasing probability to stay in their native state after turning on non-native hydrophobic interactions as energetic frustrations. The native-state probability for a native-contact H ij is defined by the ratio of the number of snapshots where H ij is in its native state to that of the total snapshots in the examined TSE. Specifically, we defined a subset of native-contacts as H p by requiring the increased probability no less than p. We then defined a number h p , called the representative native contact number, as the following which is the size of H p . Unlike w i that focuses on changes of single residue, h p has more to do with residue pair defined in nativecontacts. If we restrict the examined residues to two given secondary structural elements, then h p characterizes changes of the interactions between the two elements. Generally speaking, h p decreases very fast as the threshold probability p increases, however the detailed patterns of h p decay differ from one secondary structure partner to another, depending on the detailed local environment concerning involved secondary structural partners.

Conclusions
In this paper, variable temperature simulation studies were performed to compare the protein folding mechanisms of five homologous four a-helix Im9 domains, using a frustrate coarsegrained Gō-like model. The examined Im9 domains share the same structure topology and have single hydrophilic-to-hydrophobic mutations among them. Energetic frustrations were introduced to the systems through the non-native hydrophobic interactions using a Lennard-Jones potential energy function. The effects of energetic frustrations on protein folding were examined at residual level, based on Q-value analyses of the TSE structure conformations. We found that energetic frustrations have highly heterogeneous effects on protein folding of the examined Im9 domains depending on the local environments of the mutation amino acids. We also noticed that a strong correlation between the introduced frustration centers and the topology of the native-contact networks exists: the more a frustration center overlaps the center of the native-contact network the larger it may cause changes in the protein folding.
Taken together, our results suggest that energetic frustrations do their works with the help of the protein native-contact network itself, exhibiting a close relation between energetic frustrations and the protein topology. Our results support the protein folding topology determination in context of energetic frustrations, however it is an alternative way to emphasize importance of the native-contact in protein folding compared with other studies [35,67]. This is not so surprising at least for tightly packed single domain proteins whose TSE structures are usually determined by protein native topology as shown in this work on a-helix domains. However, considering the extended shape of all-beta proteins it might be in different situations for energetic frustrations to affect the folding of beta proteins. Thus an interesting question arises that deserves future study: with what kind of topology dependence energetic frustrations might involve in the folding of all-beta, say beta-barrel, proteins?