OneG: A Computational Tool for Predicting Cryptic Intermediates in the Unfolding Kinetics of Proteins under Native Conditions

Understanding the relationships between conformations of proteins and their stabilities is one key to address the protein folding paradigm. The free energy change (ΔG) of unfolding reactions of proteins is measured by traditional denaturation methods and native hydrogen-deuterium (H/D) exchange methods. However, the free energy of unfolding (ΔGU) and the free energy of exchange (ΔGHX) of proteins are not in good agreement, though the experimental conditions of both methods are well matching to each other. The anomaly is due to any one or combinations of the following reasons: (i) effects of cis-trans proline isomerisation under equilibrium unfolding reactions of proteins (ii) inappropriateness in accounting the baselines of melting curves (iii) presence of cryptic intermediates, which may elude the melting curve analysis and (iv) existence of higher energy metastable states in the H/D exchange reactions of proteins. Herein, we have developed a novel computational tool, OneG, which accounts the discrepancy between ΔGU and ΔGHX of proteins by systematically accounting all the four factors mentioned above. The program is fully automated and requires four inputs: three-dimensional structures of proteins, ΔGU, ΔGU * and residue-specific ΔGHX determined under EX2-exchange conditions in the absence of denaturants. The robustness of the program has been validated using experimental data available for proteins such as cytochrome c and apocytochrome b562 and the data analyses revealed that cryptic intermediates of the proteins detected by the experimental methods and the cryptic intermediates predicted by the OneG for those proteins were in good agreement. Furthermore, using OneG, we have shown possible existence of cryptic intermediates and metastable states in the unfolding pathways of cardiotoxin III and cobrotoxin, respectively, which are homologous proteins. The unique application of the program to map the unfolding pathways of proteins under native conditions have been brought into fore and the program is publicly available at http://sblab.sastra.edu/oneg.html


Introduction
Each protein adopts a specific well-defined three-dimensional (3D) structure, which is important for its biological activities. The relationships between the conformations of such proteins and their stabilities have intrigued researchers for many decades [1]. The conformational stabilities indicate the free energy differences between the folded (N) and the unfolded conformations (U) of proteins. In general, the stabilization free energies of protein molecules have been determined from the studies of protein unfolding caused by denaturants and temperature [2]. Under a reversible and two-state unfolding process, the population of 'N' and 'U' of a protein could be precisely estimated by using optical techniques such as fluorescence spectrometry and circular dichroism. The free energy of unfolding (DG U ) of proteins is calculated by fitting their unfolded population (U) plotted with respect to denaturant concentration or temperature, to an appropriate two-state model equation [3]. The classical melting analyses provide clues on understanding the mechanism of unfolding (two-state/multi-state processes) and the 3D structural architectures (domains organization) of proteins [4]. Hence, estimation of an accurate DG U (free energy of unfolding) for proteins at ambient conditions is indispensable to unambiguously address the thermodynamic and kinetic events of the proteins.
Residue-specific free energy changes of proteins have been determined under native conditions by using hydrogen-deuterium (H/D) exchange method in conjunction with NMR technique [5]. In a typical H/D exchange experiment, when a protein is dissolved in deuterium oxide (D 2 O), backbone amide protons (NHs) of the protein begin to exchange with deuterium. The H/D exchanges of the NHs can be interpreted with the two-state model proposed by Hvidt [6]. In the model, Closed (NH) and Open (NH) represent folded and unfolded conformations of proteins, respectively.
The rate constants k op and k cl are for the unfolding and the refolding reactions, respectively. Exchange takes place only from the unfolded state with the rate constant of k rc , which can be predicted using the method reported by Bai et al. [7]. Under EX2 exchange conditions, where k cl is many-folds greater than k rc , k ex (exchange rate constants of amide protons) is defined as: where K HX is the residue-specific equilibrium constant for NHs in proteins. The residue-specific free energy is then determined using the following relationship.
where R is the gas constant and T is the absolute temperature. The free energy of exchange of protein is averaged out to four largest residue-specific DG HX of the protein [8,9]. When the DG HX (estimated from H/D exchange method) and the DG U (determined from melting curves) are true representations of global unfolding events of the proteins, both the parameters of the proteins must be in good agreement under similar experimental conditions. Contrary to the expectation, proteins show strong discrepancies between their DG HX and DG U values. If the discrepancy is smaller, it can be well accounted to the effect of cis-trans proline isomerisation and/or to the effect of baselines of melting curves [8,9,10], whereas larger differences between DG HX and DG U are attributed to the possible existence of cryptic intermediates accumulating in the equilibrium unfolding reactions of proteins under native conditions [11,12]. The cryptic intermediates elude the melting curve analyses of proteins as they are short-lived and unstable and hence, the intermediates cause underestimation of the DG U . However, the cryptic intermediates can be qualitatively detected by denaturant-dependent H/D exchange method in conjunction with NMR technique [11,12]. In this method, exchange rates (k ex ) of NHs of proteins are measured at low concentrations of denaturant, which only affect the equilibrium between folded and unfolded conformations of proteins. Notwithstanding the advantages of detecting cryptic intermediates and residues constituting each intermediate from such studies, the methods are laborious, expensive, time consuming and also requires sound experimental knowledge. Proteins that are not accumulating cryptic intermediates in their unfolding pathways may also depict discrepancies between the DG HX and the DG U and the discrepancy may probably due to the existence of metastable states causing overestimation of the DG HX . Detecting exact metastable states of proteins is a challenging experimental task of protein folding [13]. In this context, computational methods will be an excellent alternative to address the possible existence of cryptic intermediates/metastable states in the unfolding events of proteins under native conditions. To date, there were no unique programs to address the above mentioned discrepancies in the stabilities of proteins, to our best knowledge. However, it should be mentioned that there are several programs to predict exchange rates of amide protons of proteins and also to predict the folding/unfolding rates of proteins from their amino acid sequences. For instance, programs such as SPHERE [14] and CamP [15] are predicting the k rc and the protection factors of NHs in proteins from their amino acid sequences, respectively. Dovidchenko et al. [16] have recently described a method on prediction of amino acid residues protected from H/D exchange in a protein chain. Wolynes et al. used a statistical approach to figure-out the energetic of protein conformations and relative foldability for contagious segments present in proteins [17,18]. RAFT (rapid autonomous fragment test) program predicts autonomous folding unit based on the analysis of inter-residue contacts of structural segments present in the native structure of proteins [19]. COREX/BEST, which is an interesting program developed by VJ Hilser [20], defines native state ensembles and also maps rigidities and flexibilities of various regions of proteins.
In the present study, we have herein developed a computational program, OneG, which predicts possible existence of cryptic intermediates/metastable states of proteins from their 3D structures, DG U , DG U * and residue-specific DG HX determined under native conditions. The OneG employs 'contact order matrix' strategies for all amide protons (NHs) that are hydrogen bonded in regular secondary structural elements of proteins, to achieve the task. The robustness of the program has been validated by predicting cryptic intermediates of proteins such as cytochrome c and apocytochrome b 562 for which experimentally characterized cryptic intermediates have been well documented in the literature. It is important to point-out that the program does not imply/support for the absence of cryptic intermediates/ metastable states in the native unfolding of proteins for which the DG U and DG HX are in good agreement (this aspect is beyond the scope of the article). Similarly, there is no straightforward correlation between the accumulation of cryptic intermediates of proteins under native conditions and the free energy discrepancies (DG U vs. DG HX ) of the proteins. In this background, the applications of the program have also been dealt in detail on understanding the unfolding events of two structurally similar proteins (cardiotoxin III (2CRT) and cobrotoxin (1COD)), under native equilibrium conditions.

Results and Discussion
Estimations of k rc and DG HX for amide protons (NHs) in proteins The intrinsic exchange rate constants, k rc , for NHs of a protein can be estimated under defined experimental conditions (pH, temperature and ionic strength) on the basis of model compound studies [7,21] and using the equation-4: where k a , k b and k w are rate constants of acid, base and water catalyzed exchange reactions, respectively; R a , R b and R w are the effect of residues that are on the left and the right sides of amide protons under considerations at acidic, basic and neutral conditions, respectively; pK D is the molar ionization constant of D 2 O; pD is the pH-meter reading corrected to deuterium effect. The effect of temperature on the intrinsic exchange rates of the NHs is calculated using the following equation.
where k(T) and k(293) are rate constants at desired temperature and 293 K, respectively; E is the activation energy and its value for acidic, basic and neutral exchange reactions are 14 kcal/mol/K, 17 kcal/mol/K and 19 kcal/mol/K, respectively; T is the absolute temperature in Kelvin and R is the gas constant. Solving the equations-4 & -5 yields the equation-6 (mathematical derivations not shown). where, The OneG calculates the values of k rc for NHs in protein using its PDB file and the equation-6, at defined pH, temperature and ionic strength. Figure 1 outlines the four stages for successfully completing a test run of OneG and the essential steps of stage-I for calculating k rc of NHs are depicted in the Figure S1. Using atomic coordinates of proteins, the program determines disulfide bridges, cysteine residues and geometrical confirmations of Xaa-Pro peptide bonds in proteins and accounts them on calculating the k rc for NHs in the proteins. Any two cysteins in a protein are considered as cystine, when the distance between the two sulphur atoms of the cysteine residues is within 2.3 Å [22] and this particular function of the program has been validated by predicting the cysteine and cystine residues in cardiotoxin III (PDB ID: 2CRT) and cytochrome C (PDB ID: 1HRC, Table 1). The reliability of the program on predicting cis-trans proline conformations in proteins is discussed in the next heading. The k rc values of NHs in ubiquitin (PDB ID: 1UBQ) and cardiotoxin III have been calculated using the OneG and the data have been compared with the k rc values of NHs in the proteins as determined by manual calculation (Figure 2). The data in Figure 2A and 2B were fitted to a simple linear equation and the fitted parameters such as slope and positive correlation coefficient were found to be 0.999 and 1.0, respectively, for both proteins. These observations unambiguously demonstrate the reliability of the OneG on calculating the k rc of NHs in proteins. The program, then, calculates residue-specific free energies, protection factors and DG HX of proteins with the only requirement of having NMRderived k ex of NHs in the proteins (estimated under EX2 conditions in the absence of denaturants). Instead of providing residue-specific k ex of NHs, residue-specific DG HX (second input) can also be directly given to the program. Upon given the third input of DG U (determined from the melting curve analyses of proteins), the program carries forward all the parameters derived in this stage to its second stage for accounting the consistency between the DG HX , exchange free energy of the proteins and DG U , unfolding free energy of the proteins.
cis-trans isomerisation effect of proline residues on the DG HX of proteins All standard amino acids, except proline, are connected to one another through amide linkages in proteins, whereas proline is linked to the preceding amino acid through an imide bond. The amide bonds are exclusively in trans conformations in folded proteins [24]. Contrary to this, the imide bond favours cis or trans conformations much more equally as the free energy differences between these two conformers are insignificant in proteins [25]. Similarly, amide bonds prefer negligible percentage of (about 0.03%) cis-conformations in the unfolded states, whereas imide bond (Xaa-Pro) prefers remarkable percentage of cis-conformations in the unfolded states and the percentage varies (6-38%) depending on the chemical properties of the residue (Xaa) preceding proline [24]. When a protein having cis-prolines is unfolded by denaturants, the unfolding process of the protein is probably a three-state reaction under equilibrium conditions as shown below: where N cis is the folded protein with proline residues in cisconformation; U cis and U trans are the unfolded protein with proline residues in cis and trans conformations, respectively. The free energy change (DG U ) of the protein estimated by denaturation method accounts the equilibrium constants of both steps in the reaction. But, H/D exchange method determines equilibrium constant for the first step only, since cis-trans proline isomerisation is a slow process under native conditions. Due to which, the DG U determined by optical methods is usually less than the DG HX determined by H/D exchange method. The DG HX corrected to the effect of the cis-trans proline isomerisation is denoted as DG HX * , which can be readily calculated using methods reported by Huyghues-Despointes et al. [9]. The DG U and the DG HX * of a protein will be in good agreement when the discrepancy between the DG U and DG HX is merely due to cis-trans isomerisation of proline residues present in the protein.
In order to estimate the effect of cis-trans isomerisation of proline residues in proteins, the OneG program uses a bee-line for the calculations as shown in the Figure 1 and detailed steps of the stage-II of the program are shown in the Figure S2. The program first determines number of proline residues and their conformations in a protein using PDB file of the protein itself. The reliability of the program on predicting cis-trans proline conformations in proteins such as ubiquitin, RNase A (PDB ID: 5RSA) and cardiotoxin III is depicted in Table 2. A quick inspection to the table suggests that the OneG program predicts the exact conformations of proline residues in the proteins. The OneG  calculates DG HX * of proteins based on the K Pro values derived from the model compound studies to all twenty types of Xaa-Pro and the values are stored as default parameters in the program. Table 3 lists DG HX and DG U reported in the literature for 16 different proteins along with their DG HX * calculated by the OneG program. The discrepancies between the DG HX and DG U of the proteins that are listed under 'Group I' could be well accounted by proline isomerisation effect alone, since the DG HX * and DG U of those proteins are in good agreement within the tolerance level of 0.4 kcal/mol [8,9]. Contrary to these observations, the discrepancies between DG HX and DG U of the proteins listed under 'Group II' could not be addressed by proline isomerisation effect alone. This finding implies that the discrepancies may probably stem from different origins that need to be identified and accounted not only for determining the exact DG (change in free energy) of the proteins and also to understand the correlations between the kinetic and thermodynamic unfolding events of the proteins. The discrepancy may also be originated due to the default consideration of K pro values from the model compounds in the program. It is possible that the K Pro estimated based on the model compounds in a set of particular experimental conditions may not be a true representation to Xaa-Pro of a protein in a totally different solution conditions. For instance, Tyr(92)-Pro(93) of RNase A was found to have 33% cis in heat unfolded states of the protein [37], which largely differs from predicted percentage (24%) of cis conformations for the imide bond. However, values of K pro estimated from the model compounds account reasonably the cis-trans isomerisation of Xaa-Pro peptide bonds in the unfolded states of most proteins [9,24,25]. Thus, the OneG program calculates DG HX * of proteins using the K Pro obtained from the model compounds, by default. However, the program provides an option to use K Pro determined from the studies on proteins for calculating the DG HX * . The program then compares the DG HX * of proteins with their DG U and carries forward the values of DG HX * , DG HX and DG U to the next stage of the program for further calculations.
Baselines effect of melting curves on the estimation of the DG U Denaturant-induced unfolding of proteins under equilibrium conditions is generally monitored by optical probes for estimating the DG U of the proteins [38,39]. In a typical all-or-none unfolding experiment, the observed signals representing the ratio of folded and unfolded states of proteins are plotted with respect to denaturant concentrations. The data are then fitted to the nonlinear least squares equation-11 proposed by Santoro and Bolen for estimating the DG U of proteins [40].
where, S x is the observed signals at various concentrations of denaturant, C is the concentration of denaturant in molarities, DG U is the free energy of unfolding in the absence of denaturant, m is slope of a plot depicting DG U versus concentrations of denaturant, S n and S u are the signals of the folded and the unfolded states of proteins in the absence of denaturant, respectively. The S n and S u are further defined to be linear with respect to denaturant concentrations as shown in equations-12 and -13, respectively.
wherein, Y n and Y u are intercepts; m n and m u are slopes; the subscripts 'n' and 'u' denote the folded and the unfolded conformations, respectively. The equation-11 treats the pre-and post-baselines of the melting curve to be linear with respect to denaturant concentrations. The equation will underestimate or overestimate the DG U of proteins, when the baselines of the  melting curves of proteins deviate from their linear predictions. It has been shown that when the pre-and post-transitions baselines of heat-induced denaturation of lysozyme were treated by nonlinear equations, the change in enthalpy (DH) of the protein estimated by optical and calorimetric methods were in good agreement [10].
We emphasize the effect of baselines of melting curves on the estimations of the DG U of ubiquitin from its GdnHCl-induced denaturation profile, herein. Figure 3 shows the equilibrium denaturation curve obtained for ubiquitin dissolved in D 2 O at pH 7.0, 288 K, using GdnHCl as the denaturant. The data in Figure 3A have been fitted to the equation-11 using Kaleidagraph software (Synergy Software) and the fitted parameters were as follows: values of DG U , m and C m were 7.5 kcal/mol, 3.87 kcal/ mol/M and 1.94 M, respectively; C m is the concentration of denaturant at which the protein is half-unfolded. Inspection to the Figure 3A indicated that the changes in the ellipticity at 222 nm of the folded protein were in opposite direction to the unfolding transitions, at low concentrations of denaturant (from 0 to 2 M). It suggested that the action of the denaturant on the protein molecules was not in a linear fashion. In these circumstances, fit of the data to the equation-11 would definitely underestimate the DG U of the protein molecules. Hence, the pre-transition baseline of the data may presumably be treated by an exponential function shown in the following equation.
where, 's' is the concentration of denaturant at which changes in optical signals of the folded molecules begin to assume a linear fashion with respect to denaturant concentrations; 'c' is the concentration of denaturant; 'I c ' is an asymptote; 'I o ' is amplitude, which is further defined by the following equation: By substituting the equations-14 &-15 in the equation-11, the following equation is derived.
The fit of the data in the Figure 3B to the equation-16 yielded the following fitted parameters: values of DG U , m and C m were 8.6 kcal/mol, 3.91 kcal/mol/M and 2.20 M, respectively. The pre-transition baselines of the data in Figure 3A and Figure 3B have been extrapolated up to 7 M of denaturant concentrations using the fitted parameters obtained by treating the data to the equations-11&-16, respectively. When the data were treated by the equation-11, the ellipticity at 222 nm representing folded proteins was linearly changing with respect to denaturant concentrations and consequently the population of unfolded species in the melting region of the curve was overestimated ( Figure 3A). In contrary, when the data were treated by the equation-16, the ellipticity representing folded proteins was constant especially in the melting region ( Figure 3B Figure S3, which outlines the detailed steps of the stage-III of the program). The equations can be used to fit the melting data for determining the free energy of unfolding of proteins as explained above. The resultant DG U of the proteins is denoted as DG U * in the present study. The OneG, then, compares the DG U * with the DG HX * of the proteins. The two values must be in good agreement for proteins for which the equilibrium unfolding pathways are all-or-none (typical two-state) process.

Predicting cryptic intermediates/metastable states in the unfolding kinetics of proteins under native conditions
If there is a remarkable difference between the DG HX * and DG U * of a protein, the origins of the discrepancies may probably stem from many facets. The following factors must be seriously considered to avoid any spurious data from the experiments: (i) experimental conditions such as solvents (H 2 O/D 2 O) and buffer solution must be identical in both melting analysis and H/D exchange methods [9,38,39] (ii) the melting curve must be adequately defined with sufficient data points [38,42] (iii) denaturant concentrations must be accurately determined using refractive index method [39,43] (iv) H/D exchange method must be performed under pure EX2-exchage mechanism [9]. While the experimental conditions of both optical and H/D exchange methods are matching well to each other, the discrepancies arising between the DG U and the DG HX of proteins must be either due to accumulation of cryptic intermediates or metastable states in the unfolding kinetics of proteins under native conditions. Since cryptic intermediates are weakly stabilized, they can easily elude from analysis of melting curves of proteins, which in turn causes underestimation of DG U . On the other hand, DG HX would be overestimated, when the H/D exchange reactions of most slowly exchanging NHs are happening through the metastable states of proteins. The metastable states are, in general, heterogeneous denatured-like ensembles of proteins, which are higher in energy than that of denatured states of the proteins. In order to detect cryptic intermediates, denaturant-dependent exchange rates of NHs of proteins need to be estimated under native conditions, using NMR (nuclear magnetic resonance) spectroscopy. Residuespecific folding (k f /k cl ) and unfolding rates (k u /k op ) of NHs of proteins should be determined using H/D exchange methods in conjunction with NMR and mass spectrometry (MS) techniques in order to explore the energetic ensembles of metastable states of proteins. Using the experimental strategies, cryptic intermediates that are accumulating in the unfolding kinetics of proteins such as cytochrome c [44], apocytochrome b 562 [35], RNase H [45] have been reported in the literature. Similarly, existence of heterogeneous mixture of denatured-like conformations of OMTKY3 has been shown at residue level resolutions from the comprehensive analysis of H/D exchange data of the protein, derived from NMR and MS techniques [46]. Though these experiments can be used to detect and structurally characterize the cryptic intermediates and metastable states of proteins, the methods are expensive, laborious and prerequisite sound knowledge in protein chemistry.
In the fourth stage, the OneG scans proteins to predict either for cryptic intermediates or metastable states that may exist in unfolding kinetics of the proteins (Figure 1 and Figure S4). On the basis of the 3D structures of proteins, DG U , DG U * and residuespecific DG HX , the program executes its predictions using 'contact order matrix' strategies, which have been elaborately discussed in 'Design and implementation' section. The program accounts the DG HX of all NHs that are participating in the regular secondary structural segments of proteins. This is based on the fact that the NHs that are involving in the formation of H-bonds either at surface areas or loop regions of proteins undergo H/D exchanges through local structural fluctuations [45,47,48]. In outline, the program divides the NHs into a few numbers of groups based on distance constraints of 7 Å and contact order matrix (see 'Design and implementation'). Briefly, NH of any residue in a group will be in contact with NH of, at least, any one residue in the same group within 7 Å and residues in a group will be away from any residues of another group, at least with distance of 7 Å . In other words, each group is distinct from other groups in terms of distance constraints and structural contexts. Hence, the program   Figure 4A and Figure 4B, respectively. The backbone structures of the protein and residues representing each intermediate are shown in ribbon and stick models, respectively. Figure 4A shows cryptic intermediates, proposed on the basis of experimental methods, in blue, green, yellow and red colours. The residues (for which exchange kinetics were observed by experiments) representing each intermediate are shown in sticks. Figure 4B shows residues constituting three distinct intermediates as predicted by OneG program, in blue, magenta and yellow colours. doi:10.1371/journal.pone.0032465.g004 predicts that each distinct group is the representation of possible cryptic intermediate of proteins. Furthermore, the free energy coverage of each and every group is compared with other groups and groups that are having same free energy coverage are denoted as cooperative units. When the program fails to identify distinct cryptic intermediates (or more than one group) on the basis of contact order matrix, the program, by default, begins searching for possible existence of metastable states of proteins, which may lead to the overestimation of DG HX . The concept of higher energy denatured states has already been introduced in the discussion of slow cis-trans proline isomerisation in the denatured states [9,24,25]. Extending this argument, any relaxation of the denatured protein that occurs more slowly than refolding should give rise to higher energy metastable denatured state in the exchange experiment. In other words, the lifetime of the denatured state sampled in exchange experiments may be less than the time required for relaxation of denatured protein to its ground state. Perhaps this relaxation process involves diffusion of a relatively compact set of conformers, crossing from the transition state, to the broader distribution of conformers that are characteristic of the denatured ground state. In order to predict the metastable states of proteins, the program cluster all residues for which DG HX .DG X (refer 'Design and implementation) and the residues may constitute either a continuum or distinct groups of higher energy metastable denatured states of the protein [46,49]. As discussed above for the cryptic intermediates, the residues constituting metastable states are grouped and cooperative unfolding among the groups are analysed on the basis of 'contact order matrix' and 'free energy coverage', respectively. The robustness of the program on predicting the cryptic intermediates/metastable states has been validated using proteins such as cytochrome C, apocytochrome b 562 , cardiotoxin III and cobrotoxin. The agreement between the predicted data and the experimental data of the proteins is depicted in Table 4 and the comparative analyses of each protein are discussed below in detail.

Cytochrome C
Cytochrome c is a simple helical protein consisting of 104 residues. The DG U and DG HX of the protein were reported as 10.0 kcal/mol and 13.0 kcal/mol, respectively [10]. The discrepancy between the two values was 2.7 kcal/mol, after accounting the effect of cis-trans proline isomerisation ( Table 3). The discrepancy has been attributed to the existence of four distinct cryptic intermediates that are populating in the unfolding kinetics of the protein as determined by denaturant-dependent H/D exchange in conjunction with NMR techniques [44,50]. In the original article of the work, the four cryptic intermediates are denoted by four colour codes for sake of clarity: blue consisted of residues from N-and C-termini helices; green consisted of residues from 60's helix and region spanning from 20-35; yellow consisted of residues from the region spanning 36-61; red consisted of residues from the region spanning 70-85 (Table 4). Strikingly, OneG predicted three distinct cryptic intermediates of cytochrome c. First group consisted of residues such as K7, I9, F10, V11, Q12, K13, C14, A15, H18, R91, E92, D93, L94, A96, K99, A101 and T102 from the N-and C-termini helices of the protein and free energy coverage of the group was 4.6-11.2 kcal/mol. This group showed perfect resemblance to the blue cryptic intermediate of the protein detected by the experimental methods. Second group predicted by the OneG consisted of residues (L64, M65, Y67, E69, N70, K73, Y74, I75 and I85) from 60's helix and 70's loop regions of the protein and free energy coverage of the group was 4.5-8.7 kcal/mol. The second group represented the green and red cryptic intermediates together. The program was unable to discriminate the green from red, as the residues from the two regions were within 7 Å , the cut-off distance constraint used in the program. Third group predicted by the program consisted of three residues (N52, K53 and N54), which exactly resembled the yellow cryptic intermediate of the protein. The free energy coverage of the group was 4.5-4.9 kcal/mol. Figure 5 depicts structural contexts of the three cryptic intermediates (Blue, Magenta and Yellow) predicted by the OneG and the four cryptic intermediates (Blue, Green, Red and Yellow) characterized by experimental methods, on the 3D structures of the protein.

Apocytochrome b 562
Apocytochrome b 562 is a monomeric, four helix bundle protein consisting of 106 residues. The DG U and the DG HX of the protein were reported as 3.3 kcal/mol and 5.5 kcal/mol, respectively [35]. The discrepancy of 2.2 kcal/mol observed between the DG U and the DG HX of the protein was merely due to the existence of cryptic intermediates of the protein under native conditions [35,50]. Fuentes and Wand have demonstrated the existence of three distinct cryptic intermediates of the protein and also characterized at structural levels: the first cryptic intermediate consisted of residues from the two central helices of the protein; the second and third cryptic intermediates were comprised of residues from the Cterminal helix and N-terminal helix of the protein, respectively (Table 4). Interestingly, OneG predicts three distinct cryptic intermediates of the protein. The first and third intermediates predicted by the program were composed of residues such as V26, K27, D28, A29, L30, K32, R34, L38, D39, A40, Q41, K42 and A43 from central helices and residues such as E8, L10, N11, N13, L14, K15, V16, I17 and E18 from N-terminal helix, respectively. The predicted first and third intermediates were well resembled with the first and the third intermediates characterized by experimental methods, respectively (Table 4 & Figure 5). However, the second cryptic intermediate predicted by OneG was constituted by residues (I67, L68, V69, G70, Q71, I72, A75, L76, A79, N80, E81, V84) from third helix, which is part of central helical segments and also residues (A87, Q88, A89, A90, A91 and Q93) from C-terminal helix of the protein. This observation is contrary to the second cryptic intermediate characterized by experimental method for the protein, because the intermediate detected by the experiments was constituted by residues from the c-terminal helix only. However, scrutinizing the structural architectures of the protein uncovered that though the third helix is sandwiched by second helix and c-terminal helix, the residues of third helix are much closer to residues from c-terminal helix, vis-àvis their contacts with residues of second helix. Moreover, the cterminal helix is a kink-helix as the region connecting the cterminal helix and third helix of the protein is tightly pulled suggesting the two helices are likely to unfold in a cooperative manner. Since the OneG predicts cryptic intermediates purely on the basis of 3D structures of proteins, the second intermediate of apocytochrome b 562 predicted by the program is very convincing, though the predicted structures of the intermediate were not exactly same to the experimentally characterized structure of the second intermediate. The free energy coverage of the first, second and third intermediates were 2.83-5.23 kcal/mol, 1.48-4.95 kcal/ mol and 2.08-3.42 kcal/mol, respectively.

Cardiotoxin III (CTXIII)
Cardiotoxin III is monomeric, single polypeptide chain consisting of 60 amino acids and an all b-sheet protein with four disulfide bridges [51]. The DG U and DG HX of CTX III have been reported to be 4.2 and 6.6 kcal/mol, respectively [33]. After accounting the effects of cis-trans proline isomerisation of the protein using the OneG, the recalculated free energy of exchange (DG HX *) of CTX III was 6.3 kcal/mol ( Table 3). The DG U and DG U * of the protein were same, as the chemical denaturation data of CTX III was well fitted to equation-11 [33,52]. The discrepancy existing between the DG U and DG HX of the protein has been left unaddressed to date. In order to account the discrepancy, we have herein used the OneG for searching possible existence of any cryptic intermediates of proteins under native conditions, using the 3D structure (2CRT), the DG U and the residue-specific DG HX of the protein. The program predicts two distinct cryptic intermediates of the protein: the first cryptic intermediate was predicted to be situated in the triple-stranded domain of the protein and was constituted by residues such as C21, K23, M24, F25, M26, V27, V32, V34 and K35; the second  (Table 4 and Figure 6). The free energy coverage of the first and second intermediates was 2.43-5.62 kcal/mol and 1.68-4.70 kcal/mol, respectively. Interestingly, the kinetic folding pathways of CTX III have been characterized to proceed through an intermediate accumulating in the burst phase (,5 ms) of the protein [53]. Based on the refolding rate constants of NHs of CTX III obtained from quenched-flow H/D exchange experiments, it has been shown that the triple-stranded b-sheet was formed before the doublestranded b-sheet segment in the refolding kinetics of the protein.
Moreover, it has also been demonstrated that the triple-stranded b-sheet segment of the protein was persistently found in the intermediate states identified along the acid-induced and alcoholinduced unfolding pathways of CTX III [54,55]. To this extent, the predictions of OneG on the possible existence of two cryptic intermediates of CTX III under native conditions are consistent with the data reported from equilibrium and kinetic studies of the protein.

Cobrotoxin (CBTX)
Cobrotoxin (CBTX) and CTX III are homologous proteins and they are belonging to the three-finger toxin family of elapidae snake venoms [56]. The two proteins share high degree of similarities in primary, secondary and tertiary structures to each other [57]. The DG U and DG HX of CBTX have been reported to be 2.3 and 3.9 kcal/mol, respectively [33,58]. As the effect of cis-trans proline isomersation of the protein accounted only 0.1 kcal/mol and the DG U * of the protein was same as DG U [33], the resultant discrepancy of 1.5 kcal/mol was observed between the DG U and DG HX of the protein and the discrepancy have not yet been addressed, to date. In order to reconcile the discrepancy, OneG was employed as explained in the above sections and the program predicted single cluster consisting of residues from various secondary structural elements of the protein. It implied that there were no possible cryptic intermediates populating in the unfolding kinetics of the protein under native conditions. As the result, the program, by default, attempted to trace for possible existence of metastable states of the protein, with the tolerance limit of 2.4 kcal/mol (DG X of CBTX, refer design and implementation). Strikingly, the program predicted a metastable state of the protein, consisting of residues such as K26, K27, W29, R28, E38, N53 and C55, which were dispersed in the strands 3, 4 & 5 of the protein (Table 4 and Figure 7). It has been shown that the chemical unfolding and refolding of the protein proceeded by all-or-none process without the accumulation of intermediates [59]. The kinetic refolding pathways of the protein characterized by chevron plot and using hydrogen-deuterium exchange method in conjunction with multidimensional NMR techniques suggested that a broad continuum of kinetic intermediates, but not distinct intermediates, were populated in the refolding pathways of the protein [59,60]. To this extent, the OneG prediction for a metastable state of CBTX is in good agreement to the results observed from the equilibrium and the kinetic studies carried-out on the protein. However, it is worthy to mention that the extent of cooperative disruptions of H-bonds in the metastable states of proteins can be unequivocally confirmed by combined NMR and mass spectrometry analyses of H/D exchange of proteins under EX1 conditions [13,61].

Concluding remarks
We have herein demonstrated a computational tool, OneG, to address the discrepancy that may arise between the DG U and the DG HX of proteins, by systematically accounting the following factors: (i) effect for cis-trans proline isomerisation (ii) effect of baselines of melting curves on the estimation of DG U and (iii)  Figure 5A and Figure 5B, respectively. The intermediates are denoted by blue, green and red colour codes in both cases. The backbone structures of the protein and residues representing each intermediate are shown in ribbon and stick models, respectively. Figure 5A shows residues for which exchange kinetic data were reported in the literature. doi:10.1371/journal.pone.0032465.g005 possible existence of cryptic intermediates/higher energy metastable states in the unfolding kinetics of proteins under native conditions. The program prerequisites four inputs, PDB file of proteins, DG U , residue-specific DG HX and DG U * of the proteins, to successfully complete a test run in a fully automated manner. The robustness of the program has been validated through accounting the discrepancies between DG U and the DG HX of proteins such as cytochrome c and apocytochrome b 562 for which experimental rationalizations to reconcile the discrepancies have already been reported in the literature. To our best knowledge, OneG is a unique tool of this kind for systematically analyzing conformational stabilities of proteins. The program is publicly available at http://sblab.sastra.edu/oneg.html. The applications of the OneG program extend beyond rationalizing the conformational stabilities of proteins. The program reveals the degree of cooperative actions among the predicted cryptic intermediates/ metastable states. This information may be useful to explore the energy landscapes of the proteins. It is worthy to point-out that several methods have been proposed in the literature to predict rates of folding and rates of unfolding of proteins under defined conditions [62,63] and consequently, the DG U of the proteins can be reasonably calculated. In these connections, developing a tool to predict the residue-specific exchange rate constants at defined conditions (such as pH, temperature, denaturants) on the basis of 3D structures of proteins itself, would also be quite interesting in the near future. The success on the task, in turn, will lead to computationally explore the energetic levels of residues that unfold/refold by various mechanisms (global, sub-global and local structural fluctuations) under native conditions of proteins. Foreseeing the potential applications of the OneG in structural biology, we do anticipate a great scope to improve the software tool at many different aspects.

OneG algorithm
OneG algorithm has been implemented using PERL scripting language [64]. The program accepts both amino acid sequences (represented by single letter codes) and PDB (Protein Data Bank) co-ordinates of proteins for predicting k rc values of NHs. In order to calculate the k rc values, OneG considers temperature in Kelvin, pH in pD (pD = pH+0.4), ionic strength in molarity and activation energies in cal/mol [7,21]. The values of k rc and DG HX for the NHs of protein molecules are expressed in minute 21 and kcal/ mol, respectively. When a PDB file is the input, the OneG determines the cis/trans conformation of Xaa-Pro (Xaa is any one of the twenty standard alpha amino acids and Pro denotes proline) peptide bond in the protein using the following relationship: where D is the distance between the C a of Xaa and C a or C d of proline residue in angstrom; U and P stand for Xaa and Pro residues of Xaa-Pro peptide bond, respectively; X, Y and Z are the atomic co-ordinates of an atom considered. The Xaa-Pro peptide bond is considered as trans-conformation, when the distance between C a of Xaa and C a of Pro (C a -C a ) is greater than the  distance between C a of Xaa and C d of Pro (C a -C d ). Similarly, when C a -C d .C a -C a , the Xaa-Pro peptide bond is considered as cis-conformation [65]. In outline, the program has four stages for each complete cycle. In the first stage, the k rc , DG HX and protection factors (provided exchange rate constants, k ex , are given) of NHs in a protein molecule under consideration are calculated. If the solvent used is other than D 2 O, users need to specify the reference rates the solvent. In the second stage, the DG HX * , the recalculated DG HX after accounting for the effect of cis/trans proline isomerisation, is compared with DG U of the protein molecule. The tolerance level of 0.4 kcal/mol is set as default in the OneG program based on the fact that the more prevalent trans form of prolines in proteins contributes about 0.3 kcal/mol to the effect, in general [8,9]. In the third stage, OneG provides options for fitting the pre-or/and post-transition baselines of melting curves of proteins. The DG U * , the recalculated DG U after treating the baselines of melting curve of the protein to an appropriate two-state model equation, is compared with DG HX * of the protein. In the fourth stage, the program predicts the possible existence of cryptic intermediates/metastable states in the unfolding kinetics of protein on the basis of the 3D structure, DG U , DG U * and residue-specific DG HX of the protein. The fourth stage of the program has a few numbers of steps as follows: First, the program detects all NHs that are hydrogen bonded in the given 3D protein structure, using method described by Stickle et al [66]. According to the method, the hydrogen bond (H-bond) distance should be , = 3.28 Å and bond angles at the acceptor atom (N-O = C) and at the donor atom (O-N-C a ) should lie between 90u-180u [66]. Second, of the hydrogen bonded NHs, NHs that are located in the regular secondary structural segments of proteins are segregated on the basis of Hbond patterns (a stretch of i to i+3 or i to i+4 H-bonds for helical conformations and a stretch of i+n to j+n H-bonds for sheets, wherein n is 0,2,4,6 and so on.) and torsion angles (F, y angles for a-helices, 3 10 helices, parallel b-sheets and anti-parallel b-sheets are (257630, 247630), (260630, 230630), (2119630, 2113630) and (2139630, 135630), respectively). Third, the program generates all possible residue pairs for the NHs (NHs for which DG HX are available) and calculates distance in angstrom between the backbone nitrogen atoms of the two residues in each pair. The program then generates a 'contact order matrix' in which each pair is assigned either with the value of 1 or 0: the value of 1 is given to a pair when the distance between the two residues is within 7 Å otherwise 0 is given. Fourth, the program groups the residue-pairs such that any pair in a group must have at least another pair having a residue common to each other. The program avoids redundancy in grouping the residue-pairs and generates atomic coordinate files in PDB format for residues in each group/cluster. If OneG finds more than a cluster for a protein, each cluster is distinct from other clusters in terms of structural contexts. Consequently, each cluster is attributed to possible existence of a cryptic intermediate in the unfolding kinetics of the protein. However, two cryptic intermediates, which are distinct in terms of structural context but indistinguishable in terms of free energy coverage, are represented as cooperative unfolding units.
The program reports no possible existence of cryptic intermediates for a protein, if it predicts single cluster. Only under the circumstance, the program is directed, by default, to predict possible existence of metastable states of proteins, which may lead to the overestimation of the DG HX , whereas cryptic intermediates accumulating in the unfolding kinetics of proteins lead to underestimation of DG U . The program generates a cluster consisting of all NHs for which DG HX .DG X , which is defined as shown, herein.
Thus, the DG X is the DG U corrected for the effects of cis-trans proline isomerisation and baselines of melting curves of the proteins. As we discussed in the above paragraph, the residues that are herein grouped into single cluster will be further divided into subgroups on the basis of contact order matrix and any possible cooperative units among the subgroups will then be defined on the basis of 'free energy coverage'. The reliability of the OneG on the prediction of possible existence of cryptic intermediates/metastable states in the unfolding kinetics of proteins has been tested on the following four proteins: Cytochrome C (1HRC), apocytochrome b 562 (1APC), Cardiotoxin III (2CRT) and Cobrotoxin (1COD). Of the four proteins, existence of cryptic intermediates in the unfolding kinetics of cytochrome c and apocytochrome b 562 has been already documented using experimental methods, in the literature [35,44].

Chemical denaturation
GdnHCl (guanidine hydrochloride) induced unfolding of ubiquitin dissolved in D 2 O (deuterium oxide) was monitored in the wavelength range from 220 to 230 nm using the AVIV circular dichroism spectrometer. Each spectrum was the average of five scans. The path length was 1 mm and the bandwidth was set to 1 nm. All measurements were made with suitable background corrections. The changes in ellipticity at 222 nm were plotted against the concentration of GdnHCl and the data were fitted to equations 11 & 16 (refer text) to determine the DG U of ubiquitin. The data analyses were performed using Kaleidagraph software (Synergy Software, USA).