Protein Thermostability Prediction within Homologous Families Using Temperature-Dependent Statistical Potentials

The ability to rationally modify targeted physical and biological features of a protein of interest holds promise in numerous academic and industrial applications and paves the way towards de novo protein design. In particular, bioprocesses that utilize the remarkable properties of enzymes would often benefit from mutants that remain active at temperatures that are either higher or lower than the physiological temperature, while maintaining the biological activity. Many in silico methods have been developed in recent years for predicting the thermodynamic stability of mutant proteins, but very few have focused on thermostability. To bridge this gap, we developed an algorithm for predicting the best descriptor of thermostability, namely the melting temperature , from the protein's sequence and structure. Our method is applicable when the of proteins homologous to the target protein are known. It is based on the design of several temperature-dependent statistical potentials, derived from datasets consisting of either mesostable or thermostable proteins. Linear combinations of these potentials have been shown to yield an estimation of the protein folding free energies at low and high temperatures, and the difference of these energies, a prediction of the melting temperature. This particular construction, that distinguishes between the interactions that contribute more than others to the stability at high temperatures and those that are more stabilizing at low , gives better performances compared to the standard approach based on -independent potentials which predict the thermal resistance from the thermodynamic stability. Our method has been tested on 45 proteins of known that belong to 11 homologous families. The standard deviation between experimental and predicted 's is equal to 13.6°C in cross validation, and decreases to 8.3°C if the 6 worst predicted proteins are excluded. Possible extensions of our approach are discussed.


Introduction
In the last decade there has been a growing attention on the study of the thermal stability of proteins and a lot of effort from both the theoretical and experimental sides have been devoted to understand its molecular basis.The potential applications are very broad and include the possibility to rationally modify the thermal stability of targeted proteins and hence optimize the bioprocesses in which they are involved [1][2][3].This opens interesting perspectives in all academic and industrial sectors that exploit the unique properties of proteins, such as food industry, biofuel production, detergent industry, remediation of environmental pollutants, therapeutic approaches and drug design [4][5][6].
As a first step, it is quite important to gain theoretical understanding of the biophysical principles behind thermal stability.In a series of works [7][8][9][10][11][12][13][14][15][16][17] the mechanism and the interactions that promote or prevent thermal stabilization have been investigated.This is a highly non-trivial issue due to the large number of factors that influence the thermostability and to the marginal stabilization reached by the delicate balance between opposite energetic contributions.A series of factors has been indicated as responsible for the enhancement of the thermal resistance, based on the analysis of the amino acid conservation among the meso-and thermostable proteins belonging to the same homologous family.However, these factors are often not universal and family-dependent.
More general investigations of the factors that influence the thermal resistance have been performed using free energy calculations with a continuum solvation model [18].They have led to the idea that salt bridges promote hyperthermostability in proteins, whereas they make little contribution to protein stability at room temperature.This idea is supported by a lattice model which suggested that salt bridges contribute not only on the stabilization of the native states but also to the destabilization of the misfolded conformations [19].Moreover, on the basis of temperature-dependent statistical potentials, it has been shown that not only salt bridges, but also cation-p interactions, aromatic interactions, and hydrogen bonds between negatively charged and some aromatic residues tend to thermostabilize proteins, whereas hydrophobic packing appears to be neutral in this respect [20,21].
Several approaches have been devised for designing mutants that are more thermally stable than wild-type proteins.Experimental methods include directed evolution, sometimes coupled with rational or semi-rational engineering strategies [22,23]; for a review see [24] and references therein.In silico engineering approaches have also been developed, which are based on residue conservation within homologous families, on structural and dynamical features, or on free energy calculations [25][26][27][28][29].A sequence-based in silico method for predicting melting temperatures has been developed and applied to distinguish hyperthermophilic from mesophilic microorganisms [30].Even if these methods are partially successful, new, faster, more powerful and precise techniques would be welcome.
It is noteworthy that a lot more computational methods have been developed to predict the thermodynamic stability of a protein -in particular the thermodynamic stability changes upon point mutations (for review of their performances, see [31][32][33][34]).These are often used to also predict thermal stability, although thermal and thermodynamic stability are only very imperfectly correlated.Indeed, the thermodynamic stability at a given temperature is defined by the folding free energy DG at that temperature, and the thermal stability by the melting temperature T m .In Figure 1, one can find an example of the stability curves of two hypothetical proteins, one mesostable and the other thermostable, with approximately the same thermodynamic stability at room temperature (given by the DG Ã value) but with a significative difference in thermal stability (given by DT m ~Tthermo m {T meso m ) of about 50uC.There is thus a need to develop efficient and fast thermal stability predictors, without detour through thermodynamic stability.
The aim of this paper is to build an in silico method that directly predicts T m , which is the best descriptor of thermal stability.For that purpose we have generalized and optimized the set-up introduced in [20,21] for defining temperature-dependent statistical potentials.This set-up was originally devised for distance potentials that describe tertiary interactions, based on propensities of residue pairs to be separated by a certain spatial distance.Here we apply it to also define temperature-dependent torsion potentials, which describe local interactions along the polypeptide chain and are based on propensities of residues to be associated with backbone torsion angle domains [35].The main idea behind the construction is that, since thermodynamic and thermal stability are not always correlated, some new potentials that are defined at different temperatures and thus take into account the thermal properties of the intra-protein interactions have to be introduced besides the standard statistical potentials that are defined at an average temperature.This construction is illustrated in Figure 2. The practical implementation consists of building different datasets of proteins with known melting temperature and deriving statistical potentials from each of these; because of the limited amount of data only two sets were considered, a mesostable and a thermostable one.Since there are not enough experimentally resolved structures with known T m , we have enlarged the datasets by introducing some proteins with unknown T m but for which a crude estimation of T m could be obtained from the environmental temperature of the host organism.This allowed us to derive smoother potentials and to obtain better performances.
Once the potentials were derived, they were used to give a quite accurate prediction of the melting temperature of a target protein, using additional information about the T m of homologous proteins.The overall flowchart of the method is summarized in Figure 3.Its performance was compared to that of the common procedure that uses temperature-independent potentials and hence predicts thermal resistance from thermodynamic stability.

Basic protein dataset S and homologous families
To define temperature-dependent potentials, we used the protein dataset defined in [20] and denoted as S, which contains 166 protein X-ray structures with resolution ƒ2.5 and known melting temperature T m measured for the transition from the monomeric state to the denatured state.They were collected from the literature and the ProTherm database [36], and manually checked on the basis of the original articles.If several T m -values were available for a given protein, we chose the T m at the pH condition closest to 7; if different T m 's were available at the same condition the average value was taken.In Table S0 in File S1 all the proteins belonging to this set and their characteristics are reported.
In this dataset, 11 families consisting of at least three homologous proteins were identified, whose melting temperatures will be predicted later in this paper and compared to the

Enlarged, family-dependent, protein datasets S f
In view of constructing smoother potentials and designing a T mpredictor that is specific for the proteins belonging to a given family f , we have enlarged the basic dataset S. For each of the 11 families f , in turn, additional proteins belonging to f were added to the dataset S so as to create the family-dependent dataset denoted as S f .This procedure thus defines 11 different datasets S f , one for each family.
In contrast to the proteins from S, the T m 's of the additional proteins in S f have not been characterized experimentally; only the environmental temperature of their host organism, T env , is known.This temperature refers to the optimal growth temperature for the micro-and cool-blooded organisms, while for the warmblooded ones it is defined as the body temperature.The values of the T env we are using (listed in Tables S1-S11 in File S1) were manually checked from the literature.When no optimal growth temperature was reported for a given microorganism, we took the mean of the range of temperatures over which it is able to grow.
In order to obtain an estimation of the melting temperature of these additional proteins, three different methodologies were used.We would like to stress that these estimations do not pretend to yield a reliable prediction of the T m , but they yield a rough approximation allowing us to decide if they belong to the set of thermostable or mesostable proteins, as explained later.
The first two methods for estimating the T m 's are based on the environmental temperature T env .It is well known that T m and T env are correlated, since thermophilic organisms necessarily host thermostable proteins (even if the converse is not true).Based on experimental data on families of homologous proteins, a correlation between T m and T env was indeed observed and the corresponding regression line was computed [38,39].The regression line obtained in [39] is: The associated correlation coefficient, noted r (1) and computed without cross validation, is equal to 0.82.The T (1)est m 's derived with this formula are listed in Table S1-S11 in File S1.
However this correlation was derived regardless of the type of proteins.One can expect that inside a given family of homologous proteins the correlation between T m and T env is stronger due to the fact that the thermostability is in some way related to specific protein characteristics.We thus calculated the linear regression between T m and T env inside each family, even though the number of proteins per family is small and the statistical significance of the correlation questionable.The estimated T (2)est m 's so obtained are listed in Tables S1-S11 in File S1 and the regression lines for each family are given in Table S14 in File S1.The mean of the correlation coefficients r (2) computed inside each family is equal to 0.84 (without cross validation) and is thus almost equivalent to the correlation coefficient r (1) calculated on all families together.Note the peculiar case of the a-lactalbumin family (see Table S5 in File S1) for which the coefficients of the regression line are very different from the others.This family contains three proteins that belong to three warm-blooded organisms with very close T env 's (Homo sapiens 37uC, Bos taurus 38uC and Capra hircus 39uC) but T m 's that differ by more than 30uC.The T m -T env regression line obtained from these proteins is thus probably not reliable.The regression line of the lysosyme family is also atypical, but to a lesser extent.
The last method to estimate T m 's is based on the sequence similarity between the proteins.We assign as T m of a given protein the melting temperature of the protein of the same family that exhibits the highest sequence identity.This quite strong assumption is justified by the fact that, often, the higher the sequence identity, the higher the similarity among all structural, functional and thermodynamic characteristics, including thermostability.For that purpose, we performed pairwise alignments of all the sequences inside each family using the FASTA program [40].The T (3)est m 's estimated on the basis of these results are reported in Tables S1-S11 in File S1.
Thermostable, mesostable and average protein datasets Each of the 11 family-dependent sets S f was divided into two equal subsets: the mesostable ensemble S + f containing the proteins with (either known or estimated) T m smaller than a certain threshold value T T m and a thermostable set S D f in which all proteins have T m w T T m .The threshold value T T m was determined in such a way that the two subsets contain an equal number of proteins; it thus slightly depends on f .Each subset was refined separately using the protein-culling server PISCES [37].For each pair of proteins in a given subset that presents a sequence identity w25%, only one protein was kept according to the following criteria: (1) when one protein has a known T m while the other has an estimated T m we chose the protein with known T m ; (2) when both proteins have either an experimentally determined T m or an estimated T m , we chose the one with highest T m in the thermostable set and with lowest T m in the mesostable set.This procedure prevents significant sequence similarity to occur inside each subset, which could bias the predictions.It also allows us to increase the difference between the average melting temperatures T T m of the meso-and thermostable subsets, so as to get more differentiated temperature-dependent potentials.
We also constructed 11 family-dependent datasets S f from S f .These sets were not split in two, but were refined using PISCES with the criterion that when two proteins (with both either known or estimated T m ) show a high degree of sequence identity (w25%), the protein with a melting temperature closest to the mean T T m is kept and the other is discarded.This rule is not applied when one protein has an estimated T m and the other a known T m ; in such case the protein with known T m is kept and the protein with estimated T m is discarded.
This procedure yields, for each of the 11 families f , three protein datasets, a mesostable set S + f , a thermostable set S D f , and an average set S f .Each of these sets is characterized by T T m , defined as the average of the melting temperatures of the proteins belonging to the set.This average temperature depends on the considered family.The dependence is, however, very small, and we will for the simplicity of the notations not add a subscript f to T T m .The values of the T T m 's associated to the different datasets are given in Table S13 in File S1.

Stastistical potentials
Temperature-and family-dependent statistical potentials were derived from the datasets S + f , S f , S D f , which are each characterized by a different average melting temperature T T m .This is done using the Boltzmann law, following [20,21]: where s represent single amino acids or amino acid pairs, and c spatial distances between residue pairs or backbone torsion angle domains; F represent relative frequencies computed in the dataset of average melting temperature T T m , i.e.F (s,c, T T m )~n(s,c, T T m )=n( T T m ).In particular, we built two distance potentials and two torsion potentials.In the torsion potentials, s correspond either to the amino acid type a i of residue i or to the amino acid types (a i ,a j ) of residues i and j, and c corresponds to the backbone torsion angle domain t k of residue k.Seven (w,y,v) torsion angle domains were used, defined in [41].These potentials describe local interactions along the chain: ivj and i,j[fk{8,kz8g.They are denoted as DW (a,t, T T m ) and DW (a,a',t, T T m ).In the two distance potentials, the structure motif c is the spatial distance d ij between the residues i and j, with jwiz1.In DW (a,a',d, T T m ), residues i and j are of type a and a'.In DW (a,d, T T m ), residue i or j is of type a and the other is of arbitrary type.We defined the distance between two residues as the distance between the geometrical center of the heavy side-chain atoms [20].The distance values between 3.0 and 8.0 were grouped into 25 bins of 0.2 width; two additional bins describe distances larger than 8.0 and smaller than 3.0 , respectively.Moreover, we used a trick to artificially increase the number of occurrences in each bin and thereby smooth the potential.We summed the occurrences of neighboring bins, giving them a decreasing weight: where n i represents the number of occurrences n(c,s, T T m ) or n(c, T T m ) in bin i, and ' is set equal to 3; n(s, T T m ) and n( T T m ) are normalized consequently.
In order to deal with the limited size of the datasets, a correction for sparse data [35] is applied: where the expected number of occurrences is n e ~n(s, T T m )n(c, T T m )=n( T T m ), and z an adjustable parameter.This correction ensures that the potentials are close to 0 when the number of observations in the dataset is too small.The value of z was chosen to be equal to either 10 or 20.
We computed all the statistical torsion and distance potentials DW f (s,c, T T m ) using the two values of z and the three different procedures for estimating T m from T env , described in the previous subsections.This yields six different series of DW f (s,c, T T m )'s.The final torsion and distance potentials that we consider in the following correspond to the average of these six potentials.

Prediction of the melting temperature T m
The folding free energy DG at some temperature referred to as T T m of a protein p that belongs to the family f is evaluated by a linear combination of the four torsion and distance potentials defined in Eq. ( 2), which are derived from the sets of proteins (S f , S + f and S D f ) of average melting temperature T T m : where i=j,j+1 for the distance potentials, k{8ƒivjƒkz8 for the torsion potentials, N f is a family dependent normalization factor, and N p is the number of residues of p.Let us for simplicity denote as DG p , DG + p and DG p the family-and T-dependent folding free energies of protein p belonging to f computed using the statistical potentiels derived from the sets S f , S + f and S D f , respectively.
We predict the melting temperature on the basis of these potentials in two different ways.In the first, we assume that the melting temperature is proportional to the average folding free energy DG .This is the common procedure that predicts thermal from thermodynamic stability.In the second, original, method, we assume that the melting temperature is proportional to the difference in folding free energy at two different temperatures: ½DG D {DG + .In these two procedures, the parameters, generically denoted as P, are optimized so as to minimize the standard deviation between the predicted and experimental melting temperatures of the ensemble of considered proteins; we use for that purpose the minimization function implemented in Mathematica 7.More precisely: where In order to avoid overestimating the performance of our method, we performed cross validation using the jack-knife technique: the parameters are identified on all proteins but one, which is used as test protein; every protein in turn is considered as test protein, and the average score is considered.

Results
The contributions of amino acid interactions to protein stability are known to be temperature-dependent; some may be more stabilizing than others in the high temperature regime and less stabilizing than others at low T, or conversely [18,20,21,42,43].Such dependence need to be taken into account for a proper analysis of thermal stability properties.For that purpose, we created different datasets of proteins with known melting temperatures: in S + sets only mesostable proteins were considered, in S D sets all entries are thermostable, and in S sets all proteins were taken independently of their T m .Each ensemble has been associated with a temperature T T m computed as the mean of the T m values of the proteins belonging to the set.
Predicting the melting temperature of a protein from its structure alone is quite a difficult task, and we therefore focus on the slightly simpler problem of predicting this temperature using information from homologous proteins.We hence selected 11 families of proteins of known T m , labelled by f , and defined 11 triplets of sets S D f ,S + f ,S f , by adding proteins belonging to the family to the complete set S, following the procedure explained in the Methods section.
From each of these datasets characterized by an average melting temperature T T m , two torsion potentials and two distance potentials have been derived using the standard statistical-potential formalism that converts the relative amino acid frequencies into free energy trough the Boltzmann law (Eq.(2)).The torsion potentials are based on the propensities of single amino acids and amino acid pairs to adopt some backbone torsion angles and describe local interactions along the chain.The distance potentials describe tertiary interactions and are computed from propensities of amino acid pairs to be separated by a certain spatial distance.The total folding free energy DG at some temperature T T m is explicitly computed as a linear combination of these different statistical potentials, derived from the dataset associated with T T m (Eq.( 5)).We hence obtain, for each protein p, three folding free energies DG D p , DG + p and DG p ; the coefficients of the combination are parameters that are fixed in a further step.In Figure 2 these three folding free energies at different temperatures T D , T + and T are depicted on the stability curve of a hypothetical protein.
Two procedures are used to predict the T m 's from these free energies.The first assumes a linear correlation between T m and DG , which is the standard way of predicting melting temperatures.The second, novel, procedure consists of assuming a linear correlation between T m and ½DG D {DG + .In the last step, the parameters (i.e. the coefficients of the linear combination of statistical potentials) were identified so as to minimize the difference between the computed and experimental T m 's (Eq.( 6)).
To avoid an overestimation of the performance, we systematically performed cross validations using the jack-knife technique as explained in the Methods section.
The first procedure, which assumes a correlation between T m and DG , is justified by the fact that the thermodynamic and thermal sometimes related, even if this is obviously not always true.Indeed, in the language of [44] (for a more recent review see also [45]), one way for the protein to enhance its thermostability is to increase its thermodynamic stability at all temperatures, thereby shifting the entire stability curve ''downwards'', i.e. towards lower DG's.The other two ways to increase thermal resistance, namely a decrease of the heat capacity change DC P that brings a modification of the shape of the curve and a global shift of the curve towards the high temperature region, are instead better captured by the second procedure, which assumes a correlation between T m and the difference between the folding free energy at different temperatures, i.e. ½DG D {DG + .
The results of the T m predictions for all proteins of our dataset are plotted in Figure 4. Figure 4.a shows the correlation between the experimental melting temperature and the temperature predicted from the folding free energy difference ½DG D p {DG + p .The associated linear correlation coefficient r D+ is equal to 0.68 (Pvalue 10 {7 ). Figure 4.b shows instead the correlation between the experimental T m 's and the T m 's predicted from the average potential DG p .The corresponding linear correlation coefficient is very low: r = 0.15 and is not statistically significant (P-value 0:3).Clearly, the new procedure presented here, which predicts melting temperatures from ½DG D p {DG + p using T-dependent statistical potentials, is much superior to the common procedure that predicts T m from DG p using simple T-independent potentials.
Focusing on the ½DG D p {DG + p -based method, we analyze whether some proteins are better predicted than others, and whether badly predicted proteins cause a significant decrease of the overall performance.In Figure 4.c, the 6 proteins that are predicted worst are excluded.To identify these proteins, we excluded at each step the protein whose melting temperature is predicted worst and we recompute the T m 's of the remaining proteins.We repeat the procedure until 6 proteins are excluded.In this case the linear correlation coefficient rises up to 0.83 (P-value v10 {10 ).
The standard deviations s between the predicted and experimental values of the melting temperatures, computed for each family individually, are reported in Table 1; the results per protein are given in Table S12 in File S1.On average, s D+ is equal to 13.6 o C when computed on the basis of the free energy difference ½DG D p {DG + p .This is significantly better than the average s-value computed with the standard DG p -based method, which yields s ~17.6 o C.Moreover, removing the 6 worst predicted proteins reduces s D+ from 13.6 to 8. The best predicted families are acylphosphatase, a-amylase and b-lactamase, with s D+ -values between 5.9 and 7.5 o C, while the worst are cytochrome P450 and myoglobin, with s D+ -values around 19 o C. The proteins from the latter two families contain a heme, whereas the proteins from the other families contain no ligands or very small ones (see Tables S1-S11 in File S1).As our statistical potentials do not take into account the interactions with the ligands, mutations in the region of the heme are necessarily not estimated properly.The presence of the heme could thus well be the reason for the poor predictions in the cytochrome P450 and myoglobin families.
The average T m prediction score obtained with the standard, DG -based, method is significantly lower than the one that uses ½DG D {DG + .It is however noteworthy that some families are better predicted with the former method.This is clearly the case for the endoglucanase family and to a lower extent for the lysozyme family.This result suggests that these proteins are thermally stabilized through a shift of the entire stability curve towards lower DG-values.

Discussion
A complete understanding of the features that determine protein thermal stability is still far from being reached.We have however made some progress towards this goal.The originality of our approach lies in the use of temperature-dependent statistical potentials, derived from distinct sets of protein structures, containing either mesostable or thermostable proteins.Linear combinations of these meso-and thermostable potentials, with coefficients identified so as to minimize the standard deviation between experimental and predicted T m 's, were used to predict the melting temperature on a set of 45 proteins that belong to 11 different homologous families.
These potentials allowed us to determine in an objective way the interactions that contribute most to protein stability in different temperature ranges and also, interestingly, the interactions that are less destabilizing -in other words, less repulsive -according to the temperature.For example, the temperature-dependent distance potentials point salt bridges, cation-p and aromatic interactions to contribute more to stability at high temperatures than hydrophobic packing, and conversely, and the interactions between positively charged residues to be less repulsive at high than at low temperature relative to other interactions [20,21].
The novel temperature-dependent torsion potentials introduced here show also a significant dependence on the temperature.They provide indeed a non-negligible improvement of the T m prediction performance.However, are much more difficult to interpret in terms of specific interactions than distance potentials.Indeed, they reflect the propensities of amino acids and amino acid pairs to be associated to backbone torsion angle domains in their vicinity along the polypeptide chain, up to eight sequence positions further.These propensities are obviously related to secondary structure preferences but in an intricate way.
Another important feature that ensures the success of our approach is the focus on families of homologous proteins.We indeed defined family-and temperature-dependent statistical potentials, that include more proteins of the family under consideration and hence bias the potentials towards it.Note that we nevertheless kept the pairwise sequence similarity in the set to be at most 25%, to avoid uncontrolled biases.As the number of proteins with known T m is quite limited, we also used proteins of unknown T m but of known T env to enlarge the datasets from which potentials are derived, using three different rules to roughly estimate the former from the latter.
Note that the same approach as the one proposed here can be used for general T m predictions, independently of protein families.However, this -as expected -decreases significantly the score of the predictions.On the other hand, we would like to emphasize that our method predicts the T m of a given protein from the T m of homologous proteins, which have sometimes very different sequences.A much easier goal would be to predict the change in melting temperature upon point mutations (DT m ).
The results presented here are very encouraging, but severely suffer from lack of data.Indeed, the number of proteins with experimentally determined structure and melting temperature is too limited, both for deriving sufficiently reliable temperaturedependent statistical potentials, and for biasing them properly towards a given protein family.The comparison of the score obtained in cross validation (s D+ ~13:6 0 C between predicted and measured T m 's) with the score in direct validation (s D+ ~5:5 0 C) indicates that improvement can be expected from an increased dataset.Another source of errors is due to the fact that some families contain ligands, such as the hemes for the myoglobin and cytochrome families.These ligands sometimes strongly affect the stabilization properties of the proteins but cannot be taken into account in our potentials, which are limited to the residues of the polypeptide chain.This inevitably brings up the value of s.Finally, some experimental error should be included in the evaluation.This involves the intrinsic experimental error but, more importantly, the fact that the available experimental data are sometimes not performed exactly in the same experimental conditions in terms of pH, ionic strength, etc.
This discussion allows us to conclude on a positive note: the performance of our method is already quite good but is expected Table 1.Values of the standard deviations s D+ and s between the measured and the predicted melting temperatures (in degrees); s D+? means the standard deviation excluding the 6 proteins whose T m is predicted worst; N indicates the number of proteins in the family.to significantly improve when larger datasets of proteins with known T m , obtained in identical experimental conditions, will be available.

Figure 1 .
Figure1.Thermal versus thermodynamic stability.An example of the stability curves of an hypothetical couple of mesostable and thermostable proteins, characterized by an equal thermodynamic stability at room temperature, but different thermal stabilities.doi:10.1371/journal.pone.0091659.g001

Figure 2 .Figure 3 .
Figure 2. Folding free energies at different temperatures.Plot of the stability curve as a function of the temperature, and of the values of the three folding free energies DG + , DG and DG D at the respective temperatures T + , T , T + , for a hypothetical protein.doi:10.1371/journal.pone.0091659.g002 ) and P ~(b 0 ,b 1 ,b 2 ,b 3 ,N f ,c ,d ); the sum over p in these expressions means the sum over all the proteins with known melting temperature T m,p that belong to the 11 homologous families.The coefficients (c D+ ,c ) and (d D+ ,d ) give, respectively, the slope and the intercepts of the regression line between computed folding free energies and experimental melting temperatures that best fit the data.
3 o C. For comparison, we added in the Table the results obtained in direct validation, which yield a s D+ of 5.5 o C.

Figure 4 .
Figure 4. Melting temperature prediction.Relation between the experimental melting temperature T exp m and the predicted temperatures: (a) T D+ m is computed from the folding free energy difference ½DG D p {DG + p (correlation coefficient r +D = 0.68), (b) T m from the folding free energy DG p (r = 0.15), and (c) T D+? m from ½DG D p {DG + p excluding the 6 proteins that are predicted worst (r +D? = 0.83).doi:10.1371/journal.pone.0091659.g004 Family DG D {DG + Â Ã DG ½DG D {DG + ½ DG D {DG + N