Fig 1.
Procedural flow is indicated with arrows, starting from the top left. Given a structure of the protein of interest, a TERM is defiend around the mutated position (green sphere) to include any potentially contacting positions (yellow spheres) and flanking backbone segments (white sticks and ribbon). The TERM is next decomposed into sub-TERMs—i.e., substructures containing a subset of the contacting positions and flanking segments. Structural ensembles for each sub-TERMs are generated by searching the PDB for close structural matches using MASTER [47]. Finally, sequences from matching ensemble of all sub-TERMs (and the original TERM, data permitting) are used to extract positional and pair amino-acid preferences to predict ΔΔGm.
Table 1.
Prediction performance with different strengths of regularization.
Table 2.
Prediction performance under different models.
Fig 2.
The performance of TERM-ΔΔG2 on S2648.
Predicted and measured ΔΔGm values are plotted on the X- and Y-axes, respectively. Color represents point cloud density. The least-squares regression line is shown with dashes.
Fig 3.
The role of multi-contact ensembles in ΔΔGm prediction, on the example of 1RIS_AI8A.
(A) and (B) correspond to models TERM-ΔΔG1 and TERM-ΔΔG2, respectively. The mutated position is shown in yellow and all its contacting positions (9 in total) are shown in cyan. Values of estimated sEP and pEPs are shown in red and blue, respectively. The experimental ΔΔGm for the mutation is 3.56 kcal/mol (destabilizing).
Table 3.
The performance of the TERM-based model relative to other published methods.
Fig 4.
The performance of different methods on the S699 set.
Data in each pannel are shown in the same manner as in Fig 2, with panel title indicating the prediction method used.
Fig 5.
Abundance of structural information is critical to performance of prediction.
(A) The distribution of ubiquity for mutations in the S2648 set. Quartile boundaries are labeled as dashed lines. (B) Performance of prediction on the four subgroups, from low ubiquity (group 1) to high ubiquity (group 4). The same representation is used here as in Fig 2.
Fig 6.
Prediction performance increases with the size of the structural database.
The model represented by each curve is indicated in the legend. For each level of subsampling, three samples were generated, with error bars showing the standard deviations among the three trials for each experiment. The functional form used in fitting is shown in the upper-left corner. The numbers on the right side of each curve indicate the corresponding best-fit plateau values (i.e., parameter a).