Sequence statistics of tertiary structural motifs reflect protein stability

doi:10.1371/journal.pone.0178272

Fig 1.

TERM-based ΔΔG_m prediction.

Procedural flow is indicated with arrows, starting from the top left. Given a structure of the protein of interest, a TERM is defiend around the mutated position (green sphere) to include any potentially contacting positions (yellow spheres) and flanking backbone segments (white sticks and ribbon). The TERM is next decomposed into sub-TERMs—i.e., substructures containing a subset of the contacting positions and flanking segments. Structural ensembles for each sub-TERMs are generated by searching the PDB for close structural matches using MASTER [47]. Finally, sequences from matching ensemble of all sub-TERMs (and the original TERM, data permitting) are used to extract positional and pair amino-acid preferences to predict ΔΔG_m.

More »

Expand

Table 1.

Prediction performance with different strengths of regularization.

More »

Expand

Table 2.

Prediction performance under different models.

More »

Expand

Fig 2.

The performance of TERM-ΔΔG₂ on S2648.

Predicted and measured ΔΔG_m values are plotted on the X- and Y-axes, respectively. Color represents point cloud density. The least-squares regression line is shown with dashes.

More »

Expand

Fig 3.

The role of multi-contact ensembles in ΔΔG_m prediction, on the example of 1RIS_AI8A.

(A) and (B) correspond to models TERM-ΔΔG₁ and TERM-ΔΔG₂, respectively. The mutated position is shown in yellow and all its contacting positions (9 in total) are shown in cyan. Values of estimated sEP and pEPs are shown in red and blue, respectively. The experimental ΔΔG_m for the mutation is 3.56 kcal/mol (destabilizing).

More »

Expand

Table 3.

The performance of the TERM-based model relative to other published methods.

More »

Expand

Fig 4.

The performance of different methods on the S699 set.

Data in each pannel are shown in the same manner as in Fig 2, with panel title indicating the prediction method used.

More »

Expand

Fig 5.

Abundance of structural information is critical to performance of prediction.

(A) The distribution of ubiquity for mutations in the S2648 set. Quartile boundaries are labeled as dashed lines. (B) Performance of prediction on the four subgroups, from low ubiquity (group 1) to high ubiquity (group 4). The same representation is used here as in Fig 2.

More »

Expand

Fig 6.

Prediction performance increases with the size of the structural database.

The model represented by each curve is indicated in the legend. For each level of subsampling, three samples were generated, with error bars showing the standard deviations among the three trials for each experiment. The functional form used in fitting is shown in the upper-left corner. The numbers on the right side of each curve indicate the corresponding best-fit plateau values (i.e., parameter a).

More »

Expand