Skip to main content
Advertisement

< Back to Article

Table 1.

Biophysical property channels for protein structure voxels.

More »

Table 1 Expand

Fig 1.

An overview of the ThermoNet computational framework.

(A) Protein structures are treated as if they were 3D images. A 16 Å × 16 Å × 16 Å cubic neighborhood centered at the Cβ atom (red sphere) of the mutated residue (or Cα atom in the case of a glycine) of an example protein (PDB ID: 1L63) is discretized into a 3D voxel grid at a resolution of 1 Å. Each voxel is represented by a gray dot. (B) Just as an RGB image has three color channels, the 3D voxel grid is parameterized with seven biophysical property channels: hydrophobic, aromatic, hydrogen bonding donor, hydrogen bond acceptor, positive ionizable, negative ionizable, and occupancy. The saturation level of each voxel ranges from 0.0 to 1.0 and is colored accordingly (Methods). (C) To predict the change in thermodynamic stability caused by a given single-point mutation, ThermoNet calls Rosetta to refine the wild-type structure and to create a structural model of the mutant protein. (D) ThermoNet voxelizes the space around the mutation site of both the Rosetta-refined wild-type structure and the corresponding mutant structural model. Both the 3D voxel grid of the wild-type structure and that of the mutant model are parameterized accordingly to create two [16, 16, 16, 7] feature maps. (E) The feature maps are then stacked to create a [16, 16, 16, 14] tensor as an input to the trained deep 3D convolutional neural network. The final output of the network is the predicted ΔΔG the given mutation causes to the wild-type protein structure.

More »

Fig 1 Expand

Fig 2.

Data set curation and identification of shared homology.

(A) Venn diagrams showing the amount of overlap at the protein level between three widely used training sets S2648, VariBench, and Q3421 for ΔΔG predictors and the Ssym test set. Numbers in these diagrams indicate protein counts. Upper panel and lower panel indicate that both S2648 and Q3421 share 14 identical proteins with Ssym; middle panel indicates that VariBench and Ssym share 11 identical proteins. All three data sets share additional homology with Ssym, which is presented in S3, S4, and S5 Tables, respectively. (B) Creating data sets for robust training and testing of ThermoNet. We started with the Q3421 set of 3421 mutations from 150 proteins. (Numbers in data set names indicate the number of unique mutations the data set contains.) After homology reduction and anti-symmetry data augmentation (Methods), this data curation workflow gives a training set of 3488 mutations with an equal representation of stabilizing and destabilizing changes and reduced homology to the Ssym test set. A separate data set called Q6428 was also created by augmenting the Q3214 data set before homology reduction to train ThermoNet*.

More »

Fig 2 Expand

Table 2.

Comparative analysis using the balanced test set Ssym.

More »

Table 2 Expand

Fig 3.

Performance of ThermoNet on the blind test set.

(A) Performance of ThermoNet on predicting ΔΔG for direct mutations; The Pearson correlation coefficient (r) between predicted values and experimentally determined values is 0.47, and the root-mean-square error (σ) of predicted values from experimentally determined values is 1.56 kcal/mol. The dots are colored in gradient from blue to red such that blue represents the most accurate prediction and red indicates the least accurate prediction. (B) Cumulative distribution of ThermoNet prediction error on direct mutations. (C) Performance of ThermoNet on predicting ΔΔG for the reverse mutations (r = 0.47, σ = 1.55 kcal/mol). (D) Cumulative distribution of ThermoNet prediction error on reverse mutations. (E) Direct versus reverse ΔΔG values of all the mutations in the blind test set predicted by ThermoNet. A perfectly unbiased predictor would give r = −1 and 〈δ〉 = 0 kcal/mol. ThermoNet successfully reduces prediction bias with r = −0.96 and 〈δ〉 = −0.01 kcal/mol. (F) Distribution of ThermoNet prediction bias.

More »

Fig 3 Expand

Fig 4.

ThermoNet predicted well the ΔΔGs of mutations in the p53 tumor suppressor protein and myoglobin.

(A) Performance of ThermoNet on predicting ΔΔG for the direct mutations in p53 (r = 0.45, σ = 2.01 kcal/mol). (B) Performance of ThermoNet on predicting ΔΔG for the reverse mutations in p53 (r = 0.56, σ = 1.92 kcal/mol). (C) Direct versus reverse ΔΔG values of all p53 mutations predicted by ThermoNet (rdirrev = −0.93 and 〈δ〉 = −0.04 kcal/mol). (D) Performance of ThermoNet on predicting ΔΔG for the direct mutations in myoglobin (r = 0.38, σ = 1.16 kcal/mol). (E) Performance of ThermoNet on predicting ΔΔG for the reverse mutations in myoglobin (r = 0.37, σ = 1.18 kcal/mol). (F) Direct versus reverse ΔΔG values of all myoglobin mutations predicted by ThermoNet, with a Pearson correlation of rdirrev = −0.97 and 〈δ〉 = −0.02 kcal/mol. The dots are colored in gradient from blue to red such that blue represents the most accurate prediction and red indicates the least accurate prediction.

More »

Fig 4 Expand

Fig 5.

Predicted ΔΔG distributions of ClinVar missense variants.

(A) The overall ΔΔG distributions of ClinVar variants predicted by ThermoNet and FoldX. ThermoNet’s predictions are consistent with the expected range based on experimentally determined ΔΔG values (-5 kcal/mol to +5 kcal/mol). In contrast, more than 15% of ΔΔGs predicted by FoldX are outside the expected range. (B) The ΔΔG distributions for ClinVar benign variants predicted by ThermoNet and FoldX. (C) The ΔΔG distributions of ClinVar pathogenic variants predicted by ThermoNet and FoldX. The ΔΔGs of 80.2% of benign variants predicted by ThermoNet fall within the neutral zone (-0.5 to +0.5 kcal/mol, region between dashed lines), in which variants are not expected to influence fitness. FoldX only predicted 39.7% of benign variants to be in the neutral zone. Further, the ΔΔGs of pathogenic variants predicted by ThermoNet suggest pathogenic variants are nearly equally likely to be stabilizing (47.3%) as destabilizing (52.7%). In contrast, FoldX predicted that 83.2% of pathogenic variants are destabilizing. Variants for which FoldX ΔΔG is > 20 kcal/mol are omitted for clarity. Percentages represent the fractions of variants whose ΔΔGs are predicted to be in the neutral zone.

More »

Fig 5 Expand