Skip to main content
Advertisement

< Back to Article

Figure 1.

Off-rate estimation using hotspot energies and organization.

In this work we generate a set of hotspot descriptors for characterizing off-rate changes upon mutation. The hotspot descriptors use single-point alanine ΔΔGs from computational alanine-scans generated using hotspot prediction algorithms, to predict changes in off-rate upon single-point and multi-point mutations to all residue types. To do so, for a given wild-type complex structure, the interface is scanned for hotspots using a hotspot prediction algorithm. The single-point alanine ΔΔGs from the scan are extracted and stored. Next, the structural mutation in question is applied and the mutated interface re-scanned for hotspots. This generates a new set of single-point alanine ΔΔGs for the mutated interface. Note that the mutation in question may also affect the hotspot energies of other neighboring residues which are not mutated. The two sets of ΔΔGs are then used to generate a set of hotspot descriptors, where the final hotspot descriptor value is the change in the descriptor's value from mutant to wild-type. For example in the case of Int_HS_Energy, the final value is the change in the sum of the ΔΔGs, of all hotspot residues, pre- and post-mutation. Hotspots are also categorized into core, rim, support and hotregions. This enables us to investigate and account for cooperative effects within hotregions and to identify differences in regions critical for stability, both on complexes of different size and interface area.

More »

Figure 1 Expand

Table 1.

Summary of hotspot descriptors.

More »

Table 1 Expand

Table 2.

Pearson's Correlation Coefficient (PCC) of hotspot descriptors with experimental Δlog10(koff) for the 713 off-rate mutations in SKEMPI.

More »

Table 2 Expand

Figure 2.

Relationship of off-rate changes upon mutation with change in binding free energy and change in interface hotspot energy.

(A) The relationship between experimental values for Δlog10(koff) and ΔΔG for all the 713 mutations in the SKEMPI off-rate dataset. (B) The relationship between changes in interface hotspot energies, as predicted by RFSpot_KFC2 hotspot predictor, and change in Δlog10(koff) for all the 713 mutations in the SKEMPI off-rate dataset. Note that 50% of off-rate mutants in this dataset involve mutations to non-alanine residues and include multi-point mutants. In turn Int_HS_Energy characterizes these changes with the use of single-point alanine ΔΔGs as highlighted in Figure 1.

More »

Figure 2 Expand

Table 3.

Relationship between experimental ΔΔG, Δlog10(koff), Δlog10(kon) and change in interface hotspot energy (Int_HS_Energy) for 713 mutations in SKEMPI.

More »

Table 3 Expand

Figure 3.

Hotspot and molecular descriptors for estimating change in off-rate.

The hotspot descriptors designed in this work are benchmarked against a set of 110 molecular descriptors; both in their ability to estimate Δlog10(koff) and in their ability to detect stabilizing mutations of Δlog10(koff) <−1. The performance measures shown here enable us to assess the raw predictive power of the descriptors independent of any learning models. Green and black bars highlight descriptors from the hotspot and molecular descriptor sets respectively. (A) Comparison of the distribution of the absolute PCC values for the hotspot descriptors designed in this work against that for the molecular descriptors. The related list of descriptor names and their respective PCCs is found in Text S5. (B) Top 10 hotspot descriptors and top 10 molecular descriptor according to absolute PCC with experimental Δlog10(koff). (C) Mann Whitney U-Test rankings for all descriptors where values are ranked according to −log10(pval) and represent the discrimination ability of the descriptors for the detection of stabilizing mutants (Δlog10(koff) <−1) from neutral to destabilizing mutants (Δlog10(koff) >0) (Referred to as CDS1). This dataset contains 31 stabilizing mutants and 503 neutral to destabilizing mutants. (D) Matthew's Correlation Coefficient (MCC) rankings for all descriptors on same dataset. (E) and (F) are identical to (C) and (D) except that results are for off-rates that satisfy |Δlog10(koff)| >1. This dataset contains 31 stabilizing mutants and 213 destabilizing mutants (referred to as CDS2).

More »

Figure 3 Expand

Figure 4.

Hotspot and molecular descriptor scatter plots.

The relationship between experimental values for Δlog10(koff) and (A) hotspot descriptors showing highest correlation with Δlog10(koff) (SuppHSEnergyKFC2a - changes in hotspot energies in the support region as predicted by KFC2a [30]), (B) molecular descriptor showing highest correlation with Δlog10(koff) (AP_MPS - the DARS atomic potential [54]), (C) top performing hotspot descriptor for the detection of stabilizing mutants (HSEner_PosCoopRFSpot – changes in hotspot energies on accounting for positive cooperativity in hotregions) and (D) top performing molecular descriptor for the detection of stabilizing mutants (CP_TB – coarse grained protein-protein docking potential).

More »

Figure 4 Expand

Figure 5.

Off-rate prediction models using hotspot and molecular descriptors.

A number of RF regression and classification models are built using different sets of hotspot and molecular descriptors. The prediction accuracy is also assessed on subsets of mutations defined as data regions. The data regions enable us to identify classes of mutations, which are consistently harder to characterize, data set biases and prediction patterns. (A) PCC values for off-rate model predictions with Δlog10(koff). Models use hotspot descriptors, or a combination of hotspot and molecular descriptors. The different methods indicate the hotspot prediction method by which the hotspot descriptors where generated from. (B) Data region analysis of predictions from each model. The prediction from each model are subset into the respective categories shown on the x-axis and values in matrix show PCC achieved by the given model for the given data region. (C) MCC values for off-rate classifier model predictions for classification data sets CDS1 in blue and CDS2 in red. CDS1 includes neutral mutations whereas CDS2 excludes neutral mutations; hence the detection of stabilizing mutants is enhanced in the latter, though results for CDS1 are more relevant for interface design scenarios. (D–F) are similar to (A–C) except that off-rate prediction models using subsets of molecular descriptors are investigated. CP – Coarse-Grain Potentials; AP – Atomic-Based Potentials; CP-AP – All Statistical Potentials; PB – Physics Based Energy Terms. As a benchmark comparison, results for RFSpot_KFC2Off-Rate (best performing off-rate predictor using hotspot descriptors) and RF_Spot_KFC2Off-Rate+MOL (best performing off-rate predictor using hotspot and molecular descriptors) are also included in (D–F).

More »

Figure 5 Expand

Figure 6.

Off-rate prediction model scatter plots.

The relationship between experimental values for Δlog10(koff) and predicted values for Δlog10(koff) with (A) RFSpot_KFC2Off-Rate+MOL, best performing off-rate prediction model combining hotspot and molecular descriptors. Hotspot descriptors for this model are generated using the RFSpot_KFC2 hotspot prediction algorithm. (B) RFSpot_KFC2Off-Rate+MOL, best performing off-rate prediction model using only hotspot descriptors. Hotspot descriptors for this model are again generated using the RFSpot_KFC2 hotspot prediction algorithm. (C) MolecularOff-Rate, off-rate prediction model using molecular descriptors. The addition of hotspot descriptors as observed in (A) to molecular descriptor model as shown in (B) notably improves the prediction of stabilizing mutants, which are all found in the lower left quadrant for RFSpotKFC2Off-Rate+MOL.

More »

Figure 6 Expand

Figure 7.

Detection of rare complex stabilizing mutations using off-rate classification models.

(A) Ranked list of 31 stabilizing mutations (Δlog10(koff) <−1) in SKEMPI off-rate dataset. The list is ranked according to the number of off-rate prediction classification models that detect the mutation in question as stabilizing. Detections per model (B) are highlighted in white, and non-detections highlighted in black. The lower portion of (A) is dominated by single-point mutations to alanine residues, which suggests that the stabilizing effects of these mutations, as opposed to their more common neutralizing/destabilizing effects, are much harder to characterize.

More »

Figure 7 Expand

Table 4.

Performance of off-rate classification models for the detection of stabilizing mutations.

More »

Table 4 Expand

Figure 8.

Specialized feature selection models and descriptor-data region networks.

Feature selection models using a genetic algorithm are run for different data regions of the off-rate dataset for which both linear (using Linear Regression) and non-linear (using SVM regression) models are investigated. For each data region, the GA-FS is run 50 times designed to find an optimal feature set of size 5. Initial features available in the population are the 110 molecular descriptors and 16 hotspot descriptors generated by RFspot_KFC2. An inner-cross validation loop is used as a scoring function for driving the feature selection whereas and outer-cross validation loop is used to assess the model prediction accuracy. (A) and (B) shows the importance of the most selected features for each data region. The features shown are those that are part of the final model for any data region on more than 50% of the GA-FS runs, and the color bar displays this percentage. The features on the y-axis are ordered as: coarse-grain potentials, atomic-based potentials, physics-based energy terms and hotspot descriptors. (C) and (D) are descriptor-data region networks for (A) and (B) respectively. Circled nodes represent data regions and square nodes represent features; therefore, only edges between circle and square nodes are present. An edge is present if the feature is in the final model for the given data region in more than 50% of the GA-FS runs (dotted edge), between 70–90% of the GA-FS runs (normal edge), more than 90% of the GA-FS runs (bold edge). Coarse-grain potentials (blue), atomic-based potentials (yellow), physics-based energy terms (green), hotspot descriptors (pink) and data regions (gray). From the descriptor-data region networks, descriptors highly specific to certain classes of off-rate mutations can be observed. Conversely, as in the case of the GS-FS (SVM) data region network, a cluster of broadly-predictive hotspot descriptors is also shown. (E) Mean PCC of the optimal models found by the GA-FS runs for each data region. For comparison, PCC results on the data regions results are also shown for RFSpot_KFC2Off-Rate+Mol. Note that the latter model is trained on all 713 off-rate mutations, and the predictions are separated post prediction into data regions and analyzed for their PCC. This effectively compares the predictions of specialized models vs. one-fits-all model. Though we find no evidence that specialized models perform better than a one-fits-all model, certain subsets of mutations, such as those at the rim regions, show notable improvements when a specialized model is employed.

More »

Figure 8 Expand

Figure 9.

Stability regions, interface-area and complex-size.

The changes in hotspot energies upon mutation are assessed at three interface regions, which enable us to explore changes in the distribution of stability for complexes of different size and interface-area. CORE, RIM and SUPP represent the PCCs of CoreHSEnergy/RimHSEnergy/SuppHSEnergy averaged for the 6 hotspot prediction algorithms with Δlog10(koff).(A) PCCs for mutants on Complexes with interface-area >1600 Å2 (LIA). (B) PCCs for mutants on complexes with interface-area <1600 Å2 (SIA). (C) PCCs for mutants on complexes with size <500 residues (SCS). (D) PCCs for mutants on complexes with size >500 residues (LCS). (E) LIA-SCS, (F) LIA-LCS, (G) SIA-SCS, (H) SIA-LCS. (I) Scatter plot of complex size vs. interface area for all complexes in off-rate mutant dataset. Here it is observed that complex stability is distributed across all three regions for small-size complexes (C, E and G), whereas the core becomes a localized region of stability for large-complex sizes (D, F, H). On analysis of the interface-area vs. complex-size subsets (E–H), the distribution of stability regions is affected primarily through complex-size irrespective of interface-area.

More »

Figure 9 Expand

Figure 10.

Effects of cooperativity on effective energetic contribution of hotregions.

The summation of single-point alanine ΔΔGs of a hotregion may underestimate/overestimate its contribution if negative/positive cooperative effects are at play respectively. In this work, in order to account for potential cooperative effects, hotspot descriptors HSEner_PosCoop, HSEner_NegCoop apply linearly decreasing and increasing weights respectively to single-point alanine ΔΔGs within a hotregion. In turn Int_HS_Energy, based on the assumption the hotspot residues within the hotregion can be assumed to be additive, does not apply any weights. Here, the effects of accounting for cooperative/additive effects on the predicted hotspot and hotregions energies on all mutated complexes used in this work, is shown. (A) The mean hotspot energies for hotregion sizes of 1 to 8 hotspot residues. Each column shows the predictions of different hotspot predictors. (A) First row (blue), shows the raw mean hotspot energies, which essentially assumes all hotspots are additive within a hotregion. (A) Second row (red), assumes negative cooperativity within hotregions. To account for negative cooperativity, a linearly increasing weight is applied to the hotspot energies according to the size of the hotregion they are in (see Materials and Methods). (A) Third row (green), assumes positive cooperativity within hotregions and a linearly decreasing weight is applied to the hotspot energies according to the size of hotregion. (B) is similar to (A) but values are now the mean of the total hotregion energy of the given size. Effectively, the additive hotspot energy assumption results in hotregions contributing in a linearly increasing manner according to their size, the negative cooperativity assumption results in hotregions contributing in an increasing exponential-like manner as the hotregions increase in size, and the positive cooperativity assumption results in hotregions reaching a maximum contribution at around a hotregion size of 5, with their contribution decreasing beyond.

More »

Figure 10 Expand

Figure 11.

Effects of conformational changes and off-rate prediction.

Predictions of the original 13 regression models developed for off-rate prediction. The predictions are assessed separately (PCC with Δlog10(koff)) for mutations on complexes which undergo significant backbone conformational changes of I_RMSD >1.5 Å (dark green), notable conformational changes of I_RMSD >1 Å (light green) and little to no conformational changes I_RMSD <1 Å (dark blue). Predicted accuracy is directly related to the magnitude of conformational change and becomes highly dependent on the model at higher levels of conformational changes. I_RMSD values were extracted from our previous work on the construction of a protein-protein affinity database [66].

More »

Figure 11 Expand