Kinetic Characterization of 100 Glycoside Hydrolase Mutants Enables the Discovery of Structural Features Correlated with Kinetic Constants

doi:10.1371/journal.pone.0147596

Fig 1.

Structure and catalyzed reaction of BglB.

(A) Structure of BglB in complex with the modeled p-nitrophenyl-β-D-glucoside (pNPG) used for design. Alpha carbons of residues mutated shown as blue spheres. The image was drawn with PyMOL. [16] (B) The BglB–catalyzed reaction on pNPG used to evaluate kinetic constants of designed mutants.

More »

Expand

Fig 2.

Log scale relative kinetic constants of 100 BglB mutants.

The heatmap depicts the effect of each mutation on each kinetic constant relative to native BglB, normalized at 0. As indicated in the color legend, gold is for higher value and blue for a lower value. The metric 1/K_M is used so a higher value is consistently corresponding to a “better” kinetic constant (assuming a lower K_M is better) when evaluating k_cat, k_cat/K_M, and K_M. If the kinetic constant was not measurable, an X is depicted in the box. Proteins that were expressed as soluble protein with a final purification concentration of >0.1 mg/mL and validated by SDS-PAGE are labeled with a black box in the first column. Those below our limit of detection of 0.1 mg/mL are labeled with an empty box. Values are on a log scale and the ranges are as follows: 10–11,000 min^-1 (k_cat), 0.6–85 mM (K_M), and 10–560,000 M^-1min^-1 (k_cat/K_M) with wild type constants of 880 ± 10 min^-1, 5.0 ± 0.2 mM, and 171,000 ± 8000 M^-1 min^-1 for k_cat, K_M, and k_cat/K_M respectively. A full table of kinetic constants and substrate versus velocity curves for each are provided in S1 Table and S3 Fig.

More »

Expand

Fig 3.

Active site model and conservation analysis of BglB.

(A) Docked model of pNPG in the active site of BglB showing established catalytic residues (navy) and a selection of residues mutated (gold). A multiple sequence alignment of the Pfam database’s collection of 1,554 family 1 glycoside hydrolases was made and the sequence logo for (B) selected regions around specific residues discussed in the text and (C) over the entire BglB coding sequence is represented. The height for each amino acid indicates the sequence conservation at that position.

More »

Expand

Fig 4.

Correlation between machine learning predictions and experimentally-determined kinetic constants.

Top panels: predicted versus experimentally-measured values for kinetic constants k_cat/K_M (A), k_cat (B), and 1/K_M (C). All values are relative to the wild type enzyme and on a log scale. The standard deviation (error bars) of the predicted values are calculated based on the prediction by 1000-fold cross validation for each point. The red line corresponds to linear regression and has been added for visualization purposes. Bottom panels: Histograms of experimentally-determined values in the data set (90, 80 and 80 samples for k_cat/K_M, k_cat, and K_M, respectively), along with the residual errors (scatter plot) between predicted and measured kinetic values.

More »

Expand

Table 1.

Most informative structural features predicting each kinetic constant.

For each mutant, 10 out of 100 models were selected based on the lowest total system energy. Fifty-nine structural features were calculated for the selected models and the most informative features were selected based on a constrained regularization technique (elastic net with bagging; see Methods). The table contains features that have been assigned non-zero weights during training (9 for k_cat/K_M, 8 for k_cat, 10 for K_M). The weights are multiplied by a normalized form of the value (not shown), and can therefore indicate both a positive or negative relationship. For example, a negative weight for hydrogen bonding is consistent with a positive correlation to hydrogen bonding where a smaller number indicates more hydrogen bonding is occurring. Inversely, a positive weight for packing would indicate a positive correlation since a larger value indicates a system with fewer voids. The relative contribution of each feature in determining the kinetic constant is given as a normalized weight (columns 1–3). Column 4 provides a description of each feature, and columns 5 and 6 show the range of observed values in the training dataset. The full feature table is available in S2 Table. ns = feature not selected by the algorithm.

More »

Expand