Fig 1.
Structure and catalyzed reaction of BglB.
(A) Structure of BglB in complex with the modeled p-nitrophenyl-β-D-glucoside (pNPG) used for design. Alpha carbons of residues mutated shown as blue spheres. The image was drawn with PyMOL. [16] (B) The BglB–catalyzed reaction on pNPG used to evaluate kinetic constants of designed mutants.
Fig 2.
Log scale relative kinetic constants of 100 BglB mutants.
The heatmap depicts the effect of each mutation on each kinetic constant relative to native BglB, normalized at 0. As indicated in the color legend, gold is for higher value and blue for a lower value. The metric 1/KM is used so a higher value is consistently corresponding to a “better” kinetic constant (assuming a lower KM is better) when evaluating kcat, kcat/KM, and KM. If the kinetic constant was not measurable, an X is depicted in the box. Proteins that were expressed as soluble protein with a final purification concentration of >0.1 mg/mL and validated by SDS-PAGE are labeled with a black box in the first column. Those below our limit of detection of 0.1 mg/mL are labeled with an empty box. Values are on a log scale and the ranges are as follows: 10–11,000 min-1 (kcat), 0.6–85 mM (KM), and 10–560,000 M-1min-1 (kcat/KM) with wild type constants of 880 ± 10 min-1, 5.0 ± 0.2 mM, and 171,000 ± 8000 M-1 min-1 for kcat, KM, and kcat/KM respectively. A full table of kinetic constants and substrate versus velocity curves for each are provided in S1 Table and S3 Fig.
Fig 3.
Active site model and conservation analysis of BglB.
(A) Docked model of pNPG in the active site of BglB showing established catalytic residues (navy) and a selection of residues mutated (gold). A multiple sequence alignment of the Pfam database’s collection of 1,554 family 1 glycoside hydrolases was made and the sequence logo for (B) selected regions around specific residues discussed in the text and (C) over the entire BglB coding sequence is represented. The height for each amino acid indicates the sequence conservation at that position.
Fig 4.
Correlation between machine learning predictions and experimentally-determined kinetic constants.
Top panels: predicted versus experimentally-measured values for kinetic constants kcat/KM (A), kcat (B), and 1/KM (C). All values are relative to the wild type enzyme and on a log scale. The standard deviation (error bars) of the predicted values are calculated based on the prediction by 1000-fold cross validation for each point. The red line corresponds to linear regression and has been added for visualization purposes. Bottom panels: Histograms of experimentally-determined values in the data set (90, 80 and 80 samples for kcat/KM, kcat, and KM, respectively), along with the residual errors (scatter plot) between predicted and measured kinetic values.
Table 1.
Most informative structural features predicting each kinetic constant.
For each mutant, 10 out of 100 models were selected based on the lowest total system energy. Fifty-nine structural features were calculated for the selected models and the most informative features were selected based on a constrained regularization technique (elastic net with bagging; see Methods). The table contains features that have been assigned non-zero weights during training (9 for kcat/KM, 8 for kcat, 10 for KM). The weights are multiplied by a normalized form of the value (not shown), and can therefore indicate both a positive or negative relationship. For example, a negative weight for hydrogen bonding is consistent with a positive correlation to hydrogen bonding where a smaller number indicates more hydrogen bonding is occurring. Inversely, a positive weight for packing would indicate a positive correlation since a larger value indicates a system with fewer voids. The relative contribution of each feature in determining the kinetic constant is given as a normalized weight (columns 1–3). Column 4 provides a description of each feature, and columns 5 and 6 show the range of observed values in the training dataset. The full feature table is available in S2 Table. ns = feature not selected by the algorithm.