OptZyme: Computational Enzyme Redesign Using Transition State Analogues

OptZyme is a new computational procedure for designing improved enzymatic activity (i.e., kcat or kcat/KM) with a novel substrate. The key concept is to use transition state analogue compounds, which are known for many reactions, as proxies for the typically unknown transition state structures. Mutations that minimize the interaction energy of the enzyme with its transition state analogue, rather than with its substrate, are identified that lower the transition state formation energy barrier. Using Escherichia coli β-glucuronidase as a benchmark system, we confirm that KM correlates (R2 = 0.960) with the computed interaction energy between the enzyme and the para-nitrophenyl- β, D-glucuronide substrate, kcat/KM correlates (R2 = 0.864) with the interaction energy of the transition state analogue, 1,5-glucarolactone, and kcat correlates (R2 = 0.854) with a weighted combination of interaction energies with the substrate and transition state analogue. OptZyme is subsequently used to identify mutants with improved KM, kcat, and kcat/KM for a new substrate, para-nitrophenyl- β, D-galactoside. Differences between the three libraries reveal structural differences that underpin improving KM, kcat, or kcat/KM. Mutants predicted to enhance the activity for para-nitrophenyl- β, D-galactoside directly or indirectly create hydrogen bonds with the altered sugar ring conformation or its substituents, namely H162S, L361G, W549R, and N550S.


Introduction
Enzymes are highly-specific, biomolecular catalysts that cause extraordinary reaction rate enhancements under mild conditions [1]. Enzyme activity is of paramount importance in the economics of cellulosic ethanol (and other biofuels) production [2,3]. Improving enzymatic activity is generally carried out using primarily experimental techniques (i.e., directed evolution strategies) relying on screening large combinatorial libraries [4]. Experiments can be synergistically coupled with efficient computational screening protocols (i.e., fine-tuning of in silico mutants with random mutagenesis) to identify mutants within promising regions of the sequence space for constructing enriched libraries. Reliable computational techniques for identifying mutations that lead to enzymatic activity improvements would have a crosscutting impact on many fronts ranging from biofuel production and biomass pretreatment to pro-drug activation and the design of new therapeutics [5][6][7][8].
Here, we introduce a new enzyme design method, OptZyme, to address some of these challenges. OptZyme uses transition state analogues (TSAs) as proxies for the typically unknown ratelimiting transition state (TS) structures. TSAs are potent inhibitors with a stable enzyme-bound complex that closely resemble the TS of an enzymatic reaction [38,39]. TSAs manage to interfere with the enzyme catalytic activity by mimicking the geometry of the TS and preferentially binding with the enzyme over the substrate, thus preventing the reaction from proceeding. TSAs are known for many enzymatic reactions [40][41][42][43]. Improving catalysis by lowering the TS energy barrier can theoretically be achieved by identifying mutations that minimize the binding energy (BE) of the enzyme with its TSA, rather than with its substrate. We approximate BE with interaction energy (IE) to limit the forcefield's role in reconfiguring the free enzyme/substrate. The developed theoretical framework assumes that solute entropic changes and conformational changes upon binding are relatively small and that product release after the rate-limiting step is energetically favored. The concept of using TSAs for enzyme redesign has been previously explored [23,44]. However, Opt-Zyme is unique as it provides a theoretical framework for making use of TSA calculations to inform enzyme design while also integrating preliminary quantum mechanics (QM) information (e.g., rate-limiting step identification and ligand partial charge information).
Enzyme optimization using OptZyme can be achieved by designing libraries of mutations that raise k cat or lower K M within the Michaelis-Menten kinetic representation. K M is related to the IE with the substrate, while k cat /K M is expressed as a function of the IE with the TSA. We used OptZyme to redesign Escherichia coli b-glucuronidase (GUS) to favor the new substrate, para-nitrophenyl-b, D-galactoside (pNP-GAL) in place of para-nitrophenyl-b, D-glucuronide (pNP-GLU). pNP-GLU was used as a proxy for the native substrate (i.e., glycosaminoglycans containing glucuronic acid [45,46]). Separate computational library designs were identified that optimize K M , k cat , or k cat /K M , and the observed differences were analyzed. Mutations H162S, D163K, L361R, L361E, W549R, and N550S were identified that optimized at the same time K M , k cat , and k cat /K M for pNP-GAL instead of pNP-GLU. Mutations that (either directly or indirectly) created hydrogen bonds with the altered geometry of the TSA of the new substrate accounted for the majority of redesigns.

Redesign of GUS
The design concept explored by OptZyme is to attempt to lower the TS barrier by optimally redesigning the enzyme so as to improve the binding affinity (approximated using IE) of a TSA. The native reaction for GUS is hydrolysis of glucuronic acid from the non-reducing end of the glycosaminoglycan [45,46] (Figure 1). The native substrate more closely resembles pNP-GLU than pNP-GAL as seen in the structures of their sugar moieties (see Figure 2). pNP-GLU was used here in lieu of the native substrate as in previous experimental work [47][48][49] because its para-nitrophenolate leaving group is facile to spectrophotometric monitoring.
The structure for GUS was computationally assembled largely from its unbound crystal structure (PDB: 3K46 [50]). A six-residue loop was not spatially resolved in PDB 3K46. The loop had to be modeled due to its proximity to the active site (minimum loopsubstrate interatomic distance = 7.5 Å ) and interactions with the substrate. An inhibitor-bound structure (PDB: 3LPF [50]) was used to obtain a reasonable conformation of the six-residue loop and pinpoint the binding site for pNP-GLU. The CHARMM [20] force field was used during energy minimizations while Nuclear Overhauser Effect (NOE) restraints were imposed between important catalytic residues (Table 1, Figure 3). The restraints were used to ensure conservation of the optimal catalytic geometry [51].
Upon modeling the GUS structure, the next step involved identifying a TSA. To our knowledge, the TS structure for the glycosidic hydrolysis of pNP-GLU is unknown, but there is information available on TSAs for GUS (i.e., 1,5-glucarolactone) [52,53]. QM calculations were used to explore the reaction mechanism (see Figure 4) to aid in identifying the rate-limiting TS. We hypothesized a TS that has sp 2 hybridization at the anomeric carbon because QM calculations confirmed the carbenium nature Figure 1. Native Reaction for GUS. GUS catalyzes the hydrolysis of a glucuronic acid-containing glycosaminoglycan to form two products, glucuronic acid and an amino sugar (acetylglucosamine in this reaction). pNP-GLU is used as the substrate instead of a glycosaminoglycan because para-nitrophenolate absorbance allows for spectroscopic monitoring of activity in experimental studies. Experimental activity measurements for GUS variants are used for verifying correlations between activity and IE. doi:10.1371/journal.pone.0075358.g001 Figure 2. Comparison between ground state, hypothetical TS, and TSA for pNP-GLU and pNP-GAL. Differences between pNP-GLU (A) and pNP-GAL (B) include reversal of the stereospecificity of the C4 carbon and replacement of a carboxylic acid (pNP-GLU) at the C5 carbon with an alcohol (pNP-GAL). The previously-suggested [52,53] TSA for pNP-GLU, 1,5-glucarolactone (D), resembles the proposed TS (C) in terms of charge distribution and stereospecificity of the carbohydrate. In contrast to the proposed TS structure, the TSA lacks the paranitrophenyl (pNP) moiety and a hydrogen atom from the C1 carbon. In addition, the TSA (D) differs from pNP-GLU (A) by assuming a more flattened sugar ring geometry (see Figure S1 for dihedral angles) and partial positive charge at the anomeric carbon. The TSA for pNP-GAL, 1,5-galactonolactone (E), is similar to 1,5-glucaronolactone (D). The differences between 1,5-galactonolactone and 1,5-glucaronolactone are identical to the differences between pNP-GAL and pNP-GLU. doi:10.1371/journal.pone.0075358.g002 of the intermediate. Vibrational confirmation of the equilibrium states was not performed as structural constraints placed on the GUS residues prevents vibrational confirmation of the minima (see Text S1). The hypothetical TS structure was similar to the independently-postulated TSA, providing further support for 1,5glucarolactone as an appropriate TSA. Density functional theory calculations were performed using a cluster model that included pNP-GLU and residues D163, E413, N466, R467, and E504. All calculations were run using Schrödinger Jaguar [54] with the hybrid B3LYP functional [55,56] and 6-31G**+ basis set.
The TSA resembles the proposed TS ( Figure 4B) through similar partial charges and stereochemistry within the carbohydrate moiety (see Figure 2). The TSA differs from the proposed TS by the replacement of the glycosidic bond with an ester functional group, resulting in an altered ring conformation due to the sp 2hybridized carbonyl. The differences between the TSA for pNP-GAL (i.e., 1,5-galactonolactone) and 1,5-glucarolactone are equivalent to the differences between pNP-GAL and pNP-GLU. These differences include changes in stereospecificity at the C4 carbon and the substituent at the C5 carbon (see Figure 2).
Testing of TSA-based Redesign Paradigm Using k cat and K M Literature Data Before proceeding with the redesign of GUS to accept the new substrate, we used existing k cat and K M data from literature to assess the validity of the proposed computationally-accessible metrics [48,49,57]. Earlier engineering efforts focused on altering GUS specificity from a substrate with a native carbohydrate topology (i.e., pNP-GLU) to a non-native one (i.e., pNP-GAL [49] or para-nitrophenyl-b, D-xylopyranoside [48]) or alternatively improving GUS resistance to glutaraldehyde and formaldehyde [57]. Therefore, the derived GUS mutants were less active towards pNP-GLU than wild-type (WT). We used the data to first assess Restraints were placed on key catalytic contacts, determined from previous experimental [51] and preliminary QM information. Distances between atoms were selected based on typical nonbonded interaction lengths, and spring constants were determined iteratively so that the distances were properly restrained while not over-constraining the system. k min was the harmonic constant implemented if the interatomic distance was too small, and k max was the harmonic constant used if the interatomic distance was too large. k min ,k max because catalytic contacts would remain intact at smaller distances. Entries are shown in Figure 3.   Table 1). Key catalytic residues are labeled by their one-letter amino acid abbreviation followed by their position number. para-nitrophenyl-b, D-glucuronide (pNP-GLU) is labeled by the abbreviation ''PNP'' (see Figure 1). Atoms involved in restraints are labeled, along with interatomic distances.  The entropic component of the free energy of solvation is accounted for by using an accessible area solvent model [58]. The change in solute entropy upon binding is assumed to be negligible relative to the other terms [59]. IE is a good surrogate for BE in cases where binding is not conditional on significant conformation rearrangements (no induced fit [60]). In addition, IE is less dependent on the force field as the energetics of any conformational rearrangements do not need to be tracked. IEs were calculated using the iterative protein redesign and optimization procedure (IPRO) [61]. IPRO iteratively randomly perturbs the protein backbone, subsequently assigns optimal rotamers for all design positions (mutable amino acid positions), and then executes an energy relaxation step. Different IPRO trajectories may converge in alternate low energy conformations. To remedy the run-dependent nature of the results, 25 separate IPRO trajectories were generated. The final IE was calculated as the average over the best IE for each one of the 25 trajectories (see Figures S2, S3, and S4 for distribution of IEs). In general, the energy distribution of the top 25 generated IEs followed trends that were consistent with a normal distribution. However, deviations away from a normal distribution are observed in some instances as a result of the small sample size. The calculated IE values were then related to K M values obtained from literature as follows.
Michaelis-Menten kinetics for GUS enzymatic catalysis (based on the mechanism shown in Figure 4) is depicted through Equation 3, where E is GUS, S is pNP-GLU, E?S is GUS bound to pNP-GLU, E?I 1 is the bound carbocation intermediate, E?I 2 is the E504-covalent adduct, E?P is bound glucuronic acid, P is the product of the reaction (glucuronic acid), and k represents a reaction rate constant.
QM calculations in vacuo identified E?S, E?I 1 , and E?I 2 and found only a slight barrier between E?I 1 and E?I 2 . A TS for the E?S to E?I 1 step was not successfully located (see Text S1). Based on the QM calculations, it is unclear whether the rate-limiting step for GUS is E?S to E?I 1 or E?I 2 to E?P. However, both of these TSs are expected to closely resemble the carbocation intermediate (i.e., E?I 1 ), which is consistent with the adopted TSA. By assuming a fast rate of hydrolysis of the covalent adduct (i.e., E?I 2 ) and that the equilibrium constant of product release (i.e., E+P) after the ratelimiting step lies far to the right E ½ P ½ = E : I 1 ½ ww1, Equations 4 and 5 describe the enzyme catalytic parameters of the overall reaction (see Text S2 for detailed discussion of how these equations are arrived at from Equation 3).
k cat~k2 ð5Þ is an alternate way of expressing that the equilibrium of product release lies far to the right. G min P must Figure 5. Ground state computational IE S for pNP-GLU versus the natural logarithm of experimental K M . IEs were calculated using IPRO, and experimental data was obtained from literature [48,49,57]. Each numbered label corresponds to a single variant enzyme with multiple amino acid substitutions from wild-type (WT). Calculated IEs at the ground state are consistent with the observed changes in K M for GUS mutants (R 2 = 0.960). Figure S2 shows the distribution of the trajectory-best IEs whose average forms each data point.   release. The hypothetical rate-limiting step was used to identify the individual rate constants in Equations 4 and 5. However, the derivations are independent of the true rate-limiting step. The TSA choice does depend on the rate-limiting step, but it has been verified independently [52,53].
Using the relationship between Gibb's free energy and equilibrium concentrations (see Text S2, Equation S12), Equation 6 links the Michaelis constant, K M , to the BE between the enzymatic substrate complex (E?S) and the unbound reactants, BE S (see Equation 1).
We find that the all-atom root mean square deviation (RMSD) between unbound (E) and bound (E?S) GUS is only 0.22 Å , implying that there is minimal conformational rearrangement in GUS upon binding [62] with pNP-GLU, which justifies the approximation of BE S with IE S (IE with the substrate, pNP-GLU) (see Equations 1 and 2). Using Equation 6 and the assumption that BE S = IE S , we find that K M and IE s for the mutant/WT enzymes are related through Equation 7.
Equation 7 implies a linear correlation between ln(K M ) and IE S . Figure 5 depicts the measured K M values [48,49,57] and corresponding calculated IE S s for the WT GUS and five mutants. The correlation coefficient of 0.960 implies that the derived expression (Equation 7) correctly captures the observed K M trends for the enzyme variants. While the actual magnitude of the energy values on the y-axis is not quantitatively accurate, the relative ordering of the mutants in terms of their K M values is consistent with the data.
Unlike K M , which depends on binding at the ground state, k cat is directly related to the reaction rate. The rate constant of a reaction is related to the change in the Gibb's free energy between the ground and TSs, based on the Eyring-Polanyi equation derived from transition state theory (Equation 8) (see also Figure 6).
In Equation 8, k is the rate constant, h is Planck's constant, k is the transmission coefficient (assumed invariant among all mutants), k B is the Boltzmann constant, and DG { is the change in Gibb's free energy between the ground and TSs (Equation 9).
We cannot directly computationally assess DG { because the TS structure is unknown. Since the structure of the TS is unavailable, we postulate that mutations that lead to beneficial interactions of the enzyme Figure 8. Scaled difference between IE TSA and IE S for pNP-GLU versus the natural logarithm of k cat . Data was obtained as detailed in the caption of Figure 5. Scaling is required because of the non-quantitative nature of the energy calculations. With scaling, it is apparent that the turnover number increases as the difference becomes more negative. These results suggest that as the enzyme interacts with the TS more strongly, the turnover number increases (R 2 = 0.854). doi:10.1371/journal.pone.0075358.g008 Figure 9. pNP-GLU IE S correlation with catalytic efficiency. Data was obtained as described for Figure 5. No significant correlation is observed (R 2 = 0.545) between IE with pNP-GLU and ln(k cat /K M ). If GUS catalysis was primarily achieved through reactant destabilization, a positive slope would have been expected. doi:10.1371/journal.pone.0075358.g009 Figure 10. Correlation between pNP-GAL IE TSA and ln(k cat /K M ). The correlation found here is significantly lower than the one found for pNP-GLU (see Figure 7) primarily due to mutant T509S. See also Figure S4. doi:10.1371/journal.pone.0075358.g010 with its TSA should produce similar benefits with the unresolved TS. Equation 10 expresses this postulate mathematically by implying that the difference between the minimized free energy of the TS and the TSA is invariant with respect to mutations introduced on the enzyme.
The unknown (for IPRO trajectories where the TSA is the ligand) free energy of the Michaelis complex can be eliminated by combining Equations 1 and 9, yielding Equation 11.
We have already shown that computationally approximated IE S provides a good approximation for BE s . In analogy, we assume that the IE with the TSA (IE TSA ) is a good approximation for BE TSA . The TSA and substrate structures, and therefore energies, remain largely unchanged during the redesign process. Since G min TSA and G min S are both invariant with respect to mutations to the enzyme and IE TSA > BE TSA , Equation 10 can be used to eliminate the unknown free energy of the bound TS (G min E : TS ) yielding Equation 12.
Constant C1 is a grouping of constants, including those from Equations 8 and 10. Equation 12 is further simplified by substituting the definition of IE TSA (see Equation 2, where the bound molecule in this case is the TSA).
C1 can be eliminated from Equation 13 by expressing it as the difference in the IEs between mutant and WT enzymes, where DIE TSA = IE TSA 2IE TSA,WT , DBE S = BE S -BE S,WT , and In   Equations 17 and 7, respectively. Note that experimental and correlating temperatures do not match. Similarly high temperatures were seen in the quantification of RNA-ribosome binding calculations in the RBSCalculator [63]. A strong correlation (R 2 = 0.864) is observed between IE TSA and the natural logarithm of experimental k cat /K M values (see Figure 7), suggesting that IE TSA is a good descriptor of k cat /K M . This observed correlation implies that the derived equations are applicable and that the chosen TSA is suitable. However, this trend does not necessarily prove the QM-based reaction mechanism. The same strong correlation (i.e., R 2 = 0.854) is observed between IE TSA /(RT) TSA -IE S /(RT) S and the natural logarithm of k cat (see Figure 8). The experimental K M values vary by less than an order of magnitude ( Figure 5), while the experimental k cat /K M values vary over several orders of magnitude (Figure 7). The scaling differences in the experimental data and the larger weight of 1/(RT) TSA ( = 0.06 mol/kJ), relative to 1/(RT) S ( = 0.002 mol/ kJ), in the correlating expression (Equation 18) contribute to the similarity between Figure 7 and Figure 8. As a control, we verified that the energy difference between the Michaelis complex and unbound reactants shows no correlation with the catalytic efficiency (see Figure 9).
The justification of the chosen TSA and validation of the correlation between computationally-accessible metrics and experimental catalytic data justifies the use of IE calculations to optimize a targeted enzyme parameter.

Further Validation of Correlating Expressions Using pNP-GAL
Before implementing the OptZyme redesign approach, we first showed that the correlating expressions derived for pNP-GLU were transferrable to alternative substrates and their corresponding TSAs. Since our overarching goal was to switch GUS specificity from pNP-GLU to pNP-GAL, we sought to verify the correlating expressions for K M (Equation 7) and k cat /K M (Equation 17) using pNP-GAL and 1,5-galactonolactone, respectively. pNP-GAL k cat /K M data was again obtained from literature sources focused on altering GUS specificity from pNP-GLU to pNP-GAL [47,49]. Accurate K M estimates were absent in the literature. Instead, we estimated them by monitoring paranitrophenolate absorbance as a function of substrate concentration and fitting to the Michaelis-Menten equation using the mutant cell lysates (see Text S3). The K M value determined for the native substrate analogue (i.e., pNP-GLU) using the same crude lysate of WT GUS (0.24260.022 mM) was similar to the literature reported value (0.183 mM [48,49,57]).
The observed k cat /K M correlation for pNP-GAL ( Figure 10, Equation 17) was similar (albeit weaker) to that for pNP-GLU (see Figure 7), with the exception of one outlier (i.e., T509S). The observed K M correlation for pNP-GAL (Equation 7) has a positive slope, similar to the correlation for pNP-GLU (see Figure 5). However, one of the three variants (i.e., T509A, D531E, S557P, N566S) was an outlier. Considering both pNP-GLU and pNP-GAL mutant data, D531E was the only surface point mutation located near the center of an a-helix. Implicit solvation models have been shown to cause inaccuracies within a-helices [64]. By considering pNP-GAL, we demonstrated the applicability of Equations 7 and 17 of OptZyme for non-native substrates.

Redesign of GUS for Improving Activity with pNP-GLU
OptZyme was first used to identify beneficial mutations that improve K M , k cat /K M , and k cat with pNP-GLU by minimizing the appropriate IE (Equations 7, 17 and 18, respectively). Constraints that ensure that both the substrate and TSA favorably bind GUS (i.e., IE S ,0, IE TSA ,0) were included in the OptZyme runs. Design positions were selected in locations that are likely to impact active site geometry and directly mediate interactions with the substrate. The same set of design positions was chosen for all sets of calculations (H162, D163, F164, V355, G356, L361, G362, W549, N550).
A high frequency of mutations to glycine by OptZyme was initially observed, presumably to avoid steric clashes within the highly-packed active site of GUS. To remedy this bias, we first performed multiple sequence alignments to extract natural amino acid usage patterns. The first family alignment was performed using PFAM [65] between GUS and the glycosyl hydrolases family 2, and the second alignment was performed between GUS and all other b-glucuronidases (as identified in BRENDA [66]) using Clustal-Omega [67]. Amino acids observed at least once in the alignment of all b-glucuronidases (181 sequences, see Figure 11) or in at least 5% of the glycosyl hydrolases family 2 (excluding gaps, 3975 sequences) were permitted for each design position (see Table 2 for permissible mutations). In addition, the total number of glycine residues throughout all design positions was restricted to be at most two (matching the glycine utilization frequency in WT).
Fifty independent trajectories of OptZyme were run to optimize K M , k cat /K M , and k cat for GUS using pNP-GLU and 1,5glucarolactone. NOE restraints were used to maintain the optimal catalytic geometry of GUS (Table 1, Figure 3). Each trajectory of OptZyme consisted of 5000 iterations, and simulated annealing was used after 100 cycles (using constant T = 7268K, which corresponds to an acceptance rate of about 50% of redesigns within 10 kcal/mol, 41.9 kJ/mol, of the best mutant) to avoid premature convergence to local minima of the GUS free energy landscape. The CHARMM energy terms used were identical to those used in the testing of the TSA-based redesign paradigm, and the backbone-dependent Dunbrack rotamer library was used for side chain optimization [68].
OptZyme was used to identify three libraries of mutants that were computationally predicted to enhance enzyme catalytic parameters relative to WT (see Table 3, Figure 12). The observed mutants seemed to lower the relevant IE predominantly through improving flexibility in the active site, increasing solvation stabilization, or improving the electrostatic IE (including hydrogen bonding). Many mutations were common between the K M -and k cat /K M -optimized libraries because of the electrostatic and structural similarity between the substrate and TSA. In the interest of identifying mutations that primarily improve a specific enzyme parameter, a systematic cutoff was defined for identifying mutations that were representative of the K M -or k cat /K Moptimized libraries. A mutation was considered representative of a library if it occurred at least 15% of the time for a given design position and at the same time 10% more frequently than in the other libraries. These metrics were selected because they closely matched the representative mutations determined by visual inspection of Figure 12. For example, H162A and H162G (extra flexibility of the protein backbone), D163S (enhanced solvation), and G362R (hydrogen bonding/solvation effects) were mutations representative of the pNP-GLU k cat /K M -optimized library (see Figure 12).
Experimental validation of the mutants can be carried out using a high-throughput assay, where the fluorescence of the paranitrophenolate leaving group is readily measured based on its high absorbance at 405 nm [47]. The design of mutants for pNP-GLU is handicapped as WT GUS is already very active and the scope for identifying significantly improved mutants is limited. However, GUS activity with pNP-GAL is ,10 7 lower than for pNP-GLU [47]. Therefore, the entire gamut of beneficial interactions leading to switch of specificity from pNP-GLU to pNP-GAL would be detectable using a high-throughput assay.

Redesign of GUS for Introducing Catalytic Activity with the New Substrate pNP-GAL
Three libraries were constructed that were designed to enhance K M , k cat /K M , or k cat of GUS for pNP-GAL (see Table 4, Figure 13). The constructed mutants were stabilized in a similar manner as described for pNP-GLU. The only representative mutant in the pNP-GAL K M -optimized library was L361N (electrostatic interactions with pNP-GAL C5 substituent/solvation enhancement). L361G (extra flexibility of GUS backbone), W549R (hydrogen bonding with pNP-GAL C2 hydroxyl group), and N550S (solvation enhancement) were representative mutants for the pNP-GAL k cat /K M -optimized library.
Mutations enriched in the pNP-GAL libraries but largely absent from all pNP-GLU libraries were also identified. The analysis revealed only one such additional mutation, H162N (electrostatic interaction with the C4 substituent). Structural analysis also revealed that the backbone carbonyl of F161 (not a design position) formed a hydrogen bond in 97.5% of the examined structures (each mutant in Table 4) with the C5 substituent of pNP-GAL. This interaction was absent for all of the pNP-GLU designs in Table 3. Thus, the identity of the adjacent residue at design position 162 may directly promote (or prevent) the backbone interaction with pNP-GAL. In addition, mutation H162S is observed in 13.3% of the examined mutants (see Table 3) for pNP-GLU but 36.7% (see Table 4) for the pNP-GAL libraries. Therefore, H162S may be important for the interaction of F161 with pNP-GAL.
Several mutations were found that make direct contact with the novel ligand. Since the differences between pNP-GLU and pNP-GAL are in the C4 and C5 substituents of the carbohydrate moiety (Figure 2), mutations that create contacts with these substituents are expected. Indeed, this is the case for the D163K, L361R, and L361E mutations, as well as the contact by F161. However, W549R forms a contact with the ligand but at the unchanged portion of the carbohydrate. W549R was more common in the pNP-GAL libraries because of a slightly deformed sugar ring of pNP-GAL, relative to pNP-GLU. The results show OptZyme is sensitive enough to detect even minor structural variances between substrates.
Amongst the pNP-GAL libraries, the K M -optimized library is enriched with smaller amino acids (see Text S4 for discussion on prevalence of small amino acids). Although this observation could be an artifact due to the larger size of pNP-GAL relative to its TSA, the design positions were chosen at the edge of the active site further away from the pNP substituent. Thus, the smaller side chains in the K M -optimized library are more likely a reflection of the chair-like conformation of the sugar ring, which has a larger excluded volume than the planar geometry of the TSA. The mutation of the WT side chains to the large, polarizable side chains that are representative of the k cat /K M -optimized library (H162Q, L361K, G362D, W549R), imply that the planar form of the molecule is stabilized through efficient packing of the enzyme and beneficial electrostatic interactions.

Summary
A new set of computationally accessible metrics was derived for correlating K M , k cat /K M , and k cat between WT and mutated enzymes. With the aid of a QM-derived reaction mechanism, we validated that the IE S correlates with K M (Equation 7), and the IE TSA correlates with k cat /K M (Equation 17). k cat can be measured through a weighted combination of IE S and IE TSA (Equation 18). It is important to note that the observed correlations are not proof for the QM-based mechanism. OptZyme, a computational tool used to design mutations that improve K M , k cat , or k cat /K M , generated mutations that were predicted to enhance enzymatic activity for pNP-GLU. OptZyme is best suited for systems where the solute entropy change upon binding is assumed to be negligible relative to other terms, substrate binding is not a consequence of ''induced fit'', and equilibrium following the rate-limiting step strongly favors product release. The identified mutants stabilized the substrate mostly through hydrogen bonding networks, improved solvation, and efficient packing of the active site. OptZyme was utilized to construct a library of mutants with improved enzyme catalytic parameters for a similar substrate, pNP-GAL. Though these substrates are similar, OptZyme was able to identify novel contacts with the ligand in the pNP-GAL libraries that were absent from the pNP-GLU libraries. Several mutations were enriched in all of the pNP-GAL libraries, namely those that interact with the distorted sugar ring conformation or its altered substituents. In comparison of the K M -and k cat /K Moptimized libraries for pNP-GAL, we found that large, polar side chains were observed more often in the k cat /K M -optimized library. This was attributed to the more planar geometry of the TSA. These results suggest that mutants with large, polar side chains can stabilize the TS through interactions with the hydroxyl substituents and efficient packing, thereby improving enzymatic activity.
OptZyme is available for download at maranas.che.psu.edu/ submission/OptZyme.htm. Figure S1 Dihedral angles of ground state and TSA for pNP-GLU and pNP-GAL. The layout of this figure corresponds to the layout of Figure 2. TS dihedral angles could not be determined because the TS structure was never solved so its coordinates are unknown. Dihedral angles were calculated using only the six atoms constituting the sugar ring (five carbon atoms, one oxygen atom). The absolute value of the dihedral angles describing the rotation about the C6-O, C1-O, and C1-C2 bonds are much lower for the TSAs than for the ground state molecules. This illustrates the more planar ring geometry of the TSAs.  (1992). Results show that glycine residues (crosses) are frequently observed in the ''generous'' or ''outside'' regions of the map. Alternatively, the other 19 standard amino acids (squares) are much less frequently observed in the ''generous'' or ''outside'' regions. Glycine residues can avoid some of the steric repulsion that is more difficult to avoid for residues with a C b . While other amino acids can undergo contortions in their side chain to avoid a strong steric clash, mutation to a glycine residue is more favorable. (TIF) Table S1 Gas phase energies from QM cluster model of GUS active site. The gas phase energies are reported for the cluster model of the active site with the backbone of all residues constrained, as well as the ASN 466 sidechain. The calculated energies are relative to the calculated ''Intermediate 2 (E)'' energy. Each of the three structures corresponds to structures identified in Figures 4 and 6. This correspondence is indicated by each structure's one-letter label. (DOC)