Mapping Enzymatic Catalysis Using the Effective Fragment Molecular Orbital Method: Towards all ab initio Biochemistry

We extend the Effective Fragment Molecular Orbital (EFMO) method to the frozen domain approach where only the geometry of an active part is optimized, while the many-body polarization effects are considered for the whole system. The new approach efficiently mapped out the entire reaction path of chorismate mutase in less than four days using 80 cores on 20 nodes, where the whole system containing 2398 atoms is treated in the ab initio fashion without using any force fields. The reaction path is constructed automatically with the only assumption of defining the reaction coordinate a priori. We determine the reaction barrier of chorismate mutase to be kcal mol−1 for MP2/cc-pVDZ and for MP2/cc-pVTZ in an ONIOM approach using EFMO-RHF/6-31G(d) for the high and low layers, respectively.


Introduction
Fragment-based quantum mechanical methods [1][2][3][4][5][6][7][8][9][10][11][12] are becoming increasingly popular [13], and have been used to describe a very diverse set of molecular properties for large systems. Although these methods have been applied to refine the energetics of some enzymatic reactions [14,15] they are usually not efficient enough to allow for many hundreds of single point calculations needed to map out a reaction path for a system containing thousands of atoms, although geometry optimizations of large systems can be performed for systems consisting of several hundreds of atoms [8,9,11,[16][17][18]. In fact, typically applications of fragment-based methods to biochemical systems, for example, to protein-ligand binding [19], are based on performing a few single point calculations for structures obtained at a lower level of theory (such as with force fields). Although many force fields are well tuned to treat typical proteins, for ligands they can be problematic.
EFMO is based on dividing a large molecular system into fragments and performing ab initio calculations of fragments and their pairs, and combining their energies in the energy of the whole system (see more below). In the FD approach we employ here, one defines an active region associated with the active site, and the cost of a geometry optimization is then essentially given by the cost associated with the active region.
However, unlike the quantum-mechanical / molecular mechanical (QM/MM) method [27] with nonpolarizable force fields, the polarization of the whole system is accounted for in FMO and EFMO methods: in the former via the explicit polarizing potential and in the latter via fragment polarizabilities. Another important difference between EFMO and QM/MM is that the former does not involve force fields, and the need to elaborately determine parameters for ligands does not exist in EFMO. Also, in EFMO all fragments are treated with quantum mechanics, and the problem of the active site size [28] does not arise.
The paper is organized as follows: First, we derive the EFMO energy and gradient expressions for the frozen domain approach, when some part of the system is frozen during the geometry optimization.
Secondly, we predict the reaction barrier of barrier of the conversion of chorismate to prephenate ( Figure 1) in chorismate mutase. The reaction has been studied previously using conventional QM/MM techniques [29][30][31][32][33][34][35][36][37][38][39][40]. The EFMO method is similar in spirit to QM/MM in using a cheap model for the less important part of the system and the mapping is accomplished with a reasonable amount of computational resources (four days per reaction path using 80 CPU cores). Finally we summarize our results and discuss future directions.

Background and Theory
The EFMO energy of a system of N fragments (monomers) is where E 0 I is the gas phase energy of monomer I. The second sum in equation 1 is the pairwise correction to the monomer energy and only applies for pairs of fragments (dimers) separated by an interfragment distance R I,J (defined previously [20]) less than a threshold R resdim . The correction for dimer IJ is E POL IJ and E POL tot are the classical pair polarization energy of dimer IJ and the classical total polarization energy, respectively. Both energies are evaluated using the induced dipole model [41,42] based on distributed polarizabilities [43]. The final sum over E ES IJ is the classical electrostatic interaction energy and applies only to dimers separated by a distance greater than R resdim . These energies are evaluated using atom-centered multipole moments through quadrupoles [44]. The multipole moments and distributed polarizabilities are computed on the fly for each fragment [20,21].
In cases where only part of a molecular system is to be optimized by minimizing the energy, equation 1 can be rewritten, resulting in a method conceptually overlapping with QM/MM in using a cheap model for the less important part of the system. Consider a system S ( Figure 2) where we wish to optimize the positions of atoms in region A, while keeping the atoms in region b and F frozen (the difference between b and F will be discussed below). With this definition, we rewrite the EFMO energy as where E 0 A is the internal energy of region A. Region A is made of fragments contaning atoms whose position is optimized, and A can also have some frozen atoms Region A is surrounded by a buffer b, because fragment pairs computed with QM containing one fragment outside of A (i.e., in b) can still contribute to the total energy gradient (see below). On the other hand, fragment pairs with one fragment in F can also contribute to the total gradient, but they are computed using a simple classical expression rather than with QM. Note that the relation between the notation used in FMO/FD and that we use here is as follows: A, F and S are the same. The buffer region B includes A, but b does not, i.e., A and b share no atoms. Formally, A and b are always treated at the same level of theory by assigning fragments to the same layer.
In the EFMO method, covalent bonds between fragments are not cut. Instead, electrons from a bond connecting two fragments are placed entirely to one of the fragments. The electrons of the fragments are kept in place by using frozen orbitals across the bond. [21,45,46] Fragments connected by a covalent bond share atoms ( Figure 3) through the bonding region so it is possible that one side changes the wave function of the bonding region [21]. It is therefore necessary to re-evaluate the internal ab initio energy of region b for each new geometry step.
The internal geometries of fragments in region F are completely frozen so the internal energy is constant and is therefore neglected However, it is still necessary to compute the multipole moments and polarizability tensors (and therefore the wave function) of the fragments in F once at the beginning of a geometry-optimization to evaluate E POL tot in equation 3 as well as some inter-region interaction energies defined as Equation 8 assumes that b is chosen so that fragments in A and F are sufficiently separated (i.e., R I,J > R resdim ) so the interaction is evaluated classically. If all atoms in region b are frozen, then E 0 F/b is constant and can be neglected. However, this assumes that the positoins of all atoms at both sides of the bonds connecting fragments are frozen.

The final expression for the EFMO frozen domain (EFMO/FD) energy is
Finally, we note that due to the frozen geometry of b we can further gain a speedup by not evaluating dimers in b (cross terms between A and b are handled explicitly according to equation 7) since they do not contribute to the energy or gradient of A. This corresponds to the frozen domain with dimers (EFMO/FDD), and equation 5 becomes The gradient of each region is and the details of their evaluation has been discussed previously [20,21]. Equation 13 does not apply to non-frozen atoms shared with region A.
The frozen domain formulation of EFMO was implemented in GAMESS [47] and parallelized using the generalized distributed data interface [48,49].

Computational Details Preparation of the Enzyme Model
We followed the strategy by Claeyssens et al. [40] The structure of chorismate mutase (PDB: 2CHT) solved by Chook et al. [50] was used as a starting point. Chains A, B and C were extracted using PyMOL [51] and subsequently protonated with PDB2PQR [52,53] and PROPKA [54] at pH = 7. The protonation state of all residues can be found in Table S1. The inhibitor between chain A and C was replaced with chorismate in the reactant state (1, Figure 1) modeled in Avogadro [55, 56].
The entire complex (chorismate mutase and chorismate) was solvated in water (TIP3P [57]) using GROMACS. [58,59] To neutralize the system 11 Na + counter ions were added. The protein and counter ions were treated with the CHARMM27 [60,61] force field in GROMACS. Force-field parameters for chorismate were generated using the SwissParam [62] tool. To equilibrate the complex a 100 ps NVT run at T = 300 K was followed by a 100 ps NPT run at P = 1 bar and T = 300 K. The production run was an isothermal-isobaric trajectory run for 10 ns. A single conformation was randomly selected from the last half of the simulation and energy minimized in GROMACS to a force threshold of F max = 300 KJ mol −1 nm −1 . During equilibration and final molecular dynamics (MD) simulation, the C3 and C4 atoms of chorismate (see Figure 1) were harmonically constrained to a distance of 3.3Å to keep it in the reactant state. Finally, a sphere of 16Å around the C1 atom of chorismate was extracted in PyMOL and hydrogens were added to correct the valency where the backbone was cut. The final model contains a total of 2398 atoms.

Mapping the Reaction Path
To map out the reaction path, we define the reaction coordinate similarly to Claeyssens et al. [40] as the difference in bond length between the breaking O2-C1 bond and the forming C4-C3 bond in chorismate (see also Figure 1), i.e.
The We used two different sizes for the active region small: (EFMO:S, Figure 4) and large (EFMO:L, Figure 5). The active region (colored red in Figure 2) is defined as all fragments with a minimum distance R active from any atom in chorismate (EFMO:S : R active = 2.0Å, EFMO:L : R active = 3.0Å).
In EFMO:S the active region consists of chorismate, 4 residues and 5 water molecules, while the active region in EFMO:L consists of chorismate, 11 residues and 4 water molecules. The buffer region (blue in Figure 2) is defined as all fragments within 2.5Å of the active region for both EFMO:S and EFMO:L.
The rest of the system is frozen. To prepare the input files we used FragIt [63], which automatically divides the system into fragments; in this work we used the fragment size of one amino acid residue or water molecule per fragment.
In order to refine the energetics, for each minimized step on the reaction path we performed two-layer ONIOM [64,65] calculations where E low real = E EFMO according to equation 3. This can be considered a special case of the more general multicenter ONIOM based on FMO [66], using EFMO instead of FMO. The high level model system is chorismate in the gas-phase calculated using B3LYP [67][68][69] (DFTTYP=B3LYP in $CONTRL) or MP2 (MPLEVL=2 in $CONTRL) with either 6-31G(d) or the cc-pVDZ, cc-pVTZ and cc-pVQZ basis sets by Dunning [70].
We also carried out multilayer EFMO and FMO [71] single-point calculations where region F is described by RHF/6-31G(d) and b and A (for EFMO) or B (B = A ∪ b for FMO [18]) is calculated using MP2/6-31G(d).
The FDD approximation in equation 11 is enabled by specifying MODFD=3 in $FMO, similarly to the frozen domain approach in FMO [18]. All calculations had spherical contaminants removed from the basis set (ISPHER=1 in $CONTRL).

Obtaining the Activation Enthalpy
The activation enthalpy is obtained in two different ways by calculating averages of M adiabatic reaction pathways. The starting points of the M pathways were randomly extracted from the MD simulation, followed by the reaction path mapping procedure described above for each pathway individually. One way to obtain the activation enthalpy averages the barriers from each individual adiabatic reaction path [72] Here M is the number of reaction paths (M = 7, Figure 6) E TS,i is the highest energy on the adiabatic reaction path while E R,i is the lowest energy with a negative reaction coordinate. 1.6 kcal mol −1 corrects for the change in zero point energy and thermal contributions [72].
The other way of estimating the activation enthalpy is [37]: Here E TS and E R are, respectively, the highest energy and lowest energy with a negative reaction coordinate on the averaged adiabatic path (bold line in Figure 6). The brackets here mean averaging over 7 reaction paths; and the difference of Eqs 17 and 18 arises because of the noncommutativity of the sum and the min/max operation over coordinates: in Eq 17 we found a minimum and a maximum for each curve separately, and averaged the results, but in Eq 18 we first averaged and then found the extrema. As discussed below, the two reaction enthalpies are within 0.2 kcal/mol, which indicates that the TS occurs at roughly the same value of the reaction coordinate for most paths.

Effects of Methodology, Region Sizes and Approximations
Reaction barriers obtained in the enzyme using harmonic constraints are plotted on Figure 8 and listed in Table 1

Refined Reaction Energetics
For the smallest EFMO:S system ONIOM results are presented on Figure 9 and in Table 2  B3LYP level of theory using a cc-pVDZ basis set are listed in tables S2 to S5 and show differences from the above by less than 1 kcal mol −1 , again the reaction coordinates changes slightly depending on the system. The effect of including correlation effects by means of MP2 and systematically larger basis sets is that the potential energy barrier for the reaction rises as more correlation effects are included, the same is true for the overall reaction energy.
The results presented here for MP2 are in line with what has been observed previously by Ranaghan et al. [37] and Claeyssens et al. [40]. Overall, the reaction barrier is reduced to roughly half of the RHF barrier and the observed coordinates for the reaction shift slightly. We do note that this study and the study by Ranaghan et al. use ONIOM style energy corrections for the correlation and not geometry optimizations done at a correlated level. Overall, we observe that the predicted reaction coordinate for the approximate transition state in the conversion of chorismate to prephenate happens around 0.2Å later than in those studies.
The results for the multilayer single points along the energy surface are presented in Table 3

Ensemble Averaging
In Figure 6 and Figure 7 we show 7 adiabatic reaction paths mapped with EFMO-RHF/6-31G(d) starting from 7 MD snapshots; the energetics were refined with ONIOM at the MP2/cc-pVDZ and MP2/cc-pVTZ level. In EFMO, we used a small active region (EFMO:S) and R resdim = 1.5 and no dimer calculations in region b (S15FD3 in Figure 8). Out of the 7 trajectories one is described in detail in the previous sub-section.
For MP2/cc-pVDZ:EFMO-RHF/6-31G(d) the reaction enthalpies are ∆H ‡ 1 = 18.3 ± 3.5 kcal mol −1 and ∆H ‡ 2 = 18.2 kcal mol −1 [cf. Equations (17) and (18) There are several differences between our study and that of Claeyssens et al. that could lead to an overestimation of the barrier height: biasing the MD towards the TS rather than the reactant, a larger enzyme model (7218 vs 2398 atoms), and more conformational freedom when computing the potential energy profile.
With regard to the latter point, while Figure 8 shows that increasing the active region has a relatively small effect on the barrier this may not be the case for all snapshots. We did identify one trajectory that failed to produce a meaningful reaction path and is presented in Figure S1. Here, the energy of the barrier becomes unrealistically high because of very little flexibility in the active site and unfortunate placement of Phe57 (located in the buffer region, Figure S2), which hinders the conformational change needed for the successful conversion to prephenate yielding an overall reaction energy of around +11 kcal mol −1 . As noted above, the EFMO:L settings is a possible solution to this as more of the protein in available to move, but as seen from Table 1 the computational cost doubles.

Timings
Using the computationally most efficient method tested here (EFMO:S), R resdim = 1.5, and skipping dimers in the buffer region b, an adiabatic reaction path, which requires a total of 467 gradient evaluations, can be computed in four days using 80 CPU cores (20 nodes with 4 cores each) at the RHF/6-31G(d) level of theory. As shown in Table 1, the same calculation using FMO2 requires takes roughly T full rel = 7.5 times longer.
Increasing R resdim from 1.5 to 2 has a relatively minor effect of the CPU time (a factor of 1.2), while performing the dimer calculations in the buffer region nearly doubles (1.7) the CPU time. Similarly, increasing the size of active region from 2.0Å to 3.0Å around chorismate also nearly doubles (1.8) the CPU time. This is mostly due to the fact that more dimer calculations must be computed, but the optimizations also require more steps (513 gradient evaluations) to converge due to the larger number of degrees of freedom that must be optimized.
Looking at a single minimization for a specific reaction coordinate R = −1.79Å, the most efficient method takes 4.5 hours. Here, the relative timings T rel are all larger than for the full run (T full rel ) due to a slight increase in the number of geometry steps (around 25) taken for all but FMO2 which is identical to the reference (22 steps). Thus, the overall cost of performing the FMO2 minimization is 6.7 times as expensive as EFMO.

Summary and Outlook
In this paper we have shown that the effective fragment molecular orbital (EFMO) method [20,21] can be used to efficiently map out enzymatic reaction paths provided the geometry of a large part of the enzyme and solvent is frozen. In EFMO one defines an active region associated with the active site, and the cost of a geometry optimization is then essentially the cost of running quantum-mechanical calculations of the active domain. This is similar to the cost of QM/MM, if the QM region is the same; the difference is that in EFMO we freeze the coordinates of the rest of the system, whereas in QM/MM they are usually fully relaxed. On the other hand, EFMO does not require parameters and can be better considered an approximation to a full QM calculation rather than a QM/MM approach.
In this work we used the mapping technique based on running a classical MD simulation, selecting some trajectories, freezing the coordinates of the outside region, and doing constrained geometry optimizations along a chosen reaction coordinate. An alternative to this approach is to run full MD simulation of a chemical reaction using EFMO. This has already been done for many chemical reactions using FMO-MD [73][74][75] and can be done in future with EFMO.
A potential energy profile for the chorismate to prephenate reaction in chorismate has been computed in 4 days using 80 CPU cores for an RHF/6-31G(d) description of a truncated model of the enzyme containing 2398 atoms. For comparison, a corresponding FMO2 calculation takes about 7.5 times more. The cost of EFMO calculations is mainly determined by the size of the buffer-and active region. Comparing to a QM/MM with QM region of the same size, EFMO as a nearly linear scaling method, becomes faster than QM if the system size is sufficiently large; especially for correlated methods like MP2 where this cross-over should happen with relatively small sizes.
Our computed conformationally-averaged activation enthalpy is in reasonable agreement to the experimental value, although overestimated by 5.5 kcal/mol. The energetics of this reaction depends on the level of calculation. We have shown that by using a level better than RHF, for instance, MP2 or DFT, considerably improves the energetics and by using such an appropriate level to also determine the reaction path following the formalism in this work can be used to provide a general and reliable way in future.
EFMO, as one of the fragment-based methods [13], can be expected to be useful in various biochemical studies, such as in enzymatic catalysis and protein-ligand binding. It should be noted that in addition to its paramater-free ab initio based nature, EFMO and FMO also offer chemical insight on the processes by providing subsystem information, such as the properties of individual fragments (e.g., the polarization energy) as well as the pair interaction energies between fragments [76,77]. This can be of considerable use to fragment-based drug discovery [78,79].    Figure 3. Cross region fragmentation. The fragmentation procedure shares an atom (here C 1 and C 5 is the shared atom) between two neighboring and covalently bonded fragments. Even though these fragments are in separate regions, they still share an atom across that region as illustrated.    Calculated reaction barriers for Chorismate Mutase S15FD3 S15FD1 S20FD3 L15FD3 S15FD3_FMO Figure 8. EFMO-RHF/6-31G(d) barrier for chorismate mutase. S15FD3 and S15FD3 FMO are EFMO:S and FMO:S, respectively, both with R resdim = 1.5, and the dimer approximation in region b (Equation 11). S15FD1 is similar to S15FD3 but without the dimer approximation in region b. S20FD3 is also similar to S15FD3 but with R resdim = 2.0, instead. Finally, L15FD3 is EFMO:L with R resdim = 1.5, and the dimer approximation (FDD) in region b.    Reaction barriers of chorismate mutase calculated with different levels of theory. R resdim is unitless. The reaction coordinates for the reactant, transition state and product are R R , R TS and R P , respectively and given inÅ, barrier height of the transition state E TS−R and overall reaction energy E P−R in kcal/mol. T rel are relative timings to EFMO-RHF/6-31G(d) using the EFMO:S model with the fully minimized reaction coordinate on the trajectory subject to harmonic constraints. T full rel are for the entire path. The reaction coordinates inÅ for the reactant, transition state and product are R R , R T S and R P , respectively. The barrier height of the transition state E TS−R and the overall reaction energy E P−R are in kcal/mol. The reaction coordinates inÅ for the reactant, transition state and product are R R , R T S and R P , respectively. The barrier height of the transition state E TS−R and the overall reaction energy E P−R are in kcal/mol. Figure S1. Reaction barrier calculated at the MP2/cc-pVDZ:EFMO-RHF/6-31G(d) level of theory for EFMO:S using R resdim = 1.5 and FDD (modfd=3). This snapshot shows the effect of not having enough flexibility in the active region around the substrate. Figure S2. Two different starting geometries with chorismate and Phe57 shown as sticks from the MD simulation. A) shows a configuration which results in a successful reaction path and B) a configuration which results in an unsuccessfull reaction path (see Figure S1). The position of Phe57 coupled with a placement in the buffer region (b) makes it unable to move to accomodate the conversion of chorismate to prephenate. Table S1. Complete listing of all residues in the protein model (PDB: 2CHT) along with their protonation state after being protonated using the PDB2PQR tool. Table S2. Reaction barriers of chorismate mutase calculated using ONIOM for EFMO:S using R resdim = 1.5 and FD (modfd=1). Table S3. Reaction barriers of chorismate mutase calculated using ONIOM for EFMO:S using R resdim = 2.0 and FDD (modfd=3). Table S4. Reaction barriers of chorismate mutase calculated using ONIOM for EFMO:L using R resdim = 1.5 and FDD (modfd=3). Table S5. Reaction barriers of chorismate mutase calculated using ONIOM for FMO2:S using R resdim = 1.5 and FDD (modfd=3).