Hybrid RHF/MP2 Geometry Optimizations with the Effective Fragment Molecular Orbital Method

The frozen domain effective fragment molecular orbital method is extended to allow for the treatment of a single fragment at the MP2 level of theory. The approach is applied to the conversion of chorismate to prephenate by Chorismate Mutase, where the substrate is treated at the MP2 level of theory while the rest of the system is treated at the RHF level. MP2 geometry optimization is found to lower the barrier by up to 3.5 kcal/mol compared to RHF optimzations and ONIOM energy refinement and leads to a smoother convergence with respect to the basis set for the reaction profile. For double zeta basis sets the increase in CPU time relative to RHF is roughly a factor of two.

For fast geometry optimizations, FMO with the frozen domain and dimers (FDD) [20] has been proposed and EFMO/FDD has been used to map the reaction path of the conversion of chorismate to prephanate in Chorismate Mutase at the RHF level for geometry optimization combined with ONIOM for energy refinement [21]. Chorismate Mutase has also been studied extensively by many groups. Particularly, the group of Mulholland has invested considerable amount of resources to evaluate the catalytic effect of Chorismate Mutase [22][23][24][25][26][27][28]. Other related QM/MM work on Chorismate Mutase includes FMO energetics refinement by Ishida et al. [29] and the work of Claeyssens et al. [30] who used linear scaling coupled cluster methods to obtain the reaction barrier on structures optimized using a QM/MM approach with density functional theory used to describe the QM region. This study specifically underlines the importance of energy corrections at a correlated level of theory, which in turn requires reliable optimization of the molecular structure. Our recent study [21] emphasizes that in addition to a high quality energy evaluation on the reaction complex, a conformational sampling of the reaction complex geometry is needed in order to obtain a reliable energy barrier, since the reaction barrier can fluctuate by up to 15 kcal/mol between geometry optimizations on different starting conformations.
Our previous methodology was to estimate the reaction barrier in Chorismate Mutase using an EFMO-RHF geometry optimization with an ONIOM MP2 energy correction [21]. It was clear, however, that the RHF based optimization did not always lead to a reliable MP2 correction. For example, the MP2 energy did not converge in a smooth manner with respect to the basis set size. One likely explanation is, that it is in general not optimal to deal with reaction complexes for which the structure is calculated using an uncorrelated wave function method such as RHF.
In this work, we have created a method to obtain a correlated (MP2 level) reaction complex geometry using the EFMO method on a large system, and show a calculation of the reaction barrier in Chorismate Mutase as an example.
We extend EFMO/FDD to enable treatment of only one fragment at the MP2 level and show that it is a good compromise between efficiency and accuracy. Note that the effects of conformational sampling are not investigated in this paper. This paper is organized as follows: First we present the EFMO method and our extension to the EFMO energy and gradient. Second we compare our method to similar ONIOM calclations on the reaction barrier of the conversion of chorismate to prephanate in Chorismate Mutase.

Theory
The basics of EFMO can be summarized as follows. The system is divided into fragments and we use the adaptive frozen orbital technique (AFO) [31] to treat fragment boundaries by freezing the molecular orbitals corresponding to detached covalent bonds. Ab initio calculations of fragments are carried out without embedding, and the total polarization is evaluated using fragment polarizabil-ities. In the next step, ab initio calculations of dimers are carried out to account for two-body quantum effects such as the charge transfer between fragment pairs within a cut-off distance, R resdim . The total energy in the two-body EFMO expansion is then: Here E 0 I is the quantum mechanical gas-phase energy of each monomer fragment, DE 0 IJ is the quantum mechanical two-body polarization energy between two fragments, E POL IJ is the classical two-body polarization energy between two fragments, and E POL tot is the classical polarization energy of the system.
In the frozen domain method (FD) [20], the geometry of the molecular system is optimized only for a smaller subsystem called the active domain, while the atoms in the rest of the system are fixed.
For a given molecular system, we define two domains F (''frozen'') and A (''active''). Domain F is defined as all atoms having a frozen geometry and domain A is defined as all atoms whose positions are optimized. Each domain is further divided into a number of molecular fragments. In the frozen domain and dimers methods (FDD) [20], the domain with frozen geometry is further divided as a polarizable domain with frozen geometry, b and a domain with frozen geometry and fragment electron densities that are not updated after they have been calculated the first time. The EFMO energy [21] is then given by: where E 0 b and E 0 A are the internal energies of domains b and A, respectively, E 0 F=A is the interaction between domains F and A, E 0 A=b is the interaction between domains A and b and E POL tot is the classical total polarization energy of the whole system. In our EFMO-RHF:MP2 extension, we evaluate the internal energies of domain b and A at the RHF level. Furthermore, we specify a single fragment H (''high level'') from the active domain to be treated at the MP2 level of theory (see Fig. 1 for a schematic overview). The total EFMO-RHF:MP2 energy is then given as where E 0,MP2 H[A is the MP2 correlation energy of fragment H. The corresponding EFMO energy gradients of each domain in the FDD approximation: This gives the following EFMO-RHF:MP2 energy gradients:

Methods
All calculations were carried out in a development version of GAMESS [32] where FMO and EFMO are implemented [33].
Starting structures for Chorismate Mutase were obtained from Steinmann et al. [21] who prepared the structures following Claeyssens et al. [28]. The preparation can be summarized as follows: The experimental structure of Chorismate Mutase was obtained from the Protein Data Bank (PDB code: 2CHT) and protonated using PDB2PQR at pH 7. The inhibitors were manually replaced with Chorismate in the reactant state. The complexes were simulated in GROMACS with the CHARMM27 force field at 300K. The structure was then prepared for fragment based calculations in FragIt. [34] All residues with an atom within a distance of 2.0 Å from any atom in chorismate were assigned to the A (active) domain. All atoms in the prephanate/chorismate reaction complex were assigned to the H fragment. See The adiabatic mapping was carried out using the presented EFMO-RHF:MP2 gradient with 6-31G(d) basis set on all atoms. Two additional runs were also carried out, in these cases with the cc-pVDZ or cc-pVTZ on chorismate and 6-31G(d) on remaining atoms. The EFMO-RHF/6-31G(d):MP2//cc-pVTZ reaction path was obtained starting from the converged structures in the EFMO-RHF/6-31G(d):MP2//cc-pVDZ reaction path.
The RESDIM keyword was set to 1.5 and the optimization convergence criterion was set to 5:0 : 10 {4 Hartree/Bohr. Each step of the reaction path was obtained by imposing harmonic constraints on R 12 and R 13 with a force constant of 500 kcal/Å . The FDD approximation was enabled by setting MODFD = 3 in all calculations. GAMESS input files to calculate the reaction path at the EFMO-RHF/6-31G(d):MP2/cc-pVDZ level of theory in File S1.
Timings for the optimization procedure were carried out on 80 Intel Xeon X5550 CPU cores distributed across 10 nodes and the Generalized Distributed Data Interface (GDDI) was used to run the code in parallel [35].

Transition State Structure
We define the reaction coordinate similarly to Claeyssens et al. [28] as the difference in bond length between the breaking O2-C1 bond and the forming C4-C3 bond in chorismate, i.e. R~R 21 {R 43 (see Fig. 3). The reaction coordinate of the transition state was found to be 20.17 Å using the 6-31G(d) basis set on the MP2 fragment and 20.43 Å for both the cc-pVTZ and cc-pVDZ basis set reaction paths. The convergence with respect to basis set is in good, quantitative agreement with the coordinates obtained by Claeyssens et al. [30] using a QM/MM approach, treating the reaction complex at the LCCSD(T0) level of theory (20.4 Å ). In comparison, the corresponding MP2:RHF ONIOM calculations by Steinmann et al. [21] resulted in transition state reaction coordinates of 0.13, 20.36, and 0.13 Å with the cc-pVDZ, cc-pVTZ and cc-pVQZ basis sets used in the MP2 calculation, respectively.
The reaction coordinate found using MP2 to optimize the reaction complex substantially improves obtained the transition state structure compared to our MP2:RHF ONIOM approach and is in good agreement with a high-level calculation [30].

Reaction Barrier
Electronic energy barriers and reaction coordinates for the transition state are given in Table 1 and Fig. 4. We find the electronic energy barrier at the EFMO-RHF/6-31G(d):MP2/6-31G(d) level of theory to be 20.95 kcal/mol. Increasing the size of the basis set on the MP2 fragment decreases the barrier to 19.21 kcal/mol with the cc-pVDZ basis set and 18.34 kcal/mol with the cc-pVTZ basis set.  .79 kcal/mol with the cc-pVDZ, cc-pVTZ and cc-pVQZ basis sets, respectively. In contrast to the ONIOM approach, we find that for increasing basis set sizes, the electronic energy barrier is systematically reduced.
The experimental enthalpy barrier has been measured to be 12.7 kcal/mol. [28,36]. The large difference between the calculated reaction barrier and the experimentally measured barrier is likely caused by lack of conformational sampling and the relatively small size of the active geometry region, A. We have previously shown [21] that the reaction barrier can fluctuate by up to 15 kcal/mol between different conformational samples. A more accurate estimation of the reaction barrier (compared to the experimental value) using this approach would thus likely require averaging the barrier over a large number of conformational samples with a larger active region.

Reaction Energy
The energy difference between the product and reactant state is found to be 23.2 kcal/mol using the 6-31G(d) basis set on chorismate. Increasing the basis set to cc-pVDZ and cc-pVTZ on chorismate decreased the reaction energy to 26.83 kcal/mol and 26.17 kcal/mol, respectively. The ONIOM approach by Steinmann et al. found the reaction energy to be between 25.48 kcal/ mol to 20.82 kcal/mol. However, in the ONIOM approach increasing the basis set from cc-pVTZ on chorismate increased the reaction energy from 25.48 kcal/mol to 21.17 kcal/mol. We find that all three basis sets are in close agreement, and only a 0.7 kcal/mol difference between the cc-pVDZ and cc-pVTZ reaction paths.
The reaction energy calculated using the presented method is around 26 kcal/mol when the cc-pVDZ or cc-pVTZ are used in the MP2 calculation, which contrasts our earlier calculations where the reaction energy systematically increases as the basis set size is increased (see Table 1).
As we discuss in the previous subsection, a more accurate estimation would likely require averaging this values over a large number of conformational samples and possibly a larger active region.

Timings
Running on 80 cores distributed on 10 compute nodes and using the default compute node load balancing scheme, the average time for a geometry optimization step was 760s at the  EFMO-RHF/6-31G(d) level of theory [21]. For the EFMO-RHF/ 6-31G(d):MP2/6-31G(d) calculation, this time increased to 1526 s per step. Increasing the basis set on the MP2 part of the system to cc-pVDZ and cc-pVTZ increased the time to 1967 s and 18845 s, respectively (see Table 2). The large increase in calculation time from cc-pVDZ to cc-pVTZ was found to be due to sub-optimal load balancing in GDDI during the MP2 part of the calculation. Subsequently, one optimization using the cc-pVTZ was carried out, in which the calculation of the MP2 fragment energy and gradient was distributed across all 10 nodes. This reduced the average gradient step time from 18845 s to 10911 s. In other words, the slower calculation used 10 GDDI groups in the second (MP2) layer, whereas the faster one had 1 group, during the monomer step. The latter run is more efficient because the MP2 fragment was calculated by all 10 nodes, whereas in the former only by 1 node. A calculation of the reaction at the EFMO-RHF/6-31G(d):MP2/cc-pVDZ level is thus about 2.5 times more expensive than the same calculation at the EFMO-RHF/6-31G(d) level of theory. But as we have shown, the calculated reaction coordinate is essentially the same as that found using a coupled cluster approach [30] when applying the EFMO-RHF/6-31G(d):MP2/cc-pVDZ level of theory.

Conclusion
We have implemented an scheme for optimizing a reaction complex using a correlated method in the EFMO/FDD approximation. [21] Our method is computationally efficient when a moderately sized basis sets is used on the correlated fragment. At the EFMO-RHF/6-31G(d):MP2/cc-pVDZ level of theory, the method is about 2.5 times slower than the same calculation at the EFMO-RHF/6-31G(d) level. However, the use of a correlated method (MP2) in the optimization step substantially improves the calculated transition state compared to similar uncorrelated optimization with a subsequent MP2 ONIOM energy correction.
The modest increase in computational cost compared to an similar uncorrelated calculation makes the presented method very attractive for cases where electron correlation is essential for a correct and reliable geometry optimization. The method is a special case within the FMO or EFMO approximations and thus requires no further approximations, such as those in the ONIOM method, and is carried out in a single calculation in the GAMESS program.
The method is thus a general method to obtain geometry optimized correlated structures inside large molecular systems when using FMO or EFMO. For example the EFMO method has been used to estimate hydrolysis barriers for the enzyme Bacillus circulans xylanase [37].
Our EFMO-RHF:MP2 approach does not achieve chemical accuracy in predicting enthalpy barrier of the conversion of chorismate to prephanate in Chorismate Mutase, but as we show in our previous work this is likely due to the lack of structural sampling [21].
In conclusion, we have demonstrated that our method serves as a rigorous and viable alternative to the widely used ONIOM approach. Source code to add the method to GAMESS can be found at: https://github.com/andersx/efmo-rhf-mp2.

Supporting Information
File S1 GAMESS input files to calculate the reaction path at the EFMO-RHF/6-31G(d):MP2/cc-pVDZ level of theory.