Carbinolamine Formation and Dehydration in a DNA Repair Enzyme Active Site

In order to suggest detailed mechanistic hypotheses for the formation and dehydration of a key carbinolamine intermediate in the T4 pyrimidine dimer glycosylase (T4PDG) reaction, we have investigated these reactions using steered molecular dynamics with a coupled quantum mechanics–molecular mechanics potential (QM/MM). We carried out simulations of DNA abasic site carbinolamine formation with and without a water molecule restrained to remain within the active site quantum region. We recovered potentials of mean force (PMF) from thirty replicate reaction trajectories using Jarzynski averaging. We demonstrated feasible pathways involving water, as well as those independent of water participation. The water–independent enzyme–catalyzed reaction had a bias–corrected Jarzynski–average barrier height of approximately for the carbinolamine formation reaction and ) for the reverse reaction at this level of representation. When the proton transfer was facilitated with an intrinsic quantum water, the barrier height was approximately in the forward (formation) reaction and for the reverse. In addition, two modes of unsteered (free dynamics) carbinolamine dehydration were observed: in one, the quantum water participated as an intermediate proton transfer species, and in the other, the active site protonated glutamate hydrogen was directly transferred to the carbinolamine oxygen. Water–independent unforced proton transfer from the protonated active site glutamate carboxyl to the unprotonated N–terminal amine was also observed. In summary, complex proton transfer events, some involving water intermediates, were studied in QM/MM simulations of T4PDG bound to a DNA abasic site. Imine carbinolamine formation was characterized using steered QM/MM molecular dynamics. Dehydration of the carbinolamine intermediate to form the final imine product was observed in free, unsteered, QM/MM dynamics simulations, as was unforced acid-base transfer between the active site carboxylate and the N–terminal amine.


Introduction
A central tenet of modern biology is the almost completely immutable nature of the genetic information content of DNA. This results from substantial stability over time of both the DNA sequence and structure, despite an enormous variety of physical, chemical, and biological agents capable of compromising its integrity by structurally damaging the DNA. The stability is due, in no small part, to enzymatic DNA repair pathways evolved in response to the selective pressure to maintain this genomic information in an unaltered state, and these repair systems can be categorized by their substrates, structures, and chemical catalytic scenarios. For example, the enzymes of some of these pathways work on the mostly seriously distorting types of DNA damage utilizing enzymes whose structures have been evolu-tionarily maintained to some degree (e.g., the nucleotide excision repair pathway). Other pathways repair diverse types of base damage with similar catalytic scenarios despite the enzyme proteins themselves being structurally and evolutionarily unrelated. DNA base excision repair (BER) enzymes are of this latter type, and T4 pyrimidine dimer glycosylase is a member of this class [1]. BER enzymes initiate repair by cleaving the nucleotide glycosyl bond (glycosylase step), thereby removing the damaged base (or in some cases inappropriate base, e.g, uracil DNA glycosylase). The damaged DNA substrate specificity for BER enzymes primarily resides in this initial glycosylase step with lesser specificity in subsequent steps. One product of this reaction type is a DNA abasic site that must itself be repaired as it is toxic to the replication and transcriptional activities of the cell [2].
The various BER enzymes fall into two broad categories based on mechanism: those in which reactions subsequent to glycosyl bond scission may involve lyase-type DNA sugar-phosphate backbone cleavages (b-elimination and, possibly, d-elimination), and those which leave the DNA backbone intact, e.g., uracil DNA glycosylase. T4PDG initiates repair at pyrimidine ultraviolet photodimers [2,3] and furnishes an example of the steps, subsequent to glycosyl bond cleavage, characteristic of the first kind of BER enzyme (Fig. 1). The lyase steps, band d-elimination, proceed via DNA deoxyribose C1 0 -imine (Schiff base) intermediates in which the amine is the N-terminal threonine a-amine. The protonated imine is thought to act as an electrophilic catalyst [4] for the sequential band d-elimination reactions [2,3,5].
In addition to paradigmatic imine intermediates formed at sites of glycosylase reactions, the lyase-capable BER glycosylases can also form identical imines as products of Schiff base formation between the abasic site ring-opened deoxyribose aldehyde and the catalytic active site amine. In Fig. 1, E-glycosylase substrate site and E-abasic site designate, respectively, Michaelis complexes for the glycosylase and abasic site reactions leading to the key and characteristic Schiff base intermediate. This relatively long-lived imine can be reduced by borohydride to form a stable complex suitable for analysis and crystallography, for example see Golan, et al. [6] and references therein. In addition to the BER glycosylase/lyase enzymes, reduced imine complexes in other systems featuring covalent enzyme-substrate imine intermediates have been studied by crystallography, e.g., deoxyribose-5phosphate aldolase [7].
The T4PDG-DNA reduced imine structure 2FCC [6] models some aspects of the product of the glycosylase step and can be compared to 1VAS [8], a structure generally assumed to approximately model the glycosylase Michaelis complex. The protein components of these two structures were shown to be quite similar, as were those parts of the DNA not close to the active site. Differences in the DNA structures in the active sites largely reflected the ring-opened abasic site sugar derivative in 2FCC and the pyrimidine dimer nucleoside with a standard ring-closed deoxyribose moiety in 1VAS. While illustrating some of the structural aspects of enzyme catalysis, analyses of these two structures did not lead to a detailed mechanistic interpretation of the reaction itself, even though they modeled stages before and after the step leading to the key covalent intermediate. Since these enzymes catalyze reactions occurring on time scales of milliseconds to 10s of seconds [9], experimental exploration of mechanisms has been difficult. Most reviews of these enzyme mechanisms invoke speculations extrapolated from studies of model compound reactions in bulk solutions and environments quite different from the active site. Dynamic computational simulations of the reaction based on one or both of the crystal structures may suggest useful mechanistic hypotheses not evident from study of the static structures themselves or easily accessible experimentally. Of course interpreting mechanisms of the pyrimidine dimer glycosylase reaction based on these simulations of its abasic site analog should be confined to homologous reactions in the two systems, e.g., carbinolamine dehydration. Beyond that, these simulations should only be used to suggest hypotheses for the glycosylase reaction.
The initial reaction of an unprotonated amine with an aldehyde forms a Schiff base (imine) carbinolamine. Subsequent dehydration of the carbinolamine yields the final Schiff base product. The E-Abasic site L JSchiff base intermediate step of Fig. 1 is composed of these two sequential reactions. The simulated reactions reported here are enclosed within the rectangular box in Fig. 1. Earlier studies simulating the glycosylase step of T4PDG, based on the 1VAS structure and using a single distance (the glycosylase bond length) as the reaction coordinate, were unsuccessful (unpublished). These simulations showed such chemically unreasonable reactions as sugar C-C bond breakage. Trying to understand these unreasonable results suggested a hypothesis that water might be involved in the glycosylase reaction. Schiff base formation in aqueous solution involves transition states containing water molecules functioning as both nucleophilic and proton transfer reagents [10]. The involvement of water in the deoxyribose-5-phosphate aldolase reaction may be somewhat analogous to the aqueous solution reaction, since a conserved water was observed appropriately positioned to participate in proton transfer steps [7]. However, in the active site of T4PDG, the catalytic Glu22 residue is a possible participant in proton transfer steps [2,3]. Both the enzyme N-terminal a-amine and Glu22 may or may not be protonated, and these two chemical species are the only groups in close proximity to the site of imine formation whose pK a values identify them as likely general acids or bases capable of participating in proton transfer events or in proton exchange with bulk water [6,11]. Further, Fuxreiter et al. [11] have shown that formation of the enzyme-DNA complex shifts the pK a values of active site residues, leading to water penetration of the active site being less favored compared to the unbound enzyme. In turn, this should shift the equilibrium to make the unprotonated amine/protonated carboxylate state of the active site more probable compared to a solvent exposed active site. For purposes of this investigation, the overall charge in the active site was taken to be zero, i.e., the protonation states were either unprotonated amine/protonated carboxylate or protonated amine/unprotonated carboxylate. A natural question then arises from the experimental demonstration [6] of an imine intermediate in the T4PDG reaction: does water participate in the enzymecatalyzed reactions? A definitive experimental answer cannot be easily obtained, but high quality coupled potential QM/MM simulations of the reaction may lead to insights into this question. At the least such studies might suggest details of similar methods to use when simulating the T4PDG glycosylase reaction. To our knowledge, the involvement of water has not been contemplated in earlier reviews or discussion of experimental or theoretical results for these reactions.
In any QM/MM study, the level of theory for the quantum treatment is of primary importance. DFTB, the Density Functional Theory-based, tight binding Hamiltonian, including the Self Consistent Charge version, SCC-DFTB [12,13] was used in these investigations [14][15][16]. This quantum treatment, in a QM/MM setting, has been successfully employed to investigate difficult questions of enzyme mechanism [17,18], and as pointed out by Senn and Thiel [18] ''promises, within the validity domain of the parametrization, an accuracy comparable to density functional theory at the cost of semiempirical methods''. It should be mentioned that the system size amenable to ab initio molecular dynamics (MD) is very much smaller than that for the QM/MM methods used here. The truncation necessary to reduce the system to a size appropriate for ab initio MD would introduce errors and biases requiring extensive preliminary characterizations. These studies would be in computational resource competition with generating increased numbers of replicates for the averaging procedures used in the extraction of potentials of mean force.
Modern molecular dynamics simulation software and the associated force fields and integrators are parameterized such that the effective rate of exploration of phase space is directly comparable to real time. This poses a problem if the time scales of interest are longer than tens of nanoseconds since the simulation computational resource requirements are prohibitive unless the systems are very simple. Several methods have been developed to work around these limitations. Steered MD (SMD) accelerates the time scales of simulations by forcing the trajectory along small subsets of the many degrees of freedom of the complex systems. Since the simulation is forced along these few coordinates and is energetically distorted by this forcing, some means of correcting properties calculated from such trajectories must be used. The Jarzynski relationship [19] allows equilibrium physical properties to be recovered from averages over replicates of nonequilibrium (i.e., directed) analogs of the equilibrium processes [20][21][22][23][24].

Results
Steered MD allows reactions to be simulated that are so improbable on the time scale of molecular dynamics that they would effectively never happen during a simulation. This approach takes advantage of the Jarzynski equality which allows equilibrium properties to be obtained from averages over nonequilibrium processes [25]. Following the approach of Bustamante et al. [19], consider a system at temperature T whose equilibrium state is determined by a control parameter x. Let the system initially be in state A with control parameter x A . If the system is evolved via a nonequilibrium process by changing x along a given path x(t) to some final value x B , the Jarzynski relationship states that where DG is the free energy difference between equilibrium states A and B, W is the work done along the trajectory x(t), k B is Boltzmann's constant and S : T denotes an average over an infinite number of such nonequilibrium experiments repeated under the protocol x(t). While the derivation of the Jarzynski relationship invokes fairly deep statistical physics, the effect of using Jarzynski exponential averaging for PMF determination along a path is to minimize the impact of higher energy (and, therefore, lower probability) steered trajectories in favor of the lower energy (i.e., more probable) trajectories. The definition of the Jarzynski relationship is as an average over an infinite number of trajectories. Averaging over a finite number of trajectories introduces a truncation bias in the computation. Gore, et al. [24] have introduced methods that correct for that bias to a large degree, and have also furnished equations for the mean square errors associated with such a bias-corrected Jarzynski average. These methods have been used in place of a more classical definition of confidence limits. Further details are given in the file S4.
In this study, state A was a snapshot from an equilibrium MD trajectory carried out with a collective progress variable (CV) restrained to its initial value. Multiple such starting structures were obtained by periodically sampling the restrained trajectory. To avoid spurious correlations between SMD replicates started with these snapshots, they were not only obtained from different points in the restrained ensemble trajectory, but also simulated with different values of the pseudo random number generator (PRNG) seed (the system clock in microseconds was used to generate the seed value). The collective variable (a linear combination of interatomic distances) constituted x(t), and it was varied from initial to final values under a harmonic restraint. Since in this study covalent bonds were being broken, very stiff restraints were required (1000 kcal mol Preliminary studies with restraints in the range of 400{ 600 kcal mol straight lines with slight scatter (governed by the finite width of the underlying harmonic restraint parabola) but no large discontinuities. In all cases, the final value of the CV was attained at the end of the SMD simulations to within an interval consistent with the magnitude of the steering restraint. Fig. 2 illustrates the initial active site quantum region structure for the SMD simulation analyzed in Fig. 3. The atom names are those referred to in the collective variable definition in Fig. 3, is the distance between atoms A and B. The starting value of the SMD CV was its value at the end of the 1 ns QM/MM relaxation run (8.92 Å ). The ending value (3.52 Å ) was taken to be the sum of the bond lengths: d 1 (~1:50)zd 2 (~1:01)zd 3 (~1:01)A 0 . In this and similar simulation setups, the quantum water molecule(s) were restrained to be within a flat well potential spherical volume around In preparation for SMD studies of imine carbinolamine dehydration, simulations were carried out with a quantum water restrained in the active site by a constant ''collective variable''. The starting structure for these simulations was obtained from one of the product structures in the carbinolamine formation study. In these preliminary simulations to generate ''snapshots'' for SMD, carbinolamine dehydration was sometimes observed without SMD forcing. To follow up on these observations, three simulations were carried out with identical initial structures. In one of the simulations, no dehydration was observed, in a second, dehydration occurred with the participation of the restrained water as an intermediate proton transfer reagent, and in a third, dehydration occurred by direct transfer of the protonated Glu22 hydrogen to the carbinolamine oxygen without participation of the restrained water. Fig. 6 is a stereogram of the quantum region for these unforced carbinolamine dehydration simulations, and this diagram corresponds to the active site analyzed in Figs. 7 and 8. As this trajectory was not ''steered'' between differing endpoints, the CV did not change over its course. The starting (and ending) CV~d 3 zd 44 :235 A 0 . Note that in this kind of restraint, the positions of the individual atoms may change as only the sum of interatomic distances is restrained. The atoms are shaded as in Fig. 2.
In AMBER, residue and atom names do not change over the course of a simulation. At the end of the run analyzed in Fig. 7, the protonated Glu22 carboxylate hydrogen (HE2) was found associated with the water oxygen (O wat ) initially present in the active site, as was one of the original water hydrogens (H2 wat ), the   [39]. Such barrier heights apply only to the pathway followed in that trajectory, and the individual work functions are not good estimates of the pathway DG function for the reaction. The purpose of the exponential averaging in the Jarzynski algorithm is to properly weight the individual work functions so that functions with large peaks contribute only to a degree appropriate to their probability of occurrence. The estimate of DG along the reaction pathway should use the Jarzynski average or some other comparable function such as its bias corrected version (the better choice) [24]. doi:10.1371/journal.pone.0031377.g003   donor to the acceptor, with 0 indicating the hydrogen at the midpoint of the donor, acceptor interatomic distance. Progress along the trajectory is coded in color (as defined in the palette). The two proton transfers associated with water-mediated carbinolamine dehydration occur within 2.5 ps of each other, which is the trajectory sampling interval.
The abscissa of Fig. 7, Panel B is the same as the ordinate of Panel A, whereas the ordinate reflects the ''transfer of the O1 0 oxygen'' (plus its bonded H1 0 hydrogen) to the unconstrained water hydrogen (H1 wat ), forming the water molecule product of the dehydration reaction. As in Panel A, both reactions occur within the sampling dead time of each other. In this panel, as in Fig. 8, Panel B, a proton transfer coordinate is plotted against the ''transfer'' of the acceptor oxygen ''to a proton'' to form water. In the initial part of the trajectory, i.e., in the negative regions of both coordinates, these two movements are seen to be almost completely reciprocal to each other, as reflected in the relatively low scatter in the points.
Panel C shows a trajectory course view of the ''transfer of the O1 0 oxygen'' (and its bonded H1 0 hydrogen) to the water hydrogen to form the water product. The ordinate in this panel is the same as that of Panel B, and measures the oxygen transfer (the dehydration reaction itself). The red curve shows the distance of the water hydrogen to the dehydrating oxygen. Note that the black curve crosses 0 (the oxygen is at the numerical midpoint of the coordinate) at the same time the H1 wat {O1 0 distance becomes equal to and maintains a typical water O{H bond length.
In Fig. 8, Panel A, the black line is the distance d 1 {d 2 in Fig. 6 and measures proton transfer from the protonated Glu22 carboxylate (HE2) to the carbinolamine oxygen (direct general acid catalyzed carbinolamine dehydration). The red line is d 6 , the donor oxygen, acceptor oxygen distance for comparison. The proton transfer, at *65% of the trajectory duration, is accompanied by relative stability in the donor-acceptor distance. Eventually, separation of the oxygens due to diffusion diminishes the influence on the proton of the departed carboxylate oxygen at *88% of the trajectory duration. As mentioned above, interrogation of the product structure atom names unambiguously identified the simulated reaction course. Panel B compares the timing of the HE2 proton transfer from OE2 to O1 0 to that of the ''O1 0 transfer'' from C1 0 to HE2. Both reactions occur within one sampling frame.  In addition to spontaneous dehydration of the carbinolamine, unforced, direct proton transfer of the protonated Glu22 hydrogen to the unprotonated N-terminal amine nitrogen was observed. To investigate this event, Fig. 9 shows the quantum region active site used in three simulations. The structure is that of the carbinolamine formation reactants. All were started from this same structure (which had been observed to undergo this unforced proton transfer event), and with the same restrained quantum water: CV~d(N,C1 0 )zd(H1 wat ,O1 0 )zd(O wat ,HE2~8:953 A 0 at both the beginning and end of these unforced simulations. Spontaneous proton transfer occurred in one of the three simulations, but without participation of the quantum water. The atom names referred to in the variable definitions in Fig. 10 are given in Fig. 9. In Fig. 10, the black curve shows the reaction coordinate for the transfer of the HE2 protonated Glu22 hydrogen to the N-terminal amine nitrogen over the course of the trajectory. The proton transfer event occurred slightly after the midpoint of the trajectory duration. The red curve measures the distance between the donor and acceptor heavy atoms, and shows the interatomic distance between them decreasing directly before proton transfer. The potential for spontaneous proton transfer occurred when diffusion of the heavy atoms brought them into close proximity. Subsequent to the proton transfer event, the two curves track each other closely. Two other replicates in this series did not display spontaneous proton transfers of this type, showing only the two proton-heavy atom distances oscillating about their initial values (data not shown).

Discussion
The question was posed in the introduction whether water might participate in proton transfer reactions in the active site of T4PDG. The PMF determinations in Fig. 3 show a biascorrected Jarzynski-average barrier height of 15 kcal mol {1 (62:8 kJ mol {1 ) for the forward (carbinolamine formation) and 19 kcal mol {1 (79:5 kJ mol {1 ) for the reverse reactions when water participates in the reaction as an intermediate proton transfer reagent. Fig. 5 shows a bias-corrected Jarzynski-averaged barrier height of 6:5 kcal mol {1 (27:2 kJ mol {1 ) for the forward and 44:5 kcal mol {1 (186:3 kJ mol {1 ) for the reverse reaction when the proton transfer for the forward reaction is directly from the protonated Glu22 general acid to the accepting aldehyde oxygen. The reverse reaction barrier height in Fig. 5 has a large associated uncertainty. This is because of the large work function scatter after (to the left of) the Jarzynski average maximum. A component of the bias correction and MSE J2 calculation [24] subtracts the Jarzynski average from the the mean of the work functions for all trajectories (this difference is the dissipated work). The large variation in the work functions is manifest in the large square root of the MSE J2 [24] (RMSE J2 .) These two sets of barrier heights are, technically, not directly comparable because of differing definitions of the quantum region in the two types of simulations. Despite this caveat, a speculative interpretation of these differences might imply the greater barrier height of the water-mediated forward reaction involves a significantly smaller probability of correctly aligning the larger number of reactants. The barrier heights for the reverse reactions make them both rather improbable, even taking into consideration the large uncertainty in the reaction without the participation of water. These large barrier heights are consistent with turnover for these reactions occurring on time scales of milliseconds to 10s of seconds [9]. In addition to the scenarios simulated here, other possible general acids acting as proton donors should be considered. Simulations (not shown) of direct proton transfer from the protonated N-terminal amine to the aldehyde oxygen accompanied by carbinolamine formation resulted in very high barriers. However, several other scenarios for this step might be envisioned, e.g., indirect proton transfer from the amine via an intermediate water. This study by no means exhausts these possibilites. Nevertheless, given the approximations in the approach used, and given the observation of unforced direct and water-mediated carbinolamine dehydration reactions (discussed below), carbinolamine formation in the T4PDG active site, both with and without integral proton transfer via an intermediate water, seems feasible. However, the reaction without water would be expected to be more probable. Parenthetically, inclusion of nuclear quantum effects (NQE) would be expected to lower the calculated barrier heights since additional reaction pathways would be made available (e.g., proton tunneling). In general, the exact magnitude of such contributions cannot be predicted a priori. Nonetheless, the approach used here should be useful for measuring relative effects, as long as comparisons are only made between similar systems and with the understanding that barrier heights measured without NQE will be overestimated. Of course, if the objectives were calculations leading to comparisons with experimental rates, NQE must be included. However, no such experimental rates were available, and inclusion of NQE was beyond the scope of this project.
The frequent observation of unforced carbinolamine dehydration, with and without water-mediated proton transfer, was unexpected. This implies that the barrier height (at this level of representation) for such reactions would be close to the 300K thermal energy, or there might be no barrier. Since this study did not incorporate NQE [26], no attempt was made to quantitate such small effects. The proton transfer and C{O bond cleavage steps in carbinolamine dehydration appear to occur concomitantly within the 2.5 ps trajectory sampling uncertainty. There seems to be no long-lived intermediates in these partial reactions: see Fig. 7 and Fig. 8. Further simulation details are given in the file S3.
The effective pK a values for the acidic and basic groups within the active site govern the distribution of protonated and unprotonated species. A formal requirement for Schiff base formation is an unprotonated amine, but the possibility of scenarios such as proton transfer from a protonated amine to the aldehyde oxygen via an intermediate water (as mentioned above) makes even this ''requirement'' more complicated than might otherwise be expected. The possibility of low barrier proton transfer among the acidic and basic groups within the active site has been assumed, and the observation of an unforced version of this proton transfer (at least in the case illustrated by Fig. 10) supports that assumption.
QM/MM simulations of T4PDG-catalyzed reactions at a DNA abasic site revealed possibly significant features of the component carbinolamine formation and dehydration reactions. The PMF for carbinolamine formation was determined by SMD methods with and without participation of water as an intermediate proton transfer reagent. While PMF barrier heights did not completely rule out the feasibility of either scenario, at the level of representation used the barrier height without water involvement was approximately 6:5 kcal mol {1 (27:2 kJ mol {1 ) for the forward (formation) reaction and 44:5 kcal mol {1 (186 kJ mol {1 ) for the reverse reaction. The latter number has a large associated uncertainty. When an active site water served as a proton transfer reagent, the bias-corrected barrier heights were approximately 15 kcal mol {1 (62:8 kJ mol {1 ) in the forward reaction. The reverse height was 19 kcal mol {1 (79:5 kJ mol {1 ). These two cases should not be directly compared in detail since the quantum region definitions differed between them. Imine carbinolamine dehydration was observed to proceed during free QM/MM dynamics via two analogous scenarios: with and without involvement of an intermediate water proton transfer reagent (and without a requirement for SMD forcing in either case). Similar acid-base transfer of the protonated active site Glu22 proton to the N-terminal amine nitrogen was observed in one of three replicate free dynamics simulations. A restrained water was contained in the quantum region during these three simulations, but was not observed participating in covalent bonding changes during the reaction.
In summary, complex proton transfer events, some involving water intermediates, were studied in QM/MM simulations of T4PDG bound to a DNA abasic site. Carbinolamine formation using steering restraints was characterized, and the barrier height for formation was lower in the absence of water. Both the reverse reactions had large barrier heights, and these had large associated uncertainties as measured by the MSE J2 statistic of Gore, et al. [24]. In the case of the reaction without an intermediate water proton transfer reagent, the uncertainty was very large. Dehydration of the carbinolamine intermediate to form the final imine product was observed in free, unsteered, QM/MM dynamics simulations, as was unforced acid-base transfer between the active site carboxylate and the N-terminal amine.
Finally, these studies should serve as a reconnaissance for similar investigations of the T4PDG glycosylase reaction, identifying important details necessary for success simulating this more difficult reaction.

Materials and Methods
The initial structures for these simulations were modeled on the T4PDG, abasic site DNA, reduced imine Protein Data Bank (PDB) entry 2FCC (the A structure was used) [6]. Amino acid residue numbering in this document reflects the processed form of the enzyme protein, i.e., the N-terminal threonine is the second residue in the T4PDG gene sequence, but is referred to here as Thr1 since the various analytical programs in the AMBER suite [27] produce this type of PDB file by default. Details of the computational methods are provided in files S1, S2, and S3 with a synopsis of the most important methods and simulation parameters given here.

System preparation
New residue descriptors for the unprotonated and cis and trans protonated forms of the product imine formed between the T4PDG Thr1 amine and the ring-opened abasic site deoxyribose aldehyde, (R)-and (S)-configurations at the C1 0 chiral center of the protonated and unprotonated forms of the C1 0 carbinolamines, the ring-opened deoxyribose aldehyde and the unproto- nated form of Thr1 were prepared using the AMBER utility program, antechamber [28]. Since in QM/MM simulations these residues would be mostly within the quantum region, the AM1-BCC [29] partial atomic charge calculation scheme was used instead of the more demanding RESP approach [30]. Methods used to prepare the new residue descriptors are available in file S1, and the resulting descriptors and force field additions are available in file S6. Standard AMBER methods, as described in the file S2, were used to equilibrate the density and relax the explicitly solvated DNA-protein complex to the classical MD force field FF99SB [31] and stabilize the temperature at 300K.

QM/MM simulations
QM/MM simulations were carried out with SHAKE [32] turned off in the quantum region so that proton transfer reactions could be simulated. A 0.5 fs time step was used for the entire calculation to compensate for SHAKE being turned off. The quantum atoms were represented by the DFTB Hamiltonian [12,15,16]. The nonbonded cutoff was 9 Å for both regions. The pseudo random number generator seed was derived from the system clock to avoid serial correlations among the trajectories [33]. The electrostatics of the whole system within a periodic boundary representation was handled with a PME treatment of both the classical and quantum regions [13]. The QM/MM relaxation regime always included an additional 1 ns (2 000 000 steps) of dynamics after the system had undergone classical dynamics to allow the system to relax to the coupled potential force field. Temperature and pressure coupling were handled as described in the files S2 and S3, and other simulation details are available in these files.

Steered molecular dynamics
The AMBER molecular mechanics software suite has had the capability to do SMD using single distance or angle steered coordinates in several recent versions. With AMBER10, a new set of adaptively biased MD (ABMD) and related methods became available. Among other applications, the ABMD subsystem can be used to do SMD. The development of the AMBER version of this methodology has been reported [34], and capabilities to easily specify new ABMD collective variables (CV, also referred to as reaction coordinates) was designed into the software. In collaboration with Dr. Volodymyr Babin (North Carolina State University), the linear combination of distances (LCOD) ABMD variable was implemented to accommodate a requirement for a more flexible distance-based collective variable. LCOD is available as patches to released versions of AMBER9 and AMBER10 and is included in AMBER11 as released.
As an example of an LCOD variable, consider the linear combination of two distances: C 1 : jr i {r j jzC 2 : jr k {r l j, wherejr i {r j j is the distance between atoms i and j. An LCOD variable is specified by a list of atom numbers with an even number of elements and a list of signed real numbers corresponding to the coefficients, C i , in the linear combination of distances between the atoms taken as pairs. In most cases C i~+ 1. Atom numbers may be repeated as in, for example, C 1 : jr donorO {r H jzC 2 : jr H {r acceptorO j, C 1~1 , C 2~{ 1, with evolution along the collective variable specified by progress between negative (hydrogen closer to the donor oxygen) and positive (hydrogen closer to acceptor oxygen) limits. This variable would be analogous to, and a generalization of, traditional constrained reaction coordinates for proton transfer between donor and acceptor atoms [35]. The more generalized LCOD variables can also specify more sophisticated reaction progress variables [36] and as used in this communication.
Progress along the CV was forced by a harmonic restraint following a defined path. In this study the path was a line segment joining the end points. Since covalent bonds were being formed and broken, large restraint forces (1000 kcal mol {1 A 0 {2 or 4184 kJ mol {1 A 0 {2 ) were used.

Computation of potentials of mean force
Final potentials of mean force (PMF) were calculated by Jarzynski averaging [19][20][21][22][23][24] thirty SMD replicates of 2 000 000 steps (0.5 fs/step, 1 ns) duration. The initial structures for the replicates were prepared by periodically writing ''snapshots'' during a 1 ns QM/MM simulation carried out as above, except the collective variable was restrained to its initial value with the same harmonic restraint to be used for the SMD runs. These procedures increased the probability that the starting systems for the replicates all sampled the same ensemble. The harmonic restraint energies required to force the steered CV closely along its defined path were Jarzynski averaged, and the results were corrected for truncation bias due to using only a finite number of trajectories in the averages. The MSE J2 statistic was computed by the methods of Gore, et al. [24] as described in more detail in the file S4. The analytical and other software and methods for their use are described in file S5.

Supporting Information
File S1 Details of new residue descriptors for nonstandard residues in the simulated system. See file S6.