Structure of Type IIL Restriction-Modification Enzyme MmeI in Complex with DNA Has Implications for Engineering New Specificities

The creation of restriction enzymes with programmable DNA-binding and -cleavage specificities has long been a goal of modern biology. The recently discovered Type IIL MmeI family of restriction-and-modification (RM) enzymes that possess a shared target recognition domain provides a framework for engineering such new specificities. However, a lack of structural information on Type IIL enzymes has limited the repertoire that can be rationally engineered. We report here a crystal structure of MmeI in complex with its DNA substrate and an S-adenosylmethionine analog (Sinefungin). The structure uncovers for the first time the interactions that underlie MmeI-DNA recognition and methylation (5’-TCCRAC-3’; R = purine) and provides a molecular basis for changing specificity at four of the six base pairs of the recognition sequence (5’-TCCRAC-3’). Surprisingly, the enzyme is resilient to specificity changes at the first position of the recognition sequence (5’-TCCRAC-3’). Collectively, the structure provides a basis for engineering further derivatives of MmeI and delineates which base pairs of the recognition sequence are more amenable to alterations than others.


Introduction
Due to their exquisite selectivity, Type II restriction endonucleases (REases) are paradigms in the study of protein-DNA sequence recognition [1,2]. Approximately 4,000 have now been discovered [3], specific for a remarkable 365 different DNA sequences. Impressive as this number is, it represents only a small fraction of the total number of DNA sequences that could in principle be recognized. Attempts to increase the number of REase specificities by protein engineering have met with very limited success due both to our incomplete understanding of the molecular mechanism of recognition and to the proteins themselves, which inherently resist such changes [4][5][6][7][8][9], a property termed "immutability" [10]. Immutability stems from the circumstances under which these enzymes have evolved. REases occur mainly in prokaryotesbacteria and archaea-in partnership with DNA-methyltransferases (MTases) of identical specificity that serve to protect the cell's own DNA from REase cleavage [2,11,12]. Together, the two enzymes form a restriction-modification (R-M) system that confers innate immunity against viruses and other infectious genetic elements. Unless compensated for by a corresponding change in the partner enzyme, a change in the specificity of either one is liable to be detrimental due to cleavage of the host's DNA at unprotected sites [10]. Simultaneous, matching changes are exceedingly unlikely among systems in which the REase and MTase(s) are separate proteins that act independently.
Not all R-M systems behave in this way, however. The Type IIG and Type IIL families comprise bifunctional R-and-M (RM) enzymes in which the two catalytic activities share the same target recognition domain (TRD) for sequence recognition [13,14]. These enzymes can change specificity more readily because any change affects both restriction and modification activities in the same way at the same time [15]. There is a selective advantage for cells to switch restriction specificity occasionally to counter resistance among infecting viruses. Accordingly, the TRDs of the bifunctional Type IIL MmeI-family RM enzymes have evolved structures that lend themselves to such changes; as a result, the DNA sequences that these enzymes recognize have diversified very widely [15].
The bifunctional RM enzymes provide a natural platform for engineering new DNA-binding specificities, and some success in this direction has been achieved already [15,16]. The cloning of MmeI, from the bacterium Methylophilus methylotrophus, and comparison of its sequence to genome database sequences led to the identification of a family of homologs that, despite significant amino acid similarity, recognize different DNA sequences. Analysis of covariation between the DNA sequences recognized by these enzymes and the amino acid sequences of their TRDs enabled pairs of amino acids specifying several of the base pair positions to be identified [15]. By interchanging these amino acids, derivatives of MmeI and NmeAIII were constructed that recognize new DNA sequences with high fidelity [15]. No structural framework exists for understanding the atomic basis for these specificity changes, however, and this has limited the repertoire that has been rationally engineered in this way.
To better understand the structural basis of DNA recognition and cleavage by Type IIL enzymes, we have determined the crystal structure of MmeI in complex with its DNA substrate. MmeI is a large enzyme (919 amino acids, 105.1 kDa) that integrates DNA recognition and methyltransferase and endonuclease activities within the same polypeptide [13,17,18]. MmeI recognizes the asymmetric DNA sequence 5'-TCCRAC-3' (R = purine; A or G) and methylates the invariant adenine in the "top" strand (underlined). When multiple unmodified sites are encountered, MmeI cleaves the DNA approximately two helical turns downstream, on average 20 nucleotides (nts) away from the methylated adenine on the top DNA strand and 18 nts away on the bottom DNA strand (thus, TCCRAC 20/18). The structure reveals the amino acids responsible for DNA recognition in MmeI and suggests a basis for the long "reach" of the enzyme between its DNA recognition and cleavage sites. The structure establishes a framework for rationally engineering further derivatives from MmeI and its homologs, which possess new, intentionally chosen specificities.

Overall Architecture
MmeI is composed of five domains. An N-terminal PD-(D/E)XK-type endonuclease domain (residues 1-155) connects to a γ-class N6-adenine DNA-methyltransferase domain (6mA-M-Tase; residues 301-620) via a multi-helical spacer (residues 156-300) ( Fig 1A) [19]. These are followed by the TCCRAC-specific TRD (residues 621-825), and a final C-terminal helical bundle (residues 826-919) ( Fig 1A). The endonuclease domain is disordered in the present structure, but its putative position-preceding the spacer-is in keeping with the ability of the enzyme to cleave DNA outside of the recognition sequence ( Fig 1A). The DNA is embedded between the TRD and the MTase domain with the adenine to be methylated (TCCGAC) flipped out of the DNA helix into the catalytic pocket of the MTase domain (Fig 1). The TRD makes contacts to the DNA bases primarily in the major groove, while the MTase domain makes several contacts to the DNA in the minor groove. The primary role of the MTase is to catalyze transfer of the methyl group from S-adenosyl methionine (AdoMet) to the 6-amino group of the target adenine, which resides in the active site cleft of the MTase domain. The overall conformation of the DNA is B-DNA, but it is severely distorted at the juncture where the target adenine is flipped from the helix (Fig 1B). The sugar-phosphate backbone of the target adenine is displaced toward the MTase domain by several Angstroms, and the minor groove over this region widens by~7.6 Å when compared to a regular B-DNA.
The overall configuration of MmeI can be compared to that of the related Type IIG RM enzyme, BpuSI (878 aa; recognition sequence: GGGAC 10/14). BpuSI cleaves roughly one turn of the DNA helix closer to its recognition sequence than MmeI, and creates a 4-base 5'-overhang rather than a 2-base 3'-overhang. The structure of BpuSI has been determined in the absence of DNA and reveals an ordered endonuclease domain that is sequestered by the helical spacer (Fig 2) [14]. Superposition of the MmeI and BpuSI structures suggests that the main conformational change on DNA binding is an~38°rotation of the TRD to clamp onto the DNA (Fig 2). The MTase domain of MmeI, and to some extent the TRD, also superimpose on M.TaqI (421 aa; recognition sequence: TCGA), a monofunctional 6mA-MTase of the same γclass as MmeI and BpuSI. M.TaqI has been crystallized with and without DNA [20][21][22]; the position of the bound DNA in the former is nearly identical to that in MmeI. Concomitant with its inability to cleave DNA, M.TaqI lacks the N-terminal cleavage domain of MmeI (and of BpuSI) and the helical connector. It also lacks the C-terminal helical bundle that follows the TRD of MmeI.

DNA Sequence Recognition
The TRD is composed of two α/β subdomains comprising residues 621-745 (TRD-N) and 746-825 (TRD-C). These domains contact the bases of the recognition sequence exclusively in the major DNA groove. TRD-N mainly follows the backbone of the complementary strand of the recognition sequence and interacts with the first two base pairs of the recognition sequence (TCCGAC). TRD-C tracks the DNA major groove and interacts with the remaining bases (TCCGAC) (Fig 1A). These interactions are supplemented by contacts in the minor groove from the MTase domain. Altogether,~2100 Å 2 of solvent-accessible surface area is buried between the DNA and the TRD and the MTase domain (S1 Fig), in the range observed with conventional Type II restriction enzymes such as BamHI and BglII [10,23].
Position 1. The first base pair of the TCCRAC recognition sequence, T:A, is specified by three amino acids and appears to be achieved largely without hydrogen bonds (H-bonds).  Tyr738 makes a hydrophobic contact with the thymine 5-methyl group and is positioned by a non-specific H-bond to the phosphate backbone (Fig 3). Phe737 is in van der Waals contact with the thymine O4 atom and the N6 group of the paired adenine (Fig 3). The Ala723 side chain packs against the N7 and C8 positions of the adenine, in a position where it could presumably exclude a thymine base due to steric clash with the 5-methyl group.
The amino acids at these three positions in MmeI-family enzymes co-vary with the base pair recognized, although not in a simple way. At the Ala723 position, enzymes that recognize C:G have Arg or Lys instead, consistent with canonical contacts between their positively charged side chains and the O6/N7 H-bond acceptor atoms of guanine. Enzymes that recognize G:C typically have either a negatively charged carboxyl (Asp or Glu) or a hydroxyl (Ser or Thr) that could H-bond with the cytosine N4 donor group. In addition, enzymes that recognize A:T have Ala at position 723, like MmeI, but Gln at position 738 (where MmeI has Tyr), consistent with forming H-bonds at the N6/N7 positions of adenine. In contrast, those recognizing C:G typically have Glu at position 738, which could H-bond with cytosine N4, and those recognizing G:C typically have a positively charged amino acid at position 737 (MmeI has Phe), which could H-bond with guanine O6/N7.
We attempted to rationally alter MmeI specificity at base pair 1 by substituting co-varying amino acids at positions 723, 737, and 738; however, no active mutant enzymes recognizing C:G, G:C, or A:T in place of the wild-type T:A were found (S1 Table). In addition, we replaced the entire loop between Ala723 and Tyr738 with the sequence found in the highly similar enzyme NmeAIII, which recognizes G:C at position 1. This mutant was also found to be inactive. These results indicate that MmeI recognition at position 1 is much less plastic than recognition at the other base pair positions, and that the enzyme is less able to accommodate alternative amino acids within the segment of MmeI TRD apposed to position 1. Overall, it reinforces a notion that residues other than those contacting the bases can also influence specificity [10].
Position 2. The second base pair (C:G) appears to be specified predominantly by Tyr642 and Lys645. Tyr642 accepts an H-bond (2.7 Å) from the cytosine N4 group, and Lys645 donates bidentate H-bonds to the guanine O6 (3.0 Å) and N7 atoms (2.8 Å; Fig 3). In addition, in the minor groove, Lys487 from the MTase domain H-bonds nonspecifically with cytosine O2 (2.9 Å), and Ser488 forms two H-bonds with guanine N2 (2.9 Å) and N3 (3.3 Å). Thus, all of the hydrogen-bonding atoms and groups of the second base pair are involved in direct Hbonds with MmeI (Fig 3). Among MmeI-family enzymes, lysine at position 645 correlates most frequently with recognition of C:G at position 2, likely due to the bidentate H-bonds to guanine O6/N7.
We investigated specificity at position 2 by substituting Tyr642 and Lys645 with residues that correlate with the recognition of alternative base pairs in other family members. A single amino acid change of Lys645 to Met generated an active enzyme with a strong preference for A:T at position 2, and some residual activity towards the wild-type C:G. Purified MmeI K645M enzyme generated a fragment banding pattern consistent with cleavage at TACRAC; however, at the enzyme concentration required for nearly complete cleavage of TACRAC, partial cleavage at TCCRAC was observed as well (Fig 4). Interestingly, on pBR322 DNA, a substrate with 4 TCCRAC sites but no TACRAC sites, little or no cleavage at TCCRAC was observed, suggesting that binding to TCCRAC is substantially less efficient than binding to TACRAC. The double mutant Y642K + K645M changed specificity from C:G to R:Y at position 2 (Fig 4). Thus, altering Tyr642 to Lys in conjunction with the K645M mutation allowed productive binding at G:C in addition to A:T. DNA-methylation data obtained by PacBio SMRT sequencing suggested that this double mutant now has a preference for G:C over A:T (Fig 4). These findings demonstrate that both positions 642 (Tyr) and 645 (Lys) are important for specificity determination at position 2. The amino acids specifying the third, fourth, and sixth base pairs (TCCRAC) confirm our predictions from earlier multiple sequence alignments (MSAs) and covariation analyses [15]. Glu751 and Asn773 specify position 3 (C:G), Arg810 and Ala 774 specify the ambiguous position 4 (R:Y), and Glu806 and Arg808 specify position 6 (C:G; Fig 3).
Position 3. At base pair 3 (TCCGAC), Glu751 forms an H-bond with cytosine N4 (3.0 Å), and the amido nitrogen (ND2) of Asn773 forms bidentate H-bonds with guanine O6 (3.3 Å) and N7 (2.9 Å). We have shown previously that substitution of Glu751 by lysine or arginine, and of Asn773 by aspartate (D), changes the specificity of the enzyme from C:G to G:C at this position [15], mimicking the amino acid combinations that occur naturally in several MmeIfamily enzymes with this specificity.  Structure of MmeI/DNA Complex H-bond between Arg810 and guanine N7 could form equally well with adenine, and so the conformation of Arg810 is consistent with the ability of MmeI to recognize either purine base, G or A (i.e., R), at this position (Fig 3). Nonetheless, many MmeI-family enzymes are specific for just G:C at this position. Specificity for G:C rather than R:Y appears to correlate with the presence of a bulky amino acid at position 774 instead of the alanine in MmeI. We propose that the bulky amino acid obstructs thymine by sterically clashing with the thymine 5-methyl group, thereby preventing an A:T base pair from occupying position 4. Indeed, in our previous study, when we replaced Ala774 in MmeI with leucine, the A774L mutant recognized only a G: C at base pair 4 [15]. In addition, we found that whereas wild-type MmeI recognized and cleaved modified sequences containing 5-methylcytosine (5mC) at this position, the A774L mutant could no longer cleave the 5mC-modified sequence, consistent with the importance of the residue at position 774 in specifying R:Y or G:C at this position [15].
Position 5. At base pair five (TCCGAC), the thymine opposite the adenine to be methylated is specified by hydrophobic contacts with Thr752 and Leu805 (Fig 3), both well conserved among MmeI family members. The adenine itself is flipped out of the DNA helix and enters the catalytic cleft of the MTase domain. Interestingly, the guanine at position 4 (TCCGAC) is highly buckled, and its sugar moiety partially occupies the space vacated by the target adenine ( Fig 1B). Thus, the configuration of base pair 4 might contribute to the flipping of target adenine from the DNA helix.
Position 6. At base pair 6 (TCCGAC), the side chains of Glu806 and Arg808 are fixed in position by a salt link. Glu806 forms one H-bond with cytosine N4, and Arg808 forms one Hbond with guanine O6. Most MmeI family enzymes recognize either C:G or G:C at base pair 6, and the identities of amino acids at positions 806 and 808 correlate closely with specificity. The Glu806 and Arg808 pair (E-R) exclusively specifies C:G, and the Lys806-Asp808 pair (K-D) exclusively specifies G:C. Accordingly, in our previous study, when we changed the E-R pair in MmeI to K-D, the mutant switched specificity from TCCRAC to TCCRAG [15]. We suspect that the K-D pair is also stabilized by a salt bridge and makes analogous H-bonds with a G:C base pair at this position.

DNA Methylation
The MTase domain (aa~301-620) consists of a twisted β-sheet flanked by α-helices on both sides (Fig 1A). The two principal motifs characteristic of amino-methyltransferases, generically termed "FGG" (motif I = AdoMet-binding site, aa 360-370) and "DPPY" (motif IV = nucleotide binding and catalytic site, aa 481-484) extend from adjacent loops that connect secondary structure elements. Based on the order and sequences of these motifs, MmeI belongs to the γ class of amino-methyltransferases [19], in which motif I is typically . . .FDPACGCGXFL. . . and motif IV, . . .NPPF. . .. The extrahelical adenine (TCCRAC) occupies the catalytic cleft between motifs I and IV and forms three H-bonds with residues of the catalytic-site. Consistent with other γ-class (but not with β-class) amino-methyltransferases [24], motif IV residues face the Hoogsteen-edge of the flipped adenine base. The adenine N7 atom accepts one H-bond from the Phe484 main chain N (2.6 Å), and the N6-group donates one H-bond to Asn481 OD1 (2.8 Å) and one to Pro482 main chain O (2.7 Å) (Fig 5). A fourth, weak, H-bond might also be present between adenine N1 and Asn481 ND2 (3.5 Å). The extrahelical adenine is further stabilized by π-π interactions with the aromatic rings of His314, Phe484, and Trp570, which form a box around the base. His314 stacks on one side of the adenine, Phe484 stacks on the other, and Trp570 stacks edge-on (Fig 5). All of these amino acids (belonging to the NPPF motif IV, as well as His314 and Trp570) are absolutely conserved in the 341 MmeI-family enzymes whose sequences we have aligned to date.
The acceptor atoms of Asn481 and Pro482 to which adenine N6 donates H-bonds lie above the plane of the flipped base, suggesting that the nitrogen atom possesses a tetrahedral, SP3, orbital geometry, rather than the planar SP2 geometry it possesses when intrahelical. In this induced SP3 configuration, the electronegative lone pair orbital of the nitrogen points directly toward the electropositive methyl group of AdoMet modeled into our structure, appropriately positioned for methyl transfer by in-line nucleophilic attack (Fig 5). To avoid catalysis and methyl transfer in our complexes, we crystallized MmeI in the presence of the AdoMet analog, Sinefungin, which has a nontransferable amino group in place of the methyl group. This amino group is positioned 3.4 Å from the adenine N6 atom in our structure and is slightly displaced. When we aligned the structure of MmeI with that of M.TaqI (pdb:2ADM), which was crystallized with AdoMet [21,22], the cofactor and analog superimposed closely, and the methyl group of AdoMet was found to be closer to the adenine N6 atom (3.0 Å) and in slightly better alignment.

DNA Cleavage
MmeI-family enzymes have the longest "reach" among Type II REases, cleaving DNA with some variability 21-22-nt away from the methylated adenine in the "top" DNA strand, and 19-20-nt away in the complementary, "bottom," strand. In the majority of these enzymes, the methylated A is the penultimate base in the recognition sequence, and so most cleave approximately 20/18 downstream from the sequence. By comparison, FokI, a Type IIS REase in which the DNA recognition and cleavage functions are also located on separate domains, cleaves DNA 9-nt/13-nt downstream of the recognition sequence [25]. The ability of MmeI to generate 20-bp "tags" has made it an attractive enzyme for certain applications, including serial analysis of gene expression (SAGE) and paired-end tags (PET) in next-generation DNA sequencing. Although the cleavage domain of MmeI (residues 1-155) cannot be seen in our structure (S2 Fig), its putative position, far from the TRD, is consistent with the ability of MmeI to cleave some distance away from the recognition sequence (Fig 1A). The helical spacer likely plays a key role in positioning the cleavage domain correctly in this regard, 20-nt/18-nt, from the sequence recognized.
Amino acid sequence analysis of MmeI family enzymes indicates that each contains only one catalytic site, belonging to the PD. . .(D/E)XK nuclease superfamily [11,12]. The two parts of this motif, PD and (D/E)XK, usually form the termini of adjacent β-strands and fold such that the acidic residues (D and E) coordinate one or more divalent metal ions, and the lysine (K) contributes to activation of a hydrolytic water molecule [11,12]. In the case of MmeI, the catalytic residues are V69-D70. . .E80-M81-K82, and mutation of D70, E80, or K82 to alanine eliminates endonuclease activity [26]. REases generally cleave both strands of duplex DNA in one binding event, and so their active forms are often multimeric, comprising two, four, and sometimes more identical subunits [12,27]. At a minimum, MmeI must cleave DNA as a dimer in which the catalytic domains of two molecules interact and each cleave one DNA strand. There is "vacant" space in the crystals adjacent to the helical spacer that can accommodate a domain of the size of the cleavage domain. The lack of electron density in this region (S2 Fig) suggests that the cleavage domain is mobile and flexibly tethered to the helical spacer, and that it may only become ordered when dimerized with that of a second enzyme molecule to form a competent cleavage complex. A similar pattern (disordered endonuclease domain in the crystal) was also observed in structures of a Type III RM enzyme EcoP15I [24] and a Type IIS enzyme AspBHI [28]. Unlike Type IIG BpuSI, MmeI requires two DNA recognition sites for efficient DNA cleavage, suggesting that both molecules must be bound to recognition sites in order to dimerize productively.

Discussion
We present here the first crystal structure of a Type IIL RM enzyme bound to its DNA substrate. MmeI differs from conventional Type II R-M systems (such as BamHI or EcoRI) in that the DNA recognition, methyltransferase, and endonuclease activities reside within the same polypeptide. The fact that the same DNA recognition module is responsible for host modification and endonuclease functions makes MmeI (and related enzymes) much more amenable to changes in DNA-binding and -cleavage specificities than conventional Type II enzymes. Based on bioinformatics analysis alone, we have rationally engineered dozens of MmeI-like enzymes with new specificities [15]. These specificity changes are at positions 3, 4, and 6 of the MmeI recognition sequence (TCCRAC), and the engineered enzymes have specific activities that are comparable to the wild-type enzyme. The DNA-bound MmeI structure provides a molecular basis for these specificity changes and reveals new interactions to guide the engineering of additional enzymes.
Overall, MmeI recognizes base pairs 3, 4, and 6 (TCCRAC) in a similar manner to that anticipated from previous bioinformatics analyses. As anticipated, base pair 3 is recognized by Glu751 and Asn773; base pair 4 by Arg810 and Ala774; and base pair 6 by Glu806 and Arg808. This convergence between structure and bioinformatics analysis shows the utility of covariation analyses using MSAs in predicting amino acids that recognize DNA in Type IIL REases. The structure provides atomic-level details on how Glu751, Asn773, Arg810, Ala774, Glu806, and Arg808 actually interact with DNA and a basis for specificity changes reported previously, including C:G to G:C at position 3, R:Y to G:C at position 4, and C:G to G:C at position 6 ( Fig 6).
Notably, previous bioinformatics and MSA covariation analyses did not yield insights into how MmeI (and related enzymes) recognizes DNA at positions 1 and 2. Our structure suggests that the T:A base pair at position 1 (TCCRAC) is specified mainly by hydrophobic interactions between Tyr738 and the 5-methyl group of T. Interactions with the C:G base pair at position 2 are more extensive than to other base pairs, with specific hydrogen bond contacts from the major (Tyr642 and Lys645) and minor (Lys487/Ser488) groove sides. Previous sequence covariation analyses failed to pinpoint the positions corresponding to MmeI Tyr642 and Lys645 as specifying recognition at position 2, because similar amino acid residues at these positions give rise to different sequence specificities in various MmeI family enzymes. For example, isoleucine and lysine at these positions, respectively, results in recognition of an A:T base pair in EsaSSI, MchCM4I, and AquIII, but C:G base pair in RmuAI. Several other enzymes that recognize an A:T base pair at this position contain a methionine at the position corresponding to Lys645, paired with either tyrosine (NlaCI) or phenylalanine (SdeAI, CstMI) at the position corresponding to Tyr642. Accordingly, when we change Lys645 to methionine in MmeI, the altered enzyme now preferentially recognizes an A:T base pair at position 2 (Fig 4), though it retains some partial activity toward the wild-type C: G base pair. It is likely that hydrophobic interactions between the methionine and the 5-methyl group of T contribute to this preference (Fig 6). Tyr642 seems readily able to contact an adenine in place of a cytosine, likely making similar interactions with the adenine N6 as with the cytosine N4 (Fig 6). Changing Tyr642 to Lys in combination with Lys645Met resulted in recognition of R (both A:T and G:C) at position 2. MmeI homologs that recognize a G:C base pair at position 2 also have Lys or Arg at the 642 position (RflFIII), often paired with Gln at position 645. In our modeling, Lys642 appears well positioned to contact the N7 of the purine (A or G) and may be localized for this contact by interaction with the backbone carbonyl of Asn773 and the hydroxyl of Tyr776. These results demonstrate the importance of both Tyr642 and Lys645 positions in specifying recognition at position 2 in the MmeI family enzymes.
Overall, the creation of enzymes with programmable DNA-binding and -cleavage specificities has been a goal ever since the discovery of REases more than 40 y ago. However, attempts to rationally alter the DNA recognition specificities of conventional Type II REases have met with very limited success. Most of the current effort has thus shifted to artificial nucleases such as Zinc Finger Nucleases and transcription activator-like effector nucleases (TALENs), or to homing endonucleases [29][30][31][32][33][34][35][36]. The many recently discovered MmeI-like enzymes offer an alternative approach to achieve "true" REase specificity engineering. The fact that a single DNA recognition module is responsible for host modification and restriction in these enzymes allows for rapid evolution of new specificities. The MmeI structure provides a basis for beginning to understand how Type IIL enzymes like MmeI recognize their DNA substrates and a framework for changing their specificities. to Arg751 and Asp773 converts DNA specificity from C:G to G:C; at position 4, mutation of Arg810 and Ala774 to Ser810 and Lys774 converts DNA specificity from G:C to C:G; at position 6, mutation of Glu806 and Arg808 to Lys806 and Asp808 converts DNA specificity from C:G to G:C.

Protein Expression and Purification
Both the native and selenium-methionine (Se-met) MmeI proteins were expressed and purified as described previously [37].

Crystallization and Structure Determination
The native MmeI protein was successfully crystalized in complex with a 29-mer DNA using 2 μl hanging-drops over 1 ml reservoirs at 293 K. The optimized crystals were grown using a mother liquor of 20% PEG 4K, 0.1M Hepes (pH7.5), and 0.1M (NH 4 ) 2 SO 4 . Resolution was improved to 2.6 Å by replacing several thymines outside of the recognition site with 5-bromouracil (5'TATCCGACAUAACGCUAGUCACUAGCUUC-3'/3'ATAGGCTGUATUGCGAU CAGUGAUCGAAG-5'; where U is 5-bromouracil). The brominated DNA oligonucleotides were synthesized at New England Biolabs and PAGE purified prior to crystallization. For cryoprotection, the crystals were soaked for 5 min in solutions containing mother liquor plus increasing concentrations of glycerol (final concentration of 30% glycerol) and plunged into liquid nitrogen. Given the absence of an appropriate molecular replacement solution, co-crystals with Se-met MmeI (14 methionines per molecule) were grown under similar conditions as the native enzyme. The Se-met crystals diffracted to 3.0 Å resolution.
The X-ray diffraction data on the MmeI/DNA/Sinefungin co-crystals were measured at the Advanced Photon Source at the Argonne National Laboratory. The data on native crystals were measured at beamline 23ID-D at a wavelength of 0.91938 Å, while single wavelength anomalous data on a Se-Met crystal were measured at a wavelength of 0.97944 Å (Se-K absorption edge) at the beamline 24ID-C. The HKL2000 package [38] was used to merge and scale Xray data. Both the native and Se-Met crystals belong to space group P1. The unit-cell dimensions of native crystals are a = 61.87 Å, b = 95.29 Å, c = 161.96 Å, α = 72.84°, β = 89.15°, and γ = 71.61°; and unit-cell dimensions of the Se-Met crystals are a = 62.08 Å, b = 94.68 Å, c = 159.91 Å, α = 73.34°, β = 80.35°, and γ = 71.89°. The structure was solved using SAD phasing method using SHARP [39]. The electron density map derived from experimental phasing was readily interpretable and showed clear electron density of both protein and DNA molecules. The model was built manually using program Coot [40] and iteratively refined with the program package Phenix [41] to the 2.6 Å resolution limit of the native crystals ( Table 1). The final model contains two molecules of MmeI bound to two separate DNA duplexes and two Sinefungin moieties. The quality of the structure is excellent, with >97% of the residues in the most favored regions of the Ramachandran plot (Table 1).

DNA Cleavage Assay
Endonuclease activity was assayed by incubating various amounts of MmeI (wt or mutant) enzyme for 30 min at 37°C in NEBuffer 4 (20 mM Tris-acetate, pH 7.9, 10 mM magnesium acetate, 50 mM potassium acetate, 1 mM DTT) supplemented with AdoMet at 80 μM, containing 1 μg substrate DNA per 50 μl. Reactions were terminated by the addition of loading dye (NEB B7024) and reaction products were analyzed by gel electrophoresis in 1% LE agarose gels.
Supporting Information S1 Fig. A schematic of amino acid-nucleic acid contacts in the crystal structure. The amino acids dictating specificity of the recognition sequence (labeled 1-6) are depicted directly above the contacting bases. Contacts are only depicted if the distance between bonding atoms is less than 3.5 Å in the crystal structure. (TIF) S2 Fig. Electron density map. A view of section of a 2Fo-Fc map (contoured 1.3σ) shows absence of electron density for the endonuclease domain, ahead of the helical spacer, suggestive of its disorder or highly mobile nature. (TIF) S1