Structural Biology of Human H3K9 Methyltransferases

SET domain methyltransferases deposit methyl marks on specific histone tail lysine residues and play a major role in epigenetic regulation of gene transcription. We solved the structures of the catalytic domains of GLP, G9a, Suv39H2 and PRDM2, four of the eight known human H3K9 methyltransferases in their apo conformation or in complex with the methyl donating cofactor, and peptide substrates. We analyzed the structural determinants for methylation state specificity, and designed a G9a mutant able to tri-methylate H3K9. We show that the I-SET domain acts as a rigid docking platform, while induced-fit of the Post-SET domain is necessary to achieve a catalytically competent conformation. We also propose a model where long-range electrostatics bring enzyme and histone substrate together, while the presence of an arginine upstream of the target lysine is critical for binding and specificity. Enhanced version This article can also be viewed as an enhanced version in which the text of the article is integrated with interactive 3D representations and animated transitions. Please note that a web plugin is required to access this enhanced functionality. Instructions for the installation and use of the web plugin are available in Text S1.


Introduction
Post-translational modifications of histone proteins regulate chromatin compaction, mediate epigenetic regulation of transcription, and control cellular differentiation in health and disease [1,2]. Methylation of histone tails is one of the fundamental events of epigenetic signaling [3]. Tri-methylation of lysine 9 of histone 3 (H3K9) mediates chromatin recruitment of HP1, heterochromatin condensation and gene silencing [4,5]. Similarly, methylation of H3K27 and H4K20 are associated with a repressed state of chromatin, whereas expressed genes are methylated at H3K4, H3K36 and H3K79 ( [3,6] for review). Histone methyltransferases are divided into protein arginine methyltransferases (PRMTs) and histone lysine methyltransferases (HKMTs). HKMTs catalyze the transfer of a methyl group from the co-factor S-adenosyl-Lmethionine (SAM) to a substrate lysine and, with the exception of DOT1L, are all organized around a canonical SET domain [7,8]. The structures of a number of HKMTs have been reported, including ternary complexes of human orthologs with co-factor and substrate peptides (SETD7-H3K4, SETD8-H4K20 and MLL1-H3K4 [9,10,11,12]), as well as N. crassa Dim-5 in complex with a H3K9 peptide [13] and a viral protein complexed to H3K27 [14] (Figure 1). These structures collectively highlighted a remarkable plasticity of the peptide-binding site and a lack of clear structural motifs that correlate with sequence selectivity [8,15].
Methylation of H3K9 in humans relies mostly on members of the Suv39 family, namely EHMT1/GLP, EHMT2/G9a, SUV39H1, SUV39H2, SETDB1 and SETDB2, as well as then non-Suv39 enzymes PRDM2 and ASH1L [6] (Figure 1). Here we report the high-resolution crystal structures of the methyltransferase domains of GLP, G9a, SUV39H2 and PRDM2, and propose a structural mechanism for substrate recognition. Our data also provide important insight to guide the development of potent and selective inhibitors of HKMTs, which are likely to have applications in a variety of diseases including regenerative medicine, oncology and inflammation [16,17,18,19,20] ( [21,22] for review).

Overall Structure
We have solved the crystal structures of the catalytic domain of four H3K9 methyltransferases: (1) the complexes of GLP/ EHMT1 co-crystallized with the co-factor product S-adenosyl-L-homocysteine (SAH) alone (accession code 2IGQ) and with an H3K9me (accession code 3HNA) or H3K9me2 peptide (accession code 2RFI), (2) G9a/EHMT2 (accession code 2O8J) and SUV39H2 (accession code 2R3A) in complex with SAH and SAM respectively, and (3) the non Suv39 protein PRDM2 (accession code 2QPW). The Suv39 structures G9a, GLP and SUV39H2 adopt a typical fold composed of a conserved SET domain and variable I-SET insert, flanked by Pre-and Post-SET regions, and characterized by canonical features such as a pseudoknot next to the catalytic site, distinct co-factor and substrate binding areas meeting at the site of methyl transfer, and a narrow substrate lysine docking channel [8,23,24]. The four structures are also characterized by the presence of an N-SET region located Nterminal to the Pre-SET, that wraps around the core SET domain (Figure 2A-D). The H3K9 peptide lies in a groove formed by the I-SET and Post-SET domains (Figure 2A), as previously observed in other HKMTs ( [8] for review), and makes extensive contact with the enzyme through both backbone and side-chain interactions ( Figure 2E).

Methylation State Specificity
Mono-, di-or tri-methylation of H3K9 constitute distinct biochemical signals and are established by distinct histone methyltransferases; G9a and GLP are mono-and di-methylases, and SUV39H2 di-and tri-methylates a mono-methylated substrate. The specificity of PRDM2 is unknown [25]. Several aromatic residues line the GLP channel occupied by the substrate lysine leading to the catalytic site and contribute to methylation state specificity as previously noted for several other HKMTs [9,12,26,27]. Two residues are of particular importance for catalysis. First, a conserved tyrosine residue of the post-SET domain (Y1211 in GLP and Y1154 in G9a) is a major component of the lysine binding channel while its hydroxyl group participates in catalysis. This residue cannot be mutated without losing catalytic activity ( Figure 3) [8,28]. Second, the hydroxyl group of GLP's Y1124 (Y1067 in G9a) hydrogen-bonds to the methylaccepting nitrogen, thereby inhibiting an orientation of the dimethyl-amine that would favor transfer of a methyl from SAM, as was previously shown for SETD7 ( [12,28] for review). To confirm this model, we showed that, unlike wild-type G9a, the Y1067F mutant is able to tri-methylate H3K9 (Figure 3). Similarly, it was shown that the F1152Y G9a mutant can only mono-methylate H3K9 [29]. Our structures show that this residue is perfectly superimposed with GLP's F1209 (0.1 Å RMSD), which, if mutated to Tyr, would hydrogen-bond with the e-amine nitrogen of H3K9 and impair the alignment of the accepting amine's lone pair with the methyl-sulfur bond of SAM ( Figure 3). Thus, the methylation state selectivity appears to be inversely proportional to the number of tyrosine residues surrounding the methyl accepting nitrogen.

I-SET Is a Rigid Peptide-Docking Platform
The structure of the I-SET domain appears relatively conserved, whether in the apo, co-factor-, or peptide-bound form, and is composed of a helix followed by a two-stranded anti-parallel b-sheet, linked by loops of variable lengths ( Figure 4). Superimposition of our six new H3K9 methyltransferase structures (see Materials and Methods) shows that in all cases the first b-strand is in a conformation which would preserve the pair of backbone hydrogen-bonds observed between Lys-9 substrate and strand-1 of I-SET in the GLP-peptide complex ( Figure 4). Furthermore, a systematic comparison of the ternary structures presented here for GLP-H3K9 and previously published for Dim-5-H3K9, SETD8-H4K20, SETD7-H3K4, SETD7-TAF10, SETD7-P53, vSET-H3K27 reveals that this pair of hydrogen bonds between the backbone of a single substrate peptide residue and the first strand of I-SET is observed (1) in all HKMT ternary structures to date and (2) always and only at the substrate lysine ( Figure 5A-E -SETD7-TAF10 and SETD7-P53 complexes not shown). This suggests that an evolutionary pressure enforces conservation of this ''double hydrogen-bond'', which likely plays an important role in the binding mechanism, probably by imposing the proper orientation of the peptide when lysine inserts into the active site; flipping the substrate by 180u in its groove would not allow formation of the double hydrogen bond.
Our structures single-out residue R-1 of H3 (the methylated lysine is used as reference position 0 for peptide residue numbering throughout the text) as the major contributor to the interaction after the substrate lysine itself, with four direct hydrogen-bonds between the arginine guanidinium group and GLP ( Figure 2E). This is in agreement with recent mutational analysis showing that no substitution at position -1 is tolerated by G9a [30]. This critical interaction takes place exclusively with I-SET residues ( Figure 5F). GLP residues that contribute to substrate binding are conserved in G9a ( Figure S1), and it is reasonable to assume that the peptide binding mode observed in GLP holds for G9a. On the other hand, mutation of R-1 to alanine only mildly affects peptide binding to Dim-5, a H3K9 methyltransferase in N. crassa [31], indicating that the selectivity mechanism observed for human GLP and G9a is not universal.
Intriguingly, our structures of GLP in complex with mono-or di-methylated susbtrate peptides reveal that residue H3K4 is making 2 hydrogen-bonds with the D1131 and D1145 side-chains of the I-SET domain ( Figure 6), which suggests that H3K4 methylation may lower binding affinity, and reduce H3K9 methylation efficiency. We tested this hypothesis, and observed a  (Top panel) was generated from the superposition of the active sites of the GPL ternary complex and GLP. Y1067 of G9a stabilizes the di-methylamine end of the substrate lysine in an orientation where the lone-pair is not facing the co-factor, thereby disfavoring transfer of a third methyl group. The Y1067F mutant loses this restriction and can tri-methylate its substrate, as indicated in the table. Previous work had shown that the F1152Y G9a mutant can only mono-methylate H3K9 [29]. doi:10.1371/journal.pone.0008570.g003 mild decrease of 43% in affinity of GLP for a H3K9 peptide trimethylated at lysine 4, and similar reduction in enzymatic efficiency, while Kcat was unaffected. Mono and di-methylation of H3K4 had no or very limited effect (data not shown). It is not clear whether this variation is biologically significant.
These results suggest a model in which a mostly pre-formed I-SET domain acts as a receiving platform for the histone 3 tail. Binding includes a conserved pair of hydrogen-bonds with the backbone of the substrate lysine, and critical contacts with a basic side-chain upstream of the methyl acceptor.

Mobile Post-SET Domain Closes onto the Peptide Substrate
The Post-SET domains of G9a, GLP, SUV39H2, but not PRDM2 include a ZnCys motif previously observed in the structures of Dim-5 [13] and the H3K4 methyltransferase MLL1 [10]. The Post-SET domains of G9a and GLP present an a-helix that contributes to peptide binding where other HKMTs have a loop. Unlike I-SET, Post-SET is absent from the PRDM2 structure, which lacks co-crystallized SAM or SAH ( Figure 2D). It is partially folded in the structures of G9a and SUV39H2 in complex with SAH and SAM respectively ( Figure 2B-C), and fully ordered in the ternary complexes of GLP with SAH and H3K9 peptide ( Figure 2A). As previously observed with other HKMTs ( [9,11,8] for review), the co-factor contributes to the formation of a hydrophobic, mostly aromatic cluster (composed of Post-SET Y1211/Y1154/Y261, F1215/F1158, W1216/W1159/L298, F1223/F1166/T285 and SET H1170/H1113/H220 in GLP/ G9a/SUV39H2) necessary for partial folding of the Post-SET domain. Surprisingly, in our structure of SUV39H2, Post-SET Lys-264 is inserted into the partially formed substrate lysine binding channel, which may represent some form of autoinhibitory mechanism (Figure 7). Post-SET is fully structured only when bound to the substrate peptide (Figure 2A), or to a small molecule inhibitor [32], but the density is incomplete otherwise ( Figure 2B-D). This implies that Post-SET is naturally flexible, which may be important for peptide turn-over, as recently proposed [10].
Altogether, these results suggest the presence of three conformational states for the Post-SET domain. (1) A flexible or even disordered state when no co-factor or peptide is bound. (2) A loose conformation when SAM or SAH, but no peptide is bound, which may also accommodate non-specific sequences. (3) A more rigid conformation in which the I-SET domain closes onto the substrate peptide. These states are probably in a dynamic equilibrium, with the co-factor and substrate shifting the equilibrium toward conformation 3.

Long Range Electrostatics Attract Histone Peptides to the HKMT Binding Groove
Histone tails are rich in lysines and arginines and consequently are electropositive ( Figure 8A). Mapping the electrostatic potential along the molecular surface of GLP, G9a, SUV39H2 and PRDM2 shows that the peptide-binding groove is consistently electronegative ( Figure 8B-E). This feature is also conserved in the structure of the N. crassa H3K9 methyltransferase Dim-5 ( Figure 8F). This suggests that non-specific long-range electrostatic attractions play an evolutionarily conserved role in guiding the substrate-binding groove towards histone tails.
Based on the four HKMT structures presented here, we propose a mechanism for selective lysine H3K9 methylation, in which (1) long-range electrostatics attract the enzyme onto basic histone tails, (2) a pre-formed I-SET domain carries structural determinants necessary for specific interactions with the substrate peptide, and (3) catalytically competent conformation is achieved by subsequent closing of the Post-SET domain on the substrate. Considering the electronegative potential of the binding groove, our analysis suggests that HKMT inhibitors should be rather basic. To achieve selectivity, inhibitors should bind sites with clear interaction field potential occupied by residues distal to the substrate lysine. A recent co-crystal structure of the first specific HKMT inhibitor supports these general concepts [32].

Data Collection and Structure Determination
X-ray diffraction data were collected at 100 K at beamline 17ID of Advanced Photon Source (APS) at Argonne National Laboratory, beamline X25 of the National Synchrotron Light Source, beamline A1 of Cornell High Energy Synchrotron Source (CHESS), Cornell University, and a Rigaku FR-E home source. Data were processed using the HKL-2000 software suite [33]. The structures of methyltransferase domain of GLP, G9a, and SUV39H2 were solved by molecular replacement using the program MOLREP [34]. ARP/wARP [35] was used for automatic model building. Graphics program COOT [36] was used for model building and   visualization. PRDM2 structure was solved by single-wavelength anomalous diffraction (SAD) at low resolution, using a selenomethionine derivative crystal with the program SHELXD [37], and the phasing was performed using SHELXE [38]. The low resolution structure was used as model to solve the native structure at higher resolution. Crystal diffraction data and refinement statistics for the structure are displayed in Table 1. One residue is in a disallowed area of the Ramachandran plot in our GLP (M1049) and G9a (I992) structures. This strained residue maps at a conserved location, remote from the peptide and cofactor binding sites.

Histone Methyltransferase Assay
The SAHH-coupled assay described by Collazo et al. [39] was optimized and employed to assay the activity of G9a. This assay utilizes S-adenosylhomocysteine hydrolase (SAHH) to hydrolyze the methyltransfer product S-adenosylhomocysteine to homocysteine and adenosine in the presence of adenosine deaminase which converts adenosine to inosine. The homocysteine concentration is then determined through conjugation of its free sulfhydryl moiety to a thiol-sensitive fluorophore, ThioGlo (Calbiochem). Assays were performed at room temperature in 25 mM potassium phosphate buffer pH 8, 1 mM EDTA, 2 mM MgCl 2 and 0.01% Tween 20. Series of control experiments were conducted to establish the optimum assay condition for each methyltransferase and the optimum conditions were used to determine the kinetic parameters for GLP. Assay cocktails were prepared with 5 mM SAHH to avoid any SAH accumulation while produced from the methyltransferase reaction, 3 U/ml of adenosine deaminase from Sigma, 70 mM SAM, and GLP. The peptide concentrations were varied over the range of 2 mM to 4 mM. Assays were initiated by the addition of peptide and immediately after starting the reaction, 2x volume of 20 mM ThioGlo solution was added to each well. The methylation reaction was followed by monitoring the increase in fluorescence using Biotek Synergy2 plate reader with 360/ 40 nm excitation filter and 528/20 nm emission filter for 20 min in 384 well-plate format. Homocysteine generated in the assay was quantified using standard curves. Activity values were corrected by subtracting background caused by the peptide or the protein. K m and k cat values were calculated using the Michaelis-Menten equation and Sigmaplot 9.0. Standard deviations were calculated from two independent experiments.

Structure Superimposition and Electrostatic Potential Coloring
Optimal structure superimpositions were identified with ICM (Molsoft LLC). Briefly, the algorithm uses an iterative procedure to find the best ''alignable'' main chain core in both structures based on seed alignments of 15 residues as follows: (1) start with the most reliable seed alignment of 15 residues; (2) set all weights to 1; (3) perform weighted superposition and evaluate RMSD; (4) calculate the deviation Di for each backbone atom pair; (5) sort the deviations and find the deviation D50 corresponding to 50-percentile of the deviation array; (6) calculate weights W according to the formula Wi = exp(-D50 2 /Di 2 ); (7) go back to step 3 unless a limit of 10 iterations is reached. The electrostatic potential was calculated with ICM using a boundary element solution of the Poisson equation. Color saturation was set to calculated values of +/2 5 kcal/electron units (+5 = blue 25 = red) when the electrostatic potential was projected on molecular surfaces. Figure S1 Sequence alignment of the methyltransferase domain of H3K9 HKMTs. Residues within 4Å of bound H3K9 peptide in our GLP-H3K9me complex are highlighted in red. The three aspartate making polar interactions with arginine H3R8 are colored blue. The sequence of PRDM2 is too divergent from other H3K9 HKMTs and was not included in the alignment. Large inserts present in SETDB1 and SETDB2 sequences are not shown for clarity. Found at: doi:10.1371/journal.pone.0008570.s001 (1.11 MB TIF) Datapack S1 Standalone iSee datapack -contains the enhanced version of this article for use offline. This file can be opened using free software available for download at http://www.molsoft.com/ icm_browser.html. Found at: doi:10.1371/journal.pone.0008570.s002 (ICB) Text S1 Instructions for installation and use of the required web plugin (to access the online enhanced version of this article). Found at: doi:10.1371/journal.pone.0008570.s003 (0.75 MB PDF)