Solution Structure of MSL2 CXC Domain Reveals an Unusual Zn3Cys9 Cluster and Similarity to Pre-SET Domains of Histone Lysine Methyltransferases

The dosage compensation complex (DCC) binds to single X chromosomes in Drosophila males and increases the transcription level of X-linked genes by approximately twofold. Male-specific lethal 2 (MSL2) together with MSL1 mediates the initial recruitment of the DCC to high-affinity sites in the X chromosome. MSL2 contains a DNA-binding cysteine-rich CXC domain that is important for X targeting. In this study, we determined the solution structure of MSL2 CXC domain by NMR spectroscopy. We identified three zinc ions in the CXC domain and determined the metal-to-cysteine connectivities from 1H-113Cd correlation experiments. The structure reveals an unusual zinc-cysteine cluster composed of three zinc ions coordinated by six terminal and three bridging cysteines. The CXC domain exhibits unexpected structural homology to pre-SET motifs of histone lysine methyltransferases, expanding the distribution and structural diversity of the CXC domain superfamily. Our findings provide novel structural insight into the evolution and function of CXC domains.


Introduction
Organisms with different numbers of sex chromosomes between males and females face the problem of an unequal dosage of genes from sex chromosomes. In Drosophila melanogaster, the transcriptional level of most genes in the single male X chromosome is increased by approximately twofold to match that from two female X chromosomes (see recent reviews [1][2][3]). This dosage compensation process is mediated by the dosage compensation complex (DCC) or male-specific lethal (MSL) complex, which contains at least five proteins MSL1, MSL2, MSL3, males absent on the first (MOF) and maleless (MLE) and two non-coding RNAs roX1 and roX2. MSL1 is a scaffold protein associated with MSL2, MSL3 and MOF [4][5][6].
In male flies, the DCC is located at hundreds of sites along the length of the X chromosome. Each of five proteins and at least one of the functionally redundant roX RNAs are required for full association of the DCC on the X chromosome and for male viability. The DCC is not assembled in females because MSL2 translation is tightly repressed [7,8]. The DCC has been shown to primarily bind at bodies of active genes on the X chromosome [9,10]. The transcriptional activation is caused, at least in part, by the MOF-mediated acetylation of histone H4 lysine 16 and enhanced transcriptional elongation [11].
The mechanism by which the DCC is specifically localized to the X chromosome remains poorly understood. According to a prevalent model, the DCC first binds to a limited number of high-affinity sites (HAS) or chromatin entry sites (CES) in the X chromosome and then spreads in cis to flanking active genes [12]. The spreading process probably involves the interaction of the MSL3 chromodomain with trimethylated H3K36, a marker for actively transcribed genes [13]. HAS are able to attract even a partially assembled DCC that lacks MSL3, MOF or MLE, or a low concentration of DCC [7,14,15]. A body of evidence suggests that specific DNA sequences are involved in HAS recognition. When translocated to an autosome, HAS as short as 100-200 base pairs (bp) can still recruit the DCC [16][17][18][19][20]. Chromatin immunoprecipitation studies followed by microarray analysis or deep sequencing have mapped ,140 HAS on the X chromosome [20,21]. The binding sites of the DCC on HAS are enriched with a GA-rich MSL recognition element (MRE) [20]. However, the MRE motif occurs frequently outside of HAS and is only slightly enriched in the X chromosome over autosomes, indicating that the MRE motif is not the sole determinant for HAS recognition. The conformation of chromatin also appears to be important for HAS recognition, since HAS are characterized by low nucleosome occupancy [20] and special compartments in the nuclei [22].
MSL2 is a core component of the DCC [23][24][25] and together with MSL1 is required for the DCC to bind HAS on the X [14,15,26,27]. MSL2 was recently shown to be a DNA-binding protein and specifically recognize a HAS in reporter gene assay in Drosophila cells [28]. However, MSL2 failed to discriminate the HAS sequence in vitro. An unknown selectivity cofactor was proposed to cooperate with MSL2 in vivo for specific HAS recognition [28].
MSL2 is composed of an N-terminal RING domain, a cysteinerich CXC domain and a C-terminal region rich in proline and basic residues (Pro/Bas patch). The RING domain binds MSL1 [6,29] and exhibits ubiquitin E3 ligase activity toward H2B K34 [30]. The CXC domain contributes critically to the DNA-binding activity of MSL2 [28]. CXC domains are also present, mostly in two copies, in the tesmin/TSO1 protein family [31][32][33][34][35]. The tandem CXC domain of human LIN54 and soybean CPP1 has been shown to bind specific DNA sequences [32,33]. No structure has been reported for any CXC domain.
The CXC domain is remarkable by having 9 invariant Cys within about 50 residue region. In this study, we have determined the first structure of MSL2 CXC domain by NMR spectroscopy. The structure reveals a compact fold that encages an unusual Zn 3 Cys 9 cluster. Interestingly, the CXC structure with a Zn 3 Cys 9 cluster shows strong similarity to pre-SET motifs of histone lysine methyltransferases, suggesting that the CXC and pre-SET domains share a common evolutionary origin.

The MSL2 CXC Domain is an Autonomously Folded Structure Containing Three Zinc Ions
Our structural analysis of D. melanogaster MSL2 CXC domain was conducted mainly on two constructs. One construct containing residues 517-572 and the C560G mutation (CXC-2) was used in early experiments, and another slightly shorter construct containing residues 520-570 and the C560G mutation (CXC-3) was used in the majority of experiments. The two fragments display identical NMR spectra in folded regions and thus should have the same structure. In both constructs, the nonconserved residue Cys560 was replaced by glycine, an amino acid that is frequently found at the corresponding position in MLS2 homologs, to avoid formation of intermolecular disulfide bonds. The 1 H-15 N HSQC spectrum of the MSL2 CXC domain displays a single set of well-dispersed peaks, indicating that the fragment is autonomously folded and amenable for further structural characterization (Fig. 1A). Nearly complete assignments for backbone and side-chain resonances were obtained by analysis of a set of triple-resonance spectra collected on 13 C/ 15 N-labeled CXC protein.
The CXC domain is characterized by nine invariant cysteine residues and conserved spacing between them, suggesting that these cysteine residues may coordinate metal ions. Indeed, inductively coupled plasma mass spectrometry of recombinant CXC protein revealed a significant enrichment of Zn compared with other metals Fe, Mg, Ca, Mn, Ni and Cu (data not shown). To assess the Zn binding stoichiometry, the CXC-3 protein was analyzed with electrospray mass spectroscopy under conditions that preserve the native protein structure (Fig. 1B). The monoisotopic mass of the major species (5824.0875 Da) precisely matched that of a monomeric CXC-3 molecule in complex with three Zn 2+ ions (5824.267 Da). Analytical ultracentrifuge sedimentation velocity assay further showed that the CXC domain is a monomer in solution (data not shown). These biophysical results indicate that the CXC domain folds into a monomeric structure with three bound zinc ions.
Assignment of Zn-coordinating Ligands by 113 Cd NMR Three Zn 2+ ions are most likely coordinated by nine invariant Cys residues in the MLS2 CXC domain. The CXC domain has two histidines at positions 557 and 565, but they are not conserved and are unlikely to bind structurally important Zn 2+ ions. As each Zn 2+ ion needs to be coordinated to four ligands, the unusual Zn to Cys ratio of 3:9 suggests that these Zn 2+ ions are bound in an unconventional way. To assign the zinc ligands, we replaced Zn with 113 Cd, which has a similar coordination pattern to Zn but has more favorable NMR properties [36]. The 113 Cd-loaded protein was prepared during protein expression by substituting Zn 2+ with 113 Cd 2+ in minimal M9 medium.
The 1 H-15 N HSQC spectrum of 113 Cd/ 15 N-labeled CXC protein displays a single set of peaks, indicating complete 113 Cd substitution ( Fig. 2A). As many resonances were shifted after 113 Cd replacement, the resonances of H, N, Ha and Hb were reassigned using 3D 1 H-15 N TOCSY-HSQC and 3D 1 H-15 N NOESY-HSQC spectra collected on 113 Cd/ 15 N-labeled protein.
The amide resonances of Cys residues and nearby residues generally display large changes upon 113 Cd replacement (Fig. 2B). The chemical shifts of Ha and Hb protons are less affected, with an average deviation of 0.048 ppm and a maximal deviation of ,0.38 ppm. The NOE patterns are unaltered, indicating that the structure is minimally disturbed by 113 Cd substitution.
The 1 H-decoupled 1D 113 Cd spectrum shows three peaks at 734.2, 740.2 and 746.6 ppm (Fig. 3A), in agreement with three Zn 2+ ions identified by native electrospray mass spectroscopy. These peaks appear as triplets but are most obvious for Cd-A. Each 113 Cd ion is probably coupled to the other two 113 Cd ions by sharing a bridging cysteine ligand, and the 113 Cd-113 Cd twobond couplings ( 2 J Cd-Cd ,30 Hz) split these resonances into triplets.
We collected a series of 2D 1 H-113 Cd HSQC spectra and an HMQC-TOCSY spectrum to correlate 113 Cd ions to Hb and Ha protons of coordinating Cys residues ( Fig. 3C-F). Some Cys Hb protons have close chemical shift values, hindering assignment of 113 Cd coordination. To measure 1 H chemical shifts with higher precision, we collected a 2D 1 H-15 N HSQC-TOCSY spectrum, in which 1 H resonances were recorded in the more resolved direct dimension and could be directly aligned to 1 H-113 Cd HSQC spectra (Fig. 3B). Analysis of these spectra revealed that Cd-A (734.2 ppm) is coordinated to Cys525, Cys527, Cys539 and Cys544, Cd-B (740.2 ppm) to Cys525, Cys546, Cys553 and Cys556, and Cd-C (746.6 ppm) to Cys558 and Cys561. The other two ligands for Cd-C were missing in all spectra probably due to small 3 J Hb-Cd value and could not be assigned.

Structural Determination of MSL2 CXC Domain
The structure of MSL2 CXC domain was initially calculated in CYANA based solely on autoassigned NOE cross-peaks without incorporation of zinc ions. The resulting structures converged and showed a compact fold with Cys clustering at the center. The structure was further refined in CNS with additional TALOSderived dihedral restraints, protein-zinc restraints deduced from 113 Cd NMR and explicit water (see Materials and Methods). Inspection of structures calculated without the Zn-C restraints suggested that Cys539 and Cys553 are the remaining two ligands for Zn-C that could not be identified with 113 Cd NMR. Other assignments of Zn-C ligands were not consistent with the existing structural restraints. The numbers and types of restraints used in the final structure calculation and the statistics for the 20 lowest energy structures are given in Table 1.
The NMR structures are well defined with an average of 27 restraints per residue. The root mean square deviation (RMSD) values of the 20 best structures to the mean structure are 0.38 Å and 0.87 Å for backbone and heavy atoms in the structure core (residues 523-529 and 538-565), respectively (Fig. 4A). A few terminal residues and an internal loop (residues 530-537) are poorly defined in the structure because of a lack of long-range distance restraints. These residues are intrinsically dynamic on the ps-ns timescale, as evidenced by their reduced steady-state 1 H-15 N heteronuclear NOE values (Fig. 4B).

The Solution Structure of MSL2 CXC Domain
The CXC domain adopts a small globular fold that encapsulates three triangularly arranged zinc ions (Fig. 5B). Among nine Cys ligands, six are singly coordinated, and three (Cys525, Cys539 and Cys553) simultaneously bind two zinc ions, such that each zinc ion is tetrahedrally coordinated to two terminal and two bridging Cys. The structure is composed of two loops that wrap around the Zn 3 Cys 9 cluster in a right-handed manner. The Nterminal loop (residues 521-550) including a short a-helix (residues 546-550) harbors five Cys residues, whereas the C-terminal loop (residues 551-567) contains four Cys residues.
An intact Zn 3 Cys 9 cluster appears to be important for the structure and function of CXC domains. The Ala double substitution of Cys544 and Cys546 of MSL2 was previously shown to delay development and reduce male fly viability [14]. The same double mutation impaired the DNA-binding ability of MSL2 and abolished its targeting to an HAS in D. melanogaster cells [28]. According to our structure, these two Cys residues are the ligand for Zn-A and Zn-B, respectively. The mutation likely disrupts the CXC domain structure, hence impairing its DNAbinding activity and in vivo function. In addition, Cys mutations in the CXC domain were shown to disrupt the function of TSO1 in flower development and cell division [34] and to impair the DNAbinding activity of human LIN54 [32].
Besides invariant Cys ligands, residues Arg526, Gly528, Thr538, Arg543, Try547 and Asn563 are also highly conserved among MSL2 homologs (Fig. 5A). These residues could be conserved for structural and/or functional reasons. In the structure, the side chains of Thr538 and Asn563 form hydrogen bonds with the carbonyl oxygen atoms of Gly560 and Asn551, respectively (Fig. 5C). These long range interactions apparently stabilize the small fold. The aromatic ring of Try747 is partially buried and Gly528 is located in a b-turn; these two residues probably also play a structural role. The Ala mutation of Try547 has been shown to disrupt the DNA binding and HAS targeting of MSL2 [28]. By contrast, the conserved Arg526 and Arg543 are solvent exposed and they together with other less conserved basic residues constitute a negatively charged surface patch (Fig. 5D). The two conserved arginine residues probably contribute to the function of CXC domain, such as DNA interaction.
To validate the structure, we conducted proton-deuterium exchange experiment. The lyophilized CXC protein was dissolved in 2 H 2 O and monitored with 1 H-15 N HSQC spectra for disappearance of amide proton. Seventeen amide protons were observed in the first recorded spectrum, ten persisted after 2 h and four remained after 24 h (Fig. 6A-D). According to our structure, these slow exchange amide protons are generally involved in hydrogen binding or are buried, and hence are protected from solvent exchange (Fig. 6E). In particular, the two side chain amide protons of Asn563, but not from other Asn or Gln, were protected, supporting that they are hydrogen bonded (Fig. 5C).

Structural Similarity between CXC and Pre-SET Domains
Similar Zn 3 Cys 9 clusters have been previously described in metallothioneins (MTs) and the SUV39 family of SET domain histone lysine methyltransferases (HKMTs). MTs are ubiquitous Cys-rich small proteins (6-7 kDa) with a high content of divalent metal ions such as Zn 2+ , Cd 2+ and Cu 1+ and are involved in metal homeostasis, detoxification and protection against reactive oxygen [37]. The mammalian MT II is composed of a b-domain that binds an M 3 Cys 9 metal-thiolate cluster and an a-domain that binds an M 4 Cys 11 cluster [38]. M 3 Cys 9 clusters are also present in other families of MTs [39][40][41]. However, the structural fold and linear order of Cys ligands in MSL2 CXC domain are distinct from those in various types of structurally characterized MTs [38][39][40][41]. MTs and CXC domains are also unrelated in function.
Surprisingly, the CXC domain structure shows remarkable resemblance with structures of pre-SET motifs in the SUV39 family of HKMTs [42][43][44]. SET domain HKMTs have been classified into at least seven families, and the SET domains of SUV39, SET2 and EZ family proteins are preceded by a familyspecific Cys-rich pre-SET motif [45].
Structures of SUV39 family HKMTs show that the pre-SET domain coordinates three zinc ions with nine invariant Cys residues [42][43][44]. Like the MSL2 CXC domain, the zincbinding structure of SUV39 pre-SET domains is composed of two loops with five and four Cys residues (Fig. 7A,B). Importantly, the linear order of Cys ligands for each of three zinc ions are strictly conserved between MSL2 CXC and SUV39 pre-SET domains (Fig. 7D). The spacing of Cys normally varies within two residues except in two regions. First, the SUV39 pre-SET domains contain an insertion of 25-60 residues between Cys-5 and Cys-6 that interacts with the SET domain. Second, the segment between Cys-2 and Cys-3 is longer in MSL2 CXC domain (11 residues) than that in SUV39 pre-SET domains (2-5 residues).
The unexpected structural similarity of SUV39 pre-SET domains and the CXC domain led us to examine whether other pre-SET motifs are related to the CXC domain. Several crystal structures have been recently determined for SET2 family HKMTs ( [46,47] and PDB 3H6L). These structures show that the pre-SET motif of SET2 proteins, which is also known as associated with SET domain (AWS), contains a Zn 2 Cys 7 cluster (Fig. 7C). Despite having a different ZnCys cluster, the zincbinding structure of the SET2 pre-SET domain bears significant similarity to the SUV39 pre-SET and CXC structures. The SET2 pre-SET domain can be considered a CXC domain variant that has a similar binding mode for Zn-A and Zn-C but that loses binding of Zn-B because of the absence of Cys-5 and Cys-7 equivalents (Fig. 7D). Cys-3 still bridges Zn-A and Zn-C as in CXC and SUV39 pre-SET structures.
No structure is currently available for the EZ family HKMTs. The EZ pre-SET motif contains 17 invariant Cys residues within about an 80-residue region. Some sequence similarity has been noted between the CXC and EZ pre-SET domains [31,[33][34][35]. In fact, the name of ''CXC domain'' was originally coined to describe the pre-SET motif of EZ proteins [48] and later adopted to designate the Cys-rich regions in TSO1 and MSL2 proteins [31,35]. A single CXC domain was previously identified in the EZ pre-SET, but it lacks an equivalent of Cys-7 [31]. In considering the structural restraints that nine ligands are required for binding three zinc ions, we revised the alignment and identified two CXC domains in EZ pre-SET (Fig. 7D). In this new alignment, the second ligand of the Nterminal CXC domain is His, which is invariant in EZ pre-SET, rather than Cys. Histidine has been shown to coordinate zinc clusters as a terminal ligand [49]. The tandem CXC domains of EZ pre-SET are immediately adjacent to each other in contrast with those found in tesmin/TSO1 proteins, which are separated by 40-60 residues.
We found that the second position C-terminal of Cys-9 is always occupied by an Asn residue in CXC and three types of pre-SET domains (Fig. 7D). In the crystal structures of SUV39 and SET2 family proteins, the equivalent Asn plays an important role in stabilizing the conformation of the C-terminal loop with its side chain amide nitrogen making a hydrogen bond to the carbonyl oxygen of the second residue N-terminal of Cys-6 ( Fig. 7B, C). In the NMR structure of MSL2 CXC domain, the Asn side chain makes a similar interaction with the polypeptide backbone (Fig. 7A). The C-terminal Asn is also invariant in the CXC domains of tesmin/TSO1 proteins and the two reassigned CXC domains of EZ pre-SET. These findings indicate that the Cterminal Asn is a signature residue of the CXC superfamily.
In summary, we show that three types of pre-SET motifs are all related to the CXC domain. SUV39 pre-SET is a CXC domain with a large insertion, SET2 pre-SET is a variant CXC domain lacking one zinc ion and EZ pre-SET appears to contain tandem CXC domains. We can define the consensus sequence of CXC superfamily as CX[C/H]X 2-13 CX 4-7 CXCX 5-60 CX 2-4 CX 1-2 CX 2-15 CXN.

Discussion
The MSL1 and MSL2 complex recognizes HAS in the X chromosome and mediates the first step of X-targeting by the DCC. The CXC domain is the only DNA-binding domain identified so far in the MSL1/MSL2 complex and contributes critically to the recognition of HAS by MSL2 in vivo [28]. We have determined the first structure of MSL2 CXC domain by NMR spectrometry, revealing a surprising Zn 3 Cys 9 -containing fold. The structure reveals the role of nine invariant Cys residues in coordinating three zinc ions. The strong sequence conservation suggests that the CXC domain of tesmin/TSO1 family proteins should adopt a similar structure, including a Zn 3 Cys 9 cluster, as MSL2 CXC domain.
We have identified unexpected structural homology between CXC and pre-SET domains, suggesting that they share a common ancestor domain in evolution. This finding also expands the structural diversity and distribution of the CXC domain superfamily. These deviant CXC domains have large variations in Cys spacing, degenerate zinc binding ligands or His ligands and are thus difficult to recognize if no structure is available. The structural knowledge and derived consensus sequence of the CXC domain superfamily may allow for more deviant CXC domains to be identified.
No specific function other than a structural role has been assigned to pre-SET domains. DNA binding appears to be the primary function of CXC domains [28,32,33]. The homology with the CXC domain implicates that pre-SET domains may have a role in binding DNA, which could facilitate HKMT recognition of nucleosomal substrates.
Many DNA-binding domains, such as helix-turn-helixes, zinc fingers and leucine zippers, use an a-helix to contact DNA at the major groove. The CXC structure is distinct from previously characterized DNA-binding domains and apparently lacks such a DNA-binding a-helix. Determining the CXC domain complex structure with DNA will help to elucidate its likely distinct DNAbinding mode. The experiment was conducted for CXC-2, which contains residues 517-572 plus three extra N-terminal residues from the vector. No data were obtained for proline residues that lack amide proton. Error bars represent the experimental uncertainties estimated from the spectrum background noise. doi:10.1371/journal.pone.0045437.g004

Protein Expression and Purification
The MSL2 cDNA was obtained from the Drosophila Genomics Resource Center. The coding sequence of the CXC domain corresponding to residues 517-572 was amplified by PCR and cloned into a pGEX-6p-1 vector (GE Healthcare). The mutation C560G was introduced by QuikChange site-directed mutagenesis (Stratagene), yielding the GST-fused CXC-2 construct. The coding sequence consisting of residues 520-570 and the C560G mutation was amplified from the CXC-2 plasmid and subcloned into an engineered pET28a vector, yielding the CXC-2 construct. The CXC-3 protein was expressed as fusion to an N-terminal His 6 -SMT3 tag. Single point mutations of R526A and R543A were introduced into CXC-3 by QuikChange. All constructs were confirmed by DNA sequencing.
Escherichia coli Rosetta(DE3) cells containing CXC expression vectors were grown at 37uC in LB broth. When the OD 600 reached 0.8, the growth temperature was lowered to 16uC, and the culture media was supplemented with 40 mM ZnSO 4 and 0.2 mM isopropyl-b-D-thiogalactopyranoside to induce protein expression. After additional growth for 16 h, the cells were harvested, suspended in buffer A consisting of 50 mM Tris-HCl (pH 8.0) and 250 mM NaCl, lysed by sonication and centrifuged.
The clarified lysate of GST-fused CXC-2 was loaded onto a GSTrap column. The GST-tag was cleaved on column by PreScission protease overnight. The released CXC-2 protein was washed with buffer A, diluted 3-fold with 25 mM HEPES- K (pH 7.6), loaded onto a heparin column (GE Healthcare) equilibrated in 25 mM HEPES-K (pH 7.6) and 80 mM KCl and eluted using a linear gradient of KCl. Proteins eluting at less than 300 mM KCl were pooled, concentrated with 3-kDa cutoff ultrafiltration devices (Amicon) and further purified with a Superdex 75 column in buffer 50 mM phosphate (pH 6.0).
The His 6 -SMT3-tagged CXC-3 protein was purified with a HisTrap column (GE Healthcare) and eluted with 500 mM imidazole in buffer A. The pooled factions were incubated with ULP1 for 1 h on ice to cleave the His 6 -SMT3 tag. After a threefold dilution in 25 mM HEPES-K (pH 7.6), the protein was loaded onto a Q column. The flow-through containing CXC-3 was further purified by heparin and gel filtration chromatography following the same procedure as for CXC-2. Protein concentrations were determined spectrophotometrically with a calculated molar extinction coefficient of 4470 M 21 cm 21 at 280 nm for both CXC-2 and CXC-3. 15 N-or 15 N/ 13 C-labeled CXC proteins were prepared in M9 minimal medium with 1 g/L of ( 15 NH 4 ) 2 SO 4 and, if needed, 2 g/ L of 13 C-glucose as the sole nitrogen and carbon sources, respectively (Cambridge Isotope Laboratories). For 113 Cd labeling, ZnCl 2 in M9 media was substituted with 10 nM 113 Cd acetate. E. coli growth and protein yields were normal in 113 Cd-containing media.

Electrospray Ionization Mass Spectrometry
The CXC-3 protein was exchanged into 200 mM ammonium acetate and analyzed by electrospray ionization mass spectrometry with a Q-Star mass spectrometer (Applied Biosystems). To calculate the monoisotopic mass of a species, the mass-to-charge (m/z) ratio of its monoisotopic peak was multiplied by the charge, and the charge was then subtracted. The monoisotopic mass of CXC-3, which contains residues 520-570, mutation C560G and an extra N-terminal Ser from the vector, is 5638.48 Da. The exact mass of Zn 2+ ion was calculated as molecular mass of 64 Zn (63.929) minus 2, to compensate for two positive charges of Zn 2+ ion.

NMR Experiments
The NMR samples contained 0.5-1.5 mM CXC-2 or CXC-3 proteins, 50 mM potassium phosphate (pH 6.0), 0.01% (w/v) sodium 2,2-dimethylsilapentane-5-sulfonate (DSS) and 10% (v/v) 2 H 2 O. NMR data were recorded at 298 K on a Bruker DMX600 spectrometer equipped with a triple resonance cryoprobe, unless indicated otherwise. Three-dimensional (3D) CBCA(CO)NH, HNCACB and HNCO spectra were collected to obtain sequence-specific backbone resonance assignments [50]. 3D HAHB(CO)NH, CC(CO)NH, CCH-TOCSY (mixing time t m 12 ms), HCCH-TOCSY (t m 12 ms), 1 H-15 N TOCSY-HSQC (t m 60 ms) and 2D 1 H-13 C HSQC spectra were collected for side-chain assignments. All spectra were processed with Felix (Accelrys Inc.) and analyzed with NMRViewJ [51]. The 1 H chemical shifts were referenced to internal DSS, and the 15 N and 13 C chemical shifts were referenced indirectly. The 1 H-15 N steady-state heteronuclear NOE values were calculated from the ratios of peak intensities in a 1 H-15 N HSQC spectrum collected with a 3 s period of initial proton saturation to those in an unsaturated spectrum. The error of peak intensity was estimated from spectrum background noise and propagated into the error of 1 H-15 N heteronuclear NOE.
In hydrogen-deuterium exchange experiments, the CXC-3 protein originally prepared in 500 ml of 50 mM potassium phosphate (pH 6.0) was lyophilized and redissolved in 500 ml 2 H 2 O. A series of 1 H- 15

Structure Calculation
NOE-based distance restraints were derived from 3D 1 H-15 N NOESY-HSQC (t m 200 ms), 3D aliphatic 1 H-13 C NOESY-HSQC (t m 200 ms) and 3D aromatic 1 H-13 C NOESY-HSQC (t m 200 ms) spectra. Inter-proton distances were obtained with NMRViewJ using an exponential calibration from peak volumes Drosophila melanogaster E(z). The observed or predicted zinc ligands are numbered sequentially from 1 to 9. The invariant C-terminal Asn is marked with asterisk. The coordination patterns of zinc ions observed in these structures are shown. The starting and ending residues of each sequence are labeled with residue numbers. A 48-residue region is omitted in DIM5 pre-SET. The zinc ligands in NSD1 pre-SET are numbered according to those in the CXC domain. doi:10.1371/journal.pone.0045437.g007 and the upper limits of restraints were set to 2.2-6.0 Å . The CXC structure was initially calculated in CYANA solely from autoassigned NOE peaks [52]. More than 80% of NOE peaks were assigned this way, and the structure calculation converged with an RMSD of 1.12 Å for the backbone atoms of the 20 best structures.
The CYANA-generated model was further refined in CNS [53], incorporating additional dihedral angle and Zn restraints. The CYANA assignments of the NOE peaks were checked manually. Backbone dihedral angle restraints were derived from HN, Ha, Ca, Cb, C' and N chemical shifts using TALOS+ [54]. The Zn tetrahedral coordination geometry was maintained by setting the bond length of Zn-Sc(Cys) as 2.3 Å and the bond angle of Sc(Cys)-Zn-Sc(Cys) as 109.5 degree. The Zn restraints that can be assigned by 113 Cd NMR experiments were incorporated first. Zn-C was assigned by 113 Cd NMR to be ligated to Cys558 and Cys561. Inspection of the resulting structures suggested that Cys539 and Cys553 are the remaining two ligands for Zn-C. Incorporation of these two Zn-C restrains caused no NOE violations during the structure calculation, whereas coordination of Zn-C with different ligands caused violations around the Zn-cysteine cluster, indicating the correctness of the assignment. 100 structures were calculated, and the 50 lowest energy structures were further refined with electrostatic potentials and explicit water using CNS and RECOORDScript [55]. The 20 lowest energy structures were selected for final analysis using PROCHECK-NMR [56], MolMol [57] and WHAT_CHECK [58]. Structural figures were created with PyMOL [59].

Accession Numbers
The NMR resonance assignments for CXC-3 have been deposited in the BioMagResBank with accession number 18514. The atomic coordinates and experimental NMR restraints for the MSL2 CXC domain have been deposited in the Protein Data Bank with accession code 2LUA.