The Identification and Structure of an N-Terminal PR Domain Show that FOG1 Is a Member of the PRDM Family of Proteins

FOG1 is a transcriptional regulator that acts in concert with the hematopoietic master regulator GATA1 to coordinate the differentiation of platelets and erythrocytes. Despite considerable effort, however, the mechanisms through which FOG1 regulates gene expression are only partially understood. Here we report the discovery of a previously unrecognized domain in FOG1: a PR (PRD-BF1 and RIZ) domain that is distantly related in sequence to the SET domains that are found in many histone methyltransferases. We have used NMR spectroscopy to determine the solution structure of this domain, revealing that the domain shares close structural similarity with SET domains. Titration with S-adenosyl-L-homocysteine, the cofactor product synonymous with SET domain methyltransferase activity, indicated that the FOG PR domain is not, however, likely to function as a methyltransferase in the same fashion. We also sought to define the function of this domain using both pulldown experiments and gel shift assays. However, neither pulldowns from mammalian nuclear extracts nor yeast two-hybrid assays reproducibly revealed binding partners, and we were unable to detect nucleic-acid-binding activity in this domain using our high-diversity Pentaprobe oligonucleotides. Overall, our data demonstrate that FOG1 is a member of the PRDM (PR domain containing proteins, with zinc fingers) family of transcriptional regulators. The function of many PR domains, however, remains somewhat enigmatic for the time being.


Introduction
The activity of the transcription factor GATA1 in erythroid development is modulated by a range of coregulators, including Friend of GATA 1 (FOG1). FOG1 is a nine-zinc-finger protein ( Figure 1A) that is essential for proper differentiation and maturation of both megakaryocytes and erythroid precursors [1]. FOG1 knockout mice die at E10.5-11.5 due to severe anaemia with arrest in erythroid development, a phenotype that is related to that observed in GATA1 knockout mice [2]. FOG1 and GATA1 interact both functionally [3] and physically [4], and disruption of the normal interaction of FOG1 and GATA1 has been linked to a range of inherited blood disorders (reviewed in [5]).
Despite FOG1 containing nine classical zinc-finger domains, there is no evidence to date that the protein binds directly to nucleic acids, suggesting that FOG1 most likely regulates GATA1 activity by recruiting co-regulator complexes. FOG1 is required for both the activation and the repression of most GATA1 target genes [6][7][8][9]. FOG-mediated repression of GATA1 in transient transfection assays and ectopic expression both depend on its ability to recruit the co-repressor C-terminal binding protein (CtBP) [10][11][12], via a PXDL motif between zinc fingers 6 and 7. However, FOG1 does not appear to require its major PXDLS CtBP-binding motif during erythropoiesis since mice carrying a FOG1 mutant with reduced CtBP binding develop normally [13], suggesting FOG1 recruits another repressor complex during erythropoiesis.
The N-terminus of FOG1 appears to be particularly important for its function. During megakaryopoiesis, deletion mutants lacking residues 1-144 could at least partially rescue erythroid but not megakaryocyte maturation, suggesting a lineage specific role for the N-terminal region [13]. Subsequently, residues 1-12 of FOG1 were shown to be able to mediate transcriptional repression by GATA1 [14], via recruitment of the nucleosome remodeling and deacetylation (NuRD) complex [15][16][17]. Similarly, the Nterminal region of FOG2 represses GATA-4 activity [17], although the possibility remains open that other regions might contribute to repression.
As part of an effort to understand the molecular mechanisms through which FOG1 regulates gene expression during hemato-poietic development, we analyzed the amino acid sequence of murine FOG1 (Uniprot: O35615). PONDR (http://www.pondr. com/), a program that predicts the distribution of structured and natively disordered regions in a protein sequence, predicted that part of the region P100-V254 of FOG1 is likely to be wellordered. Sequence comparisons reveal similarity of up to ,30% to the PR (PRDI-BF1 and RIZ homology) domains found in the human proteins PRDM1-17 ( Figure 1B). The relatively low degree of similarity means that programs such as Interpro (http:// www.ebi.ac.uk/interpro/) do not reveal any domains in FOG1 other than the nine well-characterized classical zinc fingers indicated in Figure 1A.
PRDM-family proteins are gene regulatory proteins that are found in metazoans, but not plants or fungi. Seventeen such proteins have been defined in primates, whereas only two are found in the sea squirt Ciona intestinalis, indicating a substantial expansion during vertebrate evolution. Their biological roles are still not well understood in many cases, but a number of family members appear to act in stem cells and in cellular differentiation (reviewed in [18]). PRDM14 is important in stem cell biology and epigenetic reprogramming (reviewed in [19]), PRDM3 is required for the integrity of heterochromatin [20] and PRDM16 is essential for maintaining adipocyte identity [21]. Not surprisingly therefore, dysregulation of PRDM activity has been associated with several different types of cancer [22][23][24].
All 17 human proteins contain an N-terminal PR domain and all but PRDM11 contain an array of between four and fifteen classical zinc fingers clustered in a range of different patterns [18,25]. The PR domain bears structural similarity to the catalytic SET domains (named for the Drosophila proteins Suppressor of variegation 3-9, Enhancer of zeste and Trithorax) found in histone lysine methyltransferases [26], although in general many of the residues associated with catalytic activity in the SET domains are not conserved in PR domains. Despite the absence of these residues, however, methyltransferase activity has been observed in at a number of PRDM proteins (PRDM2, -3, -6, -8, -9 and -16) [20,[27][28][29][30][31]. Members of the family have also been demonstrated to act as sequence-specific DNA-binding proteins (most likely through their zinc-finger domains) or as protein-recruitment agents at gene regulatory elements, and at this stage a clear consensus view of the function of these proteins as a class has not yet emerged. The essential catalytic consensus motif found in SET domains is shown below the alignment and indicated with a dashed box. Secondary structure elements in FOG-PR are indicated below the alignment. Alignment was carried out using CLUSTAL OMEGA [49] and the diagram made using ALINE [50]. doi:10.1371/journal.pone.0106011.g001 Determination of the solution structure of FOG1-PR To characterize this predicted domain in FOG1, we overexpressed (as a GST-fusion protein in Escherichia coli) a polypeptide corresponding to residues P100-V254 of murine FOG1 (hereafter referred to as FOG-PR) and then purified it using GSH-affinity and, following removal of the GST by cleavage with thrombin, size exclusion chromatography. Far-UV circular dichroism and one-dimensional 1 H NMR spectra (not shown) revealed that this polypeptide contained substantial b-sheet secondary structure and took up a well-defined conformation in solution. Size exclusion chromatography with in-line multi-angle laser light scattering (MALLS) gave a mass estimate of ,17.2 kDa, indicating that FOG-PR (MW theor. = 17.0 kDa) is monomeric in solution and suggesting that FOG-PR should be a suitable candidate for structural analysis by NMR spectroscopy. Accordingly, the 15 N-HSQC spectrum of FOG-PR contains approximately the expected number of signals for a 150-residue protein and displays linewidths and chemical shift dispersion consistent with a folded monomer ( Figure 2).
We went on to determine the solution structure of FOG-PR using multidimensional heteronuclear NMR methods. Assignments were made for ,95% of expected backbone 1 H, 13 C and 15 N nuclei, and ,85% of side chain 1 H, 13 C nuclei in the region P100-E207. Little or no data were observed for the residues E147 and E148, and assignments could only be made with confidence for a small number of residues in the C-terminal region (P208-V254). Approximately 25 signals in the 15 N-HSQC therefore remained unassigned, but nearly all of these had rather narrow linewidths, H N chemical shifts in the range ,7.8-8.5 ppm and few NOEs in a 15 N-edited NOESY spectrum. Taken together, these observations indicate that the C-terminal part of the polypeptide is disordered. The results of limited proteolysis carried out on FOG1(100-254) were also consistent with this conclusion, identifying a major proteolysis product corresponding to FOG1(100-214). 15 N-edited, 13 C-edited and 2D NOESY spectra were peakpicked and CYANA 3.0 [32] was used to assign NOEs and calculate structures. The 50 lowest energy structures, calculated with 1244 distance restraints and TALOS+-derived dihedral angle restraints for 89 residues, were refined in explicit water using CNS, according to the RECOORD protocol [33]. The 20 lowest energy structures were used to represent the structure of FOG-PR ( Figure 3 and Table 1). This family of structures has a backbone RMSD (over residues with Q and y angle order parameters of . 0.95) of 0.67 Å , and exhibits no NOE violations .0.5 Å . The wellordered region of the protein was defined as ranging from G103-I137 and D158-V205; the sequence Q138-V157 and the sequences N-and C-terminal to the ordered region exhibited random-coil chemical shifts, gave rise to no non-sequential NOEs, and had very low backbone angle order parameters in the final structures. PROCHECK_NMR analysis showed that, on a Ramachandran plot, 99.8% of residues in the well-ordered region of the protein fall within the most favoured or additionally allowed regions (calculated for non-Pro, non-Gly residues). 15 N T 1 , T 2 and heteronuclear NOE data were consistent with this arrangement (Figure 4).

Comparison of the structure with SET and other PR domains
The region encompassing the PR domain (P100-V205) of murine FOG1 has ,90% and ,67% sequence similarity with the corresponding regions of human FOG1 and Xenopus laevis FOG1, respectively, indicating that this structure is conserved across all FOG1 homologues. It is also clearly conserved in FOG2, although does not appear to be present in the related Drosophila protein U-shaped. Overall, the fold closely resembles that of the enzymatic SET domains found in many lysine methyltransferases [34]. Examination of the .10 structures of SET domains determined to date shows that the domain consists of a number of semi-modular subunits. The core SET domain appears to comprise N-and C-terminal regions (SET-N and SET-C) that are relatively invariant between different domains, together with a central SET-I region that has widely varying length and structure [35]. In addition, flanking domains (pre-SET and post-SET domains) are generally observed, which are also somewhat variable in nature.  Figure 3C shows a comparison of the structure of FOG-PR and the core of the SET domain of DIM-5 [36]. The arrangement of secondary structural elements is clearly the same, down to the presence of the C-terminal pseudo-knot that is common to all SET domain structures solved to date (purple). The SET-N and SET-C regions clearly match those in DIM-5. In contrast, SET-I is an a+ b structure in DIM-5 (yellow), whereas in FOG this region simply comprises a disordered 12-residue loop. The residues in this loop gave rise to broad (or no) signals in the NMR spectra, indicating that they participate in ms-ms timescale motion; it is possible that this region forms a marginally stable structure. DIM-5 also displays an elaborate pre-SET domain (grey) that binds three Zn(II) ions. A structure-guided alignment of the DIM-5 and FOG-PR sequences reveals that the most highly clustered set of conserved residues lies in the hydrophobic core that is common to the two structures.
Structures have also been determined for the PR domains of PRDM1, 2, 4, 9, 10, 11 and 12 (although only structures of PRDM2 have been published [37,38]). These structures more closely resemble that of FOG-PR ( Figure 3D shows a comparison of FOG-PR with the PRDM4 PR domain), with fewer elaborations than the SET domains. Notably, all seven structures of PR domains display a three-stranded b-sheet in the SET-I region that is disordered in FOG-PR, although some poorly

FOG-PR is unlikely to have methyltransferase activity
SET domains possess histone methyltransferase activity towards specific lysine residues in histones tails, leading to positive and negative regulation of gene expression, depending on context. The transferred methyl group is derived from the cofactor S-adenyosyl-L-methionine (SAM), and the transfer reaction gives rise to the cofactor product S-adenyosyl-L-homocysteine (AdoHcy). Several structures have been determined of SET domains in the presence of AdoHcy, allowing the identification of the conserved substratebinding pocket of the protein (see, for example, [39,40]). To test whether or not FOG-PR might also act as a methyltransferase domain, we titrated AdoHcy into a solution of 15 N-labeled FOG-PR and recorded 15 N-HSQC spectra. No changes were observed following the addition of up to 100 molar equivalents of AdoHcy (not shown), suggesting that FOG-PR is unlikely to act as a SAM dependent methyltransferase.
This lack of binding is consistent with the absence of asparagine and cysteine residues that are highly conserved in the Ado-Met/ AdoHcy co-factor binding region of SET domains [35,40]. These residues form part of an H/RxxNHxC motif that is thought to be important for enzymatic activity. As noted above, methyltransferase activity has been observed in at a number of PRDM proteins, suggesting that other residues might well be able to fulfil their roles. In the case of PRDM9, a structure has been determined of the PR domain bound to both a histone H3-derived substrate peptide and AdoHcy [41]. It is notable that a substantial portion of the binding site for both molecules is derived from residues in either the SET-I region or in the sequence immediately Cterminal to the pseudo-knot. Both of these regions are disordered in the FOG-PR structure and in general the residues that make contacts with AdoHcy and histone H3 do not appear to be conserved in FOG-PR.

Further efforts to pinpoint the function of FOG-PR
If only some PR domains act as methyltransferases, the question arises as to what the function is of the remaining domains. Some PRDM proteins have been shown to associate with DNA and to recruit other proteins to chromatin [18], and it is therefore possible that PR domains can act as either DNA-or protein-binding modules. We used gel shift assays to assess the DNA-binding properties of FOG-PR. Previously we described Pentaprobes, a set of six high-diversity oligonucleotides that contain all possible fivebase-pair sequences [42]. Data from our lab indicate that bona fide DNA-binding proteins will typically bind to Pentaprobes in a gel shift assay. However, we observed no binding of FOG-PR to any of the six double-stranded Pentaprobes. Similarly, no binding was observed to single-stranded forms of the Pentaprobes (data not shown).
The PR domain of PRDM2 (RIZ) has been shown in GSTpulldowns to act as a protein interaction domain, mediating homodimerization [43]. We tested the protein-binding capability of FOG-PR by binding GST-FOG-PR to glutathione agarose beads and treating the beads with a nuclear extract from murine erythroleukemia (MEL) cells. SDS-PAGE analysis did not reveal any bands of significant intensity that were not observed in a GSTonly control pulldown (data not shown). Furthermore, yeast twohybrid screens carried out using FOG-PR as a bait and cDNA libraries from murine erythroleukemia (MEL) or K262 cells as prey did not yield any high-confidence hits (data not shown).

Implications for PR domain function
Although it is well accepted that the PR family of proteins acts to regulate gene expression, there is not yet a clear consensus on the biochemical mechanisms through which they achieve this outcome. Only a subset of the proteins have been demonstrated to display methyltransferase activity and, given the lack of clearly identified catalytic residues, it is possible that the observed activity arises (at least in some cases) from co-purified proteins. PR domains lacking catalytic activity might still function as interaction modules that recognize methylated histone tails or methylated sequences from other proteins. Such a binding activity could serve to modulate their function by either influencing their targeting to specific genomic loci or by regulating their binding to other protein partners that can be 'tagged' by lysine methylation. It is notable that a catalytically inactive SET domain has also been observed in the human protein SUVH9 [44]; the function of this domain is also currently unresolved.
DNA-binding activity has been reported for several family members, consistent with the presence of multiple classical zinc finger domains (reviewed in [18]). In contrast, the zinc fingers of FOG1 do not appear to bind DNA (unpublished data). It is, however, known that zinc fingers in FOG1 act as protein recruitment domains, binding to GATA1 [3,4] and to TACC3 [45] and it is possible that a subset of the zinc fingers in PRDMfamily proteins (especially zinc fingers that are not part of a tandem repeat) likewise act to recognize protein partners.
Other protein-recruitment motifs exist in PRDM-family proteins, including KRAB and AWS repressor domains. FOG1 also harbors several domains that are associated with recruiting corepressors, including an N-terminal sequence that recruits the Nucleosome Remodeling and Deacetylase (NuRD) complex to chromatin and a C-terminal Binding Protein (CtBP) binding motif [12] (Figure 1). Notably, PRDM2, 3 and 16 also contain CtBP binding motifs.
In summary, our data show that FOG1 contains a PR domain. The presence of this domain, together with other structural and functional similarities, defines FOG1 as a new member of the PRdomain-containing protein family. This family of transcriptional regulators is likely to share a common mechanism of action and a broader elucidation of the biochemical function of the PR domain will illuminate the activity of the whole family.

NMR spectroscopy and structure calculations
Resonance assignments were made using a standard set of triple resonance experiments and NOE data were obtained from 13 C-NOESY-HSQC (in .99% 2 H 2 O) and 15 N-NOESY-HSQC spectra. All NMR spectra were recorded at 298 K on a Bruker Avance 600 MHz spectrometers, processed using TOPSPIN and analyzed using Sparky (T. D. Goddard and D. G. Kneller, SPARKY 3, University of California, San Francisco). 15 N backbone relaxation experiments (T 1 , T 2 and heteronuclear NOE) were performed using standard Bruker pulse programs and were analyzed to extract relaxation rates using Sparky. Backbone Q and y dihedral angle restraints were derived from the assigned backbone chemical shifts using TALOS+ [47]. Automated NOE assignment and structure calculations were carried out using CYANA [32] and the lowest energy structures were refined using the RECOORD protocol [33]. The 20 conformers with the lowest energy were used to represent the solution structure of FOG-PR and deposited in the Protein Data Bank (PDB accession number 2mpl). Geometrical properties were assessed using PROCHECK_NMR [48].

Limited proteolysis
Limited proteolysis was carried out by treating 100 mg FOG1(100-254) with 1 mg of chymotrypsin in a 20-mL reaction volume for 4-10 min, separating the reaction mixture by SDS-PAGE and analyzing the major bands by peptide mass fingerprinting on a MALDI mass spectrometer.

Titration with AdoHcy
AdoHcy in a matched buffer was added to 15 N-labeled FOG-PR [in 20 mM Na 2 HPO 4 /NaH 2 PO 4 , pH 7.0 (5% 2 H 2 O) and 2 mM DSS] giving final concentrations of up to 20 mM AdoHcy. 15 N-HSQC spectra were recorded as above.

Gel shift assays
The double-stranded probes were end-labeled with 32 P according to standard procedures using polynucleotide kinase and purified on native polyacrylamide gels by standard methods [42]. Gel shift reactions were set up in a total volume of 30 ml, comprising approximately 1 pmol of 32 P labeled probe, ,100 ng of recombinant protein, 10 mM Hepes, pH 7.8, 50 mM KCl, 5 mM MgCl 2 , 1 mM EDTA and 5% glycerol. After incubation on ice for 10 min, the samples were loaded onto a 6% native polyacrylamide gel made up in 0.56 TBE. The gel was then subjected to electrophoresis at 15 V/cm and 4uC for 3 h, dried, analysed and quantified when necessary using a PhosphoImager (Molecular Dynamics).

Pulldown assay
Mouse erythroleukemia (MEL) cells were cultured in DMEM medium (+glucose, +glutamine, +pyruvate) supplemented with 5% FBS and 1% penicillin/streptomycin. 10-20 mL seed cultures were maintained in T75 flasks. For 1 L grow-ups, 1 mL of seed culture was added to 250 ml fresh medium in a T175 flask and grown at 37uC, 5% CO 2 for 72 h to a density of ,1610 6 cells/mL (viability .85%). Cells were harvested by centrifugation at 2000 rpm for 5 min to yield 1-1.5 g (wet weight) cells/L culture. Cells were washed twice with PBS, then swollen in hypotonic solution (10 mM HEPES, 1.5 mM MgCl 2 , 10 mM KCl, pH 7.9) for 20 min and frozen in liquid nitrogen and stored at -80uC until use. Frozen and swollen cells were thawed at 37uC for 10 min and treated with IGEPAL (0.6% v/v) for 10 min. The mixture was centrifuged for 5 min at 2000 rpm to pellet nuclei and the cytoplasmic supernatant was discarded. The pellet was gently washed once with hypotonic solution (10 mM HEPES, 1.5 mM MgCl 2 , 10 mM KCl, pH 7.9, 0.6% IGEPAL) and centrifuged again at 2000 rpm. Next buffer A (50 mM Tris, 150 mM NaCl, 1% Triton X-100, 1 mM DTT, Complete protease inhibitors, pH 7.4) was added to the pellet (3 ml/g cells) and the mixture was sonicated on ice (step-tip, 1061 s bursts with 10 s recovery, three times total) to give a milky white solution. The mixture was centrifuged at 13000 rpm for 10 min at 4uC. The clear nuclear extract was then used immediately.
For the pulldown, GST-FOG-PR GSH beads (200 mL beads, 0.5 mL beads per liter of E. coli lysate) was incubated with MEL nuclear extract (from 0.5 L MEL cell culture) overnight at 4uC. The beads were separated from the nuclear extract and washed three times with buffer A. Gel loading dye (LDS, 10 mL) was added to the wet beads and the sample heated for 10 min at 90uC. The mixture was then analysed by SDS PAGE.
Protein Data Bank accession codes 1 H, 13 C and 15 N backbone and sidechain chemical shift assignments have been deposited in the BioMagResBank with accession number 19988 and the structure coordinates have been deposited in the RCSB Protein Data Bank (accession code 2 mpl).