A Regulatory Potential of the Xist Gene Promoter in Vole M. rossiaemeridionalis

X chromosome inactivation takes place in the early development of female mammals and depends on the Xist gene expression. The mechanisms of Xist expression regulation have not been well understood so far. In this work, we compared Xist promoter region of vole Microtus rossiaemeridionalis and other mammalian species. We observed three conserved regions which were characterized by computational analysis, DNaseI in vitro footprinting, and reporter construct assay. Regulatory factors potentially involved in Xist activation and repression in voles were determined. The role of CpG methylation in vole Xist expression regulation was established. A CTCF binding site was found in the 5′ flanking region of the Xist promoter on the active X chromosome in both males and females. We suggest that CTCF acts as an insulator which defines an inactive Xist domain on the active X chromosome in voles.


Introduction
Dosage compensation in female mammals is achieved by inactivation of one of the two X chromosomes. X-inactivation occurs in early embryogenesis and comprises several stages such as counting of the X-chromosome number per diploid autosome set, choice of the X-chromosome to be inactivated, initiation of inactivation, spreading of the inactive state, and its maintenance in cell lineage. X-inactivation is controlled by a locus referred to as the X chromosome inactivation center, XIC. It contains several non-coding RNA genes, most importantly Xist and Tsix. Xist is expressed from the inactive X chromosome. Further, Xist RNA spreads along the inactivating X chromosome, leading to its heterochromatinization and gene silencing [1,2,3].
Before the onset of X-inactivation, Xist is expressed at a low level from both X chromosomes. Then XICs of two X chromosomes transiently associate and the mutually exclusive choice of the future active and inactive X chromosomes occurs [4,5,6]. As a result, the Xist allele on one X chromosome is upregulated, triggering the X-inactivation process, whereas Xist expression on the other X chromosome is repressed. The mechanisms underlying such expression regulation have not been well understood. In rodents, Xist expression on the active X chromosomes is repressed through antisense transcription of Tsix across the Xist promoter [7,8,9]. However, there is no TSIX in the human XIC [10, 11,12]. Therefore, regulation of Xist expression seems to be more complicated and Xist promoter may possess some elements which can both activate and repress its transcrip-tion. To date Xist promoter region was studied only in two mammalian species, human and mouse. A number of binding sites for widespread transcription factors (TBP, YY1, SP1, CTCF) and uncharacterized regulatory proteins were found within Xist minimal promoter [13,14,15,16,17].
Common voles of the genus Microtus are characterized by interspecific differences in X chromosome morphology (size, heterochromatin blocks, and positions of centromeres). Moreover, X-inactivation in these hybrid females is skewed. The M. arvalis X chromosome remains active in 80% of the cells in the M. arvalis6M. rossiaemeridionalis, M. arvalis6M. transcaspicus, and M. arvalis6M. kirgisorum hybrids [18]. The mechanism of skewed Xinactivation is still unclear. However, interspecific changes in DNA sequences influencing different transcription levels of the Xist alleles may be involved [19,20]. These properties make common voles an interesting model for studying the X-inactivation process.
In the four closely related common vole species the sequences of XIC elements and their expression pattern were previously analyzed [21,22]. It has been shown that not all functional elements of the mouse XIC are well conserved even within one order Rodentia. This suggests a taxon-specific regulation of Xinactivation and the genes involved in this process.
This work was focused on studying the Xist promoter region of M. rossiaemeridionalis and searching for common and species-specific regulatory proteins which could influence Xist expression. We identified factors being potential activators and repressors of Xist expression at different stages of X-inactivation. CpG methylation of the promoter region demonstrated playing an important role in Xist regulation. In addition, we were not able to detect a CTCF binding site in the vole Xist minimal promoter which is well known in human and mouse. CTCF binding was found in the 59 flanking region of the Xist promoter on the active X chromosome in both males and females, allowing us to suggest that CTCF is an insulator which defines an inactive Xist domain on the active X chromosome in voles.

Comparative Analysis of Xist 59 Regulatory Region in Mammals
The nucleotide sequences of 59 regulatory regions and parts of the first exon of Xist were compared in different mammalian lineages. A comparison of Xist 59 regions between M. rossiaemeridionalis and other mammals revealed a homology of the sequences in the 21 kb to +1 bp region. The homology was then interrupted with species-specific SINE and LINE mobile elements (Fig. 1). A more detailed analysis of the 21 kb to +1 bp region using the software searching for local conserved DNA sequences (mVista) identified two most conserved regions, CNS1 [273/244 bp] and CNS2 [2540/2498 bp] (conserved non-coding sequence; Fig. 1).
CNS1 was found in the Xist minimal promoter [2101/+1 bp]. Its homology varied from 73% for vole/horse to 97% for cow/dog with an average level of 84%. Such a high homology in the noncoding gene suggests an important role of this region in Xist expression regulation. Two short sequences conserved for all the species were also identified in the minimal promoter. One of them (T-T-A-A-A-G/A) is located 25 bp upstream of the transcription start site and is likely to interact with TBP-like protein [16]. The second sequence (G-C-C-A-T-G/A-T-T-T) spans the Xist transcription start site and seems to bind to the initiator protein YY1 [17].
CNS2 was located at positions from 2540 to 2498 bp relative to the vole Xist transcription start site (Fig. 1, Fig. S1). The homology of this region varied from 74% for human/rabbit to 93% for human/dog with an average level of 79%. Several other conserved regions were found near CNS2 almost in all the species analyzed except for cow and rabbit.
In addition, using multiple alignment of Xist promoter region of 13 mammals from different taxa we observed a short but well conserved region, CNS3 ( Fig. 2A) Nine protected regions on the coding DNA strand (Fig. 3A, B) and six protected regions on the non-coding strand (Fig. 3C) were found in the first region [2267/+54 bp]. Potential binding sites for transcription factors YY1, TBP, SP1, AP2, NFY, Oct1, and many others were revealed by the MatInspector and Match TM software (Fig. S2, Table S1). The presence of these potential binding sites in the promoter region is in agreement with the results of competitive inhibition of EMSA (electrophorectic mobility shift assay) with oligonucleotides containing known binding sites for TBP («Promega»), YY1 (SRE-element) [23], Sp1 («Promega»), and CBF (CAAT-binding factor) [24] (Fig S4). The binding sites for TBP and YY1 were also found in the homologous regions of mouse and human [15,17].
We carried out footprinting binding reactions with recombinant SP1. Subsequent treatment with DNase I decreased intensity of the bands and an additional band at the edge of the binding site DNase I in vitro footprinting with the second region [2551/ 2372 bp] of the vole Xist promoter revealed nine protected motifs in the coding and five in the non-coding DNA strands (Fig. 3E). These motifs overlapped with potential binding sites for the transcription factors NMP4, RAR_RXR, SATB1, HMGIY, Znf217, ERa, etc. (Fig. S3, Table S1).

Functional Analysis of the Vole Xist 59 Regulatory Region
To determine influence of the potential regulatory elements on Xist expression we made 22 reporter constructs containing a luciferase gene under control of different fragments of the vole Xist promoter. The fragments overlapped the region [21453/+67 bp] and differed from each other in 50 bp (Fig. 4). Female vole fibroblasts (Sd10 cell line) were used for transient transfection.
All the reporter constructs could be divided into two classes: the constructs contained the region [24/+67 bp] (Fig. 4, pCx1-pCx14) and those with deletion of this region (pCx15-pCx17). A deletion of the region [24/+67 bp] caused a significant decreasing in activity of the reporter constructs whereas a deletion of the region [2190/+67 bp] (pCx18 pCx19) repressed completely the activity of the luciferase gene. The constructs pCx1 [+1/+67 bp] and pCx2 [250/+67 bp] did not have any significant luciferase activity while the region [2100/250 bp] in the consruct pCx3 [2100/+67 bp] provided the highest activity of the reporter gene. This construct comprised the protected region V binding SP1 and the region IV bearing potential AP2 binding site (Fig. 3A, D). Thus, one can conclude that the minimal promoter is localized at positions [2100/+67 bp] and SP1 is one of Xist transcription activators.
The construct pCx20 comprising the region [21453/24 bp] in an inverse orientation failed to demonstrate any significant luciferase activity.
Methylation of CpG dinucleotides in gene promoters is known to be one of the epigenetic mechanisms involved in transcription regulation [25]. To understand the role of CpG methylation in Xist expression we treated the constructs pCx5 and pCx14 with M.Sss I which methylates cytosines in 59-CG-39 sequences. The luciferase activity of the methylated constructs (pCx5-Me and pCx14-Me) was assessed 48 h after their transient transfection into female vole fibroblasts (Sd10). Both constructs failed to display any rossiaemeridionalis (vole). Putative CTCF binding sites are shown with yellow frames. Consensus of CTCF binding site is present [60]. (B) Alignment of Xist minimal promoter. Human CTCF is shown with red frames [14]. Vole AP2 binding site is shown with blue frame. doi:10.1371/journal.pone.0033994.g002 significant luciferase activity (Fig. 4), thereby suggesting that CpG methylation completely blocks Xist transcription. This may be due to a suppression of binding of the methyl-sensitive transcription factors to the regulatory elements.

Functional Analysis of the Vole Xist Interspecific 243G/A Substitution in minimal promoter Region
A single-nucleotide substitution of guanine (G) with adenine (A) at position 243 bp was found in the M. arvalis minimal promoter whereas the other vole species -M. rossiaemeridionalis, M. transcaspicus, and M. kirgisorum -contain G at this position. This substitution leads to a loss of one CpG dinucleotide in the Xist minimal promoter of M. arvalis. Therefore, we assumed that 243G/A substitution could reduce transcription of the M. arvalis Xist via increasing or decreasing the binding efficiency of a transcription factor, thus, providing skewed X-inactivation in M. arvalis6M. rossiaemeridionalis female hybrids. The binding site for the transcription factor CTCF was previously localized in the mouse and human Xist minimal promoter. The substitution of C with A at position 243 bp in this site repressed the human XIST expression while the substitution of C with G caused increased expression and skewed X-inactivation. The same correlation was found when studying the mouse Xist [14].
To determine the functional significance of the substitution the constructs pCx14 G/A [21453/+67 bp] and pCx5G/A [2190/ +67 bp] bearing G or A at position 243 bp were used. Their activity was analyzed in the primary fibroblasts of M. rossiaemeridionalis and M. arvalis males and females as well as in primary female mouse fibroblasts. The data on the M. rossiaemeridionalis female fibroblasts (Sd10) are given in Fig. 5. We observed that the substitution 243G/A had no effect on the activity of the reporter constructs. Analogous results were obtained when using the other fibroblast cultures (data not shown).
In the region [252/240 bp], computational analysis of the vole Xist promoter revealed a DNA motif with a weak similarity to the known CTCF consensus (Fig. 2B). This region was located in CNS1 and corresponded to the protected motif IV (Fig. 3A). Using chromatin immunoprecipitation (ChIP) we have also shown that CTCF interacts in vivo with the Xist promoter region in vole M. rossiaemeridionalis.
CTCF interaction with Xist on the active or inactive X chromosome was studied by ChIP in two hybrid fibroblast lines, Sa006 and Sad4, obtained by subcloning of primary lung fibroblasts of M. arvalis6M. rossiaemeridionalis female hybrids. It has been previously demonstrated that the M. rossiaemeridionalis X chromosome is inactive in Sa006 cell culture and M. arvalis X chromosome is inactive in Sad4 cell culture [26]. Sequencing PCR products from CTCF bound fractions for both Sa006 and Sad4 cell lines detected only the Xist allele corresponded to the active X chromosome (the X chromosome of M. arvalis in Sa006 line and that of M. rossiaemeridionalis in Sad4 line) (Fig. 6). Thus, CTCF binds to the promoter of the transcriptionally inactive Xist allele in the vole fibroblasts.
CTCF is well conserved in mammals with 99% protein homology. We compiled vole mRNA and CTCF protein sequences based on the Microtus ochrogaster (closely related vole species) transcriptome published in databases (SRP002127). A comparison of the vole and human CTCF by FASTA has CTCF binding is known to be almost completely inhibited by CpG methylation [28]. Therefore, we investigated CpG methylation of the Xist minimal promoter region [2100/+5 bp] in the Sa006 and Sad4 cell lines (Fig. 7). The M. rossiaemeridionalis and M. arvalis alleles could be distinguished by the 243G/A substitution. In both cell lines, we observed that the inactive Xist allele (the active X chromosome) was hypermethylated (9063% and 8765% of the CpG dinucleotides analyzed for Sa006 and Sad4, respectively). At the same time the active Xist allele (the inactive  X chromosome) was hypomethylated (363% and 161% for Sa006 and Sad4, respectively). Thus, the results of EMSA and CpG methylation assays imply that CTCF can not bind directly to the Xist minimal promoter on the active vole X-chromosome.
The third conserved region (CNS3) [2964/2944 bp] also comprises a potential CTCF binding site ( Fig. 2A). When analyzed the data on CTCF ChIP-seq (http://genome.ucsc.edu.) in tissues and cell cultures of different mouse lines, we found CTCF binding to the mouse Xist promoter around [21026/21006 bp], corresponding to CNS3 (Fig. S8, Fig. 2A). We carried out ChIP with CTCF antibodies using chromatin isolated from male and female primary embryonic fibroblasts of M. rossiaemeridionalis. RealTime PCR with primers specific for CNS1 and CNS3 has shown that CTCF binds to the CNS3 bp region of the vole Xist promoter (6fold enrichment in comparison with a negative control) (Fig. 8).
Taking into account size variability of sonicated chromatin in ChIP reactions [29,30,31], we believe that CTCF binding to CNS3 could be detected in ChIP experiments using the primers for CNS1.

Discussion
X-inactivation occurs in early embryogenesis of female mammals, before initiation of X-inactivation, Xist is expressed on both X chromosomes at a low level. During the initiation stage the Xist allele on the future inactive X chromosome is up-regulated whereas on the active X chromosome it is repressed. Thus, the Xist promoter has to comprise some elements activating Xist transcription and those responsible for its silencing in each female somatic cell. In this work, we attempted to shed more light on general principles of Xist expression regulation. We indentified the most conserved regions of the Xist promoter between vole and other mammalian species and characterized them by DNase I footprinting, luciferase reporter analysis, EMSA, and methylation assay.
The Xist minimal promoter spans the region [2100/+67 bp] and contains INR and binding sites for the transcription factors YY1, TBP, SP1, and AP2. In the EMSA experiments, the oligonucleotides with TBP («Promega») YY1 (SRE-element) [23] binding sites successfully competed with the promoter region [231/+19 bp] for binding proteins of vole liver nuclear extracts (Fig. S4) (Fig. S4). The conservatism of CNS1 [273/ 244 bp] seems to be due to the presence of the Sp1 binding site which was found in all the species analyzed (Fig. S1, S2) and studied in the mouse and human Xist promoters [15,17]. This region is likely to be involved in Xist transcription activation on the inactive X chromosome in female somatic cells.
In addition to the SP1 binding site, the protected regions V and 2 contain a potential binding site for the transcription factor BTEB3/KLF13. Unlike SP1, BTEB3/KLF13 is a repressor and can compete with SP1 for the binding site [32]. Therefore, BTEB3/KLF13 may interact with the Xist promoter on the active X chromosome and inhibit its expression.
The protected region IV [253/240 bp] overlaps the potential AP2 binding site (Fig. S2). According to EMSA, AP2 can bind to this region in vitro (Fig. S5). Both AP2 and SP1 activate transcription and appear to be strong activators of vole Xist expression. Moreover, close location of AP2 and SP1 binding sites suggests their cooperation [33]. The AP2 binding site is speciesspecific and is revealed by MatInspector only in the vole Xist promoter but the 243G/A substitution disrupts this site in M. arvalis. This part of the Xist promoter region is well conserved in  different mammalian species and contains A at position 243 (Fig.  S1, Fig. 2B). One can speculate that binding the additional AP2 activator to the Xist allele bearing G at position 243 may cause skewing choice of the inactive X chromosome in female vole hybrids between M. arvalis and other closely related species. However, analysis of reporter constructs containing A or G at position 243 bp in male and female somatic cells has not revealed any significant difference in their luciferase activity. Thus, skewed choice of the inactive X chromosome in vole interspecific female hybrids seems to be a more complicated phenomenon.
A CTCF binding site was experimentally found in the mouse and human region homologous to the Xist promoter region containing AP2 binding site in M. rossiaemeridionalis. CTCF has been shown to be a transcription activator and interact with the Xist allele on the inactive X chromosome. Interestingly, a single nucleotide C/A substitution at position 240 bp in mouse and 243 bp in human has led to a dramatic decrease in CTCF binding both in vitro and in vivo. A C/G substitution at the same positions cased an increase in binding and involvement of additional CTCF zinc fingers. In both mouse and human, the substitutions resulted in skewed choice of the inactive X chromosome favoring a more active Xist allele [14]. A multiple alignment of the minimal promoter of several mammals has demonstrated that the CTCF binding region revealed by Pugacheva et al [14] in the human Xist is not well conserved (Fig. S1, Fig. 2B). However, analysis of the Xist minimal promoter by CTCFBS Prediction Tool in CTCFBSDB database (a CTCF binding site database, http://insulatordb.uthsc.edu) found these CTCF binding sites in all the species studied except for rat and vole. In addition, we were not able to confirm CTCF binding with the regions V-1 [261/217 bp] and V-II [249/25 bp] by EMSA in vole. The presence of CTCF binding site in the minimal promoter is likely to be a species-specific peculiarity of Xist transcription regulation in rodents.
This study has shown CTCF binding to the inactive Xist allele in vole female hybrids and males using ChIP. A multiple alignment of Xist 59 region of different species revealed a 25 bp region (CNS3) which is conserved and comprises a potential CTCF binding site. ChIP-seq data published on UCSC Genome Browser have demonstrated that a strong peak of CTCF binding is observed at positions 21020 bp and 23000 bp relatively Xist transcription start site in mouse and human, respectively (Fig. S8,  S9). This binding corresponds to the CTCF binding site from CNS3 (Fig. 2B). As the minimal promoter of the inactive Xist allele is hypermethylated in vole, its interaction with CTCF must be inhibited [28]. This allows us to suggest that the potential CTCF binding site from CNS3 was analyzed in our ChIP experiments.
The second strong peak of CTCF binding to the mouse and human Xist is localized in the first exon. In mouse ChIP-seq experiments, CTCF binding was observed in all cases for the first exon and in 40% (7 out of 17) of cases for the promoter region (Fig. S8). In human, both CTCF binding peaks were detected in each experiment irrespective of sex, tissue or cell culture (Fig. S9). Two active binding sites flanking Xist promoter and first exon may imply that CTCF is involved in formation of an inactive chromatin domain on the active X chromosome in males and females, leading to a transcription repression of this Xist allele.
According to the reporter constructs analysis, the vole region [2150/2100 bp] may contain some negative regulatory elements for Xist transcription (Fig. 4). This region corresponds to the protected motifs 4, VII/5, and VIII/6 which potentially interact with NFY, Oct1, and some other homeodomain-containing transcription factors (Fig. S2, Table S1). NFY can repress transcription directly or via chromatin modification in the Xist promoter on the active X chromosome [34]. Among all the homeodomain-containing transcription factors, Msx1 (Msh homeobox 1) is of the greatest interest because it has three potential binding sites in this region. Msx1 can interact with other transcription factors, components of the transcription complex and acts as a repressor during embryogenesis. However, it is currently unclear which of its sites is functionally important or whether they act jointly.
The protected region IX is localized in the region [2200/ 2150 bp]. It comprises a potential binding site for the transcription factor TCF11/MAFG (Fig. S2, Table S1). The presence of this region in the reporter construct pCx5 slightly decreased its activity in comparison with pCx4, suggesting that TCF11/MAFG may be a repressor of Xist transcription (Fig. 4).
An increase in the activity of pCx6 containing the region [2250/+67 bp] is unclear, since this region has not been analyzed by footprinting. However, multiple alignment revealed a high conserved region which overlaps completely with a potential binding site for Jarid2 playing a key role in development and embryonic stem cell differentiation (Fig. S1). This factor has been shown to be a component of PRC2 (Polycomb repressive complex 2), which mediates histone H3 'Lys-27' trimethylation (H3K27me3) [35,36]. It is well known that X-inactivation occurs in early embryogenesis and PRC2 complex is involved in the process [1][2][3]. Moreover, H3K27me3 participates in X-linked gene silencing during X-inactivation and Xist repression on the active X chromosome. Thus, this is in agreement with Jarid2 functions and could explain the conservatism of Jarid2 binding site in Xist promoter of the species analyzed.
pCx7 [2300/+67 bp] activity is significantly decreased (Fig. 4). The data on the mouse Xist regulatory region [15,16] suggest localization of a repressor in the region [2300/2250 bp] of the vole Xist promoter. However, this region it is not conserved and has not been analyzed by footprinting so a potential factor involved is unknown. pCx8 demonstrates almost the same luciferase activity. A moderately conserved 30 bp region was revealed in the region [2350/2300 bp] by multiple alignment of Xist promoters of 13 mammalian species using Vista (Fig. S1). It contains a potential Oct1 binding site which can recruit additional factors, including repressors [37].
Additional regulatory elements were found upstream of the minimal promoter in the region [2550/2350 bp] containing CNS2. As shown by reporter construct assay, they can both activate and repress Xist transcription in voles. 14 motifs protected from DNase I have been detected in CNS2 and the adjacent sequences (Fig. 3E). All the potential binding sites indentified in this region are summarized in Table S1. The most important sites will be discussed below.
Increasing pCx9 [2400/+67 bp] luciferase activity agrees with the presence of the protected motifs 4, 5 and I. They correspond to potential binding sites for transcription factors NMP4, NFAT, and RAR_RXR (Fig. S3). RAR_RXR is a heterodimer of retinoid receptors which can recruit histone acetyltransferases and the protein HMGA1 [38], leading to formation of an ''open'' chromatin structure and gene expression activation. In the presence of retinoic acid RAR_RXR is also able to recruit transcription factors and RNA polymerase II [39]. Retinoids are regulators responsible for embryonic morphogenesis and differentiation of many tissues during post-embryonic development (epithelial and hematopoietic cells). Thus, retinoids may activate Xist expression on the future inactive X chromosome in early embryogenesis.
The level of pCx10 [2450/+67 bp] activity is comparable to that of pCx9. The region [2450/2400 bp] comprises the protected motifs 3, 4, II, III, and IV. The first three motifs overlap with the potential SRY binding site. SRY may regulate Xist expression only in males. However, the nuclear protein extract from female fibroblasts, where SRY is not expressed, was used in the footprinting reactions. Therefore, some other regulatory factor can bind to the motifs 3, 4, and II in females. The fact that the activity of the reporter constructs containing the 59 region of mouse Xist did not differ between XX and XY cell lines [16] also confirms this conclusion. The protected motif IV partially overlaps with potential binding sites for ERa, NFY, and SATB1 (Fig. S3). SATB1 belongs to architectural proteins and interacts with S/ MARs (Scaffold/Matrix Attachment Regions). It can also recruit co-repressors (HDAC), co-activators (HAT), and components of chromatin remodeling complexes [40,41]. SATB1 binding sites were observed in the Xist 59 region of all the mammalian species analyzed, although they are located at different positions relative to the Xist transcription start site. Its expression is detected in embryogenesis and embryonic stem cells. During X-inactivation, SATB1 is involved in formation of loop domains and compartmentalization of the inactive X chromosome in the nucleus. Xist is located in such a domain [42]. In an adult organism, SATB1 is expressed only in the T-and B-lymphocyte precursors [43]. SATB1 recently demonstrated to be necessary for Xist RNAdependent silencing in tumor T cells [44].
Repression of pCx11 luciferase activity is connected with addition of the region [2500/2450 bp] containing the protected motifs V, VI, and 2. The vole region [2480/2450 bp] is quite conserved in the species analyzed (Fig. S1). The transcription factors Znf217, RAR_RXR and receptors ERa, GRE, AR, RORA2, ESRRB can bind to these motifs (Fig. S3). Znf217 interacts with the co-repressor CtBP (C-terminal Binding Protein) and inhibits expression of target genes [45]. ERa and ERRB are estrogen receptors expressed beginning at the blastocyst stage. Since random X-inactivation takes place approximately at the implantation stage [46], estrogen binding to the receptors may directly or indirectly regulate expression of Xist. In addition, E3 ubiquitin ligase Rnf12 being an activator of Xist expression is an ERa cofactor and apparently elevates the transcription level of estrogen-sensitive genes [47,48,49]. Therefore, Xist transcription during X-inactivation at the implantation stage may be activated by Rnf12 and ERa.
pCx12 [2550/2500 bp] comprises CNS2 [2540/2498 bp] and its luciferase activity is 2.5-fold higher than that of pCx11. This appears to be related to interaction of this region with the transcription factors STAT3 and HMGIY (HMGA1). Potential binding sites for STAT3 and HMGA1 overlap with motifs 1 and VII (Fig. S3). The potential HMGA1 binding site was found in all species analyzed. HMGA1 regulates expression of various genes by modifying chromatin structure and recruiting other transcription factors via protein-protein interactions. HMGA1 is expressed in embryonic fibroblasts and in several cell types (bone marrow cells, macrophages, tumor cells) of adults [50]. In embryogenesis, HMGA1 may interact with Xist 59 region on the future inactive X chromosome, leading to ''open'' chromatin structure and activation of Xist expression. No potential transcription factor binding sites were observed in the protected motifs VIII and IX. Some other yet uncharacterized regulatory proteins may be discovered in this region.
The constructs pCx13 and pCx14 containing the regions [2843/+67 bp] and [21453/+67 bp], respectively, demonstrated the same luciferase activity as pCx12 [2550/+67 bp] (Fig. 4), implying that there is no elements influencing transcription of vole Xist upstream of position 2550 bp. pCx14 has a lower level of luciferase activity than pCx13 and contains a CTCF binding site. However, a 600 bp difference in size between constructions does not allow us to make any conclusions about repressive role of CTCF based on reporter construct assay.
Thus, Xist expression in vole is regulated by numerous regulatory elements. The core promoter bears weak TATA box and initiator element and is involved in recruiting, correct orientation, and assembly of the preinitiation complex. These elements are necessary but not sufficient to initiate transcription at a high level. Therefore, additional elements of promoter (transcription factors SP1, AP2, and others) may be involved in activation and fine-tuning of Xist transcription. Since the status of Xist allele and whole X chromosome is established at the implantation stage, we paid close attention to potential binding sites for factors acting in early embryonic development. For example, a potential binding site for Jarid2 which recruits PRC2 in embryonic stem cells and participates in gene silencing was observed in the Xist promoter. The regulatory factors RAR_RXR, AR, GR, RORA, and ERa/Rnf12 can take part in Xist activation during initiation of X-inactivation in undifferentiated cells. HMGA1 and SATB1 belonging to architectural proteins appear to form specific chromatin structures on the future active and inactive X chromosomes, providing a platform for binding the corresponding transcription factors.
Xist expression can be regulated not only by transcription factors but also by CpG methylation of the promoter region. The function of such CpG methylation is to provide active state of one X chromosome by repressing Xist expression. Indeed, vole Xist minimal promoter is completely methylated on the active X chromosome and not methylated on the inactive one (Fig. 7). Moreover, study of luciferase activity of methylated reporter constructs (pCx5-Me and pCx14-Me) has shown that methylation of 59 region effectively represses Xist expression (Fig. 4).
Surprisingly, in the vole Xist minimal promoter we did not find the CTCF binding site described in human and mouse. Instead, CTCF binding was revealed in the 59 flanking region of the Xist promoter on the active X chromosome in both males and females. This site is conserved in different mammalian species. We believe that CTCF is an insulator which defines an inactive Xist domain on the active X chromosome in voles.

Ethics statement
The study was carried out according to ''The Guidelines for Manipulations with Experimental Animals.'' The study was approved by the Ethical Committee of the Institute of Cytology and Genetics, Novosibirsk, permit number: (order of the Presidium of the Russian Academy of Sciences of April 02, 1980 no. 12000-496).

Cell Cultures
Primary lung and embryonic fibroblasts of M. rossiaemeridionalis male (MsEf3=) and female (Sd10R), M. arvalis male (Maf=) and female (Maf2R) and primary female mouse fibroblasts (EFM2R) were derived as previously described [55,56]. Lines Sa006 and Sad4 were obtained by subcloning of the primary lung fibroblast cultures of hybrid females M. rossiaemeridionalis R6M. arvalis =. In all Sa006 cells, the M. rossiaemeridionalis X chromosome is inactive while that of M. arvalis is inactive in Sad4 line [26].

Preparation of Nuclear Extracts and DNase I in vitro Footprinting
Nuclear extracts from liver cells and fibroblasts (line sd10) of M. rossiaemeridionalis were isolated as described [57,58] and using a CelLytic TM NuCLEAR TM Extraction kit (Sigma). The nuclear extracts were stored at 270uC.
Two DNA fragments corresponding to the regions [2267/ +54 bp] (Pmin) and [2551/2372 bp] were used as probes. These probes were amplified by PCR with the primer pairs Msx7 59tatgtggcctttcctataagc-39/Supr7 59-tagatggacagagaccacagagg-39 and ERR 59-atagcccctcgtcttggtgac-39/ERF 59-cgctgagctgtttctctaccg-39, respectively. Before PCR one primer was labeled at the 59 end with [c _32 P]ATP and T4 polynucleotide kinase (Promega). The binding reactions were carried out according to the protocol recommended for Core Footprinting System (Promega). Reaction mixtures were supplemented with 5-50 mg of the nuclear protein extract or 2 mg of the recombinant transcription factor SP1. Nonspecific protein binding was inhibited by adding 0.1-1.0 mg of Poly(dI-dC) N Poly(dI-dC) (Sigma). To determine the nucleotide sequences of the bound regions Maxam-Gilbert sequencing reactions were performed concurrently with the footprinting. The samples were analyzed in a 4-6% denaturing polyacrylamide gel.

Electrophoretic Mobility Shift Assay
Double-stranded oligonucleotides (probes F-II, V-I, V-II, AP2K, AP2-43G) and PCR product (probe Pmin) were radioactively labeled and used in EMSA. The double-stranded oligonucleotides were obtained by annealing complementary singlestranded oligonucleotides and were labeled with [c-32 P]ATP and T4 polynucleotide kinase (Promega). The single-stranded oligonucleotides are given in Table S2.
To analyze CTCF binding, 3 ml of 56 buffer (125 mM HEPES pH 7.9, 250 mM KCl, 31.25 mM MgCl 2 , 5% Nonidet P-40, and 25% glycerin), 2 pmol of DNA probe, 300 ng of recombinant CTCF protein (Novus Biologicals) were mixed and incubated for 30 min at a room temperature. When studied AP2 binding, 1 ml of buffer (20% glycerin, 5 mM MgCl 2 , 2.5 mM EDTA, 2.5 mM DTT, 250 mM NaCl, and 50 mM Tris-HCl pH 7.5), 2 pmol of DNA probe, and 1 ml of AP2 extract (Core Footprinting System kit, Promega) were mixed and incubated for 10 min at a room temperature. After the incubation, the samples were analyzed in 6% polyacrylamide gel (AA/bisAA, 29:1) in 16 TBE buffer. The electrophoresis was carried out at a voltage of 10 V/cm. The dried gel was exposed with X-ray film for 12-24 h.

Real-time PCR
Quantitation of DNA in ChIP bound fraction was performed by real-time PCR using the Bio-Rad iQ5 real-time PCR detection system. PCR products were detected with SYBR Green. Real-time PCR was carried out in duplicate on control, ChIP, and input DNA samples at the following thermal cycling parameters 95uC for 5 min and 40 cycles of 95uC for 10 s, 58uC for 15 s, and 72uC for 25 s. Data were collected at 78uC for primer pairs 59tgacaagacatgggtttcttgaggcg-39 and 59-tgaacatcgcagtggttcacataggg-39 (CNS3, the region around 2940 bp), 59-gagctgagatgggacaattctc-39 and 59-tacttctggggaaagatctgga-39 (the fourth Xist exon, negative control) and at 84uC for primer pairs 59-tatgtgcctttcctataagc-39 and 59-tcccagagaccccgatagatg-39 (Xist minimal promoter). Data were analyzed by DC T method as described in Imprint chromatin immunoprecipitation kit protocol (Sigma; Cat. no. CHP1) and presented as percentage of input chromatin. We used next equation %Input = 2 {(CT (IP){CT (Input)zlog 2 (Input dilution factor )) , input dilution factor is 20 (5% of total chromatin were used as input).

Transient Transfection and Measurement of Luciferase Activity
Fragments of vole Xist promoter were amplified by PCR and cloned into the vector pGL4.10[luc2] (Promega). The nucleotide sequence of the insert was verified by sequencing. The cells were transfected using Lipofectamine TM 2000 (Invitrogen). The fibroblast cell line of female M. rossiaemeridionalis was concurrently transfected with the reporter and control (pGL-4.74[hRluc/TK], Promega) plasmids at a ratio of 10:1. Luciferase activity was determined 48 h after transfection by a Dual-LuciferaseH Reporter Assay System (Promega) according to the manufacturer's protocol. The luciferase activity was recorded in a Wallac 1420 multilabel counter using the following parameters: measurement time, 1 s; and delay duration, 0.1 s. The ratio of Firefly to Renilla luciferase activities was taken as the activity of reporter construct. The activity of each reporter construct was measured in three independent experiments. The significance of differences in the reporter gene expression was estimated using Fisher's test.

Methylation of Cytosine Residues in Plasmid DNA
The cytosine residues in reporter constructs were methylated with M.SssI CpG methyltransferase (NEB). The components of reaction mixture -12 ml S-adenosylmethionine (0.16 mM, NEB), 12 ml of 106 NEB buffer 2, 24 U of M.SssI methyltransferase (NEB), and 6 mg of plasmid DNA -were mixed and the volume was adjusted to 120 ml with bidistilled water. The reaction components were mixed and incubated for 1 h at 37uC. The reaction was stopped by heating for 20 min at 65uC. DNA was extracted with phenol-chloroform and precipitated with ethanol.

Bisulfite Sequencing
Genomic DNA (2 mg) was treated with bisulfite according to the manufacturer's protocol (EpiTect Bisulfite kit, Qiagen) and eluted in 20 ml of elution buffer. The PCR product was amplified by nested PCR using the bisulfite-converted DNA as a template. The primers specific for the modified DNA (forward primer 59tttgttatgagttttggtataatta-39 and reverse primer 59-aaaaactaaaaatattcccaaaaac-39) were used in the first PCR round. The following PCR conditions were used: 95uC for 5 min; 30 cycles of 95uC for 15 s, 54uC for 15 s, and 72uC for 20 s, and final extension at 72uC for 5 min. In the second round, the PCR product from the first round was used as a template and the primers confined a shorter fragment located inside the PCR product (forward primer 59tttaattattttttttagaaaatagtttgt-39 and reverse primer 59-aaaaaccacaaaaaaaactcacatc-39). The reaction was carried out under the same conditions. The bisulfite PCR products were gel purified and cloned into pGEM-T Easy (Promega), and independent clones were sequenced. Sequence analysis was visualized using Meth-Tools [59].  Figure S2 Computational analysis of the promoter region of the M. rossiaemeridionalis Xist gene. Consensuses of several identified potential transcription factor binding sites are shown above and below the nucleotide sequence. The sequence corresponding to the first conserved region, CNS1, is framed with a rectangle. Footprints are shown with red lines and numerals. Roman numerals denote the protected DNA motifs identified in the (+)strand and Arabic numerals, in the (2)-strand; arrow shows the transcription start site. The nucleotides conserved for vole, human, cow, dog, horse, and rabbit are shown in yellow. (TIF) Figure S3 Computational analysis of the second conserved region (CNS2) and the adjacent sequences in the 59 region of the M. rossiaemeridionalis Xist gene. Consensuses of several identified potential transcription factor binding sites are shown above and below the nucleotide sequence. The sequence corresponding to CNS2 is framed with a rectangle. Footprints are shown with red lines and numerals. Roman numerals denote the protected DNA motifs identified in the (+)-strand and Arabic numerals, in the (2)strand. In CNS2, the nucleotides conserved for vole, human, cow, dog, horse, and rabbit are shown in yellow. (TIF) Figure S4