Escape from X Inactivation Varies in Mouse Tissues

X chromosome inactivation (XCI) silences most genes on one X chromosome in female mammals, but some genes escape XCI. To identify escape genes in vivo and to explore molecular mechanisms that regulate this process we analyzed the allele-specific expression and chromatin structure of X-linked genes in mouse tissues and cells with skewed XCI and distinguishable alleles based on single nucleotide polymorphisms. Using a binomial model to assess allelic expression, we demonstrate a continuum between complete silencing and expression from the inactive X (Xi). The validity of the RNA-seq approach was verified using RT-PCR with species-specific primers or Sanger sequencing. Both common escape genes and genes with significant differences in XCI status between tissues were identified. Such genes may be candidates for tissue-specific sex differences. Overall, few genes (3–7%) escape XCI in any of the mouse tissues examined, suggesting stringent silencing and escape controls. In contrast, an in vitro system represented by the embryonic-kidney-derived Patski cell line showed a higher density of escape genes (21%), representing both kidney-specific escape genes and cell-line specific escape genes. Allele-specific RNA polymerase II occupancy and DNase I hypersensitivity at the promoter of genes on the Xi correlated well with levels of escape, consistent with an open chromatin structure at escape genes. Allele-specific CTCF binding on the Xi clustered at escape genes and was denser in brain compared to the Patski cell line, possibly contributing to a more compartmentalized structure of the Xi and fewer escape genes in brain compared to the cell line where larger domains of escape were observed.


Introduction
Dosage compensation in mammals is achieved by upregulation of the X chromosome in both sexes and random inactivation of one of the two X chromosomes in females [1].Initially, the future inactive X (Xi) is coated by the long non-coding RNA Xist (X inactive specific transcript-), a process essential for the onset of silencing [2].Inactive chromatin marks such as tri-methylation of lysine 27 on histone H3 (H3K27me3) are put in place along with DNA methylation at CpG islands, macroH2A modification, and late replication, which represent late and possibly secondary events that lock in silencing of most genes in somatic cells.Despite efficient silencing some genes escape X chromosome inactivation (XCI) and thus remain bi-allelically expressed in females [3,4].Surveys in cultured human/mouse hybrid cells and in cell lines from individuals with skewed XCI have shown that about 8-15% of human genes consistently escape XCI, 10-13% display variable levels of escape, and 10-20% vary between cell lines and individuals [5][6][7].Escape from XCI results in significant sexual dimorphisms in levels of gene expression, and bi-allelic expression of at least some escape genes is important for a normal phenotype in human females.Indeed, the presence of a single X chromosome (45,X) results in Turner syndrome characterized by poor viability in utero, infertility, short stature, and an array of other abnormalities [8,9].
XCI is usually random in somatic cells, thus allelic characteristics of X-linked genes can only be studied in cell populations with skewed XCI obtained by cell-cloning or flow-sorting, or by cell selection based on a specific mutation.To derive a cell line (Patski) with completely skewed XCI we previously used an Hprt mutation to select cells that contain an active X (Xa) from Mus spretus (abbreviated spretus) and an Xi from a Mus musculus laboratory strain (C57BL/6J abbreviated BL6) [23].This in vitro system allowed us to determine allele-specific gene expression based on frequent SNPs (single nucleotide polymorphisms) between the mouse species (1/50-100bp), and to identify genes that escape XCI [18].Mouse trophoblastic cells in which the paternal X is always inactivated offer an alternative way to identify genes that escape imprinted XCI in vitro [14].However, Patski cells and trophoblastic cells may not represent the in vivo situation and do not address potential differences between tissues.A recent study examined mid-gestation placenta to address escape from imprinted XCI in vivo; but this system only addresses imprinted XCI in an extra-embryonic tissue, the mechanism of which differs from random XCI in the embryo proper [24].Another recent study examined escape from XCI in mouse brain based on flow sorting cell populations with skewed XCI to >98% purity [25].
To determine the XCI status of genes in multiple tissues in vivo we developed a mouse model in which F1 animals have completely skewed XCI of the spretus X due to an Xist mutation on the BL6 X [26].In the present study this model was exploited to compare the XCI status of genes between mouse tissues.We developed a new binomial model to estimate the probability of bi-allelic expression based on RNA-seq and SNPs, resulting in the identification of common as well as tissue-specific escape genes, which were verified by RT-PCR.Allele-specific profiles for two features of active genes, RNA polymerase II occupancy and DNase I hypersensitivity, demonstrate active chromatin signatures at escape genes.In addition, allele-specific profiles of CTCF occupancy were obtained to examine its distribution relative to escape genes.

Pipeline to determine allele-specific gene expression
To assess allele-specific expression of X-linked genes in vivo, we mated BL6 females heterozygous for a deletion of the proximal A-repeat of Xist (Xist Δ/+ ) [27] to spretus males.In the resulting F1 Xist Δ/+ (thereafter called F1) female progeny the maternal BL6 X chromosome fails to inactivate, leading to completely skewed XCI in all tissues [26].This was further verified based on allelic expression of a gene known to be subject to XCI (Ubqln2), as determined by Sanger sequencing of RT-PCR products (S1A Fig) .Whole brains and spleens from two adult F1 females (biological replicates) and both ovaries pooled from one of these females were used for RNA-seq, followed by gene expression analyses using a new pipeline (see below) to better capture allele-specific reads based on high quality SNPs.For comparison and validation we also re-analyzed two independent RNA-seq datasets generated for the Patski cell line in which the XCI pattern is reversed, i.e. the BL6 X is inactive and the spretus X active [18].
To identify reads that map to each parental genome in F1 mice, a "pseudo-spretus" genome was assembled by substituting known SNPs between BL6 and spretus into the BL6 mm9 reference genome [26,28].SNPs were obtained from the Sanger Institute (SNP database Nov/2011 version) and from in house analysis [18].RNA-seq reads were aligned separately to the BL6 and to the pseudo-spretus genomes (see details in Material and Methods).We segregated all high-quality uniquely mapped reads (MAPQ !30) into three categories: (1) BL6-SNP reads containing only BL6-specific SNP(s); (2) spretus-SNP reads containing only spretus-specific SNP(s); (3) reads that do not contain valid SNPs.We refer to both BL6-SNP reads and spretus-SNP reads as "allele-specific reads".Examination of the distribution of allele-specific reads on genes known to be subject to XCI, for example Ubqln2, confirmed the absence of spretus reads due to XCI skewing in female F1 tissues (S1B Fig).
We calculated diploid gene expression based on all mapped reads using cufflinks/v2.0.2 [29] (http://cufflinks.cbcb.umd.edu/) to determine RPKM (reads per kb of exon length per million mapped reads).Next, we defined SNP-based haploid gene expression from alleles on the Xi or the Xa (Xi-SRPM or Xa-SRPM) to be allele-specific SNP-containing exonic reads per 10 million uniquely mapped reads.

Modeling the XCI status of genes
Here, we propose and validate a new binomial model to identify escape genes and estimate the statistical confidence of escape probability.
For each gene i on chromosome X, let the number of allele-specific RNA-seq reads mapped to the inactive/active chromosomes be n io and n i1 , respectively, and let.n i = n io + n i1 We model n io by a binomial distribution n i0 e Binomialðn i ; p i Þ where p i indicates the expected proportion of reads from the Xi.The estimate of the binomial proportion is Let zα/2 be the 100(1 -α/2) th percentile of N(0,1).The confidence interval of each pi is, To incorporate the mapping biases toward the BL6 genome over the pseudo-spretus genome into the above model, we define the mapping bias ratio r m for each RNA-seq experiment to be where N A0 and N A1 are the number of allele-specific autosomal reads in the "inactive X containing" genome and the "active X containing" genome, respectively.Considering the mapping biases the corrected estimate of p i is: The upper and lower confidence limits are corrected accordingly.
For each RNA-seq experiment, we called a gene "escape" if (1) the 99% lower confidence limit (α = 0.01) of the escape probability was greater than zero, indicating significant contribution from the Xi, (2) the diploid gene expression measured by RPKM was !1, indicating that the gene was expressed, and (3) the Xi-SRPM was !2, representing sufficient reads from the Xi.Note that we only considered exonic reads, except for the lncRNA Firre (see below).Biological replicates of RNA-seq experiments were analyzed separately.Examples of mRNA SNPread coverage on the Xa and Xi visualized in the UCSC browser are shown for a number of genes we determined to either escape or be subject to XCI (Figs. 1-3, S1, S2 and S4).There was a good concordance between reads at different SNPs within a given gene.Levels of escape from XCI determined by RNA-seq analysis correlated with measurements done by RT-PCR using species-specific primers, although the percent of expression from the Xi measured by RT-PCR was usually higher than that determined by RNA-seq.For example, Cfp and Plp1 had 5% and 0.3% SNP reads on the Xi by RNA-seq and 15% and 1.5% Xi (versus Xa) expression measured by RT-PCR with species-specific primers, respectively (Figs. 1C and 1D, S1C and S1D).

Classification of mouse escape genes
We determined that 3-5% of X-linked Refseq genes satisfy our criteria (see above) as escape genes in both biological replicates of F1 brain and spleen, respectively (S1-S3 Table ).In addition, 3% of X-linked genes variably escaped XCI, i.e., escaped in only one biological replicate.In F1 ovary 7% of X-linked genes escaped XCI (S1 and S4 Table ), possibly due to the analysis of a single replicate for this tissue (representing two pooled ovaries).Escape genes were distributed all along the mouse X chromosome and rarely clustered, and distance from the XIC (X inactivation center) did not appear to influence levels of expression from the Xi.We classified escape genes into two groups based on their XCI status, excluding genes with a different status in biological replicates: group 1 includes 12 genes that escape XCI in at least two of the three tissues analyzed and group 2, 26 genes that escape XCI in a tissue-specific manner (Table 1).When considering potential sex bias in gene expression we found that based on published data [30], a majority (27 of 38 or 71%) of group1 and 2 escape genes showed a female bias and thus may play roles in sex differences (S5 Table ).
Of the 12 common escape genes classified as belonging to group 1, only the lncRNA 5530601H04Rik was not included in our original survey in Patski cells [18], although our current re-analysis now includes it (see below).Mid1, a gene that straddles the boundary of the pseudoautosomal region (PAR) in M. musculus and escapes XCI in Patski cells [18] was not classified as an escape gene in our current analysis of mouse tissues with a spretus Xi, indicating that Mid1 is subject to XCI in M. spretus where it is located outside the PAR [31].This was confirmed in F1 brain by Sanger sequencing of RT-PCR products (S1E Fig).Within group 2 a total of 6, 5, and 15 genes escaped XCI selectively in F1 brain, spleen, and ovary, respectively.The findings of escape in a single tissue often reflected the unique expression pattern of these genes.Functional analysis showed that group 1 common escape genes often have functions relevant to many tissues, whereas group 2 genes have tissue-specific functions (S5 Table ).In fact, all 6 genes that escape XCI in brain have brain-related functions, for example, Gpm6b and Plp1 play a role in myelination [32].RT-PCR validation using speciesspecific primers confirmed Plp1 expression from the Xi in brain (S1C and S1D Fig) .In addition, Gdi1 and Syp have been implicated in X-linked intellectual disability in humans [33,34], and Gprasp1 mutations are associated with striatum-dependent behavior inhibition [35].In spleen a total of 17 escape genes were identified, including 5 spleen-specific escape genes, three of which, Cfp, Vsig4, and Bgn, implicated in immune functions [36][37][38][39] (Tables 1, S3 and S5).Cfp escaped from XCI in spleen but not in brain or ovary, as validated by RT-PCR using species-specific primers (Table 1 and Fig. 1B-1D).Vsig4 expression from the Xi in spleen was verified by Sanger sequencing of RT-PCR products, but in liver where Vsig4 is also highly  expressed this gene was subject to XCI, suggesting that escape in spleen was independent of tissue-specific expression (S1F and S1G Fig) .In ovary a total of 33 escape genes were identified, including 15 ovary-specific escape genes (Tables 1 and S4).Note that six additional genes were not included in Table 1 since they escaped in only one biological replicate of ovary (Idh3g, Mmgt1, Usp9x, Uba1, Huwe1, 1810030O07Rik).Among the ovary-specific escape genes, AU022751 and Bmp15 had nearly equal expression from the Xa and Xi, as confirmed by Sanger sequencing of RT-PCR products (Table 1, S1H and S1I Fig) .Since both AU022751 and Bmp15 are mostly expressed in oocytes where the Xi is reactivated [40] the finding of bi-allelic expression in ovary was not surprising.X reactivation could account for the larger number of escape genes in this tissue, even for genes that are not exclusively expressed in oocytes (S4 Table ).
Using our new pipeline to re-analyze two independent RNA-seq datasets for Patski cells, one previously generated in our lab [18] and the other generated on an AB Solid platform deposited for ENCODE [41], we identified 66 escape genes (S1 and S6 Table ).Fifty-two of these genes did not escape in any of the three F1 tissues (S2-S4 and S6 Table ).For example, Rlim that escaped XCI in Patski cells was subject to XCI in brain (Fig. 2A).Next we examined gene expression in F1 kidney since Patski cells were derived from 18.5dpc embryonic kidney [23].We found that two genes that escaped XCI in Patski cells, Shroom4 and Car5b (as seen here and in [18]), failed to express from the Xi in kidney, indicating that these genes escaped XCI only in the cell line (Figs.2B, 2C and S4).In contrast, RT-PCR analyses using species-specific primers for Hdac6 showed that this gene escaped XCI both in Patski cells and kidney, suggesting that this gene is a tissue-specific escape gene since it does not escape XCI in other tissues (Fig. 2D, 2E and Table 1).The lncRNA 5530601H04Rik was confirmed to escape XCI in both Patski cells and kidney using Sanger sequencing of RT-PCR products, indicating that this is a common escape gene in all tissues examined (Fig. 2F and Table 1).
The larger number of genes classified as "escape" in the new analysis of Patski cells compared to our previously published data [18] was mainly attributed to our new binomial model and not to differences in SNP number (see details in Methods).While our previous study used a ratio of Xi/Xa SNP reads greater than 0.1 (10% Xi expression) in the entire gene body to call a gene escape, our current binomial model to compare Xi-SNP reads to total SNP reads in exons more accurately assesses Xi expression, as shown by our verifications using RT-PCR with species-specific primers or Sanger sequencing (see above).Two of three genes previously classified as "escape" [18], Bgn and BC022960, were not included in our current list because their expression was below the 1RPKM threshold.The lncRNA Firre (6720401G13Rik) was initially excluded in our current analysis based on exonic SNPs because it failed to pass the !2 Xi-SRPM cutoff in tissues and in one biological replicate of Patski cells.However, when intronic SNPs were considered, similar to our previous survey [18] Firre was re-classified as an escape gene in F1 tissues and in both replicates of Patski cells (Fig. 2G, Tables 1 and S2-S4 and S6).This is also supported by another study [42], and by our findings of enrichment in RNA polymerase II phosphorylated at serine 5 (PolII-S5p) within the gene body of Firre on the Xi, suggesting alternative transcript start sites on the Xi (Yang et al., manuscript in review).

Enrichment in RNA polymerase II and DNAse I hypersensitivity at escape genes
The distribution of PolII-S5p was determined by ChIP-seq in one female F1 brain and in Patski cells in conjunction with re-analyses of two DNase I hypersensitivity datasets for Patski cells [41] to extract allele-specific profiles for comparison with the XCI status of genes.There was good agreement between bi-allelic promoter enrichment in PolII-S5p and escape from XCI, as shown for common escape genes Ddx3x, Kdm6a and 5530601H04Rik (Figs. 3, S2).In contrast, genes subject to XCI in both brain and Patski cells such as Igbp1 lacked these features on the Xi (Fig. 3).For genes that differed in terms of their XCI status in a tissue-specific manner, corresponding differences in PolII-S5p levels were noted between tissues.For example, PolII-S5p was bound only to the Xa allele of Rlim in brain where this gene is subject to XCI, while bi-allelic PolII-S5p enrichment was observed in Patski cells where the gene escapes XCI (Fig. 3).Similar results were seen at other differential escape genes, such as Shroom4 (S2 Fig) .Consistently, metagene analyses showed high PolII-S5p occupancy at the promoter regions of both alleles of escape genes but not of inactivated genes on the Xi in brain and Patski cells (Fig. 4A and 4F).For escape genes, there was a strong correlation between the ratio of PolII-S5p enrichment on the Xi versus the Xa with allele-specific expression levels (Fig. 4B and 4G).When examining all assessable X-linked genes, the level of PolII-S5p occupancy at the promoter was positively correlated with expression as expected (Fig. 4C and 4H).Allele-specific scatter plots showed a positive correlation between PolII-S5p enrichment at the promoter for Xi-alleles of escape genes, while no such correlation was observed for Xa-alleles (Fig. 4D, 4E, 4I and  4J).Interestingly, escape genes tend to have higher expression that inactivated genes (Fig. 4E  and 4J).We did find four genes that were not classified as escape genes in our RNA-seq analyses, yet showed PolII-S5p occupancy on the Xi with an average read counts >5 at their promoter (0.5kb ± TSS) in brain (Snx12) and Patski cells (Phf6, Trappc2 and Usp9x) (S2 and S3 Dataset).PolII-S5p Xi occupancy at Trappc2 could be due to overlap of its promoter region with that of Ofd1, an escape gene in Patski cells (S6 Table ).Note that TRAPPC2 and USP9X have been reported to escape XCI in human [6].
Analyses of promoter DNase I hypersensitivity in Patski cells showed similar correlations with X-linked gene expression as well as with Xi expression (Figs. 3, 5 and S2).In addition, there was a good correlation between DNase I hypersensitivity and enrichment in PolII-S5p at the promoter region of genes on the Xi (Fig. 5E).Thus, DNase I hypersensivity did not appear to be an "all or none" feature but rather was correlated to Xi expression level.

Allelic CTCF binding analysis
While genes that escape XCI in F1 brain were few and almost all isolated those that escape XCI in Patski cells were more numerous and tended to cluster.When considering all escape genes in either replicate of Patski cells we identified 13 clusters with a high density of escape genes compared to the number of genes subject to XCI, including a 440kb cluster containing 13 escape genes (S6 Table and S1 Dataset).Furthermore, 22/66 escape genes in Patski cells were found within 12 (dark gray-shaded areas) of the 16 long-range cis-regions (i.e.4C domains) previously shown to contain escape genes and to physically interact with Cdk16 and/or Kdm5c in neural progenitor cells (S6 Table) [43].To determine whether CTCF binding might contribute to differences in the number and distribution of escape genes between F1 brain and Patski cells allele-specific profiles of CTCF binding were generated by ChIP-seq.Discrimination between alleles using SNPs was verified by examining allele-specific CTCF binding at two imprinted autosomal regions known to differentially bind CTCF at the DMR (differentially methylated region).As previously demonstrated [44], H19 showed maternal CTCF binding, while Peg13 was bound by CTCF only on the paternal allele in both brain and Patski cells (S3A Fig).
For allele-specific peak calling all mapped reads were first used to identify enriched peak regions in the diploid genome.Two independent peak calling programs were applied, CisGenome (FDR cutoff 10 -5 ) [45,46] and MACS/v1.4 (p-value cutoff 10 -5 ) [47].We defined significantly enriched peak regions as those identified by both peak callers.Next, we selected allele-specific ChIP-seq peaks using a binomial test.For each diploid ChIP-seq peak region, we assumed that the numbers of BL6-SNP reads (n i , bl ) and spretus-SNP reads (n i , sp ) within the peak follow a binomial distribution, i.e., n i;bl e Binomialðn i ; p i Þ where n i = n i , bl + n i , sp is the sum of BL6-SNP reads and spretus-SNP reads in peak region i, and p i is the binomial parameter.Since the X chromosome behaves differently from autosomes due to skewed XCI in our systems, we estimated the X chromosome allelic background using all SNP reads in the identified diploid peak regions on the X only.That is, for peaks on the X chromosome, in which N x , bl and N x , sp are the total number of BL6-SNP and spretus-SNP reads in X peaks, respectively.Finally, BL6-preferred ChIP-seq peaks were defined as those that contain significantly more BL6-SNP reads (upper-tail binomial test, p-value <0.05), while spretus-preferred ChIP-seq peaks were identified using the lower-tail binomial test (p-value <0.05), and bothpreferred ChIP-seq peaks were those peaks that were not significant in the two above tests (p-value !0.25).In addition, we required the allele-assessable peaks have a minimal SNP read coverage of one allele-specific read (BL6-SNP and spretus-SNP reads) per 10 million mapped reads.Of the allele-assessable CTCF-binding peaks on the X chromosome in brain and Patski cells (1639/2263 and 374/532, respectively) we identified 212 (13%) Xi-and 366 (22%) bothpreferred, and 86 (23%) Xi-and 161 (43%) both-preferred, respectively (S7 Table ).The much larger number of CTCF peaks on the Xi in brain suggests a different structure of the Xi.In fact, only 62 of the Xi-binding CTCF peaks were common between brain and Patski cells (S7 Table ).
To investigate the spatial distribution of Xi-binding CTCF peaks, the local Xi-and bothpreferred CTCF peak density was calculated using a sliding window approach (window size: 500kb, step size: 1kb).Assuming that CTCF Xi-binding followed a Poisson distribution we fitted the data to estimate the Poisson parameters on the Xi, and calculated a p-value for each window.Enriched Xi-binding CTCF peaks were identified at a p-value cutoff of 0.01 and adjacent peaks were merged.Escape genes and Xi-binding CTCF clusters co-localized on the Xi but not the Xa, suggesting a role for CTCF binding at regions of escape (Fig. 6A and 6B).CTCF has been implicated in both transcription control and compartmentalization of the genome [48], thus, it was not surprising that CTCF peaks were found either at gene promoters or in intergenic regions.As expected, the 5'end of escape genes often displayed Xi-promoter occupancy by CTCF in both brain and Patski cells (Fig. 6C).To enrich in CTCF binding regions that might play a role in nuclear compartmentalization rather than transcription control, we then re-analyzed our data after excluding peaks located at promoters (±1kb from the TSS), which showed that CTCF clusters were still significantly associated with escape genes in brain (8/14 genes; p-value = 0.01, compared to a random sample of 500 X-linked genes; Fisher's exact test) (S3B Fig) .There was a similar trend in Patski cells, however the association (16/66 genes) was not significant, probably due to the lower number of CTCF peaks in the cell line.Interestingly, when allelic CTCF binding was analyzed in the context of higher order structure [43], CTCF Xi-preferred peaks were more significantly associated with 4C interacting domains than CTCF Xa-preferred peaks in brain and Patski cells (S8 Table ).Furthermore, the density of CTCF peaks on the Xi was inversely related to the number of escape genes in brain and Patski cells as shown by inspecting three regions for the density of X-preferred and both-preferred CTCF binding peaks in relation to the number of escape genes (Fig. 7A).This is consistent with larger regions of escape in Patski cells (S6 Table ).
We next examined allele-specific CTCF peak profiles in the UCSC genome browser at 200 and 139 regions of transitions between adjacent genes with a known XCI status in brain and Patski cells, respectively.Transitions were classified as between either two adjacent genes subject to XCI, an escape gene directly adjacent to a gene subject to XCI, or two adjacent escape genes (S9 Table ).There were no transitions between two escape genes in brain, reflecting their low abundance compared to the Patski cell line in which 18 such transitions were observed.A larger proportion of transitions between genes with a different XCI status than those between two inactivated genes had CTCF peaks located in intergenic regions, 21% versus 15% in brain and 8% versus 3% in Patski cells, respectively (S9 Table ).We then focused on specific transition regions: at the Kdm5c-Iqsec2 region Xi-preferred CTCF binding peaks were found between Iqsec2 and Kdm5c as well as within the gene body of Kantr located downstream of Kdm5c in brain where only Kdm5c escapes XCI (Figs. 7B and S4).In contrast, in Patski cells where Kdm5c and a short Iqsec2 transcript both escape XCI, Xi-preferred CTCF binding peaks were found both upstream of the short Iqsec2 transcript and within the gene body of Kantr but not between Kdm5c and Iqsec2, suggesting that lack of Xi-preferred CTCF binding in this region may contribute to a larger domain of escape in Patski cells (Fig. 7B).A similar situation was observed in the region between Rlim and Slc16a2, again suggesting that lack of CTCF binding in The vertical axis is the negative log of the calculated binomial p-value (-log (p-value)).The thin red dashed line represents a 0.01 p-value cutoff.(B) Similar analysis for CTCF Xa-and both-preferred peaks.There was no significant CTCF co-localization with escape genes on the Xa in either brain or Patski cells.(C) Average CTCF Xi-SNP read counts in ten 100bp windows at promoters (0.5kb upstream and downstream of the TSS) is plotted against mRNA-seq Xi-SNP read counts escape genes (purple) and for genes subject to XCI (gray) in brain and Patski cells.In brain, a higher proportion of escape genes (6/14; Fisher's exact test, p = 5e -9 ) had an average !10 reads (black line) at their promoter compared to genes subject to XCI (0/403).Similarly, in Patski cells a higher proportion of escape genes (9/65; Fisher's exact test, p = 0.0004) had an average of !1 read (black line) at their promoter compared to genes subject to XCI (3/204).doi:10.1371/journal.pgen.1005079.g006Patski cells led to a larger escape region (Fig. 7C).A different situation was seen at the Car5b-Siah1b region: Xi-preferred CTCF peaks were absent in brain where both Car5b and Siah1b are subject to XCI, while CTCF peaks flanked the Car5b promoter on both alleles in Patski cells where Car5b escapes XCI and Siah1b is subject to XCI (Figs. 7D and S4), suggesting that CTCF may play a role in the transition between escape and inactivated genes.Taken together, our results imply that CTCF binding may help configure escape domains via local chromatin looping, and/or facilitate the organization of escape genes at the periphery of the Xi territory.

Discussion
Based on allele-specific analyses we identified genes expressed from both X chromosomes in female mouse tissues.Only a minority of these genes escape XCI in a tissue-and cell type-specific manner, indicating that XCI and escape from XCI are tightly controlled in vivo.The probability of bi-allelic versus mono-allelic expression was calculated using a new algorithm that can be applied to any gene in the genome.Our study represents the first comprehensive analysis of escape from XCI in vivo in multiple tissues.
Our data and those of others indicate that for a subset of X-linked genes, escape from XCI is ubiquitous and thus represents an intrinsic property of these genes [22,43].Among these common escape genes Ddx3x, Kdm6a, Eif2s3x and Kdm5c represent genes that each has a conserved Y-linked paralog with a similar function [49,50].These X/Y genes play important roles in the regulation of transcription and translation and are highly dosage-sensitive, which could explain why they consistently escape XCI in all tissues examined (Fig. 8).Interestingly, we also identified tissue-specific escape genes, which will help understanding of functional mechanisms leading to sex differences in these tissues.For example, we identified six genes that escape XCI in brain, all of which have been implicated in brain functions, Gpm6b, Gprasp1, Syp, Gdi1, Plp1, and Tmem47 [51].Our results generally agree with a recent study of XCI based on flow-sorted brain cells with differentially labeled BL6 and Mus castaneus X chromosomes [25] (Fig. 8).Of seven escape genes reported in that study, five are included in the present study (5530601H04Rik, Ddx3x, Eif2s3x, Kdm5c, and Kdm6a).The two other genes are Itm2a that had too few Xi reads to be classified in our study, and Mid1 we have shown to be subject to XCI in M. spretus where it is outside the PAR (this study), while it escapes XCI in BL6 mouse brain [31,52].
Importantly, many of the escape genes we identified have significant female sex bias in expression, suggesting roles in sex differences (S5 Table) [1,30,[53][54][55][56][57].For example, three of the brain-specific escape genes we identified in mouse, GPM6B, SYP, and PLP1 also escape XCI in human, resulting in higher expression in female than male brain [6,58].Whether deficiency in these genes due to the presence of a single X chromosome in women with Turner syndrome contributes to mild cognitive impairment remains to be determined [59].Interestingly, of the five novel spleen-specific escape genes we identified, as many as three, Vsig4, Cfp, and Bgn, have been implicated in autoimmune disorders both in mouse and human [36,38,39].It is well established that autoimmune disorders are much more common in women and their incidence is increased in Turner syndrome, but specific genetic mechanisms are not well defined [60].Our in vivo study demonstrates that analyses of relevant human tissues, for example spleen in the case of CFP and BGN, two gene previously classified as being subject to XCI in cell cultures [6], will be critical to understand sex differences in specific disorders.
Do genes escape XCI only in tissues where they are most highly expressed?While genes that escape XCI often have high expression in a particular tissue, expression does not appear to be the sole driving force for escape.For example, Car5b is more highly expressed in ovary (42RPKM) than in Patski cells (8RPKM) and yet escapes XCI only in Patski cells.A previous study identified a set of 17 escape genes in mouse placenta [24] (Fig. 8).Many of these differ from escape genes found in our study, probably because extra-embryonic tissues undergo imprinted paternal XCI, which differs from random XCI [61].A comprehensive comparison of escape from XCI in available mouse tissues shows that only Eif2s3x escapes XCI in all tissues examined (Fig. 8).We found that escape from XCI represents a continuum of expression from the Xi compared to the Xa.While our data only includes genes expressed above a strict cutoff of 1RPKM, we cannot exclude that some genes with lower expression may also escape XCI.For genes with significant expression from the Xi (excluding Xist), expression ranged from 3-105% (median 18%) of the Xa expression level.Thus, expression from the Xi was usually lower than that of the Xa in mouse, similar to what has been reported in human, even though there are more escape genes in this species based on expression analyses of cell cultures [6] and on DNA methylation profiles in human tissues where 9% of human genes were found to have a X chromosome escape maps differ between mouse tissues.The position of genes that escape XCI using a !2 Xi-SRPM cutoff (black lines) is shown at left for three tissues (brain, ovary and spleen) from F1 mice analyzed in our study.Gene names are color-coded to reflect their classification into group 1 (green, common in at least two tissues) or group 2 (blue, brain-specific escape; red, spleen-specific; brown, ovary-specific) based on our criteria (see Table 1).Coordinates at left are based on UCSC genome build NCBI37/mm9.For comparison, the genes reported to escape XCI in sorted brain cells from a M. musculus x M. castaneus cross [25], and genes reported to escape imprinted XCI in mid-gestation placenta from a M. musculus x M. castaneus cross [24] are shown at right.Genes labeled green are common between studies.doi:10.1371/journal.pgen.1005079.g008methylation pattern consistent with escape [62].The mouse with 3-7% escape genes in tissues may be exceptional compared to other mammals in which XCI patterns are often more similar to the human pattern [3,63].
Our findings of a larger number of escape genes in Patski cells compared to mouse tissues may reflect either the acquisition of epigenetic changes leading to reactivation of X-linked genes in cell culture or a genuine property of these cells.Our analyses suggest that kidney-specific escape genes do exist and could explain in part the pattern seen in Patski cells, but we also found significant differences between the cell line and the tissue of origin.Clustering of escape genes in Patski cells but not in tissues suggests unstable silencing of large chromosomal regions.Using an in vitro system of cultured trophoblastic cells Calabrese et al. also identified a relatively large number of escape genes (35 out of 262 accessible) [14], which represents twice the number of escape genes found in mouse placenta [24].Furthermore, 22/66 escape genes in Patski cells are in regions of escape reported in cultured neural progenitor cells [43].Thus, the number of escape genes may be overestimated when based on studies of cultured cells, which are notoriously susceptible to epigenetic changes such as DNA methylation changes associated with gene expression and CTCF binding aberrations [64,65].We cannot rule out the possibility that XCI in embryonic cells including embryonic kidney cells from which Patski cells were derived, as well as trophoblastic cells and neural progenitor cells may simply be less complete than in adult tissues.Future studies will help sort out developmental aspects of escape from XCI.
We found a good correlation between escape from XCI and regulatory features associated with transcription, such as PolII-S5p occupancy and DNase I hypersensitivity at the promoters of genes on the Xi, indicating that escape regions have a more open chromatin configuration.This is consistent with escape genes being associated with histone marks characteristic of active chromatin [13][14][15][16][17][18]66].Interestingly, we found distinct CTCF binding patterns on the Xi and Xa.A study in human cells also reported that while a majority of CTCF peaks on the X chromosome are bi-allelic, some peaks are Xa-or Xi-specific [67].In addition, a recent study in differentiated mouse ES cells also describes significant differences between Xa-and Xi-specific CTCF peaks at escape gene loci (determined by ChIP-seq), as well as differences in interactions between transcripts and CTCF (determined by CLIP-seq) [68].These findings are in contrast to a study of imprinted XCI, in which Xi and Xa CTCF binding patterns were nearly identical [14].Thus, the role of CTCF in escape from XCI may differ between random and imprinted XCI.CTCF binding peaks were often located at the promoters of genes expressed from the Xi, in agreement with a role for CTCF in transcription regulation [69].In addition, since CTCF binding peaks located in intergenic regions also clustered with escape genes, CTCF may also be a factor in compartmentalization of the Xi.Chromatin interactions such as looping as determined by Hi-C are correlated with the distribution of CTCF binding [70].This is supported by our findings that Xi-preferred CTCF binding is more significantly associated with 4C interacting domains.Indeed, CTCF plays an important role in nuclear structure and is often found at the boundary between topological domains [71][72][73].Furthermore, regions containing escape genes are preferentially engaged in long range cis-interactions [43].Previous studies have shown that specific boundary elements possibly involving CTCF may have a role in the segregation of silenced domains from escape domains [21,74].The low density of CTCF peaks observed in Patski cells may result in a more relaxed structure of the Xi in the cell line, leading to an expansion of escape domains.Interestingly, disruption of CTCF binding at the borders of domains enriched in H3K27me3 in Drosophila results in a reduction in H3K27me3 levels in repressed domains [75], and loss of CTCF binding at super-enhancers results in increased expression of adjacent genes [76].It is important to note that a previous 5C study of the XIC has reported CTCF binding both at the boundaries of topologically associating domains (TADs) and within TADs, suggesting that CTCF is not the sole factor in determining Xi organization [77].Further studies will help define other elements that may help structure the Xi.
In summary, we demonstrate the utility of a mouse model to study XCI in vivo.Using this resource novel tissue-specific escape genes have been identified.Escape genes are associated with an open chromatin structure and CTCF binding may influence the definition of differential chromatin architecture of the X.

Tissue collection, hybrid mouse model and cell culture
Ovaries, spleen, liver, and whole brain were collected from female F1 obtained by mating C57B/6J females that carry a deletion of the Xist proximal A-repeat (Xist Δ ) (B6.Cg-Xist<tm5Sado>, RIKEN) [27] with M. spretus males (Jackson Labs).Female progeny were genotyped to verify inheritance of the Xist Δ allele using specific primers [27].F1 mice that inherited a maternal X chromosome with an Xist Δ fail to silence the BL6 X and thus have complete skewing of XCI of the paternal spretus X.All procedures involving animals were reviewed and approved by the University Institutional Animal Care and Use Committee (IACUC), and were performed in accordance with the Guiding Principles for the Care and Use of Laboratory Animals.Patski cells were cultured as previously described [18].

Validation of allelic expression
To verify skewing of XCI, cDNA and control genomic DNA (gDNA) extracted from each tissue were subject to PCR amplification of Ubqln2 followed by Sanger sequencing (S10 Table ).A similar approach was used to confirm the XCI status of Mid1, Bmp15 and Vsig4, Rlim, Shroom4, Car5b, and 5530601H04Rik (S10 Table ).Allele-specific RT-PCR was done to confirm Xi expression of Plp1, Cfp, Hdac6.Briefly, cDNA was made by Superscript II reverse transcriptase (Life Technologies) using oligo-dT primers according to manufacturer's protocol.PCR reactions with non-species specific and BL6-specific or spretus-specific primers (S10 Table ) were performed using tissues from BL6, spretus and Xist Δ hybrid F1 mice.Actinβ was used as a positive control.For quantification, gel band intensities were measured using ImageJ software (http://imagej.nih.gov/ij/) and, together with RNA-seq Xi levels, plotted to compare expression from the Xi and Xa.

ChIP-seq with allele-specific analyses
ChIP-seq using PolII-S5p (Abcam) and CTCF (Millipore) ChIP-grade antibodies were performed as described [26].The specificity of the PolII-S5p antibody (Abcam) was verified by blocking immunostaining with synthetic peptides (Abcam ab18488).A pseudo-spretus genome was assembled by substituting available SNPs (from Sanger) into the BL6 UCSC Genome Browser NCBIv37/mm9 reference genome.Reads from genomic DNA sequencing, ChIP-seq, and DNase I-seq experiments were mapped separately to the BL6 reference sequence (mm9) and to the pseudo-spretus genome using BWA/v0.5.9 [78] with default parameters.Only those reads that mapped uniquely and with a high-quality mapping score (MAPQ !30) to either the BL6 genome or the pseudo-spretus genome were kept for allele-specific analyses (see details in main text).

Fig 1 .
Fig 1. Evaluation of escape from XCI in mouse tissues.(A) Example of mRNA SNP read distribution profiles on the Xi and Xa for Kdm5c, an escape gene common to all mouse tissues tested (brain, spleen and ovary).SNP reads specific to the Xa (blue) and Xi (green) are visualized in the UCSC genome browser.RNAseq read quantification was done by normalizing reads from the Xi to total reads (Xi + Xa) in two biological replicates.(B) Example of mRNA SNP read distribution profiles on the Xi and Xa for Cfp, a gene that escapes XCI only in spleen.SNP reads specific to the Xa (blue) and Xi (green) are visualized in the UCSC genome browser.(C, D) Validation of escape from XCI for Cfp.(C) Gel electrophoresis of RT-PCR products using non-species-specific primers and spretus-specific primers (sp) (S10 Table) in BL6, spretus, and F1 brain in which the Xi is from spretus.ActinB was used as a control.Control reactions include "No RT" (no reverse transcriptase) and H 2 O (instead of primers).(D) Graph comparing RT-PCR Cfp gel band quantification measured by ImageJ with SNP read quantification measured by RNA-seq.Xi product abundance measured by RT-PCR using spretus-specific primers in F1 spleen was normalized to total RT-PCR product abundance measured by non-species specific primers.doi:10.1371/journal.pgen.1005079.g001

Fig 2 .
Fig 2. Validation of Rlim, Shroom4, Car5b, Hdac6, 5530601H04Rik expression profiles and Firre mRNA profiles.(A) Sanger sequencing tracings of Rlim cDNA confirm bi-allelic expression in Patski cells but not brain, while gDNA sequence tracings show SNP heterozygosity (C in BL6 and T in spretus).Arrows indicate SNP positions.(B, C) Sanger sequencing tracings of Shroom4 (B) and Carb5 (C) cDNA confirm that these genes are subject to XCI in F1 kidney while they were shown to escape XCI in Patski cells [18].gDNA sequence tracings show SNP heterozygosity (Shroom4-G in BL6 and A in spretus; Car5b-G and C in BL6; A and T in spretus).Arrows indicate SNP positions.(D, E) Validation of escape from XCI for Hdac6.(D) Gel electrophoresis of RT-PCR products using non-species-specific primers, spretus-specific primers, and BL6-specific primers (S10 Table) in BL6, spretus, Patski cells and F1 kidney.ActinB was used as a control.Control reactions include "No RT" (no reverse transcriptase) and H 2 O (instead of primers).Sanger sequencing tracing confirms heterozygosity (A in BL6 and G in spretus) in the left primer (S10 Table).(E) Xi expression of Hdac6 was determined to be 9% of total expression in Patski cells by gel band quantification measured by ImageJ.(F) Sanger sequencing tracings of 5530601H04Rik cDNA confirms that the lncRNA escapes XCI in kidney and Patski cells, while gDNA sequence tracings show heterozygosity (T and A in BL6; A and G in spretus).Arrows indicate SNP positions.(G) mRNA SNP read distribution profiles on the Xi and Xa for Firre, a lncRNA that escapes XCI in mouse tissues and Patski cells.Note that Firre is classified as a variable escape gene in brain (S1 Dataset).Xa SNP reads are in blue and Xi SNP reads in green.doi:10.1371/journal.pgen.1005079.g002

Fig 3 .
Fig 3. Enrichment in PolII-S5p and DNase I hypersensitivity on the Xi allele at escape genes.(A, B) Examples of allele-specific PolII-S5p occupancy profiles and expression (mRNA) profiles at Ddx3x, Rlim and Igbp1 in two systems: brain (A) and Patski cells (B).Ddx3x escapes XCI in both systems, Rlim escapes XCI in Patski cells only, and Igbp1 is subject to XCI in both systems.PolII-S5p is enriched at the promoter regions (highlighted by a red box) of escape genes on both the Xa and the Xi, whereas enrichment is limited to the Xa for genes subject to XCI.DNase I hypersensitivity tested in Patski cells only is also increased at the promoter regions (highlighted by a red box) of escape genes on both the Xa and Xi, but is limited to the Xa for genes subject to XCI.Genes that escape XCI are labeled orange and genes subject to XCI blue.Color-coded profiles are shown for the Xa (blue) and Xi (green) SNP reads, and for the total reads (Xt, black).See additional examples in S2 Fig. doi:10.1371/journal.pgen.1005079.g003

Fig 4 .
Fig 4. PolII-S5p enrichment at promoters of X-linked genes on the Xi correlates with expression and escape from XCI. (A) Metagene analyses in brain show average Xa-(left) and Xi-(right) SNP read counts in 100bp windows 3kb upstream and downstream of the TSS for escape genes (purple; 14 genes with !2 Xi-SRPM in both replicates for brain) and for genes subject to XCI (gray; 403 genes with <2 Xi-SRPM in both replicates for brain).(B) The ratios of PolII-S5p enrichment at the promoter (SNP reads within ±500bp of the TSS) of escape genes (purple) on the Xi versus the Xa are strongly correlated to the ratios of expression from the Xi versus the Xa measured by RNA-seq in brain.Xist is excluded in this analysis.(C) Scatter plot of PolII-S5p promoter enrichment (log 2 of reads within ±500bp of the TSS) against expression levels of all X-linked genes (log 2 RPKM) shows a positive correlation in brain.(D) Scatter plot of Xi-specific PolII-S5p promoter enrichment (SNP reads within ±500bp of the TSS) against expression (Xi SNP reads) for escape genes (purple) and genes subject to XCI (gray) in brain.Promoter PolII-S5p enrichment correlates with expression from the Xi.(E) Same analysis as for D but for Xa-specific PolII-S5p promoter enrichment in brain.Escape genes generally overlap with genes subject to XCI, but are often highly expressed and enriched in PolII-S5p.(F-J).Same analyses as A-E for Patski cells.doi:10.1371/journal.pgen.1005079.g004

Fig 5 .
Fig 5. DNase I hypersensitivity at the promoters of X-linked genes correlates with expression and escape from XCI. (A) Metagene analyses of DNase I hypersensitivity (DHS) in Patski cells show average Xa-(left) and Xi-(right) SNP read counts in 100bp windows 3kb upstream and downstream of the TSS for escape genes (purple; 43 genes in both replicates for Patski cells) and for genes subject to XCI (gray; 203 genes with <2 Xi-SRPM in both replicates for Patski cells).(B) Scatter plot shows a positive correlation between DHS at the promoter (log 2 of all reads within a region ±500bp from the TSS) and expression (log 2 RPKM) for all X-linked genes in Patski cells.(C) Scatter plot of Xi-specific DHS at the promoter (reads within a region ±500bp from the TSS) against expression (Xi SNP-reads) for escape genes (purple) and genes subject to XCI (gray) shows a correlation between DHS and level of escape from XCI in Patski cells.(D) Same analysis as in C but for Xa-specific DHS at the promoter region.Escape genes generally overlap with genes subject to XCI although escape genes tend to have high expression and high DHS.(E).Scatter plot shows a good correlation between DHS and enrichment in PolII-S5p at the promoter of genes on the Xi.DHS and PolII-S5p are shown as reads within a region ±500bp from the TSS for escape genes (purple) and genes subject to XCI (gray) in Patski cells.doi:10.1371/journal.pgen.1005079.g005

Fig 6 .
Fig 6.Xi-associated but not Xa-associated CTCF peak clusters co-localize with escape regions.(A) Significant CTCF Xi-binding clusters were mapped along the Xi in brain and Patski cells.Xi-and bothpreferred peaks were determined by a binomial model and used for density analysis.Red bars represent merger of clusters of CTCF Xi-binding peaks, while purple dots represent escape genes.Significant Xibinding CTCF binding clusters tend to co-localize with chromatin containing escape genes.Little change was seen after removal of promoter-associated CTCF binding (S3B Fig).Horizontal axis represents the Xi in Mb.The vertical axis is the negative log of the calculated binomial p-value (-log (p-value)).The thin red dashed line represents a 0.01 p-value cutoff.(B) Similar analysis for CTCF Xa-and both-preferred peaks.There was no significant CTCF co-localization with escape genes on the Xa in either brain or Patski cells.(C) Average CTCF Xi-SNP read counts in ten 100bp windows at promoters (0.5kb upstream and downstream of the TSS) is plotted against mRNA-seq Xi-SNP read counts escape genes (purple) and for genes subject to XCI (gray) in brain and Patski cells.In brain, a higher proportion of escape genes (6/14; Fisher's exact test, p = 5e -9 ) had an average !10 reads (black line) at their promoter compared to genes subject to XCI (0/403).Similarly, in

Fig 7 .
Fig 7. CTCF peaks density and distribution differ in brain and Patski cells.(A) Examples of Xi-preferred and both-preferred CTCF peaks distribution in three X chromosome regions (coordinates in million bp on top).The density of escape genes (purple dots, with total number under each region) is inversely related to the density of Xi-preferred (green) and both-preferred (brown) CTCF-binding peaks when comparing brain to Patski cells.(B) Allele-specific CTCF binding profiles around Kdm5c, a common escape gene flanked by Iqsec2 and Kantr.In brain where only Kdm5c escapes XCI, CTCF binding is present at the 5' end of the gene at the transition (double star) between Kdm5c and Iqsec2 whose short and long transcripts are subject to XCI (see also S4A Fig).In Patski cells there is no such CTCF binding between Kdm5c and Iqsec2, which escapes XCI.CTCF also binds proximal of the Iqsec2 short transcript in both brain and Patski cells, which could represent a proximal boundary of an escape domain.(C) Similar analysis at a region around Rlim, a gene that escapes XCI in Patski cells but not in brain, while the adjacent gene Slc16a2 escapes XCI in both systems.A CTCF peak is present in the transition region only in brain.(D) Similar analysis in a region around Car5b, a gene that escapes XCI in Patski cells but not in brain (see also S4B Fig).CTCF binding peaks are located within the body of Car5b and in the transition between Car5b and Siah1b on the Xi in Patski cells.Genes that escape XCI are labeled orange and genes subject to XCI blue.Xa SNP reads are in blue and Xi SNP reads in green.Red stars indicate Xi-or both-preferred CTCF peaks on the Xi; one of the CTCF peaks is marked by a black star because it is present but was not called preferred at our cutoff.doi:10.1371/journal.pgen.1005079.g007

Fig 8 .
Fig 8.  X chromosome escape maps differ between mouse tissues.The position of genes that escape XCI using a !2 Xi-SRPM cutoff (black lines) is shown at left for three tissues (brain, ovary and spleen) from F1 mice analyzed in our study.Gene names are color-coded to reflect their classification into group 1 (green, common in at least two tissues) or group 2 (blue, brain-specific escape; red, spleen-specific; brown, ovary-specific) based on our criteria (see Table1).Coordinates at left are based on UCSC genome build NCBI37/mm9.For comparison, the genes reported to escape XCI in sorted brain cells from a M. musculus x M. castaneus cross[25], and genes reported to escape imprinted XCI in mid-gestation placenta from a M. musculus x M. castaneus cross[24] are shown at right.Genes labeled green are common between studies.

Table 1 .
[6]ape genes grouped as common or variable between mouse tissues.Gene names are listed to reflect their classification into group 1 (common in at least two tissues) or group 2 (escape in only one tissue).Tissue-specific escape genes escape XCI in 1 of 3 tissues but not in any replicate of other tissues.For brain and spleen average SRPM-Xi/Xa values between replicates are shown.Ovary SRPM-Xi/Xa values represent one sample.Gene expression is shown as RPKM.Whether human homologs to the mouse genes escape XCI is shown in the last column, which indicates the proportion of mouse x human hybrid cell lines that had expression of the gene from the human Xi[6].A dash indicates the gene was not assayed for escape in human in that study.Firre RPKM (*) was based on reads from exons and introns (see alsoS2-4Table and Methods).doi:10.1371/journal.pgen.1005079.t001