This is an uncorrected proof.
Extensive cell-to-cell variation exists even among putatively identical cells, and there is great interest in understanding how the properties of transcription relate to this heterogeneity. Differential expression from the two gene copies in diploid cells could potentially contribute, yet our ability to measure from which gene copy individual RNAs originated remains limited, particularly in the context of tissues. Here, we demonstrate quantitative, single molecule allele-specific RNA FISH adapted for use on tissue sections, allowing us to determine the chromosome of origin of individual RNA molecules in formaldehyde-fixed tissues. We used this method to visualize the allele-specific expression of Xist and multiple autosomal genes in mouse kidney. By combining these data with mathematical modeling, we evaluated models for allele-specific heterogeneity, in particular demonstrating that apparent expression from only one of the alleles in single cells can arise as a consequence of low-level mRNA abundance and transcriptional bursting.
In mammals, most cells of the body contain two genetic datasets: one from the mother and one from the father, and—in theory—these two sets of information could contribute equally to produce the molecules in a given cell. In practice, however, this is not always the case, which can have dramatic implications for many traits, including visible features (such as fur color) and even disease outcome. However, it remains difficult to measure the parental origin of individual molecules in a given cell and thus to assess what processes lead to an imbalance of the maternal and paternal contribution. We adapted a microscopy technique—called allele-specific single molecule RNA FISH—that uses a combination of fluorescent tags to specifically label one type of molecule, RNA, depending on its origin, for use in mouse kidney sections. Focusing on RNAs that were previously reported to show imbalance, we performed measurements and combined these with mathematical modeling to quantify imbalance in tissues and explain how these can arise. We found that we could recapitulate the observed imbalances using models that only take into account the random processes that produce RNA, without the need to invoke special regulatory mechanisms to create unequal contributions.
Citation: Symmons O, Chang M, Mellis IA, Kalish JM, Park J, Suszták K, et al. (2019) Allele-specific RNA imaging shows that allelic imbalances can arise in tissues through transcriptional bursting. PLoS Genet 15(1): e1007874. https://doi.org/10.1371/journal.pgen.1007874
Editor: Gregory S. Barsh, Stanford University School of Medicine, UNITED STATES
Received: August 29, 2018; Accepted: December 4, 2018; Published: January 9, 2019
Copyright: © 2019 Symmons et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Raw and processed data, as well as scripts for all analyses presented in this paper, including all data extraction, processing, and graphing steps are freely accessible from the Open Science Framework (URL https://osf.io/sbjcw/?view_only=09393298e00e4b8c9d0dd1b24b0318d9).
Funding: AR and MSB acknowledge support from from NIH/NIBIB R33 EB019767 (https://www.nibib.nih.gov/research-funding), AR from NSF CAREER Award 1350601 (https://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503214), NIH 4DN U01 HL129998 (https://grants.nih.gov/grants/guide/rfa-files/RFA-RM-14-010.html) and NIH Center for Photogenomics (RM1 HG007743). M.S.B. received funding from NIH/NIGMS R37 GM051279. OS received funding from EMBO long-term fellowship ALTF 691-2014 (http://www.embo.org/funding-awards/fellowships/long-term-fellowships) and Human Frontier Science Program fellowship LT000919/2016-L (http://www.hfsp.org/funding/postdoctoral-fellowships). IAM acknowledges funding from NIH/NIGMS T32GM007170 (University of Pennsylvania MSTP) (https://www.nigms.nih.gov/grants-and-funding), NIH/NHGRI T32HG000046 (https://www.genome.gov/10000991/nhgri-funding-opportunities-research/), and NIH/NINDS F30NS100595 (https://www.ninds.nih.gov/Funding/Find-Funding-Opportunities). JMK acknowledges funding from NIH K08 CA193915 (https://grants.nih.gov/funding/index.htm), St. Baldrick’s Foundation (https://www.stbaldricks.org/), Alex's Lemonade Stand Foundation and Margaret Q. Landenberger Foundation (https://ofd.research.chop.edu/node/712). JP and KS were supported by NIH/NIDDK R01 DK076077, R01 DK087635 and DP3 DK108220. Sectioning was performed at the Penn Center for Musculoskeletal Disorders Histology Core (NIH P30-AR06919). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: AR receives royalty income related to Stellaris RNA FISH probes.
Gene expression in genetically identical individual cells often deviates from that of the cell population average , which in mammals can impact cell fate and development [2–5], response to environmental stimuli [6–9] and disease [10–13]. Over the past few years, it has emerged that at least some of this variability arises due to random fluctuations in the biochemical processes that underlie transcription and translation. In the case of transcription, a primary source of fluctuations is so-called transcriptional bursting, where a gene alternates between an active state, during which RNA is produced, and an inactive state, where no RNA is transcribed. Because both the time of onset of these bursts and the amount of RNA produced in a single burst are random, this process can lead to cell-to-cell variability [14–16].
An additional nuance to the effects of bursting on cellular variability is that diploid mammalian cells carry two sets of chromosomes (one from each parent), which means that they also have two copies of each individual gene. It is typically assumed that for most genes both copies, called alleles, are capable of being expressed, thus providing protection through redundancy if one of them is mutated [17,18]. Recent studies, however, which made use of crosses between distantly related mouse strains and high-throughput sequencing, uncovered that there can be extensive differences in the relative expression levels of the two alleles [19–23]. Additional work then showed that even if transcripts from both alleles are detected at the population level, there may be substantial variation in the degree of allelic imbalance in single cells. For example, while for most genes individual cells express RNA from both alleles, for other genes the population can be a mixture of cells expressing RNA from only one or the other allele. This latter expression pattern has been termed random monoallelic expression, and certainly, genes with such an expression profile exist: most X-linked genes are expressed only from one X chromosome due to random X-chromosome inactivation [24–26], and a similar pattern has also been shown for some autosomal genes, such as olfactory receptors or antigen receptors [27–29]. Understanding random monoallelic expression is of particular interest given that quantitative cell-to-cell differences or spatial heterogeneity in allele-specific gene expression have the potential to modify phenotypic outcome if the two alleles harbor different functional variants, as has been described for both X-linked (eg. [30–34]) and autosomal traits [35,36].
Beyond these prototypic cases, it has been proposed that many more autosomal genes may be subject to random monoallelic expression [37–43], but some key properties of this extended class sets them apart from the more established examples. Similarly to the initial group, these genes were classified as displaying random monoallelic expression because cells with only transcripts from one or the other gene copy were observed, with both types of cells present in the same experiment. In addition, some of these genes maintain their monoallelic expression status over multiple passages in clonal cell lines [40,43], so a specific, heritable mechanism could limit transcription to only one allele, as is the case for X chromosome inactivation. However, thus far such a mechanism remains elusive [40,41]. Moreover, unlike previously established cases, many genes in this extended set are not expressed exclusively monoallelically, and typically a subset of cells or clones with transcripts from both alleles can also be detected [37–41]. This suggests that if a specific mechanism does exist to regulate monoallelic expression, it is limited to only a subset of cells.
To resolve this question of mechanism, Reinius et al proposed an elegant scenario (so-called dynamic random monoallelic expression), whereby the many genes with random monoallelic expression may arise not by differential cell- and allele-specific regulation, but instead monoallelic expression may arise by chance [43,44]. In this scenario infrequent transcriptional bursting would lead to cells that contain only RNA from one of the two gene copies. This observed monoallelic expression of mRNA would be temporary and the allelic state of a cell could change over time. The authors confirmed this model in clonal cell lines , but whether the same is true in tissues is still an open question. Different groups have deployed single-cell transcriptomics to determine the degree of cell-to-cell allelic imbalance [42,43,45,46], but technical limitations inherent to low abundance RNA quantification, as well as parameter choice can impact the interpretation of allele-specific sequencing data [47,48]. Thus, it has been hypothesised that the level of monoallelic expression, especially at single-cell level, could be an overestimate [45,49]. This absence of precise quantitative data has made it difficult to definitively answer if random monoallelic expression observed in vivo requires a dedicated mechanism or if it could arise as a consequence of transcriptional bursting.
In this study, we adapted a previously described single-molecule RNA fluorescent in situ hybridization technique that is sensitive to single-nucleotide differences between RNAs for the analysis of transcripts in snap-frozen, cryosectioned tissues from different mouse strains and their hybrids [50–52]. This allowed us to determine the allelic origin of individual mRNAs in single cells, while preserving both their spatial context and their in vivo expression levels. We used this method to measure allele-specific expression of multiple autosomal genes and of Xist, a gene for which it is well-documented that individual cells randomly and exclusively transcribe either the maternal or the paternal copy. Quantitative analysis of the data enabled us to answer whether the autosomal genes we investigated were expressed from one or both gene copies in single cells in tissue. While we observed monoallelic expression in some cells, mathematical modeling showed that this pattern was compatible with random transcriptional events (including transcriptional bursting) from the two alleles producing low levels of RNA, rather than an explicit mechanism governing random monoallelic expression.
SNV-specific detection of RNAs in mouse tissue using single molecule RNA FISH
Our goal in tissues was to quantitatively measure the amount of cell-to-cell variability in transcript abundance from either the maternal or paternal allele of a gene to determine the degree of imbalance between transcripts arising from the two alleles. To make these measurements, we modified a protocol previously developed by our group  that enables the detection of single nucleotide variants (SNVs) on individual RNA molecules in situ in cultured cells (Methods, Fig 1A). We applied this method to mouse kidney tissues from C57BL/6J (BL6) and JF1/Ms (JF1) mice and their F1 progeny, since these two strains harbor a large number of genomic differences, allowing us to target most genes with multiple SNV-specific probes.
A. Collecting tissue from F1 heterozygous mice from crosses between BL6 and JF1 mice enabled allele-specific RNA detection in single-cells by utilizing a “toehold probe” strategy that targets polymorphisms between the two strains. While only a single polymorphism-probe is shown here for illustration, all genes were targeted with multiple SNV probes to increase the fluorescent signal. B. The pipeline for allele-specific detection of Xist foci in kidney tissue sections: First, whole-section tissue scans were used to collect fluorescence data in the guide and SNV channels (the scan shown is for the Xist guide probes labelled with Cal fluor 610). The guide probe signal was then used to automatically detect Xist foci, and the fluorescence intensity for the two allele-specific probes was extracted at the positions of those foci (BL6 was labelled with Cy3, JF1 labelled with Cy5). Clustering of the relative fluorescence intensities was applied to all foci, allowing automated allelic assignment for all spots across the entire tissue. C. Quantification of Xist allelic assignments in homozygous and heterozygous tissues. For each sample we depict the fraction of assigned BL6 and JF1 foci. For heterozygous samples biological replicates (ie tissue samples from different animals) are shown separately to reveal inter-individual variability in random X inactivation, while data from a technical replicate (two separate tissue sections from the same animal) is also plotted to show the effect of intra-individual variability and technical noise on allelic assignments. D. The pipeline for allele-specific detection of punctate mRNA spots in kidney tissue sections: whole tissue scans at 60x resolution in DAPI (staining nuclei) were used to obtain an overall impression of tissue morphology. Then, we collected z-stacks for randomly selected locations at 100x resolution in all channels of interest, which enabled detection of single cells and thresholding of Cal Fluor 610 detecting Aebp1 mRNA (top), BL6-detection probes labeled with Cy3 (middle), and JF1-detection probes labeled with Cy5 (bottom). Testing colocalization of these probes revealed allelic assignments for guide probes (far right).
To demonstrate our ability to detect expression from the BL6 vs. JF1 allele using our technique in tissue, we first examined the expression of Xist, a prototypical example of random monoallelic expression [25,53]. In female cells Xist is transcribed exclusively from the (randomly chosen) inactivated X chromosome, is expressed at high levels, and contains a large number of SNVs between the two substrains, making it an ideal test case for our method. We collected kidney tissues from mice at day 4 of postnatal development, snap-froze them in liquid nitrogen or on aluminium blocks in dry ice and cryosectioned them at 5 μm thickness. We then applied our in situ hybridization protocol using allele-specific probes. As expected, we observed the appropriate gender-specific expression pattern: no fluorescent signal in male tissues (S1 Fig), whereas in female tissues, our guide probes clearly labelled nuclear Xist foci, which colocalized with signal from the strain-specific probes. For SNV-targeting probes, controls in homozygous tissues confirmed that only the appropriate probes bound, with little binding from the probes targeting the other strain (S1 Fig). Whereas in heterozygous samples the two strain-specific probes each labelled a subset of the nuclei in the anticipated mutually exclusive pattern (Fig 1B). To further analyse the data, we developed a pipeline to computationally identify Xist RNA foci, and automatically classify them as BL6 or JF1 within an entire scanned kidney section (Methods, Fig 1B). This algorithm identified tens of thousands of Xist foci per section (mean 22k, min 8.2k, max 30k), and—in agreement with our manual inspection of the data—predominantly identified Xist foci of the correct identity in the homozygous samples (S1 Table), while the overall population ratio of BL6:JF1 foci ranged from 45:55 to 65:35 in heterozygous samples (Fig 1C, S2A Fig).
Having verified that we could correctly measure the allelic origin of clusters of Xist RNA accumulated on the X chromosome in mouse tissues, we next ascertained whether the method would also work for monodisperse spots corresponding to single RNA molecules, which is how most mRNAs appear in the cell. We considered this challenging, because single, punctate RNA spots would be both smaller and considerably dimmer compared to Xist RNA, which accumulates multiple copies on the inactivated X chromosome. Thus, to test our protocol for use on single RNA molecules, we designed probes for 8 autosomal genes that contained at least one polymorphism in BL6 versus either JF1 or the C7 strain (which carries both copies of chromosome 7 from the M. musculus castaneus strain in a BL6 background). These genes were selected to represent genes with (Aebp1, Churc1, Lyplal1) and without (Aqp11, Mpp5, Podxl, Prcp, Stard5) putative random monoallelic expression [38–41] and with different expression levels, patterns and chromosomal locations. Some of the chosen genes were also linked to specific kidney disease phenotypes (Aqp11 , Mpp5 [55,56], Prcp ) (Methods, S4 Table). As expected, these genes were typically expressed at much lower levels than Xist and punctate individual mRNA spots could not be as readily observed in low magnification scans. We therefore combined whole tissue scans in a single plane at low magnification with random sampling of the tissue at higher magnification, where we imaged the entirety of the section. This approach allowed us to identify individual mRNA spots within the context of the whole tissue section in the 60x scan, while the additional data collected from the 100x z-stacks facilitated our ability to precisely determine colocalization between the guide probes and the strain-specific probes (Fig 1D). Colocalization between the strain-specific signal and the mRNA probes allowed us to determine from which allele a given mRNA originated. Accordingly, this could be used as a key readout for SNV-specific single molecule RNA FISH, and we characterized the quality of the experiments using colocalization rate (i.e. what percentage of the guide spots colocalized with allele-specific spots). When we assayed the overall colocalization rates for these autosomal genes in kidney sections, we found that 4 out of 8 genes had mean colocalization rates >50% (S4 Fig, S4 Table), which is comparable to colocalization rates previously observed in cultured cells , showing that we were able to perform quantitative SNV-specific single molecule RNA FISH directly in tissue. We also observed an apparent trend between colocalization rate and the number of SNV probes, where genes with fewer SNV probes had lower colocalization rates than those with more SNV probes. We tested a series of parameters that could affect probe binding (base composition, GC content, probe secondary structure and folding energy), but found no parameter that differentiated between the probes with high and low colocalization rates (S4 Fig). However, it should be noted that SNV probes were only tested as full sets (i.e. all SNV probes for a given gene were tested together), and we therefore do not know the binding behaviour of individual probes.
Collectively, these results showed that we could directly visualise and assign strain-specific identities to both focally localised RNA (Xist) and single molecule RNA spots in the context of whole tissue sections. This motivated us to ask if we could directly quantify cell-to-cell heterogeneity in the allelic origin of RNAs in tissue.
Quantifying cell-to-cell heterogeneity in the chromosomal origin of RNA in tissue
To determine how the chromosomal origin of RNAs contributes to cell-to-cell heterogeneity in tissue, we focused on two different questions. For Xist, we investigated the spatial clustering of cells based on their X inactivation choice, i.e. to what extent cells expressing Xist from either the JF1 vs BL6 X chromosome intermix. For the autosomal genes, we quantified their allelic imbalance in single cells, i.e. whether individual cells expressed RNAs from the two chromosomes at different ratios.
In the case of Xist, we measured spatial clustering of allele-specific expression because each cell is randomly and fully committed to expressing RNA from only the BL6 or the JF1 chromosome. Thus, allelic imbalance in tissue is not due to quantitative expression differences between the two alleles in single cells. Instead, it can arise either as a consequence of overall skewing of X chromosome inactivation rates or due to uneven spatial distribution of cells with a given inactivated X chromosome. Such spatial partitioning can arise through the local expansion of cells in which X chromosome inactivation “choice” has already been fixed, resulting in extended patches of cells carrying the same inactive X chromosome [58–61]. Because we had observed fairly balanced expression from the BL6 and JF1 chromosome in our initial analysis of heterozygous samples (see previous section, Fig 1C), we next determined whether cells expressing Xist from either the BL6 or JF1 allele segregated spatially. Although visual inspection of the sections revealed no extended regions where cells expressed Xist foci with the same strain-specific identity, computational modelling could potentially reveal a more precise view of spatial patterning. We therefore developed a metric that characterized the distribution of cells expressing BL6 Xist RNA (Methods and S2 Fig) and then compared this to either completely randomized BL6 and JF1 assignments or randomizations where we introduced different sized clusters of cells. We found that in all tissues BL6 cells were less evenly distributed than the random assignments (S2C Fig), suggesting that cells cluster together more than expected in a completely random scenario. Our subsequent comparison with the clustered assignments further supported and refined this interpretation: it showed that our data was most similar to simulations with smaller cluster sizes. The closest matching seed size was different, depending on what scale we assessed spatial partitioning, but centered around a cluster size of 2–4 (mean cluster size: 3.2, standard deviation 0.7) (S2 Table, Fig 2A and S2D Fig). Thus, cells expressing Xist from the same chromosome clustered together in small patches in tissue sections, resulting in a spatially fairly mixed population of cells and showed no evidence of extended patches with the same allele-specific expression.
A. Prediction of minimal size of Xist clusters with the same allelic identity. For each sample (x axis) we show the seed size with the closest matching variance for different subdivision sizes (grey dots, for original data see S2 Fig). The overlaid box plots show the overall distribution of these data points, with the red dot indicating the mean seed size from all subdivision sizes. B,D and F. Measurements of allele-specific expression in single cells (scatter plots) and in bulk data (pie charts) for Aebp1 (B), Lyplal1 (D) and Mpp5 (F). For each gene, data are shown for BL6 homozygous (left), JF1 homozygous (middle) and heterozygous (right) samples. For the single-cell scatter plots cells only contained integer numbers of RNA, but we included jitter to better display the density of cells with a given allelic distribution. For all plots we combined data from different replicates with more than 40% colocalization rate (summaries for each experiment are shown in S5 Table), and the two colors in the scatter plots indicate data from the two reciprocal crosses (ie BL6 x JF1 and JF1 x BL6). C and E. Fluorescence micrographs and allelic assignments of Lyplal1 (C) and Mpp5 (E) guide spots in BL6 x JF1 heterozygous kidney cells. For both genes guide probes were labelled with Cal fluor 610. G. BL6 allelic ratios in single cells with >2 RNA with allelic assignments for homozygous and heterozygous samples for Aebp1 (left), Lyplal1 (middle) and Mpp5 (right).
For the autosomal genes we wanted to know how much the allelic imbalance differed between cells. Visual inspection of our data indicated a range of allelic ratios, including BL6 monoallelic, JF1 monoallelic, as well as biallelic mRNA expression. To quantify this, we focused on three autosomal genes (Aebp1, Lyplal1, Mpp5) where >50% of guide spots colocalized with signal from SNV probes (we excluded Podxl, because the high expression levels of Podxl mRNA in podocytes precluded separating individual RNA spots (S7 Fig)). We selected this cutoff based on testing in cultured mouse fibroblasts, where we had found that in cells with >50% colocalization rate, our method consistently identified the correct allele (Methods, S5 Fig). First, we considered that the observed chromosomal origin (BL6 vs. JF1) could either be due to true biological variability or to technical error (as seen when we detected RNA from the “incorrect” strain in homozygous tissues). To distinguish these, we determined the BL6 and JF1 signal for these genes in kidney tissue from both BL6 and JF1 homozygous mice, as well as in tissue from reciprocal heterozygous crosses. For all three genes, we counted only a few mRNAs in the majority of cells (mean number of RNA spot counts per cell: 3.5 for Aebp1, 3.2 for Lyplal1 and 2.6 for Mpp5). These RNA counts measured by RNA FISH were higher than those observed via single-cell RNA sequencing of the kidney . The mean RNA count by single-cell sequencing ranged between 1.0–1.78 UMI/cell for Aebp1, 1.0–1.2 UMI/cell for Lyplal and 1.0–1.63 UMI/cell for Mpp5, depending on the cell type, considering only those cells where transcripts for these genes were detected (S3B Fig). Given the higher detection rate of our method, it may be particularly useful for assessing allele-specific expression for these lowly expressed genes.
We predominantly detected the correct allele in the homozygous kidney samples, both in bulk and at the single cell level (Fig 2B, 2D and 2F). In heterozygous samples we observed a more balanced presence of both BL6 and JF1 mRNAs, with the reciprocal crosses showing similar results (heterozygous data in Fig 2B, 2D, 2F and 2G, and S5 Table). These results indicated that our technical error (false positive rate) was less than the biological variability and that we could use our method to measure quantitative single-cell differences. Moreover, when we compared the BL6 allelic ratios in homozygous and heterozygous cells with >2 mRNA, we found some heterozygous cells with allelic ratios similar to those of the homozygous samples, but also a subset of cells with an allelic ratio that was intermediate to that of homozygous cells (Fig 2G). We obtained similar results when we switched the fluorescent dyes conjugated to the SNV detection probes used to label the BL6 and JF1 mRNA (Methods, S6 Fig). The observation that single cells had transcripts from both the BL6 and JF1 allele were particularly intriguing for Aebp1 and Lyplal1, because these two genes had been previously identified as genes with putative random monoallelic expression in other tissues [38–41]. Still, given that we did in fact observe cells with either BL6 or JF1 monoallelic mRNA expression in heterozygous tissue, akin to the random monoallelic expression pattern, we wanted to know which transcriptional models our quantitative single-cell allelic imbalance data was compatible with, in order to explain how this expression patterns could arise.
Observed heterogeneity of strain-specific RNA is compatible with bursty biallelic expression
Our results showing that individual cells could have mRNA from either one or both alleles motivated us to assess whether existing models of transcription were sufficient to explain the observed cell-to-cell variability in allelic imbalance. To evaluate these models, we first used the correlation coefficient between the BL6 and JF1 mRNA expression in our population as a simple metric that captures the joint behaviour of the two alleles: a correlation coefficient of zero represents no coordinated expression between the two alleles, positive values indicate coordinated expression, while negative correlation could be indicative of anti-correlation or repulsion between the two alleles. The correlation between BL6 and JF1 mRNA counts were 0.41 for Aebp1, 0.29 for Lyplal1 and 0.35 for Mpp5 suggesting somewhat coordinated expression. To better understand how these values compare to the expectations from different models of transcription, we initially considered two extreme cases: an “all-or-none” scenario in which every cell has transcripts exclusively from one or the other gene copy, and a “coin flip” scenario in which the allelic origin of every individual transcript is essentially indistinguishable from random coin flipping (Fig 3A). The former scenario could suggest the existence of regulatory mechanisms that limit transcription to only one gene copy per cell (an extreme form of random monoallelic expression, as is the case with Xist), whereas the latter corresponds to a null model with no distinct allele-specific transcriptional regulation. We used computational modeling to simulate these two scenarios and to discriminate between them.
A. Outline of two extreme scenarios that could lead to variable single-cells allelic imbalance. B, C. Simulated single-cell RNA counts (B) and correlation coefficients of simulated and observed data (C) in the case of the “all-or-none” model. D, E. Simulated single-cell RNA counts (D) and correlation coefficients of simulated and observed data (E) in the case of the coin-flipping model. F. Single-molecule RNA FISH of Aebp1 introns (left) and guide spots (right) in mouse kidney, including a cell with two transcription sites (top), a transcription site on the BL6 allele (middle) and a transcription site on the JF1 allele (bottom). The position of the transcription sites are highlighted by red arrows and the allelic assignment of the guide spots are indicated by colored circles. Introns were labelled with Atto488 (top) or Alexa700 (middle, bottom). G. Outcome of simulating the “transcriptional bursting” scenario for each allele separately (i) and for the alleles together (ii). Top row represents simulated single-cell RNA counts, bottom row shows negative log likelihood distribution of simulated and observed data for the BL6 and JF1 allele individually (i) and the correlation between randomly paired simulated RNA counts (ii). All log likelihood distributions (in C, E and G) and the correlation distribution (in G) were obtained from 10,000 simulations, and one of these simulations was picked randomly for each gene to display RNA distributions in B, D and G.
We first checked if our data were similar to those expected in the “all-or-none” scenario, in which the transcripts in each individual cell were either solely from one or the other gene copy. Looking in heterozygous cells, it seemed qualitatively apparent that (as noted) many cells have mRNAs from both gene copies, which was seemingly incompatible with this scenario. However, it was still formally possible that cells in reality only had transcripts exclusively from one of the gene copies and that the apparent transcripts from the other copy were technical artifacts due to false detection events. We used homozygous tissue to measure the rate at which these false detection events occur, and thereby estimated the expected false detection rate in heterozygous cells. We then computationally simulated hypothetical “all-or-nothing” heterozygous cell populations taking into account these false detection rates, and found that the RNA counts and distributions observed in such a simulated all-or-none population were still inconsistent with our data (compare Fig 3B with heterozygous data in Fig 2B, 2D and 2F). Next, we calculated the correlation of the BL6/JF1 counts for the “all-or-nothing” simulations, which showed much lower correlation values than we had observed in the real data (Fig 3C). Thus, the two alleles in our simulated data were uncorrelated or anti-correlated, as would be expected in the case of random monoallelic expression, and were unlike the positively correlated BL6 and JF1 mRNA we had observed our measurements. In addition, the probability of observing the strain-specific mRNA counts we measured versus simulated cell populations was also different (S8 Fig). Collectively, these results indicate that in the kidney, none of the three genes we interrogated displayed “all-or-none” expression.
In our alternative scenario, it is possible that the two copies of the gene transcribe RNA independently and each random transcription event produces just a single RNA, thus leading to the “coin-flipping” model in which most cells would have RNAs from both alleles in them, but with some statistical noise about this population average (Fig 3A). We modeled the outcome of such a scenario and found that the real versus simulated single-cell RNA distributions looked very similar for all three genes (compare Fig 3D with heterozygous data in Fig 2B, 2D and 2F). We also saw that the correlation between the BL6 and JF1 allele in our modeled data was similar to our measured correlation in the case of Lyplal1 and Mpp5 (Fig 3E). A statistical analysis showed that when we treated the strain-specific RNA counts per cell as a series of independent coin-flips, the probability of the observed distributions for both Lyplal1 and Mpp5 fell within the distribution of likelihoods from our simulated model (S8 Fig). Together, these results demonstrated that the allelic imbalance observed for Lyplal1 and Mpp5 was compatible with a simple coin-flipping null model of transcription from the two alleles.
In the case of Mpp5, where we had found false detection rates to differ most depending on which fluorescent dyes were coupled to the allele-specific probes, we collected data from heterozygous tissues swapping the dyes used on the BL6 and the JF1 probes and repeated our analysis and simulations. We obtained similar results, showing that a model for all-or-none expression did not recapitulate the allelic counts observed in our data, whereas the coin-flip model did (S6B and S6C Fig). We also tested whether the detection efficiency of ~50% influenced our conclusions about the “all-or-none” vs. “coin flip” scenarios by repeating our simulations assuming 100% detection efficiency and then randomly downsampling to the measured detection rate. The results were essentially indistinguishable from those obtained without downsampling, further verifying that our results were not due to the technical noise introduced by the low detection efficiency (S5C and S5D Fig).
All analyses of our simulations therefore supported the conclusion that allelic imbalances for Lyplal1 and Mpp5 were compatible with a simple coin-flipping null model of transcription. Aebp1, however, showed higher levels of imbalance per cell than could be explained by this model. This was not clearly visible on a scatterplot of the simulated RNA counts per cell (Fig 3D), but the measured BL6 and JF1 counts were less correlated than those in the simulated dataset (Fig 3E). Yet this gene did not exhibit the all-or-none behavior either. We therefore considered a third, intermediate scenario motivated by the phenomenon of transcriptional bursting [14–16]. Transcriptional bursts refer to the fact that most mammalian genes are transcribed in short pulses during which multiple transcripts are synthesized, interspersed between periods during which the gene remains inactive. When there are two copies of a gene, each bursting independently , the expected result would be that some cells may have more transcripts from one of the copies than expected by the coin flipping model above; bursting would be akin to getting several heads or tails in a row every time one flipped a coin.
To test whether transcriptional bursting could explain the observed data, we first wanted to confirm that Aebp1 was indeed transcribed in a burst-like fashion. To verify this, we measured Aebp1 transcriptional activity directly in kidney cells by using intronic probes (Fig 3F), which, owing to the extremely short half-life of introns, detect almost exclusively nascent transcripts at the site of transcription . This showed that 19% of cells with Aebp1 mRNA were also actively transcribing Aebp1, and that these transcription sites contained more than 1 RNA based on their fluorescence intensity relative to cytoplasmic RNA spots (average 1.6x higher fluorescence intensity in 3 independent experiments, S9 Fig). This data also showed that the majority of actively transcribing cells had only one Aebp1 transcription site, although a small subset (6 out of 55 (11%) cells with Aebp1 transcription sites) showed simultaneous expression from both alleles. This corroborated that cells indeed produced Aebp1 in transcriptional bursts and that it was possible for individual cells to transcribe RNA from both alleles simultaneously.
Next, we turned to simulations to assess the RNA distributions that we would expect in a scenario where the two alleles transcribed RNA independently from each other and in bursts. Initially, we estimated the expected burst size (average number of RNAs that were transcribed together in a single burst) and burst frequency for the two alleles based on the observed RNA counts for each allele independently and used these parameters as inputs for our model (see methods for details). BL6 and JF1 RNA counts simulated this way closely matched our measurements, which was also reflected by the likelihood of the real data falling within that of the simulated data for the two alleles separately (Fig 3G). We then wanted to see whether the degree to which there was allelic imbalance in single cells could be explained by the two alleles bursting independently; i.e., whether for per-cell RNA counts from the two alleles was more or less correlated than one would expect by chance. To simulate the null hypothesis of no interaction between alleles we randomly paired up the modeled BL6 and JF1 counts, mimicking cells that contain RNA from both alleles. When we compared this simulation to the real data we consistently observed that the modeled per cell BL6 and JF1 counts were were less correlated that the real pairwise measurements (Fig 3G and S10 Fig). Thus, while in our measurements cells with high BL6 expression typically also expressed JF1 at higher levels (compare heterozygous data in Figs 2B and 3G top panel), in the modeled data BL6 and JF1 counts showed little correlation. This was also true when we incorporated false detection events in our model to account for possible incorrect allelic assignment, as we had done in the “all-or-none” model (S11 Fig). Together, our transcription site measurements and simulations showed that the observed allele-specific single-cell RNA counts for Aebp1 were compatible with transcriptional bursting of the two gene copies individually and that expression from the two alleles was correlated.
Thus, for Aebp1, Lyplal1 and Mpp5 the observed monoallelic expression in some single cells can likely be explained by low levels of transcription occurring randomly from the two gene copies without having to invoke a special mechanism that limits expression to one of the alleles.
There has been great interest in recent years to precisely measure expression from the two alleles of a gene in diploid cells, ideally directly in tissue and at the single-cell level. The RNA fluorescence in situ hybridization method described here is a step in this direction: by visualizing endogenous SNVs it enables the assignment of single RNA molecules to their allele of origin in single cells and in the context of whole tissue sections. We now provide quantitative information about cell-to-cell heterogeneity with single transcript resolution, which is an extension of our previous work, where we used this method for a more qualitative assessments (i.e. presence-absence) of parental origin of mRNA in tissue . When we applied our method to autosomal genes we observed that individual cells in heterozygous tissue spanned the entire range from all RNAs originating from the BL6 chromosome through various more mixed populations to all RNAs originating from the JF1 chromosome. These observed allelic imbalances were not due to the parental origin of the gene copies, because reciprocal crosses (BL6xJF1 vs JF1xBL6) showed similar results. We therefore asked what model could explain the observed single-cell allelic imbalance pattern and combined our data with computational simulations to address this question. We found that the observed allelic distributions could be recapitulated by a model where transcription occurred randomly from the two alleles, perhaps with moderate transcriptional bursting (e.g. in the case of Aebp1). Thus, we did not have to invoke a special mechanism that restricted expression to only one allele to explain the presence of cells with either BL6 or JF1 monoallelic expression status.
Our results suggest that cells with RNA expression from only one of the alleles occur due to the low levels of expression and thus the limited number of random sampling events from the two gene copies. Because transcription is a dynamic process this state is likely transient so that while a cell may have mRNA from only one allele at a particular time, it may gain mRNA from the other allele a short time later if another transcriptional event occurs. For Mpp5, this conclusion is in line with the fact that the reported haploinsufficient phenotype is thought to be caused by overall reduced dosage [55,56] rather than by a special subpopulation of cells with monoallelic expression. More generally, this scenario links the observation of transcriptional bursting with that of random monoallelic expression, as put forward by Sandberg et al [43,44], and explains why we observed the co-occurrence of cells with one and two transcription sites in the same population. In the case of Aebp1 and Lyplal1, which had previously been identified to display random monoallelic expression, our data suggest that no additional mechanism is needed to explain the presence of cells with monoallelic expression in kidney tissue. However, we cannot extrapolate these results to other tissues and mouse strains, given that random monoallelic expression is tissue specific. Thus, we cannot conclude whether mechanisms for allele-specific regulation or for maintenance of monoallelic status are present in the mouse strains and cell types used in the studies that originally identified genes with random monoallelic expression [37–41]. Interestingly, though, those studies also observed that many genes with monoallelic expression in one clonal cell line are expressed biallelically in others and that the expression of a given gene is often lower in cell lines with monoallelic expression than in those with biallelic expression [38–40], which is consistent with our results.
In addition to the data on Aebp1, Mpp5 and Lyplal1, we also demonstrated our ability to distinguish Xist expression from the BL6 vs JF1 chromosome and to assess the spatial relationship of cells expressing different parental alleles. We detected a spatially fairly mixed population in the kidney, where cells expressing Xist from the same chromosome clustered together in small patches in transverse sections. This is contrary to other tissues, such as intestinal crypts or the skin [58,59,61,64], and suggests either a larger number of kidney precursor cells or extensive cell migration during development. Because our method only provides a snapshot in time we cannot easily distinguish between these scenarios (lineage vs. migration), especially given that the kidney is a complex organ, composed of cells originating from different embryonic lineages that undergo extensive migration during development even after birth [65,66]. Regardless of the developmental mechanism, however, our data indicate that there is likely no major spatial segregation due to X chromosome inactivation in this tissue, and in the case of mutations, any phenotypic effects would be fairly evenly distributed.
Through the examples detailed above we have shown how our method can be used to directly quantify cell-to-cell differences that arise due to differential expression from the two alleles in diploid cells. Our approach overcomes multiple limitations imposed by previous methods: First, because this method enables sensitive SNV-specific detection of even single mRNA molecules it provides more information than RNA FISH measurements that rely solely on quantifying the number of transcription sites in individual cells [36,37,40,41,67]. Second, although our approach tests only single genes in a given experiment and thus has much lower throughput than single-cell sequencing-based methods, it relies on direct detection of transcripts and is therefore not subject to subsampling and dropout, which complicate the interpretation of sequencing-based cell-to-cell variability results [44,46,68–70]. Finally, by making use of pre-existing endogenous SNVs it eliminates the need for genetic manipulation, for example to label the gene of interest with fluorescent tags, as has been done to measure X chromosome inactivation choice [61,71] or to monitor the transcription of autosomal genes from the two chromosomes [9,72]. Moreover, because the breeding history for classical inbred mice has lead to extended regions of shared ancestry (and shared SNVs) between different strains [73,74], a probeset developed for one strain can often easily be adapted for another strain. For example, while we measured Xist expression from the BL6 and JF1 allele, the probes were designed so that they should distinguish equally well between the 129-strains and CAST/EiJ, which are also commonly used to study strain-specific expression.
In addition, while our quantitative analysis focused solely on genes with a relatively high number of SNVs and high mean colocalization rates (>50%), it should be noted that we did not systematically explore the relationship between SNVs and colocalization rates, and also that lower colocalization can be sufficient to address specific questions, as was demonstrated in a recent single cell in situ analysis of A-to-I RNA editing . It is therefore likely that depending on the scientific question, less stringent cutoffs can be applied to colocalization rates and/or the number of SNVs required. In addition to the number of SNVs/allele-specific probes, the applicability of our method also depends on the expression level of the gene of interest. Based on the genes tested here, our method is applicable for genes with low-to-moderate expression levels, corresponding to approximately 2–25 FPKM in bulk sequencing data (see Methods). The lower boundary for this estimate is set by Churc1, a gene that is consistently expressed at low levels in all cell types of the kidney (0.1–1 UMI/cell, ~2 FPKM). It is technically possible to determine colocalization events at this mRNA density, though a large number of cells needs to be screened for robust quantification. The upper boundary is based on the observation that a high density of transcripts makes it impossible to precisely assign colocalization. In our experiments only Podxl in podocytes showed such high expression levels. Based on the relatively low proportion of podocytes in the kidney we determined that this matched an expression level of >800FPKM in these cells (based on bulk sequencing data) and an average count of 6.9 UMI/cell in single-cell sequencing data. Importantly, given the higher detection efficiency of RNA per cell, allele-specific single-molecule RNA FISH provides a valuable complementary tool to single-cell transcriptomics for low-to-moderate expression level genes, and will likely be useful in confirming findings made by sequencing.
In conclusion, we demonstrated how quantitative measurement of allele-specific expression in tissue could be used to directly determine the level of allelic imbalance in single cells. By combining these measurements with modeling, we showed that random monoallelic expression could arise in vivo by chance alone. Beyond this application, our methods could have a number of additional uses. Similar analyses could be performed in other tissues and, for example, could enable the evaluation of genetic variants directly in the tissue believed to be affected if there are genic SNVs in linkage with those variants or to study mutations thought to lead to haploinsufficiency. Furthermore, with single cell resolution, our method allows for the interrogation of particular cellular subtypes within a tissue. In concert with recent genome-wide association studies in single cells [76,77], this technique provides a useful tool for quantitative assessment of allele-specific genetic effects.
Mice, tissue harvest, sectioning and fixation
C57BL/6J and JF1/Ms founder mice were obtained from Jackson Laboratories. All mouse work was conducted in accordance with the University of Pennsylvania Institutional Animal Care and Use Committee. For tissue collection we used either homozygous C57BL/6J or JF1/Ms pups, or F1 heterozygotes from both C57BL/6J x JF1/Ms or JF1/Ms x C57BL/6J crosses. We dissected pups at postnatal day 4 using standard techniques, and mounted tissues in Tissue-Plus O.C.T. compound (Fisher Healthcare), flash-froze them in liquid nitrogen or on an aluminum block in dry ice, and then stored tissues at −80°C. We determined sex of the animals by visual inspection and verified this by SRY-specific PCR on DNA extracted from a tail sample, collected during dissection. Tissues were cryosectioned at 5 μm using a Leica CM1950 cryostat. We adhered tissue samples to positively charged Colorfrost plus slides (Fisher Scientific), washed slides in PBS, fixed them in 4% formaldehyde for 10 min at room temperature, then washed again in PBS two times. Fixed slides were stored in 70% ethanol at 4°C.
We shortlisted genes that had previously been identified in studies of random monoallelic expression, genes with known function and/or phenotypes in kidney and genes with documented expression in different cell types in the kidney. We then filtered shortlisted genes for expression levels of >20 TPM in publicly available bulk kidney sequencing data , accessed via Expression Atlas (https://www.ebi.ac.uk/gxa/). We later established that this corresponded to expression levels of >10FPKM in bulk and 0.1–1 UMI/cell (in the tissue(s) with highest expression; S3 Fig and S9 Table) using additional bulk  and single-cell  transcriptome data, respectively. The only gene with lower expression was Churc1 (TPM <10, FPKM ~2.2), for which we were nevertheless able to robustly detect transcripts using the guide probes (in agreement with a count of 0.1-1UMI/cell in all cell types of the kidney), and identify colocalization events above random (pixel-shifted) controls. The only gene with higher expression was Podxl with a UMI of 6.9 per cell in podocytes.
Probe design, synthesis and labelling
To identify exonic SNVs between the C57BL/6J and JF1/Ms strains we used the NIG Mouse Genome Database (http://molossinus.lab.nig.ac.jp/msmdb/index.jsp) . For Aebp1 and Lyplal1 we confirmed these SNVs through PCR amplification and sequencing of exonic sequences of genomic JF1/Ms DNA. All guide probes and the Aebp1 intron probe set were designed using the Stellaris probe designer (Biosearch Technologies), SNV-specific probes were designed as specified in Levesque et al.  and mask oligonucleotides were selected to leave a 7-11bp overhang (toehold) sequence (all probe sequences available in S6 Table). Guide probes were purchased labeled with Cal fluor 610 (Biosearch Technologies), while SNV-specific probes and intron probes were ordered with an amine group on the 3′ end. For these latter probes we pooled the oligonucleotides for each probe set and coupled them to either NHS-Cy3 or NHS-Cy5 (GE Healthcare) for the allele-specific probesets, or NHS-Atto488 or NHS-Atto700 (Atto-Tec) for the intronic probes. We purified dye-coupled probes by high-performance liquid chromatography. Mask oligonucleotides were used unlabelled.
DNA extraction, gene sequencing and sex-specific genotyping
DNA was extracted from tail biopsies using a quick-lyse protocol: 100μl of Solution A (25mM NaOH and 0.2mM EDTA) were added to the tissue and kept at 95°C for 30 min, before adding an equal volume of Solution B (40mM Tris, pH = 8). Samples were then spun at 6000 rpm for 10 min and 100μl of the top layer was transferred to a fresh tube. 1μl of this solution was used as template for PCR. To verify the presence of reported SNVs in Aebp1 and Lyplal1, we designed primers for the exonic segments of these genes (primer sequences available in S7 Table), and PCR-amplified genomic DNA using AmpliTaq Gold (ThermoFisher) with buffer II and 0.25mM MgCl2 according to the manufacturer's instructions. PCR amplicons were purified with ExoSAP-IT (ThermoFisher) and submitted for sequencing to the University of Pennsylvania DNA sequencing facility. For sex-specific genotyping of pups we used Sry-specific primers (S7 Table), since this gene is located on the Y chromosome and thus amplicons can only be detected in male tissues. PCR was performed as for sequencing, and the presence-absence of a product was revealed on a gel.
Allele-specific RNA fluorescence in situ hybridization of tissues
Allele-specific RNA fluorescence in situ hybridization works by first detecting the mRNA of interest (regardless of the allele of origin) using conventional single molecule RNA FISH probes labelled in one color (guide probes), and then colocalizing this signal with probes that discriminate specific single-nucleotide differences based on a “toehold probe” strategy and which are labelled in colors unique to the two different alleles. In this way, mRNAs are essentially “tagged” as being either from one or the other parental chromosome. In cultured cells, this approach can successfully distinguish RNA variants that contain just one single nucleotide variant (SNV) and thus can only be targeted with a single variant-specific probe [50,51], but the decreased signal-to-noise ratio makes reliable detection of single probes more difficult in tissue . We therefore opted to work with C57BL/6J (BL6) and JF1/Ms (JF1) mice, which belong to two different Mus musculus subspecies (domesticus and molossinus, respectively) [80,82]. Due to their distant relationship, JF1 mice harbor a large number (>50,000) of SNVs in genic regions compared to BL6 , allowing us to target most genes with multiple SNV-specific probes.
For each gene of interest we first prepared a probe mix, containing a guide probe set (labelled with Cal fluor 610), the two allele-specific probe sets (labelled in Cy3 and Cy5, respectively) and a set of mask oligos (unlabelled, in 1.5x excess of the allele-specific probes) in hybridization buffer (10% dextran sulfate, 2× SSC, 10% formamide). For detection of nascent Aebp1 RNA we also included intronic probes labelled either with Atto488 or Atto700, and to verify the integrity of RNA in male tissues stained for Xist we also included Gapdh probes (labelled with Atto488). To stain the samples, we first washed the slides with tissues sections in 2x SSC, then incubated them in 8% SDS for 2 minat room temperature, washed again in 2x SSC and finally added the hybridization buffer with probes. Slides were covered with coverslips and left to hybridize overnight in a humidified chamber (ibidi) at 37°C. The next morning we performed two 30 min washes in wash buffer (2× SSC, 10% formamide), the second one including DAPI to stain nuclei. To label cell membranes (to clearly identify single cells) the first wash was sometimes substituted with a 15 min incubation in wash buffer containing wheat germ agglutinin coupled with Alexa488 (LifeTech) and a 15 min regular wash. After the final wash, slides were rinsed twice with 2x SSC and once with antifade buffer (10 mM Tris (pH 8.0), 2× SSC, 1% w/v glucose). Finally, slides were mounted for imaging in antifade buffer with catalase and glucose oxidase  to prevent photobleaching.
We imaged all samples on a Nikon Ti-E inverted fluorescence microscope using either a 60x or a 100× Plan-Apo objective and a cooled CCD camera (Andor iKon 934). For whole-tissue scans we imaged at 60x and used Metamorph imaging software (Scan Slide application) to acquire a tiled grid of images. We used the Nikon Perfect Focus System to ensure that the images remained in focus over the imaging area. For 100× imaging, we acquired z-stacks (0.3 μm spacing between stacks) of stained cells in six different fluorescence channels using filter sets for DAPI, Atto 488, Cy3, Calfluor 610, Cy5, and Atto 700. The filter sets we used were 31000v2 (Chroma), 41028 (Chroma), SP102v1 (Chroma), 17 SP104v2 (Chroma), and SP105 (Chroma) for DAPI, Atto 488, Cy3, Cy5, and Atto 700, respectively. A custom filter set was used for CalFluor610 (Omega).
Xist image analysis and modelling
For Xist image analysis we worked with whole tissue scans, where we had collected data for Cal fluor 610 (Xist guide probes), Cy3 and Cy5 (BL6 and JF1 probes, respectively) and DAPI (nuclei). To visualize scans, we used the “Grid/Collection stitching” feature available in Fiji  to assemble tiles. To identify Xist RNA and assign them an allelic identity we developed a custom pipeline in matlab. First, we reconstructed the scan taking into account the tile order provided in a supplementary file. Then, we used the data from the guide channel to detect Xist foci, regardless of allelic identity: we performed background subtraction, removed small objects and smoothened boundaries by border clearing and morphological opening, and then used LoG filtering to sharpen objects, binarized the observed signals and created connected components. Visual inspection of these connected components showed that they largely corresponded to Xist foci, but some areas with high background signal were also being detected as connected components. We therefore applied a number of filters (minimum fluorescence intensities for all RNA FISH channels, minimum cutoff for solidity, maximum area for connected components) to yield the final segmentation. Each obtained spot was then parametrized as the ratio of the signal intensity (background subtracted and normalized to the mean intensity of the scan) of the two SNV probe channels and we applied k-means clustering (2 means) to yield a critical angle above which we assigned spots JF1 identity, and below which we assigned BL6 identity. To verify the quality of these assignments, we designed a graphic user interface to manually annotate Xist foci and their allelic identity. We typically annotated 10 or more randomly selected tiles and the results of this quality control step are shown in S3 Table. On average ~90% (mean 90.9%, standard deviation 5%) of Xist foci were correctly detected, while the remaining 10% of identified spots were areas of high background intensity that had been miscategorized as Xist foci. When Xist spots were correctly identified, typically more than 90% were assigned the correct allelic identity (mean 94.4%, standard deviation 5%).
To assess spatial patterns of Xist allelic choice we then used the positional and identity information from our automatic assignments, and developed a metric for spatial heterogeneity. First, we tiled images into regular rectangles of equal size (i.e. 16 tiles all 1/16 of the full scan size). For each rectangle, we calculated the fraction of cells expressing Xist from the BL6 allele. Next, we obtained the variance of these BL6 cell fraction values across all rectangles of a given size. This protocol was repeated for different sizes of rectangles ranging from 16 to 256 rectangles spanning the entire tissue section. We also calculated a baseline for spatial heterogeneity of random allelic choice by repeating this analysis on 1000 random permutations of the data for each sample generated by Matlab’s randperm. We performed a similar analysis to determine the minimal cluster size of Xist foci with identical allelic identity, but instead of random permutations we generated simulations, where kidney sections were randomly seeded with clusters of a fixed size (ranging from 1 to 10) while keeping the allelic ratio the same as for the measured data. For each seed size we generated 500 simulations. To obtain a likely minimal cluster size for cells with identical X chromosome inactivation we selected the seed size whose variance deviated least from the variance observed for the real data. We repeated this process for each subdivision size and determined the mean across all subdivision sizes.
Allele-specific colocalization analysis of single mRNA spots
For analysis of single molecule RNA spots we used a combination of 60x whole tissue scans in DAPI and Cal fluor 610 to determine the overall structure of the tissue and collecting z-stacks at 100x resolution of 5–10 individual positions within that tissue to identify individual mRNA molecules and characterize their allelic identity. To determine allelic identity we first segmented and thresholded images using a custom Matlab software suite (downloadable at https://bitbucket.org/arjunrajlaboratory/rajlabimagetools/wiki/Home, changeset: d278b7d0012282ecb318fde3bebbe3beaba62032). To quantify colocalization rates we first determined the ideal colocalization radius for each gene. To do so, we segmented extended areas of the tissue (typically containing 10–50 cells). To ascertain subpixel-resolution spot locations the software then fitted each spot to a two-dimensional Gaussian profile specifically on the z plane on which the spot occured. Next, colocalization between guide spots and allele-specific spots was determined in two stages. In the first stage, we searched for the nearest-neighbor allele-specific probe for each guide spots within a 2.5-pixel (360-nm) window and ascertain the median displacement vector field, which was subsequently used to correct for chromatic aberrations. After this correction, we tested a range of different radii (r = 0.1 to 2.5 pixel) for each gene to calculate colocalization rates for the real data, as well for pixel-shifted data, where we took our images and shifted the guide channel by adding 2*r pixels to the X and Y coordinates. This pixelshifted data was used to test random colocalization due to spurious allele-specific spots. For each gene we then visually inspected colocalization rates for real and pixel-shifted data at the different radii and determined a radius where both the colocalization rate for the real data and the difference between the real and the pixel-shifted data was maximal. The selected colocalization radii for each gene are included in S4 Table. To obtain allele-specific data for single cells we then repeated the colocalization analysis, but segmentation of cells was done by drawing a boundary around nonoverlapping individual cells using brightfield or wheat germ agglutinin signal, and colocalization was determined using only the previously determined ideal colocalization rate.
Analysis of transcription sites
Transcription site analysis was performed using a custom Matlab software suite (downloadable at https://bitbucket.org/arjunrajlaboratory/rajlabimagetools/wiki/Home). For this, we segmented cells, thresholded RNA FISH signal and identified transcription sites for Aebp1 by co-localization of spots in the intron and exon channel. Relative fluorescence intensities of transcription sites vs cytoplasmic RNAs were determined based on the fluorescence intensity of the guide probes using custom scripts written with R packages dplyr and ggplot2.
Analysis of SNV probe properties
To determine whether any biophysical properties could differentiate between allele-specific probes that had high vs low colocalization rates, we compiled a table containing the the following parameters (S8 Table): probe name, probe sequence, colocalization rate (the colocalization rate determined for an entire probeset was applied to each individual probe), number of predicted secondary structures and folding energies. The latter two parameters were extracted by running sequences on the mfold web server  for DNA probes, with Na concentration set to 0.3M. Frequency of individual nucleotides, dAT, dGC, purines and pyrimidines was determined through analysis of the probe sequences.
Analysis of dye effects
Since different fluorescent dyes have different detection sensitivity, we tested how this impacted our findings by performing “dye-swap” experiments in which we use the same probes but swap the fluorophore moieties used to label the probes for the BL6 and JF1 alleles. This enables us to measure the extent to which the labels on the probes affected probe binding efficiency and what the consequences were for our results and interpretations. In these experiments we probed Aebp1, Lyplal1 and Mpp5 expression in BL6 and JF1 homozygous tissues with allele-specific probes labelled either with Cy3 for BL6 and Cy5 for JF1, or the reverse (S6A Fig). The unique detection rate (i.e. the rate at which we could assign spots either a BL6 or a JF1 identity) differed by 7% or less between the dye combinations for all three genes, with the exception of Mpp5 in BL6 homozygous tissue. Similarly, the false detection rate, which includes both the error from incorrect probe binding and the differential detection sensitivity, deviated by less than 4% between the dye combinations for all three genes, with the exception of Mpp5 in BL6 homozygous tissue. For Mpp5 in BL6 homozygous tissue we observed that the BL6 allele was detectable at higher efficiency when labelled with Cy5 (66%) than when labelled with Cy3 (50%). We therefore also collected dye-swapped data for Mpp5 in heterozygous tissues (S6B Fig) and performed the “all-or-none” and “coin flip” simulations on both dye combinations (S6C Fig and Fig 3B–3D). In addition, we used the empirically determined false detection rate for each dye combination and gene for all of our analysis and modelling of allelic imbalance in heterozygous tissue to avoid artefacts due to variable sensitivity of detection for different dyes.
Analysis of how colocalization rate affects allelic measurements
Given that for all three genes that we studied in more detail we could determine BL6 or JF1 identity for only approximately 50% of RNAs, we measured how colocalization rate impacted our ability to measure single-cell allelic imbalance. This was difficult to do in tissue, because the low number of RNAs per cell provided a limited dynamic range to probe this effect (i.e. if 1 out of 2 RNAs can not be assigned an identity, this results in 50% detection efficiency). We therefore collected extensive data for Aebp1 in primary fibroblasts, where Aebp1 is expressed at higher levels (10s to 100s of RNAs per cell) to determine the general effect of the colocalization rate on allelic ratios. In homozygous fibroblasts we found that there was an inverse relationship between colocalization rate and false detection rate, so that higher colocalization rate corresponded with a lower false detection rate and a higher rate of calling the correct allele (S5A Fig). Most likely this was caused by low colocalization rates being indicative of other technical issues that complicate determining colocalization rate (such as high background fluorescence). We observed this effect regardless of the dye combination used, but also noted that above a colocalization rate of ~0.5 (the overall colocalization rate in tissue) our ability to detect the correct allele was consistently high (above 75%). Next, we used 10 heterozygous fibroblast cells with the highest colocalization rates. The cells all had colocalization rates >70% and randomly downsampling RNAs to 65, 60, 55, 50 or 45% colocalization rate (1000 random sampling per condition) showed the the mean allelic ratio for each downsampling was similar to the original measured ratio, but the variability around that mean increased with lower colocalization rates. The biggest change in the allelic ratio that we observed was ~0.2 (S5B Fig). Next, we also tested how detection efficiency influenced our conclusions about the “all-or-none” or “coin flip” scenario in tissue. For this, we repeated both simulations using the total RNA count for each cell (i.e. including unclassified RNAs and RNAs that colocalized with both allele-specific probes). We then randomly downsampled the RNA in each cell to the number of RNAs that had been assigned a unique BL6 or JF1 identity in the original measurement. Each simulation was performed 5000 times. We found that the log likelihoods and correlations from these downsampled simulations were essentially indistinguishable from the original simulations performed without downsampling (S5C Fig).
Analysis and modelling of single-cell allelic outcomes
To quantify cell-to-cell variability of allelic state in single tissue cells, we extracted colocalization data from our image analysis pipeline, and used this for further analysis. Using this data, we first compiled a quality control table for each experiment (S5 Table) and excluded those where colocalization rates were <40% (4 out of 35 experiments). For all remaining data we combined replicates from the same genotype, and in the case of heterozygous data, combined results from C57BL/6J x JF1/Ms and JF1/Ms x C57BL/6J tissues. We then processed and visualised single-cell results using custom scripts written with R packages dplyr and ggplot2.
To determine how the observed data compared to random monoallelic expression (all-or-nothing scenario) or binomially distributed (coin-flip scenario) allelic calls we simulated those scenarios through modelling. For the binomial distribution we considered a null model wherein all heterozygous cells share the same allelic ratio, which was determined to be the overall allelic ratio observed at the population level. Then, for an experiment with overall estimated C57BL/6J allelic ratio equal to pBL6 (above), we let nBL6j be the number of transcripts with C57BL/6J identity detected in cell j and nJF1j be the number the number transcripts with JF1/Ms identity detected in cell j. Under the null model, nBL6j was drawn from a binomial with (nBL6j + nJF1j) draws and probability pBL6. We simulated single-cell label counts for cells by drawing from these conditional null distributions for each cell 10,000 times. We then compared the negative log-likelihood of the observed data with the distribution of negative log-likelihoods of each simulation iteration.
To simulate random monoallelic expression each cell was assigned either a BL6 or a JF1 identity, based either on the majority of RNAs in a cell, or based on random assignment if both alleles had the same count. We then designated all RNAs in a given cell to the same allelic identity (eg a cell that originally contained 4 BL6 RNAs and 2 JF1 RNAs would be assigned a BL6 identity with 6 BL6 RNAs). Next, we randomly added “technical noise” 10,000 times, by changing some RNAs in the population to the opposite identity, based on the false positive rates measured in the original BL6 and JF1 homozygous populations. These steps were performed while keeping the final overall RNA assignments in the population the same as the original heterozygous population. For the 10,000 simulations we then calculated negative log likelihoods similarly as we did for the binomial distributions, assuming two separate null models for the BL6 and JF1 populations, whose parameters were determined by the original homozygous population.
To assess if the measured mRNA distributions were compatible with a transcriptional bursting scenario we used the negative binomial distribution to simulate expected mRNA counts . First, we determined the burst size and frequency of the BL6 and JF1 alleles separately, by using the moments method to determine r and p parameters of the negative binomial distribution based on the mean and variance of our measurements (where p = mean/variance and r = mean^2/(variance-mean)), from which we obtained the burst size and frequency using: burst_size = (1-p)/p and burst_frequency = r. We then generated 10,000 RNA counts for the two alleles separately by drawing from a distribution with the r and p parameters we had calculated. We visualized the obtained mRNA counts for both alleles individually using a randomly selected simulation, and also calculated the negative log-likelihood distribution of the 10,000 simulated datasets. Next, we randomly paired the data for the two alleles to generate “cells” with RNA counts from both alleles and calculated the correlation between the BL6 and JF1 counts in each of these modeled cells. In addition to using the negative binomial parameters that we had calculated from our data, we also tested a series of additional burst sizes (from 0.5 to 5 RNAs per burst) and repeated the entire analysis, which showed that our findings were consistent across a range of burst values (see S10 Fig). Finally, to generate a model which included BL6-JF1 correlations that arise due to false assignments, we used the randomly paired data and changed some RNAs in the population to the opposite identity based on the false positive rates measured in the original BL6 and JF1 homozygous populations (one round of reassignments for each simulation). Following reassignment we again calculated the correlation between the BL6 and JF1 counts.
Raw and processed data, as well as scripts for all analyses presented in this paper, including all data extraction, processing, and graphing steps are freely accessible from the Open Science Framework (URL https://osf.io/sbjcw/?view_only=09393298e00e4b8c9d0dd1b24b0318d9).
Our image analysis software (changeset: d278b7d0012282ecb318fde3bebbe3beaba62032) is available here:
S1 Fig. Xist RNA FISH signal in male and homozygous controls.
A. Male heterozygous (BL6 x JF1) tissue was stained with probes for Xist (guide probe labelled with Cal fluor 610 (far left), BL6-specific probes labelled with Cy5 (middle left), JF1-specific probes with Cy3 (middle right)), none of which showed signal above background. Atto 488-labelled Gapdh probes were also included to verify absence of RNA degradation (far right). B, C. Female BL6 homozygous (B) and JF1 homozygous (C) tissues were stained with probes for Xist (guide probe labelled with Cal fluor 610 (far left), BL6-specific probes labelled with Cy3 (middle left), JF1-specific probes with Cy5 (middle right)). Right-most image shows colocalization between guide foci and allele-specific foci as indicated.
S2 Fig. Allelic calls across whole tissue sections and modelling of spatial heterogeneity of Xist.
A. Allelic assignments across different BL6 and JF1 homozygous (far left and far right) as well as heterozygous (middle) kidney sections. Two of the heterozygous kidney sections are technical replicates (different kidney sections from the same animal), which is indicated by an asterisk (“*”). BL6 allelic assignment is depicted in turquoise, JF1 allelic assignment is depicted in orange. B. For all heterozygous samples we calculated spatial heterogeneity using a variance metric, the method of which is schematized: sections were subdivided into a grid, using increasingly smaller squares (from 8x8 to 16x16) and for each subdivision we calculated the ratio of BL6 Xist foci. For each grid we then also calculated the variance of the BL6 ratio across all squares of that grid. C. The measured variance (red line) was compared to the variances obtained for samples where we randomly permuted allelic assignments 1000 times (black line, error bars representing standard deviation of the modeled results). The graphs show the variance for subdivisions of different sizes, with both the area of the subdivisions and the size of the grid indicated. D. Measured variance (red line) was also compared to the variances of samples where we randomly placed different sized clusters (seeds) of allelically identical Xist foci in the tissue (lines in different shades of grey, error bars representing standard deviation of the modeled results). For each seed size we generated 500 randomizations, keeping the allelic ratio constant. For all heterozygous data shown in A, C and D the order of the samples is kept identical.
S3 Fig. Expression levels of selected genes in kidney by bulk and single-cell sequencing.
A. FPKM values of six control samples from Beckerman et al  are shown for the genes used in this study. Red crosses shown the mean of these values. B. UMI counts per cell for Aebp1, Lyplal1 and Mpp5 for cells with non-zero UMIs, based on data from Park et al .
S4 Fig. Colocalization rates and probe properties for autosomal allele-specific probes.
A, B. Overall (A) and allele-specific (B) colocalization rates for different autosomal genes. Overall colocalization rates consider all guide spots that colocalize with either BL6 and/or JF1/C7 allele-specific signal, while allele-specific colocalization counts only those guide spots that colocalize uniquely with either BL6 or JF1/C7 probes. Each spot represent the colocalization rate in one area tested (typically 10–50 cells). All genes were detected with guide probes labelled with Cal fluor 610, and the following allele-specific probes: Aebp1, Mpp5 and Podxl BL6-specific probes labelled with Cy3, JF1-specific probes labelled with Cy5; Churc1 and Lyplal1 BL6-specific probes labelled with Cy5, JF1-specific probes labelled with Cy3; Aqp11 and Stard5 BL6-specific probes labelled with Cy3, probes for the C7 allele labelled with Cy5; Prcp BL6-specific probes labelled with Cy5, probes for the C7 allele labelled with Cy3. Genes are listed in increasing order of number of SNV probes utilized, which is indicated for each gene. C. Probe properties for probe sets with high (>50%) and low (<50%) mean overall colocalization rate. We compared prevalence of individual nucleotides (dA, dC, dG, dT—top row), nucleotides forming three hydrogen bonds (dC+dG) or two hydrogen bonds (dA+dT), purines (dA+dG) and pyrimidines (dC+dT) (middle row), as well as the number of folded structures predicted for each probe, mean and minimum folding energy for each probe (bottom row). For all plots, each spot represents the value obtained for a single probe.
S5 Fig. Impact of colocalization rate on allele-specific RNA imaging.
A. The relationship between colocalization rate and correct assignment rate for RNAs in homozygous fibroblast cells. Each spot represents a measurement from a single cell and different dye combinations are displayed in different colors. B. The effect of colocalization rate on measurement accuracy. Each panel represents a single heterozygous fibroblast cell with >70% colocalization rate. The ratio of BL6 assignments measured in each cell is shown by the dashed line. The density plots show the allelic ratio after randomly downsampling the original RNAs to the indicated colocalization rate. C, D. Testing the effect of colocalization rate on “all-or-nothing” (C) and “coin flipping” (D) simulations. First, simulated cells were generated where the total number of RNAs was assigned an identity according to the model of interest. Next, the RNA in each cell was randomly downsampled to the number of RNAs that had been assigned a unique BL6 or JF1 identity in the original measurement. Each simulation was performed 5000 times. The simulated single-cell RNA counts from a randomly selected simulation are shown (top), as well as the correlations (middle) and log likelihoods (bottom) calculated for the entire population.
S6 Fig. Impact of dye choice for BL6 and JF1 probes on allele-specific RNA imaging.
A. Measurements of allele-specific expression in bulk data for Aebp1, Lyplal1 and Mpp5 homozygous tissues. For each gene, allele-specific probes were either labelled with Cy3 for BL6 and Cy5 for JF1 (top row) or Cy5 for BL6 and Cy3 for JF1 (bottom row). Boxes indicate the dye combination used for subsequent experiments, and the pie charts shown in those boxes are the same as displayed in Fig 2B, 2D and 2F. For all plots we combined data from different replicates with more than 40% colocalization rate. B. Single-cell allele-specific expression for Mpp5 in heterozygous tissue and allelic ratios detected in single-cell and bulk tissue data for heterozygous and homozygous tissues. The cells on the heterozygous scatter plot only contained integer numbers of RNA, but we included jitter to better display the density of cells with a given allelic distribution. The two colors on the heterozygous plots from the two reciprocal crosses (i.e. BL6 x JF1 and JF1 x BL6). C. Results of simulations for heterozygous Mpp5 data: for both “all-or-none” and “coin-flip” model we show simulated single-cell RNA counts (left) and negative log likelihood distribution of simulated and observed data (right).
S7 Fig. Allele-specific Podxl mRNA detection in mouse kidney sections.
A tissue scan of the Podxl guide probe (labelled with Cal fluor 610) shows staining primarily in glomeruli, due to expression of the gene in podocytes (right). At 100x resolution these areas of high expression can clearly be distinguished from background when detecting fluorescence from the guide probe (left, top), as well as for the BL6 (left, middle) and JF1-specific probes (left, bottom). However, the high expression levels of the gene precludes the precise thresholding and detection of individual mRNA spots.
Log likelihood distributions of “all-or-none” (A) and “coin flipping” simulations (B). We show the negative log likelihoods calculated for 10,000 simulations for each gene, as well as the log likelihood of the real data (red line) using the same model as for the simulations.
S9 Fig. Fluorescence intensity of Aebp1 mRNA at transcription sites and non-transcription sites.
Each spot represents the intensity of a single mRNA guide spot (labelled with Cal fluor 610). Transcription sites were identified by overlap with intron probes labelled with Atto488 (replicate 1) or Alexa700 (replicate 2 and 3).
S10 Fig. Probability and correlation of Aebp1 BL6 and JF1 mRNA counts for simulations of transcriptional bursting using different burst sizes.
A, B. Probability of observed BL6 (A) and JF1 (B) mRNA counts given different burst sizes. Burst sizes boxed in red showed a good fit for the two alleles individually and were used for correlation analysis. C. Correlation of per-cell BL6 and JF1 mRNA counts given the burst sizes that showed a good fit (boxed in red in A and B). Burst sizes of the alleles are indicated, and for each correlation analysis the simulations for BL6 and JF1 mRNA counts were paired up randomly. In all figures bar plots represent simulations, red line represents real data.
S11 Fig. Correlation between BL6 and JF1 RNA counts in single cells when including false assignments in the simulations of bursty transcription.
Correlation values for simulated counts in the case independent bursting from the two Aebp1 alleles is shown as grey bar plot, correlation for real data is indicated as red line.
S1 Table. Summary of allelic calls of Xist in kidney tissues.
Xist was imaged using CalFluor610 (guide probe), Cy3 (BL6) and Cy5 (JF1) and allelic identity was determined using an automated pipeline as described in the Methods.
S2 Table. Closest matching modeled cluster size for Xist at different sized subdivisions.
For each tissue we indicate which cluster size has the closest variance at a given subdivision size. A summary of this data is presented in Fig 2A.
S3 Table. Comparison of manual and automatic detection of Xist foci.
Using at least 10 randomly selected tiles we indicate what manual assignments we would have given the computationally identified Xist foci.
S4 Table. Overview of selected autosomal genes.
We show the name and genomic location of genes tested, the strains interrogated and the number of SNP probes used. We also indicate which final colocalization radius we selected, and what the overall colocalization rate as well the unique colocalization rate were at that radius.
S5 Table. Summary of colocalization details and allelic calls for individual experiments performed for Aebp1, Lyplal1 and Mpp5.
S7 Table. List of primers used for genotyping and sequencing.
We thank members of the Raj and Bartolomei laboratories for helpful suggestions and discussions. We particularly thank Chris Krapp for assistance with animal care and husbandry, Joanne Thorvaldsen for suggestions about Xist analysis, Paul Ginart for his advice on methodology and Rohit Gupte for help with transcription site code. Sectioning was performed at the Penn Center for Musculoskeletal Disorders Histology Core (NIH P30-AR06919).
- 1. Symmons O, Raj A. What’s Luck Got to Do with It: Single Cells, Multiple Fates, and Biological Nondeterminism. Mol Cell. Elsevier; 2016;62: 788–802. pmid:27259209
- 2. Abranches E, Guedes AMV, Moravec M, Maamar H, Svoboda P, Raj A, et al. Stochastic NANOG fluctuations allow mouse embryonic stem cells to explore pluripotency. Development. 2014;141: 2770–2779. pmid:25005472
- 3. Trapnell C, Cacchiarelli D, Grimsby J, Pokharel P, Li S, Morse M, et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol. 2014;32: 381–386. pmid:24658644
- 4. Olsson A, Venkatasubramanian M, Chaudhri VK, Aronow BJ, Salomonis N, Singh H, et al. Single-cell analysis of mixed-lineage states leading to a binary cell fate choice. Nature. 2016;537: 698–702. pmid:27580035
- 5. Mohammed H, Hernando-Herraez I, Savino A, Scialdone A, Macaulay I, Mulas C, et al. Single-Cell Landscape of Transcriptional Heterogeneity and Cell Fate Decisions during Mouse Early Gastrulation. Cell Rep. 2017;20: 1215–1228. pmid:28768204
- 6. Cohen AA, Geva-Zatorsky N, Eden E, Frenkel-Morgenstern M, Issaeva I, Sigal A, et al. Dynamic proteomics of individual cancer cells in response to a drug. Science. 2008;322: 1511–1516. pmid:19023046
- 7. Spencer SL, Gaudet S, Albeck JG, Burke JM, Sorger PK. Non-genetic origins of cell-to-cell variability in TRAIL-induced apoptosis. Nature. 2009;459: 428–432. pmid:19363473
- 8. Halpern KB, Shenhav R, Matcovitch-Natan O, Toth B, Lemze D, Golan M, et al. Single-cell spatial reconstruction reveals global division of labour in the mammalian liver. Nature. 2017;542: 352–356. pmid:28166538
- 9. Fritzsch C, Baumgärtner S, Kuban M, Steinshorn D, Reid G, Legewie S. Estrogen-dependent control and cell-to-cell variability of transcriptional bursting. Mol Syst Biol. 2018;14: e7678. pmid:29476006
- 10. Shaffer SM, Dunagin MC, Torborg SR, Torre EA, Emert B, Krepler C, et al. Rare cell variability and drug-induced reprogramming as a mode of cancer drug resistance. Nature. 2017;546: 431–435. pmid:28607484
- 11. Tirosh I, Izar B, Prakadan SM, Wadsworth MH 2nd, Treacy D, Trombetta JJ, et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science. 2016;352: 189–196. pmid:27124452
- 12. Avraham R, Haseley N, Brown D, Penaranda C, Jijon HB, Trombetta JJ, et al. Pathogen Cell-to-Cell Variability Drives Heterogeneity in Host Immune Responses. Cell. 2015;162: 1309–1321. pmid:26343579
- 13. Zanini F, Pu S-Y, Bekerman E, Einav S, Quake SR. Single-cell transcriptional dynamics of flavivirus infection. Elife. 2018;7. pmid:29451494
- 14. Raj A, van Oudenaarden A. Nature, nurture, or chance: stochastic gene expression and its consequences. Cell. 2008;135: 216–226. pmid:18957198
- 15. Vera M, Biswas J, Senecal A, Singer RH, Park HY. Single-Cell and Single-Molecule Analysis of Gene Expression Regulation. Annu Rev Genet. 2016;50: 267–291. pmid:27893965
- 16. Nicolas D, Phillips NE, Naef F. What shapes eukaryotic transcriptional bursting? Mol Biosyst. 2017;13: 1280–1290. pmid:28573295
- 17. Perrot V, Richerd S, Valéro M. Transition from haploidy to diploidy. Nature. 1991;351: 315–317. pmid:2034274
- 18. Sagi I, Benvenisty N. Haploidy in Humans: An Evolutionary and Developmental Perspective. Dev Cell. 2017;41: 581–589. pmid:28633015
- 19. Tang F, Barbacioru C, Nordman E, Bao S, Lee C, Wang X, et al. Deterministic and stochastic allele specific gene expression in single mouse blastomeres. PLoS One. 2011;6: e21208. pmid:21731673
- 20. Goncalves A, Leigh-Brown S, Thybert D, Stefflova K, Turro E, Flicek P, et al. Extensive compensatory cis-trans regulation in the evolution of mouse gene expression. Genome Res. 2012;22: 2376–2384. pmid:22919075
- 21. Borel C, Ferreira PG, Santoni F, Delaneau O, Fort A, Popadin KY, et al. Biased allelic expression in human primary fibroblast single cells. Am J Hum Genet. 2015;96: 70–80. pmid:25557783
- 22. Bonthuis PJ, Huang W-C, Stacher Hörndli CN, Ferris E, Cheng T, Gregg C. Noncanonical Genomic Imprinting Effects in Offspring. Cell Rep. Elsevier; 2015;12: 979–991. pmid:26235621
- 23. Andergassen D, Dotter CP, Wenzel D, Sigl V, Bammer PC, Muckenhuber M, et al. Mapping the mouse Allelome reveals tissue-specific regulation of allelic expression. Elife. 2017;6. pmid:28806168
- 24. Deng X, Berletch JB, Nguyen DK, Disteche CM. X chromosome regulation: diverse patterns in development, tissues and disease. Nat Rev Genet. 2014;15: 367–378. pmid:24733023
- 25. Galupa R, Heard E. X-chromosome inactivation: new insights into cis and trans regulation. Curr Opin Genet Dev. 2015;31: 57–66. pmid:26004255
- 26. Payer B. Developmental regulation of X-chromosome inactivation. Semin Cell Dev Biol. 2016;56: 88–99. pmid:27112543
- 27. Monahan K, Lomvardas S. Monoallelic expression of olfactory receptors. Annu Rev Cell Dev Biol. 2015;31: 721–740. pmid:26359778
- 28. Brady BL, Steinel NC, Bassing CH. Antigen receptor allelic exclusion: an update and reappraisal. J Immunol. 2010;185: 3801–3808. pmid:20858891
- 29. Eckersley-Maslin MA, Spector DL. Random monoallelic expression: regulating gene expression one allele at a time. Trends Genet. 2014;30: 237–244. pmid:24780084
- 30. Yoshioka M, Yorifuji T, Mituyoshi I. Skewed X inactivation in manifesting carriers of Duchenne muscular dystrophy. Clin Genet. 1998;53: 102–107. pmid:9611069
- 31. Plenge RM, Stevenson RA, Lubs HA, Schwartz CE, Willard HF. Skewed X-chromosome inactivation is a common feature of X-linked mental retardation disorders. Am J Hum Genet. 2002;71: 168–173. pmid:12068376
- 32. Renault NK, Dyack S, Dobson MJ, Costa T, Lam WL, Greer WL. Heritable skewed X-chromosome inactivation leads to haemophilia A expression in heterozygous females. Eur J Hum Genet. 2007;15: 628–637. pmid:17342157
- 33. Simmonds MJ, Kavvoura FK, Brand OJ, Newby PR, Jackson LE, Hargreaves CE, et al. Skewed X chromosome inactivation and female preponderance in autoimmune thyroid disease: an association study and meta-analysis. J Clin Endocrinol Metab. 2014;99: E127–31. pmid:24187400
- 34. Echevarria L, Benistan K, Toussaint A, Dubourg O, Hagege AA, Eladari D, et al. X-chromosome inactivation in female patients with Fabry disease. Clin Genet. 2016;89: 44–54. pmid:25974833
- 35. Pereira JP, Girard R, Chaby R, Cumano A, Vieira P. Monoallelic expression of the murine gene encoding Toll-like receptor 4. Nat Immunol. 2003;4: 464–470. pmid:12665857
- 36. Raslova H, Komura E, Le Couédic JP, Larbret F, Debili N, Feunteun J, et al. FLI1 monoallelic expression combined with its hemizygous loss underlies Paris-Trousseau/Jacobsen thrombopenia. J Clin Invest. 2004;114: 77–84. pmid:15232614
- 37. Gimelbrant A, Hutchinson JN, Thompson BR, Chess A. Widespread monoallelic expression on human autosomes. Science. 2007;318: 1136–1140. pmid:18006746
- 38. Zwemer LM, Zak A, Thompson BR, Kirby A, Daly MJ, Chess A, et al. Autosomal monoallelic expression in the mouse. Genome Biol. 2012;13: R10. pmid:22348269
- 39. Li SM, Valo Z, Wang J, Gao H, Bowers CW, Singer-Sam J. Transcriptome-wide survey of mouse CNS-derived cells reveals monoallelic expression within novel gene families. PLoS One. 2012;7: e31751. pmid:22384067
- 40. Gendrel A-V, Attia M, Chen C-J, Diabangouaya P, Servant N, Barillot E, et al. Developmental dynamics and disease potential of random monoallelic gene expression. Dev Cell. 2014;28: 366–380. pmid:24576422
- 41. Eckersley-Maslin MA, Thybert D, Bergmann JH, Marioni JC, Flicek P, Spector DL. Random monoallelic gene expression increases upon embryonic stem cell differentiation. Dev Cell. 2014;28: 351–365. pmid:24576421
- 42. Deng Q, Ramsköld D, Reinius B, Sandberg R. Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science. 2014;343: 193–196. pmid:24408435
- 43. Reinius B, Mold JE, Ramsköld D, Deng Q, Johnsson P, Michaëlsson J, et al. Analysis of allelic expression patterns in clonal somatic cells by single-cell RNA-seq. Nat Genet. 2016;48: 1430–1435. pmid:27668657
- 44. Reinius B, Sandberg R. Random monoallelic expression of autosomal genes: stochastic transcription and allele-level regulation. Nat Rev Genet. 2015;16: 653–664. pmid:26442639
- 45. Kim JK, Kolodziejczyk AA, Illicic T, Teichmann SA, Marioni JC. Characterizing noise structure in single-cell RNA-seq distinguishes genuine from technical stochastic allelic expression. Nat Commun. 2015;6: 8687. pmid:26489834
- 46. Jiang Y, Zhang NR, Li M. SCALE: modeling allele-specific gene expression by single-cell RNA sequencing. Genome Biol. 2017;18: 74. pmid:28446220
- 47. Vigneau S, Vinogradova S, Savova V, Gimelbrant A. High prevalence of clonal monoallelic expression. Nat Genet. 2018;50: 1198–1199. pmid:30082785
- 48. Reinius B, Sandberg R. Reply to “High prevalence of clonal monoallelic expression.” Nat Genet. 2018;50: 1199–1200. pmid:30082784
- 49. DeVeale B, van der Kooy D, Babak T. Critical evaluation of imprinted gene expression by RNA-Seq: a new perspective. PLoS Genet. 2012;8: e1002600. pmid:22479196
- 50. Levesque MJ, Ginart P, Wei Y, Raj A. Visualizing SNVs to quantify allele-specific expression in single cells. Nat Methods. 2013;10: 865–867. pmid:23913259
- 51. Shaffer SM, Joshi RP, Chambers BS, Sterken D, Biaesch AG, Gabrieli DJ, et al. Multiplexed detection of viral infections using rapid in situ RNA analysis on a chip. Lab Chip. 2015;15: 3170–3182. pmid:26113495
- 52. Ginart P, Kalish JM, Jiang CL, Yu AC, Bartolomei MS, Raj A. Visualizing allele-specific expression in single cells reveals epigenetic mosaicism in an H19 loss-of-imprinting mutant. Genes Dev. 2016;30: 567–578. pmid:26944681
- 53. Sado T, Brockdorff N. Advances in understanding chromosome silencing by the long non-coding RNA Xist. Philos Trans R Soc Lond B Biol Sci. 2013;368: 20110325. pmid:23166390
- 54. Ishibashi K, Hara S, Kondo S. Aquaporin water channels in mammals. Clin Exp Nephrol. 2009;13: 107–117. pmid:19085041
- 55. Straight SW, Shin K, Fogg VC, Fan S, Liu C-J, Roh M, et al. Loss of PALS1 expression leads to tight junction and polarity defects. Mol Biol Cell. 2004;15: 1981–1990. pmid:14718565
- 56. Weide T, Vollenbröker B, Schulze U, Djuric I, Edeling M, Bonse J, et al. Pals1 Haploinsufficiency Results in Proteinuria and Cyst Formation. J Am Soc Nephrol. 2017;28: 2093–2107. pmid:28154200
- 57. Maier C, Schadock I, Haber PK, Wysocki J, Ye M, Kanwar Y, et al. Prolylcarboxypeptidase deficiency is associated with increased blood pressure, glomerular lesions, and cardiac dysfunction independent of altered circulating and cardiac angiotensin II. J Mol Med. 2017;95: 473–486. pmid:28160049
- 58. Gardner RL, Lyon MF, Evans EP, Burtenshaw MD. Clonal analysis of X-chromosome inactivation and the origin of the germ line in the mouse embryo. J Embryol Exp Morphol. 1985;88: 349–363. pmid:4078538
- 59. Ponder BA, Schmidt GH, Wilkinson MM, Wood MJ, Monk M, Reid A. Derivation of mouse intestinal crypts from single progenitor cells. Nature. 1985;313: 689–691. pmid:3974703
- 60. Mrozek JD, Holzknecht RA, Butkowski RJ, Mauer SM, Tuchman M. X-chromosome inactivation in the liver of female heterozygous OTC-deficient sparse-furash mice. Biochem Med Metab Biol. 1991;45: 333–343. pmid:2049185
- 61. Wu H, Luo J, Yu H, Rattner A, Mo A, Wang Y, et al. Cellular resolution maps of X chromosome inactivation: implications for neural development, function, and disease. Neuron. 2014;81: 103–119. pmid:24411735
- 62. Park J, Shrestha R, Qiu C, Kondo A, Huang S, Werth M, et al. Single-cell transcriptomics of the mouse kidney reveals potential cellular targets of kidney disease. Science. 2018;360: 758–763. pmid:29622724
- 63. Levesque MJ, Raj A. Single-chromosome transcriptional profiling reveals chromosomal gene expression regulation. Nat Methods. 2013;10: 246–248. pmid:23416756
- 64. Thomas GA, Williams D, Williams ED. The demonstration of tissue clonality by X-linked enzyme histochemistry. J Pathol. 1988;155: 101–108. pmid:3164772
- 65. Takasato M, Little MH. The origin of the mammalian kidney: implications for recreating the kidney in vitro. Development. 2015;142: 1937–1947. pmid:26015537
- 66. Little MH, McMahon AP. Mammalian kidney development: principles, progress, and projections. Cold Spring Harb Perspect Biol. 2012;4. pmid:22550230
- 67. Huang W-C, Ferris E, Cheng T, Hörndli CS, Gleason K, Tamminga C, et al. Diverse Non-genetic, Allele-Specific Expression Effects Shape Genetic Architecture at the Cellular Level in the Mammalian Brain. Neuron. 2017;93: 1094–1109.e7. pmid:28238550
- 68. Brennecke P, Anders S, Kim JK, Kołodziejczyk AA, Zhang X, Proserpio V, et al. Accounting for technical noise in single-cell RNA-seq experiments. Nat Methods. 2013;10: 1093–1095. pmid:24056876
- 69. Dueck HR, Ai R, Camarena A, Ding B, Dominguez R, Evgrafov OV, et al. Assessing characteristics of RNA amplification methods for single cell RNA sequencing. BMC Genomics. 2016;17: 966. pmid:27881084
- 70. Torre E, Dueck H, Shaffer S, Gospocic J, Gupte R, Bonasio R, et al. Rare Cell Detection by Single-Cell RNA Sequencing as Guided by Single-Molecule RNA FISH. Cell Syst. 2018; pmid:29454938
- 71. Kobayashi S, Hosoi Y, Shiura H, Yamagata K, Takahashi S, Fujihara Y, et al. Live imaging of X chromosome reactivation dynamics in early mouse development can discriminate naïve from primed pluripotent stem cells. Development. 2016;143: 2958–2964. pmid:27471261
- 72. Aseem O, Barth JL, Klatt SC, Smith BT, Argraves WS. Cubilin expression is monoallelic and epigenetically augmented via PPARs. BMC Genomics. 2013;14: 405. pmid:23773363
- 73. Frazer KA, Eskin E, Kang HM, Bogue MA, Hinds DA, Beilharz EJ, et al. A sequence-based variation map of 8.27 million SNPs in inbred mouse strains. Nature. 2007;448: 1050–1053. pmid:17660834
- 74. Yang H, Wang JR, Didion JP, Buus RJ, Bell TA, Welsh CE, et al. Subspecific origin and haplotype diversity in the laboratory mouse. Nat Genet. 2011;43: 648–655. pmid:21623374
- 75. Mellis IA, Gupte R, Raj A, Rouhanifard SH. Visualizing adenosine-to-inosine RNA editing in single mammalian cells. Nat Methods. 2017;14: 801–804. pmid:28604724
- 76. Wills QF, Livak KJ, Tipping AJ, Enver T, Goldson AJ, Sexton DW, et al. Single-cell gene expression analysis reveals genetic associations masked in whole-tissue experiments. Nat Biotechnol. 2013;31: 748–752. pmid:23873083
- 77. van der Wijst MGP, Brugge H, de Vries DH, Deelen P, Swertz MA, LifeLines Cohort Study, et al. Single-cell RNA sequencing identifies celltype-specific cis-eQTLs and co-expression QTLs. Nat Genet. 2018;50: 493–497. pmid:29610479
- 78. Merkin J, Russell C, Chen P, Burge CB. Evolutionary dynamics of gene and isoform regulation in Mammalian tissues. Science. 2012;338: 1593–1599. pmid:23258891
- 79. Beckerman P, Bi-Karchin J, Park ASD, Qiu C, Dummer PD, Soomro I, et al. Transgenic expression of human APOL1 risk variants in podocytes induces kidney disease in mice. Nat Med. 2017;23: 429–438. pmid:28218918
- 80. Takada T, Ebata T, Noguchi H, Keane TM, Adams DJ, Narita T, et al. The ancestor of extant Japanese fancy mice contributed to the mosaic genomes of classical inbred strains. Genome Res. 2013;23: 1329–1338. pmid:23604024
- 81. Richardson DS, Lichtman JW. Clarifying Tissue Clearing. Cell. Elsevier; 2015;162: 246–257. pmid:26186186
- 82. Koide T, Moriwaki K, Uchida K, Mita A, Sagai T, Yonekawa H, et al. A new inbred strain JF1 established from Japanese fancy mouse carrying the classic piebald allele. Mamm Genome. 1998;9: 15–19. pmid:9434939
- 83. Raj A, van den Bogaard P, Rifkin SA, van Oudenaarden A, Tyagi S. Imaging individual mRNA molecules using multiple singly labeled probes. Nat Methods. 2008;5: 877–879. pmid:18806792
- 84. Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M, Pietzsch T, et al. Fiji: an open-source platform for biological-image analysis. Nat Methods. 2012;9: 676–682. pmid:22743772
- 85. Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 2003;31: 3406–3415. pmid:12824337
- 86. Raj A, Peskin CS, Tranchina D, Vargas DY, Tyagi S. Stochastic mRNA synthesis in mammalian cells. PLoS Biol. 2006;4: e309. pmid:17048983