Allele-specific RNA imaging shows that allelic imbalances can arise in tissues through transcriptional bursting

Extensive cell-to-cell variation exists even among putatively identical cells, and there is great interest in understanding how the properties of transcription relate to this heterogeneity. Differential expression from the two gene copies in diploid cells could potentially contribute, yet our ability to measure from which gene copy individual RNAs originated remains limited, particularly in the context of tissues. Here, we demonstrate quantitative, single molecule allele-specific RNA FISH adapted for use on tissue sections, allowing us to determine the chromosome of origin of individual RNA molecules in formaldehyde-fixed tissues. We used this method to visualize the allele-specific expression of Xist and multiple autosomal genes in mouse kidney. By combining these data with mathematical modeling, we evaluated models for allele-specific heterogeneity, in particular demonstrating that apparent expression from only one of the alleles in single cells can arise as a consequence of low-level mRNA abundance and transcriptional bursting.


Author summary
In mammals, most cells of the body contain two genetic datasets: one from the mother and one from the father, and-in theory-these two sets of information could contribute equally to produce the molecules in a given cell. In practice, however, this is not always the case, which can have dramatic implications for many traits, including visible features (such as fur color) and even disease outcomes. However, it remains difficult to measure the parental origin of individual molecules in a given cell and thus to assess what processes lead to an imbalance of the maternal and paternal contribution. We adapted a microscopy technique-called allele-specific single molecule RNA FISH-that uses a combination of a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Introduction
Gene expression in genetically identical individual cells often deviates from that of the cell population average [1], which in mammals can impact cell fate and development [2][3][4][5], response to environmental stimuli [6][7][8][9] and disease [10][11][12][13]. Over the past few years, it has emerged that at least some of this variability arises due to random fluctuations in the biochemical processes that underlie transcription and translation. In the case of transcription, a primary source of fluctuations is so-called transcriptional bursting, where a gene alternates between an active state, during which RNA is produced, and an inactive state, where no RNA is transcribed. Because both the time of onset of these bursts and the amount of RNA produced in a single burst are random, this process can lead to cell-to-cell variability [14][15][16].
An additional nuance to the effects of bursting on cellular variability is that diploid mammalian cells carry two sets of chromosomes (one from each parent), which means that they also have two copies of each individual gene. It is typically assumed that for most genes both copies, called alleles, are capable of being expressed, thus providing protection through redundancy if one of them is mutated [17,18]. Recent studies, however, which made use of crosses between distantly related mouse strains and high-throughput sequencing, uncovered that there can be extensive differences in the relative expression levels of the two alleles [19][20][21][22][23]. Additional work then showed that even if transcripts from both alleles are detected at the population level, there may be substantial variation in the degree of allelic imbalance in single cells. For example, while for most genes individual cells express RNA from both alleles, for other genes the population can be a mixture of cells expressing RNA from only one or the other allele. This latter expression pattern has been termed random monoallelic expression, and certainly, genes with such an expression profile exist: most X-linked genes are expressed only from one X chromosome due to random X-chromosome inactivation [24][25][26], and a similar pattern has also been shown for some autosomal genes, such as olfactory receptors or antigen receptors [27][28][29]. Understanding random monoallelic expression is of particular interest given that quantitative cell-to-cell differences or spatial heterogeneity in allele-specific gene expression have the potential to modify phenotypic outcome if the two alleles harbor different functional variants, as has been described for both X-linked (eg. [30][31][32][33][34]) and autosomal traits [35,36].
Beyond these prototypic cases, it has been proposed that many more autosomal genes may be subject to random monoallelic expression [37][38][39][40][41][42][43], but some key properties of this extended class sets them apart from the more established examples. Similarly to the initial group, these genes were classified as displaying random monoallelic expression because cells with only transcripts from one or the other gene copy were observed, with both types of cells present in the same experiment. In addition, some of these genes maintain their monoallelic expression status over multiple passages in clonal cell lines [40,43], so a specific, heritable mechanism could limit transcription to only one allele, as is the case for X chromosome inactivation. However, thus far such a mechanism remains elusive [40,41]. Moreover, unlike previously established cases, many genes in this extended set are not expressed exclusively monoallelically, and typically a subset of cells or clones with transcripts from both alleles can also be detected [37][38][39][40][41]. This suggests that if a specific mechanism does exist to regulate monoallelic expression, it is limited to only a subset of cells.
To resolve this question of mechanism, Reinius et al proposed an elegant scenario (so-called dynamic random monoallelic expression), whereby the many genes with random monoallelic expression may arise not by differential cell-and allele-specific regulation, but instead monoallelic expression may arise by chance [43,44]. In this scenario infrequent transcriptional bursting would lead to cells that contain only RNA from one of the two gene copies. This observed monoallelic expression of mRNA would be temporary and the allelic state of a cell could change over time. The authors confirmed this model in clonal cell lines [43], but whether the same is true in tissues is still an open question. Different groups have deployed single-cell transcriptomics to determine the degree of cell-to-cell allelic imbalance [42,43,45,46], but technical limitations inherent to low abundance RNA quantification, as well as parameter choice can impact the interpretation of allele-specific sequencing data [47,48]. Thus, it has been hypothesised that the level of monoallelic expression, especially at single-cell level, could be an overestimate [45,49]. This absence of precise quantitative data has made it difficult to definitively answer if random monoallelic expression observed in vivo requires a dedicated mechanism or if it could arise as a consequence of transcriptional bursting.
In this study, we adapted a previously described single-molecule RNA fluorescent in situ hybridization technique that is sensitive to single-nucleotide differences between RNAs for the analysis of transcripts in snap-frozen, cryosectioned tissues from different mouse strains and their hybrids [50][51][52]. This allowed us to determine the allelic origin of individual RNAs in single cells, while preserving both their spatial context and their in vivo expression levels. We used this method to measure allele-specific expression of multiple autosomal genes and of Xist, a gene for which it is well-documented that individual cells randomly and exclusively transcribe either the maternal or the paternal copy. Quantitative analysis of the data enabled us to answer whether the autosomal genes we investigated were expressed from one or both gene copies in single cells in tissue. While we observed monoallelic expression in some cells, mathematical modeling showed that this pattern was compatible with random transcriptional events (including transcriptional bursting) from the two alleles producing low levels of RNA, rather than an explicit mechanism governing random monoallelic expression.

SNV-specific detection of RNAs in mouse tissue using single molecule RNA FISH
Our goal in tissues was to quantitatively measure the amount of cell-to-cell variability in transcript abundance from either the maternal or paternal allele of a gene to determine the degree of imbalance between transcripts arising from the two alleles. To make these measurements, we modified a protocol previously developed by our group [50] that enables the detection of single nucleotide variants (SNVs) on individual RNA molecules in situ in cultured cells (Methods, Fig 1A). We applied this method to mouse kidney tissues from C57BL/6J (BL6) and JF1/ Ms (JF1) mice and their F1 progeny, since these two strains harbor a large number of genomic differences, allowing us to target most genes with multiple SNV-specific probes.
To demonstrate our ability to detect expression from the BL6 vs. JF1 allele using our technique in tissue, we first examined the expression of Xist, a prototypical example of random monoallelic expression [25,53]. In female cells Xist is transcribed exclusively from the (randomly chosen) inactivated X chromosome, is expressed at high levels, and contains a large Collecting tissue from F1 heterozygous mice from crosses between BL6 and JF1 mice enabled allele-specific RNA detection in single-cells by utilizing a "toehold probe" strategy that targets polymorphisms between the two strains. While only a single polymorphism-probe is shown here for illustration, all genes were targeted with multiple SNV probes to increase the fluorescent signal. B. The pipeline for allele-specific detection of Xist foci in kidney tissue sections: First, whole-section tissue scans were used to collect fluorescence data in the guide and SNV channels (the scan shown is for the Xist guide probes labelled with Cal fluor 610). The guide probe signal was then used to automatically detect Xist foci, and the fluorescence intensity for the two allele-specific probes was extracted at the positions of those foci (BL6 was labelled with Cy3, JF1 labelled with Cy5). Clustering of the relative fluorescence intensities was applied to all foci, allowing automated allelic assignment for all spots across the entire tissue. C. Quantification of Xist allelic assignments in homozygous and heterozygous tissues. For each sample we depict the fraction of assigned BL6 and JF1 foci. For heterozygous samples biological replicates (ie tissue samples from different animals) are shown separately to reveal inter-individual variability in random X inactivation, while data from a technical replicate (two separate tissue sections from the same animal) is also plotted to show the effect of intra-individual variability and technical noise on allelic assignments. D. The pipeline for allele-specific detection of punctate mRNA spots in kidney tissue sections: whole tissue scans at 60x resolution in DAPI (staining nuclei) were used to obtain an overall impression of tissue morphology. Then, we collected z-stacks for randomly selected locations at 100x resolution in all channels of interest, which enabled detection of single cells and number of SNVs between the two substrains, making it an ideal test case for our method. We collected kidney tissues from mice at day 4 of postnatal development, snap-froze them in liquid nitrogen or on aluminium blocks in dry ice and cryosectioned them at 5 μm thickness. We then applied our in situ hybridization protocol using allele-specific probes. As expected, we observed the appropriate sex-specific expression pattern: no fluorescent signal in male tissues (S1 Fig), whereas in female tissues, our guide probes clearly labelled nuclear Xist foci, which colocalized with signal from the strain-specific probes. For SNV-targeting probes, controls in homozygous tissues confirmed that only the appropriate probes bound, with little binding from the probes targeting the other strain (S1 Fig). Whereas in heterozygous samples the two strain-specific probes each labelled a subset of the nuclei in the anticipated mutually exclusive pattern ( Fig 1B). To further analyse the data, we developed a pipeline to computationally identify Xist RNA foci, and automatically classify them as BL6 or JF1 within an entire scanned kidney section (Methods, Fig 1B). This algorithm identified tens of thousands of Xist foci per section (mean 22k, min 8.2k, max 30k), and-in agreement with our manual inspection of the data-predominantly identified Xist foci of the correct identity in the homozygous samples (S1 Table), while the overall population ratio of BL6:JF1 foci ranged from 45:55 to 65:35 in heterozygous samples (Fig 1C, S2A Fig).
Having verified that we could correctly measure the allelic origin of clusters of Xist RNA accumulated on the X chromosome in mouse tissues, we next ascertained whether the method would also work for monodisperse spots corresponding to single RNA molecules, which is how most mRNAs appear in the cell. We considered this challenging, because single, punctate RNA spots would be both smaller and considerably dimmer compared to Xist RNA, which accumulates multiple copies on the inactivated X chromosome. Thus, to test our protocol for use on single RNA molecules, we designed probes for 8 autosomal genes that contained at least one polymorphism in BL6 versus either JF1 or the C7 strain (which carries both copies of chromosome 7 from the M. musculus castaneus strain in a BL6 background). These genes were selected to represent genes with (Aebp1, Churc1, Lyplal1) and without (Aqp11, Mpp5, Podxl, Prcp, Stard5) putative random monoallelic expression [38][39][40][41] and with different expression levels, patterns and chromosomal locations. Some of the chosen genes were also linked to specific kidney disease phenotypes (Aqp11 [54], Mpp5 [55,56], Prcp [57]) (Methods, S4 Table). As expected, these genes were typically expressed at much lower levels than Xist and punctate individual mRNA spots could not be as readily observed in low magnification scans. We therefore combined whole tissue scans in a single plane at low magnification with random sampling of the tissue at higher magnification, where we imaged the entirety of the section. This approach allowed us to identify individual mRNA spots within the context of the whole tissue section in the 60x scan, while the additional data collected from the 100x z-stacks facilitated our ability to precisely determine colocalization between the guide probes and the strain-specific probes (Fig 1D). Colocalization between the strain-specific signal and the mRNA probes allowed us to determine from which allele a given mRNA originated. Accordingly, this could be used as a key readout for SNV-specific single molecule RNA FISH, and we characterized the quality of the experiments using colocalization rate (i.e. what percentage of the guide spots colocalized with allele-specific spots). When we assayed the overall colocalization rates for these autosomal genes in kidney sections, we found that 4 out of 8 genes had mean colocalization rates >50% (S4 Fig, S4 Table), which is comparable to colocalization rates previously thresholding of Cal Fluor 610 detecting Aebp1 mRNA (top), BL6-detection probes labeled with Cy3 (middle), and JF1-detection probes labeled with Cy5 (bottom). Testing colocalization of these probes revealed allelic assignments for guide probes (far right). https://doi.org/10.1371/journal.pgen.1007874.g001 Allele-specific RNA imaging in mouse tissues observed in cultured cells [50], showing that we were able to perform quantitative SNV-specific single molecule RNA FISH directly in tissue. We also observed an apparent trend between colocalization rate and the number of SNV probes, where genes with fewer SNV probes had lower colocalization rates than those with more SNV probes. We tested a series of parameters that could affect probe binding (base composition, GC content, probe secondary structure and folding energy), but found no parameter that differentiated between the probes with high and low colocalization rates (S4 Fig). However, it should be noted that SNV probes were only tested as full sets (i.e. all SNV probes for a given gene were tested together), and we therefore do not know the binding behaviour of individual probes.
Collectively, these results showed that we could directly visualise and assign strain-specific identities to both focally localised RNA (Xist) and single molecule RNA spots in the context of whole tissue sections. This motivated us to ask if we could directly quantify cell-to-cell heterogeneity in the allelic origin of RNAs in tissue.

Quantifying cell-to-cell heterogeneity in the chromosomal origin of RNA in tissue
To determine how the chromosomal origin of RNAs contributes to cell-to-cell heterogeneity in tissue, we focused on two different questions. For Xist, we investigated the spatial clustering of cells based on their X inactivation choice, i.e. to what extent cells expressing Xist from either the JF1 vs BL6 X chromosome intermix. For the autosomal genes, we quantified their allelic imbalance in single cells, i.e. whether individual cells expressed RNAs from the two chromosomes at different ratios.
In the case of Xist, we measured spatial clustering of allele-specific expression because each cell is randomly and fully committed to expressing RNA from only the BL6 or the JF1 chromosome. Thus, allelic imbalance in tissue is not due to quantitative expression differences between the two alleles in single cells. Instead, it can arise either as a consequence of overall skewing of X chromosome inactivation rates or due to uneven spatial distribution of cells with a given inactivated X chromosome. Such spatial partitioning can arise through the local expansion of cells in which X chromosome inactivation "choice" has already been fixed, resulting in extended patches of cells carrying the same inactive X chromosome [58][59][60][61]. Because we had observed fairly balanced expression from the BL6 and JF1 chromosome in our initial analysis of heterozygous samples (see previous section, Fig 1C), we next determined whether cells expressing Xist from either the BL6 or JF1 allele segregated spatially. Although visual inspection of the sections revealed no extended regions where cells expressed Xist foci with the same strain-specific identity, computational modelling could potentially reveal a more precise view of spatial patterning. We therefore developed a metric that characterized the distribution of cells expressing BL6 Xist RNA (Methods and S2 Fig) and then compared this to either completely randomized BL6 and JF1 assignments or randomizations where we introduced different sized clusters of cells. We found that in all tissues BL6 cells were less evenly distributed than the random assignments (S2C Fig), suggesting that cells cluster together more than expected in a completely random scenario. Our subsequent comparison with the clustered assignments further supported and refined this interpretation: it showed that our data was most similar to simulations with smaller cluster sizes. The closest matching seed size was different, depending on what scale we assessed spatial partitioning, but centered around a cluster size of 2-4 cells (mean cluster size: 3.2, standard deviation 0.7) (S2 Table, Fig 2A and S2D Fig). Thus, cells expressing Xist from the same chromosome clustered together in small patches in tissue sections, resulting in a spatially fairly mixed population of cells and showed no evidence of extended patches with the same allele-specific expression. For the autosomal genes we wanted to know how much the allelic imbalance differed between cells. Visual inspection of our data indicated a range of allelic ratios, including BL6 monoallelic, JF1 monoallelic, as well as biallelic mRNA expression. To quantify this, we focused on three autosomal genes (Aebp1, Lyplal1, Mpp5) where >50% of guide spots colocalized with signal from SNV probes (we excluded Podxl, because the high expression levels of Podxl mRNA in podocytes precluded separating individual RNA spots (S7 Fig)). We selected this cutoff based on testing in cultured mouse fibroblasts, where we had found that in cells with >50% colocalization rate, our method consistently identified the correct allele (Methods, S5 Fig). First, we considered that the observed chromosomal origin (BL6 vs. JF1) could either be due to true biological variability or to technical error (as seen when we detected RNA from the "incorrect" strain in homozygous tissues). To distinguish these, we determined the BL6 and JF1 signal for these genes in kidney tissue from both BL6 and JF1 homozygous mice, as well as in tissue from reciprocal heterozygous crosses. For all three genes, we counted only a few mRNAs in the majority of cells (mean number of RNA spot counts per cell: 3.5 for Aebp1, 3.2 for Lyplal1 and 2.6 for Mpp5). These RNA counts measured by RNA FISH were higher than those observed via single-cell RNA sequencing of the kidney [62]. The mean RNA count by single-cell sequencing ranged between 1.0-1.78 UMI/cell for Aebp1, 1.0-1.2 UMI/cell for Lyplal and 1.0-1.63 UMI/cell for Mpp5, depending on the cell type, considering only those cells where transcripts for these genes were detected (S3B Fig). Given the higher detection rate of our method, it may be particularly useful for assessing allele-specific expression for these lowly expressed genes.
We predominantly detected the correct allele in the homozygous kidney samples, both in bulk and at the single cell level (Fig 2B, 2D and 2F). In heterozygous samples we observed a more balanced presence of both BL6 and JF1 mRNAs, with the reciprocal crosses showing similar results (heterozygous data in Fig 2B, 2D, 2F and 2G, and S5 Table). These results indicated that our technical error (false positive rate) was less than the biological variability and that we could use our method to measure quantitative single-cell differences. Moreover, when we compared the BL6 allelic ratios in homozygous and heterozygous cells with >2 mRNA, we found some heterozygous cells with allelic ratios similar to those of the homozygous samples, but also a subset of cells with an allelic ratio that was intermediate to that of homozygous cells ( Fig 2G). We obtained similar results when we switched the fluorescent dyes conjugated to the SNV detection probes used to label the BL6 and JF1 mRNA (Methods, S6 Fig). The observation that single cells had transcripts from both the BL6 and JF1 allele were particularly intriguing for Aebp1 and Lyplal1, because these two genes had been previously identified as genes with putative random monoallelic expression in other tissues [38][39][40][41]. Still, given that we did in fact observe cells with either BL6 or JF1 monoallelic mRNA expression in heterozygous tissue, akin to the random monoallelic expression pattern, we wanted to know which transcriptional models our quantitative single-cell allelic imbalance data was compatible with, in order to explain how this expression pattern could arise.
The overlaid box plots show the overall distribution of these data points, with the red dot indicating the mean seed size from all subdivision sizes. B,D and F. Measurements of allele-specific expression in single cells (scatter plots) and in bulk data (pie charts) for Aebp1 (B), Lyplal1 (D) and Mpp5 (F). For each gene, data are shown for BL6 homozygous (left), JF1 homozygous (middle) and heterozygous (right) samples. For the singlecell scatter plots cells only contained integer numbers of RNA, but we included jitter to better display the density of cells with a given allelic distribution. For all plots we combined data from different replicates with more than 40% colocalization rate (summaries for each experiment are shown in S5 Table), and the two colors in the scatter plots indicate data from the two reciprocal crosses (i.e. BL6 x JF1 and JF1 x BL6). C and E. Fluorescence micrographs and allelic assignments of Lyplal1 (C) and Mpp5 (E) guide spots in BL6 x JF1 heterozygous kidney cells. For both genes guide probes were labelled with Cal fluor 610. G. BL6 allelic ratios in single cells with >2 RNA with allelic assignments for homozygous and heterozygous samples for Aebp1 (left), Lyplal1 (middle) and Mpp5 (right). https://doi.org/10.1371/journal.pgen.1007874.g002

Observed heterogeneity of strain-specific RNA is compatible with bursty biallelic expression
Our results showing that individual cells could have mRNA from either one or both alleles motivated us to assess whether existing models of transcription were sufficient to explain the observed cell-to-cell variability in allelic imbalance. To evaluate these models, we first used the correlation coefficient between the BL6 and JF1 mRNA expression in our population as a simple metric that captures the joint behaviour of the two alleles: a correlation coefficient of zero represents no coordinated expression between the two alleles, positive values indicate coordinated expression, while negative correlation could be indicative of anti-correlation or repulsion between the two alleles. The correlation between BL6 and JF1 mRNA counts were 0.41 for Aebp1, 0.29 for Lyplal1 and 0.35 for Mpp5 suggesting somewhat coordinated expression. To better understand how these values compare to the expectations from different models of transcription, we initially considered two extreme cases: an "all-or-none" scenario in which every cell has transcripts exclusively from one or the other gene copy, and a "coin flip" scenario in which the allelic origin of every individual transcript is essentially indistinguishable from random coin flipping ( Fig 3A). The former scenario could suggest the existence of regulatory mechanisms that limit transcription to only one gene copy per cell (an extreme form of random monoallelic expression, as is the case with Xist), whereas the latter corresponds to a null model with no distinct allele-specific transcriptional regulation. We used computational modeling to simulate these two scenarios and to discriminate between them.
We first checked if our data were similar to those expected in the "all-or-none" scenario, in which the transcripts in each individual cell were either solely from one or the other gene copy. Looking in heterozygous cells, it seemed qualitatively apparent that (as noted) many cells have mRNAs from both gene copies, which was seemingly incompatible with this scenario. However, it was still formally possible that cells in reality only had transcripts exclusively from one of the gene copies and that the apparent transcripts from the other copy were technical artifacts due to false detection events. We used homozygous tissue to measure the rate at which these false detection events occur, and thereby estimated the expected false detection rate in heterozygous cells. We then computationally simulated hypothetical "all-or-nothing" heterozygous cell populations taking into account these false detection rates, and found that the RNA counts and distributions observed in such a simulated all-or-none population were still inconsistent with our data (compare Fig 3B with heterozygous data in Fig 2B, 2D and 2F). Next, we calculated the correlation of the BL6/JF1 counts for the "all-or-nothing" simulations, which showed much lower correlation values than we had observed in the real data ( Fig 3C). Thus, the two alleles in our simulated data were uncorrelated or anti-correlated, as would be expected in the case of random monoallelic expression, and were unlike the positively correlated BL6 and JF1 mRNA we had observed our measurements. In addition, the probability of observing the strain-specific mRNA counts we measured versus simulated cell populations was also different (S8 Fig). Collectively, these results indicate that in the kidney, none of the three genes we interrogated displayed "all-or-none" expression.
In our alternative scenario, it is possible that the two copies of the gene transcribe RNA independently and each random transcription event produces just a single RNA, thus leading to the "coin-flipping" model in which most cells would have RNAs from both alleles in them, but with some statistical noise about this population average (Fig 3A). We modeled the outcome of such a scenario and found that the real versus simulated single-cell RNA distributions looked very similar for all three genes (compare Fig 3D with heterozygous data in Fig 2B, 2D and 2F). We also saw that the correlation between the BL6 and JF1 allele in our modeled data was similar to our measured correlation in the case of Lyplal1 and Mpp5 (Fig 3E). Statistical analysis showed that when we treated the strain-specific RNA counts per cell as a series of independent coin-flips, the probability of the observed distributions for both Lyplal1 and Mpp5 fell within the distribution of likelihoods from our simulated model (S8 Fig). Together, these results demonstrated that the allelic imbalance observed for Lyplal1 and Mpp5 was compatible with a simple coin-flipping null model of transcription from the two alleles.
In the case of Mpp5, where we had found false detection rates to differ most depending on which fluorescent dyes were coupled to the allele-specific probes, we collected data from heterozygous tissues swapping the dyes used on the BL6 and the JF1 probes and repeated our analysis and simulations. We obtained similar results, showing that a model for all-or-none expression did not recapitulate the allelic counts observed in our data, whereas the coin-flip model did (S6B and S6C Fig). We also tested whether the detection efficiency of~50% influenced our conclusions about the "all-or-none" vs. "coin flip" scenarios by repeating our simulations assuming 100% detection efficiency and then randomly downsampling to the measured detection rate. The results were essentially indistinguishable from those obtained without downsampling, further verifying that our results were not due to the technical noise introduced by the low detection efficiency (S5C and S5D Fig).
All analyses of our simulations therefore supported the conclusion that allelic imbalances for Lyplal1 and Mpp5 were compatible with a simple coin-flipping null model of transcription. Aebp1, however, showed higher levels of imbalance per cell than could be explained by this model. This was not clearly visible on a scatterplot of the simulated RNA counts per cell ( Fig  3D), but the measured BL6 and JF1 counts were less correlated than those in the simulated dataset ( Fig 3E). Yet this gene did not exhibit the all-or-none behavior either. We therefore considered a third, intermediate scenario motivated by the phenomenon of transcriptional bursting [14][15][16]. Transcriptional bursts refer to the fact that most mammalian genes are transcribed in short pulses during which multiple transcripts are synthesized, interspersed between periods during which the gene remains inactive. When there are two copies of a gene, each bursting independently [63], the expected result would be that some cells may have more transcripts from one of the copies than expected by the coin flipping model above; bursting would be akin to getting several heads or tails in a row every time one flipped a coin.
To test whether transcriptional bursting could explain the observed data, we first wanted to confirm that Aebp1 was indeed transcribed in a burst-like fashion. To verify this, we measured Aebp1 transcriptional activity directly in kidney cells by using intronic probes (Fig 3F), which, owing to the extremely short half-life of introns, detect almost exclusively nascent transcripts at the site of transcription [63]. This showed that 19% of cells with Aebp1 mRNA were also actively transcribing Aebp1, and that these transcription sites contained more than 1 RNA based on their fluorescence intensity relative to cytoplasmic RNA spots (average 1.6x higher fluorescence intensity in 3 independent experiments, S9 Fig). This data also showed that the majority of actively transcribing cells had only one Aebp1 transcription site, although a small subset (6 out of 55 (11%) cells with Aebp1 transcription sites) showed simultaneous expression Determining the mechanisms underlying allele-specific single-cell heterogeneity. A. Outline of two extreme scenarios that could lead to variable single-cells allelic imbalance. B, C. Simulated single-cell RNA counts (B) and correlation coefficients of simulated and observed data (C) in the case of the "allor-none" model. D, E. Simulated single-cell RNA counts (D) and correlation coefficients of simulated and observed data (E) in the case of the coin-flipping model. F. Single-molecule RNA FISH of Aebp1 introns (left) and guide spots (right) in mouse kidney, including a cell with two transcription sites (top), a transcription site on the BL6 allele (middle) and a transcription site on the JF1 allele (bottom). The position of the transcription sites are highlighted by red arrows and the allelic assignment of the guide spots are indicated by colored circles. Introns were labelled with Atto488 (top) or Alexa700 (middle, bottom). G. Outcome of simulating the "transcriptional bursting" scenario for each allele separately (i) and for the alleles together (ii). Top row represents simulated singlecell RNA counts, bottom row shows negative log likelihood distribution of simulated and observed data for the BL6 and JF1 allele individually (i) and the correlation between randomly paired simulated RNA counts (ii). All log likelihood distributions (in C, E and G) and the correlation distribution (in G) were obtained from 10,000 simulations, and one of these simulations was picked randomly for each gene to display RNA distributions in B, D and G.
https://doi.org/10.1371/journal.pgen.1007874.g003 from both alleles. This corroborated that cells indeed produced Aebp1 in transcriptional bursts and that it was possible for individual cells to transcribe RNA from both alleles simultaneously.
Next, we turned to simulations to assess the RNA distributions that we would expect in a scenario where the two alleles transcribed RNA independently from each other and in bursts. Initially, we estimated the expected burst size (average number of RNAs that were transcribed together in a single burst) and burst frequency for the two alleles based on the observed RNA counts for each allele independently and used these parameters as inputs for our model (see methods for details). BL6 and JF1 RNA counts simulated this way closely matched our measurements, which was also reflected by the likelihood of the real data falling within that of the simulated data for the two alleles separately (Fig 3G). We then wanted to see whether the degree to which there was allelic imbalance in single cells could be explained by the two alleles bursting independently; i.e., whether for per-cell RNA counts from the two alleles was more or less correlated than one would expect by chance. To simulate the null hypothesis of no interaction between alleles we randomly paired up the modeled BL6 and JF1 counts, mimicking cells that contain RNA from both alleles. When we compared this simulation to the real data we consistently observed that the modeled per cell BL6 and JF1 counts were less correlated than the real pairwise measurements (Fig 3G and S10 Fig). Thus, while in our measurements cells with high BL6 expression typically also expressed JF1 at higher levels (compare heterozygous data in Figs 2B and 3G top panel), in the modeled data BL6 and JF1 counts showed little correlation. This was also true when we incorporated false detection events in our model to account for possible incorrect allelic assignment, as we had done in the "all-or-none" model (S11 Fig). Together, our transcription site measurements and simulations showed that the observed allele-specific single-cell RNA counts for Aebp1 were compatible with transcriptional bursting of the two gene copies individually and that expression from the two alleles was correlated.
Thus, for Aebp1, Lyplal1 and Mpp5 the observed monoallelic expression in some single cells can likely be explained by low levels of transcription occurring randomly from the two gene copies without having to invoke a special mechanism that limits expression to one of the alleles.

Discussion
There has been great interest in recent years to precisely measure expression from the two alleles of a gene in diploid cells, ideally directly in tissue and at the single-cell level. The RNA fluorescence in situ hybridization method described here is a step in this direction: by visualizing endogenous SNVs it enables the assignment of single RNA molecules to their allele of origin in single cells and in the context of whole tissue sections. We now provide quantitative information about cell-to-cell heterogeneity with single transcript resolution, which is an extension of our previous work, where we used this method for a more qualitative assessments (i.e. presence-absence) of parental origin of mRNA in tissue [52]. When we applied our method to autosomal genes we observed that individual cells in heterozygous tissue spanned the entire range from all RNAs originating from the BL6 chromosome through various more mixed populations to all RNAs originating from the JF1 chromosome. These observed allelic imbalances were not due to the parental origin of the gene copies, because reciprocal crosses (BL6xJF1 vs JF1xBL6) showed similar results. We therefore asked what model could explain the observed single-cell allelic imbalance pattern and combined our data with computational simulations to address this question. We found that the observed allelic distributions could be recapitulated by a model where transcription occurred randomly from the two alleles, perhaps with moderate transcriptional bursting (e.g. in the case of Aebp1). Thus, we did not have to invoke a special mechanism that restricted expression to only one allele to explain the presence of cells with either BL6 or JF1 monoallelic expression status.
Our results suggest that cells with RNA expression from only one of the alleles occur due to the low levels of expression and thus the limited number of random sampling events from the two gene copies. Because transcription is a dynamic process this state is likely transient so that while a cell may have mRNA from only one allele at a particular time, it may gain mRNA from the other allele a short time later if another transcriptional event occurs. For Mpp5, this conclusion is in line with the fact that the reported haploinsufficient phenotype is thought to be caused by overall reduced dosage [55,56] rather than by a special subpopulation of cells with monoallelic expression. More generally, this scenario links the observation of transcriptional bursting with that of random monoallelic expression, as put forward by Sandberg et al [43,44], and explains why we observed the co-occurrence of cells with one and two transcription sites in the same population. In the case of Aebp1 and Lyplal1, which had previously been identified to display random monoallelic expression, our data suggest that no additional mechanism is needed to explain the presence of cells with monoallelic expression in kidney tissue. However, we cannot extrapolate these results to other tissues and mouse strains, given that random monoallelic expression is tissue specific. Thus, we cannot conclude whether mechanisms for allele-specific regulation or for maintenance of monoallelic status are present in the mouse strains and cell types used in the studies that originally identified genes with random monoallelic expression [37][38][39][40][41]. Interestingly, though, those studies also observed that many genes with monoallelic expression in one clonal cell line are expressed biallelically in others and that the expression of a given gene is often lower in cell lines with monoallelic expression than in those with biallelic expression [38][39][40], which is consistent with our results.
In addition to the data on Aebp1, Mpp5 and Lyplal1, we also demonstrated our ability to distinguish Xist expression from the BL6 vs JF1 chromosome and to assess the spatial relationship of cells expressing different parental alleles. We detected a spatially fairly mixed population in the kidney, where cells expressing Xist from the same chromosome clustered together in small patches in transverse sections. This is contrary to other tissues, such as intestinal crypts or the skin [58,59,61,64], and suggests either a larger number of kidney precursor cells or extensive cell migration during development. Because our method only provides a snapshot in time we cannot easily distinguish between these scenarios (lineage vs. migration), especially given that the kidney is a complex organ, composed of cells originating from different embryonic lineages that undergo extensive migration during development even after birth [65,66]. Regardless of the developmental mechanism, however, our data indicate that there is likely no major spatial segregation due to X chromosome inactivation in this tissue, and in the case of mutations, any phenotypic effects would be fairly evenly distributed.
Through the examples detailed above we have shown how our method can be used to directly quantify cell-to-cell differences that arise due to differential expression from the two alleles in diploid cells. Our approach overcomes multiple limitations imposed by previous methods: First, because this method enables sensitive SNV-specific detection of even single mRNA molecules it provides more information than RNA FISH measurements that rely solely on quantifying the number of transcription sites in individual cells [36,37,40,41,67]. Second, although our approach tests only single genes in a given experiment and thus has much lower throughput than single-cell sequencing-based methods, it relies on direct detection of transcripts and is therefore not subject to subsampling and dropout, which complicate the interpretation of sequencing-based cell-to-cell variability results [44,46,[68][69][70]. Finally, by making use of pre-existing endogenous SNVs it eliminates the need for genetic manipulation, for example to label the gene of interest with fluorescent tags, as has been done to measure X chromosome inactivation choice [61,71] or to monitor the transcription of autosomal genes from the two chromosomes [9,72]. Moreover, because the breeding history for classical inbred mice has lead to extended regions of shared ancestry (and shared SNVs) between different strains [73,74], a probeset developed for one strain can often easily be adapted for another strain. For example, while we measured Xist expression from the BL6 and JF1 allele, the probes were designed so that they should distinguish equally well between the 129-strains and CAST/EiJ, which are also commonly used to study strain-specific expression.
In addition, while our quantitative analysis focused solely on genes with a relatively high number of SNVs and high mean colocalization rates (>50%), it should be noted that we did not systematically explore the relationship between SNVs and colocalization rates, and also that lower colocalization can be sufficient to address specific questions, as was demonstrated in a recent single cell in situ analysis of A-to-I RNA editing [75]. It is therefore likely that depending on the scientific question, less stringent cutoffs can be applied to colocalization rates and/or the number of SNVs required. In addition to the number of SNVs/allele-specific probes, the applicability of our method also depends on the expression level of the gene of interest. Based on the genes tested here, our method is applicable for genes with low-to-moderate expression levels, corresponding to approximately 2-25 FPKM in bulk sequencing data (see Methods). The lower boundary for this estimate is set by Churc1, a gene that is consistently expressed at low levels in all cell types of the kidney (0.1-1 UMI/cell,~2 FPKM). It is technically possible to determine colocalization events at this mRNA density, though a large number of cells needs to be screened for robust quantification. The upper boundary is based on the observation that a high density of transcripts makes it impossible to precisely assign colocalization. In our experiments only Podxl in podocytes showed such high expression levels. Based on the relatively low proportion of podocytes in the kidney we determined that this matched an expression level of >800FPKM in these cells (based on bulk sequencing data) and an average count of 6.9 UMI/cell in single-cell sequencing data. Importantly, given the higher detection efficiency of RNA per cell, allele-specific single-molecule RNA FISH provides a valuable complementary tool to single-cell transcriptomics for low-to-moderate expression level genes, and will likely be useful in confirming findings made by sequencing.
In conclusion, we demonstrated how quantitative measurement of allele-specific expression in tissue could be used to directly determine the level of allelic imbalance in single cells. By combining these measurements with modeling, we showed that random monoallelic expression could arise in vivo by chance alone. Beyond this application, our methods could have a number of additional uses. Similar analyses could be performed in other tissues and, for example, could enable the evaluation of genetic variants directly in the tissue believed to be affected if there are genic SNVs in linkage with those variants or to study mutations thought to lead to haploinsufficiency. Furthermore, with single cell resolution, our method allows for the interrogation of particular cellular subtypes within a tissue. In concert with recent genome-wide association studies in single cells [76,77], this technique provides a useful tool for quantitative assessment of allele-specific genetic effects.

Mice, tissue harvest, sectioning and fixation
C57BL/6J and JF1/Ms founder mice were obtained from Jackson Laboratories. All mouse work was conducted in accordance with the University of Pennsylvania Institutional Animal Care and Use Committee. For tissue collection we used either homozygous C57BL/6J or JF1/ Ms pups, or F1 heterozygotes from both C57BL/6J x JF1/Ms or JF1/Ms x C57BL/6J crosses. We dissected pups at postnatal day 4 using standard techniques, and mounted tissues in Tissue-Plus O.C.T. compound (Fisher Healthcare), flash-froze them in liquid nitrogen or on an aluminum block in dry ice, and then stored tissues at −80˚C. We determined sex of the animals by visual inspection and verified this by SRY-specific PCR on DNA extracted from a tail sample, collected during dissection. Tissues were cryosectioned at 5 μm using a Leica CM1950 cryostat. We adhered tissue samples to positively charged Colorfrost plus slides (Fisher Scientific), washed slides in PBS, fixed them in 4% formaldehyde for 10 min at room temperature, then washed again in PBS two times. Fixed slides were stored in 70% ethanol at 4˚C.

Gene selection
We shortlisted genes that had previously been identified in studies of random monoallelic expression, genes with known function and/or phenotypes in kidney and genes with documented expression in different cell types in the kidney. We then filtered shortlisted genes for expression levels of >20 TPM in publicly available bulk kidney sequencing data [78], accessed via Expression Atlas (https://www.ebi.ac.uk/gxa/). We later established that this corresponded to expression levels of >10FPKM in bulk and 0.1-1 UMI/cell (in the tissue(s) with highest expression; S3 Fig and S9 Table) using additional bulk [79] and single-cell [62] transcriptome data, respectively. The only gene with lower expression was Churc1 (TPM <10, FPKM~2.2), for which we were nevertheless able to robustly detect transcripts using the guide probes (in agreement with a count of 0.1-1UMI/cell in all cell types of the kidney), and identify colocalization events above random (pixel-shifted) controls. The only gene with higher expression was Podxl with a UMI of 6.9 per cell in podocytes.

Probe design, synthesis and labelling
To identify exonic SNVs between the C57BL/6J and JF1/Ms strains we used the NIG Mouse Genome Database (http://molossinus.lab.nig.ac.jp/msmdb/index.jsp) [80]. For Aebp1 and Lyplal1 we confirmed these SNVs through PCR amplification and sequencing of exonic sequences of genomic JF1/Ms DNA. All guide probes and the Aebp1 intron probe set were designed using the Stellaris probe designer (Biosearch Technologies), SNV-specific probes were designed as specified in Levesque et al. [50] and mask oligonucleotides were selected to leave a 7-11bp overhang (toehold) sequence (all probe sequences available in S6 Table). Guide probes were purchased labeled with Cal fluor 610 (Biosearch Technologies), while SNV-specific probes and intron probes were ordered with an amine group on the 3 0 end. For these latter probes we pooled the oligonucleotides for each probe set and coupled them to either NHS-Cy3 or NHS-Cy5 (GE Healthcare) for the allele-specific probesets, or NHS-Atto488 or NHS-Atto700 (Atto-Tec) for the intronic probes. We purified dye-coupled probes by highperformance liquid chromatography. Mask oligonucleotides were used unlabelled.

DNA extraction, gene sequencing and sex-specific genotyping
DNA was extracted from tail biopsies using a quick-lyse protocol: 100μl of Solution A (25mM NaOH and 0.2mM EDTA) were added to the tissue and kept at 95˚C for 30 min, before adding an equal volume of Solution B (40mM Tris, pH = 8). Samples were then spun at 6000 rpm for 10 min and 100μl of the top layer was transferred to a fresh tube. 1μl of this solution was used as template for PCR. To verify the presence of reported SNVs in Aebp1 and Lyplal1, we designed primers for the exonic segments of these genes (primer sequences available in S7 Table), and PCR-amplified genomic DNA using AmpliTaq Gold (ThermoFisher) with buffer II and 0.25mM MgCl 2 according to the manufacturer's instructions. PCR amplicons were purified with ExoSAP-IT (ThermoFisher) and submitted for sequencing to the University of Pennsylvania DNA sequencing facility. For sex-specific genotyping of pups we used Sry-specific primers (S7 Table), since this gene is located on the Y chromosome and thus amplicons can only be detected in male tissues. PCR was performed as for sequencing, and the presenceabsence of a product was revealed on a gel.

Allele-specific RNA fluorescence in situ hybridization of tissues
Allele-specific RNA fluorescence in situ hybridization works by first detecting the mRNA of interest (regardless of the allele of origin) using conventional single molecule RNA FISH probes labelled in one color (guide probes), and then colocalizing this signal with probes that discriminate specific single-nucleotide differences based on a "toehold probe" strategy and which are labelled in colors unique to the two different alleles. In this way, mRNAs are essentially "tagged" as being either from one or the other parental chromosome. In cultured cells, this approach can successfully distinguish RNA variants that contain just one single nucleotide variant (SNV) and thus can only be targeted with a single variant-specific probe [50,51], but the decreased signal-to-noise ratio makes reliable detection of single probes more difficult in tissue [81]. We therefore opted to work with C57BL/6J (BL6) and JF1/Ms (JF1) mice, which belong to two different Mus musculus subspecies (domesticus and molossinus, respectively) [80,82]. Due to their distant relationship, JF1 mice harbor a large number (>50,000) of SNVs in genic regions compared to BL6 [80], allowing us to target most genes with multiple SNVspecific probes.
For each gene of interest we first prepared a probe mix, containing a guide probe set (labelled with Cal fluor 610), the two allele-specific probe sets (labelled in Cy3 and Cy5, respectively) and a set of mask oligos (unlabelled, in 1.5x excess of the allele-specific probes) in hybridization buffer (10% dextran sulfate, 2× SSC, 10% formamide). For detection of nascent Aebp1 RNA we also included intronic probes labelled either with Atto488 or Atto700, and to verify the integrity of RNA in male tissues stained for Xist we also included Gapdh probes (labelled with Atto488). To stain the samples, we first washed the slides with tissues sections in 2x SSC, then incubated them in 8% SDS for 2 minat room temperature, washed again in 2x SSC and finally added the hybridization buffer with probes. Slides were covered with coverslips and left to hybridize overnight in a humidified chamber (ibidi) at 37˚C. The next morning we performed two 30 min washes in wash buffer (2× SSC, 10% formamide), the second one including DAPI to stain nuclei. To label cell membranes (to clearly identify single cells) the first wash was sometimes substituted with a 15 min incubation in wash buffer containing wheat germ agglutinin coupled with Alexa488 (LifeTech) and a 15 min regular wash. After the final wash, slides were rinsed twice with 2x SSC and once with antifade buffer (10 mM Tris (pH 8.0), 2× SSC, 1% w/v glucose). Finally, slides were mounted for imaging in antifade buffer with catalase and glucose oxidase [83] to prevent photobleaching.

Imaging
We imaged all samples on a Nikon Ti-E inverted fluorescence microscope using either a 60x or a 100× Plan-Apo objective and a cooled CCD camera (Andor iKon 934). For whole-tissue scans we imaged at 60x and used Metamorph imaging software (Scan Slide application) to acquire a tiled grid of images. We used the Nikon Perfect Focus System to ensure that the images remained in focus over the imaging area. For 100× imaging, we acquired z-stacks (0.3 μm spacing between stacks) of stained cells in six different fluorescence channels using filter sets for DAPI, Atto 488, Cy3, Calfluor 610, Cy5, and Atto 700. The filter sets we used were 31000v2 (Chroma), 41028 (Chroma), SP102v1 (Chroma), 17 SP104v2 (Chroma), and SP105 (Chroma) for DAPI, Atto 488, Cy3, Cy5, and Atto 700, respectively. A custom filter set was used for CalFluor610 (Omega).

Xist image analysis and modelling
For Xist image analysis we worked with whole tissue scans, where we had collected data for Cal fluor 610 (Xist guide probes), Cy3 and Cy5 (BL6 and JF1 probes, respectively) and DAPI (nuclei). To visualize scans, we used the "Grid/Collection stitching" feature available in Fiji [84] to assemble tiles. To identify Xist RNA and assign them an allelic identity we developed a custom pipeline in MATLAB. First, we reconstructed the scan taking into account the tile order provided in a supplementary file. Then, we used the data from the guide channel to detect Xist foci, regardless of allelic identity: we performed background subtraction, removed small objects and smoothened boundaries by border clearing and morphological opening, and then used LoG filtering to sharpen objects, binarized the observed signals and created connected components. Visual inspection of these connected components showed that they largely corresponded to Xist foci, but some areas with high background signal were also being detected as connected components. We therefore applied a number of filters (minimum fluorescence intensities for all RNA FISH channels, minimum cutoff for solidity, maximum area for connected components) to yield the final segmentation. Each obtained spot was then parametrized as the ratio of the signal intensity (background subtracted and normalized to the mean intensity of the scan) of the two SNV probe channels and we applied k-means clustering (2 means) to yield a critical angle above which we assigned spots JF1 identity, and below which we assigned BL6 identity. To verify the quality of these assignments, we designed a graphic user interface to manually annotate Xist foci and their allelic identity. We typically annotated 10 or more randomly selected tiles and the results of this quality control step are shown in S3 Table. On average~90% (mean 90.9%, standard deviation 5%) of Xist foci were correctly detected, while the remaining 10% of identified spots were areas of high background intensity that had been miscategorized as Xist foci. When Xist spots were correctly identified, typically more than 90% were assigned the correct allelic identity (mean 94.4%, standard deviation 5%).
To assess spatial patterns of Xist allelic choice we then used the positional and identity information from our automatic assignments, and developed a metric for spatial heterogeneity. First, we tiled images into regular rectangles of equal size (i.e. 16 tiles all 1/16 of the full scan size). For each rectangle, we calculated the fraction of cells expressing Xist from the BL6 allele. Next, we obtained the variance of these BL6 cell fraction values across all rectangles of a given size. This protocol was repeated for different sizes of rectangles ranging from 16 to 256 rectangles spanning the entire tissue section. We also calculated a baseline for spatial heterogeneity of random allelic choice by repeating this analysis on 1000 random permutations of the data for each sample generated by MATLAB's randperm. We performed a similar analysis to determine the minimal cluster size of Xist foci with identical allelic identity, but instead of random permutations we generated simulations, where kidney sections were randomly seeded with clusters of a fixed size (ranging from 1 to 10) while keeping the allelic ratio the same as for the measured data. For each seed size we generated 500 simulations. To obtain a likely minimal cluster size for cells with identical X chromosome inactivation we selected the seed size whose variance deviated least from the variance observed for the real data. We repeated this process for each subdivision size and determined the mean across all subdivision sizes.

Allele-specific colocalization analysis of single mRNA spots
For analysis of single molecule RNA spots we used a combination of 60x whole tissue scans in DAPI and Cal fluor 610 to determine the overall structure of the tissue and collecting z-stacks at 100x resolution of 5-10 individual positions within that tissue to identify individual mRNA molecules and characterize their allelic identity. To determine allelic identity we first segmented and thresholded images using a custom MATLAB software suite (downloadable at https://bitbucket.org/arjunrajlaboratory/rajlabimagetools/wiki/Home, changeset: d278b7d0012282ecb318fde3bebbe3beaba62032). To quantify colocalization rates we first determined the ideal colocalization radius for each gene. To do so, we segmented extended areas of the tissue (typically containing 10-50 cells). To ascertain subpixel-resolution spot locations the software then fitted each spot to a two-dimensional Gaussian profile specifically on the z plane on which the spot occured. Next, colocalization between guide spots and allele-specific spots was determined in two stages. In the first stage, we searched for the nearest-neighbor allele-specific probe for each guide spots within a 2.5-pixel (360-nm) window and ascertain the median displacement vector field, which was subsequently used to correct for chromatic aberrations. After this correction, we tested a range of different radii (r = 0.1 to 2.5 pixel) for each gene to calculate colocalization rates for the real data, as well for pixel-shifted data, where we took our images and shifted the guide channel by adding 2 � r pixels to the X and Y coordinates. This pixelshifted data was used to test random colocalization due to spurious allele-specific spots. For each gene we then visually inspected colocalization rates for real and pixel-shifted data at the different radii and determined a radius where both the colocalization rate for the real data and the difference between the real and the pixel-shifted data was maximal. The selected colocalization radii for each gene are included in S4 Table. To obtain allele-specific data for single cells we then repeated the colocalization analysis, but segmentation of cells was done by drawing a boundary around nonoverlapping individual cells using brightfield or wheat germ agglutinin signal, and colocalization was determined using only the previously determined ideal colocalization rate.

Analysis of transcription sites
Transcription site analysis was performed using a custom MATLAB software suite (downloadable at https://bitbucket.org/arjunrajlaboratory/rajlabimagetools/wiki/Home). For this, we segmented cells, thresholded RNA FISH signal and identified transcription sites for Aebp1 by co-localization of spots in the intron and exon channel. Relative fluorescence intensities of transcription sites vs cytoplasmic RNAs were determined based on the fluorescence intensity of the guide probes using custom scripts written with R packages dplyr and ggplot2.

Analysis of SNV probe properties
To determine whether any biophysical properties could differentiate between allele-specific probes that had high vs low colocalization rates, we compiled a table containing the the following parameters (S8 Table): probe name, probe sequence, colocalization rate (the colocalization rate determined for an entire probeset was applied to each individual probe), number of predicted secondary structures and folding energies. The latter two parameters were extracted by running sequences on the mfold web server [85] for DNA probes, with Na concentration set to 0.3M. Frequency of individual nucleotides, dAT, dGC, purines and pyrimidines was determined through analysis of the probe sequences.

Analysis of dye effects
Since different fluorescent dyes have different detection sensitivity, we tested how this impacted our findings by performing "dye-swap" experiments in which we use the same probes but swap the fluorophore moieties used to label the probes for the BL6 and JF1 alleles. This enables us to measure the extent to which the labels on the probes affected probe binding efficiency and what the consequences were for our results and interpretations. In these experiments we probed Aebp1, Lyplal1 and Mpp5 expression in BL6 and JF1 homozygous tissues with allele-specific probes labelled either with Cy3 for BL6 and Cy5 for JF1, or the reverse (S6A Fig). The unique detection rate (i.e. the rate at which we could assign spots either a BL6 or a JF1 identity) differed by 7% or less between the dye combinations for all three genes, with the exception of Mpp5 in BL6 homozygous tissue. Similarly, the false detection rate, which includes both the error from incorrect probe binding and the differential detection sensitivity, deviated by less than 4% between the dye combinations for all three genes, with the exception of Mpp5 in BL6 homozygous tissue. For Mpp5 in BL6 homozygous tissue we observed that the BL6 allele was detectable at higher efficiency when labelled with Cy5 (66%) than when labelled with Cy3 (50%). We therefore also collected dye-swapped data for Mpp5 in heterozygous tissues (S6B Fig) and performed the "all-or-none" and "coin flip" simulations on both dye combinations ( S6C Fig and Fig 3B-3D). In addition, we used the empirically determined false detection rate for each dye combination and gene for all of our analysis and modelling of allelic imbalance in heterozygous tissue to avoid artefacts due to variable sensitivity of detection for different dyes.

Analysis of how colocalization rate affects allelic measurements
Given that for all three genes that we studied in more detail we could determine BL6 or JF1 identity for only approximately 50% of RNAs, we measured how colocalization rate impacted our ability to measure single-cell allelic imbalance. This was difficult to do in tissue, because the low number of RNAs per cell provided a limited dynamic range to probe this effect (i.e. if 1 out of 2 RNAs can not be assigned an identity, this results in 50% detection efficiency). We therefore collected extensive data for Aebp1 in primary fibroblasts, where Aebp1 is expressed at higher levels (10s to 100s of RNAs per cell) to determine the general effect of the colocalization rate on allelic ratios. In homozygous fibroblasts we found that there was an inverse relationship between colocalization rate and false detection rate, so that higher colocalization rate corresponded with a lower false detection rate and a higher rate of calling the correct allele (S5A Fig). Most likely this was caused by low colocalization rates being indicative of other technical issues that complicate determining colocalization rate (such as high background fluorescence). We observed this effect regardless of the dye combination used, but also noted that above a colocalization rate of~0.5 (the overall colocalization rate in tissue) our ability to detect the correct allele was consistently high (above 75%). Next, we used 10 heterozygous fibroblast cells with the highest colocalization rates. The cells all had colocalization rates >70% and randomly downsampling RNAs to 65, 60, 55, 50 or 45% colocalization rate (1000 random sampling per condition) showed the the mean allelic ratio for each downsampling was similar to the original measured ratio, but the variability around that mean increased with lower colocalization rates. The biggest change in the allelic ratio that we observed was~0.2 (S5B Fig). Next, we also tested how detection efficiency influenced our conclusions about the "all-ornone" or "coin flip" scenario in tissue. For this, we repeated both simulations using the total RNA count for each cell (i.e. including unclassified RNAs and RNAs that colocalized with both allele-specific probes). We then randomly downsampled the RNA in each cell to the number of RNAs that had been assigned a unique BL6 or JF1 identity in the original measurement. Each simulation was performed 5000 times. We found that the log likelihoods and correlations from these downsampled simulations were essentially indistinguishable from the original simulations performed without downsampling (S5C Fig).

Analysis and modelling of single-cell allelic outcomes
To quantify cell-to-cell variability of allelic state in single tissue cells, we extracted colocalization data from our image analysis pipeline, and used this for further analysis. Using this data, we first compiled a quality control table for each experiment (S5 Table) and excluded those where colocalization rates were <40% (4 out of 35 experiments). For all remaining data we combined replicates from the same genotype, and in the case of heterozygous data, combined results from C57BL/6J x JF1/Ms and JF1/Ms x C57BL/6J tissues. We then processed and visualised single-cell results using custom scripts written with R packages dplyr and ggplot2.
To determine how the observed data compared to random monoallelic expression (all-ornothing scenario) or binomially distributed (coin-flip scenario) allelic calls we simulated those scenarios through modelling. For the binomial distribution we considered a null model wherein all heterozygous cells share the same allelic ratio, which was determined to be the overall allelic ratio observed at the population level. Then, for an experiment with overall estimated C57BL/6J allelic ratio equal to p BL6 (above), we let n BL6 j be the number of transcripts with C57BL/6J identity detected in cell j and n JF1 j be the number the number transcripts with JF1/Ms identity detected in cell j. Under the null model, n BL6 j was drawn from a binomial with (n BL6 j + n JF1 j ) draws and probability p BL6 . We simulated single-cell label counts for cells by drawing from these conditional null distributions for each cell 10,000 times. We then compared the negative log-likelihood of the observed data with the distribution of negative loglikelihoods of each simulation iteration.
To simulate random monoallelic expression each cell was assigned either a BL6 or a JF1 identity, based either on the majority of RNAs in a cell, or based on random assignment if both alleles had the same count. We then designated all RNAs in a given cell to the same allelic identity (eg a cell that originally contained 4 BL6 RNAs and 2 JF1 RNAs would be assigned a BL6 identity with 6 BL6 RNAs). Next, we randomly added "technical noise" 10,000 times, by changing some RNAs in the population to the opposite identity, based on the false positive rates measured in the original BL6 and JF1 homozygous populations. These steps were performed while keeping the final overall RNA assignments in the population the same as the original heterozygous population. For the 10,000 simulations we then calculated negative log likelihoods similarly as we did for the binomial distributions, assuming two separate null models for the BL6 and JF1 populations, whose parameters were determined by the original homozygous population.
To assess if the measured mRNA distributions were compatible with a transcriptional bursting scenario we used the negative binomial distribution to simulate expected mRNA counts [86]. First, we determined the burst size and frequency of the BL6 and JF1 alleles separately, by using the moments method to determine r and p parameters of the negative binomial distribution based on the mean and variance of our measurements (where p = mean/variance and r = mean^2/(variance-mean)), from which we obtained the burst size and frequency using: burst_size = (1-p)/p and burst_frequency = r. We then generated 10,000 RNA counts for the two alleles separately by drawing from a distribution with the r and p parameters we had calculated. We visualized the obtained mRNA counts for both alleles individually using a randomly selected simulation, and also calculated the negative log-likelihood distribution of the 10,000 simulated datasets. Next, we randomly paired the data for the two alleles to generate "cells" with RNA counts from both alleles and calculated the correlation between the BL6 and JF1 counts in each of these modeled cells. In addition to using the negative binomial parameters that we had calculated from our data, we also tested a series of additional burst sizes (from 0.5 to 5 RNAs per burst) and repeated the entire analysis, which showed that our findings were consistent across a range of burst values (see S10 Fig). Finally, to generate a model which included BL6-JF1 correlations that arise due to false assignments, we used the randomly paired data and changed some RNAs in the population to the opposite identity based on the false positive rates measured in the original BL6 and JF1 homozygous populations (one round of reassignments for each simulation). Following reassignment we again calculated the correlation between the BL6 and JF1 counts.

Reproducible analyses
Raw and processed data, as well as scripts for all analyses presented in this paper, including all data extraction, processing, and graphing steps are freely accessible from the Open Science Framework (URL https://osf.io/sbjcw/?view_only=09393298e00e4b8c9d0dd1b24b0318d9).

Ethics statement
All animal studies were approved by the Institutional Animal Care and Use Committee at the University of Pennsylvania (IACUC protocol 805433). The animals used in this study were treated humanely and with regard for alleviation of suffering. Two of the heterozygous kidney sections are technical replicates (different kidney sections from the same animal), which is indicated by an asterisk (" � "). BL6 allelic assignment is depicted in turquoise, JF1 allelic assignment is depicted in orange. B. For all heterozygous samples we calculated spatial heterogeneity using a variance metric, the method of which is schematized: sections were subdivided into a grid, using increasingly smaller squares (from 8x8 to 16x16) and for each subdivision we calculated the ratio of BL6 Xist foci. For each grid we then also calculated the variance of the BL6 ratio across all squares of that grid. C. The measured variance (red line) was compared to the variances obtained for samples where we randomly permuted allelic assignments 1000 times (black line, error bars representing standard deviation of the modeled results). The graphs show the variance for subdivisions of different sizes, with both the area of the subdivisions and the size of the grid indicated. D. Measured variance (red line) was also compared to the variances of samples where we randomly placed different sized clusters (seeds) of allelically identical Xist foci in the tissue (lines in different shades of grey, error bars representing standard deviation of the modeled results). For each seed size we generated 500 randomizations, keeping the allelic ratio constant. For all heterozygous data shown in A, C and D the order of the samples is kept identical. Overall (A) and allele-specific (B) colocalization rates for different autosomal genes. Overall colocalization rates consider all guide spots that colocalize with either BL6 and/or JF1/C7 allele-specific signal, while allele-specific colocalization counts only those guide spots that colocalize uniquely with either BL6 or JF1/C7 probes. Each spot represent the colocalization rate in one area tested (typically 10-50 cells). All genes were detected with guide probes labelled with Cal fluor 610, and the following allele-specific probes: Aebp1, Mpp5 and Podxl BL6-specific probes labelled with Cy3, JF1-specific probes labelled with Cy5; Churc1 and Lyplal1 BL6-specific probes labelled with Cy5, JF1-specific probes labelled with Cy3; Aqp11 and Stard5 BL6-specific probes labelled with Cy3, probes for the C7 allele labelled with Cy5; Prcp BL6-specific probes labelled with Cy5, probes for the C7 allele labelled with Cy3. Genes are listed in increasing order of number of SNV probes utilized, which is indicated for each gene. C. Probe properties for probe sets with high (>50%) and low (<50%) mean overall colocalization rate. We compared prevalence of individual nucleotides (dA, dC, dG, dT-top row), nucleotides forming three hydrogen bonds (dC+dG) or two hydrogen bonds (dA+dT), purines (dA+dG) and pyrimidines (dC+dT) (middle row), as well as the number of folded structures predicted for each probe, mean and minimum folding energy for each probe (bottom row). For all plots, each spot represents the value obtained for a single probe. Measurements of allele-specific expression in bulk data for Aebp1, Lyplal1 and Mpp5 homozygous tissues. For each gene, allele-specific probes were either labelled with Cy3 for BL6 and Cy5 for JF1 (top row) or Cy5 for BL6 and Cy3 for JF1 (bottom row). Boxes indicate the dye combination used for subsequent experiments, and the pie charts shown in those boxes are the same as displayed in Fig 2B, 2D and 2F. For all plots we combined data from different replicates with more than 40% colocalization rate. B. Single-cell allele-specific expression for Mpp5 in heterozygous tissue and allelic ratios detected in single-cell and bulk tissue data for heterozygous and homozygous tissues. The cells on the heterozygous scatter plot only contained integer numbers of RNA, but we included jitter to better display the density of cells with a given allelic distribution. The two colors on the heterozygous plots from the two reciprocal crosses (i.e. BL6 x JF1 and JF1 x BL6). C. Results of simulations for heterozygous Mpp5 data: for both "all-or-none" and "coin-flip" model we show simulated single-cell RNA counts (left) and negative log likelihood distribution of simulated and observed data (right). Correlation values for simulated counts in the case independent bursting from the two Aebp1 alleles is shown as grey bar plot, correlation for real data is indicated as red line. (TIF) S1 Table. Summary of allelic calls of Xist in kidney tissues. Xist was imaged using Cal-Fluor610 (guide probe), Cy3 (BL6) and Cy5 (JF1) and allelic identity was determined using an automated pipeline as described in the Methods. (XLSX) S2 Table. Closest matching modeled cluster size for Xist at different sized subdivisions.

Supporting information
For each tissue we indicate which cluster size has the closest variance at a given subdivision size. A summary of this data is presented in Fig 2A. (XLSX) S3 Table. Comparison of manual and automatic detection of Xist foci. Using at least 10 randomly selected tiles we indicate what manual assignments we would have given the computationally identified Xist foci. (XLSX) S4 Table. Overview of selected autosomal genes. We show the name and genomic location of genes tested, the strains interrogated and the number of SNP probes used. We also indicate which final colocalization radius we selected, and what the overall colocalization rate as well the unique colocalization rate were at that radius. (XLSX) S5