Transcriptional bursts explain autosomal random monoallelic expression and affect allelic imbalance

Transcriptional bursts render substantial biological noise in cellular transcriptomes. Here, we investigated the theoretical extent of allelic expression resulting from transcriptional bursting and how it compared to the amount biallelic, monoallelic and allele-biased expression observed in single-cell RNA-sequencing (scRNA-seq) data. We found that transcriptional bursting can explain the allelic expression patterns observed in single cells, including the frequent observations of autosomal monoallelic gene expression. Importantly, we identified that the burst frequency largely determined the fraction of cells with monoallelic expression, whereas the burst size had little effect on monoallelic observations. The high consistency between the bursting model predictions and scRNA-seq observations made it possible to assess the heterogeneity of a group of cells as their deviation in allelic observations from the expected. Finally, both burst frequency and size contributed to allelic imbalance observations and reinforced that studies of allelic imbalance can be confounded from the inherent noise in transcriptional bursting. Altogether, we demonstrate that allele-level transcriptional bursting renders widespread, although predictable, amounts of monoallelic and biallelic expression in single cells and cell populations.

Introduction each allele with high precision. We furthermore showed, in vitro and in vivo, that the fraction of monoallelic expression is 48 mainly driven by the frequency of transcriptional bursts rather than burst sizes, whereas allelic imbalance is a consequence 49 of both burst frequencies and size.

51
We first investigated the theoretical impact of transcriptional burst kinetics on random monoallelic gene expression, using the 52 two-state model of transcription ( Figure 1A). Importantly, a vast number of combinations of burst frequencies and sizes 53 results in the same level of mean expression, which is readily observable in scRNA-seq data ( Figure 1B). To investigate 54 where in parameter space monoallelic gene expression patterns occur, we first calculated the probabilities of generating 55 monoallelic expression for two alleles with identical transcriptional kinetics throughout kinetics parameter space. The kinetics inferred for both alleles after filtering (Methods). In these cells, we calculated the fraction of monoallelic expression 68 from both alleles and considered whether the monoallelic expression was affected by the burst kinetics of the genes. We found 69 that the observed fraction of monoallelic expression show the same relationship with the burst kinetics as predicted by theory, 70 with a ridge of monoallelic expression appearing across the parameter space of burst kinetics (Figure 2A). Furthermore,

71
we estimated the probabilities of observing a cell which is either silent, biallelic, monoallelic on CAST or monoallelic on 72 C57 for all genes, assuming that transcription occurs independently on each allele. The predicted fractions of cells in each 73 state were highly correlated with the observed fraction of cells in each category (Spearman = 0.98, 0.99, 0.92 and 0.93 74 respectively), ( Figure 2B) demonstrating that modelling transcription using the two-state model at each allele independently 75 is in agreement with experimental allelic expression analyses by scRNA-seq.

76
The theoretical results indicated that changes in aRME can be achieved by modulation of either burst frequency or size. To 77 investigate which parameter of transcriptional bursting is the most decisive factor for aRME in the cell, we examined the profile 78 of burst frequency and size in relation to monoallelic expression to isolate their relative contributions. The burst frequency of 79 the CAST and C57 alleles compared to their fraction of monoallelic expression showed a striking relationship ( Figure 2C). 80 At lower burst frequencies, we observed very low amounts of monoallelic expression because there was predominantly no 81 expression from either allele in any cell. The fraction of monoallelic expression increased as the burst frequency increased, 82 up until the point where biallelic expression became the predominant observation and monoallelic expression then declined.

83
This relationship was also clear in the theoretically predicted case which demonstrated that our model predictions were 84 consistent with the biological data (Supplementary Figure 1). The same analysis on burst size showed that the distribution of 85 monoallelic expression was almost uniform over burst size with a tendency of genes with large burst sizes to have more biallelic 86 expression ( Figure 2D). Therefore, while burst size has the theoretical capability to influence the amount of monoallelic 87 expression in cells, it plays a minor role relative to burst frequency. The predominant role of burst frequencies in determining 88 monoallelic gene expression can also be seen in the slope of the ridge of monoallelic expression ( Figure 1D, Figure 2A) 89 To extend the inference and analyses of transcriptional burst kinetics to cell types in vivo, we sequenced individual cells  (Supplementary Figure 2). 94 Due to the heterogeneous cellular composition of certain clusters, we could quantify how well each cell-type cluster 95 predicted its own biallelic expression based on the model of independent allelic transcriptional bursting (Methods). We 96 anticipated that high heterogeneity within a cell cluster would show an underestimation of predicted biallelic expression due 97 to subsets of heterogeneous cells with higher burst frequency for certain genes. Indeed, the median observed-to-expected  To examine the potential of allelic-expression modelling as an unbiased method to assess the degree of heterogeneity 101 within groups of cells, we first examined housekeeping genes as they would have less cell-type-specific transcriptional burst 102 regulation compared to other genes and thereby have observed biallelic call frequencies closer to the expected value (an O/E 103 ratio closer to 1). Indeed, housekeeping genes had a significantly lower O/E ratio compared to randomly selected subsets 104 of genes, and were close to the ratio observed in the fibroblast cells ( < 10 −5 , permutation test, Figure 3B). Importantly, permutation test, Figure 3C). Therefore, the observed-to-expected biallelic expression due to transcriptional bursting may be 108 a valuable metric to quantify the heterogeneity within an assigned cell cluster.

109
Investigating gene expression at the discrete level of monoallelic and biallelic expression is natural at the single-cell level 110 as it is frequently observed due to transcriptional bursting. However it is also important to assess the consequences of on either allele, which is predictably related to the burst frequencies of the two alleles of the gene (Supplementary Figure 3). 115 Most genes have very similar kinetics between the two alleles and therefore a close to equal probability of unequal expression 116 for each allele, as measured by ( 57 > | 57 ≠ ) (Supplementary Figure 3). 117 The probabilities were in good agreement with the observed fractions of allelic bias ( Figure 4A). By comparing the fold 118 changes in burst size and frequency between alleles to their observed fraction of allelic bias, we found that the relative 119 differences in transcriptional burst kinetics in burst frequency as well as size tended to affect allelic bias for that gene 120 ( Figure 4B). Interestingly, simultaneous relative changes in both burst frequency and size may cancel each other out. For 121 example, a reduction in burst size may be compensated by an increase in burst frequency ( Figure 4B; visualized along the 122 diagonal of the scatter plot). By using linear regression with allelic bias as the dependent variable, we determined that relative  To determine the extent to which transcriptional bursting may give rise to false positives in studies of allelic imbalance mean expression, we find that it is only for low-expressed genes that false positive allelic imbalance becomes frequent, and 134 this declines as the number of simulated cells increases (Figure 5C).

136
In this study, we explored if transcriptional burst kinetics can explain observed patterns of random monoallelic gene expression  There is current great excitement in using single-cell RNA-sequencing to identify and characterize cell types, sub-types cell clusters from mouse skin show an observed-to-expected biallelic expression ratio which is lower than what would be 162 expected by chance. If there is heterogeneity with regards to the kinetic bursting parameters between the cells we would 163 expect to observe biallelic expression at a much higher frequency than what would be predicted and therefore a higher biallelic

164
O/E ratio. In model organisms, this can easily be achieved by profiling cells of F1 offspring of characterized parent strains.

165
However, phased haplotypes may also be obtained for human cells in the transcribed parts of the genome from the single-cell 166 RNA-sequencing data itself (Edsgärd et al., 2016), which would allow for such modelling.

167
Transcriptional bursting results in considerable cellular heterogeneity in cases of two functionally different alleles (e.g. see

177
In the off state, the gene can turn on with the rate . In the on state, the gene can turn off with rate and produce one RNA transcript with the rate . Regardless of the state, one RNA transcript can be degraded with rate . At the steady state of this process, the stationary distribution can be shown to be described by the Poisson-beta distribution, in which we let The resulting marginal distribution ( | , , ) is the probability distribution for the amount of RNA transcripts observed 183 at steady state given the rates , and .

184
Inference of transcriptional burst kinetics. 185 We inferred kinetic bursting parameters for 7,191 and 7,186 genes for the C57 and CAST allele respectively from 224 F1 186 cross-breed (CASTxC57) adult tail fibroblasts. We then applied a goodness-of-fit test to these parameters to assess how 187 well the parameters describe the data and found that 5,586 and 5,601 genes fit the model for the C57 and CAST alleles These probabilities assume that the alleles burst independently. Since our probabilities closely follow the observed 205 fractions this assumption seems to be correct. The contour plots in Figure 1C and Figure 1D are based on 100x100 206 parameter combinations where is varied to change burst size while is held constant at 100.

207
For each gene, we calculated the fraction of no expression, monoallelic on C57, monoallelic on CAST and biallelic expression 208 by averaging the following conditional statements over the cells where refers to the number of actual UMI counts for that 209 allele in that cell: We then calculate the observed number of cells with biallelic expression for that gene, ( ), to combine them to obtain 246 ( )∕ ( ). The list of housekeeping genes was obtained from (Li et al., 2017). 247 Ordinary least squares regression of the effect of burst kinetics on allelic bias. 248 We used the OLS module of the statsmodels package in Python with the formula: 249 10 57 > > 57 = 1 10 ( 57 ) + 2 10 ( 57 ) + 3 10 ( 57 ) ⋅ 10 ( 57 ) + where bf is burst frequency and bs is burst size.

250
For Figure 5 we used the burst kinetics parameters inferred from the C57 allele and simulated observations for each gene twice 252 for a varying number of observations ( = 10, 20, 50, 100, 1000, 10000 observations). We then calculated max( ∑ ( 1 , 2 ), ∑ ( 2 , 1 ))∕ 253 for each gene where 1 and 2 are the observed values for the th simulated pair of observations and