Slow nucleosome dynamics set the transcriptional speed limit and induce RNA polymerase II traffic jams and bursts

Nucleosomes are recognized as key regulators of transcription. However, the relationship between slow nucleosome unwrapping dynamics and bulk transcriptional properties has not been thoroughly explored. Here, an agent-based model that we call the dynamic defect Totally Asymmetric Simple Exclusion Process (ddTASEP) was constructed to investigate the effects of nucleosome-induced pausing on transcriptional dynamics. Pausing due to slow nucleosome dynamics induced RNAPII convoy formation, which would cooperatively prevent nucleosome rebinding leading to bursts of transcription. The mean first passage time (MFPT) and the variance of first passage time (VFPT) were analytically expressed in terms of the nucleosome rate constants, allowing for the direct quantification of the effects of nucleosome-induced pausing on pioneering polymerase dynamics. The mean first passage elongation rate γ(hc, ho) is inversely proportional to the MFPT and can be considered to be a new axis of the ddTASEP phase diagram, orthogonal to the classical αβ-plane (where α and β are the initiation and termination rates). Subsequently, we showed that, for β = 1, there is a novel jamming transition in the αγ-plane that separates the ddTASEP dynamics into initiation-limited and nucleosome pausing-limited regions. We propose analytical estimates for the RNAPII density ρ, average elongation rate v, and transcription flux J and verified them numerically. We demonstrate that the intra-burst RNAPII waiting times tin follow the time-headway distribution of a max flux TASEP and that the average inter-burst interval tIBI¯ correlates with the index of dispersion De. In the limit γ→0, the average burst size reaches a maximum set by the closing rate hc. When α≪1, the burst sizes are geometrically distributed, allowing large bursts even while the average burst size NB¯ is small. Last, preliminary results on the relative effects of static and dynamic defects are presented to show that dynamic defects can induce equal or greater pausing than static bottle necks.


Introduction
Over the last 20 years, there has been a significant effort to explain stochasticity in molecular pathways focusing especially on the regulation of transcription and translation [1][2][3][4][5][6].
Most of the theoretical studies were concentrated on regulation of gene activity by means of binding and dissociation of transcription factors. However, a wide variety of assays have been developed to investigate epigenetic features that affect transcription by means of chromatin accessibility (ATAC-seq [7], DNASE-seq [8], FAIRE-seq [9]), histone modifications (Mint-ChIP [10] and ChIP-mentation [11]), or DNA methylation (bisulfite sequencing [12]). All of these epigenetic features exert their influence on transcription through the dynamics of nucleosomes, the histone octamers on which DNA is wrapped. Further, these epigenetic features exert measurable kinetic effects on the wrapping and unwrapping of DNA from the nucleosomes that may provide unexplored regulatory mechanisms for eukaryotic transcription [3,13].
At the molecular level, the process of transcription past a nucleosome is complex, but it has several key features: stalling of RNAPII on approach to a bound nucleosome, temporary nucleosome unwrapping/dissociation, and rebinding of the nucleosome if the site is not occupied by a polymerase. Histone modifications alter the duration and frequency of transcriptional pausing by influencing the fluctuation of nucleosomes between wrapped and unwrapped states (commonly referred to as "nucleosome breathing") [14,15]. The nucleosome's wrapping equilibrium and the absolute time scale of this process can potentially shape the transcriptional dynamics. Bintu et al. showed that transcription of bare DNA had an average of 4 pausing events per kbp with an average pause length of 4.4 seconds while transcription passing an unmodified nucleosome had an average of 14 pausing events per kbp with an average pause length of 10.2 seconds [16]. However, histone modification via acetylation significantly reduced the stalling to an average of 11 pausing events per kbp with an average pause length of 9.6 seconds [16]. In cases where the DNA is completely depleted of histones (as occurs with loss of stem-loop-binding protein), cells show a global increase in transcriptional elongation rates, loss of almost all transcriptional pausing as if the polymerase were transcribing bare DNA, and altered patterns of co-transcriptional splicing [17]. Taken together, this data suggests that nucleosomes are the primary regulator of transcriptional elongation.
However, nucleosomes are also part of a large array of genetic and epigenetic features that control transcriptional elongation rates and the overall transcriptional flux [18,19]. In contrast to the dynamic nature of nucleosomes, there are also static defects associated with transcriptional pausing. The most critical static features that predict elongation slowdowns are high GC content, high exon density, and highly methylated CpG islands [18,19]. It is unintuitive whether these genetic and epigenetic features can best be described as static site defects or if they modify the dynamics of nucleosomes. Wang, Stein, and Ware demonstrated that nucleosomes have higher occupancy on regions of higher GC content and on exons [20]. Jonkers, Kwak, and Lis demonstrated that increased exon density increased RNAPII pausing due to cotranscriptional splicing at intron-exon junctions [18]. Collings, Waddel, and Anderson showed that methylated CpG sites (associated with transcriptional silencing) are more highly occupied by nucleosomes while unmethylated CpG islands are associated with polymerase recruitment and nucleosome depletion [21,22]. Therefore, these pausing events may be considered the result of either a static or dynamic defects.
While nucleosomes stall polymerases, polymerases also exert an effect on nucleosomes. At low rates, each polymerase must move past the nucleosome which rebinds between polymerase crossings. However, at high transcription rates, the lead polymerase pioneers its way through the nucleosomes, while holding the DNA open for its followers. This process keeps the nucleosome in the destabilized state longer, making it more likely to be totally displaced from the chromatin [23,24]. Thus, the nucleosomes induce convoy formation through pausing, but the resulting convoys obstruct nucleosome rebinding. In spite of the numerous simulations and extensive modeling of initiation-rate limited transcription and transcriptional bursting due to multi-state promoter dynamics, few computational and mathematical modeling studies have been performed to investigate the possibility of a transcriptional regime limited by nucleosome-induced pausing or the unique pattern of transcriptional bursting that this would induce [25][26][27][28]. Therefore, we developed a stochastic, agent-based model of transcription through nucleosomes arranged in the canonical beads on a string geometry. The nucleosomes undergo stochastic transitions from the wrapped to unwrapped state consistent with the thermodynamic equilibrium set by the microscopic rate constants. Encountering a nucleosome in the closed state causes polymerases to stall until the nucleosome reopens. In turn, the presence of polymerases prevents nucleosomes from closing.
From a theoretical perspective, the proposed model is an extension of a classical model in stochastic transport theory known as the Totally Asymmetric Simple Exclusion Process (TASEP) [29,30]. In the open boundary TASEP, particles can be injected through the first boundary (if the first lattice site is unoccupied) with propensity α, removed through the other boundary (if the last lattice site is occupied) with propensity β, and advance with propensity q (typically set to one to define the time scale) if there is no particle obstructing it on the next, non-boundary lattice site as shown in Fig 1A. Even though the TASEP only has nearest neighbor interactions, the TASEP exhibits a non-trivial kinetic phase diagram (Fig 1B) with the bulk density ρ, average hop rate v, and total particle flux J = v×ρ on the lattice completely defined by the initiation and termination propensities as shown in Eqs 1-3 [31][32][33]. where periodically spaced nucleosomes function as extended-body dynamic defects. RNAPII bind the first lattice site (transcription start site, TSS) at rate α, RNAPII advances at rate q when unobstructed by other polymerases or nucleosomes. RNAPII is then released from the gene at the end of transcription at rate ß, which is set equal to q (ß = q 8 > < > : Additionally, the boundary defects give rise to a variety of density profiles as shown in Fig 1C. The theory of the dynamic defect TASEP (ddTASEP), which is relevant to transcription, has not been fully developed. Van den Berg and Depken constructed a ddTASEP of transcription through single-site nucleosomes that would automatically be displaced from the gene after direct contact with polymerases [25]. Their model showed the first proof of concept of polymerase convoy organization and nucleosome depletion. However, their model treated polymerase advancement past nucleosomes as a single concerted step (i.e. RNAPII automatically moves through nucleosomes at a manually specified, slower rate rather than letting the wrapping dynamics dictate the advancement), the representative cases focused on low nucleosome density with slow rebinding, and they interpreted the results by mapping them on to a hypothetical two state model [25]. Waclaw et al. investigated a general dynamic defect TASEP with single site hopping particles and single site defects focusing primarily on closed loop cases with soft exclusion effects that allow nucleosome and polymerase co-occupancy with some coverage of the open boundary case [34].
Previous studies investigating the flux-density-hop rate (J-ρ-v) relationship of TASEPs with dynamic defects have focused primarily on weak or fast fluctuating perturbations and have utilized mean field approaches that cannot account for the long range particle-particle and particledefect correlations [25,34,[35][36]. Most notably, Waclaw et al. studied a variety of single site dynamic defect TASEPs with periodic and open boundaries with constrained and unconstrained defect binding [34]. However, most of their theoretical results were derived for the mathematically simpler case with periodic boundaries in which the bulk particle density remains constant.
In this study, we propose a transcription-dedicated, TASEP model with elongated, dynamic defects, and hard polymerase-nucleosome exclusion constraints ( Fig 1D) that accounts for key features of RNAPII transcription through nucleosomes including nucleosome depletion, polymerase convoy organization, and transcriptional bursting. Additionally, we abandon the mean field approximation to focus on the regime in which transcription proceeds via propagation of polymerase convoys of arbitrary length. We derive the mean first passage time (MFPT) and variance of the first passage time (VFPT) of the pioneering polymerases that lead the convoys using a Markov Chain approach [37][38][39]. We quantified the strength of the nucleosome perturbation via the mean first passage rate γ, which is inversely proportional to the MFPT and depends on the microscopic nucleosome wrapping and unwrapping rate constants. The initiation rate α and the mean first passage rate γ were then used to construct the ddTASEP phase diagram (orthogonal to the canonical TASEP αβ-phase diagram at β = 1) and to calculate the location of the nucleosomeinduced jamming phase transition [29,31,32]. We identified two regimes in the novel αγ-plane: initiation limited and nucleosome pausing (defect) limited. In both regions, we postulated approximate expressions for the elongation rate v, the bulk density ρ, and the bulk transcription flux J. Additionally, the inter-burst intervals and burst sizes generated by the slow nucleosome dynamics were investigated over a wide range of conditions spanning the initiation-limited and nucleosome-limited regimes. Finally, preliminary results on the effects of variability in the RNAPII advance rate and in the nucleosome dynamic rate constants along DNA (e.g. caused by presence of exons or DNA methylation on CpG islands) are presented.

Numerical simulations
The model is simulated using a Gillespie algorithm with a variable number of reaction channels based on the set of all possible legal moves [40]. During each Monte Carlo step, the list of all legal polymerase and nucleosome moves is compiled. The propensities for each possible move are used to construct a categorical distribution, and a legal move is randomly selected from this distribution with probability proportional to its propensity. Next, the waiting time to the selected move is drawn from an exponential distribution with a mean waiting time given by the total propensity (sum of all legal propensities). The move is performed, and the system time is updated.

Determination of ddTASEP properties from the simulation output
The canonical properties of the TASEP are the flux J, average hop rate v, and the average site occupancy/density ρ [31,32]. The considered ddTASEP is ergodic and converges to a steady state probability density. For each Monte Carlo step, the state of the system is described by two vectors s ! and h ! composed of binary variables s i and h m indicating the polymerase occupancy of the i th site and the nucleosome occupancy of the m th nucleosome associated region, respectively. The averages of v, ρ, and J are estimated within the time interval between the first passage time t FP (Monte Carlo step N FP ) and final time point t MCS (at least eight times greater than t FP ). All simulations were run for a fixed number of Monte Carlo steps (N MCS = 1.2×10 6 for qualitative trends or N MCS = 2×10 6 for quantitative results). The time averaged density profile ρ i is estimated in Eq 4.
As Derrida proved for the classic TASEP, far from the boundaries, the three primary properties of the TASEP converge exactly to values set by the microscopic rate constants for the initiation and termination rates (α, ß) [31,32]. For an infinitely long lattice, the boundary effects become negligible. For genes of 20 kbp (400 lattice sites) or longer, the boundary effects are weak, allowing the system to be characterized in terms of these three non-spatial, non-temporal metrics. Taking N mRNA to be the total number of mRNA produced, N Sites to be the number of lattice sites on the gene, and Dt k pass to be the time between the k th RNAPII binding and terminating its transcription, the bulk averages of the three primary properties are defined in Eqs 5-7.
The bulk nucleosome densities ρ N and bulk nucleosome density profiles ρ N,i are calculated analogously to ρ in Eq 1 and in Eq 2. However, the bulk nucleosome density ρ N is scaled by a factor of ¾ or more generally k N k L þk N to account for the linker regions.

Classifying burst events and measuring burst properties
When the pioneering polymerase reaches the end of the gene, the subsequent polymerases in the convoy exit with the flux predicted by the classical TASEP since a sufficiently long convoy functions as a shorter TASEP within the larger ddTASEP lattice. Given that the advance rate into the convoy and the termination rate are both greater than ½ (i.e. q = 1), polymerases exit with the highest flux predicted by the classical TASEP J max = ¼. This implies that the average intra-burst waiting time (t in ) is equal to 4 time units. However, there is no way to systematically predict what set of parameters (α, h o , h c ) will give rise to bursting or what the optimal threshold is to classify an unknown waiting time t w as either an intra-burst waiting time t in or an inter-burst interval t IBI . Therefore, we take the following probabilistic approach. First, kernel density estimation is performed on the set of log-transformed waiting times (log 10 t w ) weighted by (log 10 t w ) so that the rare bursting events are more prominent. Next, an additional smoothing step is performed to reduce the noise from the probability density function estimation of the rare burst events. A local maxima search is performed to attempt to identify the intra-burst interval peak and the inter-burst interval peak. If only one peak is found, the ddTASEP is considered initiation limited, and no burst properties are recorded. If the system is bimodal, a threshold T is calculated by taking the geometric mean of the peaks (which is also the arithmetic mean of the log transformed values). Burst events are defined by t w >T. This threshold might filter out some intermediate cases, but it guarantees that those parameter sets (α, h o , h c ) that generate robust bursts will be accurately classified while systematically eliminating false positives due to intrinsic stochasticity. The burst size N B is defined as the number of mRNA produced between these two breaks in transcription. Burst size and inter-burst intervals are averaged across simulations since they constitute rare events.

Using correlations to quantify defect perturbation strength
The inter-site Pearson r ij correlation is defined below as k¼N sat ½s i ðt k Þ À r i �½s j ðt k Þ À r j � ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi P N M C S k¼N sat ½s i ðt k Þ À r i � 2 q ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Eq 8 can be used to define a matrix R = [r ij ] that contains the inter-site correlations for all site pairs (i,j). Each row or column of the matrix R is a vector of Pearson correlations that we denote as r i ! ¼ ½r i1 ; � � � ; r ii ; � � � ; r iN sites �. Given the periodic nature of the lattice when a site is far from the boundaries, the inter-site correlations r ij should only be a function of the relative distance between the sites (Δ = j−i). Further, each central row vector r i ! should exhibit the same qualitative features around the i th site as some hypothetical bulk correlation vector r (defined only in terms of inter-site distance) would around its central site at Δ = 0. To construct the bulk correlation profile r, we first chose a maximum distance (Δ max = d) to evaluate the correlations over. Then, for each i2{d, N sites −d}, a new vector r 0 i ¼ ½r ðiÀ dÞi � � � r ii � � � r ðiþdÞi � can be constructed that truncates each r i ! and centers it around the i th site. Finally, we defined the bulk correlation profile r as the sliding window average over the vectors r 0 i as shown in Eq 9.
Each entry r Δ of the vector r gives the average inter-site correlation based on relative distance Δ. This metric allows for direct assessment of the strength and length scale of polymerase cooperativity and also allows for more direct, visual comparison of the correlation strengths from different parameter sets than can be obtained from the correlation matrices themselves.

Code implementation and code/data availability
Code was written in MATLAB R2018a (Math Works, Natick MA). High throughput parallel simulations were performed on a remote server (HARDAC at Duke Center for Genomic and Computational Biology) utilizing a SLURM job handler and an Lmod Environmental Module System that provided MATLAB R2017b. Given that MATLAB's internal parallelization could not be used due to conflict between it and the job scheduler, each instance of the parameter sweep was evaluated on a separate node initialized into the same environment. Since the environmental handler initialized with the same default random number seed for each iteration, random number seeds were specified manually.
Code for the core algorithms and the calculation of simulation observables is available at SimTK (https://simtk.org/projects/histone_ddtasep) along with the associated data sets. Data sets are provided both in a.mat format that can be directly loaded into the MATLAB workspace. Alternatively, the files can be downloaded as Excel files in a.xlsx format and reprocessed by the reader.

ddTASEP transcription model formulation
In the proposed model (Fig 1D), the ddTASEP lattice is segmented into 50 base pair (bp) sites. A single nucleosome occupies three sites, and there is one linker site between each nucleosome-associated region. This discretization was chosen since the RNAPII footprint on DNA is generally estimated to be roughly 40 bp [41,42]. Further, the nucleosome linkers range in size from 10-70 bp [43]. Additionally, DNA wrapped around nucleosomes is known to be exactly 145-147 bp [43]. Last, assays such as BruDRB-seq have suggested that the maximum velocity of polymerases is roughly 50 bp/s even for extreme outliers across multiple cells types [19], and other more common assays such as MS2-tagging have suggested transcription speeds of 26-86 bp/s [44]. Thus, 50 bp sites will be occupied by one RNAPII enzyme which will take on average one second to clear the site on bare DNA at 50 bp/s. There are three possible actions for polymerases: initiation, advancement, and termination. If the transcription start site (TSS) is open, polymerases can be initiated with propensity α. Once on the gene, RNAPII advances with propensity q while it is not obstructed by another RNAPII or a nucleosome in the wrapped/closed state. It must pause until the nucleosome or other polymerase moves out of its way. Nucleosomes transition from the open (unwrapped) to the closed (wrapped) position with propensity h c when all three nucleosome associated sites are empty. The nucleosome unwraps to the open position with propensity h o . At the end of the gene, the RNAPII will terminate with propensity ß = q.
Since the lattice is segmented into 50 bp sites and the characteristic RNAPII velocity is of order of 50 bp/s, the RNAPII advance rate (from one 50 bp site to the next) q is on the order of 1 site/s. For the sake of simplicity, we will use time units such that q = 1 in the figures. However, q is left in the equations associated with the figures for generality. In the last section of this manuscript, the role of non-uniformity of DNA (due to the presence of exons, methylated CpG islands, and high GC content) will be incorporated by assigning each lattice site a new nominal advance rate q i or by assigning each nucleosome a unique set of rate constants h c,m and h o,m .
The model can be described formally as follows. Let s i be a binary variable representing the polymerase occupancy of the i th lattice site and let h m be the nucleosome occupancy of the nucleosome associated with sites i, (i+1), (i+2), and (i+3). Site i is a linker site, and the others are the nucleosome associated sites. When any of (i+1), (i+2), and (i+3) are occupied, nucleosome m cannot rebind. These rules are summarized as follows: Nucleosome Entry : Normal Advancement : Termination : Wrapping :

Nucleosome dynamics control polymerase convoy formation and transcriptional bursting
The dynamical defect TASEP exhibits a wider array of dynamical behavior than the classical TASEP [31,32] or static defect TASEP [45][46][47][48]. The pausing induced by nucleosome binding leads to organization of RNAPII into convoys. In Fig 2, we analyze the behavior of the considered ddTASEP.
The mRNA accumulation curves shown in Fig 2A all exhibit characteristic features. First, there is a delay due to the first passage time for the first RNAPII to clear the length of the gene as it interacts with the nucleosomes. Second, there is an approximately linear increase in the number of transcriptional events after this initial period. In the weakly perturbed cases (bottom row), the polymerases advance quasi-deterministically giving rise to uninterrupted, linear in time mRNA synthesis. As the nucleosome dynamics slow down (top row), two patterns of nucleosome-induced pausing are observed. In the slow opening and closing case (h o = h c = 0.001, top left), periods of sustained transcription are disrupted by nucleosome binding resulting in large transcriptional pauses. Therefore, the nearly vertical sections of mRNA production in these cases represent mRNA bursts observed when a convoy of RNAPII reaches end of the gene. In the fast closing and slow opening cases (h o = 0.001, h c = 0.1, top right), fast nucleosome re-wrapping leads to the formation of smaller, tight burst groups without the long, uninterrupted periods of continuous transcription seen in the previous case.
The self-organization of polymerases into convoys that give rise to the bursts in Fig Fig 1C, the values for the bulk density ρ are always equal or larger than those of the classical TASEP. Further, the density profiles ρ i showed a sawtooth wave pattern along the spatial axis since the linker sites always had higher polymerase occupancy due to nucleosome induced pausing. As expected, higher levels of polymerase occupancy lead to noticeable nucleosome depletion as demonstrated by the difference between resting probabilities of nucleosome occupancy (dotted red line) and the steady state nucleosome density profiles in Fig 2D. The nucleosome occupancy profiles presented here look like inverted versions of the polymerase density profiles consistent with the hard exclusion constraint that allows polymerases to hold nucleosomes in the open conformation.
Last, the central sites in the inter-site correlation heatmap Fig 2E confirm our expectation that the inter-site correlations only depend on the relative distance between sites. Using Eq 9, for inter-site correlation r Δ in Fig

Mean and variance of first passage time
As demonstrated in Fig 2A and 2B, our model implies that the time needed to produce first complete mRNA depends on the nucleosome-induced pausing. Using Markov chain theory, the expected passage time through a nucleosome unit (consisting of one linker site and three nucleosome sites) for the first polymerase can be calculated [49,50]. Due to the periodic spacing of nucleosome binding sites, the MFPT to produce an mRNA after gene reactivation is simply the sum of the expected times for the first polymerase to clear each nucleosome unit.
The interaction between a polymerase on a linker site and a nucleosome can be treated as a three state Markov Chain (Fig 3A), where State 1 (polymerase on linker with nucleosome open) and State 2 (polymerase on linker with nucleosome closed) can reversibly communicate until the absorbing State 3 (polymerase enters the first nucleosome site) is reached as shown in Fig 3A. The evolution of the state probability distribution of this Markov Chain p ! ¼ ½p 1 p 2 p 3 � can be described as a system of Forward Kolmogorov Equations with generator matrix Q [37,39,49]: All of the moments of the first passage time of any absorbing Markov chain are uniquely determined by the submatrix of transient states in Q which will be referred to as R. The fundamental Matrix N is then given by [38,49] The expected waiting times for the polymerase to enter the nucleosome site from the open state (E o , State 1) and closed state (E c , State 2) can be expressed in terms of the fundamental matrix N as The overall expected waiting time to enter the nucleosome associated site (E e ) is given by the weighted average of E o and E c with respect to the probability of finding the nucleosome in the open or closed state upon the polymerase's initial approach The expected passage time to transcribe past a nucleosome and its three associated sites (E h ) is the sum of E e and the expected time to pass through 3 empty sites (3/q) Consequently, the overall MFPT to transcribe the gene is given by where α, β and N h are the initiation rate, termination rate, and the number of nucleosome units, respectively. Fig 3B shows  We use E h also to construct a new parameter γ called the mean first passage elongation rate.
By definition γ = 1 when the nucleosomes are constantly open h c = 0. The reader should note the distinction between the bulk elongation rate v (the average velocity of all polymerases to transcribe a gene) and the mean first passage rate γ (the expected/ensemble-averaged velocity of the first polymerase to transcribe a gene).
Another useful metric to understand the effects of the nucleosomes on transcription is the variance of the first passage time (VFPT). The variance of the nucleosome entrance waiting time (V e ) was calculated with the following equation from [38].
As with the mean first passage time, the variance of the time to pass through a nucleosome unit (V h ) and the overall variance of the first passage time (VFPT) are obtained from V e .
Fig 3C demonstrates that the analytical expression for VFPT (Eq 26) accurately predicts the simulated VFPTs (R 2 = 0.991), over three orders of magnitude of h o and h c . When nucleosome binding was turned completely off (h c = 0), the analytical expression and simulation both provided the expected result for bare, nucleosome-free DNA (i.e. a canonical TASEP).
Finally, we introduce the index of dispersion D e ¼ V e E e , that can be used to measure the deviation of the ddTASEP's behavior from a classical TASEP on nucleosome-free DNA for which D e ¼ 1 q . The index of dispersion of the time to enter a single nucleosome D e was found to be In the limits as h c = 0 or h o !1, the index of dispersion converges to (1/q) which is consistent with the canonical TASEP limit since this is simply the index of dispersion for an exponential distribution or a Poisson process [49]. In contrast, when h o �h c , the index of dispersion becomes dominated by the expected waiting time for the nucleosome to reopen. Fig 3D and 3E investigate whether the index of dispersion is more strongly controlled by the resting nucleosome density (Fig 3D) or the absolute time scale of the reopening time 1 h o ( Fig 3E). As shown in Fig 3D, there is a linear increase in index of dispersion with respect to increasing resting nucleosome density ( h c h o þh c ) followed by an abrupt increase as the resting density goes to one. It is also possible that this change in the index of dispersion likely arises from the associated decrease in h o needed to achieve the desired initial nucleosome density for fixed h c . Fig 3E shows a smooth transition from the quasi-deterministic random walk of the classical TASEP (D e �1/q in the limit h c = 0) to the nucleosome limited cases where the unwrapping waiting time associated with nucleosome breathing dominates the index of dispersion D e �(1+h c )/h o in the limit h o !0. Thus, these large deviations from canonical TASEP behavior leading to polymerase self-organization and bursting are driven primarily by infrequent interactions with nucleosomes with slow re-opening (low h o ) but can be marginally enhanced by more frequent interactions with nucleosomes (higher resting nucleosome density) due to higher rates of re-wrapping (high h c ).

Validity of Coarse-Graining the Lattice
A core assumption of this model is that the lattice can be treated in terms of 50 bp sites instead of at 1 bp resolution (consistent with the fact that RNAPII advance one nucleotide at a time). It is a well-known result that, for fixed lattice size, a TASEP with particles of length L>1 will have reduced flux through the lattice relative to a TASEP with particles of length L = 1 [51,52]. However, the effect of simultaneously changing both particle length and lattice size on transcription flux is not intuitive. Therefore, we considered the possibility of using non-exponential waiting times. The true waiting time distribution for this system is likely an Erlang(k = n, λ = n) distribution with n2 [1,50]. Two special cases of this exist: When n = 1, the canonical exponential waiting time distribution is reobtained. When n = 50, the interpretation is that all 50 nucleotide polymerization steps occur with waiting times given by independent and identically distributed exponential variables with a uniform rate of 50 bp/s. (Both cases represent extremes that are not likely to be biologically realistic.) The variance of the advancement waiting time under the exponential distribution is 1s 2 while it is 0.02s 2 under the Erlang(k = 50, λ = 50) distribution. In a non-Markovian TASEP without defects, the reduced variance in the hop time leads to fewer collisions allowing the average (bulk) particle hop rate v to approach the nominal hop rate q. This leads to substantially enhanced flux J relative to a Markovian TASEP with exponentially distributed waiting times [53].
However, in the proposed ddTASEP model, we hypothesize that the contribution from pausing induced by the slow nucleosome dynamics will dominate the first passage time eliminating the entrainment effect that is observed on bare DNA allowing us to coarse-grain the system and use a simpler, exponential waiting time distribution without introducing significant error. To demonstrate this, we first prove that the entry waiting time statistics are not affected by considering a larger number of intermediate entry steps. Second, we then demonstrate that the nucleosome induced pausing dominates the mean and the variance of the first passage time relative to the time to transcribe the bare DNA after entry into the nucleosome associated region.
First, we consider a more general transient state matrix R 0 (analogous to Eq 17) which considers the entry of a polymerase of length L that travels at rate Lq (between subsequent base-pairs) into the nucleosome associated region that can be temporarily arrested by nucleosome rebinding at rate h c at any step of the process. The transient states of the new Markov Chain R 0 will have a block structure with each two rows indicating an open (unarrested) and closed (arrested) state for each step of the extended polymerase's entry into the nucleosome associated region as given by As a representative example, we can consider the two-site case where the advance rate is 2q. Interestingly, the nucleosome entry time for the two-site case E e,2 was identical to that the of the one-site case (E e,2 = E e ). Additionally, the variance of the entry waiting time V e,2 was found to be Further, we expect that V e,2 �V e . Evaluating the ratio of the two gives While analytical results become unwieldy as L!50, this approach can be performed numerically in MATLAB. The numerical values of E e, 50 and V e,50 are plotted against the analytical results for E e and V e in S1 Fig for the same parameter sets shown in Fig 3. As hypothesized, S1A Fig  Therefore, while a more accurate waiting time distribution would be critical in the absence of nucleosomes, the nucleosome induced pausing and subsequent polymerase-polymerase collisions within a burst group likely cancel out any of the entrainment effects that would be observed on bare DNA if a more realistic waiting time distribution were used. Thus, we believe that the use of a single site polymerase with 50 bp lattice sites and exponentially distributed waiting times is still an appropriate simplification to model the system.

Constructing the ddTASEP αγ-plane
The canonical TASEP phase diagram exhibits three jamming transitions in the αβ-plane when the initiation rate becomes limiting (α<β and α<½), the termination rate becomes limiting (ß<α and ß<½), or when the channel capacity is reached in the max flux limit (α, ß�1/2) as shown in Fig 1B. Analogously, the ddTASEP should exhibit a jamming transition due to the limitation on the maximum transcription flux imposed by the nucleosome-induced pausing. Therefore, we expected that the transcription efficiency γ could be used to define a third axis of the phase diagram orthogonal to the canonical αß-plane. By assuming ß = q = 1, we restrict our analysis to the αγ-plane as shown in Fig 4. The reader should note that, by the selection of q = 1, we are implicitly rescaling the values of α and γ with respect to the advance rate q.
Classically, the ideal method to determine the value of the transcription observables J, ρ, and v and to locate the phase transitions in parameter space would be to utilize a mean field approach [31,32]. However, this approach has significant limitations. Based on the model formulation in Eqs 1-6, it is clear that the system's chemical master equation would involve the products of up to four site occupancies. Therefore, when taking the ensemble average of the site occupancies, quadratic (e.g. hs i s i+1 i) and cubic mixed moments (e.g. hs i+1 s i+2 s i+3 i) appear in the master equation. (Note that quartic moments such as hh m s i+1 s i+2 s i+3 i vanish due to the exclusionary binding constraint.) Under the mean-field approximation (also referred to as the random phase approximation for the TASEP), these mixed moments can be factored as which implies that each site of the classical TASEP comes to a steady state value and that no site occupancies are correlated with each other. The correlation-free assumption clearly fails for the proposed ddTASEP with extended dynamic defects and exclusionary defect binding as shown in Fig 2E and 2F. Therefore, we decided to abandon this approach entirely. In order to derive approximate expressions for J, ρ, and v, we propose a phenomenological model inspired by saturation kinetics [54,55] that will interpolate between the canonical TASEP regime (γ = 1, α2[0, 1]) and the strongly perturbed ddTASEP regime with low initiation rates (γ<1, α�1). Specifically, appropriate functional forms for v and ρ were selected to satisfy the assumed asymptotic behavior. The bulk elongation rate v and the bulk density ρ were then used to calculate J, and all three expressions were then compared against the simulated results.
First, we identified an appropriate functional form for the bulk polymerase hop rate v in the regime with a < 1 2 . In the canonical TASEP limit γ!1, the bulk hop rate v is equal to (1−ρ) = (1−α) for α<½ as shown in Eqs 1 and 2. Since the density in the initiation limited regime is ρ = α (Eq 2), the bulk hop rate v must equal (1−α). As the initiation rate α!0, the gene effectively resets to its resting state between each passing polymerase. Therefore, the bulk hop rate v should approach γ. To satisfy both limits, we propose that In order to estimate the bulk density ρ, we noticed that, in the limit γ!1, the bulk density ρ must also converge to the classical TASEP bulk density in Eq 2. Additionally, in the limit γ!0 with a sufficiently large initiation rate α, ρ should be close to one in this limit since a whole gene spanning convoy may form (Fig 2B). Gene spanning convoys are able to form because the most likely binding site for nucleosomes is at the end of the gene (Fig 2D, top left), where the polymerase density drops to nearly zero. Last, in the highly perturbed cases, switching from an almost completely full gene to an almost empty gene is observed as shown in Fig 2B  (top left). Under these conditions, the limit ρ!1 is not reached even though the gene may transiently be fully occupied.
From the canonical TASEP model, the density has a linear relationship with the initiation rate in the low-density region. For sufficiently high first passage elongation rates γ, the density should increase almost linearly with the initiation rate for small values of α. However, the density will eventually saturate to some limit influenced by γ since convoys will back up on the transcription start site preventing new initiation. We postulate that ρ has a linear dependence on the initiation rate α (in the limit γ!1) and tends to one as γ!0. Therefore, we propose that r≔ a a þ ð1 À aÞg for a < 1 2 Finally, we obtain the transcription flux J as In the limit γ!1, the value of J converges to the unperturbed TASEP flux for the initiation limited regime (Eq 3) scaled by the advance rate q.
In the unperturbed TASEP, when a � min 1 2 ; b À � , the flux (as well as the density and the hop rate) become independent of the injection rate, and the flux J reaches its maximum value for a given β equal to max 1 4 ; bð1 À bÞ � � : By analogy, it is reasonable to expect that, for a given γ, the flux of the ddTASEP will also become independent of the initiation rate α after some saturating condition is reached since the nucleosomes will cause the polymerases to back up onto the TSS. We propose that for a given γ the jamming transition will occur when J(α,γ) reaches its maximum value with respect the initiation rate α, as given by Therefore, the critical value of the initiation rate α � is For α<α � (γ), both α and γ limit the rate of transcription. In the nucleosome-limited regime where α�α � (γ), the density, flux, and hop rate no longer depend on the initiation rate but now depend exclusively on nucleosome-induced pausing. Substituting α � into Eqs 32, 33 and 34 gives the proposed estimates for v, ρ, and J.

The initiation rate α and the mean first passage rate γ control transcription dynamics
The results shown in Fig 4B suggest that Eqs 38-40 predict the transcriptional dynamics of the system regardless of the particular values of h o and h c . This is confirmed by Fig 5A-5C which show the dependence of the bulk elongation rate v (Fig 5A), bulk density ρ (Fig 5B), and the transcription flux J (Fig 5C) on the mean first passage rate γ (which was obtained by varying h o while holding h c fixed across multiple orders of magnitude). The plots confirm that the values of h o and h c typically have relatively small effect on the transcription dynamics, which is nearly fully governed by γ.
The highest deviations exist for the lowest and highest values h c = 0.001 (blue) and h c = 1 (red). However, the predictions for the bulk elongation rate v (Fig 5A), bulk density ρ (Fig 5B), and transcription flux J (Fig 5C) are the most accurate for the biologically relevant cases with h c = 0.01 (green) and h c = 0.1 (orange) across all three metrics. As a point of reference, if q ¼ 50 bp s ¼ 1 site s , then the nucleosome dynamics associated with 0.01�h c <0.1 s −1 would be on the order of 10-100 seconds, which is biologically relevant as shown by Bintu et al [13]. Fig 5D-5F show the dependence of the bulk elongation rate v (Fig 5D), bulk density ρ ( Fig  5E), and transcription flux J (Fig 5F) on α. Because Fig 5A-5C show that α and γ control the transcription dynamics with minimal effects from h o and h c under biologically relevant conditions, we decided to omit explicitly varying h c in Fig 5D-5F. However, the reader should note that h c = h o that the nucleosome rate constants vary from 0.0143 to 0.7 to achieve values of γ from 0.1 to  Fig 5D-5F. The sharp jamming transition that arises when initiation rate passes the critical value α � (γ) is readily visible in these figures. Fig 5F shows that, as predicted by the model, the transcription flux increases linearly with α until α reaches the critical value α � (γ) at which J(α,γ) reaches its maximum value J max (γ) (Eq 40). Similarly, the bulk density ρ(α,γ) reaches its maximum value ρ max (γ) (Eq 39) in Fig 5E while the bulk elongation rate v(α,γ) reaches its minimum value v min (γ) (Eq 38) in Fig 5D. It is worth nothing that our theoretical predictions slightly overestimate the bulk elongation rate and slightly underestimate the bulk density. However, since the transcription flux is the product of these two quantities, the errors nearly cancel out leading to quantitative agreement between the simulated results and theoretical estimate for J(α,γ) as shown in Fig 5F. For J(α,γ) the transition between the initiation-limited and nucleosome-limited regimes is smooth (in contrast to the sharp transitions for ρ and v) in agreement with assumption that J(α, γ) has zero derivative with respect to α at α � (γ) (Eq 37).  Fig 6A-6C show the dependence of elongation rate, bulk density, and transcription flux on the length of the gene, ranging from 2,000 to 20,000 base pairs. On short genes, the convoys start to span the length of the gene, giving rise to some interesting effects. Most notably, the observed transcription flux is higher than that predicted by our proposed model, and the observed density is lower than our proposed model on short genes. We hypothesize that this is connected to the existence of gene spanning convoys. When convoys span the length of the gene, nucleosomes can only infrequently re-bind, and the gene transiently approximates classical TASEP behavior. In the initiation limited regime, the transcription flux of a classical TASEP is the upper-bound of transcription flux for all other related ddTASEPs (with γ<1), and the density of a classical TASEP is the lower-bound of density for all other analogous ddTASEPs. Therefore, the previously mentioned elevated transcription flux and decreased bulk density are consistent with more classical TASEP behavior that may occur in a gene-spanning convoy. Accordingly, when the gene length increased to over 10,000 base pairs, the transcription flux, density, and elongation rate converge to the theoretical predictions for the ddTASEP since the formation of a gene-spanning convoy (that would induce quasi-classical TASEP dynamics) becomes too unlikely for the given rate of nucleosome dynamics. Fig 6D-6F show the dependence of the bulk ddTASEP properties on the linker spacing between nucleosomes on a 20,000 base pair gene. The reader should note that the number of nucleosomes decrease in the simulation as the number of linker sites increase. Even when γ must be calculated assuming different linker geometries, the analytical results for the bulk elongation rate (Fig 6D) and transcription flux (Fig 6F) adapt almost perfectly to the new geometry and perform reasonably well for the bulk density (Fig 6E).

Features of nucleosome-induced convoy formation and transcriptional bursting
While classical models of transcription attribute bursting to promoter dynamics where the promoter switches between an ON and OFF state [26,27,[56][57][58][59], the ddTASEP model exhibits bursting even with a constitutively active promoter because of nucleosome-induced pausing. In order to identify bursts, we utilized the difference in time scales between the average intraconvoy waiting time t in and the average inter-burst interval t IBI . As can be seen in Fig 2A and  2B, nucleosome induced pausing introduces non-trivial time delays between the last polymerase of a convoy and the pioneering polymerase of the convoy following it. To briefly summarize our approach (described in full in the methods), we utilized a kernel density estimate of log 10 t w weighted by itself to determine whether the system exhibited bursting and (in the presence of bursts) to identify the approximate location of the average inter-and intraburst intervals. The threshold for a burst waiting time was set as the geometric average of the two peaks of the kernel density estimate. Representative kernel density estimates for the parameter sets used in Fig 2 are shown in Fig 7A. Excluding the cases without strong correlations (Fig 2F, bottom left and bottom middle), all of the remaining cases were strongly bimodal indicating that non-trivial bursting is occurring.
Next, we investigated the internal transcription dynamics of bursting ddTASEPs versus non-bursting ddTASEPs in Fig 7B. In the classic TASEP, the waiting time between particles is given by the time headway distribution f(t;ρ). The probability density function f(t;ρ) of this distribution is given by [60] and its cumulative distribution function F(t;ρ) is given by The mean of this distribution is known to be 1 rð1À rÞ ¼ 1 J ¼ 4. For a burst group at the end of a gene, we hypothesized that it would exhibit max flux limit TASEP dynamics since the entry rate into the burst group is given by q and the termination rate is given by β = q. Therefore, the effective density of a burst group would be ρ = 0.5. In contrast, for a case without bursting, the density of the system would be given by our initiation limited estimate in Eq 39. The empirical cumulative distribution functions in Fig 7B directly confirm this result.
In Fig 7C, the relationship between the average inter-burst interval t IBI and the first passage time to clear a nucleosome E h = 4/γ was considered. While there is a clear correlation between E h and t IBI ðr ¼ 0:68Þ, there is clearly dependence on the magnitude of h c that is not appropriately accounted for here. Therefore, we investigated the hypothesis that the inter-burst interval should scale with index of dispersion of the time to enter a nucleosome since D e quantifies the deviation from classical TASEP behavior. The variance V e will be directly related to the observed inter-burst intervals t IBI since these contribute most strongly to the variance out of all of the observed waiting times t w since the intra-burst intervals are always approximately t in � 4. However, a system with a long MFPT with frequent, short inter-burst intervals t IBI and a system with a short MFPT with infrequent, long inter-burst intervals t IBI could have comparable variances. Therefore, to better account for the relative perturbation strength and to maintain dimensional consistency, the index of dispersion D e is preferable to the variance V e . Fig 7D shows that all three data series collapse onto a single curve and that the average interburst intervals exhibit a power law scaling with D e that could be slightly improved with additional correction for α and h c . However, the correlation is still strong with r = 0.82 even though the relationship is not completely linear. Fig 7E investigates the limiting values of burst size. Under saturation conditions, the maximum average burst size is approximately This result is surprising since it initially seems like burst size should be controlled by the flux and the absolute time scale of the nucleosome unwrapping (1/h o ) because this should be connected to the number of polymerases that can accumulate behind a closed nucleosome. However, when the gene is saturated, a constant number of polymerases accumulate behind each closed nucleosome. The limiting factor under these conditions is how fast the nucleosome can close in such a way that it splits burst groups to finite size as shown in the anti-correlated regions observed in the Fig 2F.

The existence of bursts under initiation limited conditions
While robust bursting is clearly evident in the nucleosome-limited (saturated) region, it is unintuitive whether or not robust bursting occurs in the initiation limited region. In Fig 8A, we investigate the average inter-burst interval t IBI as α!0. The predictions given by the time (42) headway distributions (1/J) are given by the dashed lines. Remarkably, the inter-burst intervals are substantially longer than the waiting times between randomly initiated polymerases. In other words, the t IBI now represents the sum of the waiting time between rare initiation events plus the sum of the time to clear the nucleosomes which have all returned to resting nucleosome occupancy since the polymerase density is approaching zero. Fig 8B directly confirms the existence of burst groups. In the limit α!1, the max burst size converges to the value given by Eq 43. In contrast, in the limit as α!0, the burst size approaches one. However, our proposed burst identification procedure still identifies bursts for these parameter sets. We hypothesize that the burst sizes are geometrically distributed and that most "apparent bursts" under extreme initiation limited conditions are single polymerases while a smaller subset constitute non-trivial polymerase convoys/burst events.
We hypothesize that the burst sizes are geometrically distributed [27] such that where k is the number of polymerases in the burst, and p is defined by The variance of the burst size geometric distribution s 2 N B is given by Fig 8C verifies that this relationship holds for the data set plotted in Fig 8A and 8B, so the assumption that the burst sizes are geometrically distributed is reasonable. Additionally, Fig  8D shows that the variance of the inter-burst intervals is equal to their mean squared, (s 2 IBI ¼ t IBI 2 ) indicating that burst waiting times are exponentially distributed. Interestingly, these findings bare some similarity to that of a two-state random telegraph model. In this classical model, mRNA are produced continuously for brief periods of time while the promoter of a gene is active, and then, mRNA production stops when the promoter is inactivated [61,62]. This model can be interpreted through the framework of the classical TASEP (without nucleosomes) by considering the association and dissociation of a transcription factor with rates f a and f d respectively. In this new framework, the promoter would be active for a time period of 1/f d (i.e. when the transcription factor dissociates) and the interburst interval would be exponentially distributed with mean 1/f a (i.e. the waiting time for transcription factor association). Burst groups would move deterministically along the length of the gene with the same bulk velocity, but the density and flux would be scaled by the fraction of time the promoter was active f a f a þf d � � . The burst size distribution would then be geometrically distributed with a burst termination probability of f d Jþf d and a mean of J f d (i.e. the flux times the expected time the promoter will remain active).
As in the two-state telegraph model, the burst sizes of our proposed ddTASEP also follow a geometric distribution, and the inter-burst intervals are exponentially distributed [27]. However, there are a few key distinctions. First, the burst sizes in our proposed model have an upper bound given by Eq 43 that is proportional to 1= ffi ffi ffi ffi h c p and is completely independent of the flux. Second, if J<f d in a two-state telegraph model, polymerases will never form burst groups since the average number of polymerases initiated per active period is less than one. In contrast, in the ddTASEP, polymerases merge into convoys along the length of the gene due to repeated nucleosome induced pausing and arrive in rapid succession with waiting times distributed according to the time headway distribution of a classical TASEP in rapid succession, even for extremely small initiation rates.
Last, we will verify the ddTASEP's capacity to induce bursting for two cases in the extreme initiation limited regime for the parameter sets (α = 0.01, γ = 0.2) and (α = 0.002, γ = 0.2) which produced average burst sizes N B of 3.62 and 1.49, respectively. The probability mass functions given by Eq 44 are plotted for both cases in Fig 8D and 8E. For the case with the higher initiation rate (α = 0.01) in Fig 8D, bursts of up to 15 polymerases can be observed in some rare events. More interestingly for the case with (α = 0.002) in Fig 8E, the convoys can regularly include up to six polymerases even though the bulk density ρ for this case is 0.01 and the flux J is 0.002 (which would correspond to a 500 second average waiting time in a classical TASEP). In short, non-trivial bursting is still possible in the extreme initiation limited region.
From a biological standpoint, these parameter sets are obtainable. In Fig 8, cases with γ equal to 0.1 and 0.2 are considered. Veloso et al. utilized BruDRB-seq to demonstrate that the range of first passage elongation rates γ in K562 Leukemia cells ranged from 0.0083 to 1 site/s [19]. Bintu et al. showed via optical trap experiments that nucleosome unwrapping took between 10-100 seconds [13], and the cases presented here assume h o = h c = 0.0143 or 0.0333 which correspond to roughly 30-70 seconds. Tantale et al.'s study of HIV1 gene transcription (under high initiation rates) via MS2-tagging suggested that polymerases could move in convoys of 10-25 with an intraburst waiting time of 4.3 seconds which is consistent with the predictions from the proposed geometric distribution for burst size and the TASEP time headway distribution [63]. Therefore, nucleosome-induced pausing serves a viable mechanism for transcriptional bursting on its own.

Comparing the biological effects of static and dynamic defects
In this section, we numerically analyze RNAPII elongation on non-uniform DNA where the advance rate q or the nucleosome rate constants h o and h c may vary along the gene. We assume that the gene contains a 1600 bp (8 nucleosomes/32 site), partially methylated CpG Island at the promoter and seven 400 bp (2 nucleosomes/8 site) regions that correspond to exons and the adjacent intron-exon boundary regions where co-transcriptional splicing occurs [18,64]. (The reader should note most genes contain one long exon of median length 1,000-1,500 bp and many smaller exons of median length 150-200 bp [65]).
For the cases where we treated these features as static defects, we assigned a unique value of q i to each site i. First, the effect of variable GC content was incorporated into each site's advance rate by the following equation where the average advance rate is set by q, and N GC represents a binomial-distributed sample of the number of guanine and cytosine base-pairs present in the 50 bp interval. Then, the values of q i (obtained from Eq 47) were reduced by an additional 50% on the sites corresponding to the CpG Island and the exons. The minimum advancement rate obtained from this procedure (i.e. static defect slowdown of q i = 0.5q plus additional GC content penalty) used to generate the lattice for the gene in Fig 9A and  In Fig 9A and 9B, the effects of static defects on advance rate parameter variability is considered under initiation limited (α = 0.05, 9A) and saturating conditions (α = 0.25, 9B). The reader should note that all of our prior definitions for v, ρ, and J given in Eqs 39-41 assume that α and γ are scaled relative to q. Therefore, while α = 0.25 is not saturating for q = 1, the rescaled a 0 ¼ a q s ¼ 0:25 0:43 ¼ 0:58 is saturating for the slow sites. A few features become  immediately obvious. First, regions of elevated polymerase density have appeared at the sites of the static defects due to the increased dwell time on these sites, and the reduced flux through these sites has non-trivially reduced the transcription flux through the system (which must be constant across all sites). As the nucleosome dynamics slow down, the localized regions of elevated polymerase density smear out until the density profiles begin to qualitatively look like those in Fig 2C in the absence of perturbations.
In Fig 9A (barring the top left two plots), the flux through the gene is roughly equal to or slightly less than the flux predicted by the slowest bottleneck. The most significant discrepancies arise for cases with h c �h o (top and middle right) where the simulated flux is significantly lower than the predicted flux through the slow sites. We hypothesize that this results from a synergistic effect where slow downs on the static defects increase the likelihood of nucleosome rebinding immediately after them. In contrast, for Fig 9B (barring the top left two cases), the fluxes mostly lie between the upper and lower bounds set by the unperturbed and most perturbed predictions. However, in the case of fast nucleosome dynamics in Fig 9B (bottom row), the simulated values converge to the predictions for the slowest site. Last, the discrepancies between the predicted and simulated fluxes that existed for the cases with h c �h o when α = 0.05 (Fig 9A, top and middle left) have disappeared when the initiation rate was increased to α = 0.25. We suspect that saturating the gene has led to the formation of robust convoys that overcome the nucleosome rebinding effect that occurs after slow sites.
While Fig 9A and 9B demonstrate that static defects can throttle the flux, the resulting density profiles in Fig 9B have sharp discontinuities. Next, in Fig 9C and 9D, we investigate the alternative hypothesis that these genetic and epigenetic features could be inducing pausing by causing local variation in the nucleosome rate constants without inducing the same discontinuities.
In Fig 9C and 9D, it is clear that this alternative mechanism of action produces a similar throttling effect but with a smoother density profile. In Fig 9C, the simulated flux is dramatically reduced even relative to the fluxes in the analogous subplots in Fig 9A suggesting that small variations in the nucleosome rate constants can have a substantially stronger effect than even adjusting the nominal advance rate. Second, in many cases, the observed flux is substantially lower than the prediction for slow sites. We suspect that this effect is associated with the sharp decay in density at the left-hand boundaries in Fig 9C that is not observed in the static defect case. We hypothesize that the faster nucleosome rebinding on the promoter disrupts the initial formation of convoys until later on the gene causing a reduction in flux due to increased polymerase-nucleosome collisions.
While Fig 9C shows that local modification of the nucleosome dynamics can affect the flux at low initiation rates, Fig 9D shows the limitations of this hypothesis at higher initiation rates. For higher initiation rates, the exclusionary binding rule leads to nucleosome unwrapping/ destabilization even on the faster binding sites. Thus, when the system starts to saturate, the observed fluxes increase from at or slightly below the lower bound set by J Slow to an intermediate value between the bounds or even the upper bound set by J Norm .

Discussion
In this study, we present a stochastic, agent-based model of transcription with nucleosome induced pausing that maps onto the ddTASEP. The model reproduces key features of transcription including cooperative nucleosome destabilization [23,24,66], polymerase convoy formation [63], and transcriptional bursting [56]. Further, the model provides significant insight into the physics of jamming transitions in the presence of dynamic defects [34].
In lieu of a traditional mean-field approach to solving the ddTASEP dynamics, we calculated the moments of first passage time (MFPT and VFPT) using a Markov chain approach [37,39], yielding exact results for both. Our approach allowed us to directly quantify the effects of the microscopic nucleosome rate constants on the first passage elongation rate γ. Through our analysis of the index of dispersion of the waiting time to enter a nucleosome (D e ), we observed that significant deviation from classical TASEP behavior can occur as the resting nucleosome density h c h o þh c increases, but these deviations from classical behavior are more dramatic as the unwrapping kinetics slow down (h o !0) even at fixed nucleosome density. This is potentially consistent with observations such as those in Bintu et al. that showed that multiple mechanisms to reduce transcriptional pausing exist. In acetylation, the thermodynamic equilibrium of the resting nucleosome density (wrapping state) is altered with minimal effect on the rates of wrapping and unwrapping [13]. In contrast, tail-less nucleosomes (which are a minimally explored epigenetic regulatory mechanism [67]) had substantially faster rates of unwrapping and re-wrapping with no change in the resting nucleosome density (wrapping state) [13]. Both mechanisms contributed to fewer transcriptional pauses of shorter duration relative to transcription through unmodified nucleosomes.
Using the mean first passage elongation rate γ, we constructed a new axis to the fundamental TASEP αβ-phase diagram. Subsequently, we developed analytical approximations for the core transcriptional properties (transcription flux J, bulk density ρ, and bulk elongation rate v) based on assumptions about the asymptotic behavior of the system for low initiation rates and near the max flux limit. We identified a novel jamming transition in the αγ-plane that separated the transcriptional dynamics into initiation limited and nucleosome (dynamic defect) limited regions. Additionally, these estimates were robust to changes in gene length and to changes in the geometry of nucleosome spacing. Further research is merited into the physical properties of this system. For instance, the effects of varying β in the αβγ-volume and the effects of varying the dynamic defect length should be explored.
While the burst size and waiting time distributions shared similarities with two-state models, the model provided insight into a novel mechanism for transcriptional bursting that does not intrinsically rely on standard mechanisms at the promoter (such as two-state models) [25, 26,56]. ddTASEP burst groups were shown to be quantitatively (via the time headway distribution [60]) similar to shorter TASEPs nested within the ddTASEP lattice. Further, the average inter-burst intervals were shown to correlate strongly with the index of dispersion D e , and the max burst size was observed to be proportional to N B / 1= ffi ffi ffi ffi h c p . In eukaryotic systems, many transcriptional initiation rates are slow with most genes showing one RNAPII on a loci on average at a time [68]. However, even in the extreme low initiation rate region, non-trivial transcriptional bursts were observed on average. Further, given that the burst sizes were geometrically distributed, non-trivial bursting can occur even in extreme initiation limited region.
While the nucleosome-mediated bursting hypothesis is interesting, our model more importantly demonstrates both how powerfully and how versatilely nucleosomes dynamics can modulate transcriptional elongation and flux. An emerging body of evidence is forming in the literature that implicates the role of enzymes such as BRD4 and DNMT1/DNMT3B as epigenetic drivers of cancer and epithelial to mesenchymal transition. BRD4 is a histone acetyltransferase associated with histone displacement, chromatin de-compactification, and cell cycle progression. BRD4's upregulation has been associated with EMT in hepatocellular carcinoma and in salivary adenoid cystic carcinoma [69][70][71]. DNMT's are DNA methyltransferases that maintain gene silencing by hypermethylating CpG islands which encourages tighter nucleosome occupancy. Loss of DNMT activity leads to loss of nucleosomes on the promoter of tumor suppressor genes leading to their transcriptional activation [72]. The phase diagrams of the proposed ddTASEP confirm that even minor changes of the nucleosome rate constants will significantly adjust γ giving rise to a large dynamic range of behavior even for fixed initiation rates α consistent with these patterns of transcriptional upregulation. Further, unlike transcription factor dynamics or other shorter time-scale processes traditionally considered in two state models of transcription regulation, these nucleosome mediated methods of transcriptional control can span timescales from days to weeks giving long term control [73].
Last, in the final section of the manuscript, the proposed ddTASEP was extended to consider both static site defects that introduce variability in the RNAPII nominal advance rate and localized variability in the nucleosome rate constants that also induced pausing. Our results demonstrated that the nucleosome dynamics could be tuned individually to induce localized pausing effects that may occur at genetic points of interest like CpG islands or exons without introducing discontinuous shock profiles that would arise from treating these as static defects. Further, the resulting throttling effects on the flux were comparable if not stronger than those obtained in the static defect case. Additionally, the capacity to modify the nucleosome dynamics along the gene body provides an interesting capability in this model that is not attainable in a two-state transcription model. Utilizing our model, it may be possible to tune the nucleosome dynamics in different regions to serve different functions. For instance, a unique set of wrapping dynamic rate constants could be assigned to the promoter-proximal region to simulate promoter associated histone modifications like H3K9me3 silencing or H3K27ac activation while the gene body could have a separate set of modifications such as H3K79me3 or H4K20me1 promoting transcriptional elongation on the gene body leading to locally elevated polymerase density near the transcription start site with a homogeneous gene body (e.g. see Akhtar et al.) comparable to our density profiles in Fig 9C [19,74,75].