A Mechanistic Model for Cooperative Behavior of Co-transcribing RNA Polymerases

In fast-transcribing prokaryotic genes, such as an rrn gene in Escherichia coli, many RNA polymerases (RNAPs) transcribe the DNA simultaneously. Active elongation of RNAPs is often interrupted by pauses, which has been observed to cause RNAP traffic jams; yet some studies indicate that elongation seems to be faster in the presence of multiple RNAPs than elongation by a single RNAP. We propose that an interaction between RNAPs via the torque produced by RNAP motion on helically twisted DNA can explain this apparent paradox. We have incorporated the torque mechanism into a stochastic model and simulated transcription both with and without torque. Simulation results illustrate that the torque causes shorter pause durations and fewer collisions between polymerases. Our results suggest that the torsional interaction of RNAPs is an important mechanism in maintaining fast transcription times, and that transcription should be viewed as a cooperative group effort by multiple polymerases.


Introduction
Transcription is the first, and often the key, step in the control of gene expression. The process of transcription has several important phases. First, an RNA polymerase (RNAP) binds to a promotor sequence of a gene and initiates elongation. Next, the RNAP elongates down the DNA generating a single-stranded copy of RNA, and finally terminates, releasing the nascent copy of RNA. If the resulting RNA is mRNA, it is then translated by a ribosome to a chain of amino acids that fold to produce a protein. If the resulting RNA is rRNA or tRNA, it is not translated but provides a scaffold to facilitate binding of other proteins to form RNA-protein complexes such as ribosomes. In prokaryotes, both transcription and translation happen in the cytoplasm of a cell and can occur simultaneously. Therefore regulation of gene expression in bacteria, such as E. Coli, primarily happens at the transcriptional level [1,2].
Elongation of RNAP along the DNA strand is not uniform, but is interrupted by frequent pauses. There are at least three different types of pauses; backtracking pauses, hairpin pauses, and ubiquitous pauses [3,4]. Backtracking pauses and hairpin pauses have been shown to have a higher probability of occurring during transcription of specific sequences [5][6][7]. On the other hand, ubiquitous pauses are thought to have no dependence on DNA sequence and are equally likely to occur at any position along the DNA strand. It has been theorized that ubiquitous pauses are caused by a restructuring of the polymerase [4], but the exact cause remains an open question. These pauses are short (1-5 seconds) and occur approximately every 100 base pairs (bp) [4].
There has been substantial interest to understand the effect of the presence of pauses on the average transcription time and therefore output of the RNA, for highly transcribed genes. Presence of pauses may lead to traffic jams of RNAPs when one polymerase stops, affecting the trailing polymerases [8][9][10]. According to Klumpp et. al. [10], in their stochastic model RNAPs experienced a 40% reduction in the average elongation rate in dense traffic, amplifying the pause effect. This is similar to the results of a PDE model previously published by the authors [8,9]. In less traffic, the RNAPs in the model experienced only a 12% reduction of the average elongation rate [10].
An ODE model was developed in 2008 that studied the interactions of simultaneously transcribing RNAPs [11]. In this paper, Tripathi et. al. were able to incorporate the mechanochemical cycles of each RNAP into their model using two main states; when the pyrophosphate PPi is bound to an RNAP and when PPi is not bound. Using a mean-field approximation, they calculated the average rate of RNA production. With highly transcribed genes, the interactions between RNAPs can have a large impact on elongation efficiency.
A prototypical example of a highly transcribed gene is an rrn operon in E. coli. Each E. coli genome has seven rrn operons whose transcription produces ribosomal RNA (rRNA) which provides a scaffold for a ribosome [12][13][14]. During conditions of rapid growth there are as many as 70,000 ribosomes in a cell. To keep up with high demand for ribosomes, 90% of transcription in fast growing E.coli produces rRNA and tRNA, and only 10% produces mRNA [15]. As a result, there is a high density of RNAPs on all rrn operons and a high transcription completion rate is imperative. Experimental measurements have shown that approximately 31% of an rrn operon is covered by RNAPs (about 51 RNAPs) [16] during high growth conditions, which strongly suggest that the polymerases interact either directly, or indirectly during transcription. This interaction appears to be cooperative. In vivo and in vitro experiments have demonstrated that presence of multiple RNAPs in close proximity can assist in increasing the average elongation rate. A trailing RNAP can help a paused RNAP to re-enter translocation, thereby decreasing the delay caused by pauses [17]. The magnitude of the cooperativity effect has not been firmly established. However, it is worth noting that the average elongation rate of RNAPs on the rrn operon is 90 nucleotides per second (nt/s) [10,12,[18][19][20], which is about double the in vivo elongation velocity on protein coding genes [10,[21][22][23].
While the elegant paper of Epstein et. al. [17] firmly established the cooperativity effect, it did not propose a mechanistic explanation for this phenomena. In this paper we propose that the torsional force between the elongating RNAP and the DNA, caused by the helical structure of the DNA, may provide the mechanical underpinning for the interaction between elongating polymerases. The basis for our model is a set of experimental measurements by Ma and coauthors [24]. Using in vitro single-molecule experiments, they measured both the magnitude of torque exerted by elongating RNAP on DNA, and the effect of supercoiled DNA on RNAP velocity, pause density and pause duration.
We construct a model for transcription that substantially extends a basic stochastic model referred to as a Totally Asymmetric Simple Exclusion Process (TASEP), [2,8,10,[26][27][28][29][30][31][32]. In the most basic TASEP model each individual RNAP enzyme, hops along the DNA strand with a predetermined mean hopping rate provided that the forward site is unoccupied. The entire enzyme, spanning 35 nucleotides, translocates forward one nucleotide at a time as a unit, with the position of the RNAP being determined by the front of the enzyme. In addition to the basic TASEP, a mechanism for RNAP pausing can also be implemented. When pauses are included in the TASEP implementation, both the mean pause frequency and mean pause duration are constant and chosen a priori. In our model, Elongation with Torque Assisted Motion (ETAM), the rate of hopping depends on the torque between the polymerase and its closest two neighboring polymerases. The amount of torque is, in turn, the result of the relative motion of RNAPs on the DNA strand. Transcriptional pauses are included in the model as well. The mean hopping rate, mean pause frequency, and mean pause duration are dynamically updated within the model, and these parameters vary as the amount of torque varies for each RNAP. For any given RNAP that is translocating, the hopping rate and pause information depends upon the torque that is currently being experienced, and the subsequent motion is determined by sampling from the respective probability distribution functions. We base our model of the torque effects on experimental results by Ma [24] which experimentally measured the effect of torque on translocation (hopping) rate, pause duration and pause frequency. Over-twisting of DNA by shortening the distance between the polymerases increases (decreases) the translocation rate of the leading (trailing) polymerase, decreases (increases) the probability of entering a paused state for leading (trailing) polymerase and shortens (lengthens) the pause duration of the leading (trailing) polymerase.
ETAM simulation results show that the torque-based interaction between RNAPs results in a substantial cooperation effect between RNAPs. As a trailing RNAP approaches a leading RNAP, the resulting torque increases the effective elongation rate and reduces the likelihood and duration of pauses of the leading RNAP. At the same time, the effective elongation rate of the trailing polymerase decreases, while the likelihood and duration of pauses increases. As a result of this interaction, the duration of pauses decreases, and the average number of completed transcription events increases. The effect of this interaction is not unlike that of autonomously driven and communicating vehicles ("google" cars) on the road. By automatically adjusting velocity and helping each other to maintain proper spacing and shorten pauses, the collective motion of polymerases becomes more efficient with an average transcription time that is 37.5% shorter than that produced by the TASEP model. In this sense, the RNAPs are collaborating in order to transcribe the strand more efficiently than they would if they were traveling at a constant rate.

Results
In order to examine the effect of the torque on the transcription simulation we compare several quantities of interest for both the ETAM and TASEP models. By comparing ETAM and TASEP, we isolate the effect of the torque mechanism. For clarity of the results presented in this section, we briefly describe the torque effects simulated in the ETAM model. The details of the torque computation and specific parameter values are found in the Methods section.

Elongation with Torque Assisted Motion
DNA double helix structure makes one full rotation in approximately 10.5 base pairs [34]. RNAPs are large molecules that translocate along this twisted structure, which places constraints on the mutual motion of DNA and RNAP. If the DNA strand were fixed in space, an RNAP would have to rotate around DNA during translocation. The size of the RNAP and the packed environment inside the cell precludes this motion and a localized rotation of DNA has been observed [35,36]. If the DNA strand were free to rotate, it could spin along its long axis as it enters a stationary RNAP. However, if DNA is fixed upstream of the elongating RNAP and the RNAP elongates without rotation, it applies torque to the DNA. This torque is stored within the portion of the DNA strand between the fixed end and the RNAP, and if the amount of torque is large enough, it can either preclude or facilitate the forward motion of itself and its neighboring RNAP.
The effect of torque on DNA-RNAP interaction was experimentally quantified in recent work of Ma et. al. [24] where a single-molecule optical trap experiment was employed in order to measure the effect of twist in a DNA strand on transcribing RNAP. In particular, they first applied a predetermined value of torque to a strand of DNA and then measured the elongation rate, the pause frequency and the pause duration of an RNAP elongating on the strand. Two types of twisting mechanisms are used to describe the applied torque. The first is over-twisting, and it is characterized by applying twist in a manner that shortens the length of the full rotation of the helix (measured in base pairs). Likewise, a twist that increases the length of the full rotation of the helix is termed under-twisting. Ma and collaborators observed that over-twisting decreases the elongation rate of the RNAP and increases both the likelihood and the duration of pauses. On the other hand, under-twisting was found to increase the elongation rate and to decrease both the likelihood and the duration of the pauses.
To illustrate how these results can have an effect on the transcription of DNA by multiple polymerases, consider three consecutive polymerases on a DNA strand labeled as P i − 1 , P i and P i+1 in Fig 1. This notation is defined in detail in the section Incorporating Torque into Stochastic Model. We model the small segment of the DNA strand between two neighboring RNAPs as an elastic rod, and the elongation motion of one of the RNAPs imparts a twist within the elastic rod. The torque that results from this twisting motion is calculated using classical elasticity theory, and this calculation is detailed in the section Torque Between RNAP and DNA. A brief schematic overview of the motion of the RNAPs is presented below.
With respect to the motion of P i along the strand, the RNAP represented by P i − 1 is referred to as the leading RNAP, and the RNAP labeled P i+1 is referred to as the trailing RNAP. If we assume that individual RNAPs never move at exactly the same time, when P i moves forward, both P i − 1 and P i+1 provide anchors for the DNA strand. This movement imparts a torque to the portion of the DNA strand (one elastic rod) between P i − 1 and P i , as well as to the DNA strand between P i and P i+1 (another elastic rod). The portion of the strand between P i − 1 and P i will over-twist, and the portion of the strand between P i and P i+1 will under-twist. Note that the over-twist will increase the elongation rate of P i − 1 (i.e. P i − 1 receives a "push from the back") and under-twist will also increase elongation rate of P i+1 (i.e. P i+1 receives a "pull from the front"). It is noted in a later section that both of these effects tend to synchronize the motion of all three polymerases.
Our simulation results report the following quantities: the average transcription time, the average pause duration, the average collision duration time, the number of pauses and collisions, and the average transcriptional delay time experienced by an RNAP. Each of the above quantities is calculated over a range of initiation rates α Á β where β is the average elongation rate of 90 nt/s and α 2 [0.0001, 0.0115] using 11 discrete values within this interval. An α value of 0.0001 corresponds to an initiation every 111 seconds on average, while α = 0.0115 would have an initiation every 0.96 seconds on average. For each value of α, we performed 50 simulations of both ETAM and TASEP and ran the simulation for 10,000 simulated seconds. This ensures a sufficient amount of data is collected for accurate results when compiled and averaged together. In each of the fifty simulations we record the start and end time of each RNAP transcription process, and we also record both the beginning and the end time of each pause and collision.

Average Transcription Time
Experimental results from physicists and biologists give an average transcription time for the rrn gene of approximately 60 seconds [4,16], and in [16], the authors also assert that the rrn gene is, on average, approximately 31% covered. This corresponds to an average velocity of 90 nt/s. With the physical parameters used for both the TASEP and ETAM model simulations, we attempt to mimic the biological case of transcription of this gene. In order to obtain the average transcription time per RNAP, the transcription time for each RNAP within the simulation is recorded, and these values are averaged over all of the RNAPs for each specific initiation rate. The results can be seen in Fig 2A where data is presented for two situations. In addition to the simulations for both TASEP and ETAM models with pauses, we also performed numerical simulations for both models in the case of no pauses for comparison purposes. We refer to the case without pauses as the "baseline" case for each model. While the RNAPs could still experience collisions for the baseline case, the number of collisions was significantly lower, see the curves with the dotted lines in Fig 2A. This baseline model allows us to calculate the transcription time for an RNAP without any transcriptional delays caused by pauses, and we note that for both models, the average transcription time is close to 60 seconds and agrees well with numbers reported in the literature. For α = 0.0115, we observe a 61.23 second average transcription time for the baseline TASEP model and a slightly faster average transcription time of 54.7 seconds for the baseline ETAM model.
Examining the case where transcriptional pauses are introduced into each of the models, we see very different effects, and these are shown in the solid curves of Fig 2A. For α = 0.0115, the average transcription time for TASEP is approximately 156.25 seconds, which for a DNA Polymerases P i − 1 , P i , and P i+1 in order on the DNA strand. When P i translocates, the DNA between P i − 1 and P i will over-twist and the DNA between P i and P i+1 will under-twist, increasing the elongation velocity of P i − 1 and P i+1 . strand of 5450 nucleotides corresponds to an average velocity of 34.88 nt/s. This rate is significantly lower than the 90 nt/s resulting from an experimental average transcription time of 60 seconds for the rrn gene reported in [4,16]. In contrast, for the ETAM model at α = 0.0115, the average transcription time was approximately 97.73 seconds, which corresponds to an average velocity of 55.77 nt/s. This average velocity agrees much better with the experimental values. The significant difference in average transcription times between the two models is attributable to differences in the average number of pauses and their durations, as well as differences in the average number of collisions between RNAP's in the two models. In the following sections we carefully examine the effect of torque on these quantities.

Average Number and Average Duration of Pauses
In the TASEP model simulations, each RNAP experienced, on average, 70 pauses with an average duration of 0.55 seconds. As expected, these results are constant for the TASEP model over the entire range of initiation rates as seen for the magenta curves in Fig 3A and 3B respectively. For the ETAM model, the average number of pauses experienced by an RNAP varies significantly over the range of initiation rates included in the simulations. The number of pauses increases monotonically from 202.69 for α = 0.0001 to 2169.32 for α = 0.0115. This is a 970.3% increase over the range of initiation rates; moreover, for the entire collection of these simulations, the average number of pauses was consistently larger than that of the TASEP model, see Fig 3A. For the largest value of α, RNAPs in ETAM experienced 2999% more pauses than RNAPs in TASEP.
This result can be intuitively explained based on the construction of the ETAM model discussed in the Methods Section. If there are more RNAPs on a DNA strand, the RNAPs will be more likely to interact and the interaction they experience will likely be stronger than if there were fewer RNAPs. This is because the distance between RNAPs is smaller, on average, for a strand with a higher percentage of RNAP coverage. Therefore one would expect RNAPs to experience more pauses in an environment with more coverage (higher values of α) than they would in an environment with less coverage (lower values of α).
An important and interesting observation that accompanies the preceding results is the data for the average duration time of these transcriptional pauses. Our results indicate that in the ETAM model, the average duration time of the pauses decreases significantly for higher initiation rates, see Fig 3B. The duration time decreased 91.7% over the range of initiation rates, from 0.24 seconds when α = 0.0001 to 0.02 seconds when α = 0.0115. Note that the values of pause duration on the order of 0.02 seconds would not be detectable in experiments, and so very likely the motion of polymerases at these coverages will experience fewer observable pauses. At the highest value for α, the average pause duration time is 96.4% lower in ETAM than in TASEP. We propose that the effects of the torque mechanism on the average duration time of pauses can be summarized as follows. While an RNAP is more likely to pause in regimes with higher initiation rates, the effects of torque, that is, the "push from the back" and the "pull from the front," are stronger when the neighboring elongating RNAPs are closer to the paused RNAP. Therefore, the paused RNAP can be pushed or pulled "out" of its pause state and into active elongation by means of these torsional effects much more quickly than in a regime where the pause duration time is determined by purely stochastic effects.

Number and Duration of Collisions
We hypothesize that we can explain results of Fig 3C by the fact that the torque mechanism of the ETAM model allows transcribing RNAPs to maintain their spacing relative to their neighbors and to decrease the number of collisions (as described in the Methods Section under Incorporation of Collisions) that occur among polymerases. In order to investigate this hypothesis, we monitor and record the average number of collisions that occur and the time durations of those collisions for both the TASEP and the ETAM models. The initial intuitive expectation is that one would observe an increase in the number of collisions as the initiation rates increase (or as the percent of coverage increases). That is, for a larger coverage of RNAPs on the DNA strand, collisions should become more likely. Likewise, if there are very few RNAPs on the DNA simultaneously, collisions are unlikely. As shown in Fig 3C, the data for the average number of collisions behaves as expected. However, the number of collisions increases much more rapidly in the case of the TASEP model than for the ETAM model. Computing a ratio of the slopes of these two lines, we observe that the number of collisions for the ETAM results is growing at approximately 9% of the rate of the number of collisions of the TASEP results.
Given that RNAPs in the TASEP model collide much more often, the average duration time of each collision becomes critical. Fig 3D shows that for the largest initiation rate, α = 0.0115, the average duration time of a collision is significantly longer in TASEP, 0.104 seconds, than in ETAM, 0.011 seconds. Fig 3D indicates that RNAPs in the TASEP model tend to experience collisions that last approximately five times as long as those in the ETAM model. Moreover, this effect is consistent over the entire range of initiation rates included in the study. We believe this indicates that the torque is contributing to the RNAP's ability to maintain a degree of spacing and distance with its neighbors, thereby reducing the number of collisions that occur for the ETAM case. In addition, when collisions occur in the ETAM model, they tend to be very short in duration.
In summary, the inclusion of torque effects into the basic stochastic model seems to allow transcribing RNAPs to dynamically manage elongation velocity and spacing so as to avoid collisions and traffic jams. While the RNAPs experienced more pauses in the ETAM simulations, these pauses were significantly shorter in duration than those of the TASEP simulations, with far fewer collisions and shorter collision durations. To investigate more closely the effect of torque on pauses, collisions and their duration we attempt to summarize these effects by computing average transcriptional delay experienced by a polymerase in each model.

Average Transcriptional Delay
During the transcription process, an RNAP experiences both pauses and collisions, each with a certain time duration. These interruptions cause active elongation of the RNAP to cease until the RNAP is able to move again. The amount of time that an RNAP is unable to translocate contributes to the delay that the RNAP experiences. To quantify this concept, we define the average total delay to be sum of both the average delay due to pauses and the average delay due to collisions. The average total delay per RNAP is computed with the formula Delay total ¼ Delay pause þ Delay collision ¼ ðave # of pauses per RNAPÞ Á ðave pause durationÞ þ ðave # of collisions per RNAPÞ Á ðave collision durationÞ the results of which can be seen in Fig 2B. Specific values of the total delay over the range of initiation rates for each of the ETAM and TASEP models can be found in Tables 1 and 2, respectively. The average delay experienced by an RNAP in the TASEP model increased from 38.52 seconds when α = 0.0001, to 100.61 seconds when α = 0.0115. Conversely for the ETAM model, the average delay decreased from 48.63 seconds to 40.82 seconds over the same range of initiation rates. For the ETAM model, the decrease in delay values for increasing initiation rates (and thus increasing coverage), is evidence to suggest that the torque contributes to an increase in  efficiency with multiple RNAPs transcribing simultaneously. Moreover, the comparison of these two models allows us to observe that, in the TASEP model, the phenomena that drives the total delay is that of the delays due to collisions (as opposed to the delays due to transcriptional pauses experienced). Table 2 clearly demonstrates that, for the TASEP model, the delay which the RNAPs experience due to pauses remains virtually constant over the entire range of initiation parameters and that the fact that the total delay is increasing, is almost exclusively attributable to the delays due to collisions. In contrast, Fig 3C and Table 1 demonstrate that the torque mechanism included in the ETAM model prevents many collisions from happening, and it also decreases the amount of delay that RNAPs incur from those few collisions that actually do occur. This is especially apparent with the higher initiation rates. The torsional interaction between RNAPs leads to far more efficient transcription, especially in the case of high coverage of the DNA strand.
The torque provides a mechanism for the RNAPs to interact and to cooperatively prevent collisions from occurring. If an RNAP pauses, the trailing RNAP will slow down and/or enter a pause with a much higher probability due to the increasing torque applied to it as its elongation continues. The trailing RNAP will likely enter a pause or it may push the leading RNAP into active elongation before a collision occurs. The evidence of this interaction can be seen in the difference in the number of collisions experienced per RNAP in the two models in Fig 3C and  Tables 1 and 2.
The comparison of the ETAM and TASEP models leads us to propose that torque is an important mechanism in the regulation of transcription. Neighboring RNAPs may interact with each other using torque to optimize their elongation efficiency as a group effort as opposed to an individual process as suggested in [17]. In this paper, Epshtein et. al. show that transcription times are faster with multiple RNAPs present on the strand as opposed to the case of a single molecule transcribing. The ETAM simulations for α = 0.0001 have an average transcription time per RNAP of 109 seconds, with an average time between initiations of 111 seconds, see Table 1. With this parameter setting, the simulation is essentially a model of transcription by a single polymerase. The average transcription time for α = 0.0115 decreased nearly 12 seconds from the lower value of α. This is largely due to the paused RNAPs being pushed back into active elongation by their neighboring polymerases, a phenomenon suggested by Epshtein and Nudler [17]. Thus far, our results are being presented in terms of the average values, we now consider the variance of the average transcription time, the average pause duration, and the average collision duration; these are the quantities that have received careful consideration for model comparison.

Variance of Transcription Time, Pause Duration and Collision Duration
The coefficient of variation and the variance to mean ratio are presented in Fig 5(A) and 5(B) respectively. In both of these variance measures, the variability from the mean of the ETAM model decreases significantly and becomes very small in as the initiation rates increase while the TASEP model variability continues to grow. The baseline models, simulated with no pauses, are included in order to show that the variance is very close to zero for both models in the absence of pauses. With the addition of pauses, the possibility of prolonged traffic jams and collisions immediately increases the variance in both models. For the case of ETAM, as more RNAPs are added to the DNA strand, the torque interaction between polymerases drives the variance in transcription time with respect to the mean down towards zero again, as shown by the coefficient of variation and the variance to mean ratio. For both the pause duration and the collision duration, the variance for ETAM decreases as α increases. Therefore when there are fewer RNAPs on a DNA strand, the higher variance in pause duration indicates that the pause duration for the RNAPs is somewhat variable. However, as more RNAPs are transcribing simultaneously, the case for the higher α values, this variance goes to zero for both the pause duration and collision duration. The torque interaction between RNAPs is much stronger when RNAPs are closer together. This allows the RNAPs to communicate, and it stabilizes the elongation process. As a result of the decrease in pause durations and number of collisions, the RNAPs do not experience large traffic jams that can cause a great deal of variability in transcription time. In higher densities, the RNAPs can also push a paused RNAP back into elongation which causes uniformly short pause durations with very little variability.
The variance for the TASEP model is significantly different than for ETAM. Since the mean hopping rate, mean pause frequency as well as the mean pause duration for TASEP are all prescribed in the model, independent of the parameter α and constant throughout the simulation process, it is expected that the variance for the pause duration remains constant over the range of initiation rates in the case of TASEP. However, the variance for the collision duration slightly increases, and the variance measures for the transcription time increase considerably. Although the variance for the collision duration only slightly increases, the average number of collisions increases significantly as α increases, see Fig 3C, and this results in the increased variability of the average transcription time at higher values of α. Without torque, the RNAPs in the TASEP model are unable to work together. Therefore, when there are many RNAPs elongating on the same DNA strand, an individual RNAP is more likely to experience a traffic jam. The increased frequency with which these traffic jams occur and the variability of the length of these traffic jams can cause a large difference in transcription times for individual RNAPs.

Discussion
By incorporating the torque mechanism into a basic TASEP model of transcription we are able to see a cooperative effect among transcribing RNAPs. This effect was noted experimentally in 2003 by Epshtein and Nudler [17]. At the time, the mechanism causing this behavior was unclear. After the recent developments by Ma et al [24], and results from our model simulations, we propose that the torsion on the DNA caused by RNAP transcription is allowing the RNAPs to communicate with each other in order to maintain proper spacing, thereby avoiding collisions, and to increase the rate of transcription. A theoretical examination of the effect of the torque mechanism proposed here can be found in the Methods Section under Mean Field Approximation Model. A classical car following model that includes forces from both the leading and trailing RNAPs yields a steady state density-velocity relationship that qualitatively explains why the torque mechanism generates cooperative behavior among the neighboring RNAPs.
The cooperation between RNAPs is clearly seen from the results of our stochastic model, ETAM, which incorporates torque into a basic TASEP model. We compare the results of this model with those of the TASEP model to isolate the effect of the torque. There are two results that clearly demonstrate this cooperative effect: the average number of collisions each RNAP experiences and the average pause duration.
With a high initiation rate of RNAPs onto the DNA, each polymerase experiences on average 550 fewer collisions with neighboring RNAP during the course of transcription when the elongation is regulated by torque. This is a direct result of the torque allowing the RNAPs to communicate with the polymerases that are closest to them. During a simulation of TASEP, an RNAP will elongate until it either pauses or is stopped because the next nucleotide downstream is occupied by a paused polymerase. With ETAM, as an RNAP elongates close to a paused RNAP, the resisting torque experienced by the elongating RNAP makes it much more likely to pause. At the same time, the paused RNAP experiences an assisting torque from the elongating RNAP which can push it out of a pause and into active elongation. This interaction prevents an average of 550 extra collisions from occurring in high coverage regime.
The average pause duration is the other quantity most affected by the torque. With ETAM the pause duration is not a fixed quantity but can be dynamically recalculated to account for the actions of neighboring RNAPs. Pause durations can be shortened when the polymerases surrounding the paused RNAP elongate. Simulation of ETAM produced, on average, 0.02 second pauses as opposed to 0.55 second pauses simulated in TASEP. When comparing the average transcription time per polymerase we see an even more striking difference. The ETAM model shows polymerases experiencing an average transcription time of 97.73 seconds as opposed to 156.25 seconds in TASEP. The delay a polymerase experiences as a result of collisions and pauses has the largest effect on the overall transcription time. With fewer collisions and shorter pause durations, the RNAPs simulated in ETAM have significantly less delay resulting in far more efficient transcription.
As promising as these results are, they depend on how we fit limited data for velocity, pause frequency, and pause duration into functions of torque as discussed in the Methods Section under Incorporating Experimental Data to Determine the Effect of Torque. With a small amount of data points available and no data for values near the stall torque and melting torque, the models that we use to fit these functions near those end points are somewhat arbitrary. These high and low values for torque are calculated quite often with a high density of RNAPs on a DNA strand since the torsional interactions are so strong. As a result, the performance of the ETAM model depends on the choice for these functional descriptions when the torque values are sampled from regions where no experimental data is currently available. This issue is explored within the Methods Section under Results from Different Pause Frequency Functions.
As nanotechnology continues to improve, our hope is that data will become available for velocity, pause duration, and pause frequency at both very low and very high torque values. This will allow us to better fit our model to the data without needing to make assumptions for the extreme cases. Even with the limited data, the cooperation effect is evident in the ETAM results, with shorter transcription times in the simulations for the range of high initiation rates. With more polymerases transcribing DNA simultaneously, each RNAP experiences less delay than RNAPs transcribing with a smaller amount of polymerases. In the case of the highly transcribed genes, transcription can be viewed as a group effort, with torsional interactions allowing all of the RNAPs to transcribe more efficiently.

Methods
We construct the ETAM (Elongation with Torque Assisted Motion) model by extending the construction of a basic TASEP framework. The ETAM construction is discussed in detail, and then we briefly describe the simplifications to ETAM which result in the original TASEP model. We begin by discussing the fundamental aspects of TASEP that provide the foundation of ETAM.

TASEP Model
TASEP is a stochastic model that has been used to describe the process of both transcription and translation [2,8,10,[26][27][28][29][30][31][32]. In TASEP, each RNAP enzyme hops forward with a constant rate β on a 1-dimensional strand with a discrete and finite number of sites. RNAPs cannot occupy the same site and therefore will cease active elongation if the next site is occupied (we will refer to this as a collision). Only when the next site is vacated will elongation resume. We have implemented open boundary conditions where the RNAP enzymes enter the strand at a given rate and exit the strand once they reach the opposite end. Specifically, each RNAP enters our simulation (initiates) with a rate of (α Á β), and leaves the simulation (terminates) with a rate of (γ Á β). This is consistent with a differential equation model for transcription proposed originally in the late 1960s [33]. Therefore α and γ are scalars that multiply the elongation rate to obtain initiation and termination rates, respectively. For ease of simulation, the enzyme enters the simulation one nucleotide at a time, and exits the simulation by hopping off as a unit.

ETAM
The torque interaction between RNAPs described in the Results section above, allows the polymerases to work together using the over-twisting and under-twisting in the DNA strand. The following section makes this process more precise by deriving a mathematical expression for the amount of torque that an RNAP imparts to DNA during translocation.
Torque between RNAP and DNA. DNA consists of two strands in a double helix structure. In addition, DNA is a flexible polymer structure that can experience bend. A polymer's bend-persistence length is a mechanical property that characterizes its stiffness. For lengths less than the bend-persistence length, a polymer such as DNA, will exhibit behavior similar to that of an elastic rod. The bend-persistence length of DNA is estimated to be 150 bp [37]. Therefore, on length scales shorter than the persistence length, the DNA strand is comparable to an elastic rod. Since the average distance between elongating polymerases on an rrn gene is 100 bp [38], it is reasonable to assume that, on average, the force exerted by one elongating RNAP on its neighbors occurs over a distance of approximately 100 bp. Since this distance is within the persistence length reported in the literature, we assume that the local behavior of the DNA strand connecting two adjacent RNAPs can be modeled as an elastic rod. The movement of the RNAP enzyme in the ETAM model is the same as in the TASEP model, where the entire enzyme moves forward one nucleotide at a time as a unit. In addition, the footprint of an RNAP is approximately 35 bp, of which 17 bp is occupied by the transcription bubble [39]. As depicted in Fig 1, we assume that the 17 bp where DNA is unwound inside of the RNAP are anchored and cannot be twisted but that the other 18 bp are free to twist under an appropriate force. We assume that these 18 base pairs are partitioned into 9 bp on either side of the bubble. That is, the 9 bp of the strand that are outside of the transcription bubble of P i yet still covered by the front end of the RNAP are considered to be part of the elastic rod located between P i and its leading RNAP and are susceptible to twisting forces. Likewise, the 9 bp covered by the rear end of the RNAP are considered to be part of a separate elastic rod located between P i and its trailing RNAP and are susceptible to twist. In this setting, we use classical elasticity theory to describe the interaction between a small segment of the DNA strand (elastic rod) and an elongating polymerase. The mass of the RNAP itself is not accounted for in the current version of the model, and the torque calculation is only based on the elongation motion of the RNAP and the amount of torsion imparted on the rod by that motion.
The torquet stored in an elastic rod under torsion iŝ where r is the radius (% 1 nanometer (nm) for DNA [37]), μ is the shear modulus, L is the length of the rod, and Δϕ is the angle of the total twist [40], see Fig 6. All quantities representing length are measured in base pairs and converted into nanometers using the conversion 1 bp = 0.34 nm [37].
In the context of our model, there is some flexibility in how one associates the length of the rod L with the distance between two neighboring polymerases as shown in Fig 1. First note that Eq (1) indicates that the torque goes to infinity as L ! 0. In terms of our model, this would preclude direct contact between two polymerases. However, it has been observed experimentally [41] that RNAPs periodically exhibit direct contact with their neighbors during transcription. In addition, we recall that the footprint of elongating RNAP is approximately 35 bp, of which 17 bp is occupied by the transcription bubble [39]. The 17 bp inside of the transcription bubble cannot be twisted by an elongation force, but the remaining 18 bps are free to twist under an appropriate force. Therefore even when two RNAPs are directly adjacent to each other (without any empty base pairs separating them), there are still 18 bp between their corresponding transcription bubbles. Hence, we define the length L in Eq (1) to be the distance between the transcription bubbles of adjacent RNAPs, where L has units of nucleotides. This ensures that the underlying mathematical quantity in Eq (1) remains bounded.
The shear modulus, μ, is calculated using the formula where Y is Young's modulus and ν is Poisson's ratio [40]. The Poisson ratio of DNA has been reported to be anywhere in the interval ν 2 [−0.7, 0], for more details see [42][43][44]. To calculate this parameter specifically we will follow the paper by Manning et al [44]. Using the formula where B is the bending modulus and C is the twisting modulus [44], we calculate ν % −0.5.
Using these values for the Poisson ratio and Young's modulus in Eq (2), we have estimated the shear modulus for DNA to be m % 300 pN=nm 2 : We also simulated results using ν = 0 and ν = −0.25 and observed qualitatively similar behavior. Parameters values used in the simulations presented here along with corresponding literature references can be found in Table 3.
Next we derive a mathematical equation which describes the relationship between the length of the flexible (rod) segment of DNA strand between the transcription bubbles of successive RNAPs and the amount of torque in that segment. This segment of the strand is modeled using an elastic rod as sketched in Fig 6, and the notation is made precise below. Because there are 10.5 base pairs in one full helical rotation of DNA, the twist angle Δϕ in Eq (1) is proportional to the change in length Δℓ as The torque between P i − 1 and P i (see Fig 1) due to the accumulation of applied torque is calculated by adding small increments of torque that correspond to motion by one nucleotide as in Eq (4). This measure of torque is always done in relation to the state of neutral twist of that segment of the strand. To be precise, we consider the segment of DNA strand between P i − 1 and P i as an elastic rod (see Fig 1), and we assume that the DNA strand is in a state of neutral twist (not over-twisted or under-twisted) at the time of initiation and so the torque is zero. When the trailing RNAP (P i ) initiates, the distance between P i and the nearest leading RNAP (P i − 1 ) downstream (in the direction of elongation) is defined to be L 0 . Assume that at some later time, the distance between P i − 1 and P i is L. In order to derive the total amount of torque stored in the rod at this instance, one notes that the amount of torque between P i − 1 and P i can be computed by treating the segment of the strand as an elastic rod with an original length of L 0 that has experienced a twist corresponding to angle ϕ, with a resulting length of the rod denoted by L, see Fig 6. The total torque is calculated by adding small increments of torque that correspond to elongation by one nucleotide.
In order to construct a model that is efficient for extensive and repeated simulation, we use a continuous approximation of Δτ i by dτ, and lengths ℓ i by s. The total torque in a segment of DNA strand with initial length L 0 and final length L is approximated by Note that for L < L 0 the DNA is over-twisted and the torque is positive, as in Fig 6. Conversely, if L > L 0 the DNA is under-twisted and the torque is negative. These correspond to the resisting and assisting torque found in [24]. If L = L 0 the torque is zero, since the DNA has the same length as when the trailing polymerase initiated onto the DNA. With this formula for the torque between two neighboring RNAPs, we next describe how this formula is used within the context of the stochastic elongation model. Incorporating torque into the stochastic model. We will number RNAPs by index i which denotes the order of their initiation. The i-th polymerase will be characterized in the model by three numbers where i represents a positive integer. This triple will be updated each time the RNAP translocates along the strand. The term n i is an integer value denoting the furthest downstream nucleotide number occupied by the ith RNAP, while T i is the time of the next elongation of P i . The calculation of this value is addressed in the section entitled Incorporation of Pauses. Upon initiation of P i onto the strand, define L 0,i to be the distance (measured in nucleotides) between the transcription bubbles of P i and that of its leading RNAP, P i − 1 , at the time that P i initiates transcription. This distance is calculated and stored in the triplet for each RNAP. When P i initiates, the value of L 0,i is fixed, however, this distance may be different for each RNAP as its initiation occurs at a randomly generated time and is independent of the distance traveled by any previously initiated RNAPs. We denote L i (t) to be the distance between the transcription bubbles of P i and its leading RNAP P i − 1 at time t. This distance is computed by accessing the variables n i − 1 and n i of P i − 1 and P i and measuring their difference. Using this distance, we can calculate the length of the DNA strand that is free to twist by subtracting the length of the transcription bubble, 17 nts. In other words, In order to quantify the role of the torque calculation, we begin by considering elongation. First, suppose there are three RNAPs positioned on a segment of the DNA strand, and denote these RNAPs as P k , for k = i − 1, i, i+1 as labelled in Fig 1, and further suppose that P i has just experienced elongation. The values L i and L i+1 are calculated immediately following elongation. Using these distances, as well as L 0,i , and L 0,i+1 , the torque that P i is currently experiencing is calculated using Eq (6) in the following manner. Specifically, the torque has two components. The first is the component that corresponds to the torque between P i and P i − 1 , denoted τ(L i ), and the torque between P i and P i+1 , denoted by τ(L i+1 ). Define τ i to be the total torque experienced by polymerase P i . It is calculated as follows Analogously, calculations are repeated for RNAPs P i − 1 and P i+1 , recalculating the torque for all three RNAPs after translocation of P i . It is important to note, the calculations outlined in the previous paragraphs are only performed for the affected RNAPs; the elongating RNAP and its neighboring RNAPs. During transcription, RNAPs first experience initiation, and the measurement of torque for initiation can be interpreted as a special case of the previous discussion. Upon initiation of an RNAP, labeled P i in Fig 1, the distance between P i and its leading RNAP, the quantity L 0,i , is set. Upon initiation and subsequent elongation of P i , the torque behind this polymerase is zero until the next initiation. Once a trailing RNAP, P i+1 here, has initiated onto the strand, the calculation of the torque τ i will have nonzero contributions from both neighbors of the polymerase.
We model RNAP termination as follows. Again referencing Fig 1, suppose P i − 1 is the RNAP that is closest to the termination end of the DNA strand and therefore will be the next polymerase to terminate. Prior to termination, the torque measure τ i − 1 has a nonzero contribution only from τ(L i ). That is τ i − 1 = 0 − τ(L i ). Likewise, during this situation τ i has contributions from both neighboring RNAPs as long as P i − 1 is still transcribing the strand. Upon termination of P i − 1 , the torque downstream of P i is zero, hence τ i = 0 − τ(L i+1 ). Since P i is now the last RNAP on the DNA strand, τ i will be calculated as such until P i itself terminates.
Physical interpretation of torque. The formula for τ i in Eq (9) is based on the total amount of both assisting and resisting torque that is experienced by polymerase P i . Here we give some physical insights into the positioning of the RNAP relative to each of its neighbors and its effect on the torque calculation. Recall that when the distance L i is compared to the original distance L 0,i , the DNA between them can be either under-twisted, over-twisted or neutral. Over-twisting in front of an RNAP provides a resisting torque, while over-twisting behind an RNAP provides an assisting torque ("push from the back"). On the other hand, under-twisting in front of an RNAP provides an assisting torque ("pull from the front"), while under-twisting in back of an RNAP is a resisting torque. Examining Eq (9) for several possible scenarios gives us insight into the influence of torque on movement of the polymerase.
For the neutral case when L 0,i = L i in Eq (9), τ(L i ) is zero and there is no contribution to τ i from that component. The contribution to τ i is similar for the case of L 0,i+1 = L i+1 . Next we focus the discussion on the cases when τ(L i ) and τ(L i+1 ) are nonzero. There are four cases to consider which are depicted in Fig 7, scenarios (B)-(E) and are discussed below.
The values of the two components of torque τ i are τ(L i )>0 and τ(L i+1 )>0. A positive value of τ(L i ) corresponds to a resisting torque being experienced by polymerase P i relative to the position of its leading RNAP. A positive value of τ(L i+1 ) provides an assisting torque for P i . Therefore the subtraction of τ(L i+1 ) implies that τ i < τ(L i ). In other words, RNAP P i+1 is assisting P i to overcome the resisting torque in front of it at its current position.
The values of the two components of torque τ i are τ(L i )>0 and τ(L i+1 )<0. These two factors indicate that the DNA between P i and P i − 1 is over-twisted and the DNA between P i+1 and P i is under-twisted resulting in P i experiencing resisting torques from both sides. The negative value of τ(L i+1 ) is subtracted, increasing the value of τ i , therefore the subtraction of τ(L i+1 ) implies that τ i > τ(L i ).
3. L 0,i < L i and L 0,i+1 > L i+1 The values of the two components of torque τ i are τ(L i )<0 and τ(L i+1 )>0. The DNA between P i and P i − 1 is under-twisted and the DNA between P i+1 and P i is over-twisted resulting in P i experiencing and assisting torque from both sides.
The values of the two components of torque τ i are τ(L i )<0 and τ(L i+1 )<0. The DNA between P i and its neighboring RNAPs is under-twisted. This leads to P i experiencing an assisting torque from the front and a resisting torque from behind.

Mean Field Approximation Model
The role of a mean field approximation model, as for any other model, is to provide insight into a more detailed stochastic model. In deriving such a model, we inevitably simplify the fine model and therefore the deterministic model will preserve only some aspects of the detailed model. Some portions of the analysis given here use techniques borrowed from the treatment of various traffic flow models, and we point out a connection between this discussion and a particular traffic flow model in a remark at the end of this section. Our starting point in deriving the mean field approximation model is the expression for torque given by Eq (9). Our goal is to capture behavior of small perturbations around the stochastic equilibrium when this torque is zero. This implies that there exists a preferred distance L 0 such that L 0,i = L 0,i+1 = L 0 for all i and where L i = L i+1 = L 0 . Assume that the elongation velocity at zero torque is β. It follows that the polymerase P i is elongating with a velocity where the torque τ i at time t is given by Eq (9), and the expression inside the logarithm is simplified by our assumption that L 0,i = L 0,i+1 = L 0 for all i. The velocity function V is computed using one of the polynomial functions given by either Eqs (19) or (22) introduced in the section Incorporating Experimental Data to Determine the Effect of Torque. In particular, simulation results presented in the section Linear Fit to the Data show that a (piecewise) linear relationship between torque and velocity is sufficient to observe the cooperative behavior among RNAPs that is the subject of this work. Using a linear relationship between torque and velocity, say VðtÞ ¼Ãt þB; whereÃ;B represent arbitrary constants, the form of the velocity described above becomes.
where A, B are constants that are determined by the equation in Eq (9) as well as the piecewise linear velocity relationship used here which is found in Eq (22). Note that the constant A < 0 since the relationship between torque and velocity is a strictly decreasing one. In order to work with all positive constants, we let C = −A and using basic rules of logarithms, we arrive at the relationship dP i ðtÞ dt where C, B > 0. Next we examine the quotient L i /L i+1 in view of our assumption that both distances L i and L i+1 are near the preferred spacing L 0 . Note that if both RNAPs are exactly at the preferred spacing, then L i = L i+1 = L 0 , and the velocity simplifies to the constant B. Hence we prescribe B = β. Now suppose that the RNAPs are near the preferred spacing but not precisely achieving that spacing. Noting that these distances represent consecutive distances surrounding the RNAP on the DNA strand, if the RNAP elongates one nucleotide, then one distance L i is perturbed just slightly from the steady state, and the other is perturbed by that same amount but with the opposite sign. That is, the quotient can be expressed as where |s|< < 1 is a small parameter. Using a series expansion for the term in the square brackets, we have Returning to the ODE system above, we make the linear approximation of the quotient in terms of s so that the RNAP velocity is approximately Next we proceed to relate the expression s L 0 to the steady state density of the system so that the left side of Eq (13) is replaced by the steady state RNAP velocity. We use a typical convention from the traffic flow literature and assume that the density is inversely proportional to distance between polymerases r ¼ 1 L . Assuming that the RNAP density, ρ is either at steady state or is a slight perturbation from steady state, we write Using a series expansion for the term in parentheses, one obtains Approximating the density with the linear term, we have which then simplifies to Using this relation in Eq (13), when r ¼ r 0 ¼ 1 L 0 , then we have that the average velocity Observe, that when ρ = ρ 0 the RNAP velocity is β. However, for small perturbations of ρ so that ρ > ρ 0 then the argument in the logarithm is less than 1 and the particle velocity is lower than β, and if ρ < ρ 0 then the argument in the logarithm is greater than 1 and the particle velocity is faster that β. This self-adjustment of the velocity is a negative feedback that rejects local perturbations away from preferred density ρ 0 and preferred velocity β. That is, and v 0 (ρ 0 )<0, which can be observed graphically in Fig 8. Remark. The model given by Eq (12) has an interesting connection to a classical car-following model [45][46][47][48]. When the classical car-following model is derived by accounting for car velocities of both the leading and trailing neighbors, then this choice captures the mathematical form of velocity that we derived using the torque formulation from the elastic rod equations and the linear torque to velocity relationship within the ETAM model. In addition, the car-following model provides us with an attractive analogy between a string of polymerases transcribing a DNA strand which synchronize their motion through torque, and a string of driverless electronically synchronized cars. The power of this analogy is limited by the fact that the carfollowing model that we begin with is second order and thus the inertial term is important, which is not the case for the mean field model that we ultimately derive for ETAM. The following paragraphs give a short overview of the modifications made to the classical car-following model in order to observe the type of cooperative interactions between both leading and trailing polymerases discussed above.
Let P i (t) denote the position of the i-th vehicle on a straight, single lane roadway. For the classical car-following model, the acceleration of the car can be described by the equation where the acceleration depends inversely on the distance between itself and the car directly ahead of it, (the i − 1st car on the roadway), L i : = P i − 1 − P i , as defined in the section Incorporating Torque in the Stochastic Model, and the velocity of P i relative to its leading car. The constant C > 0 is related to the specific ability of the driver to react to the changing conditions in the traffic. In order to complete the connection with the mean field approximation for the ETAM model, we adjust the car following model from Eq (15) in two fundamental ways. First, in ETAM the polymerase velocity is affected by the leading polymerase as well as the trailing polymerase. Second, the car following model does not have apriori a preferred distance L 0 at which the forces from two flanking cars are zero, and as a result there is no acceleration of the car in the middle. To address the first issue, we expand the model by including a force due to the trailing car effect into Eq (15) so that the new equation is written We integrate using substitutions with u ¼ P jÀ1 ðtÞ À P j ðtÞ and du ¼ dP jÀ1 ðtÞ dt À dP j ðtÞ dt ! dt; for j = i, i+1 to obtain the expression for velocity One immediately observes that if the distances between consecutive RNAPs is approximately the average L 0 , then the logarithmic term in the above equation vanishes. Moreover, if the constant is set to the average velocity d i = β, then the expression given above for the car velocity has exactly the same algebraic form as that of the mean field approximation derived for the ETAM model in Eq (12). Now that a theoretical justification of the cooperative behavior seen in the ETAM model has been discussed, we discuss the effects of torque on transcriptional pauses, specifically the ubiquitous pauses described previously. The discussion below gives the last concepts needed for incorporating torque into the stochastic model. Incorporating Experimental Data to Determine the Effect of Torque Recent single molecule experiments by Ma and co-authors [24] attempt to advance our understanding of the relationship between the torque and the movement of the RNAP. The data obtained in those experiments suggests a relationship between torque and the elongation velocity of the RNAP. It has also been known for many years that during elongation, RNAPs experience short, frequent pauses where active elongation is stalled or arrested. Ma [24] also suggests that the nature of these pauses is tied to the amount of torque that an RNAP is experiencing at a given moment of time. The data reported in [24] was limited, as can be seen in Fig 9A-9C, and we explored various ways to develop accurate representations of these relationships that could be used in a mathematical model. The following subsections give an overview of various approaches to incorporating the data into the current model, and simulation results using these different choices are discussed later in the paper.
Incorporating experimental data into the ETAM model. For the ETAM model, we first fit the data from [24] to extract a functional relationship between the torque experienced by an individual polymerase and its elongation velocity, pause frequency and pause duration. Results from various biological experiments describing physical properties of DNA are used. For instance, the full model employs a stall torque, a high torque value experienced by RNAPs where they are unable to elongate, reported in [24] to be on average 11 pNÁnm. At values close to melting torque -10 pNÁnm, the RNAPs would be experiencing a large amount of assistance, therefore we assumed a high velocity with no pauses. While there are many classes of functions to fit data, our goal was to fit the data using low order polynomials. We use the notation V(τ) to denote the polynomial function used to describe the elongation velocity, F(τ) to denote the frequency of occurrence of a pause as a function of torque, and D(τ) to denote the duration time of a given pause.
One of the first papers to experimentally measure the force-velocity relationship was Wang et. al. [25]. They measured velocity as a function of progressively larger positive forces that were applied through a feedback-controlled optical trap, and the authors obtained the theoretically justified expression where A is a dimensionless constant that determines the degree to which either mechanical or biochemical events limit the enzymatic cycle at vanishing load and δ is a characteristic distance. After non-dimensionalizing Eq (17) [25], the following relationship obtained for normalized velocity V : where f > 0 is the dimensionless force, and a = A −1 . Since in our model we consider both positive and negative torque, we assume that for negative f we have VðÀf Þ % ÀVðf Þ; f > 0: Finally, for simplicity of the fitting procedure we approximate V(f) by a relatively low degree fifth order polynomial, which still captures strong nonlinear behavior of velocity described by Eq (18). When constructing the function V(τ) describing the elongation velocity, we assume that the velocity would decrease to 0 nt/s at the stall torque 11 pNÁnm. To impose a value of velocity when torque was -10 pNÁnm, we assume the polymerases would be traveling at rate that is significantly higher than V(0). This assumption is due to the fact that negative torque is assisting torque which would aid efficient elongation. A large amount of assisting torque would lead to extremely fast velocities. For simplicity, we choose V(−10) = 45 nt/s, which is approximately twice the velocity at a torque of 0 pNÁnm. Using these two extra data points we had a good agreement between the velocity data and a fifth order polynomial.
VðtÞ ¼ À0:0002t 5 þ 0:0008t 4 þ 0:0041t 3 À 0:035t 2 À 0:2166t þ 22:3574: In order to develop equations for pause frequency and pause duration, we make assumptions that are consistent with the biology. At the stall torque we assume the RNAPs will pause with probability 1. Incorporating this information into the pause frequency function, an accurate fit is difficult to achieve using a single function since the data increases very quickly to 1 at a torque value of 11 pNÁnm. Using an exponential fit severely underestimates the negative torque values, since the point (11, 1) dominates the fitting procedure. For this reason, a piecewise quadratic function is used to describe the functional relationship of the data. As the torque approaches the stall torque, the pause duration should increase rapidly. RNAPs are able to reenter active elongation after reaching a stall torque if the torque is alleviated but will stay in a pause if the torque remains at the stall torque [24]. To capture both the rapid rise and infinite duration at 11 pNÁnm, we choose a function with a vertical asymptote at the stall torque. In order to produce the asymptotic behavior, we use a tangent function as given below.
For a torque of -10 pNÁnm, we impose a pause duration of 0 seconds. The original data from [24] and the curve fits can be seen in Fig 9. Fig 9A-9C correspond to qualitatively realistic relationships between torque and the parameters influencing the motion of a polymerase. The case of a zero value of torque is considered to be the case of no resistance. When an RNAP experiences a resisting torque, this corresponds to a positive torque value. The resulting elongation velocity (see Fig 9A) decreases from that of the zero torque case so that elongation is hampered. In addition, as the amount of torque increases, the velocity of the motion decreases, and the forward motion of the RNAP is resisted. Examining Fig 9B-9C, both the pause frequency and pause durations are strictly increasing functions of torque. Pauses are more likely to occur when the polymerase experiences a resisting torque, and when a pause occurs, it is likely to be longer in duration if the torque is resisting forward motion. The figures also reflect the case that experiencing an assisting torque is associated with a large elongation velocity along with small pause frequency and pause duration for an individual RNAP.
Linear fit to the data. With limited data points, one may attempt a linear fit without using information such as stall torque. In addition to simulations of the data fits explained above, we also ran simulations using a piecewise linear curve fit. The equations are given in this section, and we discuss the results of these simulations in the section Results from the Piecewise Linear Data Fit. The data presented in [24] looks almost linear except for the highest torque value of 7.5 pNÁnm. Therefore we choose to build a piecewise linear functional relationship with two linear fits, one using the data points at -6 pNÁnm and 5 pNÁnm, the other uses the data points at 5 pNÁnm and 7.5 pNÁnm. These equations are

Simulation
We simulate the transcription process using a Kinetic Monte Carlo algorithm [49,50] which is often used for simulation of coupled Poisson processes. First, we provide an outline of the computations performed at each step of the simulation following an elongation event. The incorporation of the pauses and the recording of the collisions is detailed in this section, and we also describe the basic structure and order of events within the simulation algorithm.
Simulation events and implementation of pauses. The simulations consist of a series of events that occur for each RNAP within the system. These events are elongation and initiation. We assume that the time between two consecutive events for a given polymerase is exponentially distributed. Therefore, given that an event for polymerase P i just occurred, the amount of time that will elapse prior to the next event for P i has a probability distribution P(t) = λe − λt , where λ > 0 is a rate parameter. The expected value of an exponential distribution is 1/λ, which means that, on average, the time between events is 1/λ seconds. The greater the value of λ, the more quickly the next event occurs, while events with smaller rate parameters will happen less frequently. Each of the two main RNAP events has its own unique rate parameter. The first parameter of interest is related to elongation, and it is denoted by β, where β is the average rate at which a single RNAP elongates from one base pair to the next on the DNA strand in the absence of any pauses or collisions with other RNAPs. This parameter has units of nucleotide per second. The basic elongation event has a rate parameter β, which implies that an elongation will happen every 1/β seconds on average.
The second rate parameter that governs the basic transcription process is the average initiation rate, λ = (α Ã β), where α 2 (0, 1] is a constant value and β is the average elongation rate as described above. Here α is used to scale the elongation rate to a lower value causing initiation events to happen less frequently, on average every 1/(α Ã β) seconds. The value of α is fixed for each simulation. The incorporation of pauses into the model is achieved through modifications made dynamically to the elongation rate parameter. Next we describe the details of the simulation events, and for each pause event, we describe the adjustments that are made to the various rate parameters.
Incorporation of pauses. Continuing the notation used in previous sections, suppose that an elongation event has occurred for P i . Once the torques τ i+1 , τ i , and τ i − 1 are recalculated for RNAPs in Fig 1, these values are used to update both the velocity and the pause frequency of all three RNAPs using Eqs (19) and (20) respectively. The velocity is used as the rate parameter, λ in the exponential distribution defined above. Specifically, once τ i , the torque experienced by P i , is calculated, the velocity and pause frequency, V(τ i ) and F(τ i ) are calculated using Eqs (19) and (20). The time of the next movement of P i is now governed by the probability distribution P(t) = V(τ i )e −V(τ i )t with an expected value of 1/V(τ i ). The time of next movement by the RNAP is drawn using this exponential distribution and recorded as T i into the triple P i . That is, from Eq (7), the variable T i is assigned by sampling this exponential distribution using the current value of V(τ i ).
The likelihood of P i entering a pause depends on the pause frequency under the calculated torque, F(τ i ), and once this quantity is calculated, a uniformly distributed random number, u 2 [0, 1] is drawn. If u ! F(τ i ), the RNAP continues the elongation process. However, if u < F(τ i ), the RNAP enters a pause. When P i enters a pause, then the pause duration, D(τ i ), is computed using Eq (21). The velocity of the RNAP is then reset to V(τ i ) = 1/D(τ i ), and the time of the next movement, T i is drawn based on this velocity in the same manner that was described in the preceding paragraph. Therefore the expected value of the time of the next elongation for P i is 1/V(τ i ) = D(τ i ). Note here that this duration is updated when a neighboring RNAP translocates (because the value of τ i changes when that motion occurs), therefore the time of next elongation for the paused RNAP P i is recalculated, and the pause duration changes accordingly. The simulation scheme also monitors other pieces of information that are crucial for our analysis, in particular the furthest downstream nucleotide occupied by a paused RNAP (position) as well as both the start time and the end time of each pause are recorded for a posteriori analysis purposes.
Incorporation of collisions. The other event that can affect elongation is a collision event. As mentioned previously, a collision occurs when the trailing RNAP, P i must cease active elongation because the next nucleotide is already occupied by the leading RNAP, P i − 1 . Since RNAPs cannot overlap nucleotides, or "hop over" each other as seen in specific cases with ribosomes during translation [51][52][53], P i will halt elongation until the next nucleotide is free. When a collision occurs, V(τ i ) is set to zero and the time of next movement is not drawn until P i is free to move. Essentially, elongation of P i is put on hold. This ensures that an elongation event for P i , when it is directly behind P i − 1 , will not be chosen for execution; therefore no polymerases will overlap. When P i − 1 translocates, the rate of P i is subsequently updated, and the next time of elongation for P i is drawn. We consider the end of the collision to be when P i translocates, as opposed to when the next nucleotide becomes available. The position, start time and end time of each collision is also recorded for later use in our analysis.
To summarize, upon elongation of P i , the process given in Algorithm 1 is performed for P i and the neighboring RNAPs, P i − 1 and P i+1 . A flowchart detailing the elongation process can be seen in Fig 11. The following subsections describe how we can simplify the model to recover the standard TASEP model. We then use this TASEP model for comparison with the ETAM model. A discussion of the parameters used for the numerical results is also presented below.
Algorithm 1 This algorithm is performed after the elongation of P i . 1: L i , L i+1 using n i − 1 , n i , n i+1 2: if L i = 0 then 3: V(τ i ) = 0 4: τ i using L i , L i+1 , and L 0,i , L 0,i+1 5: V(τ i ), F(τ i ) using τ i 6: u random(0, 1) 7: if u < F(τ i ) then Pause 8: D(τ i ) using τ i 9: V(τ i ) = 1/D(τ i ) 10: else u ! F(τ i ) No Pause, Do not update V 11: Remark 1 In order to calculate the transcription time of each RNAP, the time of initiation and termination is recorded. If an RNAP initiates but does not complete termination within the simulation time, then the initiation time is not saved, and only the transcription times for the RNAPs that terminate during the time of simulation are used for the subsequent analysis. While all pauses and collisions that occur are recorded, for the analysis presented here we only include the information from pauses and collisions that occur before the final recorded termination time during the simulation.
Comparison to TASEP. In order to isolate the effect of the torque on the RNAP elongation, we compare the ETAM model to the TASEP model, which has the same stochastic structure as ETAM, with the key exception that the torque effects are not included. In particular, we use the same flowchart as in Fig 11 but the velocity, pause frequency, and pause duration are all constant values that correspond to Eqs (19), (20) and (21) with τ = 0. This is equivalent to RNAPs experiencing zero torque for the entire simulation. We use a constant velocity of β = V(0) which also gives a constant initiation rate of α Á β. Since the RNAPs are elongating with a constant rate of β and we have boundary conditions corresponding to initiation with rate α Á β and termination with rate γ Á β, this simulation describes the typical TASEP simulation described previously.
Pauses are simulated similarly to the pauses in ETAM. Since the torque value is always zero, the velocity and pause frequency are V(0) and F(0) respectively. These values are held constant throughout the simulation. As described before, upon elongation a uniformly distributed random number, u, is drawn. If u < F(0) the RNAP will experience a pause. The duration of the pause D(0) is employed to recalculate the velocity as 1/D(0). Upon completion of the pause, the velocity is reset to V(0), and the time of the next elongation calculated and stored.
Simulation parameters. The two most important parameters governing the simulation events are the average elongation rate, denoted by β and the scalar that affects the rate of initiation, denoted by α. In the literature, β can vary from about 20 nt/s [24] up to about 90 nt/s [16] depending on several factors including the gene the RNAP is transcribing, environmental conditions, and the specific structure of the gene [19,20]. The average elongation rate for RNAPs on an rrn operon gene has been shown experimentally to be approximately 90 nt/s [12,[18][19][20]. Note that this is different from the data calculated by Ma et. al. [24] where they measured the velocity for an RNAP under no torque at approximately 22 bp/s. This measurement is consistent with single-molecule experiments on certain E. coli gene sequences [54,55]. In order to simulate transcription on an rrn operon, we multiply the velocity function described by Eq (19) by an appropriate scaling factor, K. The value of K is chosen so that an RNAP under no torque travels at 90 nt/s, which gives K Á V(0) = β = 90 nt/s. Therefore, when drawing the next time for elongation of an RNAP, we use the exponential distribution KV(τ)e −KV(τ)t , with the expected value (KV(τ)) −1 .
While mathematically, the initiation parameter α can be as high as 1, in reality the values of α are much less than 1. Simulations were conducted using a variety of initiation rates ranging from α = 0.0001, which corresponds to an initiation every 109 seconds on average, to α = 0.0115, which corresponds to an initiation every 1.6 seconds on average. According to the literature, an rrn operon has approximately 31% coverage of the DNA strand by RNAPs [16]. Accounting for this percentage of coverage on a strand of DNA that is 5450 nts long occupied by RNAPs, each of length 35 nts, corresponds to approximately 50 RNAPs on the strand at any given time. Hence, the choice of the parameter range α 2 [0.0001, 0.0115] is meant to mimic conditions ranging from light coverage of the DNA strand to a density of RNAPs that is beyond the experimentally observed conditions described above.

ETAM Model Results from the Piecewise Linear Data Fit
The small number of data points given in [24] require us to extrapolate in order to characterize the relationships between the torque and the various physical quantities (elongation velocity, pause frequency and pause duration) for the ETAM model in the extreme cases where the torque values are near the melting or the stall cases, that is, where the absolute value of the torque is large. For those two cases, the functional relationships are constructed based on experimental biological data from [24] as well as other literature. The results presented in the Results Section at the beginning of this paper focused on that combination of information to inform the mathematical model; however, the choices made for the functional relationships were still somewhat arbitrary. The results of this section illustrate that the choices made for the case of torque values near either melting or stall are very crucial to the results of the mathematical model. The most important result shows that the cooperative effect discussed previously is observed for some choices of data fit constructions but not for others. Hence, the results reported in this section demonstrate the need for more experimental data over a larger range of torque values in order to produce a realistic mathematical model.
Although the average transcription time and total delay functions displayed for the ETAM model in the Results Section are largely concave up, the results are very different when using the piecewise linear curve fit discussed in the section Linear Fit to the Data for ETAM simulations. With the piecewise linear fit, the average transcription time and total delay have the same behavior as the nonlinear fit for small values of α, but the results continue to increase and remain concave down for the larger initiation rates. This is in sharp contrast to the behavior of the data generated using the nonlinear fit in the case of larger initiation rates. This can be seen in Fig 12. For α = 0.0001 the average transcription time is approximately 108 seconds with a total delay of 45 seconds. This increases over the range of α, to a final average transcription time of 147 seconds and a delay of 78 seconds. This is a 36% increase in transcription times and a 73% increase in total delay. If we compare these to the values calculated for its nonlinear counterpart, we see that for the largest value of α = 0.0115, there is a 50% increase in transcription time and a 91% increase in delay for the piecewise linear fit. While the result with the piecewise linear fit is much closer to a 60 second average transcription time than TASEP, the cooperative effect of the nonlinear fit from the Results Section diminished for this particular choice of piecewise linear fit. The cooperative effect of the ETAM in the Results Section shows a decrease in transcription time as the coverage of the DNA strand increases to biologically relevant situations, and that effect is different than both the piecewise linear fit for ETAM and the TASEP case where transcription time monotonically increases as the coverage of the DNA strand increases.

The Importance of the Nonlinear Pause Duration
Because of the difference in results between the nonlinear fit of the data and that of a piecewise linear fit, we investigate which function or combination of functions drives this difference. Below we explore several possible combinations of piecewise linear and nonlinear data fit choices, and we find that the behavior is being driven by the pause duration function, specifically the pause duration for very low torque values near the melting threshold, see Fig 13. This figure compares the ETAM transcription time and delay results using the nonlinear fit for the pause duration function (Fig 9), and the piecewise linear fit for the pause duration function (Fig 10) with two other curves. The other curves show results when the pause duration function is constructed in a piecewise manner by combining portions of the piecewise linear and nonlinear fits in order to define other composite piecewise functions for the pause duration function. The curve labeled "Nonlinear Left" shows the results for average transcription time and average delay where the piecewise linear fit is used for all functions except the pause duration. The pause duration, instead, is a new piecewise defined function that is a combination of the nonlinear fit for α 2 [-10, 5] and the piecewise linear fit for α 2 [5,11]. Similarly the curve labeled "Nonlinear Right" represents the results obtained using the nonlinear fit for α 2 [5,11] and the piecewise linear fit for α 2 [-10, 5] for the pause duration function. As one can observe, the results for the piecewise linear fit and the nonlinear right fit are qualitatively similar. The most surprising are the results for the nonlinear left fit. For this curve fit, with large values of α, the RNAPs experience nearly 60 second transcription times with virtually no delay. The reason for this difference in results can be seen as we investigate the graph of pause duration for torque values near -10 pNÁnm. Fig 14 depicts the nonlinear pause duration and the piecewise linear pause duration for torque values in the interval [-10, -5]; it is essentially a "zoom in" of the graph in Fig 10C. It's important to note that pause durations are not fixed but are calculated upon entering a pause and then recalculated when neighboring RNAPs elongate. However, when an RNAP is paused, the recalculated pause duration will always be smaller than the original because it is either being pulled from in front or pushed from behind by the elongation of a neighbor. Therefore even if the original pause duration assigned to the RNAP is large, the recalculated pause duration is likely to be small, as evidenced by the extremely small average pause durations in high coverage environments. Hence, the range of pause duration values for the case of small torque is very important. In the case of the nonlinear data fit with low torque values, the pause duration can be set as low as zero, in which case, the RNAP is released from the pause state and is free to elongate. However for the case of the piecewise linear data fit, the lowest pause duration possible is approximately 0.3 seconds. With the large amount of pauses experienced by an RNAP, the difference in pause duration for low torque values is driving the difference in behavior between the nonlinear and piecewise linear data fit.
To finish our discussion on pause duration we again consider Fig 13. We concentrate on the difference in the results of the "Nonlinear" data fit and the "Nonlinear Left" data fit whose pause duration is linear for α 2 [5,10] and nonlinear for α 2 [- 10,5]. If the behavior of the results is driven by the pause duration for low torque values, how do we account for this difference, as they have the same pause duration for those torque values? This difference can be attributed to the pause frequency function. Recall, the "Nonlinear Left" data fit has the piecewise linear fit for velocity and pause frequency. The piecewise linear fit for large torque values would give a pause frequency of 0.049 when the torque is 11 pNÁnm, as opposed to a pause frequency of 1 for the nonlinear fit. The RNAPs will experience significantly fewer pauses with the linear pause frequency, encountering on average 146 pauses when α = 0.0115, as opposed to 2169 pauses. Regardless of how short the duration, a very large number of these minor interruptions in elongation (corresponding to a large value of the frequency function) can have a large effect on the overall transcription time of an RNAP. With this in mind, we investigate how the value of the pause frequency function when torque is 11 pNÁnm can affect the results of the model.

Results from Different Pause Frequency Functions
As mentioned earlier, the choices for data fit in the cases of very high torque values and very low torque values were somewhat arbitrary with the limited data points. One choice which has proven to be crucial is to use a pause frequency of 1 when torque is 11 pNÁnm, the approximate value for stall torque. Here we illustrate the impact of that choice on our results. Using the nonlinear data fit for pause duration and velocity, as in Eqs (21) and (19), we perform a set of calculations for various choices for the function value when torque is 11 pNÁnm. We continue to fit the pause frequency function using quadratic functions similar to Eq (20); however the value of the pause frequency function when the torque value is 11 pNÁnm is allowed to range  The values for pause frequency used when torque is 11 pNÁnm are {0.2, 0.5, 0.6, 0.7, 0.8, 0.9}. We also report the results for the original pause frequency function that takes on the value of 1 at the stall torque. Also included are the results for the case where the pause frequency function data is fit with one quadratic function which has a value of approximately 0.05 at stall torque. The average transcription time and total delay for all of these different choices for data fit can be seen in Fig 16A and 16B respectively. Pause frequency increasing to 0.05 and 0.2 at stall torque have the fastest transcription times, both being very close to a 60 second transcription time and, in the case of the lowest pause frequency, actually being faster than a 60 second transcription time. These two pause frequency functions give results that agree extremely well with experimental data for large values of α. If we consider the difference in the average number of pauses per RNAP under these different pause frequency functions, we can understand the difference in delay. Table 4 shows the average number of pauses per RNAP, average pause duration, and the corresponding delay due to pauses when using the various pause frequencies for the value of α = 0.0115, as shown in Fig 16. By examining the delay values in Table 4 and comparing these to Fig 16B, one can see that the pause delay contributes the majority of the time toward the total delay each polymerase experiences. The pause delay is influenced mostly by the number of pauses per polymerase which is a direct result of the choice of the pause frequency function.
Another interesting result is the shift in behavior from frequency 0.5 to frequency 0.6. The results for lower frequency values are concave up over the range of α. However with pause frequency up to 0.6 we begin to see a global maximum when α = 0.0004. For pause frequency greater than or equal to 0.8 the global maximum is when α = 0.0007. For values of α larger than 0.0007, the coverage of the DNA strand is large enough that the RNAPs feel a substantial effect from their neighboring polymerases and begin to experience a cooperative effect. It is in this range that while the polymerases experience more pauses, the decrease in the pause duration is enough to shorten the total delay. We believe this is the range of initiation values where the RNAPs are now close enough for the torsional interaction to push a paused RNAP back into elongation, as proposed by Epshtein et. al. [17], substantially quicker than they had been previously. As the values of α increase beyond 0.0007, this cooperation becomes even more pronounced. Regardless of pause frequency, the overall cooperative behavior is clear from the decrease in delay and transcription times for the larger coverages.
Mathematically, choosing a value for pause frequency equal to 1, or even close to 1, when torque is 11 pNÁnm is the natural choice as this is the stall torque. However, if one is attempting to fit the experimental data with one function, one would choose a pause frequency close to 0.2. For that case, we have 30% coverage of the DNA strand with an average transcription time of just under 62 seconds at the highest initiation rate. This agrees very well with the results presented by Neuman et. al. [4]. In order to properly tune the ETAM model proposed here, more data is necessary for the extreme cases near the stall torque and melting torque.
In order to illustrate the importance of accurate measurements near the stall torque and the melting torque for the ETAM model, we investigate how often these torque values are sampled during the simulations of the ETAM model over the range of α. Fig 17 depicts the number of times each torque value is calculated in a simulation as a percentage of the total number of torque values that are computed, displayed as a histogram. We show the results for both the baseline simulation (no pauses), and the pause simulation for α = 0.0001 ( Fig 17A) and α = 0.0115 Table 4. The average number of pauses per RNAP, average pause duration, and average pause delay per RNAP due to pauses experienced per RNAP for the different pause frequency functions. Frequency represents the value of the pause frequency function when torque is 11 pNÁnm. All of the data reported is for α = 0.0115. ( Fig 17B). In these simulations, we count the number of times the torque values of 10 pNÁnm and -10 pNÁnm are computed, and we plot this percentage as histogram bars at -10 and 10 respectively. In between these values we tabulated the number of times a torque value fell in the interval (-10, -9] and plotted that percentage in the histogram bars at -9. The percentage of torque values in (-9, -8] were plotted in the histogram bars at -8, and so on, up to the torque value of 9. The percentage of values in the interval (9, 10) were plotted under the label of <10. As shown in Fig 17, for our lowest initiation rate (α = 0.0001) the algorithm computes a torque of 0 pNÁnm approximately 50 percent of the time in the baseline simulation and just under 40 percent for the pause simulation. This is to be expected as many of the RNAPs for this value of α are transcribing as single molecules and therefore would not generate torque values away from 0 pNÁnm. For α = 0.0115, the results are very different. The interaction between RNAPs causes the 0 pNÁnm to be computed only 10% of the time. The extreme values of -10 pNÁnm and 10 pNÁnm are calculated the most often. In the pause simulation -10 pNÁnm is computed approximately 25% of the time and 10 pNÁnm about 35% of the time. This high percentage is due to the fact that the RNAPs are very close together at this high density and therefore the interactions between them are extremely strong. As a result, the torque values related to the extreme cases of stall and melting along with the choices one uses to construct the functional relationships between torque and all three of velocity, pause duration, and pause frequency at these extreme torque values combine to create a large impact on the ETAM results.

Author Contributions
Conceptualization: TH TG LD JG.
Formal analysis: TH TG LD JG CM.
Funding acquisition: TH LD TG.
Investigation: TH TG LD JG.
Methodology: TH TG LD JG. Software: TH TG LD JG CM.
Writing -original draft: TH LD TG.