## This is an uncorrected proof.

## Figures

## Abstract

Transcription elongation can be modelled as a three step process, involving polymerase translocation, NTP binding, and nucleotide incorporation into the nascent mRNA. This cycle of events can be simulated at the single-molecule level as a continuous-time Markov process using parameters derived from single-molecule experiments. Previously developed models differ in the way they are parameterised, and in their incorporation of partial equilibrium approximations. We have formulated a hierarchical network comprised of 12 sequence-dependent transcription elongation models. The simplest model has two parameters and assumes that both translocation and NTP binding can be modelled as equilibrium processes. The most complex model has six parameters makes no partial equilibrium assumptions. We systematically compared the ability of these models to explain published force-velocity data, using approximate Bayesian computation. This analysis was performed using data for the RNA polymerase complexes of *E. coli*, *S. cerevisiae* and Bacteriophage T7. Our analysis indicates that the polymerases differ significantly in their translocation rates, with the rates in T7 pol being fast compared to *E. coli* RNAP and *S. cerevisiae* pol II. Different models are applicable in different cases. We also show that all three RNA polymerases have an energetic preference for the posttranslocated state over the pretranslocated state. A Bayesian inference and model selection framework, like the one presented in this publication, should be routinely applicable to the interrogation of single-molecule datasets.

## Author summary

Transcription is a critical biological process which occurs in all living organisms. It involves copying the organism’s genetic material into messenger RNA (mRNA) which directs protein synthesis on the ribosome. Transcription is performed by RNA polymerases which have been extensively studied using both ensemble and single-molecule techniques. Single-molecule data provides unique insights into the molecular behaviour of RNA polymerases. Transcription at the single-molecule level can be computationally simulated as a continuous-time Markov process and the model outputs compared with experimental data. In this study we use Bayesian techniques to perform a systematic comparison of 12 stochastic models of transcriptional elongation. We demonstrate how equilibrium approximations can strengthen or weaken the model, and show how Bayesian techniques can identify necessary or unnecessary model parameters. We describe a framework to a) simulate, b) perform inference on, and c) compare models of transcription elongation.

**Citation: **Douglas J, Kingston R, Drummond AJ (2020) Bayesian inference and comparison of stochastic transcription elongation models. PLoS Comput Biol 16(2):
e1006717.
https://doi.org/10.1371/journal.pcbi.1006717

**Editor: **Vassily Hatzimanikatis,
Ecole Polytechnique Fédérale de Lausanne, SWITZERLAND

**Received: **December 12, 2018; **Accepted: **December 13, 2019; **Published: ** February 14, 2020

**Copyright: ** © 2020 Douglas et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **All posterior distribution files and the software used to generate these distributions are available in a GitHub repository https://github.com/jordandouglas/jordandouglas.github.io/tree/master/simpol/about/ElongationPosteriorData.

**Funding: **J. Douglas was funded by a University of Auckland Doctoral Scholarship. URL: https://www.auckland.ac.nz/en/study/scholarships-and-awards/scholarship-types/postgraduate-scholarships/doctoral-scholarships.html. The authors wish to acknowledge the contribution of NeSI high-performance computing facilities to the results of this research. NZ’s national facilities are provided by the NZ eScience Infrastructure and funded jointly by NeSI’s collaborator institutions and through the Ministry of Business, Innovation & Employment’s Research Infrastructure programme. URL https://www.nesi.org.nz. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

Transcription is carried out by RNA polymerases: RNAP in *Escherichia coli*, pol II in *Saccharomyces cerevisiae*, and T7 pol in Bacteriophage T7. It involves the copying of template double-stranded DNA (dsDNA) into single-stranded messenger RNA (mRNA). RNAP and pol II are comprised of multiple subunits, and their catalytic subunits are homologous [1, 2]. In contrast, T7 pol exists as a monomer with a distinct sequence, and resembles the *E. coli* DNA polymerase I [3].

Optical trapping experiments have been performed on the transcription elongation complex (TEC) from a variety of organisms [4–10]. In a typical experimental setup, two polystyrene beads (around 600 nm in diameter) are tethered to the system; one attached to the RNA polymerase and the other to the DNA [4]. As transcription elongation progresses, the distance between the two beads increases and the velocity of a single TEC can be computed. Optical tweezers can be used to apply a force *F* to the system (Fig 1).

(A) Optical trapping setup showing dsDNA being transcribed by RNA polymerase (grey ellipse) into mRNA. Two polystyrene beads are tethered to the system allowing the application of force using optical tweezers. An assisting load *F* > 0 acts in the same direction as transcription (top) while a hindering load *F* < 0 acts in the opposing direction (bottom). Figure not to scale. (B) Schematic depiction of the effect of applying a force on RNA polymerase. Due to the stochastic nature of transcription at the single-molecule level, each experiment yields a different distance-time trajectory, even under the same applied force.

Single-molecule studies of the TEC have revealed that RNA polymerases progress in a discontinuous fashion [4, 11–14] with step sizes that correspond to the dimensions of a single nucleotide (3.4 Å [15]). Consequently, at the single molecule level, transcription is best modelled as a discrete process rather than a continuous one.

A single cycle in the main transcription elongation pathway (Fig 2) requires (1) Forward translocation of the RNA polymerase, making the active site accessible; (2) Binding of the complementary nucleoside triphosphate (NTP); (3) Addition of the nucleotide onto the 3′ end of the mRNA. This third step involves NTP hydrolysis. Nucleoside monophosphate is added onto the chain and pyrophosphate is released from the enzyme.

(A) The model of the main transcription elongation pathway, which shows the postulated states; the pathways for interconversion; and the rate constants that govern each part of the reaction. The transcription bubble is the set of *β*_{1} + *h* + *β*_{2} bases (see main text for definitions) in the double-stranded DNA which are unpaired. States are denoted by *S*(*l*, *t*) where *l* is the length of the mRNA and *t* is the position of the polymerase active site (small grey rectangle) with respect to the 3′ end of the mRNA. Polymerase translocation displaces the polymerase by a distance of *δ* = 1 bp = 3.4 Å. During polymerisation the chain is extended by one nucleotide. (B) Instantiated posttranslocated state of RNA polymerase transcribing the *rpoB* gene sequence, with *β*_{1} = 2, *h* = 9, *β*_{2} = 1. Forward translocation requires melting two T/A basepairs (right arrows). Backward translocation requires melting two C/G basepairs (left arrows). The mRNA secondary structure would also require reconfiguration [16, 17].

Our study aimed to identify the best model to describe this reaction cycle for RNAP, pol II, and T7 pol, based on analysis of published force-velocity data. As there are three reactions, up to six rate constants may be necessary for a kinetic model of a single nucleotide addition. These describe forward and backwards translocation (*k*_{fwd} and *k*_{bck}), binding and release of NTP (*k*_{bind} and *k*_{rel}), and NTP catalysis and reverse-catalysis (*k*_{cat} and *k*_{rev}), also known as pyrophosphorolysis [18]. However fewer than six parameters may be required in practice.

First, it is reasonable to assume that polymerisation is effectively irreversible [17, 19–21], as pyrophosphorolysis is a highly exergonic reaction, reducing the number of rate constants to five. Second, translocation between the pretranslocated and posttranslocated states, and/or NTP binding, may occur on timescales significantly more rapid than the other steps, in which case they may be modelled as equilibrium processes. These assumptions simplify the model, as the respective forward and reverse reaction rate constants are subsumed by a single equilibrium constant. Third, thermodynamic models of nucleic acid structure can be used to estimate sequence-dependent translocation rates *k*_{fwd}(*l*) and *k*_{bck}(*l*), by invoking transition state theory, and this can sometimes result in parameter reduction [16, 17, 21].

Irrespective of equilibrium assumptions and parameterisation, transcription elongation under applied force can be modelled in two fundamentally distinct ways. First, there are the **deterministic** equations which can be used to calculate the mean pause-free elongation velocity *v*(*F*, [NTP]) as a function of force *F* and NTP concentration [NTP]. This kind of model can be derived from the differential equations describing the time evolution of all species, by application of the steady state approximation. Force effects on the translocation step are incorporated using transition state theory [22, 23].

An example is the following 3-parameter model [4].
(1)
where *δ* is the distance between adjacent basepairs (3.4 Å, [15]), is the equilibrium constant of NTP binding, is the equilibrium constant of translocation, *k*_{B} is the Boltzmann constant, and *T* is the absolute temperature. Increasingly complex equations may be used as more parameters or states are added to the model [4, 6, 17]. Such equations describe the velocity averaged across an ensemble of molecules. Parameter inference applied to velocity-force-[NTP] experimental data is straightforward and computationally fast when using these equations. However these equations do not describe the distribution of velocity nor do they account for site heterogeneity across the nucleic acid sequence and therefore cannot predict local sequence effects.

Second, there are the **stochastic** models, which can be implemented via simulation of single-molecule behaviour using the Gillespie algorithm [24]. The mean velocity can be calculated by averaging velocities over a number of simulations for a given *F* and [NTP]. This offers not just the mean but a full distribution of velocities and could potentially explain emergent properties unavailable from a deterministic model. Unfortunately, simulating can be very slow and therefore parameter inference can be a problem.

In this study we used a Markov-chain-Monte-Carlo approximate-Bayesian-computation (MCMC-ABC) algorithm [25] to estimate transcription elongation parameters for **stochastic** models via simulation. The observed pause-free velocities we are fitting to were measured at varying applied force and NTP concentration. For each RNA polymerase under study—*E. coli* RNAP, *S. cerevisiae* pol II, and T7 pol—we fit to one respective dataset from the single-molecule literature [4, 26, 27].

## Models

### Notation and state space

Suppose the TEC is transcribing a gene of length *L*. Then let *S*(*l*, *t*) denote a TEC state, where the mRNA is currently of length *l* ≤ *L*, and describes the position of the active site with respect to the 3′ end of the mRNA. When *t* = 0 the polymerase is pretranslocated and cannot bind NTP, and when *t* = 1 the polymerase is posttranslocated and *can* bind NTP (Fig 2). This study is focused on the main elongation pathway and the observed velocities being fitted have pauses filtered out. Therefore, although additional backtracked states (*t* < 0) [4, 28, 29] and hypertranslocated states (*t* > 1) [30, 31] exist, these are not incorporated in the model.

Let *β*_{1} and *β*_{2} be the number of unpaired template nucleotides upstream and downstream of RNA polymerase, respectively, and let *h* be the number of basepairs in the DNA/mRNA hybrid (Fig 2A). Although there are uncertainties in these parameters, they are held constant at *h* = 9, *β*_{1} = 2, and *β*_{2} = 1 [17, 32].

Transcription of the gene begins at state *S*(*l*_{0}, 0) and ends upon reaching *S*(*L*, 0), where *l*_{0} = *β*_{1} + *h* + 2.

### Parameterisation of the NTP binding step

NTP binding has been modelled as both a kinetic and equilibrium process in the literature [4, 17, 21].

In a kinetic binding model, NTP binding occurs at pseudo-first order rate *k*_{bind}[NTP], while NTP release occurs at rate *k*_{rel}. In this case, *k*_{bind} and must be estimated.

Under a partial equilibrium approximation NTP binding and release are assumed to be rapid enough that equilibrium is achieved. In this case, the rate constants *k*_{bind} and *k*_{rel} are subsumed by the NTP dissociation constant which becomes the sole binding-related parameter to estimate.

### Parameterisation of the translocation step

While inferences about the rate constants associated with NTP binding and catalysis (*k*_{bind}, , and *k*_{cat}) can be made directly from the data, the translocation step is more complex. Transition state theory is invoked in order to estimate *k*_{fwd} and *k*_{bck}. Recasting the problem in this way (1) provides a way of accommodating the effects of applied force on the elongation process, and (2) allows the sequence-dependence of translocation to be incorporated by considering the energetics of basepairing. When allowing for sequence dependence, the total number of translocation rates required to model translocation of the full gene is 2(*L* − *l*_{0}).

#### Thermodynamic models of base pairing energies.

The standard Gibbs free energies Δ_{r}*G*^{0}(= Δ*G*) involved in duplex formation are calculated using nearest neighbour models. The standard Gibbs energy of state *S*—arising from nucleotide basepairing and dangling ends—is calculated as
(2)
where SantaLucia’s DNA/DNA basepairing parameters [33] are used to calculate and Sugimoto’s DNA/RNA parameters [34] are used for . For the latter, dangling end energies are estimated as described by Bai et al. 2004 [21]. Here, and elsewhere, the ^{(bp)} superscript is used to denote a model parameter that can be evaluated from the sequence alone. Gibbs energies are expressed on a per molecule basis, relative to the thermal energy of the system, in multiples of *k*_{B}*T*, where *k*_{B}*T* = 4.28001 × 10^{−21} J at *T* = 310 K.

In order for RNA polymerase to translocate forward (backward), up to two basepairs must be disrupted: (1) the basepair at the downstream (upstream) edge of the transcription bubble, and (2) the basepair at the upstream (downstream) end of the DNA/mRNA hybrid (Fig 2B). Differences in the basepairing energies in these regions confer sequence-dependence on the rate of translocation.

#### Calculation of translocation rates or translocation equilibrium constant.

The standard Gibbs energies of the pre and posttranslocated states, and , respectively, are used with up to four additional terms—Δ*G*_{τ1}, *δ*_{1}, , and —to calculate the translocation rates. The first three are model parameters which must be estimated while the latter is directly evaluated from the sequence.

Let *T*(*l*, *t*) be the translocation transition state between *S*(*l*, *t*) and *S*(*l*, *t* + 1). Then is the sequence-dependent standard Gibbs energy of activation which must be overcome in order to translocate (Fig 3).

(A) Effects of model parameters on state energies. The figure displays a schematic Gibbs energy landscape of translocation, with backtracked states included for visualisation purposes. The solid red lines represent translocation states (*t* = 0: pretranslocated, *t* = 1: posttranslocated, and *t* < 0: backtracked), while the dashed red lines represent transition states. Applying an assisting force *F* > 0 tilts the landscape in favour of higher values of *t*. The effect of Δ*G*_{τ1} is observed at the posttranslocated state *t* = 1. In a translocation equilibrium model, the barrier height is assumed to be so small, = translocation is so rapid, that the transition states are disregarded. (B) A model for the sequence-dependent transition state between translocation states *S*(*l*, 0) and *S*(*l*, 1). This is required for estimating the Gibbs energy of basepairing in the transition state. The basepairing energy, added to a baseline term , together specify the height of the activation barrier (Eq 10).

Given an applied force *F*, the translocation rates governing transition between the pre and posttranslocated states (*k*_{fwd}(*l*) and *k*_{bck}(*l*)) are calculated from barrier height using an Arrhenius type relation:
(3) (4)

The derived rates *k*_{fwd}(*l*) and *k*_{bck}(*l*) are therefore dependent on the local sequence. The pre-exponential factor *A* is held constant at 10^{6} s^{−1}. This term has been arbitrarily set to a variety of values in previous studies (10^{6}−10^{9} s^{−1} [16, 17, 21]). This has little consequence for model fitting, however the value of is entangled with the value of the pre-exponential factor *A* and can only be meaningfully interpreted in light of its value.

If the system has time to reach equilibrium, the probabilities of observing the pretranslocated state *S*(*l*, 0) and posttranslocated state *S*(*l*, 1) are
(5) (6)

This is described by equilibrium constant *K*_{τ}.
(7) (8) (9)

The physical meanings of the terms Δ*G*_{τ1}, *δ*_{1}, , and , and the way they are used in the model, are detailed below.

#### Energetic bias for the posttranslocated states.

Δ*G*_{τ1} (units *k*_{B}*T*) is a parameter added to the standard Gibbs energy of the posttranslocated state. If Δ*G*_{τ1} = 0, then the sequence alone determines the Gibbs energy difference between pre and posttranslocated states. In this case, pretranslocated states are usually favoured over posttranslocated states due to the loss of a single basepair in the hybrid of the latter.

Δ*G*_{τ1} has frequently been estimated for T7 pol [35–37] and there has been discussion around whether such a term is necessary for RNAP [6].

#### Polymerase displacement and formation of the transition state.

*δ*_{1} (units Å) is the distance that the polymerase must translocate forward to facilitate the formation of the transition state. The distance between adjacent basepairs is held constant at an experimentally measured value *δ* = 3.4 Å [15], and 0 < *δ*_{1} < *δ*. The response of the system to an applied force *F* depends on this term. In general, the application of force *F* tilts the Gibbs energy landscape—the Gibbs energy difference between adjacent translocation states being augmented by a factor (Fig 3A, [38, 39]).

It may be necessary to estimate *δ*_{1} to model the data adequately [17], or it may be sufficient to simply set *δ*_{1} = *δ*/2 [38].

#### Energy barrier of translocation.

and (units *k*_{B}*T*) together determine the activation barrier height in the translocation step. It is assumed that the sequence-dependent standard Gibbs energy of activation can be written as
(10)

is therefore a sequence-independent baseline term used to compute the translocation barrier heights. The parameter must be estimated in order to evaluate translocation rates.

In contrast is a term that is evaluated directly from the sequence derived from a model of the transition state (Fig 3B). The term is evaluated as the standard Gibbs energy of a TEC containing all hybrid and gene basepairs found in both *S*(*l*, *t*) and *S*(*l*, *t* + 1), ie. the intersection between the two sets of basepairs.

### Model space

The full transcription elongation model makes use of the following 6 parameters:

*k*_{cat}(units s^{−1}).-
(units
*μ*M). *k*_{bind}(units*μ*M^{−1}s^{−1}).- Δ
*G*_{τ1}(units*k*_{B}*T*). *δ*_{1}(units Å).-
(units
*k*_{B}*T*).

However fewer than 6 parameters may be needed to adequately describe the data. If it is assumed that the energy differences between pre and posttranslocated states are determined by basepairing energies alone, the parameter Δ*G*_{τ1} does not need to be estimated. This is equivalent to holding Δ*G*_{τ1} constant at 0. If it is assumed that the displacement required for formation of the translocation intermediate state is half the distance between adjacent basepairs, the parameter *δ*_{1} does not need to be estimated. This is equivalent to holding *δ*_{1} constant at *δ*/2.

Partial equilibrium approximations may also simplify the model, as detailed above. If binding is approximated as an equilibrium process, *k*_{bind} does not need to be estimated. If translocation is approximated as an equilibrium process, and *δ*_{1} do not need to be estimated. One, both, or neither of these two steps (binding and translocation) could be assumed to achieve equilibrium, thus yielding four equilibrium model variants (Fig 4A). The introduction of partial equilibrium approximations for both the NTP binding and translocation steps has implications when specifying the prior distributions for the Bayesian analysis (S4 Appendix) The chemical master equations for single nucleotide addition cycles of these models are presented in S2 Appendix.

(A) The four equilibrium model variants. NTP binding, translocation, both, or neither, could be assumed to achieve equilibrium prior to catalysis. (B) The 12 transcription elongation models. An arrow connects model *i* to *j* if augmentation of model *i* with a single parameter generates model *j*. The number of parameters to estimate *k* is shown for each level in the network. Equilibrium approximation colour scheme is the same as in A. Δ*G*_{τ1} and *δ*_{1} can each be estimated or set to a constant.

Incorporating these simplifications to the model in a combinatorial fashion results in a total of 12 related models, which together constitute the model space. Our objective was to determine which of these 12 models provides the best description of the experimental data. The simplest model (Model 1) contains 2 parameters (*k*_{cat} and *K*_{D}). The most complex model (Model 12) contains all 6 parameters. The full model space is displayed in Fig 4B.

### Stochastic modelling

For each model we performed stochastic simulations, appropriate for the modelling of single-molecule force-velocity data. The simulations, performed using the Gillespie algorithm [24, 40], can be used to estimate the mean elongation velocity under a model.

The estimation of mean velocity can be broken down into three steps. First, the system is initialised by placing the RNA polymerase at the 3′ end of the template—state *S*(*l*_{0}, 0)—with the transcription bubble open and a DNA/RNA hybrid formed. The force and NTP concentrations are assigned their experimentally set values. Second, a chemical reaction is randomly sampled. The probability that reaction is selected is proportional to its rate constant *k* (Fig 2). The amount of time taken for the reaction to occur is sampled from the exponential distribution. States which are subject to a partial equilibrium approximation are coalesced into a single state, which augments the outbound rate constants. The second step is repeated until the RNA polymerase has copied the entire template. Third, the previous two steps are repeated *c* times. The mean elongation velocity is evaluated as the mean of each mean elongation velocity across *c* simulations. For further information, see S1 Appendix.

### Relation to previous models and stochastic simulations

There is an extensive literature concerned with the kinetic modelling of transcription elongation. Such models may incorporate backtracking, hypertranslocation, and other reactions. Here we are concerned only with the central elongation pathway.

A stochastic and sequence-dependent model was proposed by Bai et al. 2004 [21] for RNAP, with both NTP binding and translocation treated as equilibrium processes. The translocation equilibrium constant was calculated entirely from basepairing energies. Therefore this model is equivalent to Model 1, and the parameters were estimated as *k*_{cat} = 24.7 s^{−1} and *K*_{D} = 15.6 *μM* from fit to experimental data. Maoiléidigh et al. 2011 also presented stochastic simulations of RNAP. The elongation component of their model is equivalent to Model 6 [17]. We build on this work by providing a systematic Bayesian framework for model comparison and parameter estimation.

While our analysis employed sequence-dependent stochastic models, comparisons can also be made with some deterministic models.

Abbondanzieri et al. 2005 [4], Larson et al. 2012 [41], Schweikhard et al. 2014 [26], and Thomen et al. 2008. [27, 37] described a deterministic model (for RNAP, pol II, pol II, and T7 pol respectively) which estimated *k*_{cat}, *K*_{D} and translocation equilibrium constant . These are most similar to Model 4.

Maoiléidigh et al. 2011 for RNAP, and Dangkulwanich et al. 2013 for pol II, however found that the translocation and catalysis were occurring on similar timescales, and modelled only NTP binding as an equilibrium process [17, 42]. They also estimated the distance of translocation. These deterministic models are most similar to Model 11.

Finally, Mejia et al. 2015 [43] used a model that is quite different to all the above models, as it does not explicitly treat translocation. Instead elongation is modelled with a two step kinetic scheme, the first step involving NTP binding and conformational change, and the second step involving nucleotide incorporation and product release. This model is most similar to a special case of Model 5 where Δ*G*_{τ1} becomes extremely negative, driving the polymerase into the posttranslocated position.

## Results and discussion

### Model selection with MCMC-ABC

Our aim was to 1) use Bayesian inference to select the best of 12 transcription elongation models for each RNA polymerase; and 2) estimate the parameters for those of the models appearing in the 95% credible set of the posterior distribution. Selecting prior distributions behind each parameter is a critical process in Bayesian inference. A prior distribution should reflect what is known about the parameter before observing the new data. We have explicated our prior assumptions, with justifications, in Table 1.

We performed MCMC-ABC experiments which estimated the parameters and model indicator *M*_{i} for . Models which appear more often in this posterior distribution are better choices, given the data. The model indicator is a discrete variable which can take 12 values, and is treated identically to the 6 continuous parameters in the Bayesian framework.

The datasets we fit our models to are all from the single-molecule literature and are presented in: Figures 5a and 5b of Abbondanzieri et al. 2005 [4] for *E. coli* RNAP, Figure 2a of Schweikhard et al. 2014 [26] for *S. cerevisiae* pol II, and Table 2 of Thomen et al. 2008 [27] for T7 pol. To computationally replicate these experiments as faithfully as we could with the available information and computational limitations, simulations in this study were run on the 4 kb *E. coli* *rpoB* gene for RNAP (GenBank: EU274658), the first 4.75 kb of the human *rpb1* gene for pol II (NCBI: NG_027747) the first 10 kb of the Enterobacteria phage λ genome for T7 pol (NCBI: NC_001416). The mean velocities from 32 (for RNAP), 10 (for pol II) and 3 (for T7 pol) simulations of the full respective sequences were used to estimate the mean elongation velocity during MCMC-ABC, given *F* and [NTP].

For further information about the MCMC-ABC algorithm [25, 44], or the model indicator *M*_{i}, see S3 Appendix.

### The posterior distributions

The posterior distributions from our MCMC-ABC experiments are presented in Table 2, Figs 5 and 6.

Posterior distributions for all models which appear in the 95% credible set are displayed (two models for RNAP, two models for pol II, and one model for T7 pol). Plots show the prior probability density *P*(*θ*) of each parameter and posterior probability density of each parameter conditional on the model *P*(*θ*|*D*, *M*_{i}). The geometric median point-estimates and highest posterior density (HPD) intervals (calculated with Tracer 1.6 [53]) are displayed above each plot (3 sf).

Black open circles represent experimentally measured mean velocities reported in the original publication for (A) RNAP, (B) pol II, and (C) T7 pol [4, 26, 27]. Each coloured dot represents a single sample simulated from the posterior distribution of parameters/models for the respective polymerase. 30 samples were generated from each of the three posterior distributions. For RNAP, [NTP]_{eq} is defined as [ATP] = 5 *μ*M, [CTP] = 2.5 *μ*M, [GTP] = 10 *μ*M, and [UTP] = 10 *μ*M.

A large effective sample size (> 100 [53]) and a small (< 1.1, as defined by Gelman et al. 1992 [54–56]) are essential for making reliable parameter estimates. Table 2 suggests that the parameters in the 95% credible set of models are sufficiently estimated by these criteria.

These results indicate that the best models for the datasets examined are Models 11 and 12 for both RNAP and pol II, and Model 5 for T7 pol (Fig 4B).

For pol II, Model 12 has the highest posterior probability *P*(*M*_{12}|*D*) = 0.71. This is the most complex model considered, with 6 estimated parameters. In Model 12 translocation, NTP binding and catalysis are all kinetic processes; the displacement required to facilitate formation of the translocation transition state, *δ*_{1} < *δ*, is estimated (); and the standard Gibbs energy of the posttranslocated state is influenced by parameter Δ*G*_{τ1} ≠ 0.

The posterior distribution for RNAP consists of the same set models as that of pol II. For RNAP, Model 11 has the highest probability *P*(*M*_{11}|*D*) = 0.81. This model is a submodel of Model 12 with one fewer parameter: in Model 11 NTP binding is treated as an equilibrium process while in Model 12 it is not.

The only model in the 95% credible set for T7 pol is Model 5 *P*(*M*_{5}|*D*) = 0.96. In Model 5 (4 parameters) translocation, but not binding, is treated as an equilibrium process, and Δ*G*_{τ1} is estimated. This positions T7 pol in a quite different area of the model space to the other two polymerases.

### Translocation rates differ among RNA polymerases

For RNAP and pol II, we estimate that a partial equilibrium approximation for the translocation step is inadequate. The posterior probability that such models are inadequate is 1.00 (see Table 2). For T7 pol, however, translocation is significantly faster than catalysis and is best modelled with a partial equilibrium approximation. Using estimates for and Δ*G*_{τ1} under the maximum posterior models (Model 11 for RNAP and Model 12 for pol II) we estimate the mean forward and backward translocation rates averaged across the *rpoB* sequence as: 230 s^{−1} and 112 s^{−1} for RNAP, and 350 s^{−1} and 12.7 s^{−1} for pol II, respectively (3 sf). These estimates are within one order of magnitude of the respective estimate for the rate of catalysis (Fig 5) suggesting that translocation and catalysis indeed occur on similar timescales.

For RNAP and pol II, translocation has frequently been modelled as an equilibrium process [4, 21, 26, 41, 43], however in some recent analyses this assumption has been rejected [16, 17, 42, 57, 58]. Our Bayesian analysis supports this. In contrast, there is general agreement that translocation in T7 pol is adequately modelled as an equilibrium process [27, 59, 60].

### The data does not determine the kinetics of the NTP binding step

It remains unclear how to best model the NTP binding step. Models that describe NTP binding as a kinetic process have posterior probabilities of 0.19 for RNAP, 0.71 for pol II and 0.96 for T7 pol (Table 2). However, in an earlier experiment, where we used different a prior distribution for , the latter probability was 0.21 and *P*(*M*_{4}|*D*) was 0.79. The intermediate magnitude of these posterior probabilities, and sensitivity to the choice of prior, imply that the data contains very little information about which binding model is preferred.

Furthermore, and *k*_{bind} (Models 5 and 12) are unable to be estimated simultaneously. For pol II and for T7 pol, *k*_{bind} is estimated at around 0.48 and 1.4 *μM*^{−1} s^{−1} respectively with fairly narrow 95% highest posterior density (HPD) intervals (Fig 5). However, the HPD interval of spans three orders of magnitude and the value of this parameter was therefore poorly informed by the data. For RNAP, in contrast, neither *k*_{bind} nor were well-informed by the data and both have HPD intervals spanning 1-2 orders of magnitude. This non-identifiability—where two or more parameters are unable to be estimated simultaneously (S4 Appendix)—highlights the appeal in an NTP binding equilibrium model where only one parameter needs to be estimated, despite the unrealistic assumptions it may invoke. In the case of each enzyme, the data has taught us nothing about one or two of the binding parameters.

The pause-free mean velocities measured during transcription elongation follow Michaelis-Menten kinetics even though the reaction cycle is more complicated than that of a simple enzyme [61]. As such, the inability to resolve the timescale of the substrate binding step is unsurprising [62–64].

In the transcription literature, NTP binding is almost always assumed to achieve equilibrium for RNAP, pol II, and T7 pol [4, 16, 17, 21, 26, 27, 37, 41, 42, 60]. However Mejia et al. 2015 [43] have shown that NTP binding is indeed rate-limiting, and that mutations in the RNAP trigger loop impair the binding rate thus suggesting that the trigger loop is coupled with NTP binding.

### RNAP has an energetic preference for the posttranslocated state

In previous stochastic sequence-dependent models [16, 21] the standard Gibbs energies of the pre and posttranslocated states have been based solely on the nucleic acid basepairing energies. Our models include an additional term, Δ*G*_{τ1}, to account for potential interactions between the protein and the nucleic acid. The marginal posterior probability of a model in which an additional term Δ*G*_{τ1} is required is 1.00 in all three polymerases. In each case Δ*G*_{τ1} was estimated to be less than 0 *k*_{B}*T* and 0 *k*_{B}*T* is not included in the 95% HPD interval (Fig 5). We find that is the most significant in pol II and T7 pol: −4.6 *k*_{B}*T* and −4.0 *k*_{B}*T* respectively, while *k*_{B}*T* for RNAP (2 sf).

These results suggest that structural elements within RNA polymerases can energetically favour posttranslocated states over pretranslocated states. We note that the sequence-dependent contribution of the dangling end of the DNA/RNA hybrid is included in the thermodynamic model. The energetic bias for the posttranslocated state is separable from this effect.

To facilitate comparison with previous deterministic models, using our estimates of Δ*G*_{τ1} we calculated the equilibrium constant between the pre and posttranslocated states. Geometrically averaged across the *rpoB* gene, these are
(11)

Thus, for all three polymerases, *K*_{τ} < 1, indicating that the small energetic preference that the protein has for the posttranslocated state is sufficient to override the loss of basepairing energy, thereby biasing the system towards population of the posttranslocated positions. This is in agreement with estimates made for pol II and T7 pol [26, 27, 35, 36, 41] and Kireeva et al. 2018 [58] for RNAP: *“forward translocation occurs in milliseconds and is poorly reversible”*. However these estimates are inconsistent with some RNAP and pol II studies which place this ratio above 1 [4, 17, 42, 52].

Kinetic modelling can itself suggest no physical mechanism for the stabilisation. Yu et al. 2012 [36] have identified a conserved tyrosine residue near the active site of T7 pol that pushes against the 3′ end of the mRNA, and thus stabilises the posttranslocated state. They propose a similar mechanism for the multi-subunit RNA polymerases.

*δ*_{1} may be an important parameter but its physical meaning is unclear

Our results suggest that *δ*_{1}, the distance that RNA polymerase must translocate forward by to reach the translocation transition state, is a necessary parameter to estimate for RNAP and pol II. Setting *δ*_{1} = *δ*/2 is not sufficient. The marginal posterior probability of models which estimate this term is 1.00. *δ*_{1} is irrelevant to the modeling of the T7 pol data because the best models invoke a partial equilibrium approximation for the translocation step.

While our prior distribution restricted *δ*_{1} to lie in the range (0, *δ*), the upper end our 95% HPD intervals of *δ*_{1} for RNAP and pol II are very close to *δ* = 3.4 Å. If it was not for this prior distribution, *δ*_{1} estimates would have included values higher than *δ*. Similar results have been observed by Maoiléidigh et al. 2011 [17] for RNAP.

Our interpretation of *δ*_{1} implies it should never be greater than *δ* nor should *δ* be more than the width of one basepair. The physical meaning of *δ*_{1} with values greater than *δ* is thus unclear. It is noted that *δ*_{1} is only used when *F* ≠ 0.

### Comparing the kinetics of RNA polymerases

The *in vivo* rate of transcription elongation varies considerably across RNAP, pol II and T7 pol. The prokaryotic and eukaryotic RNA polymerases have a mean rate ranging from 20-120 bp/s [45, 46, 48, 49, 65–67], which may be slowed down in histone-wrapped regions of eukaryotic genomes [7]. In contrast, Bacteriophage T7 pol operates up to an order of magnitude faster (around 200-240 bp/s [49, 68]) and is known to be quite insensitive to transcriptional pause sites [9, 27].

In additional to these differences, we have shown that translocation is very rapid in T7 pol, relative to the rate of NTP incorporation, while the disparity is much less significant in RNAP and pol II. Furthermore, the model does not fit the data for T7 pol as closely it does for RNAP and pol II (Fig 6). T7 pol therefore seems to operate under quite a different kinetic scheme than that of the cellular polymerases, which is not unexpected given their distant evolutionary relationship [3].

In general, the elongation velocity of RNA polymerase is significantly slower in an optical trap (with estimates ranging from 9.7-22 bp/s for RNAP [11–13, 43, 69]) compared with that of the untethered enzyme (with estimates *in vitro* or *in vivo* ranging from 25-118 bp/s for RNAP [45, 49, 70, 71]). This relationship holds for multiple RNA polymerases including *E. coli* RNAP, *S. cerevisiae* pol II [41, 42, 52, 72], Bacteriophage T7 pol [9, 27, 49, 51], and Bacteriophage Φ6 P2 [10, 73]. This suggests that optical trapping perturbs the system to a significant extent. Additionally, varying degrees of heterogeneity in elongation rate have been observed across different polymerase complexes even under the same conditions [11, 13, 27].

The velocity perturbations resulting from the optical trapping apparatus will be propagated into the model parameters, especially *k*_{cat}, and , and some caution is needed when extrapolating these results to untethered systems.

### Bayesian inference of transcription elongation

To our knowledge we are the first to perform Bayesian inference on single-molecule models of transcription elongation. This was achieved by simulation which necessitated the use of approximate Bayesian computation. An alternative would be to build and use a likelihood function (ie. the probability of taking exactly *t* units of time for RNA polymerase to copy the sequence *n* times). The latter approach can be achieved using chemical master equations, as opposed to (Gillespie) sampling from the distribution. Finding analytical, stable numerical, or approximate solutions to the chemical master equations could provide a similar insight in less computational time, however is susceptible to a multitude of analytical and numerical issues associated with the exponentiation of an arbitrary transition rate matrix that grows with the length of the sequence (S2 Appendix) [74]. This problem would be amplified by the introduction of backtracking, hypertranslocation, or NTP misincorporation reactions into the model, for instance. The Bayesian framework we have presented, although computationally intensive due to its simulation requirement, is general and will work on any model of transcription without the need to resolve these issues. The path has been paved for modelling transcriptional pausing, for instance [16, 21, 75]. Nevertheless, likelihood-based Bayesian inference is an approach that should be explored in the future.

We have demonstrated that single-molecule data can be usefully analysed using a Bayesian inference and model selection framework. This analysis would have even greater statistical power if applied to the progression of individual RNA polymerase complexes instead of mean velocities averaged across multiple experiments.

### Conclusion

In this article we evaluated some simple Brownian ratchet models of transcription elongation (Fig 2). By varying the parameterisation of the translocation step (Fig 3) and incorporating partial equilibrium approximations commonly invoked in the literature (Fig 4A) we enumerated a total of 12 related models (Fig 4B). Using stochastic simulations and approximate Bayesian computation, we then assessed which of these models were capable of describing the force-velocity data previously measured for several RNA polymerases (Table 2 and Fig 5) using single-molecule optical trapping experiments [4, 26, 27].

Our analysis suggests that 1) different partial equilibrium approximations of the translocation step are appropriate for the multisubunit RNA polymerases versus the single subunit T7 RNA polymerase. 2) Treatment of the NTP binding step remains a point of ambiguity. The existing data does not place strong constraints on the modelling of this step. 3) There is an energetic bias for posttranslocated state. 4) The model of the force-dependent translocation, which invokes transition state theory, is not physically realistic.

## Supporting information

### S1 Appendix. Stochastic simulation.

The Gillespie algorithm is described.

https://doi.org/10.1371/journal.pcbi.1006717.s001

(PDF)

### S2 Appendix. Chemical master equations.

Master equations for the four equilibrium model variants are presented.

https://doi.org/10.1371/journal.pcbi.1006717.s002

(PDF)

### S3 Appendix. MCMC-ABC.

A description of the MCMC-ABC algorithm and how it is used to infer parameters and models from experimental data.

https://doi.org/10.1371/journal.pcbi.1006717.s003

(PDF)

### S4 Appendix. Prior distributions.

Simulation-based justifications behind prior distributions for , *k*_{bind}, and .

https://doi.org/10.1371/journal.pcbi.1006717.s004

(PDF)

### S1 Fig. Simulations of the elongation pathway.

Each point is a single simulation of the full *rpoB* gene (4029 nt). For (A-C), Parameters on the x- and z-axis are sampled uniformly at random from the displayed range at the beginning of each trial. The y-axis of each plot (mean elongation velocity) is then measured from the respective simulation. [NTP] and *F* held constant at 1000 *μ*M and 0 pN respectively. (A) and (B): Relationship between and *k*_{cat} for the melting model with binding at equilibrium (Model 8). Δ*G*_{τ1} set to its prior mean (0 for RNAP and pol II, and -3.3 for T7 pol). (C) Relationship between *k*_{bind} and *k*_{cat} for the kinetic binding model with translocation at equilibrium (Model 2). (D) Relationship between *K*_{D} and *k*_{bind} with translocation held at equilibrium (Model 2). *K*_{D} and *k*_{bind} sampled uniformly from specified range and velocity is measured. Samples with simulated velocities outside of the range 1-2 bp/s were discarded. [NTP] = 10 *μ*M and *k*_{cat} = 100 s^{−1}.

https://doi.org/10.1371/journal.pcbi.1006717.s005

(PDF)

## References

- 1. Sweetser D, Nonet M, Young RA. Prokaryotic and eukaryotic RNA polymerases have homologous core subunits. Proceedings of the National Academy of Sciences. 1987;84(5):1192–1196.
- 2. Sosunov V, Sosunova E, Mustaev A, Bass I, Nikiforov V, Goldfarb A. Unified two-metal mechanism of RNA synthesis and degradation by RNA polymerase. The EMBO journal. 2003;22(9):2234–2244. pmid:12727889
- 3. Sousa R, Chung YJ, Rose JP, Wang BC. Crystal structure of bacteriophage T7 RNA polymerase at 3.3 Å resolution. Nature. 1993;364(6438):593. pmid:7688864
- 4. Abbondanzieri EA, Greenleaf WJ, Shaevitz JW, Landick R, Block SM. Direct observation of base-pair stepping by RNA polymerase. Nature. 2005;438(7067):460–465. pmid:16284617
- 5. Adelman K, La Porta A, Santangelo TJ, Lis JT, Roberts JW, Wang MD. Single molecule analysis of RNA polymerase elongation reveals uniform kinetic behavior. Proceedings of the National Academy of Sciences. 2002;99(21):13538–13543.
- 6. Bai L, Fulbright RM, Wang MD. Mechanochemical kinetics of transcription elongation. Physical review letters. 2007;98(6):068103. pmid:17358986
- 7. Hodges C, Bintu L, Lubkowska L, Kashlev M, Bustamante C. Nucleosomal fluctuations govern the transcription dynamics of RNA polymerase II. Science. 2009;325(5940):626–628. pmid:19644123
- 8. Galburt EA, Grill SW, Wiedmann A, Lubkowska L, Choy J, Nogales E, et al. Backtracking determines the force sensitivity of RNAP II in a factor-dependent manner. Nature. 2007;446(7137):820–823. pmid:17361130
- 9. Skinner GM, Baumann CG, Quinn DM, Molloy JE, Hoggett JG. Promoter binding, initiation, and elongation by bacteriophage T7 RNA polymerase a single-molecule view of the transcription cycle. Journal of Biological Chemistry. 2004;279(5):3239–3244. pmid:14597619
- 10. Dulin D, Vilfan ID, Berghuis BA, Hage S, Bamford DH, Poranen MM, et al. Elongation-competent pauses govern the fidelity of a viral RNA-dependent RNA polymerase. Cell reports. 2015;10(6):983–992. pmid:25683720
- 11. Neuman KC, Abbondanzieri EA, Landick R, Gelles J, Block SM. Ubiquitous transcriptional pausing is independent of RNA polymerase backtracking. Cell. 2003;115(4):437–447. pmid:14622598
- 12. Davenport RJ, Wuite GJ, Landick R, Bustamante C. Single-molecule study of transcriptional pausing and arrest by E. coli RNA polymerase. Science. 2000;287(5462):2497. pmid:10741971
- 13. Tolić-Nørrelykke SF, Engh AM, Landick R, Gelles J. Diversity in the rates of transcript elongation by single RNA polymerase molecules. Journal of Biological Chemistry. 2004;279(5):3292–3299. pmid:14604986
- 14. Abbondanzieri EA, Shaevitz JW, Block SM. Picocalorimetry of transcription by RNA polymerase. Biophysical journal. 2005;89(6):L61–L63. pmid:16239336
- 15. Watson JD, Crick FH, et al. Molecular structure of nucleic acids. Nature. 1953;171(4356):737–738. pmid:13054692
- 16. Tadigotla VR, Maoiléidigh DÓ, Sengupta AM, Epshtein V, Ebright RH, Nudler E, et al. Thermodynamic and kinetic modeling of transcriptional pausing. Proceedings of the National Academy of Sciences of the United States of America. 2006;103(12):4439–4444. pmid:16537373
- 17. Maoiléidigh DÓ, Tadigotla VR, Nudler E, Ruckenstein AE. A unified model of transcription elongation: what have we learned from single-molecule experiments? Biophysical journal. 2011;100(5):1157–1166.
- 18. Maitra U, Nakata Y, Hurwitz J. The Role of Deoxyribonucleic Acid in Ribonucleic Acid Synthesis XIV. A Study of the Initiation of Ribonucleic Acid Synthesis. Journal of Biological Chemistry. 1967;242(21):4908–4918. pmid:4862425
- 19. Erie DA, Yager TD, Von Hippel PH. The single-nucleotide addition cycle in transcription: a biophysical and biochemical perspective. Annual review of biophysics and biomolecular structure. 1992;21(1):379–415. pmid:1381976
- 20. Rhodes G, Chamberlin MJ. Ribonucleic acid chain elongation by Escherichia coli ribonucleic acid polymerase I. Isolation of ternary complexes and the kinetics of elongation. Journal of Biological Chemistry. 1974;249(20):6675–6683. pmid:4608711
- 21. Bai L, Shundrovsky A, Wang MD. Sequence-dependent kinetic model for transcription elongation by RNA polymerase. Journal of molecular biology. 2004;344(2):335–349. pmid:15522289
- 22. Bustamante C, Chemla YR, Forde NR, Izhaky D. Mechanical processes in biochemistry. Annual review of biochemistry. 2004;73(1):705–748. pmid:15189157
- 23. Cleland W. Partition analysis and concept of net rate constants as tools in enzyme kinetics. Biochemistry. 1975;14(14):3220–3224. pmid:1148201
- 24. Gillespie DT. Exact stochastic simulation of coupled chemical reactions. The journal of physical chemistry. 1977;81(25):2340–2361.
- 25. Beaumont MA. Approximate Bayesian computation in evolution and ecology. Annual review of ecology, evolution, and systematics. 2010;41:379–406.
- 26. Schweikhard V, Meng C, Murakami K, Kaplan CD, Kornberg RD, Block SM. Transcription factors TFIIF and TFIIS promote transcript elongation by RNA polymerase II by synergistic and independent mechanisms. Proceedings of the National Academy of Sciences. 2014;111(18):6642–6647.
- 27. Thomen P, Lopez P, Bockelmann U, Guillerez J, Dreyfus M, Heslot F. T7 RNA polymerase studied by force measurements varying cofactor concentration. Biophysical journal. 2008;95(5):2423–2433. pmid:18708471
- 28. Wang MD, Schnitzer MJ, Yin H, Landick R, Gelles J, Block SM. Force and velocity measured for single molecules of RNA polymerase. Science. 1998;282(5390):902–907. pmid:9794753
- 29. Shaevitz JW, Abbondanzieri EA, Landick R, Block SM. Backtracking by single RNA polymerase molecules observed at near-base-pair resolution. Nature. 2003;426(6967):684–687. pmid:14634670
- 30. Artsimovitch I, Landick R. Pausing by bacterial RNA polymerase is mediated by mechanistically distinct classes of signals. Proceedings of the National Academy of Sciences. 2000;97(13):7090–7095.
- 31. Zhou Y, Navaroli DM, Enuameh MS, Martin CT. Dissociation of halted T7 RNA polymerase elongation complexes proceeds via a forward-translocation mechanism. Proceedings of the National Academy of Sciences. 2007;104(25):10352–10357.
- 32. Greive SJ, Von Hippel PH. Thinking quantitatively about transcriptional regulation. Nature Reviews Molecular Cell Biology. 2005;6(3):221–232. pmid:15714199
- 33. SantaLucia J. A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proceedings of the National Academy of Sciences. 1998;95(4):1460–1465.
- 34. Wu P, Nakano Si, Sugimoto N. Temperature dependence of thermodynamic properties for DNA/DNA and RNA/DNA duplex formation. The FEBS Journal. 2002;269(12):2821–2830.
- 35. Yin YW, Steitz TA. The structural mechanism of translocation and helicase activity in T7 RNA polymerase. Cell. 2004;116(3):393–404. pmid:15016374
- 36. Yu J, Oster G. A small post-translocation energy bias aids nucleotide selection in T7 RNA polymerase transcription. Biophysical journal. 2012;102(3):532–541. pmid:22325276
- 37. Thomen P, Lopez PJ, Heslot F. Unravelling the mechanism of RNA-polymerase forward motion by using mechanical force. Physical Review Letters. 2005;94(12):128102. pmid:15903965
- 38. Depken M, Galburt EA, Grill SW. The origin of short transcriptional pauses. Biophysical journal. 2009;96(6):2189–2193. pmid:19289045
- 39. Herbert KM, Greenleaf WJ, Block SM. Single-molecule studies of RNA polymerase: motoring along. Annu Rev Biochem. 2008;77:149–176. pmid:18410247
- 40. Lecca P. Stochastic chemical kinetics. Biophysical reviews. 2013;5(4):323–345. pmid:28510113
- 41. Larson MH, Zhou J, Kaplan CD, Palangat M, Kornberg RD, Landick R, et al. Trigger loop dynamics mediate the balance between the transcriptional fidelity and speed of RNA polymerase II. Proceedings of the National Academy of Sciences. 2012;109(17):6555–6560.
- 42. Dangkulwanich M, Ishibashi T, Liu S, Kireeva ML, Lubkowska L, Kashlev M, et al. Complete dissection of transcription elongation reveals slow translocation of RNA polymerase II in a linear ratchet mechanism. Elife. 2013;2:e00971. pmid:24066225
- 43. Mejia YX, Nudler E, Bustamante C. Trigger loop folding determines transcription rate of Escherichia coli’s RNA polymerase. Proceedings of the National Academy of Sciences. 2015;112(3):743–748.
- 44. Csilléry K, Blum MG, Gaggiotti OE, François O. Approximate Bayesian computation (ABC) in practice. Trends in ecology & evolution. 2010;25(7):410–418.
- 45. Vogel U, Jensen KF. The RNA chain elongation rate in Escherichia coli depends on the growth rate. Journal of bacteriology. 1994;176(10):2807–2813. pmid:7514589
- 46. Ryals J, Little R, Bremer H. Temperature dependence of RNA synthesis parameters in Escherichia coli. Journal of bacteriology. 1982;151(2):879–887. pmid:6178724
- 47. Richardson JP, Greenblatt J. Control of RNA chain elongation and termination. Escherichia coli and Salmonella: cellular and molecular biology. 1996;1:822–848.
- 48. Mason PB, Struhl K. Distinction and relationship between elongation rate and processivity of RNA polymerase II in vivo. Molecular cell. 2005;17(6):831–840. pmid:15780939
- 49. Iost I, Guillerez J, Dreyfus M. Bacteriophage T7 RNA polymerase travels far ahead of ribosomes in vivo. Journal of bacteriology. 1992;174(2):619–622. pmid:1729251
- 50. Bonner G, Lafer EM, Sousa R. Characterization of a set of T7 RNA polymerase active site mutants. Journal of Biological Chemistry. 1994;269(40):25120–25128. pmid:7929200
- 51. Anand VS, Patel SS. Transient state kinetics of transcription elongation by T7 RNA polymerase. Journal of Biological Chemistry. 2006;281(47):35677–35685. pmid:17005565
- 52. Kireeva ML, Nedialkov YA, Cremona GH, Purtov YA, Lubkowska L, Malagon F, et al. Transient reversal of RNA polymerase II active site closing controls fidelity of transcription elongation. Molecular cell. 2008;30(5):557–566. pmid:18538654
- 53.
Rambaut A, Drummond A. Tracer 1.6. University of Edinburgh, Edinburgh. UK. Technical report; 2013.
- 54. Gelman A, Rubin DB, et al. Inference from iterative simulation using multiple sequences. Statistical science. 1992;7(4):457–472.
- 55. Brooks SP, Gelman A. General methods for monitoring convergence of iterative simulations. Journal of computational and graphical statistics. 1998;7(4):434–455.
- 56.
Brooks S, Gelman A, Jones G, Meng XL. Handbook of markov chain monte carlo. CRC press; 2011.
- 57. Nedialkov YA, Nudler E, Burton ZF. RNA polymerase stalls in a post-translocated register and can hyper-translocate. Transcription. 2012;3(5):260–269. pmid:23132506
- 58. Kireeva M, Trang C, Matevosyan G, Turek-Herman J, Chasov V, Lubkowska L, et al. RNA–DNA and DNA–DNA base-pairing at the upstream edge of the transcription bubble regulate translocation of RNA polymerase and transcription rate. Nucleic acids research. 2018;46(11):5764–5775. pmid:29771376
- 59. Guajardo R, Lopez P, Dreyfus M, Sousa R. NTP concentration effects on initial transcription by T7 RNAP indicate that translocation occurs through passive sliding and reveal that divergent promoters have distinct NTP concentration requirements for productive initiation. Journal of molecular biology. 1998;281(5):777–792. pmid:9719634
- 60. Arnold S, Siemann M, Scharnweber K, Werner M, Baumann S, Reuss M, et al. Kinetic modeling and simulation of in vitro transcription by phage T 7 RNA polymerase. Biotechnology and bioengineering. 2001;72(5):548–561. pmid:11460245
- 61. Wong F, Dutta A, Chowdhury D, Gunawardena J. Structural conditions on complex networks for the Michaelis–Menten input–output response. Proceedings of the National Academy of Sciences. 2018;115(39):9738–9743.
- 62. Briggs GE, Haldane JBS. A note on the kinetics of enzyme action. Biochemical journal. 1925;19(2):338. pmid:16743508
- 63. English BP, Min W, Van Oijen AM, Lee KT, Luo G, Sun H, et al. Ever-fluctuating single enzyme molecules: Michaelis-Menten equation revisited. Nature chemical biology. 2006;2(2):87–94. pmid:16415859
- 64. Schnell S. Validity of the Michaelis–Menten equation–steady-state or reactant stationary assumption: that is the question. The FEBS journal. 2014;281(2):464–472. pmid:24245583
- 65. Tennyson CN, Klamut HJ, Worton RG. The human dystrophin gene requires 16 hours to be transcribed and is cotranscriptionally spliced. Nature genetics. 1995;9(2):184–190. pmid:7719347
- 66. Darzacq X, Shav-Tal Y, De Turris V, Brody Y, Shenoy SM, Phair RD, et al. In vivo dynamics of RNA polymerase II transcription. Nature structural & molecular biology. 2007;14(9):796–806.
- 67.
Kainov DE, Lísal J, Bamford DH, Tuma R. Packaging motor from double-stranded RNA bacteriophage
*ϕ*12 acts as an obligatory passive conduit during transcription. Nucleic acids research. 2004;32(12):3515–3521. pmid:15247341 - 68. Makarova OV, Makarov EM, Sousa R, Dreyfus M. Transcribing of Escherichia coli genes with mutant T7 RNA polymerases: stability of lacZ mRNA inversely correlates with polymerase speed. Proceedings of the National Academy of Sciences. 1995;92(26):12250–12254.
- 69. Mejia YX, Mao H, Forde NR, Bustamante C. Thermal probing of E. coli RNA polymerase off-pathway mechanisms. Journal of molecular biology. 2008;382(3):628–637. pmid:18647607
- 70. Burns CM, Richardson LV, Richardson JP. Combinatorial effects of NusA and NusG on transcription elongation and rho-dependent termination in Escherichia coli1. Journal of molecular biology. 1998;278(2):307–316. pmid:9571053
- 71. Kingston R, Nierman W, Chamberlin M. A direct effect of guanosine tetraphosphate on pausing of Escherichia coli RNA polymerase during RNA chain elongation. Journal of Biological Chemistry. 1981;256(6):2787–2797. pmid:7009598
- 72. Galburt EA, Grill SW, Bustamante C. Single molecule transcription elongation. Methods. 2009;48(4):323–332. pmid:19426807
- 73.
Usala SJ, Brownstein BH, Haselkorn R. Displacement of parental RNA strands during in vitro transcription by bacteriophage
*φ*6 nucleocapsids. Cell. 1980;19(4):855–862. pmid:7379123 - 74. Moler C, Van Loan C. Nineteen dubious ways to compute the exponential of a matrix, twenty-five years later. SIAM review. 2003;45(1):3–49.
- 75. Bai L, Wang MD. Comparison of pause predictions of two sequence-dependent transcription models. Journal of Statistical Mechanics: Theory and Experiment. 2010;2010(12):P12007.