Temperature-Dependent Model of Multi-step Transcription Initiation in Escherichia coli Based on Live Single-Cell Measurements

Transcription kinetics is limited by its initiation steps, which differ between promoters and with intra- and extracellular conditions. Regulation of these steps allows tuning both the rate and stochasticity of RNA production. We used time-lapse, single-RNA microscopy measurements in live Escherichia coli to study how the rate-limiting steps in initiation of the Plac/ara-1 promoter change with temperature and induction scheme. For this, we compared detailed stochastic models fit to the empirical data in maximum likelihood sense using statistical methods. Using this analysis, we found that temperature affects the rate limiting steps unequally, as nonlinear changes in the closed complex formation suffice to explain the differences in transcription dynamics between conditions. Meanwhile, a similar analysis of the PtetA promoter revealed that it has a different rate limiting step configuration, with temperature regulating different steps. Finally, we used the derived models to explore a possible cause for why the identified steps are preferred as the main cause for behavior modifications with temperature: we find that transcription dynamics is either insensitive or responds reciprocally to changes in the other steps. Our results suggests that different promoters employ different rate limiting step patterns that control not only their rate and variability, but also their sensitivity to environmental changes.

for the second strain). Antibiotics were from Sigma-Aldrich (USA). The composition of LB was: 10 g/L of tryptone (Sigma Aldrich, USA), 5 g/L of yeast extract (LabM, UK) and 10 g/L of NaCl (LabM, UK).
Finally, in order to obtain a set of medium conditions where differences between intracellular RNAP concentrations are maximized while differences in growth rates are minimized, we followed the procedure established in [4]. Namely, we carried out measurements in modified LB medium that have lower tryptone and yeast extract concentrations (by 0.25 or 0.5 fold), which reduces intracellular RNAP concentrations accordingly [4].

Induction of Target and Reporter Genes
Cell cultures were diluted in LB from overnight cultures to OD 600 of 0.05, and kept at 37°C at 250 RPM in a shaker until reaching mid-logarithmic phase with an OD 600 of 0.3. After that, cells containing the promoters P lac/ara-1 and P LtetO-1 were induced with 0.1% L-arabinose and 1 mM IPTG for target activation, and 100 ng/mL aTc for reporter activation. Cells containing the target promoter P tetA and the reporter promoter P LlacO-1 were induced with 15 ng/mL aTc for target activation and 1 mM IPTG for reporter activation. In both cases, for the cells to produce sufficient MS2-GFP for the detection of target RNAs, the reporter and target genes were induced 50 minutes prior to the measurements, while keeping cells shaking at 250 RPM in the incubator at the appropriate temperature (24, 27, 30, 33, 35, 37, 39, or 41°C). In the case of P lac/ara-1 , the induction was complemented by adding 1 mM IPTG 10 minutes prior to microscopy. In the end, cells were collected by centrifugation at 8000 × g for 1 minute, and diluted in fresh LB medium. For this, 5 µL of cells were added to an agarose pad (Sigma Life Science, USA), and placed into a temperature chamber (Bioptechs, FCS2) at the appropriate temperature for image acquisition.

Microscopy
The imaging was performed using a Nikon Eclipse (Ti-E, Nikon) inverted microscope, equipped with a 100× Apo TIRF (1.49 NA, oil) objective and a C2+ (Nikon) confocal laser-scanning system. Images were captured with the aid of a motorized stage. To visualize fluorescent-tagged RNA spots, we used a 488 nm laser (Melles-Griot) and an emission filter (HQ514/30, Nikon). Time-lapse fluorescence images were taken once per minute for 120 or 60 minutes. The software used for image acquisition was NIS-Elements (Nikon), and the images were analyzed using a custom software, described below.

Image Analysis
The fluorescence microscopy images were processed as follows. First, consecutive images in the time series were aligned such that the cross-correlation of fluorescence intensities is maximized. Next, a region occupied by each cell during the time series is manually annotated. After this, the locations, dimensions, and orientation of each cell in each frame are found by principal component analysis and the assumption that the fluorescence inside the cells is uniform [5]. Cell lineages were constructed using CellAging, which associates segments in consecutive frames based on their overlapping areas [6].
Next, the intensity of each cell is fit with a surface, which is a quadratic polynomial of the distance from the cell border, in least-deviations sense [7]. This surface is taken to represent the cellular background intensity resulting from the abundant, unbound MS2-GFPs, and is subtracted to obtain the foreground intensity. The foreground intensity is fit with a set of Gaussian surfaces, in least-deviations sense, with decreasing heights until the heights are in the 99% confidence interval of the background noise (estimated assuming a normal distribution and using median absolute deviation) [7]. The Gaussians are taken to represent fluorescent RNA spots, the volume under each representing the total spot intensity.
Since the lifetime of a MS2-GFP-tagged RNA is much longer than the cell division time [8,3], the cellular foreground intensity is expected to be an increasing curve, with a jump corresponding to an appearance of a new tagged RNA. The jump positions are estimated using a specialized curve fitting algorithm [9]. Any observed interval between two consecutive RNA productions is recorded for further analysis as-is, while the interval occurring after the last observed production is rendered right censored [10]. These right censored data improve the accuracy and avoid underestimating the transcription duration [10], as the exactly observed intervals tend to lack more longer intervals than shorter ones.

Modeling Transcription Initiation and Transcription Intervals
We assume the model of transcription initiation specified in Eq (1) of the main manuscript. This model is a submodel of the more general model proposed in [10], which in turn combines the models of [11] and [12]. The reactions are: where P off , P on , I 1 , and I 2 represent the different states of the promoter, and E represents a produced elongation complex (can be taken to approximate produced RNAs). The mechanistic details are further discussed in the main manuscript. We expect the dynamics of the above reactions to follow the stochastic chemical kinetics [13,14]. While the phenotypic distribution of a cell population with respect to a particular gene is characterized by the RNA and protein numbers, the contribution of transcription is best characterized by the distribution of produced transcripts in a given period of time, as the former is affected by the latter along with other processes such as degradation and dilution (due to cell division) of the transcripts.
Meanwhile, the distribution of the number of produced transcripts is intimately related to the distribution of consecutive transcription intervals: where F E (t) is the cumulative density of E(t), f τ and F τ are the probability density function and cumulative density of the intervals between the production of consecutive transcripts τ i (assumed to be independent and identically distributed), f * g is the convolution of f and g, and f * k is the k:th convolution power of f . For exponential τ ∼ E(λ) with rate λ, the produced RNA numbers are Poisson distributed E(t) ∼ P(λ t), in which case it is said that the RNAs are produced according to a Poisson process, or more concisely, that the transcription is Poissonian. Regardless of the interval distribution, the two moments of the two distributions are related in the long-term limit [15]: that is, the mean number of RNA produced per unit time equals the inverse of the transcription interval mean, while the Fano factor of the produced RNA numbers equals the squared coefficient (variance over squared mean) of variation of the transcription intervals. Conveniently, the latter equals always unity for Poissonian process. The long-term limit assumption is necessary such that short-term memory effects (which are present for a non-Poissonian transcription process) of the transcription process vanish. Note that the long-term limit assumption is only necessary to link the moments of the transcription interval distribution to that of the RNA numbers as specified above, and not in determining the appropriate transcription process from the measurement data-the link in Eq (2) holds for any t.
The transcription intervals of the model of Eq (1) are conveniently written using the following functional equation: is the probability density function of the waiting time of a reaction with rate k. The parenthesized expression arises from the random number of visits to P off prior to transitioning to I 1 , and the latter two terms from the remaining reactions. The expression for f τ can be simplified by manipulating it in the Laplace space (X → E e tX ) [10]. The result can be written as: provided that k on , p 1 , · · · , p n are distinct. The singularities p i = p j can be removed, and a more general result can be found in our previous article [10]: essentially, in such case, the density is a mixture of Erlang densities instead of the exponential ones as above.
From the above equation, several choices of parameters can result in an identical distribution of transcription intervals (and, consequently, RNA numbers). For example, k 2 and k 3 can be interchanged. This implies that while the best fitting distribution for a set of data can be found, the order of the steps k 2 and k 3 , or in fact, the exact values of any of the parameters, cannot be identified without additional information. In the manuscript, we exploit the details of how the parameters change to provide such additional information to identify k off , k on , and k 1 .
The other properties of the transcription intervals can be derived from the density, e.g. the mean and variance of the transcription intervals can be integrated from f τ : which can alternatively be derived from the results of Peccoud and Ycart [12] using Eq (3), which links the moments of the long-term RNA distribution to those of the transcription intervals. The duty cycle, average burst size, and average burst interval of the on/off switching loop are k on / (k on + k off ), k 1 / k off , and k on −1 + k off −1 , respectively. Here, the duty cycle is the fraction of time the gene spends in the on versus off state, the burst size is the number of RNAs produced prior to turning off, and the burst interval is the duration between such bursts.
In the temperature-dependent models, each model parameter changes as a function of temperature according to a polynomial function: where a kx −1 ,j is the order-j coefficient for the parameter k x and T is temperature. In practice, we consider polynomials up to the second order p = 2, as an order-p polynomial (or higher) can always pass through to p + 1 data points. We do not expect the polynomial models to provide particular insight, rather, parameters with a model of orders 0, 1, and 2, indicate that the parameter is independent of, varies linearly, or in a nonlinear fashion, respectively, with temperature.

Model Fitting and Selection
The models are fit using censored time intervals between the production of consecutive transcripts extracted from live-cell measurements (intervals are available in S8 table). The censoring is necessary, as production intervals longer than cell division cannot be observed, resulting in underestimation of the true transcription intervals, and as it improves the accuracy of the parameter estimation by properly accounting for the effects of finite sampling rate (60 s sampling interval) [10]. With censoring, rather than observing the production intervals τ i , we observe bounds for each interval: The models are fit in a maximum likelihood sense. The maximum likelihood estimate is:θ where θ represents a vector of the parameters to be estimated. If multiple models are to be fit together with independent data sets, the likelihoods sum as for the different samples above. The parameter vector θ contains the appropriate set of the polynomial coefficients a kx,j that determine the model parameters for each temperature T . Here, the objective˜ is some function that is equal up to some additive constant to the logarithm of the likelihood function . In general, the maximum likelihood objective is not guaranteed to feature attractive properties such as convexity or unimodality, but it is smooth almost everywhere and in practice well behaved, provided that the model is somewhat correct. The optimization was performed using a general-purpose derivativefree nonlinear optimization algorithm [16]. To counter the convergence of the optimization procedure to a local maximum, we used 1,000 random restarts, with each parameter being generated from an unit-interval uniform distribution. The parameters were scaled to have a mean equal to that of the data, assuming that the data were exponential.
The distribution of the estimated parameters or any model feature derived from them can be estimated using the delta method. It can be shown that a mapping applied to the maximum likelihood estimate converges in distribution to that applied to the true parameter such that [17]: for any g(θ) that is continuous almost everywhere. Here, g θ is the Jacobian of g, I(θ) is the Fisher information matrix, and · d − → N (b, c) represents a convergence in distribution to a normal distribution with a mean of b and covariance c. For practical purposes, the Jacobian g θ (θ) can be approximated with that of at the parameter estimate g θ (θ), and the Fisher information at the true parameter θ can be approximated with the observed information − θθ (θ) at the parameter estimate, where (θ) is the logarithm of the likelihood function and θθ its Hessian. In this work, g θ (θ) is computed using automatic differentiation and θθ (θ) =˜ θθ (θ) is computed numerically.
As the models with more parameters fit never worse than those with less, a scheme to penalize the excess degrees of freedom in the models is required. For this, we use the Bayesian information criterion (BIC) [18]. The BIC is computed according to [18]: where (θ) is the log-likelihood at the maximum likelihood estimate, k is the number of parameters, and n is the number of samples. In general, (θ) and n are not known when some of the data are censored. Instead, we know˜ (θ), the log-likelihood up to some additive constant, and n is known to be in some range, as each censored interval can contain information worth of 0 to 1 samples (the specific value depending on both the sample and the true model, and as such, cannot be determined). However, the additive constant vanishes when comparing two BICs, so the difference of two BICs can be estimated as: (11) wheren . = 1 n i + 0.5 n r is an estimate of the effective number of samples. As indicated above, in this work, each of the n i interval censored samples is assumed to be worth of 1 samples, as the sampling intervals are relatively short compared to the transcription intervals, and each of the n r right censored sample is assumed to be worth of 0.5 samples.
Finally, a conservative lower bound for ∆BIC 1,2 can be derived: which guarantees that invalid conclusions are not drawn due to the inaccuracy of the approximation, and allows a degree of inaccuracy in the former.

qPCR of Target Gene Activity
The activity of the target genes were also analyzed using quantitative PCR (qPCR). Cells containing the target plasmids were grown at various LB media [4] at 37°C, and induced with their respective inducers (0.1% L-arabinose and 1 mM IPTG for P lac/ara-1 -mRFP1-96BS, and 15 ng/mL aTc for P tetA -mRFP-96BS), as described above. Cells were collected by centrifugation at 8000 × g for 5 minutes. Twice the cell culture volume of RNA protect reagent (Qiagen) were added to the reaction tube, following the addition of Tris EDTA lysozyme buffer (pH 8.0) for enzymatic lysis. The total RNA from cells was isolated by the RNeasy kit (Qiagen), according to the manufacturer instructions. The concentration of RNA was quantified by Nanovue plus spectrophotometer (GE Healthcare). To remove the residual DNA, the samples containing the total isolated RNA samples were treated with DNase. Following that, iSCRIPT reverse transcription super mix was added for cDNA synthesis. Next, the cDNA samples were mixed with the qPCR master mix, containing iQ SYBR Green supermix (Biorad), with specific primers for the target and reference genes. The reaction was carried out in triplicates with a total reaction volume of 20 µL. For quantifying the target gene, we used mRFP1 primers (forward: 5' TACGACGCCGAG-GTCAAG 3' and reverse: 5' TTGTGGGAGGTGATGTCCA 3') and for the reference gene, we used the 16S RNA primers (forward: 5' CGTCAGCTCGT-GTTGTGAA 3' and reverse: 5' GGACCGCTGGCAACAAAG 3'). The qPCR experiments were performed using a MiniOpticon Real time PCR system (Biorad). The following conditions were used during the reaction: 40 cycles at 95°C for 10 s, 52°C for 30 s, and 72°C for 30 s for each cDNA replicate. We used no-RT controls and no-template controls to crosscheck non-specific signals and contamination. PCR efficiencies of the reactions were greater than 95%. The data from CFX Manager TM software was used to calculate the relative gene expression and its standard error [19].

Western Blot for RNA Polymerase Quantification
To quantify RNA polymerase abundance in DH5α-PRO strain at the different media, we measured the amount of RpoC subunits by western blot. Cells were grown until reaching mid-logarithmic phase, and harvested by centrifugation at 8000 × g for 1 minute. After that, cell lysate was treated with the B-PER bacterial protein extraction reagent (Thermo scientific), containing protease inhibitors, and incubated at room temperature for 10 minutes. The samples were centrifuged at 15000 × g for 10 minutes, after which the supernatant was collected. Next, the total protein samples were diluted to the 4× lamellae sample loading buffer, containing β-mercaptoethanol, and boiled for 5 minutes at 95°C. The samples containing about 30 µg of total protein were resolved by 4 to 20% TGX stain free precast gels (Biorad). Proteins were separated by electrophoresis and then electro-transferred on the PVDF membrane. Membranes were blocked with 5% non-fat milk and incubated with primary RpoC antibodies of 1 : 2000 dilutions (Biolegend) overnight at 4°C, followed by the HRP-secondary antibodies 1 : 5000 dilutions (Sigma Aldrich) for 60 minutes at room temperature. For detection of the RpoC protein, chemilumiscence reagent (Biorad) was used. Images were generated by the Chemidoc XRS system (Biorad). Band quantification was done by using the Image Lab software (version 5.2.1).