Analytical kinetic model of native tandem promoters in E. coli

Closely spaced promoters in tandem formation are abundant in bacteria. We investigated the evolutionary conservation, biological functions, and the RNA and single-cell protein expression of genes regulated by tandem promoters in E. coli. We also studied the sequence (distance between transcription start sites ‘dTSS’, pause sequences, and distances from oriC) and potential influence of the input transcription factors of these promoters. From this, we propose an analytical model of gene expression based on measured expression dynamics, where RNAP-promoter occupancy times and dTSS are the key regulators of transcription interference due to TSS occlusion by RNAP at one of the promoters (when dTSS ≤ 35 bp) and RNAP occupancy of the downstream promoter (when dTSS > 35 bp). Occlusion and downstream promoter occupancy are modeled as linear functions of occupancy time, while the influence of dTSS is implemented by a continuous step function, fit to in vivo data on mean single-cell protein numbers of 30 natural genes controlled by tandem promoters. The best-fitting step is at 35 bp, matching the length of DNA occupied by RNAP in the open complex formation. This model accurately predicts the squared coefficient of variation and skewness of the natural single-cell protein numbers as a function of dTSS. Additional predictions suggest that promoters in tandem formation can cover a wide range of transcription dynamics within realistic intervals of parameter values. By accurately capturing the dynamics of these promoters, this model can be helpful to predict the dynamics of new promoters and contribute to the expansion of the repertoire of expression dynamics available to synthetic genetic constructs.

Introduction in the S1 Appendix). However, to study the dynamics of genes controlled by tandem promoters, we focused on only 102 of them, because their activity is expected to be undisturbed by neighboring genes in the DNA (arrangements I and II in Fig 1), for reasons

Fig 1. Interference between tandem promoters with different arrangements relative to each other. (A)
Interference by an RNAP occupying the downstream promoter on the activity of the elongating RNAP from upstream promoter. The TSSs need to be at least 36 bp apart (the length occupied by an RNAP when in OC, [23,25]) (B) Interference by occlusion of one of the promoter's TSS by an RNAP on the TSS of the other promoter. The distance between the TSSs need to be � 35 bp apart. Blue clouds are RNAPs. Black arrows sit on TSSs and point towards the direction of transcription elongation. Arrangements (I-II) of two promoters studied in the manuscript in tandem formation are represented. The red rectangles are the protein coding regions. We studied only the natural tandem promoters that neither overlap with nor have in between another gene (arrangements I and II, which differ based on whether the promoter regions overlap or not). Other arrangements (not considered in this study) are shown in Fig   (II) Next, we measured the single-cell protein levels of those genes with arrangements I and II that are tagged in the YFP strain library [28]. We also measured the mean RNA fold changes of these genes over time (S1 Appendix, section 'RNA-seq measurements and data analysis'). (III) We used the single-cell data to tune the model. (IV) Finally, we used the model to explore the state space of protein expression. Figure created with BioRender.com.
Further, these promoters do not have specific short nucleotide sequences capable of affecting RNAP elongation (section 'Pause sequences' in the S4 Appendix). Also, the 102 genes expressed by these promoters are not overrepresented in a particular biological process (section 'Over-representation test' in the S4 Appendix). From time-lapse RNA-seq data (S1 Appendix, section 'RNA-seq measurements and data analysis'), we also did not find evidence that their dynamics are affected by their input transcription factors (TFs) in our measurement conditions (section 'Input-output transcription factor relationships' in the S4 Appendix) nor by H-NS in a consistent manner (section 'Regulation by H-NS' in the S4 Appendix). Finally, they do not exhibit any particular TF network features (Table C in the S3 Appendix). As such, neither input TFs nor specific nucleotide sequences are considered in the model below. In addition to all of the above, we found no correlations between the shortest distance from the TSS of upstream promoters from the oriC region in the DNA and expression levels (section 'Relationship with the oriC region' in the S4 Appendix).

Model of gene expression controlled by tandem promoters
RNAPs bind, slide along, and unbind from a promoter several times until, eventually, one of them finds the TSS [29][30], commits to OC at the TSS, and initiates transcription elongation.
Reactions (1A1) are a 4-step (I-IV) model of transcription [20,31]. The forward reaction in step I in (1A1) models RNAP binding to a free promoter (P free ), which becomes no longer free albeit the RNAP might not yet have reached the TSS. This state, pre-finding of the TSS, is here named P bound and its occurrence increases with RNAP concentration, [R]. Next, as it percolates the DNA, the RNAP should find and stop at the nearest TSS and form a closed complex (CC) with the DNA (step II, Reaction 1A1). CCs are unstable, i.e. reversible [22] (reaction 1A2) but, eventually, one of them will commit to OC irreversibly [32], via step III, Reaction 1A1 [21][22]. It follows RNAP escape from the TSS, freeing the promoter (step IV, Reaction 1A1) [33][34][35][36][37]. Then, the RNAP elongates (R elong ) until producing a complete RNA (reaction 1A3) and freeing itself.
These set of reactions usually model well stochastic transcription dynamics [20]. However, if two promoters are closely spaced in tandem formation, they can interfere [38]. Fig 3 shows sequences of events that can lead to interference between tandem promoters, not accounted for by the model above.
From Fig 3, if the TSSs are sufficiently close, the occupancy of one TSS by an RNAP will occlude the other TSS, blocking its kinetics [18]. This is accounted for by reaction 1A5, which competes with CC formation in reaction 1a1. Its rate constant, k occlusion , is defined in the next section. In (1A5), 'u/d' stands for occlusion of the upstream promoter by an RNAP on the TSS of the downstream promoter.
Instead, if the TSSs are not sufficiently close, they will still interfere since the elongating RNAP (R elong ) starting from the upstream promoter can collide with RNAPs on the TSS of the downstream promoter. This can dislodge either RNAP via (reaction 1A4) or (reaction 2A3), depending on the sequence-dependent binding strength of the RNAP to the TSS [9].
Finally, once reaction 1A1 occurs, either reaction 1A3 or 1A4 occur. To tune their competition, we introduced the terms ω d and (1ω d ) in their rate constants, with ω d being the fraction of times that an elongating RNAP from an upstream promoter finds an RNAP occupying the downstream promoter. Meanwhile, 'f' is the fraction of times that the RNAP occupying the downstream promoter falls-off due to the collision with an elongating RNAP, whereas '1-f' is the fraction of times that it is the elongating RNAP that falls-off.
Next, we reduced the model and derived its analytical solution. First, since P cc completion is expected to be faster than P bound completion ( [10] and references within) we merged them into a single state, P occupied , which represents a promoter occupied by an RNAP prior to commitment to OC, whose time length is similar to P bound . A similar set of events occurs in tandem promoters, if only one RNAP interacts with them at any given time. (B / C) Interference due to the occlusion of the downstream / upstream promoter by a bound RNAP, which will impede the incoming RNAP from binding to the TSS. (D) Interference of the activity of the RNAP incoming from the upstream promoter by the RNAP occupying the downstream promoter. One of these RNAPs will be dislodged by the collision. Created with BioRender.com. Similarly, in standard growth conditions, the occurrence of multiple failures in escaping the promoter per OC completion should only occur in promoters with the highest binding affinity to RNAP. Thus, in general promoter escape should be faster than OC [20,32]. We thus merged OC and promoter escape into one step named 'events after commitment to OC', with a rate constant k after . The simplified model is thus: These two steps are not merged since only the first differs with RNAP concentration [20,26,39]. Further, reports [40][41] indicate that E. coli has~100-1000 RNAPs free for binding at any moment but~4000 genes, suggesting that the number of free RNAPs is a limiting factor.
Finally, we merge (1A2), (1A5) and (1B1) in one multistep without affecting the model kinetics: Overall, this reduced model of transcription of upstream promoters has a multistep reaction of transcription initiation (1C1), a reaction of transcription elongation (1A3) and a reaction for failed elongation due to RNAPs occupying the downstream promoter (1A4).
Regarding RNA production from the downstream promoter, it should either be affected by occlusion if d TSS � 35, or by RNAPs elongating from the upstream promoter if d TSS > 35 ( Fig  3). We thus use reactions (2A1), (2A2), and (2A3) to model these promoters' kinetics: Finally, one needs to include a reaction for translation (reaction 3), as a first order process since protein numbers follow RNA numbers linearly ( Fig F in the S2 Appendix), and reactions for RNA and protein decay accounting for degradation and for dilution due to cell division (reactions 4A and 4B, respectively). TF regulation is not included as noted above (Fig C and

Transcription interference by occlusion
In a pair of tandem promoters, the k occlusion of one of them should increase with the fraction of time that the other one is occupied. Further, it should decrease with increasing d TSS between the two promoters' TSS. We thus define k occlusion for the upstream (Eq 5A) and downstream (Eq 5B) promoters, respectively as: Here, k max ocl is the maximum occlusion possible. It occurs when the two TSSs completely overlap each other (d TSS = 0) and the TSS of the 'other' promoter is always occupied. Meanwhile, I(d TSS ) models distance-dependent interference.
We tested four models of interference: 'exponential 1', 'exponential 2', 'step', and 'zero order' ( Table 1). The first two assume that the effects of occlusion decrease exponentially with d TSS (first and second order dependency, respectively).
Meanwhile, the ' Step' model assumes that interference only occurs precisely in the region in the DNA occupied by the RNAP when in OC formation. For this, it uses a logistic equation to build a continuous step function, where L is the length of DNA (in bp) occupied by the RNAP in OC. As such, L tunes at what d TSS the step occurs, while m is the steepness of that step (set to 1 bp -1 ).
Finally, the 'Zero order' model assumes (unrealistically) that interference by occlusion, is independent of d TSS. Fig G in  Finally, ω is the fraction of time that the 'other' promoter is occupied. It ranges from 0 (no occupancy) to 1 (always occupied). It is estimated for upstream and downstream promoters as: Similarly, if k max occupy is the maximum possible interference due to RNAPs occupying the downstream promoter, k occupy is defined as:

Analytical solution of the moments of the single-cell protein numbers
Next, we derived an analytical solution of the expected mean single-cell protein numbers at steady state, M P , which is later tuned to fit the empirical data. For any gene, regardless of the underlying kinetics of transcription, k r is the effective rate of RNA production. Based on the reactions above, the mean protein numbers in steady state will be (see sections "Analytical model of mean RNA levels controlled by a single promoter in the absence of a closely spaced promoter" and "Derivation of mean protein numbers at steady state produced by a pair of tandem promoters" in the S1 Appendix): This equation applies to a pair of tandem promoters as well. In that case, assuming that k bind of the two tandem promoters is similar, we have: To derive the other moments, we considered that empirical single-cell protein numbers in E. coli are well fit by negative binomials [28]. Consequently, M p and the squared coefficient of variation CV 2 P , should be related as (Equations S28 to S38 in the S1 Appendix): This relationship matches empirical data at the genome wide level, except for genes with high transcription rates [42]. Additionally, we further derived a relationship (Section 'CV 2 and Skewness of single-cell protein expression of a model tandem promoters' in the S1 Appendix) between M P and the skewness, S P , of the single-cell distribution of protein numbers:

Single-cell distributions of protein numbers
To validate the model, we measured by flow-cytometry the single-cell distributions of protein fluorescence of 30 out of the 102 genes known to be controlled by tandem promoters (with arrangements I and II). Measurements were made in 1X and 0.5X media (3 replicates per condition) using cells from the YFP strain library (section 'Strains and Growth Conditions' in the S1 Appendix). Data from past studies show that, in these 30 genes, RNA and protein numbers are well correlated (Fig F in the S2 Appendix) in standard growth conditions. Past studies also suggest that most of these genes are active during exponential growth (~95% of our 30 genes selected should be active, according to data in [43] using SEnd-seq technology).
Single-cell distributions of protein expression levels are shown in Fig 4A for one of these genes as an example. The raw data from all 30 genes (only one replicate) are shown in Fig H in the S2 Appendix. Finally, the mean, CV 2 and skewness for each gene, obtained from the triplicates, are shown in Excel sheets 1 and 2 in the S2 Table. In addition, we also show this mean, CV 2 and skewness after subtracting the first, second, and third moments of the single-cell distribution of the fluorescence of control cells, which do not express YFP (Sheets 3, 4 in the S2 Table) (Section 'Subtraction of background fluorescence from the total protein fluorescence' in flow-cytometry in the S1 Appendix).
Based on the analysis of the data of these 30 genes, we removed from subsequent analysis those genes (5 in 1X and 14 in 0.5X) whose mean, variance, or third moment of their protein fluorescence distributions are lower than in control cells (not expressing YFP), i.e., than cellular autofluorescence (Sheets 3, 4 in S2 Table). As such, only one gene studied here (in condition 1X alone) codes for a protein that is associated to membrane-related processes, which might affect its quantification (section 'Proteins with membrane-related positionings' in S4 Appendix). As such, we do not expect this phenomenon to influence our results significantly. The data from these genes removed from further analysis is shown in Fig F in S2 Appendix alone, for illustrative purposes.  Table G in the S3 Appendix) when obtained by FC plotted against when obtained by microscopy, 'Mic'. (D) Mean single-cell protein fluorescence (own measurements) plotted against the corresponding mean single-cell protein numbers reported in [28]. From the equation of the best fitting line without y-intercept (y-intercept = 0), we obtained a scaling factor, sf, equal to 0.09. We started by testing the accuracy of the background-subtracted flow-cytometry data by confronting it with microscopy data (also after background subtraction, see section 'Microscopy and Image Analysis' in the S1 Appendix). We collected microscopy data on 10 out of the 30 genes (Table G in the S3 Appendix). The microscopy measurements of the mean single-cell fluorescence expressed by these genes (example image in Fig 4B), were consistent, statistically, with the corresponding data obtained by flow-cytometry ( Fig 4C).
Next, we converted the fluorescence distributions from flow-cytometry (25 genes in 1X and 16 genes in 0.5X) into protein number distributions. In Fig 4D we plotted our measurements of mean protein fluorescence in 1X against the protein numbers reported in [28] for the same genes, in order to obtain a scaling factor (sf = 0.09). Using sf, we estimated M P , CV 2 P , and S P of the distribution of protein numbers expressed by the tandem promoters in (Sheets 5, 6 in S2 Table) (Section 'Conversion of protein fluorescence to protein numbers' in S1 Appendix).
To test the robustness of the estimation of the scaling factor, we also estimated a scaling factor from 10 other genes present in the YFP strain library [28] (listed in Table B in S3 Appendix). These genes were selected as described in the section 'Selection of natural genes controlled by single promoters' in S1 Appendix. Using the data from this new gene cohort Table, we estimated a scaling factor of 0.08, supporting the previous result. Meanwhile, since when merging the data from tandem and single promoters, the resulting scaling factor equals 0.09 (Panel B of Fig I in S2 Appendix), we opted for using 0.09 from here onwards.
We also tested how sensitive the estimated scaling factor is to the removal of data points. Specifically, for 1000 times, we discarded N randomly selected data points, and estimated the resulting scaling factor. We then compared, for each N, the mean and the median of the distribution of 1000 scaling factors (Fig J in S2 Appendix). Since the median is not sensitive to outliers, if mean and median are similar, one can conclude that the scaling factor is not biased by a few data points. Visibly, the mean and the median only start differing for N larger than 6, which corresponds to nearly 30% of the data.

Log-log relationship between the mean single-cell protein numbers of tandem promoters and the other moments
We plotted M P against CV 2 P and S P in log-log plots, in search for the fitting parameters, 'C 1 ' and 'C 2 ', to estimate the rate of protein production per RNA (Eq 10). To increase the state space covered by our measurements, in addition to M9 media (named '1X'), we also used diluted M9 media (named '0.5X'), known to cause cells to have lower RNAP concentrations ( Fig 5A) (Section 'Strains and growth conditions' in the S1 Appendix), without altering the division rate (Panels A and B of Fig K in the S2 Appendix). We note that 1X and 0.5X only refer to the degree of dilution of the original media and not to how much RNAP concentration and consequently, protein concentrations, were reduced by media dilution. From the same figures, we attempted stronger dilutions, but no further decreases in RNAP concentration were observed and the growth rate decreased.
Next, from Fig 5B, most genes (of those expressing tangibly in both media) suffered similar reductions (well fit by a line) in protein numbers with the media dilution, as expected by the model of gene expression (Eqs 8 and 9). This linear relationship could also be interpreted as evidence that the difference in expression of these genes between the two conditions is not affected by TFs in our measurement conditions. Namely, if TF influences existed, and TF numbers changed, they would likely be diversely affected by their output genes (weakly and strongly activated, repressed, etc.) and, thus, our proteins of interest would not have changed in such similar manners (linearly).
Meanwhile, as in [42,44], CV 2 P decreases linearly with M P (log-log scale), irrespective of media (R 2 > 0.8 in all fitted lines), in agreement with the model (Fig 5C). Fitting Eq 10 to the data, we extracted C 1 in each condition. S P also decreases linearly with M P , irrespective of the media (Fig 5D). Similar to above, Eq 11 was fitted to each data set and C 1 and C 2 were obtained (R 2 > 0.6 for all lines).
Since C 1 from Fig 5C and 5D differed slightly (likely due to noise), we instead obtained C 1 and C 2 values that maximized the mean R 2 of both plots. Using 'fminsearch' function in MATLAB [45], we obtained C 1 = 72.71 and C 2 = 16.94 (R 2 of 0.80 and 0.61, respectively) for Fig 5C and Fig 5D, respectively.

Inference of parameter values and model predictions as a function of d TSS
We next used the model, after fitting, to predict how d TSS and the promoters' occupancy regulate the moments of the single-cell distribution of protein numbers (M P , CV 2 P , and S P ) under  Table 2 and tuned the remaining parameters.
To set the RNAP numbers in Table 2, we considered that the RNAPs affecting transcription rates are the free RNAPs in the cell, and that, for doubling times of 30 min in rich medium, there are~1000 free RNAPs per cell [41]. Meanwhile, for doubling times of 60 min in minimal medium, there are~144 [40]. In both our media, we observed a doubling time of~115 mins ( Fig 5B). Thus, we expect the free RNAP in 1X to also be~144/cell or lower. Meanwhile, in 0.5X, we measured the RNAP concentration to be 17% lower than in 1X ( Fig 5A) and no morphological changes. Thus, we assume the free RNAP in 0.5X to equal~120/cell. Next, we fitted the Eqs (8) and (9) relating d TSS with log 10 (M P ) in all interference models (Table 1), using the data on M P in 1X medium (Fig 6A) and the 'fit' function of MATLAB. For this, we set k max ¼ k max occupy ¼ k max ocl , for simplicity, as well as realistic bounds for each parameter to infer. To avoid local minima, we performed 200 searches, each starting from a random initial point, and selected the one that maximized R 2 . Results are shown in Table 3.
Next, we inserted all parameter values (empirical and inferred) in Eqs (10) and (11) to predict CV 2 P and S P in 1X medium (Fig 6B and 6C). Also, we inserted the same parameter values and the estimated RNAP numbers in 0.5X medium in Eqs (8)(9)(10)(11) to obtain the analytical solutions for M P , CV 2 P and S P for 0.5X medium (Fig 6D,6E and 6F). From Fig 6, the data is 'noisy', which suggests that it is not possible to establish if the models are significantly different. As such, here we only select the one that best explains the data, based on the R 2 values of the fittings. Table 3 shows the mean R 2 for M P , CV 2 P , and S P when confronting the model with the data. Overall, from the R 2 values, the step model is the one that best fits the data. Meanwhile, the 'ZeroO' model is the least accurate, which supports the existence of distinct kinetics when d TSS is smaller or larger than 35 nucleotides, which is the length of the RNAP when committed to OC on the TSS [23][24][25].
In summary, the proposed model of expression of genes under the control of a pair of tandem promoters is based on a standard model of transcription of each promoter, which are subject to interference, either due to occlusion of the TSSs or by RNAP occupying the TSS of the downstream promoter. The influence of each occurrence of these events is well modeled by linear functions of TSS occupancy times, while their dependency on d TSS is modeled by a Table 2. Parameter values imposed identically on all models.

Parameter description Parameter Value References
Inverse of the mean time to complete OC k after 0.005 s -1 Differs between promoters. Since empirical data lacks, we used the data from in vivo single RNA measures for Lac-Ara-1 [20].
Protein decay due to dilution by cell division and degradation k pd = k pdeg + k dil We then confronted the analytical solutions of the step model with stochastic simulations (Section 'Stochastic simulations for the step inference model' in the S1 Appendix). We first assumed various d TSS , but fixed k bind , for simplicity. Visibly, M P , CV 2 P , and S P of the stochastic simulations are well-fitted by the analytical solution, supporting the initial assumption that CV 2 P , and S P follow a negative binomial (Fig M in the S2 Appendix). However, natural promoters are expected to differ in k bind as they differ in sequence [48,49]. Thus, we introduced this variability and studied whether the analytical model holds.
To change the variability, we obtained each k bind from gamma distributions (means shown in Table 3 and CVs in Table I in the S3 Appendix). We chose a gamma distribution since its values are non-negative and non-integer (such as rate constants). Meanwhile, all parameters of the step model, aside from k bind , are obtained from Tables 2 and 3. For d TSS � 35 and d TSS > 35, and each CV considered, we sampled 10000 pairs of values of k bind �[R], and calculated M, CV 2 and S for each of them. Next, we estimated the average and standard deviation of each statistics. From Fig N in the S2 Appendix, if CV(k bind )<1, the analytical solution is robust. In that the standard error of the mean is smaller than M P /3. Notably, for such CV, the strength of the  Table). The dots were also grouped in 3 'boxes' based on their d TSS . In each box, the red line is the median and the top and bottom are the 3 rd and 1 st quartiles, respectively. The vertical black bars are the range between minimum and maximum of the red dots. In A, all lines are best fits. In B, C, D, E, and F, all lines are model predictions, based on the parameters used to best fit A. The insets show the R 2 for each model fit and prediction. Next, we used the fitted model to predict (using Eqs 8 to 11) the influence of promoter occupancy (ω) on the M P , CV 2 P and S P of upstream and downstream promoters. We set d TSS to 20 bp to represent promoters where � 35, and to 100 bp to represent promoters with d TSS > 35. Then, for each cohort, we changed ω from 0.01 to 0.99 (i.e., nearly all possible values). In addition, we estimated these moments when k occlusion , k occupy , and ω are all set to zero (i.e., the two promoters do not interfere), for comparison. From Fig 7, a pair of tandem promoters can produce less proteins than a single promoter with the same parameter values, if d TSS � 35, which makes occlusion possible. Meanwhile, if d TSS > 35, tandem promoters can only produce protein numbers in between the numbers produced by one isolated promoter and the numbers produced by two isolated promoters. In no case can two interfering tandem promoters produce more than two isolated promoters with equivalent parameter values. I.e., according to the model, the interference between tandem promoters cannot enhance production.
Meanwhile, the kinetics of the upstream (Fig 7A and  Finally, consider that the model predicts that transcription interference should occur in tandem promoters, either due to occlusion if d TSS � 35 occupancy or due to occupancy of the downstream promoter if d TSS > 35. Meanwhile, in single promoters, neither of these phenomena occurs. Thus, on average, two single promoters should produce more RNA and proteins than a pair of tandem promoters of similar strength. Using the genome wide data from [28] on 0.21 (Fig 6A-6C) 0.09 (Fig 6D-6F 0.25 (Fig 6A-6C) 0.12 (Fig 6D-6F) Step

PLOS COMPUTATIONAL BIOLOGY
protein expression levels during exponential growth we estimated the double of the mean expression level (it equals 183.8) of genes controlled by single promoters (section 'Selection of natural genes controlled by single promoters' in the S1 Appendix). Meanwhile, also using data from [28], the mean expression level of genes controlled by tandem promoters equals 148 (estimated from the 26 that they have reported on), in agreement with the hypothesis. Nevertheless, this data is subject to external variables (e.g., TF interference). A definitive test would require the use of synthetic constructs, lesser affected by external influences.

Regulatory parameters of promoter occupancy and occlusion
Since the occupancy, ω, of each of the tandem promoters is responsible for transcriptional interference by occlusion and by RNAPs occupying the downstream promoter, we next explored the biophysical limits of ω. Eqs 6A and 6B define the occupancies of the upstream and downstream promoters, ω u and ω d , respectively. For simplicity, here we refer to both of them as ω. Fig 8A shows that ω increases with the rate of RNAP binding (k bind �[R]), but only within a certain range of (high) values of the time from binding to elongating (k À 1 after ). I.e., RNAPs need to spend a significant time in OC, if they are to cause interference, which is expected. Similarly, ω changes with k À 1 after , but only for high values of k bind �[R]. I.e., if it's rare for RNAPs to bind, the occupancy will necessarily be weak.
Next, we estimated k occlusion , the rate at which a promoter occludes the other as a function of d TSS and ω using Eqs 6A and 6B. k max is shown in Table 3. To model I(d TSS ) we used the step function in Table 1. Overall, k occlusion changes linearly with ω, when and only when d TSS � 35 (Fig 8B).

State space of the single cell statistics of protein numbers of tandem promoters
We next studied how much the single-cell statistics of protein numbers (M P , CV 2 P , and S P ) of the upstream, 'u', and downstream, 'd', promoters changes with ω u , ω d , and d TSS . Here, ω u and ω d are increased from 0 to 1 by increasing the respective k bind (Eqs 6A and 6B).
From Fig 9A, if d TSS � 35 bp, reducing ω d while also increasing ω u is the most effective way to increase M u , since this increases the number of RNAPs transcribing from the upstream promoter that are not hindered by RNAPs occupying the downstream promoter. If d TSS > 35 bp, the occupancy the downstream promoter, ω d , becomes ineffective.
Oppositely, from Fig 9B, if d TSS � 35 bp, increasing ω d while also decreasing ω d , is the most effective way to increase M d since this increases the number of RNAPs transcribing from the  From Fig P in the S2 Appendix, CV 2 P and S P behave inversely to M P . Relevantly, in all cases, the range of predicted protein numbers (Fig 9C) are in line with the empirical values (~10 −1 to 10 3 proteins per cell) (Fig 4D).

Discussion
E. coli genes controlled by tandem promoters have a relatively high mean conservation level (0.2, while the average gene has 0.15, with a p-value of 0.009), suggesting that they play particularly relevant biological roles (section 'Gene Conservation' in the S1 Appendix). From empirical data on single-cell protein numbers of 30 E. coli genes controlled by tandem promoters, we found evidence that their dynamics is subject to RNAP interference between the two promoters. This interference reduces the mean single-cell protein numbers, while increasing its CV 2 and skewness, and can be tuned by ω, the promoters' occupancy by RNAP, and by d TSS . Since both of these parameters are sequence dependent [21,31] the interference should be evolvable. Further, since ω of at least some of these genes should be under the influence of their several input TFs, the interference has the potential to be adaptive.
We proposed models of the dynamics of these genes as a function of ω and d TSS , using empirically validated parameter values. In our best fitting model, transcription interference is modelled by a step function of d TSS (instead of gradually changing with d TSS ), since the only detectable differences in dynamics with changing d TSS were between tandem promoters with d TSS � 35 and d TSS > 35 nucleotides (the latter cohort of genes having higher mean expression and lower variability). We expect that causes this difference tangible is the existence of the OC formation. In detail, the OC is a long-lasting DNA-RNAP formation that occupies that strict region of DNA at the promoter region [24,31]. As such, occlusion should share these physical features. Because of that, when d TSS � 35, an RNAP bound to TSS always occludes the other TSS, significantly reducing RNA production. Meanwhile, if d TSS > 35, interference occurs when an RNAP elongating from the upstream promoter is obstructed by an RNAP occupying the downstream promoter.
Meanwhile, contrary to d TSS , if one considers realistic ranges of the other model parameters, it is possible to predict a very broad range of accessible dynamics for tandem promoter arrangements. This could explain the observed diversity of single-cell protein numbers as a function of d TSS (Fig 6). At the evolutionary level, such potentially high range of dynamics may provide high evolutionary adaptability and thus, it may be one reason why genes controlled by these promoters are relatively more conserved.
One potentially confounding effect which was not accounted for in this model is the accumulation of supercoiling. Closely spaced promoters may be more sensitive to supercoiling buildup than single promoters [52][53][54]. If so, it will be useful to extend the model to include these effects [26]. Using such model and measurements of expression by tandem promoters when subject to, e.g. Novobiocin [55], may be of use to infer kinetic parameters of promoter locking due to positive supercoiling build-up.
Other potential improvements could be expanding the model to tandem arrangements other than I and II (Fig 1), to include a third form of interference (transcription elongation of a nearby gene).
One open question is whether placing promoters in tandem formation increases the robustness of downstream gene expression to perturbations (e.g., fluctuations in the concentrations of RNAP or TF regulators). A tandem arrangement likely increases the robustness to perturbations which only influence one of the promoters. Another open question is why several of the 102 tandem promoters with arrangements I and II appeared to behave independently from their input TFs (according to the RNA-seq data), albeit having more input TFs (1.62 on average) than expected by chance (the average E. coli gene only has 0.95). As noted above, we hypothesize that these input TFs may become influential in conditions other than the ones studied here.
Here, we also did not consider any influence from the phenomenon of "RNAP cooperation" [56]. This is based on this being an occurrence in elongation, and we expect interactions between two elongating RNAPs to rarely affect the interference between tandem promoters [9]. However, potentially, it could be of relevance in the strongest tandem promoters.
Finally, a valuable future study on tandem promoters will require the use of synthetic tandem promoters (integrated in a specific chromosome location) that systematically differ in promoter strengths and nucleotide distances. This would allow extracting parameter values associated to promoter interference to create a more precise model than the one based on the natural promoters (which is influenced by TFs, etc). Similarly, measuring the strength of individual natural promoters would contribute to this effort. Overall, our model, based on a significant number of natural tandem promoters whose genes have a wide range of expression levels, should be applicable to the natural tandem promoters not observed here (at least of arrangements I and II), including of other bacteria, and to be accurate in predicting the dynamics of synthetic promoters in these arrangements.
Currently, predicting how gene expression kinetics change with the promoter sequence remains challenging. Even single-or double-point mutants of known promoters behave unpredictably, likely because the individual sequence elements influence the OC and CC in a combinatorial fashion. Consequently, the present design of synthetic circuits is usually limited to the use of a few promoters whose dynamics have been extensively characterized (Lac, Tet, etc.). This severely limits present synthetic engineering.
We suggest that a promising methodology to create new synthetic genes with a wide range of predictable dynamics is to assemble well-characterized promoters in a tandem formation, and to tune their target dynamics using our model. Specifically, for a given dynamics, it is possible to invert the model and find a suitable pair of promoters with known occupancies and corresponding d TSS (smaller or larger than 35), which achieve these dynamics. A similar strategy was recently proposed in order to achieve strong expression levels [57]. Our results agree and further expand on this by showing that the mean expression level can also be reduced and expression variability can further be fine-tuned.
Importantly, this can already be executed, e.g., using a library of individual genes whose expression can be measured [28]. From this library, we can select any two promoters of interest and arrange them as presented here, in order to obtain a kinetics of expression as close as possible to a given target. Note that these dynamics have a wide range, from weaker to stronger than that of either promoter (albeit no stronger than their sum, Fig 9C). Given the number of natural genes whose expression is already known and given the present accuracy in assembling specific nucleotide sequences, we expect this method to allow the rapid engineering of genes with desired dynamics with an enormous range of possible behaviours. As such, these constructs could represent a recipe book for the components of gene circuits with predictable complex kinetics.

Materials and methods
Using information from RegulonDB v10.5 as of 30 th of January 2020 [58], we started by searching natural genes controlled by two promoters (Section 'Selection of natural genes controlled by tandem promoters' in the S1 Appendix). Next, we studied their evolutionary conservation and ontology (Sections 'Gene conservation' and 'Gene Ontology' in the S1 Appendix) and analysed their local topological features within the TFN of E. coli (Section 'Network topological properties' in the S1 Appendix).
RNA-seq measurements were conducted in two points in time (Section 'RNA-seq measurements and data analysis' in the S1 Appendix), to obtain fold changes in RNA numbers of genes controlled by tandem promoters with arrangements I and II, their input TFs, and their output genes (Fig 1). We used this data to search for relationships between input and output genes.
Next, a model of gene expression was proposed, and reduced to obtain an analytical solution of the single-cell protein expression statistics of tandem promoters (Sections 'Derivation of mean protein numbers at steady state produced by a pair of tandem promoters' and 'CV 2 and skewness of the distribution of single-cell protein numbers of model tandem promoters' in the S1 Appendix). This analytical solution was compared to stochastic simulations conducted using the simulator SGNS2. (Section 'Stochastic simulations for the step inference model' in the S1 Appendix).
We collected single-cell flow-cytometry measurements of 30 natural genes controlled by tandem promoters (Section 'Flow-cytometry and data analysis' in the S1 Appendix) to validate the model. For this, first, from the original data, we subtracted the cellular background fluorescence (Section 'Subtraction of background fluorescence from the total protein fluorescence' in the S1 Appendix). Then, we converted the fluorescence intensity into protein numbers (Section 'Conversion of protein fluorescence to protein numbers in the S1 Appendix). From this we obtained empirical data on M, CV 2 , and S of the single-cell distributions of protein numbers in two media (Sections 'Media and chemicals' and 'Strains and growth conditions' in the S1 Appendix). Flow-cytometry measurements were also compared to microscopy data, supported by image analysis (Section 'Microscopy and Image analysis' in the S1 Appendix), for validation.
Comparing the data from RegulonDB (30.01.2020) used here, with the most recent (21.07.2021), we found that the numbers of genes controlled by tandem promoters of arrangements I and II differed by~4% (from 102 to 98). Regarding those whose activity was measured by flow-cytometry, this difference is~3% (30 to 31). Globally, 163 TF-gene interactions differed (~3.4%) while for the 98 genes controlled by tandem promoters of arrangements I and II, only 10 TF-gene interactions differ (~2.7%). Finally, globally the numbers of TUs differed by~1%, promoters by~0.6%, genes by~1%, and terminators by~15% (which did not affect the genes studied, as they changed by~4% only). These small differences should not affect our conclusions.