Customized Regulation of Diverse Stress Response Genes by the Multiple Antibiotic Resistance Activator MarA

Stress response networks frequently have a single upstream regulator that controls many downstream genes. However, the downstream targets are often diverse, therefore it remains unclear how their expression is specialized when under the command of a common regulator. To address this, we focused on a stress response network where the multiple antibiotic resistance activator MarA from Escherichia coli regulates diverse targets ranging from small RNAs to efflux pumps. Using single-cell experiments and computational modeling, we showed that each downstream gene studied has distinct activation, noise, and information transmission properties. Critically, our results demonstrate that understanding biological context is essential; we found examples where strong activation only occurs outside physiologically relevant ranges of MarA and others where noise is high at wild type MarA levels and decreases as MarA reaches its physiological limit. These results demonstrate how a single regulatory protein can maintain specificity while orchestrating the response of many downstream genes.

1 Strains and Plasmids

Strains
The MarA + strain is derived from Escherichia coli MG1655, with modifications to overexpress MarA. The chromosomal copies of the two MarR binding sites in the marRAB promoter are inactivated, removing negative feedback by MarR. This inactivation was accomplished by placing transversion mutations at these sites. The mutated sequences were derived from [1], where they are listed as "TV -14 to -18" and "TV +11 to 15." The new marRAB promoter is: GCATCGCATTGAACAAAACTTGAACCGATTTAGCAAAACGTGGCATCGGTCAATTCA TTCATTTGACTTATACTTGCCTGTTACCTATTATCCCCTGCAACTAATTACGGTAAAGGG C AACTAAT GTGAAAAGTACCAGCGATCTGTTCAATGAAATTATT where bold letters indicate the transversion mutations, while underlining indicates MarR binding sites. We used homologous recombination to insert this modified promoter following [2], then cured the resistance marker.

Plasmids
Downstream Gene Reporter Plasmids All plasmids were constructed using Gibson assembly [3]. For the transcriptional reporters, we amplified the promoters of the downstream genes and included all known binding sites. The primers typically terminated at the transcriptional start site and were joined to a synthetic ribosome binding site and the gfp gene. The only exception to this is P marRAB , which extends beyond the transcriptional start site of the promoter in order to include the second MarR binding site. The ribosome binding site and plasmid backbone were taken from pBbSvk-cfplaa described in [4]. All primers used to amplify promoters are listed in Table S1.

marA-rfp and rfp Control
The rfp control used in Fig. S1 uses the plasmid pBbA5a-rfp from [5]. For the activator/reporter experiments this plasmid was modified to produce pBbA5a-marA-rfp. To construct this, the marA gene and an additional ribosome binding site were inserted directly upstream of rfp.

Chimeric Reporters
We constructed the chimeric reporters using site-directed mutagenesis to modify the MarA binding sites from plasmids for P acrAB and P marRAB . The primers used for the mutagenesis are listed in Table S2. Underlined text shows the acrAB and marRAB marboxes inserted during site-directed mutagenesis. Bold text corresponds to the sequence where the primers bind to the original tem-plate. P am is the P acrAB promoter with the marRAB marbox; P ma is the P marRAB promoter with the acrAB marbox.

Combining and Binning Values
To characterize each promoter we induced marA-rfp expression under a range of IPTG concentrations, combined the data, and binned the values to generate output statistics based on input values ( Fig. S1) . Each data point was calculated by binning the IPTG-induced activator/reporter data logarithmically, then computing statistics (mean and standard deviation, µ and σ) on the data within that bin.

Activation Function
In order to model how a downstream gene responds to activation by MarA, we used where the steady state solution is Here, A is the MarA concentration and B is the downstream gene product concentration. α is the promoter strength, β is the dilution/degradation rate, K d is the dissociation constant of MarA with the downstream gene, n is the Hill coefficient, and c is the basal level of expression.
When plotting the activation in Figs. 2 and 4 of the main text, we use a normalized version of this function with the basal level subtracted: This function depends on the promoter specific parameters, not on the behavior of the specific gene product regulated by the promoter.

Control for Cross-talk in Activation Measurements
In order to verify that there was no cross-talk between our RFP and GFP channels, we constructed a control strain bearing an IPTG inducible rfp gene without marA. Regardless of the level of RFP, we observed no increase in GFP expression (Fig. S2).

Fitting Parameters
In order to simultaneously fit both the activation and transmitted noise curves, we built a fitness function and used the differential evolution algorithm to fit parameter values [6]. This function performs an element-wise comparison between the experimentally calculated values (input and output; A and B) and the analytical solutions for the fitted functions for each bin. The experimental values were calculated using the procedure shown in Fig. S1. The analytic solutions were Using this cost function, we fit parameter values for each of our reporter strains (Table S3). Note that for the sake of simplicity, β was not fit as a free parameter since it simply acts as a scaling term. Instead we set β = 1 and fit α and c. While the experimental data only occupies a portion of the entire Hill curve, our fitting function fits both the activation and noise simultaneously. This allows for a good fit without explicitly assuming a maximum expression level for any of the downstream genes or constraining the parameters in any other way. Note as well that the values of K d here are in units of arbitrary fluorescence -which correspond to the concentration of MarA. The exact values are a function of the specific experimental protocol, but the relative values between each downstream gene will be maintained regardless of method.

Noise in Mapped MarA Distributions
The coefficient of variation of the estimated wild type MarA distribution was 0.48 versus 0.65 in the MarA+ strain. These are on the high-end of gene expression coefficients of variation in E. coli, which usually range from 0.2 to 0.5 [7]. Transmitted noise through each of the downstream genes functions as a multiplication of this value.

Computational Modeling
We modeled noisy expression of a transcription factor A usinġ where I A is the intrinsic noise of A. This is modeled using an Ornstein-Uhlenbeck processes [8].
Here, κ defines the time scale of the noise and scales with correlation time and λ depends on the standard deviation of the noise [8]. η is a zero mean white noise random variable.

Parameter Value
Simulation were run for 10,000 minutes of simulation time and values were then used to create distributions for each protein species. The differential equations were integrated using Euler's method with a time-step of 0.5 minutes.
Either varying α or β has the capacity to affect activation and noise transmission as shown in Fig. S3.

Physiologically Relevant Ranges of MarA
The computation of channel capacity for each downstream gene uses the physiological ranges of MarA from the wild type and MarA + strains. However, we do not directly measure MarA levels, but rather the downstream response. To estimate endogenous MarA levels, we mapped the results from the reporter-only experiments in wild type and MarA + strains through our activator/reporter data to obtain estimates of the wild type and MarA + levels. An example of this is shown in Fig. S4. For each cell's fluorescence level in the reporter-only experiments, we matched the nearest neighbors for wild type and MarA + conditions in the activator/reporter experiment. The RFP values from these points served as estimates for the MarA level producing that downstream response. By repeating this process for each downstream gene, we generated independent estimates for MarA levels in wild type and MarA + conditions. As the MarA levels are the same in each of the wild type and MarA + experiments, we combined data to estimate the underlying MarA distributions.

Channel Capacity
We calculated channel capacity using where I * (A; B) is the channel capacity given input A (MarA) and output B (downstream gene). In our calculations we used the normalized values of B from Eqn. 3. A 0 is the natural scale of concentration of the activator molecule, which we have set to 1000 for it to scale with the order of magnitude of our experimental measurements taken as arbitrary units of fluorescence. The constant X is based on the expansion from [9]: where N max is the maximum number of independent molecules produced from a downstream promoter. The number of independent molecules produced from a promoter is equal to the natural scale of concentration and agrees with the order of magnitude of average protein copy number in E. coli [10].

Mutual Information
Mutual information for Fig. S5 and Fig. S6 is calculated using the following equation from [11] I(A; B) = P (A)dA P (B|A)log 2 P (B|A) P (B) dB.
Where P (A) and P (B) are the input and output probability distributions, respectively. In order to calculate this value from our experimental data we binned the data following the methods from [12] and calculated it as discrete mutual information. The number of bins scaled with the number of samples following the equation below from [12] N bins = N samples 5 .