Dynamics of chromosomal target search by a membrane-integrated one-component receptor

Membrane proteins account for about one third of the cellular proteome, but it is still unclear how dynamic they are and how they establish functional contacts with cytoplasmic interaction partners. Here, we consider a membrane-integrated one-component receptor that also acts as a transcriptional activator, and analyze how it kinetically locates its specific binding site on the genome. We focus on the case of CadC, the pH receptor of the acid stress response Cad system in E. coli. CadC is a prime example of a one-component signaling protein that directly binds to its cognate target site on the chromosome to regulate transcription. We combined fluorescence microscopy experiments, mathematical analysis, and kinetic Monte Carlo simulations to probe this target search process. Using fluorescently labeled CadC, we measured the time from activation of the receptor until successful binding to the DNA in single cells, exploiting that stable receptor-DNA complexes are visible as fluorescent spots. Our experimental data indicate that CadC is highly mobile in the membrane and finds its target by a 2D diffusion and capture mechanism. DNA mobility is constrained due to the overall chromosome organization, but a labeled DNA locus in the vicinity of the target site appears sufficiently mobile to randomly come close to the membrane. Relocation of the DNA target site to a distant position on the chromosome had almost no effect on the mean search time, which was between four and five minutes in either case. However, a mutant strain with two binding sites displayed a mean search time that was reduced by about a factor of two. This behavior is consistent with simulations of a coarse-grained lattice model for the coupled dynamics of DNA within a cell volume and proteins on its surface. The model also rationalizes the experimentally determined distribution of search times. Overall our findings reveal that DNA target search does not present a much bigger kinetic challenge for membrane-integrated proteins than for cytoplasmic proteins. More generally, diffusion and capture mechanisms may be sufficient for bacterial membrane proteins to establish functional contacts with cytoplasmic targets. Author summary Adaptation to changing environments is vital to bacteria and is enabled by sophisticated signal transduction systems. While signal transduction by two-component systems is well studied, the signal transduction of membrane-integrated one-component systems, where one protein performs both sensing and response regulation, are insufficiently understood. How can a membrane-integrated protein bind to specific sites on the genome to regulate transcription? Here, we study the kinetics of this process, which involves both protein diffusion within the membrane and conformational fluctuations of the genomic DNA. A well-suited model system for this question is CadC, the signaling protein of the E. coli Cad system involved in pH stress response. Fluorescently labeled CadC forms visible spots in single cells upon stable DNA-binding, marking the end of the protein-DNA search process. Moreover, the start of the search is triggered by a medium shift exposing cells to pH stress. We probe the underlying mechanism by varying the number and position of DNA target sites. We combine these experiments with mathematical analysis and kinetic Monte Carlo simulations of lattice models for the search process. Our results suggest that CadC diffusion in the membrane is pivotal for this search, while the DNA target site is just mobile enough to reach the membrane.


Introduction
1 Bacteria are exposed to fluctuating environments with frequent changes in nutrient 2 conditions and communication signals, but also life-threatening conditions such as 3 environmental stresses and antibiotics [1]. To sense and adapt to changing 4 environmental conditions, bacteria have evolved sophisticated signaling schemes, 5 primarily based on one-and two-component systems [2,3]. 6 Two-component signaling systems feature a sensor kinase and a separate response 7 regulator, where the former is typically membrane-integrated while the latter diffuses 8 through the cytoplasm to reach its regulatory target [3]. The majority of response 9 regulators are transcription factors that bind to specific target sites on the genomic 10 DNA to activate or repress transcription. Hence, a key step in the signal transduction 11 pathway of these two-component systems is a DNA target search by a cytoplasmic 12 protein. The target search dynamics of cytoplasmic transcription factors have been 13 thoroughly studied in the past decades, triggered by early in vitro experiments 14 indicating that the Escherichia coli Lac repressor finds its target site faster than the 15 rate limit for three-dimensional (3D) diffusion [4]. Inspired by the idea that a reduction 16 of dimensionality can lead to enhanced reaction rates [5], the experiments were 17 explained by a two-step process, where transcription factors locate their target by 18 alternating periods of 3D diffusion and 1D sliding along the DNA [6]. Compared to 19 pure 3D diffusion, sliding increases the association rate by effectively enlarging the 20 target size ("antenna" effect) [7][8][9]. These dynamics were later probed with 21 single-molecule methods, both in vitro [10,11] and in vivo [12,13]. For the Lac repressor 22 as a paradigmatic example of a low copy number cytoplasmic transcription factor, the 23 in vivo timescale of the target search was found to be about one minute [12]. Further 24 studies continued to add to the detailed understanding of this target search process, e.g. 25 with respect to effects of DNA conformation [14], DNA dynamics [15], and 26 macromolecular crowding [16]. 27 One-component signaling systems, in contrast, combine sensory function and 28 response regulation within one protein [2]. The subset of one-component systems that 29 are both membrane-integrated sensors and DNA-binding response regulators face an 30 extraordinary DNA target search problem: They must locate and bind to a specific site 31 on the bacterial chromosome from the membrane. This is the case for one-component A cytoplasmic transcription factor (yellow) locates its specific binding site (green) on the DNA (red) through a combination of 3D diffusion in the cytoplasm and 1D sliding along the polymer. While the DNA itself is not static during this target search, DNA motion does not significantly contribute to its completion. (B) In contrast, a membrane-integrated transcription factor can only perform 2D diffusion in the membrane, such that DNA motion becomes essential. At a minimum, the specific DNA-binding site has to move close to the membrane to enable target recognition.
and TfoS in V. cholerae [19], PsaE in Yersinia tuberculosis [20] and the pH 37 stress-sensing receptor CadC in E. coli [21]. A priori, it is not clear how these 38 one-component systems establish functional protein-DNA contacts after stimulus 39 perception. One scenario is that the DNA-binding domain is proteolytically cleaved, 40 such that it can search for its specific binding site in the same manner as a regular 41 cytoplasmic transcription factor. Another possible scenario is that simultaneous 42 transcription, translation, and membrane-insertion ("transertion" [22,23]) tethers the 43 DNA locus of the one-component system to the membrane and, at the same time, places 44 the protein in the vicinity of its binding site on the DNA (which is typically close to the 45 gene encoding the one-component system). The third scenario is a diffusion and capture 46 mechanism [24], whereby the one-component system diffuses within the membrane and 47 captures conformational fluctuations that bring the DNA close to the membrane. 48 For the case of E. coli CadC, a well characterized model system [25], indirect 49 evidence [26,27] paired with direct observation [28] argues against the proteolytic 50 cleavage and transertion scenarios, and instead supports the diffusion and capture 51 mechanism. For instance, the transertion mechanism should be sensitive to relocating 52 the cadC gene to a locus far from its native position, which is close to the CadC target 53 site at the cadBA promoter. However, relocation to the distant lac operon did not 54 reduce the regulatory output of CadC. One of the observations arguing against 55 proteolytic cleavage was that external signals can rapidly deactivate the CadC response 56 after the original stimulus, whereas cleavage would irreversibly separate the 57 DNA-binding domain from the signaling input [26]. In contrast, the diffusion and 58 capture mechanism appeared consistent with experiments that imaged 59 fluorophore-labeled CadC in vivo [28]. These experiments showed that localized CadC 60 spots in fluorescent microscopic images form after cells were shifted to a medium that 61 simultaneously provides acid stress and a lysine-rich environment, the two input signals 62 required to stimulate the CadC response [26]. Formation of these spots was rapidly 63 reversible upon removal of the input signals. Furthermore, the observed number of 64 CadC spots was positively correlated with the number of DNA-binding sites, indicating 65 that the spots correspond to CadC-DNA complexes with a much lower mobility than 66 freely diffusing CadC in the membrane.

67
Taken together, the existing data suggest that the one-component system CadC conformation that occasionally bring the DNA region of the target site close enough to 72 the membrane to be captured by the protein (Fig. 1B) obtained from stochastic models [30]. We also measure the mobility of a chromosomal 83 locus in our experimental setup using a fluorescent repressor/operator system to inform 84 kinetic Monte Carlo simulations of the target search dynamics. These simulations are 85 able to reproduce the experimental behavior and to elucidate properties of the search 86 process that we cannot obtain experimentally.

88
Choice of CadC as experimental model system

89
CadC is a particularly well studied membrane-integrated one-component receptor. It is 90 part of a pH stress response system, which is also dependent on signaling input from the 91 lysine-specific permease LysP [26]. function as well as the feedback inhibition by cadaverine was assigned to distinct amino 99 acids within the periplasmic sensory domain of CadC [27,32], whereas the availability of 100 external lysine is transduced to CadC via the co-sensor and inhibitor LysP, a 101 lysine-specific transporter [33,34]. High-affinity DNA-binding of CadC requires CadC 102 homodimerization, which is inhibited by LysP via intramembrane and periplasmic 103 contacts under non-inducing conditions [33,34]. A drop in external pH induces 104 dimerization of the periplasmic sensory domain of CadC followed by structural 105 rearrangement of its cytoplasmic linker, permitting the DNA-binding domain of CadC 106 to homodimerize [21,32,35]. The CadC protein number is extremely low (on average 107 1-3 molecules per cell [28]), mainly due to a low translation rate caused by polyproline 108 stalling, which is only partially relieved by elongation factor P [36].
solely due to low pH, using a pH-independent CadC variant that showed spots at both 121 neutral and low pH. The connection between spot formation and CadC DNA-binding 122 was derived from the observation that no spots were formed when CadC was rendered 123 unable to bind DNA, and the number of spots per cell correlated with the number of 124 DNA-binding sites for CadC. Without DNA-binding site only 20 % of cells formed spots, 125 possibly due to non-specific DNA-binding. We take this into account in our quantitative 126 analysis below.

127
Experimental measurement of CadC target search times 128 We used three E. coli strains with different binding site configurations along the 129 chromosome: N-P cadBA (wild type) with the native DNA-binding site close to ori (at 130 93.9 [37]), T-P cadBA with the binding site relocated to the terminus, and N+T-P cadBA 131 with both binding sites ( Fig 2B). To visualize the temporal and spatial localization of 132 CadC in vivo, we transformed each of the strains with plasmid-encoded mCherry-tagged 133 CadC, thereby increasing the average number of CadC molecules per cell to 3-5 [28].

134
After the medium shift at t = 0 min, we took fluorescence and phase contrast microscopy 135 images of cells sampled from the same culture every minute. We used image analysis which rises from zero to one. Here, the asymptotic value ν(∞) accounts for the fact 142 that fluorescent spots are never detected in all cells (see the raw data in S1 Fig). This is 143 likely due to the heterogeneous distribution of CadC [28]: Given the low average copy  The target search as a stochastic process 153 On a coarse-grained level, the search of CadC for a specific binding site on the DNA can 154 be described by a stochastic process with a small number of discrete states. Since both 155 CadC and the DNA must move in order to establish a specific protein-DNA contact, it 156 is reasonable to assume a reversible sequential process with an intermediate state,   is not bound to this segment. The transition rates between these states are denoted as 163 k + 1 , k + 2 , and k − 2 . We are interested in the so-called 'first-passage time' τ to reach the 164 final state S 3 , which corresponds to the target search time within this coarse-grained 165 description. The probability distribution p(τ ) for this time is calculated by making the 166 final state absorbing, using standard techniques [30] (see S1 Appendix). Assuming that 167 the system is initially in state S 1 , the first passage time distribution is where the two timescales α and β of the exponential functions are related to the 169 transition rates via implying that α > β. At large times, the distribution p(τ ) decays exponentially (decay 171 time α), whereas the timescale β corresponds to a delay at short times. Hence,

172
increasing α leads to a slower decay and increasing β to a longer delay. The average 173 value of this distribution, referred to as the mean first passage time, is This corresponds to the average time for the first step (1/k + 1 ) multiplied by the average 175 number of trials needed to reach the second step, plus the average time for the second 176 reaction step (1/k + 2 ). In cases where one of the reaction steps is rate limiting, the mean 177 first passage time is simply the inverse of the limiting rate, and the process simplifies to 178 a two-state process with an exponentially distributed first passage time.

179
To relate the first passage time distribution to our experimental response function, 180 Eq 1, we consider the cumulative distribution function CDF(τ ) defined by which is the probability that the first passage time is less than or equal to τ .

182
Experimentally, CDF(τ ) corresponds to the fraction of cells in which the target search 183 was successful by time τ . The response function in Eq 1 is our best proxy for this 184 fraction of cells, and hence we identify The cumulative distribution function for p(τ ) of Eq 3 is which we can use as a fit function to describe the experimental data. However, using 187 Eq 9 amounts to the assumption that all cells are initially in state S 1 . If we allow for 188 the possibility that some cells are in state S 2 when CadC is activated, we have a mixed 189 initial condition, where the process starts either from state S 1 or from state S 2 .

190
Denoting the fraction of cells that are initially in state S 2 by x, such that a fraction 191 Results from fitting the experimentally computed CDFs to the sequential reversible model with mixed initial condition (N-P cadBA ) and fixed initial condition (T-P cadBA and N+T-P cadBA ). The fit parameters α, β and c were used to compute the mean first passage time τ and the variance σ 2 with uncertainties obtained from error propagation using the full covariance matrix.
1 − x starts in state S 1 , we obtain a first passage time distribution and associated 192 cumulative distribution function of the form where the new parameter c corresponds to c = xk + 2 . The associated mean first passage 194 time is The mean search time is less than 5 min and not affected by shows an adequate fit using the sequential model with fixed initial condition in Eq 9.

212
The calculated mean first passage time of τ ≈ 4.20 ± 0.15 min is comparable to that 213 for the wild type strain. The slower initial increase is compensated by a faster increase 214 at later times to yield a slightly smaller mean search time. To characterize the shape of 215 the mean first passage time distributions, we calculated the variance σ 2 of p(τ ), see 216 Table 1, which is smaller for strain T-P cadBA than for the wild type.  Localization of ori and CadC spots. Localization probability of CadC spots in N-P cadBA cells (orange) and ParB localization marking ori (blue) along the half long axis of cells. The half long axis is normalized such that mid-cell is at x = 0 and the poles are at x = 1. Overlaps of the two distributions are shown in darker orange. Cell age is taken into account by splitting all occurring cell lengths into ten equally spaced steps ∆l and pooling the cells according to their size. From the ten different age classes we observed similar localization probabilities for l = (1 to 2)∆l (A), l = (2 to 6)∆l (B) and l = (6 to 10)∆l (C), which are therefore grouped together in this plot.
Search time is decreased with two chromosomal CadC binding 218 sites 219 After observing essentially the same mean search time for two very distant locations of 220 CadC target sites on the chromosome, we wondered how a strain harboring both target 221 sites would behave. We therefore repeated the measurements for E. coli strain 222 N+T-P cadBA , which has the native DNA-binding site and additionally the binding site 223 at the terminus. As shown in Fig 2C (cyan triangles), the response function of this 224 strain saturates much earlier than for the other two strains. Fitting the response data 225 to the sequential model with fixed initial condition (Eq 9), we obtained a mean first 226 passage time of τ ≈ 2.02 ± 0.12 min, which is only around half of the time than for a 227 single chromosomal binding site.

228
Colocalization of CadC spots with the DNA-binding site 229 We also wondered whether the fluorescence spots indicating the position of stable 230 CadC-DNA complexes in single cells would show a similar spatial distribution as the 231 cadBA locus. We therefore analyzed the localization of CadC spots along the long axis 232 of the cell in E. coli wild type. As an estimate for the position of the cadBA locus along 233 the cell, we tracked the position of ori at low pH. Towards this end we inserted a parS 234 gene close to ori and let ParB-yGFP bind to it, making the ori region visible as a represented by a path on the 3D lattice (the 'cytoplasm') and the CadC dimer moves on 248 the surface of the lattice (the 'membrane'). Starting from a random initial configuration, 249 the CadC dimer diffuses in the membrane with a rate k 2D , binds non-specifically to 250 DNA segments that are at the membrane and close to CadC with rate k on and unbinds 251 (k off ) or slides (k 1D ) along DNA segments at the membrane when bound. The

307
This is the expected result for two binding sites moving independently of each other due 308 to the decorrelation of polymer subchains in spatial confinement. How far two binding 309 sites have to be apart along the DNA to behave as independent targets therefore 310 depends on the size of the simulated cell. The initial inverse-N b scaling of τ flattens 311 progressively as the binding sites come closer to one another and are more correlated in 312 their movement. Placing the binding sites next to each other (orange stars) has an 313 expectedly small effect for small N b , since it only increases the size of the binding site. 314 The curve becomes steeper as the binding sites occupy a larger fraction of the polymer. 315 Given that the two DNA-binding sites in N+T-P cadBA are on opposite sides of the 316 chromosome, we expect them to behave as two independent binding sites. The 317 experimentally measured reduction of the search time by roughly a factor of two is 318 therefore in good agreement with our model. Also, by construction, the position of the 319 binding site along the chromosome has no effect on the simulated search times.  To further validate this observation, we used our numerical simulations to 335 approximate a target search exclusively due to CadC diffusion in the membrane. We 336 used the realistic parameters but placed the DNA-binding site at the membrane and set 337 the DNA diffusion rate to zero, such that only CadC was moving, yielding a mean   Using the approximate formula τ ≈ Ω 4rtD [44] for the encounter time of a particle with 359 diffusion constant D to locate a target of radius r t on the surface of a confining 360 spherical domain with volume V we obtained τ ≈ 14 min for a monomer with diffusion 361 constant D 0 and τ ≈ 1400 min for a monomer with diffusion constant D0 NDNA . Finding a 362 value between these two approximations is what we expected, as they do not account for 363 the polymer dynamics.  [26]. Given the severe constraint of 409 a membrane-anchored target search, it seems surprising that the search time is only 410 about 5-fold slower than the search time of the cytosolic Lac repressor for its operator, 411 which takes around one minute at a similarly low protein level [12].

412
As the position of the DNA-binding site along the chromosome has no influence on 413 the mean search time, the target search process appears to be quite robust. Given that 414 the chromosome is highly organized within the cell, we believe this effect to arise mostly 415 due to the mobility of CadC. While we do not have new evidence against proteolytic 416 processing, this leads us to favor a diffusion and capture mechanism over transertion.

417
Diffusion and capture mechanisms are well established for the localization of 418 membrane-integrated proteins like SpoIVB in Bacillus subtilis [45,46]. As the search 419 time is independent of the position of the DNA-binding site, transertion of CadC is at 420 least not a requirement for fast response, in agreement with a previous evaluation of the 421 three models [28]. Our finding that the mean search time decreases by a factor of two in 422 a mutant with two DNA-binding sites is also consistent with the diffusion and capture 423 mechanism. Our simulations show that this is the expected result for two independent 424 and equally accessible binding sites, where distant parts of the polymer become 425 uncorrelated in their motion due to the confinement in the cell.

426
Despite the simplicity of our biophysical model of the target search, we obtained 427 search times that match the experimental measurements surprisingly well, even without 428 consideration of non-specific binding and sliding. It therefore does not seem to require a 429 fine-tuned strategy to make the target search work. Although the simulations are with spring constant k s and an equilibrium bond length b = l k . Following the 492 Bond-Fluctuation model, the polymer is moved by attempting the displacement of a 493 single bead to one of the nearest lattice points, the spring potential being enforced by a 494 Metropolis algorithm [50]. CadC beads are also moved by random displacements but 495 constrained to the cell surface or to the DNA when bound. The kinetic Monte Carlo 496 algorithm simulates the master equation with p(x, t) the probability that the system is in state x at time t and transition rates 498 w x x = w(x → x) between the states. The rates governing the dynamics are the 499 polymer diffusion rate k DNA , which is set to one unless otherwise stated, the rate for 500 CadC diffusion on the cell surface k 2D , the non-specific binding rate k on , the unbinding 501 rate k off and the one-dimensional sliding rate of CadC along the polymer k 1D . Unless    In order to gain strain E. coli MG1655-parS ori, the parS site of Yersinia pestis was 538 inserted at the origin of replication (ori ) at 84.3 in E. coli MG1655. Briefly, the parS 539 region was inserted between pstS and glmS. Therefore, DNA fragments comprising 540 650 bp of pstS and glmS and the parS region were amplified by PCR using MG1655 541 genomic DNA as template and the plasmid pFH3228, respectively. After purification, 542 these fragments were assembled via Gibson assembly [53] into EcoRV-digested 543 pNPTS138-R6KT plasmid, resulting in the pNTPS138-R6KT-parS ori plasmid. The 544 resulting plasmid was introduced into E. coli MG1655 by conjugative mating using E. 545 coli WM3064 as a donor on LB medium containing DAP. Single-crossover integration 546 mutants were selected on LB plates containing kanamycin but lacking DAP. Single 547 colonies were then streaked out on LB plates containing 10 %(wt/vol) sucrose but no 548 NaCl to select for plasmid excision. Kanamycin-sensitive colonies were then checked for 549 targeted insertion by colony PCR and sequencing of the respective PCR fragment. In 550 order to gain strain E. coli MG1655-ΔcadC parS ori, the parS site of Y. pestis was 551 inserted at the origin of replication (ori ) at 84.3 in E. coli MG1655-ΔcadC as 552 described above.

553
In order to gain strain E. coli MG1655 P cadBA terminus, the cadBA promoter region 554 was inserted at the terminus (33.7 ) in E. coli MG1655. Construction of this strain was 555 achieved via double homologous recombination using the 556 pNTPS138-R6KT-P cadBA terminus plasmid [28] as described above. Correct colonies 557 were then checked for targeted insertion by colony PCR and sequencing of the 558 respective PCR fragment.

559
Details of the strains and plasmids used in this study are summarized in Table 3. and a 605 nm emission filter with a 75 nm bandwidth was used for mCherry fluorescence 575 with an exposure of 500 ms, gain 5, and 100 % intensity. Before shifting the cells to low 576 pH, 2 µl of the cultures in KE medium pH 7.6 were spotted on 1 %(w/v) agarose pads 577 (prepared with KE medium pH 7.6) and imaged as a control.

578
To analyze the spatiotemporal localization of a chromosomal locus, the parS site was 579 inserted close to the ori. The localization of the parS site was visualized via the binding 580 of ParB-yGFP [58]. E. coli MG1655 parS ori cells carrying plasmid pFH3228 were 581 cultivated in KE medium pH 7.6 as described above. At an OD 600 of 0.5, 2 µl of the 582 culture were shifted on 1 %(w/v) agarose pads (prepared with KE medium pH 7.6 or pH 583 5.8 + lysine) and placed onto microscope slides and covered with a coverslip. To analyze the fluorescence microscopy images for CadC or ParB spots within the cells, 591 we used Oufti [59], an open-source software designed for the analysis of microscopy data 592 for cell segmentation of the phase contrast microscopy images. The resulting cell 593 outlines were used in a custom-written software implemented in Matlab and available on 594 request to detect fluorescent spots. Briefly, a graphical user interface (GUI) was 595 implemented that allows testing the parameters in a test mode before running the 596 actual detection. In detection mode a function SpotDetection.m is called, that iterates 597 through all frames and all cells. For each cell, from pixels in the fluorescence microscopy 598 images the intensity of which is above a threshold defined by the parameters and In a custom-written Matlab script the results from the image analysis were used to compute the fraction of cells with spots ν(t) as a function of time t after receptor activation, which upon normalization corresponds to the CDF of the search time distribution. We fit the data to the CDF of a theoretical model using the curve fit function of the scipy module in Python, choosing a trust region reflective algorithm, which is an evolution of the Levenberg-Marquardt method that can handle bounds. This algorithm was developed to solve nonlinear least squares problems and combines the gradient descent method and the Gauss-Newton method. It minimizes the sum of the weighted squares of the errors between the measured data y i and the curve-fit functionŷ(t; p) with an independent variable t and a vector of n parameters p and a set of m data 606 points (t i , y i ). σ yi is the measurement error for measurement y(t i ) and W ij = 1 with Boltzmann's constant k B , absolute temperature T , Kuhn length l K , friction  Table. Fit results. Results from fitting the experimentally computed CDF to 629 the sequential reversible model with mixed initial condition (N-P cadBA ) and fixed initial 630 condition (T-P cadBA and N+T-P cadBA ). The fit parameters α, β and c were used to 631 compute the mean first passage time and the variance with uncertainties obtained from 632 error propagation using the full covariance matrix.