Transcription-Factor-Mediated DNA Looping Probed by High-Resolution, Single-Molecule Imaging in Live E. coli Cells

A high-resolution, single-molecule study directly assesses the prevalence and dynamics of DNA looping in gene regulation in live E. coli cells.


Introduction
Looping between two DNA sites, mediated by transcription factors, is a ubiquitous mechanism in prokaryotic transcription regulation [1]. DNA looping brings two distal DNA sites into close proximity, enhancing interactions between transcription factors bound at separate sites or bringing transcription factors close to RNA polymerase at the promoter. Knowing when and how DNA loops in vivo is important to understand the role of DNA looping in gene regulation and cell decision-making; some studies found molecular details of gene regulation have little influence on gene expression [2][3][4], while others suggested that DNA looping could trigger cell phenotype switching [5] and influence fluctuations in transcription activity [6].
Biochemical, biophysical, and genetic studies have established important roles of DNA looping in transcription regulation. However, transcription-factor-mediated DNA looping on the length scale of a few kilobases in prokaryotic cells has not been directly visualized in vivo, and the in vivo dynamics of DNA looping are difficult to investigate. Chromosome conformation capture (3C) has been used to detect juxtaposition of DNA sites separated by hundreds of kilobases in both eukaryotic and prokaryotic cells [14,15], but high background of interactions at the kilobase scale limits the utility of these methods in studying typical prokaryotic DNA loops [16]. An in vivo imaging method using fluorescent proteins fused to DNA-binding proteins bound to tandem arrays of hundreds of binding sites has been employed to visualize homologous chromosome pairing in yeast induced by double-strand breaks [17]; however, an array of several kilobases of binding sites makes this method unsuitable for studying DNA loops of only a few kilobases. In addition, the long array of tightly bound protein molecules may be detrimental to cells [18].
We developed a two-color, high-resolution imaging method to directly measure the end-to-end separation of two DNA sites 2.3 kb apart in live E. coli cells (Figure 1a). This method is based on the ability to precisely determine the location of a specific DNA site in vivo [19]. By expressing a fluorescent protein in fusion with a DNA-binding protein in a cell with only three tandem binding sites (spanning less than 100 bp), the resulting fluorescent spot is diffraction-limited, and the location of the binding site can be determined with sub-diffraction-limited precision by fitting its fluorescence profile to a two-dimensional Gaussian function [20]. By labeling two ends of a DNA segment with two unique sets of binding sequences and co-expressing corresponding fluorescent DNA-binding fusion proteins of different colors, the distance between the two DNA sites can be determined with a precision of a few tens of nanometers. An in vitro experiment employing the same principle measured intramolecular distances using organic dyes [21], but this approach has not been demonstrated in vivo with comparable resolution using fluorescent proteins.
We used our method to probe the mechanisms and dynamics of DNA looping mediated by the bacteriophage l repressor CI [22] in live E. coli cells and investigate its regulation of transcription from the CI promoter P RM . The l repressor CI is an essential transcription factor in determining the fate of an E. coli cell infected by the bacteriophage l. When CI is expressed, it represses lytic promoters to commit to an extraordinarily stable lysogenic state that persists for millions of generations [23][24][25]. However, upon induction by UV irradiation or other specific events, CI degradation can trigger an irreversible switch from lysogenic to lytic gene expression within one cell generation time [26].
The robustness of the l regulatory circuit has been extensively studied. Among many important features of the system such as promoter-operator arrangement [27,28], CI autoregulation [3,29,30], and cooperative binding [31][32][33][34], DNA looping between the homologous rightward and leftward operators O R and O L , separated by 2.3 kb, was shown to play significant, fatedetermining roles in the l lifecycle [13,35]. Cooperative binding of CI dimers at the subsites O R 1 and O R 2 of O R represses the lytic promoter P R (reviewed in [36]) and simultaneously activates CI's own promoter, P RM , by accelerating transcription initiation [37][38][39]. At higher CI concentrations, an additional CI dimer binds to O R 3 and represses P RM [40].
As illustrated in Figure 1a, an octameric CI complex (with or without an additional CI tetramer) can mediate DNA looping by bridging O R and O L . These higher-order complexes result from interactions between CI dimers bound to subsites at O R 123 and O L 123, and were first identified in vitro by ultracentrifugation [41] and later visualized by EM [12] and AFM [42]. Looping dynamics were investigated in vitro using tethered particle motion (TPM) [43][44][45][46].
To gain quantitative insight into the relationship between CImediated DNA looping and transcription regulation, thermodynamic models and numerical simulations were developed [33,35,44,[47][48][49][50][51][52]. Key parameters in these studies were the free energies of octameric and tetrameric CI interactions that mediate DNA looping [35]. These free energies specify the DNA looping probability at a given condition (temperature, CI concentration, etc.) and hence the extent to which distal DNA sites affect each other. To date, DNA-looping probabilities and free energies were either estimated indirectly in in vivo studies by measuring P RM and P R activities in various operator mutants with a priori assumptions of DNA looping states [35,49,51] or measured using purified components in vitro, where conditions differ from those in a cellular environment [42][43][44][45][46]. Consequently, these studies yielded varying estimates for the free energies of DNA looping and the degree to which DNA looping influences P RM activity. Hence, the roles of CI-mediated DNA looping in transcription regulation are still in debate [13,35,49,51,53].
In this study, we tracked the apparent separation between the O R and O L sites on a l DNA segment (termed O R -O L DNA below) in real time in live E. coli cells, from which we obtained the first direct estimates of in vivo looping frequencies and kinetics for both wild-type DNA and for DNA carrying mutations in O R 3 and O L 3. We also measured corresponding CI expression levels in these strains by counting the number of CI transcripts in individual cells. Applying these independent, in vivo measurements to a thermodynamic model, we were able to obtain looping free energies and quantify the influence of DNA looping on P RM expression. Furthermore, we discuss how the compaction of the E. coli chromosome may impact DNA looping kinetics. The methodology established in this work can be extended to a broad range of questions regarding chromosomal DNA conformation and/or gene activities in prokaryotes and higher organisms.

High-Resolution Imaging of Two DNA Sites
We inserted the construct shown in Figure 1a into the E. coli chromosome. It contains three tandem tetO sites (tetO 3 ) [54] and three tandem lacO sym sites (lacO 3 ) [55] flanking the wild-type l lysogen sequence from O R to O L (including the P R , P RM and P L promoters and the cI, rexA (accession number P68924) and rexB (accession number P03759) genes). In this construct, called lWT, CI is expressed from P RM and regulates its own expression. The

Author Summary
One mechanism cells use to regulate gene expression is DNA looping, whereby two distant DNA sites are brought together by regulatory proteins. The looping then either enhances interactions between other regulatory proteins bound at the separate sites or brings those regulatory proteins close to RNA polymerase at the promoter. Recent work in bacteriophage l has suggested that DNA looping mediated by a transcription factor called l repressor CI plays a critical role in regulating the expression of l genes and consequently in determining the fate of the host E. coli bacterial cells. CI-mediated DNA looping has been directly demonstrated in vitro, but it has only been indirectly inferred in vivo. For the current study we developed a method to visualize CI-mediated DNA looping in individual live E. coli cells. We labeled two DNA sites-one each side of the proposed loop-with differently colored fluorescent fusion proteins, allowing us to measure their separation with an accuracy of a few tens of nanometers. Using this method, we directly analyzed CI-mediated DNA looping, providing insight into how transcription factor-mediated DNA looping influences gene regulation in live E. coli cells. Our methodology can be applied to a broad range of questions regarding chromosome conformation in prokaryotes and higher organisms.
lacO-binding and tetO-binding proteins LacI and TetR (accession number P04483) were fused with red and yellow fluorescent proteins to generate LacI-mCherry and TetR-EYFP, and were expressed from an inducible plasmid (Figure 1b).
With the combination of strong induction, weak ribosome binding sites, and carefully controlled growth, we achieved sufficiently low LacI-mCherry and TetR-EYFP expression levels to detect distinct, diffraction-limited mCherry and EYFP spots in single cells. We then fit the fluorescence intensity profile of each individual spot with a two-dimensional Gaussian function to estimate its centroid position. The average localization precisions for individual spots of LacI-mCherry and TetR-EYFP were 17 and 14 nm, respectively ( Figure S1a). Subsequently, we transformed EYFP coordinates into mCherry coordinates using fiducial data to calculate the vector between the mCherry and EYFP spots arising from LacI-mCherry and TetR-EYFP protein molecules bound to the same O R -O L DNA segment. We called this vector r I lac=tet ( Figure 1c). The magnitude of the vector, r lac=tet , is the twodimensional projection of the distance between lacO 3 and tetO 3 onto the image plane; on average, it is proportional to the end-toend distance between lacO 3 and tetO 3 in three dimensions. The total error for an r lac=tet measurement, including fitting errors in determining centroid of individual spots ( Figure S1a), registration errors in aligning EYFP and mCherry two-color images (,10 nm based upon experiments using fluorescent beads), and contributions from local fluorescent background, was on average ,40 nm (see below). With very low TetR-EYFP and LacI-mCherry expression, it was inevitable that not all lacO 3 and tetO 3 sites were bound by fusion protein molecules. Furthermore, not all fusion protein molecules were fluorescent due to stochastic chromophore maturation. Figure 2a contains typical data showing that a subset of cells was successfully labeled at both sites. We analyzed all cells having distinct fluorescent spots in both emission channels to calculate r I lac=tet . We expected r lac=tet to decrease when DNA between lacO 3 and tetO 3 looped.

Distinguishing Between Looped and Unlooped States
To determine whether our two-color imaging method was sufficient to distinguish between looped and unlooped DNA in the crowded intracellular environment, we constructed two control strains (Table 1). In the positive control lnull, the centers of lacO 3 and tetO 3 sites are separated by 66 bp (Figure 1d). The outmost lacO sym and tetO sites are separated by less than 40 nm ( Figure S2a). The close proximity of lacO 3 and tetO 3 mimicked permanently looped DNA. In the negative control lDO L , we inserted the l sequence from O R up to but not including O L between lacO 3 and tetO 3 (Figure 1e). The resulting lDO L DNA has comparable length as the wild-type l DNA, but CI-mediated DNA looping between O R and O L is abolished.
We first examined lnull and lDO L in two-color fluorescence images to determine whether we could discriminate between looped and unlooped DNA by eye. We obtained at least sixty 20frame movies (100 ms exposures; 2 s total) for each strain in each of three independent experiments. Typical fluorescence images are shown in Figure 2a and b. Crosstalk between the two emission channels was negligible, as bright mCherry and EYFP spots only appeared in the corresponding channel but not the other. Figure 2c and d show 1 s of typical data for individual lnull and lDO L spots. Representative movies for the two strains and others discussed below are included as Movies S1, S2, S3, S4, S5, S6. As expected for a permanently looped configuration, the positive control lnull exhibited overlapping EYFP and mCherry spots Three tandem lacO sym and tetO sites, termed lacO 3 and tetO 3 , were placed immediately next to O L and O R , respectively. Red and yellow fluorescent fusion proteins LacI-mCherry and TetR-EYFP bind lacO 3 and tetO 3 , respectively. DNA looping mediated by a CI octamer (blue) or an additional CI tetramer (dashed) brings lacO 3 and tetO 3 together. Strains lO R 3 2 and lO L 3 2 harbor mutations (described in main text) to O R 3 and O L 3, respectively, that prevent CI dimers from binding these operator sites. (b) LacI-mCherry and TetR-EYFP are expressed co-transcriptionally from separate ribosome binding sites on a plasmid. (c) Illustration of r I lac=tet measurement. The observed distance between mCherry and EYFP spots indicates the distance between lacO 3 and tetO 3 projected onto the imaging plane. (d) Positive control lnull. The centers of lacO 3 and tetO 3 are separated by only 66 bp (see Figure S2a). (e) Negative control lDO L . O L is deleted to eliminate CI-mediated DNA looping. doi:10.1371/journal.pbio.1001591.g001 ( Figure 2c). Generally, lDO L molecules did not exhibit spot separation that was easily identifiable by eye ( Figure 2d). However, some lDO L molecules displayed large displacements between the LacI-mCherry and TetR-EYFP spots that were distinguishable by eye ( Figure 2e); such images were not observed for lnull.
Visual inspection of the apparent separation between the LacI-mCherry and TetR-EYFP spots suggested that comparing the end-to-end separation in O R -O L DNAs required a more quantitative approach. We calculated r  vectors for all movies lasting 0.8 s or longer. We then compiled the corresponding probability density distributions (PDF, P r lac=tet À Á , Figure 3a) and cumulative density distributions (CDF, C r lac=tet À Á , Figure 3b) of the vector magnitude, r lac=tet . The long-tailed PDF observed for lnull ( Figure 3a) is consistent with the expected end-to-end distance distribution measured from two spots with a fixed separation when the localization of each spot is subject to Gaussian fitting error [56]. A simple numerical simulation of the end-to-end distance PDF for two sites separated by 22 nm and each subject to 22-nm localization error largely recapitulates the long-tailed distribution ( Figure S2c).
We found that the r lac=tet distribution for lDO L was distinctly different from that of lnull (p,10 23 ); the difference was reproduced in three independent experiments ( Figure S1b). The  lWT lnull lacO 3 tetO 3 :: mean separations, Sr lac=tet T, were 47 (N = 1,153) and 71 nm (N = 979) for lnull and lDO L respectively (results and measurement errors summarized in Table 2). Peaks in P r lac=tet À Á plots centered at ,40 nm, reflecting our experimental precision in determining r lac=tet ; that is, O R -O L molecules with r lac=tet below 40 nm could not be distinguished from each other. Hence, it was more meaningful to compare distributions of r lac=tet at large values where r lac=tet distributions differed most prominently. The cumulative probability of r lac=tet being 75 nm or more was ,40% for lDOL and only ,15% for lnull ( Figure 3b). Furthermore, two-dimensional distributions of r I lac=tet vectors ( Figure S4) were clearly wider for lDO L than for lnull. Thus, by examining r lac=tet distributions, we could distinguish between the looped and unlooped control strains, suggesting that this approach could be used to probe CI-mediated DNA looping.
Compact Conformation of Unlooped DNA lDO L Does Not Depend on Transcription or Nonspecific CI Binding We measured the mean end-to-end separation Sr lac=tet T for lDO L at 71-nm, much shorter than the ,200-nm distance expected for B-form DNA with a typical 50-nm in vitro persistence length [57]. While such a result is expected given the many factors known to compact prokaryotic chromosomes [58], it is possible that nonspecifically bound CI on the lDO L DNA and/or P RM transcription activity could influence the r lac=tet distribution, as indicated by a series of recent studies in vitro and in higher eukaryotic systems [46,59,60].
To examine these possibilities, we first compared the r lac=tet distribution of the lDO L strain to that of a control strain lDO L P RM 2 cI 2 /cI trans (Table 1, Figure S5a and b). In this control strain, promoter P RM was mutated to abolish transcription and the cI start codon was eliminated, but CI binding to O R was unaffected ( Figure S5c, d, and e). In addition, we expressed CI from a plasmid at ,9 times its level in lWT (Table S8). We found that the r lac=tet distributions of the lDO L and lDO L P RM 2 cI 2 /cI trans strains were indistinguishable ( Figure S5a and b), demonstrating that the compact lDO L distribution does not depend on P RM transcription. Furthermore, r lac=tet distributions for the same lDO L P RM 2 cI 2 strain with or without the CI-expressing plasmid were indistinguishable ( Figure S5a and b), suggesting that nonspecifically bound CI did not interact with specifically bound CI at O R operator sites to condense DNA in vivo [46].

In Vivo Observations of DNA Looping
We next investigated DNA looping in the context of wild-type and mutant O R -O L DNAs. In lWT, the wild-type l sequence from O R through O L was inserted between lacO 3 and tetO 3 . CI could bind all O R and O L sites to mediate looping with both octameric and tetrameric CI complexes ( Figure 1a). In lO R 3 2 and lO L 3 2 , mutations in O R 3 and O L 3 essentially eliminated CI binding to these operators at lysogenic CI concentrations (Table 1) [35,61].
We measured r I lac=tet for these three strains and found that r lac=tet distributions differed significantly from those of the positive and negative controls lnull and lDO L (p,10 23 , except p = 0.004 for lWT and lnull), with P r lac=tet À Á and C r lac=tet À Á being intermediate to those of the controls (Figure 3c and d). Mean r lac=tet values for the three strains also fell in between those of lnull and lDO L ( Table 2). The wild-type strain had lower Sr lac=tet T than lO R 3 2 and lO L 3 2 , and its distribution differed from those of the mutant strains with moderate to high significance (p = 0.001 and 0.048 for lO R 3 2 and lO L 3 2 , respectively); r lac=tet distributions for lO R 3 2 and lO L 3 2 were indistinguishable from each other (p = 0.493). The trend of lnull,lWT,lO R 3 2 <lO L 3 2 ,lDO L for Sr lac=tet T was reproduced in three independent experiments ( Figure S1b). Assuming that a DNA molecule in the lWT, lO R 3 2 , and lO L 3 2 strains is in either a looped or unlooped state, the intermediate Sr lac=tet T values of the three strains suggested that the fraction of looped DNA molecules (herein termed looping frequency) could be estimated by comparing r lac=tet distributions of these strains to those of the looped and unlooped controls lnull and lDO L .
To further investigate whether the observed DNA looping in the lWT, lO R 3 2 , and lO L 3 2 strains could be abolished by eliminating CI cooperative binding rather than by deleting O L , we constructed a control strain lCI G147D (Table 1). This strain differs from lWT by a CI mutation G147D known to be defective in pairwise cooperative interaction [62,63]. Structural evidence suggests that cooperative binding interfaces are shared for pairwise binding to adjacent operator sites and the formation of CI tetramers or octamers via DNA loops [64]. We found that the r lac=tet distribution of the lCI G147D strain was indistinguishable from that of lDO L ( Figure S5f and g, Table S7). We note that this G147D mutant also diminishes P RM transcription because of its weakened ability to form a CI tetramer at the O R 1 and O R 2 sites; hence its expression level is lower than that with wild-type CI (Table S8). Therefore, we constructed another control strain (lCI G147D /cI G147D,trans ), in which the CI G147D mutant protein was expressed constitutively at ,11 times the CI expression level in lWT from a plasmid transformed into the lCI G147D strain (Table  S8). We found that r lac=tet distribution of this strain was indistinguishable from that of the lDO L and the lCI G147D strains, demonstrating that DNA looping could be abolished by eliminating CI cooperative binding.
Estimating DNA Looping Frequency from C r lac=tet À Á To quantitatively examine how operator mutations influence DNA looping, we estimated looping frequencies for lWT, lO R 3 2 , and lO L 3 2 by assuming a simple model. In this model, DNA molecule can only exist in one of two states, looped or unlooped, with r lac=tet distributions for each state resembling those of the looped and unlooped controls, lnull and lDO L , respectively. Therefore, the distribution P r lac=tet À Á or C r lac=tet À Á for one of the three strains is the linear combination of that of lnull and lDO L , with their distributions weighted by the looping frequency, f : Using this model, we found that the looping frequency was 79% for lWT, and reduced to 53% for lO R 3 2 and 60% for lO L 3 2 (results with errors summarized in Table 2). The results were indistinguishable within error regardless of whether cumulative or probability density distributions were used, or whether data points from all frames or only the first frame of each molecule's movie were used (Table S1). The looping frequencies for lO R 3 2 and lO L 3 2 were indistinguishable from each other within error, suggesting a similar role of O R 3 and O L 3 in loop formation. Reduced looping frequencies of lO R 3 2 and lO L 3 2 compared to lWT suggest that while a CI octamer at O R 12 and O L 12 is sufficient to loop DNA, the resulting loop can be further stabilized by an additional CI tetramer only if both O R 3 and O L 3 are intact. To our knowledge, these measurements provide the first quantitative in vivo estimates of DNA looping frequencies that are independent of gene regulation models.

Estimating DNA Looping Kinetics
In the above analyses, we only utilized r lac=tet , the magnitude of the r I lac=tet vector, and discarded information about the direction of r I lac=tet and its evolution in time. Looping frequencies estimated from r lac=tet distributions are analogous to equilibrium constants and lack kinetic information. While many DNA molecules only exhibited fluorescent spots in both EYFP and mCherry channels for one or two consecutive frames due to photobleaching, some molecules had fluorescent spots lasting for several consecutive frames in both channels (Figure 2c-h; also see r I lac=tet plots from molecules with many frames in Figure S3). By analyzing how r I lac=tet evolves in time, we can obtain additional information about DNA looping kinetics.
We calculated the autocorrelation of r

Single-Molecule Measurements of CI Expression Levels
Next, we measured average CI expression levels, S CI ½ T, in all strains in order to understand to what different extent DNA looping influences P RM regulation. We used single-molecule fluorescence in situ hybridization (smFISH, [2,65,66]), in which multiple fluorescently labeled oligonucleotides probe targeted nonoverlapping regions of cI mRNA, to count the number of P RM transcripts in individual cells (Figure 4b and c). Given the assumption that the average number of CI molecules translated per P RM transcript is the same in all strains and the observation of indistinguishable cell growth rates ( Figure S6a and b), we expected average mRNA expression levels proportional to S CI ½ T. The lnull strain does not contain the cI gene and was used as a negative control. All other strains were transcriptionally active. Under our experimental conditions, the false positive rate using the lnull strain was ,1 transcript per 50 cells, two orders of magnitude below the levels of all other strains; false positives arise when nonspecifically bound probes occasionally co-localize to create a fluorescent spot above the detection threshold. Typical smFISH images of the five strains are shown in Figure 4b. We quantified the number of transcripts in each individual cell by dividing the total intensity of fluorescent spots in each cell by the average intensity of a single-transcript spot (Figure 4c). We then determined S CI ½ T in wild-type l units (WLU) by dividing the average number of transcripts in cells of a given strain by the average number of transcripts in lWT cells. We found that deleting O L increased S CI ½ T to ,1.4 WLU ( Table 2), indicating that the DNA loop formed between O L and O R in lWT enhances P RM repression. Mutating either O R 3 or O L 3 further increased S CI ½ T to ,2.5 WLU. These observations are consistent with previous observations that although O L 3 is 2.3 kb away from the P RM promoter, it has as important a role as O R 3 in repressing P RM at lysogenic CI concentrations [13]. This suggests that P RM was not strongly repressed by CI binding to O R 3 in the absence of a tetrameric interaction with an additional dimer at O L 3. Finally, elevated S CI ½ T in lO L 3 2 relative to lDO L indicated that DNA looping could also activate P RM , which was likely mediated by the binding of a CI octamer at O L 12 and O R 12, and was consistent with recent in vivo [49,51] and in vitro [53] experiments.

Evaluating Looping Free Energies and Transcription Activation Using a Thermodynamic Model
We have shown that reduced looping frequencies in lO L 3 2 and lO R 3 2 compared to that in lWT corresponded to increased expression levels of CI in the two strains, and that unlooped lDO L has a higher expression level than the lWT strain. To establish a quantitative framework that explains all observed relationships between looping and CI expression levels, we refined a thermodynamic model, with which we estimated looping free energies and the degree to which DNA looping changes the activity of P RM . These parameters are important because free energies describe the likelihood of interaction between two distal DNA sites, and changes in promoter activity directly reflect the influence of DNA looping on gene regulation.
The thermodynamic approach was first applied to model repression and activation of P RM by CI bound to O R [52] and recently modified to address looping [35,44,49,51]. Our modeling approach is unique in that we used two independent, in vivo measurements, looping frequencies, and corresponding CI expression levels, to refine parameters for DNA-looping free energies and transcription activities. In previous modeling work, DNA-looping free energies were either inferred from P RM and P R expression-level measurements [35,49,51] or estimated using in vitro data [44].
The thermodynamic model and fixed physical parameters from previous reports we used to estimate P RM expression levels and DNA looping frequencies are essentially identical to the one used to analyze in vivo gene expression experiments [35]. Briefly, we assume that DNA states can be enumerated, that steady-state, in vitro DNA-binding measurements are applicable in vivo, and that mean expression rate, SkT, equals the sum of all products k i P i , where k i is the transcription rate in a particular state and P i is the probability of the state at a given concentration of free CI dimers CI free 2 Â Ã : Each state is defined by its free energy, DG i , the number of bound CI dimers, n i , and the degeneracy, d i , which is the number of states with the same DG i , n i , and k i . The model is described in greater detail in the Materials and Methods section; all states considered are listed in Table S2. P i is normalized by the partition function, Z, so that the sum of all state probabilities is 1. Following earlier work [49] and considering that the CI-mediated loop is relatively long, we assumed looping free energies to be independent of parallel or antiparallel orientation. Note that loop orientation is important in shorter DNA loops such as those mediated by Gal repressor [67]. We approximated the average CI concentration, S CI ½ T, as the concentration at which the degradation rate equaled the production rate. We refined our model to fit seven experimental observables: CI expression levels for lDO L , lWT, lO R 3 2 , and lO L 3 2 , and the looping frequencies for lWT, lO R 3 2 , and lO L 3 2 . We varied four free parameters: the free energies of forming a CI octamer and tetramer in the DNA loop as defined by Dodd et al. [35], DG oct , and DG tet , and the P RM expression rates when O R 12 is bound by CI and DNA is either looped (k looped ) or unlooped (k unlooped ). DG oct is the free energy of bringing together O R and O L when both are bound by two adjacent CI dimers to form a CI octamer, resulting in a looped conformation. DG tet is the free energy of adding a CI tetramer to a loop already secured by a CI octamer. All other free energies and parameters such as specific and nonspecific DNA binding of CI were fixed at the values used by Dodd et al. [35]. The wild-type CI concentration was fixed to 220 nM (,150 molecules/cell) based upon our previous experiment in which CI molecules were counted at the single-molecule level in a similar strain at similar growth conditions [3]. The CI degradation rate was fixed to give a half-life equal to the observed 2-h doubling time in our experiments.
The four free parameters were adjusted to best fit our experimental measurements of looping frequencies and CI expression levels. Modeled looping frequencies and CI expression rates at different CI concentrations are shown in Figure 5a and b. The best fit estimated DG oct and DG tet at 0.3 and 23.2 kcal / mol , respectively, and the CI expression rates at 1.9 nM/s and 4.5 nM/s for unlooped (k unlooped ) and looped (k looped ) DNA when CI binds O R 12. These results suggest that the DNA looping mediated by only a CI octamer is not strongly favored, while looping mediated by both an octamer and tetramer is the dominant configuration if all six binding sites are bound by CI dimers. Note that a small, positive DG oct is consistent with measured looping frequencies greater than 50% for DlO L 3 2 and DlO R 3 2 , as one unlooped configuration could lead to multiple looped configurations (Table S2). The higher CI expression rate from the looped configuration suggests that, in the absence of O R 3 binding, bringing the distal O L and O R sites together to form a DNA loop activates P RM to 2.4 times the unlooped level.
To test how sensitive the fitting results were to two fixed parameters that are poorly defined in previous work, we varied CI expression levels and nonspecific DNA binding affinity. We found that across the examined ranges, octameric looping energies, DG oct , were consistently near 0 and tetrameric looping energies, DG tet , were strongly favorable between 22.8 to 24.6 kcal / mol (Table S3). Similarly, CI expression rates k unlooped and k looped remained close to the original fit values, giving activation ratios between 1.7 and 2.5 (Table S3). We also verified that our fit parameters were unique-as shown in Figure 5c and d, the values of fit parameters corresponded to a well-defined minimum in the sum of squared residuals in the four-dimensional (two free energies and two expression rates) parameter space (Figure 5c and d).
Hence we conclude that the four fit parameters resulted from the model were robust and well defined.

Discussion
In this work, we directly measure the end-to-end separation between two DNA sites separated by only 2.3 kb on the E. coli chromosome with high spatial resolution, and report the first estimates of CI-mediated DNA looping frequencies in live E. coli cells. We improved a thermodynamic model to estimate the free energies of DNA looping as well as the degree to which DNA looping enhances transcription regulation. Combining independent, single-molecule measurements of looping frequencies and CI expression levels increased confidence in this model. Our results provide insight into transcription-factormediated DNA looping in vivo, and the new method reported here also has the potential to address questions beyond DNA looping, including understanding of chromosome structure and dynamics in vivo. In the following, we compare our results with previous work, and discuss unique information provided by our new method.

Differences with in Vitro Looping Measurements
Our estimated looping frequencies of 79% for lWT and greater than 50% for lO R 3 2 and lO L 3 2 are larger than those observed in vitro by TPM and AFM, where looping frequencies at lysogenic CI concentrations were approximately 60% with wild-type operators and 10%-40% in the absence of O R 3 and O L 3 [42,44,46]. As looping frequency is directly linked to looping free energy, comparison of DG values showed the same trend: DG tet values estimated in these in vitro experiments were similar to our estimate of 23.2 kcal / mol , while in vitro DG oct values were 1-2 kcal / mol higher than ours [44,46].
Significantly different DG oct values likely resulted from differences between naked DNA in an in vitro environment and the compact, protein-decorated E. coli chromosome in the crowded cellular environment. Factors such as supercoiling and nonspecific, ''histone-like'' DNA-binding proteins could compact DNA and lead to more frequent encounters between O R and O L . Our observation that the unlooped lDO L DNA was extremely compact (discussed in more detail below) was consistent with this view; this level of compaction (comparable to a polymer with a 3-nm rather than a 50-nm persistence length) could lead to a 50-fold increase in the rate at which O R and O L encounter each other [68]. The relatively unchanged DG tet values could reflect the fact that the entropic and energetic costs of bringing O R and O L together are included in DG oct . Our looping frequency estimates confirm what were predicted by in vivo gene expression experiments-DNA was estimated to loop ,72% of the time for wild-type O R -O L DNA and ,69% for DNAs similar to our lO R 3 2 and lO L 3 2 constructs [35]. Correspondingly, the DG oct and DG tet estimated in the in vivo work (20.5 and 23.0 kcal / mol ) [35] compared well to ours (0.3 and 23.2 kcal / mol ).
One important assumption we employed in calculating looping frequencies is that that looped and unlooped lWT, lO R 3 2 , and lO L 3 2 DNA molecules had similar r lac=tet distributions to those of the looped control lnull and unlooped control lDO L , respectively. It is possible that the unlooped states in the lWT, lO R 3 2 , and lO L 3 2 strains were more compact than that in lDO L if after a DNA loop breaks O R -O L DNA does not always completely relax before it reforms again. In such a case, looping frequencies estimated using the linear-combination model would be upper limits on the true looping frequencies. Nevertheless, as we show above, our looping frequency estimates broadly agree with expectations from previous studies. Since this simple model only requires one free parameter and gives reasonable results, it is unnecessary to invoke more complicated models.

Effects of DNA Looping on Transcription Regulation
By comparing looping frequencies and corresponding CI expression levels in lWT, lDO L , lO R 3 2 , and lO L 3 2 , we showed that loop stabilization by the CI tetramer between O R 3 and O L 3 is important for efficient P RM repression, and that looping mediated by a CI octamer at O R 1 and O R 2 is important for P RM activation. We note that while it is possible that the presence of tetO 3 and lacO 3 binding sites flanking O R -O L DNA may influence CI binding and/ or transcription, this influence is negligible. This is because CI expression levels in these strains measured using smFISH are comparable to that of a wild-type l lysogen (Table S8), and our results are consistent with previous observations [13,49,51,53]. Furthermore, results are directly comparable as all strains used in this study are identical with respect to the presence and positioning of these binding sites.
Combining these results in our thermodynamic model, we estimated that CI-mediated DNA looping activates P RM to 2.4 times its level when the DNA does not loop. This compares well to earlier estimates of 2-4 fold [49], and 1.6-fold for a highexpression P RM mutant [53]. Another study did not find looping activates transcription, modeling CI-concentration-dependent P R and P RM activities without invoking activation via looping (by assuming k looped~kunlooped ) [35]. A later study indicated that this discrepancy may have resulted from different constructs used in the earlier study [49].
The molecular basis for DNA loop-enhanced P RM activation is unclear. One possibility is that a CI dimer bound to O R 2 interacts with RNA polymerase to a greater extent if it is part of a higherorder CI octamer [53]. Alternatively, a recent work showed that a DNA UP element proximal to O L [49,69] enhances CI expression from P RM in looped DNA by contacting the a-C-terminal domain of RNA polymerase [51]. The activation mechanism could be clarified in future experiments measuring both looping frequency and P RM activity while varying operator and UP element sequences and introducing CI mutations affecting operator binding, oligomerization, and RNA polymerase interaction.

Kinetics of DNA Looping
We estimated the time scale a DNA molecule stays in a particular state by calculating the autocorrelation function of the r I lac=tet vector (Figure 4a). The r I lac=tet vector was strongly correlated for at least 0.5 s, suggesting that a particular DNA conformational state, either compact or extended, persisted for at least 0.5 s. This implies an upper limit of 2 s 21 for the rate of loop formation from the extended state. This upper bound of transition rate is in the range of what was observed in a previous TPM experiment, in which looped and unlooped states lasted for tens of seconds [44], and argues against a significantly faster rate used in a recent computer simulation (,60 s 21 ) [50]. We note that although it is possible that transient CI unbinding does not necessarily lead to immediate and complete DNA conformational relaxation at our measurement time scale, the autocorrelation analysis puts an upper limit for the true transition rate between the looped and unlooped states. The same concern also applies to in vivo 3C and in vitro TPM experiments.
Slow transitions between looped and unlooped states imply that low or high expression states resulting from a particular DNA conformation could be long-lived, potentially committing a cell to a particular fate. Supporting this is a recent study that suggested that a single unlooping event could trigger induction of the lac operon [5]. We were unable to obtain time trajectories long enough to clearly identify looped/unlooped transitions for single DNA molecules. Development of brighter, faster maturing, and more photostable fluorescent proteins or in vivo labeling with synthetic fluorophores [70,71] will help in increasing the number of measurements made on one DNA molecule, possibly enabling accurate measurement of DNA looping kinetics in vivo.

The Short End-to-End Separation of lDO L Reflects the High Compactness of Chromosomal DNA
We observed very small end-to-end separation for the unlooped control (Sr lac=tet T = 71 nm). This distance was shorter than expected from modeling the unlooped DNA as a noninteracting worm-chain with an in vitro persistence length of 50 nm [72], but consistent with the recently observed extreme bendability of short DNA molecules [73]. A noninteracting chain with an equivalent Sr lac=tet T to that of lDO L would have a persistence length of only 3 nm, which is physically infeasible. Our measurements of indistinguishable conformational distributions in the absence of P RM transcription and the presence of CI overexpression suggest that neither transcription nor nonspecifically bound CI played a major DNA-compacting role in our experiments. Furthermore, C. crescentus chromosomal DNA segments of ,5 kb were found to be similarly compact and consistent with Brownian dynamics simulations of supercoiled DNA [74].
We attribute the small end-to-end separation observed for lDO L to the high compaction of the E. coli chromosome in the crowded cellular environment. While the exact molecular mechanisms responsible for compaction remain unclear, previous studies found that in vitro binding of the histone-like HU proteins [75] (accession numbers P0ACF0, P0ACF4) and in vivo mammalian chromatin packing [76] reduced the apparent persistence length of DNA. Hence, it is possible that nucleoid-associated proteins such as HU may bring distal DNA sites together by protein-protein interactions and/or affect local DNA conformations by introducing bends and relieving torsional strain [77]. Another important factor could be negative supercoiling, which has been shown to compact the chromosomal DNA globally [78]. However, the exact effect of negative supercoiling on a 2.3-kb DNA segment is difficult to predict, because negative supercoiling could also introduce extended, plectonemic structures that promote large separations between DNA sites on relatively short length scales [78].

Potential Applications
Our two-color, high-resolution method can be applied to examine how chromosomal location, DNA length, genetic background, and growth conditions affect the distance between any two DNA sites on the E. coli chromosome. Furthermore, the spatial organization of the E. coli chromosome can be determined by systematically measuring r lac=tet distributions between DNA sites throughout the chromosome. This method is similar to how chromosome conformation capture was used to generate a 3D model of the C. crescentus chromosome [79], but with significantly improved spatial resolution and without potential artifacts from fixation.

Strain and Plasmid Construction
A plasmid, pS2391, containing lacO 3 and tetO 3 (the tetO 2 sequence [54] was used for each repeat in tetO 3 ) sites was synthesized by Genewiz, Inc. Segments of l DNA (O R through O L for lWT, O R up to but not including O L for lDO L ) from the wildtype lysogen JL5392 (a gift from John Little, University of Arizona) were amplified by PCR. This DNA was sequenced and inserted between lacO 3 and tetO 3 using the In-Fusion PCR cloning system (Clontech). A kanamycin-resistance cassette flanked by BamHI sites was amplified by PCR and inserted after lacO 3 . For strains with mutated operators, mutations r1 [80], O L 3-4 [13], and cI G147D [62] were introduced to the lWT template via QuikChange (Agilent). A plasmid carrying the P RM 2 cI 2 mutations ( Figure S5c) (lDO L P RM 2 cI 2 ) was constructed by overlapping PCR mutagenesis using complementary primers carrying the desired mutations, flanked by a forward primer that sits at the EcoRI site on the upstream end of the operon and a reverse primer at the ClaI site in the rexA gene downstream of cI. The 1.13 kb PCR product was introduced to the lDO L plasmid by restriction ligation.
This procedure resulted in seven plasmids that were used as templates in subsequent chromosome insertion: pZH105 (lnull), pZH016 (lDO L ), pZH107 (lWT), pZH107r1 (lO R 3 2 ), pZH107O L 3-4 (lO L 3 2 ), pACL006 (lWT G147D ), and pACL007 (lDO L P RM 2 cI 2 ). Note that we use shorthand names such as lnull here for clarity; corresponding names used in our laboratory are listed in Table S4. The DNA sequence including lacO 3 , the l DNA segment, tetO 3 , and the kanamycin resistance cassette was inserted into the chromosome of E. coli strain MG1655 by l Red recombination [81], excising the lac operon, lacI, and all lacO sites.
To express the CI protein in trans from a plasmid, we constructed the plasmid pACL18 in which the wild-type cI ORF is driven by a constitutive promoter, P RM c , which has the wild-type 235 (TAGATA) and 210 (TAGATT) sequences, lacks O R 2, and has a mutated O R 1 sequence (CGCCTCGTGAGACCA) that eliminates binding by CI. The pRM c -cI fragment was then cloned to the ClaI site of the low-copy vector pACYC184. The plasmid pACL17 was generated similarly using a template containing the CI G147D mutation.
The two-color reporter plasmid pLau53, which expresses LacI-ECFP and TetR-EYFP polycistronically under the control of the P BAD promoter [82], was obtained from the Yale Coli Genetic Stock Center. Because the autofluorescence spectrum of live cells is generally strongest at wavelengths around 500 nm [83], singlemolecule imaging of blue-shifted fluorescent proteins such as ECFP is difficult. The red fluorescent protein mCherry, which further benefits from a large Stokes shift, fast chromophore maturation rate, and high brightness relative to other monomeric RFPs [84], was inserted in place of ECFP. We also created a tandem LacI-mCherry-EYFP reporter, which was used as a fiducial marker, by inserting the linker sequence from the tandemdimer fluorescent protein tdTomato [84] in between mCherry and EYFP.
To accurately localize a fluorescent spot arising from only a few fluorescent protein molecules above the background of unbound molecules within a cell, we reduced the reporter expression level by weakening the ribosome binding sites (RBSs). Weakened RBS sequences were designed using an online RBS calculator [85]. For example, the RBS for TetR-EYFP translation was the consensus AGGAGG Shine-Delgarno sequence in the parent plasmid pLau53. Our reporter plasmid had an ACCAGG Shine-Delgarno sequence, with a predicted ,300-fold decrease in the TetR-EYFP translation rate. All sequences including chromosome insertions were verified by sequencing (Genewiz Inc). Reporter plasmids are described in Table 1.

Growth Condition
For all experiments reported in this study, cells were grown and imaged at room temperature (,25uC) in M9 minimal media supplemented with MEM amino acids (Sigma). Cells were grown overnight with 0.4% glucose and 50 mg/ml carbenicillin to an optical density (OD 600 ) of 0.4. After centrifugation at room temperature, cells were resuspended at OD600<0.2 with 0.4% glycerol plus 0.2% L-arabinose and grown for 2 h (,1 cell cycle) to induce LacI-mCherry and TetR-EYFP expression. Cells were again resuspended at OD 600 <0.2 with 0.4% glucose and grown for another 2 h before immediate observation to allow time for fluorescent protein chromophores to mature.
We compared growth rates for the parent strain MG1655 to the experimental strain lnull to determine whether inserting the lacO 3 and tetO 3 construct into the chromosome and/or inducing expression from the reporter plasmid introduced a significant growth defect. Under induction growth conditions (,27uC, M9 media with 0.4% glycerol and 16 MEM amino acids) starting at OD 600 <0.1 and observing 8 h of growth, we measured doubling times of 2.7 h for MG1655 and 3.4 and 3.3 h for lnull harboring the reporter plasmid (in the absence and presence of 0.3% Larabinose, respectively), indicating that there is no large growth defect associated with the insertion of the tandem operator sites into the chromosome and/or the expression of TetR-EYFP and LacI-mCherry fluorescent fusion proteins ( Figure S6c).

Imaging Conditions
In each experiment, samples of all strains were placed on separate gel pads in the same growth chamber. Two sets of at least 30 movies were acquired for each strain, with the second set acquired in the reverse order to minimize any bias possibly introduced by observing some strains in a particular order. All images were acquired within less than one cell doubling time.
Cells were put on a gel pad made of 3% low-meltingtemperature SeaPlaque agarose (Lonza) in M9 with glucose and imaged on an Olympus IX-81 inverted microscope with a 1006 oil immersion objective (Olympus, PlanApo 1006 NA 1.45) and additional 1.66 amplification. Images were split into red and yellow channels using an Optosplit II adaptor (Andor) and captured with an Ixon DU-895 (Andor) EM-CCD with a 13-mm pixel width using MetaMorph software (Molecular Devices). Laser illumination was provided at 514 nm by an argon ion laser (Coherent I-308), which also pumped a rhodamine dye laser (Coherent 599) tuned to ,570 nm. A quarter-wave plate (Thorlabs) was used to circularly polarize excitation light. Emitted light was split by a long-pass filter, and the red and yellow images were filtered using HQ630/60 and ET540/30 bandpass filters (Chroma).
Measuring and Analyzing r I lac=tet Images were inspected manually using a custom MATLAB script to identify spots that appeared in both EYFP and mCherry images. Images from all strains were displayed in random order without knowing the strain identify to avoid bias in spot selection. Pixel intensities within 3 pixels of the initial spot location were fitted with a symmetric, two-dimensional Gaussian distribution to estimate spot coordinates. The variance of the fit distribution was constrained to be less than 2 pixels. Spot-fitting error was estimated by scrambling residuals from a fit to the fluorescence data in 10 random permutations, adding them to the data, and fitting the resulting images; the reported error for a spot is the standard deviation of the distances between these fits and the initial fit to the raw data. Fitting error distributions are shown in Figure S1a.
The LacI-mCherry-EYFP tandem dimer ( Figure S2b) in which the two fluorescent proteins were directly fused together was used to acquire fiducial control points to transform between the mCherry and EYFP coordinate systems. A projective transform was calculated from the control points using the cp2tform function in MATLAB. We found that relatively simple, global transformations were sufficient to transform coordinates of fluorescent beads (Tetraspeck, Invitrogen) with ,10-nm registration error in our microscope setup, and did not see any further improvement with a locally weighted transformation used in in vitro two-color experiments [21]. This transformation was also used to generate the overlay images in Figure 2, Figure 3, and all supplemental movies. Fluorescent beads were not used as fiducial markers because the beads' emission spectra were different from those of the fluorescent proteins. Analysis was restricted to molecules in which mCherry and transformed EYFP coordinates were separated by less than 200 nm. Separations beyond this threshold were rare (,1% of data, see twodimensional distributions in Figure S4) and did not correlate with strain identity in any reasonable way. They possibly arose from data in which cells contained two labeled copies of O R -O L DNA.
After transformation into a uniform coordinate system, r lac=tet was calculated from the mCherry and EYFP coordinates and multiplied by an 81-nm pixel size (resulting from 1606 magnification on a CCD with a 13-mm pixel width). Probability and cumulative distributions P r lac=tet À Á and C r lac=tet À Á were calculated for 10-nm bins using the kernel smoothing probability density estimation (ksdensity) function in MATLAB, restricting the density to positive values and employing a uniform kernel width small enough to follow empirical cumulative density distributions without any systematic errors. Significant differences between r lac=tet distributions were determined using a two-sample Kolmogorov-Smirnov test; two-tailed Student's t tests of sample means returned smaller, more significant p values. Errors in Sr lac=tet T and S CI ½ T were determined by calculating the means of 1,000 bootstrapped samples; the reported error is the standard deviation of the calculated means. Looping frequencies were estimated by least squares fitting of 1,000 bootstrapped distributions (control distributions were also randomized on each iteration) and their error was calculated similarly.

Single-Molecule Fluorescence in Situ Hybridization (smFISH)
Concentration measurements by smFISH followed a previously described protocol [66]. Transcripts from P RM were labeled with a mixture of 42 oligonucleotides labeled with CAL Fluor Red 610 (Biosearch Technologies), 31 of which hybridized to cI (11 targeted sequences not found in E. coli and did not cause a problematic level of false positives). Table S5 lists all 42 oligonucleotides. Labeled cells were imaged with 561-nm excitation at six imaging planes separated by 200 nm z-depth with negligible photobleaching. For each frame, fluorescent spots were automatically detected and fit to a Gaussian using a custom MATLAB routine. Nearly all molecules appeared in multiple image slices; the slice with the largest fit amplitude was kept. The integrated fluorescence of spots was observed to be quantized with one or a few molecules localized within one diffraction-limited spot. The intensity of one transcript was estimated from the distribution of spot intensities, and the number of molecules contributing to each spot was estimated from this quantization. The number of transcripts in each cell was estimated from the sum of the number of molecules in each spot within that cell. Alternatively, the number of molecules in one cell is proportional to its integrated fluorescence; this measurement provided the same average expression levels within error. The experiment was repeated to ensure that differences in labeling efficiency between samples were not responsible for differences in the number of detected molecules; combined data from both experiments were used for analysis.

Simulation of P r lac=tet À Á
To generate simulated r lac=tet distributions, we first generated 10,000 random radial distances for a chain with a contour length L and persistence length P from a worm-like, noninteracting chain model using a Gaussian distribution with Daniels' approximation, which is accurate in the regime L=P~k%1 : Each simulated r was projected onto the plane at a random angle to give a distance r 0 . Simulated spots were placed at x,y ð Þ coordinates 0,0 ð Þ and r 0 ,0 ð Þ. The MATLAB function mvnrnd was then used to simulate normally distributed measurement error with a standard deviation of 22 nm to the coordinates of each simulated spot. This procedure was sufficient to simulate the lnull distribution ( Figure S2c) using a fixed end-to-end distance of 22nm (approximate distance between the centers of the lacO 3 and lacO 3 sites; Figure S2a). Note here that the simulation is simplified in that it assumes that each spot has the same 22-nm localization error. In reality, localization error varies between different spots ( Figure S1a) and there are other sources of measurement error. These differences may explain the slight deviation of the simulated distribution from the experimental distribution. The same procedure was used to estimate the Sr lac=tet T expected for 2.3kb, B-form DNA with a 50-nm persistence (,200 nm) as well as the apparently persistence length (3 nm) implied by the 71-nm Sr lac=tet T observed for lDO L .

Thermodynamic Modeling
Additional descriptions of thermodynamic states are listed in Table S3. Parameter values were determined by first scoring a wide range of parameter values and iteratively searching narrower and more finely grained parameter ranges to manually minimize the sum of the squares of the differences between experimental and modeled values for looping frequency and CI expression level. We then refined this fit by least-squares minimization using MATLAB. This was done using a minimized model that only accounted for states likely to be populated near or above lysogenic CI concentrations (e.g., disregarding states in which O R 1 and O R 2 are unbound by CI). Using the same parameters and accounting for all 176 possible states (122 unique states accounting for degeneracy) did not significantly change the fit results. Fitting with this much more complex model gave octameric and tetrameric looping free energies of 0.6 and 23.3 kcal / mol and unlooped and looped expression rates of 2.1 and 5.3 nM/min. When determining parameters, rates were expressed in terms of changes in concentration per unit time; we followed earlier work in assuming that in a typical E. coli cell, a single molecule is at a concentration of ,1.47 nM [35].
We do not report any estimate of fitting error; instead, we present only the parameters most consistent with our data and assumptions. Figure 5c and d shows that fit parameters were welldetermined at a given combination of wild-type CI concentration and nonspecific binding parameters. As noted in the main text, varying these two parameters changed the absolute best-fit parameters, but did not dramatically change our conclusions. Furthermore, fixed parameters of previous studies were determined in a number of separate experiments employing different methods at temperatures other than 25uC; a rigorous estimate of modeling error would require knowing the error in the measurements of fixed parameters in our experimental conditions.
The basal CI expression rate, k basal , was arbitrarily fixed at k unlooped =5; this did not have any significant impact on determining other parameters, as our measurements were all at or above lysogenic ½CI, where O R 2 is almost always bound by a CI dimer. Additionally, the fraction of free CI dimers was fixed at its value for 150 CI molecules per cell at a given concentration of nonspecific binding sites and nonspecific binding affinity. Fixing the concentration of free CI dimers is a reasonable approximation if (1) nearly all CI molecules are in dimers and (2) the number of free nonspecific binding sites is not significantly changed by nonspecifically bound CI dimers.  Figure 4b, and Movies S1, S2, S3, S4, S5, S6 were prepared using NIH ImageJ [86]. Raw fluorescence image intensities were scaled linearly from the lowest to highest values in region shown. For EYFP/mCherry overlay images, brightfield images were inverted and converted to 8-bit RGB. Fluorescence images were bandpass filtered and background subtracted before being used to generate magenta (mCherry) and green (EYFP) 8-bit RGB images that were added to the brightfield image. The EYFP images were first transformed in MATLAB using the imtransform function and the same fiducial data that were used to transform EYFP spot locations into mCherry coordinates. For smFISH images (Figure 4b), the value of each pixel is the maximum value of that pixel in six images collected at different z-axis positions. Intensities for all images were scaled linearly from the minimum to the maximum of all pictures (117-4,840 counts in 16-bit images). Figure S1 Spot fitting and experimental error analysis. (a) Distribution of fitting errors for EYFP (green), mCherry (red) localizations, and r lac=tet (black). Errors were estimated using a bootstrapping procedure by fitting raw data to a Gaussian distribution. The residuals from this fit were then randomly rearranged and added back to the data in 10 different permutations. The reported error is the standard deviation of the distance between these 10 locations and the initial fit location. Error in r lac=tet was determined similarly; from the 10 bootstrapped EYFP and mCherry fits, 100 distances were obtained and the error was estimated as the standard deviation of the difference between these distances and the distance determined from fitting the raw data. (b) A compilation of all data from three separate experiments was used for all analysis in the main text. Here, Sr lac=tet T is shown for the individual experiments. Error was estimated as the standard deviation of the means of 1,000 bootstrapped distributions. Except for one sample (lO R 3 2 , day 3), the estimated mean separations for all days followed the trend Figure S2 Estimate of positive control dimensions and apparent end-to-end distance distribution. (a) The maximum distance between TetR-EYFP and mCherry-LacI chromophores was approximated assuming straight DNA. All distances are in nm. Here, bound fusion proteins are shown on the same face of a DNA molecule, but this needs not be the case. Dimers of DNA-binding proteins were based on Protein Data Bank (PDB) entries for TetR (1QPI [87]) and LacI (1EFA [88]). Both fluorescent proteins are shown using the entry for GFP (1GFL [89]). Protein structures images generated using VMD [90]. (b) In an alternative positive control that was used to collect fiducial data for image registration, the plasmid pZH102R33TD encodes the tandem-dimer reporter LacI-mCherry-EYFP. (c) The r lac=tet PDF for the lnull control (black line; 1 s.e.m. shown in red as in Figure 3a) is shown with the distribution of 10,000 numerically simulated end-to-end distances for two sites separated by 22 nm, randomly projected onto the 2D plane, and subjected to 22-nm localization error for both ends (dashed black line). PDFs were calculated using methods described in main text. See Materials and Methods for simulation details.  Figure S5 Experiments showing the effects of transcription, nonspecific CI binding and higher-ordered CI oligomer on DNA looping. (a) End-to-end distance (r lac=tet ) distributions (PDF) for lnull (red), lDO L (blue), lDO L P RM 2 cI 2 (purple), and lDO L P RM 2 cI 2 /cI trans (green). The PDF is estimated for 10-nm bins. (b) Cumulative density of r lac=tet (CDF) for lnull (red), lDO L (blue), lDO L P RM 2 cI 2 (purple), and lDO L P RM 2 cI 2 /cI trans (green).

Supporting Information
The CDF is estimated for 10-nm bins. (c) DNA sequence for the P RM 2 cI 2 mutant in comparison to the wild-type sequence. Mutated nucleotides are shown in red. (d) Gel shift assay monitoring the binding of wild-type CI protein. Lane 1-4, CI at concentrations of 0, 150, 300, and 600 nM binding to a 158-bp DNA fragment (20 nM) amplified from the plasmid pZH107 carrying the wild-type P RM DNA sequence. Lane 5-8, CI at concentrations of 150, 0, 300, and 600 nM (note loading order) binding to a 158-bp DNA fragment (20 nM) amplified from the plasmid pACL007 carrying the P RM 2 cI 2 sequence. Lane 9: empty. Lane 10-13, CI at concentrations of 0, 150, 300, and 600 nM binding to a 140-bp DNA fragment (20 nM) amplified from the E. coli hns promoter region, which CI does not bind specifically. Reaction mixtures were incubated in a buffer (10 mM Tris pH 8.0, 50 mM KCI, 1 mM MgCl 2 , 10% glycerol, 100 ug/ml BSA, 1 mM DTT) at room temperature for 10 min. Samples were electrophoresed in Bio-Rad 4-20% Gradient TBE gels (Bio-Rad, Hercules, CA) in a cold room and then stained with Ethidium Bromide for 30 min. (e) Fraction of bound DNA (intensity of lowweight band divided by intensity of lane over background) quantified using NIH ImageJ for the gel shown in (d). (f, g) Distributions of r lac=tet identical in description to those in (a, b) showing strains lnull (red), lDO L (blue), lG147D (purple), and lG147D/cI G147D,trans (green). (TIF) Figure S6 Growth rate comparisons. (a, b) Strains used in thermodynamic modeling were diluted from exponential growth to low optical densities in M9 minimal media supplemented with 0.4% glucose and carbenicillin as described in the main text. OD600 was measured over 10 h of growth for two replicate experiments. Strains are lDO L (blue), lWT (red), lO R 3 2 (green), and lO L 3 2 (purple). Doubling times calculated using the Microsoft Excel LOGEST function range from 1.7 to 2.5 h. Two independent replicates are shown. (c) Growth rates for the parent E. coli strain MG1655 (blue) were compared to those of the control strain lnull in which the lac operon is replaced with a construct incorporating the lacO 3 and tetO 3 binding site arrays and which harbors the plasmid pZH102R33Y29 which expresses both TetR-EYFP and LacI-mCherry fluorescent fusion proteins upon arabinose induction. Strains were grown in M9 minimal media supplemented with 0.4% glycerol and lnull was grown in both the absence (red) and presence (green) of 0.3% L-arabinose. Doubling times were 2.7 h for MG1655 and 3.4 and 3.3 h for lnull in the absence and presence of L-arabinose, respectively. (TIF) Movie S1 Fluorescence movie montage for strain lnull corresponding to the data in Figure 2c. Single-color images for TetR-EYFP (top left) and LacI-mCherry (top right) data have intensities scaled linearly from the lowest to the highest pixel values in the first image in each time series. Before creating the overlay images (bottom), single-color images were background subtracted and bandpass filtered using the program ImageJ [91]. The overlay images are scaled to be twice as large as the single-color images. Scale bars correspond 4 mm in the small, single-color images and 2 mm in the overlay image. Ten consecutive image frames are shown in real time (10 frames per second); the movie is looped 5 times.

(MOV)
Movie S2 Fluorescence movie montage for strain lDO L corresponding to the data in Figure 2d. Single-color images for TetR-EYFP (top left) and LacI-mCherry (top right) data have intensities scaled linearly from the lowest to the highest pixel values in the first image in each time series. Before creating the overlay images (bottom), single-color images were background subtracted and bandpass filtered using the program ImageJ [91]. The overlay images are scaled to be twice as large as the single-color images. Scale bars correspond 4 mm in the small, single-color images and 2 mm in the overlay image. Ten consecutive image frames are shown in real time (10 frames per second); the movie is looped 5 times.

(MOV)
Movie S3 Fluorescence movie montage for strain lDO L corresponding to the data in Figure 2e. Single-color images for TetR-EYFP (top left) and LacI-mCherry (top right) data have intensities scaled linearly from the lowest to the highest pixel values in the first image in each time series. Before creating the overlay images (bottom), single-color images were background subtracted and bandpass filtered using the program ImageJ [91]. The overlay images are scaled to be twice as large as the single-color images. Scale bars correspond 4 mm in the small, single-color images and 2 mm in the overlay image. Thirteen consecutive image frames are shown in real time (10 frames per second); the movie is looped 5 times.

(MOV)
Movie S4 Fluorescence movie montage for strain lWT corresponding to a typical, long movie. Single-color images for TetR-EYFP (top left) and LacI-mCherry (top right) data have intensities scaled linearly from the lowest to the highest pixel values in the first image in each time series. Before creating the overlay images (bottom), single-color images were background subtracted and bandpass filtered using the program ImageJ [91]. The overlay images are scaled to be twice as large as the single-color images. Scale bars correspond 4 mm in the small, single-color images and 2 mm in the overlay image. Twelve consecutive image frames are shown in real time (10 frames per second); the movie is looped 5 times.

(MOV)
Movie S5 Fluorescence movie montage for strain lO R 3 2 corresponding to a typical, long movie. Single-color images for TetR-EYFP (top left) and LacI-mCherry (top right) data have intensities scaled linearly from the lowest to the highest pixel values in the first image in each time series. Before creating the overlay images (bottom), single-color images were background subtracted and bandpass filtered using the program ImageJ [91]. The overlay images are scaled to be twice as large as the single-color images. Scale bars correspond 4 mm in the small, single-color images and 2 mm in the overlay image. Thirteen consecutive image frames are shown in real time (10 frames per second); the movie is looped 5 times.

(MOV)
Movie S6 Fluorescence movie montage for strain lO L 3 2 corresponding to a typical, long movie. Single-color images for TetR-EYFP (top left) and LacI-mCherry (top right) data have intensities scaled linearly from the lowest to the highest pixel values in the first image in each time series. Before creating the overlay images (bottom), single-color images were background subtracted and bandpass filtered using the program ImageJ [91]. The overlay images are scaled to be twice as large as the single-color images. Scale bars correspond 4 mm in the small, single-color images and 2 mm in the overlay image. Twelve consecutive image frames are shown in real time (10 frames per second); the movie is looped 5 times.

(MOV)
Table S1 Looping frequencies were estimated from alternate data sets using either all data or only the data from the first frames (for molecules appearing in more than one sequential frame) and fitting either probability (PDF) or cumulative (CDF) distributions. The first row results for each strain were reported in the main text. (DOCX)

Table S2
States used in thermodynamic modeling. We used free-energy parameters that were described by Dodd et al. [35]. States that will not be populated near lysogenic CI concentrations (e.g., those without O L 1 or O L 2 bound) are ignored; the reference state (DG~0) has CI dimers bound to O L 1 and O L 2. A state with O R free of CI is included to show activation in Figure 5a and b, but does not significantly change fit parameters; because O R 1 and O R 2 binding is highly cooperative, we do not model states with only one or the other operator bound. The degeneracy term indicates how many microstates exist with identical CI dimer binding patterns and free energies. A particular macrostate may have several microstates that differ in terms of parallel or antiparallel looping configurations or in the identity of binding sites participating in cooperative interactions (either through looping or through adjacent dimers). Here, we also list whether a state is looped (1 for looped; 2 for unlooped) as well as its transcription rate, k (0; 1 for k basal ; 2 for k unlooped ; 3 for k looped ). The free energy of state 2 is called DG 2 below. (DOCX) Table S3 Thermodynamic model fitting using alternative choices for wild-type CI concentration (expressed here in molecules/cell; in the model, 1 molecule per cell is equivalent to 1.47 nM) and the fraction of CI molecules that are in the form of free dimers. The approximation of a constant free-dimer fraction is reasonable if specifically bound CI dimers (up to 6 dimers composed of 12 monomers) do not make up a large fraction of total CI and if CI concentration is sufficiently high that almost all CI molecules are in dimeric complexes. The free-dimer fractions used here were calculated assuming the absence of specific binding sites using the parameters for nonspecific binding site affinity and concentration estimated by Dodd et al. [35]. Results in the first row are the same as those presented in the main text. (DOCX)

Table S4
Names of new strains used in this study (as used internally in our lab) and shorthand names used in the main text. (DOCX)

Table S5
Sequences of oligonucleotide probes for singlemolecule fluorescence in situ hybridization (smFISH) experiment. Asterisks indicate probes that do not hybridize specifically with any E. coli sequence. All other probes hybridize nonoverlapping sequence in the cI coding region of the mRNA transcript from the P RM promoter. (DOCX)

Table S6
Measurement statistics for experiment comparing r lac=tet distributions for looped and unlooped control strains to r lac=tet for strains lacking O L and having weakened P RM promoters with and without the overexpression of wild-type CI from a plasmid. Errors for the r lac=tet measurements are all 1 s.e.m. as estimated from 1,000 bootstrapped samples. Note that r lac=tet distributions display small, day-to-day variability between experiments (see Figure S1, this table, Table 2, Table S7), but the trend stays the same for a given set of experiments. (DOCX)

Table S7
Measurement statistics for experiment comparing r lac=tet distributions for looped and unlooped control strains to r lac=tet for strains in which CI harbors the G147D mutation with and without the overexpression of CI G147D from a plasmid. Errors for the r lac=tet measurements are all 1 s.e.m. as estimated from 1,000 bootstrapped samples. Note that r lac=tet distributions display small, day-to-day variability between experiments (see Figure S1, this table, Table 2, Table S6), but the trends stays the same for a given set of experiments. (DOCX)