The Structural Basis for Promoter −35 Element Recognition by the Group IV σ Factors

The control of bacterial transcription initiation depends on a primary σ factor for housekeeping functions, as well as alternative σ factors that control regulons in response to environmental stresses. The largest and most diverse subgroup of alternative σ factors, the group IV extracytoplasmic function σ factors, directs the transcription of genes that regulate a wide variety of responses, including envelope stress and pathogenesis. We determined the 2.3-Å resolution crystal structure of the −35 element recognition domain of a group IV σ factor, Escherichia coli σE 4, bound to its consensus −35 element, GGAACTT. Despite similar function and secondary structure, the primary and group IV σ factors recognize their −35 elements using distinct mechanisms. Conserved sequence elements of the σE −35 element induce a DNA geometry characteristic of AA/TT-tract DNA, including a rigid, straight double-helical axis and a narrow minor groove. For this reason, the highly conserved AA in the middle of the GGAACTT motif is essential for −35 element recognition by σE 4, despite the absence of direct protein–DNA interactions with these DNA bases. These principles of σE 4/−35 element recognition can be applied to a wide range of other group IV σ factors.


Introduction
Bacterial transcription is driven by the DNA-dependent RNA polymerase (RNAP), comprising five core subunits (a 2 bb9x) plus an initiation-specific r subunit, which binds to the core RNAP to form the holoenzyme [1][2][3]. Promoterspecific transcription initiation first requires the formation of a closed complex in which r domains 2 (r 2 ) and 4 (r 4 ) bind sequence-specifically to the À10 and À35 promoter DNA elements, respectively [3][4][5]. Analysis of the available bacterial genomes has revealed great variation in both the number and type of r factors that each bacterial species possesses [6,7], allowing for promoter-specific transcription of defined regulons.
Most r factors belong to the r 70 family, which can be broadly divided into five subgroups [7,8]. The group I (primary) r factors, such as Escherichia coli (Ec) r 70 and Thermus aquaticus (Taq) r A , direct the transcription of housekeeping genes for which basal levels of transcription are essential for normal cellular processes and survival. The largest and most diverse subgroup, the group IV, or extracytoplasmic function (ECF) r factors, direct the transcription of genes that regulate a wide variety of responses including periplasmic stress, iron transport, metal ion efflux, alginate secretion, and pathogenesis [7,[9][10][11]. The Ec ECF r factor r E is an essential protein that directs the response to periplasmic stress [12][13][14][15].
Like many ECF rs, Ec r E is regulated by an anti-r, RseA [13,15]. Under normal conditions, RseA inactivates r E by sequestering it at the cytoplasmic face of the inner membrane. However, when environmental stresses lead to unfolded proteins in the periplasm, a series of proteolytic cleavage reactions release r E from RseA [16]. The r E is then free to bind RNAP and drive the transcription of a core set of genes conserved across most bacteria, as well as a more variable set of genes [17]. The core genes coordinate the assembly and maintenance of the bacterial outer membrane.
The structure of Ec r E bound to the cytoplasmic portion of its anti-r RseA revealed that, despite little primary sequence identity, domains 2 and 4 of r E (r E 2 and r E 4 , respectively) share striking structural similarity to the corresponding domains of Taq r A (r A 2 and r A 4 ; [22]). Domain 4 of all primary rs, which contains a helix-turn-helix DNA binding motif, recognizes the 6-base-pair (bp) À35 consensus TTGA-CA [4,23], while Ec r E 4 is thought to directly recognize the 7bp À35 element GGAACTT [17]. Taken together, this suggests that the different groups of r factors share the same general mechanisms of À35 element binding, but that residue changes on the surface of the recognition helix account for differences in promoter specificity. Previous studies have revealed the molecular details of how domain 4 of the group I r factor Taq r A recognizes its À35 consensus promoter element [4]. To better understand the structural basis for group IV r factor promoter specificity, we solved the 2.3-Å resolution crystal structure of Ec r E 4 bound to its À35 consensus promoter element. The structure reveals that, despite the structural similarity with Taq r A 4 , Ec r E 4 recognizes its À35 element in a distinct manner. Conserved sequence elements of the r E À35 element, including the most highly conserved 'AA' of the GGAACTT motif, are not involved in direct interactions between the protein and the unique edges of the DNA bases. Instead, these DNA elements induce a specific DNA geometry that is required for r E 4 binding. Sequence analysis of other group IV rs and their cognate À35 elements indicates that this principle of À35 element recognition is a conserved feature of À35 element recognition by group IV r factors.

Crystallization and Structure Determination
We performed vapor diffusion crystallization trials with Ec r E 4 (residues 122 to 191) in complex with DNA fragments corresponding to the Ec r E consensus À35 promoter sequence GGAACTT [17]. Thin rectangular crystals grown using a 12bp DNA fragment ( Figure 1A) diffracted to 2.3 Å -resolution (see Materials and Methods and Table 1). The structure was determined by molecular replacement using both a model of Ec r E 4 from the Ec r E /RseA complex structure [22] and the 6bp À35 element from the Taq r A 4 /DNA structure [4] in search models. The crystals contained two r E 4 /DNA complexes per asymmetric unit, with a solvent content of 65%. Iterative model building and crystallographic refinement converged to an R/R free of 0.241/0.253 (Table 2).

Overall Structure
Two r E 4 molecules in the asymmetric unit each bound a separate DNA fragment. As anticipated, the recognition helix of the r E 4 helix-turn-helix motif bound in the major groove of the À35 element ( Figure 1B). The crystallographically related DNA helices packed head-to-tail, forming a pseudocontinuous double helix with the 1 bp overhangs forming Hoogstein base pairs with the adjacent double helices.

r E 4 -DNA Interactions
Protein-DNA interactions, which occur exclusively within the major groove, extend from À29 to À36, spanning the entire À35 element as well as one base of upstream DNA (Figures 2  and 3A). The protein anchors itself to the DNA by direct and water-mediated side chain and main chain interactions with the phosphate backbone on the nontemplate strand from À33 Figure 1. Overview of Ec r E 4 /À35 Element DNA Structure (A) Synthetic 12-mer oligonucleotides use for crystallization. The black numbers above the sequence denote the DNA position with respect to the transcription start site at þ1. The À35 element is colored light green (nontemplate strand) and dark green (template strand). The flanking bases are colored light gray (nontemplate strand) and dark gray (template strand to À35 and the template strand from À299 to À329 [throughout this paper, DNA bases will be numbered as in Figure 3A, where negative numbers denote base pairs upstream of the transcription start site. Unprimed numbers denote the nontemplate (top) DNA strand, while primes denote the template (bottom) strand]. Specific protein-DNA base interactions occur through direct hydrogen bonds and van der Waals forces (Figures 2 and 3A). In addition, there is one cation-p interaction between R176 and À36. Interestingly, the primary base-specific protein-DNA interactions occur at only three positions of the 7-bp À35 element (all Guanines), À35, À34, and À319 ( Figure 3A). The upstream edge of the À35 element is recognized through a series of hydrogen bonds and van der Waals interactions, mostly between R176 and S172 and the guanine bases at À35 and À34. R176 forms two hydrogen bonds with the À35G. In addition, R176 forms a cation-p interaction with the À36 DNA base, creating a stair motif along with the À35 hydrogen bonds [24,25]. S172 forms direct hydrogen bond and van der Waals interactions with the À34G. The protein-DNA basespecific interactions at the À319 position are almost exclusively from R171, which makes two hydrogen bonds and one van der Waals interaction with the À319G.
In contrast to the numerous base-specific interactions at the À35, À34, and À319 positions, the À33 and À32 positions each contain only one base-specific contact, in the form of van der Waals interactions between the thymidine C5-methyl groups at À339 and À329 with F175 and R171, respectively ( Figure 3A). The structure reveals no base-specific protein-DNA interactions at the À30 and À29 positions.

Geometry of the r E 4 À35 Element DNA
Over four of the À35 element positions (À33, À32, À30, À29), there are a total of only two protein-DNA-base contacts, both weak, van der Waals contacts ( Figure 3A). Nevertheless, the À33 and À32 positions are the most highly conserved positions, not only in the Ec r E À35 consensus but also across all group IV r factors where the promoter specificity is known ( Figure 3B; [7,17]). Furthermore, genetic screens for defective transcription resulting from single nucleotide substitutions in the À35 element of the Ec r E homolog from Salmonella enterica serovar Typhimurium only resulted in the selection of mutants with substitutions at positions À33 and À32 [26]. Therefore, how is it that the most highly conserved and essential positions in the r E À35 element are also the same ones that lack strong protein-DNA base interactions? The answer for this apparent paradox comes from the unique DNA geometry of the r E À35 element ( Figure 4).
The unique DNA geometry induced by oligo(dA) oligo(dT) tracts, defined by the presence of four to six consecutive A T bp, is well established [27][28][29][30][31]. Depending on its sequence, oligo(dA) oligo(dT) tract DNA is rigid and straight, with a high degree of propeller twist and a very narrow minor groove. Despite not being a true oligo(dA) oligo(dT) tract as a result of the cytosine insertion at À31, the r E À35 element DNA is relatively straight ( Figure 4A), with a high degree of propeller twist ( Figure S1), and the minor groove width begins to narrow at the start of the À33/À32 AA ( Figure 4B). The narrow minor groove is stabilized by a network of cross-strand hydrogen bonds between adjacent DNA bases, along with a spine of hydration consisting of water-mediated hydrogen bonds between the two strands ( Figure 4C). The AA at À33/À32 is the most highly conserved feature of the r E À35 consensus. After the À31 cytosine insertion, the consensus comprises TT (À30/À29). Furthermore, there is a continued run of two additional conserved Ts at À28/À27 ( Figure 3B; [17]).
Interestingly, the nucleosome structure [32] contains a stretch of DNA, GAAGTT, similar in sequence to À34 to À29 (GAACTT) of the Ec r E À35 element ( Figure S2). Similar to Ec r E À35 element DNA, the nucleosome DNA cannot be classified as a typical oligo(dA) oligo(dT) tracts as a result of the non-A/T base, yet it too displays the hallmark DNA geometry, such as a very narrow minor groove ( Figure S2B). The presence of similar DNA geometry in two different structural contexts strongly suggests that the oligo(dA) oligo(dT)-like DNA geometry found in the Ec r E À35 element DNA complex is an intrinsic property of the DNA sequence and not due to protein induced conformational changes.
The absence of strong, base-specific protein-DNA interactions at the À33, À32, and À30 to À27 positions ( Figure 3A) is conspicuous in light of the high DNA sequence conservation, particularly at the À33/À32 positions ( Figure 3B). This, combined with the observation that the DNA sequence induces a unique geometry in the À35 element DNA ( Figure  4), strongly suggests that the DNA sequence is conserved at these positions to set up the global conformation of the DNA, and that this DNA conformation is essential for r E 4 binding. In this light, the results of the previous genetic screen [26] make good sense. Individual mutations at positions other than the À33 and À32 could be compensated for by both the binding interactions at other À35 element positions and by protein-DNA backbone interactions, which would not be lost at the mutated position. However, substitutions at the À33/ À32 positions, which disrupt the highly conserved AA, would in turn disrupt the global DNA geometry necessary for r E 4 binding.

Comparison of r E 4 and r A 4 À35 Element Recognition
Superposition of the DNA from the Ec r E 4 and Taq r A 4 [4] À35 element complexes reveals that Ec r E 4 binds 4 Å further into the major groove than the group I r factor Taq r A 4 , allowing Ec r E 4 to form more extensive interactions with the DNA ( Figure 5A). In addition, this shift extends the DNA recognition surface of the protein toward the C-terminus of the helix-turn-helix motif recognition helix of Ec r E 4 (Figure the r E 4 À35 element at À35. Interestingly, Taq r A 4 makes one van der Waals and four hydrogen bond protein-DNA contacts upstream of the À35 element at À36 and À38, whereas, Ec r E 4 only makes one van der Waals and one cation-p interaction with the nearby À36 DNA base. In essence the 4-Å shift causes the regions of Taq r A 4 that were involved in upstream non-promoter element contacts to be involved in sequence specific À35 element contacts in the Ec The protein is shown as an a-carbon backbone worm, with r E 4.1 colored yellow and r E 4.2 colored light blue. Side chains are shown for those residues that make protein-DNA contacts. Carbon atoms of the side chains are colored as the backbone, except atoms involved in polar contacts with the DNA are colored (nitrogen atoms, blue; oxygen atoms, red). The DNA is color-coded as in Figure 1A, except atoms involved in polar contacts with the protein are colored (nitrogen atoms, blue; oxygen atoms, red). Water molecules are indicated with red spheres. Dashed black lines indicate hydrogen bonds or salt bridges. DOI: 10.1371/journal.pbio.0040269.g002 r E 4 /DNA structure. For example, in both structures aligned residues K418/R176 (Taq r A 4 /Ec r E 4 ), T408/P166, R411/T169, and Q414/S172 make up the majority of the upstream nontemplate strand interactions. However, in the case of Ec r E 4 they all make interactions within the À35 element at À35 and À34, whereas in Taq r A 4 they make interactions mostly upstream of the À35 element (À38 to À35). Similarly, the aligned residues R387/R149, L398/Y156, and E399/E157  [4]). The nontemplate/template strand DNA is colored light gray/dark gray (respectively), except the À35 element is colored light green/dark green (for Ec r E 4 ) or pink/magenta (for Taq r A ). Colored boxes denote protein residues. Color-coding for the proteins, as well as the meaning of the lines indicating interactions, is explained in the legend (lower right). Double thick solid black lines indicate two hydrogen bonds with the same residue. Water molecules mediating protein-DNA contacts are shown as red circles. (B) Sequence logo denoting sequence conservation within the Ec r E 4 À35 element [17,51]. DOI: 10.1371/journal.pbio.0040269.g003 interact in both structures with the downstream template strand DNA backbone. However, in Ec r E 4 R149 and E157 make their contacts 1 to 2 bp farther downstream than Taq r A 4 R387 and E399 ( Figure 5B). In contrast to the genetic screen for nucleotide substitutions in the r E À35 element, which only found decreased transcription from mutations at two of the seven promoter positions (À33 and À32; [26]), systematic mutational studies of the Ec r 70 À35 element have shown decreased transcription from mutations at five of the six promoter positions (À35 to À31; [33]). The two structures also show major differences in the geometry of the À35 element DNA. Whereas Taq r A 4 bends its À35 element, the protein-bound Ec r E 4 À35 element DNA is relatively straight ( Figure 4A). Unlike the r 70 À35 element, the Ec r E À35 element itself adopts a unique DNA geometry (described above) that leads to a rigid, straight DNA segment. In fact, unlike the primary rs, which utilize the flexibility of its À35 element DNA, Ec r E appears to use the rigidity of its À35 element DNA sequence to increase specificity.
Superposition of the proteins from the Ec r E 4 and Taq r A 4 À35 element complexes highlights the significant differences in the positioning of the À35 element DNA with respect to the protein, and the different properties of the protein surfaces available for interacting with other proteins bound to the upstream DNA ( Figure 5C). Conserved, basic residues of the group I r domain 4 are key targets for interacting with acidic residues of class II transcriptional activators that bind just upstream of the À35 element [4,34,35]. The role of transcriptional activators in controlling r E transcription is largely unknown.

Implications for À35 Element Recognition by Other Group IV r Factors
The primary sequences of the group IV r factors are much more divergent from each other than the members of the other r 70 -family subgroups. Furthermore, some genomes contain over 60 group IV r factors, each of which can recognize unique, but overlapping, sets of promoter sequences. Nevertheless, the various group IV r factors generally share a high degree of conservation in their À35 element sequences, implying that the less conserved À10 element sequences provide the primary basis for promoter specificity between the different group IV rs, especially within the same species [7,36,37]. Therefore, the mechanism of À35 element recognition revealed in the Ec r E 4 /DNA structure should be relevant to other group IV r factors.
Partial to fully characterized regulons have been described for at least eight group IV rs: Ec r E [17], Bacillus subtilis (Bsu) r X [38], Bsu r W [39], Pseudomonas aeruginosa (Paer) r E [37,40], Mycobacterium tuberculosis (Mtub) r E [41], Mtub r H [42], Figure 4. Ec r E À35 Element DNA Geometry (A) Cartoon views of the DNA backbone geometry. The DNA was aligned using the template strand DNA from À359 to À309, giving an RMSD of 0.839 over 30 atoms for Ec r E 4 /DNA and Taq r A 4 /DNA. Straight B-form dsDNA is blue, Ec r E À35 element DNA is green, while Taq r A À35 element DNA is magenta. The paths of the DNA helical axes, calculated using Curves (http://www.ibpc.fr/UPR9080/Curindex.html), are also shown. (B) Graph showing the DNA minor groove width (calculated using 3DNA) for B-form DNA (blue), Ec r E 4 À35 element DNA (green), and Taq r A À35 element DNA (magenta; [49]). Minor groove width was calculated as the P-P distance minus 5.8 Å to take into account the radii of the phosphate groups. Streptomyces coelicolor (Scoe) r R [43], and Pseudomonas syringae (Psyr) HrpL [44]. When considering the À35 elements recognized by these group IV rs together, the À35 element can clearly be divided into three distinct regions. The first is an upstream G region, the second is the previously recognized AAC motif [7], and the third is a less wellconserved downstream T-tract ( Figure 6 and Figure S3). The differences and similarities between the consensus À35 elements recognized by these group IV rs can be directly explained from the r E 4 sequence alignments in light of the r E 4 /DNA structure ( Figure 6). For example, when consensus sequences for the À35 elements are aligned by the highly conserved AAC motif, all but one of them contain a G at the position equivalent to the Ec À35 position. In the structure, this position is recognized by Ec r E R176, which is conserved across all the Group IV rs. At the À34 position of the Ec r E 4 /À35 element DNA and Taq r A 4 /À35 element DNA complexes were aligned using the template strand DNA from À359 to À309, giving an RMSD of 0.839 over 30 atoms. The two views are related by a 908 rotation about the horizontal axis as shown. Proteins are shown as a-carbon backbone worms, color-coded as shown. The Ec r E À35 element DNA is colored light green (nontemplate strand) and dark green (template strand). The Taq r A À35 element is colored pink (nontemplate strand) and magenta (template strand). (B) Comparison of the Ec r E 4 and Taq r A 4 protein-DNA interactions. The Ca-backbone of Ec r E 4 and Taq r A 4 were aligned using Ec r E 4 residues 137 to 150 and 155 to 182 with Taq r A 4 residues 375 to 388 and 397 to 424, giving an RMSD of 1.00 Å over 42 atoms. Protein residue numbering is shown between the sequences (Taq/Ec). Residues in r 4.1 are highlighted in red/yellow (Taq r A /Ec r E ) and those in r 4.2 are colored purple/blue. Red dots denote protein residues that make base-specific DNA contacts. Colored dots denote protein residues that make DNA contacts. Black dots denote hydrogen bonds (less than 3.2 Å ) or salt bridges (less than 4.0 Å ) originating from the protein side chain. Magenta dots denote hydrogen bonds originating from the protein main chain. Blue dots denote van der Waals (hydrophobic) contacts (less than 4.0 Å ). Yellow dots denote cation-p interactions. The positions along the DNA that are contacted by each residue are indicated above and below the contact circles. (C) The protein a-carbon backbones of Ec r E 4 and Taq r A 4 were aligned as described in (B). The superimposed proteins, shown as a-carbon backbone worms, are shown on the left, color-coded as in (A). The Ec r E 4 /À35 element and Taq r A /À35 element complexes are shown separately (middle and left, respectively). In these views, the proteins are shown as molecular surfaces, color-coded according to electrostatic surface potential. The DNAs are shown as phosphate-backbone ribbons, with bases indicated schematically as sticks. DOI: 10.1371/journal.pbio.0040269.g005 promoter consensus, the occurrence of G or A correlates perfectly with the presence of S or T (respectively) at amino acid position 172.
In the Ec r E 4 /À35 element structure, the face of the phenylring of F175 makes van der Waals interactions with the C5methyl group of the T opposite the absolutely conserved A at position À33. Consistent with this, all of the Group IV rs except for Psyr HrpL have either an F or an H (which could contribute similar van der Waals interactions) at the equivalent amino acid position.
Amino acid residue R171 of r E 4 donates a hydrogen bond to the G opposite the highly conserved C at position À31. Correlating with the conservation of C at this position of the promoter is the occurrence of amino acid residues R or K (which could also donate a hydrogen bond to the complementary G). In the two exceptions, Mtub r H and Scoe r R have M at this amino acid position, and the Scoe r R consensus has a T at this position, while the Mtub r H À35 element has a very weak C/T at this position. Even the downstream T rich sequence, whose primary residue-specific interaction is with R149, is found only in the consensus of those r factors (Bsu r X , Bsu r W , Paer r E ) which contain an R or equivalent residue at this position. These correlations suggest that the mechanism of binding found in the Ec r E 4 /DNA structure can be generalized to other group IV r factors.

Conclusion
Despite similar function and secondary structure, the group I and IV r factors recognize their À35 elements using distinct mechanisms. The group IV r factor Ec r E 4 binds 4 Å further into the major groove than the group I r factor Taq r A 4 , making more extensive contacts. Unlike Taq r A 4 , Ec r E 4 does not bend the DNA. Instead, conserved sequence elements of the r E À35 promoter induce DNA geometry characteristic of oligo(dA) oligo(dT)Àtract DNA, including pronounced minor groove narrowing. For this reason, the highly conserved AA at À33/À32 is essential for À35 element recognition by r E 4 , even in the absence of direct protein interactions with the DNA bases. It appears that these principles of r E 4 /À35 element recognition can be applied to a wide range of other group IV r factors.

Materials and Methods
Cloning, expression, and purification of Ec r E 4 . The gene encoding Ec r E 4 (residues 122 to 191) was PCR subcloned from pLC31 [22] into the NdeI/BamHI sites of the pET-15b expression vector (Novagen, Madison, Wisconsin, United States), creating pWJL3. The plasmid was transformed into Ec BL21(DE3)pLysS cells, and transformants were grown at 37 8C in LB medium with amplicillin (100 lg/ml) to an OD 600 of 0.4 to 0.6. Protein expression was induced with 1 mM IPTG for 4 h. Cells containing the overexpressed protein were harvested and resuspended in lysis buffer (20 mM Tris-HCl [pH 8.0], 0.5 M NaCl, 5% glycerol, 0.1 mM EDTA, 5 mM imidazole [pH 8.0], 0.5 mM b-ME, and 1 mM phenylmethylsulfonylfluoride). Cells were lysed using a sonicator and clarified by centrifugation. Supernatants were applied to 2 3 5 ml of Ni 2þ -charged HiTrap metal-chelating columns (Amersham Biotech [GE Healthcare], Piscataway, New Jersey, United States). Lysis buffer with 20 mM imidazole was used to wash the column, followed by elution of the tagged protein using lysis buffer with 250 mM imidazole. To remove the (His) 6 -tag, samples were diluted into thrombin digestion buffer (20 mM Tris-HCl [pH 8], 0.15 M NaCl, 5% glycerol, 5 mM CaCl 2 , and 0.5 mM b-ME) and treated with thrombin (500 l g/100 mg protein) at 4 8C. To separate the cleaved (untagged) protein from the thrombin and uncleaved, (His) 6 tagged protein, the sample was reapplied to the Ni 2þ -charged HiTrap column in tandem with a 1 ml Benzamidine FF HiTrap column (Amersham), and the flow-through was collected. The sample was Figure 6. Correlation of r 4 and À35 Element Sequences for Several Group IV r Factors The top shows a sequence alignment of the proposed À35 element DNA binding region of several group IV r factors. The residue positions that are important in À35 element DNA recognition in the Ec r E 4 /À35 element DNA structure are highlighted green (similar to Ec r E ) or red (dissimilar to Ec r E ). The bottom shows the alignment of the known À35 consensus sequences from several group IV r factors. The three À35 element regions are highlighted with the upstream G region (blue), the middle AAC motif (red), and the downstream T rich region (green). Lines connecting the two alignments indicate protein residue-DNA base interactions important for À35 element recognition in the Ec r E 4 /DNA structure. DOI: 10.1371/journal.pbio.0040269.g006 then precipitated using ammonium sulfate (60 g/100 ml sample), centrifuged, and resuspended in gel filtration buffer (20 mM Tris-HCl [pH 8], 0.5 M NaCl, 5% glycerol, and 1 mM DTT). The resuspended sample was applied to a Superdex 75 gel filtration column (Amersham) equilibrated with gel filtration buffer. The eluted Ec r E 4 was concentrated to 30 mg/ml by centrifugal filtration (ViaScience, Hanover, Germany) and exchanged into a low salt crystallization buffer (20 mM Tris-HCl [pH 8], 0.2 M NaCl, 5% glycerol, 0.1 mM EDTA, and 1 mM DTT). Since Ec r E 4 rapidly precipitated at room temperature when in a low salt buffer (less than 0.3 M NaCl), all subsequent steps were done in the cold room using prechilled supplies. The final purified protein product was aliquoted, flash frozen, and stored at À80 8C. Electrospray mass spectrophotometry was used to confirm the mass of the purified product (8,427 Da).
Nucleic acid preparation. For the purposes of crystallization, several different DNA constructs were designed, based on the Ec r E 4 À35 consensus. Construct length and flanking bases were varied in an attempt to promote crystallization through end-to-end dsDNA contacts. Lyophilized, tritylated, single-stranded oligonucleotides (Oligos Etc., Wilsonville, Oregon, United States) were detritylated and purified on an HPLC using a Varian (Palo Alto, California, United States) Microsorb 300 DNA column [45]. The purified oligonucleotides were dialyzed into 5 mM TEAB (pH 8.5) and dried on a SpeedVac (Savant). The dried oligonucleotides were resuspended in 5 mM Na cacodylate (pH 7.4), 0.5 mM EDTA, 50 mM NaCl to a concentration of 1 mM. Equimolar amounts of oligonucleotides were annealed by heating to 95 8C for 5 min and then cooling to 22 8C at a rate of 0.01 8C/s. The annealed oligonucleotides were dried in a SpeedVac and stored at À20 8C.
Crystallization and structure determination of the Ec r E 4 -DNA complex. Co-crystals were obtained by vapor diffusion by mixing the duplex DNA ( Figure 1A) and Ec r E 4 (molar ratio 1:1.5) with the final concentration of protein at 1.8 mM (15 mg/ml). The mixture was centrifuged for 30 min, then was mixed with an equal volume of well solution (0.04 M MgCl 2 , 0.05 M Na-Cacodylate [pH 6.0], and 5% v/v 2methyl-2,4-pentanediol). Rectangular crystals (0.3 3 0.1 3 0.06 mm) grew within 5 d. Crystals were prepared for cryocrystallography by soaking in the crystallization solution supplemented with 25% 2methyl-2,4-pentanediol, followed by flash freezing in liquid nitrogen. A native dataset was collected to 2.3 Å at The National Synchrotron Light Source (NSLS, Brookhaven National Laboratory, Upton, New York, United States), Beamline X25 ( Table 1).
The structure was solved by molecular replacement with Molrep 8.1 [46] using Ec r E 4 from the Ec r E -RseA complex structure [22]. Initially, Molrep was used to search for solutions with 2 or 3 molecules per asymmetric unit. Both searches yielded a solution with two molecules of Ec r E 4 arranged in a symmetrical dimer (Molrep Corr ¼ 0.252). Though there were some slight clashes between the flexible N-and C-term regions, the crystal symmetry related molecules did not clash and in fact stacked upon one another in one direction. Additionally, there was room for the dsDNA. However, when this solution was used to generate an electron density map there was no observable density for the DNA. In an effort to improve the solution, the two-molecule dimer was used as a search model to generate a new Molrep solution (Molrep Corr ¼ 0.439), which yielded some clear dsDNA density. Molrep was further used to improve the dsDNA density by keeping the Ec r E 4 dimer fixed and doing two tandem molecular replacement searches using the 6-bp À35 element from the Taq [47] was then used to perform density modification, giving an improved electron density map in which clear density could be seen for the entirety of both dsDNAs, excluding the overhanging base at the downstream end of the DNA. The final DNA was built using a starting template of straight B-form dsDNA corresponding to the crystallization oligos (constructed using Namot2; http://namot. sourceforge.net). Model building was done using O v9.0.7 [48] and refinement using CNS v1.1 (Table 2).
Protein-DNA contacts were analyzed using the program CON-TACT, followed by geometric verification using PyMOL v0.98 (http:// www.pymol.org). CationÀp interactions were visualized using a custom PyMOL script based on previously determined geometric criteria [25]. DNA geometry was analyzed using 3DNA v1.5 [49] and Curves v5.1 (http://www.ibpc.fr/UPR9080/Curindex.html). Electrostatic surfaces were calculated using APBS: Adaptive Poisson-Boltzmann Solver [50]. All structural figures were prepared using PyMOL. Figure S1. Comparisons of Ec r E 4 and Taq r A 4 À35 Element DNA Geometry (A) Propeller twist, (B) DNA buckle, (C) curvature, and (D) major groove width calculated using 3DNA. Found at DOI: 10.1371/journal.pbio.0040269.sg001 (569 KB TIF). Figure S2. Comparison of Ec r E 4 À35 Element DNA and Nucleosome DNA (A) The nucleosome structure contains a sequence similar to the Ec r E 4 À35 Element DNA. Both DNA sequences contain an AA-tract followed by a non-A/T base and then a TT-tract. Despite the non-A/T base, both structures contain narrow minor grooves, which are characteristic of oligo(dA) oligo(dT) tracts. The DNA structures were aligned using the template strand phosphates. The minor groove narrowing is evident from the location of the non-template strand DNA relative to B-form DNA. The Ec r E 4 À35 element DNA is in green and the nucleosome DNA orange. (B) Graph showing the DNA minor groove width (calculated using 3DNA) for B-form DNA (blue), Ec r E 4 À35 element DNA (green), and nucleosome DNA (orange). Minor groove width was calculated as the P-P distance minus 5.8 Å to take into account the radii of the phosphate groups. Found at DOI: 10.1371/journal.pbio.0040269.sg002 (2.7 MB TIF). Figure S3. Correlation of r 4 and À35 Element Sequences, along with the À10 Element Consensus, for Several Group IV r Factors The top shows a sequence alignment of the proposed À35 element DNA binding region of several group IV r factors. The residue positions that are important in À35 element DNA recognition in the Ec r E 4 /À35 element DNA structure are highlighted green (similar to Ec r E ) or red (dissimilar to Ec r E ). The bottom shows the alignment of the known À10 (right) and À35 (left) consensus sequence logos from several group IV r factors. The three À35 element regions are highlighted with the upstream G region (blue), the middle AAC motif (red), and the downstream T rich region (green). Lines connecting the two alignments indicate protein residue-DNA base interactions important for À35 element recognition in the Ec r E 4 -DNA structure. Despite being more divergent then the À35 elements it is still possible to generate a proposed À10 element alignment. Possible regions of similarity within the À10 elements have been highlighted in light blue, magenta, and gray. The single base change thought responsible for the differential gene regulation between Bsu r X and Bsu r W is indicated with a red arrow. The column to the right of the sequence logos contains the signal and mechanism of regulation for each r factor. Found at DOI: 10.1371/journal.pbio.0040269.sg003 (1.7 MB TIF).

Accession Numbers
Structure coordinates and structure factors from the Ec r E 4 /DNA crystals have been deposited in the Protein Data Bank (http://www. rcsb.org/pdb) under ID code 2H27. The Protein Data Bank accession number for the nucleosome structure in Figure S2A is 1KX4.