Indirect DNA Readout by an H-NS Related Protein: Structure of the DNA Complex of the C-Terminal Domain of Ler

Ler, a member of the H-NS protein family, is the master regulator of the LEE pathogenicity island in virulent Escherichia coli strains. Here, we determined the structure of a complex between the DNA-binding domain of Ler (CT-Ler) and a 15-mer DNA duplex. CT-Ler recognizes a preexisting structural pattern in the DNA minor groove formed by two consecutive regions which are narrower and wider, respectively, compared with standard B-DNA. The compressed region, associated with an AT-tract, is sensed by the side chain of Arg90, whose mutation abolishes the capacity of Ler to bind DNA. The expanded groove allows the approach of the loop in which Arg90 is located. This is the first report of an experimental structure of a DNA complex that includes a protein belonging to the H-NS family. The indirect readout mechanism not only explains the capacity of H-NS and other H-NS family members to modulate the expression of a large number of genes but also the origin of the specificity displayed by Ler. Our results point to a general mechanism by which horizontally acquired genes may be specifically recognized by members of the H-NS family.


Introduction
Enteropathogenic Escherichia coli (EPEC) and enterohaemorrhagic E. coli (EHEC) are causal agents of infectious diarrhea. While the former is responsible mainly for infantile diarrhea, EHEC infections are associated with hemorrhagic colitis and may produce a life-threatening complication known as hemolytic uremic syndrome. EPEC and EHEC are non-invasive pathogens that produce characteristic attaching and effacing (A/E) intestinal lesions [1]. The genes required for the formation of A/E lesions are clustered on a pathogenicity island known as the locus of enterocyte effacement (LEE). LEE genes are organized in five major operons (LEE1 to LEE5) and several smaller transcriptional units and they encode the components of a type III secretion system (TTSS), an adhesin (intimin) and its receptor (Tir), effector proteins secreted by the TTSS, chaperones, and several transcription regulators [2]. The first gene of the LEE1 operon encodes the LEE-encoded regulator Ler, which is essential for the formation of A/E lesions in infected cells [3,4] and for the in vivo virulence of A/E pathogenic E. coli strains [5].
Ler (123 amino acids, 14.3 kDa) is the master regulator of LEE expression and is required to activate LEE genes that are otherwise repressed by the histone-like nucleoid structuring protein H-NS [2].
The H-NS protein, best characterized in E. coli and Salmonella, is a member of a family of transcriptional regulators with affinity for AT-rich DNA sequences that mediate the adaptive response of bacterial cells to changes in multiple environmental factors associated with colonization of different ecological niches, including human hosts. H-NS is usually an environmentally-dependent transcriptional repressor. H-NS-mediated repression (usually termed silencing) is alleviated either by alterations in physicochemical parameters (i.e., a transition from low (25uC) to high (37uC) temperature), by the activity of proteins that displace H-NS from its target DNA sequences, such as Ler, or by a combination of both. H-NS regulation is strongly associated with pathogenicity, thus understanding the basis of the selective regulation of virulence genes could lead to sustainable antimicrobial strategies that are less susceptible to acquiring resistance.
In addition to the LEE genes, Ler is also involved in the regulation of other horizontally acquired virulence genes located outside the LEE loci and scattered throughout the chromosome of A/E pathogenic strains [3,6,7]. However, Ler does not regulate other H-NS-silenced operons such as bgl [8] and proU [3]. This observation shows that Ler is not a general antagonist of H-NS, but a specific activator of virulence operons acquired by horizontal transfer (HT). Selective regulation of HT genes has been demonstrated in the plasmid R27 encoded H-NS paralogue (H-NS R27 ) and in chromosomal H-NS in the presence of a co-regulator of the Hha/YmoA family [9].
The mechanism of Ler-mediated activation has been extensively studied in operons located both within the LEE loci, such as LEE2/LEE3 [10], grlRA [11,12] and LEE5 [8], and outside, including nleA (for non-LEE-encoded effector A) [13] and the lpf1 fimbrial operon [6,14]. These studies suggest that Ler counteracts the silencing activity of H-NS by directly binding to DNA and displacing H-NS from specific promoter regions. Ler does not exert dominant negative effects on H-NS function and there is no evidence of a direct interaction between Ler and H-NS [8]. Despite the wealth of biochemical/biophysical data, including the proposal of a DNA sequence consensus motif for H-NS [15], the lack of structural data on the complexes formed between H-NS or H-NS family members and DNA has until now prevented a detailed understanding of the mechanism of DNA recognition and the basis of the selectivity within H-NS family proteins.
All H-NS-related proteins identified to date are predicted to be organized in two structurally different domains. While the oligomerization domains of Ler and H-NS differ greatly, their DNA binding domains are very similar, thereby suggesting that they account for the similar recognition properties of both proteins, and possibly also for their distinct selectivity. While a possible interplay between protein oligomerization and DNA binding cannot be ruled out, a detailed understanding of the recognition mechanism by individual DNA-binding domains is a prerequisite for further studies.
The C-terminal domain of Ler (CT-Ler), exhibits significant amino acid homology with the C-terminal H-NS DNA-binding domain (CT-H-NS; 36.0% identity, 63.8% similarity) and its deletion abolishes DNA binding [16]. CT-Ler contains a sequence (TWSGVGRQP) similar to the consensus core DNA-binding motif found in H-NS-like proteins (TWTGXGRXP) [17]. Here we present the solution structure of a complex formed by CT-Ler bound to a natural occurring DNA sequence of the LEE2/LEE3 regulatory region. This is the first report of a DNA complex that includes a member of the H-NS family characterized at atomic detail. Our results reveal that CT-Ler does not participate in basespecific contacts but recognizes specific structural features in the DNA minor groove. The indirect readout mechanism can be extended to H-NS and other H-NS family members and explains their capacity to modulate the expression of a large number of genes. The CT-Ler/DNA structure provides clues for the mechanism by which HT genes may be specifically recognized by members of the H-NS family and illustrates the general features of DNA minor groove readout.

CT-Ler/DNA complex formation
We used a CT-Ler construct encompassing residues 70-116 ( Figure 1A). This construct gave rise to a folded and functional domain ( Figure S1) with excellent solubility and long-term stability. Residues 117-123 are part of an extension that is dispensable to counteract H-NS repression [18]. NMR spectra of a construct including these residues showed that they are disordered and have no effect on the structure of the folded domain, as seen by the exact coincidence of the cross-peak position of most residues in HSQC NMR spectra of different constructs ( Figure S2).
The sequence of the short DNA fragment used to form the complex was based on the regulatory region of the LEE2/LEE3 operons spanning positions -221 to -101. This region was protected by Ler in footprinting experiments [10]. Seven 30 bp long dsDNA, LeeA-LeeG, with a 15 bp overlap between consecutive fragments ( Figure 1B, Table S1) were tested for binding to CT-Ler using fluorescence anisotropy. As positive and negative controls, we used two 30-mer duplexes: an adenine tract that was previously employed to study the DNA-binding properties of CT-H-NS, (GGCAAAAAAC) 3 [19] and (GTG) 10 ( Figure S3). CT-Ler showed the highest affinities for LeeF and LeeG ( Figure 1B) and we further analyzed its binding to the 15 bp overlapping region of theses two fragments, namely LeeFG (AAATAATTGATAATA). Fluorescence anisotropy titrations showed small but systematic deviations from the 1:1 model, suggesting simultaneous multiple binding to this DNA sequence ( Figure 1C). Since the consensus binding motif proposed for H-NS is only 10 bp long [15] we designed a new 15 bp DNA, LeeH (GCGATAATTGATAGG), containing the central 10 bp of LeeFG flanked by GC base pairs for thermal stability. LeeH partially matches the proposed H-NS consensus sequence (tCG(t/ a)T(a/t)AATT) [15]. A good fit to a 1:1 model with apparent K d 1.1060.05 mM was observed for this duplex ( Figure 1C).

Structure of the CT-Ler/DNA complex
The complex of CT-Ler with LeeH was solved by a combination of NMR and small-angle X-ray scattering (SAXS). The structure determination protocol consisted of the independent calculation of the structure of bound CT-Ler and DNA, followed by intermolecular NOE (iNOEs) driven docking and a final scoring including SAXS data. CT-Ler structures were calculated based on 1302 NOE distance restraints, together with torsion angle and experimentally determined hydrogen bonds. The restraint and structural statistics of the 20 lowest energy structures are shown in Table S2. None of the structures contained distance or dihedral angle violations .0.5 Å or 5u, respectively.
The pattern and intensities of bound DNA NOEs were typical of a B-form. The DNA structure was optimized in explicit solvent using experimental restrains determined in the bound form, starting from canonical B-DNA as described in the Materials and

Author Summary
Pathogenic Escherichia coli strains and other enterobacteria carry genes acquired from other bacteria by a process known as horizontal gene transfer. Proper regulation of the genes that are expressed in a given moment is crucial for the success of the bacteria. The protein H-NS is a global regulator that binds DNA and maintains a large number of genes silent until they are required, for example, to sustain the bacteria's colonization of a new host. Ler is a member of the H-NS family that competes with H-NS to activate the expression of a group of horizontally acquired genes that encode for a molecular machine used by E. coli to infect human cells. Ler and H-NS share a similar DNA-binding domain and can bind to different DNA sequences. Here, we present the structure of a complex between the DNAbinding domain of Ler and a natural DNA fragment. This structure reveals that Ler recognizes specific DNA shapes, explaining its capacity to regulate genes with different sequences. A single arginine residue is key for the recognition of a DNA narrow minor groove, which is one of, though not the only, hallmarks of the DNA shapes that are recognized by H-NS and Ler.
Methods section. The absence of major distortions in the DNA structure caused by CT-Ler binding was confirmed by the good agreement between the experimental SAXS curve of free LeeH and the prediction based on the DNA model extracted from the final complex ( Figure S4).
The DNA region most affected by CT-Ler binding, identified by the combined chemical shift perturbations of nucleotide protons, is centered in the symmetrical 4 bp AT-tract, AATT ( Figure 2A). The largest chemical shift perturbations of CT-Ler ( Figure 2B) were observed for residues Val88 to Arg93. The 30 assigned iNOEs involve protein residues located in the region where the chemical shift perturbations were observed. On the basis of these iNOE restraints and the mapped interfaces, 400 CT-Ler/LeeH complex structures were generated as described in Materials and Methods and ranked by energy and NMR intermolecular restraint (irestraint) violations. The quality of the structures was confirmed by comparing the predicted and experimentally determined SAXS curves of the complex. The SAXS profile predicted for the best NMR-derived complex structure is in good agreement with the experimental curve ( Figure 3A). The scatter plot in Figure 3B shows that, in general, the best NMR structures also fit SAXS data well. The final ensemble of 20 structures was selected using a scoring function that combined docking energy and measures of the agreement with experimental NMR and SAXS data (red circles). The ensemble is well defined ( Figure 3C), with a pairwise RMSD (heavy atoms) of 1.3060.38 Å and all conformers exhibited good geometry, no violations of iNOE distance restraints .0.5 Å and correctly explained the SAXS data. Most of the protein residues are in the core region of the Ramachandran plot. The small irestraint deviations illustrate that the protein-DNA interface is well defined, allowing us to elucidate a molecular basis for CT-Ler/LeeH recognition.
The structure of DNA-bound CT-Ler contains a central helix (residues 93-101) and a triple-stranded antiparallel b-sheet (b1:76-78, b2:84-85, b3:109-110). The b1-b2-hairpin is connected to the a-helix by a loop (Loop2:86-92). A turn and a short 3 10 -helix (105-108) link the helix to the b3 strand. The similarity between the C a and C b secondary chemical shifts of the free and bound forms indicate that the secondary structure is retained upon binding ( Figure S5). The overall protein fold is analogous to that previously described for CT-H-NS in the absence of DNA [19].
CT-Ler binds as a monomer inserting Loop2 and the Nterminal end of the a-helix into the DNA minor groove and contacting the central 6 bp region (A 6 A 7 T 8 T 9 G 10 A 11 ) ( Figure 4). The complex buries 953655.64 Å 2 of surface area and is  Table S1) derived from the DNAse I footprint of Ler in the LEE2/LEE3 regulatory region [10]. stabilized by non-specific hydrophobic and polar contacts, involving mainly the sugar-phosphates backbone and residues of the consensus DNA-binding motif found in H-NS-like proteins. Residues Trp85, Gly89, Arg90 and Pro92 ( Figure 1A), highly conserved among H-NS-like proteins, are located in the complex interface ( Figure 4B), and all gave rise to iNOE restraints with DNA. A summary of the observed intermolecular contacts is shown in Figure 4D.
The interaction surface of CT-Ler is positively charged and the Arg90 side chain is deeply inserted inside a narrow minor groove ( Figure 4B and C). In addition, Arg93 at the N-terminus of the ahelix and the helix-dipole moment itself create a positively charged region that points into the negatively charged minor groove.
The width of the LeeH minor groove varies along the sequence and deviates significantly from the average value of canonical B-DNA ( Figure 5). The groove progressively narrows towards the A 7 pT 8 base step, and widens at the T 9 pG 10 base step. The DNA electrostatic potential is modulated by the width of the minor groove. The guanidinium group of Arg90 interacts with the narrowest region of the groove where the electrostatic potential is most negative ( Figure 5A and B). The approach of Loop2, where Arg90 is located, is enabled by the adjacent widening of the minor groove.
Sequence-dependent variations of DNA structure can be described in terms of helical parameters, such as roll and helix twist ( Figure 5C and D). The roll angle is most negative (24.64u61.38) at the A 7 pT 8 base step and is small or negative for most of the steps in LeeH except for the pyrimidine-purine base steps, which show large positive values. A series of consecutive small/negative roll angles leads to the narrowing of the minor groove [20]. The groove widening at T 9 pG 10 can be traced to a combination of positive roll and a small helix twist of 33.8u60.8, indicating that the segment is slightly unwound with respect to the standard B-form. The region including the A 6 A 7 T 8 T 9 stretch is slightly overwound, with an average helix twist of 37.4u61.6.

Arg90 is essential for Ler binding
To verify the relevance of Arg90 in the interaction, we replaced this residue by glycine (R90G), glutamine (R90Q) or lysine (R90K) and tested their effects on the affinity of CT-Ler to LeeH. All CT-Ler variants were properly folded, as determined from NMR, and their interaction with LeeH was measured by fluorescence anisotropy ( Figure 6A). The mutated domains showed no affinity to LeeH or highly reduced affinity (R90K), thereby confirming that Arg90 is an essential residue.
The effect of these mutations on the binding of Ler(3-116), including the oligomerization domain, to the LEE2 regulatory region (positions 2225 to +121) was determined using electrophoretic mobility shift assays (EMSA) ( Figure 6B). In agreement with the results obtained with the isolated CT-domain, DNA binding by Ler is abolished by R90Q and R90G mutations and strongly reduced in the case of the R90K variant. These experiments confirm the essential role of Arg90 in the context of the oligomeric Ler protein and for the range of binding sequences present in one of its natural targets.

DNA sequence specificity of Ler binding
The structure of the CT-Ler/LeeH complex does not show base specific contacts. On the contrary, the structure of the complex suggests that CT-Ler recognizes local structural features of the minor groove that may be associated with distinct DNA sequences. In order to gain some insight into the range of DNA sequences that can be recognized by CT-Ler, we measured the dissociation constants of complexes formed by two series of short DNA duplexes related to the LeeH sequence. In the first series we introduced a single base pair replacement in each of the ten central positions of LeeH. Adenines and thymines were replaced by guanines and cytosines, respectively, and guanine in position 10 was mutated to adenine, to preserve the purine-pyrimidine sequence. In the second series, we compared the binding of CT-Ler to several 10-mer duplexes. One of these contained the ATtract (AATT) that interacts with CT-Ler in the LeeH complex flanked by GC base pairs to ensure thermal stability. Variants were designed to test the effect of interrupting the AT-tract by TpA steps at a number of positions.
Affinity to CT-Ler was measured by fluorescence anisotropy. The results are shown in Figure 7 and the DNA sequences and dissociation constants are listed in Table S3. Figure 7A shows the relative K d values of the single base-pair replacements of LeeH. The largest effects were observed when the base pairs of A 6 or A 7 were replaced. The base pair of G 10 resulted to be similarly relevant. A smaller effect was observed at the    position of T 8 . Small non-specific effects were observed in all the remaining sites except that of A 4 . The most affected base pairs were at the sites where the minor groove width in LeeH is more different from the standard B-DNA and define the features that we hypothesize to be recognized by CT-Ler: the narrow groove where the Arg90 side chain is inserted and the wide adjacent region that enables the approach of Loop2. Figure 7B show the relative dissociation constants of the complexes formed by the 10-mer duplexes. The presence of TpA steps in CGCAATAGCG, CGCTATAGCG and CGCTTA-AGCG results in a decrease in the stability of the complexes. The remaining three sequences (CGCAATTGCG, CGCAAATGCG, and CGCAAAAGCG) show AT-tracts of the same length but their affinity for CT-Ler differs. The complex with the A 4 stretch is 2-fold less stable than that containing the AATT motif.
The AT-tract in LeeH is terminated by a TpG pyrimidinepurine step. Replacing it by a TpC pyrimidine-pyrimidine step in a 10 bp duplex had only a minor effect on the affinity for CT-Ler (cf. AATT and AATTC in Table S3). Interestingly, replacement of the T 9 pG 10 step in LeeH by the alternative pyrimidine-purine step, TpA, resulted in a major loss of stability of the complex.

CT-Ler provides insight into DNA binding by H-NS
The DNA binding domains of Ler and H-NS share a high degree of similarity both in sequence and in structure. We carried out experiments to specifically test two key points that are apparent from the analysis of the Ler/LeeH complex, namely the role of the conserved arginine residue (Arg90 in Ler, Arg114 in H-NS) in Loop2 and the requirement for an AT-tract and the effect of interrupting TpA steps.
H-NS Arg114, corresponding to Arg90 in Ler, was mutated to glycine and the affinity towards the 2225 to +121 LEE2 region was compared with that of the wild type form by EMSA. As in the case of Ler, replacing the arginine residue in Loop2 results in a substantial loss of affinity ( Figure 8A). However, H-NS retains some residual activity even when arginine was replaced by glycine  The requirement for a narrow minor groove in the case of Ler can be assessed by the relative affinities towards the AATT and TATA 10-mer duplexes. Titrations of CT-H-NS with both oligonucleotides (Figure 8) provided dissociation constants of circa 41 mM for the AATT complex and 102 mM, 2-3-fold larger, for the TATA complex. CT-Ler showed similar relative affinities for the same oligonucleotides (Table S3), thereby suggesting that these two domains have similar requirements for a narrow minor groove.
As many H-NS and Ler target sequences may overlap, the relative affinity of the DNA-binding domains of these two proteins is relevant. As the CT-Ler complex studied included only the structured domain, we compared CT-Ler with the CT-domain of H-NS including only residues 95 to 137, excluding linker residues. This H-NS construct is properly folded as shown by the observation of well resolved NMR spectra (Figure 8). The same natural DNA fragment (LEE2 positions 2225 to +121) used in EMSA assays with Ler ( Figure 6B) and H-NS ( Figure 8A) was selected to compare the affinities of the CT-domains of these two proteins. The large number of binding sites for Ler and H-NS in this extended DNA fragment, as shown by footprinting experiments, allows the assessment of the relative overall affinities of the two domains for the whole range of sequences present in one of their common natural targets. The affinity of CT-Ler is larger than that of CT-H-NS, which under the conditions of the experiment hardly caused any retardation ( Figure 8C). This observation contrasts with the similar affinity towards the same DNA fragment shown by longer constructs of Ler and H-NS that include the oligomerization and linker domains (cf. Figure 6B and 8C) and highlights varying relevance of interactions outside the folded CT-domains of these two proteins. The contribution of residues outside of the structured H-NS DNA-binding domain has been previously described [21,22].

Discussion
The structure of the complex between CT-Ler and LeeH shows that DNA shape and electrostatics, rather than base specific contacts, form the basis for the recognition of the CT-Ler binding site. This mechanism is referred to as indirect readout. Arg90 is a key residue for the CT-Ler interaction with DNA. Its side chain is inserted deep into a narrow minor groove. The requirement for Arg90 is strict in the case of CT-Ler and the R90G and R90Q mutants of Ler are totally inactive. The R90K mutant shows some residual binding suggesting that a positive charge is required. Arginine interactions with the DNA minor groove have been described in eukaryote nucleosomes [23,24] and in DNA interactions by a nucleoid-associated protein of Mycobacterium tuberculosis [25]. These observations suggest that this mechanism may be universal for indirect DNA recognition of AT-rich sequences. A correlation between minor groove width and the electrostatic potential has been demonstrated as well as the preference for arginine binding to the narrowest regions where the electrostatic potential is more negative [23].
For CT-Ler, the narrow minor groove may be provided by a relatively short AT-tract as only the Arg90 side chain has to be inserted. The minimum width in the AATT motif is observed at the ApT step, matching the site where the guanidinium group is inserted. Continuous polyA tracts of 4 ( Figure 7) and 6 nucleotides ( Figure S3) of length give less stable complexes than sequences combining A and T. However, the presence of highly dynamic TpA steps [26] interrupting the AT-tracts decreases the affinity for CT-Ler. The presence of guanine, with its 2-amino group extending into the minor groove and increasing its width is also predicted to destabilize the insertion of the arginine side chain. We explored the effect of introducing TpG or TpA steps in the sequence recognized by CT-Ler. Figure 7 clearly shows that an uninterrupted AT-tract is needed for an efficient interaction with CT-Ler. However, a narrow AT-tract is not the only requirement for CT-Ler interaction. The lower affinity of the G10A variant of LeeH shows that, next to the narrow region, a rigid wide minor groove is also required to enable the access of Loop2 delivering the side chain of Arg90 into the narrowest region of the minor groove. Both sequences, T 9 pG 10 in LeeH and T 9 pA 10 in the mutated duplex, could adopt wide minor grooves. However, while the former is expected to provide a permanently wide groove, the flexible TpA step may switch between expanded and compressed forms, interfering with the approach of Loop2 directly or indirectly through the entropic penalty associated to stiffening of the DNA in the complex.
The structure of the complex as well as the affinity data with DNA sequence variants show that CT-Ler recognizes a pattern in the minor groove of DNA formed by two consecutive regions that are narrower and wider, respectively, with respect to standard B-DNA and show the optimal shape and electrostatic potential distribution for binding. This structural pattern is present in the free LeeH DNA fragment as shown by the observation of diagnostic inter-strand NOES between AdeH2 and ThyH1' protons of A 7 /A 23 and T 25 / T 9 , respectively supporting minor groove narrowing both in the free and bound forms of LeeH. Moreover, the SAXS data of free LeeH is better explained by the structure of LeeH in the complex than the structure of a canonical B-DNA LeeH ( Figure S4). Therefore, at least in the case of LeeH, CT-Ler recognizes preexisting DNA structural features following an indirect readout mechanism.
The molecular basis of the preference that H-NS displays for some promoter regions has been extensively studied. AT-tracts were initially postulated to be high affinity sites for H-NS and related to the presence of a narrow minor groove [27]. More recently, two short high affinity H-NS sites with an identical sequence, 5'-TCGATATATT-3' were identified in the E. coli proU promoter [28]. Lang et al. proposed that a 10 bp long consensus sequence (tCG(t/a)T(a/t)AATT) [15] acts as a nucleation site for cooperative binding to more extensive regions. In a recent study, a shorter segment of 5-6 nucleotides comprising only A/T nucleotides was found to be over-represented in genomic loci bound by H-NS in E. coli [29]. The interaction of the H-NS CTdomain, including a few residues from the linker region, with a short oligonucleotide was studied by NMR [22]. The authors concluded that a structural anomaly in the DNA associated with a TpA step was crucial for H-NS recognition.
Our results suggest that AT-tracts and wide TpA steps may be simultaneously required by H-NS family proteins. The correct positioning of a compressed and widened minor groove is the specific recognition signal for CT-Ler. Pyrimidine-purine steps tend to widen the minor groove and TpA steps may contribute to its widening, which is required after the AT-tract. However, in the case of Ler, a TpG step was preferred to the TpA step, suggesting that a wide narrow groove after the AT-tract is the true structural requirement.
CT-Ler and CT-H-NS showed similar structural requirements: mutation of Arg114 reduced the affinity of the complex, and introduction of TpA steps in the AT-tract caused a similar decrease in stability. This result is consistent with the fact that Ler targets can also be occupied by H-NS. Ler and H-NS bind to multiple sites. An indirect readout mechanism allows recognition of multiple sequences, if they adopt similar minor groove patterns.
The absence of structural changes between the free and bound forms of CT-Ler ( Figure S5)   with DNA. The dipoles of both indole rings are oriented with their positive end towards the negatively charged DNA backbone and the side chain NH of Trp94 forms a hydrogen bond with the DNA backbone.
We have determined for the first time the structure of a complex formed by the DNA-binding domain of a member of the H-NS family. Our results highlight the similarities in the DNA recognition mechanisms used by CT-Ler and CT-H-NS but also evidence some differences that may contribute to the differential recognition of some genes by Ler and H-NS.

Samples preparation
DNA fragments containing the coding sequence of Ler residues 65-123, 70-116 (CT-Ler) and 3-116 fused to an N-terminal His 6tag were amplified by PCR from EHEC strain 0157:H7 and subcloned into the pHAT2 vector. To overexpress CT-H-NS, DNA encoding this fragment (amino acids 95-137) with six histidine residues tagged at its N terminus was amplified by PCR using the full length H-NS construction [30] as template and then subcloned into the pHAT2 vector. Point mutations were generated using the QuikChange site-directed mutagenesis kit (Stratagene).
Ler fragments 65-123, 70-116 and 3-116 and CT-H-NS were overexpressed in BL21(DE3) cells with overnight incubation at 15uC by induction with 0.5 mM IPTG when an O.D. 600 of 0.7 was reached. For 15 N and/or 13 C isotopic labeling, cells were grown in M9 minimal media containing 15 NH 4 Cl and/or 13 Cglucose. For 10% 13 C enrichment we used a carbon source consisting of a 1:10 mixture of 12 C-glucose/ 13 C-glucose [31,32]. Cells were harvested by centrifugation, frozen and resuspended in 20 mM HEPES (pH 8.0), 1 M NaCl, 5 mM imidazol, 5% (v/v) glycerol, treated for 30 min with lysozyme and DNAse and sonicated (6610 s on ice). After centrifugation, the His-tagged fusion proteins were isolated with Ni-NTA beads (Qiagen) and further purified by size exclusion chromatography on a Superdex 75 column in 20 mM sodium phosphate, 150 mM NaCl, 0.2 mM EDTA, 0.01% (w/v) NaN 3 pH 5.7 or 20 mM sodium phosphate, 300 mM NaCl, 0.01% (w/v) NaN 3 pH 7.5. The expression and purification procedure for full length H-NS has been previously described [30]. DNA samples were prepared by hybridization of complementary oligonucleotides purchased from Sigma-Aldrich. Quality control was assessed by MALDI-TOF mass spectrometry. Oligonucleotides were mixed in equimolar amounts and annealed by heating to 92uC for 4 min and slowly cooled to room temperature.

Fluorescence anisotropy measurements
Changes in CT-Ler intrinsic fluorescence anisotropy were monitored upon DNA addition. All measurements were recorded on a PTI QuantaMaster spectrophotometer equipped with a peltier cell, using an excitation wavelength of 295 nm to selectively excite CT-Ler tryptophans and emission detection at 344 nm. Fluorescence measurements were performed in 40 mM HEPES (pH 7.5), 60 mM potassium glutamate, 0.01% (w/v) NaN 3 at 20uC. More details on data acquisition and equipment settings were previously described [33]. For the initial screening of the -221 to -101 regulatory region of LEE2, the apparent fraction saturation of CT-Ler was used to infer about DNA binding preferences. To measure the affinity of CT-Ler for 15 bp and 10 bp DNA fragments, titrations were performed at least in duplicate. The fitting was performed assuming a 1:1 binding using the following equations [34]: where A is the observed anisotropy, Af and Ab are the anisotropies of free CT-Ler and the complex respectively, f b is the fraction of bound CT-Ler and Q is the ratio of quantum yields of bound and free forms. Equations 1 and 2 were solved iteratively until the theoretical binding isotherm matched the experimental data. K d and A b were considered to be adjustable parameters.
NMR spectra for structure determination were recorded on a ,1 mM sample containing a 1:1 complex of uniformly 13 C-and 15 N-labeled CT-Ler and unlabeled DNA in 20 mM sodium phosphate (pH 5.7), 150 mM NaCl, 0.2 mM EDTA and 0.01% (w/v) NaN 3 . Backbone and aliphatic assignments of free and DNA-bound CT-Ler were obtained by standard methods. Aromatic resonances were assigned using 2D 1 H-13 C-edited-NOESY optimized for aromatic resonances. Stereospecific assignments of Val and Leu methyl groups were obtained from a constant time 1 H-13 C-HSQC on a 10% 13 C-labeled protein sample [31]. Non-exchangeable protons of the LeeH duplex bound to CT-Ler were assigned using 2D F1,F2-13 C-filtered TOCSY and NOESY spectra in D 2 O [38]. Exchangeable protons and H2 protons were assigned from 2D F1,F2-15 N/ 13 C-filtered NOESY spectrum in H 2 O [39]. Free DNA resonances were assigned using 2D DQF-COSY, TOCSY and 2D NOESY spectra. Proton chemical shifts were referenced using 4,4dimethyl-4-silapentane-1-sulfonic acid (DSS) as an internal standard, whereas 15 N and 13 C chemical shifts were indirectly referenced. Chemical shift assignments have been deposited in the BioMagResBank database under BMRB accession number 17729.
Protein backbone dihedral angle restraints were derived using a combination of TALOS [43] and quantitative analysis of 3 J HNHa obtained from a 3D HNHA spectrum [44]. Restraints on side chain x 1 angle and stereospecific assignments of Hb proton resonances were based on 3 J NHb couplings, obtained from a 3D HNHB spectrum, in combination with observed intraresidual NOEs using the HABAS routine of the CYANA 2.1 program [45]. 1 H-15 N HSQC spectra for analysis of the interaction of 15 Nlabeled CT-H-NS (100 mM) with dsDNA were obtained at 25uC in 20 mM sodium phosphate (pH 5.7), 150 mM NaCl, 0.2 mM EDTA and 0.01% (w/v) NaN 3 .

Structure calculation and refinement
The structure of CT-Ler was determined by simulated annealing using the torsion angle dynamic simulation program CYANA 2.1 [45] and further water refinement with CNS 1.2.1 [46,47]. Protein structure calculation was based on Unio'08/ CYANA-generated upper distances, 3 J HNHa / 3 J NHHb couplings, and TALOS-driven dihedral angle restraints. Based on H/D exchange experiments, backbone NOE pattern and 13 C a / 13 C b chemical shifts, hydrogen bond restraints were also used in the structure calculation. An ensemble of 100 protein structures was generated and the 20 lowest energy conformers were docked onto a B-DNA.
The observed overlap and broadening of DNA resonances hampered the complete quantitative analysis of NOESY spectra for bound DNA. Only a set of 282 well resolved cross-peaks were converted into distances using initial build-up rates and reference to the cytosine H5-H6 cross-peaks. Upper and lower limits were defined as 6 20% of the calculated distances. The structure of LeeH was fixed as B-DNA and further energy-refined using miniCarlo [48] followed by a 20 ps molecular dynamics refinement in explicit solvent using the Amber force field [49] and including NOE-derived distance restraints. To preserve the helical conformation of DNA, weak planarity restraints were also introduced. The DNA backbone was constrained to a range typical of B-form and all glycosidic angles were restrained as anti. Hydrogen bond restraints were used for all base pairs in which the imino proton was observed. The complex structure was generated employing 30 iNOEs, supplemented with highly ambiguous intermolecular restraints (AIRs) that were driven from the mapped binding interfaces. A total of 22 intermolecular NOE restraints were simultaneously assigned to the two symmetry-related protons in the AATT central region of the DNA and used as ambiguous restraints. HADDOCK 2.0 [50] was used to generated 2000 structures by rigid docking energy minimization, and 400 structures with the lowest energy were selected for semi-flexible refinement process. These 400 structures were finally refined in explicit water including all experimental restraints. Structures were then ranked using the energy-based HADDOCK scoring function (sum of intermolecular electrostatic, van der Waals, desolvation and AIR energies) and NOE energy term. The quality of these structures was evaluated in terms of the violations to the NOE data and the value x i SAXS defining the agreement to SAXS curve. A final ensemble of 20 structures was obtained by re-scoring the pool of 400 structures using the following scoring function.
where s xi and s Ei correspond to the root mean squared deviations with respect to the best possible value in x i SAXS and E i respectively. Coordinates of the final ensemble were deposited in the Brookhaven Protein Data Bank under the accession number 2lev.
Minor groove geometry and helical parameters were analyzed using w3DNA [51]. Electrostatic potentials were obtained at physiological ionic strength using DelPhi [52].

SAXS data collection and analysis
SAXS data for LeeH and the CT-Ler/LeeH complex were collected on a MAR345 image plate detector at the X33 European Molecular Biology Laboratory (DESY, Hamburg, Germany) [53]. The scattering patterns were measured at 25uC for 2 min at sample concentrations of 4.6 and 2.7 mg/ml and 6.6 and 3.3 mg/ml for LeeH and CT-Ler/LeeH, respectively. A momentum transfer range of 0.018, s ,0.62 Å 21 was measured. Repetitive measurements indicated that samples did not present radiation damage. Buffer subtraction and the estimation of the radius of gyration, R g , and the forward scattering, I(0), through Guinier's approach were performed with PRIMUS [54]. The scattering profile of LeeH was obtained from merging curves at both concentrations. For CT-Ler/ LeeH, SAXS profiles at both concentrations were virtually equivalent and only data from the highest concentrated sample were used for further analysis. Using Guinier's approach, the radii of gyration of LeeH and CT-Ler/LeeH were estimated to be 15.660.1 and 18.260.1 Å , respectively. All data manipulations were performed with the program PRIMUS. Using a bovine serum albumin sample (3.3 mg/ml), an estimated molecular weight of 18 kDa was obtained for CT-Ler/LeeH (theoretical MW of 16.3 kDa), thereby indicating the presence of a monomeric particle in solution. The agreement of the SAXS curve to various threedimensional models was quantified with the program CRYSOL [55] using a momentum transfer range of 0.018, s ,0.40 Å 21 .

Electrophoretic mobility shift assays
The DNA fragment used in this assay (LEE2 positions 2225 to +121) was obtained by PCR amplification from EHEC strain 0157:H7. The indicated concentrations of PCR-generated DNA and H-NS or Ler proteins were mixed in a total volume of 20 ml of 15 mM sodium phosphate, 100 mM NaCl, 0.01% (w/v) NaN 3 pH 7.5. 1 mM tris(2-carboxyethyl)-phosphine (TCEP) was included for samples containing full length H-NS. After 20 min of incubation at room temperature, glycerol was added to 10% (w/v) final concentration and the reaction mixtures were electrophoresed on either 1.5% agarose or 7% polyacrylamide gels in 0.5x Tris-borate-EDTA buffer. The DNA bands were stained with ethidium bromide. Table S1 DNA fragments used in the initial optimization of the CT-Ler/DNA complex. DNA fragments span the Ler-footprint within the LEE2/LEE3 regulatory region. Only the sequence of one of the complementary strands is shown. (DOC) Table S2 NMR and refinement statistics. Refinement statistics including the number and type of experimental restraints and the results of quality controls performed using PROCHECK [57] and CRYSOL [55]. (DOC)