Identification of DNA Binding Motifs of the Mycobacterium tuberculosis PhoP/PhoR Two-Component Signal Transduction System

Background The Mycobacterium tuberculosis PhoP/PhoR two-component signal transduction system controls the expression of about 2% of the genome and plays a major role in pathogenicity. However, its regulon has not been well characterized. Methodology/Principal Findings The binding site of PhoP transcription regulator was identified in the upstream regions of msl3, pks2, lipF and fadD21 genes, by using gene fusions, electrophoretic mobility shift assays and DNase I footprinting experiments. A consensus sequence for PhoP binding was deduced. It consists of two direct repeats, DR1/DR2, associated with a third repeat, DR3, important in some cases for PhoP binding to DR1/DR2 but located at a variable distance from these direct repeats. DR1/DR2 and DR3 consensus sequences were used to screen the whole-genome sequence for other putative binding sites potentially corresponding to genes directly regulated by PhoP. The identified 87 genes, encoding transcription regulators, and proteins involved in secondary metabolites biosynthesis, transport and catabolism are proposed to belong to the PhoP regulon. Conclusions/Significance A consensus sequence derived from the analysis of PhoP binding to four gene promoter regions is proposed. We show for the first time the involvement of a third direct repeat motif in this binding reaction. The consensus sequence was instrumented to study the global regulation mediated by PhoP in M. tuberculosis. This analysis leads to the identification of several genes that are potentially regulated by this key player.


Introduction
Mycobacterium tuberculosis (MTB), the causal agent of tuberculosis in humans, is one of the leading causes of mortality due to a single infectious agent. MTB is a successful intracellular pathogen that can adapt to changing environmental conditions within the host. It can infect different cell types, in which it may replicate or remain dormant for years. Differential expression of MTB genes is observed during the infection of various cells of the immune system, such as dendritic cells, macrophages and alveolar epithelial cells, during disease progression [1,2,3,4]. We have previously shown that bacterial stress responses are more strongly induced in dendritic cells, whereas genes encoding ribosomal proteins are overexpressed, indicating bacterial multiplication, in macrophages [5].
As in other prokaryotes, two-component signal transduction systems (TCS) are key elements of the adaptive response to various stimuli in the tubercle bacillus. TCS contain an environmentally sensitive histidine kinase (HK) and a response regulator (RR) that is activated by the cognate HK. These systems play a major role in bacterial responses to changing growth contexts. The MTB genome encodes 11 TCS [6]. This relatively small number of TCS probably reflects the intracellular lifestyle of MTB, as the cell environment is less variable than that confronted by soil bacteria, or a certain degree of overlap in signal processing.
The MTB phoP gene has been shown to encode one of the components of a TCS (PhoP/R) playing a major role in virulence [7]. Inactivation of phoP results in an attenuated mutant unable to replicate in animal models and in cells cultured in vitro, but able to persist, unlike auxotrophs. This led to the construction of live candidate vaccines against tuberculosis based on the inactivation of phoP [8]. Analyses of the genomes of known avirulent strains, such as the BCG vaccine or the H37Ra attenuated clinical variant, showed mutations in phoP or phoR resulting in a loss of function, thus confirming the role of phoP/phoR in virulence [9,10,11].
The MTB PhoP regulator belongs to the PhoB/OmpR subfamily, the largest one among response regulators. Members of this subfamily have two domains, an N-terminal regulatory domain and a C-terminal DNA-binding domain (also called the effector domain). Many members of this subfamily have been studied in detail, and most reports have indicated that these response regulators bind DNA as a dimer recognizing tandem repeat binding sites at the -35 region of the regulated promoter [12]. The Bacillus subtilis TCS PhoP/PhoR, which belongs to this subfamily, senses phosphate. It is activated in conditions in which phosphate is limited. It regulates the expression of 31 genes involved in phosphate utilization [13]. However, in Salmonella enterica typhimurium, PhoP/PhoQ responds to other stimuli, such as Mg 2+ concentration [14], low pH [15,16] and antibacterial peptides [17,18], thereby regulating genes involved in Mg 2+ homeostasis and virulence [19]. The stimuli sensed by the MTB PhoP/PhoR TCS are unknown.
Efforts have been made to identify genes regulated by this PhoP/PhoR TCS, by comparing the transcriptomes of phoP null mutants and wild-type parental strains. Two studies identified a large number of genes positively or negatively regulated by PhoP, 70 of which were upregulated [20,21]. However, these two studies identified different, though overlapping, sets of regulated genes. Differences could also be due to the use of two different strains in the two studies (H37Rv for one study and MT103 for the other one), or to the lack of knowledge of the metabolites acting as stimuli for this TCS, resulting in non optimal experimental conditions.
Many of the genes upregulated by PhoP are involved in general or lipid metabolism, substrate transport across the plasma membrane and the synthesis of regulators, such as DosR, controlling dormancy. phoP null mutants display deficiencies in the synthesis of sulfatides and diacyl and polyacyl trehaloses. msl3, a polyketide beta-ketoacyl synthase gene, is involved in the synthesis of polyacyl trehaloses, and the pks2 and mmpl8 genes are involved in sulfatide synthesis [21,22]. The disruption of pks2 generated a sulfolipid-deficient mutant that was unable to synthesize hydroxyphthioceranic and phthioceranic acids [23]. The msl3-disrupted mutant was unable to produce the mycolipanoic and mycolipenic acids required for the synthesis of a major class of polyacylated trehaloses. In the absence of these classes of polyacylated trehaloses, which anchor trehaloses to the cell surface, the mutants grew in bead-like aggregates, with no discernable decrease in either growth rate or in virulence [24]. phoP null mutants and H37Ra lack these complex lipids. PhoP/ PhoR TCS also controls secretion of the early-secreted 6 kDa antigen (ESAT-6), an important virulence factor and antigenic component of M. tuberculosis [9].
The two transcriptomic analyses suggested that the lipF and fadD21 genes were positively regulated by PhoP [20,21]. The lipF gene of MTB encodes an esterase that has been shown to be important in pathogenesis. The insertion of a transposon between the lipF promoter region and the transcription start site significantly decreased the ability of the bacterium to grow in mouse lungs [25].
PhoP is autoregulated, as it binds to two adjacent 9-bp direct repeat motifs binding to PhoP in a sequence-specific manner. These motifs are located downstream from the transcription +1 site, contrasting with the models proposed for other members of the PhoB/OmpR TCS family [26]. We investigated PhoP binding sites and their molecular regulation in more detail, by looking for such sites in other promoters thought to be regulated by PhoP on the basis of previous transcriptome analyses.
We first used gene fusions to identify the msl3, pks2, lipF and fadD21 gene promoters. DNA-affinity electrophoretic mobility shift studies and DNase I protection assays have been used to identify PhoP binding sites. These sites were found to contain tandem repeat sequences displaying similarities that could be used to define a consensus-binding motif. We then carried out wholegenome analysis with these consensus sequences, to identify genes regulated directly by PhoP.

Identification of promoter regions
We used lacZ as a reporter gene, to measure the activity of M. tuberculosis promoters thought to be regulated by PhoP in M. smegmatis, a non-pathogenic and fast-growing mycobacterial species.
Several fragments corresponding to upstream and structural gene regions of pks2 (286,+60), msl3 (2282,+28) and fadD21 (2198, +10) were generated (Table 1). These fragments were inserted upstream from the lacZ reporter gene in the pJEM15 E. coli-mycobacterial reporter shuttle plasmid. For lipF, a 611 bp region (2596, +14) was used, because this region has previously been reported to be required for the upregulation of transcriptional activity in response to exposure to an acid stress. This upregulation has been demonstrated in both pathogenic M. tuberculosis and non-pathogenic M. smegmatis strains [27]. Cultures of M. smegmatis transformed with the empty pJEM15 vector, a positive control, the pJEM31 (PAN) vector containing a promoter sequence, PAN, isolated from M. paratuberculosis [28] or recombinant pJEM15 plasmids carrying lipF, pks2, msl3 or fadD21 gene fragments were set up. Aliquots of cultures were collected during the exponential growth phase and beta-galactosidase activities were assessed to determine transcription levels.
As indicated in Fig. 1, expression levels were higher for M. smegmatis transformed with the recombinant plasmids carrying pks2, msl3, lipF or fadD21 gene fusions than for M. smegmatis transformed with the empty pJEM15 vector. No significant expression was detected for M. smegmatis transformed with the empty pJEM15 vector. With the plasmid pJEM31 (PAN) used as a positive control, we observed expression of similar magnitude as observed for pks2, msl3 and fadD21. A much lower expression was observed for M. smegmatis carrying the lipF fusion, under the conditions tested. These results demonstrate that there was a promoter present in the various cloned regions in fusion with lacZ gene.

Identification of PhoP binding sites
We investigated whether PhoP bound directly to the DNA sequences upstream from lipF, pks2, msl3 and fadD21 identified in gene fusion studies, by carrying out electrophoretic mobility shift assays (EMSA) with phosphorylated PhoP protein (PhoP-P). In fact, previous studies have shown that PhoP must be phosphorylated for efficient binding to specific sites. We also carried out EMSA with the regulatory binding region of the PhoP gene itself, Protect40, as a positive control [29,30]. Only positive bindings were shown in Fig. 2. We started by using PCR fragments of about 200 bp in size for EMSA (Table 1, Table 2, and Fig. 2A). For the lipF promoter region, we generated three 200 bp subfragments: lipFa (2596 to 2375), lipFb (2394 to 2168) and lipFc (2167 to +14). Only lipFa displayed a clear shift in electrophoretic mobility in the presence of PhoP-P ( Fig. 2A and Fig.S1). For msl3, we generated two subfragments: msl3a (2282 to 249), which was about 200 bp in length, and msl3b (267 to +28), which was about 100 bp long. Only msl3a displayed a clear shift in electrophoretic mobility in the presence of PhoP-P ( Fig. 2A).
The 147 bp (286 to +60) pks2 and the 209 bp (2198 to +10) fadD21 fragments used in beta-galactosidase experiments also displayed a clear shift in electrophoretic mobility in the presence of PhoP-P ( Fig. 2A). To address relative affinity differences between the DNA substrates, we varied the protein to DNA ratio to obtain a complete shift. The protein concentrations used ranged from 0 to 8 mM, for a constant DNA concentration of 40 nM. All five regulatory regions studied (lipF, pks2, msl3, fadD21 and phoP) bound to PhoP-P resulted in the detection of higher molecular weight bands. For msl3a, fadD21 and pks2, a PhoP-P:DNA ratio of 100:1 was sufficient to obtain an almost complete shift, whereas a ratio of 200:1 was required for lipFa. The various DNA segments thus have different affinities for PhoP-P (Fig. 2). A faint second band of higher molecular weight was observed for Protect-40, but not for the other targets analyzed here. All reactions were tested for the requirement for phosphorylation (Fig. 2). Non-phosphorylated assays are shown (Fig.S2). No significant binding was observed in absence of phosphorylation.  Using the data obtained with the large fragments, we carried out a sequence alignment analysis of all the fragments shown to bind PhoP-P. The regions similar to DR1/DR2 and DR3 previously identified in the phoP promoter [26] were observed.
We applied trimming and walking techniques to the initial oligonucleotides to generate subsequent fragments of 50 to 100 bp in size (Fig. 2B, Fig. 3) and proceeded with PhoP-P binding.
The results from the EMSAs on the subfragments were analyzed in the context of the presence or absence of the DR repeats. In addition to lipFa, we also tested two shorter DNA regions for binding: lipFa1 (2575 to 2508) and lipFa2 (2465 to 2377). Binding was observed only with lipFa1, which contains DR1-3, at a PhoP-P concentration of 2 mM, but not for lipFa2. For pks2, we tested four additional shorter DNA regions: pks2a (252 to +6), pks2b (295 to 256), pks2c (+10 to +86) and pks2d (286 to +15). Binding was observed for pks2d, which contains a partial DR1 and DR2-3 (at a PhoP-P concentration of 2 mM). Only very weak binding was observed for pks2a, which contained a DR3 sequence alone (at a PhoP-P concentration of 10 mM), and for pks2b, which contained only DR1 and DR2 sequences (at a PhoP-P concentration of 20 mM). pks2c which does not contain any of the DRs did not give any binding (Fig. 2B). We studied msl3a binding in more detail with two additional DNA fragments: msl3a1 (2242 to 2198) and msl3a2 (2269 to 2188). Binding was observed for both of these fragments; msl3a2 which contains the entire DR1-3 (binding observed at a PhoP-P concentration of 2 mM), whereas msl3a1 contains only the first 3 bp of DR3 (binding at a PhoP-P concentration of 2 mM). Finally, fadD21 was studied by testing two additional DNA regions: fadD21a (2152 to 298), and fadD21b (2145 to 261). Binding was observed for fadD21b, which contained all three DR sequences (PhoP-P concentration of 2 mM). Only very weak binding was observed for fadD21a, which contained only DR1 and partial DR2 (PhoP-P concentration of 10 mM), as for pks2b, which also contained only DR1-2 (PhoP-P concentration of 20 mM). However, in assays of PhoP-P binding to Protect40, full binding was observed when we used a DR1-2-containing fragment (PhoP-P concentration of 4 mM). Interestingly, reducing the size of the DNA targets (45-102 bp) resulted in a decrease in the protein/DNA ratio (only 1:50) required to obtain a shift, (Fig. 2B), showing that we indeed were able to increase the affinity as we narrowed down the targets to match the DRs.
The transcription initiation sites were determined for lipF, pks2 and msl3 in previous studies [31,32]. This allowed us to locate DR1, DR2 and DR3 with regard to the transcription start.
We investigated the importance of the repeat motifs in sequence-specific DNA binding by PhoP, using oligonucleotides with altered nucleotide sequences Pho1, Pho2, Pho3 (Table 2, Fig. 2C and 3E) in place of DR1, DR2 or combined DR1/DR2 sequences, respectively whereby we removed one or all of the DR1-2 repeats. The Protect-40 wild-type sequence was used as a control (Fig. 2C). The Pho1-Pho3 altered sequences severely impaired PhoP-P binding. These results confirm that the binding of PhoP to the phoP promoter region is dependent on the direct repeat motifs.

Characterization of the PhoP binding site by DNase I footprinting
We defined the DNA regions binding PhoP more precisely, by carrying out DNase I protection assays on the PhoP-P/DNA target complex. We used the lipF, pks2, fadD21 and msl3 promoter regions for these protection experiments. PhoP-P binding enhanced the protection of these four promoters against DNase I digestion (Fig. 4). The protected regions for pks2, msl3 and fadD21 were approximately 50 to 70 bp long (Fig. 4). For pks2 and fadD21, these regions include coverage of the DRs, the spacer region between them, with a complete DR3 motif. In the case of msl3, the DR3 is partially protected until position A7 known for being present in all the DRs (Table 3, [26]). The protected region of lipF is only 30-35 bp in length and the DR3 region is not protected by PhoP protein.

DR sequence analyses
Using two different approaches (see Materials and Methods section), we identified three motifs matching the original DR1, DR2 and DR3 repeats of the PhoP promoter. These repeats are also present in all target regions upstream the lipF, pks2, msl3 and fadD21 genes (Fig. 5). The degree of sequence identity to the consensus of the canonical DR1-3 motifs of the phoP promoter region [23] ranged from 56% to 89% (Table 3). DR1 was the most conserved motif, followed by DR2, and then DR3. The DR1 and DR2 motifs were separated by 1 to 5 bp. By contrast, DR2 and DR3 were separated by 13 to 35 bp (Fig. 5). Taking this into account, we analyzed the distribution of the consensus sequences for each motif throughout the MTB genome. Motif configurations with only DR1 and DR2, or with DR1, DR2 and DR3 were observed in the upstream regions of 87 genes, accounting for 83.9% and 16.1% of total cases, respectively. The identified genes were analyzed according to the classes of the Clusters of Orthologous Groups (COGs) classification [29,30] (Fig. 6A, Table  S1). We further investigated, using the Tuberculist data, the function of genes that are not listed in COGs (38 genes, Fig. 6B). Some of the identified genes have already been reported to be regulated by PhoP on the basis of transcriptomic data [20,21]. Based on the COG classification, we found that PhoP play a key role as a putative transcription regulator mostly for genes involved in transcription (21%) and Secondary metabolites biosynthesis, transport and catabolism (14%), lipid transport and metabolism (9%), and coenzyme transport and metabolism (9%) (Fig. 6A).

Discussion
All living organisms have developed regulatory networks allowing them to survive in different environments, through rapid adaptation to hostile conditions and the conservation of energy by blocking pointless biochemical biosynthesis. TCS regulators, which have an environmentally responsive component, play a major role in this process. One of the advantages of this mode of control is that it allows the expression of complex traits or actions, such as membrane biosynthesis, host defences escape or adaptation, to be coordinated. This regulation results in changes to the patterns of expression of multiple genes [33,34].
The MTB PhoP/PhoR TCS regulates the expression of more than 100 genes, based on differential gene expression data [20]. This includes genes encoding proteins involved in polyketide synthesis, for which differences in expression have been observed between PhoP null mutants and wild-type strains.
We aimed to identify genes directly regulated by PhoP. Transcriptome analyses have identified groups of genes regulated by PhoP, including genes encoding regulators that might function as intermediates in regulatory cascades [20,21]. An analysis and comparison of these groups showed that they had only a few genes in common. This result reflects the problems involved in dealing with genes displaying low-level regulation and the importance of taking into account different experimental conditions and different strains genotypes. Indeed, as PhoP plays a key role in the regulation of several virulence factors, its activation varies with stressful conditions and the nature of the stimulus of the cognate sensor. We used a combination of genetic, biochemical and bioinformatics approaches to define PhoP binding sites as hallmarks of PhoP-regulated genes. We studied lipF, msl3, pks2 and fadD21, which were shown to be upregulated by PhoP in genetic [29] and transcriptomic studies [20,21]. However, there has been little evidence for the mechanism of regulation by PhoP in terms of DNA recognition sites, binding to selected genes or search for genes distributed throughout the genome with sub threshold regulation levels.

Oligonucleotides
Sequence (59-39 position relative to the translation start) phosphorylated PhoP protein as unphosphorylated MTB PhoP binds weakly to oligonucleotides derived from the lipF, pks2, msl3, and fadD21, promoter sequences as previously described [31,32,35]. These observations are similar to those for PhoB from E. coli, in which PhoR and PhoB constitute a two-component system controlling phosphate uptake [35]. The first study on PhoP protein interaction with its own promoter revealed the presence of three direct repeat motifs (DRs) [26]. The first two direct repeats have been shown later to be sufficient for this interaction [30,32]. More recently, the analyses of the promoter regions of pks2 and msl3 genes, revealed the presence of two DRs motifs that are needed for recognition by PhoP protein [31]. Nevertheless the motifs published by Goyal et al. (2011) are different from those that we identified. While the previously identified DRs regions are different from those reported in our study, the two works identified, overall, the same protected region. The main difference resides in the fact that we took in consideration the previously confirmed DRs regions identified in the promoter region of the phoP gene. This led us to clearly identify the presence of DR1-3 in some of the promoters that bound PhoP-P, and also allowed us to witness in some cases loss of binding when DR3 was missing. This is confirmed by our footprinting assays and gel shift experiments.
In this study, we show that for all studied genes, we were able to identify in their promoter regions three DRs namely DR1, DR2 and DR3. In addition, the DR3 repeat is essential for the binding of PhoP to pks2 and fadD21 DR1-2 repeats. This result is consistent with a potential cooperative interaction between DR3 and DR1-2. By comparing all identified DRs motifs we could suggest that DR3 assisted the binding to all PhoP-P-regulated sites showing low levels of similarity to phoP's DR1 and DR2 sequences (Table 3). Moreover, the ratio of PhoP to DNA required to achieve this binding is variable, thus resulting in additional fine-tuning of expression for different genes under the control of PhoP.  2007) overlaps with the newly assigned DR3 box described here. Based on the distance between the lipF DR1/DR2 sites and the transcription start site, we suggest that a tandem head-to-tail PhoP dimer, close to the 235, binds to the DR1/DR2 repeats that occupy adjacent same-DNA-face strands. The same configuration is also observed for pks2. A similar situation has been described for E. coli PhoB, which activates transcription by interacting with the sigma 70 subunit of the RNA polymerase in promoters in which the 235 sigma recognition element is replaced by the Pho box.
PhoP dimerization would bring the two copies of the DNAbinding domains close to each other, facilitating binding to DR1 and DR2, which are separated by only two nucleotides in this case. The presence of DR1/DR2 PhoP binding sites that overlaps with the 235 transcription region of lipF and pks2 might be indicative of a class II activation mechanism according to which the regulator PhoP binds the region 4 of the sigma factor [36]. This configuration might result in the recruitment of the RNA polymerase to the promoter and subsequent induction of transcription start. The presence of DR1/DR2 PhoP binding site upstream the 235 transcription region for msl3 might involve another mechanism of transcription activation that remains to be investigated.
With the use of various bioinformatics analyses, we fine-tuned the three consensus sequences for the individual DR1, DR2 and DR3 sites (Fig. 6). Only DR1 and DR2 were highly conserved in all analysed regions. We used these confirmed consensus sequences by taking into account the length of the spacer sequences between them to investigate the distribution of these motifs throughout the Mycobacterium tuberculosis genome, in order to identify all genes that could be potentially regulated directly by the PhoP/PhoR TCS. We found these motifs in the upstream regions of 87 genes, including genes having already been reported to be regulated by PhoP [21]. The classification of all identified genes into COG categories revealed a predominance of genes encoding proteins  Table 3. Identification of the DR1-3 sites on the DNA promoters recognized byPhoP.

DR1
DR2 DR3 msl3 TCTGGTAGC 8/9: 89% CATGGCAAC 7/9: 78% AATGTGTTC 5/9: 56% fadD21 TGTTTCAGC 7/9: 78% ATGCACAGC 6/9: 67% ATGATCAGC 7/9: 78% lipF ACGTACAGC 8/9: 89% ACTCCCAGT 6/9: 67% CCTGTGATC 6/9: 67% pks2 CCCAGTAGC 6/9: 67% CCGCTTAGA 6/9: 67% AGACACAGC 5/9: 56% phoP ACTGTTAGC 9/9: 100% ACTGGCAAC 9/9: 100% consensus sequence of phoP A C T/G T/G T/G Py A Pu C For each DNA sequence shown to bind PhoP-P, sequences with a complete or partial match to the DR1, DR2 and DR3 sites were identified (underlined). The degree of conservation of the consensus sequence was calculated and is shown as the ratio of matching nucleotides to the consensus: % identity. doi:10.1371/journal.pone.0042876.t003 involved in transcriptional regulation (Fig. 6). This result reflects and confirms the role of phoP as a key player in the virulence of tubercle bacilli by controlling the activity of several genes. Indeed, PhoP has been shown to regulate the differential transcription of up to 600 genes in E. coli [37]. Genes encoding proteins involved in the biosynthesis of polyketide-derived lipids were also well represented, providing additional support for our approach. These genes have already been reported to be under the control of the phoP/phoR regulon [20,21]. Some of the other genes identified were classified as involved in lipid biosynthesis pathways. This finding is consistent with previous studies showing that synthesis of the lipid components of the cell surface is regulated by PhoB in E. coli.
Our analysis provides information that cannot be obtained easily by transcriptomic approaches, because the stimuli of the PhoP cognate sensor have yet to be identified and because transcriptional regulators are often only weakly expressed. The identification of genes directly regulated by PhoP paves the way for the characterization of genes involved in pathogenicity. The role of these genes should be further investigated by genetic approaches and may be used as a starting point for the design of new drugs and vaccines.

Plasmid construction
For construction of the plipF, ppks2, pmsl3 and pfadD21 promoter-lacZ fusions, we amplified the corresponding fragments from M. tuberculosis H37Rv genomic DNA by PCR with the primers shown in Table 1. The PCR products were digested with BamHI and KpnI and inserted into the corresponding sites in the pJEM15 E. coli-mycobacterial shuttle plasmid [39]. We used pJEM31-PAN containing a promoter sequence, PAN, isolated from M. paratuberculosis as a positive control. This promoter lies adjacent to and does not overlap the 39 end of an IS900 mobile genetic element [28]. This plasmid was used to transform M. smegmatis mc 2 155. The strains were grown at 37uC in 7H9 medium supplemented with kanamycin (20 mg/ml), until they reached an optical density at 600 nm (OD 600 nm ) of 0.6-0.8. The cells were collected by centrifugation and b-galactosidase activity in the cell extract was evaluated to determine the levels of transcription from the various promoter regions studied (lipF, pks2, msl3, and fadD21).

Quantification of beta-galactosidase activity
Mycobacterial strains were grown to the exponential growth phase (OD 600 nm = 0.8). The cultures were centrifuged, and the Electrotransformation of mycobacteria 50 ml culture of bacteria was grown to an OD 600 nm of 0.6 to 0.8. The cells were collected by centrifugation, washed twice in 10% glycerol and resuspended in 2 ml of 10% glycerol. Aliquots (400 ml) were electroporated with vector DNA in 0.2 cm-path length cuvettes (Bio-Rad), with a single pulse (2.5 kV, 25 mF, 1000 V). Cells were transferred to 1 ml of 7H9-ADC-0.05% Tween 80 and incubated for 2 hours at 37uC.

Production, purification and phosphorylation of PhoP
The PhoP protein was produced in E. coli Bl21(DE3) pLysS (Agilent Technologies), from the pET15b expression plasmid (Novagen, Merck Chemicals France) [30]. PhoP overproduction was induced by adding 1 mM IPTG (Euromedex, France) to LB medium and incubating for 3 hours at 30uC. The cells were collected by centrifugation at 5,500 rpm (SLA-4000 rotor, RC5c (Sorvall) centrifuge) for 30 min and resuspended in 50 mM Tris-HCl pH 8.0, 1 M NaCl, 5 mM KCl, 10% glycerol, 5 mM imidazole, 100 mM phenyl methyl sulfonyl fluoride (PMSF) (Buffer A). The cells were lysed by passage through an Emulsiflex-C5 (Avestin Europe) at 4uC. PhoP was purified on a 5-ml HiTrap Chelating HP (GE Healthcare) affinity column loaded with 0.5 M Ni 2 SO 4 and equilibrated with Buffer A, on an AKTA-prime FPLC machine (GE Healthcare). The histidine tag was cleaved overnight at 4uC, with a Thrombin CleanCleavage kit (Sigma Aldrich). The released PhoP was further purified with a second passage through the 5-ml HiTrap Chelating HP Ni-affinity column. Pure protein (.95%, based on SDS-PAGE) was obtained after several rounds of FPLC, and stored at 280uC in 50 mM Tris-HCl pH 8, 150 mM NaCl, 5 mM KCl, and 20% glycerol.

DNA fragment synthesis and purification
Double-stranded DNA fragments (ds-DNAs) were labeled at the 59 end by PCR with commercially synthesized 59-labeled IRD700 or DY682 (Eurofins MWG Operon, Germany) primer and the various templates described in Table 1, 2. ds-DNAs were purified by electrophoresis in 8% polyacrylamide gels, electroeluted in 0.56 TBE (Biosolve, Netherlands), through 10 kDa MWCO (molecular weight cut off) dialysis tubing (Spectra/Por, Spectrum Labs, USA), and dried on a Speed Vac. The DNAs were resuspended in 60 mM sodium acetate pH 5.4, precipitated in 100% ethanol, washed in 70% ethanol and recovered in 10 mM Tris pH 7.8, 0.1 mM EDTA (Gibco). The DNA concentrations were deduced from A 260 nm before use or storage.

Electrophoretic mobility shift assay (EMSA)
Aliquots of 40 nM labeled ds-DNAs were separately mixed with various concentrations of PhoP or PhoP-P, in a final volume of 30 ml in the presence of poly dI-dC at 10 mg/ml. The mixture was dialyzed in several steps against 50 mM Tris-HCl, pH 8.0, 1 mM dithiothreitol (DTT), 1 mM EDTA, 100 mM PMSF, with decreasing concentrations of NaCl (750 mM, 250 mM, 150 mM respectively), at 4uC. The DNA/PhoP(-P) complex obtained at the end of dialysis was mixed with 3 ml of gel loading buffer (0.1% bromophenol blue and 40% sucrose), loaded onto a 20-cm620-cm 8% native polyacrylamide gel and subjected to electrophoresis at 180 V/cm for 3 hours at 18uC, with 0.56 TBE used as the running buffer. The gels were then transferred onto blotting paper (Whatman 3MM CHR) and scanned on an OdysseyH Infrared Imaging System at 700 nm (Li-cor Inc, NE, USA).

DNaseI footprinting experiments
DNA probes were prepared by amplifying fragments of the lipF, pks2, msl3 and fadD21 upstream regions by PCR with the lipFDF/ lipFDR, msl3DF/msl3DR pks2DF/pks2DR and fadD21DF/ fadD21DR primers, respectively, using M. tuberculosis chromosomal DNA as the template. In each case, the 59 end of the forward primer (lipFDF, msl3DF, pks2DF and fadD21DF) was labeled with [c-32 P]ATP, with T4 polynucleotide kinase. Before the DNA binding reaction, PhoP-P was obtained as described above and dialyzed against a buffer containing 20 mM Tris pH 8, 1 mM EDTA, 200 mM NaCl, 50% glycerol. DNase I footprinting reactions were performed as previously described [25]. Briefly, 0 to 15 pmol of PhoP-P was mixed with 0.2 pmol of DNA and incubated with DNase 1 at room temperature (,24uC) for 1 min. The samples were analyzed by electrophoresis on a 6% polyacrylamide gel containing 7 M urea. Maxam and Gilbert sequencing ladders (G+A) were also loaded on the same gel.

Sequence analysis
We narrowed down the DNA sequences potentially recognized by PhoP-P (to ,60 bp) by EMSA, with iterative sequence alignment to DR1 and DR2 (EMBOSS package, ClustalW).
The results were confirmed by footprinting. The resulting sequence datasets were then analyzed further, by two different approaches. In the first, we used a Python script (this study) to search the identified regions for motifs identical and similar (no more than three nucleotides of difference) to those already identified in the upstream region of the PhoP gene [20,26,32]. We then analyzed these regions with the MEME (Multiple Em for Motif Elicitation) program [40], which uses the expectationmaximization (EM) algorithm for iterative improvement of a model of the motif. Finally, we constructed consensus motif sequences with WebLogo 3 [41,42]. The distribution of these consensus sequences in the Mycobacterium tuberculosis genome was analyzed with a Python script (this study) by scanning the 800 bp regions immediately upstream all M. tuberculosis H37Rv genes.