The whole set of the constitutive promoters recognized by four minor sigma subunits of Escherichia coli RNA polymerase

The promoter selectivity of Escherichia coli RNA polymerase (RNAP) is determined by the sigma subunit. The model prokaryote Escherichia coli K-12 contains seven species of the sigma subunit, each recognizing a specific set of promoters. For identification of the “constitutive promoters” that are recognized by each RNAP holoenzyme alone in the absence of other supporting factors, we have performed the genomic SELEX screening in vitro for their binding sites along the E. coli K-12 W3110 genome using each of the reconstituted RNAP holoenzymes and a collection of genome DNA segments of E. coli K-12. The whole set of constitutive promoters for each RNAP holoenzyme was then estimated based on the location of RNAP-binding sites. The first successful screening of the constitutive promoters was achieved for RpoD (σ70), the principal sigma for transcription of growth-related genes. As an extension, we performed in this study the screening of constitutive promoters for four minor sigma subunits, stationary-phase specific RpoS (σ38), heat-shock specific RpoH (σ32), flagellar-chemotaxis specific RpoF (σ28) and extra-cytoplasmic stress-response RpoE (σ24). The total number of constitutive promoters were: 129~179 for RpoS; 101~142 for RpoH; 34~41 for RpoF; and 77~106 for RpoE. The list of constitutive promoters were compared with that of known promoters identified in vivo under various conditions and using varieties of E. coli strains, altogether allowing the estimation of “inducible promoters” in the presence of additional supporting factors.


Introduction
The genome of Escherichia coli K-12, the most well-characterized model prokaryote, contains a total of more than 4,500 genes, which are transcribed by a single species of the RNA polymerase (RNAP).The intracellular concentration of RNAP is, however, approximately 2,000 molecules per genome, which is less than the total number of genes or operons [1][2][3].The pattern of genome expression is therefore determined by the selective distribution of a limited number of RNAP within the genome [4,5].For adaptation to stressful environments, the pattern of genome transcription is, however, altered by modulating the promoter selectivity of RNAP through two-step interaction with two groups of the regulatory factor, i.e., 7 species of the sigma factor with promoter recognition activity at the first step [5,6] and then approximately 300 species of the transcription factor (TF) including both protein and nucleotide factors at the second step [4,5,7,8].For understanding the genome regulation at molecular level, therefore, three kinds of the basic knowledge are absolutely needed for both all the sigma and TF factors [8,9]: (1) the whole set of regulatory target promoters, genes or operons under the control of each regulatory factor; (2) the binding affinity of the test regulatory protein to target DNA; and (3) the intracellular concentrations of the functional forms of each regulatory protein [note that the activity of TF is often controlled by effector ligands or protein modification such as phosphorylation].Once we get these three lines of knowledge, we will be able to predict the pattern of genome transcription.
After the complete genome sequencing of E. coli K-12, its transcription pattern or transcriptome in vivo has been analyzed for various E. coli wild-type and mutant strains growing under various stress conditions, including niches within host animals, using modern technologies such as the microarray system [10,11].The localization of RNAP and TFs on the genome was also analyzed by using ChIP-chip system [12][13][14].More recently microarray was replaced by direct sequencing of RNAs [15][16][17] or mapping of transcription start sites [18,19].These data are assembled in the databases such as RegulonDB [20,21] and EcoCyc [22,23].The huge accumulation of background knowledge is absolutely needed for understanding the regulation mechanism of genome transcription as a whole in a single organism, and thus at this stage, E. coli is reassessed as the model organism.The binding sites of RNAP and TF identified in vivo using these modern techniques, however, do not represent the whole set of their binding sites because: i) their binding to regulatory sites is often interfered by other DNA-binding proteins, thereby masking their binding target sequences by antagonistic inhibitory proteins [8,9,24]; and ii) in the case of activator-dependent transcription, their binding to targets depends on the simultaneous presence of supporting factors [8,9,25].Under the in vivo situations, therefore, it is in principle impossible to obtain the whole set of binding sites for both RNAP and TFs.In addition, the transcription-related data listed in the databases include different levels of accuracy.For instance, a number of TF-binding sites are estimated in silico relying on the consensus sequences that often include the inaccurate prediction.Another serious problem is originated from the use of various E. coli strains with different genetic background and of different culture conditions used in each experiment (for details see Discussion).
In order to avoid the problems associated with these in vivo experiments, we then decided to employ the in vitro approaches.For identification of the binding sites of RNAP and TFs, we developed the Genomic SELEX system [26] and successfully employed for search of regulatory targets for a number of TFs [8,9].We also employed the Genomic SELEX for mapping of promoters.As described in the previous report [27], we identified a total of 2,071 sites on the E. coli K-12 genome of binding of RNAP holoenzyme containing RpoD (σ 70 ), the major sigma for transcription of most of the growth-related genes, and mapped the location of "constitutive promoters" that are recognized by RpoD holoenzyme alone in the absence of other DNAbinding proteins [Note that the "constitutive promoter" is defined as the promoter that is recognized by RNAP alone in the absence of supporting factors while the promoters that are detected only in vivo are defined as the "inducible promoters", supposedly under the support of accessory regulatory factors].
Besides this major house-keeping RpoD sigma (σ 70 ), E. coli K-12 contains six alternative minor sigma factors, i.e., nitrogen-regulated gene-specific RpoN (σ 54 ), stationary-phase nutrient-starvation specific RpoS (σ 38 ), heat-shock response-specific RpoH (σ 32 ), flagellar-chemotaxis specific RpoF (σ 28 ), extra-cytoplasmic stress-response RpoE (σ 24 ), and iron-starvation specific FecI (σ 28 ) [4][5][6].In this study, we identified the list of constitutive promoters for four minor sigma factors, RpoS, RpoH, RpoF and RpoE.Since RpoN sigma requires an additional TF such as NtrC for promoter binding, the set of promoters recognized by RpoN sigma differs depending on the species of collaborative TF.The list of promoters recognized by RpoN will be described elsewhere.On the other hand, FecI sigma is rather a unique sigma that recognizes only a specific target of the gene for fecA encoding transport of ferric citrate [28].Thus, these two sigma factors, RpoN and FecI, are not included in this report.The list of constitutive promoters herein described provides the fundamental catalogs for the promoters recognized by the four minor sigma factors alone.The data described in this report will be deposited into TEC (Transcription Profile of Escherchia coli) database (https://shigen.nig.ac.jp/ecoli/tec/) [9].The data of each minor sigma will be shown by ordering the sigma name (RpoS, RpoH, RpoF or RpoE) [https://shigen.nig.ac.jp/ecoli/tec/tfmap].For details follow the instruction in TEC [9].

Genomic SELEX screening for the constitutive promoters
The constitutive promoters are transcribed in vitro by the RNA polymerase holoenzyme alone in the absence of supporting factors.In order to identify the whole set of constitutive promoters on the entire genome of E. coli K-12 W3110, we performed a mass-screening in vitro of the whole set of sequences that are recognized by the reconstituted holoenzymes, each containing only one specific minor sigma factor.The sigma-free core enzyme was prepared by passing the purified RNA polymerase three times through phosphocellulose column chromatography in the presence of 5% glycerol, the stabilizer of holoenzyme complexes in the storage buffer [29].The level of remaining sigma subunits was less than 0.1%, if any, as detected by both protein staining and immuno-staining against each of all seven species of sigma factors (RpoD, RpoN, RpoS, RpoH, RpoF, RpoE and FecI).The stoichiometry between core enzyme subunits was also checked by immune-staining with antibodies against the core subunits, RpoA, RpoB, RpoC and RpoZ.The holoenzymes fully saturated with each sigma subunit were reconstituted by mixing this sigma-free core enzyme and 4-fold molar excess of purified sigma factors, RpoS, RpoH, RpoF and RpoE.Since these sigma subunits alone are unable to bind to promoter DNA, the presence of excess sigma does not interfere with the function of RNAP holoenzymes.For the identification of DNA sequences that are recognized by each holoenzyme, we employed the Genomic SELEX screening system [26], in which a library of E. coli genome DNA fragments of 200-300 bp in length was used instead of synthetic oligonucleotides with all possible sequences used in the original SELEX method [30][31][32].
The multi-copy plasmid library of 200-300 bp-long random DNA fragments was constructed from the E. coli K-12 W3110 genome [26].The library used in this study contained 6.5-fold molar excess of the entire genome, and thus a single and the same sequence might be included in 6 different overlapping segments on average, thereby increasing the resolution of mapping of SELEX fragments.In each experiment of Genomic SELEX screening, the mixture of genome DNA fragments, which was regenerated by PCR from the genome DNA library, was mixed with 2-fold molar excess of the reconstituted each RNAP holoenzyme, and subjected to Genomic SELEX screening.DNA-holoenzyme complexes formed were recovered using the anti-RpoC antibody, which gave the highest level of RNAP recovery among all the anti-core subunit antibodies.RNA polymerase-associated DNA was isolated from the antibody precipitates, amplified by PCR, and subjected to next cycles of SELEX.After repeated SELEX screening, the final products of holoenzyme-bound DNA fragments were subjected to mapping on the genome using a DNA tilling microarray (Oxford Gene Technology, Oxford, UK) [14].The binding intensity was measured as the ratio of holoenzyme-bound DNA labeled by Cy3 against original library DNA labeled by Cy5 on an array and plotted along E. coli genome about each holoenzyme.On the DNA tilling array used, the 60 b-long probes are aligned along the E. coli genome at 105 bp-intervals, and therefore approximately 300 bp-long SELEX fragments should bind to two or more consecutive probes.This criterion was employed to avoid the background noise of non-specific binding of holoenzyme-bound DNA fragments to the tilling array [note that peaks showing hybridization to only a single probe was judged as a false-positive noise].
The binding sites were classified into two groups, one 'within spacers' and another 'inside genes'.The binding sites on 'within spacers' were further classified into 3 types; type-A spacer located between bidirectional transcription units, type-B spacer located upstream of one transcription unit but downstream of another transcription unit, and type-C spacer located downstream of both transcription units.Based on the transcription direction of flanking genes, the total number of the constitutive promoters was predicted to range between the minimum [number of type-A spacer plus number of type-B spacer] and the maximum [number of type-A spacer x 2 plus number of type-B spacer].The intragenic binding site was referred to type-D site.The height of binding intensity identified by SELEX-chip system is generally in good agreement with the number of clones identified by SELEX-clos (cloning-sequencing) system, indicating that these two parameters correlate with the binding affinity of test TF to DNA [4,5,8,9].

The whole set of constitutive promoters for the stationary-phase sigma RpoS
In laboratory culture of E. coli, cell growth enters into the stationary phase mainly due to the limited availability of nutrients.Upon entry into the stationary phase, the pattern of genome expression is markedly altered by turning down the growth-related genes and instead up-regulation of the stress-response genes.In switching the transcription pattern, the stationary-phase specific minor sigma RrpoS is involved [33,34].The rpoS gene is not essential for growth under non-stress conditions, but strains carrying mutations affecting rpoS activity are extremely sensitive to environmental stresses.As in the case of other sigma factors, RpoS interacts with RNAP core enzyme and modulates its promoter recognition specificity so as to recognize a specific but large set of genes.
As noted above, the set of genes identified in vivo include a number of genes under the indirect control of RpoS.On the other hand, some target promoters of RpoS are masked in vivo due to competitive interference by other DNA-binding proteins.In order to identify the constitutive promoters directly recognized by RpoS in the absence of other DNA-binding proteins, the Genomic SELEX screening in vitro was performed using the reconstituted RNAP RpoS holoenzyme.The sequences with binding affinity to the RpoS holoenzyme formed a number of peaks along the entire E. coli genome (Fig 1).Location of peaks was aligned along the map of E. coli K-12 genome (Table 1).By setting the cut-off level of 3.0 fold-higher intensity over the background of original library DNA, a total of 218 peaks were identified, of which 125 (67%) are located within intergenic spacers and 73 (33%) are inside of open reading frames (Table 2).Since the majority of hitherto identified promoters are located within spacers and generally upstream of open reading frames, detailed search for the constitutive promoters was focused on these 125 spacer peaks.These spacers can be classified into three types: 50 peaks are located within type-A spacer between bidirectional transcription units; 79 peaks are located within type-B spacers located upstream of one transcription unit but downstream of another transcription unit; and 16 peaks are located within type-C located spacers downstream of both transcription units.Based on the transcription direction of flanking genes, the total number of RpoS constitutive promoters was predicted to range between minimum 129 (50 type-A plus 79 type-B) and maximum 179 (50x2 type-A plus 79 type-B) (Table 2).Type-A spacers should contain two promoters for bidirectional transcription, at least one of which should be RpoSdependent promoter.The RpoS holoenzyme-binding sites identified in a total of 50 type-A spacers should represent promoters for one or both of bidirectional transcription.
Up to the present, two general approaches have been employed to define the RpoS regulon: the proteome analysis using two-dimensional gels of whole cell lysates [35]; and the transcriptome analysis using ChIP-chip or ChIP-Seq systems [37][38][39].These studies altogether indicated that RpoS regulates, directly or indirectly, 10% (approximately 500 genes) of the E. coli genes, of which only about 140 genes were predicted to be under the direct control in vivo of RpoS [38].The total number of RpoS promoters (or the transcription initiation sites) listed in the current RegulonDB database is as many as 164 [note that all these promoters were detected in vivo].Of which 21 were identified by setting the cut-off level at 3.0 (Table 1, marked by asterisk), indicating that only these promoters represent the constitutive promoters and the majority of other known RpoS promoters represent the inducible promoters that are expressed only under the support of regulatory factors.Genomic SELEX analysis identified minimum 129 and maximum 179 RpoS constitutive promoters including 21 known RpoS-dependent promoters (Table 2).The highest peak (390-fold higher than the background of original library alone) was located at the 5'-proximal region of the purT gene (Fig 1), which encodes a bifunctional enzyme with both phosphoribosylglycinamide formyltransferase using formate (the third-step reaction of purine nucleotide synthesis) and acetate kinase for the synthesis of acetylphosphate (AcP).AcP might be utilized as the general phosphate donor for phosphorylation of most of the stress-response TCS (two-component system) response regulators under stressful conditions.A high-level peak (154-fold higher than the background of DNA library) was detected upstream of the artPIQM operon encoding L-arginine ABC transporter.This promoter was also identified in vivo to be RpoS dependent [36] (Table 1).High-level binding of the RpoS holoenzyme was also identified upstream of the cydAB operon encoding cytochrome oxidase for anaerobic respiration, and the hipBA operon encoding anti-toxin-toxin pair for control the persistence (Fig 1 and Table 1).RpoS-dependent constitutive promoter(s) also exists upstream of the nanCMS operon (N-acetylneuraminic acid transport and utilization)   and/or the fimB gene (regulator for fimA encoding fimbrin, the major type-1 pili) (Fig 1 and Table 1).Noteworthy is that most of the RpoS-dependent promoters listed in the current databases might be those under the indirect control of RpoS [8,9,27].Otherwise a set of RpoSdependent promoters, designated as the inducible promoters, might be activated in the presence of additional supporting factors.
Using the newly constructed collection of E. coli promoters expressing two-fluorescent reporters, one attached to the test promoter and another to the reference promoter, we performed a systematic quantitative search in vivo for E. coli promoters that are activated in the stationary phase [39].The activity of RpoS-dependent promoters was measured at various growth phases under various growth conditions.The results indicated that the constitutive promoters exhibited low but steady-state activity while the inducible promoters generally showed high activity during the transition from exponential growth to stationary phase.
The RpoS regulon is involved in not only cell survival in the stationary phase, but also in cross protection against various stresses, including nutrient starvation, osmotic stress, acid shock, cold shock, heat shock, and oxidative DNA damage [33,34].Beyond entry into stationary phase, E. coli forms aggregates or biofilms that are morphologically and physiologically distinct from cells of planktonic growth.This requires coordinated production of an extracellular matrix of polysaccharide polymers and protein fibers that facilitate cell aggregation and adhesion to solid surface.The genes involved in biofilm formation and transformation into persister cells were included in the list of RpoS constitutive promoters [40,41].

The whole sets of constitutive promoters for heat-shock response sigma RpoH
When E. coli cells are exposed to higher temperature, a set of heat-shock proteins (HSPs) is markedly and transiently induced.Heat shock-induced proteins (HSPs) play major roles in controlling the structure and function of various proteins, including protein folding, assembly, transport, repair and degradation during normal growth as well as under stress conditions [42,43].The heat-shock response is a cellular protective system for maintenance of protein homeostasis.The set of HSPs include the GroEL (HSP60) and DnaK (HSP70) chaperones and

(67%)
RNAP holoenzyme was reconstituted from the sigma-free core enzyme and 4-fold molar excess of each sigma subunit.The binding site of each holoenzyme on the genome of E. coli K-12 W3110 was determined in vitro using the improved Genomic SELEX screening system.Details of the experimental procedures are described previously [26].The number of constitutive promoters were estimated based on the location of holoenzyme-binding sites.The number of constitutive promoters recognized by RpoD holooenzyme were described in the previous report [27]. https://doi.org/10.1371/journal.pone.0179181.t002 the Lon and the Clp proteases.RpoH is specifically required for expression of the genes encoding a set of HSPs as identified by proteome [44,45] and also by transcriptome analyses [46].Genome-wide transcription profiling of the regulatory targets of RpoH was identified under the moderate induction of a plasmid-borne rpoH gene under defined, steady state growth conditions [47].A total of 126 genes were influenced in the absence or in the over-expression of RpoH, which are organized in 85 operons.The set of genes identified in vivo by changing the level of RpoH include a large number of indirect targets, which are affected in response to the changes in the level of direct target [8,9,27].The total number of RpoH promoters (or the transcription initiation sites) listed in the current RegulonDB database is as many as 322, but the majority of RpoH targets are predicted by the computational analysis using the consensus sequence that was predicted based on a few experimentally identified RpoH promoters.We isolated RpoH protein for the first time and confirmed its recognition in vitro of the known HSP gene promoters [48].Since then no serious examination in vitro has been performed to identify the RpoH function and it regulatory targets.To get insights into the regulatory role of RpoH sigma, we then performed in this study the Genomic SELEX screening using the reconstituted RNAP RpoH holoenzyme.By setting the cut-off level of 3.0 fold higher than the background of original library DNA, a total of 133 RpoH holoenzyme-binding peaks were identified (Fig 2 and Table 3), of which 107 (80%) are located within intergenic spacers and 26 (20%) are inside of open reading frames (Table 2).Since the majority of hitherto identified promoters are located within spacers, detailed search for the constitutive promoters was focused on the total of 107 peaks within spacers.The spacers containing RpoH holoenzymebinding sites were also classified into three types (Tables 2 and 3 for the whole list): 41 peaks are located within type-A spacer; 60 peaks are located within type-B spacers; and 6 peaks are located within type-C spacers.Based on the transcription direction of flanking genes, the total number of RpoH constitutive promoters was predicted to range between minimum 101 (41 type-A plus 60 type-B) and maximum 142 (41x2 type-A plus 60 type-B) (Table 2).
Among a total of 322 RpoH promoters (or the transcription initiation sites) listed in Regu-lonDB database, 20 were identified setting the cut-off level at 3.0 (Table 3, marked by asterisk).The majority of RpoH promoters in the database were suggested to belong to the inducible promoters that are expressed only under the support of other positive regulatory factors.Otherwise these RpoH promoters might represent the inaccurate prediction as note above.Genomic SELEX analysis identified minimum 100 and maximum 140 RpoH constitutive promoters including 18 known RpoH dependent promoters (Table 2).The highest peak was 20-fold intensity that was detected on promoter region of the ybeD gene, which encodes a conserved protein of unknown function under regulation of RpoH (Fig 2) [49], followed by highlevel peaks at the aspA-fxsA and the rlmJ-yhbY intergenic regions.The fxsA gene encodes an inner membrane protein, which is involved in sensitivity control to bacteriophage T7 [49].
The rlmE gene encodes 23S rRNA 2'-O-ribose U2552 metyltransferase, and has been proposed to carry RpoH-dependent promoter [50].The regulatory target of RpoH sigma identified by Genomic SELEX expands to a set of genes related to varieties of stress-response genes beyond the HSP genes.In fact, the genes for response to environmental insults such as ethanol, alkaline pH, and hyperosmotic shock and the genes for proteolysis and cell division have been indicated under the control of RpoH.The set of RpoH-regulon genes thus identified in vivo, however, vary depending on the culture conditions.
The whole sets of constitutive promoters for the flagella-chemotaxis sigma RpoF The bacterial flagellum is a complex organelle consisting of three distinctive structural parts, the basal body, the hook and the filament [51].The synthesis, assembly and function of the flagellar and chemotaxis system require the expression of more than 50 genes, which are divided into three temporally regulated transcriptional classes based on the hierarchy of expression order: class-I (early), class-II (middle), and class-III (late) [52,53].The class-1 (early) consists of a single operon including two genes, flhD and flhC, each encoding transcription factor FlhD and FlhC, respectively, which together form a complex, FlhD 2 -FlhC 2 or FlhD 4 -FlhC 2 , that activates transcription of a set of class-2 (middle) genes, including both the rpoF sigma gene (renamed fliA) and the flgM gene encoding the anti-RpoF factor [51,52].RpoF is the sigma factor for flagellar chemotaxis, which recognizes the promoters of motility and flagellar synthesis genes.The regulatory target of RpoF in Salmonella was identified to include a set of genes that were classified into the class-3 operons of the flagella regulon [54,55].More than 30 genes have been proposed to carry promoters that are under the control of RpoF sigma, including a set of the structural genes for flagella formation, and the chemotaxis genes encoding sensor of environmental signals affecting the motility control [54,56].With use the combination of ChIPchip, ChIP-seq and RNA-seq systems, a more comprehensive screening was recently performed for identification of the regulatory targets of RpoF sigma in E. coli [57].A total of 52 RpoF-binding sites were identified in vivo on the genome of exponentially growing E. coli K-12 MG1655 cells in a rich LB medium, with a considerable level of over-lapping with the hitherto identified target genes of the RpoF regulon.The total number of RpoF promoters (or the transcription initiation sites) listed in the current RegulonDB database is as many as 144, which have been identified in vivo using ChIP-chip and ChIP-RNA Seq analyses.Most of the targets predicted by the in vivo data, however, represent those indirectly affected upon knockout of the rpoF gene or over-production of RpoF.
We then performed the Genomic SELEX screening in vitro for search of the direct target promoters, genes and operons under the control of RpoF using the reconstituted RNAP RpoF holoenzyme.By setting the cut-off level of 4.0 fold higher than the background of original library DNA, a total of 105 RpoF holoenzyme-binding peaks were identified (Fig 3 and Table 4), of which 37 (35%) are located within intergenic spacers and 68 (65%) are inside of genes (Table 2).One unique feature of RpoF holoenzyme is its high-level (65%) binding to inside of open reading frames of a number of genes.A high-level (60%) of RNAP binding was also identified for RpoD holoenzyme [27].The identification of the promoter-like sequences inside these genes awaits further analysis.The spacers containing RpoF holoenzyme-binding sites were also classified into three types (Tables 2 and 4 for the whole list): 7 peaks are located within type-A spacer; 27 peaks are located within type-B spacers; and 3 peaks are located within type-C spacers.Based on the transcription direction of flanking genes, the total number of RpoF constitutive promoters was predicted to range between minimum 34 (7 type-A plus 27 type-B) and maximum 41 (7x2 type-A plus 27 type-B) (Table 2).The total number of RpoF promoters (or the transcription initiation sites) listed in the current RegulonDB database is as many as 144.Of which 14 were identified setting cut-off level at 4.0 (Table 4, marked by asterisk), indicating that these promoters are constitutive promoters and the majority of known promoters represent the inducible promoters that are expressed only under the support of positive regulatory factors.
The highest peak was 46-fold intensity detected on promoter region of rpoF itself (Fig 3), indicating the autoregulation as already suggested [58].A high-level peak was also identified upstream of the flgK gene, which encodes flagellar hook-filament junction protein that connects the filament to the hook, and its transcription has been shown in vitro under the direct control of RpoF [59].The flgM gene encodes the anti-sigma factor for RpoF [55].FlgM forms a complex with RpoF, thereby inactivating its sigma function but protects its degradation by the Lon protease for preservation [60].

The whole set of constitutive promoters for extra-cytoplasmic stress response sigma RpoE
The bacterial cell envelope is a dynamic compartment, changing its structure and function in response to environmental conditions.Accordingly, the integrity of envelope is maintained through frequent modulation of its composition and components.The minor sigma factor RpoE plays a central role in this process, by controlling the selective expression of envelope components [61].The regulatory targets have been estimated after proteome and transcriptome analyses in vivo [62][63][64].The activity of RpoE is negatively regulated by a membranebound anti-sigma factor RseA, which sequesters RpoE under unstressed conditions.Within membrane, RseA is associated at its C-terminal domain with a periplasmic protein RseB,   which senses misfolded proteins for release and activation of RpoE from RpoE-RseA complexes [62,65].The total number of RpoE promoters (or the transcription initiation sites) listed in the current RegulonDB database is as many as 518, of which most are identified by computational analyses based on the consensus sequence of RpoE promoters without experimental analysis.After SELEX screening as noted below, most of these RpoE promoters must be inaccurate estimation due to the error in the consensus sequence.The Genomic SELEX screening system was employed as a short-cut approach for identification of the RpoE regulon.Previously we purified RpoE and examined its promoter selectivity using an in vitro transcription system [66].Using this purified RpoE, we performed SELEX screening.By setting the cut-off level of 4.0 fold against original library DNA, a total of 126 RpoE holoenzyme-binding peaks were identified (Fig 4 and Table 5), of which 84 (67%) are located within intergenic spacers and 42 (33%) are inside of open reading frames (Tables 2 and  5 for the whole set).Since the majority of hitherto identified promoters are located within spacers, detailed search for the constitutive promoters was focused on the total of 84 peaks within spacers.The spacers containing RpoE holoenzyme-binding sites were also classified into three types (Table 2): 29 peaks are located within type-A spacer; 48 peaks are located within type-B spacers; and 7 peaks are located within type-C spacers.Based on the transcription direction of flanking genes, the total number of RpoE constitutive promoters was predicted to range between minimum 77 (29 type-A plus 48 type-B) and maximum 106 (29x2 type-A plus 48 type-B) (Table 2).Within the set of constitutive promoters identified by setting the cut-off level at 4.0, a total of 19 known RpoE promoters were identified (Table 5, marked by asterisk).The majority of known promoters represent the inducible promoters that are expressed only under the support of positive regulatory factors.
Genomic SELEX analysis identified minimum 77 and maximum 106 for the RpoE constitutive promoters.The highest peak (55-fold intensity) was detected in the promoter region of rybB, which encodes a small regulatory RNA for expression control of some outer membrane  proteins.The rybB promoter is known to be regulated by RpoE (Fig 4) [67].The second highest peak was located on the luxS-micA intergenic region.The micA gene again encodes a small regulatory RNA that regulates expression of many genes including outer membrane proteins [68,69].The micA promoter is also established under the control of RpoE.These sRNAs control the repair of damages in the outer membrane that took place in response to envelope stress [70,71].High-intensity peaks were detected on some other known RpoE-dependent promoters such as rpoH, pgrR and ycjY [72,73].Among the total of 136 binding sites of RpoE holoenzyme, 36 overlaps with that of RpoH holoenzyme.Most of these overlapping sites are related to the genes that are expressed under envelope stresss or heat-shock stress.
The intracellular levels of all seven sigma factors in E. coli K-12 W3110 In this study, we determined the constitutive promoters for the four minor sigma factors, RpoS, RpoH, RpoF and RpoE, from E. coli K-12 W3110.These promoters are recognized by the RNAP holoenzyme containing each sigma in the absence of other supporting factors.
Using the mixed reconstitution in vitro of RNAP holoenzyme in the presence of all seven sigma factors, we estimated the binding affinity of each sigma to the common core enzyme, the order being RpoD (highest) > RpoN > RpoF > RpoH > FecI > RpoE > RpoS (lowest) [74].Once we get the knowledge of intracellular concentrations of these sigma factors, it should be possible to predict the expression levels of the regulatory target genes and operons under the control of the constitutive promoters of each sigma factor.Including these four minor sigma factors, we then determined the intracellular concentrations of all seven sigma subunits.For this purpose, antibodies were made against each of the purified sigma factors that were also used for SELEX screening.E. coli K-12 W3110 type-A was cultivated with shaking at 37˚C in LB medium, and the whole cell lysates were prepared in both exponential growing phase and the stationary phase.By using the quantitative immuno-blotting method and the purified sigma proteins as the reference controls, we measured the concentrations of all seven sigma subunits.The measurement was carried out for two independent cultures, and the immuno-blot analysis was repeated for all the samples.The intracellular concentration of RpoD sigma is maintained at a constant level (on average, 160 fmol/μg total protein) throughout the transition from the     Taking the intracellular concentrations and the binding affinity of sigma to RNAP core enzyme as noted above, we are now able to estimate the level of each RNAP holoenzyme.Noteworthy is that the total number of all seven sigma factors is approximately as many as that of the core enzyme, but the RNAP involved in the transcription cycle or the elongation of RNA chains is considered to lack sigma subunit, the RNAP not involved in transcription should be stored as the holoenzyme forms.

Discussion
Seven species of the sigma subunit exist in E. coli K-12, the widely used model E. coli strain.
Here we identified the whole set of constitutive promoters for four minor sigma factors, RpoS, RpoH, RpoF and RpoE, by using the Genomic SELEX system.Up to the present time, the binding sites of RNAP and TF have been identified in vivo using the high-throughput systems such as ChIP-chip, ChIP-seq and RNA-seq systems.Even using these modern techniques, however, it is in principle impossible to obtain the whole set of binding sites for both RNAP and TFs because of the competition with other DNA-binding proteins in binding to DNA targets [8,9,24] [note that E. coli contains more than 500 DNA-binding proteins [77], and because the binding of RNAP and TFs often depends on the supporting factors for binding to targets [8,9,25].The computational approaches in silico have also been employed to identify the target binding sequences of RNAP and TFs, relying on the consensus sequences predicted based on the known target sequences listed in the databases such as RegulonDB [20,21] and EcoCyc [22,23] (Table 2) (Details of the promoter list and the evidence are in Supplemental Information: S1  ).The consensus sequences, however, often include the inaccurate non-target sequences due to the lack of experiments for confirmation or some regulators recognize wide-varieties of the binding sequences [8,9,27,78].Another serious problem associated with in vivo approaches is the difference in genetic background of E. coli strains used.Up to the present, the complete genome sequence has been determined for more than 1,000 different E. coli strains, allowing the prediction of about 3,000 core genes for all strains but at least one third of the total genes on the E. coli genome are different among the hitherto sequenced E. coli genome [79].The difference in genetic background exists even in the RNAP and TF genes and between not only different strains but also different stocks of the same E. coli strain.For instance, the difference in the gene encoding the stationary-phase sigma RpoS was first identified between laboratory stocks of a single and the same E. coli K-12 W3110 strain [80].The widely used databases such as RegulonDB [20,21] and EcoCyc [22,23] include huge collections of useful data of transcription in vivo, but care should be taken to use these data for theoretical prediction of transcription regulation, in particular, as to the bacterial strains and culture conditions used in each experiment.
In this study, we performed the SELEX screening for the constitutive promoters that are recognized in vitro by four minor sigma factors, RpoS, RpoH, RpoF and RpoE, but in the absence of repressors, activators and other DNA-binding proteins.It should be noted that all the proteins and promoters used in this study are prepared from a single and the same E. coli K-12 W3110.Here we also determined the intracellular concentrations of all seven sigma factors in both growing and stationary-phase cells of E. coli K-12 W3110.These data altogether will be used for our ultimate purpose of the prediction of genome expression under a given condition.The list of constitutive promoters for the minor sigma factors will be deposited into TEC database (Transcriptional profile of Escherichia coli database: https://shigen.nig.ac.jp/ ecoli/tec/top/) [9].

Bacterial strains and plasmids
E. coli K12 W3350 type-A containing the full-set of seven sigma factors [80] was used for purification of RNA polymerase and the template DNA for Genomic SELEX screening of RpoS, RpoH, RpoF and RpoE promoters.E. coli BL21(DE3) was used for the expression and purification of sigma and core enzyme subunit proteins.Expression plasmids for the core enzyme subunits (pRpoA, pRpoB and pRpoC) and all seven sigma subunits (pRpoD, pRpoN, pRpoS, pRpoH, pRpoF, pRpoE and pFecI) were constructed by ligating the respective coding sequences, which were prepared by PCR amplification of the E. coli K12 W3350 type-A genome DNA as template, into pET21 expression vector essentially according to the standard procedure used for expression of all sigma and all transcription factors in this laboratory [74,81].
Purification of core RNA polymerase RNAP was purified from log-phase cells of E. coli K-12 W3350 by the standard procedure [29].Separation of the native core from holoenzymes was performed by passing the purified RNAP through P11-phosphocellulose column in the presence of 50% glycerol.To remove trace amounts of the core enzyme-associated sigma factors, the purified RNAP in the storage buffer containing 50% glycerol was dialyzed against the same buffer but containing 5% glycerol and fractionated by P11-phosphocellulose column chromatography in the presence of 5% glycerol.The level of remaining sigma factors was less than 0.1%, if any, as checked of SDS-PAGE gels by both protein-staining with a silver reagent and immuno-staining with antibodies against each of seven sigma factors.

Purification of core and sigma subunits
The core enzyme subunits (RpoA, RpoB, RpoC and RpoZ) were expressed using the respective expression plasmids and purified by two cycles of column chromatography through DEAE (DE52) and P11-phosphocellulose [29].Sigma subunits were expressed and purified by ionexchange column chromatography through DE52 and P11 followed by Sephacryl S-300 gel filtration column.The purified sigma and core subunit proteins were more than 99% pure as judged by both protein-staining and immuno-staining of SDS-PAGE gels.

Preparation of antibodies
Antibodies against sigma factors and core enzyme subunits were produced in rabbits by injecting purified sigma proteins [75,76].Antibodies against each RNA polymerase protein were produced in two rabbits, and after examination of antibody activity using immune-blot analysis, the batch of higher activity was used in this study.Anti-RpoD, anti-RpoS, anti-RpoN, anti-RpoH, anti-RpoF, anti-RpoE, anti-FecI and anti-RpoC used in this study did not cross-react with each other.These antibodies were produced in the Nippon Institute for Biological Science (One, Tokyo) and the Animal Laboratory of Mitsubishi Chemical Medience Co. (Uto, Kumamoto, Japan).

Genomic SELEX screening of RNA polymerase holoenzyme binding sequences
The Genomic SELEX screening was carried out under the standard procedure [26,27].This method was developed by improvement of the original SELEX methods [30][31][32].A mixture of DNA fragments of the E. coli K-12 W3110 genome was prepared after sonication of purified genome DNA, and cloned into a multi-copy plasmid pBR322 at EcoRV site.In each SELEX screening, the DNA mixture was regenerated by PCR using a pair of primers with the flanking sequences of pBR322 EcoRV.For SELEX screening, 5 pmol of the mixture of DNA fragments and 10 pmol of RNA polymerase holoenzyme were mixed in a binding buffer (10 mM Tris-HCl, pH 7.8 at 4˚C, 3 mM magnesium acetate, 150 mM NaCl, and 1.25 mg/ml bovine serum albumin) and incubated for 30 min at 37˚C.The DNA-RNA polymerase mixture was treated with anti-RpoC antibody and DNA fragments recovered from the complexes were PCR-amplified and subjected to next cycle of SELEX for enrichment of RNA polymerase-bound DNA fragments.
For SELEX-chip analysis, DNA samples were isolated from the DNA-protein complexes at the final state of SELEX, PCR-amplified and labeled with Cy5 while the original DNA library was labeled with Cy3.The fluorescent labeled DNA mixtures were hybridized to a DNA microarray consisting of 43,450 species of 60 b-long DNA probe, which are designed to cover the entire E. coli K-12 MG1655 genome at 105 bp interval (Oxford Gene Technology, Oxford, UK) [14].The fluorescent intensity of test sample at each probe was normalized with that of the corresponding peak of original library.After normalization of each pattern, the Cy5/Cy3 ratio was measured and plotted along the E. coli K-12 MG1655 genome.The gene organization is almost identical between two well-characterized E. coli K-12 strains except for a long-range inversion between the rrnD and rrnE operons.

Immuno-blot analysis for determination of sigma levels
For the measurement of sigma factors in E. coli K-12 W3110, a quantitative Western blot analysis was employed with the anti-sigma antibodies as employed in the previous studies [66,75,76].In brief, cell lysates were treated with a SDS (sodium dodecyl sulfate) sample buffer (50 mM Tris-HCl, pH 6.8, 2% SDS, 1% 2-mercaptoethanol, 10% glycerol, and 0.025% bromophenol blue) and separated on SDS-7.5 or 10% polyacrylamide gels.Proteins in gels were directly electro-blotted onto polyvinylidene difluoride membranes (Nippon Genetics).Blots were blocked overnight at 48C in 3% BSA in PBS (phosphate-buffered saline), probed with the polyclonal antibodies against each sigma factor, washed with 0.5% Tween 20 in PBS, and incubated with goat anti-rabbit immunoglobulin G conjugated with hydroxyperoxidase (Cappel).The blots were developed with 3,3'-diaminobenzidine tetrahydrochloride (Dojindo).Staining intensity was measured with a PDI image analyzer system equipped with a white light scanner.The standard curve for the calculation of each sigma level was prepared from the immuno-blot patterns of increasing concentrations of each sigma factor.Under the standard Western-blot conditions herein employed, the linearity was detected over a 10-fold range at least between 2 and 20 ng sigma proteins.The determination of test sigma proteins subunits was first performed using several different volumes of the cell lysates.Using the optimum volumes of cell lysates to give the sigma concentrations within the linear range of standard curves, we finally repeated the determination of individual sigma factors.

Fig 1 .
Fig 1. SELEX-chip search for RNAP RpoS holoenzyme-binding sequences on the E. coli K-12 genome.The y-axis represents the relative number of RpoS holoenzyme-bound DNA fragments whereas xaxis represents the position on the E. coli K-12 genome, in base pair.The adjacent gene on E. coli K-12 genome of peak position was indicated for high intensity peaks.The peaks located within spacer regions are shown in green color, while peaks located within open reading frames are shown in orange color.The list of RpoS holoenzyme-binding sites is described in Table1.
Genomic SELEX was performed for search of the binding sites of RNAP RpoS holoenzyme.By setting the cut-off level of 3.0, a total of 218 binding sites were identified (see Fig 1 for SELEX pattern), which are aligned along the map of E. coli K12 genome.A total of 125 sites are located within intergenic spacers: 50 within type-A spacers (shown under orange background); and 79 within type-B spacers (shown under green background).The constitutive promoters of RpoS were predicted based on the adjacent genes [note that only the genes next to the RpoS holoenzyme-binding sites are shown] and the gene orientation (shown by arrows in the column of transcription direction).A total of 73 RpoS holoenzyme-binding sites are located inside open reading frames as indicated by the gene symbols shown in RpoS column.*The genes listed in RegulonDB as the regulated targets of RpoS.https://doi.org/10.1371/journal.pone.0179181.t001

Fig 2 .
Fig 2. SELEX-chip search for RNAP RpoH holoenzyme-binding sequences on the E. coli K-12 genome.The y-axis represents the relative number of RpoH holoenzyme-bound DNA fragments whereas xaxis represents the position on the E.coli K-12 genome, in base pair.The adjacent gene on E. coli K-12 genome of peak position was indicated for high intensity peaks.The peaks located within spacer regions are shown in green color, while peaks located within open reading frames are shown in orange color.The list of RpoH holoenzyme-binding sites is described in Table 3. https://doi.org/10.1371/journal.pone.0179181.g002

Genomic
SELEX was performed for search of the binding sites of RNAP RpoH holoenzyme.By setting the cut-off level of 3.0, a total of 133 binding sites were identified (see Fig 2 for SELEX pattern), which are aligned along the map of E. coli K12 genome.A total of 107 sites are located within intergenic spacers: 41 wihin type-A spacers (shown under orange background); and 60 within type-B spacers (shown under green background).The constitutive promoters of RpoH were predicted based on the adjacent genes [note that only the genes next to the RpoH holoenzyme-binding sites are shown] and the gene orientation (shown by arrows in the column of transcription direction).A total of 26 RpoH holoenzyme-binding sites are located inside open reading frames as indicated by the gene symbols shown in RpoH column.* The genes listed in RegulonDB as the regulated targets of RpoH.https://doi.org/10.1371/journal.pone.0179181.t003

Fig 3 .
Fig 3. SELEX-chip search for RNAP RpoF holoenzyme-binding sequences on the E. coli K-12 genome.The y-axis represents the relative number of RpoF holoenzyme-bound DNA fragments whereas xaxis represents the position on the E. coli K-12 genome, in base pair.The adjacent gene on E. coli K-12 genome of peak position was indicated for high intensity peaks.The peaks located within spacer regions are shown in green color, while peaks located within open reading frames are shown in orange color.The list of RpoF holoenzyme-binding sites is described in Table 4. https://doi.org/10.1371/journal.pone.0179181.g003

Genomic
SELEX was performed for search of the binding sites of RNAP RpoF holoenzyme.By setting the cut-off level of 3.0, a total of 105 binding sites were identified (see Fig 3 for SELEX pattern), which are aligned along the map of E. coli K12 genome.A total of 37 sites are located within intergenic spacers: 7 wihin type-A spacers (shown under orange background); and 27 within type-B spacers (shown under green background).The constitutive promoters of RpoF were predicted based on the adjacent genes [note that only the genes next to the RpoF holoenzyme-binding sites are shown] and the gene orientation (shown by arrows in the column of transcription direction).A total of as many as 68 RpoF holoenzyme-binding sites are located inside open reading frames as indicated by the gene symbols shown in RpoF column.*The genes listed in RegulonDB as the regulated targets of RpoF.https://doi.org/10.1371/journal.pone.0179181.t004

Fig 4 .
Fig 4. SELEX-chip search for RNAP RpoE holoenzyme-binding sequences on the E. coli K-12 genome.The y-axis represents the relative number of RpoE holoenzyme-bound DNA fragments whereas xaxis represents the position on the E.coli genome, in base pair.The adjacent gene on E. coli K-12 genome of peak position was indicated for high intensity peaks.The peaks located within spacer regions are shown in green color, while peaks located within open reading frames are shown in orange color.The list of RpoE holoenzyme-binding sites is described in Table 5. https://doi.org/10.1371/journal.pone.0179181.g004

Fig 5 .
Fig 5. Intracellular concentrations of seven sigma factors in E. coli K-12 W3110 type-A strain.E. coli K-12 W3110 type-A strain was grown in LB medium at 37˚C with shaking.Cells were harvested at various times and cell lysates were subjected to the quantitative immuno-blot analysis of all seven sigma factors as described in Materials and Methods.[A] The sigma levels at exponential growth phase; [B] the sigma levels in the stationary phase.https://doi.org/10.1371/journal.pone.0179181.g005

No. Type Map Left Gene Function Left D RpoE D Right Right Gene Function Intensity
Table for RpoS; S2Table for RpoH; S3 Table for RpoF; and S4 Table for RpoE).In particular, more than 80% of RpoE-dependent promoters were predicted in silico (Table 5; and S4 Table