Figure 1.
Schematic Representation of the Main Interactions of RNAP with Promoter DNA and Alignment of the σ70 Motifs for Recognition and Binding of −10 (2.4 Region) and −35 Promoter Sequences (4.2 Region) for Representative Eubacteria
Sequence alignments of several σ70 factors from different bacteria reveal four conserved regions that can be further divided into subregions [39]. Only regions 2 and 4 are well conserved in all members of the σ70 family [40–43] and include subregions involved in binding to the core RNAP complex, promoter melting, and recognition of the −10 and −35 promoter sequences (regions 2.4 and 4.2, respectively) [10,40,44]. CLUSTALW was used to generate the alignment with default parameters (http://www.ebi.ac.uk/clustalw) [45].
Figure 2.
Frequency Matrices for the −10 and −35 Motifs of σ70 Promoters in E. coli
This matrix pair (Matrix_18_15_13_2_1.5) was selected for searching across bacterial genomes from a collection of optimized matrices defined for E. coli in [1]. Note that in order to compare these matrices with the canonical patterns (TTGACA and TATAAT), the spacers of 13 bp to 19 bp between the two boxes correspond to the 15 bp to 21 bp reported in the literature, as the TGT triplet is considered as part of the −10 box. Before searching for promoter-like signals, these matrices were calibrated using the noncoding base composition of each target genome.
Table 1.
−10 and −35 Consensi and Corresponding Average Scores for σ70 Promoter-Like Signals in Representative Bacterial Genomes
Table 2.
Density Patterns of σ70 Promoter-Like Signals in Eubacterial Genomes
Figure 3.
Signal Density in Regulatory versus Nonregulatory Regions of Large and Small Eubacterial Genomes
Regulatory regions correspond to the strictly noncoding regions located upstream of a gene start. For the set of genomes selected in this study, the average size of the strictly noncoding upstream regions is 182 bp. For the sake of the graph, the regions were extended to 500 bases upstream and 500 bases downstream of the start of the gene (position 0). Nonregulatory regions include the coding regions and the noncoding regions between convergently transcribed genes. For coding regions, the middle point of a gene was taken as the position 0 and 500 bases upstream and 500 bases downstream of this position were included. For the set of genomes analyzed here, the average size of the convergent regions is 194.5 bp. For the sake of the graph, the end of the 3′ gene was taken as position 0 and 500 bases upstream and 500 bases downstream of this position were included. The number of signals was averaged within intervals of 10 bp.
Figure 4.
Signal Density in Regulatory versus Nonregulatory Regions of M. tuberculosis and M. leprae
M. leprae shows an increase of promoter-like signals in the upstream and downstream regions of pseudogenes relative to the regulatory regions. M. leprae contains over 1,115 recognizable pseudogenes [20]. Both 500 bp upstream and downstream of the start of pseudogenes (position 0) were analyzed for searches of promoter-like signals. All other region definitions and methodology are as in Figure 3.
Table 3.
−10 and −35 Consensi and Average Scores for the Degrading Genome of M. leprae and Its Close Relative M. tuberculosis