Figure 1.
Comparison of observed and expected −10 motif frequencies in different genomic regions of E. coli.
Distributions were obtained by counting and summing the occurrences of 185 −10 hexamers in sets of 1000 shuffled sequences. This procedure was performed independently for (A) randomly generated, (B) nonfunctional, (C) regulatory, and (D) coding sequences. In each panel, the blue lines represent ± 2 Standard Deviations (SD) from the mean value in the population of shuffled sequences and the red line represents the observed count in the unshuffled E. coli sequence.
Figure 2.
Phylogeny of the 42 bacterial species analyzed.
Phylogeny based on a set of 51 genes present in single copy in most bacterial genomes. Concatenated multiple sequence alignments were used as input for the bayesian tree reconstruction program MrBayes [50]. The tree topology and branch lengths shown correspond to the most likely tree obtained in two independent runs of MrBayes spanning 250,000 generations each. All clades were recovered with 100% posterior probability. The tree is rooted on the branch leading to the hyperthermophiles Aquifex aeolicus and Thermotoga maritima.
Table 1.
Bacterial species analyzed and their general features.
Figure 3.
Patterns of over- and under-representation of −10 motifs across species.
For every species, observed and expected −10 motif frequencies in the three different genomic regions were compared as in Figure 1. Over- and under-representation were considered significant when the observed frequency was beyond ±2 SD from the mean value for shuffled sequences.
Figure 4.
Deviations in the observed counts of −10 motifs in different genomic regions.
NSD values measure the deviation in the observed counts of −10 motifs in numbers of standard deviations from the expected mean in the corresponding genome, and were obtained by comparing observed counts to distributions of motif frequencies in shuffled sequences. Species are ordered according to ascending NSD values for the nonfunctional regions. NSD values in the white band are not significantly different from expectation. The red line separates species with under-representation of −10 sites in nonfunctional regions (left) from those without (right).
Figure 5.
NSD of −10 motif counts in nonfunctional DNA correlate negatively with the number of tRNA genes in the genome.
NSD values are defined as in the text and Figure 4. The red line represents −2 SD from the mean expected value of −10 motif counts in the nonfunctional DNA of the corresponding genome. The green line indicates the average number of tRNA genes in the genomes analyzed (55). The color-coding distinguishes species according to their maximal growth rate (see Table 1); no growth rate estimate was available for Corynebacterium glutamicum (not colored). Black dotted line: regression including all analyzed genomes (r = 0.38 and p<0.02). Black solid line: regression for the subset of genomes with NSD<−2 in nonfunctional DNA (r = 0.63 and p<0.001).