Figure 1.
Approach for Detecting Genome Position-Dependent Patterns
(A) Raw sequence-derived data often contain patterns with respect to chromosome position that are not obvious from casual observance. (This example is for the fractional gene density per kilobase for Salmonella enterica serovar Typhi strain CT18.)
(B) Wavelet analysis was used to generate a scalogram showing significant chromosome position-dependent patterns in gene density over varying periodicities. The level of significance of the patterns was determined by randomizing the order of the raw sequence data 200× and recomputing the real and imaginary portions of the Morlet wavelet transform values at each point in the scalogram for each randomization. Regions having an FDR greater than 5% are not displayed (white). The pattern strength for this dataset is 33%.
(C) To facilitate the interpretation of the wavelet scalogram, three examples are shown for the moving averages of the raw data at three different length scales: 1 Mb, 460 kb, and 115 kb. Regions highlighted in red/green indicate significant regions of the scalogram at that scale that lie above/below the mean real transform value.
Figure 2.
Generality of Chromosome Position-Dependent Patterns in Sequence Properties for 163 Prokaryotic Chromosomes
Continuous wavelet scalograms were computed for most prokaryotic chromosomes sequenced through January 2005 to identify patterns in CAI per gene (A), fractional gene density per kilobase (B), and GC/AT content per kilobase (C). The colored portions of the scalogram indicate significant periodic patterns (FDR < 5%). The degree of patterning for each prokaryotic sequence and each parameter (called the fractional pattern strength) was taken as the percentage of the area of the scalogram containing significant patterns. The first column shows the scalograms for the maximally patterned chromosome found for each sequence property. For reference, the second column shows these scalograms for E. coli K-12 MG1655. The third column shows the rank-ordered fractional pattern strengths for the 163 sequenced prokaryotic chromosomes that were analyzed, with E. coli indicated relative to the other chromosomes on each plot.
Table 1.
Descriptive Statistics for Pattern Strengths in GC/AT Content, Gene Density, and CAI across 163 Prokaryotic Chromosomes
Table 2.
Organisms Exhibiting Either very High or very Low Chromosome Position-Dependent Patterns in Sequence-Derived Data
Figure 3.
Correlations between Sequence-Derived Properties for 163 Prokaryotic Chromosomes
(A) Correlation of fractional pattern strength in GC/AT content with chromosome length.
(B) Anti-correlation of fractional pattern strength in GC/AT content with total chromosomal AT%. The correlation coefficients and associated p-values are indicated on each graph.
Table 3.
Correlation between Pattern Strength in CAI and Organism Taxon, Gram Staining, Cell Shape, and the Presence of Known Motility and Nucleoid Proteins
Figure 4.
Correlation of Specific Chromosome Position-Dependent Patterns in E. coli Functional Properties
(A) Wavelet scalograms calculated for gene expression, gene essentiality, and evolutionary retention index were converted to a binary significance matrix by setting each significant point in a scalogram (FDR < 5%) to unity and each non-significant point to zero.
(B) These binary matrices were summed across the three properties listed above to determine chromosome position-dependent patterns that were consistent across the different properties, and the resulting map was color-coded according to how many of the properties shared significant patterns. The red-colored segments indicate the periods and chromosome positions at which all three properties exhibited significant patterns. The averaged data have been normalized such that the mean is zero and the tick marks indicate SDs from the mean value.
(C) Correlation of gene expression, essentiality, and evolutionary retention averaged at a window of 325 kb (650-kb period).
(D) Correlation of gene expression with intragenic codon preferences for two of the major codons encoding leucine (CUG) and arginine (CGU), and anti-correlation of these with preferences for the corresponding minor codons, UUA and AGA, at a moving average of 325 kb. The labels are as described above in (C).
Figure 5.
Overlay Plots of Significant Regions of Wavelet Scalograms for Various E. coli Parameters
(A) Degree of significant pattern overlap in expression, gene density, and codon adaptation in E. coli. Binary matrices corresponding to significant regions of wavelet scalograms (FDR < 5%) for gene expression, CAI, and fractional gene density in E. coli were summed as described in Materials and Methods. A periodic pattern of 600–650 kb can be seen across nearly three-quarters of the chromosome.
(B) Degree of significant pattern overlap sequence-derived DNA-bending parameters in E. coli. Binary matrices corresponding to significant regions of wavelet scalograms (FDR < 5%) for intrinsic curvature, DNAseI sensitivity, protein-induced deformability, propeller twist, stacking energy, and nucleosome position preference in E. coli [12] were summed as described in the text. The white contour lines outline the significant regions of the wavelet scalogram for GC/AT content, thus demonstrating that these parameters are not independent.
Figure 6.
Comparison of E. coli Gene Expression, Essentiality, and Evolutionary Retention at 600–650-kb Length Scale with Experimentally Identified Chromosome Macrodomains [8]
The four shaded regions correspond to four macrodomains identified previously based upon the frequency of recombination events following genetic dissection of the E. coli chromosome. The two unshaded regions correspond to less-structured macrodomains. The traces in the lower panel are exactly as described in Figure 4C. The upper panel is a section of the wavelet scalogram for E. coli gene expression at a 650-kb period. Segments of this wavelet transform trace have been colored to correspond to the experimentally identified chromosome macrodomains.