Drosophila Functional Elements Are Embedded in Structurally Constrained Sequences

Figure 1

Punctuated GC elevation around conserved elements in flies.

A) Nucleotide composition around TSS. Nucleotide frequency (y-axis) is depicted versus the distance to the nearest transcription start sites (TSS) (x-axis), obtained from UCSC genome browser. As previously reported, the frequency of A and T reaches a peak 200 bp upstream (left) to the TSS and decreases to the genomic average near the TSS. Also pronounced are AT asymmetry and GC asymmetry downstream to the TSS. B) Nucleotide composition around DHS. Similarly to A) but depicting nucleotide compositions relative to the center of the nearest DNase I hypersensitive sites (DHSs). DHS sites were defined as TSS distal (distance>1 kb) loci which are densely covered by DNase cutting sites (top 0.992 percentile, n = 2652) in at least one of four embryonic stages (stage number 5, 10, 11 and 14). C) Divergence around DHS sites. Shown are inferred (solid black line) and expected (given our substitution model, gray dashed line) substitution rates as a function of the distance to the nearest DHS sites. Rates were estimated from multiple alignments of 12 drosophila genomes (methods) using an evolutionary model that takes into account nucleotide composition biases and other context effects. The minor increase in expected substitution rate on DHSs (a consequence of higher GC content over DHSs), stands in marked contrast to the observed conservation pattern. D) Size distribution of conserved elements. CEs were identified at single base pair resolution as regions showing at least two-fold reduction in divergence compared to the expected rate. The size distribution of the inferred element is depicted, showing that most elements are smaller than 50 bp. E) Nucleotide compositions around conserved elements (CEs) are plotted over two distance scales (1 kb in top figure, 100 bp in inset). Regional (−300 to 300 bp) increase in GC content is observed in the larger scale, but the pattern is punctuated with high AT content (−20 to 20 bp) in the smaller scale.

