Drosophila Functional Elements Are Embedded in Structurally Constrained Sequences
A) GC gain and GC loss rates around conserved elements. Shown are the average rates of different types of substitutions around CEs (see also Figure S3A). B) GC gain and loss balance. Plotted is the ratio of GC gain rate (combining all substitution types A/T→G/C) and total GC gain and loss rate at varying distance from CEs. C) Coupling between GC gain and loss. Using pairwise alignment of D. melanogaster and D. yakuba, we identified for each genomic distance (X axis) all pairs of loci within this distance that are both diverged between the two species. We then computed the fraction of such pairs that compensate an excess of GC in D. yakuba in one locus with an excess of GC in D. melanogaster in the other locus, out of all pairs of diverged loci. This non-parametric test shows that pairs of opposite GC changing substitutions are spatially more coupled as the distance between them decreases. Gray polygon represents binomial 95% confidence interval. For a parametric version of this test see Figure S3B. D) Frequency of rare alleles is correlated with GC content on a scale of 20 bp. We grouped SNPs according to mutation type and plotted the average frequency of rare allele p(0.01<minor allele frequency<0.05) for different levels of local GC content (20 bp). Binomial 95% confidence intervals are depicted as gray curves. See Figure S3D for analysis that is stratified for larger scale (200 bp) GC content.