Strong Purifying Selection at Synonymous Sites in D. melanogaster
(A) Overview of the bootstrap method. We sample 4D sites and their nearby (<1 KB apart) short intron pairs with replacement in order to control for linked selection and variation in GC content and mutation/recombination rates between the neutral reference (short introns) and the test set (4D sites). The short intron, 4D pair must have the same nucleotide as their major allele. (B) The folded Site Frequency Spectra (SFS) of observed SNPs from short introns, 4D sites, and the theoretical neutral distribution in a population with constant size. The SNPs were resampled to 130 strains and folded using the minor allele frequency. (C) The ratio of the amount of polymorphism in short introns versus 4D sites in all, conserved, and variable amino acids with standard error bars. Conserved amino acids are those present and identical in the 12 sequenced Drosophila genomes. Variable amino acids are defined as being not conserved according to the above definition. Ten bootstraps were done for each category (all, conserved, and variable) of 4D site. Lifting the restriction on distance and only controlling for GC content in the bootstrap produces identical results as above (not shown). To be conservative, we continued to use the distance restriction in the bootstrap. Note, had we simply taken the density of polymorphism as is without correction of GC content, we would've only seen a 7% drop in the density of polymorphism from short introns to 4D sites (5.58% vs 6.0% segregating in 4D versus short intron sites).