Mutations of Different Molecular Origins Exhibit Contrasting Patterns of Regional Substitution Rate Variation
(A) At each distance (along the X-axis), CpG sites were divided into four bins based on G+C content at that distance from the site (as measured from the G+C content of the 200 bps window centered at that distance; G+C<38: red curve, 38< = G+C<45%: green curve, 45< = G+C<52%: blue curve, G+C>52%: black curve). The proportion of CpG sites mutated in each of these bins is plotted as a function of distance from the site. At distances closer to the CpG site, the rate of substitution in high local-GC bins (black curve) is clearly lower compared to that of low local-GC bins (red curve). This relationship progressively declines as we move farther away from the site, suggesting a distance-decaying relationship between G+C content and CpG substitution rate. In case of GpC sites, we do not observe a distance-decaying effect (see inset). (B) Results of the chi-square test for the independence of the rate of CpG substitution and the G+C content of the windows at each distance, in log scale. The blue line indicates the P-value cutoff of 0.05 [or log10 (P-value) = −1.30]. The P-values are very low at distances close to the CpG site, and progressively become larger as the distance from the CpG site increases (distance-decaying effect). The rate of CpG substitution becomes independent of the G+C content [log10 (P-value)>−1.30] after ∼2,000 bps from the CpG. (C): Results of the chi-square test for the independence of the rate of GpC substitution and the G+C content of the windows at each distance, in log scale. Again, the blue line indicates the P-value cutoff of 0.05 (or log10 (P-value) = −1.30). The rate of GpC substitution becomes independent of the G+C content [log10 (P-value)>−1.30] at a distance very close to the GpC site, and no distance-decaying effect was observed.