The author has declared that no competing interests exist.
A long-time observation in genomes from bacteria to humans is that the level of nucleotide diversity varies from region to region within the genome. The sequence at some spots is virtually identical among all individuals in a population, while at other spots, variation abounds. What accounts for this differential variability from place to place within the genome? In this issue of
Their work concerns how insertions and deletions (“indels”) contribute to the sequence variability within a region, and the relative importance of indels to other factors. In the most widely held model, the “mutagenic indel” hypothesis, a heterozygous indel causes the DNA repair machinery to sprinkle the surrounding region with substitutions (the essence of sequence variability) in the process of attempting to correct the mismatch. A key prediction of the hypothesis is that, because the repair machinery is only called into play when sequences on homologous chromosomes differ, once the indel becomes homozygous in the population (i.e., all individuals have it on both chromosomes), there is nothing left to repair, and the accumulation of substitutions should end.
In contrast, the “regional differences” hypothesis posits that substitutions arise because of peculiarities of the local genomic environment, independent of the presence or heterozygosity of indels, and thus should continue to accumulate substitutions whether or not the indel is homozygous in the population, or even present at a particular spot.
The authors began their test of the mutagenic indel hypothesis by examining nucleotide diversity in a prokaryote, the gut bacterium
A second test was to compare nucleotide diversity in regions without indels to comparable regions with indels. If indels promote substitutions (and therefore diversity), the region surrounding the indel should be more diverse, and the region without it should be no more diverse than expected from the background rate of sequence change. This comparison is trickier than it sounds, since it requires knowing ahead of time which of two similar sequences contains an indel. The authors proceeded by comparing similar regions in two different bacterial strains, and using the sequence from a third strain to infer the ancestral sequence. Contrary to the mutagenic indel hypothesis, they found that diversity in both sequences was elevated above the background, but that the sequence without the indel was just as diverse as the sequence with the indel.
Furthermore, while indels had an acute effect near the time of mutation, that effect diminished over time, while the regional effect persisted. This suggested that the sequence of the region, not the presence of the indel, was controlling the diversity level. And what was true for bacteria was also true for yeast and flies: indels caused a one-time spike in diversity, while the effect of the region was constant.
So what characteristics of a region make it prone to accumulation of diversity? Repeat sequences are well known to cause indels, as the replication machinery slips and misaligns the two strands. But the authors propose a different mechanism to account for the substitutions surrounding the repeats. The authors note that two of
The results here will be valuable in understanding the origin and distribution of genomic sequence diversity, a critical feature of both fundamental genomic studies and evolutionary models. Their findings also have practical applications, since diversity is the raw material for all types of gene hunting techniques.