Figures
Abstract
Plant mitochondrial and plastid genomes have exceptionally slow rates of sequence evolution, and recent work has identified an unusual member of the MutS gene family (“plant MSH1”) as being instrumental in preventing point mutations in these genomes. However, the effects of disrupting MSH1-mediated DNA repair on “germline” mutation rates have not been quantified. Here, we used Arabidopsis thaliana mutation accumulation (MA) lines to measure mutation rates in msh1 mutants and matched wild type (WT) controls. We detected 124 single nucleotide variants (SNVs: 49 mitochondrial and 75 plastid) and 668 small insertions and deletions (indels: 258 mitochondrial and 410 plastid) in msh1 MA lines at a heteroplasmic frequency of ≥ 20%. In striking contrast, we did not find any organelle mutations in the WT MA lines above this threshold, and reanalysis of data from a much larger WT MA experiment also failed to detect any variants. The observed number of SNVs in the msh1 MA lines corresponds to estimated mutation rates of 6.1 × 10-7 and 3.2 × 10-6 per bp per generation in mitochondrial and plastid genomes, respectively. These rates exceed those of species known to have very high mitochondrial mutation rates (e.g., nematodes and fruit flies) by an order of magnitude or more and are on par with estimated rates in humans despite the generation times of A. thaliana being nearly 100-fold shorter. Therefore, disruption of a single plant-specific genetic factor in A. thaliana is sufficient to erase or even reverse the enormous difference in organelle mutation rates between plants and animals.
Author summary
Plants have very low rates of mutations in the sequence of their mitochondrial and plastid (chloroplast) DNA, especially when compared to the high rates found in animal mitochondrial genomes. Unlike in animals and many other eukaryotes, plants nuclear genomes encode a DNA repair factor called MSH1 that is targeted to their mitochondria and plastids. This enzyme is known to be necessary for maintaining the low mutation rates in these organelles, but its exact effects on heritable (germline) mutation rates have not been quantified. In this study, we used mutation accumulation lines in the model plant Arabidopsis thaliana to directly observe new mutations in both wild type plants and those lacking MSH1, providing estimates of absolute mutation rates per generation. We found that loss of MSH1 function led to massive increases in organelle mutations rates, effectively erasing or even reversing the typical gap between plants and animals.
Citation: Broz AK, Hodous MM, Zou Y, Vail PC, Wu Z, Sloan DB (2025) Flipping the switch on some of the slowest mutating genomes: Direct measurements of plant mitochondrial and plastid mutation rates in msh1 mutants. PLoS Genet 21(6): e1011764. https://doi.org/10.1371/journal.pgen.1011764
Editor: Ian R. Henderson, University of Cambridge, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND
Received: January 8, 2025; Accepted: June 9, 2025; Published: June 30, 2025
Copyright: © 2025 Broz et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Illumina sequencing reads were deposited to the NCBI Sequencing Read Archive and are available under BioProject PRJNA1201229. Custom scripts used in data analysis and figure generations are available via GitHub (https://github.com/dbsloan/msh1_MA_lines).
Funding: This work was supported by the National Institutes of Health (R35GM148134 to DBS), which provided salary for AKB, MMH, PCV, and DBS. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Identifying the evolutionary and molecular mechanisms that determine mutation rates remains one of the defining challenges in the field of genetics [1,2]. In addition to the nuclear genome, eukaryotes harbor endosymbiotically derived organelles (mitochondria and plastids) that retain their own genomes. Despite occupying the same cells, these genomic compartments can exhibit highly divergent mutation rates [3–5]. In many eukaryotes (including most animals), the mitochondrial genome has much higher rates of nucleotide substitutions than the nucleus, but the opposite is true in seed plants, which have average nuclear, plastid, and mitochondrial substitution rates that exhibit an approximately 10:3:1 ratio [6]. Although mitochondrial and plastid mutation rates have not been directly estimated in land plants, phylogenetic analyses indicate that mitochondria experience less than one substitution per site per billion years in many lineages [7].
One likely explanation for the anomalously low mutation rates in plant organelles is the function of the “plant” MutS Homolog 1 (MSH1) gene, which was horizontally acquired by the green plant lineage (i.e., prior to the divergence of all green algae and land plants) and is absent from most other eukaryotes [8,9]. MSH1 is a member of a much larger gene family with diverse roles in DNA mismatch repair and regulating recombination [10]. MSH1 is a nuclear-encoded enzyme that is dual-targeted to the mitochondria and plastids and has long been known to maintain the structural stability of plant organelle genomes by suppressing ectopic recombination between small repeated sequences [8,11–14]. The distinctive domain architecture of the MSH1 protein [15–17] has prompted the hypothesis that it is also responsible for maintaining low mutation rates via a novel mismatch repair mechanism [18–20]. This hypothesis is supported by recent findings that disruption of MSH1 leads to a large increase in the number of de novo point mutations [9,14,21]. However, the approaches used in previous studies have precluded direct quantification of the heritable (“germline”) mutation rates in msh1 mutants.
Mutation accumulation (MA) experiments have proven to be an effective way to generate direct estimates of germline mutation rates in many model systems [22–24]. These experiments are conducted by rearing multiple MA lines in the lab. Each generation, lines are propagated by randomly choosing one or a small number of individuals for breeding, thereby minimizing effects of natural selection except in cases of mutations that are completely lethal or sterilizing. Resequencing MA lines after many generations can then identify the de novo mutations that have occurred over the course of the experiment. Here, we report the results of an MA experiment, in which we propagated msh1 mutant lines and matched wild type (WT) control lines in the model angiosperm Arabidopsis thaliana to directly quantify mitochondrial and plastid mutation rates.
Results and Discussion
Arabidopsis thaliana MA lines
All lines in this study were derived from an A. thaliana Col-0 WT plant (maternal parent) that was crossed with a homozygous knock out msh1 CS3246 mutant line (i.e., chm1–2; [8]) to generate heterozygous F1s with a “clean” cytoplasmic background that had never experienced a homozygous msh1 mutant nuclear genotype [9]. An F1 plant was then self-fertilized to generate F2 families segregating at the MSH1 locus. F2 plants that were homozygous for WT allele (W1, W2, W3) or the mutant msh1 allele (M1, M2, M3) were then selected. The progeny from these F2 plants were used to generate the initial material for this study. For each F3 family, seven WT plants (e.g., W1, 1–7) and eight mutant plants (e.g., M1, 1–8) were selected as starting individuals and propagated to the F8 generation (i.e., seven generations in the homozygous WT or msh1 mutant state) with single-seed descent. Consistent with previous characterizations [8,11,25,26], individuals from the msh1 MA lines exhibited a diverse range of mutant phenotypes (Fig 1A). Four msh1 mutant lines were considered extinct at the F7 generation because F8 seeds would not germinate after multiple plantings, which was reflective of low germination rates in the msh1 MA lines (S1 Fig). Therefore, we analyzed F7 individuals in these cases. Although we began with 21 WT and 24 msh1 mutant lines, we only included sequence data from 20 WT and 22 msh1 lines. One WT line was excluded because the sequencing library produced very low yield. Two msh1 lines were excluded because it was discovered that they had been contaminated by WT seed and were not actually msh1 mutants.
(A) Examples of phenotypic variation arising in the msh1 MA lines. F9 progeny from sequenced F8 individuals are pictured. Nine seeds were sown per pot, but not all seeds germinated. (B) Mitochondrial (top) and plastid (bottom) variants detected by MA line sequencing. Each column represents an MA line. Each row represents a nucleotide position in the genome. Only positions with variants are shown, so spacing on the vertical axis is not a measure of distance. The color intensity of horizontal lines in each of cell of the heatmaps represents variant frequency, which was normalized for background sequencing error rates by subtracting the mean variant frequency averaged across all WT MA lines. As indicated by the almost completely white panels, we did not detect any variants that passed filtering criteria in the WT MA lines. We detected 892 SNVs and small indels in the msh1 mutant MA lines.
Higher percentage of organelle-mapping reads in msh1 mutants than WT lines
Despite the small size of organelle genomes, they can be present in numerous copies and, therefore, account for a substantial proportion of DNA within a plant cell [27]. On average, we found that 2.9% and 16.1% of reads from total DNA extracted from leaf tissue mapped to the mitochondrial and plastid genomes, respectively. For both organelle genomes, msh1 mutant lines exhibited a modest but significant increase in the proportion of organelle-mapping reads (~30% increase for each genome; S2 Fig). The cause of this increase is unclear. On one hand, it could somehow be directly related to the role of the MSH1 in organelle genome recombination and repair. However, it could also be a downstream consequence arising from other functional effects of MSH1. For example, msh1 mutants often exhibited reduced rates of growth in our MA lines (Fig 1A). Therefore, harvested leaf tissue might have represented effectively earlier stages in development and been more organelle-rich. Recent comparative analyses have found a negative relationship between mitochondrial genome copy number and mutation rate in seed plants [28,29]. Although this relationship runs counter to the observed increase in organelle genome copy number in the mutation-prone msh1 lines here, it may not be a relevant comparison because one hypothesized mechanism explaining the negative correlation across species depends on the role of intact MSH1 in template-based recombinational repair.
Mitochondrial and plastid germline mutation rates are extremely high in msh1 MA lines
After filtering to exclude variant calling artefacts (see Methods) related to sequencing errors, repeat-mediated recombination, structural rearrangements, and nuclear copies of mitochondrial DNA (numts), we identified a total of 49 mitochondrial single nucleotide variants (SNVs) and 75 plastid SNVs in the 22 msh1 MA lines. The total number of SNVs per msh1 line ranged from 2 to 12 and did not deviate significantly from a Poisson distribution (P = 0.31). We also observed one mitochondrial multinucleotide variant (MNV) and one plastid MNV, which consisted of either two substitutions at adjacent positions or replacement of a single base pair by a dinucleotide (S1 Dataset). We did not identify any organellar SNVs or MNVs in the 20 WT MA lines (Fig 1B) at our applied cutoff of 20% heteroplasmic allele frequency, but lowering this threshold to 10% did lead to detection of one low-frequency plastid SNV in a WT line (see Methods). Sanger sequencing of a sample of ten identified SNVs (five mitochondrial and five plastid) confirmed the accuracy of the variant calls and that they were germline mutations transmitted to the F9 generation.
One mitochondrial SNV (AT→GC at position 91,017) was shared by two msh1 MA lines derived from the same F2 parent (M3-2 and M3-6), suggesting that it may have arisen a single time in that F2 individual and been retained by both of the lines. Indeed, previous sequencing of a large pool of F3 siblings from this F2 parent showed that this mutation was present at a substantial frequency of 14.3% in that pool [20,29], providing further evidence that it was heteroplasmic in the F2 founder. This variant appears to have reached homoplasmy (100% frequency) in the two F8 individuals that carry it (S1 Dataset), whereas it seems to have been lost entirely in the remaining M3 lines (with measured allele frequencies ≤ 1% and not exceeding background error rates). This pattern of fixation or complete loss by the F8 generation of a variant that arose in the F2 generation is consistent with the rapid rate of heteroplasmic sorting in this system [20,30]. Every other SNV and MNV was unique to a single MA line. Therefore, the vast majority of the observed variants appear to have arisen independently during MA line propagation. In addition, we emphasize that all WT msh1 mutant lines in this experiment were derived from a single heterozygous F1 parent. The likelihood of pre-existing heteroplasmy in this F1 individual is low because it has a functional copy of MSH1 (i.e., the msh1 mutant allele is recessive). However, if present, any heteroplasmic variants in the F1 would be expected to sort out to an equal extent in WT and mutant lines. Therefore, the fact that we did not observe SNVs in the WT lines indicating that pre-existing heteroplasmy in the F1 progenitor was not a meaningful contributor to variants identified in the msh1 mutant lines.
We found that 7 of the 49 mitochondrial SNVs (14%) and 45 of the 75 plastid SNVs (60%) were in protein-coding, rRNA, or tRNA gene sequences (Table 1). These frequencies closely mirror the percentage of genic sequence in the A. thaliana mitochondrial and plastid genomes (10% and 59%, respectively), suggesting that the mutation rates in these functionally important regions were similar to those in intronic/intergenic sequence and that MA line propagation was effective at minimizing selection. Among the variants in protein-coding sequence (CDS), 3 of the 6 mitochondrial SNVs (50%) and 13 of the 35 plastid SNVs (37%) were synonymous changes (i.e., they do not alter the amino acid sequence of the encoded protein). The observed counts of synonymous vs. nonsynonymous variants do not differ significantly from expectations generated by random permutations (P = 0.35), again suggesting that selection on functional effects of mutations was very limited in the MA lines.
We detected 50 of the 75 plastid SNVs (67%) at a frequency of 98% or higher in their respective MA line. Allowing for the effects of sequencing and read mapping errors, it is likely that these variants are fixed (homoplasmic) in the sequenced samples. The remaining 25 plastid SNVs are more likely heteroplasmic variants, with measured frequencies ranging from 97% down to the 20% cutoff that was applied during variant detection. It should be noted that we cannot definitively distinguish between true homoplasmic mutations and high-frequency heteroplasmic mutations and that it is even more challenging to do so for the mitochondrial genome because our sequencing data were generated from total-cellular DNA samples and the A. thaliana Col-0 nuclear genome contains a large numt that is nearly identical in sequence to the actual mitochondrial genome [31,32]. As such, even if a mitochondrial variant has reached homoplasmy, samples will still produce numt-derived reads that match the reference allele. Therefore, we applied a correction to allele frequency estimates based on the percentage of nuclear sequence in the dataset and the number of copies of the corresponding region in the numt. Given the very approximate nature of this correction, we consider mitochondrial SNVs with an estimated frequency >90% to be strong candidates for homoplasmic variants. Only 14 of the 49 (29%) of the mitochondrial SNVs reached this threshold. The apparently lower proportion of homoplasmic SNVs in the mitochondria compared to the plastids is consistent with previous observations in A. thaliana msh1 mutants that the plastid genome exhibits stronger effects of transmission bottlenecks and, thus, more rapid heteroplasmic sorting [20,30].
The observed numbers of SNVs correspond to estimated nucleotide substitution rates in msh1 mutants of 6.1 × 10-7 and 3.2 × 10-6 in the mitochondrial and plastid genomes, respectively (here and throughout, mutation rates are expressed as per bp per generation). These values represent remarkably high rates of nucleotide substitution, not only given the low mutations rates that are characteristic of plant organelle genomes but also in comparison to MA line estimates in animal systems with high mitochondrial mutation rates. For example, the mitochondrial nucleotide substitution rates estimated from MA experiments with multiple nematode species and the fruit fly Drosophila melanogaster are ~ 10-fold and ~50-fold lower than the mitochondrial and plastid rates we observed in A. thaliana msh1 lines, respectively (Table 2). Indeed, the observed rates in these msh1 lines are similar to the germline mitochondrial substitution rates estimated from human pedigrees [33,34], despite the ~ 100-fold shorter generation time in A. thaliana. To our knowledge, the only other direct estimate of a germline mitochondrial substitution rate that approaches these levels is from the water flea Daphnia magna [35], which intriguingly has an estimated rate that is almost 30-fold higher than that of its congener D. pulex [36].
Because we did not detect any mitochondrial or plastid SNVs in the A. thaliana WT MA lines, we cannot calculate WT mutation rates to compare to the msh1 lines. To further investigate the occurrence of organelle variants in WT backgrounds, we reanalyzed published data from a much larger A. thaliana MA experiment [37], which resequenced 107 lines at the F25 generation. The combination of more lines and more generations means that this study had a ~ 17-fold larger WT sample than our own. Nevertheless, we still did not detect any mitochondrial or plastid SNVs in this dataset, consistent with the lack of reported organelle variants in the original study [37]. To assess a potential upper bound on substitution rates in WT lines, we can consider that if we had observed a single (homoplasmic) SNV in these studies, it would have corresponded to mutation rate estimates of ~1 × 10-9 in the mitochondrial genome or ~3 × 10-9 in the plastid genome. More conservatively, we can calculate the lowest mutation rate that would still have yielded a 95% probability (when modeled as a Poisson process) of detecting at least one SNV given the sample size of these studies, which would correspond to rates of ~3 × 10-9 in the mitochondrial genome and ~8 × 10-9 in the plastid genome. Given these estimated upper bounds for WT mutation rates, it is not surprising that neither study detected any organelle mutations. Previous phylogenetic analysis of synonymous substitutions has estimated a mitochondrial mutation rate of ~2 × 10-10 (assuming a generation time of ~4 months) for the Arabidopsis lineage [7]. Based on typical ratios of plastid to mitochondrial substitution rates in angiosperms [37–40], this would correspond to a plastid rate of ~1x10-9. Both these values fall below the measurable range in this study. To our knowledge, the only other direct estimates of mutation rates in photosynthetic eukaryotes have been performed with unicellular species that have much shorter generation times than multicellular plants [38–41]. Most of these studies did not detect any mitochondrial or plastid SNVs, and those that did estimated mutation rates of ~1x10-9 or lower (Table 2). Therefore, larger scale MA studies may be needed to accurately quantify organelle mutation rates in these systems. However, investigating mutation rates in WT and msh1 mutant lines in green algal models would be valuable because characterization of MSH1 function has been largely limited to land plants to date.
Our earlier study using high-fidelity DNA sequencing to detect low-frequency variants in pooled samples of vegetative tissues estimated that SNV frequencies were ~10-fold and ~200-fold higher in msh1 mutants compared to WT for mitochondrial and plastid genomes, respectively [41,42]. Despite the large magnitude of these differences, the results from the present study indicate that comparing frequencies of somatic variants may greatly underestimate the proportional effects of disrupting MSH1 on germline mutation rates. For example, the estimated SNV rate in msh1 MA lines is ~ 200-fold higher than the calculated WT upper bound in the mitochondrial genome and ~400-fold higher in the plastid genome. Because those ratios are based on WT upper bounds, they represent a minimum estimate of msh1 mutant effects. Using the phylogenetic-based estimates described above as our WT rates suggests that disrupting MSH1 increases both the mitochondrial and plastid substitution rate by ~3000-fold. However, that estimate should be interpreted with caution because phylogenetic analyses have often been found to underestimate mutation rates relative to direct measurements [42,43].
The mutation spectrum in msh1 MA lines is GC-biased in the plastid genome and GC-neutral in the mitochondrial genome
The observed substitutions in A. thaliana msh1 mutant lines exhibited a highly biased mutation spectrum (Fig 2). Strikingly, 62 of the 75 (83%) plastid SNVs were AT→GC transitions. This substitution type was also the most common in the mitochondrial genome (23 of 49 SNVs; 47%). The abundance of AT→GC transitions in these lines is consistent with the disproportionate increase in this mutation class that was previously observed among low-frequency somatic variants in msh1 mutants [43–47]. In both organelles, GC→AT transitions were the second most common SNV: 6 of 75 (8%) in the plastid genome and 21 of 49 (43%) in the mitochondrial genome. Therefore, transitions represented ~90% of the observed SNVs in both genomes (Fig 2), but there was a large difference in GC bias between the plastids and the mitochondria. Substitutions were highly GC-biased in the plastid genome, whereas they were relatively GC-neutral in the mitochondrial genome. Even a GC-neutral mutation spectrum is fairly unusual because of the predominant AT bias that has been documented across the tree of life [44–48]. The observed mitochondrial mutation spectrum in msh1 MA lines is largely congruent with the relatively GC-neutral nucleotide composition of the A. thaliana mitochondrial genome (45% GC). In contrast, the GC-biased mutation spectrum in plastids is strikingly opposite the AT-biased composition of the plastid genome (35% GC). Therefore, the loss of MSH1 function may alter not just the rate but also the spectrum of germline mutations.
For mutation rate calculations, SNVs were weighted by heteroplasmic frequency and normalized to the corresponding number of GC or AT base-pairs to account for biased nucleotide compositions in the respective genomes. Note that the y-axis scales differ between the two panels.
We posit three possible explanations (none of which are mutually exclusive) for why the observed mutation spectra in msh1 MA lines are GC-biased or at least lack the strong AT bias that is often found in other organisms/genomes. First, the extent to which repair pathways are dependent on MSH1 may alter mutation spectra. For example, even if MSH1 targets DNA damage that can cause AT-biased mutations, plant organelles may utilize redundant pathways to repair some of the most common types of damage such as deaminated cytosines [49,50] and 8-oxoG modifications formed through oxidation [51]. In contrast, if MSH1 functions in the primary or sole pathway responsible for repair of damage that leads to GC-biased mutations, msh1 mutants would be expected to show a disproportionate increase in GC-biased mutations. In animal mitochondrial genomes, oxidative damage to adenosines has been hypothesized to cause mutation signatures involving AT→GC transitions [52]. Such damage could potentially be a target of MSH1 in plant organelles. Second, the abundance of AT→GC transitions in the msh1 mutants may reflect the nucleotide misincorporation kinetics of the DNA polymerases that function in plant organelles. Steady-state kinetic analyses conducted in vitro have suggested that these polymerases are prone to misincorporate guanosines opposite thymines in the template strand [52,53]. If MSH1 normally plays a role in recognition and repair of the resulting mismatches, loss of MSH1 function would again be expected to lead to a disproportionate increase in AT→GC transitions. Finally, it is possible that observed spectra do not solely reflect mutational input but are also biased by selection or a selection-like process. The premise of MA experiments is that the random bottleneck each generation reduces effective population size to an extent that effects of selection are essentially eliminated. However, this experimental bottlenecking is applied at the organismal level and does not preclude selection acting on the large population of organelle genome copies that exists within an individual [53,54]. Biased gene conversion can also mimic selection in its effects on allele frequencies [55]. We previously observed that heteroplasmic sorting shows a bias towards GC alleles in A. thaliana [55]. Such a form of selection at an intracellular level could bias estimates of both the mutation rate and spectrum relative to true mutational input.
More generally, the predominance of transitions over transversions in the mutation spectra observed in MA lines is consistent with the high affinity of other MutS-family mismatch repair proteins for G-T mismatched base pairs [56]. Accordingly, mutation spectra from tumor samples arising from defects in mismatch repair often exhibit huge increases in both AT→GC and GC→AT transitions (e.g., the SBS6, SBS15, SBS21, and SBS44 spectra in the Catalogue of Somatic Mutations in Cancer [57–60]). In addition, MutS-family proteins show strong affinity for unpaired bases arising from indels in DNA synthesis [56], and “microsatellite instability” is a defining characteristic of tumors with defective mismatch repair [60,61], both of which are consistent with the extremely high rates of indels in homopolymer sequences (single-nucleotide repeats) in msh1 mutant lines (see below).
Homopolymers are hotspots for indel mutations in msh1 MA lines
The effects of disrupting MSH1 on organelle mutation rates were even more extreme for small indel variants than for SNVs. In the 22 A. thaliana msh1 mutant MA lines, we detected a total of 258 small indels in the mitochondrial genome and 410 in the plastid genome (S2 Dataset). In contrast, we did not detect any small indels in the 20 WT MA lines. The observed small indels were almost exclusively found at homopolymers, with 655 of the 668 small indels (98%) representing expansions or contractions of single-nucleotide repeats of 5-bp or longer. Of the remaining 13 indels, 10 were expansions or contractions of tandem dinucleotide repeats. Therefore, replication of simple repetitive sequences appears to be extremely error-prone in the absence of MSH1 function. The extreme mutability of these loci likely explains why there were numerous cases where mutations were detected at the same homopolymer or dinucleotide repeat in multiple MA lines (S2 Dataset). Although some of these may represent mutations that arose in an F2 progenitor and were inherited by multiple lines, we infer that the vast majority represent “multiple hits” that occurred independently in the different lines. This conclusion is supported by the following evidence: (1) it is similarly common for lines derived from different F2 individuals share the same small indel, (2) there are many cases where the same homopolymer exhibits mutations in multiple lines that differ in indel length, and (3) sharing of alleles among lines was extremely rare for SNVs (see above).
The observed counts of small indels correspond to exceptionally high mutation rates of 4.8 × 10-6 and 2.1 × 10-5 in mitochondrial and plastid genomes, respectively. Because of the high rates of sequencing errors and other challenges to accurately estimating indel allele frequency at homopolymers [62], we did not attempt to distinguish between heteroplasmic and homoplasmic indel variants. Therefore, the above rate estimates may be inflated because we did not weight variants by their allele frequencies. On balance, however, these values are more likely to be underestimates given our inability to capture multiple independent mutations at the same site in a single line or variants with frequencies that do not rise above the high noise threshold due to sequencing error rates at homopolymers.
The higher observed rate of small indel mutations in plastids than in mitochondria may largely reflect the greater abundance of homopolymers in the plastid genome, especially the large number of long A/T homopolymers. The homopolymer composition of the respective genomes may also influence the balance of insertion vs. deletion mutations. Both genomes exhibit a bias towards deletions at A/T homopolymers and a bias towards insertions at G/C homopolymers (Fig 3A). The AT-rich plastid genome has 12.3-fold more A/T homopolymers than G/C homopolymers (Fig 3B). Accordingly, its overall indel spectrum is deletion-biased (140 insertions and 270 deletions). In the mitochondrial genome, there are only 2.8-fold more A/T homopolymers than G/C homopolymers, and the overall indel spectrum is slightly insertion-biased (144 insertions and 114 deletions). These mutation patterns are an illustration of the reciprocal causal relationships that can exist between genome nucleotide composition and mutation rates/biases and potentially lead to feedback cycles.
(A) The bars represent the counts of short indels observed in A/T homopolymers (left panel) and G/C homopolymers (right panel). Within each panel, mitochondrial indels are shown to the left of the center line, and plastid indels are shown to the right. Both genomes exhibit a deletion bias in A/T homopolymers and an insertion bias in G/C homopolymers. Indels >3 bp in length were extremely rare (S3 Dataset) and not shown in this plot. (B) Frequency of A/T homopolymers (left panel) and G/C homopolymers (right panel) in the A. thaliana mitochondrial (gold circles) and plastid (green triangles) genomes. The fact that the plastid genome is dominated by A/T homopolymers may explain why its overall indel spectrum is deletion-biased given that A/T homopolymers appear to be more deletion-prone than G/C homopolymers in both organelles of msh1 MA lines.
A previous study that identified the spontaneous plastid mutations responsible for observed phenotypes in collections of the evening primrose Oenothera also found a predominance of small indels [63]. However, those variants did not exhibit the same overwhelming effect of homopolymer runs that we observed in A. thaliana msh1 MA lines, suggesting that the loss of MSH1 function may preferentially exacerbate error rates at homopolymers.
WT and msh1 MA lines show little difference in nuclear mutation rates
Given the known targeting of MSH1 to the mitochondria and chloroplasts, we anticipated that the mutagenic effects of disrupting its function would be limited to the organelle genomes. To test this prediction, we analyzed our sequence data to identify de novo nuclear mutations, filtering for SNVs that were found in only a single MA line and determined to be homozygous. Interestingly, the mean number of nuclear SNVs per line was significantly higher for msh1 mutants than for WT lines (7.0 vs 3.4; P = 0.006; S3 Dataset). Although this difference is tiny in comparison to the increases in mutation rate observed in the organelle genomes, it could indicate that there is a mutagenic effect in the nuclear genome resulting from the loss of MSH1 function. For example, it has been shown that msh1 mutants exhibit altered cytosine methylation patterns in the nuclear genome perhaps due to retrograde signaling from the plastids [26,64], which could potentially affect the rate and locations of mutations. In addition, the highly disruptive effects on organelle genomes in msh1 mutants may alter plant biochemistry, physiology, and growth (Fig 1A) with indirect effects on the nuclear mutation rate. However, the apparent difference between msh1 mutant and WT lines in nuclear mutation rate should be interpreted with caution, particularly because it appears to be driven by lower rates in just two of the three sets of WT lines. The lines derived from one of the WT F2 plants (W2) show nuclear SNV counts that are comparable to those of the three msh1 sets (S3 Dataset). Given that our analysis filtered for unique nuclear SNVs, it is difficult to explain why there would be an effect of the F2 progenitor, although it is possible that there were sequence variants that distinguished the F2 individuals and affected nuclear mutation rates. Regardless, the effects of MSH1 on nuclear mutation rates, if any, appear minimal compared to the massive increase in mitochondrial and plastid mutagenesis.
Conclusions
It has long been perplexing why plants and animals can differ by orders of magnitude in their germline organelle mutation rates despite their similarity in effective population size and other biological features [65]. Our study shows that disruption of a single genetic factor in plants (MSH1) is sufficient to erase or even reverse this difference. The fact that MSH1 appears to have been horizontally transferred into the green plant lineage illustrates how acquisition of foreign DNA repair machinery can have a fundamental effect on mutation rate evolution, which has also been observed for other members of the MutS gene family with mitochondrial functions [10,66]. The mechanisms by which MSH1 suppresses substitutions and small indels in plant organelle genomes have not yet been fully determined, but there is some evidence supporting a process involving mismatch recognition followed by introduction of double-stranded DNA breaks and template-based recombinational repair [65]. Regardless of the mechanism, MSH1 function appears to be critical for maintaining plant viability over generational timescales. When bottlenecking during MA line propagation limits the efficacy of selection, the accumulation of deleterious mutations was so fast in msh1 mutant lines that we began to see line “extinctions” in only a handful of generations. Even in natural populations where selection is more efficacious, sustaining organelle function in the face of the extreme rates of nucleotide substitutions, indels, and structural mutations that occur in the absence of MSH1 function would seem untenable.
Methods
Plant growth and MA line propagation
The original F2 families used for this experiment were those created in [66]. For MA line propagation, seeds were placed in water and vernalized at 4 °C for three days. Seeds were then transferred to 3 × 3 inch pots filled with PRO-MIX BX potting media and placed on light shelves with a 16-hr day length (light intensity of ~150 μE/m2/s). Nine seeds from each line were sown, and a single randomly chosen individual from each line was allowed to set seed for the next generation. If the randomly chosen individual failed to germinate/survive/reproduce, a backup individual was chosen at random (and so on). This process of single-seed descent was carried out until the F8 generation. Seeds for each subsequent generation were planted within one month of parental seed harvesting.
DNA extraction and sequencing
A single rosette leaf (or two in the case of very small plants) was harvested from each F8 (or F7) plant prior to bolting and stored at -80 °C until DNA was extracted using a Qiagen DNeasy Plant Kit and quantified with a Qubit HS-dsDNA kit. We performed a pilot round of DNA sequencing on samples from three MA lines (M1_2, M2_1, and W1_1) to assess whether there were likely to be enough variants to robustly measure mutation rates. For this pilot round, Illumina library construction was performed by Novogene, using an NEBNext Ultra II DNA Library Prep Kit (E7645L) and up to 80 ng of input DNA. After the pilot run, libraries for the remaining MA lines were also generated by Novogene but with an ABclonal Rapid Plus DNA Lib Prep Kit for Illumina (RK20208) and 70 ng of input DNA for each sample. In both cases, sequencing was performed on the Novaseq X Plus platform with 2 × 150 bp paired-end reads on a 25B flow cell (partial lane sequencing to generate ≥10 Gb of data per sample).
Mitochondrial and plastid read mapping, depth analysis, and variant calling
Illumina reads were processed with Cutadapt v4.0 to remove Illumina adapters and low quality end sequence (-q 20), applying a minimum trimmed read length of 50 bp. Trimmed reads were then mapped to the A. thaliana Col-0 mitochondrial [67] and plastid [68] genomes using Bowtie v2.2.5 [69]. The plastid genome reference was modified to reflect the previous observation that our Col-0 lab stock differs from the published sequence by a 1-bp expansion in the homopolymer at position 28,673 [69]. The resulting alignments were then sorted, converted to bam format, and indexed with Samtools v1.17 [70]. Read mapping percentages to the organelle genomes were summarized for each library with custom scripts. Variant counts at each position in the genomes were then compiled with Perbase v 0.9.0 (https://github.com/sstadick/perbase).
Using custom scripts, variants were filtered to only include those with a frequency of ≥20% and coverage depth of ≥50 × . We used this relatively high frequency of cutoff of 20% to minimize false positives or misidentify low-frequency vegetative mutations as germline variants. The high cutoff was also important because some low-complexity regions such as long homopolymers cause high sequencing error rates and elevated variant frequencies across all samples, so we further filtered variants to exclude those that did not have at least a three-fold higher frequency than the mean across all WT lines. We explored the consequences of reducing the 20% frequency threshold by reanalyzing the plastid SNV calls with a more permissive threshold of 10%. We used plastid SNVs as the test for this change because they require the least amount of filtering to deal with structural and numt artefacts (see below). Reducing the threshold identified 10 additional plastid SNVs. Notably one of these low-frequency SNVs was found in a WT line (W2-1, AT→GC substitution at position 7336; 13.2% variant frequency), while the other nine were in msh1 mutant lines. These additional SNVs increased the total count in msh1 mutant lines from 75 to 84, but because they are low-frequency, they had a negligible effect on that calculated mutation rate (an increase from 3.24 × 10-6 to 3.31 × 10-6), which is weighted by frequency (see below). Given the small effect of lowering the frequency threshold on the estimated mutation rate and our goals of minimizing effects of vegetative mutations and mapping artefacts, all reported values correspond to the 20% frequency threshold.
Plant organelle genomes contain large repeated sequences that are identical in sequence and frequently interconvert via recombination and gene conversion [71,72]. To avoid double counting variants in these regions, we removed calls in one copy of the large inverted repeat in the plastid genome and one copy of each of the two pairs of large repeats in the mitochondrial genome.
Disruption of MSH1 function leads to recombinational activity between small, imperfect repeats, resulting in structural rearrangements [71]. When mapped to a reference sequence, reads from these rearranged genomes can lead to artefactual identification of de novo SNVs and small indels [72]. Therefore, we used BLASTN 2.14.1+ [73] searches to identify and exclude variants that could be explained by structural rearrangements. Variants were similarly searched against the large numt sequence [37] to exclude spurious calls related to this insertion in the nuclear genome [74]. We also corrected estimates of mitochondrial SNV frequencies to account for numt-derived sequencing reads that are identical to the reference allele in the mitochondrial genome. Average nuclear genome coverage was estimated based on the percentage of reads that were not mapped to the organelle genomes by Bowtie 2 and the total size of the nuclear genome. Using this coverage estimate and the number of copies of the corresponding region that are present in the numt, we subtracted the expected number of nuclear-derived reads from both the reference allele count and the total coverage of the mitochondrial locus and then recalculated the variant frequency.
To expand sampling of WT lines, we reanalyzed previously published A. thaliana MA line resequencing data [37] with the same variant calling approach. Although the raw variant calls included 231 SNVs (and no indels) that passed our coverage and frequency thresholds, these were exclusively found in just six of the 107 MA lines (39, 40, 100, 101, 102, and 103), and all of them were present <50% frequency. Moreover, their frequencies were highly correlated across the six sequencing libraries despite the fact that these lines were propagated independently for 25 generations. Therefore, these variants appear to be the result of a biased sequencing error profile in these six libraries, and we concluded that there were no convincing organelle sequence variants in any of these MA lines, consistent with the lack of any identified variants in the original publication [73].
Mitochondrial and plastid variant confirmation
To confirm high frequency SNVs and determine whether they are germline variants transmitted to offspring, Sanger sequencing was performed (Genewiz, Azenta Life Sciences) using locus-specific primers (S1 Table). Sequenced samples included the F8 individual harboring the SNV, an F9 progeny of this plant (growing conditions as described above in “Plant growth and MA line propagation”), and a WT control line (not harboring the SNV). Sequencing traces were analyzed using Chromas Lite software.
Mutation rate calculations
Germline mutation rates (μ) were calculated as follows:
where V is the number of identified variants, G is the respective genome size, and N is the total number of generations summed across the set of MA lines. Genome sizes were reduced by the length of the large repeat copies that were excluded for variant calling purposes (see above), resulting in sizes of 357,025 and 128,214 bp for the mitochondrial and plastid genomes, respectively. There was a total of 150 generations for homozygous msh1 mutants (7 generations for each of the 18 F8 lines and 6 generations for each of the 4 F7 lines; the F1 generation was excluded from this count because that individual was not homozygous for the msh1 mutant allele). For SNV mutation rates, each variant was scaled by its frequency. However, due to the high-sequencing error rate and other challenges associated with estimating frequencies at homopolymers, we used unscaled variant counts to calculate indel mutation rates (see above for discussion of associated effects on rate estimates). To calculate an upper bound for the WT mutation rates, we used the fact that the probability of observing zero mutations given a Poisson distribution is e-λ, where λ is the expected number of mutations and can be calculated as the product of μ, G, and N. By setting this probability to 0.05 and solving for μ, the minimum mutation rate that would have yielded a 95% probability of detecting at least one mutation can be calculated as follows:
For these upper-bound calculations, N was set to 2835, reflecting the sum of 2675 (107 MA lines × 25 generations per line) from [74] and 160 (20 WT MA lines × 8 generations per line) from this study.
Comparison of variation among MA lines to a Poisson distribution
To assess whether the observed differences in number of SNVs among msh1 lines was consistent with the variance expected from a Poisson distribution, we used the dispersiontest function in the AER package within R v4.4.1.
Functional annotation and simulations of nonsynonymous and synonymous variants
Functional characterization of the location of identified variants (genic/intronic/intergenic) and the effect on protein coding sequence (synonymous/nonsynonymous) was conducted with custom scripts. In addition, we simulated random sets of mutations to determine whether the number of synonymous vs. nonsynonymous mutations in our dataset exhibited signatures of selection against deleterious variants. To control for the mutation spectrum, we kept the number (six in the mitochondrial genome and 35 in the plastid genome) and type (AT→GC, GC→AT, GC→TA, etc.) constant but randomized their position within protein-coding sequences. Using 10,000 of these permutations, we generated a distribution of the total count of synonymous substitutions, which we used to perform a one-tailed test for an enrichment of synonymous SNVs in our dataset. We calculated the P-value for this test as the frequency of permutations with greater than or equal to the observed number of synonymous SNVs (16) in the dataset.
Mitochondrial and plastid genome homopolymer analysis
A custom script was used to extract the number, length, and type of homopolymers in the A. thaliana mitochondrial and plastid genomes. For consistency with variant calling and mutation rate calculations, reported homopolymer data exclude one copy of each pair of large repeats in these genomes.
Nuclear variant calling
To identify nuclear SNVs, BWA-MEM v0.7.18-r1243 [75] was used to map previously trimmed reads (see above) from each of our MA lines to the current A. thaliana Col-CC nuclear genome assembly (NCBI GCA_028009825.2) along with the above organelle reference sequences [76]. GATK v4.6.1.0 MarkDuplicates, HaplotypeCaller, and GenotypeGVCFs were used to identify variants in the resulting alignments, which were filtered with custom scripts to identify homozygous SNVs that had a coverage of >20× and were unique to a single MA line. These variants were further filtered to remove clusters of adjacent SNVs and indels that likely arose from mapping artefacts related to larger structural variants.
Supporting information
S1 Table. Primers used to validate a sample of mitochondrial and plastid SNVs by PCR amplification and Sanger sequencing.
All tested sequences were confirmed as germline variants.
https://doi.org/10.1371/journal.pgen.1011764.s001
(PDF)
S1 Fig. Reduced germination percentage in msh1 mutant lines compared to WT lines in the F7, F8, and F9 generations (germination percentage was not tracked in previous generations).
Each point represents the measured germination percentage for a line, typically based on a sample of 18 seeds. Additional sets of 18 were sown in cases of extremely low germination values from the first set.
https://doi.org/10.1371/journal.pgen.1011764.s002
(PDF)
S2 Fig. Total DNA from leaf tissue exhibited a higher percentage of reads mapping to organelle genomes for msh1 mutants than WT lines.
Each point represents the read mapping percentage to the mitochondrial (left) or plastid (right) genome for an individual line.
https://doi.org/10.1371/journal.pgen.1011764.s003
(PDF)
References
- 1. Sturtevant AH. Essays on evolution. I. On the effects of selection on mutation rate. Q Rev Biol. 1937;12:464.
- 2. Lynch M, Ackerman MS, Gout J-F, Long H, Sung W, Thomas WK, et al. Genetic drift, selection and the evolution of the mutation rate. Nat Rev Genet. 2016;17(11):704–14. pmid:27739533
- 3. Brown WM, George M Jr, Wilson AC. Rapid evolution of animal mitochondrial DNA. Proc Natl Acad Sci U S A. 1979;76(4):1967–71. pmid:109836
- 4. Wolfe KH, Li WH, Sharp PM. Rates of nucleotide substitution vary greatly among plant mitochondrial, chloroplast, and nuclear DNAs. Proc Natl Acad Sci U S A. 1987;84(24):9054–8. pmid:3480529
- 5. Smith DR, Keeling PJ. Mitochondrial and plastid genome architecture: Reoccurring themes, but significant differences at the extremes. Proc Natl Acad Sci U S A. 2015;112(33):10177–84. pmid:25814499
- 6. Drouin G, Daoud H, Xia J. Relative rates of synonymous substitutions in the mitochondrial, chloroplast and nuclear genomes of seed plants. Mol Phylogenet Evol. 2008;49(3):827–31. pmid:18838124
- 7. Mower JP, Touzet P, Gummow JS, Delph LF, Palmer JD. Extensive variation in synonymous substitution rates in mitochondrial genes of seed plants. BMC Evol Biol. 2007;7:135. pmid:17688696
- 8. Abdelnoor RV, Yule R, Elo A, Christensen AC, Meyer-Gauen G, Mackenzie SA. Substoichiometric shifting in the plant mitochondrial genome is influenced by a gene homologous to MutS. Proc Natl Acad Sci U S A. 2003;100(10):5968–73. pmid:12730382
- 9. Wu Z, Waneka G, Broz AK, King CR, Sloan DB. MSH1 is required for maintenance of the low mutation rates in plant mitochondrial and plastid genomes. Proc Natl Acad Sci U S A. 2020;117(28):16448–55. pmid:32601224
- 10. Sloan DB, Broz AK, Kuster SA, Muthye V, Peñafiel-Ayala A, Marron JR, et al. Expansion of the MutS Gene Family in Plants. bioRxiv. 2024. pmid:39071318
- 11. Martinez-Zapater JM, Gil P, Capel J, Somerville CR. Mutations at the Arabidopsis CHM Locus Promote Rearrangements of the Mitochondrial Genome. The Plant Cell. 1992;4(8):889.
- 12. Xu Y-Z, Arrieta-Montiel MP, Virdi KS, de Paula WBM, Widhalm JR, Basset GJ, et al. MutS HOMOLOG1 is a nucleoid protein that alters mitochondrial and plastid properties and plant response to high light. Plant Cell. 2011;23(9):3428–41. pmid:21934144
- 13. Davila JI, Arrieta-Montiel MP, Wamboldt Y, Cao J, Hagmann J, Shedge V, et al. Double-strand break repair processes drive evolution of the mitochondrial genome in Arabidopsis. BMC Biol. 2011;9:64. pmid:21951689
- 14. Zou Y, Zhu W, Sloan DB, Wu Z. Long-read sequencing characterizes mitochondrial and plastid genome variants in Arabidopsis msh1 mutants. Plant J. 2022;112(3):738–55. pmid:36097957
- 15. Abdelnoor RV, Christensen AC, Mohammed S, Munoz-Castillo B, Moriyama H, Mackenzie SA. Mitochondrial genome dynamics in plants and animals: convergent gene fusions of a MutS homologue. J Mol Evol. 2006;63(2):165–73. pmid:16830098
- 16. Fukui K, Harada A, Wakamatsu T, Minobe A, Ohshita K, Ashiuchi M, et al. The GIY-YIG endonuclease domain of Arabidopsis MutS homolog 1 specifically binds to branched DNA structures. FEBS Lett. 2018;592(24):4066–77. pmid:30372520
- 17. Peñafiel-Ayala A, Peralta-Castro A, Mora-Garduño J, García-Medel P, Zambrano-Pereira AG, Díaz-Quezada C, et al. Plant Organellar MSH1 Is a Displacement Loop-Specific Endonuclease. Plant Cell Physiol. 2024;65(4):560–75. pmid:37756637
- 18. Christensen AC. Genes and junk in plant mitochondria-repair mechanisms and selection. Genome Biol Evol. 2014;6(6):1448–53. pmid:24904012
- 19. Ayala-García VM, Baruch-Torres N, García-Medel PL, Brieba LG. Plant organellar DNA polymerases paralogs exhibit dissimilar nucleotide incorporation fidelity. FEBS J. 2018;285(21):4005–18. pmid:30152200
- 20. Broz AK, Keene A, Fernandes Gyorfy M, Hodous M, Johnston IG, Sloan DB. Sorting of mitochondrial and plastid heteroplasmy in Arabidopsis is extremely rapid and depends on MSH1 activity. Proc Natl Acad Sci U S A. 2022;119(34):e2206973119. pmid:35969753
- 21. Lencina F, Landau A, Prina AR. The Barley Chloroplast Mutator (cpm) Mutant: All Roads Lead to the Msh1 Gene. Int J Mol Sci. 2022;23(3):1814. pmid:35163736
- 22. Halligan DL, Keightley PD. Spontaneous mutation accumulation studies in evolutionary genetics. Annu Rev Ecol Evol Syst. 2009;40:151–72.
- 23. Katju V, Bergthorsson U. Old Trade, New Tricks: Insights into the Spontaneous Mutation Process from the Partnering of Classical Mutation Accumulation Experiments with High-Throughput Genomic Approaches. Genome Biol Evol. 2019;11(1):136–65. pmid:30476040
- 24. Wang Y, Obbard DJ. Experimental estimates of germline mutation rate in eukaryotes: a phylogenetic meta-analysis. Evol Lett. 2023;7(4):216–26. pmid:37475753
- 25. Redei GP. Extra-chromosomal mutability determined by a nuclear gene locus in Arabidopsis. Mutat Res. 1973;18:149–62.
- 26. Virdi KS, Laurie JD, Xu Y-Z, Yu J, Shao M-R, Sanchez R, et al. Arabidopsis MSH1 mutation alters the epigenome and produces heritable changes in plant growth. Nat Commun. 2015;6: 6386.
- 27. Preuten T, Cincu E, Fuchs J, Zoschke R, Liere K, Börner T. Fewer genes than organelles: extremely low and variable gene copy numbers in mitochondria of somatic plant cells. Plant J. 2010;64(6):948–59. pmid:21143676
- 28. Broz AK, Waneka G, Wu Z, Fernandes Gyorfy M, Sloan DB. Detecting de novo mitochondrial mutations in angiosperms with highly divergent evolutionary rates. Genetics. 2021;218(1):iyab039. pmid:33704433
- 29. Zwonitzer KD, Tressel LG, Wu Z, Kan S, Broz AK, Mower JP, et al. Genome copy number predicts extreme evolutionary rate variation in plant mitochondrial DNA. Proc Natl Acad Sci U S A. 2024;121(10):e2317240121. pmid:38427600
- 30. Broz AK, Sloan DB, Johnston IG. Stochastic organelle genome segregation through Arabidopsis development and reproduction. New Phytol. 2024;241(2):896–910. pmid:37925790
- 31. Stupar RM, Lilly JW, Town CD, Cheng Z, Kaul S, Buell CR, et al. Complex mtDNA constitutes an approximate 620-kb insertion on Arabidopsis thaliana chromosome 2: implication of potential sequencing errors caused by large-unit repeats. Proc Natl Acad Sci U S A. 2001;98(9):5099–103. pmid:11309509
- 32. Fields PD, Waneka G, Naish M, Schatz MC, Henderson IR, Sloan DB. Complete Sequence of a 641-kb Insertion of Mitochondrial DNA in the Arabidopsis thaliana Nuclear Genome. Genome Biol Evol. 2022;14(5):evac059. pmid:35446419
- 33. Connell JR, Benton MC, Lea RA, Sutherland HG, Chaseling J, Haupt LM, et al. Pedigree derived mutation rate across the entire mitochondrial genome of the Norfolk Island population. Sci Rep. 2022;12(1):6827. pmid:35473946
- 34. Árnadóttir ER, Moore KHS, Guðmundsdóttir VB, Ebenesersdóttir SS, Guity K, Jónsson H, et al. The rate and nature of mitochondrial DNA mutations in human pedigrees. Cell. 2024;187(15):3904–18.e8. pmid:38851187
- 35. Ho EKH, Macrae F, Latta LC, McIlroy P, Ebert D, Fields PD, et al. High and highly variable spontaneous mutation rates in Daphnia. Mol Biol Evol. 2020;In Press.
- 36. Xu S, Schaack S, Seyfert A, Choi E, Lynch M, Cristescu ME. High mutation rates in the mitochondrial genomes of Daphnia pulex. Mol Biol Evol. 2012;29(2):763–9. pmid:21998274
- 37. Weng M-L, Becker C, Hildebrandt J, Neumann M, Rutter MT, Shaw RG, et al. Fine-Grained Analysis of Spontaneous Mutation Spectrum and Frequency in Arabidopsis thaliana. Genetics. 2019;211(2):703–14. pmid:30514707
- 38. Ness RW, Kraemer SA, Colegrave N, Keightley PD. Direct Estimate of the Spontaneous Mutation Rate Uncovers the Effects of Drift and Recombination in the Chlamydomonas reinhardtii Plastid Genome. Mol Biol Evol. 2016;33(3):800–8. pmid:26615203
- 39. Krasovec M, Eyre-Walker A, Sanchez-Ferandin S, Piganeau G. Spontaneous mutation rate in the smallest photosynthetic eukaryotes. Mol Biol Evol. 2017;34:1770–9.
- 40. Krasovec M, Sanchez-Brosseau S, Piganeau G. First Estimation of the Spontaneous Mutation Rate in Diatoms. Genome Biol Evol. 2019;11(7):1829–37. pmid:31218358
- 41. López-Cortegano E, Craig RJ, Chebib J, Samuels T, Morgan AD, Kraemer SA, et al. De Novo Mutation Rate Variation and Its Determinants in Chlamydomonas. Mol Biol Evol. 2021;38(9):3709–23. pmid:33950243
- 42. Parsons TJ, Muniec DS, Sullivan K, Woodyatt N, Alliston-Greiner R, Wilson MR, et al. A high observed substitution rate in the human mitochondrial DNA control region. Nat Genet. 1997;15(4):363–8. pmid:9090380
- 43. Denver DR, Morris K, Lynch M, Vassilieva LL, Thomas WK. High direct estimate of the mutation rate in the mitochondrial genome of Caenorhabditis elegans. Science. 2000;289(5488):2342–4. pmid:11009418
- 44. Morton BR. The role of context-dependent mutations in generating compositional and codon usage bias in grass chloroplast DNA. J Mol Evol. 2003;56(5):616–29. pmid:12698298
- 45. Hershberg R, Petrov DA. Evidence that mutation is universally biased towards AT in bacteria. PLoS Genet. 2010;6(9):e1001115. pmid:20838599
- 46. Hildebrand F, Meyer A, Eyre-Walker A. Evidence of selection upon genomic GC-content in bacteria. PLoS Genet. 2010;6(9):e1001107. pmid:20838593
- 47. Lynch M. Rate, molecular spectrum, and consequences of human mutation. Proc Natl Acad Sci U S A. 2010;107(3):961–8. pmid:20080596
- 48. Sloan DB, Wu Z. History of plastid DNA insertions reveals weak deletion and at mutation biases in angiosperm mitochondrial genomes. Genome Biol Evol. 2014;6(12):3210–21. pmid:25416619
- 49. Boesch P, Ibrahim N, Paulus F, Cosset A, Tarasenko V, Dietrich A. Plant mitochondria possess a short-patch base excision DNA repair pathway. Nucleic Acids Res. 2009;37(17):5690–700. pmid:19625491
- 50. Córdoba-Cañero D, Dubois E, Ariza RR, Doutriaux M-P, Roldán-Arjona T. Arabidopsis uracil DNA glycosylase (UNG) is required for base excision repair of uracil and increases plant sensitivity to 5-fluorouracil. J Biol Chem. 2010;285(10):7475–83. pmid:20056608
- 51. Córdoba‐Cañero D, Roldán‐Arjona T, Ariza RR. Arabidopsis ZDP DNA 3′‐phosphatase and ARP endonuclease function in 8‐oxoG repair initiated by FPG and OGG 1 DNA glycosylases. Plant J. 2014;79:824–34.
- 52. Mikhailova AG, Mikhailova AA, Ushakova K, Tretiakov EO, Iliushchenko D, Shamansky V, et al. A mitochondria-specific mutational signature of aging: increased rate of A> G substitutions on the heavy strand. Nucleic acids research. 2022;50: 10264–77.
- 53. Taylor DR, Zeyl C, Cooke E. Conflicting levels of selection in the accumulation of mitochondrial defects in Saccharomyces cerevisiae. Proc Natl Acad Sci U S A. 2002;99:3690–4.
- 54. Schaack S, Ho EKH, Macrae F. Disentangling the intertwined roles of mutation, selection and drift in the mitochondrial genome. Philos Trans R Soc Lond B Biol Sci. 2020;375(1790):20190173. pmid:31787045
- 55. Duret L, Galtier N. Biased gene conversion and the evolution of mammalian genomic landscapes. Annu Rev Genomics Hum Genet. 2009;10:285–311. pmid:19630562
- 56. Brown J, Brown T, Fox KR. Affinity of mismatch-binding protein MutS for heteroduplexes containing different mismatches. Biochem J. 2001;354:627–33.
- 57. Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SAJR, Behjati S, Biankin AV, et al. Signatures of mutational processes in human cancer. Nature. 2013;500(7463):415–21. pmid:23945592
- 58. Drost J, van Boxtel R, Blokzijl F, Mizutani T, Sasaki N, Sasselli V, et al. Use of CRISPR-modified human stem cell organoids to study the origin of mutational signatures in cancer. Science. 2017;358(6360):234–8. pmid:28912133
- 59. Meier B, Volkova NV, Hong Y, Schofield P, Campbell PJ, Gerstung M, et al. Mutational signatures of DNA mismatch repair deficiency in C. elegans and human cancers. Genome Res. 2018;28(5):666–75. pmid:29636374
- 60. Sondka Z, Dhir NB, Carvalho-Silva D, Jupe S, Madhumita, McLaren K, et al. COSMIC: a curated database of somatic variants and clinical data for cancer. Nucleic Acids Res. 2024;52: D1210–7.
- 61. Alexandrov LB, Kim J, Haradhvala NJ, Huang MN. The repertoire of mutational signatures in human cancer. Nature. 2020.
- 62. Jeanjean SI, Shen Y, Hardy LM, Daunay A, Delépine M, Gerber Z, et al. A detailed analysis of second and third-generation sequencing approaches for accurate length determination of short tandem repeats and homopolymers. Nucleic Acids Res. 2025;53(5):gkaf131. pmid:40036507
- 63. Massouh A, Schubert J, Yaneva-Roder L, Ulbricht-Jones ES, Zupok A, Johnson MTJ, et al. Spontaneous Chloroplast Mutants Mostly Occur by Replication Slippage and Show a Biased Pattern in the Plastome of Oenothera. Plant Cell. 2016;28(4):911–29. pmid:27053421
- 64. Kundariya H, Sanchez R, Yang X, Hafner A, Mackenzie SA. Methylome decoding of RdDM-mediated reprogramming effects in the Arabidopsis MSH1 system. Genome Biol. 2022;23(1):167. pmid:35927734
- 65. Lynch M. Evolution of the mutation rate. Trends Genet. 2010;26(8):345–52. pmid:20594608
- 66. Muthye V, Lavrov DV. Multiple Losses of MSH1, Gain of mtMutS, and Other Changes in the MutS Family of DNA Repair Proteins in Animals. Genome Biol Evol. 2021;13(9):evab191. pmid:34402879
- 67. Sloan DB, Wu Z, Sharbrough J. Correction of Persistent Errors in Arabidopsis Reference Mitochondrial Genomes. Plant Cell. 2018;30(3):525–7. pmid:29519893
- 68. Sato S, Nakamura Y, Kaneko T, Asamizu E, Tabata S. Complete structure of the chloroplast genome of Arabidopsis thaliana. DNA Res. 1999;6(5):283–90. pmid:10574454
- 69. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357–9. pmid:22388286
- 70. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–9. pmid:19505943
- 71.
Arrieta-Montiel MP, Mackenzie SA. Plant mitochondrial genomes and recombination. In: Kempken F, Editor. Plant Mitochondria. New York: Springer Verlag. 2011. p. 65–82.
- 72. Zhu A, Guo W, Gupta S, Fan W, Mower JP. Evolutionary dynamics of the plastid inverted repeat: the effects of expansion, contraction, and loss on substitution rates. New Phytol. 2016;209(4):1747–56. pmid:26574731
- 73. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. pmid:20003500
- 74. Waneka G, Broz AK, Wold-McGimsey F, Zou Y, Wu Z, Sloan DB. Disruption of recombination machinery alters the mutational landscape in plant organellar genomes. G3 (Bethesda). 2025;In Press.
- 75.
Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv [q-bio.GN]. 2013.
- 76. Reiser L, Bakker E, Subramaniam S, Chen X, Sawant S, Khosa K, et al. The Arabidopsis Information Resource in 2024. Genetics. 2024;227:iyae027.
- 77. Sung W, Tucker AE, Doak TG, Choi E, Thomas WK, Lynch M. Extraordinary genome stability in the ciliate Paramecium tetraurelia. Proc Natl Acad Sci U S A. 2012;109(47):19339–44. pmid:23129619
- 78. Saxer G, Havlak P, Fox SA, Quance MA, Gupta S, Fofanov Y, et al. Whole genome sequencing of mutation accumulation lines reveals a low mutation rate in the social amoeba Dictyostelium discoideum. PLoS One. 2012;7(10):e46759. pmid:23056439
- 79. Lynch M, Sung W, Morris K, Coffey N, Landry CR, Dopman EB, et al. A genome-wide view of the spectrum of spontaneous mutations in yeast. Proc Natl Acad Sci U S A. 2008;105(27):9272–7. pmid:18583475
- 80. Liu H, Zhang J. Yeast spontaneous mutation rate and spectrum vary with environment. Curr Biol. 2019;29:1584–91.e3.
- 81. Howe DK, Baer CF, Denver DR. High rate of large deletions in Caenorhabditis briggsae mitochondrial genome mutation processes. Genome Biol Evol. 2009;2:29–38. pmid:20333220
- 82. Konrad A, Thompson O, Waterston RH, Moerman DG, Keightley PD, Bergthorsson U, et al. Mitochondrial Mutation Rate, Spectrum and Heteroplasmy in Caenorhabditis elegans Spontaneous Mutation Accumulation Lines of Differing Population Size. Mol Biol Evol. 2017;34(6):1319–34. pmid:28087770
- 83. Molnar RI, Bartelmes G, Dinkelacker I, Witte H, Sommer RJ. Mutation rates and intraspecific divergence of the mitochondrial genome of Pristionchus pacificus. Mol Biol Evol. 2011;28(8):2317–26. pmid:21368317
- 84. Haag-Liautard C, Coffey N, Houle D, Lynch M, Charlesworth B, Keightley PD. Direct estimation of the mitochondrial DNA mutation rate in Drosophila melanogaster. PLoS Biol. 2008;6(8):e204. pmid:18715119