Different rates of spontaneous mutation of chloroplastic and nuclear viroids as determined by high-fidelity ultra-deep sequencing

Mutation rates vary by orders of magnitude across biological systems, being higher for simpler genomes. The simplest known genomes correspond to viroids, subviral plant replicons constituted by circular non-coding RNAs of few hundred bases. Previous work has revealed an extremely high mutation rate for chrysanthemum chlorotic mottle viroid, a chloroplast-replicating viroid. However, whether this is a general feature of viroids remains unclear. Here, we have used high-fidelity ultra-deep sequencing to determine the mutation rate in a common host (eggplant) of two viroids, each representative of one family: the chloroplastic eggplant latent viroid (ELVd, Avsunviroidae) and the nuclear potato spindle tuber viroid (PSTVd, Pospiviroidae). This revealed higher mutation frequencies in ELVd than in PSTVd, as well as marked differences in the types of mutations produced. Rates of spontaneous mutation, quantified in vivo using the lethal mutation method, ranged from 1/1000 to 1/800 for ELVd and from 1/7000 to 1/3800 for PSTVd depending on sequencing run. These results suggest that extremely high mutability is a common feature of chloroplastic viroids, whereas the mutation rates of PSTVd and potentially other nuclear viroids appear significantly lower and closer to those of some RNA viruses.


Author summary
Spontaneous mutations are the ultimate source of genetic variation and their characterization provides fundamental information about evolutionary processes. The highest mutation rate so far described corresponds to a hammerhead viroid infecting plant chloroplasts. Viroids are plant-exclusive parasites constituted by 250-400 nt-long, non-protein-coding RNAs, and are divided into two families with distinct mechanisms of Introduction Spontaneous mutations are pivotal to evolution as they constitute the ultimate source of genetic variation. The biochemical and genetic bases of replication fidelity have been extensively studied, and it is well-established that spontaneous mutation rates vary by orders of magnitude across biological systems [1,2]. Whereas bacteria and other microorganisms show highly accurate replication, RNA viruses replicate with frequent errors [3]. Yet, the lowest replication fidelity reported so far corresponds to chrysanthemum chlorotic mottle viroid (CChMVd), a chloroplastic viroid in which a mutation is incorporated approximately every 400 bases copied [4]. Viroids are small (250-400 nt), circular, highly-structured RNAs that do not encode proteins and are copied by nuclear or chloroplastic DNA-dependent RNA polymerases forced to accept RNA templates [5][6][7]. They infect plants exclusively and their pathogenicity has been linked to RNA silencing [6], although other mechanisms cannot be excluded. Chloroplastic viroids encode in both polarity strands hammerhead ribozymes that play an essential role in their replication cycle. Together with their extreme simplicity, the presence of ribozymes makes these viroids reminiscent of the primordial replicons postulated by the RNA world hypothesis for the origin of life [5,8].
Although the genetic diversity of some representative viroids has been characterized in previous work [9][10][11], CChMVd is the only viroid for which the rate of spontaneous mutation has been determined experimentally [4]. As such, it remains to be shown to what extent extremely high mutation rate is a more general property of viroids or, in contrast, is specific to CChMVd and closely-related viroids. CChMVd belongs to the family Avsunviroidae, the members of which replicate in plastids (mostly chloroplasts), where their single-stranded circular RNA is copied by a bacteriophage-like nuclear-encoded RNA polymerase (NEP) through a rolling-circle mechanism to yield linear oligomers [12,13]. The latter are cleaved co-transcriptionally [14] by the embedded hammerhead ribozymes to yield monomers [15,16], which are circularized by a tRNA ligase [17]. By convention, the (+) polarity is assigned to the most abundant strand, but the replication cycle is fully symmetric, i.e. identical for both polarities (Fig 1).
In contrast, members of the family Pospiviroidae are copied by RNA polymerase II in the nucleus, where rolling-circle replication of the circular (+) strand produces (-) oligomers, which are used directly for a second round of copying to yield (+) oligomers [18][19][20]. These replicative intermediates are then cleaved into (+) monomers by RNAse III [21] and circularized by DNA ligase I accepting RNA substrates [22] (Fig 1). Although analysis of genetic diversity suggested differences in replication fidelity between nuclear and chloroplastic viroids [9][10][11], recent work based on previously published deep sequencing data has posited that potato spindle tuber viroid (PSTVd), the type species of the family Pospiviroidae, might show extremely high copying error rates similar to those of CChMVd [23]. It therefore remains to be elucidated whether the two viroid families show different rates of spontaneous mutation. Importantly, these previous works have not disentangled the various factors contributing to genetic diversity, which include selection, mutational robustness, but also genetic drift and population structure among others, precluding an unbiased inference of mutation rates.
Here, we sought to quantify in vivo the mutation rate of one representative viroid of each family replicating in a common host. Typically, mutation rate estimation methods require highly controlled laboratory conditions as well as detailed information about the number of replication cycles elapsed [24], which makes them often unsuitable for measuring mutation rates in vivo. To circumvent this problem, it is possible to quantify the population frequency of lethal mutations [4]. This is because lethal mutations cannot be inherited, and hence, their frequency should equal the rate at which they are produced. As formalized by classical mutationselection balance models, in haploid populations the equilibrium frequency of a deleterious mutation is q = μ / s, where μ is the rate of spontaneous mutation and s the selection coefficient [25]. Hence, whereas for slightly deleterious or neutral mutations (s ! 0) the observed mutation frequency may strongly deviate from mutation rate (q >> μ), for highly deleterious mutations (s ! 1) q will approach μ. Analysis of lethal or quasi-lethal mutations has been previously used for inferring the mutation rates of hepatitis C virus [26,27], poliovirus [28], and human immunodeficiency virus 1 [29], in addition to CChMVd [4]. A complication of this approach, though, is that since lethal mutations have low population frequencies, sequencing must be carried out with both high depth and accuracy. Sanger sequencing is highly accurate but has limited depth, whereas standard next-generation sequencing (NGS) has extreme depth but low per-read accuracy. This problem has been solved recently by the development of methods that increase the accuracy of NGS by orders of magnitude, such as CircSeq and Duplex Sequencing (DS) [30,31], which now permit a better characterization of viroid genetic diversity and mutation rates.
Here we have focused on DS because CirSeq demands high amounts of starting material and is thus impractical for viroid RNA obtained from infected tissue. DS reduces considerably sequencing mistakes by tagging and sequencing independently each of the two DNA strands multiple times, wherein true mutations are detected in the same position. Whereas DS does not allow removal of errors associated with reverse transcription and PCR, it can nevertheless The asymmetric pathway with one rolling-circle (upper) is followed by members of the family Pospiviroidae, while the symmetric pathway with two rollingcircles (lower) is followed by members of the family Avsunviroidae. Orange and blue lines refer to plus (+) and minus (-) polarities, respectively and processing sites are marked by arrowheads. The enzymes and ribozymes that catalyze the replication steps are indicated. Notice that RNA polymerase II (and NEP) is redirected to transcribe RNA templates and DNA ligase I to circularize RNA substrates. Abbreviations: HHRz, hammerhead ribozyme; NEP, nuclear-encoded polymerase. Reproduced with modifications from [5] with permission.
strongly increase accuracy compared to conventional NGS by removing errors associated with sequencing. We have applied DS to the chloroplastic eggplant latent viroid (ELVd) and the nuclear PSTVd infecting eggplant to exclude possible biases caused by using different hosts. We found that, while ELVd showed an extremely high mutation rate similar to that of CChMVd [4], the mutation rate of PSTVd was lower and fell closer to the typical range of RNA viruses.

DS provides in-depth sequence coverage while strongly reducing sequencing errors
For each ELVd and PSTVd, three eggplant seedlings were agro-inoculated with infectious plasmids containing head-to-tail dimeric inserts of the corresponding viroid cDNAs and total nucleic acids from upper non-inoculated leaves were extracted six months post-inoculation (mpi) for PSTVd and 12 mpi for ELVd. Subsequent fractionation with non-ionic cellulose [32] resulted in preparations enriched in RNAs with a high content in secondary structure, including viroid RNAs. The six individual preparations were separated by denaturing PAGE and the RNAs migrating between the markers of 400 and 600 nt containing the monomeric circular RNAs were eluted and recovered (Fig 2). To assess whether levels of genetic diversity varied among different viroid RNA forms, we also recovered the strands migrating between the markers of 600 and 1000 nt, which correspond to oligomeric viroid RNAs. The extracted RNAs were used for high-fidelity RT-PCR and sequenced by the DS method. The RT-PCR was performed with adjacent primers of opposite polarities to generate full-length products from the monomeric viroid circular (+) strands and the (-) oligomers, hence allowing us to sequence the entire viroid except for the primer regions. To control for errors associated with reverse transcription, PCR, and sequencing, we also performed DS of the PCR product obtained directly from a plasmid with a PSTVd insert, as well as of the RT-PCR products from (+) PSTVd and (+) ELVd RNAs transcribed in vitro using T7 or T3 RNA polymerases. The PCR products from plant extracts and the RT-PCR controls were tagged and analyzed in the same run, using a MiSeq Illumina sequencer. DS of the direct PSTVd PCR product with an average depth of 8071 reads per site yielded 16 total mutations in 2,808,781 bases read. Hence, assuming there was no variation in the plasmid template, the joint technical error rate of PCR and DS was 5.7 × 10 −6 . This rate increased to 3.9 × 10 −5 and 4.4 × 10 −5 in RT-PCR products from the in vitro transcripts of PSTVd and ELVd, respectively, showing the important contribution of RT to sequencing errors, although mutations arising during in vitro transcription were also probably present in these controls.

ELVd and PSTVd display different mutation frequencies
Analysis of the three ELVd-infected plants yielded an average mutation frequency of (1.5 ± 0.3) × 10 −2 for circular (+) strands and of (1.8 ± 0.3) × 10 −2 for (-) oligomers (Table 1; S1 Dataset), confirming the extremely high genetic diversity of chloroplastic viroids. At an average sequencing depth of 764 reads/site/run, 264 of the 295 ELVd nucleotide sites examined (89%) were polymorphic. The genetic diversity was not uniformly distributed along the ELVd sequence, with peaks in regions encompassing sites 120-140 and 240-245, which map to a hairpin and a loop, respectively, in the secondary structure proposed in vivo for the monomeric ELVd (+) strand [33] (Fig 3). In contrast, relatively low diversity values were found in regions delimited by sites 69-79 and 95-103, which form the two strands of a base-paired stem, as well as an adjacent bulge. We also found lower-than-average diversity in the region encompassing sites 18-50, which maps to the (+) hammerhead ribozyme. Sites 152-180 and 188-200, which map to the (-) hammerhead ribozyme, also showed low diversity in circular (+) strands. Overall, there was an excellent correlation between per-site mutation frequencies in circular (+) strands and (-) oligomers of the same plant at the analyzed sampling time (Pearson correlation: r ! 0.952; P < 0.001). We also found that per-site mutation frequencies were significantly correlated among plants (r ! 0.785; P < 0.001), suggesting that genetic diversity was mainly driven by a deterministic mutation-selection balance.
In PSTVd, we found variation in 262 of the 304 sites examined (86%) at an average sequencing depth of 4471 reads/site/run. Hence, a greater depth was required to sample a number of polymorphic sites similar to that found for ELVd. The average frequency of  mutations was two orders of magnitude lower in PSTVd than in ELVd, with values of (1.5 ± 0.2) × 10 −4 and (3.0 ± 2.0) × 10 −4 for circular (+) strands and (-) oligomers, respectively ( Table 1). The higher diversity of (-) oligomers was driven by an anomalously high mutation frequency in plant 1 (7.1 × 10 −4 ). Importantly, mutation frequency was significantly higher in sequences obtained from PSTVd-infected plants than in control sequences derived from the in vitro transcript, indicating that most sequence variants detected in vivo were real and not RT-PCR-sequencing artifacts (Fisher test: P < 0.001 in all six runs). As for ELVd, we found significant correlations between the per-site mutation frequencies of circular (+) strands and (-) oligomers in plant 2 (r = 0.551; P < 0.001) and plant 3 (r = 0.735; P < 0.001). This Primers used for RT-PCR from samples taken at 6/12 mpi (run 1) are indicated with red and blue font in the sequences and those used for RT-PCR from samples taken at 18 mpi (run 2) are shown in orange and purple (see S1 Table for details). The skyline plots show mutation frequencies (total mutations/total reads per site) averaged over a 10-nucleotide sliding window to smooth the plot. Thick colored arrows in these graphs indicate the positions encompassed by RT-PCR primers, with the same color code as for the sequences. Red plots correspond to circular (+) strands and blue plots to (-) oligomers. Light, bright, and dark color correspond to plants 1, 2, and 3, respectively. In the ELVd sequence and skyline plots, key elements of the (+) hammerhead ribozyme are shown in black and those of the (-) hammerhead ribozyme in white. Flags indicate the regions encompassed by the hammerhead ribozymes of each polarity, arrows indicate self-cleavage sites in the sequences, and conserved motifs are indicated with boxes. By convention, ELVd sites are numbered starting from the (+) strand self-cleavage site. In the PSTVd sequence, the central conserved region (CCR) and terminal conserved region (TCR) are boxed and also indicated by black bars in the skyline plots. Red bases shown next to the PSTVd sequence indicate mutations previously shown to have lethal effects (see text for references). By convention, PSTVd sites are numbered starting from the left terminal loop. A site-by-site list of all mutations is available in S1 Dataset (6/12 mpi) and S2 Dataset (18 mpi correlation, albeit still significant, was much lower in plant 1 (r = 0.228; P < 0.001), with the (-) oligomers of this plant showing a distribution of mutations across sites discordant with the other five runs (Fig 3). Excluding this anomalous set, we also observed correlations between per-site mutation frequencies from different plants, albeit lower than those found for ELVd (r ! 0.468; P < 0.001). This result could be explained by the greater difficulty of reproducibly sampling rarer genetic variants, or could indicate that random genetic drift has a stronger influence on genetic diversity in PSTVd than for ELVd. Finally, no major diversity peaks were found in PSTVd except for position 117 at the terminus of an A-rich sequence in the pathogenicity domain of the secondary structure proposed in vivo for the monomeric PSTVd (+) strand [34,35] (Fig 3). This region had a marked tendency to point insertions/deletions, possibly resulting from polymerase slippage. The most abundant types of mutations in ELVd sequences from circular (+) strands were transitions (75.4 ± 4.6% of all mutations), followed by transversions (21.7 ± 4.5%) and point insertions (2.5 ± 0.1%), whereas point deletions were the rarest type (0.4 ± 0.1%). C-to-U, Gto-A, and U-to-C substitutions were found at similar frequencies, whereas A-to-G changes were slightly less frequent. A very similar pattern was found for sequences derived from (-) oligomers ( Table 2). In contrast, the mutational spectrum was markedly different in PSTVd, with 50.5 ± 2.0% transitions, 40.8 ± 1.6% transversions, 3.6 ± 1.1% insertions, and 5.2 ± 0.6% deletions in circular (+) strands, versus 56.5 ± 3.5%, 31.5 ± 6.2%, 6.4 ± 2.7%, and 5.6 ± 0.6% in (-) oligomers, respectively. Contrarily to ELVd, we found clear differences among transition types in PSTVd, such that C-to-U > G-to-A > U-to-C > A-to-G in circular (+) strands, whereas (-) oligomers showed a different pattern (G-to-A > A-to-G > U-to-C % C-to-U; Table 2). The different mutational spectrum of PSTVd (+) and (-) strands was explained in part by reverse complementarity, i.e. C-to-U > G-to-A in (+) strands as opposed to G-to-A > C-to-U in (-) strands, and U-to-C > A-to-G in (+) strands as opposed to A-to-G > U-to-C in (-) strands.

The mutation rate of PSTVd is lower than that of ELVd
To estimate the ELVd mutation rate by the lethal mutation method we focused on the hammerhead ribozymes, which mediate self-cleavage of the linear oligomers and are hence  (Fig 4). The core nucleotides are required for the catalytic activity of the ribozyme [36][37][38] and, since the vast majority of mutations at these positions inactivate self-cleavage activity, the 15 sites can be used for mutation rate inference using the lethal mutation method, as shown previously [4]. In circular (+) strands average mutation frequencies were (1.   should resemble the rate of spontaneous mutation, as opposed to those in the rest of the sequence. Averaging the above values, the estimated rate of spontaneous mutation of ELVd was (1.3 ± 0.3) × 10 −3 , or roughly one mutation every 800 bases. We adopted the same mutation rate estimation approach for PSTVd. To do this, we focused on specific sites forming the central conserved region (CCR) and the terminal conserved region (TCR). These regions mediate key functions including replication for the CCR, or are presumed to play alternative essential roles (yet unknown) for the TCR [39]. Hence, most mutations at these sites should have highly deleterious or lethal effects. We also focused on a set of 23 different single-base substitutions previously reported to impair PSTVd infectivity [40][41][42][43][44]. In circular (+) strands, average mutation frequencies were (1.4 ± 0.1) × 10 −4 for the CCR/TCR regions and (1.4 ± 0.3) × 10 −4 for the set of previously described lethal mutations (Fig 4). We found similar values in (-) oligomers, except that variance was higher and that plant 1 showed a higher value, as discussed above. Therefore, the estimated rate of spontaneous mutation of PSTVd was 1.4 × 10 −4 or roughly one mutation every 7000 bases. Unexpectedly, mutation frequencies at these essential sites were not lower than those obtained for the rest of the PSTVd sequence. It is possible that most mutations in PSTVd are highly deleterious or lethal, regardless of whether they map or not to these specific regions, meaning that PSTVd would show very low mutational tolerance. It is also possible that genetic diversity accumulated at low rates due to slow replication, such that few polymorphisms were produced at 6 mpi. Alternatively, the actual PSTVd mutation rate might be lower than 1.4 × 10 −4 and the noise introduced by sequencing errors could have precluded us from measuring this lower value. However, our overall sequencing error rate as determined using a PSTVd RNA transcribed in vitro was 3.6-fold lower (3.9 × 10 −5 ) than the estimated PSTVd mutation rate (the error rates estimated specifically for the CCR/TCR and the set of 23 predefined mutations being 2.0 × 10 −5 and 6.6 × 10 −5 , respectively). Notice that this probably represents an upperlimit to the actual sequencing error rate, because in vitro transcription is an error-prone process that may have contributed mutations in our control assays, but not in actual sequences from plants.

Analysis of plants after 18 months of viroid replication
We performed a second set of experiments from the same plants at 18 mpi to address whether viroid diversity depended on sampling time and/or sequencing run. The RNA extraction procedure was identical except that we focused only on monomeric circular RNAs (i.e. migrating between the markers of 400 and 600 nt; Fig 2C and 2D). As above, the RT-PCR was performed with adjacent primers of opposite polarities producing full-length products from monomeric circular (+) RNAs, but annealing at different regions in order to cover the portions of viroid sequence that were not analyzed in the first run (see Fig 3). This new run included six PCR products (three plants, two viroids) as well as controls of RT-PCR products from (+) PSTVd and (+) ELVd RNAs transcribed in vitro. The three ELVd-infected plants yielded an average mutation frequency of (1.45 ± 0.04) × 10 −2 , which is nearly identical to the value obtained in the first run at 12 mpi (Table 3). Furthermore, the distribution of mutations along the ELVd sequence was also highly similar between the two time points (Fig 3; within-plant Pearson correlation between per-site mutation frequencies at 12 and 18 mpi: r ! 0.795), confirming that ELVd reached a deterministic mutation-selection balance.
At 18 mpi, the average frequency of mutations in PSTVd was (2.8 ± 0.1) × 10 −3 , a value an order of magnitude higher than at 6 mpi but still five times lower than for ELVd (Table 3). In addition to position 117 at the terminus of an A-rich sequence (which already showed a high frequency of point insertions/deletions at 6 mpi) at 18 mpi, we found other single-nucleotide polymorphisms (G143A, U161C, C167A, U209del, and U309A) that independently arose at high population frequencies (> 5%) in the three plants (Fig 3). Removal of these few sites from the analysis reduced the average mutation frequency by threefold, i.e. (9.0 ± 1.1) × 10 −4 . We also found that the distribution of mutations along the sequence was markedly different at 6 and 18 mpi (within-plant correlation between per-site mutation frequencies: r 0.131; Fig 3). Contrarily, the per-site mutation frequencies from the three plants were highly correlated at 18 mpi (r ! 0.917; Fig 3). Hence, in contrast to the earlier analysis in which random processes such as genetic drift appeared to play an important role in PSTVd genetic diversity, sequences obtained at 18 mpi were more consistent with a deterministic mutation-selection balance, similar to the pattern found for ELVd. Therefore, PSTVd accumulated relatively low levels of diversity at 6 mpi and showed an unexpectedly slow onset of mutation-selection balance.
As above, we estimated the spontaneous mutation rate of ELVd by focusing on the 15 central catalytic core nucleotides of each hammerhead ribozyme. Average mutation frequencies were (1.2 ± 0.3) × 10 −3 for the (+) hammerhead and (0.9 ± 0.1) × 10 −3 for the (-) hammerhead, or approximately one mutation every 1000 bases copied. These values are similar to those obtained at 6 mpi (Fig 4). For PSTVd, we again focused on CCR/TCR, as well as on the set of 23 single-point mutations previously reported to impair PSTVd infectivity. Average mutation frequencies were (4.1 ± 0.2) × 10 −4 for the CCR/TCR and (4.5 ± 1.3) × 10 −4 for the set of previously described lethal mutations (Fig 4). As opposed to the results obtained at 6 mpi, these values were six-fold lower on average than those obtained for the rest of the PSTVd sequence (Fig 3). This supports the conclusion that the accumulation of diversity was restricted specifically at these sites by the strongly deleterious/lethal effects of mutations. On the other hand, mutation frequencies at the CCR/TCR and for the set of 23 predefined mutations were threefold higher at 18 than at 6 mpi, which was unexpected assuming that these mutations were lethal. This discrepancy could be in part explained by a higher sequencing error rate in this run. The in-vitro transcribed PSTVd control showed a mutation frequency of 1.7 × 10 −4 (172 mutations in 1,014,425 bases read) in the CCR/TCR and of 1.6 × 10 −4 (44 mutations in 271,663 bases read) for the set of 23 predefined mutations, versus 2.0 × 10 −5 and 6.6 × 10 −5 in the previous experiment, respectively. By subtracting the corresponding error rates obtained in the 18 mpi run, the estimated net mutation frequencies were 2.4 × 10 −4 and 2.9 × 10 −4 for the CCR/TCR and the predefined set, respectively, suggesting approximately one mutation every 3800 bases.

Discussion
Owing to their lethality or quasi-lethality, mutations in the catalytic core of ELVd hammerhead ribozymes as well as in the CCR/TCR and some specific sites of PSTVd should have a very small number of rounds of copying to accumulate. Specifically, mutations falling at the central catalytic core of the hammerhead ribozyme should be able to survive for 0 to 2 rounds of copying, depending on the polarities of the sequenced strand and of the mutated hammerhead ribozyme (Fig 1). This makes them an excellent target for mutation rate inference by the lethal mutation approach, and a similar argument should hold for the CCR/TCR and the set of PSTVd mutations previously shown to inactivate infectivity. For instance, changes inactivating the ELVd (+) hammerhead ribozyme should prevent production of circular (+) RNA, implying that these mutations should not be found in the catalytic core of the (+) hammerhead ribozyme in sequences derived from the circular (+) strand template. In contrast, we found mutations at a frequency in the order of 10 −3 in these sequences, a value not attributable to RT-PCR-sequencing errors because the latter were two orders of magnitude less frequent. Yet, at least two other explanations are possible. First, some mutations may have resulted from RNA editing or spontaneous RNA damage (in vivo, or during the extraction process). RNA damage appears more likely in the single-stranded circular (+) RNA than in the in (-) oligomers, a fraction of which could be forming double-stranded complexes. Second, the hammerhead ribozyme located in the 5´-end repeat of the (+) oligomer should not be required for cleavage and hence may incorporate loss-of-function mutations, as opposed to the other oligomer repeats.
We have found a mutation rate for ELVd (1/100 to 1/800) relatively similar to that of CChMVd (1/400) and, hence, our results suggest that an extremely fast mutation is shared by at least two of the four chloroplastic viroids. In contrast, the mutation rate of PSTVd was 4-8 times lower and more similar to those of RNA viruses [3]. This marked difference is probably at the origin of the higher genetic diversity of chloroplastic viroids compared with their nuclear counterparts. RNA polymerase II has proofreading capacity [45] and its estimated misincorporation rate in Caenorhabditis elegans is 4 × 10 −6 , the most frequent errors being C-to-U, followed by G-to-U [46]. Interestingly, these were also the two most frequent mutation types in PSTVd (+) circular strands, although the overall mutation rate of PSTVd was much higher than the estimated RNA polymerase II misincorporation rate. This suggests that the fidelity of RNA polymerase II is strongly reduced when the enzyme is forced to accept viroid RNA as template instead of nuclear DNA. In contrast to PSTVd, chloroplastic viroids are thought to be copied by a bacteriophage-like NEP with a lower fidelity than RNA polymerase II [47,48]. In addition to differences in replication fidelity, we cannot discard other factors that could contribute to explaining differences in mutation rates, such as RNA editing [49] and spontaneous RNA damage. The more open secondary structure of chloroplastic viroids could increase susceptibility to RNA damaging agents. Furthermore, mutagenic free radicals resulting from electron transduction during photosynthesis, as well as unbalanced nucleotide pools, may also contribute to increased mutation rates in the chloroplast. Finally, in addition to differences in replication fidelity and/or RNA damage, chloroplastic and nuclear viroids may also exhibit different tolerance to mutations [50]. We found that, at 6 mpi, PSTVd mutation frequencies showed low heterogeneity along the sequence, with few peaks of diversity and no apparent differences between the CCR/TCR and other viroid regions. However, higher diversity was apparent in some PSTVd regions at 18 mpi.
According to the RNA world hypothesis, RNA preceded DNA as the carrier of genetic information during early stages of life. Indirect evidence supporting the existence of an RNA world is provided by ribozymes, which include the hammerhead structures found in chloroplastic viroids and in viroid-like satellite RNAs. It has been suggested that their small size, circularity, high G+C content, lack of protein-coding ability, and, specially, the catalytic activity associated to ribozymes, make these minimal replicons candidates for being relics of early life-forms [5,8]. An important consequence of error-prone copying in early replicons is the existence of a limit to genome complexity, as genomes over a certain size would incur in an excessive mutational load. This limit would prevent the evolution of new functions, including repair mechanisms, thereby trapping RNA genomes in an evolutionary dead-end, a problem known as Eigen's paradox [51]. This constraint predicts a negative correlation between mutation rate and genome size, although such correlation may have other explanations, including random genetic drift [52] and mutation rate optimization [53]. Whereas there is strong evidence for such a negative correlation among viruses and bacteria [2,3,53,54], viroids, are not self-replicating entities and hence should be subject to different contraints. Except for the possible role of secondary structure, factors determining the mutation rate of nuclear viroids are mainly controlled by the host, implying that lower mutation rates may not be evolutionarily accessible to them. Whereas extremely high mutation rates may situate chloroplastic viroids close to Eigen´s error threshold and may hence impose limits to the evolution of larger sequences, this does not seem to be the case for PSTVd and, probably, other nuclear viroids.

Plant inoculation and RNA fractionation
Eggplant seedlings (Solanun melongena cv. 'Redonda morada') were PSTVd-or ELVd-agroinoculated 6 and 12 months, respectively, before RNA extraction (run 1), and 18 months before RNA extraction (run 2). Total nucleic acids were extracted by grinding systemic leaves in buffer-saturated phenol, and then fractionated on non-ionic cellulose (CF11; Whatman) with STE (50 mM Tris-HCl, pH 7.2, 100 mM NaCl, 1 mM EDTA) containing 16% ethanol [32]. The resulting preparations, enriched in RNAs with a high content in secondary structure including viroid RNAs, were electrophoresed in denaturing 5% polyacrylamide gels containing 89 mM TBE (Tris-Borate-EDTA) and 8 M urea. The gels were stained with ethidium bromide and the viroid circular RNA (migrating between the linear RNA markers of 400 and 600 nt) and the viroid oligomeric forms (migrating between the linear RNA markers of 600 and 1000 nt), were excised, eluted overnight with 10 mM Tris-HCl, pH 7.5 containing 1 mM EDTA and 0.1% SDS, and recovered by ethanol precipitation.

Preparation of controls
The substrate for the ELVd control was the dimeric product resulting from in vitro transcription driven by the T7 promotor of a recombinant plasmids containing a dimeric head-to-tail ELVd-cDNA insert of the reference variant ELVd-2 (GenBank AJ536613). For the PSTVd control, the substrate was the monomeric product resulting from in vitro transcription driven by the T3 promotor of a recombinant plasmid containing a monomeric PSTVd-cDNA insert of variant RG1 (GenBank U23058) opened between positions C1-G2 flanked by a modified version of the hammerhead ribozyme of tobacco ringspot virus satellite RNA and a modified version of the ribozyme of hepatitis delta virus minus RNA strand [20,22]. The resulting unitlength transcripts, purified by denaturing PAGE and subsequent elution, were added to leaves of healthy eggplant homogenized in buffer-saturated phenol and the RNA extraction was continued as indicated in the previous section in order to prepare the controls under conditions mimicking those of infected samples.

Duplex sequencing
This technique increases per-read accuracy by orders of magnitude compared to standard Illumina sequencing, using adapters that have random yet complementary double-stranded nucleotide sequences [55]. Since the probability of two molecules being labeled with the same adapter sequence is vanishingly small, these molecular tags can be used to identify reads originating from each individual strand of DNA in the sequencing output and calculation of a consensus sequence for each of these individual strands, hence allowing removal of sequencing errors. DS adapters were constructed by annealing two oligonucleotides, one of which contained a 12-nt single-stranded randomized sequence tag. Annealed oligonucleotides were extended using the Klenow fragment, digested with a specific restriction endonuclease to produce cohesive ends, and annealed to viroid RT-PCR products for library preparation, following previously described protocols [31]. Given the small size of viroids, no template fragmentation was required. A library was prepared to identify each RT-PCR product and run on an Illumina MiSeq machine sequencer. Sequencing of direct PCR controls was made on a separate run. FastQ files were processed with the DS software pipeline (https://github.com/loeblab/Duplex-Sequencing) using BWA 0.6.2, Samtools 0.1.19, Picard-tools 1.130 and GATK 3.3-0, and GenBank accessions AJ536613 and AJ634596 as reference sequences for ELVd and PSTVd, respectively. After parsing of tags, the first 200 bases of each read were selected to increase accuracy, and initial alignment and single stranded consensus sequence (SSCS) were assembled, followed by duplex consensus sequence (DCS) assembly. The DCS outputs were finally realigned to the reference sequence to count mutations. Previously defined default parameters were used for this process [31].

Mutation rate estimation by the lethal method
As described previously, the frequency of lethal mutations in a population should equal the rate at which these mutations are produced [4,[26][27][28][29]. For ELVd, we assumed that all mutations in the hammerhead ribozyme core sites defined in Fig 4 should be lethal [4], and the same assumption was made for the PSTVd CCR/TCR sites analyzed. The overall mutation rate was simply estimated as m ¼ where N i is the number of mutations at site i, C i is sequencing coverage at site i, and T is the number of sites analyzed. For the 23 mutations that were previously reported to have lethal effects in PSTVd [40][41][42][43][44], the estimation was more complicated because for most of the sites only one or two of the three possible base substitutions could be used for mutation rate estimation, as the other substitutions were not reported lethals. To account for this, the mutation rate was estimated as m ¼ , where ρ j is the contribution of the specific mutation considered to the total mutation spectrum of PSTVd, k is the number of different lethal substitutions at site i (k = 1, 2, or 3) and ρ i is the contribution of the three possible base substitutions at this site to the total mutational spectrum. These coefficients are provided in Table 2 for sequences obtained from circular (+) and oligomeric (-) forms. For instance, if at a given site only C-to-U substitutions were lethal, k = 1 and, for sequences from circular (+) forms, we used ρ j = 1 = 25.7 (i.e. the percentage of C-to-U mutations in the total spectrum) and ρ i = 25.7 + 4.9 + 3.3 (i.e. the percentages of C-to-U, C-to-A, and C-to-G mutations). Notice that this formula could also be used for estimating mutation rates in the PSTVd CCR/TCR and ELVd hammerhead ribozymess more precisely, but this was not necessary as long as base composition and mutational spectra are similar for these regions and the rest of the viroid sequence.