Sequence characterization of eccDNA content in glyphosate sensitive and resistant Palmer amaranth from geographically distant populations

The discovery of non-chromosomal circular DNA offers new directions in linking genome structure with function in plant biology. Glyphosate resistance through EPSPS gene copy amplification in Palmer amaranth was due to an autonomously replicating extra-chromosomal circular DNA mechanism (eccDNA). CIDER-Seq analysis of geographically distant glyphosate sensitive (GS) and resistant (GR) Palmer Amaranth (Amaranthus palmeri) revealed the presence of numerous small extra-chromosomal circular DNAs varying in size and with degrees of repetitive content, coding sequence, and motifs associated with autonomous replication. In GS biotypes, only a small portion of these aligned to the 399 kb eccDNA replicon, the vehicle underlying gene amplification and genetic resistance to the herbicide glyphosate. The aligned eccDNAs from GS were separated from one another by large gaps in sequence. In GR biotypes, the eccDNAs were present in both abundance and diversity to assemble into a nearly complete eccDNA replicon. Mean sizes of eccDNAs were similar in both biotypes and were around 5kb with larger eccDNAs near 25kb. Gene content for eccDNAs ranged from 0 to 3 with functions that include ribosomal proteins, transport, metabolism, and general stress response genetic elements. Repeat content among smaller eccDNAs indicate a potential for recombination into larger structures. Genomic hotspots were also identified in the Palmer amaranth genome with a disposition for gene focal amplifications as eccDNA. The presence of eccDNA may serve as a reservoir of genetic heterogeneity in this species and may be functionally important for survival.


Introduction
Extra-chromosomal circular DNA (eccDNA) are nucleus limited ring-like DNA entities derived from the genome and have been found in a wide range of eukaryotic organisms including yeast, Drosophila, Xenopus, mice, and humans [1][2][3][4]. In yeast, eccDNAs with functional genes and sizes of up to 38 kb that cover 23% of the genome have been reported [5].
EccDNAs have been reported in normal healthy cells in humans [6,7] with functions associated with aging and the formation of telomeric circles [8,9], cancer progression, and therapeutic resistance [10][11][12]. EccDNAs have been implicated in approximately half of all human cancers contributing to genetic heterogeneity that enables aggressive tumors with a selective advantage; hence the higher prevalence in malignant tumors [13][14][15]. Sizes of cancer related eccDNA have been reported to range from several hundred base pairs to 600 kb encoded with functional oncogenes and their various regulatory elements [16,17]. In plants, eccDNAs have been reported in Arabidopsis [18,19], Oryza, Pisum, Secale, Triticum, and Vicia [20,21] with sizes that range from 1 kb to 50kb. These eccDNAs contain coding sequences commonly found within the nucleus such as ribosomal genes, tRNAs, and transposons [19,22,23]. EccD-NAs are thought to arise from linear chromosomes through repeat-mediated intrachromosomal homologous recombination that results in the 'looping-out' of circular structures. These focal amplifications are mediated by multimers corresponding to 5S ribosomal DNA, noncoding chromosomal high-copy tandem repeats, and telomeric DNA [1,21,22]. In Arabidopsis, eccDNA genesis is the result of recombination among inverted repeats upstream and downstream of the various tRNAs and transposons [19]. Several follow up studies using Arabidopsis and rice, have shown that defective RNA polymerase II (Pol II) activity and simultaneous inhibition of DNA methylation leads to the activation of retrotransposons which can induce eccDNA formation upon stress [11]. These studies suggest a possible relationship among epigenetic status, regulation of transposon bursts, and genomic focal amplifications. The presence of eccDNA with functional genes in a cell can be a signature of a stress and/or function as a reservoir of genetic variation in which a cell may activate as a rapid response to stress. For example, oncogene amplification and expression via eccDNA in human cancers provides a unique mechanism for massive gene expression [16] and ultimately a reservoir of genetic heterogeneity by which cancer cells have a selective advantage for aggressive behavior and persistence [13].
Recently, the genetic entity conferring resistance to the herbicide glyphosate in Palmer amaranth (Amaranthus palmeri), now termed the eccDNA replicon, was revealed to be a massive, 399 kb extrachromosomal circular DNA (eccDNA) [24,25]. Glyphosate resistance in Palmer amaranth is achieved through replicon amplification with simultaneous gene copy amplification and expression of the 5-enoylpyruvylshikimate-3-phosphate synthase (EPSPS) gene and its product, EPSP synthase [26], which is the herbicide target of glyphosate [27]. Glyphosate resistance may occur with as few as 5 copies of EPSPS. The increase in EPSPS functions to ameliorate the unbalanced or unregulated metabolic changes, such as shikimate accumulation, loss of aromatic amino acids, phenolic acids for lignin synthesis, and structural intermediates for plant growth regulators associated with glyphosate activity in sensitive plants [26,28]. Isolation and singlemolecule sequencing of the replicon resulted in a single copy of the EPSPS gene along with 58 other predicted genes whose broad functions traverse detoxification, replication, recombination, DNA binding, and transport [25,29]. Gene expression profiling of the replicon under glyphosate treatment showed transcription of 41 of the 59 genes in GR biotypes, with high expression of EPSPS, aminotransferase, zinc-finger, and several uncharacterized proteins [25,29].
Repeat sequences and mobile genetic elements have been associated with eccDNA formation [4,6,18,20,29,30] in higher eukaryotes. The repeat landscape of the replicon is described as a complex arrangement of repeat sequences and mobile genetic elements interspersed among arrays of clustered palindromes which may function in stability, DNA duplication and/ or a means of nuclear integration [25]. In a follow up study, sequence analysis identified a region in the replicon with elevated A+T content and an exact match to a conserved eukaryotic extended autonomous consensus sequence (EACS) [31]. Surrounding this sequence were multiple DNA unwinding elements (DUE), which together are often associated with DNA bending and origins of replication and typically found near EACS [32,33]. Regions flanking these elements in the replicon were cloned into an ARS-less yeast plasmid which resulted in colony formation, suggesting autonomous replication as the mechanism for the replicon increases in copy number [34].
Initial low-resolution FISH analysis of GR A. palmeri showed the amplified EPSPS gene was randomly distributed in the genome, suggesting a possible transposon-based mechanism of mobility [26]. A follow up study using much longer bacterial artificial chromosome (BAC) probes coupled with high resolution fiber extension microscopy verified the eccDNA replicon and identified various structural polymorphisms including intact, circular, dimerized circular, and linear forms [24]. Additionally, this study resolved a critical question regarding the maintenance mechanism that explains uneven segregation of glyphosate resistance among progenies-genomic tethering. Analysis of fiber-FISH images with replicon probes and meiotic pachytene chromosomes revealed very clear, single signals [24]. If the replicon were integrated into the genome, then double signals would be evident, suggesting a tethering mechanism as a means of genomic persistence to daughter cells during cell division [24]. Other genetic entities that maintain genomic persistence through tethering include DNA viruses such as Epstein-Barr, Rhadinovirus, Papillomavirus, and others [35].
Glyphosate resistance in Palmer amaranth has been observed in individuals with EPSPS copy numbers that range from 5-150 copies [26,36,37]. Amplification of the EPSPS gene correlated with amplification of flanking genes and sequence [25,29], which suggests a large amplification unit and genome size enlargement in cells with many replicon copies [29]. Flow cytometry verified significant genome expansion in plants with high copy numbers (eg. 11% increase in genome size with~100 extra copies of the replicon), seemingly without fitness penalty [29].
Glyphosate resistance in Palmer amaranth was originally reported in Georgia in the early 2,000's [38], and a recent analysis using whole genome shotgun sequencing verified that the replicon was present and intact in GR Palmer amaranth populations across the USA [39,40]. This study also reported a lack of replicon SNP variation among GR eccDNAs from geographically distant states when aligned to the Mississippi replicon reference [25]. The replicon was not present in GS individuals, which supports a single origin hypothesis and spread of the replicon across the USA through mechanical means such as spread of GR pollen in contaminated plant products, on farm equipment, and cattle movement, or via pollen.
The genomic mechanisms, origins and how the replicon assembled and gave rise to eccDNA in Palmer amaranth remains elusive, but the above studies lead to a couple of hypotheses: 1) the eccDNA replicon formed through intramolecular recombination among distal parts of the nuclear genome in short evolutionary time, or 2) there may exist a reservoir of smaller eccDNAs that are basal in the cell that may have the ability to recombine to assemble larger units as part of a dynamic response to stress. In this study, we report the presence and sequence characterization of an abundant reservoir of eccDNAs in both GS and GR biotypes using single molecule sequencing and the CIDER-Seq approach [18]. We examine the similarities and differences among samples representing distant geographic locations reported in [40], quantitate their abundance and diversity and assess whether recombination may be possible to form larger multimeric units.

EccDNA content and coding structure in geographically distributed A. palmeri
Following the general methods and recommended computational pipelines outlined in the CIDER-Seq single-molecule approach [18], we identified an extensive amount of variable-sized eccDNA in all samples of (GS) and GR) biotypes that were sequenced [ Table 1]. The number of unique eccDNAs detected in GS samples ranged from 443 (ks_s) to 6,227 (ms_s) with a mean of 2,661 [ Table 1]. Unique eccDNAs were in much higher abundance in GR samples and ranged from 2,200 (az_r) to 5,650 (ms_r), with a mean of 4,448, nearly double that of GS [ Table 1]. Length distributions of eccDNA were similar among both GS and GR biotypes and ranged from 27bp to nearly 27kb, with mean lengths of around 6kb, [ Table 1 and Fig 1]. Gene prediction resulted in eccDNAs both with and without complete open reading frames. In GS samples, the number of eccDNAs with predicted genes ranged from 76-505 with a mean of 272 eccDNAs with genes per sample. GR eccDNAs with predicted genes was nearly 4 times greater with a range of 263-1,179 and a mean of 718 eccDNA with genes per sample, suggesting that glyphosate stress influenced unique gene focal amplifications [ Table 1]. Of the eccDNA with predicted genes, the number of predicted genes per eccDNA ranged from 1 to 10, with an average of 2 genes per eccDNA in both GS and GR [S1 and S2 Tables]. Transfer RNAs (tRNA) were predicted exclusively on eccDNA without CDS sequences and ranged widely from 46-715 (average of 350 per sample) in GS and samples and 130-528 (average of 364 per sample).

Coding content of eccDNAs in glyphosate sensitive and resistant A. palmeri
Gene content from both GS and GR biotypes was compared to identify unique and common functional protein coding domains among the geographically distant samples. In GS biotypes, 9 functional protein coding domains were discovered that are common among the each of the states [Fig 2]. These functional domains are annotated as ATP synthase, cytochrome P450, protein kinase, ribosomal protein, NADH dehydrogenase, Clp protease, and oxidoreductase [ Table 2]. Various pairwise combinations of GS A. palmeri biotypes shared a range of 1 to 12 elements [Fig 2 and S3 Table]. Genes that regulate cell division, such as the Ras protein family and those involved in DNA replication (helicase) were common among Arizona, Georgia, and Mississippi GS eccDNA samples [S3 Table].
Several abiotic/biotic resilience-related functional protein domains were found in Arizona and Mississippi GS samples that includes an oxysterol-binding protein, pectinesterase, NmrA-like family, and WRKY DNA-binding domain elements [S3 Table]. Also discovered were shared functional domains involved in DNA methylation and histone maintenance (H2A/H2B/H3/H4) [S3 Table]. Common between Georgia and Mississippi GS biotypes were ABC transporter and Cytochrome C oxidase subunit II (periplasmic domain) protein domains [S3 Table]. Unique to Arizona were response regulators such as trehalose-phosphatase, chalcone-flavanone isomerase, O-methyltransferase, Myb-like  Table]. Hundreds of other unique functional domains in different GS biotypes were recorded in S3 Table. It is notable that the EPSPS gene was not found in any of the GS eccDNAs.
In GR biotypes, we identified a total of 20 functional protein domains that are shared among all 6 resistant samples [ Table 3]. The shared GR domains had various cellular maintenance functions in addition to stress response domains that include ABC transporter, HSP70 protein, Ribosomal protein, WD domain, and Leucine rich repeats [ Table 3]. A range of 1 to 9 protein family domains were shared by at least 5 of the GR biotypes [S4 Table]. No apical meristem (NAM) protein, peroxidase, TCP-1/cpn60 chaperonin family are among the stress response elements. Arizona, Delaware, Kansas, and Maryland GR biotypes all contained EPSP synthase (3-phosphoshikimate 1-carboxyvinyltransferase) and Arabidopsis phospholipase-like protein (PEARLI 4) functional domains, with 21 and 24 copies distributed across various eccDNA within these four samples respectively.

Gene ontology enrichment of A. palmeri eccDNA
Gene ontology enrichment analysis of predicted coding elements on eccDNA of GS eccDNA revealed a variety of enriched biological processes, cellular components, and molecular functions encoded on eccDNA [ Fig 3]. Enriched biological processes include regulation of transcription, membrane and lipid transport, DNA binding, fatty acid biosynthesis, protein phosphorylation, oxidation-reduction, chromatin maintenance, and protein translation [ Fig  3A and S5 Table]. Cellular component and molecular function categories of interest include membrane and ribosome components [ Fig 3B], cytoplasm, protein kinase activity, and ATP binding [ Fig 3C].
Glyphosate resistant eccDNAs showed similar, but slightly different enriched biological processes such as transmembrane transport, translation, protein phosphorylation, and oxidation-reduction process [ Fig 4A]. Ribosome, nucleus, membrane, and integral component of membrane were also enriched in the cellular component category [ Fig 4B]. Representative molecular functions for GR eccDNA were mainly in the ribosome and membrane categories, but ATP binding, protein kinase activity, and catalytic activity were enriched [Fig 4C  and S6 Table].  Table]. The most common repeat classes were simple repeats, long terminal repeats (LTR) from the Copia superfamily, low complexity regions, and LTR from the Gypsy superfamily [S7 Table]. Interestingly, simple repeat content varied drastically among the GR and GS states. For example, Arizona and Mississippi GS and GR pairs were closely balanced in terms of content, but Mississippi has nearly 6 times as many with~17.5k compared to~4k simple repeats [S7 Table]. The Long Terminal Repeats/Copia class was second in abundance among eccDNAs, followed by low complexity repeats and then Gypsy elements. DNA  Table].

Similarity to the eccDNA replicon and replication origins on eccDNAs in A. palmeri
Alignment and comparative analysis for coding content and conserved sequence structure between GS and GR eccDNAs and the eccDNA replicon [25] identified a total of 162 GS eccDNA and 2,547 GR eccDNA with at matches at least 100 bp in length with a percent identify of at least 95% [ Fig 5]. A total of 7 and 11 eccDNA replicon genes were predicted in GS and GR eccDNA, respectively [S8 Table]. Predicted eccDNA replicon genes in GS eccDNA include PEARLI4, Heat shock (HSP70), no apical meristem (NAM), replication factor-A, retrotransposon, zinc finger, and suppressor of gene silencing [S8 Table]. GR predicted replicon  Fig 6A].
In GR eccDNA we identified 5 eccDNA with 2 copies of the EPSPS gene and 11 eccDNA with a single EPSPS copy [ Fig 6A]. A self-alignment of the GR EPSPS eccDNA shows many conserved direct and indirect repeats [ Fig 6B] with very high sequence identity (>95% with at least 100bp). Palindromic repeats that flank the EPSPS gene, previously described as possible genome tethering sites [25], were also evident among various eccDNA (Grey links in A and on the top right corner of B) indicating the potential for recombination among these smaller eccDNA, relative to the replicon.
Previous work has implicated a 17bp extended autonomous consensus sequence (EACS) with a motif of WWWWTTTAYRTTTWGTT that contains a core 11bp autonomous consensus sequence (ACS) reported in yeast [41] as a sequence where replication machinery initiates autonomous replication in plants [32]; which was functionally verified in the eccDNA replicon [34] [Fig 7]. Analysis of the GS and GR eccDNA for autonomous consensus (ACS) sequences (ACS) [41] Table]. A total of 36,237 core ACS sites (11bp) were predicted within 18,679 unique eccDNA out of the total 37,336 predicted eccDNAs implicating this sequence as a possible common origin of replication sequence    among smaller eccDNA in Amaranthus palmeri. Of the eccDNA that contained ARS sequences, 2,785 were predicted to contain coding sequences, whereas 16,048 eccDNA did not contain an ARS sequence.

Genomic origins of eccDNAs in A. palmeri
To determine the genomic origins of eccDNA and the possibility of genomic regions with a disposition for eccDNA formation, GS and GR eccDNA were mapped to the chromosome scaffolded Amaranthus palmeri assembly [42] and counted using non-overlapping genomic windows of 500kb [Fig 8]. We identified several regions of the genome with a very high disposition for focal amplifications that are conserved between GS and GR.  Table]. The center of chromosome 3 contained 225 GS and 449 GR eccDNA. The genomic region of eccDNA origin among GR samples with the most eccDNA was on chromosome 4 with 487 eccDNA and only 51 from GS, suggesting a possible signal of glyphosate stress. Extraction and self-alignment of the 6 genomic windows from the Palmer amaranth chromosome scale assembly from [42] revealed intricate arrays of repetitive sequence [ Fig 8B]. Short, inverted repeats were the most common among all 6 regions [ Fig 8B]. Clustered palindromes of various sizes were discovered in segments 2, 3, 4, and 5, as indicated by box-like structures. Regions 2 and 3 (highlighted in Fig 8B) contained more complex repetitive structure with larger direct repeats (region 2) and indirect repeats (region 3) [Fig 8B].

PCR validation of eccDNA
To validate circular structure of eccDNA in A. palmeri, primers were designed in various configurations to exclude the possibility of linear DNA, [S1 Fig]. Random templates were selected, Ga_r_ecc_311 and Az_r_ecc_1037, and PCR conducted that exhibit amplification products of the expected size verifying a circular molecule [S1 Fig]. Because of the highly repetitive nature of eccDNA and the low complexity of the sequence, we did observe non-specific binding when using the Ga template, likely due to the repetitive nature of the eccDNA, but do observe amplification products close in size to the predicted sizes.

Discussion
Gene copy number variation is a predominant mechanism by which organisms respond to selective pressures in nature. Focal amplifications of transcriptionally active chromatin as eccDNAs have been found in both abundance and diversity across higher and lower order eukaryotic species underpinning their importance as a vehicle for gene copy amplification. Advancements of single molecule sequencing and approaches to purify and directly sequence circular DNA have led to evidence that eccDNA may have a fundamental role in the cell and function and also function as a source of genetic heterogeneity in response to environmental pressures [1-4, 18, 22, 25, 30]. Previous work in Palmer amaranth demonstrated that several genes in addition to EPSPS were co-amplified on a large eccDNA (~400kb) with sophisticated repetitive content and origins from distal segmental genomic regions [25]. This large eccDNA served as the vehicle for EPSPS gene copy amplification, but whether construction of this large eccDNA was the result of intramolecular recombination or recombination among a population of smaller eccDNA is unclear.
Using single molecule sequencing and the CIDER-Seq analytical pipeline [18], we identified diverse and abundant eccDNA species in both GS and GR biotypes collected from distal geographic regions that were previously reported [40]. The sizes of these eccDNA ranged from a few hundred base pairs to nearly 30kb in both biotypes and between 6 and 20% were predicted to contain genes which indicates that eccDNAs are present in Amaranthus palmeri without glyphosate exposure.
Gene enrichment analysis of both GS and GR eccDNA provided insight on biological processes and molecular functions enriched for activities related to a generalized stress response or important for rapid adaptation such as transcription regulation, development, chromatin, protein phosphorylation, oxidation-reduction, ribosomal and membrane components, protein kinase activity, and ATP binding. This indicates that eccDNAs may have a role in preserving important protein synthesis genes. Notably, transfer RNAs (tRNA) were predicted to reside on eccDNA in both GS and GR samples, but only on eccDNA that do not contain coding sequences. This was also shown by  in Arabidopsis [19] and suggests that protein synthesis is a key attribute or component of the early response to stress and or the adaptive response. This finding also suggests that regulation of protein synthesis is perhaps as driven by eccDNA is an independent component of selection and directed gene focal amplifications as eccDNA. Furthermore, plants likely require additional copies of these protein synthesis genes for stress responses to produce significant immunity or defense products, as is the case for GR A. palmeri [1]. For example, transmembrane transport has been shown to plays an important role in adaptation of Arabidopsis to metalliferous soils [43], resource allocation and sensing under plant abiotic stress [44][45][46] and were enriched on GS Palmer amaranth eccDNA. Fatty acid biosynthesis is another category of enriched genes on GS eccDNA which has been implicated in signaling and plant defense to pathogens [47,48].
At the gene level, there were a core set of 9 functional protein coding domains in common among the GS samples. Ribosomal proteins (circular rDNA), which are commonly reported as functional genes among eccDNA, were found among all 9 GS samples suggesting a common role for rDNAs as circular structures in plants [6,30,49,50]. Interestingly, Cytochrome p450 and Clp protease domains were also present in each of the GS samples. Cytochrome p450s are a superfamily of genes that perform a suite of functions in plant development and protection from various stresses via multiple biosynthetic and detoxification pathways. Cytochrome p450 activity plays a central role detoxification of xenobiotics in various weed species [51][52][53][54], biosynthesis of hormones, fatty acids, sterols, cell wall components, biopolymers, and various defense compounds [55]. Clp proteases are proteolytic enzymes whose increased expression also play a protective role for the plant in both abiotic and biotic stress [56][57][58]. Clp proteases help maintain protein homeostasis in chloroplasts and remove nonfunctional proteins, which is essential during stress episodes when proteins tend to be more vulnerable to damage [20][21][22]. These core genes encoded on GS eccDNA may contribute Palmer amaranth's innate ability to rapidly adapt.
GS biotypes shared the same 9 core functional domains as GS biotypes including Cytochrome p450 and Clp protease, in addition to 11 other domains indicating that eccDNAs are dynamic and their presence and coding structure may be the result of selective pressures. Notably, the additional functional domains in GR biotypes include additional ribosomal motifs, ABC transporters, HSP70 proteins, and leucine rich repeat (LRR) domains. ABC transporters are important for detoxification, environmental stresses and pathogen resistance [23] and may play a complementary role in glyphosate detoxification in addition to EPSP synthase over accumulation. The most abundant functional domain and conserved among all the samples is the HSP70 domain, which functions in protein maintenance and a wide variety of stress response mechanisms such as response to high temperatures [59], and was also a predicted gene on the eccDNA replicon [25]. Hsp70 have been reported to function by holding together protein substrates to help in movement, regulation, and prevent aggregation under physical and or chemical pressure in plants [59,60] and have served as functional target in improving abiotic stress resilience in Arabidopsis [61] and other species. It is notable that the HSP70 is present in both GS and GR biotypes but is a core gene shared among all GR biotypes. The presence of Hsp70 on eccDNA suggests a possible role in glyphosate resistance, or perhaps, a genomic mechanism for rapid mitigation of heat and other abiotic stresses. Leucine rich repeat (LRR) domains are associated with protein-protein interactions, often as part of plant innate immune receptors [62]. Various transcription factors such as WRKY, bZIP, helicases, GATA (zinc finger), E2F, helix-loop-helix, TCP, and others were also predicted on A. palmeri eccDNA. Since transcription factor access to heterochromatin is limited by its compact structure, eccDNAs may provide a faster and more effective avenue for protein synthesis. Cancer cells with oncogenes encoded on eccDNAs appear to produce significantly more transcript copies compared to the same oncogenes encoded on linear DNA structures [14].
A primary question underlying the origins and structural dynamics of the large eccDNA replicon (~400kb) [24,25] is the mechanism by which it is assembled. The most likely scenarios are long-range genomic interactions and a compounded building event over short evolutionary time; or intramolecular recombination between smaller eccDNA with newly selected genomic focal amplifications resulting from glyphosate stress to form the larger structure, again over short evolutionary time scales. Here we show a moderate degree of eccDNA replicon coverage with GS eccDNA [Fig 6A], however there are large, disconnected gaps in coverage. It is notable that the EPSPS gene was not found on any GS biotype eccDNA in this study, while several other replicon genes were. One of the primary drawbacks to the CIDER-Seq methodology is the limitation of eccDNA size to the read length of the Pacific Biosciences Sequel II instrument [18] which means eccDNAs larger than an average read length will not be sequenced intact, such as the eccDNA replicon [25]. This limitation prevented the complete assembly of the EPSPS replicon, however the EPSPS gene and most other predicted eccDNA replicon genes were found in GR biotype eccDNA and coverage of the replicon was practically complete, with only a few small gaps. Furthermore, the EPSPS gene was found on smaller eccDNA in GR biotypes in multiple copies, which corroborates the work of Koo et al., that observed the extra-chromosomal EPSPS gene vehicle as multi-meric forms. [24]. Together, these results suggest that eccDNA are present as a basal source of genetic heterogeneity or rapid response mechanism, are selectively amplified, and the large eccDNA structure reported to confer glyphosate resistance is likely built by recombination among smaller eccDNA over rapid evolutionary timescales.
Another important observation and similarity with the eccDNA replicon are the high abundance and seemingly random distribution of the core 11bp autonomous consensus sequence and a longer more conserved 16bp extended autonomous consensus sequence [41] among approximately half of the GS and GR eccDNA. The greater abundance seems to be on eccDNA without coding sequences. These sequences were previously verified to function in autonomous replication and may be regulated mechanism, perhaps epigenetic or other, to maintain gene copy numbers in A. palmeri. In the eccDNA replicon, there is a single copy of the 17bp consensus sequence and 46 copies of the 11bp sequence, seemingly randomly distributed among the replicon [25,34]. This observation further supports the possibility that the eccDNA replicon is the result of recombination among smaller eccDNA. It is also possible that there are alternate mechanisms or origins of replication on eccDNA in A. palmeri that are used to maintain and amplify copy number. Previous work showed that the coding components of the eccDNA replicon seem to be derived from distal regions of the genome [25], and evidence presented here show that eccDNA in both GS and GR seem to originate from all over the genome, Fig 8A. Here, we also demonstrate that there are segments of the genome, or perhaps a genomic context, with a disposition for focal amplifications. These genomic 'hotspots' are comprised of various repeat structures that may have facilitate eccDNA formation. There are also regions of the genome that seem to be activated as 'hotspots' in response to glyphosate stress that suggests eccDNA formation may also be a directed event, rather than random. It is still unclear if genes need to be in the 'right' genomic context for a focal amplification to occur, or if other regulatory/initiation mechanisms exist. Validation of circular structure with overlapping PCR amplicons provided single PCR bands in most cases, but non-specific binding was also observed-likely due to the repetitive nature of eccDNA. This work provides evidence that eccDNA are a basal component of the cell and likely function as a reservoir of genetic heterogeneity in A. palmeri as part of the rapid adaptation program.

Plant material and genomic DNA extraction
Seeds were collected from individual GR plants that had survived glyphosate application as previously described [29,40]. Plants were grown in 9 × 9 × 9 cm plastic pots that contained a commercial potting mix (Metro-Mix 360; Sun Gro Horticulture, Bellevue, WA, USA). Seeds were sown on the potting mix surface and lightly covered with 2 mm of potting mix. Pots were sub-irrigated and maintained in a greenhouse set at a temperature regime of 30/25 �C (day/ night) and a 15-h photoperiod under natural sunlight conditions supplemented with highpressure sodium lights providing 400 μmol m−2 s−1. Sampling for whole genome sequencing was performed using a leaf from the third node of two representative plants from each population. Total DNA was extracted using a modified CTAB-based protocol with chloroform, isopropanol, and RNase A buffer [39]. Briefly, leaf material from each sample (approximately 20-100 mg) was ground into a fine powder using a mortar and pestle with liquid nitrogen, extracted with CTAB buffer, chloroform extracted, and ethanol precipitated. Total genomic DNA was resuspended in 50 μl of TE (10 mM Tris, 0.1 m MEDTA, pH 8.0) buffer containing RNaseA. The tube was incubated at 37˚C for 30 minutes and stored at -20˚C.

EccDNA enrichment and sequencing (CIDER-seq)
Circular DNA enrichment sequencing (CIDER-Seq) was used to enrich, sequence, and analyze eccDNAs from the leaf tissue DNA extraction samples according to the protocol by Mehta et al., [18]. Because we wanted to survey the landscape of eccDNA, we did not perform a size exclusion step prior to enrichment. Otherwise, the circular DNA amplification, debranching reaction, and DNA branch release and repair stages closely followed the methods of Mehta et al., [18]. Enriched eccDNA for each sample [10] was individually barcoded following the manufacturer's recommended protocol (Pacific Biosciences), pooled in equimolar amounts, and sequenced on a Sequel II single molecule sequencer (Pacific Biosciences).

EccDNA sequence processing and analysis
Raw sequence reads were demultiplexed and circular consensus sequences analyzed with the SMRT link software (Pacific Biosciences). Parameters for CCS analysis were stringent and include: 1) predicted quality = 0.999; and 2) minimum read length = 1,000 bp. Processed reads were stored as .fastq files. Processed fastq files were analyzed with the packaged CIDER-seq software using the suggested approach to identify circular DNA. Predicted eccDNA were matched to the A. palmeri reference genome by Montgomery et al., [42]. After processing of predicted eccDNA, shorter duplicate eccDNAs were collapsed into the longest reference eccDNA with the CDhit software [63] with an identity threshold of 90%. Reference eccDNA were annotated for genuine open reading frames using the MAKER annotation pipeline [64] and evidence for genes derived from the A. palmeri published annotation [42]. Alignments to the reference genome were performed with the Minimap2 software [65] and comparative genome alignments performed with Mummer 4.0 [66]. Transfer RNAs were determined with the tRNAscan-SE software with default settings [67]. The A. palmeri reference assembly from [42] was divided into non-overlapping windows of 500kb and mapped eccDNA counted with BedTools [68].
PCR validation of circular DNA. Primer pairs of forward and reverse primers were selected to yield PCR products covering the entire circular DNA structure of several selected eccDNA sequences. The primers were designed with Geneious software and produced by Integrated DNA Technologies. Primers were resuspended in water at 100 uM concentrations. Aliquots of mixed primer pairs were prepared with 20 ul of forward primer, 20 ul of reverse primer, and 160 ul of water to yield 10 uM concentration for each primer pair. The PCR reactions contained 10 ul of 2x buffer, 1 ul of primer pair solution, 1 ul of genomic DNA matching the respective eccDNA origin, and 8 ul water. The thermal cycler settings were 98˚C for 4 minutes, 98˚C for 12 seconds, 52-58˚C for 30 seconds, 72˚C for 1 minute and 30 seconds, cycle to step two for 34 more times, 72˚C for 2 minutes, and incubate at 10˚C forever. After PCR, gel electrophoresis was performed to determine fragment size of the products.