Organellar genome analysis reveals endosymbiotic gene transfers in tomato

We assembled three complete mitochondrial genomes (mitogenomes), two of Solanum lycopersicum and one of Solanum pennellii, and analyzed their intra- and interspecific variations. The mitogenomes were 423,596–446,257 bp in length. Despite numerous rearrangements between the S. lycopersicum and S. pennellii mitogenomes, over 97% of the mitogenomes were similar to each other. These mitogenomes were compared with plastid and nuclear genomes to investigate genetic material transfers among DNA-containing organelles in tomato. In all mitogenomes, 9,598 bp of plastome sequences were found. Numerous nuclear copies of mitochondrial DNA (NUMTs) and plastid DNA (NUPTs) were observed in the S. lycopersicum and S. pennellii nuclear genomes. Several long organellar DNA fragments were tightly clustered in the nuclear genome; however, the NUMT and NUPT locations differed between the two species. Our results demonstrate the recent occurrence of frequent endosymbiotic gene transfers in tomato genomes.


Introduction
The plant cell organelles, the plastid and mitochondrion, are known to have originated from prokaryotes via endosymbiosis, and it is possible that the origin of the mitochondrion was contemporaneous with that of the eukaryotic cell, because there is no evidence of an amitochondriate phase in eukaryotic evolution [1]. Although both organelles exist together in the plant cell, the evolutionary histories of the two organellar genomes in land plants differ slightly. Plastid genomes (plastomes) from bryophytes to angiosperms are normally 120-170 kb in length [2][3][4][5], excluding certain contracted or expanded genomes [6,7]. They are highly conserved in terms of gene content and arrangement, which is typically circular [4]. Mitochondrial genomes (mitogenomes) in land plants are more complex than plastomes. The moss mitogenome is approximately 100 kb long, and its structure has been constant for 350 My [8]. However, seed-plant mitogenomes changed rapidly [9][10][11]. Ribosomal protein genes and sdh genes were frequently lost in angiosperm mitogenomes during evolution, and are thought to have been transferred to the nuclear genome [12,13]. Large [10,14] and small [15][16][17] repeated sequences increased the size of mitogenomes in seed plants and changed their structure via reversible and non-reversible recombination, respectively [15]. Horizontal gene transfers of mitogenome sequences have been frequently observed in terrestrial plant species [18]; consequently, mitogenomes in land plants vary between 100 kb [8] and 11.3 Mb long [19]. In addition, certain plant species contain multichromosomal mitogenomes [19,20]. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 Six types of gene transfer have been observed among three genome-containing organelles in plants [21]: from the plastid to the nucleus [22][23][24], from the mitochondrion to the nucleus and vice versa [25][26][27], and from the plastid to the mitochondrion [28][29][30][31][32][33][34][35]. Gene transfer from mitochondria to plastids has been reported recently [36][37][38][39][40], but gene transfer from the nucleus to plastids appears to rarely occur [21]. Nuclear copies of mitochondrial DNA (NUMTs) as a result of endosymbiotic gene transfer (EGT) have been widely found from protists to animals [26]. Certain NUMTs are associated with human diseases [41] and make DNA barcoding and phylogenetic analysis using mitogenomes difficult [42,43]. Data from 85 genomes of protists, fungi, plants, and animals have revealed a correlation between genome size and the total number of NUMTs, and eukaryotes that have only one mitochondrion contain fewer NUMTs than those that have multiple mitochondria [26]. Less than 0.1% of the nuclear genomes of mammals, insects, yeasts, and some plants contain NUMTs [26], but NUMTs in Oryza sativa and Arabidopsis thaliana account for 0.1-0.2% of their nuclear genomes [44]. The integration of mitochondrial segments into the nuclear chromosome occurs by NUMTs being inserted into double-strand breaks by non-homologous end-joining machinery [45].
Similarly, nuclear copies of plastid DNA (NUPTs) have also been found in many organisms, including land plants, algae, apicomplexans, and haplophytes. The cumulative lengths of NUPTs in polyplastidic organisms are greater than those in monoplastidic organisms, except for certain species of green algae and apicomplexans [46]. However, few comprehensive studies of gene transfers among the three genomes have been conducted, because few complete land plant nuclear genomic sequences are available.
Solanum is one of the most economically important plant genera because it includes many valuable crops, such as the tomato, potato, and chili pepper [47]. These species are used as plant models, and their complete nuclear genomes provide insights into many aspects of plant biology [48][49][50]. The organellar genomes of Solanum have also been studied, and the plastomes of 15 Solanum species have been sequenced [51][52][53][54][55][56][57]. These complete plastome sequences increase our understanding of the evolution and phylogenetic relationships of Solanum species. In contrast to the plastome, complete mitogenome sequences of Solanum have not been completely analyzed. The first physical map of the tomato mitogenome was constructed for a male-sterile tomato that was generated via cell fusion between the tomato and potato [58]; subsequently, draft mitogenome sequences of the tomato and potato, containing numerous gaps and unordered contigs, have been generated [49] (http://www. mitochondrialgenome.org/). Therefore, if tomato mitogenome sequences are available, it would be useful to investigate EGTs among the three DNA-containing organelles because of the availability of two sets of complete nuclear genome sequences of S. lycopersicum 'Heinz1706' and S. pennellii 'LA0716' [48,54] and plastome sequences, and the expectation of more frequent nuclear copies of organellar DNA than those of previously studied land plants owing to their larger genome sizes [26,46].
In this study, we assembled three complete mitogenomes (two of S. lycopersicum and one of S. pennellii) and analyzed their intra-and interspecific variations. In addition, EGTs among three genomes, including two organellar genomes and the nuclear genome, were comprehensively investigated.

Materials and methods
Assembly and confirmation of complete mitogenome and plastome sequences 'LA0716' (SRA accession number: ERR418107) were obtained using the Illumina HiSeq 2000 system. Both ends of the reads were trimmed using Geneious [60] with an error probability of 0.01, and only paired-end reads longer than 50 bp were extracted. The mitogenomes were assembled using previously developed strategies, a baiting and iterations [61,62]. Firstly, reads were mapped to the Solanaceae mitogenomes (S1 Table). The assembled reads on the reference sequences were distributed at genes excluding noncoding regions. Secondly, mapped reads were assembled de novo with zero mismatch and gap to generate reference contigs, before we annotated the reference contigs using Geneious [60] to confirm whether all of the ribosomal RNAs (rRNAs), transfer RNAs (tRNAs), and protein-coding regions included in the other Solanaceae mitogenomes were included in these contigs. Thirdly, reads were realigned with the reference contigs with zero mismatch and gap among reads. Consensus sequences of these mapping reads were used as new, extended reference contigs. Subsequently, reads were iteratively mapped to the new extended reference contigs generated in the previous iteration. Contig length increased in each iteration, and few of the contigs overlapped with each other. Finally, the sequence of one circular mitogenome was obtained using each raw dataset; however, the coverage depth for certain regions was higher than that for other mitogenome regions. The sequences of these high-depth regions were almost identical to the tomato plastome sequence. We designed primer sets based on these regions, and showed that these regions belonged to the mitogenome using the genomic DNA of S. pennellii 'LA0716'. To verify the coverage depths of the mitogenomes and plastomes of the three tomato genomes, raw reads were mapped to six mitogenome and plastome sequences using the Burrows-Wheeler alignment tool (S1 Fig and S2 Table) [63].
In addition, three plastome sequences from each raw dataset were assembled to identify EGTs among the three tomato genomes. The plastome assembly strategy followed that of Kim et al. [64].

Annotation of genes and repeat regions
All of the genes in the three tomato mitogenome sequences were annotated and compared with other mitogenome sequences of Solanaceae using Geneious [60], and protein-coding and tRNA genes were re-examined using blastp [65] and tRNAscan-SE [66], respectively. Open reading frames (ORFs) with a minimum length of 303 bp and the start codon "ATG" were annotated using Geneious [60].
Duplicated regions with a minimum repeat length of 100 bp and zero maximum mismatch were identified using Geneious [60], and 56 mitogenome sequences of core eudicots (ftp://ftp. ncbi.nlm.nih.gov/genomes/refseq/mitochondrion) were downloaded to investigate the relationship between repeat region length and total mitogenome length (S1 Table).

Analysis of the structural evolution of tomato mitogenomes
To analyze the structural evolution of tomato mitogenomes, the three tomato mitogenome sequences were compared using Circoletto [67] and blastn with an e-value of <1 x 10 −10 [65]. Syntenic blocks that were longer than 1 kb and contained at least one gene are summarized in S3 Table. <1 x 10 −5 , and 50,000 maximum hits. Multiple hits for the same nuclear genomic locus caused by repetitive regions of the query sequence (duplicated regions in organellar genomes) were eliminated to avoid overestimating the migration of nuclear copies of organellar DNA [46]. Nevertheless, integration of organellar DNA into the nuclear genome can be overestimated, because nuclear organellar copies could have become fragmented during evolution.

Origin of nuclear-transferred organellar DNA
NUPTs and NUMTs that were over 3,000 bp long from S. pennellii 'LA0716' were used to investigate 1) whether tightly clustered organelle copies in the nuclear genome originated from a single, large original sequence of nuclear-transferred organellar DNA and 2) whether they degenerated during evolution in hotspot regions of the nuclear genome [22]. Among the regions selected, we extracted those containing more than two large organellar DNA fragments and compared them with their equivalent organellar DNA.

Structure of tomato mitogenomes
The three mitogenome sequences were 423,596-446,257 bp in length (Fig 1 and Table 1). Their lengths were similar to the length of the MSA1 mitogenome which were generated by cell fusion between the tomato and potato [58] but were longer than that of the first draft of the tomato mitogenome [73]. Structurally, their GC content (45.0-45.2%) was similar to that of Nicotiana sylvestris, N. tabacum (45%), and Capsicum annuum (44.5%) ( Table 1). The duplicated regions of the tomato mitogenome were 42,193-76,436 bp in length, and longer than those of other Solanales mitogenomes (Table 1). Thirty-seven coding genes, three rRNAs, and 20 tRNAs were identified in the three tomato mitogenomes (Fig 2). Among them, 5-8 genes were duplicated, excluding ORFs and tRNAs (Fig 2), whereas only 0-3 genes were duplicated in the mitogenomes of other plants belonging to the Solanales. Specifically, rpl16, rps3, rps19, rrn5, and rrn18 were duplicated in the tomato mitogenomes, and rps7 and rps14 were deleted from the mitogenomes of tomato and other Solanaceae plants.
Over 97% of the mitogenome sequences were conserved among the three tomato mitogenomes (Fig 3). The sequences of the two S. lycopersicum mitogenomes shared similarity with that of S. pennellii 'LA0716,' excluding a 43-bp region, whereas only four regions (223-, 460-, 1,013-, and 6,328-bp regions) were observed in the S. pennellii 'LA0716' mitogenome. Among them, the 6,328-bp region was almost identical to the nuclear genome sequence of S. pennellii 'LA0716,' with certain insertions and deletions. In addition to the identical part to the S. pennellii nuclear genome, the 6,328-bp region comprised three segments ( Fig 4A). The first segment was almost identical to the S. lycopersicum nuclear genome, the second was identical to the mitogenome and nuclear genome of Nicotiana, and half of it was similar to the nuclear genomes of S. lycopersicum and C. annuum. The third segment comprised five plastome-like regions that corresponded with the tomato plastome ( Fig 4B).
There were numerous inter-and intraspecific mitogenome rearrangements; however, most of the mitogenome sequence regions in the three tomatoes were shared ( Fig 5). Interestingly, the maximum syntenic region between S. pennellii 'LA0716' and S. lycopersicum 'LA1479' was higher than that between S. lycopersicum 'LA1421' and 'LA1479'. Nineteen syntenic blocks were conserved among the three mitogenomes (S3 Table). The longest syntenic block was 61,213 bp in length and contained five genes, and the shortest was 5,815 bp in length and contained sdh3 and exon1 and 2 of nad2.

Duplicated regions in the tomato mitogenomes
The duplicated regions in the tomato mitogenomes ranged from 42,193 bp (S. pennellii 'LA0716') to 76,436 bp (S. lycopersicum 'LA1421') in length, and were longer than those of other Solanaceae species (Table 1). Compared with the other 56 core eudicot mitogenomes [74], the total duplicated regions of the three tomato mitogenomes were the 6 th , 10 th , and 12 th longest (S2 Fig). The correlation between the total duplicated region length and total mitogenome length was not significant according to the Pearson's correlation coefficient (p = 0.2831). However, the total duplicated region length was significantly correlated with the maximum duplicated region length (p < 2.

Intracellular gene transfer from the plastome to the mitogenome
In total, 9,598 bp [large single copy of 2,558 bp, small single copy (SSC) of 32 bp, and an inverted repeat of 7,008 bp] of plastome sequences were detected in the three tomato mitogenomes, and a few of them were duplicated in each mitogenome. Mitochondrial plastid DNAs (MTPTs) in the tomato mitogenomes were 9,750-12,983 bp in length, constituting 2.2-3.1% of each mitogenome ( Table 2). Compared with other Solanales species, the percentage of MTPTs in the tomato mitogenomes was more similar to that in Nicotiana and Hyoscyamus than in Capsicum, which is phylogenetically closer to Solanum [75]. However, most of the tomato MTPTs, excluding the partial sequences of rps20, rps12, and ycf2, were similar to the C. annuum MTPTs [34]. The SSC regions of the Solanaceae plastomes were highly conserved for transfer to mitogenomes, whereas the mitogenome of Ipomoea nil, belonging to the Solanales and the Solanaceae, contained a large SSC region.
As mentioned above, only five MTPTs were observed in the S. pennellii 'LA0716' mitogenome among the three tomato mitogenomes, and the plastome counterparts of the five MTPTs were located nearby, excluding the partial psbB region (Fig 4B). To further analyze these five MTPTs, they were grouped into sequence A, which included two small plastome regions, and sequence B, which included three large plastome regions (S3 Fig). Sequence A was observed in the C. annuum mitogenome, whereas sequence B was only partly observed in certain angiosperm mitogenomes (S3 Fig). Interestingly, the mitogenome of Hesperelaea palmeri [76], Tomato mitochondrial genome analysis which is an extinct Oleaceae species, shared four MTPTs with that of S. pennellii 'LA0716', although the similarity in MTPTs between the S. pennellii 'LA0716' and H. palmeri mitogenomes was weaker than that between the S. pennellii 'LA0716' and Nicotiana mitogenomes. By comparing the nuclear genomes of Solanum and Capsicum using blastn [65], numerous sequence-A-similar regions were observed in the nuclear genomes of Solanum and C. annuum, excluding that of S. lycopersicum 'Heinz1706,' which had only one region (S4 Fig). Although the phylogenetic tree constructed using Bayesian inference did not completely determine the relationship among sequence-A-similar regions in the Solanum and C. annuum genomes (data not shown), sequence-A-similar regions in the C. annuum genome were distinguished from those in the Solanum genomes by two deletions (4 bp and 2 bp) (S4 Fig). Sequence-B-similar regions were not observed in the nuclear genomes of S. lycopersicum, S. tuberosum, or C. annuum.

Gene transfer between the mitogenome and nuclear genome
Most of the sequences shared between the tomato mitogenome and nuclear genome were not coding regions, and noncoding regions in mitogenomes vary among land plant species. Therefore, it was difficult to determine the direction of gene transfer between the mitogenome and nuclear genome. Consequently, all of the sequences shared between the tomato mitogenome and nuclear genome were considered NUMTs.
In total, 15,670-16,844 NUMTs were observed in the nuclear genomes of S. pennellii 'LA0716' and S. lycopersicum 'Heinz1706' ( Table 3). The total length of NUMTs in the S. pennellii 'LA0716' nuclear genome (3,412 kb) was greater than that in the S. lycopersicum 'Heinz1706' (2,944 kb) nuclear genome, representing 0.37% of the total nuclear genome. Most of the NUMTs in the two tomato species were observed on chromosome 1; however, they occupied less than 0.29-0.32% of it. In contrast, 0.72% and 0.74% of chromosome 11 in the two species was homologous to their mitogenomes.
NUMTs were evenly distributed among the chromosomes (S5-S7 Figs). However, NUMTs longer than 1,000 bp were tightly clustered, and regions containing numerous large NUMTs were not identical between the two nuclear genomes of S. pennellii 'LA0716' and S. lycopersicum 'Heinz1706'. Specifically, the number of NUMTs longer than 5,000 bp in S. pennellii 'LA0716' was nearly twice that in S. lycopersicum 'Heinz1706' (Fig 6). Half of the large NUMTs in S. pennellii 'LA0716' were on chromosomes 5 and 11. Consequently, the median length of the NUMTs on chromosomes 5 and 11 in S. pennellii 'LA0716' was similar to that in S. lycopersicum; however, the mean length of the NUMTs on chromosomes 5 and 11 in S. pennellii 'LA0716' was greater than that in S. lycopersicum 'Heinz1706' (Table 3).

Gene transfer from the plastome to the nuclear genome
There were 7,445 and 7,805 NUPTs in the nuclear genomes of S. lycopersicum 'Heinz1706' and S. pennellii 'LA0716,' respectively (S4 Table). The cumulative NUPT length was 1,533,904-1,739,535 bp, constituting 0.189-0.191% of the nuclear genome. The cumulative NUPT lengths of chromosome 1 in S. lycopersicum 'Heinz1706' and chromosome 10 in S. pennellii 'LA0716' were the longest among the chromosomes, and occupied 0.28% and 0.40% of each chromosome in S. lycopersicum and S. pennellii, respectively. Similar to the NUMTs, large NUPTs were tightly clustered; however, their locations were not identical in the two tomato nuclear genomes (S8 and S9 Figs).

Similarity and structural mutations in nuclear-transferred organellar DNA and counterparts
To determine why large organellar copies were clustered in certain nuclear genome loci, two assumptions were made. The first was that NUPTs and NUMTs that were tightly clustered originated from the DNA of a single organelle, and the second was that the occurrence of structural mutations, such as rearrangements and insertions/deletions, and base substitutions had increased with time after EGT from organellar genomes to the nuclear genome.
Large NUPTs (longer than 1 kb) were tightly clustered in 11 regions of the S. pennellii 'LA0716' nuclear genome (S10 Fig). In these 11 regions, certain NUPTs that appeared to be more structurally mutated because of large inversions, rearrangements, or insertions/deletions had less similarity with their counterparts than those that appeared to be less structurally mutated because of small insertions/deletions or duplications. However, certain NUPTs that appeared to be more structurally mutated had stronger similarity than those that appeared to be less structurally mutated.
Large, tightly clustered NUMTs (longer than 1 kb) were observed in 24 regions of the S. pennellii 'LA0716' nuclear genome (S11 Fig). Similar to the large NUPTs, certain NUMTs that appeared to be more structurally mutated had a lower similarity to their counterparts than those that appeared to be less structurally mutated. However, certain regions with numerous large insertions/deletions or inversions had over 99.4% similarity compared with their Tomato mitochondrial genome analysis counterparts. In particular, the similarities of large NUMTs on chromosome 5 in S. pennellii 'LA0716' were over 96.1%, although large deletions and rearrangements appeared to have occurred.

Structural variations in tomato mitogenomes
Considering the slow evolutionary rates of sequence [11] and gene [77] conservation in plant mitogenomes, the low similarity between plant mitogenomes of closely related genera [9,78,79] and foreign DNA causing variation in plant mitogenome length [80][81][82] suggest that the mitogenomes of land plants appear to comprise syntenic blocks containing coding genes and Tomato mitochondrial genome analysis unique regions that contain noncoding regions, and these unique regions appear to be related to foreign DNA. In addition, 59 eudicot mitogenome sequences, including the three tomato mitogenomes, showed that there was no correlation between total duplicated region length and mitogenome sequence length (S2 Fig). Therefore, it appears that foreign DNA has more effect than duplicated regions on length variations at high taxonomic levels (family and above).
However, duplication appears to be a cause of mitogenome expansion at low taxonomic levels, such as inter-and intraspecific levels. Except for the duplicated regions, 97.1-99.3% of the mitogenome sequences of Brassica juncea (219,766 bp) and Brassica oleracea (360,271 bp) could be aligned together [14]. The lengths of the tomato mitogenomes were strongly related to the duplicated regions. The duplicated regions in the S. lycopersicum 'LA1421' mitogenome were the largest among the three tomato mitogenomes ( Table 1). The total length of the duplicated regions in the S. lycopersicum 'LA1479' mitogenome was 23% greater than that in the S. pennellii 'LA0716' mitogenome. Consequently, the two S. lycopersicum mitogenomes were larger than that of S. pennellii 'LA0716,' although the S. pennellii 'LA0716' mitogenome contained a unique 8,024-bp sequence compared with the two S. lycopersicum mitogenomes. Because the maximum duplicated region length was significantly correlated with the total duplicated region length (S2 Fig), the total length difference between mitogenomes of closely related taxa seem to be more affected by the maximum duplicated region than by short duplicated regions. In contrast to plastomes, the structures of mitogenomes in land plants have evolved rapidly [11], and direct repeats and inverted repeats have facilitated rearrangements [77,83]. Therefore, the mitogenomes of land plants probably evolved to be able to produce duplications frequently and easily in order to rapidly alter their structures.
Rearrangement is a major issue in studies on mitogenomes, because it can result in the generation of novel chimeric ORFs, which is a new, related phenomenon [78,84]. Many previous studies have demonstrated great intraspecific [78,[85][86][87] and interspecific [10,14] variability in mitogenome structure. Numerous repeat regions support the possibility of a greater number of rearrangements in tomato mitogenomes, because recombination via inverted repeats and direct repeats induces the inversion of intervening sequences and subgenomic molecules, respectively [17,83,88]. Tomato mitochondrial genome analysis

Gene transfer from the plastome to the mitogenome via the nuclear genome
Among the five MTPTs detected only in the S. pennellii 'LA0716' mitogenome among the three tomato strains, the entire sequence A region was detected in the C. annuum mitogenome (S3 Fig); however, the plastome counterparts were distant (Fig 4B). According to Wang et al. [34], in 39 seed plants, MTPT gene clusters containing psbB did not contain psaJ. Therefore, it is probable that two small plastome regions were integrated together in the ancestor of Solanum and Capsicum. Subsequently, the sequence A region might have transferred into the Table 3. Nuclear copies of mitochondrial DNA (NUMTs) in the nuclear genomes of tomato (Solanum) species. Tomato mitochondrial genome analysis mitogenomes of both S. pennellii and C. annuum. If this scenario is correct, why was this region only transferred to the S. pennellii and C. annuum mitogenomes and not to the S. lycopersicum mitogenome? Because the sequence A region was observed in the nuclear genomes of Solanum species and C. annuum (S4 Fig), this region might have initially infiltrated the nuclear genome of the common ancestor of Solanum and Capsicum. Subsequently, the sequence A region was duplicated in the nuclear genomes of both S. pennellii and C. annuum, but not in S. lycopersicum, after the speciation of extant tomato species. According to a recent phylogenetic study on the Solanaceae [89], the ancestor of Capsicum diverged from that of Solanum 19. 13 Ma, and the ancestor of S. pennellii diverged from that of S. lycopersicum 1.72 Ma. Therefore, it appears that the first infiltration of integrated plastome sequences dates back at least to the Neogene, and sequence A region duplications occurred during the Quaternary period. Because there were frequent gene transfers between the mitogenome and nuclear genome during evolution [23,26,90,91], the presence of multiple copies of sequence A in the nuclear genomes of S. pennellii and C. annuum could have increased their chances of transfer into the mitogenome, compared with one sequence A copy in the S. lycopersicum genome. Therefore, the sequence A region in the mitogenomes of S. pennellii and C. annuum appears to have been independently transferred from the nuclear genome, and this finding indicates that certain MTPTs are the result of twostep gene transfers, i.e., plastome ! nuclear genome ! mitogenome.

Recent EGTs from organellar genomes to the nuclear genome vs rapid deletion of organellar copies from the nuclear genome
Original, large insertions appear to have been degraded during evolution into smaller fragments [22]. Therefore, the total length of organellar DNA copies in the nuclear genome was negatively correlated with the number of organellar DNA fragments, because of degradation during evolution. The total numbers of NUPTs and NUMTs in S. pennellii were similar to those in S. lycopersicum; however, their cumulative lengths were 13% and 16% longer, respectively, in S. pennellii. These variations in cumulative length were caused by large organellar copies in the nuclear genome of S. pennellii. In addition, a discordance of NUPTs between two Oryza subspecies has also been reported [46].
The lengths and numbers of large organellar DNA fragments in the S. lycopersicum nuclear genome were lower than those in the S. pennellii nuclear genome. If the evolutionary rate of the decrease and deletion of insertion fragments in S. lycopersicum was greater than that in S. pennellii, it is clear why there were more, and larger, organellar copies in S. pennellii than in S. lycopersicum.
However, if the difference in the number of large fragments was caused entirely by the different evolutionary rates of the decrease and deletion of insertion fragments between the two tomatoes, a similar ratio of large fragments in each chromosome of S. pennellii and S. lycopersicum might be achieved. However, the distribution of large NUMTs (longer than 3,000 bp) in chromosome 3 differed to that in chromosome 6 in the two tomatoes. Therefore, the assumption that recent EGTs occurred between organellar DNA and the nuclear genome can be accepted.
Consequently, it appears that the evolutionary rates of the decrease and deletion of fragments inserted in the nuclear genome were altered in the two tomatoes after they diverged, and recent EGTs differentiated the distribution of organellar copies on certain chromosomes to that on other chromosomes.

Clustered, large organellar DNA in the nuclear genome
Long NUPT and NUMT fragments were frequently found in different regions of the nuclear genomes of the same tomato species. Few of these large organellar DNA fragments appeared to have originated from long organellar genomic fragments, because they were derived from closely located regions of the organellar genomes, and showed high similarity with their organellar genome counterparts. Michalovova et al. [90] suggested that new organellar DNA sequences were inserted near centromeres, degraded by transposable elements, and then scattered by structural mutations. However, certain nuclear regions with long organellar DNA fragments were derived from different regions of the plastome or mitogenome, and did not appear to be older than the long fragments derived from organellar DNA in terms of sequence divergence (S10 and S11 Figs). These mosaic organellar DNA fragments cannot be explained by the single organellar DNA origin hypothesis, which is based on the similarity between organelle-derived nuclear DNA and organellar DNA without structural mutations [22].
Noutsos et al. [24] suggested that mosaic organellar DNA fragments were generated by 1) the random end-joining of different fragments before integration; 2) rapid rearrangements after integration; or 3) by the ongoing integration of organellar DNA at the same locus. The first and second scenarios explain the clustered, large NUPTs and NUMTs in tomato nuclear genomes; however, why were the loci of the clustered NUPTs and NUMTs distantly located? If the random end-joining of different fragments before integration occurred regardless of genomic source (mitogenome or plastome), large fragments of the mitogenome and plastome could have merged, like five complex insertions of O. sativa containing rearranged DNA from the mitogenome and plastome [24] and not like the tomato nuclear genome. The discordance between nuclear genomic regions with long plastome fragments and those with long mitogenome fragments could have been caused by very large, continuous integrants and consecutive rearrangements [24]; however, this hypothesis cannot apply to all nuclear organellar DNA copies, because certain organellar DNA copies that appear to be more structurally mutated had a stronger similarity with their counterparts than those that appear to be less structurally mutated. We could not identify hotspots of organellar DNA integration into the nuclear genome with the data available; however, the hotspot hypothesis could explain the discordance between nuclear genomic regions with long plastome fragments and those with long mitogenome fragments.
Supporting information S1 Fig. Coverage depths of the mitogenomes and plastomes used in this study. Raw reads were mapped to mitogenomes and plastomes using Geneious aligner with zero mismatch and gap among the reads, and the Burrows-Wheeler alignment tool with the default options set to verify the coverage depths through the genome. Sharp peaks that were up to 20-fold higher than base coverage indicate mitochondrial plastome regions. Coverages were higher than 200, except for certain regions containing homopolymers or AT-rich regions, which had low coverage depth. However, these regions were also supported by numerous paired-end reads (blue bar and red line indicate paired-end reads and intervals between paired-end reads, respectively, in Solanum lycopersicum 'LA1479'). The X-axis and Y-axis indicate positions and coverage depths, respectively.