Whole-genome sequencing of Brassica oleracea var. capitata reveals new diversity of the mitogenome

Plant mitochondrial genomes (mtDNAs) vary in sequence structure. We assembled the Brassica oleracea var. capitata mtDNA using a mean coverage depth of 25X whole genome sequencing (WGS) and confirmed the presence of eight contigs/fragments by BLASTZ using the previously reported KJ820683 and AP012988 mtDNA as reference. Assembly of the mtDNA sequence reads resulted in a circular structure of 219,975 bp. Our assembled mtDNA, NCBI acc. no. KU831325, contained 34 protein-coding genes, 3 rRNA genes, and 19 tRNA genes with similarity to the KJ820683 and AP012988 reference mtDNA. No large repeats were found in the KU831325 assembly. However, KU831325 showed differences in the arrangement of bases at different regions compared to the previously reported mtDNAs. In the reference mtDNAs KJ820683 and AP012988, contig/fragment number 4 is partitioned into two contigs/fragments, 4a and 4b. However, contig/fragment number 4 was a single contig/fragment with 29,661 bp in KU831325. PCR and qRT-PCR using flanking markers from separate parts of contig/fragment number 4 confirmed it to be a single contig/fragment. In addition, genome re-alignment of the plastid genome and mtDNAs supported the presence of heteroplasmy and reverse arrangement of the heteroplasmic blocks within the other mtDNAs compared to KU831325 that might be one of the causal factors for its diversity. Our results thus confirm the existence of different mtDNAs in diverse B. oleracea subspecies.


Introduction
Several features of plant mitochondrial genomes (mtDNAs) that distinguish them from animal and fungal mtDNAs, include larger genome size, frequent rearrangements, and an extremely low rate of point mutations [1][2][3]. Plant mtDNAs are relatively stable but complex, and variable in size compared to animal mtDNAs [4][5], their DNA evolves rapidly in structure, but slowly in sequence [6]. The mtDNAs of flowering plants are particularly variable among a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Plant material and genomic DNA extraction
Leaf samples were harvested from four-week-old plants of B. oleracea var. capitata inbred line '4119' grown at Asia Seed Company, Korea. Total genomic DNA was extracted from samples using the DNeasy Plant Mini kit (Qiagen, Germany) following the manufacturer's protocol. We reconstructed the complete mitochondrial sequence as described below.

Preparation of whole-genome NGS reads
Whole genome sequencing (WGS) reads of B. oleracea var. capitata were acquired via nextgeneration sequencing (NGS) using the Illumina HiSeq platform. A paired-end (PE) library with 500-bp inserts was constructed using the Illumina PE DNA library kit according to the manufacturer's instructions and was sequenced using an Illumina Hiseq2000 sequencing system by the National Instrumentation Center and Environmental Management (NICEM, http://nicem.snu.ac.kr/, Korea) and Macrogen (http://dna.macrogen.com/, Korea). The B. oleracea genome was completed by using a previously published reference sequence (GenBank accession no. NC_016118) of B. oleracea genome. The assemblies of WGS reads represented more than 20X genome coverage was considered the cabbage genome as previously described by Kim et al. [39] in rice. From there, we identified contigs/fragments unique to mitochondria using reference mtDNA sequences from NCBI, Acc. Nos. KJ820683, AP012988, and JF920286 of Brassica. Mitochondrial contigs/fragments were detected by BLASTN searches [40] for each assembly. The best draft assembly for each species was chosen as the assembly that maximized the total length of mitochondrial contigs/fragments and average length per mitochondrial contig/fragments. The contigs/fragments from our draft assemblies were manually aligned to the reference sequence using BioEdit v7.2.3 [41].

WGS assembly and building of complete mtDNA sequences
Raw reads with Phred scores of 20 or lower were removed from the total NGS PE reads using the CLC-quality trim tool (quality_trim software included in CLC ASSEMBLY CELL package ver. 4.06 beta. 67189). Sub-data sets were extracted from the trimmed WGS reads and were assembled using CLC de novo assembler in the CLC ASSEMBLY CELL package. Sequence gaps were filled by Gap closer with SOAP (Short Oligonucleotide Analysis Package; ver. 1.12). Contigs/fragments representative of the mtDNAs were retrieved from the total contigs/fragments using Nucmer (NUCleotide MUMmer; [42]) with the reference sequences. The retrieved contigs/fragments were ordered and arranged with the related mtDNA sequences based on built-in BLASTZ analysis (http://nature.snu.ac.kr/tools/blastz_v3.php; [43]; Fig 1) and connected into a single draft sequence by joining overlapping terminal sequences. Tentative error sites were identified by mapping raw reads to draft sequences using the CLC mapping tool (clc_ref_assemble in the CLC ASSEMBLY CELL package) and visualized using CLC viewer (clc_assembly_viewer in the CLC ASSEMBLY CELL package). Errors found in repeat, insertion/deletion (InDel), and SNP regions were manually corrected and validated by PCR amplification. . Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach, and then selecting the topology with superior log likelihood value. The analysis involved 5 nucleotide sequences. All positions containing gaps and missing data were eliminated. There were a total of 150244 positions in the final dataset. Evolutionary analyses were conducted in MEGA6 [48].

Sequence alignment for detecting heteroplasmy in mitogenome
In silico analysis was performed for detecting heteroplasmy within the different mtDNAs of B. olearacea (NCBI, Acc. nos. KU831325, KJ820683, JF920286, and AP012988) due to recombination from plastid; and also the block-wise similarities among the mtDNAs. Plastid genome sequence (Acc. no. KR233156) was collected from NCBI (https://www.ncbi.nlm.nih.gov) and used as reference sequence and aligned with the mtDNAs (KU831325, KJ820683, JF920286, and AP012988). Mauve (version 2.4.0) and Geneious Free trial version (https://www.geneious. com/free-trial/) were used to align the genomes and to identify the homology between sequence blocks among the plastid and mtDNA sequences (representated by differential colors); this facilitated to identify the heteroplasmy derived from plastid and simillar regions among the mtDNAs (KU831325, KJ820683, JF920286, and AP012988).

Data analysis
The 2 −ΔΔCt method was used for qRT-PCR data analysis [51]. Relative expression levels of different contigs/fragments were normalized against the expression of the housekeeping gene Bol-Actin. Variance analysis and the Tukey test were carried out using SPSS Statistics 17.0 software for Windows to determine the differences of expression of the contigs among various inbred lines of B. oleracea.

Isolation of mtDNA sequences from WGS
We assembled the mtDNA of B. oleracea var. capitata using a mean coverage depth of 20X whole genome sequencing (WGS) (NCBI acc. no. KU831325) and confirmed the presence of 'eight mitochondrial contigs/fragments' in B. oleracea var. capitata by BLASTZ using the previously reported 'KJ820683' and 'AP012988' mtDNAs of B. oleracea var. capitata as reference [39] (Figs 1 and 2). We assembled high-throughput sequencing data of the mtDNA of B. oleracea var. capitata by de novo assembly. The whole mtDNA sequence was total 219,975 bp and was assembled into a circular molecule that included 'eight contigs/fragments'. Among them, contigs/fragments number 6 was the largest (49,020 bp) and contigs/fragments number 2 was the smallest (1,833 bp) (Figs 1 and 2). Interestingly, we have found deviated distribution of the mitochondrial contigs/fragments in our newly submitted NCBI acc. no. KU831325 with reverse order of the contigs/fragments 4, 8, and 2 compared to the previously published reference mtDNAs of B. oleracea (Fig 3A). Our WGS and assembly showed contig/fragment number 1-2 connected to contig/fragment number 2-6, and contig/fragment number 2-6 connected to contig/fragment number 5-3 (Fig 3B), which differed from the reference mitochondrial genomes KJ820683, AP012988, and JF920286. Furthermore, in the reference genomes KJ820683 and AP012988, contig/fragment number 4 was partitioned into two separate sub-contigs/fragments (4a and 4b). However, contig/fragment number 4 was appeared as single contig/fragment with 29,661 bp in our reported KU831325 mtDNA ( Fig 3B). This genome contained 34 protein-coding genes, 3 rRNA genes, and 19 tRNA genes, which is as similar to the reference mtDNAs KJ820683 and AP012988 (S1 Table). We also detected 43 open reading frames (ORFs) in our B. oleracea mtDNA. The sequencing coverage was approximately 25X (S1 Fig). The overall GC content was 45.25%, which was also alike to the reference mtDNAs KJ820683 and AP012988.

Comparison of the new mtDNA with the reference mtDNA
The size of the predicted genome suggested the presence of same contigs/fragments and restriction map of Brassica species as in the reference mtDNAs. However, while the mtDNAs reported by Tanaka et al. [36] and Grewe et al. [37] shared a common sequence and structural alignment, while the mtDNA of B. oleracea reported in here showed structural variations in the contigs/fragments (Fig 3A and 3B). Primers designed at the flanking regions of contigs/ fragments 2-6, 5-3, and 4a-4b were used to confirm the presence of the contigs/fragments in different individuals covering subspecies of B. oleracea (Fig 4A and 4B). In our reported mtDNA contig/fragment number 4 (4a-4b) was found in all of the tested Brassica genotypes except Ogura male sterile cabbage line ( Fig 4A). We also looked for whether both of the previously reported mtDNAs of AP012988 or KJ820683 are present in our reported B. oleracea var. capitata mtDNA or not. The amplification of the flanking primer of contig/fragment 4a-5   confirmed the coexistence of both of the previously reported mtDNAs (Fig 4A and 4B). The differential rearrangement of contigs/fragments might be due to the presence of large and short repeat sequences within the mtDNAs of the tested B. oleracea var. capitata lines. To extent the results of coexistence of the previously reported mtDNAs, a BLAST search was carried out for contigs/fragments 2-6, 5-3, and 4a-4b, which also confirmed their coexistence with higher similarity (100%) with three previously published B. oleracea mtDNA. The other contigs/fragments also exhibited similarity (99%) with previously published B. oleracea mtDNAs (S2 Fig).

Heteroplasmy from plastid into mtDNAs
Whole-genome re-alignments of mtDNAs (KU831325, KJ820683, JF920286, and AP012988) with plastid genome sequence (Acc. no. KR233156) was performed to find out the heteroplasmy in the mtDNAs from plastid due to recombinations, and also for the similarities among the mtDNAs with our reported mtDNA KU831325. We have identified 12 blocks in the mtDNAs KU831325, KJ820683, and AP012988, whereas only nine blocks were deployed in mtDNA JF920286, when aligned with the plastid genome sequence (KR233156) (Fig 5 and Table 2). The similar blocks in the different mtDNAs are represented by the same colour in the Fig 5. However, the direction and position of different blocks in the genome were completely different between KU831325 Vs KJ820683 and KU831325 Vs AP012988. Among the 12 blocks, five blocks were arranged as reverse order in the mtDNAs KJ820683 and AP012988, whereas, they were in normal direction in the mtDNA KU831325. In addition, herteroplasmy from plastid genome was not detected for block 1, block 3, block 11, and block 12 in the mtDNAs KU831325, KJ820683, and AP012988 except mtDNA JF920286, in where block 3, block 6, and block 9 were completely absent ( Table 2). Rest of the blocks had wide range of hetroplasmy, the highest heteroplasmy was detected in the block 5 and block 10 with 2182 bp and 1547 bp, respectively in the mtDNA KU831325. Considering the whole plastid genome size, the highest hetroplasmy was also presented in the mtDNA KU831325 in terms of total bases (5519 bp out of 153366 bp of plastid; Table 2) and in terms of percent (3.6%; S2 Table). By contrast, the re-alignment of mtDNAs KJ820683, AP012988, and JF920286 in reference with KU831325 showed more than 98% similarities with KJ820683 and AP012988 (S2 Table). Therefore, it could be concluded that our reported mtDNA KU831325 is a diverse type mtDNA in B. oleracea might be due to presence of either heteroplasmy in terms of higher recombination of plastid genome or the rearrangement and positions of the blocks within the mtDNA.

Structural differences in the mtDNAs
The new B. oleracea mtDNA contained 34 protein-coding genes, 3 rRNA genes, and 19 tRNA genes (S1 Table) that were presumably inherited from the angiosperm common ancestor. In addition to known genes, angiosperms mtDNAs also contained numerous ORFs of unknown function. Thirty-eight of these ORFs were found as conserved in our reported new B. oleracea var. capitata mtDNA as the member of angiosperm (Fig 2).

Repeats and syntenic regions in the mtDNAs
The complex, multipartite structure of the Brassica mitochondrial genome was formed due to presence and recombination of repeats [6]. No large repeats were identified in KU831325 but 101 tandem repeats were confirmed, where the repeat length was 2,178 bp/repeat (S3 Table). Tandem repeats similar to those in KU831325 also exist in the reference mtDNAs KJ820683 and AP012988, but not in the JF920286. Frequency distribution analysis of the mtDNAs showed that the size of the tandem repeats was ranged between 8 and 24 bp (S3 Fig). The short tandem repeats accounted for about 1.1% of the entire B. oleracea mtDNA. Short repeats are associated with an irreversible organization and uniformly distributed throughout the Brassica mtDNAs [35,52]. To investigate the diversity and access mitochondrial genome structure variation among the four B. oleracea mtDNAs, microsyntenic analysis was performed. The syntenic sequence blocks were identified as syntenic fragments with variable size in length among the mtDNAs. However, in the mtDNA KU831325 had about 2.4 kb syntenic blocks when compared with other three mtDNAs (S4 Table). The previously reported B. oleracea mtDNAs, AP012988, JF920286, and KJ820683 had good linearity to each other. Whereas, our reported mtDNA KU831325 showed little amount of linearity with AP012988, JF920286, and KJ820683 (Fig 6). A total of 77, 117, and 74 syntenic blocks were identified in case of KU831325 compared with AP012988, JF920286, and KJ820683, respectively (S4 Table). All the syntenic comparisons in Brassica mitochondrial genomes are shown in the Circos map (Fig 6A-6D). A large duplicated region identified in JF920286 was absent from KU831325, AP012988, and KJ820683. We found that some rearrangements in block regions might be associated with short repeats. Phylogenetic analysis of the four B. oleracea mtDNAs with Arabidopsis thaliana (LUHQ01000021.1) as out-group showed that AP012988, JF920286, and KJ820683 were distributed within one group, but KU831325 was in a separate class (Fig 7). This diversity indicated that KU831325 is likely a new mtDNA, not previously described.

Discussion
Mitochondria contribute to energy production; thus, metabolism and cell homeostasis depend on the performance of the mitochondrial genetic system [53]. Plant mitochondrial DNA comprises a set of sub-genomic forms called sublimons with a mixture of linear, circular, and branched structures [54]. To elucidate the mitochondrial genomic diversity within a species of Brassica, we sequenced the mtDNA of B. oleracea var. capitata. Diversity in mtDNAs might be due to evolutionary factors such as duplications, rearrangements, InDels, and mutations [3,19,[55][56]. In this study, we identified a different type of B. oleracea mtDNA sequence with a different distribution of contigs/fragments compared to the reference mtDNAs (KJ820683, AP012988, and JF920286) of B. oleracea. Our reported mtDNA (KU831325) is slightly larger than KJ820683 and AP012988 based on previously reported data from physical mapping [6,[36][37] of B. oleracea (Fig 2). However, this mtDNA is much smaller than the previously reported 'JF820683' mtDNA.  Fig). Most of the 'KU831325'type genes may exist in several copies because of many short tandem repeat regions and are identical to those of the other previously reported mtDNAs. The only exception was contig/ fragment number 4, which we have found absent in the previously reported KJ820683, AP012988, and JF920286 mtDNAs, might be due to DNA rearrangement in the mtDNA. Recombination is an essential process for shaping and maintaining diversity in the genome, and may be the basis of generation of new diversity in mtDNAs [53]. In our reported mtDNA KU831325, we have found comparatively larger heteroplasmy from plastid DNA and completely different types of distribution and direction of some heteroplasmic blocks compared to other reference mtDNAs ( Table 2 and Fig 5) might make the diverse type of mtDNA KU831325.
MtDNAs are typically mapped as circular molecules with one or more large repeated sequences that promote active homologous recombination [9]. In our reported circular mtDNA of B. oleracea about 99 ORFs we have found, among them 34 ORFs are protein coding gene, 3 rRNA genes, and 19 tRNA genes; rest of the ORFs could not be characterized for their definite functions (Fig 2 and S1 Table). This result indicates, in addition to known genes, B. oleracea mtDNAs contain numerous ORFs of unknown function. Several of these ORFs are conserved among diverse higher plant species and are functional mitochondrial genes [57]. Some ORFs have been shown to play a role in establishing cytoplasmic male sterility (CMS), although the majority of these CMS-associated ORFs are species specific and are likely to affect male fertility by mechanisms unique to each species [58][59]. Besides ORFs, we have found the presence of large subunit (LSU) and small subunit (SSU) of ribosomal proteins might be arisen through inter-and intra-genomic recombination in plant species. Such inter-and intra-genomic recombination can lead to dynamic structural diversity consisting of multiple, coexisting mitogenomic forms, as evidenced by DNA gel blot mapping, cosmid sequencing, and readpair mapping [8,17,60], though a few species lack large mitochondrial repeats and presumably exist in a single predominant conformation [4,[61][62]. Mitochondrial gene content can also vary among angiosperms [7,63]. Some species, such as the tulip tree Liriodendron tulipifera [64], have retained all 41 protein-coding genes that were presumably present in the common angiosperm ancestor, but most present-day species have retained only a subset of these genes. Mitochondrial gene loss is commonly linked to the ongoing transfer of DNA to the nucleus, which mostly affects genes for ribosomal proteins and the subunits of the succinate dehydrogenase complex [65]. However, most mitochondrial ORFs are not conserved among angiosperms [8,60,[66][67] and are generally considered to be nonfunctional [63].
The comparison of the B. oleracea mtDNAs revealed that the KU831325 mtDNA shared 77, 177, and 74 syntenic blocks with AP012988, JF920286, and KJ820683, respectively, where the syntenic average block size was ranged 105-162 bp. By contrast, the syntenic average block size among AP012988, JF920286, and KJ820683 was ranged 134-1943 bp with very large single syntenic block up to 142 kb in AP012988 vs JF920286 (S4 Table). We also identified short tandem repeats of 2,424 bp with a repeat length of 8 to 24 bp (S3 Fig), which may have been involved in the reorganization of the 'KU831325' mtDNA. These repeats can also explain the rearrangement between the 'AP012988' and 'KU831325' mtDNAs via homologous recombination (Fig 6D). Alverson et al. [17] reported that plant mitochondrial genomes are rich in repeated sequences. Mitochondrial repeated sequences could have originated via recombination of the shorter repeats, producing sublimons in plants. Differential expression profiling of the newly identified 'contig/fragment number 4 (4a-4b)' showed lower expression than other contigs/fragments, even within the individual member of the B. oleracea genotypes reveal the presence of a separate contig/fragment in the reported mtDNA of B. oleracea (Fig 4B).
The ratios between the abundances of sublimons in the KU831325, AP012988, and KJ820683 mtDNAs are varied. Sublimons are generated by recombination of short repeats and are sometimes amplified rapidly, replacing the main genome by a substoichiometric shifting process, while the original genome is suppressed to a low-frequency level [68][69][70][71]. Substoichiometric shifting is possibly caused by environmental conditions or mutations in the nuclear recombination proofreading machinery [9,52,72].
In conclusion, the KU831325 mtDNA sequence reported here a different variety of mtDNAs in B. oleracea, thereby, coexistence of the JF920286 mtDNAs is usually at low frequency compared to KU831325 (Table 2). Phylogenetic analysis of B. oleracea mitochondrial genome suggested that the KU831325 mtDNA is a diverse type mtDNA compared to AP012988 and KJ820683 (Fig 7). The diversity of the present mtDNA also confirmed by its comparable size with already published B. oleracea mtDNA. In addition, presence and distribution of hetroplasmic and syntenic blocks also support the diversity of the new B. oleracea mtDNA reported here.