The complete chloroplast genome sequences of four Viola species (Violaceae) and comparative analyses with its congeneric species

We report the complete chloroplast genomes of four Viola species (V. mirabilis, V. phalacrocarpa, V. raddeana, and V. websteri) and the results of a comparative analysis between these species and the published plastid genome of the congeneric species V. seoulensis. The total genome length of the five Viola species, including the four species analyzed in this study and the species analyzed in the previous study, ranged from 156,507 (V. seoulensis) to 158,162 bp (V. mirabilis). The overall GC contents of the genomes were almost identical (36.2–36.3%). The five Viola plastomes each contained 111 unique genes comprising 77 protein-coding genes, 30 transfer RNA (tRNA) genes, and 4 ribosomal RNA (rRNA) genes. Among the annotated genes, 16 contained one or two introns. Based on the results of a chloroplast genome structure comparison using MAUVE, all five Viola plastomes were almost identical. Additionally, the large single copy (LSC), inverted repeat (IR), and small single copy (SSC) junction regions were conserved among the Viola species. A total of 259 exon, intron, and intergenic spacer (IGS) fragments were compared to verify the divergence hotspot regions. The nucleotide diversity (Pi) values ranged from 0 to 0.7544. The IR region was relatively more conserved than the LSC and SSC regions. The Pi values in ten noncoding regions were relatively high (>0.03). Among these regions, all but rps19-trnH, petG-trnW, rpl16-rps3, and rpl2-rpl23 represent useful molecular markers for phylogenetic studies and will be helpful to resolve the phylogenetic relationships of Viola. The phylogenetic tree, which used 76 protein-coding genes from 21 Malpighiales species and one outgroup species (Averrhoa carambola), revealed that Malpighiales is divided into five clades at the family level: Erythroxylaceae, Chrysobalanaceae, Euphorbiaceae, Salicaceae, and Violaceae. Additionally, Violaceae was monophyletic, with a bootstrap value of 100% and was divided into two subclades.


Introduction
With the development of next-generation sequencing (NGS) technology, many studies have performed whole chloroplast genome sequencing. These studies have provided much information about plant taxonomy and evolution. The rapidly evolving loci identified by these studies are very important for resolving unclear phylogenetic relationships because they have a higher resolving power than that of traditional molecular markers. Therefore, many studies have focused on finding genic regions among specific families or genera to provide useful information about molecular markers for further studies [1][2][3][4][5][6].
Violaceae Batch. consists of approximately 22 genera and 1000-1100 species of herbs, shrubs, lianas, and trees [7][8][9]. The genus Viola L. comprises 583-620 species and is distributed mainly in temperate and tropical regions [7][8][10][11]. This genus is known as one of more difficult groups to classify because of the very similar external morphology characters among species and the many intermediate forms that exist due to frequent interspecies hybridization between closely related species [12][13][14][15].
For this reason, although many studies have been carried out, the phylogenetic relationships of Viola are still unclear among sections and/or species [7,11,13,[16][17][18][19]. This lack of clarity is because the molecular markers used by previous studies has low resolution to evaluate the phylogenetic relationships of Viola. Therefore, to correctly evaluate the phylogenetic relationships of Viola, the most suitable molecular markers should be selected via analyses of the sequence variation at each locus.
Among the four Viola species discussed in this study, three species (V. mirabilis L., V. raddeana Regel and V. websteri Hemsl.) are very rare because they are endangered in Korea. In particular, it is urgent to establish a conservation strategy for V. raddeana because this species only has a single, relatively small population of individuals in Gyeongsangnam-do Province [20]. V. phalacrocarpa Maxim. is not an endangered species, but various taxonomic data are needed because its taxonomic level is ambiguous due to its close relationship to V. seoulensis.
Here, we report the whole chloroplast genome sequences of four Viola species (V. mirabilis, V. phalacrocarpa, V. raddeana, and V. websteri) and the results of a comparative analysis between these species and the published genome of a congeneric species (V. seoulensis Nakai). The main goal of this study was to provide important information about the most suitable chloroplast molecular markers for further studies to solve unclear phylogenetic relationships of Viola via the calculation of the rate of evolution of each chloroplast genome loci. Furthermore, this study could expand the current understanding of the chloroplast genome characteristics of the genus Viola and provide basic chloroplast phylogenomic data for Violaceae, thus supporting the development of conservation strategies for endangered Violaceae species.

Sample collection, DNA extraction
Among the four Viola species in this study, three (V. mirabilis, V. raddeana, and V. websteri) are legally protected species. Therefore, samples of these species were collected with permission from the Ministry of Environment in Korea, with the following license numbers: 2015-15 (V. raddeana) and 2015-39 (V. mirabilis and V. websteri).

Sequencing, assembly, annotation, genome comparison and repeat analysis
Genomic DNA was used for sequencing by an Illumina MiSeq (Illumina Inc., San Diego, CA, USA) platform. The DNA of Viola species were sequenced to produce 8,920,660-9,244,544 raw reads with a length of 301 bp. These reads were aligned with the reference genome of Viola seoulensis (GenBank accession number: KP749924). A total 542,183 to 667,526 reads were mapped to the reference genome. The genome coverage was estimated using CLC Genomics Workbench v7.0.4 software (CLC-bio, Aarhus, Denmark). The genome coverages of the sequencing data from V. mirabilis, V. phalacrocarpa, V. raddeana, and V. websteri were 1002, 875, 986, and 1073, respectively.
The protein-coding genes, transfer RNAs (tRNAs), and ribosomal RNAs (rRNAs) in the plastid genome were predicted and annotated using Dual Organellar GenoMe Annotator (DOGMA) with the default parameters [21] and manually edited by a comparison with the published chloroplast genome sequence of Violaceae. tRNAs were confirmed using tRNAscan-SE [22]. The circular plastid genome map was drawn using OGDRAW [23].
The complete chloroplast genomes of five Viola species, including the four Viola species of this study and the previously published species (V. seoulensis), were compared using MAUVE [24]. The large single copy/inverted repeat (LSC/IR) and inverted repeat/small single copy (IR/ SSC) boundaries of these species were also compared and analyzed.
The REPuter program [25] was used to identify repeats (forward, reverse, palindrome, and complement sequences). The size and identity of the repeats were limited to no less than 30 bp and 90%, respectively. The simple sequence repeats (SSRs) in the chloroplast genome of the five Viola species were detected using Phobos v.3.3.12 (http://www.ruhr-uni-bochum.de/ ecoevo/cm/cm_phobos.htm). Repeats were �10 bp in length and had three repeat units for mono-, di-tri-, tetra-, penta-and hexanucleotides.

Divergence hotspot identification
The five chloroplast genomes of Viola were analyzed to identify rapidly evolving molecular markers that can be used in further phylogenetic studies of Viola. Both coding and noncoding region fragments in each plastid genome were extracted separately by applying the "Extract" option of Geneious v7.1.8 (Biomatters Ltd., Auckland, New Zealand). Then, the homologous loci were aligned individually using MAFFT [26]. To analyze nucleotide diversity (Pi), the total number of mutations (Eta), average number of nucleotide differences (K) and parsimony informative characters (PICs) were determined using DnaSP [27].
The result of the chloroplast genome structure comparison using MAUVE [24] showed that all five Viola plastomes were the same (S1 Fig). The LSC/IR and IR/SSC boundaries were conserved in Viola. In all five Viola chloroplast genomes, trnH-GUG was located in the LSC near the IRa/LSC border, and ndhF was located in the SSC near the IRb/SSC border. Additionally, pseudogenes of rps16 and ycf1 situated in the IRb were created by IR extending into the LSC and SSC regions, respectively (Fig 2).
Four classes of tandem repeats (forward, reverse, complement and palindrome) were investigated. The number of tandem repeats for each class is shown in Fig 3A. Additionally, the tandem repeats that ranged from 30 to 39 bp were the most abundant, followed by those that ranged from 40 to 49 bp. Moreover, among the chloroplast genomes of all five Viola species, that of V. websteri had the highest number of tandem repeats longer than 50 bp ( Fig 3B). The analysis of SSRs indicated that six categories of SSRs, i.e., mono-, di-, tri-, tetra-, pentaand hexanucleotide, were detected. The total number of SSRs was 64 in V. mirabilis, 56 in V. phalacrocarpa, 50 in V. raddeana, 43 in V. seoulensis, and 73 in V. websteri. The most dominant of SSRs were A/T mononucleotides. Only the V. websteri chloroplast genome had all six types of SSRs, and those of the other species had five types of SSRs, excluding the hexanucleotide SSR (Fig 3C).

Divergence hotspot regions in Viola
A total of 259 exon, intron and intergenic spacer (IGS) fragments were compared among the five Viola species to verify divergence hotspot regions. The Pi values ranged from 0 to 0.7544 (Fig 4 and S2 Table). The IR region was much more conserved than the LSC and SSC regions  Chloroplast genomes of four Viola species because the IR region had the most fragments with a relatively low Pi value. The Pi value was 0.03 or more in ten regions. Among these, eight (rps19-trnH, trnH-psbA, trnG-trnR, trnD-trnY, psbZ-trnG, petA-psbJ, petG-trnW, rpl16-rps3) were located in the LSC region, one (rpl2-rpl23) was located in the IR region, and one (ndhF-trnL) was located in the SSC region. Additionally, all regions with a high Pi value were IGSs.

Comparison of the chloroplast genomes of five Viola plastomes
Many recent studies have been carried out to solve taxonomic problems between related taxa using complete chloroplast genome sequences. The chloroplast genome is known to be very conservative in land plants, but structural changes in chloroplast genomes, such as gene duplication and deletion and inversion due to occasional rearrangements, provide important taxonomic data [31][32][33][34][35][36][37]. This study showed that the gene order of five Viola chloroplast genomes was identical, and the sequence identity was also very similar among species in most of the chloroplast regions (Fig 6 and S1 Fig). Therefore, these results indicate that the plastid genome of Viola is very conservative.
Ycf15 in V. mirabilis was 66 bp shorter than that in the other four Viola species because there was a premature stop codon due to a point mutation of one nucleotide (S2 Fig). This important data supports the taxonomic position of V. mirabilis. In addition, future studies should be performed to determine whether ycf15 is a pseudogene.
The number of the tandem repeats in the five Viola plastomes ranged from 39 (V. seoulensis) to 47 (V. raddeana), and the number of tandem repeats according to type and length showed a slightly difference across each species (Fig 3A, 3B). The presence and abundance of repetitive sequences in the chloroplast or nuclear genome are likely to involve many phylogenetic signals [38][39][40]. Therefore, the different abundances of tandem repeats among the plastid genomes of the five Viola species may provide additional evolutionary information. In

Selection of useful molecular marker regions for additional phylogenetic studies
The genus Viola is known as one of more difficult groups to study taxonomically since Viola has many morphologically similar species, and the creation of intermediate forms due to interspecific hybridization occurs freely [12][13][14][15]. Because of this external morphological complexity among species, although many taxonomic studies [7,9,11,13,16,18] have been conducted, the taxonomic positions and phylogenetic relationships within sections level of Viola are remain insufficiently resolved.
The molecular markers used in previous studies, except for the ITS region of nuclear DNA, were ten chloroplast DNA sequences, trnL, trnL-trnF, rbcL, atpB-rbcL, atpF-atpH, matK, psbA-trnH, psbK-psbI, rpl16 and rpoC1. The Pi values of these regions were calculated in this study, and all but psbA-trnH (0.06656) showed a very low Pi of 0.02510 or less (Fig 4 and S2 Table). Therefore, the low phylogenetic resolution of the previous studies was due to the selection of molecular marker regions with very low Pi values.

Phylogenetic implications
The phylogenetic analysis in this study produced an ML tree very similar to that of the Angiosperm Phylogeny Group (APG) system [41]. However, in the APG system, the main clade of Malpighiales was an unresolved polytomy, while in this study, the phylogenetic tree formed a monophyly as follows: Erythroxylaceae and Chrysobalanaceae formed a clade, and Euphorbiaceae formed a sister of the Salicaceae and Violaceae clade. These results are attributed to the increase in resolution resulting from the greater amount of sequence data used in this study. However, only a few species were included in this study, so additional studies that include more species are needed to clarify the phylogenetic relationships in Malpighiales.
In a previous study, the phylogenetic position of V. seoulensis was not identified, as it formed an unresolved polytomy with V. phalacrocarpa [11]. Based on the results of the phylogenetic analysis in the present study, it was not possible to confirm the exact phylogenetic position of V. seoulensis because not enough species were included in the analysis, but V. seoulensis was the most closely related to V. phalacrocarpa. An analysis of the chloroplast genomes of the two species in this study revealed that the total genome size of V. phalacrocarpa was 1335 bp longer than that of V. seoulensis, and the LSC, IR, and SSC junctions also largely differed between the two species. Additionally, the Pi between the two species was 2.22%. Therefore, it would be reasonable to recognize the two taxa as independent species rather than classifying them as variants, and we will carry out additional studies including allied species of V. phalacrocarpa and V. seoulensis to clarify their taxonomic positions.

Conclusion
We first report of the complete chloroplast genome sequences of four Viola species (V. mirabilis, V. phalacrocarpa, V. raddeana, and V. websteri), and analyzed these data compared to published congeneric species in genus Viola. Results of this study, six non-coding regions (trnH-psbA, trnG-trnR, trnD-trnY, psbZ-trnG, petA-psbJ, and ndhF-trnL) will presumably be very useful for resolving the many unclear phylogenetic relationships of the genus Viola. Phylogenetic analyses showed that Malpighiales is divided into five clades at the family level. Also, Violaceae and Viola were monophyletic, and was divided into two subclades.