The complete chloroplast genome sequences of three Adenophora species and comparative analysis with Campanuloid species (Campanulaceae)

We report the complete chloroplast genomes of three Adenophora species, and analyzed these compared them to five published Campanuloid plastomes. The total genome length of Adenophora divaricata, Adenophora erecta, and Adenophora stricta ranged from 159,759 to 176,331 bp. Among the eight Campanuloid species, many inversions were found to be only in the LSC region. IR contraction was also identified in the plastid genome of Adenophora stricta. Phylogenetic analyses based on 76 protein coding genes showed that Campanuloids are monophyletic, and are composed of two major groups: Campanula s. str. and Rapunculus. When we compared each homologous locus among the four Adenophora species, ten regions showed high nucleotide divergence value (>0.03). Among these, nine loci, excepting ycf3-rpoB, are considered to be useful molecular markers for phylogenetic studies and will be helpful to resolve phylogenetic relationships of Adenophora.

The genus Adenophora, which belong to Campanuloids, is a perennial herbaceous plant with approximately 50-100 species that are distributed in temperate regions in Asia and Europe. This genus was first described by Fischer [8], and then, a classification system was established based on various studies [9][10][11][12][13][14][15][16][17] according to morphological characteristics, such as the phyllotaxy, presence or absence of petioles, shape of the calyx, and length of the disk. However, there are still many different opinions regarding the sections and subsections within the Adenophora classification system because very similar morphological characteristics are found the species. Additionally, many taxonomic studies [4,[6][7][18][19][20][21][22][23][24][25][26][27]  Campanulaceae, however, the phylogenetic relationships among the Adenophora remain to be elucidated. Among the three Adenophora species discussed in this study, A. erecta is a very important species since it is an endemic species to Ulleung-do Island in Korea. This species was first described as a new species in 1997 [28] and is distinguished from A. remotiflora because the leaves are compactly arranged along the upper part of the stem, while the length of the disc is shorter than the width. Furthermore, A. divaricata and A. stricta are distributed in China, Japan, and Korea. In addition, A. divaricata differs from Adenophora pereskiifolia because the cylindrical disk is more than 1.2 times longer than the width, while the presence of trichome in the calyx tube is characteristic of A. stricta and distinguishe A. stricta from closely allied taxa, such as Adenophora lamarckii and Adenophora polyantha [29].
The structure of the chloroplast genome is highly conserved among land plants [30]. However, gene order changes due to rearrangements sometimes occur and can be used to obtain very important phylogenetic information [31][32][33]. Campanulaceae are known to have very different chloroplast genome structures in a genus or species due to many rearrangements [30][31][34][35][36][37][38]. Therefore, the chloroplast genome structure of Campanulaceae is very useful for identifying their unclear phylogenetic relationships. However, plastid genome research studies in Campanulaceae have not been extensively performed. Only a few species (Adenophora remotiflora, Campanula punctata, Campanula takesimana, Hanabusaya asiatica, and Trachelium caeruleum) were sequenced for the whole cp genome [30,[35][36][37][38].
In this study, we report the chloroplast genome sequences of three Adenophora species (A. divaricata, A. erecta, A. stricta), and compared these sequences to five published Campanuloid chloroplast genomes. The specific goals of this study were to (1) confirm the genome features of the three Adenophora species, (2) the changing tendency of the chloroplast genome structure of Campanuloids, and (3) identify the divergence hotspot regions to provide information regarding useful molecular markers for future phylogenetic studies in Adenophora.
DNA extraction, sequencing, assembly and genome mapping Total DNA was extracted from approximately 100 mg of fresh leaves using the DNeasy Plant Mini Kit (Qiagen Inc., Valencia, CA, USA). The total genomic DNA was used for sequencing by an Illumina MiSeq (Illumina Inc., San Diego, CA, USA) platform at LabGenomics (http:// www.labgenomics.com/). Three Adenophora species were sequenced to produce 3,754,935-4,223,586 raw reads with a length of 301 bp. These paired-end reads were aligned of Adenophora remotiflora cp genome (accession no. KP889213) as a reference. After screening these paired-end reads through alignment with A. remotiflora cp genome, 325,221 to 560,114 reads were extracted. A de novo assembly was performed using Geneious v.7.1.8 (Biomatters Ltd., Auckland, New Zealand). The chloroplast genome coverage was estimated using the CLC Genomics Workbench v7.0.4 software (CLC-bio, Aarhus, Denmark). The cp genome coverages of the sequencing data of A. divaricata, A. erecta, and A. stricta were 524, 779, and 947×, respectively. In addition, the junction of large single copy (LSC), small single copy (SSC), and inverted repeat (IR) regions, along with the end points of inversion and IR contraction, were reconfirmed by PCR and Sanger sequencing.
The protein coding genes, tRNAs, and rRNAs in the plastid genome were predicted and annotated by Dual Organellar GenoMe Annotator (DOGMA) using the default parameters [39]. Based on this initial annotation, the putative starts and stops and the intron positions were determined by comparisons with homologous genes in other Campanulaceae cp genomes. tRNAs were confirmed using tRNAscan-SE [40]. A circular plastid genome map was drawn using the OGDRAW program [41].
In total, 76 protein coding genes (S1 Table) in the eight Campanuloid species and one outgroup (Brighamia insignis, KT372780) were compiled into a single file and aligned with MAFFT v.7 [43]. In addition, the rpl23 and infA genes were excluded from the phylogenetic analysis data matrix, since most of these gene regions were deleted, and only a few regions existed. Before Maximum likelihood (ML) analysis, a search for the best fitting substitution model was performed using jModeltest v. 2.1.10 [44]. Based on the Akaike Infromation Criterion (AIC) and Akaike Information Criterion with Correction (AICc), GTR+I was the best model. ML analysis was performed using RAxML v7.4.2 [45] with 1,000 bootstrap replicates and the GTR+I model. Bayesian inference was performed using MrBayes v3.0b3 [46].

Divergence hotspot identification in Adenophora
Four chloroplast genomes, including A. divaricata, A. erecta, A. remotiflora and A. stricta, were analyzed to identify rapidly evolving molecular markers that can be used in future phylogenetic studies in Adenophora. Both coding genes and non-coding region fragments >200 bp were extracted separately from each plastid genome by applying the "Extract" option in Geneious v.7.1.8 (Biomatters Ltd., Auckland, New Zealand). Then, the homologous loci were aligned individually using MAFFT v.7 [43]. To analyze the nucleotide diversity (Pi), the total number of mutations (Eta), the average number of nucleotide differences (K) and the parsimony informative characters (PICs) were determined using DnaSP v.5.10 [47].

Genome features of the three Adenophora species
The chloroplast genomes of A. divaricata (accession no. KX462129), A. erecta (accession no. KX462130), and A. stricta (accession no. KX462131) have been submitted to GenBank of National Center for Biotechnology Information (NCBI). The plastid genome sizes of three Adenophora species ranged from 159,759 to 176,331 (Table 1 and Fig 1). The genome is composed of an LSC region (105,861-113,353 bp), SSC region (8,648-27,238 bp), and two IR copies (10,100-28,098 bp). Their overall GC contents are almost identical (38.5-38.7%). The chloroplast genomes of the three Adenophora species each contain 112 unique genes (Table 1). In the three Adenophora cp genomes, the following three genes (rpl23, infA, and clpP) are presumably nonfunctional: (1) only 46 bp of the 3' end of rpl23 exists, (2) only 202 bp in the middle of infA remain, and (3) only 38 bp in exon1 of clpP exists. Additionally, two tRNAs (trnI-CAU and trnV-GAC) and one gene (psbJ) had an additional one and two copies, respectively. In the chloroplast genomes of two species (A. divaricata and A. erecta), a portion of three genes (psbB, ycf3, and rrn23) was duplicated in the IRs. In rps12, the 3' exon and 5' exon were located in the LSC and IRs, respectively. However, the 5' exon in rps12, and portion of ycf3 and rrn23 were located in the SSC in A. stricta due to the IR contraction (Fig 2).

Comparison of the chloroplast genome structure in eight Campanuloid species
The gene order changes in the plastid genomes of the eight Campanuloid species were confirmed only in the LSC region (except for A. stricta). The gene order among the three Campanula s. str. species (C. punctata, C. takesimana, and T. caeruleum) and between the two sect. Remotiflorae species (A. erecta and A. remotiflora) of Adenophora was exactly the same (S1 Fig).
Among the three Campanula s. str. species and Hanabusaya, two inversions of large gene blocks (trnT-ndhC and psbM-trnS) were found. In addition, two inversions between Hanabusaya and sect. Remotiflorae and between sect. Remotiflorae and the remaining Adenophora species (A. divaricata and A. stricta) were identified (Fig 3).
The IR/LSC and IR/SSC borders in the eight Campanuloid plastid genomes were compared (Fig 4). Two genes (ycf2 and trnH-GUG) in the eight Campanuloid species were all located in the LSC. Ycf2 was separated from the LSC/IRb border by 191 bp (C. punctata and C. takesimana) to 341 bp (T. caeruleum), and trnH-GUG was separated from the IRa/LSC border by 118 bp (in five Rapunculus species) to 140 bp (T. caeruleum). In addition, trnL-CAA was found in the IR (separated from the LSC/IRb and IRa/LSC border by 172-180 bp), and ndhF was    In the A. stricta cp genome, the psbB pseudogene was duplicated in the IR near the IRb/SSC and SSC/IRa border, while the clpP pseudogene was located in the SSC adjacent to the SSC/IRa border. Meanwhile, the cp genome of A. stricta lost duplication copies of ndhG, ndhI, ndhA, ndhH, rps15, ycf1, 5' exon of rps12, and clpP due to the IR contraction (Fig 2). Phylogenetic analysis of eight Campanuloid species using 76 protein coding genes The ML tree, which used 76 protein coding genes from the eight species, revealed that Campanuloids were monophyletic. The Campanuloids formed the following two clades: The Campanula s. str. clade and the Rapunculus clade. All nodes in the phylogenetic tree were strongly supported, with 100% bootstrap (BP) values and 1.00 Bayesian posterior probabilities (PP).
In the Campanula s. str. clade, T. caeruleum is the earliest diverging lineage, which was identified as a sister to the other species in this group. Within the Rapunculus clade, H. asiatica formed a basal branch and was a sister to all other taxa. Additionally, A. erecta and A. remotiflora as well as A. divaricata and A. stricta formed a clade (Fig 5).

Chloroplast genome organization in Campanuloids
The LSC, SSC, and IR regions in the three Adenophora cp genomes varied in length. In particular, the length of the IR regions in A. stricta was significantly shorter than that in the other species, which is thought to be due to the IR contraction. The length of the LSC region in A. erecta was approximately 7,000 bp shorter than that in the other species, and the lengths of the SSC and IR regions in A. erecta were approximately 2,500 bp and 900 bp longer than those in the A. divaricata cp genome, respectively. It is believed that the length difference in the LSC region was due to a change in the length of the intergenic spacer (IGS) in psbJ-ndhC, trnT (UGU)-psbJ and psbJ-ycf3, which is the end point of the inversion. Additionally, the length difference in the SSC and IR regions was identified to be caused by the length difference in the IGS regions in portion of ycf3-rps12(5'), which is located in the IR regions, and ndhF-rpl32, which is located in the SSC region, regardless of the inversion.
The Campanuloid plastomes were identified to have a very different gene order when compared to the cp genome of B. insignis belonging to Campanulaceae s. l. This is important evidence that many rearrangements in the cp genome occurred among the Campanulaceae species. However, it is very difficult to estimate the exact gene order change process by only the cp genomes analyzed so far, since the plastid genome of species belonging to Wahlenbergioids and Platycodonoids has not been analyzed in any species to date.
The results of this study revealed that the chloroplast genome structure of two species of Campanula (C. punctata and C. takesimana) was almost identical to that of Trachelium regarding the gene order and duplication of certain genes. In addition, we confirmed that the genome structures two Remotiflorae species (A. erecta and A. remotiflora) of Adenophora were the same. Therefore, we believe that the cpDNAs of members of Campanula s. str. and sect. Remotiflorae evolved via identical routes. Genome structure modifications in the phylogenetic groups of Campanula s. str. and Rapunculus were identified only in the LSC region (except for A. stricta). Duplicated copies of psbJ and trnV-GAC were located at the inversion endpoint (Fig  3). This duplication might be attributed to inversions, and these duplicated genes might have caused multiple inversion events within Campanuloids.
Many interspecific and intergeneric relationships of Campanuloids currently remain unclear. Specifically, the phylogenetic relationships among Adenophora are currently unresolved. In this study, the genome structures of each section of Adenophora, such as sect. Remotiflorae (A. erecta and A. remotiflora), sect. Platyphyllae (A. divaricata), and sect. Microdiscus (A. stricta) were different. Therefore, we believe that the chloroplast genome structure can be used as an important characteristic to differentiate the Adenophora.
Among the 11 plastid-encoded ndh genes, a loss of function of ndhK in the chloroplast genome has been reported in Pinus thunbergii and Phalaenopsis aphrodite [48][49]. Haberle et al. [30] have suggested that ndhK might be a pseudogene in T. caeruleum because it contains multiple internal stop codons. We compared ndhK in T. caeruleum to that in the seven Campanuloid species. The results of the comparison revealed that ndhK in T. caeruleum was not a pseudogene. Its earlier identification as a pseudogene might be due to an error in the annotation process. In conclusion, ndhK in the chloroplast genomes of Campanuloid species is a functional gene (S2 Fig).

Phylogenetic implications
Recent phylogenetic studies [4,[6][7] have proposed that Campanuloids can be divided into the following two clades: Campanula s. str. clade and Rapunculus clade. The results of this study were based on 76 protein coding genes, and these two clades were well supported and clearly distinguished by the gene order in the plastid genome structure (Fig 5).
The phylogenetic tree constructed in this study showed that A. erecta and A. remotiflora (sect. Remotiflorae) formed a sister clade to the A. divaricata (sect. Platyphyllae) and A. stricta (sect. Microdiscus) clade (Fig 5). Therefore, it is hypothesized that sect. Remotiflora had the earliest divergence among the section of Adenophora. Additionally, sect. Platyphyllae had an earlier divergence from a common ancestor than sect. Microdiscus because A. stricta had a unique IR structure among the Campanulaceae due to the IR contraction (Figs 2 and 5).
Meanwhile, H. asiatica is a very important plant resource since it is monotypic and one of the six endemic genera in Korea. It was first described as Symphyandra asiatica Nakai [50] due to its connate anthers but was segregated into a new genus, Hanabusaya, based on its morphological characteristics [51]. However, the phylogenetic position of Hanabusaya as an endemic genus remain unclear despite the many studies that have been performed [7,18,[24][25][26]52]. In this study, Hanabusaya formed a sister clade to the Adenophora clade. Thus, Hanabusaya has evolved through a different evolutionary route than the morphologically similar Campanula s. str. species, including Campanula and Symphyandra. Additionally, Hanabusaya has a unique cp genome structure compared to that in the other species used in this study (Fig 3). Therefore, the phylogenetic position of Hanabusaya was well supported in this study as an endemic genus. However, only a few species were included in this study, and thus, we believe that further studies that include various species are needed to clarify the phylogenetic position of Hanabusaya.

Useful molecular markers information for the phylogeny of Adenophora
Despite the many phylogenetic studies performed [4,6,[25][26], the phylogenetic relationships among the Adenophora species remain unclear. We believe that this is due to nucleotide variations in molecular markers, such as atpB, matK, rbcL, atpB-rbcL, atpF-atpH, rpl16, rpoC1, and trnL-trnF, which were very low in previous studies (Fig 6 and S2 Table).
The results of this study showed that the nucleotide diversity in ten regions, including one coding gene and nine non-coding regions, had relatively high calculated values (>0.03). However, among these regions, one non-coding region (ycf3-rpoB) was not considered suitable as a phylogenetic marker because it had a very small number of PICs compared to the Eta and/or aligned length (S2 Table).

Conclusions
We provide the first report of the complete plastid genome sequences of A. divaricata, A. erecta, and A. stricta and compared these sequences to those of five Campanuloid species in Campanulaceae. The results of the genome structure comparison confirmed many inversions. The phylogenetic analyses showed that Campanuloids were divided two major groups. The divergence hotspots clarified in this study could be used as molecular markers that will be helpful for elucidating the phylogenetic relationships among the Adenophora species.