Identification of plastid genomic regions inferring species identity from de novo plastid genome assembly of 14 Korean-native Iris species (Iridaceae)

Yang Jae Kang; Soonok Kim; Jungho Lee; Hyosig Won; Gi-Heum Nam; Myounghai Kwak

doi:10.1371/journal.pone.0241178

Abstract

Iris is one of the largest genera in the family Iridaceae, comprising hundreds of species, including numerous economically important horticultural plants used in landscape gardening and herbal medicine. Improved taxonomic classification of Iris species, particularly the endangered Korean-native Iris, is needed for correct species delineation. To this end, identification of diverse genetic markers from Iris genomes would facilitate molecular identification and resolve ambiguous classifications from molecular analyses; however, only two Iris plastid genomes, from Iris gatesii and Iris sanguinea, have been sequenced. Here, we used high-throughput next-generation sequencing, combined with Sanger sequencing, to construct the plastid genomes of 14 Korean-native Iris species with one outgroup and predict their gene content. Using these data, combined with previously published plastid genomes from Iris and one outgroup (Sisyrinchium angustifolium), we constructed a Bayesian phylogenetic tree showing clear speciation among the samples. We further identified sub-genomic regions that have undergone neutral evolution and accurately recapitulate Bayesian-inferred speciation. These contain key markers that could be used to identify and classify Iris samples into taxonomic clades. Our results confirm previously reported speciation patterns and resolve questionable relationships within the Iris genus. These data also provide a valuable resource for studying genetic diversity and refining phylogenetic relationships between Iris species.

Citation: Kang YJ, Kim S, Lee J, Won H, Nam G-H, Kwak M (2020) Identification of plastid genomic regions inferring species identity from de novo plastid genome assembly of 14 Korean-native Iris species (Iridaceae). PLoS ONE 15(10): e0241178. https://doi.org/10.1371/journal.pone.0241178

Editor: Giuseppe Pellegrino, Universita degli Studi della Calabria Dipartimento di Biologia Ecologia e Scienze della Terra, ITALY

Received: November 14, 2019; Accepted: October 12, 2020; Published: October 26, 2020

Copyright: © 2020 Kang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The chloroplast assemblies of this study were deposited in NCBI with GenBank ID listed in Table 2.

Funding: - MHK - NIBR201905102 - grant from the National Institute of Biological Resources (NIBR) - https://www.nibr.go.kr - No.

Competing interests: The authors have declared that no competing interests exist.

Introduction

The Iris genus is comprised of hundreds of species, making it one of the largest in the Iridaceae family. This group contains a large number of plants used for aesthetic purposes, such as landscape gardening, as well as many economically important medicinal plants. Portions of Iris plants have been used in traditional medicine for detoxification, as well as for treating constipation, stomach ache, and sore throat [1]. Iris species are distributed across Europe, Asia, and America, and display high levels of genome diversity and variable ploidy [2–4]. In Korea, several native Iris species are distributed across diverse environments, ranging from dry to wet regions. Additionally, some species are currently considered endangered (e.g. Iris laevigata, Iris ruthenica, and Iris koreana) and are subject to legal protection by the Korean government.

To date, the phylogenetic relationships among species in the Iris genus have been determined based on genomic regions in the chloroplast and nucleus, such as the internal transcribed spacer (nrITS), matK, ndhF, trnL-trnF, trnQ-rps16, and trnS-trnfM [5–7]. Although these methods have been used for most members of the Iridaceae, it is not clear whether the phylogenetic relationships among clades and closely related species have been clearly identified due to insufficient taxonomic coverage or lack of informative sites [5–7]. Further, in addition to problems arising from insufficient sampling and poor resolution of molecular markers, phylogenetic relationships among species, particularly closely related ones, are often difficult to resolve due to factors such as frequent hybridisation and taxonomic ambiguity [8, 9].

Recently, phylogenetic analysis of whole chloroplast genomes was suggested as an alternative to provide better resolution for species designation [10, 11]. In support of this, consolidation of alignments for the majority of genes in a plastid genome has been successfully used for building species trees in a number of instances [12–14]. Angiosperm speciation, for example, was investigated using plastid genes, providing strong support for the early diverged flowering lineage Amborella [12]. Brassica speciation was also elucidated with whole-plastid genome sequencing-consolidated plastid gene trees [13]. The development of next-generation sequencing (NGS) technology and advances in bioinformatic tools have further facilitated the assembly of complete plastid genome sequences from plants [12, 15]. However, due to cost and the need for large amounts of computing power, it remains difficult to decipher whole plastid genomes from a sufficient number of samples to elucidate low-level phylogeny and enable delineation of species.

Currently, a total of 280 Iris species have been documented in NCBI with taxonomy IDs. However, only two plastid genomes, those from Iris gatesii and Iris sanguinea, have been sequenced. In this study, we used high-throughput NGS technology, together with Sanger sequencing, to decipher the plastid genomes of 14 Korean-native Iris species and predict their gene content. Using these data and the published plastid genomes from I. gatesii and I. sanguinea, we compared the Iris species by pair-wise Ks value calculation and successfully constructed a Bayesian phylogenetic tree. We then extracted representative regions from whole plastid genomes reflecting the phylogeny of Iridaceae using the scores from the neutrality test. The speciation of closely related species was re-verified with traditional phylogenetic analysis using matK sequences from 117 Iris accessions. From the representative CP genomic regions, the resolution of the Iris species classification would be increased for the identification and protection of endangered Korean native Iris species.

Results

Chloroplast genome sequence assembly from 14 Korean-native Iris species

The complete plastid genome sequences were determined for 14 Korean-native Iris species and one outgroup species, Sisyrinchium angustifolium, using NGS and Sanger sequencing technology (Table 1). Genomic sequences of approximately 0.9–2.3 Gbps were generated from each species using the Illumina platform (Table 2). Plastid genome sequences, ranging from 150,947–153,730 bp in length, were also extracted and assembled. Based on these assembled plastid sequences, 83 genes were predicted for each species (S1 Table). Implementation of a curation process to meet the NCBI submission standard resulted in a total of 63–73 coding genes for each species (Table 3). This variation in the number of coding sequences is partly due to assembly ambiguities (erroneous insertions and variants) that could not be properly translated into start and stop codons for certain genes. These resulted from predicted coding sequences (CDS) that were not in multiples of three or contained an improper codon at the start or end of the protein. Hence, the absence of genes in each assembly does not necessarily indicate the true absence of genes from the evolutionary process. In addition, a total of 30–31 tRNAs and 12 rRNAs were annotated in each plastid assembly. One of three tRNAs, including ’trnG-UCC’, ’trnK-UUU’, and ’trnnull-NNN’, were not annotated in some species, possibly due to sequencing errors. The large single copy (LSC), small single copy (SSC), and inverted repeat (IR) regions were also identified, displaying average lengths of 82,255 bp, 18,060 bp, and 26,053 bp, respectively (Table 3).

Download:

Table 1. Iris species and outgroup used in this study.

https://doi.org/10.1371/journal.pone.0241178.t001

Download:

Table 2. NGS sequencing statistics for Iris species and outgroup.

https://doi.org/10.1371/journal.pone.0241178.t002

Download:

Table 3. Summary of plastid genome assemblies and gene annotations.

https://doi.org/10.1371/journal.pone.0241178.t003

Ks value-based classification

Ks values are calculated by estimating synonymous changes within a coding sequence, which are believed to provide a metric for the length of time following speciation, without being affected by the selection process. The pairwise comparison between two species generates a Ks value distribution for orthologous gene pairs, and the peak value of the distribution provides a good proxy for estimating relative species divergence time [16]. Therefore, in order to estimate speciation for the 17 Iris members in our study, pairwise Ks values were calculated. The Ks value distributions from Iris odaesanensis to each species were then plotted to visualise speciation signals displaying variable peaks (Fig 1a). From all pairwise combinations of Iris species and the outgroup (S. angustifolium), peak Ks values were extracted, and we built a triangle distance table of peak Ks values (Fig 1b). These data displayed close relationships, such as 1) I. koreana and Iris minutoaurea, 2) I. ruthenica and Iris uniflora, and 3) Iris rossii var. rossii and I. rossii var. latifolia (Fig 1b). Close relationships were also indicated from a matK-based phylogenetic tree generated from a set of 117 Iris accessions displaying the clades: 1) I. koreana and I. minutoaurea and 2) I. ruthenica and I. uniflora (S1 Fig).

Download:

Fig 1. Ks distributions for Iris species pairwise comparisons.

(a) Pairwise Ks histogram nested to I. odaesanensis. (b) All-to-all triangle heatmap for modal values of pairwise Ks distributions.

https://doi.org/10.1371/journal.pone.0241178.g001

Species tree reconstruction using the Bayesian method by 57 chloroplast genes

To determine a reliable pattern for Iris speciation, we implemented the Bayesian inference (BI) method with the BEAST software package [17]. Using 57 intact single-copy plastid genes that were predicted from each genome assembly, we built a species tree comprised of four distinct clades (Fig 2). The posterior probabilities on each branching node were within a reliable range, from 0.9 to 1, and all clades diverge from the outgroup, S. angustifolium. Clade I consists of I. gatesii (Subgenus Iris, Section Oncocyclus), together with Iris domestica and Iris dichotoma. Clades II, III, and IV represent the Subgenus Limniris. Clade II is comprised of Iris ensata (Subgenus Limniris, Section Limniris, Series Laevigatae), Iris pseudacorus (Subgenus Limniris, Section Limniris, Series Laevigatae), I. setosa (Subgenus Limniris, Section Limniris, Series Tripetalae), I. laevigata (Subgenus Limniris, Section Limniris, Series Laevigatae), and I. sanguinea (Subgenus Limniris, Section Limniris, Series Sibiricae). Clade III contains I. ruthenica (Subgenus Limniris, Section Limniris, Series Ruthenicae), I. uniflora (Subgenus Limniris, Section Limniris, Series Ruthenicae), and Iris lactea (Subgenus Limniris, Section Limniris, Series Ensatae). Clade IV in comprised of Series Chinenses and includes I. koreana (Subgenus Limniris, Section Limniris, Series Chinenses), I. minutoaurea (Subgenus Limniris, Section Limniris, Series Chinenses), I. odaesanensis (Subgenus Limniris, Section Limniris, Series Chinenses), and I. rossii (Subgenus Limniris, Section Limniris, Series Chinenses).

Download:

Fig 2. Bayesian inference phylogenetic tree generated using 57 intact single-copy plastid protein sequences that were predicted from each plastid genome assembly.

The species with pictures are Korean-native Iris. The branch colours and values correspond to posterior values.

https://doi.org/10.1371/journal.pone.0241178.g002

Identification of plastid marker sequences to facilitate construction of phylogenetic trees

Here, in order to select the sub-genomic regions for this analysis, we attempted to implement the Tajima’s D test that estimates the evolutional neutrality of observed genomic regions. Whole plastid genomes from our study were aligned using Cactus software [18]. Well-aligned sub-genomes were then collected, and both the diversity (pi) and Tajima’s D were calculated (Fig 3a). Theoretically, Tajima’s D can statistically detect a non-random evolution process, which includes various types of selection [19]. We detected five well-aligned sub-genomic regions showing Tajima’s D > -0.5 (Fig 3a). This threshold was determined at a higher bar than Tajima’s D = -0.9 from the matK sequence alignment of 117 Iris accessions (S2 Table). Our selected sub-genomic regions were 998 bp in total length (S3 Table), with 124 segregating sites on the alignments, excluding gaps. We further applied hierarchical clustering on the genotype matrix of segregating sites versus 17 species (Fig 3b). Notably, the dendrogram generated from our clustering analysis displayed consistent phylogeny with the BI species tree. The maximum likelihood (ML) tree with 1,000 bootstrap values on same genotype matrix also showed classification of Iris species consistent with the BI tree (Fig 3c), indicating that the 124 segregating sites we selected are informative enough to recapitulate the BI phylogenetic tree.

Download:

Fig 3. Plastid sub-genome selection by Tajima’s D statistics.

(a) Whole-plastid genome Tajima’s D distribution revealing few regions with values higher than -0.5. Upper panel shows diversity (pi) values, and lower panel depicts Tajima’s D values. (b) Hierarchical clustering of samples (rows) and polymorphic sites (columns) in the selected plastid sub-genome matrix by Tajima’s D. (c) Maximum likelihood phylogenetic tree with 1,000 bootstrap values generated from the genotype matrix of the selected plastid sub-genomes by Tajima’s D.

https://doi.org/10.1371/journal.pone.0241178.g003

Discussion

Using the whole-plastid genome assemblies of 14 native Korean Iris species determined in our study, together with the previously published I. gatesii and I. sanguinea plastid genomes [20, 21], we performed pairwise Ks calculation and BI phylogenetic tree construction with 57 non-redundant plastid genes to enable the observation of speciation among Korean-native Iris. From our BI phylogenetic tree, we observed that Series Chinenses species, which did not co-cluster in with psbA-trnH and trnL-F-based phylogenetic trees calculated in a previous study [7], formed a single clade in our analysis (Fig 2). In addition, I. gatesii, I. dichotoma, and I. domestica clustered into a single clade. Until recently, I. domestica, known as blackberry lily, has been considered to be a single species belonging to the genus Belamcanda, as Belamcanda domestica, due to its unique morphological features, such as subequal tepal and ligulate style branches, not found in most Iris species [22]. However, recent molecular studies using matK [22], trnL-trnF, and the plastid intergenic region [23] clearly showed that I. domestica is nested in the genus, Iris, and closely related to I. dichotoma. This species also has unique morphological characteristics, and as such, I. dichotoma has previously been classified in a separate subgenus, section, or subsection of the Iris genus, and, alternatively, has also been proposed as a member of the distinct genus Pardanthopsis [24–26]. In contrast, our BI tree showed that I. domestica and I. dichotoma are nested within the genus Iris, and are phylogenetically closely related, displaying both a short branch length (0.0014) and a BI posterior value of 1.0. Our plastid sequencing therefore shows a close phylogenetic relationship between I. dichotoma and I. domestica and supports the transfer of Belamcanda domestica into the Iris genus.

In order to construct a reliable species tree, it is important to select an informative genomic region that can classify query samples into the right clade. This can be accomplished using the entire set of single-copy genes; however, this is practically expensive with regards to the analysis procedures that are required. Based on the premise that our BI phylogenetic tree represents a reliable representation of phylogenetic relationships within the Iris genus, we then selected a subset of genomic regions that can recapitulate the BI phylogenetic tree topology with speciation signals calculated by Tajima’s D statistics. While Tajima’s D was originally hypothesized for estimating selective pressures within a single species [19], the set of genes showing notably high Tajima’s D value successfully recapped the topology of the BI phylogenetic tree. Nevertheless, our study still violates the original hypothesis of the Tajima’s D test, and it would be difficult to generalize the evolutionary neutral regions selected. Rather, we propose that the Tajima’s D distribution can capture the genomic regions preserving the speciation signals on the alignment blocks of highly conserved chloroplast sequences. Using the DNA barcode at the matK gene, our collection of 117 Iris samples showed a Tajima’s D value of -0.9. Using a slightly more conservative value (absolute Tajima’s D <0.5) as our threshold, we then calculated the Tajima’s D distribution in our whole Iris plastid genomes after multiple sequence alignment with the outgroup, S. angustifolium. Use of an outgroup introduces a number of rare alleles and increases the number of segregating sites in the alignments, causing the overall Tajima’s D distribution to shift towards positive selection (negative value), as compared to the Tajima’s D distribution without the outgroup. As expected, we observed that Tajima’s D values were distributed in the range lower than -1 (Fig 3). Only a few regions showed Tajima’s D values higher than our threshold, and these were selected as candidate plastid genome regions that may conserve Iris speciation signals. Notably, the phylogenetic tree constructed using concatenated sequences of the candidate representative regions successfully recapitulates the topology of the BI phylogenetic tree. Moreover, genes proximal to the candidate plastid genome representative regions include matK, psbI, atpA, ycf3, ndhD, and psaC. Interestingly, the matK and psbI regions, which have been used as noncoding spacers (psbK–psbI), have also been proposed as DNA barcode markers [27].

From our analysis, problematic species complexes were also confirmed using pairwise Ks values of the genes identified in plastid genomes. Pairwise Ks value distribution showed highly similar relationships between I. koreana vs. I. minutoaurea and I. ruthenica vs. I. uniflora (Fig 1b). These species are quite difficult to distinguish due to their lack of distinct morphological features, as well as the presence of suspected hybrids between these species [28, 29]. Here we found that, in addition to displaying low Ks values, these species did not form separate clades in the matK-based phylogenetic tree of 117 Iris accessions (S1 Fig). A previous phylogenetic study of Korean Iris species using partial plastid DNA sequences, such as psbA-trnH and trnL-F, also indicated that the phylogenetic relationship between I. minutoaurea and I. koreana was not clear and thus needed to be improved using diverse genetic markers to clarify ambiguous classification. Here, our results reveal that the delineation of those species complexes remains unclear and needs to be examined further.

In summary, we constructed the plastid genome assemblies of 14 Korean-native Iris species and performed Ks value-based classification. In addition, using a BI phylogeny calculated from the alignment of 57 predicted plastid proteins, we provide suggestions for resolving classification ambiguities within the Iris genus, and further identify representative plastid genomic regions that may be informative for cost-efficient classification. Critically, these findings provide a valuable resource for determining phylogenetic relationships within the Iris genus and can be further be utilised for the identification and protection of endangered Iris species.

Methods

Plant materials

Collection information for species used in this study is shown in Table 4. The voucher specimens are deposited in the herbaria of the Korean National Institute of Biological Resources (KB) and Daegu University (DGU). Young leaves were collected from plants, dried in silica gel, and store at -80°C until use.

Download:

Table 4. Species collected in this study.

https://doi.org/10.1371/journal.pone.0241178.t004

DNA extraction and sequencing

Total DNA extraction from plant leaves was performed using the DNeasy Plant Mini Kit (QIAGEN, Hilden, Germany), and the HiGen Gel & PCR Purification Kit (Biofact Inc., Daejeon, Korea) was used for DNA purification. Extracted DNA was sequenced with the Illumina NGS platform and by the traditional Sanger sequencing method (Table 1). Sanger sequencing was performed as previously described [30]. Sequences of DNA fragments were determined using the ABI Prism BigDye Terminator Cycle Sequencing Kit, ver. 3.0 (QIAGEN) and an ABI 3700 Analyzer (Applied Biosystems, Foster City, CA) by genome walking methods. The chromatograms and alignments were visually checked and verified using Sequencer 5.0 (Gene Codes Corporation, Ann Arbor, MI, USA). Using Illumina NGS methods, about 4.4~6.9 million reads were generated on MiSeq platform for each Iris species. Around 3.5~5.5 million high quality reads obtained using quality_trim method (minimum quality score of 20) within the CLC assembly cell package, accounting for about 80% of raw reads and 0.9~1.5 Gb in length, were used for plastid genome assembly. Plastid genome-associated reads were extracted and reconstructed into full plastid genomes using CLC genome assembler, ver. 4.06 beta (CLC Inc, Rarhus, Denmark) software with manual inspection, yielding genomes ranging from 150 to 153 kb in length (Table 1). Assembled plastid genomes were annotated for genes, rRNAs, and tRNAs with GeSeq [31] and tRNAscan-SE software [32]. The annotations were curated to meet NCBI submission criteria, as follows: 1) plastid genome sequences showing internal stop codons were changed into ‘N’ and 2) genes missing the start and/or stop codon were removed.

Bayesian tree construction

Genes predicted in plastid genomes were filtered using the following criteria: 1) they must be present in only one copy and 2) they must not contain any ‘N’s. Using these criteria, a total of 57 plastid genes was extracted from each of the 17 Iris species. The protein sequences encoded by these genes were then aligned using PRANK software [33], and protein alignments of the 57 gene products were parsed using BEAUti software. Construction of the Bayesian species tree was performed with BEAST software, ver. 1.10.4 [17], and this process was initiated with a random starting tree. Two runs of the Markov Chain Monte Carlo (MCMC) chain, at 50 million generations were implemented, with sampling at every 5,000 steps. The relaxed-clock model was used with lognormally distributed uncorrelated rates. To assign the protein evolutionary model, we used ProtTest software for the alignments and selected the best model with PlastidREV [34].

Phylogenetic analysis

To estimate the selection process that occurred for each gene in the plastid genome, Tajima’s D statistics were applied using DendroPy [35]. The coding sequences of our 57 unique single-copy plastid genes were extracted and aligned using PRANK, with the option, ‘-codon’ [33]. The resulting FASTA alignment files were supplied to the dendropy.calculate.popgenstat.tajimas_d module, with ignore_uncertain = True. Pairwise comparisons were performed using PRANK software, and the Ks values were calculated with KaKs_Calculator [36]. The phylogenetic tree generated from selected plastid genome regions was inferred by the maximum likelihood method and the Tamura-Nei model [37], and these analyses were conducted with the MEGA X software package [38].

Supporting information

S1 Fig. Phylogenetic tree of the 117 Iris collection by the DNA barcode at the matK gene.

https://doi.org/10.1371/journal.pone.0241178.s001

(PDF)

S1 Table. 83 predicted genes in each assembled CP genome (this includes incomplete genes).

https://doi.org/10.1371/journal.pone.0241178.s002

(XLSX)

S2 Table. Segregating site of matK region from 117 Iris species.

https://doi.org/10.1371/journal.pone.0241178.s003

(XLSX)

S3 Table. CP sub genomes with less selection pressure.

https://doi.org/10.1371/journal.pone.0241178.s004

(XLSX)

References

1. Wang H, Cui Y, Zhao C. Flavonoids of the genus Iris (Iridaceae). Mini Rev Med Chem. 2010;10: 643–661. pmid:20500154
- View Article
- PubMed/NCBI
- Google Scholar
2. Wheelwright NT, Begin E, Ellwanger C, Taylor SH, Stone JL. Minimal loss of genetic diversity and no inbreeding depression in blueflag iris (Iris versicolor) on islands in the Bay of Fundy. Botany. 2016;94: 543–554.
- View Article
- Google Scholar
3. Lim KY, Yoong Lim K, Matyasek R, Kovarik A, Leitch A. Parental Origin and Genome Evolution in the Allopolyploid Iris versicolor. Ann Bot. 2007;100: 219–224. pmid:17591610
- View Article
- PubMed/NCBI
- Google Scholar
4. Artiukova EV, Kozyrenko MM, Iliushko MV, Zhuravlev IN, Reunova GD. [Genetic variability of Iris setosa]. Mol Biol. 2001;35: 152–156.
- View Article
- Google Scholar
5. Wilson CA. Phylogeny of Iris based on chloroplast matK gene and trnK intron sequence data. Mol Phylogenet Evol. 2004;33: 402–412. pmid:15336674
- View Article
- PubMed/NCBI
- Google Scholar
6. Guo J, Wilson CA. Molecular Phylogeny of Crested Iris Based on Five Plastid Markers (Iridaceae). Syst Bot. 2013;38: 987–995.
- View Article
- Google Scholar
7. Lee, H. J., Yeungnam University, Gyeongsan, Republic of Korea, Park, S. J., Yeungnam University, Gyeongsan, Republic of Korea. A phylogenetic study of korean Iris L. Based on plastid DNA (psbA-trnH, trnL-F) sequences. Sigmul Bunryu Hag-hoeji. sep2013;43. http://agris.fao.org/agris-search/search.do?recordID=KR2015003937
8. de Abreu NL, Alves RJV, Cardoso SRS, Bertrand YJK, Sousa F, Hall CF, et al. The use of chloroplast genome sequences to solve phylogenetic incongruences in Polystachya Hook (Orchidaceae Juss). PeerJ. 2018;6: e4916. pmid:29922511
- View Article
- PubMed/NCBI
- Google Scholar
9. Wheeler AS, Wilson CA. Exploring Phylogenetic Relationships within a Broadly Distributed Northern Hemisphere Group of Semi-Aquatic Iris Species (Iridaceae). Syst Bot. 2014;39: 759–766.
- View Article
- Google Scholar
10. Bi Y, Zhang M-F, Xue J, Dong R, Du Y-P, Zhang X-H. Chloroplast genomic resources for phylogeny and DNA barcoding: a case study on Fritillaria. Sci Rep. 2018;8: 1184. pmid:29352182
- View Article
- PubMed/NCBI
- Google Scholar
11. Yang Z, Zhao T, Ma Q, Liang L, Wang G. Comparative Genomics and Phylogenetic Analysis Revealed the Chloroplast Genome Variation and Interspecific Relationships of Corylus (Betulaceae) Species. Front Plant Sci. 2018;9: 927. pmid:30038632
- View Article
- PubMed/NCBI
- Google Scholar
12. Jansen RK, Cai Z, Raubeson LA, Daniell H, Depamphilis CW, Leebens-Mack J, et al. Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns. Proc Natl Acad Sci U S A. 2007;104: 19369–19374. pmid:18048330
- View Article
- PubMed/NCBI
- Google Scholar
13. Li P, Zhang S, Li F, Zhang S, Zhang H, Wang X, et al. A Phylogenetic Analysis of Chloroplast Genomes Elucidates the Relationships of the Six Economically Important Brassica Species Comprising the Triangle of U. Front Plant Sci. 2017;8: 111. pmid:28210266
- View Article
- PubMed/NCBI
- Google Scholar
14. Choi KS, Kwak M, Lee B, Park S. Complete chloroplast genome of Tetragonia tetragonioides: Molecular phylogenetic relationships and evolution in Caryophyllales. PLoS One. 2018;13: e0199626. pmid:29933404
- View Article
- PubMed/NCBI
- Google Scholar
15. Shaw J, Shafer HL, Leonard OR, Kovach MJ, Schorr M, Morris AB. Chloroplast DNA sequence utility for the lowest phylogenetic and phylogeographic inferences in angiosperms: the tortoise and the hare IV. Am J Bot. 2014;101: 1987–2004. pmid:25366863
- View Article
- PubMed/NCBI
- Google Scholar
16. Wolfe KH, Gouy M, Yang YW, Sharp PM, Li WH. Date of the monocot-dicot divergence estimated from chloroplast DNA sequence data. Proc Natl Acad Sci U S A. 1989;86: 6201–6205. pmid:2762323
- View Article
- PubMed/NCBI
- Google Scholar
17. Drummond AJ, Suchard MA, Xie D, Rambaut A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol. 2012;29: 1969–1973. pmid:22367748
- View Article
- PubMed/NCBI
- Google Scholar
18. Paten B, Earl D, Nguyen N, Diekhans M, Zerbino D, Haussler D. Cactus: Algorithms for genome multiple sequence alignment. Genome Res. 2011;21: 1512–1528. pmid:21665927
- View Article
- PubMed/NCBI
- Google Scholar
19. Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989;123: 585–595. pmid:2513255
- View Article
- PubMed/NCBI
- Google Scholar
20. Lee H-J, Nam G-H, Kim K, Lim CE, Yeo J-H, Kim S. The complete chloroplast genome sequences of Iris sanguinea donn ex Hornem. Mitochondrial DNA A DNA Mapp Seq Anal. 2017;28: 15–16. pmid:26641138
- View Article
- PubMed/NCBI
- Google Scholar
21. Wilson CA. The Complete Plastid Genome Sequence of Iris gatesii (Section Oncocyclus), a Bearded Species from Southeastern Turkey. Aliso: A Journal of Systematic and Evolutionary Botany. 2014;32: 47–54.
- View Article
- Google Scholar
22. Goldblatt P, Mabberley DJ. Belamcanda Included in Iris, and the New Combination I. domestica (Iridaceae: Irideae). Novon St Louis Mo. 2005;15: 128–132.
- View Article
- Google Scholar
23. Tillie N, Chase MW, Hall T. MOLECULAR STUDIES IN THE GENUS IRIS L.: A PRELIMINARY STUDY. Annali di Botanica. 2000;58.
- View Article
- Google Scholar
24. Sim JK. “Iridaceae” in The Genera of Vascular Plants of Korea. Flora of Korea Editorial Committee., editor. Seoul: Academy Publishing Co.; 2007. pp. 1326–1331.
25. Mathew B. The Iris. Universe Books, New York; 1981.
26. James W. Waddick YZ. IRIS OF CHINA. Timber Press, United States; 1992.
27. CBOL Plant Working Group. A DNA barcode for land plants. Proc Natl Acad Sci U S A. 2009;106: 12794–12797. pmid:19666622
- View Article
- PubMed/NCBI
- Google Scholar
28. Zhao YT, Noltie HJ, Mathew B. Iridaceae. Flora of China. 2000;24: 297–313.
- View Article
- Google Scholar
29. Son O, Son S-W, Suh G-U, Park S. Natural hybridization of Iris species in Mt. Palgong-san, Korea. Sigmul Bunryu Hag-hoeji. 2015;45: 243–253.
- View Article
- Google Scholar
30. Park J, Shim J, Won H, Lee J. Plastid genome of Aster altaicus var. uchiyamae Kitam., an endanger species of Korean asterids. Journal of Species Research. 2017;6: 76–90.
- View Article
- Google Scholar
31. Tillich M, Lehwark P, Pellizzer T, Ulbricht-Jones ES, Fischer A, Bock R, et al. GeSeq—versatile and accurate annotation of organelle genomes. Nucleic Acids Res. 2017;45: W6–W11. pmid:28486635
- View Article
- PubMed/NCBI
- Google Scholar
32. Schattner P, Brooks AN, Lowe TM. The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res. 2005;33: W686–W689. pmid:15980563
- View Article
- PubMed/NCBI
- Google Scholar
33. Löytynoja A. Phylogeny-aware alignment with PRANK. Methods Mol Biol. 2014;1079: 155–170. pmid:24170401
- View Article
- PubMed/NCBI
- Google Scholar
34. Adachi J, Waddell PJ, Martin W, Hasegawa M. Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA. J Mol Evol. 2000;50: 348–358. pmid:10795826
- View Article
- PubMed/NCBI
- Google Scholar
35. Sukumaran J, Holder MT. DendroPy: a Python library for phylogenetic computing. Bioinformatics. 2010;26: 1569–1571. pmid:20421198
- View Article
- PubMed/NCBI
- Google Scholar
36. Wang D, Zhang Y, Zhang Z, Zhu J, Yu J. KaKs_Calculator 2.0: a toolkit incorporating gamma-series methods and sliding window strategies. Genomics Proteomics Bioinformatics. 2010;8: 77–80. pmid:20451164
- View Article
- PubMed/NCBI
- Google Scholar
37. Tamura K, Nei M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol. 1993;10: 512–526. pmid:8336541
- View Article
- PubMed/NCBI
- Google Scholar
38. Kumar S, Stecher G, Li M, Knyaz C, Tamura K. MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms. Mol Biol Evol. 2018;35: 1547–1549. pmid:29722887
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Wang H, Cui Y, Zhao C. Flavonoids of the genus Iris (Iridaceae). Mini Rev Med Chem. 2010;10: 643–661. pmid:20500154
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Wheelwright NT, Begin E, Ellwanger C, Taylor SH, Stone JL. Minimal loss of genetic diversity and no inbreeding depression in blueflag iris (Iris versicolor) on islands in the Bay of Fundy. Botany. 2016;94: 543–554.
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref3] 3. Lim KY, Yoong Lim K, Matyasek R, Kovarik A, Leitch A. Parental Origin and Genome Evolution in the Allopolyploid Iris versicolor. Ann Bot. 2007;100: 219–224. pmid:17591610
View Article
PubMed/NCBI
Google Scholar

[9] View Article

[10] PubMed/NCBI

[11] Google Scholar

[ref4] 4. Artiukova EV, Kozyrenko MM, Iliushko MV, Zhuravlev IN, Reunova GD. [Genetic variability of Iris setosa]. Mol Biol. 2001;35: 152–156.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref5] 5. Wilson CA. Phylogeny of Iris based on chloroplast matK gene and trnK intron sequence data. Mol Phylogenet Evol. 2004;33: 402–412. pmid:15336674
View Article
PubMed/NCBI
Google Scholar

[16] View Article

[17] PubMed/NCBI

[18] Google Scholar

[ref6] 6. Guo J, Wilson CA. Molecular Phylogeny of Crested Iris Based on Five Plastid Markers (Iridaceae). Syst Bot. 2013;38: 987–995.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref7] 7. Lee, H. J., Yeungnam University, Gyeongsan, Republic of Korea, Park, S. J., Yeungnam University, Gyeongsan, Republic of Korea. A phylogenetic study of korean Iris L. Based on plastid DNA (psbA-trnH, trnL-F) sequences. Sigmul Bunryu Hag-hoeji. sep2013;43. http://agris.fao.org/agris-search/search.do?recordID=KR2015003937

[ref8] 8. de Abreu NL, Alves RJV, Cardoso SRS, Bertrand YJK, Sousa F, Hall CF, et al. The use of chloroplast genome sequences to solve phylogenetic incongruences in Polystachya Hook (Orchidaceae Juss). PeerJ. 2018;6: e4916. pmid:29922511
View Article
PubMed/NCBI
Google Scholar

[24] View Article

[25] PubMed/NCBI

[26] Google Scholar

[ref9] 9. Wheeler AS, Wilson CA. Exploring Phylogenetic Relationships within a Broadly Distributed Northern Hemisphere Group of Semi-Aquatic Iris Species (Iridaceae). Syst Bot. 2014;39: 759–766.
View Article
Google Scholar

[28] View Article

[29] Google Scholar

[ref10] 10. Bi Y, Zhang M-F, Xue J, Dong R, Du Y-P, Zhang X-H. Chloroplast genomic resources for phylogeny and DNA barcoding: a case study on Fritillaria. Sci Rep. 2018;8: 1184. pmid:29352182
View Article
PubMed/NCBI
Google Scholar

[31] View Article

[32] PubMed/NCBI

[33] Google Scholar

[ref11] 11. Yang Z, Zhao T, Ma Q, Liang L, Wang G. Comparative Genomics and Phylogenetic Analysis Revealed the Chloroplast Genome Variation and Interspecific Relationships of Corylus (Betulaceae) Species. Front Plant Sci. 2018;9: 927. pmid:30038632
View Article
PubMed/NCBI
Google Scholar

[35] View Article

[36] PubMed/NCBI

[37] Google Scholar

[ref12] 12. Jansen RK, Cai Z, Raubeson LA, Daniell H, Depamphilis CW, Leebens-Mack J, et al. Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns. Proc Natl Acad Sci U S A. 2007;104: 19369–19374. pmid:18048330
View Article
PubMed/NCBI
Google Scholar

[39] View Article

[40] PubMed/NCBI

[41] Google Scholar

[ref13] 13. Li P, Zhang S, Li F, Zhang S, Zhang H, Wang X, et al. A Phylogenetic Analysis of Chloroplast Genomes Elucidates the Relationships of the Six Economically Important Brassica Species Comprising the Triangle of U. Front Plant Sci. 2017;8: 111. pmid:28210266
View Article
PubMed/NCBI
Google Scholar

[43] View Article

[44] PubMed/NCBI

[45] Google Scholar

[ref14] 14. Choi KS, Kwak M, Lee B, Park S. Complete chloroplast genome of Tetragonia tetragonioides: Molecular phylogenetic relationships and evolution in Caryophyllales. PLoS One. 2018;13: e0199626. pmid:29933404
View Article
PubMed/NCBI
Google Scholar

[47] View Article

[48] PubMed/NCBI

[49] Google Scholar

[ref15] 15. Shaw J, Shafer HL, Leonard OR, Kovach MJ, Schorr M, Morris AB. Chloroplast DNA sequence utility for the lowest phylogenetic and phylogeographic inferences in angiosperms: the tortoise and the hare IV. Am J Bot. 2014;101: 1987–2004. pmid:25366863
View Article
PubMed/NCBI
Google Scholar

[51] View Article

[52] PubMed/NCBI

[53] Google Scholar

[ref16] 16. Wolfe KH, Gouy M, Yang YW, Sharp PM, Li WH. Date of the monocot-dicot divergence estimated from chloroplast DNA sequence data. Proc Natl Acad Sci U S A. 1989;86: 6201–6205. pmid:2762323
View Article
PubMed/NCBI
Google Scholar

[55] View Article

[56] PubMed/NCBI

[57] Google Scholar

[ref17] 17. Drummond AJ, Suchard MA, Xie D, Rambaut A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol. 2012;29: 1969–1973. pmid:22367748
View Article
PubMed/NCBI
Google Scholar

[59] View Article

[60] PubMed/NCBI

[61] Google Scholar

[ref18] 18. Paten B, Earl D, Nguyen N, Diekhans M, Zerbino D, Haussler D. Cactus: Algorithms for genome multiple sequence alignment. Genome Res. 2011;21: 1512–1528. pmid:21665927
View Article
PubMed/NCBI
Google Scholar

[63] View Article

[64] PubMed/NCBI

[65] Google Scholar

[ref19] 19. Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989;123: 585–595. pmid:2513255
View Article
PubMed/NCBI
Google Scholar

[67] View Article

[68] PubMed/NCBI

[69] Google Scholar

[ref20] 20. Lee H-J, Nam G-H, Kim K, Lim CE, Yeo J-H, Kim S. The complete chloroplast genome sequences of Iris sanguinea donn ex Hornem. Mitochondrial DNA A DNA Mapp Seq Anal. 2017;28: 15–16. pmid:26641138
View Article
PubMed/NCBI
Google Scholar

[71] View Article

[72] PubMed/NCBI

[73] Google Scholar

[ref21] 21. Wilson CA. The Complete Plastid Genome Sequence of Iris gatesii (Section Oncocyclus), a Bearded Species from Southeastern Turkey. Aliso: A Journal of Systematic and Evolutionary Botany. 2014;32: 47–54.
View Article
Google Scholar

[75] View Article

[76] Google Scholar

[ref22] 22. Goldblatt P, Mabberley DJ. Belamcanda Included in Iris, and the New Combination I. domestica (Iridaceae: Irideae). Novon St Louis Mo. 2005;15: 128–132.
View Article
Google Scholar

[78] View Article

[79] Google Scholar

[ref23] 23. Tillie N, Chase MW, Hall T. MOLECULAR STUDIES IN THE GENUS IRIS L.: A PRELIMINARY STUDY. Annali di Botanica. 2000;58.
View Article
Google Scholar

[81] View Article

[82] Google Scholar

[ref24] 24. Sim JK. “Iridaceae” in The Genera of Vascular Plants of Korea. Flora of Korea Editorial Committee., editor. Seoul: Academy Publishing Co.; 2007. pp. 1326–1331.

[ref25] 25. Mathew B. The Iris. Universe Books, New York; 1981.

[ref26] 26. James W. Waddick YZ. IRIS OF CHINA. Timber Press, United States; 1992.

[ref27] 27. CBOL Plant Working Group. A DNA barcode for land plants. Proc Natl Acad Sci U S A. 2009;106: 12794–12797. pmid:19666622
View Article
PubMed/NCBI
Google Scholar

[87] View Article

[88] PubMed/NCBI

[89] Google Scholar

[ref28] 28. Zhao YT, Noltie HJ, Mathew B. Iridaceae. Flora of China. 2000;24: 297–313.
View Article
Google Scholar

[91] View Article

[92] Google Scholar

[ref29] 29. Son O, Son S-W, Suh G-U, Park S. Natural hybridization of Iris species in Mt. Palgong-san, Korea. Sigmul Bunryu Hag-hoeji. 2015;45: 243–253.
View Article
Google Scholar

[94] View Article

[95] Google Scholar

[ref30] 30. Park J, Shim J, Won H, Lee J. Plastid genome of Aster altaicus var. uchiyamae Kitam., an endanger species of Korean asterids. Journal of Species Research. 2017;6: 76–90.
View Article
Google Scholar

[97] View Article

[98] Google Scholar

[ref31] 31. Tillich M, Lehwark P, Pellizzer T, Ulbricht-Jones ES, Fischer A, Bock R, et al. GeSeq—versatile and accurate annotation of organelle genomes. Nucleic Acids Res. 2017;45: W6–W11. pmid:28486635
View Article
PubMed/NCBI
Google Scholar

[100] View Article

[101] PubMed/NCBI

[102] Google Scholar

[ref32] 32. Schattner P, Brooks AN, Lowe TM. The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res. 2005;33: W686–W689. pmid:15980563
View Article
PubMed/NCBI
Google Scholar

[104] View Article

[105] PubMed/NCBI

[106] Google Scholar

[ref33] 33. Löytynoja A. Phylogeny-aware alignment with PRANK. Methods Mol Biol. 2014;1079: 155–170. pmid:24170401
View Article
PubMed/NCBI
Google Scholar

[108] View Article

[109] PubMed/NCBI

[110] Google Scholar

[ref34] 34. Adachi J, Waddell PJ, Martin W, Hasegawa M. Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA. J Mol Evol. 2000;50: 348–358. pmid:10795826
View Article
PubMed/NCBI
Google Scholar

[112] View Article

[113] PubMed/NCBI

[114] Google Scholar

[ref35] 35. Sukumaran J, Holder MT. DendroPy: a Python library for phylogenetic computing. Bioinformatics. 2010;26: 1569–1571. pmid:20421198
View Article
PubMed/NCBI
Google Scholar

[116] View Article

[117] PubMed/NCBI

[118] Google Scholar

[ref36] 36. Wang D, Zhang Y, Zhang Z, Zhu J, Yu J. KaKs_Calculator 2.0: a toolkit incorporating gamma-series methods and sliding window strategies. Genomics Proteomics Bioinformatics. 2010;8: 77–80. pmid:20451164
View Article
PubMed/NCBI
Google Scholar

[120] View Article

[121] PubMed/NCBI

[122] Google Scholar

[ref37] 37. Tamura K, Nei M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol. 1993;10: 512–526. pmid:8336541
View Article
PubMed/NCBI
Google Scholar

[124] View Article

[125] PubMed/NCBI

[126] Google Scholar

[ref38] 38. Kumar S, Stecher G, Li M, Knyaz C, Tamura K. MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms. Mol Biol Evol. 2018;35: 1547–1549. pmid:29722887
View Article
PubMed/NCBI
Google Scholar

[128] View Article

[129] PubMed/NCBI

[130] Google Scholar

Figures

Abstract

Introduction

Results

Chloroplast genome sequence assembly from 14 Korean-native Iris species

Ks value-based classification

Species tree reconstruction using the Bayesian method by 57 chloroplast genes

Identification of plastid marker sequences to facilitate construction of phylogenetic trees

Discussion

Methods

Plant materials

DNA extraction and sequencing

Bayesian tree construction

Phylogenetic analysis

Supporting information

S1 Fig. Phylogenetic tree of the 117 Iris collection by the DNA barcode at the matK gene.

S1 Table. 83 predicted genes in each assembled CP genome (this includes incomplete genes).

S2 Table. Segregating site of matK region from 117 Iris species.

S3 Table. CP sub genomes with less selection pressure.

References