Comparative Chloroplast Genomes of Photosynthetic Orchids: Insights into Evolution of the Orchidaceae and Development of Molecular Markers for Phylogenetic Applications

The orchid family Orchidaceae is one of the largest angiosperm families, including many species of important economic value. While chloroplast genomes are very informative for systematics and species identification, there is very limited information available on chloroplast genomes in the Orchidaceae. Here, we report the complete chloroplast genomes of the medicinal plant Dendrobium officinale and the ornamental orchid Cypripedium macranthos, demonstrating their gene content and order and potential RNA editing sites. The chloroplast genomes of the above two species and five known photosynthetic orchids showed similarities in structure as well as gene order and content, but differences in the organization of the inverted repeat/small single-copy junction and ndh genes. The organization of the inverted repeat/small single-copy junctions in the chloroplast genomes of these orchids was classified into four types; we propose that inverted repeats flanking the small single-copy region underwent expansion or contraction among Orchidaceae. The AT-rich regions of the ycf1 gene in orchids could be linked to the recombination of inverted repeat/small single-copy junctions. Relative species in orchids displayed similar patterns of variation in ndh gene contents. Furthermore, fifteen highly divergent protein-coding genes were identified, which are useful for phylogenetic analyses in orchids. To test the efficiency of these genes serving as markers in phylogenetic analyses, coding regions of four genes (accD, ccsA, matK, and ycf1) were used as a case study to construct phylogenetic trees in the subfamily Epidendroideae. High support was obtained for placement of previously unlocated subtribes Collabiinae and Dendrobiinae in the subfamily Epidendroideae. Our findings expand understanding of the diversity of orchid chloroplast genomes and provide a reference for study of the molecular systematics of this family.


Introduction
The orchid family Orchidaceae is one of the two largest families of flowering plants, with over 25,000 species [1] and five recognized subfamilies (Apostasioideae, Cypripedioideae, Epidendroideae, Orchidoideae, and Vanilloideae) [2]. A large number of orchids have significant economic value [3]. For example, some cultivars have been used as cut flowers or potted plants, while others can be utilized as food or medicine because of their nutritious or medical efficacy. Overexploitation and habitat destruction have threatened the survival of many wild orchid species. At the same time, numerous cultivated varieties and crossbreeds have been developed worldwide. Therefore, molecular information on orchids is of interest not only for the study of systematics, but also for species conservation and flower cultivation.
Epidendroideae is the largest of the five orchid subfamilies and includes approximately 20,000 species [1,2]. Several perspectives on its classification have long been debated [2,[4][5][6][7][8][9][10][11][12][13][14][15][16]. Burns-Balogh and Funk (1986) have reviewed previous morphological classification systems of Orchidaceae. Freudenstein and Rasmussen (1999) pointed out that most of the previously established classifications have a highly developed tribal and subtribal classification within the Epidendroideae; they first performed a cladistic analysis of Orchidaceae. However, major groups of genera are equivalent to subfamilial groups, and the detailed classifications at the tribal level are not well supported by morphological and anatomical features in most cases [12,16]. In recent years, molecular data has been used in phylogenetic studies, but some relationships among subtribes or tribes remain questionable. Major disputes were focused on whether some tribes or subtribes were monophyletic, polyphyletic, or paraphyletic; which tribe or subtribe was the most basal; and the locations of Agrostophyllinae, Collabiinae and Dendrobiinae. Limited sampling with few variable loci in most of the common DNA regions has impeded reasonable and robust estimates of phylogenetic patterns. Recent comparative chloroplast (cp) genomics has provided large quantities of data that are useful for selecting pertinent markers to resolve obscure phylogenetic relationships in seed plants [17][18][19][20][21]. However, cp genome information is still limited for the Epidendroideae.
In most land plants, the cp genome is a single circular molecule of 120-220 kb that consists of one large single-copy (LSC) region, one small single-copy (SSC) region, and a pair of inverted repeats (IRs). Although gene organization and content are conserved in cp genomes of higher plants, their genome sizes are diverse and depend largely on the extent of gene duplication, small repeats, and the size of intergenic spacers [22]. The information on sequence insertion or deletion, transition or transversion, and nucleotide repeats may help to clarify evolutionary relationships [23][24][25][26][27]. To date, cp genomes from seven orchid genera (Corallorhiza, Cymbidium, Erycina, Neottia, Oncidium, Phalaenopsis, and Rhizanthella) have been sequenced. The former six genera belong to the subfamily Epidendroideae, whereas the last one falls into the subfamily Orchidoideae. All species in these seven genera are photosynthetic orchids except Rhizanthella gardneri, Corallorhiza striata, and Neottia nidus-avis being nonphotosynthetic plants [17,[28][29][30][31][32][33][34]. However, the cp genome of the Dendrobiinae, the largest and most economically important subtribe in the Epidendroideae, has not yet been sequenced.
Dendrobium officinale Kimura et Migo, a perennial epiphytic herb of the Dendrobiinae, is endemic in moderately damp mountains in China [35]. The stems of D. officinale have been widely used as a traditional Chinese medicine (TCM) called ''Tiepi Fendou.'' The efficacious compounds in D. officinale include phenols, alkaloids, coumarins, and polysaccharides [36]; and its medical benefits include stimulation of saliva, improvement in eyesight, warming of the stomach, enhancement of immunity, and inhibition of tumor growth [36]. As a result of its habitat shrinking and human overexploitation, natural populations of D. officinale have been progressively destroyed, and in 1992 it was classified as an endangered species in the Chinese Plant Red Book [37]. D. officinale has recently been rescued by tissue culture in southern China.
The subfamily Cypripedioideae comprises approximately 155 species in five genera [2]. All Cypripedioideae species have special flowers with a saccate lip, two fertile stamens, a shield-like staminode, and a synsepal composed of fused lateral sepals [38]. Because of its attractive morphological characteristics, this subfamily has been investigated widely in theoretical and applied research. Nonetheless, molecular information on this subfamily is still limited. Cypripedium macranthos Sw. is a terrestrial herbaceous plant in the subfamily and naturally distributed in East Asia [39]. Because of the commercial value of its pretty red or pink flowers, it has been cultivated as a potted and garden plant.
In this study, we sequenced the complete cp genomes of D. officinale and C. macranthos using a next-generation sequencing (NGS) approach. Our objectives were to deepen understanding of the structural diversity of orchid cp genomes and to provide information for resolving uncertain relationships within the Epidendroideae. The cp genomes of seven photosynthetic orchid species (C. macranthos, Cymbidium mannii, D. officinale, Erycina pusilla, Oncidium Gower Ramsey, Phalaenopsis aphrodite, and Phalaenopsis equestris) were compared to elucidate the diversity of gene order, gene content, and genome structure among them. Four regions were filtered according to the sequence divergence of proteincoding genes, and 56 taxa from 36 genera were used as a case study to determine phylogenetic relationships within the Epidendroideae.

Materials and Methods
Chloroplast DNA extraction and genome sequencing, assembly, and PCR-based validation This study was approved by the Ethics Committee of Forestry Bureau of Zhejiang Province and Nanjing Normal University, China. We collected seeds of D. officinale from Yandang experimental base of Zhejiang Branch, College of Life Sciences, Nanjing Normal University.
Young leaves of D. officinale were taken from 6-month-old seedlings grown in a greenhouse. Intact chloroplasts were isolated using the Percoll gradient method (22-45%) [40]. Purified chloroplast DNA was extracted according to the 26 CTAB protocol [41]. Fresh leaves of C. macranthos were collected from Yunnan Province, China. Total DNA was extracted using a Qiagen DNeasy plant mini kit (Qiagen, Germany). DNA concentration and quality were determined using a NanoDrop 8000 Spectrophotometer (Thermo Scientific, Wilmington, DE). High quality DNA (concentration .300 ng/ml, A260/280 ratio = 1.8-2.0 and A260/230 ratio.1.7) was used for sequencing.
Purified DNA was fragmented and used to construct short-insert libraries (insert size,500 bp) according to the manufacturer's instructions (Illumina). The short fragments were sequenced using an Illumina Hiseq 2000 sequencing system [42].
The raw reads for D. officinale were trimmed with error probability ,0.001 and assembled using SOAPdenovo version 1.05 with default parameters [43]. The de Bruijn graph approach was applied to assembly with an optimal K-mer size of 79. The contigs shorter than 200 bp were removed. Then the paired-end information was used to join the contigs into scaffolds with the cp genome of P. aphrodite (Accession Number: AY916449) as a reference. Gaps among scaffolds were filled using paired-end extracted reads.
The short reads for C. macranthos were trimmed with error probability ,0.05 and assembled using CLC Genomic Workbench 6.0.1 (CLC Bio, Aarhus, Denmark). The contigs shorter than 200 bp were discarded; others were compared with plant cp genomes in the National Center for Biotechnology Information (NCBI) using BLAST (http://blast.ncbi.nlm.nih.gov) searches. Contigs matching referenced genomes with E values ,10 25 were selected for annotation.
Based on the reference genomes in Orchidaceae [28,29], gaps and four junction regions between LSC/SSC and IRs were confirmed by PCR amplification and Sanger sequencing using the primers listed in Table S1.

Analyses of RNA editing sites
Thirty protein-coding genes of D. officinale and C. macranthos cp genomes were used to predict potential RNA editing sites using the online program Predictive RNA Editor for Plants (PREP) suite (http://prep.unl.edu/) [48] with a cutoff value of 0.8.

Phylogenomic analyses
Sixty-three common protein-coding genes were extracted from 10 cp genomes. Seven photosynthetic orchid species were involved in analyses with Calamus caryotoides, Phoenix dactylifera, and Typha latifolia as outgroups. The GenBank accession numbers of all taxa are shown in Table S2. The accD, infA, rps16, rps19, ycf1, and ndh genes were not included in the data set because they were pseudogenized in some cp genomes. Alignments were performed using the MUSCLE program in Mega 5.03 [49], without including gaps, and start and stop codons. The aligned sequences were concatenated and used for phylogenetic reconstruction.
The ML tree was constructed by means of GTR+G model with raxmlGUI version 1.2 (http://sourceforge.net/projects/raxmlgui/) [50] and a rapid bootstrap value of 1,000. A Bayesian inference (BI) tree was constructed using CAT model with PhyloBayes version 3.2 [51]. Two Independent MCMC chains were run. The first 25% of the cycles were removed as burn-in, and convergence of three chains was checked on the basis of maxdiff ,0.3 by following the PhyloBayes manual.

Sequence divergence of protein-coding genes
To obtain suitable markers for phylogenetic analysis within subfamilies, complete cp genomes of six orchid species (C. macranthos, C. mannii, D. officinale, E. pusilla, O. Gower Ramsey, and P. aphrodite) were applied. The average pairwise distances of nucleotide and protein substitutions for 68 protein-coding genes were estimated using Kimura's two-parameter model and pdistance, respectively, in Mega 5.03 [49].

Phylogenetic application of cp genomes, a case study on the Epidendroideae
We selected the Epidendroideae as an example for phylogenetic analysis. Data sets for four incomplete gene sequences (ycf1, matK, ccsA, and accD) were obtained for 56 taxa from 36 genera. The data matrix included 11 subtribes and one tribe in the Epidendroideae, with C. caryotoides, P. dactylifera, and T. latifolia as outgroups. Six taxa from two additional orchid subfamilies were used as internal checks. Sequences from 10 of the taxa were extracted from complete cp genomes; sequences from the other 46 taxa were obtained by PCR amplification and sequencing of PCR products with an ABI PRISM 3730XL DNA analyzer (Applied Biosystems). Primers for accD and ccsA were designed using Primer Premier version 6 [52] based on homologous sequences from orchid cp genomes (Table S3). All newly generated sequences were deposited in GenBank with accession numbers KF361524-KF361707. Sources of species and GenBank accession numbers are indicated in Table S4. These regions were aligned separately using Mega 5.03 [49] with manual modifications, and gaps were coded as ''-.'' Sequence information was analyzed using Mega 5.03 and DnaSP version 5.0 [53]. The combined matrix was utilized for phylogenetic analyses. Modeltest version 3.7 [54] was employed to select the best nucleotide substitution model under the Akaike Information Criterion (AIC); the GTR+I+G model was chosen as the best fit for our data set. The ML and BI analyses were performed according to the same protocol as that used for phylogenomic analyses.

Sequencing and genome assembly
The raw Illumina paired-end sequencing of D. officinale produced 350 Mb of data. After quality trim, 210 Mb of data remained with an average read length of 80 bp. The subsequent de novo assembly produced 13 scaffolds, 12 of which were .2 Kb and the scaffold N50 size was 84,551 bp. The average coverage depth was 1,4006. These scaffolds were used for the following assembly.
We sequenced 2.5 Gb of Illumina paired-end reads for C. macranthos (average read length of 90 bp). The initial assembly included 12,148 contigs. After compared with plant cp genomes, 41 contigs were obtained with E values,10 25 and mean coverage depth = 266. Four of these contigs were larger than 10 kb with average depth coverage 1296, resulting in a nearly complete draft genome. After assembly and gap closure, two complete chloroplast genomes were obtained.

Characteristics of the chloroplast genomes of Dendrobium officinale and Cypripedium macranthos
The complete cp genomes of D. officinale and C. macranthos were circular, having 152,221 and 157,050 bp, respectively. Similar to other angiosperms, both cp genomes were AT-rich (62.53% and 62.17%, respectively). The D. officinale plastome contained 110 different genes, of which 91 were single-copy genes and 19 were duplicated genes (Fig. 1). Its cp genome consisted of 76 proteincoding genes, 4 rRNA genes, and 30 tRNA genes. C. macranthos encoded 113 different genes (94 single-copy and 19 duplicated genes). The C. macranthos cp genome included 79 protein-coding genes, four rRNA genes, and 30 tRNA genes (Fig. 2). The gene content of the D. officinale cp genome was relatively conserved compared with other known orchid cp genomes. The gene content of the C. macranthos cp genome was also relatively conserved with the exception of the following. A coding sequence (CDS) of infA (coding for translation initiation factor) was interrupted because of a 5-bp deletion (53 bp downstream of the start codon). This gene was lost from Discorea in monocots [55]. At the N terminus of rps19, a surplus nucleotide A in the poly (A) tract interrupted the open reading frame (ORF), causing a frameshift. Furthermore, we recognized rps16 as a pseudogene because a partial intron and the second exon were missing in it.

Potential RNA editing sites
In the present study, potential RNA editing sites were predicted for 30 genes; as a result, a total of 51 RNA editing sites were identified in genes of Cypridium and Dendrobium (Table S5). No potential editing sites were identified in seven genes (petD, petG, petL, psbB, psbE, psbL, and rpl23) in both cp genomes. Of the 51 editing sites, 9 (17.6%) and 42 (82.4%) were located at the first and the second codon position, respectively, in Cypripedium; 8 (15.7%) and 43 (84.3%) were located at the first codon and the second codon position, respectively, in Dendrobium; but no editing sites were found at the third codon position. Just as in other terrestrial plants, the editing types in Cypripedium and Dendrobium were all Cto-U [56][57][58]. The amino acid conversion S to L occurred most frequently, while P to S and R to C occurred least. Thirty-four common RNA editing sites were shared in genes of the two species. We also observed RNA editing (C to U conversion) in the initiation codon of rpl2 transcripts of D. officinale, which is a common phenomenon among angiosperms and has been verified in P. aphrodite and R. gardneri [28,30].

Phylogenomic analyses of the seven orchids
Our phylogenomic construction was based on 63 proteincoding genes of cp genomes, and the aligned data set comprised 47,736 bp. The BI and ML trees had the same topology (Fig. 3), demonstrating that Cypripedium (Cypripedioideae) was sister to the Epidendroideae. In the Epidendroideae, Dendrobium was sister to other species, and Cymbidium and Oncidium-Erycina were sister to Phalaenopsis.  Table 1). Compared with C. macranthos, the other taxa had reduced IR length. The organization of the cp genomes of the Epidendroideae was similar to that of the C. macranthos, except for three sequences: Yycf1-ndhF, ndhC-ndhJ, and ndhD-ndhH. Variations in Yycf1-ndhF sequence were due to reductions in the lengths of Yycf1, ndhF, and Yycf1-ndhF non-coding regions located at the IR B /SSC junction. Variations in ndhC-ndhJ and ndhD-ndhH sequences were caused by pseudogenization or loss of ndh genes.

Comparison of sequences flanking IR/SC junctions in the Orchidaceae
Sequences flanking IR/SC (single copy) junctions vary among cp genomes of different species [59]. Here, we compared sequences flanking IR/SC junctions among seven orchid cp genomes (Fig. 4); all of them were found to have similar structures at the IR/LSC junction. The trnH-rps19 cluster was duplicated and involved in IR. The IR B /LSC junction (J LB ) was located within rpl22 in all seven orchid cp genomes. As a result, a duplicated Yrpl22 was nested within IR A .
On the other side, the orchid chloroplast genomes had distinct characteristics at the IR/SSC junction. In P. Aphrodite, the IR A / SSC junction (J SA ) was located upstream of ycf1, whereas in other species J SA was located within ycf1. Four types of junctions in the orchid cp genomes were characterized on the basis of the organization of genes flanking the IR B /SSC junction (J SB ). Cypripedium and Dendrobium shared type I structure in which J SB was located upstream of the ndhF-rpl32 cluster. Type II junction was found in Cymbidium and was characterized by an overlap between Yycf1 and ndhF, resulting in J SB being located within these two genes. Type III was shown in Oncidium, Erycina, and P. equestris, in which J SB was located inside the Yycf1-rpl32 cluster, with the loss of ndhF gene. The type IV structure was present in P. aphrodite and characterized by the entire incorporation of the entire ycf1 into the SSC, with J SB inside trnN-rpl32.

Chloroplast-encoded ndh genes in seven orchid species
Chloroplast-encoded ndh genes were investigated in C. macranthos and the six photosynthetic Epidendroideae species (Fig. 3). The 11 ndh genes in Cypripedium cp genome were intact, but many ndh genes had either truncations or indels, resulting in frameshifts or pseudogenes in the six Epidendroideae cp genomes. The ndhD gene in all these Epidendroideae species contained indels or stop codons. The characteristics of other ndh genes differed among the genera. In Dendrobium, ndhB was intact; ndhC, I, and K were lost; and ndhF was truncated with two sequence inserts, creating two frameshifts. In the two Phalaenopsis species the ndhA and ndhF genes were absent and the remnants of seven ndh genes became pseudogenes. The ndhE genes in P. equestris and P. aphrodite were lost and incomplete, respectively. The two Oncidiinae species (Erycina and Oncidium) had similar patterns of diversity of ndh genes except ndhA, ndhE, and ndhI. Other varieties within Oncidiinae shared major characteristics of ndh genes in Erycina and Oncidium [29]. In Cymbidium, most of the ndh genes were present in the ORF and remained intact [17].

Sequence divergence of protein-coding genes in the Orchidaceae
The pairwise distances of nucleotide and protein substitutions of 68 protein-coding genes were compared among six orchid species (Table 2). According to the average pairwise distance and numbers of nucleotide substitutions, three genes (rps7, rpl2, and rpl23) located in the IR regions had relatively low mean levels of sequence divergence. The rpl and rps genes in the LSC and SSC regions showed higher evolutionary rates. Fifteen regions with relatively high divergence were identified in rps16, ycf1, matK, rps15, rpl22, ccsA, psaI, rpl32, rpl16, rpl20, atpF, psbK, psbT, accD, and rps8, located in the LSC, SSC, or SSC/IR junction regions. Similar patterns of divergence were also observed at the protein level, with the exception of psbT. Sequence divergence and gene length yielded a sufficient variety of loci (.600 bp); thus, the sequences of accD, ccsA, matK, and ycf1 were identified and used for phylogenetic analyses.

Molecular phylogeny within the Epidendroideae
To determine the availability of the accD, ccsA, matK, and ycf1 sequences for phylogenetic analyses, the Epidendroideae was used as a case study because of the disputes regarding its systematics. Sequences of the four genes were successfully amplified in all 46 taxa. The aligned combined dataset comprised 4,593 characters, of which 2,839 represented variable sites and 1,447 were parsimony-informative sites. The number of variable sites was highest in ycf1 and lowest in ccsA (Table S6).    Table 2. Pairwise distances of nucleotide and protein substitutions among six orchid species.    Phylogenetic analyses using BI and ML approaches resulted in the same topology (Fig. 5). Most nodes had high support among tribes and subtribes within the subfamily Epidendroideae. Within it, Coelogyninae was sister to all the other subtribes or tribes, with strong support (ML BP 100%, BI PP 1.00). The Bulbophyllum group clustered with the Epigeneium and Dendrobium-Flickingeria to form a monophyletic clade of Dendrobiinae that was closely allied to Malaxideae (Laparis and Oberonia). The Dendrobiinae-Malaxideae clade was sister to the rest of the subfamily. Podochilinae and Eriinae were not monophyletic clades; these two subtribes (both of tribe Podochileae) were sister to Collabiinae.

Comparison of RNA editing sites
Involved in plastid posttranscriptional regulation, RNA editing provides an effective way to create transcript and protein diversity [60,61]. Some chloroplast RNA editing sites are conserved in higher plants [62,63]. In Orchidaceae, RNA editing sites were identified in 24 protein-coding transcripts in P. aphrodite [63]. Potential editing sites also were identified in P. equestris and O. Gower Ramsey [33]. Of the examined 30 genes in abovementioned seven orchids, 15 potential RNA editing sites out of 11 genes (atpA, atpF, clpP, matK, petB, psbF, rpl20, rpoA, rpoB, rpoC1 and ycf3) were shared; the number of shared editing sites increased in Epidendroideae species (28 sites out of 16 genes) (Table S5 and [33]). Therefore, RNA editing is more conserved from the same subfamily than which from different subfamily. However, orchids and other angiosperms have relatively less common editing sites. For example, 10 potential RNA editing sites were shared by orchids and Cocos nucifera; comparisons among Nicotiana tabacum, Arabidopsis thaliana, grasses and orchid RNA editing sites showed low conservation of editing sites (only one common editing sites in rpoB)( Table S5). These cases indicate that the evolutionary conservation of RNA editing is essential for only a few plastidediting sites [64][65][66].

IR expansion or contraction in the Orchidaceae
The variability of genes flanking IR/SC junctions results in IR expansion or contraction [59,67]. At the IR/LSC boundaries, most IRs of non-orchid monocots contained trnH-rps19 gene clusters, excluding Yrpl22 genes, leading to more-progressive expansion of IRs than that having occurred in non-monocot angiosperms [17,20,55,59,[68][69][70][71]. In all known photosynthetic orchid cp genomes, trnH-rps19 clusters and Yrpl22 genes were involved in IRs at the IR/LSC junctions. The IR/LSC junctions were the standard type III [71], and IRs experienced the largest expansion at the IR/LSC junction compared with other monocots.
The IR/SSC junction types II-IV in orchids differed from those in other monocots, while type I (in Cypripedium and Dendrobium cp genomes) was similar to that in Acorus (Fig. 4) with ycf1 extending over the J SA andYycf1 located within IR adjacent to the J SB . Although Yang et al. suggested most likely evolutionary routes of IRs in monocots [59], no studies have proposed a model about the evolutionary dynamics of the IR/SSC junctions within orchids. Here, we hypothesize two evolutionary routes to explain the expansion or contraction of IRs adjacent to IR/SSC junctions from an Acorus-like ancestor to the existing orchids. The first route proceeded from type I to type II; ycf1 further expanded into the IR A , resulting in an expansion of duplicated Yycf1 in the IR B . During this period, an overlap occurred between ndhF remnant and Yycf1. On the second route, ycf1 shifted continuously into the SSC, resulting in a shorter, duplicated Yycf1 adjacent to the J SB .
Continually, ycf1 was embedded completely into the SSC, leading to the loss of duplicated Yycf1. This contractive process of IR involved the structural change from type I to type IV via type III. Moreover, IRs expansion or contraction may not correlate with the taxonomic relationships. More molecular data need to be collected for intensifying our understanding of variations in sequences flanking IR/SSC junctions.
The shift of the border between the IR and SSC in orchids was associated with the ycf1 gene. Compared with the average AT content of protein-encoding genes, all known orchid ycf1 genes exhibited usage bias of AT base pairs (see Table 1 and Table S7). AT base pairs are bound by two hydrogen bonds, while GC base pairs are bound by three hydrogen bonds; therefore, DNA with high AT content is less stable than that with low AT content. Poly (A) tract sequences at IR/LSC boundaries might be closely linked with the dynamics of IR/LSC junctions and expansion of IR [67,71]. Similarly, the AT-rich nature of ycf1 gene may be linked to the recombination of IR/SSC junction.
The loss or pseudogenization of ndh genes in orchid chloroplast genomes Instances of gene loss or pseudogenes have been elucidated in the cp genomes of monocots [21]. Chloroplast-encoded gene degeneration in photosynthetic orchids is mostly embodied in structural changes of ndh genes. There are 11 chloroplast-encoded ndh genes in the cp genomes of land plants, located in several transcriptional units and encoding for the thylakoid Ndh complex [72]. Non-functional chloroplast-encoded ndh genes have been found in CAM and C3 plants [32], including gymnosperms and grasses [68,73,74]. Sequence truncations and indels are common phenomena in orchid chloroplast-encoded ndh genes [17,[28][29][30][31][32][33]75]. Pseudogenization or loss of the ndh gene did not correlate well with the divergent patterns of Epidendroideae lineages observed in the phylogenetic trees (Fig. 3). However, 10 common ndh pseudogenes of two Phalaenopsis species showed a high degree of similarities in sequence and indel patterns [33]. Both Erycina and the allied genus Oncidium lost two ndh genes (ndhF and ndhK) and had six pseudogenes (ndhB, C, D, G, and J); similar results were obtained from Oncidium and related Oncidiinae varieties [29]. Thus, we infer that relative species had similar patterns of variation in ndh gene content.
The loss of some chloroplast-encoded genes might not affect the plant life cycle. Gene transfer from chloroplast to nucleus is known to occur frequently during evolutionary processes [76]. The ancestral plastid ndh genes of orchids are presumed to have been transferred to the nucleus [28]. Moreover, fungal symbionts may contribute to the fate of ndh genes [77]. Therefore, the functions of lost chloroplast-encoded ndh genes could be performed by homologous genes from other resources; this hypothesis needs to be tested in the future study.

Phylogenetic relationships based on complete cp genomes
The cp sequences have been used in deep phylogenetic analyses because of their low substitution rates [20,78]. Phylogenetic analyses based on complete chloroplast genomes have resolved some bewildering relationships in angiosperms. Using two tree construction methods with different models, we obtained consistent results on the relationships among Phalaenopsis (Aeridinae), Cymbidium (Cymbidiinae), Dendrobium (Dendrobiinae), Oncidium and Erycina (Oncidiinae) within Epidendroideae, which are congruent with matK and rbcL analyses by Gustafsson et al. (2010) [8] and morphological cladistic analysis by Freudenstein and Rasmussen (1999) [12]; but are inconsistent with the analyses based on nuclear ribosomal internal transcribed spacer (nrITS), matK, rbcL, trnL-F, the trnL intron, and nuclear Xdh gene [4,7]. However, wholegenome sequencing for sparse sampling can result in long-branch artifacts and incorrect evolutionary reconstructions [79]. Therefore, further genomic and taxon sampling will be necessary to resolve the relationships within this subfamily.

Gene divergence based on comparative chloroplast genomes
Variability of genes in cp genomes has been calculated according to nucleotide diversity in previous studies [80,81]. If we considered sequence divergence at the nucleotide and protein levels, rps7, rpl23, rpl2, and ycf2 were conserved with low evolutionary distance, with the exception of rps19, which exhibited medium divergence in the IR regions. These results are consistent with previous reports of slower divergence of sequences in the IR regions compared to other regions [80,82]. Although the ycf2 gene has been demonstrated to be one of the most rapidly evolved genes among 16 vascular plant species [80], the present study showed that it had relatively slow nucleotide divergence and moderate protein divergence within the Orchidaceae.
In this study, highly divergent genes were acquired according to pairwise distance of nucleotide substitutions. While ycf1 was located at the IR/SSC junction, 14 other genes bordered the LSC and SSC regions, four of which were selected to construct phylogenetic trees. Of these, matK and ycf1 have been used in previous studies [4,6], while accD and ccsA were applied for the first time to the phylogenetic analysis of the subfamily Epidendroideae in the present study. These genes can be used as good phylogenetic markers at the subfamily level because of the following three reasons. First, these regions are variable, which highlights their unusual evolutionary properties. According to the pairwise distance of protein substitutions (Table 2), ycf1 and matK have high divergence, and accD and ccsA have relatively moderate substitution rates that are higher than rbcL, which has been used in previous systematic analyses within the Epidendroideae [4,5,75]. Second, these regions are sufficiently long (.600 bp) to yield adequate loci for phylogenetic analysis. Third, the sequences are easily obtained by PCR amplification and relatively conservative for alignment.

Phylogenetic reconstruction of the Epidendroideae
The phylogeny of the Epidendroideae has long been debated. Here, eleven common subtribes and one tribe from Epidendroideae were used as a case study to identify the phylogenetic relationships within this subfamily using four cp sequences. Fig. S1 illustrates the relationships among these subtribes or tribes in previous studies based on molecular data. With polyphyletic and paraphyletic groups excluded from phylogenetic analyses, major debates were the placement of Dendrobiinae, Malaxideae, and Collabiinae, as well as identification of the basal subtribe or tribe. On the basis of a concatenated data set, we clarified several relationships that were previously poorly resolved, and the majority of nodes at the subtribe level in the gene trees had high support.
Previously, the placement of Collabiinae and Dendrobiinae was problematic, but their positions have been recovered. Collabiinae was polyphyletic based on matK and rbcL [15]. Van Den Berg (2005) proposed Collabiinae was in an unfixed position in MP and BI analyses, and Gorniak et al. (2010) posited that Collabiinae was sister to Aeridinae and Eriinae with high support based on nuclear gene Xdh [4,7]. Our results suggest that Collabiinae was sister to the Podochilinae-Eriinae (tribe Podochileae) clade with moderate support; this is congruent with MP analysis of Van Den Berg (2005), which had weak support [4]. The positions of Dendrobiinae and Malaxideae were also confirmed. Dressler (1990) placed Malaxideae and Dendrobieae in two separated groups, Cymbidioid phylad and Epidendroid phylad, according to reed stem, upper lateral inflorescences and spherical silica bodies [13]. Chase (2003) recognized Dendrobiinae as a subtribe rather than tribe Dendrobieae [2]. By inferring from nrITS and four chloroplast sequences, Van den Berg et al. (2005) held that Dendrobiinae was beside Malaxideae [4]. Dendrobiinae is similar to Malaxideae in the synapomorphic state of the naked pollinium [84]. Like other analyses based on Xdh and rbcL, our analyses support that Dendrobiinae and Malaxideae were sister relatives [5,7], which was consistent with the morphological similarities between them. Controversially, the position of Dendrobiinae-Malaxideae clade was going up in the analysis of Xdh [7], and Podochileae was sister to this clade in the analysis of plastid gene rbcL (bootstrap support ,50%) [5]; however, this clade was sister to other clades except Coelogyninae with high support in the present study. More extensive sampling and sequencing of mitochondrial and nuclear genomes should be conducted to resolve uncertain relationships within the Epidendroideae with confidence.

Conclusions
In summary, complete chloroplast genomes can provide abundant information for resolving evolutionary questions. The gene content, organization, and sequence of chloroplast genome have been used as important markers in systematic research. This study determined complete cp genomes of Dendrobium officinale and Cypripedium macranthos and compared cp genomes of seven photosynthetic orchids including the above two, which showed structural similarities but differences in IR/SSC junctions and ndh genes. We propose that the AT bias of ycf1 in the Epidendroideae may be related to recombination of the IR/SSC junction. In Figure 5. Phylogenetic tree of the Epidendroideae reconstructed based on combined genes. BI and ML analyses yielded identical topologies. Posterior probability and bootstrap proportion are indicated near the nodes. Subfamilies, tribes, and subtribes (sensu Chase et al. [2]) are indicated where applicable. doi:10.1371/journal.pone.0099016.g005 Comparison and Application of Orchid Cp Genomes addition, relationships among subtribes and tribes in the subfamily Epidendroideae were resolved with high or moderate support in the present study. The highly divergent genes of cp genomes identified in this study can be used as markers in phylogenetic analyses. Further plastome sequencing of orchids will be necessary to clarify the diversity of chloroplast genomes and to improve our understanding of the relationships within this family. Figure S1 Phylogenetic relationships among 11 subtribes and one tribe within the subfamily Epidendroideae resulting from previous studies. All trees were drawn according to the cited references. Molecular markers and methods used in phylogenetic analyses are enclosed by parentheses below the cited studies. Subtribal and tribal delimitations refer to Chase et al. [2]. (TIF)