The Complete Plastid Genome Sequence of Madagascar Periwinkle Catharanthus roseus (L.) G. Don: Plastid Genome Evolution, Molecular Marker Identification, and Phylogenetic Implications in Asterids

The Madagascar periwinkle ( Catharanthus roseus in the family Apocynaceae) is an important medicinal plant and is the source of several widely marketed chemotherapeutic drugs. It is also commonly grown for its ornamental values and, due to ease of infection and distinctiveness of symptoms, is often used as the host for studies on phytoplasmas, an important group of uncultivated plant pathogens. To gain insights into the characteristics of apocynaceous plastid genomes (plastomes), we used a reference-assisted approach to assemble the complete plastome of C . roseus , which could be applied to other C . roseus -related studies. The C . roseus plastome is the second completely sequenced plastome in the asterid order Gentianales. We performed comparative analyses with two other representative sequences in the same order, including the complete plastome of Coffea arabica (from the basal Gentianales family Rubiaceae) and the nearly complete plastome of Asclepias syriaca (Apocynaceae). The results demonstrated considerable variations in gene content and plastome organization within Apocynaceae, including the presence/absence of three essential genes (i.e., accD, clpP, and ycf1) and large size changes in non-coding regions (e.g., rps2-rpoC2 and IRb-ndhF). To find plastome markers of potential utility for Catharanthus breeding and phylogenetic analyses, we identified 41 C . roseus -specific simple sequence repeats. Furthermore, five intergenic regions with high divergence between C . roseus and three other euasterids I taxa were identified as candidate markers. To resolve the euasterids I interordinal relationships, 82 plastome genes were used for phylogenetic inference. With the addition of representatives from Apocynaceae and sampling of most other asterid orders, a sister relationship between Gentianales and Solanales is supported.


Introduction
Plastids are distinctive organelles that originated from cyanobacteria and are shared by photosynthetic eukaryotes and their descendants [1].They are crucial metabolic compartments with their own genome (i.e., plastome), which is the remnant of the cyanobacterial genome with most genes transferred to the nucleus [2].Due to its relatively stable structure and uniparental inheritance in most angiosperms, the plastome is commonly used as a source of information for the inference of phylogenetic relationships at various taxonomic levels [3].Previously, the prevailing approach to phylogenetic analyses based on plastomes was to sequence one or a few loci from many taxa.With the increasing availability of complete plastome sequences, analyses based on whole plastomes are becoming feasible.Compared with the analyses based on a limited number of loci, the whole-plastome approach could reduce the sampling error [4] and may hold promise for resolving previously unresolved phylogenetic relationships [5,6].
According to the latest classification system of the Angiosperm Phylogeny Group [7], Gentianales is placed in a subclade of euasterids I with unresolved relationships with Boraginaceae and the monophyletic group formed by Solanales and Lamiales.This is consistent with the analyses based on three protein-coding genes and three non-coding regions of plastomes from 132 genera [8].However, the support for this sister relationship between Solanales and Lamiales was relatively weak (maximum parsimony jackknife < 50% [8]).In addition, several other studies with more extensive taxon sampling and/or based on more molecular markers have resulted in phylogenetic trees with different topologies.Instead of Lamiales, Gentianales forms a monophyletic group with Solanales in phylogenetic analyses that included nuclear markers [9,10,11].It is notable that in a phylogeny based on 77 nuclear genes, the monophyly of Gentianales and Solanales received a strong support (maximum likelihood bootstrap > 95% [10]).On the contrary, phylogenetic analyses based on most of the available plastome protein-coding and rRNA genes have resulted in contradictory topologies.Whereas Moore et al. [5] showed monophyly of Gentianales and Solanales, Jansen et al. [6], Moore et al. [12], and Yi and Kim [13] suggested a sister relationship between Gentianales and Lamiales.Inclusion of the basal asterid Ardisia and exclusion of taxa to avoid overrepresentation of certain families and genera also indicated a closer relationship of Gentianales with Lamiales than with Solanales [14].However, these conflicting phylogenetic hypotheses received relatively weak supports in individual studies, highlighting the uncertainty of interordinal relationship within euasterids I.One possible explanation for this observation might be insufficient taxon sampling of families, particularly for Gentianales.To date, the only taxon with a complete plastome sequence is Coffea [15] from the basal Gentianales family Rubiaceae [16].To expand the taxon sampling for plastome phylogenetic analyses and to have a better understanding of plastome evolution within Gentianales, we chose the Madagascar periwinkle Catharanthus roseus (Apocynaceae, Gentianales) for whole plastome sequencing.
In addition to its potential implications for resolving asterid phylogeny, the complete plastome sequence may be applied to other studies related to C. roseus.With a rich repertoire of more than 130 terpenoid indole alkaloids, C. roseus has been one of the most important sources of chemotherapeutic and antihypertensive drugs [17].Although the complete synthesis pathways are known, the alkaloids or their precursors still have to be harvested from periwinkle plants [18].There is strong evidence that the isopentenyl pyrophosphate (IPP) precursor for secologanin biosynthesis, which is the limiting step for alkaloid accumulation in C. roseus [19], mainly comes from the MEP/DOXP pathway located in plastids [20].Therefore, one possible approach to increasing the production of secologanin and of terpenoid indole alkaloids would be to genetically modify the plastome to express enzymes that could accelerate IPP synthesis.For this purpose, the complete plastome sequence would be needed for designing transformation vectors that could be used to engineer the C. roseus plastome.
Catharanthus roseus is also an ornamental plant grown worldwide for its traits of continuous flowering and variable flower colors.Efforts have been made to select for cultivars with various morphological traits or higher alkaloid contents [17].Previous studies have characterized and differentiated the cultivars using approaches such as amplified fragment length polymorphisms (AFLP), random amplified polymorphic DNA (RAPD) [21], and chemotaxonomy [22].However, since there are now over 100 cultivars of C. roseus [17], sequence-based methods are needed to provide more accurate analyses that could reveal the phylogenetic relationships among cultivars.The complete plastome sequence therefore can be used to design primers for markers such as highly variable intergenic regions or regions containing simple sequence repeats (SSRs), which are commonly used for differentiation and bar coding of cultivars as well as detection of hybrids due to the nonrecombinant, uniparentally inherited nature of plastomes [23,24,25].
In addition to its importance as a medicinal and ornamental plant, C. roseus is commonly used as the experimental host for studies on plant pathogenic phytoplasmas [26,27].Because phytoplasmas are hitherto unculturable and can only be maintained in plants, molecular studies that require DNA samples often suffer from high levels of contaminations from plant nuclear, mitochondrial, and plastid DNA.In particular, plastid DNA generally has a lower GC content than that of nuclear or mitochondrial DNA and tends to be co-purified with AT-rich Phytoplasma DNA in the commonly used cesium chloride (CsCl) gradient ultracentrifugation protocols, causing a major problem for de novo assembly of Phytoplasma genomes.The complete plastome sequence of C. roseus therefore has practical applications for genomic and transcriptomic studies on phytoplasmas by providing a reference for filtering out non-Phytoplasma sequence reads.
In this study, we aimed to determine and characterize the complete plastome sequence of C. roseus using Illumina sequencing data.To identify loci of potential utility for the characterization and phylogenetic analyses of Catharanthus cultivars and species, we examined the intergenic regions and SSRs of the C. roseus plastome.Finally, with the addition of the C. roseus plastome, we performed phylogenetic analyses to gain insights into the position of Gentianales in asterid plastome phylogenies.

Plant materials and sequencing
The plant materials used (Catharanthus roseus cultivar Pacifica Punch Halo) were grown from seeds (Asusa Spike Seeds Inc., Taiwan) and maintained in a greenhouse.The Wizard Genomic DNA Purification Kit (Promega) was used to extract total DNA from 1.4 g of midribs cut from mature leaves of plants infected with the Phytoplasma strain PnWB NTU2011 [28].Two separate libraries were prepared and 101-bp reads were sequenced on the HiSeq 2000 platform (Illumina, USA) by a commercial sequencing service provider (Yourgene Bioscience, Taiwan), including one paired-end library (insert size = ~223 bp, 149,717,490 read pairs, ~30.2 Gb of raw data) and one mate-pair library (insert size = ~4.5 kb, 13,233,069 read pairs, ~2.7 Gb of raw data).

Plastome assembly
Due to the high proportions of C. roseus nuclear and Phytoplasma DNA in the Illumina libraries, we adopted a reference-assisted approach for the assembly of C. roseus plastome.The references included 14 incomplete Asclepias plastome sequences [29], as well as the complete plastome sequences from Coffea arabica and four Nicotiana spp.(Table S1).The Illumina reads were mapped onto these references using BWA 0.6.2[30].The mapped reads were considered as of putative plastome origin and were used as the input for Velvet 1.2.07 [31] to perform de novo assembly, while the unmapped reads were ignored during the initial assembly.Based on our optimization tests, the parameters for Velvet were set to k = 87, expected coverage = auto, and minimum contig length = 500.The initial assembly of the C. roseus plastome included 14 contigs, which had a total length of 123,451 bp and were used as the starting point for our iterative assembly improvement process [32].For each iteration, we mapped all raw reads from the two libraries to the existing contigs using BWA and visualized the results using IGV [33].Neighboring contigs with mate-pair support for continuity were merged into scaffolds and reads overhanging at margins of contigs or scaffolds were used to extend the assembly and to fill gaps.Possible assembly errors were examined by recognizing read pairs with abnormal insert size.The iterations continued until the final circular plastome sequence was obtained.Mapping of raw reads onto the final assembly using BWA resulted in coverage levels well beyond those reported for other plastome assemblies [29,34,35]: 178-fold coverage of mate-pair reads with mapping quality of at least 37 and 4,497fold coverage of paired-end reads with a mapping quality of 60.

Annotation and genome analyses
The online automatic annotator DOGMA [36] was used to generate preliminary annotations of the C. roseus plastome.Questionable regions in the DOGMA draft annotations were verified using BLAST [37,38] against other asterid plastomes.Annotations of the tRNA genes were confirmed using tRNAscan-SE [39].The genome map and positions of SSRs (see below) were drawn with the help of OGDRAW [40] and GenomeVx [41].
The gene content and genome organization of the C. roseus plastome was compared with other asterid plastomes (Table S1), in particular the complete plastome of Coffea [15] and the nearly complete plastome of Asclepias syriaca (subfamily Asclepiadoideae, Apocynaceae [42]).The plastome sequence of A. syriaca, which contains two unresolved regions in rps8-rpl14 and ycf1, respectively [34], was chosen because it has fewer gaps than those from other Asclepias species [29].Furthermore, several regions in the A. syriaca plastome have been verified by Sanger sequencing [34].
The positions and types of SSRs in the C. roseus plastome were identified using Msatfinder 2.0 [43].The minimum number of repeats were set to 10, 5, 4, 3, 3, and 3 for mono-, di-, tri-, tetra-, penta-and hexanucleotides.To facilitate our comparative analysis, we characterized the plastome SSRs of A. syriaca with the same procedure.Since SSRs that are conserved across genera are likely to be under selective constraint, the SSR contents of these two apocynaceous plastomes were compared to distinguish SSRs that are conserved between the two or are unique to each individual plastome.An SSR is defined as conserved if it consists of the same repeat unit, occurs in the same genomic region (coding, intron, or intergenic), and is bounded by sequences which are alignable in both plastomes.
To find other markers that have potential phylogenetic utility, we calculated the sequence divergence in intergenic regions, which have been shown to be the most variable parts of plastomes [13,44].Three reference plastomes, including those from A. syriaca, Coffea arabica, and Solanum lycopersicon, were used to perform pairwise comparisons with C. roseus to identify fast evolving intergenic regions in this lineage.For genes that are putatively pseudogenized or absent in any of these plastomes (accD, clpP, ycf1, and ycf15 in A. syriaca, ycf15 in Coffea and infA in Solanum), the flanking intergenic regions were excluded from the 111 unique intergenic regions of C. roseus in pairwise comparisons.The intergenic regions were parsed out from the four plastomes using custom Perl scripts and aligned using MUSCLE [45] with the default settings.Sequence divergence in each pairwise comparison was calculated using the DNADIST program of PHYLIP [46].

Phylogenetic analyses
To investigate the interordinal relationships within euasterids I, phylogenetic analyses were conducted using plastome sequences from C. roseus and other asterids.Plastomes of parasitic asterids, which were reported to have accelerated evolutionary rates in plastomes [47,48], were excluded from our analyses.To avoid overrepresentation of certain taxa with complete plastomes (e.g., Olea, Solanaceae, Asteraceae), we constructed a first dataset that included only one species from each genus and at most two genera from each family.To investigate the effects of taxon sampling, a second dataset was constructed to include most asterids with complete plastomes, as well as eight asterids with most plastome protein-coding and rRNA genes sequenced [5] that expanded our sampling of asterids (Table S1).The nucleotide sequences of proteincoding and rRNA genes were parsed from the plastomes of asterids and outgroups using custom Perl scripts and clustered into ortholog groups using OrthoMCL [49].The presence/ absence of orthologous genes was examined for each plastome.Gene absence was verified using BLAST searches with the gene sequences of other asterids as queries.Gene absence due to misannotation was manually corrected.In total, 82 genes were included into the datasets, which contain all protein-coding genes in the C. roseus plastome except for infA and ycf15, which are absent in many asterid plastomes.Eight other genes absent in only few lineages were included in the datasets (Table S1).The gene sequences were aligned with MUSCLE with the default settings and concatenated into a single alignment of 82,219 and 84,401 characters for the first and second datasets, respectively.A maximum parsimony (MP) phylogeny was generated for each dataset using PAUP* 4.0 [50] with heuristic searches, tree bisection and reconnection for branch swapping and 1,000 randomizations.Nodal supports were estimated using 1,000 bootstrap [51] replicates with the same search and branch-swapping options and 100 randomizations.Maximum likelihood (ML) phylogenies were inferred using PhyML [52] with the GTR+I+G model and six substitution rate categories.Bootstrap supports were estimated from 1,000 samples of alignment generated by the SEQBOOT program of PHYLIP.

Gene content and plastome organization
The complete plastome of C. roseus (GenBank accession number KC561139) is 154,950 bp in length, including a large single copy (LSC) region of 85,765 bp, a small single copy (SSC) region of 17,997 bp, and a pair of inverted repeats (IRa and IRb) of 25,594 bp (Figure 1).The gene content of the C. roseus plastome is the same as the basal angiosperm Amborella [53] and includes 86 protein-coding, eight rRNA, and 37 tRNA genes (Table S2).The junction between LSC and IRb (JLB) is within rps19, while the junction between SSC and IRb (JSB) is in the trnN-GUU-ndhF region.The junctions between IRa and the two single copy regions (JLA and JSA) are between rpl2 and trnH-GUG and within ycf1, respectively (Figure 2).This C. roseus sequence represents the second complete plastome in Gentianales.Compared to the first representative, Coffea [15], the two plastomes are similar in terms of their gene content and genome organization.However, several differences were found between the plastomes of C. roseus and A. syriaca, both of which belong to Apocynaceae.Notably, three genes (accD, clpP, and ycf1) were pseudogenized in A. syriaca [34] and other Asclepias species [29].These three genes have been shown to be essential in previous knockout experiments [54,55,56] and remained intact in C. roseus as well as Coffea.Another difference between the plastomes of these two apocynaceous genera lies in the position of JLB (Figure 2).In C. roseus, Coffea, and most asterids [14], JLB is located within rps19.In Asclepias, it is in the spacer between rps19 and rpl2 [34], as in Olea spp.and a few Nicotiana species [57].In addition, the length of the spacer between IRb and ndhF is 43 bp in Nicotiana tabacum, 50 bp in C. roseus, and only 9 bp in Coffea, but it has a size of 540 bp in A. syriaca and around 500 bp in other Asclepias species.In asterids, the spacer between ndhF and IRb (or IRa in Asteraceae where SSC is inverted [58]) rarely exceeds 250 bp and a comparable size is only found in Ipomoea (510 bp).
The most notable difference in plastome organization between C. roseus and Asclepias probably lies in the size of the rps2-rpoC2 spacer.Whereas the spacer between IRb and ndhF partly accounts for the larger size of SSC in A. syriaca (~18,489 bp) than in C. roseus (17,997 bp), the enlarged rps2-rpoC2 is the main contributor to the large size of LSC in A. syriaca (~89,307 bp), which is larger than the C. roseus LSC (85,765 bp) by over 3.5 kb.The C. roseus rps2-rpoC2 (244 bp) has a size similar to that in other asterids, including the moderately rearranged Jasminum plastome (258 bp [59]) and the highly rearranged Trachelium plastome (228 bp [60]).In comparison, the A. syriaca rps2-rpoC2 has a size of 2,680 bp.The extraordinary size of rps2-rpoC2 is also found in the partial plastomes of other Asclepias [29], with the shortest one being 2,639 bp.BLAST similarity searches against the NCBI nr database [61] showed that the part of A. syriaca rps2-rpoC2 that is unalignable with the C. roseus spacer (positions 224-2,641) had a 3' portion (1,776-2,613) with all top 30 hits (excluding Asclepias plastomes in the database) to mitochondrial genomes of eudicots.In the Nicotiana tabacum mitochondrial genome (BA000042), the hit corresponded to the region containing rpl2 exon 2 (360,986-361,737).If this portion had indeed stemmed from the mitochondrial genome in the lineage leading to Asclepias, it would be an extremely rare case of lateral transfer from mitochondria to plastids.Further confirmation, including Sanger sequencing of the plastome rps2-rpoC2 and sequencing of the complete mitochondrial genome, is needed to provide adequate evidence for this putative transfer.
In general, the complete plastome of C. roseus highlights the variation of apocynaceous plastomes.Whereas Catharanthus belongs to the tribe Vinceae (subfamily Rauvolfioideae), which has a relatively basal position within Apocynaceae, Asclepias is nested within the APSA clade formed by the other four subfamilies [42].Given the relatively basal position of Catharanthus within Apocynaceae and the plastome similarity to Coffea, which belongs to the most basal family within Gentianales [16], the organization and gene content of C. roseus is probably more similar to the ancestral apocynaceous plastome.With the C. roseus plastome as the reference for comparative analyses, complete plastomes from other tribes and subfamilies of this speciose family would shed light on the evolutionary history of changes in plastome size (~158,598 bp in A. syriaca compared with 154,950 bp in C. roseus and 155,189 bp in Coffea) and of the losses of the three essential genes (i.e., accD, clpP, and ycf1), which may have involved functional replacement by homologs in the nucleus [62].
To distinguish between the shared and lineage-specific plastome SSRs in C. roseus and A. syriaca, the types and positions of SSRs were compared between the two species.Among the 56 SSRs in C. roseus, 15 are shared with A. syriaca (Table S5).When the SSRs are categorized by repeat length, mononucleotide SSRs account for 80% of the conserved SSRs (Figure 3A).When the locations were considered, a disproportionately high number of conserved SSRs were found in genic regions (Figure 3B).Whereas the proportion of total SSRs in genic regions is about one third of that in intergenic regions in both plastomes, the conserved SSRs in genic regions are nearly as many as those in intergenic regions.Several of the conserved intergenic SSRs (e.g., T 11 in psaI-ycf4) are found near boundaries of genic regions and several (e.g., T 12 in trnM-CAU-atpE) are located  S3) are drawn as lines vertical to the inner circle (color-coded by repeat length; 1-bp: red; 2-or 3-bp: blue; 4-or 5-bp: green).Genes drawn inside the outer circle are transcribed clockwise, those outside counterclockwise.Pseudogenes (Ψ) and genes containing one (*) or two (**) introns are indicated.Five intergenic regions of potential phylogenetic utility are indicated by hollow triangles outside of the outer circle.doi: 10.1371/journal.pone.0068518.g001within polycistronic transcription units [63].In general, it indicates that SSR conservation tends to be found in repeats that correspond to conserved amino acid residues (e.g., T 11 in rpoC2) or that are located in transcribed noncoding regions, which may play a role in plastome gene expression.
Since compound or complex SSRs may show higher variability [64] and have been employed as nuclear markers for plants [65], we examined the C. roseus plastome to find regions containing adjacent SSRs with different repeat units.The only such region was in the ndhA intron and contains two adjacent SSRs, (AT) 5 and A 11 , both of which are unique to C. roseus (Table S3).Additionally, this region has 33 positions of irregular A/T repeats ((TA) n , (AT) n , A n , or T n ) upstream of the two SSRs.These indicate that this region may have good potential for the development of SSR markers.

Divergence of intergenic spacers
To find the plastome regions of potential phylogenetic utility for Catharanthus, the divergence in intergenic regions were calculated between C. roseus and three other euasterids I taxa (Table S6).The divergence levels (average ± std.dev.) are 0.10 ± 0.07 in the pairwise comparison with A. syriaca, 0.14 ± 0.08 with Coffea arabica, and 0.17 ± 0.11 with Solanum lycopersicon.This trend is consistent with the expectations  based on the taxonomy and phylogenetic relationships of these four asterids [7,8].However, examination of the divergence levels across different regions reveals complex patterns.For instance, only nine spacers were identified as one of the 25 most divergent regions in all three pairwise comparisons.This discrepency suggests that lineage-specific rate variation is a common phenomenon in asterid plastomes.One notable example is the trnH-GUG-psbA region, which has a divergence level of 0.43 in the Catharanthus-Asclepias comparison.This estimate is much higher than the second most divergent intergenic region in the comparisons between these two lineages (0.31 in the rpoC1-rpoB region) and also higher than the homologous region in the other two pairwise comparisons (0.26 with Coffea and 0.34 with Solanum).This observation may be best explained by the high evolutionary rate of trnH-GUG-psbA in the Asclepias lineage [29].Another example is the trnN-GUU-ndhF spacer, which is the eighth most divergent region in the Catharanthus-Asclepias comparison, but ranks 95th and 96th in the other two comparisons.This observation can be explained by the presence of a pseudogenized copy of ycf1 in Asclepias, which exhibits considerable divergence from other asterids [34].
To identify the intergenic spacers that are fast evolving in C. roseus (rather than exhibiting accelerated evolution in one of the reference lineages), we examined the spacers that ranked among the 25 most divergent regions in all three pairwise comparisons.Because the phylogenetic utility of a molecular marker is determined both by its variability and length [66], we excluded the spacers that are shorter than 500 bp in C. roseus.A total of five spacers, including rpl32-trnL-UAG, ndhF-rpl32, trnE-UUC-trnT-GGU, rps16-trnQ-UUG, and trnK-UUU-rps16 (Figure 1 and Table S6), were found to satisfy the criteria described.In addition to the phylogenetic inference of Catharanthus (and possibly of related genera in the tribe Vinceae), these markers may facilitate the identification, bar coding, and breeding of C. roseus cultivars with important medicinal or ornamental characteristics.

Phylogenetic analyses
Phylogenetic analyses of the 19 representative asterid taxa with complete plastome sequences inferred a tree topology that was supported by both of the ML and MP approaches (Figure 4).ML and MP trees based on the dataset that includes more extensive taxon sampling (i.e., 47 asterids with complete or partial plastome sequences) are also largely congruent (Figure 5).The monophyly of every order received 100% bootstrap support in both ML and MP analyses of the two datasets, as do most interordinal relationships.The ML tree is shown in Figure 5 because it is more congruent with other phylogenetic analyses based on fewer genes from more asterid taxa [8,11,67].The MP tree has three topological differences from the ML tree, including the grouping of Lonicera (Dipsacales) with Asterales, Ehretia (Boraginaceae) with Lamiales, and Antirrhinum (Plantaginaceae) with Boea (Gesneriaceae).
In general, the two ML phylogenies reconstructed (Figures 4  and 5) are consistent with the latest classification system of the Angiosperm Phylogeny Group [7] in terms of interordinal relationships, with the only exception being the sister relationship between Gentianales and Solanales.The Solanales-Lamiales clade in the APG III [7] system is consistent with an analysis based on six plastome regions from 132 taxa [8], but this interordinal relationship was weakly supported (MP jackknife < 50%) in the latter study.An analysis based on 77 nuclear genes from three euasterids I orders strongly supported (ML bootstrap > 95%) a closer relationship of Gentianales (represented by Catharanthus, Coffea and Kadua (Rubiaceae)) with Solanales (Ipomoea and Nicotiana) than with Lamiales (Antirrhinum, Mimulus (Phymaceae), Ocimum, Salvia (Lamiaceae) and Triphysaria (Orobanchaceae)) [10].The monophyly of Gentianales and Solanales was also found in studies that utilized both nuclear and organelle genes [9,11].On the contrary, the relationships among euasterids I orders based exclusively on plastome genic regions have remained unsettled.When almost all protein-coding and rRNA genes are included, Gentianales may be sister to either Solanales [5] or Lamiales [6,12,13,14].One possible explanation for the unresolved interordinal relationships in plastome phylogenies is inadequate taxon sampling, which could lead to the inference of erroneous topology.Previous studies have shown that the exclusive use of three plastomes of Poaceae to represent the whole monocot clade resulted in the misplacement of Amborella to the basal position of eudicots, instead of the basal position of all angiosperms [53,68].A similar case could be found for Gentianales, where Coffea is the only representative in phylogenies based on completed plastomes.By including divergent genera within Apocynaceae (i.e., Catharanthus, Asclepias, and Nerium), we obtained ML and MP phylogenies suggesting a sister relationship between Gentianales and Solanales (ML bootstrap support = 77% in Figure 4 and 74% in Figure 5), which is consistent with phylogenies exclusively or partially based on nuclear genes [9,10,11].Besides, compared with the analyses based on 77 nuclear genes, where euasterids I is represented by three orders [10], the addition of representatives from Boraginaceae and Garryales into the dataset did not change the sister relationship between Gentianales and Solanales (Figure 5).In summary, these analyses indicate that, based on both plastome and nuclear sequences, the Gentianales-Solanales monophyly is the best supported hypothesis regarding the interordinal relationships among asterids, rather than the Solanales-Lamiales monophyly suggested by the APG III [7] system.Further analyses that include plastome sequences from the other Solanales lineage (clade of Montiniaceae, Sphenocleaceae, and Hydroleaceae [11]), other Gentianales families (Gentianaceae, Loganiaceae, Gelsemiaceae [16]) and other euasterids I families will be needed to further test this hypothesis.

Conclusion
We reported the complete plastome sequence of Catharanthus roseus (Apocynaceae) in this study.Comparative analyses that included the complete Coffea arabica plastome and the nearly complete Asclepias syriaca plastome highlight variations in plastome organization and gene content within the speciose family Apocynaceae, including changes in the sizes of rps2-rpoC2 and IRb-ndhF and presence/absence of three essential genes, which merit further studies on the evolution of apocynaceous plastomes.The C. roseus plastome contains 41 lineage-specific SSRs and five intergenic regions that exhibit high divergence rates.These regions may provide phylogenetic utility at low taxonomic levels, which could be applied to the breeding of Catharanthus cultivars.With respect to the previously unresolved relationships within euasterids I using plastome sequences, the improvement in taxon sampling provided by this study supports the monophyly of Gentianales and Solanales, which is consistent with studies that used nuclear genes.

Figure 1 .
Figure 1.Plastome map ofCatharanthus roseus.The within-plastome GC content variation is indicated in the inner circle.Positions of simple sequence repeats (TableS3) are drawn as lines vertical to the inner circle (color-coded by repeat length; 1-bp: red; 2-or 3-bp: blue; 4-or 5-bp: green).Genes drawn inside the outer circle are transcribed clockwise, those outside counterclockwise.Pseudogenes (Ψ) and genes containing one (*) or two (**) introns are indicated.Five intergenic regions of potential phylogenetic utility are indicated by hollow triangles outside of the outer circle.

Figure 2 .
Figure 2. Comparison of boundaries between inverted repeats (IRs) and single-copy (SC) regions.The sizes of SC regions in Asclepias syriaca are uncertain due to the presence of a small unresolved region in each of the SC regions.doi: 10.1371/journal.pone.0068518.g002

Figure 3 .
Figure 3. Numbers of simple sequence repeats (SSRs) specific toCatharanthus roseus and Asclepias syriaca plastomes and those conserved in both.A: classification of SSRs by repeat length.B: classification of SSRs by region.doi: 10.1371/journal.pone.0068518.g003

Table S1 . Accession numbers of plastome sequences of asterids included in phylogenetic analyses
. (PDF)

Table S3 . Distribution of simple sequence repeats in the Catharanthus roseus plastome
. (PDF)

Table S4 . Distribution of simple sequence repeats in the Asclepias syriaca plastome
. (PDF)

Table S6 . Divergence of plastome intergenic regions in pairwise comparisons between Catharanthus roseus and three other euasterids I taxa
. (XLS)