Complete Plastid Genome Sequence of the Basal Asterid Ardisia polysticta Miq. and Comparative Analyses of Asterid Plastid Genomes

Ardisia is a basal asterid genus well known for its medicinal values and has the potential for development of novel phytopharmaceuticals. In this genus of nearly 500 species, many ornamental species are commonly grown worldwide and some have become invasive species that caused ecological problems. As there is no completed plastid genome (plastome) sequence in related taxa, we sequenced and characterized the plastome of Ardisia polysticta to find plastid markers of potential utility for phylogenetic analyses at low taxonomic levels. The complete A. polysticta plastome is 156,506 bp in length and has gene content and organization typical of most asterids and other angiosperms. We identified seven intergenic regions as potentially informative markers with resolution for interspecific relationships. Additionally, we characterized the diversity of asterid plastomes with respect to GC content, plastome organization, gene content, and repetitive sequences through comparative analyses. The results demonstrated that the genome organizations near the boundaries between inverted repeats (IRs) and single-copy regions (SCs) are polymorphic. The boundary organization found in Ardisia appears to be the most common type among asterids, while six other types are also found in various asterid lineages. In general, the repetitive sequences in genic regions tend to be more conserved, whereas those in noncoding regions are usually lineage-specific. Finally, we inferred the whole-plastome phylogeny with the available asterid sequences. With the improvement in taxon sampling of asterid orders and families, our result highlights the uncertainty of the position of Gentianales within euasterids I.


Introduction
Plastids are crucial organelles for photosynthesis and other metabolic pathways, which arose only once through endosymbiosis of free-living cyanobacteria within eukaryotic cells [1]. The plastid genomes (i.e., plastomes) are valuable sources of phylogenetic information due to their relatively stable genome structure and higher evolutionary rate compared to mitochondrial genomes [2]. To date, over 170 complete angiosperm plastomes have been sequenced (NCBI Organelle Genome Resources). However, the taxon sampling of these sequences is highly uneven. For the two major eudicot clades, rosids have 75 complete plastomes available and asterids have only 36 (as of December 2012). In terms of the order-level lineages recognized in the Angiosperm Phylogeny Group (APG) classification system III [3], only five out of the 14 asterid groups have completed plastomes. In addition, several of the complete asterid plastomes were sampled from multiple species of the same genus (e.g., Nicotiana, Solanum) or even subspecies of a single species (Olea europaea). Furthermore, five of the 36 available asterid plastomes were sampled from parasitic lineages (Epifagus virginiana and Cuscuta spp.), which have undergone genome reduction and exhibit accelerated sequence divergence [4,5].
To improve our understanding of plastome evolution and to expand taxon sampling in asterids, we chose the coral berry Ardisia polysticta Miq. for whole-plastome sequencing in this study. Ardisia is a member of the basal asterid order Ericales, which is the sister group to all euasterids [6]. It is one of the largest genera in the family Myrsinaceae [7] (or included in Primulaceae based on APG III [3]), estimated to have nearly 500 species distributed in the paleotropical and neotropical regions [8]. Fruits and other parts of the plant bodies are consumed for their nutritional values in Asia [9,10,11]. Additionally, many species are commonly used in traditional Chinese medicine to treat symptoms such as coughing and diarrhea [11]. Phytochemical studies have shown various medicinal properties of this genus, including antioxidant [12], anti-HIV [13], and anti-tumor [14] effects. Compounds with biological activities have also been identified from Ardisia, such as ardisicrenosides [15], ardisiaquinones [16], and ardisiphenols [17], indicating the potential for development of novel phytopharmaceuticals [18]. In addition to the nutritional and medicinal values, Ardisia also includes many well-known ornamental species cultivated worldwide (e.g., A. japonica, A. crispa, A. squamulosa, A. escallonioides). Among them, A. crenata has the longest history of cultivation -nearly 200 years since its first description as an ornamental [19]. It also has attracted great attention for being an invasive species in the USA [20]. The species chosen in this study, A. polysticta, is closely related to A. crenata according to molecular phylogenetic inference [21]. Both species are widely distributed, with A. polysticta mainly in Southeast Asia [22] and A. crenata East Asia [8]. Due to their morphological similarities, misidentification between these two species is a common problem [23].
In addition to offering a basal asterid reference for plastome comparisons within asterids, the complete Ardisia plastome will also be important for future studies on the plastid biology, plastid engineering, and phylogenetics of Ardisia and related genera. Plastids are the compartments for one of the two synthesis pathways of isopentenyl diphosphate in plants, which is converted into isoprenoids, steroids, terpenoids, and other compounds [24]. Many of the biologically active compounds isolated from Ardisia are saponins, which are glycoside derivatives of steroids or terpenoids [25] and are synthesized within plastids [26]. A fully sequenced plastome not only adds to our knowledge of Ardisia plastids, but also facilitates development of plastid genetic engineering in Ardisia, which could be used to increase the production of biologically active metabolites synthesized within plastids. Plastid transformation also has several advantages over nuclear transformation, including polycistronic gene expression, higher expression levels, and transgene containment due to lack of pollen transmission [27].
As a valuable resource for evolutionary analyses, the completely sequenced plastome could facilitate phylogenetic studies at lower taxonomic levels. Phylogenetic analysis using the trnL-trnF region, one of the most popular plastome markers for molecular phylogenetics, resulted in a largely polytomous tree of 12 Ardisia species [28]. To resolve interspecific relationships in the speciose genus Ardisia, the complete plastome can provide a reference for designing Ardisia-specific primers that amplify fast evolving regions reported for other angiosperms. Furthermore, it is known that different plastome regions show variable rates of evolution across plant taxa [29] and it is difficult to find a set of markers applicable to a wide range of plant lineages [30]. A solution to this problem would be to use the complete plastome sequence to identify Ardisiaspecific fast evolving regions. In addition to resolving interspecific relationships, these markers could also be used for the identification of Ardisia species, which is difficult and causes confusion to researchers studying their medicinal usages [18]. Furthermore, the simple sequence repeats (SSRs) in plastomes can be used for evolutionary and ecological studies at the levels of cultivars, populations, and closely related species [31]. For example, the SSR markers can be used to track the population histories of closely related A. polysticta and A. crenata, which share much of their distribution range. These SSR markers can also supplement previous studies on the expansion history of invasive A. crenata populations such as that by Niu et al. [20], which used the largely invariable trnL-trnF as the only plastome marker. Due to their maternal inheritance, both the Ardisia-specific fast evolving regions and SSRs in the plastome could also assist in the characterization, parent identification, and selection of new cultivars of Ardisia ornamentals.
In this study, we determined the complete plastome sequence of A. polysticta and characterized its genome structure, gene content, and other characteristics such as repetitive sequences. Through comparative analysis with other asterid plastomes based on a phylogenetic framework, we aim to investigate the evolutionary history of plastomes in this major angiosperm clade. Furthermore, we examined the divergence level between Ardisia and euasterids in plastome intergenic regions to identify a list of molecular markers that can facilitate future phylogenetic studies.

Sequencing and Assembly
Fresh leaves were collected from A. polysticta at Yuanyang Valley, Hsinchu County, Taiwan. The voucher specimen (Ku028) was deposited in the National Taiwan University Herbarium (TAI). A. polysticta is not an endangered or protected species in Taiwan. According to the regulations of the Forestry Bureau (Council of Agriculture, Taiwan), no specific permits were required for collection of non-protected species at Yuanyang valley because this location is not a part of a nature reserve or national park.
For DNA extraction, 1.6 g of leaves was grounded using a ceramic mortar and pestle set with 15 mL PBS. The suspension was filtered through 100 mm filters and centrifuged at 1,200 g to remove uncrushed tissues and intact plant cells. The supernatant was then centrifuged at 16,000 g to pellet subcellular parts, from which DNA was extracted using the Tri-Plant Genomic DNA Reagent Kit (Geneaid, Taipei, Taiwan). A paired-end library was prepared from the DNA sample and sequenced using the HiSeq 2000 platform (Illumina, USA) by a commercial sequencing service provider (Yourgene, Taiwan). The Illumina sequencing technology was chosen because it is more accurate for sequencing homopolymers compared with Roche 454 platforms [32] and has been shown to work well for other plastomes [33,34]. As many plastome SSRs are mononucleotide repeats with variable lengths in different haplotypes [31], it is important to accurately sequence these motifs. Approximately 224 million paired-end reads of 101 bp were obtained, with an average insert size of 251 bp. The raw reads were quality trimmed at the first position from the 59end that has a quality score of lower than 20. Reads that are shorter than 70 bp after the quality trimming were discarded.
For the de novo genome assembly, we used Velvet 1.2.07 [35]. The assembly parameters were set to k = 55, expected coverage = 1,500X, maximum coverage = 7,500X, and coverage cutoff = 300X based on our iterative optimization tests. To distingusih the scaffolds of plastid origin from those of nuclear or mitochodial, we used the BLAST [36,37] similarity searches against the NCBI nr database [38] to identifiy scaffolds that encode plastid genes. Three large scaffolds that contain approximately 129 kb of unique sequence in the A. polysticta plastome were identified in the initial draft assembly. Primer walking and additional Sanger sequencing were then used to fill the gaps within and between these scaffolds and to validate the regions with possible assembly artifacts. The final complete plastome sequence was further checked by using BWA [39] for mapping all Illumina reads and IGV [40] for visual inspections.

Annotation and Genome Map Drawing
The online automatic annotator DOGMA [41] was used to annotate the A. polysticta plastome. BLAST against other plastomes was also used to verify questionable regions in the DOGMA draft annotation. For tRNA genes, the annotations were also confirmed using tRNAscan-SE [42]. The annotations exported from DOGMA were compared with those of other plastomes and manually curated. The genome map and positions of repetitive sequences (see below) were drawn with the help of OGDRAW [43] and GenomeVx [44].

Genome Analyses
To have a comprehensive overview of asterid plastome evolution, we compared the A. polysticta plastome with other available asterid plastomes (Table S1) with respect to GC content, genome organizations, and content of repetitive sequences. Because the intergenic regions are the most variable parts in plastomes [45,46], we calculated the sequence divergence between A. polysticta and representative euasterids to find regions of potential phylogenetic utility for Ardisia at lower taxonomic levels. To avoid biases in mutation rate in the selected euasterid plastome, two plastomes with similar gene content and gene order to those of the A. polysticta plastome were chosen, including Sesamum indicum (euasterids I) and Panax ginseng (euasterids II). There are a total of 126 intergenic regions in the A. polysticta plastome (the 59 and 39 portions of rps12 are considered different genic regions), of which 16 were IR duplicates. The 110 unique regions were parsed out from the three genomes using custom Perl scripts, aligned using MUSCLE [47] with the default settings, and analyzed using the DNADIST program in the PHYLIP package [48] to calculate the sequence divergence.
For characterization of repetitive sequences, the program Msatfinder v2.0 [49] was used to find SSRs in the plastomes of A. polysticta and other asterids by setting the minimum number of repeats to 10, 5, 4, 3, 3 and 3 for mono-, di-, tri, tetra-, penta-and hexanucleotides. For tandem and dispersed repeats, the program REPuter [50] was used to identify these elements with a repeat unit of at least 26 bp and sequence identity greater than 90%.

Phylogenetic Analysis
We used plastome genes to reconstruct a phylogeny of asterids with completed plastomes (Table S1). Holoparasitic or hemiparasitic taxa that were previously reported to have accelerated evolutionary rates in plastomes [4,5] were excluded from our analysis to avoid problems in phylogenetic reconstruction. In addition, Parthenium argentatum (Asteraceae) was also excluded due to the inconsistency in the number of proteincoding genes reported in the original study (85; Kumar et al. [51]) and the annotation found in the GenBank entry (56; accession number NC_013553). It is possible that the exclusive use of 454 reads in the assembly of this genome [51] has produced many frameshift artifacts. To avoid overrepresentation of certain genera or families, we reconstructed another tree in which only one plastome was included for each genus and at most two plastomes from different genera for each family. The protein-coding and rRNA genes were parsed from the selected plastomes of asterids and outgroups and clustered into ortholog groups using OrthoMCL [52]. We then examined the presence/absence of orthologous genes in each plastome. To confirm gene absence, we used the genic sequences of A. polysticta as the queries to run BLAST searches against the plastome in question. False negative results due to misannotation (e.g., ycf1 in Lactuca, rps19 in Boea, infA in Olea) were manually corrected to increase the number of usable genes for phylogenetic inference. In total, we included 74 protein-coding and four rRNA genes that are present in all of the plastomes analyzed. The sequences were aligned with MUSCLE with the default settings, concatenated into a single alignment of 77,976 characters, from which a maximum likelihood phylogeny was inferred using PhyML [53] with the GTR+I+G model and six substitution rate categories. Nodal supports were estimated from 1,000 bootstrap [54] samples of the alignment generated by the SEQBOOT program of PHYLIP.

Genome Organization and GC Content
The complete plastome of A. polysticta (deposited in GenBank under the accession number KC465962) has a total length of 156,506 bp ( Figure 1). It has a pair of inverted repeats (IRa and IRb) of 26,050 bp that separate a large single copy (LSC) region of 86,078 bp and a small single copy (SSC) region of 18,328 bp ( Table 1). The genic regions account for 58.3% of the genome, including 86 protein-coding (50.7%), eight rRNA (5.8%), and 37 tRNA genes (1.8%) (Table S2). Six tRNA and ten protein-coding genes contain one intron and two protein-coding genes (ycf3 and clpP) have two introns, while the remaining genes are intronless. The rps12 gene, as in Nicotiana [55], consists of a 59 portion (exon 1) in LSC and a 39 portion (exons 2 and 3) in IR. The GC content of the whole plastome is 37.07%, with the IRs having a higher GC content (43.01%) than those of LSC (34.94%) and SSC (30.17%) due to the presence of GC-rich rRNA genes ( Figure 1).
It is notable that the plastome sequence of the basal asterid A. polysticta has the second lowest GC content among all reported asterid plastomes (Table S1). The asterid plastome with the lowest GC content found so far is that of Epifagus virginiana, a holoparasitic plant with the second smallest land plant plastome reported to date (70,028 bp) [56]. Because of the mutational bias of GC-to-AT substitutions [57,58], it is not surprising that the Epifagus plastome, characterized by accelerated evolution and extensive reduction [5,56], has a GC content below the norm of asterids. This effect is also evident in the hemiparasitic genus Cuscuta, where the two more reduced plastomes (C. gronovii, C. obtusiflora) have lower GC contents than those of the two other less reduced plastomes. When Epifagus is not considered, GC contents of asterid plastomes fall within the range from 37.07% to 38.33%, which is relatively narrow compared with either rosids (33.97-39.61%) or monocots (36.65-39.01%) [2]. Additionally, the GC contents show little within-genus variation (Nicotiana: 37.79-37.88%, Solanum: 37.86-37.88%, Olea: 37.79-37.81%), indicating different lineages have specific ranges of GC contents. When compared with the outgroup Spinacia (Caryophyllales, 36.82%), there is a trend toward increased plastome GC content from the outgroup to the basal asterid and then to euasterids.

Divergence of Intergenic Regions
To investigate the variation of sequence divergence rates among intergenic regions, we compared the A. polysticta sequences to that of Panax ginseng and Sesamum indicum. Subsequently, we identified the 20 most conserved and the 20 most divergent regions in these two comparisons. Among the most conserved regions, the two pairwise comparisons shared 17 homologous regions ( Table 2). Twelve of these regions are in IRs, which is consistent with the observation that IRs are more conserved than LSC and SSC [45,46,59]. The other five regions are relatively short (,60 bp) and are located within polycistronic transcription units [60,61]. The high levels of sequence conservation in these regions may be explained by selective constraint that stemmed from their roles in splicing.
Among the most divergent regions, 16 were shared between the two pairwise comparisons. Surprisingly, several plastome markers commonly used for molecular phylogenetics of asterids at low taxonomic levels, such as regions between trnT-UGU and trnF-GAA [20,28,62,63] and between atpB and rbcL [63,64,65], are not included in this list. To resolve interspecific relationships in a speciose genus such as Ardisia, suitable markers should be variable and, at the same time, encompass a region of adequate length, so that there will be sufficient characters. Therefore, we suggest that the intergenic regions of over 500 bp are markers of potential phylogenetic utility for Ardisia, as highlighted in Table 2 and Figure 1. Other regions may also contain useful information for phylogenetic analyses, but their utility is limited by the short sequence lengths. For example, the trnH-GUG-psbA spacer region is frequently used for phylogenetic analyses, but has an average length of only 465 bp and thus is often too short to yield a wellresolved phylogeny [66]. All of the seven highlighted regions are located in SC regions, with two in the region between trnF-GAA and trnV-UAC in LSC, three between trnK-UUU and trnG-UCC in LSC and the other two between ndhF and trnL-UAG in SSC ( Figure 1). Among them, several were found to be highly variable in other studies: three in comparison between Helianthus and Lactuca [30] and four in comparison among olive cultivars [67]. Five of them were also found in the list of nine intergenic regions recommended for angiosperm molecular phylogenetics at low  (Table 2) are indicated by hollow triangles outside the circle (numbered from more to less divergent). Numbers and locations of repetitive sequences (Table 4) are drawn on the four inner circles (from inside: dispersed direct repeats (forward triangles), inverted repeats (forward and reversed triangles), tandem repeats (tandem triangles), palindromic sequences and a sequence that matches its reversed sequence (hexagrams)). doi:10.1371/journal.pone.0062548.g001 taxonomic levels [68], further indicating the potential of these regions for species-level phylogenetics of Ardisia. It is notable that the longest of the 16 regions, rps16-trnQ-UUG, has a length of only 429 bp in Arabidopsis and only 407 bp in Spinacia, which has a sister relationship with asterids [3]. In asterids, its length is mostly in the range between 800 and 1,300 bp, and is over 1,700 bp in three lineages distributed in euasterids I (Oleaceae), euasterids II (Panax) and basal asterids (Ardisia). This length variation suggests that this region evolves relatively fast in asterids. In addition, its length in A. polysticta is shorter than 1,800 bp and thus could be sequenced with a single PCR run and Sanger sequencing with primers at both ends. In light of these, the rps16-trnQ-UUG spacer appears to be the best candidate marker for resolving interspecific relationships in Ardisia.

Simple Sequence Repeats (SSRs)
There are 57 SSRs with a length of at least 10 bp in the A. polysticta plastome, including 45 mono-, four di-, seven tetra-and one pentanucleotide repeats (Table 3). No trinucleotides or hexanucleotides were found. Most (43/45) of mononucleotides consist of A or T and all of the dinucleotides are AT or TA repeats, which is consistent with the AT-richness of the plastome. We also screened the other asterid plastomes for SSRs with a length of at least 10 bp (Table S1). The number of SSRs in each asterid plastome ranges from 27 in Boea to 75 in Crithmum. It is quite surprising that the largest number of plastome SSRs found in asterids is much smaller than the number of SSRs in the rosid Arabidopsis thaliana (104) or in the monocot Dioscorea elephantipes (95; NC_009601; 152,609 bp, 37.15% GC).
At the genus-or tribe-level, the number of SSRs per plastome shows little variation: 59-63 in Nicotiana spp., 53-56 in Solanum spp., 51-53 in the Guizotia-Helianthus-Parthenium clade of Asteraceae [69]. The range is slightly larger in Olea (57-70), but the one with the fewest SSRs (O. europaea ssp. cuspidata) actually has seven mononucleotides just below the 10 bp cutoff, which are only one or two bp shorter than the counterparts in O. europaea ssp. maroccana. Among the Cuscuta species , the differences could be attributed to their different levels of plastome reduction. This is consistent with previous findings that SSR abundance is positively correlated with plastome size [70,71]. In contrast with the conservation in the number of SSRs in closely related taxa, the number of SSRs differs considerably from one family to another. In the order Apiales, its range is 60-75 in Apiaceae, but only 37-46 in Araliaceae. In Lamiales, it is 57-70 in Oleaceae, but only 35 in Sesamum (Pedaliaceae) and 27 in Boea (Gesneriaceae). To determine if there are any shared SSRs in asterid plastomes, the SSR positions in the A. polysticta plastome were compared with those in Helianthus annuus, Panax ginseng, Solanum lycopersicum, Boea hygrometrica, Olea europaea cv. Bianchera and Coffea arabica plastomes. There is no SSR position common to all of these asterid plastomes. Two SSR positions are found in all but the Helianthus plastome. They are T homopolymers in rpoC2 and atpB, corresponding to conserved lysine residues. Although SSRs in protein-coding regions tend to be conserved across lineages, they only represent a small portion of all plastome SSRs (14/57 in A. polysticta) and are unlikely to change in length due to the selection on maintaining reading frames. The higher evolutionary rates of noncoding regions create different sets of SSRs in different lineages that are more likely to be variable among haplotypes. This explains why the number of plastome SSRs changes dramatically from family to family and underscores the importance of a whole-plastome reference for SSR identification in related taxa.

Long Repetitive Sequences
Ten sets of repetitive sequences that are 26 bp or longer were found in the A. polysticta plastome (Table 4; Figure 1). They were further divided into five categories based on the structure, including (1) tandem repeats, (2) dispersed direct repeats, (3) inverted repeats, (4) palindromic sequences, and (5) sequences that match their own reversed sequences. This five-type classification system is different from the seven-class system used by Timme et al. [30] in that we excluded SSRs (which are more abundant and were considered separately; see the previous section) and did not separate repeats in genic or intergenic regions into distinct categories (the numbers of long repeats were too few to warrant such detailed classification).
To investigate the evolution of these long repetitive sequences, we examined other asterids and outgroups (Table S1) for regions similar to the consensus sequences of the ten sets found in A. polysticta. Four sets of repetitive sequences were found to be conserved in asterids, Spinacia, and Arabidopsis: Nos. 1, 3, 7, and 9 ( Table 4). The first three sets are found in all asterids except some parasitic taxa due to deletion or pseudogenization of certain genes (ycf3, trnV-GAC, ndhA, psaA, psaB and trnS-GGA in Epifagus [56], ndhA in Cuscuta spp. [4]). Two of these sets (Nos. 3 and 7) are similar portions of different photosystem I subunit genes (No. 3) or trnS genes (No. 7). Set No. 9 consists of a single palindromic sequence found in all asterids but Cuscuta spp., Jasminum, and Trachelium, probably because of high divergence levels of ycf2 in these lineages [72]. Set No. 1 merits special attention because it has the longest consensus sequence (42 bp) among the ten sets and has been identified previously [30,45,67]. Additionally, this repeat was found in all four regions of asterid plastomes (i.e., LSC, SSC, IRa, and IRb). In contrast to the conserved repeats, sets Nos. 2, 4, 6, 8 and 10 were absent in Arabidopsis, Spinacia and most asterids. Three of them (i.e., Nos. 4, 8, and 10) are located in intergenic regions and the other two in the fast evolving genes accD (No. 6) and ycf2 (No. 2), which ranked as the third and the sixth most divergent genes in 17 tracheophyte plastomes [73]. The higher evolutionary rate explains why these repetitive sequences are more lineagespecific.
The remaining repeat (i.e., No. 5) has a more intriguing distribution across asterid lineages. It is absent in Coffea, Boea, Sesamum, Convolvulaceae and Apiaceae, but present in Oleaceae, Solanaceae, Araliaceae and four asteraceous genera. It consists of a perfect palindromic sequence in A. polysticta, but, in other asterids, it corresponds to a stretch of imperfect (except for Solanum lycopersicum) palindromic sequence capable of forming stem-loop structure (Table S3). Additionally, the loop sequence differs among genera and even among species in Solanum. As this sequence is generally found near the 39 end of atpH and in the middle of a single transcription unit from rps2 to atpA [61], it may play a role in the gene expression processes.

Boundaries between Inverted Repeats and Single-copy Regions
To have a comprehensive overview of the asterid IR/SC boundary organizations, we compared the A. polysticta plastome with all available complete plastomes of nonparasitic euasterids. The asterid plastomes can be divided into seven types according to the extent of IR at the junctions between LSC and IRb (JLB), between SSC and IRb (JSB) and between SSC and IRa (JSA) (Figure 2; Table S4). Type I, represented by A. polysticta, has JLB within rps19 or rps19-rpl2, JSA within ndhF or trnN-GUU-ndhF, and JSB within ycf1. This is the most common type in asterids, other eudicots [2] and the basal angiosperm Amborella [74]. Given its presence in the basal asterid A. polysticta and the outgroup Spinacia, this Type I organization probably represents the ancestral state of asterids. Because gene conversions occur constantly at the IR/SC boundaries, causing small expansions or contractions of IR even in within-genus comparison [75], Type I can be further divided into four subtypes. The subtyping is mainly based on whether JLB is within rps19-rpl2 or rps19 and whether JSB is within trnN-GUU-ndhF or ndhF. Among the subtypes, Type I-2, where JLB is within rps19 and JSB trnN-GUU-ndhF, is the most common in euasterid plastomes and is found in all five euasterid orders (Table S4; Figure S1). If we consider this subtype as ancestral for euasterids, it could be inferred that several transitions have occurred from Type I-2 to other subtypes: Type I-3 in Olea spp., Type I-4 in a subclade of Nicotiana, and Type I-1 in Boea, Hydrocotyle, and Solanum tuberosum ( Figure S1).
In comparison with Type I, other types of IR/SC organizations in asterids include those with substantial IR expansion (II, III, IV), IR contraction (III, V), or large rearrangements (VI, VII). The different types are also characterized by lengthening/shortening of IRs, LSC and SSC. For instance, the Ipomoea plastome (Type III) has a longer LSC and a shorter SSC, which are indicative of IR contraction at JLB and expansion at JSA, respectively. The events of IR expansion/contraction do not seem to have an apparent pattern in euasterids. Closely related taxa, such as Petroselinum and Crithmum, could change in the opposite directions at the same IR/ SC boundary (Figure 2; Figure S1). In contrast to some seed plants [33,76,77], there seems to be no significant IR expansion/ contraction event at JLA (junction between LSC and IRa) in the evolutionary history of the asterid plastomes.

Phylogenetic Analysis of Asterid Plastomes
We reconstructed the phylogenetic relationships of representative asterid taxa and two outgroups using 78 orthologous genes shared by all plastomes. The phylogeny shown in Figure 3 contains 16 ingroups that best represent asterid diversity (see Materials and Methods for the over-representation issue of certain genera). Additionally, a more comprehensive sampling that includes all usable nonparasitic euasterid plastomes is shown in Figure S1. In terms of order-or family-level diversity, these are by far the most comprehensive phylogenetic analyses of asterids based on complete plastomes. The topology was not affected by the taxon sampling. Both maximum likelihood trees strongly support the basal position of Ericales within asterids and the subdivision of euasterids into euasterids I (Gentianales, Lamiales, Solanales) and euasterids II (Apiales, Asterales). It should be noted that all the nodes in the whole-plastome tree of 16 asterids received 100% support in the ML bootstrap analyses, except for the one grouping Gentianales and Lamiales ( Figure 3). In the latest APG classification system, Lamiales is more closely related to Solanales than either is to Gentianales [3], which is consistent with a phylogeny of 111 taxa based on three plasid protein-coding genes and three plastid non-coding regions [6]. However, many other phylogenetic analyses with more extensive taxon sampling and/or based on more molecular markers resulted in topologies different from that of the APG III system. In the evolutionary tree based on four mitochondrial genes, the relationships among the major euasterids I orders are largely unresolved [78]. On the contrary, Gentianales is the sister group to Solanales in the phylogenetic trees inferred from 77 nuclear genes [79], a combination of one nuclear and three plastid genes [80], or 17 regions from all three genomes [81]. Phylogenetic trees based on whole-plastome data show a sister relationship of Gentianales either with Solanales [82] or with Lamiales [45,83]. As these three studies used almost all protein-coding and rRNA genes in an angiosperm plastome, the discrepancy is best explained by the differences in taxon sampling. Whereas the parasitic taxa Epifagus and Cuscuta were included in Moore et al. [82], Jansen et al. [83] and Yi and Kim [45] included multiple plastomes from a single genus (Olea, Nicotiana, Solanum) or even a single species (Olea europaea) in their analyses. Our exclusion of the parasitic lineages, which are likely to cause long-branch attraction problem due to their accelerated evolutionary rates, resulted in the grouping of Gentianales and Lamiales (Figures 3 and S1). Although the control for reducing overrepresentations of certain taxa improves the bootstrap value of this clade (56 vs. 49; cf. Figures 3 and S1), the support is still relatively weak. Examination of the bootstrap trees showed that the clustering of Gentianales and Solanales represents the best supported alternative. This result indicates that the present dataset could not resolve the relationships within euasterids I. Further analyses with complete plastome sequences from other euasterids I and more extensive taxon sampling within Gentianales would be needed to resolve the relationships.

Gene Content Evolution
When compared with the gene content of A. polysticta plastome, six protein-coding genes have been differentially lost in asterid lineages ( Table 5). Three of these that have been pseudogenized or deleted outside of the asterids are: infA (lost in many rosids and other angiosperm lineages [84]), rpl23 (Spinacia [85] and several gymnosperms [33]), and accD (many Poales families [86]). Both accD and rpl23 were found to be essential for Nicotiana [87,88] and rpl23 was suggested to be replaced by a nuclear homologue in Spinacia [85]. The infA gene was found to have been independently transferred to and expressed in the nucleus with a transit peptide in lineages without intact plastome infA (including Solanum lycopersicum) [84]. These suggest that their loss in different asterid lineages might indicate independent functional replacement by a nuclear copy.
Mapping of gene loss events onto the plastome phylogeny ( Figure 3; Figure S1) shows that many losses occurred in lineages with plastomes characterized by large inversions (Trachelium [72], Jasminum [89]) or IR contraction/expansion (Ipomoea, Jaminum; Figure 2; Table S4). Specifically, the Trachelium plastome, which has been extensively rearranged [72], has lost all the six genes (Table 5). Moreover, these lineages have longer branches compared to their sister groups (Figure 3; Figure S1) and they have the smallest SSC and the largest LSC among asterids  Table S4. doi:10.1371/journal.pone.0062548.g002 ( Figure 2). This is intriguing because the opposite would be expected due to higher evolutionary rate of genes in SSC than in LSC [45,46]. More studies are needed to investigate such association of higher evolutionary rate, changes in plastome organization, and gene loss across a larger range of angiosperm lineages.

Evolution of ycf15
The hypothetical gene ycf15 has been lost six times in the asterid phylogeny ( Figure S1). Based on nucleotide sequence similarity, two regions separated by an intervening sequence of 250-300 bp in plastomes of several basal angiosperms, monocots, and non-asterid eudicots correspond to the 59 (1-154) and 39 (155-264) portions of the Nicotiana ycf15 [70,90]. However, the intervening sequence was shown to be absent in a few asterids, including Epifagus virginiana, Cuscuta reflexa, Panax ginseng, and two other solanaceous genera Atropa and Solanum [70,90]. In this study, we further confirmed the absence of the intervening sequence in other complete asterid plastomes, including those from Apiaceae, Lamiales, and the basal asterid Ardisia, thus pinpointing the time of its loss to the range after the divergence of Caryophyllales and before the Ericales-euasterids split. Amplification of ycf15 from Spinacia cDNA showed that it was transcribed, but the intervening sequence was not removed in the RNA transcript [90]. Since  Table 5. Loss and gain of plastome protein-coding genes relative to Ardisia polysticta in nonparasitic euasterids. premature stop codons in the intervening sequence would result in truncated protein products without the region homologous to the Nicotiana ycf15 39 portion, this led to the suggestion that ycf15 was probably not a protein-coding gene [70,90]. However, even in asterid plastomes where a continuous region homologous to the Nicotiana ycf15 occurs, frameshift indels are found in the ycf15 39 portion of non-Solanaceae asterids, resulting in premature stop codons in Lamiales and Ardisia or an extended but dissimilar 39 portion in Eleutherococcus and Panax ( Figure S1; Figure S2). The high length and sequence variability of the 39 portion suggests that it plays a minor role in the function of ycf15. Compared to the 39 portion, the 59 portion is largely invariable and no frameshift mutation has been observed based on available plastome sequences. Alignment of the amino acid sequences also shows that there is a conserved ycf15 region that corresponds to the central region of ycf15 in asterids and the 59 half in non-asterids ( Figure S2). If ycf15 is indeed expressed as polypeptides, this region would probably assume the main functional role.

Conclusions
The complete plastome sequence of the basal asterid Ardisia polysticta was obtained using Illumina technology and Sanger sequencing. The Ardisia plastome has the gene content and organization typical of asterids and most angiosperms. By comparing the divergence levels of intergenic regions between Ardisia and euasterids, we found candidate regions of potential phylogenetic utility for this speciose genus. Using the Ardisia plastome as a reference sheds light on the characteristics and diversity of asterid plastomes with respect to GC content, plastome organization, gene content and content of repetitive sequences. Phylogenetic analysis based on complete plastomes highlights uncertainty in the position of Gentianales within euasterids I, which merits further studies. Figure S1 Maximum likelihood phylogeny of 78 plastome genes from 35 nonparasitic asterids (11 families, 6 orders). All nodes, except where noted, received 100% bootstrap support. Gene loss events are mapped onto the tree in the most parsimonious way ( Table 5). Types of inverted repeat/single copy boundary organization are also indicated (Table S4). All euasterid taxa, except where noted, have Type I-2 plastomes. Indel events within the 39 portion of ycf15 ( Figure S2B) are also mapped onto the tree. (TIF) Figure S2 Alignment of ycf15. A. Alignment of ycf15 amino acid sequences in Calycanthus, Arabidopsis, Spinacia, and asterids. The arrow indicates the divide between the 59 and 39 portions of ycf15 in Nicotiana tabacum, to which the homologous regions are separated by a 250-300 bp intervening sequence in non-asterid angiosperms. B. Alignment of asterid ycf15 nucleotide sequences corresponding the boxed region of Nicotiana tabacum in A. In-frame stop codons are boxed. Arrows indicate the six non-triplet indels. (TIF)

Supporting Information
Table S1 Accession numbers of complete plastome sequences of asterids and of those included in the phylogenetic tree in Figure 3