Understanding evolution in Poales: Insights from Eriocaulaceae plastome

In this study, we report the plastome of Eriocaulon decemflorum (Eriocaulaceae) and make an effort to understand the genome evolution, structural rearrangements and gene content of the order Poales by comparing it with other available plastomes. The size of complete E. decemflorum plastome is 151,671 bp with an LSC (81,477bp), SSC (17,180bp) and a pair of IRs (26,507 bp). The plastome exhibits GC content of 35.8% and 134 protein-coding genes with 19 genes duplicated in the IR region. The Eriocaulaceae plastome is characterized by the presence of accD, ycf1 and ycf2 genes and presence of introns in clpP and rpoC1 genes which have been lost in the Graminid plastomes. Phylogenomic analysis based on 81 protein-coding genes placed Eriocaulaceae sister to Mayacaceae. The present study enhances our understanding of the evolution of Poales by analyzing the plastome data from the order.


Introduction
The order Poales contains 15 families [1] and over 20,000 species, representing about onethird of monocots [2]. The order also includes many economically important crops, such as rice (Oryza sativa L.), wheat (Triticum aestivum L.), maize (Zea mays L.), millets, bamboo and lots of ecologically important species that dominate modern Savanna and Steppe vegetation [2]. Poales can be simplified into five major groups viz. Bromeliads, Cyperids, Xyrids, Restiids, and Graminids [2,3,4]. The order has been studied for genome evolution, and ancient polyploidy events wherein transcriptome data was generated for representatives of each clade of the order [5,6]. As far as the phylogenomic studies are concerned, most of the studies are available for the Graminids, focusing on Poaceae because of its ecological, evolutionary, and economic importance [7,8]. Also, amongst Poales, the highest number (i.e., 396) of plastid genomes have been generated for the family Poaceae (https://www.ncbi.nlm.nih.gov/genome/ ). The plastomes of Poaceae have undergone several evolutionary events, such as inversions (28 kb inversion between trnG-UCC and rps14 region, <1kb in the trnT sequence and a 6 kb in the trnG-UCC), complete loss of genes (accD, ycf1 and ycf2) and intron losses in the genes clpP, rpoC1 [7,8,9,10,11]. However, the genome information for other families from Graminids is sparse. Other than Poaceae, plastome sequence is available only for Joinvilleaceae [12]. Also, very few plastomes are available from Bromeliads (2 published, one unpublished) and a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 Cyperids (3, Cyperaceae) (S1 Fig). No chloroplast genome has been sequenced for any member of Restiids until now. Besides, these major groups, no attempt has been made to understand the gene content, structural rearrangements, and genome evolution of order Poales as a whole.
The family Eriocaulaceae belongs to Xyrids of Poales and is sister to family Xyridaceae [2,3,4]. The family consists of ten genera and ca. 1400 species which are distributed throughout the tropics [13,14,15] and the family can be easily distinguished by characteristic capitulum or head inflorescence [16]. Ruhland [17] classified the family into two subfamilies Eriocauloideae and Paepalanthoideae comprising of two and eight genera, respectively. The members of Eriocaulaceae inhabit a variety of habitats like marshy or aquatic to terrestrial and xeric habitats. Moreover, they also comprise of both annuals and perennials [18]. Eriocaulon L. (subfamily Eriocauloideae) is the largest genus of Eriocaulaceae and exhibits cosmopolitan distribution [13,18,19]. Taxonomy of this genus has remained a challenge for taxonomists due to high intraspecific variations and limited interspecific differences [20,21,22]. Several studies have been conducted to understand relationships between the family Eriocaulaceae, including both morphological and molecular techniques [13][14][15][18][19][23][24][25]. However, all these studies include a wider sampling of the subfamily Paepalanthoideae but very few from Eriocauloideae. Molecular studies mainly included nuclear and chloroplast markers such as ITS, trnL-F, and psbA-trnH intergenic spacer [13][14][15]19,25,26]. Diaz Pena [26], for the first time, included plastome sequences to understand phylogeny and biogeography of the genus Paepalanthus subg. Platycaulon. However, the study did not mention accession numbers for the plastomes. Also, one Eriocaulon plastome (E. sexangulare L., MK193813), first for the genus, has been reported recently [27] but not yet available in public database. In spite of the availability of these plastome sequences, no attempts were made to understand the gene content, structural rearrangements, and genome evolution in the family concerning the evolution of order Poales.
In China, the genus Eriocaulon is represented by 35 species, 13 of which are endemic [28]. Some of the species have considerable use in Traditional Chinese Medicine [29][30][31]. Eriocaulon decemflorum Maxim. an important medicinal plant is distributed from China to Japan, Korea, and the Far East of Russia [28,[32][33][34]. The species occurs in rice fields, marshy places, and mountain slopes at an altitude of 1600-1700 m [28]. The species has also been tested for its antibacterial activity against Staphylococcus aureus and Pseudomonas aeruginosa [35]. As per the recent assessment, the species has been listed as vulnerable in China [36]. In the present study, the assembly, annotation, and analyses of complete plastome of E. decemflorum Maxim. is reported. Attempts were also made, for the first time, to understand the position, structural arrangements, and evolution within Poales with the insights received from the Eriocaulon plastome.

Sampling, DNA extraction, and sequencing
Fresh leaf samples of Eriocaulon decemflorum were collected from Mt. Dayang, Jinyun County, Zhejiang Province, China (August 2017, Voucher No. X.L. Xie 170189). Voucher specimens were deposited at the herbarium of Zhejiang University (HZU). The total DNA was extracted using Plant DNAzol Reagent (LifeFeng, Shanghai) according to the manufacturer's protocol from approximately 20 mg of the silica-dried leaf tissue. The high molecular weight DNA was sheared (yielding � 800 bp fragments) and the quality of fragmentation was checked on an Agilent Bioanalyzer 2100 (Agilent Technologies). The short-insert (500 bp) paired-end libraries preparation and sequencing were performed by the Beijing Genomics Institute (Shenzhen, China). The sample was pooled with others and run in a single lane of an Illumina HiSeq X10 with a read length of 150 bp.

Assembly, annotation and comparative analyses
The quality of reads was checked using software FastQC v. 0.11.7. [37]. Adapters and ends were trimmed with Cutadapt 1.16 [38], a Linux based software and Trimmomatic v 0.38 was used to filter the raw reads and to get high-quality clean reads [39]. De-novo genome assembly was carried out with curated reads using the software NOVOPlasty v.2.7.1 [40]. Forward and reverse reads with a read length of 150 bp and an average insert size of 300 bp were used for assembly. The default k-mer value of 39 was given in the configuration file. The seed input was an rbcL (ribulose-1,5-bisphosphate carboxylase/oxygenase) sequence of Eriocaulon compressum Lam. (EU832954). Since no reference plastome exists for any species of Eriocaulaceae, contigs could not be scaffolded by an automated method. Four contigs were produced after assembly. The contigs were then extended by mapping reads and other assembled contigs in Geneious Prime 2019.1.1 (www.geneious.com) until perfect overlap of at least 20 base pairs (bp) with other contigs or reads was obtained. This was repeated until the quadripartite plastome structure was completed. The orientation of IRs, LSC, and SSC regions was further confirmed by NCBI blast and graphic view. Genome annotation was performed with DOGMA [41] and using GeSeq-Annotation of Organellar Genome [42], an online tool of CHLORO-BOX (https://chlorobox.mpimp-golm.mpg.de/geseq.html). For tRNAs prediction, additional tools such as ARAGORN v1.2.38 and tRNAscan-SE v2.0 were used. Sequences of Typha latifolia L. and Ananas comosus (L.) Merr. from Bromeliads were used as the references for annotation. The circular map of plastid genome was constructed by using OGDRAW [43]. The annotation was confirmed again with Geneious prime 2019.1.1 (www.geneious.com).

Phylogenomic analyses
The phylogenetic tree was constructed using 81 Coding DNA sequences (CDS) of the plastid genome. Most of the analyses were performed using the CIPRES Science Gateway [47]. The sequences were aligned using MAFFT v7.402 [48]. Maximum Likelihood (ML) analyses were performed using IQ-TREE v. 1.6.7 [49] using GTR+F+R4 model. Ingroup consisted of 57 taxa in total belonging to Bromeliaceae (1), Typhaceae (1), Eriocaulaceae (1), Cyperaceae (3), Joinvilleaceae (1) and Poaceae (50, representing all subfamilies). Data for 19 taxa available from the study of Givnish et al. [5] was also included to have a representation of all families of the order. The outgroup was composed of ten taxa belonging to Zingiberales (S1 Table)

Genome assembly
The illumina sequencing generated 99,54,908 paired end reads. Both untrimmed and trimmed reads generated a similar number of contigs after assembly. The average organelle coverage was 20X. The largest contig scored 91.93% of total organelle genome.

Genome organization and features
The plastome of Eriocaulon decemflorum exhibits a typical quadripartite structure, with an LSC (81,477bp), SSC (17,180bp) and a pair of IRs (26,507bp) (Fig 1, Table 1). The size of the complete plastome is 151,671bp (Fig 1, Table 1). The GC content of the whole plastome is 35.8%. GC contents of LSC, SSC and IR regions are 32.6%, 27.8%, and 43.2%, respectively. IR region exhibited more GC content. Higher GC content in the IR region is due to high GC content in the rRNA genes. IR region exhibits four rRNA genes containing 52.9% of GC content.
In the genome of E. decemflorum, a total of 134 genes was predicted, including 83 proteincoding genes, 31 tRNA genes, 4 rRNA genes duplicated in the IR region. List of genes is presented in Table 2. 19 genes are duplicated in IR and 16 genes contain introns, which include 10 protein-coding genes and 6 tRNAs.

Repeat and SSR analyses
Plastome of E. decemflorum contains 21 forward, 17 palindromic, one complement and one reverse repeat ( Table 3). Size of the repeats ranged from 30 to 150. Simple sequence repeats (SSRs) are another important type of repeats in the plastome used as a genetic marker because of their length polymorphism [50]. In total 48 SSRs were found in the genome of E. decemflorum including 11 mono, 12 di, three tri, 11 tetra, two hexa, and nine compound repeats. Comparison of several repeats identified in seven other genomes of Poales and one outgroup is presented in Fig 2. In LSC, SSC and IR regions 35, 9 and 2 SSRs were found respectively. All the SSRS found in E. decemflorum were AT-rich. The highest number of SSRs were found in Carex neurocarpa (Cyperaceae).

Comparative plastomic analyses
Among the nine compared genomes, Anomochloa marantoidea (Poaceae) has the smallest plastome (138,412 bp) while Carex siderosticta has the largest plastome (195,251 bp). When all eight genomes were compared with Eriocaulon decemflorum annotation as a reference. gene order and content were found to be conserved (Fig 3).
Eriocaulon decemflorum plastome exhibited two copies of the ycf1 gene (one partial and one full length), which have been lost in the Graminids. The full-length ycf1 gene has three introns in E. decemflorum. However, a functional ycf2 gene is present in E. decemflorum, which also has been lost in the Graminids. In Bromeliads too, ycf1 and ycf2 genes are partially degraded [8]. Several indels have been reported in ycf1/2 regions between Ananas and Musa [51].
Evolution of accD gene. The accD gene encodes one of the four subunits of acetyl co-A carboxylase enzyme required for the formation of malonyl-CoA from acetyl CoA, in the first step of fatty acid synthesis [52,53]. Its absence or partial degradation in some monocots (mostly in order Poales and family Acoraceae) is known [54]. Even though the gene is lost from plastome, a multifunctional nuclear-encoded enzyme is present in some monocot species [55,56]. Moreover, this region between rbcL and psaI is considered as a hotspot as it exhibits higher rates of mutations [57,58]. Katiyama and Ogihara [54] predicted the loss of the accD gene before the divergence of Poales and Commelinales. However, Konishi et al. [59] noted the presence of accD in Cyperids and Xyrids and hence proposed that the loss occurred later after Cyperid and Xyrid divergence. Harris et al. [60] however predicted loss of accD after the splitting of Eriocaulaceae and Xyridaceae. Eriocaulon decemflorum plastome exhibited the functional copy of the accD gene. In Bromeliads, partial degradation of accD was reported [8]. However, in Musa, accD is much longer as compared to Bromeliads [51]. Several studies have confirmed the presence of the accD gene in Cyperaceae [57,59,60]; however, sequences deposited on NCBI database lack accD gene, probably unannotated. No information is available for Restiid clade. Our results corroborate with those of Harris et al. [60], which supported the theory of gene loss after the Eriocaulaceae and Xyridaceae splitting.   Loss of introns. rpoC1 encodes for the β 0 subunit of RNA polymerase and consists of a single intron in most of the land plants. However, loss of rpoC1 introns has also been reported in several lineages [61]. Katayama and Ogihara [54] noticed a loss of rpoC1 introns in all the members of Poaceae and Restiid clade. However, Morris and Duvall [11] reported the presence of rpoC1 intron in Anomochloa (Anomochloideae), one of the basal member of Poaceae. rpoC1 intron has also been reported in the Bromeliads [7,8], and our study confirms the same in Eriocaulon, which is a member of Xyrids. However, further studies are required to trace the point of rpoC1 intron loss in Poales. The other protein-coding gene clpP of Eriocaulon decemflorum has maintained its two introns. The introns have also been reported for Bromeliad members [7,8,51] while they have been lost in the Graminids. Annotations provided for three Cyperaceae members on NCBI dataset, do not exhibit introns for both the genes. However, when we annotated these genomes keeping one of the three as a reference, both the genes exhibited the presence of introns.
The gene order between Xyrids and Bromeliads appears to be conserved ( Fig 4A) and between Xyrids and Graminids is characterized by three major inversions namely, 28-kb inversion between the trnG-UCC-rps14 region, a 6-kb in the trnG-UCC-psbD region and the third in trnT and flanking region (Fig 4B). Two inversions were observed between Eriocaulon and Hypolytrum in LSC (44000-55000 bp region) and SSC (around 135000 bp region). The variations observed between Xyrids and Cyperids could be due to large genome size and longer inverted repeats reported from Cyperid genomes (S2 Fig).

Contraction and expansion of IRs
The IRs in the plastomes are divided by four junctions viz. IRb/LSC, IRb/SSC, IRa/LSC, and IRa/SSC. The contraction and expansion of IR regions differ in various plant species. Such variation has already been observed in members of Poales [8,51]. All nine genomes were compared for their IR boundaries (Fig 5). All the compared genomes have expanded IRb/LSC and IRa/LSC to add both trnH-GUG and rps19 to the IR region. The extent of IR expansion into the intergenic spacers between rps19 and rpl22 varies from 15  Eriocaulaceae plastome exhibits rps15 and ndhH genes in the IR, which is characteristic to all grasses [7]. The Cyperaceae members exhibit ndhG gene at the IRb/SSC boundary. Poales have ndhF gene in SSC region at IRb/SSC junction ranging from 5 to 398 bp away from the junction. Only in Eriocaulon, it has 1 bp in the IR region. At the IRa/SSC junction, bromeliads, xyrids, and Musa have ycf1 gene while the graminids have the ndhH gene. The Cyperids have ndhE and ndhG genes at this junction.

Phylogenomic analyses
The data matrix used for phylogenetic reconstruction was composed of 87 taxa, 77 belonging to Poales (representing 14 families) and 10 from Zingiberales as outgroup. ML analysis using IQTREE resulted in a tree having lnL of -613657.791. Eriocaulon and Syngonanthus appeared to be sisters with bootstrap value = 100 (Fig 6). Eriocaulaceae appeared sister to Mayacaceae with bootstrap value 86. However, Xyridaceae (Abolboda) appeared sister to the Restiid-Graminid clade which was in accordance with Han et al. [27]. Givnish et al. [5] tried to trace evolutionary history of the order based on plastome protein coding genes using both maximum parsimony (MP) and ML methods. MP analysis yielded Xyrids (Eriocaulaceae, Xyridaceae and Mayacaceae) as monophyletic with moderate bootstrap support. However, ML analysis resulted in Xyridaceae as sister to Restiid-Graminid clade and Mayacaceae and Eriocaulaceae appeared as sisters with strong bootstrap support. Recently, Mckain et al. [6] attempted to study evolutionary history as well as ancient polyploidy of Poales. They found that Eriocaulaceae (Lachnocaulon) and Xyridaceae (Xyris) were sisters but with very low support, and Mayacaeae (Mayaca) was not included in the analysis. Results obtained in our study are in accordance with the study of Givnish et al. [5] and Han et al. [27] where Eriocaulaceae appeared sister to Mayacaceae. Earlier studies have reported Xyrid clade as the most ambiguous clade in terms of its phylogenetic relationships [2][3][4][5][6]. In some studies, Xyridaceae and Eriocaulaceae were reported as sister families [3,4] while some suggested sister relationship of Eriocaulaceae and Mayacaceae [5,27]. However, the inclusion of more plastomes from all the three families will help in resolving relationships within this clade.

Conclusion
In the last few years, plastomes have been widely used to study phylogeny and evolution in different plant groups, as well as for reconstructing the ancestral states of angiosperms. Important advances have also been made in our understanding of the relationship within the monocots [62]. Studies based on plastome data have shown that orchids and grasses together form a monophyletic group nested within the remaining angiosperms [63]. The present study enhances our understanding of the evolution of Poales by analyzing the plastome data from the order. Understanding relationships within Eriocaulaceae has always been difficult due to minute floral characters [18]. Hybridization events have also been reported for the family [36,64]. No attempts have been made to resolve species relationships and to understand evolutionary events, though Eriocaulon is the only wide-spread genus of the family. Deletion of genes like accD, ycf1, ycf2 and intron losses in clpP and rpoC1 genes are characteristic to graminids and were not found in other groups of Poales, i.e., Bromeliads and Cyperids. Our study shows that Eriocaulon plastome exhibits the presence of accD, ycf1, and ycf2 genes, and also clpP and rpoC1 introns similar to Bromeliads. ycf1 is highly variable in terms of phylogenetic information at the level of species and has been shown to be subject to positive selection in many plant lineages [65]. In the present phylogenomic analysis, Eriocaulaceae is sister to Mayacaceae, which is in accordance with the previous study of Givnish et al. [5] and Han et al. [27]. However, the inclusion of more plastomes from Xyrids will further resolve the relationships between Xyridaceae, Mayacaceae, and Eriocaulaceae and will also help to understand evolution within Poales.  Table. NCBI accession numbers of the genomes included in phylogenomic analysis. (XLSX)