Antarctic hairgrass (Deschampsia antarctica Desv.) is the only natural grass species in the maritime Antarctic. It has been researched as an important ecological marker and as an extremophile plant for studies on stress tolerance. Despite its importance, little genomic information is available for D. antarctica. Here, we report the complete chloroplast genome, transcriptome profiles of the coding/noncoding genes, and the posttranscriptional processing by RNA editing in the chloroplast system.
The complete chloroplast genome of D. antarctica is 135,362 bp in length with a typical quadripartite structure, including the large (LSC: 79,881 bp) and small (SSC: 12,519 bp) single-copy regions, separated by a pair of identical inverted repeats (IR: 21,481 bp). It contains 114 unique genes, including 81 unique protein-coding genes, 29 tRNA genes, and 4 rRNA genes. Sequence divergence analysis with other plastomes from the BEP clade of the grass family suggests a sister relationship between D. antarctica, Festuca arundinacea and Lolium perenne of the Poeae tribe, based on the whole plastome. In addition, we conducted high-resolution mapping of the chloroplast-derived transcripts. Thus, we created an expression profile for 81 protein-coding genes and identified ndhC, psbJ, rps19, psaJ, and psbA as the most highly expressed chloroplast genes. Small RNA-seq analysis identified 27 small noncoding RNAs of chloroplast origin that were preferentially located near the 5′- or 3′-ends of genes. We also found >30 RNA-editing sites in the D. antarctica chloroplast genome, with a dominance of C-to-U conversions.
We assembled and characterized the complete chloroplast genome sequence of D. antarctica and investigated the features of the plastid transcriptome. These data may contribute to a better understanding of the evolution of D. antarctica within the Poaceae family for use in molecular phylogenetic studies and may also help researchers understand the characteristics of the chloroplast transcriptome.
Citation: Lee J, Kang Y, Shin SC, Park H, Lee H (2014) Combined Analysis of the Chloroplast Genome and Transcriptome of the Antarctic Vascular Plant Deschampsia antarctica Desv. PLoS ONE 9(3): e92501. https://doi.org/10.1371/journal.pone.0092501
Editor: Szabolcs Semsey, Niels Bohr Institute, Denmark
Received: December 22, 2013; Accepted: February 22, 2014; Published: March 19, 2014
Copyright: © 2014 Lee et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was supported by Functional Genomics on Polar Organisms grant (PE13020) and the Basic Research Program (PE13120) funded by Korea Polar Research Institute (KOPRI). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Chloroplasts are plant-specific organelles that conduct photosynthesis, providing essential energy for the synthesis of starch, fatty acids, pigments, and amino acids , . Chloroplasts contain DNA and their own genetic information. In higher plants, chloroplast genomes exist as circular DNA, with the size ranging from 120 kb to 150 kb, and generally have a highly conserved quadripartite organization composed of two copies of inverted repeats (IRs), which separate the large single copy (LSC) and small single copy (SSC) regions , . In vascular plants, chloroplast genomes usually contain 110–130 unique genes encoding 4 rRNAs, 30–31 tRNAs, and 80–90 proteins; these encode ribosomal proteins and RNA polymerase subunits involved in protein synthesis, thylakoid proteins, and the Rubisco large subunit for photosynthesis, as well as protein subunits for an NADH dehydrogenase complex, which mediates redox reactions , . Advances in high-throughput sequencing technologies have resulted in the full sequences of organelle genomes from a growing number of organisms . Currently, plastid genome resources with >420 records have been established. These provide a vast amount of high-resolution information that can be exploited in phylogenetic and ecological studies, making it possible to track the evolutionary history of a species after obtaining the full sequence of its chloroplast genome.
The grass family (Poaceae), which occurs in nearly every terrestrial habitat, is one of the most diverse angiosperm families, including approximately 10,000 species over 700 genera. To date, 38 chloroplast genomes of grass species [32 from the BEP (Bambusoideae, Ehrhartoideae, Pooideae) clade and 6 from the PACMAD (Panicoideae, Arundinoideae, Chloridoideae, Micrairoideae, Aristidoideae, and Danthonioideae) clade] have been deposited into the GenBank database, and recent studies have tried to reconstruct the phylogeny of the subfamilies and genera in the Poaceae family using whole sequences of chloroplast genomes , .
Extremophile plants have evolved tolerance overcoming unfavorable environmental conditions, such as freezing temperatures, drought, high salinity, and high UV radiance. The genetic information on such species provides clues for the evolutionary or geological history of the species, as well as resources for genetic engineering. Antarctic hairgrass (Deschampsia antarctica Desv.) is the only native grass species that thrives in the harsh environment of Antarctica . As an extremophile, it may be useful as a source of genes associated with stress tolerance . It has also been suggested as an ecological marker of global warming because of its successful adaptation to climate change and its rapid spread , . Despite the importance of this terrestrially isolated plant, its phylogenetic position is still controversial –, and available genetic resources are limited.
Here, we obtained the complete chloroplast genome sequence of D. antarctica by high-throughput sequencing and de novo assembly. By comparison with the chloroplast genomes from other representative members of the BEP clade, we explored the deep-phylogenetic relationship of D. antarctica to other grass species at the genomic level. In addition, using combinatorial analysis of the RNA-seq data, we conducted high-resolution mapping of the chloroplast-derived transcripts to a reference chloroplast genome to demonstrate transcriptome profiles of the coding and noncoding genes and the posttranscriptional processing by RNA editing in the chloroplasts of D. antarctica. These data may contribute to a better understanding of the evolution of D. antarctica within the Poaceae family and the characteristics of the chloroplast transcriptome.
This study including sample collection and experimental research conducted on these materials was according to the law on activities and environmental protection to Antarctic approved by the Minister of Foreign Affairs and Trade of the Republic of Korea.
Deschampsia antarctica Desv. (Poaceae) plants growing under natural conditions were collected in the vicinity of the Korean King Sejong Antarctic Station (62°14′29″S, 58°44′18″W) on the Barton Peninsula of King George Island and then transferred to the lab and grown hydroponically, supplemented with 0.5× Murashige and Skoog (MS) medium containing 2% sucrose under a 16∶8 h light:dark cycle with a light intensity of 150 μmol m−2 s−1 at 15°C, a temperature that results in high Rubisco activity in D. antarctica .
DNA and RNA Sequencing
Total genomic DNA was extracted from leaf tissues using the DNeasy Plant Mini Kit (Qiagen, Valencia, CA, USA) according to the manufacturer's instructions. Total RNA was extracted from whole plants using the RNeasy Plant Mini Kit (Qiagen). For the small noncoding RNA library, total RNA was extracted from leaves using the mirVana Kit (Ambion, Austin, TX, USA). The quality of the RNA and DNA was checked on a Bioanalyzer 2100 (Agilent, Santa Clara, CA, USA). The libraries were prepared and sequenced according to the manufacturer's instructions (Illumina, San Diego, CA, USA). The DNA library was constructed using TruSeq DNA sample preparation kits and a single lane of an Illumina HiSeq2000 sequencer (PE, 2×101 bp). For the mRNA library, multiplex libraries were obtained using TruSeq RNA sample preparation kits, and the samples were sequenced in one lane of an Illumina HiSeq2000 sequencer (PE, 2×101 bp). The small RNA library was constructed using the TruSeq Small RNA Sample Prep Kit; the resulting single end library was sequenced in one lane of an Illumina GAIIX sequencer (SE, 1×35 bp). The files containing the sequences and quality scores of reads were deposited in the NCBI Short Read Archive, and the accession numbers are SRX465632 (genomic DNA-Seq), SRX465633 (mRNA-Seq), and SRX465634 (Small RNA-Seq).
Genome Assembly, Annotation, and Sequence Analysis
After trim of low quality reads and adapters, the raw reads were aligned to 330 publicly available chloroplast genomes downloaded from NCBI organelle genome resources. De novo assembly was done with the collected chloroplast-related reads by Celera Assembler 6.1 (Celera Genomics, Alameda, USA). The assembled contigs were ordered with reference chloroplast genomes of two ryegrass species, Lolium multiforum (NC_019651) and Festuca altissima (JX871939), which were identified as the top-hit species when the input reads were blasted against the nr database. The gaps were filled by realignment of input reads using Geneious R6 v6.1.5 (Biomatters Ltd., Auckland, New Zealand) and PCR-based Sanger sequencing using primers designed for gap-flanking regions (Table S1). The sequences from the junction and highly variable region were validated by Sanger sequencing. The complete plastome was annotated using the online software DOGMA with default parameters . Repeat sequences were analyzed using REPuter .
Complete plastome sequences of nine Poaceae species (accession numbers are listed in Table S2) were aligned using the LAGAN program within the mVISTA online suite of computational tools . Default parameters were applied, and the annotation framework of the perennial ryegrass chloroplast genome was used. The percentage identity between each plastome, all relative to that of D. antarctica, was subsequently visualized using an mVISTA plot . The plastome-based phylogeny was reconstructed for the nine Poaceae species using the whole plastome alignment generated by LAGAN. The phylogenetic tree was constructed through the method of maximum parsimony, as implemented by MEGA 5.2 . Sites with gaps or missing data were excluded from the analysis, and statistical support was achieved through bootstrapping using 1000 replicates.
Transcriptome and Small Noncoding RNA Analysis
We analyzed in-house RNA-seq data libraries generated from two sets of RNAs (mRNA and small RNA), obtained as described above. For transcriptome analysis, we analyzed combined data sets of mRNAs and small RNAs. The reads of the combined data sets were mapped to the complete chloroplast genome, and the filtered reads were collected using the Bowtie 2.0 program with mismatch ≤2 bp . The filtered reads were remapped according to the genome annotation using Cufflinks to calculate the fragments per kilobase of exon per million fragments mapped (FPKM) values of the transcripts and TopHat for alignment of transcript variants . For small noncoding RNA analysis, we collected the reads in the size range of 20–24 nt from the small RNA data set. The size-filtered reads were mapped using Bowtie 2.0 with the criterion of zero mismatch. To search for RNA-editing sites in the chloroplast genome, putative target sites were predicted using two independent methods: 1) the PREP-chloroplast  search program using the chloroplast-genome sequence and 2) SAMtools/BCFtools, which calls single-nucleotide polymorphisms (SNPs) and indels by comparing transcripts against references . After prediction, the candidate sites were manually examined in the transcriptome data using the Integrative Genomics Viewer (IGV) genome browser.
Chloroplast Genome Assembly and Validation
Illumina paired-end sequencing produced 153,346,825 raw reads with a sequence length of 101 bp and a total base number of 15,488,029,325. After quality trim and alignment of the raw reads against the publicly available chloroplast genomes reported in NCBI, we collected 1,985,544 chloroplast-related paired reads with 191,735,269 bases. The subsequent de novo assembly resulted in 18 large contigs >3 kb (max: 50,269 bp, min: 3,046 bp). To order the contigs, the chloroplast genomes of L. multiforum, and F. altissima were used as references because these species were identified as the top-hit species when the input reads were blasted against the nr database. The resulting gaps were filled by alignment of the input reads using the Geneious program and PCR-based Sanger sequencing. The sequences from the junction regions (LSC–IRA, LSC–IRB, SSC–IRA, SSC–IRB) and the regions with high interspecific variability were validated by Sanger sequencing. The final D. antarctica chloroplast genome sequence has been submitted to GenBank (Accession No. KF887484).
Genome Organization and Gene Content
The size of the D. antarctica chloroplast genome was 135,362 bp, similar in range as other Poaceae species, with a typical quadripartite structure (Figure 1). The LSC and SSC regions were 79,881 bp and 12,519 bp in size, respectively, separated by a pair of inverted repeats (IRa and IRb), which were both 21,481 bp in length. The GC content of the D. antarctica chloroplast genome was 38.3%, consistent with other reported Poaceae chloroplast genomes. The GC contents of the LSC and SSC regions were 36.3% and 32.4%, respectively, whereas that of the IR region was 43.85%.
Genes lying outside of the outer circle are transcribed clockwise, while those inside the circle are transcribed counterclockwise. Genes belonging to different functional groups are color coded. The innermost darker gray corresponds to GC, while the lighter gray corresponds to AT content. IR, inverted repeat; LSC, large single copy region; SSC, small single copy region.
The D. antarctica chloroplast genome contained 81 unique protein-coding genes, 12 of which were duplicated in the IR, including rps7, rps12, rps15, rps19, rpl2, rpl23, ycf1, ycf2, ycf15, ycf68, ndhB, and partial ndhH. Additionally, 29 unique tRNA genes, representing all 20 amino acids, were distributed throughout the genome (1 in the SSC region, 20 in the LSC region, and 8 in the IR region). Four rRNA genes were also identified, with complete duplication in the IR regions. Altogether, the D. antarctica chloroplast genome contained 114 unique genes (Table 1). Among them, 14 genes contained a single intron (9 protein-coding genes and 5 tRNA genes), while ycf3 contained two introns. Of the 15 genes with introns, 10 were located in the LSC (7 protein-coding genes and 3 tRNAs; 9 contained one intron and 1 contained two introns), 1 in the SSC (a protein-coding gene with a single intron), and 4 in the IR region (2 protein coding genes and 2 tRNAs, all 4 containing a single intron) (Table 2). The rps12 gene is a trans-spliced gene with a 5′-end exon located in the LSC region and duplicated 3′-end exons located in the IR region. The trnK-UUU gene contained the largest intron (2,486 bp), which included the matK gene.
On the basis of the sequences of protein-coding genes and tRNA genes within the chloroplast genome, the frequency of codon usage was deduced (Table 3). Among these codons, 2,466 (11.22%) encode for leucine, while 321 (1.46%) encode for cysteine, which are the most and least used amino acids, respectively. The codon usage is biased toward a high representation of A and T at the third codon position, which is similar to a previous report .
Comparison with Other Poaceae Chloroplast Genomes
The availability of multiple complete Poaceae chloroplast genomes provides an opportunity to compare sequence variation within the family at the genome-level. The sequence identity of seven Poaceae chloroplast genomes was plotted using the mVISTA program, with the annotation of D. antarctica as a reference (Figure 2, percent identity plot, as summarized in Table S3). The whole aligned sequences indicate that the Poaceae chloroplast genomes are rather conservative, although some divergent regions were found between these genomes. Similar to other plant species, the coding region is more conservative than the noncoding counterpart. Of all genes, ycf1 appears to be the most divergent pseudogene. In addition, rpl32, ycf2, and rpoC2 also displayed high sequence divergence. The noncoding regions showed a higher sequence divergence than the coding regions among the eight Poaceae chloroplast genomes. In the alignment sequences, several intergenic regions were found to display high divergence, including trnG(UCC)-trnfM(CAU), trnY(GUA)-trnD(GUC), ndhF-rpl32, and rpl32-trnL(UAG). In addition, the intron sequences from trnK(UUU), trnL(UAA), and ndhA showed high sequence divergence.
The top line shows genes in order (transcriptional direction indicated by arrows). The sequence similarity of the aligned regions between Deschampsia antarctica and the other seven species is shown as horizontal bars indicating the average percent identity between 50% and 100% (shown on the y-axis of the graph). The x-axis represents the coordinate in the chloroplast genome. Genome regions are color coded as protein-coding (exon), tRNA or rRNA, and conserved noncoding sequences (CNS).
The length variation was also examined among D. antarctica and the eight Poaceae chloroplast genomes. The most interesting region with length variation was the rbcL-psaI region, which contains four gene regions and three intergenic regions (Figure 3). The variation of gene region was detected in the presence of an rpl23 translocation product and an accD pseudogene in the region between rbcL and psaI. The rpl23 gene was absent from L. perenne, F. arundinacea, and Brachypodium distachyon, and was present in the five other analyzed Poaceae species, including D. antarctica. Remnants of the accD gene were detected in D. antarctica, L. perenne, F. arundinacea, and Hordeum vulgare. This pseudogene was identified in rice but was not predicted in the other species according to DOGMA. The variation in size of the intergenics regions was also detected among species of the Pooideae subfamily. Three intergenic regions occurred between the rbcL and psaI genes. The intergenic region between rbcL and rpl23 ranged from 288 bp (D. antarctica) to 498 bp (Triticum aestivum). Between rpl23 and accD, it ranged from 0 bp (B. distachyon) to 661 bp (H. vulgare), and between accD and psaI, it ranged from 141 bp (B. distachyon) to 392 bp (Agrostis stolonifera). In cases when a particular gene was absent, the boundaries of the intergenic regions were determined based on homologies between the species.
The genes and intergenic regions between rbcL and psaI are indicated by boxes, with the length presented in bp. (Lp: Lolium perenne, Fa: Festuca arundinacea, As: Agrostis stolonifera, Hv: Hordeum vulgare, Ta: Triticum aestivum, Bd: Brachypodium distachyon, Os: Oryza sativa subsp. japonica).
Phylogenomic analysis of representatives from the Pooideae subfamily, including D. antarctica, produced a single, well-supported tree using maximum parsimony (Figure 4). The tree is well congruent with respect to species, and the two outgroup species belonging to the BEP clade (Bambusa oldhamii from Bambusoideae and Oryza sativa subsp. japonica from Ehrhartoideae) are basal to the remaining species in a separate resolved clade.
Repeat Sequence Analysis
Repeat regions of DNA are an important factor in genome recombination and rearrangement. We identified 69 repeats in D. antarctica, including 43 forward, 24 palindromic, and 2 reverse repeats with a length >20 bp and a sequence identity e-value <10−3, using the REPuter program (Table S4). Among the 69 repeats, 58 (84%) were 25–80 bp in length, 51 (63%) were 25–40 bp in length, and 10 (21%) were 41–80 bp in length. The repeats were mostly located in the intergenic sequences (54%), followed by coding sequences (37%) and intronic sequences (9%). The structure of the repeats in the other seven Poaceae species was also analyzed using REPuter. The majority of repeats in Poaceae species within the size range of 25–80 bp commonly are forward or palindromic (Figure 5). The total number of repeats varied among species (D. antarctica: 69, L. perrene: 72, F. arundinacea: 59, A. stolonifera: 50, B. distachyon: 60, H. vulgare: 67, T. aestivum: 79, O. sativa: 78, B. oldhamii: 74). The repeat pattern in D. antarctica was more similar with L. perenne and F. arundinacea in the Poeae tribe than with B. oldhamii from the Bambusoideae. For example, repeats in the size range of 41–80 bp represent ≤20% of the total number of repeats in species of the Pooideae subfamily, whereas they represent >28% of the total in O. sativa and B. oldhamii.
Repeat sequences are compared among eight chloroplast genomes in the Poaceae family. To identify repeat sequences, the REPuter program was used. Repeats with length >20 bp and sequence identity e-value <10−3 were selected and categorized to four types based on their orientations (F: forward, P: palindromic, R: reverse).
We performed an expression analysis of the 81 chloroplast protein-coding genes using in-house RNA-seq data from leaf tissues of D. antarctica (Lee et al., unpublished data). The short reads were mapped to the D. antarctica chloroplast genome, and the numbers of reads corresponding to coding genes were calculated and normalized according to gene length (Table 4). The most abundant genes were ndhC, psbJ, rps19, psaJ, and psbA, with FPKM value >10,000. Thirteen genes (ccsA, ndhI, rpoA, rpoC2, rps2, ndhA, ndhD, ycf1, rps11, rps3, ycf2, rpoC1, and rpoB) had low expression, with FPKM value <100.
A total of 247,904 reads mapped to the protein coding region. Among these, 89,675 (36.2%) and 73,054 reads (29.5%) were generated from genes encoding components of the cyclic electron transfer system and photosystem II (PSII) complex, respectively. In addition, among the 18 highly expressed genes (FPKM value >2,000), 10 genes were found to encode subunits of the PSII complex (psbA, psbB, psbE, psbF, psbH, psbI, psbL, psbM, psbN, and psbT). In contrast, rpoA, rpoB, rpoC1, and rpoC2, which encode plastid RNA polymerase, showed very low expression.
RNA editing is a sequence-specific posttranscriptional modification resulting in conversion, insertion, and deletion of nucleotides in a precursor RNA. Such modifications are observed across organisms. In plants, RNA editing has been reported to occur with C-to-U or U-to-C (rare) conversions in mitochondria and plastids .
In the Deschampsia chloroplast genome, we first predicted 37 RNA-editing sites out of 16 genes using the PREP-chloroplast program (Table S5). Using another method, we aligned read sequences from the RNA-seq data using variant searching tools comparing transcripts against a reference genome and confirmed 30 editing sites. The 30 nucleotide substitutions occur in 23 genes in the D. antarctica chloroplast genome, which results in 25 non-synonymous amino acid changes (Table 5). Of the substitutions, 17 (54.8%) were C-to-U conversions, resulting in 14 non-synonymous amino acid changes. In contrast, only 1 edit was a U-to-C conversion with synonymous base change. Although RNA editing of plant plastids has been shown to be conversions of C to U and U to C, we observed different versions of edits, including 3 A-to-Cs, 3 A-to-Gs, 3 G-to-As, 1 G- to- C, 1 U-to-A, 1 A-to-U, and 1 U-to-G in 13 sites.
We calculated the ratio between the number of reads with an alternate base and the number of reads with the same base as the reference. The percentages of the conversion rates of each edit varied with the locus (16–100%) (Table 5). However, some edits with C-to-U conversion in several genes showed very high editing rates (>90%), especially for atpA, ycf3, ndhK, petB, rpoA, rps8, ndhD, ndhG, and ndhA, suggesting that the edited RNAs for these gene are common forms in the processed RNA pools in D. antarctica.
Discovery of Plastid Small Noncoding RNA in D. antarctica
Numerous small noncoding RNAs have been identified in the nuclear genomes of bacteria and eukaryotes. Small noncoding RNAs are also transcribed from mitochondria and plastid genomes –. In this study, we screened for small noncoding RNAs from our deep sequencing data in the small RNA library generated from D. antarctica leaf tissues. The reads between 20 and 24 nt in length were mapped to the chloroplast genome with 100% identity. In total, 12,753,636 reads were distributed unevenly throughout the chloroplast genome (Figure 6), including coding regions of psbA and rbcL, intergenic regions, regions encoding several tRNA genes, and inverted repeat regions in which most of the rRNA genes exist. To exclude RNA fragments that may have been generated from abundant RNA species, we compared the distribution of reads that were 20–24 nt in length with those longer than 30 nt. As a result, we identified 27 loci where short noncoding RNAs (sRNAs) of 20–24 nt length with unique sequences were abundantly expressed (Table 6).
The reads from small RNA-seq were divided into two groups according to the length (20–24 nt and >30 nt) and aligned to the D. antarctica chloroplast genome with 100% identity. The distributions of reads were compared between the two groups. In total, 12,753,636 reads were distributed unevenly in the chloroplast genome with high density in the coding regions of psbA and rbcL, intergenic regions, and inverted repeat regions in which most of the rRNA genes exist. The 27 loci enriched with 20–24 nt RNAs are indicated in red, along with the number of reads. The y-axis shows the number of reads (from 0 to 1000).
The D. antarctica plastid sRNAs were not evenly distributed throughout the genome. The relative positions of the sRNAs showed that 19 of 27 (71%) were located in the noncoding regions (18 in intergenic regions and 1 in an intronic region). In particular, 30% and 11%, respectively, of the intergenic sRNAs were located at the 5′- and 3′-ends of genes (>100 bp from the start or termination codons) (Figure 7). Fifteen (55.6%) sRNAs were located within −150 to +50 bp from the start codon of genes, suggesting that proximity to the 5′-ends of genes is important.
a Relative locations of plastid small RNAs according to the gene structure; b examples of small RNAs located proximal to the 5′ ends of the coding genes; c examples of small RNAs located proximal to the 3′ end of the coding genes.
To determine if the identified sRNAs are evolutionarily conserved, we compared the sequences of 27 sRNAs in D. antarctica with the sRNAs reported for other plant species by multiple sequence alignment , . In total, we found that 13 sRNAs have orthology with the plastid sRNAs found in Arabidopsis, rice, or barley (Figures 8, Figure S1, and Table 6). Among the pairs identified, four sRNAs (psbH-petB, atpH 5′end, ndhB 5′end, and petD_rpoA) showed >90% sequence homology, and their locations within the genome were the same in all of the species examined, suggesting these plastid sRNAs may be evolutionarily conserved across angiosperms (Figure 8).
To determine if the identified sRNAs are evolutionarily conserved, Deschampsia antarctica sRNAs were compared with the plastid sRNAs identified in Arabidopsis, rice, or barley , . The sequence aligments of sRNAs which have >90% sequence homology are shown. The multiple sequence alignments were performed with ClustalW2 algorithm (http://www.ebi.ac.uk/Tools/msa/clustalw2/) and visualized with Jalview program . The consensus sequences between ortholog sRNAs were shown at the bottom of each alignment.
We obtained the completed sequence of the chloroplast genome of D. antarctica using whole genome sequencing data from total genomic DNA from leaves. As previous studies have reported, aligning all the reads against the plastid genome database allow the rapid and efficient assembly of the chloroplast genome , , . By this method, we identified 1.2% of the total genomic reads as chloroplast-related sequences.
The chloroplast genome of D. antarctica has the typical features found in the genomes of other Poaceae species. The size of its genome and the ratio of GC content is 135,362 bp and 38.3%, respectively, similar to other Poaceae species. The subfamily Pooideae, which includes one-third of all grass species, has been divided into 13 tribes , but recent analyses have demonstrated wide variations between them. For example, neither Poeae nor Aveneae are monophyletic, and the components of these two groups are intermixed within a clade , . Traditional morphological phylogenetic studies placed Deschampsia within the tribe Aveneae. However, molecular studies inferred alternative phylogenetic positions of Deschampsia (i.e., Aveneae or Poeae), depending on the target sequences used for examination or the parameters used for grouping , , –. In this study, we revised the phylogenetic position of D. antarctica using complete sequences of chloroplast DNA. A comparative analysis based on both whole plastome and open reading frame sequences of coding genes suggest that D. antarctica is more closely related with species in the Poeae tribe than the Aveneae tribe. This is in agreement with the results of Davis and Soreng , Catalan et al. , and Nadot et al. , in which Deschampsia forms a closer relationship with species of the Poeae than with those of Aveneae, as suggested by Souto et al.  and Hsiao et al. . However, in our genome structure analysis, we found an interesting region (rbcL–psaI) where both the rpl23 translocation product and accD pseudogene were found. This appears to be specific to Deschampsia, since other Poeae or Aveneae species have kept only one remnant of accD or rpl23 in the region, suggesting that this region could be molecular evidence for an intermixed lineage of Deschampsia.
For the transcriptome analysis of the chloroplast genome, we utilized RNA-seq data from libraries generated by two preparation methods (mRNA-seq and small RNA-seq). We found that a significant proportion of the reads from RNA-seq data represent the organelle derived sequences, suggesting that the eukaryotic RNA-seq results are very good resources for a functional study of genes in organelles.
The transcriptome analysis of D. antarctica plastid RNAs revealed several interesting aspects of RNA metabolism. First, a search of the variant transcripts revealed numerous RNA-editing sites in the D. antarctica chloroplast genome. RNA editing has been observed in the chloroplasts of extant descendants of early land plants other than liverworts and mosses. In angiosperm plastids, RNA editing is mostly restricted to a C-to-U conversion, and the conversion occurs at about 30 different positions, whereas hornworts and fern plastids extensively edit U-to-C as well as C-to-U at >300 different positions . A comparative analysis of eight land plants, including hornworts, ferns, and seed plants, suggested that chloroplast RNA editing is of monophyletic origin and evolved as a system to generate new variations . Our transcriptome analysis revealed in situ editing sites beyond those predicted by computational tools (Table 4 vs. Table S5). According to the variant transcript search, the major form of RNA-editing is C-to-U conversion (54.8%), and the conversion rate of C-to-U edits (>90%) is much higher than those of other edits. Some edits with C-to-U conversion in several genes, such as atpA, ycf3, ndhK, petB, rpoA, rps8, ndhD, ndhG, and ndhA, have been reported in other species , indicating that these edits are functionally conserved in plants. Comparison between the whole genome DNA and transcriptome data also showed that various versions of edits exist and that their respective conversion rates differ. The difference in conversion rates among edits might be the result of tissue-specific, gene-specific, or developmental stage-specific RNA-editing patterns. Considering that mitochondrial RNA editing occurs with developmental and tissue specificity in plants –, exploring whether tissue-disparity exists in plastid RNA-editing and the regulatory mechanisms that underlie it would be worthwhile.
We identified 27 plastid small noncoding RNAs in the D. antarctica chloroplast genome by high-resolution mapping of the transcriptome data. In Arabidopsis, rice, maize, and barley, small RNAs are expressed in plastids and their sequences correlate with the termini of processed mRNA , . These studies also suggested that the small RNAs are footprints of the RNA-binding pentatricopeptide repeat (PPR) proteins, which protect RNAs from exonucleolytic degradation. Our results support this hypothesis. We observed a large amount of small RNAs expressed in the D. antarctica plastid, and these RNAs were not randomly distributed but were located in intergenic regions preferentially near the 5′- or 3′-ends of coding regions. This suggests that many small RNAs are evolutionarily conserved in their sequences and locations, which might have resulted from the functionally conserved gene regulatory system of higher plants.
Using Illumina high-throughput sequencing technology, we obtained the complete sequence of the D. antarctica chloroplast genome. This is the first chloroplast genome sequenced from a plant species endemic to Antarctica. Sequence divergence analysis with other plastomes of the BEP clade in the grass family suggests a sister relationship between D. antarctica and two species of the Poeae tribe, F. anrundinacea and L. perenne. In addition, we conducted high-resolution mapping of the chloroplast-derived transcripts resulting from RNA-seq data. As a result, we could make an expression profile for 81 protein-coding genes and proposed ndhC, psbJ, rps19, psaJ, and psbA as the most highly expressed chloroplast genes in D. antarctica. Analysis of small RNA-seq revealed that 27 small noncoding RNAs are preferentially located close to the 5′- or 3′-ends of genes. Also, >30 RNA-editing sites were found in the D. antarctica chloroplast genome, with a predominance of C-to-U conversions. These will be very useful for molecular phylogeny studies of the evolution of Antarctic plants and for transcriptome studies specific to plant organelles.
Comparison of small RNA sequences from different species.
List of primer pairs used in sequence verification and improvement of the Deschampsia antarctica chloroplast genome.
The GenBank accession numbers of all eight chloroplast genomes used for phylogenetic analysis.
Comparison of homologs between the Deschampsia antarctica chloroplast genome and Lolium perenne (Lp), Festuca arundinacea (Fa), Agrostis stolonifera (As), Hordeum vulgare (Hv), Triticum aestivum (Ta), Brachypodium distachyon (Bd), and Oryza sativa subsp. japonica (Os) by the percent identity of coding and noncoding regions.
Repeat sequences in the Deschampsia antarctica chloroplast genome.
The 37 RNA-editing sites predicted by the PREP-cp program.
The authors wish to thank Dr. Sanghee Kim (Korea Polar Research Institute) for providing support of analysis software.
Conceived and designed the experiments: JL HL HP. Performed the experiments: JL YK. Analyzed the data: JL YK HL SCS. Wrote the paper: JL HL HP.
- 1. Neuhaus HE, Emes MJ (2000) Nonphothosynthetic metabolism in plastids. Annual Review of Plant Physiology and Plant Molecular Biology 51: 111–140.
- 2. Wicke S, Schneeweiss G, dePamphilis C, Müller K, Quandt D (2011) The evolution of the plastid chromosome in land plants: gene content, gene order, gene function. Plant Molecular Biology 76: 273–297.
- 3. Chumley TW, Palmer JD, Mower JP, Fourcade HM, Calie PJ, et al. (2006) the complete chloroplast genome sequence of Pelargonium × hortorum: organization and evolution of the largest and most highly rearranged chloroplast genome of land plants. Molecular Biology and Evolution 23: 2175–2190.
- 4. Yang M, Zhang X, Liu G, Yin Y, Chen K, et al. (2010) The complete chloroplast genome sequence of Date Palm Phoenix dactylifera L. PLoS ONE. 5: e12762.
- 5. Bock R (2007) Structure, function, and inheritance of plastid genomes. In: Bock R, editor. Cell and Molecular Biology of Plastids: Springer Berlin Heidelberg. pp. 29–63.
- 6. Moore M, Dhingra A, Soltis P, Shaw R, Farmerie W, et al. (2006) Rapid and accurate pyrosequencing of angiosperm plastid genomes. BMC Plant Biology 6: 17.
- 7. Wu Z-Q, Ge S (2012) The phylogeny of the BEP clade in grasses revisited: Evidence from the whole-genome sequences of chloroplasts. Molecular Phylogenetics and Evolution 62: 573–578.
- 8. Hand ML, Spangenberg GC, Forster JW, Cogan NOI (2013) Plastome sequence determination and comparative analysis for members of the Lolium-Festuca grass species complex. G3: Genes|Genomes|Genetics 3: 607–616.
- 9. Alberdi M, Bravo LA, Gutiérrez A, Gidekel M, Corcuera LJ (2002) Ecophysiology of Antarctic vascular plants. Physiologia Plantarum 115: 479–486.
- 10. Lee J, Noh E, Choi H-S, Shin S, Park H, et al. (2013) Transcriptome sequencing of the Antarctic vascular plant Deschampsia antarctica Desv. under abiotic stress. Planta 237: 823–836.
- 11. Xiong FS, Mueller EC, Day TA (2000) Photosynthetic and respiratory acclimation and growth response of Antarctic vascular plants to contrasting temperature regimes. American Journal of Botany 87: 700–710.
- 12. Souto DPF, Catalano SA, Tosto D, Bernasconi P, Sala A, et al. (2006) Phylogenetic relationships of Deschampsia antarctica (Poaceae): Insights from nuclear ribosomal ITS. Plant Systematics and Evolution 261: 1–9.
- 13. Davis JI, Soreng RJ (2007) A preliminary phylogenetic analysis of the grass subfamily pooideae (Poaceae), with attention to structural features of the plastid and nuclear genomes, including an intron loss in GBSSI. Aliso: A Journal of Systematic and Evolutionary Botany 23: Article 27.
- 14. GPWG (2001) Phylogeny and subfamilial classification of the grasses (Poaceae). Annals of the Missouri Botanical Garden 88: 373–457.
- 15. Pérez-Torres E, García A, Dinamarca J, Alberdi M, Gutiérrez A, et al. (2004) The role of photochemical quenching and antioxidants in photoprotection of Deschampsia antarctica. Functional Plant Biology 31: 731–741.
- 16. Wyman SK, Jansen RK, Boore JL (2004) Automatic annotation of organellar genomes with DOGMA. Bioinformatics 20: 3252–3255.
- 17. Kurtz S, Schleiermacher C (1999) REPuter: fast computation of maximal repeats in complete genomes. Bioinformatics 15: 426–427.
- 18. Brudno M, Malde S, Poliakov A, Do CB, Couronne O, et al. (2003) Glocal alignment: finding rearrangements during alignment. Bioinformatics 19: i54–i62.
- 19. Frazer KA, Pachter L, Poliakov A, Rubin EM, Dubchak I (2004) VISTA: computational tools for comparative genomics. Nucleic Acids Research 32: W273–W279.
- 20. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, et al. (2011) MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Molecular Biology and Evolution 28: 2731–2739.
- 21. Langmead B, Trapnell C, Pop M, Salzberg S (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10: R25.
- 22. Trapnell C, Roberts A, Goff L, Pertea G, Kim D, et al. (2012) Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protocols 7: 562–578.
- 23. Mower JP (2009) The PREP suite: predictive RNA editors for plant mitochondrial genes, chloroplast genes and user-defined alignments. Nucleic Acids Research 37: W253–W259.
- 24. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, et al. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078–2079.
- 25. Zhang Y, Nie X, Jia X, Zhao C, Biradar SS, et al. (2012) Analysis of codon usage patterns of the chloroplast genomes in the Poaceae family. Australian Journal of Botany 60: 461–470.
- 26. Germain A, Hotto AM, Barkan A, Stern DB (2013) RNA processing and decay in plastids. Wiley Interdisciplinary Reviews: RNA 4: 295–316.
- 27. Lung B, Zemann A, Madej MJ, Schuelke M, Techritz S, et al. (2006) Identification of small non-coding RNAs from mitochondria and chloroplasts. Nucleic Acids Research 34: 3842–3852.
- 28. Ruwe H, Schmitz-Linneweber C (2012) Short non-coding RNA fragments accumulating in chloroplasts: footprints of RNA binding proteins? Nucleic Acids Research 40: 3106–3116.
- 29. Zhelyazkova P, Hammani K, Rojas M, Voelker R, Vargas-Suárez M, et al. (2012) Protein-mediated protection as the predominant mechanism for defining processed mRNA termini in land plant chloroplasts. Nucleic Acids Research 40: 3092–3105.
- 30. Zhang T, Zhang X, Hu S, Yu J (2011) An efficient procedure for plant organellar genome assembly, based on whole genome data from the 454 GS-FLX sequencing platform. Plant Methods 7: 38.
- 31. Wang W, Messing J (2011) High-throughput sequencing of three Lemnoideae (Duckweeds) chloroplast genomes from total DNA. PLoS ONE 6: e24670.
- 32. Davis JI, Soreng RJ (2000) Phylogenetic structure in Poaceae subfamily Pooideae as inferred from molecular and morphological characters: misclassification versus reticulation. In: Jacobs SWL, Everett J, editors. Grasses: systematics and evolution. Collingwood, Victoria, Austrailia: CSIRO Publishing. pp. 61–74.
- 33. Catalán P, Kellogg EA, Olmstead RG (1997) Phylogeny of Poaceae subfamily pooideae based on chloroplast ndhF gene sequences. Molecular Phylogenetics and Evolution 8: 150–166.
- 34. Nadot S, Bajon R, Lejeune B (1994) The chloroplast gene rps4 as a tool for the study of Poaceae phylogeny. Plant Systematics and Evolution 191: 27–38.
- 35. Hsiao C, Chatterton NJ, Asay KH, Jensen KB (1995) Molecular phylogeny of the Pooideae (Poaceae) based on nuclear rDNA (ITS) sequences. Theoretical and Applied Genetics 90: 389–398.
- 36. Stern DB, Goldschmidt-Clermont M, Hanson MR (2010) Chloroplast RNA metabolism. Annual Review of Plant Biology 61: 125–155.
- 37. Tillich M, Lehwark P, Morton BR, Maier UG (2006) The evolution of chloroplast RNA editing. Molecular Biology and Evolution 23: 1912–1921.
- 38. Grosskopf D, Mulligan RM (1996) Developmental- and tissue-specificity of RNA editing in mitochondria of suspension-cultured maize cells and seedlings. Current Genetics 29: 556–563.
- 39. Howad W, Kempken F (1997) Cell type-specific loss of atp6 RNA editing in cytoplasmic male sterile Sorghum bicolor. Proceedings of the National Academy of Sciences 94: 11090–11095.
- 40. Picardi E, Horner DS, Chiara M, Schiavon R, Valle G, et al. (2010) Large-scale detection and analysis of RNA editing in grape mtDNA by RNA deep-sequencing. Nucleic Acids Research.
- 41. Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ (2009) Jalview Version 2-a multiple sequence alignment editor and analysis workbench. Bioinformatics 25: 1189–1191.