RNA Editing in Chloroplasts of Spirodela polyrhiza, an Aquatic Monocotelydonous Species

RNA editing is the post-transcriptional conversion from C to U before translation, providing a unique feature in the regulation of gene expression. Here, we used a robust and efficient method based on RNA-seq from non-ribosomal total RNA to simultaneously measure chloroplast-gene expression and RNA editing efficiency in the Greater Duckweed, Spirodela polyrhiza, a species that provides a new reference for the phylogenetic studies of monocotyledonous plants. We identified 66 editing sites at the genome-wide level, with an average editing efficiency of 76%. We found that the expression levels of chloroplast genes were relatively constant, but 11 RNA editing sites show significant changes in editing efficiency, when fronds turn into turions. Thus, RNA editing efficiency contributes more to the yield of translatable transcripts than steady state mRNA levels. Comparison of RNA editing sites in coconut, Spirodela, maize, and rice suggests that RNA editing originated from a common ancestor.


Introduction
RNA editing in angiosperms mainly defines the process that alters a cytosine (C) to uracil (U) in specific positions of RNA so that the sequence in the mature RNA differs from that of genomic DNA. RNA editing is a mechanism that corrects missense mutations of genes at the RNA level. It thereby restores conserved amino acid residues to maintain essential functions of encoded proteins [1]. For example, psbF mRNA is edited in spinach plastids by a C to U conversion, changing a serine to a conserved phenylalanine codon. In tobacco, a phenylalanine codon is present at the DNA level without any editing. When the spinach psbF was introduced into tobacco plastids, the lack of RNA editing led to a defective phenotype, indicating that RNA editing is site-specific [2]. Introduction of the tobacco chloroplast genome into Atropa belladonna, demonstrates that the belladonna nuclear genome is unable to edit the tobacco genes with their FPKM values (fragments per kilobase of exon per million mapped reads). We further defined the differentially expressed genes if the fold change was more than 2 and false discovery rate (FDR) was less than 0.05. SNPs were called by SAMtools to uncover potential changes of C to U when at a given position with coverage limit set to 10 reads. The coverage setting is arbitrary, but it is sufficient to identify all potential SNPs, while excluding sequencing errors at the same time. We took advantage of the four biological replicates and only kept the SNPs that were present in at least two replicates. The mapped reads were then visualized in the Integrative Genomics Viewer [15]. RNA editing efficiency was counted by edited reads divided by total mapped reads. The Chi-squared test was used to determine edited sites with significant changes.

RNA editing validation
The EST sequences from total RNA including nuclear and organellar sequences were downloaded from the NCBI Sequence Read Archive (SRA), submitted by the DOE Joint Genome Institute (JGI) from our 454 sequencing of Spirodela ESTs (SRX148325). For unconfirmed sites, we designed specific primers spanning the candidate regions and performed RT-PCR; the resulting products were directly sequenced using the same PCR primers. These sites were confirmed as RNA editing sites only if there were two overlapping peaks at the same location. To further validate editing efficiency, we cloned the RT-PCR products into pGEM-T easy and selected 96 clones for sequencing to validate the edited sites by comparing them with the genomic DNA sequence of Spirodela.

Mapping statistics and chloroplast gene expression
Our previous study showed that more than 26% of the RNA-seq reads from ribosomal depleted total RNAs could be mapped back to the chloroplast genome, equal to~2,000-fold deep coverage [13]. Here, we used a stringent filter to analyze only reads with a score of 20 and minimal length of 70 bp. A range of 1,315,402 to 4,109,489 reads equal to~1,000-fold coverage was mapped back to the chloroplast genome, which collectively represented more than 20% of the total reads (S1 Table). Read density varied widely between different genomic regions, which reflected the differential accumulation of chloroplast RNA. For example, psbA, rbcL and psaJ were highly expressed, whereas rpoC2, rpoC1 and rpoB were expressed at low levels (Fig 1). However, for the individual gene, the FPKM value did not change significantly, when turion formation was induced, suggesting that chloroplast genes were expressed at a constant level as fronds developed into turions (S2 Table).

Detection of chloroplast RNA editing sites
The high read density at most sites allowed us to assess a robust qualification and quantification of editing events. A total of 66 RNA editing sites of C-to-U conversion from 27 genes were found, when they were transcribed (Fig 1 and Table 1). All sites were validated either with 454 or traditional capillary electrophoresis (CE) platforms, where two overlapping peaks (C and T) were seen at RNA editing sites. The most heavily edited genes were ndhB (15 sites), ndhD (6 sites) and ndhA (5 sites). As expected, the RNA editing in the first and second position of codons changes the identity of amino acid, whereas it was silent for the third codon. Of the 58 editing sites in protein coding regions, 6 sites (10.3%) were in first, 49 sites (84.5%) in second, and three sites (5.2%) in third codons. The conversions from ACG to AUG in rpl2 and ndhD genes were found to create an initiation codon in Spirodela. Due to the depth of sequencing, we also detected eight silent RNA editing sites (12.1% from total) in non-coding sequences that have been rarely reported for chloroplast genome (S3 and S4 Tables) with the exception of the report in Arabidopsis [23]. Two sites were located in the intron of ycf3 and ndhB, one in the 5' UTR of rps7, and five in intergenic regions. RNA editing sites from UTR, introns, or intergenic regions provide us perhaps with a new evolutionary cause for RNA editing, as we do not know whether these editing events contribute to essential functions of plastids.

RNA editing evolution in monocots
Compared with other angiosperms, the number of RNA editing events in Spirodela (66) and coconut (75) were about twice the editing sites of rice (35) and maize (26) [1,24].
Noticeably, 31 out of 66 editing sites in Spirodela were from the ndh genes. A total of 15 out of 31 sites were from members of the ndhB genes ( Table 1). Because of the well-studied and abundant editing sites within the plant kingdom [25,26], ndhB is a good example for the study of the conservation and evolution of RNA editing. We aligned and compared 14 editing sites except for the one in the intron of the ndhB coding region of Spirodela with coconut [16], rice [17] and maize [18] (Fig 2). All C-to-U transitions observed in ndhB transcripts occurred in either the first or second codon, thereby changing the amino acid identity. Six editing sites (III, V, VII, VIII, XIII and XIV) were conserved in Spirodela, coconut, rice, and maize. In contrast, two sites (VI and IX) were conserved in Spirodela, coconut and rice, but not in maize. However, four sites (I, IV, XI and XII) were only present in Spirodela and coconut. Contrary to the conserved sites, the newly identified sites II and X exhibited Spirodela-specific divergence. In all unedited locations, the T was already encoded at the DNA level, which eliminates the requirement for RNA editing.
In the phylogenetic tree drawn by rbcL alignment, Spirodela and coconut were sister groups, whereas rice and maize were sister species (Fig 3). Spirodela shared more editing sites with coconut than with rice and maize (Table 1). For example, in the well-studied ndh gene family of ndhA, ndhB, ndhD and ndhF, 21 (81%) out of 26 of the sites were common between Spirodela and coconut, whereas Spirodela shared only 11 (42%) of the sites with rice and 10 (38%) with maize (Fig 3). The observed distribution of shared editing sites was correlated with the phylogenetic tree: close sister species shared more common sites than distant ones. The conservation of RNA editing sites indicated that RNA editing originated from a common ancestor with many editing sites but followed by lineage-specific losses and gains during monocot evolution.
RNA editing could have significantly different editing efficiencies within the same transcript, for instance, for ndhC in a range of 30%~92%, ndhB 8%~100% and ndhA 25%~95% (Fig 1 and Table 1), indicating that the individual RNA editing site is recognized by independent PPRs, a group of RNA editing factors. Furthermore, low-efficiency intergenic editing events (6%~26%) and rpoC2-2 site (< 7%) appear not to be required for transcription or the function of the translated protein as coconut, rice, maize, Arabidopsis, tobacco and tomato develop normally even though they have the nucleotide "C" in this position ( Table 1). The rpoB site is not edited in barley but edited in maize. Lack of RNA editing at this particular site does not seem to affect chloroplast function in barley [27]. The rpoA site is found edited in coconut and pre-edited in rice and maize. The ndhB-14 site is also pre-edited at the DNA level  in tobacco and tomato. However, the conservation of RNA editing of rpoA and ndhB-14 in Spirodela could probably be due to the importance of their functions, in spite of the extremely low editing efficiency of < 11% and < 8%, respectively (Table 1).

RNA-seq offers a method for qualifying and quantifying RNA editing
Transcripts from organellar genomes undergo extensive post-transcriptional processing, such as 5'-and 3'-end processing, RNA splicing and RNA editing [23]. RNA editing yields the conversion of cytosine (C) to uracil (U) nucleotides of mRNA transcripts. With prior knowledge of editing sites, primers are designed and RT-PCR products are processed to determine whether RNA editing occurs by comparing PCR products with the genomic DNA sequence. Such an approach, however, misses untranslated regions (UTRs), introns, and intergenic regions [24]. Furthermore, most editing sites are reported as fully edited, whereas partial editing is greatly underestimated. For editing efficiency of less than 10%, one has to sequence more than 10 clones to find one edited transcript when comparing cDNA with genomic sequences. Therefore, deep, strand-specific cDNA sequencing (RNA-seq) offers a new approach to identify all the potential RNA editing sites and quantify RNA editing efficiency, and to detect edited sites at very low efficiency [28].
Whereas the powerful technique of RNA-seq has been greatly utilized to study the nuclear transcriptome, it has not been widely applied to the organellar transcriptome because extracting transcripts from purified mitochondria and chloroplast is very time-consuming. Although one could sequence total RNA of both the nuclear and organellar genomes, the experimental method for preparing total RNA needs to be carefully considered. The reason is that organellar transcripts do not undergo polyadenylation like nuclear transcripts and do not have to be transported from one cellular compartment to another. Furthermore, post-transcriptional polyadenylation of organellar transcripts accelerates their degradation [29]. Therefore, the method of rRNA removal is preferred over the general approach using Oligo(dT)-based poly (A)+ enrichment for organelle transcript analysis. Compared to the isolation of RNA from purified organelles or the extraction of polyA mRNA for RNA-seq, rRNA removal by affinity is less biased, fast, and easily adapted to other plants.

RNA editing evolution is of monophyletic origin in monocots
RNA editing is a system that exists in various land plant lineages, such as hornworts, ferns and seed plants but evolves very rapidly. Probably due to the limited verified editing sites, it was reported that the editing pattern was not correlated with the phylogeny of angiosperms [24] [26], whereas other studies found that relatively closely related species shared more editing sites than distant species. For example, Nicotiana and Atropa from Solanaceae family shared 28 out of 31 RNA editing sites [30]. A total of 18 out of the 85 chloroplast-editing sites in seed plants were shared with either ferns or hornworts [7], indicating that the editing sites in seed plants could be remnants of the original editing system of land plants. After filling the phylogenetic gap with the editing sites data of a species of the order of the Alismatales, our results showed that 21 (81%) out of 26 sites of ndhA, ndhB, ndhD and ndhF transcripts were shared between Spirodela and its sister species coconut. In contrast, Spirodela shared only 11 (42%) of its sites with rice and 10 (38%) with maize, two more distantly related species (Fig 3 and  Table 1), which is consistent with a monophyletic origin of RNA editing.

Differential regulation of RNA editing in Spirodela
Partial RNA editing generates RNA polymorphism. In the psbL gene, RNA editing creates a translation initiation codon in tobacco. In another case, a chimeric gene conferring kanamycin resistance depended on ACG being edited into AUG. It was found that the unedited RNA could not be translated due to absence of an initiation codon [31]. However in yet another case, immunological analysis demonstrated that both unedited and edited rps12 RNAs were translated in maize [32] and petunia mitochondria [33], resulting in the synthesis of polymorphic polypeptides. However, the translated proteins from unedited rps12 transcript failed to assemble into ribosome in maize, whereas unedited rps12 protein in petunia could integrate into ribosome, but whether it can function or not is not known.
In maize, the quantitative analysis for 10 plastid genes showed there were no expression differences in the green tissues including young leaf, old leaf, stems, and silks, except in roots and tissue-cultured cells [34]. Although developing turions enter a dormant state, their chloroplasts remain still quite active. They are functionally closer to amyloplasts, which are mainly responsible for the synthesis and storage of starch granules [12]. Like in green tissues of maize, we could not detect that chloroplast genes are differentially expressed between fronds and turions. However, seven genes with 11 RNA editing sites show a significant change of editing efficiency when fronds turn to turions. Interestingly, for these seven genes, it appears that RNA editing efficiency affects functional protein abundance more than the steady state level of mRNA. However, whether it plays a role in the morphological transition of Spirodela needs further investigations.
Supporting Information S1 Table. Sequences mapped to the chloroplast genome. Samples contained four replicates of fronds and four replicates of turions [13]. Qualified total reads were based on the standard of minimum score of 20 and length of 70 bp. Reads were mapped back to the chloroplast genome. Mapped percentage was defined as mapped reads divided by qualified total reads. (XLSB) S2 Table. Expression of chloroplast protein-coding genes. The significant change was considered when |Fold change| >2 and p-value < 0.05. The expression unit is FPKM. (XLSB) S3 Table. Comparison of the number of RNA editing sites. RNA editing sites were compared in monocots including Spirodela, coconut, rice and maize. "NA" means the item was not studied. (XLSB) S4 Table. RNA editing sites in non-coding regions. "Reference coverage" means the number of mapped reads identical to reference (four replicates are combined). "Edited coverage" means the number of mapped reads that have been edited (four replicates are combined). "Edit (%)" gives the percentage of RNA editing using the edited reads divided by total mapped reads. (XLSB)