Transcriptome analysis of the typical freshwater rhodophytes Sheathia arcuata grown under different light intensities

The Rhodophyta Sheathia arcuata is exclusively distributed in freshwater, constituting an important component in freshwater flora. This study presents the first transcriptome profiling of freshwater Rhodophyta taxa. A total of 161,483 assembled transcripts were identified, annotated and classified into different biological categories and pathways based on BLAST against diverse databases. Different gene expression patterns were caused principally by different irradiances considering the similar water conditions of the sampling site when the specimens were collected. Comparison results of gene expression levels under different irradiances revealed that photosynthesis-related pathways significantly up-regulated under the weak light. Molecular responses for improved photosynthetic activity include the transcripts corresponding to antenna proteins (LHCA1 and LHCA4), photosynthetic apparatus proteins (PSBU, PETB, PETC, PETH and beta and gamma subunits of ATPase) and metabolic enzymes in the carbon fixation. Along with photosynthesis, other metabolic activities were also regulated to optimize the growing and development of S. arcuata under appropriate sunlight. Protein-protein interactive networks revealed the most responsive up-expressed transcripts were ribosomal proteins. The de-novo transcriptome assembly of S. arcuata provides a foundation for further investigation on the molecular mechanism of photosynthesis and environmental adaption for freshwater Rhodophyta.


Introduction
The Rhodophyta constitutes an ancient derived monophyletic eukaryotic lineage. As a member of archaeplastida, Rhodophyta originated from the primary photosynthetic endosymbiosis and subsequently spread plastid through secondary endosymbiosis to a diverse array of photosynthetic lineages [1,2]. They are primarily marine in distribution, with less than 3% of the over 6500 species occurring in truly freshwater habitats [3,4]. Though owning a relatively low diversity compared with the marine group, freshwater rhodophytes are usually important constituents of stream floras, either in terms of abundance or distribution from local scale to biomes [5]. Genus Sheathia is a typical freshwater Rhodophyta and inhabited exclusively in streams or rivers. It belongs to the Florideophyceae, growing as gelatinous gametophyte a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 filaments, with beaded appearance, varying from blue-green, olive, violet, and gray to brownish. Sheathia can be found the year round but most abundant in late winter and spring, with the growing rate accelerating in December and decreasing in June throughout a year [6,7]. Species of Sheathia are reported worldwide, and numerous species have been collected from different continents. S. arcuata is one of the most widespread species in the genus and has recently been reported from numerous localities [8].
Light is one of the important environmental factors regulating photosynthesis, growth and reproduction of photosynthetic organisms. Physiological responses to changing light intensity have been examined extensively [9,10]. Variation of growth rate, pigment content and photosynthetic characteristics in response to irradiance have been investigated in freshwater red algae [10,11]. However, little is currently known regarding the molecular mechanisms affecting the regulatory and biochemical pathways of freshwater red algae Sheathia in response to irradiance. Previous report has confirmed that Sheathia was typically shade-adapted plants, whereas some species can tolerate high irradiances and have mechanisms to avoid photo damage [10]. Thus, analyzing the gene expression patterns in response to different irradiance will provide a molecular basis for their environment adaption. Transcriptome analysis using nextgeneration sequencing is a powerful tool for examining complex molecular mechanisms. It provides a complete reference profile to understand genome content, gene function, gene expression under various conditions [12]. High-throughput RNA-sequencing (RNA-Seq) provides new perspectives for analyzing functional complexity of transcriptomes [13][14][15]. It has been used to analyze different gene expression patterns of different morphological types or under different conditions in higher plants [16,17]. Whereas the transcription profiling is still unknown for freshwater Rhodophyta.
In this study we presented the transcriptome profile of the typical freshwater taxa S. arcuata, analyzed the coding gene contents and function annotations based on BLAST against multiple databases. The significantly different expressed genes under different irradiances were analyzed, thus laying a foundation for investigation on the molecular mechanism for the environmental adaption of freshwater Rhodophyta.

Sample collection and preparation
Samples of S. arcuata were collected in Nanlaoquan, Jinci Park, Shanxi province, China (374 2 0 24.02@N; 112˚26 0 31.76@E) on June 20th and December 22nd, 2015. The park where the samples were collected is open to public and no specific permissions are requested for field sampling, and we confirm that the field studies did not involve endangered or protected species. According to the statistical data of Monthly Averaged Clear Sky Insolation Incident On A Horizontal Surface in Taiyuan (https://eosweb.larc.nasa.gov/cgi-bin/sse/grid.cgi?email= zhenhuawan@gmail.com), the collection dates were selected when strongest and weakest light intensities occurred empirically. Both collection days were sunny and the light intensities were measured using a curing radiometer. Specimens of Sheathia growing at the same location with similar wet weight were collected at different dates. Other parameters related to the water conditions were measured with pH & EC waterproof (HANNA instruments, Woonsocket RI USA) when samples were collected, and the results were showed in Table 1. Physiochemical factors including temperature, pH, current velocity, total dissolved solids and electrical conductivity of the underground water in the sampling site were relative stable, except for the considerable different light intensities.
The thalli were washed using distilled water and frozen in liquid nitrogen as soon as possible after collection at the sampling site. Total RNA of each specimen was extracted according to Holmes and Bonner [18]. After the sample were treated with DNase, RNA degradation and contamination were monitored on 1% agarose gels. RNA purity was checked using the Nano-Photometer1 spectrophotometer (IMPLEN, CA, USA). Concentration of RNA was quantified using Qubit (Thermo Fisher Scientific) and integrity of RNA was tested using Agilent 2100 (Agilent technology). RNA samples used for subsequent analyses were with values of A260/A280 ratios between 1.9 and 2.1, RNA 28S:18S ratios higher than 1.0, and RNA integrity numbers (RINs) ! 6.8. The extracted RNA samples of each group (high light intensity and low light intensity) were pooled from 3 individual thalli before subsequent handling.

Library preparation for transcriptome sequencing
A total amount of 1.5 μg RNA per sample was used as input material for the RNA sample preparations. Sequencing libraries were generated using NEBNext1 Ultra™ RNA Library Prep Kit for Illumina1 (NEB, USA) following manufacturer's recommendations and index codes were added to attribute sequences to each sample. mRNA was purified from total RNA using poly-T oligo-attached magnetic beads. Fragmentation was carried out using divalent cations under elevated temperature in NEBNext First Strand Synthesis Reaction Buffer (5X). First strand cDNA was synthesized using random hexamer primer and M-MuLV Reverse Transcriptase (RNase H). Second strand cDNA synthesis was subsequently performed using DNA polymerase I and RNase H. Remaining overhangs were converted into blunt ends via exonuclease/ polymerase activities. After adenylation of 3' ends of DNA fragments, NEBNext adaptor with hairpin loop structure were ligated to prepare for hybridization. In order to select cDNA fragments of preferentially 150~200 bp in length, the library fragments were purified with AMPure XP system (Beckman Coulter, Beverly, USA). Then 3 μl USER Enzyme (NEB, USA) was used with size-selected, adaptor-ligated cDNA at 37˚C for 15 min followed by 5 min at 95˚C before PCR. Then PCR was performed with Phusion High-Fidelity DNA polymerase, universal PCR primers and Index (X) Primer. At last, PCR products were purified (AMPure XP system) and library quality was assessed on the Agilent Bioanalyzer 2100 system. The clustering of the index-coded samples was performed on a cBot Cluster Generation System using TruSeq PE Cluster Kit v3-cBot-HS (Illumina) according to the manufacturer's instructions. After cluster generation, the library preparations were sequenced on an Illumina Hiseq platform and paired-end reads were generated.

Transcriptome analysis
Clean reads were produced by removing reads containing adapter, reads containing ploy-N and low quality reads from raw data. At the same time, Q20 (corresponding to sequencing quality with 99% accuracy rates) and GC-content of the clean data were calculated. All the downstream analyses were based on clean data with high quality. Transcriptome assembly was accomplished based on the clean data using Trinity [19] with min_kmer_cov set to 2 and all other parameters were set default. Gene function was annotated based on BLAST search against the following seven databases: Nr (NCBI non-redundant protein sequences); Nt (NCBI non-redundant nucleotide sequences), Swiss-Prot (a manually annotated and reviewed protein sequence database) with 10 −5 e-value cutoff and KOG/COG (Clusters of Orthologous Groups of proteins) with 10 −3 e-value cutoff. Automatic annotation ServerKO (KEGG Ortholog database) was conducted with 10 −10 e-value cutoff. GO (Gene Ontology) annotation was conducted using Blast2GO v2.5 [20] with 10 −6 e-value and customized script. Pfam (Protein family) annotation was based on hmmscan in the HMMER 3.0 package with e-value 0.01 [21,22].

Quantification of gene expression levels and differential expression analysis
Gene expression levels were estimated by RSEM [23] for each sample, with the clean data mapping back onto the assembled transcriptome and read count for each gene was obtained from the mapping results. Prior to differential gene expression analysis, the read counts were adjusted by edgeR program through one scaling normalized factor for each sequenced library, which was designed for datasets with no biological replicates [24]. Differential expression analysis of two samples was performed using the DEGseq package [25,26]. P-value was adjusted using q-value [27]. q-value < 0.005 and |log 2 (foldchange)| > 1 were set as the threshold for significantly differential expression.

Quantitative real-time PCR (qRT-PCR) validation
Total RNA extracted for library preparation were used as template of RT-PCR. Reverse transcribed cDNA were used to conduct quantitative real-time PCR with SYBR Green Dye (TakaRa SYBR Premix Ex Taq Ⅱ). Five genes from differentially expressed gene pools based on the bioinformatics analysis were selected including psbU, LHCA4, petH, petB and petC. The translation initiation factor 5A (elF5a) was selected as internal control gene according to previous literature [28]. Amplification primers for selected genes were shown in S1 Table. The amplification procedures were 95˚C for 30 s, 40 cycles of 95˚C for 5 s, 53˚C for 30 s, followed by dissolution stage of 95˚C for 15 s, 53˚C for 1 min and 95˚C for 15 s. Specificity of the qPCR products was estimated based on melting curve. Expression values of each gene were calculated using the method proposed by Pfaffl [29]. Results of qPCR and RNA-seq data for the selected genes were compared.

KEGG pathway enrichment analysis and Protein-protein interactive network construction
To understand the different functional pathways between the two samples, we used KOBAS software to test the statistical enrichment of differential expressed genes in KEGG pathways [30,31]. The sequences of the differently expressed genes (DEGs) were BLAST to the genome of Cyanidioschyzon merolae, which was available of the protein-protein interaction (PPI) in the STRING database (http://string-db.org/) to get the predicted PPI of these DEGs. BLAST settings for constructing interaction networks were evalue = 1e-10 and max_target_seqs = 1.
Then the PPI of these DEGs were visualized in Cytoscape [32].

Gene annotations
All 161,483 assembled transcripts were queried against seven curated databases (Fig 2A). Databases including Nr, Nt, KOG, GO and Pfam were selected to illustrate annotation venn diagram ( Fig 2B). 2278 common genes were shared in the five annotation databases. Based on the Nr annotation result (Fig 3), the species with most homologous genes with Sheathia was the Chondrus crispus (marine Rhodophyta), followed by Oryza sativa (green plant), Galdieria sulphuraria (thermophilic Rhodophyta) and Phaeodactylum tricornutumoth (Bacillariophyta). The predicted S. arcuata transcripts were classified according to GO assignments [33]. A total of 62,008 genes (38.4%) were assigned at least one GO term (Fig 4), among which 147,479 were assigned in the biological process category (Level 1), 70,427 in the molecular function category (Level 1) and 95,177 in the cellular component category (Level 1). These transcripts were further classified into functional subcategories. Genes corresponding to the ''biological process" group (Level 1) were divided into 24 subcategories, among which "cellular process" (Level 2) comprised 22.6% and was the largest term. Genes corresponding to the ''molecular function" group (Level 1) were divided into 10 subcategories, among which "binding" (Level 2) comprised 44.9% and was the largest term. Genes corresponding to the ''cellular  Transcriptome analysis of the typical freshwater rhodophytes Sheathia arcuata component" group (Level 1) were divided into 21 subcategories, among which "cell" (Level 2) comprised 20.9% as the largest term. Based on the KOG annotation result (Fig 5), 40,556 genes belonging to 25 categories were yielded. Among these categories, the largest group was genes for "Posttranslational modification, protein turnover, chaperones" cluster, owning 6,407 (15.8% of the totally annotated transcripts) in number. The biological pathways in S. arcuata were identified according to the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (Fig 6). A total of 31,330 transcripts were mapped to 19 KEGG pathways in 5 categories (S2 Table). Among the 5 categories, the pathways represented by most transcripts were metabolism (17,224,54.98% of the totally annotated transcripts), followed by genetic information (10,893, 34.77% of the totally annotated transcripts) and cellular processes 2285 (7.29% of the totally annotated transcripts).

Differential gene expression analysis
Gene expression levels of each sample were counted using Trinity, with different gene expression patterns observed in the two S. arcuata samples illustrated in Fig 7. Considering the similar water conditions of the sampling site when the specimens were collected in this study, the   (Fig 7A). Differentially expressed genes with statistically significance were observed with up-regulated and down-regulated genes mainly in the sample under high and low irradiance respectively (Fig 7B). Gene lists of down-regulated and up-regulated were listed in S3 and S4 Tables.
The top 18 enriched KEGG pathways involving up-regulated genes under low irradiance were illustrated as Fig 8. The enriched pathways of down-regulated genes in S. arcuata specimen under low irradiance were not statistically significant, with all the q-values larger than 0.5. Therefore they are not discussed in this study (S5 Table). On the other hand, the enriched pathways corresponding up-regulated genes are all statistically significant, with all the q-values evidently smaller than 0.5 ( Table 3). The top 18 significantly up-regulated genes under low irradiance were involved in important metabolism pathways including energy metabolism (photosynthesis, photosynthesis-antenna proteins, carbon fixation in photosynthetic organisms, sulfur metabolism, nitrogen metabolism), carbohydrate metabolism (glycolysis/gluconeogenesis, pentose phosphate pathway, glyoxylate and dicarboxylate metabolism, fructose and mannose metabolism), amino acid metabolism (glycine, serine and threonine metabolism), overview metabolism (carbon metabolism, biosynthesis of amino acids), metabolism of other amino acids (selenocompound metabolism), metabolism of cofactors and vitamins (riboflavin metabolism, folate biosynthesis), cellular processes (phagosome) and genetic information processing (sulfur relay system).
The significantly up-regulated transcripts involved in photosynthesis related pathways in S. arcuata specimen under low irradiance were showed in Figs 9-11 [30,31]. Among photosynthesis-antenna proteins, significantly up-regulated transcripts were the light-harvesting chlorophyll complex LHCA1 and LHCA4, which were associated with the photosystem I ( Fig  9). For photosynthesis apparatus, evidently up-expressed transcripts include PSBU in the photosystem II, PETB and PETC in cytochrome b6/f complex, PETH in photosynthetic electron transport, and the beta and gamma subunits of F-type ATPase (Fig 10). Moreover, the ATPase, which constituted part of the photosystem, were also up-expressed as a result of increased light absorption and electron transport. As the last step of photosynthesis, pathway of carbon fixation in photosynthetic organisms was the most enriched (Fig 11). The up-expressed transcripts involved in carbon fixation were as followed, phosphoenolpyruvate carboxykinase (EC 4. Up-expression of focused genes in S. arcuata specimen under low light were validated by qRT-PCR, with the results showed in Fig 12. And the differential expression pattern revealed   Transcriptome analysis of the typical freshwater rhodophytes Sheathia arcuata by qRT-PCR of selected genes was consistent with the high-throughput sequencing results, thus enhancing the statistical reliability based on sequencing data.

Transcription factors
Transcription factors in S. arcuata (S6 Table) were identified and classified into different families using the iTAK pipeline (http://bioinfo.bti.cornell.edu/tool/itak) [34]. Results showed most abundant transcription factors involved in S. arcuata transcription process were

Protein-protein interactions
Interactive networks involving the up-regulated transcripts of S. arcuata in response to weak light intensity were shown in Fig 13. The results revealed that the transcripts in response to light were all cross-linked and in a closely-related network. The nodes with highest degrees were transcripts corresponding ribosomal proteins, followed carbon metabolism, protein transport proteins, translation elongation factors, biosynthesis of amino acids and carbon fixation proteins in photosynthetic organisms.

Discussion
Despite the widely application of transcriptome sequencing in marine Rhodophyta [28,[35][36][37][38][39], transcriptome data have not been reported for freshwater red algae. This study presents the first transcriptome profiling of S. arcuata, which will enrich the repertoire of transcripts of the freshwater rhodophytes and provide more data for the further investigation on this plant lineage. Compared with the marine red algal samples with transcriptomes reported to date (P. yezoensiswith 18,640 annotated transcripts [28]. Assembled transcripts of freshwater taxa S. arcuata was considerably larger. The overrepresented transcripts length obtained in this study is consistent with sequencing result of P. yezoensis, with the most abundant length distributed between 200-500 bp, while the GC content of the transcriptome in S. arcuata is lower than P. yezoensis [28]. The size of each node represents interactive degree, with larger size corresponding to higher degree; the color of each node represents clustering coefficient, with colors ranging from green to red corresponding to lower to higher coefficients. https://doi.org/10.1371/journal.pone.0197729.g013 Transcriptome analysis of the typical freshwater rhodophytes Sheathia arcuata Specimens used in this study were collected at the irradiance of 1462 and 274 μmol photons/m 2 /s respectively, which were similar with light intensities used in previous research of genus Sheathia by Necchi. Necchi proved the maximal photosynthetic rate (8.1 ± 0.5) occurred in specimens collected at the irradiance of 320 μmol photons/m 2 /s and the minimal rate (4.9 ± 0.6) was under 1510 μmol photons/m 2 /s based on oxygen evolution test [10]. Photosynthetic changing trend revealed by physiological parameters measured in Necchi's research was consistent with the transcriptome regulation observed in our study.
Transcriptome analysis of S. arcuata grown under different light intensities in our study also shed light on the molecular mechanisms underlying the shade-adaption of this taxon. For S. arcuata specimen under weak light intensity, the up-expressed photosynthesis-antenna transcripts (LHCA1 and LHCA4, as observed in this study) facilitated more light absorption and thus improving the photosynthetic activity. Light-harvesting complexes (LHCs) associated with both photosystems I and II (in green lineage) and phycobilisomes (in cyanobacteria) served as the primary light-harvesting antenna for photosynthesis [1,40]. LHCs are important constituents that facilitate photosynthetic function in response to light quantity and quality [41]. Moreover it was found LHCs responded more evidently than the phycobilisome in S. arcuata when grown under low light intensity, revealing the improved adaptive ability to surrounding environment and the advanced stage of LHCs in Rhodophyta evolution. It was consistent with previous report that in red algae, the photosynthetic apparatus represented a transitional state between cyanobacteria and chloroplasts of green lineage, with enhanced light-harvesting capacity by owning LHCs (light harvesting complexes) associated with PSI [42]. Another pathway in regulation of adaptive response to weak light was photosynthesis. The up-expressed transcripts including PSBU, PETB, PETC, PETH, the beta and gamma subunits of F-type ATPase contribute to the adaptive response. It was reported that in cyanobacteria and red algae, the PS-II system gene psbU encodes protein constituting part of the oxygenevolving complex (OEC), which was also involved in stabilizing the oxygen-evolving machinery of PSII against high-temperature stress [43]. PETB participated in electron transferring in the photosynthesis, and PETC in the cytochrome b6/f complex was involved in mediating electron transfer between photosystem II (PSII) and photosystem I (PSI) [44]. In combination with the photosynthetic electron transport protein PETH, the transcripts related to protein network involved in the electron transport of photosynthesis were all up-expressed in S. arcuata under weak irradiance. Photosynthetic control of electron transport was a fundamental concept in the regulation of photosynthesis [45]. Additionally, regulatory pathway of S. aucuata in respond to weak light was carbon fixation. Phosphoenolpyruvate carboxylase (EC 4.1.1.49) was up-regulated in response to weak light in freshwater S. arcuata, combined with other enzymes to improve the carbon fixation activity. In marine macroalgae, phosphoenolpyruvate carboxylase (EC 4.1.1.49) was characterized as the only enzyme for dark carbon fixation [46]. The up-regulated transcripts of photosynthesis-antenna and photosynthesis apparatus triggered the increasing rate of carbon assimilation, thus fueling the growth of Sheathia specimen under the low light intensity. In contrast with previous report on higher plant including Bryophyllum fedtschenkoi, maize and barley, expression of phosphoenolpyruvate carboxykinase (EC 4.1.1.49) was down-regulated in response to decreased light [47]. It is speculated that the up-regulation of this enzyme under weak light in freshwater Rhodophyta S. arcuata is relevant to their shade-adaption. Our study revealed that for freshwater S. arcuata, transcripts involved in light harvesting, photosystem II, cytochrome b6 complex, photosynthetic electron transport and ATPase were all up-regulated thus enabling the increased photosynthetic function, which in turn provided sufficient energy and nutrients for fast growing of the plant under weak irradiance, which was consistent with the proposal that freshwater red algae were shade-adapted eukaryotic lineage [10].
Light provided fuel for photosynthetic electron transport and CO 2 fixation. As the primary determinant of ATP levels and carbon metabolites, it served to modulate cellular processes based on complex transcriptional networks [48]. Genus Sheathia regulated relative small amount of photosynthetic genes under different light intensities compared with marine diatom Chaetoceros neogracile, which exhibited altered expression of most photosynthesis genes (48 out of 70) in response to high light according to previous report [49]. It maybe explain the molecular mechanisms for weak ability in environmental adaption of Sheathia, leading to the current situation of strict habitat demand and limited distribution of freshwater Rhodophyta globally.
Along with the increased photosynthesis, other metabolic pathways with up-regulated transcripts were also observed including carbon metabolism, biosynthesis of amino acids, glycolysis / gluconeogenesis, sulfur metabolism, phagosome and nitrogen metabolism. S. arcuata displayed sophisticated responses to optimize their photosynthesis and growth under weak light conditions. This finding was in line with previous research on marine red algae and diatom, which indicated that light regulated many important cellular processes, physiological processes and biochemical pathways [37,[50][51]. Among the diverse responsive transcripts, those corresponding to ribosomal proteins and involved in protein synthesis, proved highly-regulated for S. arcuata under weak light. In previous research on marine Rhodophyta Chondrus crispus, stress treatments caused decreased expression of protein synthesis-related genes [38], which implied indirectly that low light intensity was more appropriate for growth and development of S. arcuata.
Both up-and down-regulation of diverse transcription factors in S. arcuata in responding to different irradiances revealed the complexity of regulation network. Transcription factors have been enriched in early light-responsive genes according to recent genomic studies [52]. The transcription factors identified in this study can direct adaptive changes in gene expression of freshwater Rhodophyta in response to environmental light signals in further study.

Conclusions
We present the first transcriptome profiling of freshwater Rhodophyta by conducting highthroughput RNA sequencing on S. arcuata. A total of 161,483 assembled transcripts were identified and different gene expression patterns under different irradiances were observed. The results revealed that photosynthesis-related pathways significantly up-regulated under the weak light, revealing the shade-adaption of freshwater red algae S. arcuata. Molecular mechanisms underlying shade-adaption are increased expression of transcripts corresponding to antenna proteins (LHCA1 and LHCA4), photosynthetic apparatus proteins (PSBU, PETB, PETC, PETH and beta and gamma subunits of ATPase) and metabolic enzymes in the carbon fixation. The most responsive up-expressed transcripts were ribosomal proteins in S. arcuata grown under low light intensity. The de-novo transcriptome assembly of S. arcuata laid the foundation for further investigation on environmental adaption of freshwater Rhodophyta.
Supporting information S1