Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Transcriptome Sequencing of Gracilariopsis lemaneiformis to Analyze the Genes Related to Optically Active Phycoerythrin Synthesis

  • Xiaoyun Huang,

    Affiliation Key Laboratory of Marine Genetics and Breeding, Ministry of Education, Ocean University of China, Qingdao, Shandong, China

  • Xiaonan Zang ,

    Affiliation Key Laboratory of Marine Genetics and Breeding, Ministry of Education, Ocean University of China, Qingdao, Shandong, China

  • Fei Wu,

    Affiliation Key Laboratory of Marine Genetics and Breeding, Ministry of Education, Ocean University of China, Qingdao, Shandong, China

  • Yuming Jin,

    Affiliation Key Laboratory of Marine Genetics and Breeding, Ministry of Education, Ocean University of China, Qingdao, Shandong, China

  • Haitao Wang,

    Affiliation Key Laboratory of Marine Genetics and Breeding, Ministry of Education, Ocean University of China, Qingdao, Shandong, China

  • Chang Liu,

    Affiliation Key Laboratory of Marine Genetics and Breeding, Ministry of Education, Ocean University of China, Qingdao, Shandong, China

  • Yating Ding,

    Affiliation Key Laboratory of Marine Genetics and Breeding, Ministry of Education, Ocean University of China, Qingdao, Shandong, China

  • Bangxiang He,

    Affiliation Key Laboratory of Marine Genetics and Breeding, Ministry of Education, Ocean University of China, Qingdao, Shandong, China

  • Dongfang Xiao,

    Affiliation Key Laboratory of Marine Genetics and Breeding, Ministry of Education, Ocean University of China, Qingdao, Shandong, China

  • Xinwei Song,

    Affiliation Key Laboratory of Marine Genetics and Breeding, Ministry of Education, Ocean University of China, Qingdao, Shandong, China

  • Zhu Liu

    Affiliation Key Laboratory of Marine Genetics and Breeding, Ministry of Education, Ocean University of China, Qingdao, Shandong, China

Transcriptome Sequencing of Gracilariopsis lemaneiformis to Analyze the Genes Related to Optically Active Phycoerythrin Synthesis

  • Xiaoyun Huang, 
  • Xiaonan Zang, 
  • Fei Wu, 
  • Yuming Jin, 
  • Haitao Wang, 
  • Chang Liu, 
  • Yating Ding, 
  • Bangxiang He, 
  • Dongfang Xiao, 
  • Xinwei Song


Gracilariopsis lemaneiformis (aka Gracilaria lemaneiformis) is a red macroalga rich in phycoerythrin, which can capture light efficiently and transfer it to photosystemⅡ. However, little is known about the synthesis of optically active phycoerythrinin in G. lemaneiformis at the molecular level. With the advent of high-throughput sequencing technology, analysis of genetic information for G. lemaneiformis by transcriptome sequencing is an effective means to get a deeper insight into the molecular mechanism of phycoerythrin synthesis. Illumina technology was employed to sequence the transcriptome of two strains of G. lemaneiformis- the wild type and a green-pigmented mutant. We obtained a total of 86915 assembled unigenes as a reference gene set, and 42884 unigenes were annotated in at least one public database. Taking the above transcriptome sequencing as a reference gene set, 4041 differentially expressed genes were screened to analyze and compare the gene expression profiles of the wild type and green mutant. By GO and KEGG pathway analysis, we concluded that three factors, including a reduction in the expression level of apo-phycoerythrin, an increase of chlorophyll light-harvesting complex synthesis, and reduction of phycoerythrobilin by competitive inhibition, caused the reduction of optically active phycoerythrin in the green-pigmented mutant.


The genus Gracilaria encompasses the greatest number of species in the family Gracilariaceae, with the majority of them being reported from warm-water and tropical regions. Gracilariopsisspp. are widely distributed around the world and are important economic macroalgae, as they are utilized for agar extraction and may play an important role in bioremediation [1, 2, 3, 4, 5]. G. lemaneiformis is the principal species for seaweed cultivation in China, used as a nutritious food source rich in beneficial proteins, minerals and trace elements [6]. Moreover, the majority of phycobiliproteins in G. lemaneiformis is phycoerythrin, as identified by fluorescence spectroscopy. Phycoerythrin is a bright red, fluorescent and water-soluble light harvesting pigment, commonly found in red algae. Phycoerythrin has a variety of characteristics, including solubility in water, antioxidant property, anti-tumor activity and so on, which promotes use of phycoerythrin in a wide range of applications in the food, cosmetic, clinical diagnostics, immunochemistry, biological engineering and other fields, as well as in research to develop other uses. The chromophore–protein complex in PBPs shows structural flexibility in response to environmental changes [7,8,9]. The green mutant of G. lemaneiformis has a completely different color, resembling green algae, along with changes of its spectral characteristics, which appear to result from the change in phycobiliprotein composition, especially the percentage of phycoerythrin. The green G. lemaneiformis pigmentation mutant could be useful for phycobiliprotein research, because the molecular mechanism of phycoerythrin biosynthesis in G.lemaneiformis is little known. Thus, in this study, next generation sequencing technology was used to help us analyze the differentially-expressed genes between the wild type and the pigment mutant at the level of transcription, with the objective of getting an improved understanding of the biosynthesis of phycoerythrin having optical activity.

In recent years, high-throughput sequencing technology has emerged as a powerful and cost-efficient tool for transcriptome analysis [10]. Transcriptome sequencing is not dependent on prior knowledge of gene sequences, and the de novo assembly of transcripts in non-model plant species that lack a reference genome is possible [11]. Transcriptome sequencing is an efficient way to measure transcriptome composition, obtain RNA expression patterns, and discover new genes [12].

In the present study, Illumina technology was used to sequence the transcriptome of G. lemaneiformis and a green pigment mutant. After merging effective reads of two samples, Trinity software was exploited to assemble these short reads into contigs, and then contigs were assembled into scaffolds. Finally, the repository of the G. lemaneiformis transcriptome was established. Taking these unigenes as reference genes; RNA-seq (quantification) technology was used to analyze the differentially transcribed genes in the green mutant with the objective of gaining a better understanding of the phycoerythrin synthesis and activity in G. lemaneiformis. This dataset will serve as a public information platform for gene expression, genomics, and functional genomics in G. lemaneiformis. The assembled and annotated transcriptome sequences will provide a valuable resource for future genetic and genomic research in G. lemaneiformis and closely related species.

Materials and Methods

Algal strains

The green mutant was discovered among wild-type fronds of Gracilariopsis lemaneiformis on the Nanao Island. The morphology and growth of the green mutant is almost the same as the normal group. After collection from Nanao island (Coastal Experimental Station of Shantou University in Nanao Island) in October, the green mutant and the wild type of G. lemaneiformis were thoroughly rinsed with sterilized sea water and cultured in 5L flasks containing 2L sterilized modified f/2 medium supplemented with 100 μM NaNO3 and 10 μM NaH2PO4·H2O, under a light intensity of 50 μmol photons m-2s-1 for a 12:12 light/dark period at 23±1°C. The medium was changed every three days. After a period of cultivation, the algae were frozen in liquid nitrogen and stored at -80°C for RNA extraction.

Total RNA extraction and library construction

Total RNA of the wild type and the green mutant was extracted from the freshly frozen tissue samples by using TIANGEN reagent (Novogene), according to the manufacturer's instructions. Then, these RNA samples were tested to ensure the quality met the standard for library construction with regard to four aspects. First, the levels of degradation and contamination of RNA samples were checked by agarose gel electrophoresis. Second, RNA purity was checked by Nanodrop to confirm that it was of sufficient quality with an OD 260/280 ratio value near 2. Third, the concentration of RNA samples was accurately quantified by Qubit. Fourth, Agilent 2100 was used to check the integrity of RNA samples. When the RNA quality parameters met the requirements, the cDNA library could be constructed.

Magnetic beads with oligo(dT) were used to isolate eukaryotic mRNA from the total RNA. Subsequently, fragmentation buffer was added to chop the mRNA into short fragments, which were then used as the template for single-stranded cDNA synthesis by adding 6-bp random hexamers. Then, the second-strand cDNA synthesis was performed by adding buffer, dNTPs, DNA polymerase I and RNase H. After purification of the double-stranded cDNA using AMPure XP beads, the ends were repaired and poly (A) was linked to the 3’-cDNA. Sequencing adapters, which contained a recognition site, were added to the end of poly(A). After amplifying the double-stranded cDNA, the PCR amplification product was purified by AMPure XP beads. Finally, the cDNA library was constructed.

After construction of the cDNA library, the preliminary initial quantitation of the library was performed by Qubit 2.0. Then, the cDNA concentration was diluted to 1.5ng/μl. Subsequently, insert size was checked by Agilent 2100. After ascertaining that the insert size met the requirement, the effective concentration of the library was accurately quantified by Q-PCR (Effective concentration of library >2 nM) to guarantee the quality of the library.

Illumina sequencing, quality controls and de novo assembly

The pools for wild type and the green mutant were used to generate two paired-end cDNA libraries, which were then sequenced using Illumina HiSeqTM2500 and MiSeqTM sequencers. The raw data were obtained and the base quality was tested using Qphred software. Q10, Q20, Q30, Q40 represent an accuracy of 90%, 99%, 99.9%, 99.99%, respectively. Then, the low-quality reads were filtered out, (i.e., reads containing primer/adaptor sequences, reads in which the proportion of N (i.e. undetermined bases) were greater than 10%, reads with more than 50% proportion of bases with Qphred5, and so on) and the remaining clean reads were retained. The raw data were submitted to NCBI SRA with accession number SRP095068.

Transcriptome de novo assembly was carried out using Trinity [11]. Clean reads were virtually chopped to form a k-mer (k = 25) library by the Trinity software, to construct a dictionary of k-mers. Then, the k-mer reads with a certain length of overlap were combined to extend the sequence until they could not be further extended. These final extended sequences are called contigs. Contigs were assembled into components based on their overlap relationships. Then pair-end reads were re-used to fill the gaps in the components in order to obtain the unigenes. Finally, we got the assembled results of TRINITY. fasta.

Gene function annotation, classification and CDS forecast

To obtain comprehensive information about gene function, unigene sequences were annotated by blast X and analyzed using the NR, NT, Swiss-prot (e-value = 1e-5), Pfam (e-value = 0.01), KOG (e-value = 1e-3), KEGG (e-value = 1e-10), and GO (e-value = 1e-6) database resources. On the basis of gene function annotation, genes can be classified into three categories: BP (biological process), CC (cellular component) and MF (molecular function) by Gene Ontology. Using the KOG database, unigenes can be divided into 26 groups. Unigenes can be mapped into different metabolic pathways by using KEGG. If the results of one database conflict with another one, the priority order of the databases should be NR, GO, KOG, Swiss-prot, NT.

ORF information was extracted according to the annotated unigene sequences directly from databases. Then, the coding region was translated into amino acid sequences (5'-3'direction).When the unigene sequences failed to match to the protein databases, ESTscan (3.0.3) software was applied to predict the ORF of these sequences. Finally, the nucleic acid sequences and the amino acid sequences encoded by them were annotated and could be applied for further study.

GO and KEGG enrichment analysis of DEGs

Pathway enrichment analysis was used to identify significantly enriched metabolic pathways or signal transduction pathways of DEGs by comparing them with the whole genome background. Screening Threshold was qvalue <0.005 and | log2 fold-change| > 1 (Log2 fold-change = log2(Read count of Gg/Read count of Gr), Gr stands for the sample of the wild type while Gg stands for the sample of green mutant.). For the differentially expressed genes, log2 fold-change >0 means DEGs of the pigment mutant were up-regulated, log2 fold-change < 0 means DGEs of the pigment mutant were down-regulated compared to the wild type. After calculating the p-value, multiple hypothesis testing was carried out to try to reduce false positives. The smaller the p-value is, the larger the difference.

DEGs were mapped to the term types of the Gene Ontology database (, and the numbers of DEGs were calculated in every term type. The significantly enriched terms were found by comparing to the whole genome background [13]. This method of enrichment analysis is termed GOseq [14].

KEGG is the main public database for pathway determination [15]. KEGG enrichment can identify the principal metabolic pathways and signal transduction pathways of DEGs. The KEGG significant enrichment was analyzed by the hypergeometric test to look for the significant enrichment pathway of DEGs compared to the all annotated genes by using KEGG Pathway as a unit. Computational formula of the analysis is:

N stands for the number of genes that can be annotated by pathways, and n is the number of DEGs in N. M is the number of genes annotated to a certain pathway among all the genes, and m is the number of DEGs in M. Those pathways with FDR (false discovery rate) ≤ 0.05 were defined as the significant enrichment pathways in DEGs.

Gene validation and expression analysis

Based on the transcriptome result, genes encoding apo-phycoerythrin were chosen for validation of the transcriptome result, and study of phycoerythrin biosynthesis based on the differential expression between the green mutant and the wild-type using qRT-PCR. Total RNA was extracted from the wild type and the green mutant cultured in a light intensity of 54 μmol m-2 s-1 for 12h everyday for one week. The algae were powdered in liquid nitrogen, followed by RNA extraction using RNA extraction kits (Takara). The synthesis of cDNA was performed using a PrimeScript II 1st Strand cDNA Synthesis Kit (TaKaRa). Oligo dT primer and random 6-mer primers were both used as the RT-primers.

Real-Time qRT-PCR was performed using a SYBR Premix Ex Taq™ II Kit (TaKaRa) with a 20 μl reaction system containing 10.4μl 2×SYBR green Mastermix (TaKaRa), 0.3μl forward and reverse primers, 1 μl cDNA and 8 μl ddH2O. Reaction steps followed the following program: 95°C for 30 s, followed by 40 cycles of 95°C for 5 s, 55°C for 20 s, 72°C for 20 s and reading fluorescence signal, followed by 1 cycle of 95°Cfor 15 s, 60°C for 60s, 95°C for 15 s.

Fluorescence spectrophotometer

First, equal thallus masses of the wild-type and the green pigment mutant were weighed and ground with liquid nitrogen in a mortar. Then, the powder was transferred into a 10 ml EP tube, 3 ml of 0.1mol/l PBS were added and the samples were sonicated at intervals for 6 minutes (repeated cycles of sonication for 2.5 seconds, pause for 5 seconds, for a total of 6 minutes). The mixture was centrifuged for 10 minutes at 10000r/min. Then 1 ml supernatant liquid was added into the cuvette to measure the fluorescence intensity of the characteristic peaks for phycoerythrin and chlorophyll.


Short-read de novo sequencing and assembly

By sequencing the transcriptome of G. lemaneiformis, 30,450,973 and 28,306,016 raw reads of the wild type and the green mutant were obtained, respectively. To guarantee the quality of data used for the analyses, stringent parameters were used to measure the quality of reads, and all reads with undetermined information greater than 10% were removed. After this filtering, 29,217,420 and 27,230,252 clean reads of the wild type and green mutant were generated, with 94.97% and 95.14% bases showing quality greater than Q20, while the GC percentage was 52.42% and 52.83% respectively. Then, the k-mer (k = 25) technique was used to measure the nucleotide sequence of reads.

As a no-reference genome project, these clean reads were assembled into contigs using overlap information until the contigs could not be further extended. After a series of subsequent steps, 86915unigenes (>200 bp) were generated with N50 of 1481 bp and N90 of 245 bp. The size of the unigenes varied from 201 bp to 38052 bp with a mean length of 679 bp, and a total length of 59,051,626 bp. The length distribution of the unigenes is given in Fig 1.

Fig 1. The length distribution of unigenes.

The vertical axis represents the number of unigenes, and horizontal axis gives the length interval of the unigenes.

Function annotation

To acquire the comprehensive genetic information of G. lemaneiformis, the unigenes were blasted against the NR, NT, KO, SwissProt, PFAM, KOG, GO, KEGG database resources to identity the functions of all of the unigene sequences.

All of the unigenes were annotated to genes having known functions in the indicated databases based on the sequences with greatest similarity. Among these, 42884 unigenes were annotated in at least one database, 2070 unigenes were annotated in all of the databases (Table 1). In total, 36036 unigenes (41.46% of all unigenes) were aligned to the NR database with an e-value < 1e-5.

Table 1. Summary of the function annotation results for G. lemaneiformis unigenes in public protein databases.

A total of 32194 (37.04%) unigenes were annotated successfully by GO annotation. These annotated unigenes were classified into the next terms of three ontologies: BP (biological process), CC (cellular component) and MF (molecular function). The distribution of unigenes is shown in Fig 2. In many cases, the same unigene was assigned to multiple terms. Within the cellular component category, the majority of genes were assigned into “Cell” (11882, 36.91%) and “Cell Part” (11877, 36.89%). For the molecular function, most of genes were involved in “Catalytic Activity” (16154, 50.18%) and “Binding” (18384, 57.10%). Among the "Biological Process", a high percentage of genes were classified into “Cellular Process” (19233, 59.74%), “Metabolic Process” (18551, 57.62%) and “Single-organism Process” (14679,45.60%).The greatest number of annotated unigenes were involved in "Biological Process".

Fig 2. Gene function classification of all annotated unigenes by Gene Ontology.

The vertical axis represents the number of unigenes, and horizontal axis gives the specific GO sub-categories.

The unigenes was searched against the KOG database in order to predict and classify their possible functions. A total of 14996 unigenes were annotated successfully and classified into 26 groups by KOG annotation (Fig 3). Among the 26 KOG categories, “Posttranslational Modification", "Protein Turnover", and "Chaperones” account for the largest proportion of unigenes (2286 unigenes, 15.24%). The second largest group of unigenes were in “Translation", and "Ribosomal Structure and Biogenesis”(1689 unigenes, 11.26%), followed by “Signal Transduction Mechanisms”(1402 unigenes, 9.35%), “RNA Processing and Modification” (973 unigenes, 6.49%), “Intracellular Trafficking, Secretion, and Vesicular Transport” (941 unigenes, 6.28%).The smallest groups were “Cell Motility” and “Extracellular Structures” (22 unigenes, 0.15%).

Fig 3. KOG distribution of the annotated unigenes.

The y-axis represents the percent (%) of unigenes classified into the specific categories.

Based on the KO annotation results, all of the 12279 annotated unigenes could be classified into 268 pathways at the level of "Pathway Hierarchy 2" on the basis of their participation in KEGG metabolic pathways. KEGG pathways can be divided into five branches: "Cellular Processes", "Environmental Information Processing", "Genetic Information Processing", "Metabolism", and "Organismal Systems". Most of the unigenes were involved in the "Translation Process" of "Genetic Information Processing"(1646unigenes, 13.41%), followed by "Folding, Sorting and Degradation" (1111, 9.05%), "Carbohydrate Metabolism" (1102, 8.97%), " Signal Transduction" (1089, 8.87%), and "Energy Metabolism" (1023, 8.33%) (Fig 4).

Fig 4. KEGG classification.

Y-axis lists the various metabolic pathways, while the x-axis gives the percent of genes in the pathways. The KEGG pathway categories can be divided into five groups: A. Cellular Processes, B. Environmental Information Processing, C. Genetic Information Processing, D. Metabolism, E. Organismal Systems.

Based on the results above, a large number of unigenes were found to match known proteins in the four public databases, which provided detailed functional information. This function analysis is essential to get a deeper understanding of the gene expression profiles in G. lemaneiformis. At the same time, these annotations contribute to identifying unigenes and the pathway involved in the synthesis of phycoerythrin.

DEG and DEG GO enrichment

In the present study, the primary goal of transcriptome sequencing was to compare gene transcriptional levels between the wild type and the green-pigmented mutant. A total of 4041 differentially expressed genes (5.6% of all detected genes) were identified. All these DEGs were screened by “the absolute value of log2 Ratio ≥ 1 and FDR ≤ 0.001” between the two libraries. In the green mutant library, 3726 DEGs were up-regulated and 315 DEGs were down-regulated compared to the wild type.

In order to get a better understanding about the biological function of these DEGs, GOseq was used to divide these DEGs into different GO terms. A total of 3239 DEGs were classified by GO functional annotation. In general, "non-membrane-bounded organelle cellular component" (GO:0043228) and "intracellular non-membrane-bounded organelle cellular component" (GO:0043232) account for the largest proportion of DEGs (572 DEGs, 17.66%), followed by "cellular component biogenesis biological process" (GO:0044085) (393 DEGs, 12.13%), "translation biological process" (GO:0006412) (344 DEGs, 10.62%), "ribonucleoprotein complex cellular component" (GO:0030529) (335 DEGs, 10.34%), "ribosome cellular component" (GO:0005840) (289 DEGs, 8.92%), "ribonucleoprotein complex biogenesis biological process" (GO:0022613) (286 DEGs, 8.83%), "ribosome biogenesis biological process" (GO:0042254) (284 DEGs, 8.77%), "structural constituent of ribosome molecular function" (GO:0003735) (241 DEGs, 7.44%) (Fig 5). The results indicate that at least 54.92% of the DEGs directly participated in the process of protein synthesis by affecting the structure of the ribosome and the protein translation process.

Fig 5. The comparison of GO enrichment between all DEGs and up-regulated DEGs.

With the aim of getting detailed information of DEGs, these DEGs were divided into two groups (up- and down-regulated DEGs). Up-regulated DEGs were compared with the total DEGs. The results indicated that DEGs with enriched GO terms were mainly up-regulated genes, and all of these GO terms’ "Corrected_pValues" were less than 0.05. In the biological process category, the end nodes of the GO terms for up-regulated DEGs were mainly enriched in “translation” and “ribosome biogenesis biological process”, and down-regulated DEGs were mostly included in “multicellular organismal development”. In the "Cellular Component" GO network analysis, the significantly enriched GO terms focus on “Ribosome Cellular Component”. In the Molecular Function category, the GO terms of up-regulated DEGs were enriched in “Structural Constituent of Ribosome” (Fig 6).

Fig 6. Biological Process, Cellular Component and Molecular Function network with significantly enriched GO terms in DGEs between wild type and green pigment mutant.

(A) Up-regulated DEGs of Molecular Function in green pigment mutant compared to control. (B) Up-regulated DEGs of Biological Process in green pigment mutant compared to wild type. (C) Up-regulated DEGs of Cellular Component in green pigment mutant compared to wild type. (D) Down-regulated DEGs of Biological Process in green pigment mutant compared to wild type. The colored nodes represent the level of enrichment, and the deeper the color, the higher the enriched level.

In conclusion, the results of the GO functional enrichment analyses of DEGs indicated that the biggest difference of all DEGs between the wild type and the green pigment mutant was for the unigenes related to biological process of protein and the structure of ribosome. The results also indicated that the significantly up-regulated ribosomal subunit composition may increase the synthesis of protein.

KEGG Pathway analysis and detection of candidate genes

Pathway-based analysis plays an important role in getting a further understanding of the interactions among genes in the same pathway. A total of 4041 DEGs were analyzed against the KEGG databases. These DEGs were enriched into 294 pathways by the hypergeometric statistical test. The top 20 comprehensive, up-regulated and down-regulated DEG KEGG enriched pathways were identified. Generally, the largest number of DEGs (141) was enriched in the pathway of Ribosome, among these, 140 DEGs were up-regulated and 1 DEG was down-regulated. The results indirectly indicated that in the green pigment mutant, the function of protein synthesis increased significantly.

Apo-phycoerythrin was identified as a down-regulated DEG, as shown in Fig 7, the expression of the green mutant's light-harvesting chlorophyll protein complex increased significantly. Meanwhile, the DGEs related to the synthesis of porphyrin and chlorophyll metabolism were enriched (Fig 8). For porphyrin and chlorophyll metabolism, a large number of genes were highly expressed in the green mutant compared to the wild type of G. lemaneiformis. Interestingly, these genes are related to the synthesis of biliverdin, which is a precursor of phycoerythrin. Peroxidase, myoglobin, cytochrome a, cytochrome c and catalase compete with biliverdin for heme. The up-regulation of COX15 and cytochrome c oxidase assembly protein subunit 15 ( might competitively decrease the synthesis of biliverdin. For phycoerythrin to be optically active, apo-phycoerythrin must be bound to phycoerythrobilin. In this study, the transcription level of apo-phycoerythrin was decreased, and the synthesis of phycoerythrobilin was competitively inhibited in the green mutant. At the same time, the transcription level of the light-harvesting chlorophyll protein complex was increased. These changes in pigment composition are clearly responsible for the green color of the mutant.

Fig 7. Apo-phycoerythrin and light-harvesting chlorophyll protein complex function-related KEGG pathway.

CpeA, and CpeB represent theα and β subunit of apo-phycoerythrin, respectively. Lhc represents the light-harvesting chlorophyll protein complex, the green color represents the down-regulated DEGs, red color represents the up-regulated DEGs.

Fig 8. The metabolic pathway of phycoerythrobilin (porphyrin and chlorophyll metabolism).

Red box represents up-regulated DEGs. The dashed lines represent omitted intermediate steps. represents the cytochrome c oxidase assembly protein subunit 15.

Analysis of the results of Real-time quantitative PCR

To validate the sequencing results obtained by RNA-seq, real-time PCR was performed on the apo-phycoerythrin gene related to the biosynthesis of phycoerythrin. The result showed that the expression of apo-phycoerythrin was much higher in the wild type than the pigment mutant, which is similar to the RNA-seq data (Fig 9). The result verified the reliability of the transcription sequencing results.

Fig 9. Differential expression analysis of apo-phycoerythrin (c13260-g1) between wild type and the green mutant.

(A) The result of qRT-PCR. (B)The result of RNA-seq. FPKM represents the expected number of fragments per kilobase of transcript sequence per millions of base pairs sequenced, which provided a favorable reliability for the quantitative analysis of DEG.

Fluorescence spectra analysis of phycoerythrin

To validate the analysis results of the KEGG pathway analysis, a fluorescence spectrophotometer was used to analyze fluorescence intensity of phycoerythrin at the characteristic peaks, for green mutant and wild type. We set the wavelength of emission light at 340nm and 480 nm, respectively (Fig 10). Phycoerythrin absorbs maximally around 580nm, while Chlorophyll absorbs maximally around 680nm. Characteristic peaks at different wavelength can show the proportion between phycoerythrin and chlorophyll in the wild type and green mutant. When fluorescence emission was at 340nm, the characteristic peak of the wild type was lower than the green mutant, indicating there was a higher level of chlorophyll in the green mutant. When fluorescence emission was at 480nm, the characteristic peak of the wild type was very high while there was no characteristic peak in the green mutant, indicating that the level of phycoerythrin in the wild type is much greater than in the green mutant.

Fig 10. Excitation spectra for phycoerythrin and chlorophyll in the wild type and green mutant.

(A) Fluorescence emission at 340 nm. (B) Fluorescence emission at 480 nm. Phycoerythrin absorbs maximally around 580 nm, while Chlorophyll absorbs maximally around 680 nm. Characteristic peaks at different wavelengths can show the proportion between phycoerythrin and chlorophyll in the wild type and green mutant.


RNA-Seq has proved useful for the detection of gene expression, the discovery of novel transcripts, and the identification of differentially expressed genes. Here, second generation sequencing technology was used to analyze the transcriptome of G. lemaneiformis, and DEG analysis was used to reveal the change of expression pattern in the green mutant.

Red algae contain phycobiliproteins in addition to chlorophyll a, chlorophyll b, carotene and lutein. Consequently, red algae have two light-harvesting systems including the peripheral antenna system consisting of phycobilisomes and the internal antenna system comprised of the chlorophyll protein complex. Phycobiliproteins are pigments composed of colorless apoproteins that bind several light-absorbing phycobilin chromophores [16]. Phycoerythrin, as the main antenna, plays an important role in normal G. lemaneiformis giving it a red color, when phycoerythrin is successfully synthesized and the phycoerythrobilin is attached to the apo-phycoerythrin.

By KEGG Pathway analysis, the transcriptional level of apo-phycoerythrin in the green mutants was significantly down-regulated, about 1.2 times, which may be the direct cause of the low level of phycoerythrin in green mutants. Meanwhile, in the porphyrin and chlorophyll metabolic pathway, cytochrome c heme-lyase, and cytochrome c oxidase assembly protein subunit 15, increased 4–6 times, which resulted in the higher level of cytochrome C and heme A. At the same time, the expression level of biliverdin decreased significantly by competitive inhibition, which negatively affected phycoerythrobilin synthesis. These results indicate a competitive relationship between phycoerythrobilin and chlorophyll synthesis in G. lemaneiformis, and also demonstrated the importance of phycoerythrobilin and phycoerythrin synthesis for the synthesis of phycoerythrin with optical activity.

In photosynthetic organisms, solar energy in the form of photons is gathered by light-harvesting complexes (LHC) and then transferred to reaction centers, where the excitation energy is converted into electrochemical energy by charge separation [17]. Concomitant with the decrease of the phycoerythrin level, the chlorophyll light-harvesting protein complexes, including Lhca1, Lhca5, Lhcb2 were up-regulated 4–8 times. Plants and microalgae respond to light quantity changes by either reducing (excess light) or increasing (limiting light) the cellular LHC level [18, 19, 20]. Therefore, the increase in light-harvesting protein may compensate for the decrease in the amount of light energy harvested due to the drop in the phycoerythrin level.

The energy captured in an excited chlorophyll molecule can have several fates: it can be transferred to another pigment, used to drive the redox reactions of photosynthetic electron transfer (PET), dissipated thermally as heat, or re-emitted as fluorescence. Because these processes are in competition with each other, any one of them will be modified by changes in any of the other pathways [21]. In this study, the expression level of proteins related to photosynthetic electron transport in the green mutant increased significantly. In PSⅡ, the expression level of three protein subunits including PsbO, PsbS, PsbU increased 4–6 times, in the cytochrome b6-f complex, the Fe-S subunit increased 5.7 times, and ferredoxin—NADP+ reductase increased 5–8 times in PSⅠ. All of these up-regulated proteins play the role of electron transport carrier between PSⅠand PSⅡ. Therefore, the speed of electron transfer might accelerate between PSⅠ and PSⅡ. Based on the outcomes for the pathways for porphyrin and chlorophyll synthesis, we know that the expression quantity of cytochrome c increased, which is interesting because cytochrome c plays the role of electron transport carrier in the oxidative phosphorylation metabolic pathways. In addition, five subunits of cytochrome c oxidase and two subunits of cytochrome c reductase increased significantly, which might indicate that cytochrome c can be oxidized and reduced quickly, increasing the rate of electron transport (Fig 11). Electron transport always couples with ATP synthesis in photosynthesis and oxidative phosphorylation, thus the information gathered suggested that the synthesis of ATP was likely increased.

Fig 11. The pathway of photosynthesis and oxidative phosphorylation.

(A) The pathway of photosynthesis. (B) The process of oxidative phosphorylation. Red numbered boxes indicate up-regulated genes in the green mutant as compared to the wild type.

The expression level of ferredox in NADP reductase increased about 5–8 times, thus more NADPH would be available for synthesizing ATP to provide energy. The expression of NADH dehydrogenase, including Ndufs1, Ndufs2, Ndufs3, Ndufv1, Ndufv2, Nduf3v, Ndufa2, Ndufa6, Ndufa9, Ndufab1, also increased significantly, which would provide energy for synthesizing ATP directly. Three subunits of F-type ATPase α, β, γ, F-type H+-transporting ATPase subunit O and almost all of the subunits of V-type ATPase increased significantly, which could increase the rate of ATP synthesis. As a result, photophosphorylation increased significantly.

The GO functional enrichment analyses of DEGs suggested that the biggest difference of all DEGs between the wild type and the green pigment mutant was that the significantly up-regulated unigenes related to the biological process of protein and the structure of ribosome. Therefore, the results of DEG GO enrichment can be explained to a certain extent. On the one hand, photophosphorylation and oxidative phosphorylation provide the energy basis for the synthesis of ribosomes and protein. On the other hand, the enhancement of ribosome metabolic activity could allow the synthesis of more proteins related to energy metabolism. Increase of energy metabolism and material metabolism of the green mutant may help to compensate for the shortage of energy due to the reduction in the amount of phycoerythrin.

Through this comparative analysis between the wild type and the green-pigmented mutant, we identified the causes of the color variation in the mutant and identified the synthesis pathway of phycoerythrin. This study also set an important foundation for detecting potential genes related to the synthesis of phycoerythrin with optical activity. Additionally, G. lemaneiformis is able to increase the expression of the light-harvesting protein complexes to increase light capture when lacking phycoerythrin, which will provide an important base for studying the Lhc gene family.


G. lemaneiformis is a suitable alga for studying the metabolic pathway of phycoerythrin, as phycoerythrin accounts for the largest proportion of pigments in the phycobiliprotein. The green mutant provides a good comparison for finding differential expression of genes for phycoerythrin synthesis. Transcriptome sequencing technology provides an effective means for analyzing the genetic information of this species. We have assembled 86915 unigenes using the Trinity de novo method, including identifying 42884 annotated unigenes in data bases, and screening 4041 differentially expressed genes. This study helps to provide a better understanding of the molecular mechanisms of photosystem regulation. The transcriptome dataset will take an active role in identifying genes related to photosynthesis in red algae, and it will also serve as a public information platform for studying gene expression and function in G. lemaneiformis.


This work was supported by the National Natural Science Foundation of China (Grant No. 31472255). We thank Dr. John P. van der Meer for his comments and editing of the manuscript. We thank Professor Weizhou Chen for providing the green mutant of Gracilariopsis lemaneiformis.

Author Contributions

  1. Conceptualization: XNZ XYH.
  2. Data curation: BXH.
  3. Formal analysis: XYH.
  4. Funding acquisition: XNZ.
  5. Investigation: FW YMJ HTW.
  6. Methodology: XYH XNZ.
  7. Project administration: XYH.
  8. Resources: YTD.
  9. Software: CL XWS DFX.
  10. Validation: XNZ.
  11. Visualization: ZL.
  12. Writing – original draft: XYH.
  13. Writing – review & editing: XYH XNZ.


  1. 1. Freile-Pelegrın Y, Murano E (2005) Agars from three species of Gracilaria (Rhodophyta) from Yucatán Peninsula. Bioresource Technology 96: 295–302. pmid:15474929
  2. 2. Lapointe BE, Ryther JH (1978) Some aspects of the growth and yield of Gracilaria tikvahiae in culture. Aquaculture 15: 185–193.
  3. 3. Marinho-Soriano E (2001) Agar polysaccharides from Gracilaria species (Rhodophyta, Gracilariaceae). Journal of Biotechnology 89: 81–84. pmid:11472802
  4. 4. Marinho-Soriano E, Bourret E (2003) Effects of season on the yield and quality of agar from Gracilaria species (Gracilariaceae, Rhodophyta). Bioresource Technology 90: 329–333. pmid:14575957
  5. 5. Zhou Y, Yang HS, Hu HY, Liu Y, Mao YZ, et al. (2006) Bioremediation potential of the macroalga Gracilaria lemaneiformis (Rhodophyta) integrated into fed fish culture in coastal waters of north China. Aquaculture 252(2): 264–276.
  6. 6. Wu HY, Jiang H, Liu CX, Deng YY (2015) Growth, pigment composition, chlorophyll fluorescence and antioxidant defenses in the red alga Gracilaria lemaneiformis (Gracilariales, Rhodophyta) under light stress. South African Journal of Botany 100:27–32.
  7. 7. MacColl R (1998) Cyanobacterial phycobilisomes. Journal of Structural Biology 124: 311–334. pmid:10049814
  8. 8. MacColl R (2004) Allophycocyanin and energy transfer. Biochimica et Biophysica Acta (BBA)-Bioenergetics 1657: 73–81.
  9. 9. MacColl R, Eisele LE, Malak H, Endres RL, Williams EC, et al. (1999) Studies on R-phycoerythrins from two Antarctic marine red algae and a mesophilic red alga. Polar Biology 22: 384–388.
  10. 10. Sun X, Zhou S, Meng F, Liu S (2012) De novo assembly and characterization of the garlic (Allium sativum) bud transcriptome by Illumina sequencing. Plant Cell Reports 31: 1823–1828. pmid:22684307
  11. 11. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, et al. (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnology 29: 644–652. pmid:21572440
  12. 12. Zhang D, Ren L, Yue J-h, Shi YB, Zhuo LH, et al. (2015) RNA-Seq-based transcriptome analysis of stem development and dwarfing regulation in Agapanthus praecox ssp. orientalis (Leighton) Leighton. Gene 565: 252–267. pmid:25865295
  13. 13. Thimm O, Bläsing O, Gibon Y, Nagel A, Meyer S, et al. (2004) MAPMAN: a user‐driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. The Plant Journal 37: 914–939. pmid:14996223
  14. 14. Young MD, Wakefield MJ, Smyth GK, Oshlack A (2010) Gene ontology analysis for RNA-seq: accounting for selection bias. Genome biology 11: 2–r14.
  15. 15. Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, et al. (2008) KEGG for linking genomes to life and the environment. Nucleic Acids Research 36: D480–D484. pmid:18077471
  16. 16. Everroad RC, Wood AM (2012) Phycoerythrin evolution and diversification of spectral phenotype in marine Synechococcus and related picocyanobacteria. Molecular Phylogenetics and Evolution 64: 381–392. pmid:22588203
  17. 17. Llansola-Portoles MJ, Uragami C, Pascal AA, Bina D, Litvin R, et al. (2016) Pigment structure in the FCP-like light-harvesting complex from Chromeravelia. Biochimica et Biophysica Acta (BBA)-Bioenergetics 1857: 1759–1765.
  18. 18. Bonente G, Pippa S, Castellano S, Bassi R, Ballottari M (2012) Acclimation of Chlamydomonas reinhardtii to different growth irradiances. Journal of Biological Chemistry 287: 5833–5847. pmid:22205699
  19. 19. Ballottari M, Dall'Osto L, Morosinotto T, Bassi R (2007) Contrasting behavior of higher plant photosystem I and II antenna systems during acclimation. Journal of Biological Chemistry 282: 8947–8958. pmid:17229724
  20. 20. Wientjes E, van Amerongen H, Croce R (2013) Quantum yield of charge separation in photosystem II: functional effect of changes in the antenna size upon light acclimation. The Journal of Physical Chemistry B 117: 11200–11208. pmid:23534376
  21. 21. Wobbe L, Bassi R, Kruse O (2016) Multi-level light capture control in plants and green algae. Trends in Plant Science 21: 55–68. pmid:26545578