Transcriptome Sequencing and Profiling of Expressed Genes in Phloem and Xylem of Ramie (Boehmeria nivea L. Gaud)

Ramie (Boehmeria nivea L. Gaud) is a highly versatile herbaceous plant which is widely cropped in southern China. The success of this herbaceous plant relies on wide use in modern industry. Understanding the profiling of expressed genes in phloem and xylem of ramie is crucial for improving its industrial performance. Herein, we uncover the transcriptome profile in phloem and xylem in present study. Using Illumina paired-end sequencing technology, 57 million high quality reads were generated. De novo assembly yielded 87,144 unigenes with an average length of 635 bp. By sequence similarity searching for public databases, a total of 32,541 (41.77%) unigenes were annotated for their function. Among these genes, 57,873 (66.4%) and 28,678 (32.9%) unigenes were assigned to categories of Gene Ontology and Orthologous Groups database, respectively. By searching against the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG), 18,331 (21.0%) unigenes were mapped to 125 pathways. The metabolic pathways were assigned the most unigene (4,793, 26.2%). Furthermore, Pol II and Pol III subunits as well as the genes of Galactose metabolism pathway had higher expression in phloem compared to xylem. In addition, fatty acid metabolism pathway genes showed more abundant in xylem than phloem. These results suggest that high activities of RNA synthesis and Galactose metabolism pathway promises fiber synthesis in phloem. The present study is the initial exploration to uncover the fiber biosynthesis difference between phloem and xylem in ramie through the analysis of deep sequencing data.


Introduction
Ramie (Boehmeria nivea) is one of the most widely cropped herbaceous plants in southern China which provides important natural fiber for industry, including packing materials, fishing nets, filter cloths, industrial fuel and bio-fertilizers [1,2,3]. In ramie, as in other plants, phloem and xylem are two different complex tissues which plays crucial role in transporting food and water in plant [4]. In modern industry, these two tissues have different uses. Bast fibers in phloem of ramie are one of the best clothing materials which are the second usage clothing materials in China except cotton fiber [5,6] while, lignin in xylem is great raw material for industrial ethanol production and suitable material for bio-fertilizers after fermentation [7,8]. Hence, several studies have been focus on the mechanisms of phloem and xylem development in ramie [6,9,10,11]. These researches focus on single gene or a class of single-function genes expression in phloem and xylem. However, to better understand the differentiation of the net work genes between phloem and xylem, high-throughput technology needs to be employed.
Nowadays, system biology is applied to explore metabolic network of fiber biosynthesis in plants by improving lignocellulose biological quality with the ultimate purposes of biofuel reduction in processing and disposing, cost saving and pollution reduction [12,13]. Cell behavior and biological phenomena can be comprehended better with the study on metabolic network through analyzing biochemical data, exploring the relations between structure and function of metabolic network as well as the relations between the metabolic flow and gene expression [14]. However, up to date, little information has been given in metabolic network in ramie which could aid to planting industry and machining. In 2013, Liu et al reported a transcriptome analysis using Illumina paired-end sequencing in ramie which is the first time to characterize the ramie transcriptome [15]. Also, they detected the different expression among leaf, root, stem bast, stem xylem and stem shoot. The report offered a new sight for ramie study. Considering the industrial uses of ramie, the most urgent need is to understand the metabolic network of fiber biosynthesis in phloem and xylem. Once understanding the different metabolic network of fiber biosynthesis between these two different tissues, we could adjust the breeding and planting strategy of ramie for higher productive efficiency.
Next-generation sequencing offered a new approach which is more efficient and less cost for obtaining functional genomic information [16]. However, for non-model organisms, high sequencing coverage depth and facilitates the de novo assembly are needed. Furthermore, next-generation sequencing provides a new strategy which promises understanding the gene expression differentiation without knowing sequences. This superiority of next-generation sequencing guarantees transcriptome scanning for network genes rather than micro-arrayed in non-model organisms [17]. Thus, next-generation sequencing is powerful and efficient tools to understand the molecular mechanisms involved in fiber biosynthesis and key targets for genetic manipulation as well as germplasm improvement. In recent years, this technology has been applied in several studies for plant fiber formation and biosynthesis [18,19,20]. However, very limited genomics and functional genomics resources which related to the mechanisms of ramie fiber formation has been unveiled.
In the present study, we used the Illumina paired-end sequencing technology to characterize the transcriptome of phloem and xylem tissues in ramie and identified the different expressed genes between phloem and xylem. These data offer useful information for understand the gene controlling of the fiber biosynthesis. To our knowledge, this is the initial exploration to uncover the fiber biosynthesis difference between phloem and xylem in ramie through the analysis of deep sequencing data.

Materials and RNA extraction
All the ramie individuals in the present study were grown in the experimental field in Department of Biotechnology and Environmental Science, Changsha University. The samples of ramie were planted in Changsha University, Department of Bioengineering & Environmental Science. We used 5 different adult individuals for this study. The phloem and xylem tissues were distinguished and collected under the stereomicroscope from the middle stem of the plants (Fig. 1). The tissues were immediately frozen in liquid nitrogen and then stored at 280u until total RNA isolation. The Total RNAs were extracted from each tissue of ramie of three different individuals by using Plant DNA Isolation Kit (Invitrogen, CA, USA).

cDNA library construction and transcriptome sequencing
The cDNA library construction and transcriptome sequencing were performed by BGI-Shenzhen (Shen Zhen, China) using a HiSeq 2000 platform (Illumina, San Diego, CA, USA). Briefly, poly(A) RNA was isolated from 20 mg total RNA with Sera-mag Magnetic Oligo (dT) Beads (Illumina). First, the mRNA was fragmented into small pieces and then the cDNA was synthesis using random hexamer (N6) primers (Illumina) according to the end-repair and phosphorylation protocol. Second, 15 rounds of PCR amplification was used to enrich the end-repair cDNA. Finally, the cDNA library was underwent Illumina sequencing in an Illumina Genome Analyzer HiSeq 2000 according to the manufacturer's instruction. The sequencing data is available in the

Data process and de novo assembly
After transcriptome sequencing, for de novo assembly, all the raw reads were filtered into high-quality clean reads. The clean reads from the cDNA libraries were filtered by low-quality reads and adaptors-only with Q value #20. We used Trinity software (http://trinityrnaseq.sourceforge.net/) for de novo assembling with an optimized k-mer length of 25. The paired-end reads were assigned to different contigs for following annotation. We also used TGICL to filter out the redundant unigene and process single set of non-redundant unigene.

Gene annotation
BLASTx alignment E-value#10 25 was performed to determine the functional annotation of unigenes. The protein databases of BLASTx are non-redundant protein database (nr) form NCBI (http://www.ncbi.nlm.nih.gov), the Cluster of Orthologous Groups database (COG) (http://www.ncbi.nlm.nih.gov/COG), Swiss Prot (http://www.expasy.ch/sprot), Protein Information Resource (PIR) (http://pir.georgetown.edu/), Protein Research Foundation (PRF) (http://www.prf.or.jp/index-e.html) and Protein Data Bank (PDB) (http://www.rcsb.org/) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database (http://www.genome.jp/kegg). The best matched results were regarded as the sequence direction of unigenes. Once the results were different from the different database, we set the priority order of the database as nr, COG, Swiss Prot, PIR, PRF, PDB and KEGG. Once a unigene could not match any sequence in all the databases, we used ESTScan software to predict the coding region and to decide the sequence direction of unigenes.
Subsequently, the unigenes were annotated by Blast2GO program to obtained Gene Ontology (GO) annotation according to the three levels including molecular function, biological process and cellular component ontologies. The GO annotation plot was depict from WEGO website by BGI. The unigenes were assigned into different classify functions by match the genes in COG database. The pathway assignments of the ramie unigenes were predicted by assign into KEGG genes with BLASTx.    Analysis and screening of differentially expressed genes After sequence assembly and gene prediction of the cDNA sequences, we used reads count and length information to analyze the differentially expressed genes in phloem and xylem of ramie. To calculate the relative expression levels of phloem and xylem, the reads counts were normalized to the total number of produced reads for each sample and calculated by reads per kb per million reads (RPKM) method. The significance of gene differential expression level was assessed using R, DEGseq package. The significant differentially expressed genes were confirmed from the assay of DEGseq package with P#0.0001 based on false discovery rate.

Real-time PCR
To validate the expression profiles of the selected genes, realtime PCR was performed. Total RNAs of the tissues were isolated and then cDNA were synthesized using MMLV reverse transcriptase (Invitrogen, Carlsbad, CA, USA). The real-time PCR primers were showed in Table S1. The SYBR Green I Master Mix (TaKaRa, Dalian, China) was used. The reactions were performed as denaturation at 95uC for 5 min; followed by 40 amplification cycles at 95uC for 15 s and 60uC for 30 s. Dissociation protocols were used to measure melting curves. bactin were used as internal control. The relative expression levels were calculated by 2 2DDCt .

RNA-seq and gene annotation by searching against public databases
Sequences of the two cDNA libraries derived from phloem and xylem tissues generated 33.0 and 33.4 million reads, respectively. The total data contained 10 Gb of sequences with average length of 200 bp. After filtered the low-quality reads, approximately 57 million high quality reads were obtained. Based on these high quality reads, 162,330 contigs with an average length of 357 bp ranging from 100 to 14,587 bp were obtained after assembly (Fig. 2). For each tissue, 100,207 and 62,123 contigs were assembled with an average length of 319 and 418 bp in ramie phloem and xylem, respectively. Unigenes were produced by realignment of paired-end reads to contigs, and the contigs in one transcript which assembled with not extended on the end were considered as unigenes. In total, after assembly, 87,144 unigenes were yielded with average length of 635 bp. For phloem and xylem, 50,434 and 36,710 unigenes were obtained, respectively. About 83.88% of the unigenes were produced from more than 10 reads. All the unigenes were predicted via BLAST against public databases. With the E-value threshold of 10 25 , 32,541 (41.77%) unigenes were matched the sequences in the databases. The Evalue distribution of the top hits in the Nr database revealed that the mapped sequences showed significant homology (Fig. 3A), and 38.5% of the sequences with similarities were the 40-60% were found (Fig. 3B). The most target sequences were from Arabidopsis thaliana and Oryza sativa (Fig. 3C). For phloem and xylem, 15,219 (36.65%) and 17,322 (47.61%) unigenes were annotated from the databases, respectively (Table S2).

GO and COG analysis of the unigenes
Gene Ontology (GO) as a standardized gene functional classification system has been used for illustrating the properties of genes which is widely used in transcriptome and micro-array analysis. Three different ontologies were set up, including molecular function, cellular component and biological process. In present study, 57,873 (66.4%) unigenes were assigned to GO classes with 5,783 functional terms. Of these unigenes, 20,141 (23.1%), 24,819 (28.5%) and 12,913 (14.8%) were assigned into Molecular function, Cellular component and Biological process ontologies, respectively (Fig. 4).

Differential gene expression between phloem and xylem
To determine the differential gene expression between phloem and xylem, we first prepare the cDNA libraries with standard Illumina random-prime PCR technique to enrich mRNA of the samples and then used DEGseq package of R to examine differential gene expression. The differential expression was defined as the expression fold of gene specific mRNA levels with P value #0.0001. In total, 10,870 unigenes were found that significantly higher expressed in phloem tissue while only 4,440 unigenes were higher expressed in xylem tissue compared with phloem tissue. Interestingly, most of the genes classified in RNA polymerase pathway show an increased expression in phloem, including Pol II Core subunits B2, Pol II specific subunits B7, Pol II specific subunits B9, Pol common subunits ABC3, Pol common subunits ABC5, Pol III Core subunits C1, Pol III Core subunits C2 (Fig. 6A) (Table S4, S5). These genes encode well-described RNA polymerase pathway genes. Meanwhile, Galactose metabolism pathway increased in xylem tissue compared with phloem tissue (Fig. 6B) which showed stimulation of polysaccharides synthesis. On the other hand, most of the genes classified in fatty acid metabolism pathway show a decreased expression in phloem, including ACOX1, ACOX3, ACADM, paaF, and ECHS1 (Fig. 6C) (Table S4, S5). It is surprising that higher expression of fatty acid metabolism pathway which could produce more energy.
To confirm the expression differentiation between phloem and xylem, real-time PCR was employed. The expression levels of Pol II Core subunits B2, Pol II specific subunits B7, Pol II specific subunits B9, Pol common subunits ABC3, Pol common subunits ABC5, Pol III Core subunits C1, Pol III Core subunits C2, ACOX1, ACOX3, ACADM, paaF, and ECHS1 were further examined (n = 5). The results showed that the results of RPKM are similar with the results of real-time PCR. The gene expression of Pol II Core subunits B2, Pol II specific subunits B7, Pol II specific subunits B9, Pol common subunits ABC3, Pol common subunits ABC5, Pol III Core subunits C1, Pol III Core subunits C2 were up-regulated in phloem while ACOX1, ACOX3, ACADM, paaF, and ECHS1 showed contrary results (Fig. 7).

Discussion
The strict controlled sequential changes of gene expression are required for tissue differentiation during plant development. In previous studies, enormous surveys of transcriptional regulation of tissue differentiation were conducted using microarray and EST sequencing [21,22,23]. In most recent studies, transcriptomes of different tissues by RNA-seq have been conducted. For the most part, these data is limited in the connection between productive performance and RNA sampled from a variety of tissues. Up to date, few studies have focused on the gene expression profiles of ramie. This is because ramie is a non-model organism and small planting areas all over the world. To our knowledge, the present research is the first study to explore the global gene expression profiles of in phloem and xylem of ramie. In ramie, phloem and xylem are used for clothing materials and renewable resources, respectively [24]. However, there are currently no comprehensive sequences or comparative transcriptomic analyses of phloem and xylem in ramie by RNA-seq. The present study provides large amount of sequences in these tissues. Of all the clean reads, 58.23% were not matched to any specific genes. The unmapped reads may for the following reasons: 1) the reads were not recognized by the genomes from databases; 2) sequencing errors lead to fail assemble; 3) alternatively spliced exons of the short reads resulted in failure matching of the reference genome; 4) several reads were from the contaminated genomes, such as microbes; 5) sequence differences between trancriptomes of ramie and reference genome.
Recently, a study characterized transcriptomes of 5 mixed tissues in ramie, including leaf, root, stem bast, stem xylem and stem shoot [15]. They characterized 43,990 genes from the ramie transcriptome and provided information of ramie growth and fiber development, which was aid to understanding of ramie growth and fiber development. This work extended our knowledge in understanding of expression profiles, regulation and networks of important traits of the ramie. However, the details of the most valuable productive tissues, such as phloem and xylem, were not distinguished. In our study, the 15,219 (36.65%) and 17,322 (47.61%) genes expressed in the phloem and xylem reported herein represent a large and previously transcriptomic resource for ramie. Also, the transcriptomic of two important tissues were distinguished. For all the genes, approximately 41.77% of the unigenes discovered in this study was ascertained. These genes were annotated from the public database which exhibited their function. Furthermore, these unigenes were assigned into different COG, GO and KEGG groups or pathways. These data will provide clues which would aid to explore the major genes for important agronomic traits and molecular mechanism of fibrogenesis in ramie.
The transcript profiling of phloem and xylem has been reported in several plants. By subtractive library method, Foucart and colleagues identified 263 unique sequences by comparing phloem to xylem [25]. However, due to the limited quantity of the data by subtractive library sequencing, the overall scanning of transcript profiling needs to be provided. Microarray technology may be one of the most useful resolutions. Zhao and colleagues studied xylem and phloem transcriptomes in Arabidopsis Root-Hypocotyl using Affymetrix GeneChip which promised large amount of data at one time [26]. They found that class III, HD, ZIP and KANADI transcription factors are high expressed during secondary growth which demonstrated that the up-regulation of these factors are key evidence in regulating of xylem or phloem cell differentiation and activity. Compared to traditional ESTs scanning and microarray technology, RNA-seq provides a more effective and economic way to investigate the profile of tissues transcriptomes. First, RNA-seq could obtain enormous data in one experiment which is similar with microarray; second, RNA-seq also aid to obtain new transcripts while microarray could not. In present study, we successfully obtained large sequences in phloem and xylem cDNA pools, and the differential expression genes are also defined. Thus, the results proved that RNA-seq is an effective and useful protocol to investigate the expression profiles of phloem and xylem.
The identification and characterization of the phloem and xylem enhanced genes in ramie is of vital importance to our understanding how these molecular differences regulate phloem and xylem differentiation and development. Some of the phloem enhanced genes were found. These genes included the following: (1) Pol II and Pol III subunits, which has been characterized to be required for RNA polymerization [27,28]; (2) Galactose metabolism pathway genes which participates polysaccharides synthesis [29]. On the other hand, some of the xylem-enhanced genes have been proven to be higher expression compared to phloem. These representative genes include the following: ACOX1, ACOX3, ACADM, paaF, and ECHS1 which participates in fatty acid metabolism pathway [30,31]. From this point of sight, the transcriptomic analysis of ramie is indeed an economic and efficient approach to identify tissue-enhanced genes. Further functional characterization of these genes should be focus on molecular mechanisms of these genes during development in ramie. Herein, we speculate that the enhanced expressed genes in phloem or xylem triggers the metabolic function in each tissue. The differential expression genes are functionally expressed in phloem and xylem tissues. For phloem enhanced genes, RNA polymerization indicates the high transcriptional activity. Considering phloem is the main location for primary metabolism of ramie, it is no surprise to understand high expression of these genes compare to xylem. Similarly, up-regulation Galactose metabolism pathway promises fiber synthesis in phloem. Intriguing, fatty acid metabolism pathway was up-regulated in xylem. This result suggested that more energy production were in xylem. In ramie, the connection of galactose and/or fatty-acid metabolism and the agronomic properties are still limited. Thus, the details and its influence are needed to be elucidated in further researches.
In conclusion, the present study provides a portfolio of candidate genes for future studies on fibre formation. However, the exact roles of these specific expressed genes in differentiation of ramie tissues are still unveiled. One of the possible roles of these specific expressed genes may be to antagonize the genes responsible for the other key pathway for specific tissues differentiation involved in development. In this way, further researches needs to be characterized the function of these genes and illustrated the paradigm during different developmental stages.