Genome-wide transcriptome profiling of radish (Raphanus sativus L.) in response to vernalization

Vernalization is a key process for premature bolting. Although many studies on vernalization have been reported, the molecular mechanism of vernalization is still largely unknown in radish. In this study, we sequenced the transcriptomes of radish seedlings at three different time points during vernalization. More than 36 million clean reads were generated for each sample and the portions mapped to the reference genome were all above 67.0%. Our results show that the differentially expressed genes (DEGs) between room temperature and the early stage of vernalization (4,845) are the most in all treatments pairs. A series of vernalization related genes, including two FLOWERING LOCUS C (FLC) genes, were screened according to the annotations. A total of 775 genes were also filtered as the vernalization related candidates based on their expression profiles. Cold stress responsive genes were also analyzed to further confirm the sequencing result. Several key genes in vernalization or cold stress response were validated by quantitative RT-PCR (RT-qPCR). This study identified a number of genes that may be involved in vernalization, which are useful for other functional genomics research in radish.


Introduction
Radish (Raphanus sativus L.) is an economically important root vegetable crop grown worldwide, particularly in China, Japan, Korea, and Southeast Asia [1]. For the nutrient-rich tuberous root, many breeding efforts on radish have been devoted to developing varieties with different size [2][3][4], color [5][6][7][8][9], cultivation season [5,10,11] and other characteristics [12][13][14]. Although all-season radish is available in many areas now, there are still a lot of practical problems in production, among which premature bolting is one of the most prominent. Before bolting, the plant needs a period of low temperature to accomplish vernalization. A better understanding of the molecular mechanism of vernalization will be helpful to solve these practical problems such as premature bolting.
In the past several decades, many efforts were made to illustrate the molecular mechanisms of vernalization in various plants, among which Arabidopsis thaliana has been studied PLOS  extensively. In Arabidopsis thaliana, the vernalization requirement is mainly due to the expression level of the FLC gene [15]. Low temperature can reduce the level of the DNA methylation and affect the FLC expression [16]. FLC is a MADS-box transcription factor and functions as a flowering suppressor during vegetative growth, by directly binding to the floral integrators downstream, such as FLOWERING LOCUS T (FT), FLOWERINGLOCUS D (FD) and SUP-PRESSOR OF OVEREXPRESSIONOF CONSTANS 1 (SOC1) [17][18][19]. Additionally, in the upstream of FLC, the vernalization pathway includes other genes, such as VERNALIZATION1 (VRN1), VERNALIZATION2 (VRN2), VERNALIZATION INSENSITIVE 3 (VIN3) and VIN3-LIKE1 (VIL1)/VERNALIZATION5 (VRN5) [20]. Study on a vin3 mutant showed that the expression level of FLC was not repressed even when the plant underwent a long period of low temperature [21]. In the plants with the vrn1 and vrn2 mutants, though cold stress can reduce the FLC expression, the repression is reversed when the temperature rises again [22,23]. These results indicate that VIN3 participates in the suppression of FLC at the beginning of vernalization, while VRN1 and VRN2 function to maintain the low level of FLC expression. In addition to Arabidopsis, other crops with different gene regulatory circuitries of vernalization have also been investigated in recent years [24][25][26][27][28]. For radish, several genes related to vernalization have been found [29][30][31]. However, the molecular mechanism of vernalization is still largely unknown.
The emergence of the next generation sequencing (NGS) technology has improved the throughput and shortened the cycle time of sequencing, which facilitates molecular studies at the transcriptome and genome levels. The application of the NGS technology to transcriptome analysis, namely RNA-Seq, offers an efficient and inexpensive way for transcriptome studies. Transcriptome sequencing on radish has led to the discovery of many critical genes related to certain characteristics, and the development of numerous molecular markers [32][33][34]. However, the majority of the transcriptome sequencing analyses were analyzed by de novo assembly because the reference genome was unavailable when these studies were constructed. The recently released genome sequence [35,36] provides us a new strategy for radish transcriptome sequencing with better coverage and accuracy.
In this study, we analyzed the radish transcriptome during vernalization with RNA-Seq, using the radish genome as a reference [36]. The vernalization-related genes and the gene expression patterns were studied by three treatments with different cold exposure schemes. Our study provides an important opportunity to advance our understanding of the molecular mechanism of vernalization in radish and other plants.

Illumina sequencing and mapping against radish reference genomes
For each of the three treatments, three replicates were prepared and each replicate included ten individuals. The samples were labeled as RT1, RT2, and RT3 for the room temperature treatment (RT), VE1, VE2, and VE3 for the early stage of vernalization (VE), and VL1, VL2, and VL3 for the late stage of vernalization (VL), respectively. A total of nine cDNA libraries were constructed and sequenced by paired-end sequencing on an Illumina HiSeq 2500 platform. More than 36 million raw reads were generated for each library, and the portions of clean reads were all above 99.60% (Table 1). Two reported radish genomes, "RSA_r1.0" [35] and "rsgv1" [36], were used for mapping analysis. For RSA_r1.0, the clean reads for VE1, VE2, VE3, VL1, VL2, VL3, RT1, RT2 and RT3 were 66.93%, 67.28%, 67.02%, 66.69%, 65.34%, 66.70%, 67.42%, 67.88% and 68.01%, respectively. The portions of clean reads were higher when rsgv1 was used as the reference genome, which ranged from 67.85% to 69.91% (Table 1). Hence, the rsgv1 genome was used as the reference for subsequent analysis. For the reference genome, more than 67% of the clean reads in each library were uniquely mapped, while less than 1.5% was mapped to multiple positions. A total of 558 new genes were identified after filtering out the sequences that contain only one exon or encode a peptide less than 50 amino acids (S1 Table).

Identification of differentially expressed genes during vernalization
The transcripts were classified into five categories according to the fragments per kilobase of transcript per million fragments mapped (FPKM) [37] (Fig 1). The genes with FPKM between 10 and 100 belonged to the largest group, followed those with FPKM between 3 and 10.
According to the FPKM value, the correlations of the gene expression of the nine samples, especially the three biological replicates, were assessed, and the correlation coefficients (R 2 ) [38] were all above 0.91 among the replicates (Fig 2). Genes expressed in different treatments were screened. A total of 29,535, 29,416 and 29,540 genes were expressed in all three replicates in VE, VL and RT, respectively (S2 Table). More than 92% (27,304) were expressed in all three treatments, while only a small portion (40,35 and 129 for VE, VL and RT, respectively) was expressed specifically in one treatment. DEGs was identified by the DEGseq package [39], and a total of 1,575 (706 up-and 869 down-regulated), 4,845 (2,675 up-and 2,170 down-regulated), and 3,239 (1,653 up-and 1,586 down-regulated) genes were differentially expressed in the treatment pairs, VE vs. VL, RT vs. VE, and RT vs. VL, respectively. All DEGs were divided into seven groups according to the expression profiles, among which the expression of 122 genes was significant different in all three treatment pairs, and the genes only differentially expressed between RT and VE belonged to the largest group (Fig 3).

Gene Ontology (GO) enrichment of DEGs during vernalization
GO enrichment of the DEGs was conducted using the GO database. A total of 1,419, 4,363 and 2,925 DEGs from the three corresponding treatment pairs were annotated, respectively. All the GO terms were ranked based on the Kolmogorov-Smirnov (KS) value [40], and only the top three GO terms of the three main groups (cellular component, molecular function and biological process) of each treatment pair were picked for comparison (S3 Table). In the biological process category, "pyrimidine ribonucleotide biosynthetic process" was shared between RT vs.  VE and VE vs. VL, while "single-organism process" and "single-organism cellular process" were shared between RT vs. VL and VE vs. VL. "Protein binding" and "ATP binding" from the molecular function category were found in all treatment pairs. The term "sequence-specific DNA binding transcription factor activity" was shared between RT vs. VL and VE vs. VL. No term in the cellular component category was shared.

Metabolic pathway analysis of DEGs
Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis of DEGs, based on the KEGG database, was performed to identify significantly altered pathways involved in vernalization. In total, 377, 1,309 and 773 DEGs from the VE vs. VL, RT vs.VE, and RT vs. VL treatment pairs were annotated, respectively. Only 3 pathways were significantly enriched with Q value 0.01. For VE vs. VL, "ribosome biogenesis in eukaryotes" and "purine metabolism" were enriched. "Ribosome" and "ribosome biogenesis in eukaryotes" were enriched in RT vs. VE. No pathway was significantly enriched in RT vs. VL. In addition, pathways with the most DEGs in both RT vs. VE and RT vs. VL were "Ribosome", followed by "Biosynthesis of amino acids" and "Carbon metabolism". The corresponding pathways in VE vs. VL were "Ribosome biogenesis in eukaryotes", "Protein processing in endoplasmic reticulum", "Purine metabolism" and "Ribosome" (S1-S3 Figs).

Screen vernalization related candidates according to the expression profile
Vernalization is a mechanism for plants to avoid flowering in an improper season. Plants have to undergo a period of cold, which is long enough to ensure the winter has passed. Hence, genes with expression changes that only occur after a long period of cold are more likely to function in the regulation of vernalization. In this study, these genes belong to two groups of the DEGs, which show expression difference in VE vs. VL and not in RT vs. VE (Fig 3). A total of 775 (459 up-and 316 down-regulated) vernalization related candidate genes were screened (Fig 4). The screened genes accounted for a small portion (11.57%) of the DEGs and the upregulated candidates were more than the down-regulated.

Validation of DEGs by quantitative RT-PCR
To verify the DEGs, five and four genes related to vernalization [15] and cold stress response [41] were selected for quantitative RT-PCR, respectively. The result showed that all genes exhibited the same expression tendency as in the RNA-Seq result, indicating that our RNA-Seq results are reliable (Fig 5).

Discussion
The molecular mechanism of vernalization is still largely unknown in radish. One high throughput sequencing analysis aiming at radish vernalization has been reported, which was de novo assembled and mainly focused on the FLC genes [31]. In this study, we analyzed the radish transcriptome before and after vernalization with a reference genome and identified a series of vernalization-related genes. Our experiment includes three treatments in which, the plants were exposed to low temperature for 0, 3 and 20 days, respectively. A total of 39,118 transcripts were detected. Among the three treatments, the experiment for the pair RT vs.VE discovered more DEGs than the other two experiments, with the least number of DEGs in the experiment for VE vs. VL. The large amount of DEGs in RT vs.VE may be partly because of the cold stress responsive genes which had expression changes immediately after exposure to low temperature. Vernalization is one of the four major flowering regulation pathways in plants. Before flowering, a period of low temperature is needed for winter plant to accomplish vernalization, during which a series of related genes are regulated. To date, Arabidopsis is the model plant used for the investigation of the vernalization mechanism. According to the regulatory network of Arabidopsis [15], several key genes, including FLC, VRN1, VRN2, VIN3, VIL1/VRN5, FT, FD and SOC1, were analyzed in our result. Three FLC genes were reported in a previous study [31], but only two were annotated in the reference genome. Two FLC genes (RSG13912 and RSG31600) were detected in our study, and both were down-regulated (S4 Table). RSG13912 was down-regulated in the whole process and RSG31600 was down-regulated only in RT vs. VL. For the upstream of FLC, 2, 2, 10 and 6 genes were found in the genome annotated as VIN3, VRN1, VRN2 and VIL1/VRN5, respectively, while the numbers of VRN2 (5) and VIL1/VRN5 (5) in the present study were less than the genome (S4 Table). In Arabidopsis, VRN1, VRN2 and VIL1/VRN5 are constitutively expressed regardless of vernalization. Our result showed that the expressions of most genes did not change noticeably during vernalization, except 1 VIL1/VRN5. The VIN3 gene was reported to be up-regulated in the process of vernalization in previous studies [21,42], which was also identified in this study. The two VIN3 genes did not have the same expression pattern; RSG35873 was up-regulated in the whole process, and RSG11514 was up-regulated only in RT vs. VL. FT, FD and SOC1 are at the downstream of FLC, which contains 1, 3 and 3 members in the radish transcriptome (S4 Table). Only the member of FT showed an increase of the expression. This gene involves in the flowering process of the "circadian rhythm-plant" pathway (ko04712). The up-regulation in the late stage of vernalization indicates its important role in promoting flowering (Fig 6).
In addition to the key genes in vernalization, the members of previously reported protein complexes that regulate the expression of FLC [15] were also studied in our result. The complexes involved in the activation of FLC in Arabidopsis include the COMPASS complex [43], the RAD6-BRE1 complex [44], the PAF1 complex [45], the FRI complex [46] and the SWR1 complex [46]. Among the FLC activators, the expression of most components of the complexes did not change during vernalization. Only 1, 1, 2 and 2 genes in COMPASS, PAF1, FRI and RAD6-BRE1 were down-regulated, respectively (S5 Table). Regarding the repressors of FLC, there are mainly 2 complexes, PRC2 and PRC1 [47,48]. Similar to the FLC activators, the expression of most components of PRC2 and PRC1 did not change during vernalization (S5 Table). Three and one genes in PRC2 and PRC1 were up-regulated, respectively. These results indicate that the majority of the complex members are constitutively expressed.
Among the seven groups of the DEGs, two groups that showed expression difference in VE vs. VL and not in RT vs. VE were selected as vernalization related candidates. The up-regulated genes were more than the down-regulated genes, and the fold change of the up-regulated genes was larger than the down-regulated genes, except the infinite and infinitesimal values. Among the up-regulated genes with the most dramatic changes (log2Ratio ! 4), the majority encode proteins that are involved in plant growth and development, such as glycine-rich protein, beta-1, 3-glucanase, cytochrome P450, etc (S6 Table). The thioglucoside glucohydrolase gene (TGG1) [49] was also identified, implying it may play an important role in vernalization in addition to defenses against insects and disease. In the dramatically down-regulated genes, one gene encodes indole-3-acetic acid (IAA) inducible 32 was filtered. Moreover, the "plant hormone signal transduction" pathway (ko04705) also showed a down-regulation of AUX/ IAA (S4 Fig). Previous studies showed that decrease of IAA can promote the process of flower bud differentiation [50,51]. The down-regulated IAA32 gene in this study indicates that IAA may also play an important role in the process of vernalization.
To further verify our sequencing result, the COR genes and CBF genes which constitute the predominant cold signaling pathway in plant [41] were selected for expression analysis. Among the 13 genes annotated as COR, 12 CORs were identified in this study (S7 Table). Among the COR genes, 5 were differentially expressed in vernalization, and up-regulated since the early stage of vernalization and did not change between VE and VL. All 8 CBF genes were detected, among which, only 2 were up-regulated in RT vs. VE and down-regulated in VE vs. VL. The increases in the expression of the CORs and CBFs after cold exposure are consistent with the reports in other plants [52][53][54][55].

Plant material and RNA extraction
A green radish inbred line "2 # " was used for this study. The seeds were obtained from the Institute of Vegetables and Flowers, Shandong Academy of Agricultural Sciences. Three cold exposure treatments were constructed with different exposure time (0, 3 and 20 days). These treatments stand for room temperature, early stage of vernalization and late stage of vernalization respectively. For room temperature treatment, the seeds germinated and grew in culture dishes with wet filter paper at room temperature. The seeds for the early stage of vernalization treatment also germinated and grew in culture dishes at room temperature and then the culture dishes were placed in the refrigerator at 4˚C for 3 days, while the germinating seeds for the late stage of vernalization were transferred to 4˚C for 20 days. All seedlings were cultivated in the dark and sampled at the same time when they were of the same size and stored at -80˚C until RNA extraction.
Total RNA was isolated using Trizol reagent (Invitrogen, USA) following the standard protocol. The concentration of the total RNA was determined by NanoDrop (Thermo Scientific, USA), and the RNA integrity value (RIN) was checked using RNA 6000 Pico LabChip of Agilent 2100 Bioanalyzer (Agilent, USA).

cDNA library construction and sequencing
Enrichment of mRNA was conducted with oligo magnetic adsorption. The enriched mRNA was randomly fragmented by the fragmentation buffer. Severed as a template, mRNA was used for the first-strand cDNA synthesis with random hexamers. The second-strand cDNA was synthesized using DNA polymerase I and purified using AMPure XP beads (Agencourt, Beverly, MA, USA). Sequencing adaptors were linked to the purified cDNA, and the cDNA fragments of a suitable length were then selected by AMPure XP beads. Finally, nine doublestrand Illumina libraries were obtained by PCR amplification. The libraries were sequenced by the Illumina HiSeq 2500 system (Biomarker Technologies Co., Ltd, Beijing, China). All datasets from the Illumina sequencing platform can be found in the Short Read Archive (SRA) database of the National Center for Biotechnology Information (NCBI) under accession number SRP093947.

Data processing
Raw data were processed to remove primers and adaptor sequences. Low quality reads were filtered out and the reads with more than 80% portion Q ! 30 were obtained as clean reads. Clean reads were aligned with the TopHat2 software [56] using two reported radish genomes [35,36] as references. Reads aligned to the genome sequence were named as mapped reads, and only mapped reads were selected for subsequent analysis. The number of the mapped reads was used for the calculation of alignment efficiency and for the selection of the optimal reference genome. The mapped reads were assembled by the Cufflinks software [57], and compared with the reference genome to uncover new genes.

Analysis of differentially expressed genes
The gene expression level was calculated using the FPKM method [37]. Correlations of the biological replicates were evaluated by calculating the Pearson's Correlation Coefficient [38]. Differentially expressed genes between different treatments were identified by the DEGseq package [39]. The significant differences in the expression levels were assessed using false discovery rate (FDR) and log2Ratio with a threshold "FDR < 0.01 and the absolute value of log2Ratio ! 1". To annotate the biological functions of the DEGs, GO enrichment analysis was performed using the GO database (http://geneontology.org/), and KEGG enrichment analysis was performed using the KEGG database (http://www.genome.jp/kegg/).

Quantitative RT-PCR validation
To validate the sequencing result and the DEGs, qRT-PCR was conducted. The first strand cDNA synthesis and qRT-PCR analysis were conducted using the TransScript One-Step gDNA Removal and cDNA Synthesis SuperMix (AT311, TransGen Biotech, Beijing, China) and the TransStart Tip Green qPCR SuperMix (AQ141, TransGen Biotech, Beijing, China), respectively. The ACTIN gene was chosen as a constitutive expression control in the qRT-PCR analysis. The gene-specific primers of the validated DEGs and ACTIN are listed in S8 Table. PCR reactions were performed on an IQ5 Real-Time PCR System (BIO-RAD, Hercules, CA, USA) with the following cycling parameters: 95˚C for 2 min, followed by 45 cycles at 95˚C for 15 s, and 60˚C for 70 s. All reactions were performed with three replicates. Gene expression levels were calculated using the delta-delta Ct method [58].

Conclusions
We presented a comprehensive analysis of the gene expression profiles in radish during vernalization, using the latest published genome sequence as the reference. A series of vernalization related genes were identified according to the annotations and the expression patterns. Cold stress responsive genes were also analyzed to further confirm the sequencing result. This study offers important insights into the molecular mechanism of vernalization in radish.