Identification of Regulatory Genes Implicated in Continuous Flowering of Longan (Dimocarpus longan L.)

Longan (Dimocarpus longan L.) is a tropical/subtropical fruit tree of significant economic importance in Southeast Asia. However, a lack of transcriptomic and genomic information hinders research on longan traits, such as the control of flowering. In this study, high-throughput RNA sequencing (RNA-Seq) was used to investigate differentially expressed genes between a unique longan cultivar ‘Sijimi’(S) which flowers throughout the year and a more typical cultivar ‘Lidongben’(L) which flowers only once in the season, with the aim of identifying candidate genes associated with continuous flowering. 36,527 and 40,982 unigenes were obtained by de novo assembly of the clean reads from cDNA libraries of L and S cultivars. Additionally 40,513 unigenes were assembled from combined reads of these libraries. A total of 32,475 unigenes were annotated by BLAST search to NCBI non-redundant protein (NR), Swiss-Prot, Clusters of Orthologous Groups (COGs) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Of these, almost fifteen thousand unigenes were identified as significantly differentially expressed genes (DEGs) by using Reads Per kb per Million reads (RPKM) method. A total of 6,415 DEGs were mapped to 128 KEGG pathways, and 8,743 DEGs were assigned to 54 Gene Ontology categories. After blasting the DEGs to public sequence databases, 539 potential flowering-related DEGs were identified. In addition, 107 flowering-time genes were identified in longan, their expression levels between two longan samples were compared by RPKM method, of which the expression levels of 15 were confirmed by real-time quantitative PCR. Our results suggest longan homologues of SHORT VEGETATIVE PHASE (SVP), GIGANTEA (GI), F-BOX 1 (FKF1) and EARLY FLOWERING 4 (ELF4) may be involved this flowering trait and ELF4 may be a key gene. The identification of candidate genes related to continuous flowering will provide new insight into the molecular process of regulating flowering time in woody plants.


Introduction
Longan, Dimocarpus longan, is a member of the family Sapindaceae. Longan trees are grown in many subtropical and tropical countries with majority of the production in Southeast Asia and Australia [1]. Planting area and yield in China have become the largest and highest in the world [2].
Flowering is a key event in plant life, especially in fruit trees. Obtaining plants that flower over the year is the goal of many gardeners, so as to be able to achieve year-round fruit production. Usually, longan trees have a single spring flowering period, floral bud induction requires a period of low temperature and only the terminal meristem differentiates into an inflorescence. Off-season flowering in longan is achieved by chemical treatment with potassium chlorate (KClO 3 ) application [3,4]. However, the induction effect varies in different regions. One cultivar of longan, 'Sijimi', originating from the China (Guangxi Province) and Vietnam border region [5], has a continuous blossoming trait due to a spontaneous mutation. 'Sijimi' was found to have a closer genetic relationship with longan cultivars of Guangxi Province by use of molecular marker analysis and is clustered with Chinese cultivar groups [6,7]. This cultivar blossoms and bears fruits throughout the year, under both tropical and subtropical conditions, with no requirement of environmental control. Both terminal and axillary buds of 'Sijimi' can differentiate into inflorescences. Flowers and fruits can be observed on one tree at the same time ( Figure 1A, B). Therefore, 'Sijimi' has been successfully used to produce off-season fruits without KClO 3 application. Furthermore, 'Sijimi' has a shorter grafting juvenile phase compared with normal longan cultivars. When 'Sijimi' scions are grafted on mature rootstocks in spring, sprouting shoots become mature in summer and bloom. In typical longan cultivars, flowering will occur at least two years after grafting on mature rootstocks. Based on the observation of specific flowering traits in 'Sijimi', we speculate that the mutation in 'Sijimi' gives mature shoots in mature trees or after grafting the capacity of continuous flowering.
Flowering is one of the most important developmental stages in plants, controlled by interactions between regulatory networks that control shoot development and those that mediate response to the environment. Photoperiod and temperature are two important environmental factors that affect plant flowering [8]. In Arabidopsis, the circadian clock and light signaling tightly control CONSTANS (CO) protein activity, and then accelerate flowering through the function of integrator genes like FLOWERING LOCUS T (FT), SUPPRESSOR OF OVEREXPRESSION OF CONSTANS 1 (SOC1) and LEAFY (LFY) [9,10]. The key regulator of the vernalization response in Arabidopsis is FLOWERING LOCUS C (FLC) [11]. FLC and SHORT VEGETATIVE PHASE (SVP), another transcriptional repressor, form a heterodimer under a wide range of cold conditions [12]. GA affects diverse biological processes, including flowering time. However, contrary to that seen in herbaceous species, GA may repress flowering in woody plants [13,4].
Other information that plants use to provide input on initiation of flowering are endogenous changes, involving an autonomous pathway and age-related pathway which affect the juvenile to adult transition [14,15]. Recently, mechanisms of age-dependent response to winter temperature in herbaceous perennial plants were reported [16,17]. Two age-regulated microRNAs, miR156 and miR172, and their targets SQUAMOSA PROMOTER BINDING LIKE (SPL) and APETALA 2 (AP2) regulate the timing of sensitivity in response to vernalization [16,17].
Perennial plants can flower both repeatedly and with seasonality. However, the regulatory pathway of seasonal flowering in perennial plants remains unclear. In rose and woodland strawberry, some cultivars also have the ability to flower continuously during a favorable season [18]. KSN, the TERMINAL FLOWER 1 (TFL1) homologue in these two plants, is the regulator of continuous flowering, suggesting a new role of TFL1 in perennial plants in the maintenance of vegetative growth and flowering seasonality [18]. In trifoliate orange a spontaneous mutant was found to have a short juvenile phase [19]. Deep sequencing and comparative gene expression analysis between wild type and precocious trifoliate orange during flower buds formation was used, but the gene associated with mutation remains unknown [19].
The longan cultivar 'Lidongben' originates from Fujian Province, Southern China. Although 'Sijimi' and 'Lidongben' originate from different regions of China, they have close genetic relationship [6,7]. In the current study RNA sequencing was used to examine differentially expressed genes between 'Sijimi' longan and typical longan cultivar 'Lidongben',with the aim of identifying genes and regulatory pathways associated with the mutation in 'Sijimi'. As it is difficult to obtain shoots with the same developing phase from mature trees of 'Sijimi' and normal longan cultivars, grafting newly-sprouting buds before maturity were used as material to eliminate the influence of inflorescence differentiation. Our results suggest longan homologues of SVP, GI, FKF1 and ELF4 may be involved in this flowering trait and ELF4 may be a key gene. No differences were seen when comparing the TFL1 homologue sequences between 'Sijimi' and 'Lidongben', indicating the mechanism in 'Sijimi' which is mutated is different from previously studied other perennial woody plants. The results will contribute to more understanding of seasonal flowering regulation in longan and other woody plants.

Plant materials
Longan samples of 'Sijimi' (S) and 'Lidongben' (L) were collected from the experimental fields of Fujian Academy of Agricultural Science in Fuzhou. Scions collected from mature trees of these two cultivars were grafted on different branches of mature rootstock plants (the cultivar 'Fuyan' of longan was used as rootstock) ( Figure 1C). The terminal tips of newly-sprouting shoots were collected before maturity. Materials of the same cultivar from more than 20 rootstock plants were mixed. Using the same method, terminal buds from the next year grafting were used for real-time quantitative PCR. All materials were frozen in liquid nitrogen immediately and stored at 280˚C for later use.
Total RNA was isolated using Universal Plant Maxi RNA Extraction Kit (BioTeke, China) according to the manufacturer's instruction and treated with RNase-free DNase I (Takara Biotechnology, China). All RNA samples were quantified and examined for protein contamination (A260/280) and reagent contamination (A260/230) by a Nanodrop ND 1000 spectrophotometer. Thirty micrograms of total RNA were sent to Beijing Genome Institute (BGI) (Shenzhen, China) where the libraries were constructed and sequenced using Illumina HiSeq 2000. RNA-seq library was constructed as described by Liu et al. [20].

Assembly and functional annotation
The cDNA fragments were approximately 200 bp in length and sequencing of cDNA libraries was performed by paired-end Illumina sequencing. De novo assembly of the short reads was carried out using SOAPdenovo [21] as described. The obtained raw reads were pre-processed by removing adaptor sequences, lowquality reads and reads of larger than 5% unknown sequence. High-quality clean reads were assembled using Trinity [22]. The longest assembled sequences were termed contigs. Paired-end reads were then mapped back to contigs to detect contigs from the same transcript as well as the distances between these contigs. Sequences without Ns which could not be extended on either end were selected and defined as unigenes. Using the same strategy, unigenes of longan were obtained from both L and S libraries. The datasets are deposited in the NCBI's SRA database with the accession numbers SRX479329 for the S library and SRX479332 for the L library, respectively.
The assembled unigenes were annotated by BLASTx alignment (Evalue,0.00001) to protein databases including NCBI NR (http://www.ncbi.nlm. nih.gov), Swiss-Prot (http://www.expasy.ch/sprot), KEGG (http://www.genome. jp/kegg) and clusters of orthologous groups (COG database, http://www.ncbi.nlm. nih.gov/COG). Protein function was predicted based on features of the best BLASTX hits from the alignments. Sequence orientations were determined according to the best hit in the database. If results from different databases conflicted with each other, a priority order of NR, Swiss-Prot, KEGG and COG was followed. Orientation and CDS (coding region sequences) of sequences with no hit in blast were predicted using ESTScan [23] (http://www.ch.embnet.org/ software/ESTScan.html). Original transcript sequences (5'-3') were provided if their orientations can be determined. Other sequences are provided as the assembler outputs.
For NR annotation, the Blast2GO program [24] was used to get GO annotation of unigenes. After obtaining a GO annotation for every unigene and differentially expressed genes, WEGO software [25] was used to classify GO function and to understand the distribution of gene functions of Dimocarpus longan from the macro level.

Differential expression of unigenes
The reads for a specific transcript were counted by mapping reads to assembled unigene sequences using SOAP [26]. Unigene expression levels were calculated using the RPKM method (Reads Per kb per Million reads) [27]. The calculated gene expression can be used for comparing the difference of gene expression among samples.
The transcript fold change was calculated by the formula of log 2 (S_RPKM/ L_RPKM). If the value of either L_RPKM or S_RPKM was zero, we used 0.01 instead of 0 to calculate the fold change. A modified Audic's method [28] was used to analyze differential expression. The formula to calculate the probability of a specific gene being expressed equally between the two samples was defined as: Where N1 and N2 indicate the total number of clean reads in L and S, respectively, and x and y indicate the number of mapped clean reads in each sample. FDR (False Discovery Rate) method was used to determine the threshold of the p-value in multiple tests [29]. In this study, we used FDR#0.001 and the absolute value of log 2 Ratio>1 as the threshold to judge the significance of differentially expressed genes.

Gene Ontology analysis for significantly differentially expressed genes (DEGs)
All DEGs were mapped to GO terms in the database by calculating gene numbers for every term, followed by a test to find significantly enriched GO terms, using a formula for calculation as described by Liu et al. [20]. GO terms fulfilling this condition were defined as significantly enriched. This analysis is able to recognize the main biological function that each DEGs is assigned to. Pathway enrichment analysis was used to identify significantly enriched pathways which were differentially expressed. The calculating formula is the same as that in GO analysis. Pathway with FDR#0.05 is called enriched significantly in differentially expressed genes.

Identification of flowering-related genes and flowering-time genes
Published sequences of flowering-related genes or gene families were downloaded from NCBI as reference database. All DEGs were blasted to this reference database with expectation value le-5 to screen potential flowering-related genes and followed by GO enrichment analysis.
Furthermore, unigenes were identifed in the transcriptome database which were homologous to longan flowering-time genes, and confirmed by blasting to NCBI Nt and Nr database. The differential expression of identified flowering-time genes between two longan samples were determined by the value of log 2 (S_RPKM/ L_RPKM).

Real-time quantitative PCR verification
Fifteen flowering-time unigenes were selected to confirm their expression in 'Sijimi' and 'Lidongben' using real-time quantative PCR (RT qPCR). Longan Actin gene was used as the reference gene [30]. Specific primers for these genes were designed using DNAMAN 6.0 and primer sequences were listed in Table S1. All the primers were synthesized by Biosune in Shanghai, China.
The primary cDNA was synthesized from equal amounts of purified total RNA (1 mg) using the Prime Script RT reagent Kit (TaKaRa, China) follow by PCR analysis. Each PCR reaction mixture contained 2 mL of diluted cDNA (40 ng), 12.5 uL of SYBR Premix EX Taq (TaKaRa, China), 0.5 mL of each primer and 8.15 mL RNA-free H 2 O to a final volume of 25 mL. PCR was performed with following conditions: 95˚C for 30 s, followed by 40 cycles of 95˚C for 15 s, annealing temperature for 30 s and 72˚C for 30 s in a CFX96 TouchReal-Time PCR Detection System (Bio-Rad, Hercules, CA, USA). Real-time qPCR was performed in triplicate for each sample, data were indicated as means ¡ SD (n53). Duncan's multiple range test was used to analyze statistical significance of gene expression difference between two sample based on average expression level in each sample (P#0.01).

Illumina sequencing and reads assembly
In order to determine the differential gene expression between the 'Lidongben'(L) and 'Sijimi' (S) longan cultivars, cDNA samples from each were sequenced using the Illumina sequencing platform. The quality of the reads was evaluated using the base-calling quality scores from the Illumina's base-caller. Over 96% of the clean reads had Phred-like quality scores at the Q20 level (a sequencing error probability of 0.01). 25,603,968 and 27,723,240 clean reads of 90 bp were obtained from L and S and were used to assemble 74,185 contigs in L and 75,548 contigs in S, respectively. With the approach of paired-end read joining and gap-filling, the contigs from above were further assembled into 36,527 unigenes with a mean size of 701 bp in L and 40,982 unigenes with a mean size of 481 bp in S. To reduce the errors and biases during RNA sequencing and assembly process, the raw reads from L (26 million) and S (28 million) were combined, and 40,513 unigenes were assembled with a mean size of 703 bp ( Table 1). The length distributions of unigenes from the three combinations (L, S and L&S) were shown in Figure 2.

Functional annotation of unigenes
BLAST search was used against the public protein and nucleotide databases. Of the 36,527 L and 40,982 S unigenes, 29,687 and 33,855 unigenes had at least one hit within these databases, respectively. Of the 40,513 L&S unigenes, 32,475 (80.16%) were annotated with a Blast match. The remaining unigenes (18.73%, 17.39% from L, S and 19.84% from L&S unigenes respectively) with no hit in any database could be longan-specific genes, or genes with homologues in other species whose corresponding biological functions have not been identified (  Table 1).
Furthermore, after blasting the longan (L&S) unigenes against NR, Swiss-Prot, KEGG, COG protein databases and GO annotation, a total of 32,475 CDS were obtained ( Table 2). Unigenes with no hits to any of these databases were blasted by ESTScan to predict the nucleotide (5'-3') and amino acid sequences of the coding regions, with 1,452 genes were identified using this method.

Differential expression and pathway analysis in L and S
We developed an algorithm (see Methods) to identify longan unigenes differentially expressed between L and S samples. Comparing S to L, 15,429 unigenes were up-regulated and 24,816 were down-regulated ( Figure 3). The significance of gene expression differences was judged by using the threshold of false discovery rate (FDR#0.001) and the absolute value of log 2 Ratio (>1). 14,913 significantly differentially expressed genes (DEGs) were obtained between the two samples, including 9,771 down-regulated and 5,142 up-regulated genes ( Figure 3). The number of down-regulated genes was almost 2-fold that of upregulated genes.
To evaluate the potential functions of genes with significant transcriptional changes between the L and S, 8,743 of 14,913 DEGs were categorized into 54 GO terms consisting of three domains: biological process, cellular component and molecular function. It was clear that the dominant distributions were from 'Cellular process', 'Metabolic process', 'Cell' and 'Cell part' terms. We also observed a high percentage of genes assigned to 'Organelle', 'Binding', 'Catalytic activity', 'Response to stimulus', 'Membrane', 'Biological regulation' and 'Regulation of biological process'. A few genes were assigned to 'Locomotion', 'Viral reproduction', 'Extracellular matrix', 'Extracellular region part', 'Extracellular matrix part', 'Metallochaperone activity', 'Nutrient reservoir activity', 'Protein tag' and 'Translation regulator activity' (Figure 4).
Pathway-based analysis helps to further understand the product of a genes biological function. 14,913 of DEGs were mapped to the KEGG database, a total of 128 different pathways were found in this study related to 6,415 unigenes ( Table  S2). The maps with highest unigene representation were metabolic pathways (1,625 unigenes), followed by the biosynthesis of secondary metabolites (769 unigenes). Lipid metabolism, glycerophospholipid metabolism, pentose and glucuronate interconversions, oxidative phosphorylation and the spliceosome were included in the top 20 pathways ( Figure 5). Specific pathways were found that are implicated in flowering, such as plant hormone signal transduction, the spliceosome, circadian rhythm, and starch and sucrose metabolism.

Identification of flowering-related and flowering-time genes
After blast-based searching of the 14,913 DEGs to reference database, 539 potential flowering-related genes were found. GO Enrichment analysis indicated that these unigenes significantly enrich in terms of 'nucleus', 'sequence-specific DNA binding transcription factor activity', 'DNA binding', 'protein dimerization activity', 'membrane' and 'regulation of transcription, DNA dependent' ( Figure  S1). Flowering time of fruit treeshas been reported to be affected by environmental and biological factors such as photoperiod, temperature, plant age and gibberellic acid (GA) [31,32]. In order to identify DEGs related to flowering signal transduction and integration, 107 putative flowering-time homologues including  differentially transcribed genes were identified from longan unigenes in this study. These were further catalogued into circadian clock & photoperiod pathway, vernalization pathway, autonomous pathway, GA pathway, age-related pathway and floral pathway integrator genes ( Table 3 and Table S3).
In circadian clock & photoperiod pathway, eight longan unigenes showed significant similarity to Arabidopsis CO. Some genes in light signal pathway and in circadian clock loops were also identified. However CIRCADIAN CLOCK-ASSOCIATED 1 (CCA1), PSEUDO RESPONSE REGULATOR 7/9 (PRR7/9) and CCA1 HIKING EXPEDITION (CHE) were apparently absent. F-BOX 1 (FKF1) and GIGANTEA (GI) proteins, which are downstream of the circadian clock, play major roles in facilitating the CO expression [10,33]. Two putative GI homologues and one FKF1 homologue were identified in longan. According to the calculation of transcript fold change (log 2 (S_RPKM/L_RPKM)), the expression levels of GI and FKF1, were up-regulated in the S sample. Especially, the value of log 2 (S_RPKM/L_RPKM) of one gene (Unigene4309) encoding EARLY FLOWERING 4 (ELF4), which is involved in many of the same physiological processes as GI [34], reached 6.0154, markedly large increased in the S sample compared to the L sample. Another longan homolog of ELF4, named Unigene5963, also showed an increase in transcription level. The epigenetic silencing of the FLOWERING LOCUS C (FLC) [35,36] is central to the vernalization process [37]. One putative homologue for the FLC sequence was found in longan, and showed a decrease in transcript abundance in the S sample. FLOWERING LOCUS M (FLM/MAF1) and MADS-AFFECTING FLOWERING 2 (MAF2), clade members of FLC, were not found in our database. FRIGIDA (FRI) is the activator of FLC in Arabidopsis [38]. However, longan FRI (CL67.Contig2) had higher expression levels in the S sample. Genes which negatively regulate FLC in vernalization pathway were also present in the two samples, like VERNALIZATION INSENSITIVE 3 (VIN3), these genes did not show significant expression change between two genotypes.
SVP is a flowering repressor and central regulator of the flowering regulatory network [39]. Multiple copies of SVP were found in longan, and their expression levels were all down-regulated. Both SVP and FLC are targets of genes in the autonomous pathway [40,41]. Several unigenes in this pathway, such as FLOWERING LOCUS Y (FY) (Unigene10275) and FLOWERING LOCUS D (FLD) (Unigene18631) showed higher transcription levels in S.
GA promotes flowering in Arabidopsis under SD [42]. Several putative GA3ox, GA20ox and GA2ox genes were identified in the two samples. Most of their reads were relatively lower. However, the expression of two GA3ox (Unigene12858, Unigene21359) and two GA20ox (Unigene4959, Unigene22418) unigenes were up-regulated. Candidate unigenes for the GA signal were also examined. SPINDLY (SPY), PHOTOPERIOD RESPONSIVE 1 (PHOR1), GIBBERELLIC ACID INSENSITIVE (GAI), SLEEPY1 (SLY1) and SLEEP2 (SLY2) were found. With the     exception of SPY (Unigene15359), the expression levels of these six genes were all down-regulated in S. An additional flowering pathway, termed the age-related pathway, represents a developmental process with parts that are independent of environmental variables [14]. Sequences corresponding to SQUAMOSA PROMOTER BINDING PROTEIN-LIKE 9 (SPL9) and SPL3, the targets of MicroRNA156 were identified in two samples, both of their expression levels were down-regulated. One cluster of genes (CL1113) was found corresponding to the APETALA 2 (AP2) gene, the target of MiRNA172 and the repressor of FT. All contigs in this cluster seemed to be differentially transcribed and their reads were relatively low.
Two putative TFL1 genes were found in longan, the expression of Unigene6027 was slightly up-regulated in the S sample and the expression of Unigene6475 was down-regulated. Cloning and resequencing of the cDNA and genomic DNA sequence of Longan TFL1 (Unigene6475) coding region from 'Lidongben' and 'Sijimi' showed no coding differences ( Figure S2). Other floral integrators, such as LFY, APETALA1 (AP1) and SOC1 did not show significantly differential expression between the two samples. Expression of FT was not observed in these libraries, although three published longan FTs (DlFT1, DlFT2 and DlFT3 [43,44]) were all expressed in leaf tissues (not shown). Differentially expressed genes revealed by transcriptome analysis were confirmed by real-time quantitative PCR. qPCR was done with sample collections from another year, to futher validate RNA-seq results. Fifteen flowering-time genes were chosen including floral pathway integrator genes and their family members ( Figure 6). The RT-qPCR results showed a similar trend for all tested unigenes in L and S samples. For example, of the two putative TFL1 genes, the expression of Unigene6475 was down-regulated and the expression of Unigene6027 was up-regulated in S, consistent with the transcriptome data. The expression level of ELF4 (Unigene4309) was increased nearly 90 times in the S sample compared with the L sample as analyzed by real-time PCR, consistent with transcriptome data which showed 72.5 times higher expression in the S sample compared with the L sample. Genes having statistically significant differences in transcription levels are shown in Figure 6.

Discussion
Over the past decade, the development of genomic and transcriptomic technologies has contributed to a better understanding of floral biology at the molecular level. However, most of our knowledge about flower induction has arisen from studying flowering regulatory genes in Arabidopsis thaliana. Although these genes appear to be conserved in woody species [45], the regulatory pathways of flowering in woody perennials show remarkable differences [19], especially in the regulation of seasonal flowering.
To identify candidate genes associated with continuous flowering and obtain more insight into the molecular regulation of flowering seasonality in longan, genome-wide gene expression profiling was used to compare 'Sijimi' longan which has a continuous flowering habit and typical longan cultivar 'Lidongben'. Using the SOAPdenovo software, we generated a total of 40,513 unigenes from the two samples. Using this dataset, 14,913 significantly differentially expressed genes were identified, among them 539 flowering-related genes were identified.
Over one hundred putative homologues of flowering-time genes in longan were identified and their transcript abundance was compared, including floral integrator genes such as LFY, AP1 and SOC1. However, FT was not found in the two libraries. Three longan FT-like genes, DlFT1 DlFT2 and DlFT3, with flower promotion and repression functions in Arabidopsis have been described [43,44]. With KClO 3 treatment, DlFT transcripts were detected in mature leaves, and expression in other tissues including various stages of bud development and roots was not detected [44]. In Arabidopsis, FT induces flowering when expressed either in leaves or the shoot apical meristem (SAM) [46]. Experiments have demonstrated that FT moves from leaves to the SAM inducing flowering [47,48]. Also, in grafts of Cucurbita moschata, movement of an FT ortholog across a graft junction in the phloem system correlated with flowering [49]. Our results suggest that in the terminal tips of newly-sprouting longan shoots, FT has either not been expressed or imported from surrounding leaves. The expression pattern of DlFTs during flowering of longan needs further investigation.
SOC1 integrates multiple flowering signals and activates LFY, a floral meristem identity gene [15]. The expression levels of SOC1, LFY and another floral meristem identity gene AP1 were not altered significantly between the two samples, consistent with the developmental stage of materials used in this study. These results also suggest that these integrator genes are not associated with the continuous flowering trait of 'Sijimi'.
In Arabidopsis, the GA pathway actively promotes flowering under SD conditions, by up-regulating one or both of the genes LFY [50] and SOC1 [51]. The literature on the role of GAs in the floral initiation of woody perennials is large but inconsistent [52]. GAs inhibit floral initiation in mango [53] and citrus [13]. In longan, high GA 1 , GA 3 , GA 19 and GA 20 levels in the shoot tips contribute to floral bud induction [54]. However, it has also been demonstrated that GA inhibits flowering in longan as has been found for other fruit trees [4,32]. In our results, GA biosynthetic genes, GA20ox and GA3ox were up-regulated in the S cultivar. However, the reads of all these genes were very low in the two samples, suggesting the GA pathway may not be dominant for flowering in longan [55]. The GA signal promotes growth by initiating the degradation of DELLA proteins [56]. In poplar, the higher expression of GA2ox and GA INSENSITIVE (GAI) as a transgene accelerates flowering in the field, indicating GA and DELLA proteins may affect the juvenility of woody plants [57]. The repressors of DELLA proteins, PHOR1, GID1 and GID2, and DELLA domain protein GAI showed decreased expression levels in this study (Table 3). However the GA regulatory pathway in woody plants remains unclear; whether the down-regulation of these genes is associated with the shorter grafting juvenility of 'Sijimi' needs further study.
The repressor TFL1 is thought an important gene regulating juvenility and flowering seasonality in woody plants [18,58], and is involved in the thermosensory pathway of Arabidopsis [59]. One of TFL1 homologs in longan, unigene 6475, was slightly down-regulated in S according to the transcriptome data, its genomic DNA and cDNA sequences between S and L were compared and no differences were found. If longan TFL1 (Unigene6475) is involved in the regulation pathway of seasonal flowering then more experimental evidence is required. Another TFL1 gene in longan, Unigene6027, showed higher expression level in S, suggesting it may have different function in longan.
As plant maturity proceeds, there is a decline in miR156 levels, and an increase in levels of certain SPLs, which leads to the activation of FT in leaves. This increase in SPLs in the meristem leads to the activation of FUL, SOC1, LFY and AP1, promoting the floral transition [60,61]. However, the expression levels of SPL3 and SPL9 were observed to decrease in S, so these gene homologues may not be involved in the regulation of seasonal flowering. Alternatively there may be other SPL genes acting in this process.
Combining the results of gene expression levels from RNA sequencing and q-PCR, SVP, the integrator and repressor of flowering regulation, was significantly down-regulated in the S sample. SVP mediates ambient temperature signaling within the thermosensory pathway, and endogenous signals from autonomous and GA pathways [14]. Several genes in automous pathway were up-regulated in the S sample, therefore SVP may be affected by these genes. Recently, CCA1 and LHY were reported to decrease SVP protein stability, possibly through proteinprotein interaction via ELF3 [62,63]. This suggests SVP is regulated by circadian clock related genes and may participate in the regulation of seasonal flowering in longan. Furthermore, multiple copies of SVP were found in longan. Further research is needed to ascertain which SVP gene(s) participate in regulating seasonal flowering.
Plants possess a circadian clock that enables them to coordinate internal biological events with external rhythm changes [10]. In woody plants, the circadian clock may sense seasonal cues. The circadian clock in Arabidopsis is composed of at least three interlocking loops to measure day length changes and regulate FKF1, GI and CYCLING DOF FACTOR (CDF) [8,10]. FKF1 and GI form a complex which degrades CDF proteins under LD conditions, facilitating the expression of CO [33,64]. Two GI and one FKF1 genes of longan were identified in our sequences, and their expression was significantly up-regulated, especially FKF1. However, the expression of the eight CO-like genes did not vary significantly between the two cultivars. GIGANTEA (GI)-regulated miR172 defines a unique genetic pathway that regulates photoperiodic flowering by inducing FT independently of CO in Arabidopsis [65] and GI also directly activates FT [66]. So, in more mature longan tissue, FKF1and GI may directly work on miR172, FT or other floral integrator genes, affecting the flowering seasonality. GI interacts with SVP in vivo and controls expression at the FT promoter in Arabidopsis [66], whether GI pathway interacts with the repressor SVP in longan needs to be further investigated.
One gene of great interest from this current study is ELF4 (U4309), which was up-regulated nearly 90 times in 'Sijimi' compared with 'Lidongben'. The ELF4 gene controls circadian rhythms and flowering time in Arabidopsis thaliana [67]. ELF4-deficient mutants show an early flowering phenotype with increased CO expression, while over-expression of ELF4 delays flowering [67,68]. In 'Sijimi', ELF4 was significantly up-regulated, suggesting it may have important roles in regulating flowering time in longan. ELF4 and GI have a synergistic or additive effect on endogenous clock regulation [34] and temperature signals feed into the clock transcriptional circuitry through the complex repressor including ELF4 [69]. We hypothesized ELF4 may be the key gene associated with 'Sijimi' mutation, interacting with FKF1 and GI to influence flowering time ( Figure S3).
In conclusion, this is the first report of gene expression profiling in longan shoots, allowing the study of 539 potential flowering-related unigenes differentially expressed between 'Sijimi' and 'Lidongben' cultivars. More than 100 putative flowering-time genes were identified and their expression levels between two samples were compared by RPKM method. The analysis reveals the candidate genes related to continuous flowering in longan, which may provide insight into flowering regulation in woody plants.