RNA-Seq analysis of gene expression for floral development in crested wheatgrass (Agropyron cristatum L.)

Crested wheatgrass [Agropyron cristatum L. (Gaertn.)] is widely used for early spring grazing in western Canada and the development of late maturing cultivars which maintain forage quality for a longer period is desired. However, it is difficult to manipulate the timing of floral transition, as little is known about molecular mechanism of plant maturity in this species. In this study, RNA-Seq and differential gene expression analysis were performed to investigate gene expression for floral initiation and development in crested wheatgrass. Three cDNA libraries were generated and sequenced to represent three successive growth stages by sampling leaves at the stem elongation stage, spikes at boot and anthesis stages. The sequencing generated 25,568,846; 25,144,688 and 25,714,194 qualified Illumina reads for the three successive stages, respectively. De novo assembly of all the reads generated 311,671 transcripts with a mean length of 487 bp, and 152,849 genes with an average sequence length of 669 bp. A total of 48,574 (31.8%) and 105,222 (68.8%) genes were annotated in the Swiss-Prot and NCBI non-redundant (nr) protein databases, respectively. Based on the Kyoto Encyclopedia of Genes and Genome (KEGG) pathway database, 9,723 annotated sequences were mapped onto 298 pathways, including plant circadian clock pathway. Specifically, 113 flowering time-associated genes, 123 MADS-box genes and 22 CONSTANS-LIKE (COL) genes were identified. A COL homolog DN52048-c0-g4 which was clustered with the flowering time genes AtCO and OsHd1 in Arabidopsis (Arabidopsis thaliana L.) and rice (Oryza sativa L.), respectively, showed specific expression in leaves and could be a CONSTANS (CO) candidate gene. Taken together, this study has generated a new set of genomic resources for identifying and characterizing genes and pathways involved in floral transition and development in crested wheatgrass. These findings are significant for further understanding of the molecular basis for late maturity in this grass species.


Introduction
The crested wheatgrass complex (Agropyron) plays an important role in the provision of forage for ruminant animals in temperate semi-arid regions of North America. This complex has approximately 15 species and is an important perennial genus of the tribe Triticeae [1]. There are three ploidy types in the genus: diploid (2n = 2x = 14), tetraploid (2n = 4x = 28), and hexaploid (2n = 6x = 42). In Canada, Agropyron cristatum L. is the most common species of crested wheatgrass, with both diploid and tetraploid cultivars grown. The tetraploid type is more popular than diploid type. It is recognized as a valuable grass for spring grazing and hay production. However, once crested wheatgrass reaches the stage of flowering, nutritive value declines rapidly and plants become less palatable to grazing animals. Thus, the development of late maturing cultivars is a goal for crested wheatgrass breeding programs. However, it is a challenge to manipulate the timing of its floral transition, as there is little understanding of flowering genes and their expression in this species.
Flowering is a complex trait controlled by multiple genes and has considerable impacts on the adaptability, biomass and economic value in agricultural crops. Many genes involved in various flowering pathways, e.g. photoperiod, vernalization and autonomous pathways, have been identified in the model plant A. thaliana [2,3]. The photoperiod pathway controls flowering time in response to seasonal changes in day length. Two photoperiod responsive genes, namely, CONSTANS (CO) and FLOWERING LOCUS T (FT), are vital in regulating photoperiodic flowering [3,4]. In Arabidopsis, GIGANTEA (GI) protein, which is the output from the circadian clock, activates the expression of CO gene. The role of CO protein in flowering is to activate the FT gene which promotes flowering by acting upstream of the floral meristem identity genes APETALA 1 (AP1), LEAFY (LFY) and CAULIFLOWER (CAL) [4]. The CO/FT module appears to be conservative and genes with similarity to GI, CO and FT have been identified and characterized in several grass species [5][6][7][8][9][10][11]. Vernalization is the process by which prolonged exposure to cold renders plants competent to flower [12]. It prevents seeds sown in late summer or early fall from developing into flowering plants. In Arabidopsis, vernalization reduces the activity of central flowering repressor which is encoded by the FLOWERING LOCUS C (FLC) gene [13]. In cereals, three genes (VERNALIZATION1 [VRN1], VRN2 and VRN3) have been identified and are thought to form a regulatory loop to control the timing of flowering [14].
In contrast with those genes identified in important annual grass species like rice (Oryza sativa L.) and wheat (Triticum aestivum L.), almost no research has been done on genes regulating the vegetative-to-reproductive transition in crested wheatgrass. Crested wheatgrass is an obligate outcrossing species, and thus the high level of individual heterozygosity and population heterogeneity makes its genetic and molecular studies more difficult. Recently, next-generation deep-sequencing technologies such as Solexa/Illumina RNA-sequencing (RNA-Seq) have provided new approaches to study global transcriptome profiles for species without reference genomes. RNA-Seq can study transcripts of a certain trait in a given developmental stage and has been used to investigate the transcriptomic profiles in some grass species [15,16]. Zhang et al. [17] performed de novo transcriptome sequencing of tetraploid crested wheatgrass to develop genomic resources for wheat genetic improvement. However, little attention has been paid to identify and characterize genes for flowering in crested wheatgrass.
To enhance our understanding of the genetic control of flowering in crested wheatgrass, an RNA-Seq analysis was conducted of the flowering transcriptome over three developmental stages: stem elongation stage (VS), boot stage (BS) and anthesis stage (AS) in tetraploid crested wheatgrass. The specific objectives of this study were: (1) to perform de novo assembly of three different RNA-Seq libraries to develop genomic resources for flowering traits and (2) to conduct a differential gene expression analysis among the three stages to identify genes involved in specific pathways for floral transition. It was hoped that this analysis would not only provide some baseline information and genomic resources for identifying and characterizing genes associated with floral initiation and development in this species, but also allow for better understanding of the molecular mechanisms controlling plant flowering.

Plant materials and growth conditions
Four tetraploid crested wheatgrass plants, one from each line (Plant Introduction No. PI598641, PI439914, W625134 or PI439914), with the same flowering date in 2015 and 2016 growing seasons (S1 File) were selected for RNA-Seq analysis. These plants were grown in the field plots at Agriculture and Agri-Food Canada Saskatoon Research Farm, Saskatoon, SK, Canada. The field plots were established in July of 2014 on 1 m centers. The samples of the four selected plants were collected at the same time as follows: leaf tissues at stem elongation stage (approximately E0), spikes at boot stage 7-d after the first sampling (approximately R0) and spikes at the anthesis (flowering) stage (R4), 30-d from the second sampling, respectively according to Moore et al., (1991) [18]. The collected tissues were immediately frozen and stored in liquid nitrogen for RNA extraction.

RNA extraction, cDNA library construction and Illumina sequencing
For each plant, total RNA was extracted from approximately 100 mg raw material at the three developmental stages, respectively, using the Qiagen RNeasy Plant mini kit (Qiagen) according to the manufacturer's protocol. RNA quantification was performed using Nanodrop 8000 (Thermo Fisher) and RNA integrity was assessed via the RNA 6000 Nano labchip on 2100 Agilent Bioanalyzer (Agilent Technologies). For each growth stage, 1.25 ug RNA from each plant was pooled and used for subsequent analysis. The pooled RNA samples were subsequently used in cDNA library construction. Three cDNA libraries were prepared using a TruSeq1 RNA sample preparation Kit from Illumina (San Diego, CA, USA). Paired-end libraries were sequenced using the Illumina HiSeq1 2500 system at National Research Council, Saskatoon, SK, Canada. All raw sequences were deposited in the National Center for Biotechnology Information Short Read Archive under accession number SRP096782.

Transcriptome de novo assembly
The raw sequence reads for three floral development stages from the image data output from the sequencing facility were quality checked by FASTQC (http://www.bioinformatics. babraham.ac.uk/projects/fastqc/). Based on the FASTQC results, a filtering was performed to remove ribosomal RNA (rRNA) contamination and trimming to remove low quality bases and adapters. SortMeRNA was used to identify rRNA in the raw reads [19]. Trimmomatic [20] was used to clean the raw reads by removing low-quality reads (Q value = 20 as threshold), partial adapter sequences and reads with ambiguous bases 'N', based on the setting of pairedend mode, phred33 and threads 6. De novo assembly of the crested wheatgrass transcriptome was accomplished following the online instructions of Trinity [21,22]. The quality of de novo transcriptome assembly, including the number, total bases, mean length, and N50 of transcripts and genes, was checked using perl script "TrinityStats.pl" of the Trinity pipeline. Corset [23] was used to cluster relevant transcripts into genes. The bam files used for Corset analysis were produced by multi-mapping the reads to the transcriptome using Bowtie2 [24]. Each sample had one bam file. The assembled transcripts and genes were analyzed with the perl script "fasta_seq_length.pl" of the Trinity pipeline to get the sequence length for each transcript and gene.

Sequence annotation and classification
Blast2go (http://www.blast2go.com/b2ghome) was used to align genes against the National Centre of Biotechnology Information (NCBI) nr protein database for function annotation. Local blastx was performed to search gene sequences against the Swiss-Prot protein database. The e-value cutoff was set at 1e -5 . Gene name was assigned to each gene based on top Blastx hit with the highest score. The genes related to flowering time and floral development were explored based on the gene names. Similarity distribution and species distribution analysis was based on annotation from Blast2go. For each annotated transcript, the top blastx hit was used for analysis. The annotated sequences were searched in the KEGG Automatic Annotation Server (KAAS) [25,26] using the bi-directional best hit method. Specific metabolic pathways were identified from the output in KAAS.

Differential gene expression analysis
The alignment-based qualification method RSEM [28] was used to estimate gene abundance. The de novo assembly of three RNA-Seq libraries was used as reference. Each RNA-Seq library was separately aligned to the reference, using Bowtie [29]. The differentially expressed genes (DEGs) were analyzed using R Bioconductor package, edgeR [30]. The edgeR dispersion value was explored between 0.1 and 0.4, and 0.2 was adopted based on the number of DEGs. The threshold of the P-value in multiple tests was determined by the value for the false discovery rate (FDR) [31]. The threshold to judge the significance of gene expression differences was ''FDR 0.001 and the absolute value of Log 2 fold change (Log 2 FC ) ! 4". Local blastx was performed to search DEGs against the KEGG database. A tabular BLAST output from the local blastx was analyzed by the KEGG Orthology Based Annotation System (KOBAS) program [32] for pathway enrichment analysis. Significantly enriched pathways were defined by taking a corrected P-value = 0.05 as the threshold.
To assess the accuracy of DEG detection, we also performed a quantitative real-time PCR (qRT-PCR) validation of DEGs related to flowering time and floral development using two plants randomly selected from the Plant Introduction No.W625134 and 439914, respectively. Specifically, total RNA at the VS and BS stages of these two plants was extracted, respectively. The RNA extraction method was the same as described for the RNA-Seq library preparation and sequencing. Extracted RNA was treated with DNAse I (Ambion), and cDNA was synthesized using the SuperScript1 First-Strand Synthesis System for RT-PCR (invitrogen) according to the manufacturer's instruction. The sequences of the specific primer sets for each tested gene were designed using the PrimerQuest Tool (https://www.idtdna.com/Primerquest/ Home/Index) and listed in S2 File. The glyceraldehyde-3-phosphate dehydrogenase (GAPDH) gene of crested wheatgrass (DN67262-c0-g1) was used as an internal control for normalization. Three separate first-strand cDNA reactions were analyzed in duplicate for each sample. The qPCR analysis was performed with SsoFast EvaGreen supermix (Bio-rad) according to the manufacturer's instructions using a Bio-rad CFX96 TM system.

Sequence annotation
Assembled genes were annotated using Blastx program with an E-value threshold of 10 −5 against Swiss-Prot and NCBI non-redundant (nr) protein databases. A total of 48,574 (31.8%) and 105,222 (68.8%) genes showed significant similarity to known proteins in the Swiss-Prot and NCBI nr protein databases, respectively. The sequences and annotation of the 105,222 genes were presented in S4 and S5 Files, respectively. The annotation information from nr database was used for similarity distribution and species distribution analysis. The similarity distribution showed that 55.0% of the matches were of high similarity ranging from 80% to 100% similarity as reported in the Blastx results whilst 38.1% of the matches were of similarity ranging from 60% to 80% (Fig 2A). Further analysis of the matching sequences indicated that 9.6% of the sequences showed the closest matches with sequences from Oryza sativa while 6.8% and 6.7% of the sequences showed closest matches with sequences from Zea mays and Aegilops tauschii, respectively ( Fig 2B).

Flowering time-associated genes and MADS-box genes
We obtained 113 genes associated with flowering time (S6 File) based on the annotations of assembled genes. The number of flowering time homologous genes in the crested wheatgrass transcriptome data is summarized in Table 2. These include photoreceptor genes cryptochromes (CRY1 and CRY2) and photochromes genes (PHYA and PHYC); photoperiod pathway genes EARLY FLOWERING 3 (ELF3), PHYTOCHROME INTERACTING FACTOR 3

Metabolic pathway assignment by KEGG
Mapping 105,222 annotated genes to the reference canonical pathways in KEGG revealed a total of 9,723 genes that were assigned to the 298 KEGG pathways (S8 File). The pathways with the most represented genes were the metabolism pathway (830 genes) and the biosynthesis of secondary metabolites (384 genes). Interestingly, a circadian rhythm pathway involving 21 genes was also found in KEGG, and the detailed metabolic pathway for the circadian rhythm is shown in Fig 3. Circadian rhythm is an important part of the photoperiod pathway for plant  flowering. In A. thaliana, the circadian oscillator, which is at the core of photoperiod pathway, is composed of the interlocked feedback loop formed by a pseudo response regulator (PRR) and major transcriptional factors CCA1, LHY and TOC1. The morning-expressed CCA1/ LHY suppresses TOC1 expression by binding to its promoter. Moreover, CCA1/LHY activates the expression of PRR7/9 in the morning and then PRR7/9 represses the transcription of CCA1/LHY during the rest of the day. By contrast, the evening-expressed TOC1 activates the expression of CCA1/LHY. In the flowering transcriptome of crested wheatgrass, not only the homologs of these major factors CCA1, LHY, TOC1 and PRR7, but also photoperiod genes ELF3, PIF3, GI, CO and FT were mapped to the circadian rhythm pathway. The pathway identified here not only confirms that the photoperiod pathway is involved in flowering in crested wheatgrass, but also provides valuable resource for investigating the photoperiod pathway and flowering-related processes in crested wheatgrass. thaliana. All members of this gene family contain one or two B-box domains at the N-terminus and a CCT domain at the C-terminus [33,34]. A total of 22 homologs of the COL genes were identified in our assembled transcriptome database (Table 2 and S6 File). Amino acid sequence alignments indicated that 11 putative proteins of these homologs in crested wheatgrass have conserved CCT domains (Fig 4A). The phylogenetic analysis of these putative COL homologs with COL proteins in A. thaliana, rice, wheat and barley revealed that all members of homologs with conserved CCT domains in crested wheatgrass could be divided into three divergent groups (Fig 4B). Seven COL homologs (DN59116-c0-g2, DN52048-c0-g1, DN52048-c0-g2, DN54810-c0-g1, DN52048-c0-g4, DN50988-c0-g4 and DN51148-c0-g2) were assigned in group I and were clustered with AtCOL3/AtCOL4/AtCOL5. They were closely related to AtCO and OsHd1. DN49022-c0-g1, DN45923-c0-g1 and DN53380-c0-g2 were closely related to AtCOL9 and AtCOL10 and clustered in group II. DN46523-c0-g2 was clustered with AtCOL6 and AtCOL16 in group III. These results suggested that the COL proteins have evolved before the divergence of the monocots and dicots [7,35,36].

Differential gene expression during floral initiation and development
Putative homologs of COL and some other important genes involved in controlling flowering time in crested wheatgrass have been identified. To better understand the molecular mechanisms that regulate the genetic pathways of floral transition and floral development, a genomewide expression analysis was performed for the VS, BS and AS developmental stages. The expression of each gene was calculated using the numbers of reads mapping onto the assembled transcriptome. Comparisons of gene expression showed that a total of 1544 genes, including 996 up-regulated and 548 down-regulated genes were identified in BS when compared to VS (Fig 5A). An overview of the expression patterns for VS vs BS was shown in Fig 5B. The expression changes of 1544 differentially expressed genes (DEGs) ranged from 17-fold to -14-fold. There were 163 genes showing specific expression in VS while 686 genes showing specific expression in BS (Fig 5C, S9 File). KOBAS was used to further identify biosynthetic pathways and to explore the functions of the DEGs in detail. Some of DEGs, which were upregulated in the BS, were significantly enriched to the pathway of steroid biosynthesis by KOBAS (S10 File). Comparing gene expressions between BS and AS revealed that 1671 DEGs were identified in AS when compared to BS, including 1256 up-regulated and 415 down-regulated genes ( Fig  5A). A corresponding view of the expression patterns for BS vs AS was depicted in Fig 5B. The changes of 1671 DEGs ranged from a 17-fold to an -18-fold. Among these genes, 322 genes were specifically expressed in BS while 1101 genes showed specific expression in AS (Fig 5C,  S9 File). In KOBAS enrichment, 13 gene sets were significantly enriched and they are related to ribosome, lysosome, starch and sucrose metabolism, fatty acid metabolism, cutin, suberine and wax biosynthesis. Most of these DEGs were up-regulated in the AS stage (S10 File).
DEGs related to flowering time and floral development were also identified. The flowering time-associated genes showed higher expression levels in VS than that in BS, and the functional annotation for these genes was listed in Table 3. DN61589-c0-g2, the homologous gene to the plant circadian clock gene LHY, was expressed 11 times higher in VS than that in BS. DN59278-c0-g2, the homologous gene of plant circadian clock gene GI, was expressed 5 times higher in VS. The CO/FT regulatory system was found to be associated with flowering initiation in crested wheatgrass. DN52048-c0-g4, which is the homologous gene of CO expressed  about 5 times higher in VS. DN56531-c3-g1 is the homologous gene for FT and its expression elevated about 8 times in VS. DN53048-c0-g2, homologous gene of flowering-promoting factor 15 also had 7 times elevation in VS compared to that in BS. DN53956-c0-g1, the homologous gene of ZCN26, a member of the maize FT-related gene family, showed about 10 times higher expression level in VS. These results indicated that photoperiod pathway is involved in flower initiation in crested wheatgrass. Genes involved in floral development were expressed significantly higher in BS (Table 4). These up-regulated genes in BS were involved in flower organ formation and floral development. DN56706-c0-g5, DN60175-c2-g2, DN56706-c0-g6 and DN36907-c0-g1, the homologous genes of MADS-box transcription factors, were expressed from 5 to 12 times higher in BS than VS and were correlated with floral development. DN55848-c0-g1, a homologous gene of auxin response factor, showed 10 times higher expression in BS. These identified genes and their sequences are useful for future functional studies on floral development in crested wheatgrass.
Our qRT-PCR analysis of eight flowering time-associated genes randomly selected from Table 3 showed up-regulation at the stem elongation stage (Fig 6A). Similarly, eight flowering development genes randomly selected from Table 4 showed higher expression at the boot stage (Fig 6B). These qRT-PCR results helped to confirm the reliability of our differential gene expression analysis.

Discussion
This study represents the first attempt to use RNA-Seq to analyze the transcriptome of different floral developmental stages in crested wheatgrass. The analysis revealed several interesting findings. First, there were 311,671 transcripts identified with a mean length of 487 bp and 68.8% of 152,849 genes were annotated in the Swiss-Prot and NCBI nr protein databases, including the identification of 113 flowering time-associated genes, 123 MADS-box genes and 22 COL genes. Second, 9,722 annotated genes were mapped onto 298 pathways using the RNA-Seq analysis of floral development in A. cristatum KEGG pathway database, which including plant circadian rhythm. Third, a COL homolog DN52048-c0-g4 seemed to be a CO candidate gene in crested wheatgrass. These findings not only demonstrated the effectiveness of RNA-Seq and differential gene expression analysis in the identification and characterization of genes related to complex traits in a species without a reference genome, but also advanced our understanding of the conserved photoperiod-circadian clock-CO-FT regulatory circuit for flowering initiation in crested wheatgrass. We generated 69 million high quality sequence reads from three cDNA libraries representing the transition from juvenile to the adult phase and identified 311,671 transcripts. These transcripts were clustered into 152,849 genes and 68.8% of genes were annotated in the NCBI nr protein database. The proportion of annotated genes was compatible with the report of 62.4% of 73,664 genes detected from a similar RNA-Seq analysis of flag leaf and young spike samples in tetraploid crested wheatgrass plants [17]. However, we do not know the reason(s) for such a considerable proportion of un-annotated genes obtained from the two similar studies. Some un-annotated genes may represent novel genes associated with vegetative and reproductive growth of crested wheatgrass in unique processes and pathways. It is possible that those un-annotated genes detected from the tetraploid crested wheatgrass may contain many separate paralogs, chimeras and differentially expressed isoforms. Also, the stringent conditions used for running Bowtie2 to map reads to the transcriptome for gene detection using Corset would generate more genes with paralogs for lowering gene annotation. RNA-Seq analysis of floral development in A. cristatum Many genes associated with vernalization and photoperiod pathways for floral transition have been identified in crested wheatgrass (Table 2 and S6 File). Vernalization is the process by which flowering is promoted by prolonged exposure to cold period. Vernalization removes blocking of flowering so that the plant can perceive inductive signals, such as long-day photoperiod. The FLC gene of A. thaliana has been identified as a central flowering repressor which can directly interfere with FT expression in leaves [37]. In winter wheat and barley, three genes VRN1, VRN2 and VRN3 (FT) form a regulatory loop that regulates the initiation of flowering [38][39][40][41]. VRN1 encodes an AP1-related MADS-box protein [42]. VRN2 has CCT zinc finger, functioning as floral repressor FLC [43][44][45][46]. VRN3 corresponds to an ortholog of A. thaliana FT [47]. During growth of vernalization-requiring cereals in the fall, VRN2 represses FT to prevent flowering and VRN1 is transcribed at very low levels [41,47,48,49]. However, when winter comes, VRN1 expression level increases, which causing the repression of VRN2 and activation of VRN3 [41, 49,50]. In our transcriptome data, homologs of VRN1 and FT were found. However, no VRN2 homologous gene has been detected. This may reflect the late sampling of tissue in our study and the high level of VRN1 after vernalization repressed the expression of the VRN2. Also, two other vernalization genes homologous to VIN3 and VIN4 were found. As a cool-season grass, crested wheatgrass requires vernalization to induce floral initiation. These newly detected genes will allow us to further study the molecular mechanism of vernalization in this perennial species.
Photoperiod is an important factors regulating flowering [51]. The photoperiod pathway consists of three parts: photoreceptors, a circadian clock and an output pathway from the clock specific to flowering. Interactions between photoreceptors and the circadian clock are thought to allow plants to distinguish different day lengths. Phytochromes (PHYA-PHYE) detect red/far-red light and the cryptochromes (CRYs) detect blue light [52][53][54][55]. In our transcriptome data, homologous genes for phytochromes (PHYA and PHYC) and two cryptochromes (CRY1 and CRY2) were found ( Table 2 and S6 File). These photoreceptors process physical signals and produce a circadian clock. The circadian clock is an endogenous timekeeper that controls many rhythmic processes in organisms as they experience 24h cycle of day and night [56]. There are several genes in the long-day flowering pathway affecting both circadian rhythms and flowering time in A. thaliana. The ELF3 gene is responsible for light input signals to the clock, measures daylength and represses flowering under short days [57]. The elf3 mutant shows early, daylength-insensitive flowering [57]. The best-characterized transcriptional-translational regulatory feedback loop in the circadian clock is comprised of two morning-expressed Myb transcription factors, CCA1 and LHY, and an evening-expressed gene TOC1 [58,59]. This central loop interlocks with the morning and evening loops, forming the basic architecture of the plant circadian clock. Overexpression of either of LHY and CCA1 resulted in the late flowering plants and the expression of several genes controlled by circadian clock was disrupted [59,60]. The mutation of TOC1 can accelerate circadian rhythms and cause early flowering A. thaliana under short days [61]. The identification of these homologous genes ( Table 2 and S6 File) confirmed that photoreceptors and circadian clock are conserved for flowering in crested wheatgrass.
The circadian clock controls the CO expression level rhythm through GI, a large plant-specific protein [62,63]. GI activates expression of CO protein, a B-box-type zinc finger transcription factor-encoding gene expressed in leaves [64,65]. When it coincides with a light period, indicating long days of summer, translated CO protein is stable and directly activates its prime target, the FT gene [64]. FT protein then acts as a transcription factor to activate expression of APETALA1 (AP1) and promote floral meristem identity gene at the shoot apex [66,67]. Expression of CO in dark periods of a long night, however, results in degradation of CO protein and no FT activation [68]. COL genes have been identified in some plant species, each of which seems to have a large family with several genes [69]. There are 17 COL genes in A. thaliana [34] and at least 16 COL genes in rice [69]. In this study, eleven of the 22 crested wheatgrass COL homologs were identified with the conserved CCT domain. The phylogenetic analysis revealed that these COL proteins were classified into three conserved COL subgroups as defined previously [33,34,69]. COL proteins of different subgroups are expected to perform distinct biological function and our 11 COL proteins are expected to perform different biological roles. CO homolog DN52048-c0-g4 clustered with AtCO and OsHd1 in class I of the COL gene family. The AtCO gene integrates inductive photoperiod information via the circadian clock and actives the FT gene promoting flowering in Arabidopsis. OsHd1 was found to encode an ortholog of the AtCO gene in rice [5]. Besides CO homologous genes, sequence homologs for GI and FT genes were also found in our transcriptome data, which suggests that the output pathway of circadian clock, CO and FT proteins all are involved in flowering in crested wheatgrass.
As crested wheatgrass is an open pollinated species, each plant is genetically heterogeneous and our bulk of four genotypes should be informative to our current DEG analysis with edgeR. Our effort has detected 1544 DEGs between VS and BS stages and 1671 genes between BS and AS stages. The circadian clock output gene GI (DN59278-c0-g2) showed higher expression in VS than that in BS, indicating its role in the transition to flowering. CO (DN52048-c0-g4) and FT (DN56531-c3-g1) homologs showed specific expressions in the leaves (VS) of crested wheatgrass. CO was reported to be expressed in the vasculature of A. thaliana leaves, and its role in flowering is to activate the expression of FT [70]. In both rice and Arabidopsis, FT is a strong promoter of flowering that is translocated from the vasculature of leaves to the shoot apical meristem [3,71]. Our phylogenetic data also showed that CO homolog (DN52048-c0-g4) clustered with AtCO and OsHd1, indicating that it could be the CO candidate gene responsible for integrating photoperiod and activating FT gene in crested wheatgrass. Thus, our DEG analysis was informative and provided additional support that ancient and evolutionarily adaptable module CO/FT is conserved in crested wheatgrass.
Altogether, our RNA-Seq analysis has generated a new set of genomic resources for characterizing genes and pathways involved in floral transition and development in crested wheatgrass. These genomic resources can be explored and utilized to develop marker-based tools for development of late maturing lines in this species. Our research findings clearly showed that vernalization and photoperiod pathways are involved in the regulation of flowering time in crested wheatgrass, and thus advanced our understanding of the mechanisms for flowering initiation in this species. Also, the photoperiod-circadian clock-CO-FT regulatory circuit is conserved in crested wheatgrass. However, many questions remain, like how GI is regulated by circadian clock and how it controls the expression of CO. The possible role of CO and FT in photoperiod response and whether FT moves from leaves to meristem as mobile flowering signals are worthy to pursue. The answers to these questions require more extensive expression analysis and successful cloning of these genes. Also, the research outputs presented here have clearly demonstrated the effectiveness of RNA-Seq and differential gene expression analysis in the identification and characterization of genes related to specific traits in species without a reference genome. These technologies have made the informative genomic investigation of a specific complex pathway such as floral development in a non-model plant practically possible.