Transcriptome Analysis by Illumina High-Throughout Paired-End Sequencing Reveals the Complexity of Differential Gene Expression during In Vitro Plantlet Growth and Flowering in Amaranthus tricolor L.

Amaranthus tricolor L. is a C4 plant, which is consumed as a major leafy vegetable in some tropical countries. Under conditions of high temperature and short daylight, Am. tricolor readily bolts and blooms, degrading leaf quality. A preliminary in vitro flowering study demonstrated that the flowering control pathway in Am. tricolor may differ from that of Arabidopsis. Nevertheless, no transcriptome analysis of the flowering process in Amaranthus has been conducted. To study Am. tricolor floral regulatory mechanisms, we conducted a large-scale transcriptome analysis—based on Illumina HiSeq sequencing of cDNA libraries generated from Am. tricolor at young seedling (YSS), adult seedling (ASS), flower bud (FBS), and flowering (FS) stages. A total of 99,312 unigenes were obtained. Using BLASTX, 43,088 unigenes (43.39%) were found to have significant similarity with accessions in Nr, Nt, and Swiss-Prot databases. Of these unigenes, 11,291 were mapped to 266 KEGG pathways. Further analysis of the four digital transcriptomes revealed that 735, 17,184, 274, and 206 unigenes were specifically expressed during YSS, ASS, FBS, and FS, respectively, with 59,517 unigenes expressed throughout the four stages. These unigenes were involved in many metabolic pathways related to in vitro flowering. Among these pathways, 259 unigenes were associated with ubiquitin-mediated proteolysis, indicating its importance for in vitro flowering in Am. tricolor. Other pathways, such as circadian rhythm and cell cycle, also had important roles. Finally, 26 unigenes were validated by qRT-PCR in samples from Am. tricolor at YSS, ASS, FBS, and FS; their differential expressions at the various stages indicate their possible roles in Am. tricolor growth and development, but the results were somewhat similar to Arabidopsis. Because unigenes involved in many metabolic pathways or of unknown function were revealed to regulate in vitro plantlet growth and flowering in Am. tricolor, the process appears to be highly complex in this species.


Introduction
Amaranthus L., a genus of approximately 70 C 4 dicotyledonous herbaceous plant species, is widely distributed throughout warm temperate and tropical regions of the world [1,2]. Many amaranth species are economically important crop plants valued for their nutritional and horticultural significance [3]. The ''grain amaranths'' are a major source of highly nutritious pseudocereals that complement cereal diets [3,4], while ''green amaranths'' are savored as a major leafy vegetable in several Asian and African countries, such as India, Bangladesh, and Zimbabwe [3,[5][6][7][8]. Some species are also valued for their ornamental, forage, or pharmaceutical properties. Leafy amaranths, with axillary determinate inflorescences and native to southeast Asia, constitute one of the most widely consumed tropical vegetable crops. Most of these Amaranthus species, especially Am. tricolor, are cultivated as green vegetables similar to spinach, broccoli, and cabbage [3], contributing to a wholesome diet. The leaves, along with the stems, may be eaten as a salad vegetable. In Africa, amaranth is usually cooked as a leafy vegetable. The mucilaginous leaves are rich in vitamins and minerals. They also contain betalain pigments, used for food coloring, that possess antioxidant, anticancer, antiviral, antiparasitic, and radical scavenging properties against certain oxidative stress-related disorders [9][10][11][12].
Am. tricolor readily bolts and blooms under high-temperature, short daylight conditions, which leads to deterioration in quality and prevents year-round production. To preserve fresh quality and increase production, growth can be regulated to maintain vegetative growth and delay blooming. Investigation of in vitro flowering in Am. tricolor is very useful for horticultural production, and more importantly, in vitro plantlets of Am. tricolor are ideal experimental materials for the study of in vitro flowering mechanisms [13].
Flowering is an important transformative event in the life cycle of higher plants. The process is central to individual plant development, as proper floral initiation timing is essential for optimal production of fruits and seeds to ensure reproductive success. Studies of plant flowering regulatory mechanisms have primarily focused on Arabidopsis thaliana. These studies have revealed the complexity of the flowering process, which is controlled by genetic and environmental factors [14,15]. The expression and interaction of multiple genes is required to accurately regulate flowering times and, ultimately, floral development [16,17]. Nevertheless, floral induction mechanisms uncovered in Arabidopsis are not adequate to fully explain floral regulation in short-day plants [16,18]. Investigation of flowering mechanisms in the short-day plant Am. tricolor can aid further elucidation of the flowering process.
Transcriptome analysis is a general phenotyping method [19,20]. Using this technique, a large number of functional genes can be identified, and genes differently expressed among samples of the same species can be uncovered. Transcriptome analysis is particularly useful for revealing relationships between plant gene expression and phenotypes. Application of transcriptome sequencing technology to study plant developmental and floral transition processes has generated large quantities of flowering-related data [21][22][23][24][25][26][27][28][29]. The results of such studies have demonstrated that this method is effective for investigating the transformations of plant growth and development.
The number of advanced genomic or transcriptomic studies of the genus Amaranthus has recently increased, with at least seven published reports appearing since 2008. One such report detailed the construction of a bacterial artificial chromosome (BAC) library for Am. hypochondriacus [30]. Using this BAC library, microsatellite markers for Amaranthus were subsequently developed and applied to clarify taxonomic relationships within the Am. hybridus complex [31,32].
De novo whole genome sequencing is an effective approach for genomic study of non-model species with unsequenced genomes [33]. Using next-generation 454 pyrosequencing technology, for example, the whole genome of Am. tuberculatus has been sequenced and used to identify herbicide-resistant genes and develop molecular markers for population genetics and phylogenetics studies [34]. The transcriptome of this species has also been sequenced, with 44,469 unigenes uncovered [35]. Thousands of single nucleotide polymorphisms (SNPs) in four different populations of Am. caudatus have been identified via genomic reduction, barcoding, and 454-pyrosequencing [36]. Next-generation 454 pyrosequencing technology has also been used to generate leaf and stem gene expression profiles for Am. hypochondriacus and to analyze responses to biotic and abiotic stress [2]. In that study, the generated Am. hypochondriacus transcriptome was also compared with that of Am. tuberculatus to highlight differences in drought stress response between cereal and weedy Amaranthus species.
Although various transcriptome studies, as described above, have been conducted on Amaranthus species, the use of this method to explore the flowering process in Am. tricolor has not yet been reported. The transcriptome of Am. tricolor is worthy of study, as a preliminary analysis of a small RNA library from Am. tricolor has suggested that flowering control pathways may differ between Am. tricolor and Arabidopsis [37]. Examination of the in vitro flowering process of Am. tricolor would help to lay a foundation for further study of flowering control mechanisms and provide evidence for regulation of growth and development of other Amaranthus species, including Am. hypochondriacus. The results would also have important significance for the breeding of Amaranthus species to enhance the quality of edible organs,both leaves and seeds.

Results
Sequencing and de novo assembly of the Am. tricolor transcriptome during in vitro plantlet growth and flowering To comprehensively cover the Amaranthus in vitro flowering transcriptome, total RNA was extracted from four Am. tricolor developmental stages: young seedling (YSS), adult seedling (ASS), flower bud (FBS), and flowering (FS) stages. Equal amounts of total RNA from each sample were pooled together. The mRNA was isolated, enriched, sheared into smaller fragments, and reverse- These values indicate that the sequencing data was of sufficient quantity and quality to ensure sequence assembly accuracy and adequate transcriptome coverage.
All high-quality reads were assembled de novo using the Trinity program [38]. Assembly of reads generated 1,161,029-3,172,183 contigs from the four stages of in vitro plantlets, with a maximum length of 15,543-18,338 bp, mean length of 63.9-92. 41 (Table 3). These results demonstrate that the sequencing data was sufficiently ample for assembly of longer scaffolds into unigenes.
Unigene length is a very important factor that reflects data assembly quality. After further gap filling and exclusion of unigenes less than 200-bp long, the scaffolds from the four plantlet developmental stages were assembled into 99,312 unigenes with a maximum length of 23,783 bp, a mean length of 831.65 bp, a GC percentage of 39.90%, and an N50 of 1,460 bp excluding unknown bases (''N'') ( Table 4). Unigenes with lengths ranging from 201-300 bp, 301-500 bp, 501-1,000 bp, 1,001-1,500 bp, 1,501-2,000 bp, 2,001-2,500 bp, and .2,500 bp accounted for 33.05% (32,827), 20.07% (19,931), 20.02% (19,881), 10.37% (10,302), 6.81% (6,760), 4.08% (4,050), and 5,561 (5.60%) of total unigenes, respectively ( Figure 1 and Table 5).  Gene Ontology functional analysis of genes expressed during Am. tricolor in vitro plantlet growth and flowering Gene Ontology (GO) is an international standardized genefunction classification system that uses a dynamically updated, controlled vocabulary and a strictly defined concept to comprehensively describe the properties of genes and their products in any organism [40]. GO terms, which are assigned to unigenes based on sequence similarities to known proteins in the UniProt database annotated with GO terms as well as their InterPro and Pfam domains, are the basic GO units. The GO database comprises three ontologies: molecular function, cellular components, and biological processes. A total of 282,582 unigenes were assigned into the three main GO categories and 51 sub-categories (functional groups). A high percentage of genes in the biological processes category fell under ''cellular process'' (37,279), ''metabolic process'' (27.575), and ''response to stimulus'' (10,796). Genes related to ''cell part'' (42,055), ''cell'' (31,611), and ''organelles'' (20,336) were heavily represented in the cellular components category, while ''binding'' and ''catalytic activity'' subcategories dominated the molecular function category ( Figure 2 and Table 6).
Within each subcategory, the number of genes functionally annotated from each developmental stage was similar to the total number of genes annotated (Figure 3a-d and Table 6). These results demonstrate that identified genes associated with molecular function, cellular component, and biological process categories play important regulatory roles during in vitro flowering in Am. tricolor.   COG functional classification of genes expressed during Am. tricolor in vitro plantlet growth and flowering The COG database contains classifications of orthologous gene products. Each protein in the COG database is assumed to have evolved from an ancestor protein. The database includes coding proteins of complete genomes as well as information about the evolutionary relationships of bacteria, algae, and eukaryotes. Out of 10,087 unigenes matching the public databases, 12,460 sequences (12.55% of all unigenes) were classified into 24 COG categories ( Figure 4 and Table 7). ''General function prediction only'' represented the largest group (1747; 14.02%), followed by ''Translation, ribosomal structure and biogenesis'' (1,100, 8.83%), and ''Replication, recombination and repair'' (1,008, 8.09%). ''Cell motility'' (14; 0.11%) and ''Nuclear structure'' (6; 0.05%) were the smallest groups.

KEGG functional classification during Am. tricolor in vitro plantlet growth and flowering
Pathway-based analysis can help us further understand the biological functions of genes. The KEGG pathway database contains information on networks of intracellular molecular interactions and their organism-specific variations. Once unigenes are assigned KEGG Orthology (KO) identifiers, or K numbers, organism-specific pathway maps and BRITE functional hierarchies are automatically generated. In total, 5,998 unigenes were assigned K numbers, pathway maps, and BRITE functional hierarchies. Among them, 11,291 unigenes were assigned to 266 KEGG pathways, including ubiquitin-mediated proteolysis, circadian rhythm, cell cycle, cytokine-cytokine receptor interaction, oxidative phosphorylation, starch and sucrose metabolism, phototransduction, and ubiquinone and other terpenoid-quinone biosynthesis, and 4,558 unigenes were assigned to 32 KEGG BRITE functional hierarchies, such as ubiquitin system, chromosome, and protein kinases (File S1).
Analysis of genes differentially expressed during Am. tricolor in vitro plantlet growth and flowering Unigene expression levels were normalized to the number of reads per kilobase of exon region per million mapped reads (RPKM) [20,41]. RPKM values of unigenes expressed during YSS, ASS, FBS, and FS varied widely, i.e., 0.00862021-106615, 0.0140538-44980.8, 0.0075633-49467.9, and 0.0141148-59971, respectively (Table 8). These results demonstrate that even extremely small expression level differences can be detected using the Illumina HiSeq 2000 system [42].
Differential gene expression analysis between unigenes from FBS and YSS, FBS and ASS, and FBS and FS was performed using DESeq software [43]. A p-value cut-off of ,0.05 was used after adjustment for multiple testing using the Benjamini and Hochberg false discovery rate method. A total of 59,517 genes were differentially expressed among the four stages, with 735, 17,184, 274, and 206 genes specifically expressed in YSS, ASS, FBS, and FS, respectively ( Figure 8).
The photoperiod-dependent flowering pathway is the main flowering pathway, with interactions between light and an internal timekeeping mechanism-the circadian clock-regulating photoperiodic flowering [44,45]. In the model plant Arabidopsis, circadian clock-related genes have been found to regulate the floral transition. Constitutive overexpression of CCA1 delays flowering under short-day conditions [46,47], whereas elf3-1, cca1, and lhy mutations accelerate such flowering [44,48]. In addition, CCA1 can interact with other circadian clock-related genes, such as ELF3 and LHY, to regulate the transition to flowering [49,50]. In our study, expression of the circadian clock-related genes assayed by qRT-PCR (excluding XTH9) declined from YSS to FBS, implying that this lowering of gene expression may promote the transition from vegetative to reproductive stages.
XTH9 is expressed in Arabidopsis at different development stages, including during differentiation of vegetative and reproductive meristems and cell elongation of inflorescence stems [51]. This gene functions in tissues closest to shoot apices, where cell division is most active. As shown in Fig. 9a, expression of XTH9 increased from YSS to FBS and then declined, but still remained higher than at YSS. The XTH9 transcript was detectable during both vegetative and reproductive phases, suggesting a correlation between XTH9 expression and Am. tricolor growth and development.
Numerous studies have suggested that ubiquitin-mediated proteolysis plays an important role in photoperiod-responsive flowering-time regulation. In particular, COP1 can regulate the entire growth and development process. In rice (Oryza sativa), the COP1 ortholog PPS has been confirmed to promote the change from juvenile to adult phases and to also suppress the transition from vegetative to reproductive stages [52]. The pps-1 mutant is able to both prolong the juvenile phase and to promote early flowering. In our research, relative gene expression of the COP1 gene declined from YSS to FBS and then increased (Fig. 9b). This result indicates that elevated COP1 gene expression at YSS promotes the change from YSS to ASS, while lowered expression at FBS promotes the transition from vegetative to reproductive stages. COP1 may interact with other ubiquitin-mediated proteolysis genes such as CUL1 to regulate growth and development.
A recent study has demonstrated that ubiquitin-specific proteases UBP12 and UBP13 are involved in circadian clock and photoperiodic floral regulation in Arabidopsis [53]. Double mutants of ubp12 and ubp13 display early flowering and short periodicity of circadian rhythms. In our study, the relative expression trend of UBC12 was consistent with that of COP1.
In general, ubiquitin-mediated proteolysis pathway (PATH: ko04120)-related genes displayed much higher expression levels at YSS than at other stages, with the lowest expression observed at FBS. This trend suggests that elevated expression of these ubiquitinmediated proteolysis pathway-related genes at YSS promotes the YSS to ASS change, and that lowered expression at FBS triggers the transition from vegetative to reproductive stages.
Many plant physiological processes are regulated by photoperiod signals [54]. Various gene expression changes reportedly  induce formation of flower buds from apical buds through photoperiod induction. GLP1, a leaf protein-encoding gene [55], shows circadian oscillations in short-day plant Pharbitis nil [56] and long-day plant Sinapis alba [57], with the gene detected specifically in cotyledons and leaves. In maize, transcripts of ZmGLP1 were found to be abundant in young leaves, but less frequent in other tissues and organs, such as mature leaves, young tassels, immature kernels, and stalks [58]. In our study, the GLP1 gene was expressed at high levels during YSS and ASS; it was expressed at low levels during FBS and FS, however, being barely detectable at FS.
Relative expression of AP1 and AGL8 genes, which are directly associated with flowering, was high at FBS and FS, consistent with other research reports [59,60]. The combined results of our qRT-PCR analysis confirm the reliability of the transcriptome sequencing results.
Temperature stress can delay or advance flowering. For example, growth temperatures above a finely tuned threshold can rapidly trigger flowering. HSP70 plays an important regulatory role in the transition from vegetative to reproductive growth for photoperiodic control of flowering [61]. In our study, the relative expression of HSP70 was reduced during Am. tricolor growth and flowering.

Feasibility of Illumina paired-end sequencing and assembly for non-model species with unsequenced genomes such as Amaranthus
Understanding the dynamics of plant transcriptomes is helpful for studying the complexity of transcriptional regulation and its impact on phenotype [19]. In recent years, next-generation sequencing technologies have enabled less costly and faster generation of large-scale sequence data compared with conventional Sanger sequencing. High-throughput mRNA sequencing technology is especially suitable for gene expression profiling in non-model organisms that lack genomic sequence data [62]. Sequencing lengths reach 26100 bp with an Illumina HiSeq 2000 paired-end 100-bp (PE 100) system, which is suitable for analysis of 200-5,000-bp lengths. This system is the most efficient and accurate approach to demarcate the boundaries of transcription units of genes and complements other methods for transcriptome studies [63]. We obtained transcriptomic data from the developmental stages YSS, ASS, FBS, and FS of Am. tricolor in the most systematic, thorough manner possible using the Illumina HiSeq 2000 PE 100-bp sequencing platform. At least 6 Gbp of high-  [70], but they were longer than those generated from I. batatas (581 bp) [62], Phyllostachys heterocycla (612 bp) [65], S. indicum (629 bp) [66], C. sinensis [68], Fagopyrum (341 bp; 454) [71], and maize (218 bp; 454) [72]. More importantly, 26,637 (26.86%) of the assembled unigenes were longer than 1,000 bp. These results demonstrate that the sequencing data was of sufficient quantity and quality to ensure the accuracy of the sequence assembly. They also demonstrate that Illumina HiSeq 2000 PE100 sequencing technology can be an effective tool for gene discovery in nonmodel organisms [62]. Among the 99,312 generated unigenes, 43,088 (about 43.39%) were successfully annotated, suggesting their relatively conserved functions. Most unigenes were of unknown function, however, suggesting that in vitro flowering in Am. tricolor is more complex than in Arabidopsis. This relative sparsity of identifiable genes may also be due to the lack of a complete genomic or transcriptomic sequence set for Am. tricolor to use as a reference, increasing the difficulty of unigene annotation in this study. Our data provide important new insights into flowering in Am. tricolor and should facilitate further studies of Am. tricolor genes and their functions.

Regulatory role of photoperiod during Am. tricolor in vitro plantlet growth and flowering
The developmental transition from nutritional growth to reproductive growth phrases is controlled by various floral integrators [55,56,73,74]. The photoperiod-dependent flowering pathway is the main flowering pathway, with interactions between light and an internal timekeeping mechanism-the circadian clock-regulating photoperiodic flowering [44,45]. Because they cannot respond to day length, plant photoperiod mutants are unable to flower normally in nature [75,76]. Long days can accelerate floral transition in Arabidopsis. In the model plant Arabidopsis, circadian clock-related genes have been found to regulate the floral transition. Constitutive overexpression of CCA1 delays flowering under short-day conditions [46,47], whereas elf3-1, cca1 and lhy mutations accelerate flowering [44,48]. In addition, XTH9 is expressed at different development stages of Arabidopsis, including differentiation of vegetative and reproductive meristems and cell elongation of inflorescence stems [51]. In our study, 196 unigenes involved in the plant circadian rhythm (PATH:ko04712) pathway were obtained from the transcriptome data, including genes such as XTH9, XTH33, ELF3, LHY, PHYA, PHYB, PIF3, CRY1, and CRY2. The results of our analysis indicate that photoperiod plays an important regulatory role during Am. tricolor in vitro plantlet growth and flowering. The qRT-PCR expression analysis confirmed the participation of these related genes in this process.
In general, ubiquitin-mediated proteolysis pathway (PATH:ko04120)-related genes showed much higher gene expressions at YSS than other stages, with the lowest expression levels observed at FBS. This result suggests that elevated expression of these genes at YSS promotes the YSS to ASS change, while reduced expression at FBS triggers the transition from vegetative to reproductive stages.
Importance of the ubiquitin-mediated proteolysis pathway during Am. tricolor in vitro plantlet growth and flowering Plant flowering is regulated by multiple external and internal signals, including photoperiod, temperature, hormone, and agerelated signals. These signals ultimately converge upon the floral pathway integrators, a group of genes that are turned on or off to determine flowering time [77]. Numerous studies suggest that ubiquitin-mediated proteolysis plays an important role in flowering time regulation in response to photoperiod. Based on studies examining ubiquitination-mediated protein control of photoperiodic flowering [50,76], the photomorphogenic repressors COP1 and DET1 are involved in light-controlled gene expression and developmental programs, while ubiquitination controls photoperiodic flowering [78]. In rice (Oryza sativa), the COP1 ortholog PPS has been confirmed to promote the change from juvenile to adult phases and to suppress the transition from vegetative to reproductive stages [52], while the pps-1 mutant is able to prolong the juvenile phase and to promote early flowering. In our study, changes in relative expression of COP1 imply that high expression at YSS promotes the change from YSS to ASS and that reduced expression at FBS stimulates the transition from vegetative to reproductive stages. A recent study has demonstrated that ubiquitin-specific proteases UBP12 and UBP13 interact to regulate circadian clock and photoperiodic flowering in Arabidopsis [53]. In our study, 259 unigenes involved in the ubiquitin-mediated proteolysis pathway (PATH:ko04120) were obtained from the transcriptome data, suggesting the importance of this pathway during Am. tricolor in vitro flowering. According to the qRT-PCR analysis, these related genes were expressed more strongly at YSS than at other stages, with the lowest expression observed during FBS. This result indicates that high expression of ubiquitinmediated proteolysis pathway (PATH:ko04120)-related genes at  Role of GLP1 during Am. tricolor in vitro plantlet growth and flowering Many plant physiological processes are regulated by photoperiod signals [54]. Various gene expression changes reportedly induce formation of flower buds from apical buds through photoperiod induction. GLP1, a leaf protein-encoding gene [55], shows circadian oscillations in short-day plant Pharbitis nil [56] and long-day plant Sinapis alba [57], with the gene detected specifically in cotyledons and leaves. In maize, transcripts of ZmGLP1 were found to be abundant in young leaves, but less frequent in other tissues and organs, such as mature leaves, young tassels, immature kernels, and stalks [58]. In our study, the GLP1 gene was expressed at high levels during YSS and ASS; it was expressed at low levels during FBS and FS, however, being barely detectable at FS. This result suggests that GLP1 plays an important regulatory role during Am. tricolor in vitro plantlet growth and flowering.
Role of HSP70 during Am. tricolor in vitro plantlet growth and flowering Temperature stress can delay or advance flowering. For example, growth temperatures above a finely tuned threshold can rapidly trigger flowering. Hsp70 plays an important regulatory role in the transition from vegetative to reproductive growth for photoperiodic control of flowering. In transgenic Arabidopsis plants, overexpression of Heat Shock-induced Gene 1 (HSG1), a grape Bcl-2associated athanogene (BAG), promotes floral transition by activating CO in the photoperiod pathway [79]. In contrast, other floweringrelated proteins regulating gibberellin, autonomous, and vernalization pathways are not altered by HSG1 overexpression. The BAG protein has a conserved BAG domain that interacts with the ATPase domain of Hsp70/HSC70, inducing early floral transition [80]. HSG1 proteins promote meristem transition from vegetative to reproductive growth, resulting in early flowering. In contrast to previous studies, HSP70 expression was reduced in our transcriptome sequencing-and qRT-PCR-based study. We propose two possible reasons for the conflicting results. First, the plantlets were cultured in different environments. Our plantlets were grown at a constant temperature (2561uC); in contrast, cultivation in previous studies involved 12-day-old Arabidopsis plants grown at 12uC and then shifted to 27uC [61] or Vitis vinifera samples heattreated at 45uC for 60 min [81]. The thermal stimulation in those studies may have induced the significant gene expression level increase. Second, we detected an HSP70 protein encoded by a different HSP70 family member than those of previous studies. Further research on the role of HSP70 during Am. tricolor in vitro plantlet growth and flowering is consequently needed.

Conclusions
In this study, we generated the first large-scale dataset of the Am.

Plant materials for transcriptome sequencing
In vitro plantlets of Am. tricolor from four developmental stagesyoung seedling stage (YSS), adult seedling stage (ASS), flower bud stage (FBS), and flowering stage (FS)-were cultured using previously published methods [13] (Figure 10). The young seedling stage is represented by 7-day-old plantlets germinated from in vitro  Total RNA extraction and quality detection Total RNAs were extracted from each sample using Trizol reagent (Invitrogen, USA) and treated with DNase I to remove genomic DNA. Extracted RNAs were quantified using a NanoDrop 2000 spectrophotometer (Thermo, USA) and checked for integrity on an Agilent 2100 Bioanalyzer (Agilent Technologies) by denaturing agarose gel electrophoresis with ethidium bromide staining. Only RNA samples with A 260 /A 280 ratios between 1.9 and 2.1, RNA 28S/18S ratios higher than 1.0, and RNA integrity numbers $8.5 were used in subsequent analyses; at least 15 mg of total RNA ($300 ng ml 21 ) was used per analysis.

cDNA library construction, quality detection, and Illumina sequencing
The cDNA library of each sample was constructed according to the instructions given in ''Preparing Samples for Sequencing of mRNA Test Kits'' (Illumina). Products were checked to verify the cDNA library quality and fragment length using an Agilent 2100 DNA 1000 Kit. Sequencing of the resulting cDNA library was carried out on an Illumina HiSeq 2000 paired-end 100-bp (PE 100) system.

Data filtering and de novo assembly
Following sequencing of each cDNA library, the raw sequencing data were transformed by base calling into sequence data termed raw data or raw reads. Adaptor fragments, duplicated sequences, low-quality reads with ambiguous bases (''N''), and reads with more than 10% of Q-values ,20 bases were removed from the raw reads to yield the clean reads required for analysis. De novo transcriptome assembly of these short reads was performed using the Trinity assembly program, which first combined reads with a certain length of overlap to form longer fragments, called contigs, having no ambiguous bases. The reads were mapped back to the contigs, and paired-end reads were used to detect contigs arising from the same transcript as well as the distances between these contigs. Next, Trinity connected the contigs, using 'N' to represent unknown sequences, to yield scaffolds. Paired-end reads were then reused to fill in the gaps between the scaffolds, yielding sequences that had the fewest Ns and that could not be extended on either end. These resulting sequences were designated as unigenes. The raw data were deposited in the NCBI SRA under accession numbers SRR924089, SRR924090, SRR924091, and SRR924092.

Gene annotation, classification, and metabolic pathway analysis
ORFs of unigene sequences greater than 200-bp long were identified using getorf software (http://emboss.bioinformatics.nl/ cgi-bin/emboss/getorf). Using BLASTX, the unigene sequences were then searched (E-value ,1610 25 ) against Nr, Swiss-Prot, KEGG, and COG databases. The best results from the alignment were used to predict unigene coding regions and direction (i.e., 5' to 3'). When results from different databases conflicted, annotations were applied according to the following order: Nr . SwissProt . KEGG . COG. Unigene annotations were used to provide information on expression patterns and functions of the identified genes.
The gene2go (ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/) program was used in conjunction with the Nr annotations to obtain unigene GO annotations, with Excel 2010 (Microsoft Corporation) then used for all GO functional classifications.
Collections of Clusters of Orthologous Genes (COGs) provide indispensable tools for comparative genomic analysis, evolutionary reconstruction and functional annotation of new genomes, and various genome-wide evolutionary analyses [70]. To evaluate the completeness of our transcriptome library and the effectiveness of our annotation process, the unigene sequences were searched for the possible functions involved in COG classifications.
A metabolic pathway analysis was performed using the KEGG database and related software applications (http://www.genome. jp/kegg/kegg4.html), which form a bioinformatics resource for linking genomes to living organisms. KEGG pathway maps, BRITE functional hierarchies, and KEGG modules are represented in a generic way to be applicable to all organisms. The KEGG Orthology (KO) system is the basis for this representation. Once unigenes are assigned KO identifiers, or K numbers, organism-specific pathway maps and BRITE functional hierar-chies are automatically generated. Within the KEGG databases, the PATHWAY database contains information on networks of molecular interactions within cells, as well as variations specific to particular organisms. Consequently, the acquired Am. tricolor unigenes based on the KEGG database were further extended by assigning them to metabolic pathways using BLASTX.

Differential gene expression analysis
After RPKM-normalization of unigene expression levels [20,41], analysis of differentially expressed genes was carried out between FBS and YSS, FBS and ASS, and FBS and FS using DESeq software [43], with a p-value cut-off of ,0.05 applied after adjustment for multiple testing based on the Benjamini and Hochberg false discovery rate method.

qRT-PCR analysis
qRT-PCR analysis based on gene-specific primers that were designed with DNAMAN 6.0 was used to verify candidate unigenes having potential roles during Am. tricolor in vitro plantlet growth and flowering. RNA samples used for the qRT-PCR analysis were the same as those used for the cDNA library construction. Purified RNA was reverse-transcribed to cDNA using a SYBR PrimeScript RT-PCR kit (Perfect Real Time; Takara, China). qRT-PCR reactions were performed in triplicate with three biological replicates per sample according to the SYBR Green I Master Mix protocol (Takara) on a LightCycler 480 qPCR instrument (Roche, Switzerland). NormFinder (http:// moma.dk/normfinder-software) was used to analyze the stability of candidate unigene expression. The HERC gene was used as an internal reference gene for calculation of relative expressions of the candidate unigenes.
Gene names, primer sequences, product sizes, PCR efficiencies, and annealing temperatures are given in File S2.

Supporting Information
File S1 KEGG classifications of genes expressed during Am. tricolor in vitro plantlet growth and flowering.

(XLS).
File S2 qRT-PCR primers used to analyze candidate unigenes expressed during Am. tricolor in vitro plantlet growth and flowering.