Transcriptome Analysis of Gene Expression during Chinese Water Chestnut Storage Organ Formation

The product organ (storage organ; corm) of the Chinese water chestnut has become a very popular food in Asian countries because of its unique nutritional value. Corm formation is a complex biological process, and extensive whole genome analysis of transcripts during corm development has not been carried out. In this study, four corm libraries at different developmental stages were constructed, and gene expression was identified using a high-throughput tag sequencing technique. Approximately 4.9 million tags were sequenced, and 4,371,386, 4,372,602, 4,782,494, and 5,276,540 clean tags, including 119,676, 110,701, 100,089, and 101,239 distinct tags, respectively, were obtained after removal of low-quality tags from each library. More than 39% of the distinct tags were unambiguous and could be mapped to reference genes, while 40% were unambiguous tag-mapped genes. After mapping their functions in existing databases, a total of 11,592, 10,949, 10,585, and 7,111 genes were annotated from the B1, B2, B3, and B4 libraries, respectively. Analysis of the differentially expressed genes (DEGs) in B1/B2, B2/B3, and B3/B4 libraries showed that most of the DEGs at the B1/B2 stages were involved in carbohydrate and hormone metabolism, while the majority of DEGs were involved in energy metabolism and carbohydrate metabolism at the B2/B3 and B3/B4 stages. All of the upregulated transcription factors and 9 important genes related to product organ formation in the above four stages were also identified. The expression changes of nine of the identified DEGs were validated using a quantitative PCR approach. This study provides a comprehensive understanding of gene expression during corm formation in the Chinese water chestnut.


Introduction
The Chinese water chestnut belongs to the sedge family, and is widely cultivated throughout the world, including Asia, Australia, Africa, and the Pacific and Indian oceans. It has many uses, such as water purification and embankment protection, and its corms are also edible [1,2,3,4]. Indeed, the water chestnut is a popular aquatic vegetable because of its unique taste and the fact that it is fat-, gluten-, and cholesterol-free. It is also a critical ingredient in traditional Chinese medicine [5].
Similar to other aquatic species, this plant must be maintained in pools, water gardens, tanks, or shallow water tubs in greenhouses during the entire growth season from spring to autumn. It is often asexually propagated by directly planting a corm with a healthy apical bud into wet soil. The corm not only provides nutrition but also serves as an energy unit during the early growth of the plant. In its early developmental stage, buds first form at the end of the main stem, and then develop into stolons. The corms form at the tips of each stolon [6].
The genetic and morphometric processes involved in the formation of storage organs have been extensively studied over past decades [7,8]. Chinese water chestnuts and arrowheads share a similar developmental process in terms of underground stem morphology [9], and corm formation in both species includes an induction stage, an initial swelling stage, and a swelling stage. In the induction stage, rapid radial growth of the stolon tips occurs. In the initial stage, elongation of the stolon is terminated and the tip of the stolon begins to swell. In the swelling stage, the volume of the corm increases and nutrients such as polysaccharides, fats, and proteins are synthesized and deposited [10].
Underground formation of the storage organ can be affected by various internal and environmental factors [11] such as the photoperiod. Storage organ formation of the arrowhead is clearly regulated by the photoperiod, because short days accelerate the process of storage organ formation, whereas long days inhibit it [12]. Perceiving and responding to the photoperiod is the first step for the plant to adjust its growth and developmental processes. The leaf is the main organ that perceives the photoperiod, and the signal is then transferred to the shoot apex through the phloem. The storage organ forms at the tip of the stolon during short days. Phytochrome B (PHYB) is thought to be responsible for photoperiodic responses, and participates in storage organ development during short days. Downregulated PHYB expression leads to the promotion of storage organ formation during short or long days after elimination of the inhibition on tuberization originating from the effect of long days [13]. Flowering locus T (FT), CONSTANS (CO), and GIGANTEA are also involved in photoperiodic regulation during storage organ formation, and the CO/FT module has been shown to be required for the photoperiodic control of tuberization [14,15,16].
The photoperiod shows a relationship with gibberellic acid (GA) to control storage organ formation. Transgenic plants carrying the GA oxidase gene promote underground formation of the storage organ only under conditions of a short-day photoperiod, while earlier formation of storage organs occurs when this enzyme activity is inhibited [17]. During short days, a decrease in the gibberellin content by transgenic StBEL5 and KNOX leads to earlier formation of storage organs [18]. As well as GA, some phytohormones such as cytokinin, jasmonic acid (JA), abscisic acid, indole acetic acid, and ethylene have been identified as being involved in the formation of product organs [19,20,21,22]. For instance, axillary buds decapitated from peas and potatoes were inhibited by the exogenous application of auxin [23]. Ethylene is involved in many plant developmental processes [24,25], including the induction of storage organ formation and root bulking in carrots [26]. JA has also been shown to play an important role in plant biological processes [27]. Several studies have shown that JA has a positive effect on the swelling of underground storage organs. Endogenous JA is believed to be a strong inducer of storage organ development in dicotyledonous potato plants and monocotyledonous yam plants [28]. Moreover,  showed that exogenous JA functions in the formation of garlic bulbs [29].
Despite its importance, particularly in south China, few studies have been carried out to investigate underground corm growth of the Chinese water chestnut. Formation of the storage organ is genetically regulated, but the expression of genes involved in corm formation has not been studied in detail, although work in other species has partially described the above processes, and many storage organ-related genes have been documented [30]. An understanding of the genes expressed during corm formation is necessary to provide insights into the mechanism of corm development, which may lead to better regulation of production in the future.
DNA sequencing is an efficient approach to gain important information on genes, genetic variation, and gene function for biological processes [31,32]. Recently, many critical genes regulating growth and metabolism have been identified using a tag sequencing technique in horticultural plants [33,34]. In the present study, gene expression analysis at the transcriptional level was performed to monitor changes in biological processes during the underground formation of the storage organ [10]. Differentially expressed genes (DEGs) from four developmental stages of the Chinese water chestnut were sequenced and analyzed, with the aim of comprehensively understanding the process of corm formation.

Plant Materials
'Hongbaoshi' , a species of Chinese water chestnut that is widely cultivated in China, was planted in fields at a water depth of 5-10 cm in spring, with average temperatures of 30°C in the day and 20°C at night during the entire growth season. Four or more stolons were observed to develop and elongate in the correct order on each plant. About 90-100 days after plantation, corm formation started at the stolon tips. For tag sequencing and gene expression analysis, corms from the four developmental stages (stolon, initial swelling, middle swelling, and later swelling stage) (S1 Fig) were used (four tips from different plants were combined for each stage). To obtain materials from different developmental stages, water chestnuts were cultivated in a field (non-private) located in southeastern China. Permission for sample collection was granted from the Department of Horticulture of YangZhou University, China. No specific permissions were required for the location or the field studies because the experiments did not involve any endangered or protected species.

Screening Differentially Expressed Genes
Corm transcriptomes from the four developmental stages were analyzed. Stolon tips and corms from the initial swelling, middle swelling, and later swelling stages were collected and ground, and RNA was isolated using an RNA extraction mini kit (Qiagen, Hilden, Germany). DNase I was added to eliminate DNA contamination. Sequencing of transcripts in the form of special constructs was completed by the Beijing Institute of Genomics.
To screen the DEGs, transcriptomes from the four stages were analyzed with the aim of tracking major changes in metabolism. DEG libraries of the four samples were determined in parallel using Illumina gene expression sample preparation kits. Briefly, total RNA of these four stages was used for mRNA capture with magnetic oligo(dT) beads. First and second strand cDNA were synthesized, and the bead-bound cDNA was subsequently digested with NlaIII.
3 0 -cDNA fragments attached to oligo(dT) beads were ligated to Illumina GEX NlaIII adapter 1, which contained a recognition site for the endonuclease MmeI to enable cleavage 17 bp downstream of the recognition site (CATG) to produce tags with adapter 1. After removing the 3 0 fragment with magnetic bead precipitation, an Illumina GEX adapter 2 was introduced at the MmeI cleavage site. The resulting adapter-ligated cDNA tags were amplified using PCR primers that were annealed to the adaptor ends for 15 cycles.
The 90 bp fragments were sequenced separately at the 3 0 and 5 0 ends of the gene (PE90), and the PCR product was recovered and purified from a 6% polyacrylamide Tris-Borate-EDTA gel. The final quality of the tagged sequences was checked using an Agilent 2100 Bioanalyzer. The four tag libraries constructed underwent Illumina proprietary sequencing for cluster generation through in situ amplification, and were deep sequenced using an Illumina Genome Analyzer. For the raw data, we filtered out adaptor sequences, low-quality tags (tags with unknown nucleotides N), empty reads, tags that were too short or too long, and tags with only one copy, to obtain clean tags. The types of clean tags were represented as distinct clean tags. Subsequently, we classified the clean tags and distinct clean tags according to their library copy number, determined their percentage in the total clean and distinct tags, and analyzed the saturation of the four libraries.
Transcriptome de novo assembly was carried out with a short read assembling program, Trinity [35]. This is comprised of three independent software modules including inchworm, chrysalis, and butterfly. RNA-seq data were assembled into unique sequences of transcripts for a dominant isoform, then these sequences were partitioned into many individual De Bruijn graphs (each graph represented the transcriptional complexity at a given gene or locus). Unigenes were obtained after each graph was independently processed to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes. For annotation, all tags were mapped to the reference sequence from the NCBI database (http://www.ncbi.nlm. nih.gov/), and no more than 1 bp of nucleotide mismatch was allowed. The alignment procedures were conducted by following the protocols described in the online documentation (http://maq.sourceforge.net) and adopting default parameters. To monitor mapping events on both strands, sense and complementary antisense sequences were included in the mapping process. The tags mapped to reference sequences from multiple genes were filtered. The data sets supporting the results of this article are available in the NCBI database, Submission ID: SUB1471108 and BioProject ID: PRJNA318689, which is available at: http://www.ncbi.nlm. nih.gov/bioproject/318689.

Identification of Differentially Expressed Genes
The transcriptome of the Chinese water chestnut from the four developmental stages was used as a reference for DEG screening and analysis because of the unavailability of existing data. All expressed genes were monitored, and their gene functions were explored using database annotations such as nr, Swiss-Prot, KEGG, and COG, using the criteria of BLASTx alignment (E value < 0.00001) between unigenes and protein databases such as Swiss-Prot, KEGG, and COG. The best aligned results were used to decide the sequence direction and functions of the unigenes.
In case of any conflict between the results from different databases, a priority order of nr, Swiss-Prot, KEGG, and COG was followed when deciding the sequence direction of unigenes. When a unigene aligned with none of the above databases, ESTscan was used to predict its coding region as well as to decide its sequence direction. All of the expressed unigenes were classified according to their functions in metabolic processes. For DEG screening, a false discovery rate (FDR) 0.001 and the absolute value of the log 2 ratio ! 1 were used as thresholds to judge the significance of the difference in expression of the unigenes. All DEGs were deposited into the NCBI database under GenBank accessions KX365752 to KX365849.

Gene Expression Analysis by Quantitative PCR
To study gene expression, the cultivation conditions of the Chinese water chestnut were maintained as described above. Quantitative (q)PCR analysis was performed to quantify the transcriptional level of nine novel genes of the Chinese water chestnut at the stolon stage, initial stage, middle swelling stage, and later swelling stage, to evaluate the results of tag sequencing.
Total RNA was extracted from 100 mg samples of stolon tips and corms at the initial swelling, middle swelling, and later swelling stages using an RNA extraction mini kit (Qiagen). Approximately 10 samples at each stage were the pooled. DNase I was used to digest DNA during the RNA extraction process to eliminate DNA contamination. A total of 1-2 μg of RNA was used for cDNA synthesis according to the manufacturer's instructions (Promega, Madison, WI). qPCR was performed with the Mx3000P machine (American, agilent). SYBR Green Master Mix was used to identify mRNA levels according to the manufacturer's instructions (Tiangen, Beijing, China). According to the sequencing results, primers were designed for genes that showed enhanced transcription during corm formation (S1 Table). β-Actin was used as an internal standard and amplified using the following primers: forward 5 0 -AACCTCCTCCT CATCGTACT-3 0 , and reverse 5 0 -GACAGCATCAGCCATGTTCA-3 0 . Amplification was performed in a 25 μL reaction mixture, containing 12.5 μL SYBR Premix Ex Taq II (Tli RNaseH Plus) (2×), 10 μM of each of the forward and reverse primer, 2 μL RT reaction solution (cDNA), and 8.5 μL dH 2 O. The PCR program consisted of 94°C for 30 s, then 40 cycles of 95°C for 5 s and 60°C for 60 s. qPCR reactions were performed in triplicate.

Transcriptome Profile during Corm Formation
Four libraries were constructed from the four stages of corm development (the stolon stage, the initial swelling stage, the middle swelling stage, and the swelling stage) using the Illumina sequencing platform, with the aim of identifying genes relevant to corm development in the Chinese water chestnut. The transcriptomes of the four developmental stages of the corm were preprocessed before mapping the tags of each stage. About five million tags were obtained in the B1, B2, B3, and B4 libraries, with 283,469, 245,690, 209,624, and 251,126 distinct tags, respectively. Additionally, 4,371,386, 4,372,602, 4,782,494, and 5,276,540 clean tags were obtained, including 119,676, 110,701, 100,089, and 101,239 distinct clean tags in the B1, B2, B3, and B4 libraries, respectively, after filtering all raw tags with reference sequences. A detailed analysis of the tags in each library is shown in Table 1. We found that the copy distribution of the total and distinct clean tags in the four libraries was similar, with about 5% of distinct clean tags at higher than 100 copies, and about 30% of tags at 5-50 copies. The number of distinct clean tags at between two and five copies (about 60%) was higher than that of other libraries (Fig 1). To evaluate whether the sequenced tags were sufficient to cover the whole transcriptome, analysis of sequencing saturation was also carried out for the four libraries of the different developmental stages. We observed that the number of detected genes increased until the sequencing tags reached approximately four million (S2 Fig), suggesting that the tags identified in each library were sufficient to reflect the entire transcriptional information of the Chinese water chestnut genome.
In total, 38,207 unigenes with an average length of 788 bp were obtained following assembly of these clean tags. Gene functions were identified using BLASTx software by comparison with the existing NCBI database, using a cut-off E value of 10-5. Of these, about 40% of all distinct sequences in the four libraries showed an above cut-off BLAST result, and about 60% did not match with known genes. All matched genes were grouped into 26 catalogues according to their annotated functions. We found that the function of catalogue with the largest number of genes was predicted, and the group with the smallest number of genes was relevant to extracellular structure (S3 Fig). The DEGs were also classified into three categories according to their function: biological process, cellular component, and molecular function. We found that the DEGs (from stolon stage to late swelling stage) were involved in various metabolisms. The number of genes in the biological process category related to cellular process and metabolism process was greater than for other processes. It was also shown that most genes in the cellular component were involved in cell, cell junction, organelle, organelle part, and membrane functions. For molecular function, many genes participated in binding, catalytic activity, and transport activity (S4 Fig).

Differentially Expressed Genes in the Four Stages of Corm Formation
Changes in gene expression during corm formation. To monitor the changes in gene expression, four libraries at different developmental stages of the corm were constructed. A total of 11,592, 10,949, 10,585, and 7,111 transcripts were identified from the B1, B2, B3, and B4 libraries, respectively. Of these, 9,567, 8,921, and 6,445 genes were similarly expressed in the B1/B2, B2/B3, and B3/B4 stages, respectively (Fig 2 and S2 Table). Within these four libraries, we also found a large number of genes that changed expression during corm development, suggesting that the storage organ formation process is highly complex at the molecular level. The number of genes per million clean tags was used to evaluate the abundance of transcripts in the four libraries.
In the B1/B2 stages, 1,027 genes changed expression; of these, the expression of 315 genes was enhanced and that of 712 genes was decreased. In the B2/B3 stages, 182 genes were identified as upregulated and 1,014 were downregulated. In the B3/B4 stages, we observed enhanced expression of 106 genes and decreased expression of 3,213 genes. In summary, the number of transcripts in the B1/B2 libraries showed a smaller change compared with the B1/B4 libraries, indicating that the development of the corm from the stolon stage to the initial swelling stage differs from that from the initial swelling stage to the swelling stage (Table 2).
Differentially expressed genes at each stage. During corm development of the Chinese water chestnut, we found that a large number of genes altered their expression at the B1/B2, B2/B3, and B3/B4 stages. Therefore, 20 DEGs showing high levels of change were selected at the B1/B2, B2/B3, and B3/B4 stages to monitor the metabolic process during corm formation. We observed that two types of hormone, indole acetic acid (indole-3-acetic acid-amido synthetase) and gibberellin (gibberellin 20-oxidase), were involved in the process from the stolon stage to the initial formation stage. A cation/calcium exchanger showed enhanced expression during this period, suggesting that Ca 2+ has a close relationship with corm formation. We also found increased expression of malic enzyme participating in energy metabolism. In the B2/B3 libraries, two genes were involved in storage metabolism: the glutathione s-transferase gene and the soluble starch synthase gene. We also found altered expression of extensin protein, which is related to corm swelling. Some genes such as calmodulin-like protein and calciumdependent protein kinase involved in Ca 2+ signal regulation demonstrated increased transcriptional levels from the initial swelling stage to the middle swelling stage. In the B3/B4 libraries, we found that four genes showed enhanced expression, glutathione s-transferase, starch branching enzyme, aldose reductase-related protein, and proline-rich receptor-like protein kinase, which are involved in substance metabolism. Two genes, ATPase and NADP-dependent malic enzyme, relevant to energy metabolism also showed increased transcriptional levels ( Table 3).
We also identified the expression of all transcription factors during corm formation. We found that 15, 16, and eight transcription factors were upregulated in the B1/B2, B2/B3, and B3/B4 stages, respectively (Table 4). In the B1/B2 stages, two of the transcription factors were ethylene-responsive factors (ethylene-responsive transcription factor 9 and AP2 transcription factor), which indicated that ethylene plays an important role from the stolon stage to the initial swelling stage. The expression of some important transcription factors, such as bZIP transcription factor ABI5, WRKY, MYB, and MYC was also found to increase during this period. In the B2/B3 stages, aside from ethylene-responsive factors (ethylene-responsive transcription factor CRF4, AP2 domain class transcription factor, and ethylene-responsive transcription factor 1A), we observed that the expression of CaM-binding transcription factors was also increased. The extensin gene, related to corm swelling, increased its transcriptional level during the initial swelling stage to the middle swelling stage. In the B3/B4 stages, three transcription factors (ERF1 transcription factor, ethylene-responsive transcription factor 5-like, and ethylene-responsive transcription factor 4), involved in ethylene signal transduction, were increased.  Table 2. DEGs across all libraries. All genes mapped to the reference sequence and genome sequences were examined for their expression differences across different libraries. The numbers of differentially expressed genes represent transcripts, using threshold values of FDR 0.001 and log2 ratio ! 1 for controlling false discovery rates. B1, B2, B3, and B4 represent the samples that were collected at the stolon stage, the initial swelling stage, the middle swelling stage, and the later swelling stage of the corm, respectively.  Additionally, bZIP transcription factor ABI5, transcription factor HBP-1a(c14), and heat shock transcription factor A2 showed enhanced expression during the middle swelling stage to the swelling stage. These gene expression data show that the ethylene signal transduction pathway plays a critical role throughout the entire period of corm formation. Expression profiling of genes related to corm formation. RNA sequencing data were compared with previous reports to identify whether our expression model had coverage of well-defined genes. Expression profiling showed that 10 genes were related to corm formation. Detailed expression data of these genes and the biological processes involved in plant metabolism are listed in Table 5. We found that five genes, GIGANTEA, FRUITFUL-like protein, Cycling Dof Factor, BEL1-like HD transcription factor, and Lipoxygenase, showed increased expression during corm swelling, while five were downregulated: zinc finger CONSTANS-like protein, MADS-box transcription factor, SFT family, sucrose synthase, and phytochrome B.

B1
qPCR gene expression analysis. To further monitor gene expression profiling, nine genes related to corm formation, including zinc finger protein CONSTANS-LIKE, GIGANTEA-like protein, MADS-box transcription factor, SFT2 transport protein, dof zinc finger protein DOF1.7, sucrose synthase, lipoxygenase, FRUITFUL-like protein, and BEL1-like HD transcription factor, underwent qPCR analysis of transcriptional levels. A similar gene expression pattern was observed for seven of the genes by qPCR and tag sequencing methods during corm formation, suggesting that the results of the two techniques correlate with each other. Only one gene showed a difference in expression by the qPCR technique compared with the tag sequencing analysis (Fig 3).

Discussion
The high-throughput tag sequencing technique is an effective approach for monitoring gene expression of the whole plant genome, and many metabolic mechanisms have been identified using this method [34,43]. It was previously demonstrated that storage organ formation is regulated by both genetic and environmental factors [15]. In this study, approximately five million tags were identified at the stolon stage, initial stage, middle swelling stage, and later swelling  (Tables 1 and 2). COG and GO function classification were applied to analyze these expressed genes, and detailed information is shown in S3 and S4 Figs. Some genes relevant to corm formation were found, and some differences in gene expression between tag sequencing analysis and qPCR were identified. This could be explained by differences in sample collection conditions between the  two methods. Alternatively, differences in the corm development stage may have caused differences in expression profiling. Samples were collected according to corm size and classified into the same developmental stage, although differences in growth periods could cause some corms to be larger or smaller than average. Additionally, RNA-seq focuses on the whole transcriptome, so disturbances of bias during sequencing are difficult to avoid, and gene expression changes are determined according to the number of reads. However, qPCR concentrates on the expression of a specific gene so is more reliable at reflecting gene transcription during plant metabolism.

Survival in an Anaerobic Environment
The entire growth season of the Chinese water chestnut must be maintained in shallow water, so the plant would be expected to develop molecular mechanisms to adapt to anaerobic conditions. We observed that many genes showed an altered expression profile during corm formation (Fig 2), of which some were related to anaerobic adaptation such as genes encoding alcohol dehydrogenase (B4394), NADH dehydrogenase (B2836), superoxide dismutase (B13999), CDPK (B806), MYB transcription factor (B8501), ethylene responsive transcription factor (B490), and ATPase (B227); these showed increased expression during corm formation (Table 3 and S2 Table). These genes also showed enhanced expression in the lotus root and arrowhead during storage organ formation [10,44]. Many reports show that plant products and quality are often destroyed by anaerobic environments [45,46,47,48], which can lead to the starvation of people who depend on these crops. Compared with these xerophytes, aquatic plants such as the lotus root, arrowhead, water celery, and Chinese water chestnut show increased adaptations to anaerobic conditions [44,49]. The expression of some key genes, including ethylene responsive factor, alcohol dehydrogenase, SOD, and HSP, is an essential strategy for plants to survive during storage organ formation [50,51,52,53,54]; therefore, increased transcriptional levels of alcohol dehydrogenase are necessary for aquatic plants to maintain normal growth under anaerobic conditions.

Hormonal Signal Transduction during Corm Formation
Formation of the storage organ is affected by genetic and environmental factors [11], and hormonal signal transduction is necessary for its development. We found that several ethyleneresponsive factors (B0/B1: ethylene-responsive transcription factor 9; B2/B3: ethylene-responsive transcription factor 1A, and ethylene-responsive transcription factor CRF4; B3/B4: ethylene-responsive transcription factor 4, ethylene-responsive transcription factor 5-like, and ERF1 transcription factor) showed enhanced expression during corm formation (Tables 3 and  4), suggesting that the ethylene signal transduction pathway is necessary for corm development in the Chinese water chestnut.
Ethylene is involved in various biological processes [55, 56], and has been suggested to be a promoter of plant growth and development or the anaerobic response [57]. Aside from biotic and abiotic stress, ethylene has many other functions, such as cell division, plant height, and the underground formation of storage organs [58]. Constitutive expression of an AP2 domain class transcription factor previously resulted in increased resistance to abiotic stress in the potato plant, and a transgenic plant showed decreased cell size, plant height, hypocotyl elongation, and fertility [59]. It has also been reported that transgenic plants expressing some AP2 domain class transcription factors have altered flowering times [60]. WRI1, a putative AP2 domain class transcription factor, was reported to regulate seed development in Arabidopsis, and transgenic plants with this gene have a higher content of seeds and triacylglycerols [61]. The storage content is altered in transgenic rice plants with an AP2 domain class transcription factor because of the control of expression of the waxy gene [62]. In this study, we found that several ethylene-responsive factors and AP2 domain class transcription factors showed enhanced expression during corm formation, and we believe that expression of AP2 domain class transcription factors is relevant to the development of the Chinese water chestnut storage organ.
There is increasing evidence for GA and JA in storage organ formation. Stolon elongation was previously found to be promoted and storage organ formation inhibited by a high GA content [63]. An increased GA content in transgenic plants with the GA oxidase gene requires a longer duration of short-day photoperiods for storage organ formation, while this formation is accelerated when enzyme activity is decreased [17]. In a dwarf mutant of Solanum tuberosum ssp. Andigena, storage organ formation occurred during both long and short days because the GA content was decreased. Interestingly, it has also been shown that formation of the storage organ can be affected by conditions of short days when GA biosynthesis is totally inhibited [19]. This evidence indicates that GA plays an important role in storage organ formation, which is directly regulated by the photoperiod. In the present study, expression of the GA 20-oxidase gene (B15737), which is involved in GA biosynthesis, was increased during B1/B2 stages (the stolon stage to the initial stage) ( Table 3), suggesting that a high level of GA probably benefited elongation of the stolon.
JA has also been shown to be involved in plant metabolism and the stress response [64,65]. For example, endogenous JA was identified as a necessary inducer of storage organs in potatoes and yams [66].  suggested that exogenous JA is involved in garlic bulb formation [67], while Zeas et al. (1997) suggested that JA not only induces the formation of storage organs, but also increases their number. The same phenomenon is also found in Pterostylis sanguinea, where the exogenous application of JA promotes tuber formation [68].
Some genes, including MYB, transcription factor bHLH, bZIP, AP2/ERF domain-containing transcription factor, and lipoxygenase, which have previously been demonstrated to promote organ formation [69,70,71,72,73], were increased during corm formation in the present study. We list some genes with a role in storage organ formation [74,75,76,77,78] in Table 5. Overall, the expression profiles of these genes show that corm formation is highly complex and regulated by multiple genes.

DEGs Involved in the Ca 2+ Signal Transduction Pathway during Corm Formation
We found that three proteins relevant to Ca 2+ signaling, including a calmodulin-like protein (B18064), CaM-binding transcription factor (B40), and a calcium-dependent protein kinase (B2484), showed enhanced expression during corm formation (Tables 2 and 5). Ca 2+ is a second messenger molecule that is involved in many biological processes [79]. Calmodulin perceives the signal when Ca 2+ and CaM are combined (Ca 2+ /CaM), and then modulates the cellular response [80]. Ca 2+ /CaM has also been shown to be closely associated with storage organ formation [81]. In transgenic potato plants, constitutive expression of CaM affects the underground shape and size of tubers [82]. CaM-binding proteins were also reported to be involved in the formation of storage organs [83,84]. These findings indicate that the Ca 2+ signal transduction pathway is necessary for the development of storage organs. Therefore, the increased expression of calmodulin-like protein, CaM-binding transcription factor, and calcium-dependent protein kinase genes in our present study further demonstrates that Ca 2+ signaling is critical for the formation of storage organs.

Conclusions
We analyzed gene expression during corm formation in the Chinese water chestnut using high-throughput tag sequencing. In total, 4,371,386, 4,372,602, 4,782,494, and 5,276,540 clean tags, including 119,676, 110,701, 100,089, and 101,239 distinct tags, were obtained at each stage, respectively (B1: the stolon stage, B2: the initial swelling stage, B3: the middle swelling stage, and B4: the late swelling stage). Of these, 11,592, 10,949, 10,585, and 7,111 genes were annotated from the B1, B2, B3, and B4 stage, respectively, after mapping their functions in existing databases. A number of DEGs were found in the B1/B2, B2/B3, and B3/B4 stages during corm formation, and 10 important storage organ formation genes were also identified. qPCR results were highly correlated with those of tag sequencing analysis.