RNA-Seq Analysis and De Novo Transcriptome Assembly of Jerusalem Artichoke (Helianthus tuberosus Linne)

Jerusalem artichoke (Helianthus tuberosus L.) has long been cultivated as a vegetable and as a source of fructans (inulin) for pharmaceutical applications in diabetes and obesity prevention. However, transcriptomic and genomic data for Jerusalem artichoke remain scarce. In this study, Illumina RNA sequencing (RNA-Seq) was performed on samples from Jerusalem artichoke leaves, roots, stems and two different tuber tissues (early and late tuber development). Data were used for de novo assembly and characterization of the transcriptome. In total 206,215,632 paired-end reads were generated. These were assembled into 66,322 loci with 272,548 transcripts. Loci were annotated by querying against the NCBI non-redundant, Phytozome and UniProt databases, and 40,215 loci were homologous to existing database sequences. Gene Ontology terms were assigned to 19,848 loci, 15,434 loci were matched to 25 Clusters of Eukaryotic Orthologous Groups classifications, and 11,844 loci were classified into 142 Kyoto Encyclopedia of Genes and Genomes pathways. The assembled loci also contained 10,778 potential simple sequence repeats. The newly assembled transcriptome was used to identify loci with tissue-specific differential expression patterns. In total, 670 loci exhibited tissue-specific expression, and a subset of these were confirmed using RT-PCR and qRT-PCR. Gene expression related to inulin biosynthesis in tuber tissue was also investigated. Exsiting genetic and genomic data for H. tuberosus are scarce. The sequence resources developed in this study will enable the analysis of thousands of transcripts and will thus accelerate marker-assisted breeding studies and studies of inulin biosynthesis in Jerusalem artichoke.


Introduction
The sunflower species Jerusalem artichoke (Helianthus tuberosus L.), in the family Asteraceae of the order Asterales, has been cultivated as a vegetable, a fodder crop, and a source of inulin for food and industrial purposes [1][2][3][4]. Jerusalem artichoke, which has been cultivated since the 17 th century, can grow well in nutritionally poor soil and has good resistance to frost and plant diseases [5,6]. In the early 1900s, systematic breeding programs began to explore the use of H. tuberosus tubers for industrial applications such as the production of ethanol [4]. Jerusalem artichoke is a hexaploid with 102 chromosomes (2n = 66= 102) [7] that is thought to have originated in the north-central U.S., although the exact origins remain a subject of debate [8,9]. Despite its cultural and economic significance, few studies have investigated the genetic origins of Jerusalem artichoke and its various cultivars. A recent study assessed the origin of Jerusalem artichoke using genome skimming [10], a new technique for assembling and analyzing the complete plastome, partial mito-chondrial genome, and nuclear ribosomal DNA genomes. This analysis showed that the genome of Jerusalem artichoke was not derived from Helianthus annuus (an annual) but instead originated from perennial sunflowers through hybridization of the tetraploid Hairy Sunflower (Helianthus hirsutus) with the diploid Sawtooth Sunflower (Helianthus grosseserratus). [11,12]. These results indicate that H. tuberosus is an alloploid species, having a set of chromosomes from each progenitor and double the chromosome number of the two parental species.
Many members of the Asteraceae family accumulate fructans (fructose polymers) in underground storage organs [13]. On such fructan is, inulin, which is stored in the vacuole in approximately 15% of flowering plant species [14]. Jerusalem artichoke and chicory (Cichorium intybus L.) are the most important cultivated sources of inulin [15][16][17]. Inulin molecules are much smaller than starch molecules, and have 2270 linked fructose moieties terminated by a glucose residue [7]. The average number of fructose subunits depends on the species, production conditions, and developmental timing [18]. Inulin has many uses in the production of food [19,20], and pharmaceuticals [21][22][23], and can be used as a storage carbohydrate for bioethanol production [24]. The inulin produced by Jerusalem artichoke is therefore a commercially valuable resource [7].
Recent advances in next-generation sequencing technology have enabled gene discovery, analysis of gene content, and measurement of gene expression in non-model organisms that lack a published genome sequence. For example, transcriptome sequencing can be used for genome-wide determination of absolute transcript levels, identification of transcripts, and delineation of transcript structure (including 59 and 39 ends, introns, and exons) [25][26][27][28]. Transcriptome sequencing can also identify genetic variations such as, single nucleotide polymorphisms (SNPs) and simple sequence repeats (SSRs) [29]. In recent years, RNA-Seq analysis has facilitated transcriptome characterization in hundreds of plant species lacking sequenced genomes [30][31][32][33][34].
In this study, we used RNA-Seq technology to develop the first H. tuberosus transcriptome dataset. De novo transcriptome sequencing was performed on RNA from five different H tuberosus tissues. We identified 66,322 loci, annotated 40,215 loci, and mapped 11,844 loci to 237 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. We also identified 670 tissuespecific candidate loci and 10,778 SSRs. This novel dataset will be an important resource in the further genetic characterization of Jerusalem artichoke and will be particularly valuable in markerassisted breeding and investigation of traits related to inulin biosynthesis.

Plant Materials and RNA Isolation
A widely-cultivated Jerusalem artichoke cultivar, Purple Jerusalem Artichoke (PJA), was used for transcriptome analysis. PJA tubers were planted in January 2012 and were grown under normal conditions until harvesting. Stems, leaves, and tubers (stages 1 and 2; tuber1 and tuber2, respectively) were collected 6 months after planting. To avoid contamination with pathogen, roots were collected from in vitro-cultivated PJA. Tissues were snap-frozen in nitrogen upon harvest and were stored at 280uC until further processing. Total RNAs were extracted using Trizol Reagent (Invitrogen, Carlsbad, CA, USA), and were then treated with DNase I (Fermentas, Pittsburgh, PA, USA) according to the manufacturers' instructions. The OD260/230 ratio was determined using a NanoDrop ND-1000 Spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, USA) and was used for assessment of RNA quality and purity.

Transcriptome Sequencing
An equal amount of total RNA from each tissue was pooled for transcriptome sequencing in order to obtain a comprehensive range of transcripts. Poly(A)+ RNAs were purified from the pooled total RNA (20 mg) using oligo(dT) Dynabeads. Impurities were removed from the hybridized sample using a series of low-salt washes. First-strand cDNAs were synthesized using oligo(dT) primers. RNA was then degraded with RNase H (Invitrogen, Carlsbad, CA, USA) and second-strand cDNA were synthesized using DNA polymerase I (New England BioLabs, Ipswich, MA, USA). Double-stranded cDNAs were randomly fragmented using a nebulizer. The fragments were then repaired and extended at the 39 end by addition of a single adenine, and different adapters were ligated to the 59 and 39 ends. The ligated fragments were separated on a gel, and fragments of ,200 bp were isolated. After amplification by polymerase chain reaction (PCR), fragments were separated using electrophoresis, purified, and subjected to Illumina HiSeq2000 sequencing. Raw sequence data were generated by the Illumina analysis pipeline. Sequence data are deposited in the NCBI Sequence Read Archive (SRA, http://www.ncbi.nlm.nih. gov/Traces/sra) under study number PRJNA258432.

De novo Transcriptome Assembly
Raw sequence data were filtered using standard RNA-Seq parameters. Briefly, low-quality and N-base reads were trimmed from the raw reads and reads were filtered by Phred quality score (Q$20 for all bases) and read length ($25 bp). The 39 ends of the clean reads were trimmed to form five sets of reads from the five different tissues. These datasets were then pooled and assembled using de novo assemblers (Velvet v1.2.07 [35] and Oases v0.2.08 [36]) based on the de Bruijn graph algorithm. Reads were assembled into contigs at distinct k-mer values (45,51,53,55,57,59,61,63,65,67,69 and 75) using Velvet. Contigs at each k-mer value were assembled into transcripts using Oases. Finally, the transcripts assembled at k-mer values 63 and 65 were merged using Oases with a minimum length of 200 bp and other default settings. Hash length (k-mer = 65) was considered for selection of the optimal de novo assembly as described previously [37]. The cleaned reads were also assembled using Trinity release_2011-11-26 [38] with k-mer of 25, minimum k-mer coverage of 1. Default settings were used for all other parameters. The performance of  [39] with an E-value cut-off of 1E-20.

Functional Annotation and Classification
BLASTX and Blast2GO software v2.4.4 [40] were used to compare the assembled loci ($200 bp) to the NR, Phytozome, and UniProt databases at a threshold E-value #1.0E-05. For Gene Ontology analysis, the gene ontology (GO) database (http://www. geneontology.org/) was downloaded and the assembled loci were annotated to the GO database using BLASTP (E-value #1.0E-06). GO term annotation was determined using GO classification results from the Map2Slim.pl script [37]. Protein sequences with the highest sequence similarities and cut-offs were retrieved for analysis. Further functional enrichment analysis was carried out using DAVID [41,42] and AgriGO (plant GO slim, FDR#0.01) [43]. Gene lists were annotated by TAIR ID, and were analyzed with default criteria (counts $2 and EASE score #0.1) for GO terms [44], Clusters of Eukaryotic Orthologous Groups (KOGs) [45], and KEGG pathways [46]. In addition, KEGG pathways were assigned to the locus sequences using the single-directional best hit method on the KEGG Automatic Annotation Server [47,48].
Coding sequences were predicted through BLAST comparisons with public protein databases. Sequences were compared with the Phytozome and Nr protein databases using BLASTX (E-value # 1.0E-5). Loci that matched sequences in the Phytozome database were not examined further. Coding sequences were derived from loci sequences according to BLASTX outcomes ($200 bp). In addition, full-length transcripts were predicted using BLASTP with the following parameters to ensure similarity of transcripts: orthologous gene of 99% similarity, minimum 90% identity.

Analysis of Differential Expression and Tissue-Specific Loci
Five mRNA libraries were generated from separate tissues using Illumina sequencing. Reads for each sequenced tag were mapped to the assembled loci using Bowtie (mismatch #2 bp, other parameters as default), and the number of clean mapped reads for each locus was counted. The DEGseq package [49] was used to identify differentially expressed genes. The five different libraries were compared pairwise using a greater than two-fold difference as the criterion for differential expression. Significant differential expression between tissues was defined by p-value , 0.001, FDR , 0.01, and log 2 . 2. Differential expression analysis between tissues was used to identify candidate loci with tissue-specific expressions, and to determine functionally enriched loci, as described above.
Tissue-specific loci were selected based on the read counts from leaf, root, stem, tuber1 and tuber2 samples of H. tuberosus. Tissuespecific candidates were those with . 200 reads from the target tissue and , 50 reads from other tissues.

Reverse Transcription (RT) and Quantitative Reverse Transcription (qRT) PCR Analyses
Total RNA was isolated from five H. tuberosus tissues using RNAiso Plus (Takara, Tokyo, Japan). The cDNAs were synthesized with M-MLV reverse transcriptase and an oligo(dT) primer in a 20 mL volume according to the manufacturer's instructions (Invitrogen, Carlsbad, CA, USA). Twenty putative tissue-specific genes (five per tissue type), were selected for RT-PCR. Quantitative RT-PCR was performed in 10 mL reactions containing gene-specific primers, 1 mL cDNA as template, and SYBR Premix Ex Taq. Reactions were performed using a CFX96 Real-Time PCR system (BioRad, Hercules, CA, USA). The thermal profile for qRT-PCR was as follows: 3 min at 95uC, followed by 40 cycles each consisting of 95uC for 25 sec, 60uC for 25 sec and 72uC for 25 sec. Primer specificities and the formation of primer-dimers were monitored by dissociation curve analysis. The expression level of H. tuberosus Actin2 (HtActin2) was used as an internal standard for normalization of cDNA template quantity. RT-PCR and qRT-PCR reactions were performed in triplicate.

RNA-sequencing and de novo Transcriptome Assembly of H. tuberosus
Total RNAs were isolated from five different tissues of the PJA cultivar: leaves, stems, roots, tuberous initial stage 1 (tuber1) and mature stage 2 (tuber2). The extracted RNAs were then mixed in equal proportions for mRNA isolation, fragmentation, cDNA synthesis, and sequencing. RNA sequencing with the Illumina Hiseq2000 produced 244,101,906 paired-end 101 bp reads corresponding to more than 24.4 billion base pairs of sequence. The raw reads were subjected to quality control using FastQC, and reads were trimmed (Table S1). The total number of highquality reads was 206,215,632, and these contained a total of 16,675,072,220 nucleotides. Of these, 68.37% reached a strict quality score threshold of Q $20 bases and read length $25 bp, and these were used for de novo assembly [31].
The clean RAN-Seq reads were assembled de novo into contigs using two assemblers with optimal parameters. First, the reads were assembled using Velvet-Oases (k-mer = 65) [35,36] to reduce redundancy and generate longer sequences: 66,322 loci and 272,548 transcripts with lengths $200 bp were produced. Second, the reads were assembled using the Trinity program [38]: 246,155 transcripts with lengths $200 bp were produced. A comparison of transcript length distribution between the two assemblies is shown in Figure S1. Overall, the mean length, maximum length, and N50 were longer for the Velvet-Oases assembled sequences than for the Trinity assembled sequences and we therefore used the Velvet-Oases assembly for subsequent analyses.  Table S2. In summary, we generated genome-wide locus sequences of H. tuberosus, a resource that will promote functional genomics approaches in Jerusalem artichoke.

Validation of Assembled Loci Against Publically Available ESTs from H. tuberosus
We used publically available EST data to validate the loci identified by our RNA-Seq and assembly. Sequence information for ESTs from H. tuberosus was retrived from the NCBI GenBank database (most recently accessed in January, 2014). BLASTN analysis of the assembled loci was performed against the H. tuberosus ESTs (40,388 ESTs) and the best hit for each locus was selected. Of the H. tuberosus ESTs, 35,402 sequences (87.65%) matched a locus from our assembly, but no match was found for 4,986 ESTs (12.35%). Most of the loci with hit matched the ESTs with good coverage and assembly quality ( Figure S2A). Of our 66,322 loci, 52,174 loci showed no BLAST hits to the H. tuberosus ESTs and were thus considered to be putative transcripts newlyidentified by our RNA-Seq analysis.
Transcriptome information is not available for the direct progenitors of H. tuberosus, Helianthus hirsutus and Helianthus grosseserratus; however, a curated unigene collection for sunflower (Helianthus annuus L.) was recently generated by EST assembly analysis [50]. We used BLASTN to compare our assembled H. tuberosus loci against the ESTs of H. annuus and found that 81.04% of H. annuus ESTs (108,984 out of 134,474) had matches among the H. tuberosus loci ( Figure S2B).

Functional Annotation of H. tuberosus Loci
After filtering out short-length and low-quality sequences, we used our assembled locus sequences to perform similarity searches against public protein databases (Phytozome [51] Nr [52], and UniProt [53]). Firstly, we searched all six frame translations of our loci against the Phytozyme protein database using BLASTX (E-value #1.0E-05). Database matches were found for 32,746 loci (49.4%). The unmatched loci were further analyzed against the NCBI non-redundant (Nr) and UniProt database. Additionally, databases were searched using BLASTN and BLASTX to identify homologous genes. Overall, 40,215 loci (60.64%) matched significantly similar sequences within the databases. The 39.36% of sequences (26,107 loci) without hits may represent novel loci specific to H. tuberosus. Alternatively, these sequences may have been too short to produce significant hits. Similar search outcomes have been observed in previous non-model plant studies [54][55][56] ( Table 2). Based on the top BLASTX hits against the Phytozome database, H. tuberosus loci were most similar to sequences from Vitis vinifera (3,556 loci, 12.02%) followed by Solanum tuberosum (2,869 loci, 9.7%) and Solanum lycopersicum (2,500 loci, 8.45%) ( Figure 1A). The E-value distribution of the top matches showed that 23.52% of the sequences had an extremely high E-value score (E-value = 0) and 76.48% of the homologous sequences had values in the range 1.0E-0521.0E-180 ( Figure 1B). The similarity distribution showed that 18.93% of these sequences had similarities greater than 80%, 42.21% had similarities of 60%280%, and 38.86% had similarities , 60% ( Figure 1C).
To assess the functionality of the H. tuberosus transcriptome, the annotated loci were matched to the Eukaryotic Orthologous Groups (KOGs) database to find homologous genes. The search outcomes were used to determine sequence directions within loci [45]. The 66,322 loci were annotated with 15,434 KOG terms in 25 classifications (Figure 3). Each KOG term represents a conserved domain; therefore, these results indicated that a large proportion of the putative proteins encoded by the assembled locus sequences had protein domains with existing functional annotations [45]. The cluster for 'General function' prediction (19.77%) was the most frequently identified group, followed by 'Signal transduction mechanisms' (16.34%), 'Post translational modification, protein turnover, chaperones' (7.37%), 'Function unknown' (7.03%), 'Transcription' (6.78%), 'Carbohydrate transport and metabolism' (5.53%), and 'Secondary metabolites biosynthesis, transport and catabolism' (3.67%).
In addition, to identify active biochemical pathways, we mapped the H. tuberosus loci onto the KEGG pathways using BLASTX and the KEGG Automatic Annotation Server [47,48]. KO identifiers were assigned to 11,844 loci, using the KEGG orthology that contains 4,531 Enzyme Codes [46]. A number of

Identification of Differentially Expressed Loci using RNA-Seq Data
RNA-Seq data were used for the identification of differentially expressed genes (DEGs) in different H. tuberosus tissues. More than 4.8 million raw reads were obtained from the libraries for each tissue (roots, stems, tuber1, tuber2, and leaves) (Table S1). To create a unified library, the reads were normalized by the total read count for gene expression in each tissue library ( Figure S3). Next, Likelihood Ratio Tests were used to correct p-values, and libraries were median normalized. DEGs were identified using the  Transcriptome Analysis of Jerusalem Artichoke PLOS ONE | www.plosone.org following filters: adjusted p-value#0.001, FDR #0.01, and log 2 ratio at 2, #22. Pairwise comparisons were performed between the five libraries. The average number of loci showing significant differences in expression between tissue pairs was 9,588 (range, 949-15,840) ( Figure 5).
Comparison of differential expression between tissues showed that the largest expression difference occurred between leaves and tuber2 with 13,863 and 1,977 loci up-and down-regulated in leaves, respectively. The top four up-regulated loci in leaves were annotated as encoding pyridoxal-59-phosphate-dependent enzyme family protein, acclimation of photosynthesis to environment (APE1) protein, single hybrid motif superfamily protein, and subunit NDH-M of NAD(P)H, suggesting an important role for these proteins in leaves. The most similar expression patterns were noted between tuber1 and tuber2 with only 949 differentially expressed loci identified (758 and 191 loci up-and down-regulated in tuber1, respectively) ( Figure 5). The similarity in gene expression pattern between the two tuber tissues suggests that metabolic processes are similar at both stages of development stages. Differentially expressed loci were subjected to functional enrichment analysis using R tools. For pathway enrichment analysis, the specifically expressed loci were assigned to terms in the KEGG database and KEGG terms were identified that were significantly enriched compared to the underlying transcriptome. A hypergeometric test was applied and p-values were adjusted using the Bonferroni method [61,62] to identify significantly enriched pathways. Functional loci involved in the 'Photosynthesis and Photosynthesis -antenna proteins' pathway were enriched in leaves compared to the other four tissues. Loci involved in the 'alpha-Linolenic acid metabolism' and 'Plant hormone signal transduction' pathways were enriched in stems, 'Phenylalanine metabolism' and 'Amino sugar and nucleotide sugar metabolism' pathways were enriched in roots, 'Protein processing in endoplasmic reticulum' and 'Zeatin biosynthesis' pathways were enriched in tuber1, and 'Ribosome' and 'Flavone and flavonol biosynthesis' pathways were enriched in tuber2.
Notably, a-linolenic acid metabolism-related loci were specific to the stem tissue. a-Linolenic acid is released from plant lipids in response to stress stimuli or biotic elicitation. In addition, alinolenic acid initates a signal cascade that stimulates the production of secondary metabolites involved in plant defense. A previous study reported that the defense hormone methyljasmonate plays a role in the biosynthesis and accumulation of inulin in Jerusalem artichoke [63]. Secondary metabolites with medicinal uses are derived from phenylalanine and are synthesized mainly in the root [64]. In the current study, functional enrichment analysis demonstrated that loci involved in zeatin biosynthesis tuber development [65] were enriched in early stage tuber1, and flavonoid biosynthesis-related loci, which could enhance the efficiency of nutrient retrieval and transport [66], were enriched in later stage tuber2. Previous research showed that, potato tubers expressed genes involved in expressed genes of potato included starch biosynthesis genes and synthesis of storage proteins [59]. Similarly, our results also showed expression of loci related to biosynthesis and transport within tubers.

Validation of Expression of Tissue-Specific Candidate Loci by RT-PCR and qRT-PCR
Quantitative reverse-transcription-PCR (qRT-PCR) was performed to validate DEGs from different H. tuberosus tissues, and to evaluate the reliability of the H. tuberosus transcriptome assembly. Candidate tissue-specific loci were chosen with read count values . 200 in one tissue and , 50 in other tissues (Table  S4). Twenty tissue-specific candidates were selected from the five tissues. Primer sets were designed to verify tissue-specific expression (Table S5) and were used for RT-PCR validation ( Figure S4). Quantification of tissue-specific loci was conducted using qRT-PCR with two tissue-specific loci for each tissue.
Locus 36956 (similar to Arabidopsis 1-AMINOCYCLOPRO-PANE-1-CARBOXYLATE OXIDASE (AT2G19590), which is involved in cell wall macromolecule metabolic processes), and locus 39880 (similar to AT4G12520, which is annotated as 'bifunctional inhibitor/lipid-transfer protein/seed storage 2S albumin superfamily protein) were confirmed as uniquely expressed in root tissue ( Figure 6A). Similarly, locus 63236 (similar to CYSTEINE PROTEINASES SUPERFAMILY PROTEIN (AT5G50260)), and locus 41667 (similar to HPT PHOSPHO-TRANSMITTER 4 (AT3G16360)), were highly expressed in stem ( Figure 6B). Locus 08448 (similar to MLP-LIKE PROTEIN 28 (AT1G70830)), and locus 45443 (similar to PLANT PROTEIN OF UNKNOWN FUNCTION (AT3G02645)) were confirmed to be predominently expressed in leaf tissue ( Figure 6C). Locus 58397 (similar to INTEGRASE-TYPE DNA-BINDING SUPER-FAMILY PROTEIN (AT5G52020)), and locus 40208 (similar to an F-box and associated interaction domains-containing protein (AT4G12560)) were highly expressed in tuber tissues, in either a stage-specific or non-stage-specific pattern ( Figure 6D). The Arabidopsis genes similar to each annotated locus are shown in Table S5.

SSR Markers in the H. tuberosus Transcriptome
H. Tuberosus sequences (66,322 loci) were examined for SSRs. A total number of 10,778 SSRs were identified from 8,746 unique loci. Of these, 1,604 loci contained more than one of SSR motif (Table S6). The SSR frequency in the H. tuberosus transcriptome was 16.25% and the average distance between SSRs was 4.68 kb. Di-nucleotide repeats constituted the most abundant class, followed by tri-nucleotide repeats ( Figure S5A, Table S7). In addition, among the specific repeat motifs, di-and tri-nucleotide repeats were the most common, with AG/CT motifs accounting for 41.31% of the di-nucleotide repeats, fllowed by ATC/ATG (11.1%), ACC/GGT (9.41%), and AAG/CTT (8.25%) ( Figure  S5B). SSRs are thought to affect chromatin organization, gene regulation, recombination, DNA replication, the cell cycle, and mismatch repair [67]. In addition, SSR markers are invaluable for genetic diversity analysis [68].

Loci from the H. tuberosus Transcriptome Involved in the Inulin Biosynthesis Pathway
Inulin has phamarceutical applications in treating diabetes and obesity. In H. tuberosus, inulin mainly accumulates in tuber tissue, and it was therefore of interest to identify the genes responsible for biosynthesis and vacuolar storage of inulins in tubers. We used our RNA-Seq data to conduct expression profiling of loci related to carbohydrate metabolism (Figure 7). Cytosolic sucrose is the only substrate for inulin biosynthesis. Two major enzymes, fructan 1, 2beta-fructan 1-fructosyltransferase (1-FFT) and sucrose:sucrose 1Fbeta-D-fructosyltransferase (1-SST), function in transport of sucrose [72]. The proteins encoded by the loci involved in sucrose biosynthesis are likely to be present mainly in the cytosol, whereas the proteins involved in fructose chain formation are likely be present in the vacuole. We analyzed the expression of loci encoding major carbohydrate metabolic enzymes in different tissues to understand the inulin biosynthesis pathway in H. tuberosus. The four key enzymes involved in sucrose biosynthesis, hexokinase (6 loci), sucrose phosphate synthase (6 loci), sucrose synthase (10 loci) and sucrose phosphate phosphatase (2 loci), were expressed in most tissues; however, some of the loci showed tissuespecificity and had very low expression levels (Figure 7). Two essential enzymes in inulin biosynthesis, 1-FFT and 1-SST, were more strongly expressed in tuber1 and tuber2 tissues than in other tissues. Locus 33971, annotated as 1-SST, showed 7.6-fold and 59fold higher expression in tuber tissue than in root and leaf tissues, respectively. Another key enzyme, 1-FFT (locus 01768), also showed tuber tissue-specificity with high expression levels. Locus 01768 was expressed more than 18.6-fold and 9-fold higher in tuber tissue than in leaf and root tissues, respectively. Interestingly, the fructan 1-exohydrolase (1-FEH) enzyme involved in inulin degradation was not highly expressed; rather, gene expression was lower in tuber tissue than in other tissues (Table 3). In summary, we identified loci corresponding to 1-FFT and 1-SST, which are two key enzymes involved in inulin biosynthesis, and demonstrated that these unigenes were more highly expressed in tuber than in other tissues. These results are consistent with inulin biosynthesis occurring mainly in tuber tissue.

Conclusions
We took advantage of RNA-Seq technology from the Illumina platform to investigate metabolic pathways and tissue-specific gene expression in a non-model plant species. Our transcriptome analysis used raw data at an unprecedented depth (20.6 Gbp) and produced a total of 66,322 assembled loci using de novo assemblers. Of these loci, 87.65% were novel sequences not present in the most recently released H. tuberosus EST database. We mapped 11,844 loci onto 237 KEGG pathways, including 'Carbohydrate metabolism' and 'Signal transduction and Translation pathways'. We further found 43 loci that functioned in sucrose and inulin metabolism. We performed RT-PCR with 20 tissue-specific candidate loci, and most tissue-specific candidate loci were expressed mainly in specific H. tuberosus tissues. In addition, qRT-PCR results confirmed the reliability of the H. tuberosus transcriptome assembly and tissue-specificity of expressed loci. SSR markers were identified, and these could provide primary information for analysis of polymorphisms within Jerusalem artichoke populations. The assembled transcriptome sequences and additional data make a substantial contribution to the existing genomic resources for H. tuberosus and will serve to enable functional genomics research in H. tuberosus.