Transcriptome Profiling of Tomato Fruit Development Reveals Transcription Factors Associated with Ascorbic Acid, Carotenoid and Flavonoid Biosynthesis

Tomato (Solanum lycopersicum) serves as a research model for fruit development; however, while it is an important dietary source of antioxidant nutrients, the transcriptional regulation of genes that determine nutrient levels remains poorly understood. Here, the transcriptomes of fruit at seven developmental stages (7, 14, 21, 28, 35, 42 and 49 days after flowering) from two tomato cultivars (Ailsa Craig and HG6-61) were evaluated using the Illumina sequencing platform. A total of 26,397 genes, which were expressed in at least one developmental stage, were detected in the two cultivars, and the expression patterns of those genes could be divided into 20 groups using a K-mean cluster analysis. Gene Ontology term enrichment analysis indicated that genes involved in RNA regulation, secondary metabolism, hormone metabolism and cell wall metabolism were the most highly differentially expressed genes during fruit development and ripening. A co-expression analysis revealed several transcription factors whose expression patterns correlated with those of genes associated with ascorbic acid, carotenoid and flavonoid biosynthesis. This transcriptional correlation was confirmed by agroinfiltration mediated transient expression, which showed that most of the enzymatic genes in the ascorbic acid biosynthesis were regulated by the overexpression of each of the three transcription factors that were tested. The metabolic dynamics of ascorbic acid, carotenoid and flavonoid were investigated during fruit development and ripening, and some selected transcription factors showed transcriptional correlation with the accumulation of ascorbic acid, carotenoid and flavonoid. This transcriptome study provides insight into the regulatory mechanism of fruit development and presents candidate transcription factors involved in secondary metabolism.


Introduction
Fleshy fruit development and ripening involve the coordinated regulation of numerous metabolic pathways that influence fruit nutrient levels and quality [1]. Cell division followed by cell expansion, resulting in the formation of large vacuolated cells, eventually leads to an increase in fruit weight and is accompanied by the accumulation of nutrients [2]. The subsequent increase in the accumulation of carotenoids, flavonoids, vitamins and other compounds that typically occurs during fruit ripening enhances both the sensory and nutritional quality of fruits; two parameters that are beneficial for human diet and that promote seed dispersal [3,4].
The nutrient accumulation during fruit development and ripening is the final outcome of a complex set of integrated processes involving gene expression, protein translation and metabolic accumulation [5]. For most metabolites, such as carotenoids, flavonoids and ascorbic acid, the biosynthetic pathways have been well characterized in tomato; however, the underlying regulatory mechanisms remain largely unknown [6]. Transcriptome analysis has revealed fluctuations in the expression of genes involved in various metabolic pathways during fruit development and ripening together with diverse patterns of metabolite accumulation [7]. For example, the transcript abundance of genes encoding enzymes involved in ascorbic acid biosynthesis shows diurnal fluctuations that are influenced by light, indicating the existence of regulatory networks that are affected by external factors [8]. Indeed, these metabolites accumulate to varying degrees during fruit development and have been reported to be affected by environmental factors [9]. The fact that fruit nutrient status is subject to environmental regulation in addition to being developmentally regulated indicates that multiple transcription factors or regulators may act to control their biosynthesis [10].
Studies of transcription factors related to biosynthetic genes may help dissect the regulatory machinery that influences metabolite accumulation. Several such regulatory genes have been reported to be associated with carotenoid metabolism: for example, RAP2.2 and PIF1 have been shown to modulate carotenoid accumulation by transcriptionally regulating PSY in Arabidopsis [11,12]. By comparative analysis with tomato genome, six of nineteen carotenoidassociated transcription factors were differentially expressed during fruit development and ripening in watermelon [13]. In tomato, several transcription factors, such as CNR, RIN, TAGL1 and SGR have been shown to be involved in regulating ripening and, thus, fruit carotenoid accumulation [14][15][16][17], while CSN5B, AtERF98, ABI4, and AMR1 are known to regulate ascorbic acid levels in the context of both stress responses and growth regulation [10,[18][19][20]. Ascorbic acid levels have also been shown to be modified via transcriptional regulation of the biosynthetic pathway in high-pigment tomato, a mutant in light signal transduction [21]. Moreover, it has been reported that the MYB transcription factor, SlMYB12 modulates the expression of flavonoid biosynthetic genes in tomato fruits [22] and the flavonoid biosynthetic pathway is activated in tomato fruit by the transcription factors Del and Ros1 [23].
All these regulators or transcription factors have been characterized based on their respective mutants, but another strategy is to use transcriptome analysis to identify potential transcription factors associated with specific aspects of fruit development and biosynthetic pathways [24]. In one study, differential expression profiling identified 72 signal transduction or transcriptional factors genes that are potentially involved in tomato fruit development and ripening [25] and, in parallel, genes involved in anthocyanin biosynthesis have been found to be up-regulated during the fruit ripening processes, concurrent with color change and fruit development [26]. In this regard, transcript co-expression analysis is a potentially valuable strategy to link of transcription factors and structural genes. Such a relationship was reported for the flavonoid biosynthetic pathway, where gene expression studies combined with genetic mapping and segregation analysis suggested that SlMYB12 is a likely candidate for the y locus, which is responsible for flavonoid accumulation [22].
In this current study, RNA-seq combined with metabolism analysis was used to investigate global dynamic changes in gene expression and metabolite accumulation during the development and ripening of fruit from two tomato cultivars, Ailsa Craig (AC) and HG6-61. The two cultivars originated from different area and showed different maturity progress, AC is an English variety with precocity, while HG6-61was an elite line with late-maturity from China. Co-expression analysis of transcription factors and structural genes involved in ascorbic acid, carotenoid and flavonoid biosynthesis was performed, in order to identify putative transcription factors that regulate these biosynthetic pathways. Finally, an agroinfiltration assay was also used to investigate the effect of the candidate transcription factors on these specific metabolic pathways.

Plant material and growth conditions
Tomato plants (Solanum lycopersicum; cultivars AC and HG6-61) were grown in the same greenhouse at the National Center for Vegetable Improvement (Central China) during the spring season. Plant growth and cultivation was carried out according to commercial practices with a day/night temperature of 28/20°C. Plants were pruned so that fruits were on one vine per plant. To collect fruit samples from various developmental stages, flowers were tagged when fully opened (anthesis) and fruits were harvested from 5 individual plants of each genotype at 7, 14, 21, 28, 35, 42 and 49 days after flowering (DAF). In order to ensure the uniformity of the fruit samples at each stage, harvested fruits were visually inspected externally and internally (e.g. size, shape, pigmentation, seed development and locular jelly formation), and only fruits that were developmentally equivalent were used in the subsequent analysis [5]. The fruit pericarp at each stage were mixed, snap-frozen in liquid nitrogen and kept at -80°C until further analysis. For agroinfiltration, greenhouse grown AC plants were grown at 28°C/20°C (day/ night) with a 16 h photoperiod in 10 cm (diameter) plastic pots and fruits left on the plant until the breaker stage were used for agroinfiltration [27].

RNA extraction and RNA-seq
The AC and HG6-61 fruits harvested at 7, 14, 21, 28, 35, 42 and 49 DAF were frozen in liquid nitrogen and kept at -80°C until use. Total RNA was extracted using a ZP411-2 GREENspin RNA quick extraction kit (ZOMANBIO, Beijing), excluding polyphenols and polysaccharides from the sample according to the manufacturer's instructions. Total RNA were then sent to ABlife Wuhan where the libraries were produced and sequenced using Illumina's Genome AnalyzerIIx. Fruit sampling and RNA-seq from two cultivars AC and HG6-61 were carried out in parallel as two biological replicates. The sequencing data can be accessed at the website: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE64981. Raw sequences were filtered to remove the 3' adaptor sequence, low-quality reads (reads containing sequencing Ns > 5) and short reads (<16 nt) and the resulting sets of clean reads were used for the following analysis, as described previously [28]. All cleaned reads were mapped to contig assemblies using the Tophat mapping algorithm with the version 2.0.4 (http://tophat.cbcb.umd.edu/) allowing no more than 2-nucleotide mismatches. Clean reads that mapped to the genome sequences (SL2.40 version) of Solanum lycopersicum downloaded from SOL Genomics Network database (SGN, http://solgenomics.net/organism/Solanum_lycopersicum/genome). The multiple aligned reads were then filtered by tophat software and the remaining clean reads were designated as unambiguous clean reads. The number of unambiguous clean reads for each gene was calculated and then normalized to reads per kilobase of gene per million reads (RPKM), a standard unit to calculate UniGene expression [29]. The software edgeR was used to perform differential expression analysis [30]. edgeR can be used to analyze the difference in expression between two or more samples and indices of fold change (Log 2 ratio) and p-value (false discovery rate) provide an indication of whether a gene is differentially expressed. Here, genes with a p-value < 0.01 and a Log 2 ratio > 2.0 or < -2.0 were considered to be differentially expressed. The differentially expressed tomato genes extracted from ITAG2.4_proteins.fasta (ftp://ftp. solgenomics.net/genomes/Solanum_lycopersicum/annotation/ITAG2.4_release/) were used as query to identify Arabidopsis thaliana homologs (TAIR9 version) using an e-value of 1×e -5 by blastp (http://blast.ncbi.nlm.nih.gov/Blast.cgi). Finally, DAVID software was applied to perform a GO enrichment analysis of the annotated genes, as previously described [31].

qRT-PCR
The expression pattern of selected differentially expressed genes identified in the RNA-seq analysis was validated by qRT-PCR. The expression abundance of biosynthetic genes following agroinfiltration of fruits (see below) was also investigated by qRT-PCR. The sequences of the primer pairs (designed using Primer Premier 3.0 [http://frodo.wi.mit.edu/primer3]) are listed in S1 Table. The cDNA synthesis and qRT-PCR steps were performed as previously described [32].

Ascorbic acid, carotenoid and flavonoid extraction and HPLC analysis
Ascorbic acid extraction and HPLC analysis were carried out as described previously [33]. Briefly, samples were ground under liquid nitrogen and homogenised in 5 mL of cold 0.1% (w/ v) metaphosphoric acid. The homogenate was then centrifuged at 12,000 g for 10 min at 4°C. The supernatant was filtered through a Millipore membrane (0.22 μm) to measure reduced ascorbate and an aliquot of 300 μL was incubated with 300 μL 50 mM dithiothreitol for 15 min at room temperature to measure total ascorbate. Then, the extracts were analyzed by HPLC using an SB-aq column (Agilent) eluted with acetate buffer (0.2 mol L -1 pH 4.5) at a flow rate of 1.0 mL min -1 . Elutes were detected at 254 nm, and a standard curve from 2 to 40 μg mL -1 ascorbic acid was obtained.
Carotenoids were extracted and analyzed by HPLC as described previously [17]. Samples were ground into powder after freezing in liquid nitrogen. Carotenoids were eluted with methanol-methyl tert-butyl ether-H 2 O(81: 15: 4, v/v/v; eluent A) and methanol-methyl tert-butyl ether-H 2 O (10: 90: 4, v/v/v; eluent B) by a C 30 carotenoid column (150x4.6 mm i.d., 3μm) from Waters. The linear gradient program was performed as follows: initial condition was 100% A to 100% B in 90 min, and back to the initial condition for re-equilibration. Analysis was conducted under subdued light to avoid carotenoid degradation during analysis. HPLC-grade β-carotene, lycopene, phytoene and lutein standards were obtained from Sigma (St Louis, MO, USA).
Flavonoids were extracted from 100 mg freeze-dried samples using 80% methanol which contains 0.1mg/L lidocaine as a quantification standard. The mixture was extracted for 12 h at 4°C. Flavonoids were analyzed and identified using a QToF 6520 mass spectrometer (Agilent Technologies, Palo Alto, CA, USA) coupled to a 1200 series Rapid Resolution HPLC system by a method modified from that described by [34]. 20μL of sample extract was loaded onto a Zorbax StableBond C18 1.8 μm, 2.1x100 mm reverse-phase analytical column (Agilent Technologies). Mobile phase A was 0.1% formic acid in water and mobile phase B was acetonitrile with 0.1% formic acid. The following gradient was used: 0 min-5% B; 20 min-95% B; 22 min-95%B; 22.1 min-10% B; 28 min-5% B. The flow rate was 0.3 mL min -1 and the column temperature was held at 35°C for the duration. The source conditions for electrospray ionization were as follows: gas temperature was 350°C with a drying gas flow rate of 10L min -1 and a nebulizer pressure of 55 psig. The capillary voltage was 3.5 kV in positive ion mode. The fragmentor voltage was 135V and skimmer 65V. Scanning was performed using the auto MS/MS function at 2 scans s -1 with a sloped collision energy of 3.5V/100 Da with an offset of 5V. Flavonoids were quantified by calculating the area of each individual peak and comparing this to internal standard.

Correlation analysis of structural genes and transcription factors
A correlation analysis of structural genes and transcription factors was carried out to identify transcription factors that were co-expressed with the enzymatic genes involved in ascorbic acid, carotenoid and flavonoid metabolism [22]. The transcriptome sequencing data of cultivar Heinz and the wild relative Solanum pimpinellifolium were also downloaded from TFGD (http://ted.bti.cornell.edu/cgi-bin/TFGD/digital/experiment.cgi?ID=D004). In order to exclude false positives, structural genes and transcription factors with an RPKM value !5.0 in at least one of the seven stages during fruit development were selected, and transcription factors with correlation coefficient values of !0.8 by t test (The formula to calculate t value was t = r ffiffiffiffiffi ffi nÀ2 p ffiffiffiffiffiffi ffi 1Àr 2 p , at P<0.05 and n = 7. |t|>t 0.05,5 = 2.571 means significant correlation, so r>0.754 means significant correlation) were considered to have an expression that was significantly correlated with the expression of genes in the various biosynthetic pathways. The co-expression analysis was preformed by "CORREL" function in "EXCEL2003" and confirmed by an in-house Perl scripts and IBM SPSS Statistics software.

Agroinfiltration
The full length cDNAs of three transcription factors (MYB [Solyc09g010840.1], NAC [Solyc12g013620.1] and ZIF [Solyc06g065440.1]) were amplified from the AC by reverse transcription (RT)-PCR. The PCR products were inserted into the entry vector pDONR221 using the BP enzyme (Invitrogen, USA), and then cloned into the destination vector pMV3 using the Gateway recombination reaction (Invitrogen, USA). The resultant constructs, as well as the pMV3 empty vector (control), were introduced into the Agrobacterium tumefaciens strain EHA105 and agroinfiltration of AC fruits was carried out as previously described [27]. For each construct, three fruits from the same position of two independent plants were agroinfiltrated. RNA were isolated from infiltrated fruits and analyzed by qRT-PCR. The agroinfiltration experiments were repeated three times. The sequences of the primer pairs (designed using Primer Premier 3.0) are listed in S1 Table. Results Changes in ascorbic acid, carotenoids and flavonoids content during tomato fruit development and ripening Metabolism analysis was carried out to monitor the dynamics of fruit ascorbic acid, carotenoids and flavonoids content in two tomato cultivars AC and HG6-61 (Fig 1). Most metabolites showed similar fluctuations in the two cultivars during fruit development, but the accumulation of some metabolites such as lycopene in carotenoids and naringenin chalcone belonging to flavonoids in AC fruits reached to its peak value earlier than in HG6-61 (Fig 1B and 1C). The content of total ascorbic acid showed a high-low-high pattern along fruit development and ripening but higher in HG6-61 than AC at 49 DAF (Fig 1A). The carotenoid accumulation showed a increasing trend in two cultivars but faster in AC than HG6-61 ( Fig 1B). For flavonoids, most of the metabolites showed increasing trend along fruit development and ripening except that chlorogenic acid and rutin declined gradually toward maturation (Fig 1C). When comparing the metabolite concentration between two cultivars, most of final metabolite concentration is equal except that difference occurred in ascorbic acid, phytoene, β-carotene, naringenin chalcone, caffeic acid, and naringenin-hexose. The metabolite accumulation difference in the later development stages of two genotypes is possibly due to the different maturation progress.

Transcriptome profiling of seven tomato fruit developmental stages
We used RNA-seq to profile the transcriptomes of fruit from the tomato cultivars AC and H6-61 at 7 time points: 7, 14, 21, 28, 35, 42 and 49 DAF. The average number of reads produced for each sample was 9.5 million (Table 1), with the number of clean reads per sample ranging from 3.3 to 10.9 million. The number of reads that were mapped to the S. lycopersicum genome ranged from 1,786,380 to 6,780,667, and the uniquely mapped reads from 1,657,266 to 6,576,631. We found that most of the uniquely mapped reads were mature mRNA or non-coding RNA (ncRNA), and the reads that mapped multiple times were primarily ribosomal RNA (rRNA) or transfer RNA (tRNA). Only uniquely mapped reads were used in subsequent analyses. A read distribution analysis revealed that >70% of the uniquely mapped reads originated from coding sequences (CDS), while the rest were distributed in intergenic regions, 3'UTRs  (untranslated regions), introns or 5'UTRs (Table 1). mRNA coverage analysis suggested the set of sequences derived from the RNA-seq analysis covered the complete region of mRNAs, with most reads falling in 20-80 relative position of the mRNA (S1 Fig). RPKM statistics showed that approximately 60% of the genes in each sample had RPKM values < 20, suggesting that most of the genes were expressed at low levels (S2 Fig), likely reflecting the fact that the plant does not express all its genes during its entire life cycle.

Changes in the transcriptome during fruit development and ripening
In this study, we used two tomato cultivars, AC and HG6-61, to reveal the changes in the transcriptome during fruit development and ripening. It was found that approximately 70% of the total numbers of detected genes were expressed at any one fruit developmental stage (Table 1). A total of 26,397 tomato genes were expressed in at least one of the seven sampling points, accounting for 76% of the 34,727 genes in the tomato reference genome, suggesting that the RNA-seq experiment gave a saturated coverage of expression (S2 Table). Among the 26,397 genes detected, 14,758 were expressed in all seven stages and five of these genes were most highly expressed in AC at all stages, with RPKM values >4,000. Two of them, Solyc05g053070.2 and Solyc05g054090.2, are located on chromosome 5 and encode proteins with unknown function. Two others, located on chromosome 1, encode a pre-mRNA-splicing factor ATP-dependent RNA helicase (Solyc01g110700.2) and the CCR2 glycine-rich RNA-binding protein (Solyc01g109660.2). The fifth gene (Solyc11g008510.1) encodes a 60S ribosomal protein. Transcriptome changes during fruit development and ripening were examined using a K-mean cluster analysis of gene expression patterns, which divided the 26,397 genes into 20 groups (Fig 2). Each group exhibited was characterized by a unique expression pattern and the largest group A correlation analysis using the 14 sampling points from the two tomato cultivars also showed that AC and HG6-61 had similar transcription patterns at each development stage (Fig  3), so there was a strong resemblance in terms of gene expression between AC and HG6-61 during fruit development. According to the correlation coefficient analysis, we observed that the number of differentially expressed genes increased through fruit development, with the correlation coefficient decreasing toward ripening for both cultivars.  Fig). To identify significantly altered biological processes during fruit development and ripening based on these data, the differentially expressed genes from the seven stages of both cultivars were subjected to a GO term enrichment analysis. The differentially expressed genes were divided into 34 groups including "RNA regulation", "cell wall", "secondary metabolism", "hormone metabolism", "stress", "lipid metabolism", "protein", "signaling", "development" and "transport" (Fig 4) and this analysis also indicated that the two genotypes had similar transcript expression patterns (Fig 5). Some groups, such as "photosynthetic light reactions", and "cell wall", contained genes that showed continuous down-regulation during fruit development and ripening, while other groups were comprised of genes that showed the opposite pattern; specifically "development", "ethylene response/ signaling pathways", "ABA response" and "hormone metabolism". Transcript abundance of genes involved in the "micro RNA, natural antisense" category did not change substantially during fruit development, while the expression of several genes changed dramatically during fruit development and ripening in AC (Table 2).
Two transcription factors in the "RNA regulation" category (Solyc01g091000.2, Solyc12g087940.1) showed a decrease in expression throughout fruit development and ripening, while another transcription factor (HSFB2A, Solyc03g026020.2) showed an increase in expression during fruit maturation. In contrast, the expression of a homolog of the A. thaliana ATHB13 DNA binding transcription factor (Solyc05g007180.2) declined during early fruit development but increased sharply at the mature stage.
The expression of several genes involved in secondary metabolism (e.g. flavonoid and phenylpropanoid biosynthesis) showed notable changes during fruit development. For example, the transcript abundance of three flavonoid related genes (Solyc10g052490.1, Solyc09g059170.1, Solyc07g043500.1) decreased during fruit development and ripening, while a gene encoding a 2OG-Fe(II) oxygenase family protein (Solyc07g054920.1), which is involved in flavonoid metabolism (http://solgenomics.net/), was up-regulated at the early stages of fruit development and then declined sharply to undetectable levels at the final ripening stage. The expression of a gene encoding the caffeoyl-CoA 3-O-methyltransferase (Solyc02g093270.2), which is associated with phenylpropanoid metabolism (http://solgenomics.net/), similarly showed an 'up-downup pattern' and peaked at 28 DAF and 49 DAF.
In the context of hormone biology, two genes related to gibberellin (GA) synthesis and signaling (Solyc06g007910, Solyc03g006880.2) showed decreasing expression during fruit development Tomato Transcriptome and Transcription Factors in Nutrient Synthesis and ripening, while genes related to ethylene biosynthesis (Solyc02g036350.2, Solyc07g049530.2) and signaling (Solyc02g077370.1) exhibited increased expression during the later stages of fruit maturation. However, an ethylene response factor 1 gene (Solyc12g010520.1) showed stable expression throughout fruit development.
Some of the differentially expressed genes are annotated as being associated with cell wall metabolism and fruit softening. For example a gene encoding a pectinesterase (Solyc12g008530.1) showed a decreasing expression pattern during early development but was up-regulated in mature fruits, which is consistent with the well characterized phenomenon of pectin modification in ripening fruit [25]. In contrast, genes encoding a pectate lyase (Solyc03g071570.2) and a xyloglucan endotransglucosylase hydrolase (Solyc07g052980.2) showed a pattern of deceasing expression during fruit ripening. Other patterns were also observed: the transcript levels of an expansin (Solyc06g051800.2) increased during the early stages of fruit development, peaked at 35 DAF and then gradually declined, while an endo-1,4-β-mannanase gene (Solyc01g008710.2) showed high expression at 42 DAF but was barely detectable at other stages.

Transcription factors associated with enzymes involved in ascorbic acid, carotenoid and flavonoid metabolism
The flavonoid, carotenoid and ascorbic acid biosynthetic pathways have been well characterized in terms of the constituent enzyme encoding genes [35][36][37]. However, the regulators or transcription factors that modulate the expression of these genes remain largely unknown and so we sought to identify some candidates based on the transcriptome analysis. In total 46, 18, and 14 structural genes putatively involved in the ascorbic acid, carotenoid and flavonoid metabolic pathways, respectively, were selected (S3 Table). Based on the transcriptome sequencing data of AC and HG6-61, a transcription abundance correlation analysis was carried out between selected structural genes from the three pathways and 823 annotated tomato transcription factors (S4 Table) and the transcription factors whose expression levels were highly correlated with those of the structural genes were identified (Table 3). Moreover, some metabolic related transcription factors were validated by two sequenced varieties cultivar Heinz and the wild relative Solanum pimpinellifolium (S5 Table).
In the ascorbic acid metabolic pathway, the expression of a MYB gene (Solyc12g088320.1) showed a high correlation with the transcript levels of many of the selected structural genes (25 and 17 of the 46 targeted genes from AC and HG6-61, respectively). In total, 34 transcription factors showed a high correlation with at least 15 structural enzyme encoding genes in AC, and most of them showed a positive correlation with the expression of structural genes, including Gene expression data are presented as log 2 fold change values compared to the first sampling point (7 DAF) within each genotype. The data were subjected to a Wilcoxon test in PageMan [46], and the results are displayed as a false-color code. Bins colored in red correspond to genes that were significantly up-regulated and bins colored in blue correspond to genes that were significantly down-regulated.  2) and a bHLH protein (Solyc01g096050.2) was negatively correlated with that structural genes involved in the ascorbic acid metabolic pathway. Interestingly, GPI-1, PMI-3, PMM-1, GME-1, GME2, GalDH, GalUR-2, GalUR-3, GalUR-6, AO-2, APX-1, APX-6, APX-7 and MDHAR-3 showed a high positive correlation with most of the 34 transcription factors, while GMP-2, APX-3, APX-10 and DHAR-5 showed a strong negative correlation with many of the 34 transcription factors (S4 Table). Twenty transcription factors were associated with the flavonoid metabolic pathway, of which 12 had a positive correlation and 8 a negative association (Table 3). MYB12 (Solyc01g079620.2), which showed a highly positive correlation with many of the selected structural genes (6 and 7 of the 14 selected genes from AC and HG6-61, respectively), is considered to be an important regulator of flavonoid biosynthesis in pink tomato fruit [22]. The expression of other transcription factors, such as bHLH proteins (Solyc05g006650.2, Solyc09g083220.2 and Solyc06g083170.2), MYB proteins (Solyc02g088190.2, Solyc04g078420.1) and MYB-like proteins (Solyc08g077230.2, Solyc12g017370.1) was highly correlated with many of the selected structural genes in the flavonoid pathway. A total of 37 transcription factors showed a significant correlation with at least 5 structural genes from the carotenoid metabolic pathway in both AC and HG6-61. A MADS-box gene were positively correlated with the expression of genes in the carotenoid pathway. Other transcription factors, such as AUX/IAA, bHLH, MYB, SBP-box, transcriptional factor B3 and zinc finger proteins also showed a high correlation with carotenoid metabolism, suggesting a complex underlying regulatory network (S4 and S5 Tables). Furthermore, a correlation analysis was carried out between metabolites accumulation and transcription abundance of structural genes or related transcription factors (S6 Table). Most structural genes and transcription factors involved in ascorbic acid metabolic pathway were significantly positively correlated with oxidized ascorbate (DHA) accumulation, but few were significantly correlated with total ascorbate abundance. The transcription factors showed a higher percentage of correlation with ascorbic acid than structural genes. In flavonoid metabolic pathway, the content of chlorogenic acid and rutin exhibited positive correlation with most structural and regulatory genes, but only two structural genes (Solyc03g117870 and Solyc05g010320) and one transcription factor (Solyc03g006000) significantly correlated with naringenin chalcone. It was surprising that the expression level of most structural genes and related transcription factors in carotenoid metabolic pathway didn't show significant correlation with carotenoids content, including lutein, phytoene, β-Carotene or lycopene.
Verification of gene expression related to ascorbic acid, carotenoids and flavonoid metabolism qRT-PCR was employed to verify the RNA-seq based transcription profiles. RNA was extracted from fruits at the same seven developmental stages and cDNA was synthesized as a template for qRT-PCR. The expression of 15 transcription factors putatively related to ascorbic acid, carotenoid and flavonoid biosynthesis was assessed by qRT-PCR (Fig 6A-6C) and correlation analysis revealed a high degree of consistency between transcript abundance determined by qRT-PCR or RNA-seq (Fig 6D).

Agroinfiltration verification of the association between selected transcription factors and genes encoding enzymes involved in ascorbic acid metabolism
To further validate the transcriptional association between the transcription factors and structural genes presented in Table 3, we performed agroinfiltration to verify the putative regulatory activity of the selected transcription factors on genes associated with ascorbic acid biosynthesis. Specifically, we selected three representative transcription factors: a MYB protein (Solyc09g010840.1), a NAC protein (Solyc12g013620.1) and a ZIF protein (Solyc06g065440.1) (S6 Fig). After agroinfiltration, the relative expression abundance of the corresponding genes increased at least three fold (MYB-2) and in one instance more than 30 fold (NAC-9). The expression levels of the selected structural genes involved in ascorbic acid metabolism was then assessed (Fig 7). In the MYB over-expressing fruit, most of the structural genes (13 of 18) were up-regulated, including GMP2, GalUR, AO2, and APX6. Only 5 of the genes were either down-regulated (3 of 5) or show no change in expression (2 of 5). A similar expression profile was seen in the NAC and ZIF over-expressing fruit (Fig 7). Furthermore, we found a quantitative relationship between the degree of over-expression of the transgene and the expression of the associated structural genes, further suggesting a correlation between expression of each of the transcription factors and the ascorbic acid biosynthesis genes (Fig 7).

Discussion
Fruit development has been extensively investigated with regards to ethylene synthesis, signal transduction and cell wall-modifying proteins [25,[38][39][40][41][42], and studies of non-ripening tomato mutants have provided insights into the signal transduction networks that govern ethylene synthesis and production, perception and signaling, as well as other aspects of ripening related metabolism [5]. However, the transcriptional profiles related to fruit nutritional quality have been less studied. Here, two cultivated tomato genotypes, AC and HG6-61, were analyzed by RNA-seq to characterize the genome-wide transcriptome dynamics and differential expression of genes during fruit development and ripening. A total of 26,397 tomato genes were detected in the present study, providing a more detailed overview of gene expression than obtained previously using microarrays [25]. Furthermore, the parallel analysis of two different cultivated Tomato Transcriptome and Transcription Factors in Nutrient Synthesis tomato varieties, Ailsa Craig, an English heritage variety, and HG6-61, an elite tomato inbred line from China, allowed as assessment of conserved and divergent patterns of gene expression. From this we concluded that the gene expression profiles of the two genotypes were highly similar during fruit development and ripening (Figs 3 and 5) and this high concurrence between two genotypes can prove our transcriptional data is reliable in the absence of replicate. Moreover, given that the expression profiles of genes determined by the RNA-seq evaluation and further validated by q-PCR were extremely similar (Fig 6), we are confident that our RNA-seq data set will be broadly useful for studies of fruit transcriptome dynamics.
Many studies of the tomato fruit transcriptome to date have broadly addressed hormonemediated fruit ripening, rather than targeting the development of molecular basis of the underlying quality traits [5,43]. Transcriptome analyses of ripening related tomato mutants, such as ripening-inhibitor (rin) [25], non-ripening (nor) and never-ripe (Nr) [5] have helped dissect processes associated with ethylene-mediated fruit ripening. This late stage of fruit development is accompanied by the accumulation of carotenoids, flavonoids and ascorbic acid, all of which are beneficial to human health. The identity of the factors that regulate the accumulation of these compounds is therefore of great commercial interest. A study of the transcriptomes of orange(Citrus sinensis) fruit and its red fleshed mutant showed the differential expression of genes involved in carotenoid metabolism; however, associated transcription factors were not reported [44].
A key hypothesis in this study was that an investigation of the structural genes that contribute to the formation of nutritionally valuable compounds, such as ascorbic acid, carotenoids and flavonoids, during tomato fruit development, in parallel with an evaluation of regulatory genes would help elucidate the molecular basis of fruit nutritional quality. The expression of genes involved in metabolite accumulation during fruit development is known to vary between different species. For example, the L-galactose-1-phosphate phosphatase (GGP) gene which catalytic L-Galactose1-P to form L-Galactose was initially reported to have the highest expression during maturation, and was thus assumed to be the limiting step for ascorbic acid biosynthesis in tomato [7]. However, GGP expression showed a decreasing pattern in the ascorbic acid-rich fruit of chestnut rose, but the expression of the dehydroascorbate reductase (DHAR) gene which was the key enzyme involved in ascorbic acid recycling pathway correlated with ascorbic acid accumulation [45]. Such examples underline the complexity and diversity of the key pathways that determine metabolite levels. Some transcription factors that regulate metabolite biosynthesis have been isolated by map-based cloning or mutant screening, including MYB12, which regulates flavonoid biosynthesis in tomato [22], ERF98 which modulates ascorbic acid biosynthesis in Arabidopsis [18] and PIF1 which controls carotenoid biosynthesis in Arabidopsis [12]. However, correlation analysis of enzyme encoding genes and transcription factors also provides an effective approach to finding candidate transcription factors [12,22]. Using transcriptional and metabolic association analysis, we identified transcription factors that are potentially associated with the biosynthesis of ascorbic acid, carotenoids and flavonoids during tomato development ( Table 3). The expression of the transcription factor LIM (Solyc06g071310.2) is highly correlated with the expression of genes involved ascorbic acid and flavonoid biosynthesis (Table 3 and S4 Table) and also with the total ascorbate concentration in an introgression population derived from the wild tomato species S. pennelli (http://ted.bti. cornell.edu/cgi-bin/TFGD/array_data/probe_expression.cgi?array_ID=A03&probe_ID= LE1O19). A MYB transcription factor (Solyc09g010840.1) show a decreasing transcriptional patter during fruit development and ripening in AC fruit, while its transcription abundance in ripe fruit of HG6-61 is higher than that in AC (S3 Table). This transcription pattern is consistent with metabolite level of ascorbic acid. The ascorbic acid concentration in fruits of HG6-61 is higher than that in AC at 49 DAF (Fig 1A). These results support our hypothesis that the transcription factors listed in Table 3 are potentially involved in regulating ascorbic acid biosynthesis. We also identified a gene encoding MYB12 (Solyc01g079620.2), which has been reported to be an important regulator of flavonoid biosynthesis in tomato fruit, and to be positively associated with the expression levels of 9 of 14 structural genes from the flavonoid biosynthesis pathway [22]. Moreover, the transcription abundance of MYB12 in reaches very high level in green mature fruit stage (S3 Table), which is consistent to the high level of major flavonoid compound (e.g. naringenin chalcone) (Fig 1C). In our correlation analysis, expression of MYB12 showed a high correlation with transcript levels of several flavonoid biosynthetic genes in the two tomato genotypes (Table 3 and S4 Table), while several other MYB and bHLH genes also showed a significant expression correlation with such genes, and may therefore also be involved in the regulation of flavonoid biosynthesis as previously report [46]. Likewise, three AUX/IAA genes (Solyc03g120500, Solyc09g090910 and Solyc12g007230) were selected in ascorbic acid biosynthesis, consistent with previously report that ascorbic acid content in tomato fruit is associated with genes involved in hormone signaling [47]. Unexpectedly, regulators related to ripening and carotenoid accumulation such as RIN, CNR and Nor were not found in the correlation analysis (Table 3). It is most likely that RIN, CNR and Nor genes were coexpressed with individual genes involved in carotenoid biosynthesis, e.g. PSY1 or PDS1, instead of major of biosynthetic genes.
Most metabolites showed similar fluctuations in the two cultivars during fruit development, but the accumulation of several compounds in carotenoids and flavonoids in AC fruits reached to its peak value earlier than in HG6-61, suggesting different ripening progress (Fig 1). This can also be reflected by the differential response to ripening at transcription level. The expression of PSY1 showed a delayed activation in HG6-61 compared to AC (S3 Table). Also, the expression of RIN gene (Solyc05g012020) and AP2 (Solyc03g044300) which were normally ripening induced were low in both cultivars during fruit enlargement stage (before 35 DAF), and were induced earlier in AC than in HG6-61. Although the ripening progress is slightly different between two genotypes, the focus of our study is on the correlation of transcription abundance of transcription factors and enzymatic genes or metabolites. We carried out correlation analysis in these two varieties respectively and the ripening variation will not affect our independent correlation analysis in each cultivar. Same transcription factors were selected by correlation analysis from these two different ripening genotypes even in some published data (Heizn1706 and S. pimpinellifolium) (S5 Table), revealing their conserved involvement in carotenoids, flavonoids and ascorbic acid metabolism.
Agroinfiltration provides a high throughput transient gene expression system that has been widely used in the gene functional analyses [48]. In this current study, this transient expression system was employed to investigate the influence of candidate transcription factors on metabolite pathway genes. The expression levels of the three transcription factors investigated here were significantly enhanced in vivo by agroinfiltration ( S6 Fig) and, as a result, most of the structural genes that we surveyed in the various biosynthetic pathways were up-regulated as hypothesized, while expression of only a few of the genes was down-regulated or remained unchanged. This suggests that most of the selected transcription factors are involved in the regulation of these structural genes. One exception was GMP2, the expression of which was negatively correlated with the expression of MYB (Solyc09g010840) in the co-expression analysis (S4 Table), but was up-regulated when MYB was overexpressed (Fig 7). In contrast, the APX5 gene was positively associated with MYB in the co-expression analysis (S4 Table), but was down-regulated when MYB was overexpressed (Fig 7), highlighting the complex regulatory network controlling metabolite biosynthesis. The co-expression analysis of transcription factors and enzyme encoding genes did not inherently indicate whether the identified transcription factors were potential inducers or suppressors of the regulated genes. If the transcription factors work downstream of the enzyme encoding genes, it is possible that a feedback inhibition would occur in the agroinfiltrated fruits to maintain metabolite levels. On the other hand, the expression levels of different members of a gene family may be differentially affected by transcriptional regulation, and the expression of the enzyme encoding genes may affect other genes involved in the biosynthetic pathway.
The RNA-seq transcriptome profiling provides an indication of the molecular mechanisms that govern fruit development at the transcriptional level. Processes that are known to be stage related were evident the patterns of gene expression: an example was the general down-regulation of photosynthetic light reactions during fruit development in the last two developmental stages (Fig 5), in agreement with the reduced photosynthesis capacity and carbon assimilation in ripe fruit [49]. The gene encoding the ethylene response factor 9 (Solyc07g053740.1) showed high expression levels particularly at later stages, suggesting that it may play an important role in fruit ripening (S3 Table), and its expression was also negatively correlated with flavonoid biosynthetic genes (Table 3), which are expressed earlier in development. However, there were also some inconsistencies between the gene expression and the corresponding biological process. The expression of genes encoding a pectinesterase, a pectate lyase and a xyloglucan endotransglucosylase hydrolase (Solyc12g008530.1, Solyc03g071570.2 and Solyc07g052980.2) decreased throughout fruit development and was almost undetectable at the last two stages, which is consistent with previously reported results [44,50] but contrary to previously reported patterns of cell wall degradation during fruit ripening [50].
In summary, the transcriptome of tomato fruit during development and ripening was extensively investigated. Correlations between the expression of metabolite biosynthetic genes and transcription factors were used to suggest candidate transcription factors that may regulate metabolite formation. Using RNA-seq analysis, the transcript abundance of a total of 26,397 genes was revealed. A total of 823 transcription factors were identified and their expression levels were compared to those of genes encoding enzymes involved in flavonoid, ascorbic acid and carotenoid biosynthesis. This revealed 20, 34 and 37 transcription factors putatively involved in the biosynthesis of flavonoids, ascorbic acid and carotenoids, respectively. Most of these candidate transcription factors have not previously been associated with metabolite biosynthesis, although functional evidence is available for a few, such as MYB12 [22]. Finally, three selected transcription factors (MYB, NAC and ZIF) were shown to modulate the expression levels of genes involved in the biosynthesis of ascorbic acid.  Table. Correlation analysis between metabolite abundance and the transcript abundance of its related transcription factors and structural genes. (XLS)