Comparison of fruit organic acids and metabolism-related gene expression between Cerasus humilis (Bge.) Sok and Cerasus glandulosa (Thunb.) Lois

Cerasus humilis (Bge.) Sok and Cerasus glandulosa (Thunb.) Lois are economically important fruit-producing shrubs. Although these two species have similar looking fruits, their fruit organic acid contents differ drastically. In this study, we focused on comparing the organic acid content, activity of enzymes gene expression involved in organic acid metabolism in both C. humilis and C. glandulosa fruits. To investigate the differences of organic acid metabolism in fruits of these two species, a comparative transcriptome analysis was performed. Our results showed that temporal changes of two main organic acids exhibited different trends in these two species. Transcriptome sequencing of developing C. humilis and C. glandulosa fruits systematically revealed 6,594 differentially expressed genes. Compared with C. humilis, the expression levels of 3,469 and 3,125 genes were up- and down-regulated in C. glandulosa, respectively, including one PEPC gene, 12 malic acid metabolism genes, 25 citric acid cycle genes, and 194 NAD/NADP metabolism genes. The correlation analysis and principal component analysis of gene expression, enzymatic activity and organic acid content showed that differences in the expression of genes encoding the NAD-malate dehydrogenase (NAD-MDH) and NADP-malate enzyme (NADP-ME) contributed substantially to the observed differences in organic acid accumulation of two species. Our results provide a solid foundation for future elucidation of key mechanisms regulating organic acid biosynthesis in C. humilis and C. glandulosa fruits and could lead to efficient and highly targeted generation of more commercially accepted cultivars.


Introduction
The acidity of fleshy fruit is an important component of fruit organoleptic quality [1,2]. Fruit acidity is due to the presence of organic acids synthesized within fruits [3], with considerable developmental stages (8,10,12,14,16,18, and 19 weeks after flowering), while C. glandulosa fruit samples were collected at five developmental stages (8,10,12,13, and 14 weeks after flowering), from plants grown at the experimental farm of the Horticultural College, Shanxi Agricultural University, Taigu, China. Following collection, the fruit samples were immediately frozen in liquid nitrogen and stored at -80˚C until analysis.

Determination of total acid, malic acid, and citric acid contents
Fruit organic acid extraction and determination were carried out with some modifications to the methods described by Nergiz and Ergönül [30]. Samples were analyzed with HPLC (Ulti-Mate3000 system, Thermo Fisher, USA), and the analysis conditions were as follows: chromatographic separation was performed on a Syncronis C18 column (5 μm, 250 mm × 4.6 mm); mobile phase was methanol:0.01mol/L K 2 HPO 4 solution (10:90, pH = 2.0) with a flow rate of 0.5 mL/min; column temperature was set at 25˚C; sample injection volume was 15 μL; detection wavelength was set at 210 nm. Three replicates of each sample were analyzed.

Determination of enzyme activity
Enzyme samples were prepared as described by Chen et al [8]. Enzyme activity was determined as described by Hirai and Ueno [31], but the reaction volume was changed to 3 mL. The enzyme activity of NAD-MDH was measured in reaction mixture containing 300 μL Tris-HCl (800 mM/L, pH 8. The reaction components were mixed just before adding the substrate. The absorbance (at 340 nm) of the enzyme samples was measured using a UV-2450 spectrophotometer (Shimadzu Corporation, Japan) immediately after adding the substrate (1 s for the unit readings with a 3-min scan). Changes in absorbance were recorded and the analysis was repeated three times. One enzyme unit corresponded to an absorbance change of 0.01 in 1 min, and enzyme activity was expressed in units per gram of pulp (fresh weight) per minute (U g −1 FW min −1 ).

RNA extraction
Frozen fruit samples (1 g) from each of the different developmental stages (seven stages for C. humilis and five stages for C.glandulosa) were used for the total RNA extraction using a modified CTAB method [27]. RNA quality was then assessed on 1% agarose gels for degradation and contamination. The purity of the extracted RNA was checked using the NanoPhotometer 1 spectrophotometer (Implen, CA, USA), while the concentration was measured using the Qubit 1 RNA Assay Kit and the Qubit 1 2.0 Fluorometer (Life Technologies, CA, USA). Finally, RNA integrity was assessed using the RNA Nano 6000 Assay Kit and the 2100 Bioanalyzer system (Agilent Technologies, CA, USA).

Library preparation for transcriptome sequencing
The sequencing cDNA libraries were constructed for C.humilis and C.glandulosa respectively with the NEBNext 1 Ultra™ RNA Library Prep Kit for Illumina 1 (NEB, USA) following the manufacturer's recommendations. Index codes were added to attribute sequences to each sample. Briefly, mRNA was purified from total RNA using poly-T oligo-attached magnetic beads and then fragmented by heating in 5X NEBNext First Strand Synthesis Reaction Buffer containing divalent cations. First-strand cDNA was synthesized using a random hexamer primer and M-MuLV Reverse Transcriptase (RNase H−). Second-strand cDNA was synthesized with DNA Polymerase I and RNase H. The remaining overhanging ends were converted to blunt ends via exonuclease/polymerase activities. After adenylating the 3 0 ends of cDNA fragments, the NEBNext Adaptor with a hairpin loop structure was ligated for a subsequent hybridization.
To select cDNA fragments that were 150-200 bp long, the library fragments were purified with the AMPure XP system (Beckman Coulter, Beverly, USA). Then, 3 μL USER enzyme (NEB, USA) was added to the size-selected, adaptor-ligated cDNA samples, which were then incubated at 37˚C for 15 min and then at 95˚C for 5 min. The subsequent polymerase chain reaction (PCR) was completed with the Phusion High-Fidelity DNA polymerase, universal PCR primers, and the Index (X) Primer. Finally, the PCR products were purified with the AMPure XP system and the library quality was assessed using the 2100 Bioanalyzer system.

Clustering and sequencing
The index-coded samples were clustered using the cBot Cluster Generation System and the TruSeq PE Cluster Kit v4 cBot-HS (Illumina) according to the manufacturer's instructions. The libraries were then sequenced with the HiSeq 2500 platform (Illumina) to generate pairedend reads. Raw data (raw reads) in fastq format were first processed using in-house Perl scripts. Clean data (clean reads) were obtained by removing reads with an adaptor, reads containing multiple Ns, and low-quality reads. Additionally, the Q20, Q30, GC content, and sequence duplication level of the clean data were calculated. All downstream analyses were based on high quality clean data.

Transcriptome assembly and function annotation
For all libraries/samples, the left files (read1 files) were pooled into one left.fq file, while the right files (read2 files) were combined into one right.fq file. The transcriptome was assembled based on the left.fq and right.fq files using Trinity [32] with min_kmer_cov set to 2 by default. All other parameters were also set to default values. The transcriptome data of C. humilis and C. glandulosa were submitted to NCBI BioProject under the accession number PRJNA417674.
Gene function was annotated based on the following databases: Nr (NCBI non-redundant protein sequences); Nt (NCBI non-redundant nucleotide sequences); Pfam (Protein family); KOG/COG (Clusters of Orthologous Groups of proteins); Swiss-Prot (manually annotated and reviewed protein sequence database); KO (KEGG Ortholog database); and GO (Gene Ontology).

Quantification of gene expression levels and identification of differentially expressed genes
Gene expression levels were estimated by RSEM [33] using a two-step process. First, clean data were mapped back to the assembled transcriptome. Second, the read count for each gene was obtained based on the mapping results.
For samples with three biological replicates, the analysis of differentially expressed genes between two conditions/groups was completed using the DESeq R package (1.10.1), which detects differentially expressed genes using a model based on the negative binomial distribution. The resulting P-values were adjusted with the Benjamini and Yekutieli approach for controlling the false discovery rate [34]. Genes with an adjusted P-value < 0.05 were considered differentially expressed. Prior to the analysis of differentially expressed genes, the read counts for each sequenced library were adjusted by the edgeR program package with one scaling normalization factor. Genes differentially expressed between two samples were analyzed with the DEGseq (2010) R package. The P-value was adjusted using the q-value [35]. The following threshold was applied to identify differentially expressed genes: q-value < 0.005 and |log 2 (fold change)| ! 1. The WEGO online program (http://wego.genomics.org.cn/cgi-bin/ wego/index.pl) was used for the GO functional classification of all unigenes and determination of the distribution of gene functions.

Quantitative real-time PCR analysis
Total RNA was used as the template to synthesize cDNA with the PrimeScript™ RT reagent Kit with gDNA Eraser (Perfect Real Time) (TaKaRa, Japan). The 20-μL reverse transcription reaction mixture contained 5 μL total RNA (1 μg), 1 μL 10 mM dNTP, 1 μL random primer (9-mer) (50 mM), 1 μL oligo-d(T) 18 primer (50 mM) (TaKaRa, Japan), and 6 μL DEPC water. The quantitative real-time PCR (qPCR) was completed using qPCR SYBR 1 Premix Ex Taq™ II (Tli RNaseH Plus) (TaKaRa) and the 7500 Real-Time PCR System (Applied Biosystems, USA). The qPCR primers used to validate the differentially expressed genes were synthesized by Beijing Liuhe Genomics Technology Co., Ltd. (Table 1). Each gene was analyzed using four replicates, after which the average threshold cycle was calculated for each sample. An endogenous actin gene was used for data normalization [36]. Relative fold changes in gene expression were calculated using the 2 −ΔΔCt method.

Statistical analysis
Results are expressed here as mean ± SD. The differences among means were evaluated using Tukey's multiple comparison test and the statistical significance was set at p < 0.05. Principal component analysis (PCA) was carried out using SAS 8.0 (SAS Institute, Cary, NC, USA).

Analysis of organic acid content in developing C. humilis and C. glandulosa fruits
The C. humilis and C. glandulosa fruits reached maturity at 19 and 14 weeks after flowering (WAF), respectively (Fig 1). Malic acid and citric acid were the main organic acids in their Study on fruit organic acids and fruit transcriptomes of C. humilis and C. glandulosa fruits. Similar trends were observed for the total acid, malic acid, and citric acid content of C. humilis fruits, with an increase during the early fruit-developmental stage, followed by a rapid increase during the fruit-enlargement stage (FES), and then a decrease in the mature fruits. The highest total acid (28.62±1.05 mg/g), malic acid (23.81±1.20 mg/g), and citric acid (3.53±0.23 mg/g) content were observed at 18 WAF. In mature fruits, the total acid, malic acid, and citric acid content decreased to 22.58±1.07 mg/g, 19.27±0.98 mg/g, and 2.48±0.21 mg/g, respectively (Table 2). Moreover, the malic acid and citric acid content of mature fruits accounted for 85.34% and 10.98% of the total acid content, respectively. Similar trends among the total acid, malic acid, and citric acid content were also observed for C. glandulosa fruits. The total acid and citric acid content decreased throughout the fruit development period, while the malic acid content increased slightly before fruit maturation. Total acid (17.59±0.88 mg/g), malic acid (14.13±0.33 mg/g), and citric acid (3.15±0.18 mg/g) content were highest at 8 WAF. The total acid, malic acid, and citric acid content decreased to 7.39 mg/g, 6.57 mg/g, and 0.35 mg/g in mature fruits, respectively. The lowest malic acid content (6.12 mg/g) was recorded at 13 WAF (Table 2). In mature C. glandulosa fruits, malic acid and citric acid accounted for 88.90% and 4.74% of the total acid content, respectively.

Analysis of organic acid-related enzyme activities in developing C. humilis and C. glandulosa fruits
Phosphoenolpyruvate carboxylase (PEPC) catalyzes the β-carboxylation of PEP to produce oxaloacetic acid (OAA) and inorganic phosphoric acid. The OAA is reduced to malic acid in a reaction catalyzed by NAD-MDH [11]. The activity of PEPC in C. humilis fruits increased Fig 1. C. humilis (A: 8, 10, 12, 14, 16, 18, and 19 weeks after flowering) and C. glandulosa (B: 8, 10, 12, 13, and 14 weeks after flowering)  Study on fruit organic acids and fruit transcriptomes of C. humilis and C. glandulosa gradually from 8 to 14 WAF, then decreased slightly from 14 to 16 WAF, followed by a rapid increase to the maximum value of 70.91 U g −1 FW min −1 (Fig 2A). In contrast, in C. glandulosa fruits, the PEPC activity was 72.69 U g −1 FW min −1 at 8 WAF and then decreased to 39.75 U g −1 FW min −1 at maturity (Fig 2A). NAD-MDH activity exhibited the opposite trend in the developing fruits of C. humilis and C. glandulosa (Fig 2B). In C. humilis, NAD-MDH activity at 8 WAF was 82.43 U g −1 FW min − 1, which then gradually increased to 214.32 U g −1 FW min −1 as the fruits developed. In C. glandulosa, NAD-MDH activity was 192.69 U g −1 FW min −1 at 8 WAF, which then gradually decreased to the minimum value of 76.46 U g −1 FW min −1 at 13 WAF, after which the activity slowly increased to 91.25 U g −1 FW min −1 in mature fruits.
In developing C. humilis fruits, the activity of NADP-ME exhibited a decreasing trend from 8 to 18 WAF and then increased as the fruits matured ( Fig 2C). Meanwhile, NADP-ME activity in C. glandulosa fruits gradually increased from 8 to 13 WAF, followed by a slight decrease as the fruits matured.
The above-mentioned 223 genes involved in malic acid metabolism and citric acid metabolism were classified into GO categories (Fig 3). There were 16, 10, and 12 subclasses in the biological process, molecular function, and cellular component categories, respectively. In the biological process category, the metabolic process (211 genes), single-organism process (204 genes), and cellular process (153 genes) subgroups contained the most genes, while the rhythmic process (one gene) contained the fewest genes. In the molecular function category, the catalytic activity (202 genes) and binding (151 genes) subgroups contained the most genes, while the nucleic acid binding transcription factor activity, antioxidant activity, and molecular function regulator subgroups contained the fewest genes, each with only three genes. In the cellular component category, the cell part and cell subgroups included the most genes, both with 168 genes, while the supramolecular fiber subgroup contained only one gene.
The KEGG pathway analysis indicated that 233 differentially expressed genes were involved in 10 metabolic pathways, with the carbohydrate metabolism (42 genes), global and overview (37 genes), and energy metabolism (26 genes) pathways associated with the most genes. The pathways with the fewest genes were the metabolism of other amino acids (four genes) and nucleotide metabolism (two genes). The genetic information processing group contained folding, sorting, and degradation, as well as translation and transcription metabolic pathways. The environmental information processing group comprised signal transduction and membrane transport metabolic pathways. The cellular processes group contained transport and catabolism metabolic pathways, while the organismal systems group included environmental adaptation metabolic pathways (Fig 4).

Expression analysis of genes encoding key enzymes for malic acid metabolism
Mature C. humilis and C. glandulosa fruits contained more malic acid than citric acid ( Table 2), implying that malic acid is the main organic acid responsible for the differences in taste. The expression levels of genes encoding key enzymes involved in organic acid metabolism during different fruit-developmental stages were analyzed based on the transcriptome sequencing results and homologous sequences in the NCBI database.
In C. humilis fruits, the PEPC gene expression level increased slightly from the young fruit stage (YFS) to the pit-hardening stage (PHS), followed by a rapid increase from the PHS to the fruit-enlargement stage (FES) (Fig 5A). The highest and lowest PEPC gene expression levels were observed at the FES and the color changing stage (CCS), respectively. The PEPC gene had a similar expression trend in C. glandulosa fruits. The PEPC gene expression level at the FES was much lower in C. glandulosa fruits than in C. humilis fruits. In contrast, at all other fruitdevelopment stages, the PEPC gene expression level was slightly higher in C. glandulosa fruits than in C. humilis fruits. These results implied that the PEPC gene affected the accumulation of organic acids in C. humilis fruits primarily at the FES (Fig 5A). In C. humilis fruits, the expression of the NAD-MDH gene decreased slightly from the YFS to the PHS, and then increased rapidly to reach the maximum value at the FES, which was followed by a rapid decrease at the CCS and a drastic increase at the FRS (Fig 5B). The NAD-MDH gene was highly expressed at the FES and at the FRS, with expression levels that were 5.58 and 7.42 times that of fruits at the YFS, respectively. The lowest expression level observed at the PHS was 0.81 times that of fruits at the YFS. Meanwhile, the highest NAD-MDH gene expression in C. glandulosa fruits was found at the FRS, which was 9.05 times that of fruits at the YFS, and the lowest NAD-MDH gene expression was observed at the PHS ( Fig  5B). The NAD-MDH gene expression level was higher in C. humilis fruits than in C. glandulosa fruits throughout the fruit-developmental stage, but there was relatively little difference prior to the FES.
In C. humilis fruits, the NADP-ME gene expression level was the highest at the FES (2.17 times that of fruits at the YFS) and the lowest at the PHS (0.83 times that of fruits at the YFS). Meanwhile, in C. glandulosa fruits, the NADP-ME gene expression level was the highest at the FES (4.55 times that of fruits at the YFS) and the lowest at the YFS. Overall, the NADP-ME gene was more highly expressed in C. glandulosa fruits than in C. humilis fruits, except at the CCS. The relatively high NADP-ME gene expression levels induced the degradation of malic acid, especially at the FES and at the FRS (Fig 5C). The Citrate synthase (CS) gene expression levels were similar in two species and the CS gene was more highly expressed in C. humilis fruits than in C. glandulosa fruits (Fig 5D).

Correlation analysis and principal component analysis
The correlation between gene expression, enzymatic activity and organic acid content in developing fruits of C. humilis and C. glandulosa was analyzed. Strong correlations are apparent among total acid, malic acid, and citric acid content in both C. humilis and C. glandulosa fruits (Fig 6). The p-value was the probability and was used to compare the confidence coefficient among the data groups in the the paired Tukey's-test [37]. Significant correlation between data groups is indicated if p-value is less than 0.05 (p < 0.05). In this study, the levels of total acid, malic acid, and citric acid contents are all less than 0.05, indicating significant correlations. The correlation of organic acid contents with enzyme activities and the gene expression is also shown in Fig 6. PEPC and NAD-MDH are significantly positively correlated with the contents Study on fruit organic acids and fruit transcriptomes of C. humilis and C. glandulosa of organic acid while NADP-ME is significantly negatively correlated with the organic acid contents, PEPC and NAD-MDH in both C. humilis and C. glandulosa (Fig 6A and 6B).
In C. humilis fruits, the expression of PEPC gene is significantly positively correlated with organic acid contents and enzyme activity of NAD-MDH while it is significantly negatively correlated with the expression of the NADP-ME gene (Fig 6A). The expression of NAD-MDH gene is significantly positively correlated with total acid, malic acid and PEPC expression, but its correlations with citric acid content, PEPC activity, NAD-MDH activity and NADP-ME activity are not significant. The correlations between the expression of NADP-ME gene and other data are not significant which indicate that NADP-ME gene doesn't have a decisive effect on the organic acid accumulation in the fruits of C. humilis. However, NADP-ME gene plays a very important role in the organic acid metabolism of C. glandulosa fruits. For example, the NADP-ME gene expression is significantly negtively correlated with the organic acid contents and enzyme activity of NAD-MDH.
The PCA results identified four components that explained 100% of the total variation in organic acid metabolism of C. humilis and C. glandulosa ( Table 3). The first principal component (PC1) contributed 99.65% and 91.85% of the total variation in organic acid metabolism of C. humilis and C. glandulosa respectively, suggesting that the variation in PC1 could explain the differences in organic acid metabolism of these two species. In C. humilis, NADP-ME gene has the largest eigenvalue meaning that the NADP-ME gene plays the most important part in the organic acid metabolism of the fruits. In C. glandulosa, NAD-MDH gene and NADP-ME gene have the largest and the second largest eigenvalues respectively, suggesting that in addition to NAD-MDH gene, NADP-ME gene also has a substantial effect on the organic acid metabolism.

Discussion
To the best of our knowledge, we systematically report for the first time, the comparative study on the organic acid accumulation in fruits of C. humilis and C. glandulosa. In this study, the malic acid and citric acid content in C. humilis fruits accounted for 85.34% and 10.98% of the total acid content, respectively. In contrast, the malic acid and citric acid content in C. glandulosa fruits accounted for 88.90% and 4.74% of the total acid content, respectively. These results suggest that malic acid and citric acid are the main organic acids in both fruits, and likely affect Study on fruit organic acids and fruit transcriptomes of C. humilis and C. glandulosa fruit taste. The organic acid contents decreased in mature fruits, which is consistent with the results of earlier studies on nectarine, pear, and apple [6,7,18]. Additionally, the organic acid content decreased continuously throughout the fruit-developmental stages of C. glandulosa. However, in the developing fruits of C. humilis, the organic acid content increased gradually until one week before maturation. Although it underwent a sharp decrease in the fruit-ripening stage, the organic acid content in the fruits of C. humilis was still high, making the fruits very sour. In our study, 6,594 differentially expressed genes were detected based on transcriptome sequencing results, including 12 genes involved in malic acid metabolism, 25 genes involved in the citric acid cycle, 194 genes related to NAD/NADP metabolism, and one PEPC gene. These genes may be involved in the biosynthesis or catabolism of organic acids in C. humilis and C. glandulosa fruits, with potential consequences for fruit quality. There are at present relatively few reports describing the transcriptional regulation and cloning of genes regulating organic acid contents in C. humilis and C. glandulosa fruits. Therefore, the differences in gene expression levels described herein may be useful for future studies.
Through correlation analysis, we discovered strong correlations between organic acid content, enzyme activity and gene expression in both C. humilis and C. glandulosa fruits. First of all, PEPC, NAD-MDH and NADP-ME are significantly correlated with the organic acid contents in both C. humilis and C. glandulosa. In addition, NADP-ME gene doesn't have a decisive effect on the organic acid accumulation in C. humilis while it plays a very important role in the organic acid metabolism of C. glandulosa. The PCA analysis was carried out and the results verified the important role of NADP-ME gene in the organic acid metabolism in C. glandulosa.
Our results suggest that NAD-MDH, NADP-ME, and PEPC are all involved in regulating organic acid biosynthesis in C. humilis and C. glandulosa fruits. However, the gene expression levels were not directly related to changes in malic acid content.