Transcriptomic analysis of Lycium ruthenicum Murr. during fruit ripening provides insight into structural and regulatory genes in the anthocyanin biosynthetic pathway

Fruit development in Lycium ruthenicum Murr. involves a succession of physiological and biochemical changes reflecting the transcriptional modulation of thousands of genes. Although recent studies have investigated the dynamic transcriptomic responses during fruit ripening in L. ruthenicum, most have been limited in scope, and thus systematic data representing the structural genes and transcription factors involved in anthocyanin biosynthesis are lacking. In this study, the transcriptomes of three ripening stages associated with anthocyanin accumulation, including S1 (green ripeness stage), S2 (skin color change) and S3 (complete ripeness stage) in L. ruthenicum were investigated using Illumina sequencing. Of a total of 43,573 assembled unigenes, 12,734 were differentially expressed during fruit ripening in L. ruthenicum. Twenty-five significantly differentially expressed structural genes (including PAL, C4H, 4CL, CHS, CHI, F3H, F3’H, F3’5’H, DFR, ANS and UFGT) were identified that might be associated with anthocyanin biosynthesis. Additionally, several transcription factors, including MYB, bHLH, WD40, NAC, WRKY, bZIP and MADS, were correlated with the structural genes, implying their important interaction with anthocyanin biosynthesis-related genes. Our findings provide insight into anthocyanin biosynthesis and regulation patterns in L. ruthenicum and offer a systematic basis for elucidating the molecular mechanisms governing anthocyanin biosynthesis in L. ruthenicum.


Introduction
As the largest subclass of water-soluble pigments, anthocyanins determine the coloration of flowers and fruits [1,2]. Anthocyanins are synthesized via the flavonoid pathway and possess a multitude of biological roles, such as protecting plants against solar exposure and ultraviolet radiation, and also possess antioxidative capacity and free radical scavenging activity [3,4]. PLOS  In response to this limitation, the current study evaluated anthocyanin biosynthesis at three different development or ripeness stages in the fruits of L. ruthenicum using RNA-Seq. The RNA-Seq data allowed us to investigate the structural genes and TFs related to anthocyanin biosynthesis throughout fruit coloration. Our findings offer insight into the molecular mechanisms of anthocyanin biosynthesis in L. ruthenicum and provide a foundation for the future genetic engineering of improved anthocyanin content in plants.

Plant materials
Lycium ruthenicum plants were collected from Shizuishan in Ningxia, China (38˚56.799 0 N, 106˚24.711 0 E, 1088 H) in March 2016 and transplanted into a horticultural field of Gansu Agriculture University, Gansu, China. All the shrubs were of similar age and were cultivated on homogenous loessal soil under the same management practices (soil management, irrigation, fertilization, pruning, and disease control). No specific permission was required for the location of Shizuishan. This study did not involve endangered or protected species.
Fruit samples were harvested at three developmental stages associated with anthocyanin accumulation, namely S1 (green ripeness stage), S2 (skin color change) and S3 (complete ripeness stage). On July 26 th 2017, the fruits in S1 were harvested, and the fruits in S2 and S3 were harvested after 15 days and 35 days, respectively. Three shrubs at each development stage were constituted the biological replications of the material. Five representative fruits were sampled from each shrub at the same time of the day (9-10 AM). The combined samples of fifteen berries from three shrubs were sliced and then immediate frozen in liquid nitrogen and stored at −80˚C until use.

Evaluation of the color coordinates L � , a � , and b �
To quantify the fruits color perception from different developmental stages by using CIELab color space, individual fruit was evaluated through Photoshop CS3 10.0 software, in which 5 different points were randomly selected, respectively. Three independent variables were used to quantify color comparisons. Coordinate L � represented lightness, meanwhile, coordinate a � and b � represented chromaticity (a � , green-red axis, b � , blue-yellow axis).

Determination of anthocyanin content
The same powdered freeze-dry samples with a pestle and mortar used for the RNA extraction were used for the anthocyanin determination. Anthocyanin content was determined by the modified extinction coefficient method [27]. Briefly, approximately 0.1 g sample was ground to fine powder and extracted in a 4 mL extraction solution (1% HCl in ethanol) with oscillation at 60˚C, 120 r/min for 100 min. Following centrifugation (10,000 g, 25˚C) for 10 min, the supernatant was transferred into a clean tube and diluted to 25 mL with extraction solution. The absorbance of the supernatant was measured at 528 nm using a spectrophotometer (TU-1901; Beijing Puxi, Co., Ltd) [39]. Anthocyanin content was calculated as milligrams of cyanidin-3-O-glucoside equivalent per gram dry weight. The following formula was used to determine anthocyanin content: MF = [(A × v × M 0 )/(e × M 1 )] × 100, MF: the anthocyanin content (mg/100 g DW), A: absorbance, v: 25× dilution ratio (mL), e: molar extinction coefficient of cyanidin-3-O-glucoside, 26,900 [40,41]; M 0 : molecular weight of cyanidin-3-O-glucoside, 449 [40,41]; M 1 : dry weight (g) of the sample. The anthocyanin contents at different ripeness stages were analyzed statistically by one-way analysis of variance (ANOVA) followed by Duncan's post-hoc tests at significance level of P � 0.05. Statistical analyses were done using SPSS version 16.0 (IBM Corporation, USA, Chicago, 2007).

RNA preparation
Total RNA was extracted from nine samples (three biological replicates in each treatment) using a mirVana miRNA Isolation Kit (Thermo Fisher Scientific, Waltham, MA, USA). The RNA quantity and quality were determined using a NanoDrop 2000 instrument (Thermo Fisher Scientific), and RNA integrity was evaluated using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA), with A260/A280 ratios for all samples being greater than 2.0.

Illumina sequencing
The three triplicate biological samples at the three stages yielded nine non directional cDNA libraries, which were prepared from 4 μg of total RNA using the TruSeq Stranded mRNA LT Sample Prep Kit (Illumina, San Diego, CA, USA). The samples with an RNA Integrity Number (RIN) � 8 were selected for subsequent analysis [42]. The size and purity of the library were determined using an Agilent 2100 Bioanalyzer. These libraries were then sequenced on the Illumina HiSeq XTen sequencing platform at Shanghai Oe Biotech Co., Ltd. (Shanghai, China) and 150-bp paired-end reads were generated.

De novo assembly and functional annotation
Considering the effects of error rates on data, quality control of the raw reads in FASTQ format was performed using NGS QC Toolkit software freely available at http://www.nipgr.res. in/ngsqctoolkit.html [43], firstly filtering the reads with more than 30% low-quality bases (Qvalue < 20), then discarding the reads with low-quality bases (Q-value < 20) from 3' end, lastly discarding the reads with "N" bases from 5' end, and the reads with the length less than 50 bp were removed. Pollution test of the clean reads was fulfilled by comparing 250 thousand pairs reads to the NCBI non-redundant (Nr) database. De novo assembly of the high-quality reads to transcripts was performed using Trinity (vesion: trinityrnaseq_r20131110) in paired-end method. Clustering and de-redundancy of transcripts were operated by TGICL, and the longest transcript that could not be extended on either end was defined as a unigene based on the similarity and length of a sequence for subsequent analysis. The sequences were annotated using a Blastx search (E-value <10 −5 ) from several protein databases, includingNr and nucleotide (Nt) databases (http://www.ncbi.nlm.nih.gov), Swiss-Prot (http://www.expasy.ch/sprot), Kyoto Encyclopedia of Genes and Genomes (KEGG) (http://www.genome.jp/kegg), Clusters of Orthologous Groups (COG) (http://www.ncbi.nlm.nih.gov/COG), and Gene Ontology (GO) (http://www.geneontology.org/), based on sequence similarity. The Blast2GO program (http://www.blast2go.de) was used to obtain GO annotations for the unigenes [44].

Differential gene expression (DEG) analysis
Nine independent cDNA libraries from three different maturation stages (S1, S2, and S3) of the L. ruthenicum fruits were constructed according to the method detailedly described by Li et al. [45]. Each library was sequenced in parallel using the Illumina HiSeq XTen sequencing platform at Shanghai Oe Biotech. After removing low quality tags, including tags with unknown nucleotide "Ns", empty tags, and tags with only one copy number, the clean tags were mapped to our transcriptome reference database. For the analysis of gene expression, FPKM (Fragments Per Kilobase of transcript per Million mapped reads) and read counts of each unigene were calculated and normalized using bowtie2 (http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml) and eXpress (http://www.rna-seqblog.com/express-a-tool-for-quantification-of-rna-seq-data/). DEGs were identified using the DESeq (http://bioconductor.org/packages/release/bioc/html/DESeq.html) with estimateSizeFactors function and nbinom Test. The significant difference was tested by NB (negative binomial distribution test). A P-value � 0.05 in multiple tests and an absolute log 2 fold change value � 2 were used as thresholds for determining significant differences in gene expression.

Clustering analysis
To evaluate the dynamic changes in absolute expression during fruit maturation in L. ruthenicum, we performed a hierarchical clustering analysis of the expression patterns using the Short Time-series Expression Miner (STEM) program (version 1.3.11) [46]. A unique STEM clustering algorithm was used for the hierarchical clustering analysis, and the gene expression values were standardized by log-transforming the data prior to analysis.

Correlation analysis of structural genes and TFs
Correlation analysis of anthocyanin structural genes and TFs was performed as described by Fang et al. [8]. To obtain main putative TFs related to anthocyanin biosynthesis, we only selected unigenes with a FPKM value > 50 in at least one of the three stages during fruit ripening. Pearson correlation analysis was carried out between TFs and structural genes by using the "correlate" function in SPSS 16.0 software.

Real-time quantitative (RT-q) PCR validation
Sixteen candidate DEGs were selected to validate the transcriptome results by RT-qPCR. Total RNA was extracted from the nine samples as described above. First-strand cDNA synthesis used 0.5 μg of total RNA and followed the manufacturer's protocol (Vazyme, R223-01). In the second step, 2 μL of 5 × HiScript II Q RT SuperMix IIa was added and then amplification was achieved in a GeneAmp PCR System 9700 (Applied Biosystems, USA). The RT-qPCR was performed using a QuantiFast SYBR Green PCR Kit (Qiagen, Germany) and finished on the LightCycler 480 II Real-time PCR System (Roche, Swiss). The EF1a gene (JX427553) was used as the internal control gene. Each sample was run in triplicate. Melting curve analysis was used to validate the specific generation of the expected PCR product. The primer sequences were designed in the laboratory using the software Primer premier 5.0 and synthesized by Generay Biotech Co., Ltd. (Shanghai, China) based on the mRNA sequences obtained from the NCBI database (S1 Table). The expression levels of the mRNAs were normalized to EF1a and the relative expression levels of the genes were calculated using the 2 −ΔΔCt method [47].

Statistical analysis
Results of anthocyanin contents are presented as means ±SE (n = 3) and data analysis was performed by one-way ANOVA using SPSS statistical software (Ver. 16.0, SPSS Inc., Chicago, IL, USA). Duncan's multiple range tests were used to detect differences among means at a significance level of P < 0.05. Pearson correlation between the RNA-Seq data and RT-qPCR data, also between structural genes and transcription factors were calculated using the "correlate" function SPSS 16.0 software.

Anthocyanin accumulation during ripening in L. ruthenicum fruits
As indicated in Fig 1A, the color of the L. ruthenicum fruits changed from green to purple-red during ripening and the flesh became pigmented. From the mean coordinates L � , a � , and b � , color differences for different developmental stages were calculated ( Fig 1B). The color changes of the coordinates ranged from 45.5 to 5.8 for L � , from -14.6 to -1.6 for a � , from 29.2 to -0.2 for b � . The color coordinates L � and b � were consistently declined from S1 to S3, and conversely, the color coordinate a � increased significantly. The anthocyanin content of the L. ruthenicum fruits increased from 13.7 mg/100 g dry weight (DW) to 60.3 mg/100 g DW as ripening proceeded (Fig 1C).

Illumina sequencing, De novo assembly, and functional annotation and classification
To obtain complete gene expression information in the L. ruthenicum fruits, nine cDNA libraries representing the cDNA in three different stages of ripeness were subjected to RNA-Seq using an Illumina HiSeq XTen platform (  46.84, and 46.33 million clean reads were obtained from the nine libraries. Using Trinity, the clean reads were assembled into 43,573 unigenes with an average length of 1,267 bp. All of the unigenes are available in the NCBI SRA database (accession number SRR7700825). The size-distribution analysis showed that the lengths of 19,453 unigenes were greater than 1,000 bp while the lengths of 35,227 unigenes were greater than 500 bp (Fig 2). These results suggest that the quality of the unigene data was sufficient for the subsequent analyses.
A total of 43,573 unigenes were searched against public databases, including Nr, Nt, Swiss-Prot, KEGG, COG and GO, with an E-value cut-off of 10 −5 . We were able to identify 54.44% of the unigenes (23,723), while the remaining unigenes (45.56%) could not be annotated with known genes (S2 Table), most likely because of an absence of genome information. Of the above 23,723 unigenes with functional annotations, 15,064, 13,128, and 4,951 unigenes could be annotated in GO, KOG, and KEGG, respectively, based on sequence homology (S2 Table). The top three matches for the L. ruthenicum fruit unigenes were with Solanum tuberosum (4,586, 19.47%), followed by Capsicum annuum (3,475,14.75%) and Nicotiana attenuata (3,337, 14.16%) (S1 Fig).

DEGs during fruit ripening
The unigenes from the nine libraries were mapped to the assembled transcriptome. Normalized-FPKM was used to quantify the transcript levels. The differences in gene expression were then analyzed by comparing the three different ripeness stages, using the thresholds of false discovery rate (FDR) -value <0.05 and fold change > 2, respectively (Fig 3). A total of 8,123 DEGs were identified between S1 and S2, with 4,264 up-regulated unigenes and 3,859 down-regulated unigenes, including 135 and 488 unigenes that were expressed exclusively at S1 and S2, respectively. Between S2 and S3, 1,529 DEGs were identified, with 602 up-regulated unigenes and 927 downregulated unigenes, including 94 and 102 unigenes that were expressed exclusively at S2 and S3, respectively. Between S1 and S3, 10,152 DEGs were identified with 5,254 up-regulated unigenes and 4,898 down-regulated unigenes, including 186 and 481 unigenes that were expressed exclusively at S1 and S3, respectively. In combination, a total of 12,734 unigenes were differentially expressed during the L. ruthenicum fruit ripening process. This finding suggests that the developmental period with the most dynamic transcriptional changes was between S1 and S2, with more than half of the DEGs (63.8%) showing significant changes during this period. All of these unigene sequences can be accessed in Supporting information files: S4, S5 and S6 Tables.

Changes in gene expression profiles during L. ruthenicum fruit maturation
In order to gain further insights into the biological processes involved in L. ruthenicum fruit maturation, the 11,765 annotated genes commonly modulated during fruit ripening were subjected to clustering analysis, allowing for the identification of 11,703 genes exhibiting similar expression trends. Three clusters comprising 9,917 transcripts with significant differential expression at P-value � 0.05 are illustrated in Fig 4. To provide a global description of the enriched biological pathways in each cluster of similarly regulated transcripts, we also presented an overview of the KEGG pathway enrichment (Fig 4). Cluster 1 contained 1,991 genes whose expression declined consistently throughout the ripening process, including "photosynthesis", "carbon fixation in photosynthetic organisms", and "porphyrin and chlorophyll metabolism." In Cluster 2, 2,520 genes were negatively modulated from S1 to S2 and remained stable from S2 to S3, including a broad range of genes responsible for "necroptosis", "PI3K-Akt signaling pathway", and "cell cycle-yeast." Cluster 3 comprised 5,406 genes whose expression increased between S1 and S2, and the top three KEGG pathways included "plant hormone signal transduction", "MARK signaling pathway-plant", and "metabolism of xenobiotics by cytochrome P450", respectively. Overall, Cluster 3 constituted the largest group with 46.2% gene coverage.

Structural genes related to anthocyanin biosynthesis
Anthocyanin biosynthesis is a dynamic and complex pathway that is regulated by a series of enzymes. In our study, 25 candidate transcripts were identified and assigned to the anthocyanin  (Table 2). Furthermore, five 4CL genes with fold change > 2 exhibited differential expression. The CL6132Contig1, CL41122Contig1, and comp97802_c0_seq7_2 genes were down-regulated from S1 to S3. In contrast, CL32840Contig1 was upregulated from S1 to S3. The transcriptional levels of CL36675Contig1 peaked at S2. Among the genes with fold change > 2, CL25405Contig1 and CL31014Contig1 encoding CHS, three CHI genes (CL2953Con-tig1, CL25937Contig1, and CL29423Contig1), CL2320Contig1 and CL6713Contig1 encoding F3H, F3'5'H gene (CL6842Contig1), a DFR gene (CL6494Contig1), and an ANS gene (CL29803Contig1) displayed similar expression patterns, and were significantly upregulated during fruit maturation in L. ruthenicum. However, the expression of CL31595Contig1 encoding CHI and UFGT genes (CL11932Contig1) decreased with fruit ripening (Table 2). In total, 17 of 25 structural genes were assembled to 4 different clusters, respectively, and 11 structural genes were belonged to cluster 3 ( Table 2). To analyze the anthocyanin content of the L. ruthenicum fruits with structural genes Transcriptome of Lycium ruthenicum during fruit ripening expression, we performed the correlation analysis. Among 10 different enzymes family (F3'H excepted), at least one member was highly correlated with the anthocyanin accumulation pattern, respectively ( Table 2).

TFs related to anthocyanin biosynthesis
TFs play crucial roles in plant growth and development. We selected TFs with FPKM � 50 in at least one of the three stages during fruit ripening, from which 72 putative TFs with highly dynamic anthocyanin biosynthesis-related expressional changes were identified. To identify significant TFs that were co-expressed with the structural genes involved in anthocyanin biosynthesis, 12 structural genes with FPKM � 50 were also selected. A transcription abundance correlation analysis was carried out between the 72 differentially expressed TFs and 12 structural genes from the anthocyanin biosynthetic pathway (S3 Table). Fifty-seven TFs were significantly correlated with at least one of the 12 structural genes. Sixteen TFs were evidently correlated with five or more structural genes from the anthocyanin biosynthetic pathway (Table 3), including one MYB gene  Table 2. Expression profiles, clusters distribution, and correlation analysis of the structural genes related to anthocyanin biosynthesis in L. ruthenicum fruit ripeness. "r" represents the Pearson correlation coefficient between structural genes and anthocyanin contents in different developmental fruits.  Table). Of these, only one of the MYB-related genes (CL8773Contig1) showed a negative correlation with the expression of structural genes, while most of them were positively correlated with the structural genes involved in anthocyanin biosynthesis. Interestingly, 15 of 16 TFs were assembled to cluster 3 whose expression increased between S1 and S2 (Table 3).

Gene name
To analyze the anthocyanin content of the L. ruthenicum fruits with the expression patterns of TFs, we performed the correlation analysis. The results showed that all the TFs were highly correlated with the anthocyanin accumulation pattern, respectively (Table 3).

Experimental validation
To confirm the reliability of the RNA-Seq data, the relative expression levels of 16 DEGs were examined by RT-qPCR (Fig 6). PCR amplification indicated that all the primers for RT-qPCR generated only single segments with the expected lengths (61-134 bp, S1 Table). The RT-qPCR results for the randomly selected unigenes were generally consistent with their transcript abundance changes determined by DEG profiling, which confirmed the reliability of the RNA-Seq data. However, discrepancies were observed between the fold change in CL37168Contig1 (in S2) and CL38545Contig1 (in S1) (Fig 6).

Discussion
The fruit of L. ruthenicum is renowned for its high anthocyanin content, and is thus valued for its nutritional contribution to human health [4,33]. The measured color coordinates L � , a � , and b � of fruits indicated that, the lightness and yellow color were declined, and the red color increased definitely with fruit maturation (Fig 1A and 1B). The total anthocyanin content of Table 3. Expression profiles, clusters distribution, and correlation analysis of 16 TFs related to anthocyanin biosynthesis in the different fruit maturation stages of L. ruthenicum. The table shows the TFs that were significantly correlated with five or more structural genes. "r" represents the Pearson correlation coefficient between TFs and anthocyanin contents in different developmental fruits.  [36] and Zeng et al. [37]. The anthocyanin content increased 3.4-fold from S1 to S3 (Fig 1C), suggesting that anthocyanin accumulation plays a role in the maturation process, and L. ruthenicum is suitable for studying anthocyanin biosynthesis during fruit maturation. Comparative transcriptome analysis by RNA-Seq has been effectively used for investigating the genes involved in anthocyanin biosynthesis in several plants, such as Vitis vinifera [27], Solanum melongena L. [29], Litchi chinensis Sonn. [48], and Lactuca sativa [49]. In the present study on L. ruthenicum, a total of 43,573 unigenes were assembled, of which only 54.44% were identified (S2 Table). The proportion of unigenes annotated is higher than that in L. barbarum (33.62%) [50]. The reason of some unigenes unannotation might be too short to have a characterized protein domain, whereas others with a known protein domain are highly diverged from other genes in the databases. Additionally, unannotated unigenes might represent specific genes with novel functions, and thus warrant further investigation.

Gene name
The transcriptional response of L. ruthenicum fruits among three ripeness stages associated with anthocyanin accumulation provides a framework for evaluating important transcriptional changes and their associated physiological mechanisms during the maturation process. A total of 8,123 DEGs were observed between the S2 and S1 stages, whereas only 1,529 DEGs were detected between the S3 and S2 stages (Fig 3). The decreased number of DEGs suggested that most of the genes governing important physiological changes were expressed at the stage of skin color change. Three clusters comprising 9,917 transcripts with significant differential expression at P � 0.05 were detected (Fig 4), and probably represent the core transcriptome of L. ruthenicum fruit development.
The majority of the genes in the three main clusters represented physiological processes that have been documented to occur during fruit development. For example, many genes (Cluster 1) involved in photosynthesis and carbon fixation in photosynthetic organisms were downregulated from S1 to S3, suggesting the shutdown of photosynthesis, as observed in previous study on grapevine [27]. Conversely, the substantial up-regulation of 57 genes related to plant hormone signal transduction indicated that plant hormone depletion might induce the relative expression of genes associated with fruit maturation in L. ruthenicum. This speculation was supported in many plants such as pea, the cell division and elongation during pea fruit growth were maintained by the hormonal interaction of GA and auxin [51]. Besides, Gene modulation relating to necroptosis was observed during the maturation phase, with genes exhibiting declined expression in S2 and remaining relatively stable in S3, suggesting a role for necroptosis in the transcriptomic reprograming that accompanies maturation [52,53].
The anthocyanin biosynthesis pathway has been extensively studied in several plant species, including apple [26], pak choi [17], grape [54], and others. In the present study, 25 DEGs implicated in anthocyanin biosynthesis, including PAL, C4H, 4CL, CHS, CHI, F3H, F3'H, F3'5'H, DFR, ANS, and UFGT, were identified (Table 2). Moreover, 16 structural genes were filtered with FPKM � 5 and fold change > 2 in at least one of the three stages during fruit ripening, most of which were significantly up-regulated. Although five structural genes exhibited declined expressional change patterns, including three 4CL genes (CL6132Contig1, CL41122Contig1 and comp97802_c0_seq7_2), one CHI gene (CL31595Contig1), and one UFGT gene (CL11932Con-tig1). In combination, our results confirmed that most of these structural genes were up-regulatedin response to anthocyanin biosynthesis, which is in accordance with previous studies [8,37], and the expression patterns of half of the members from 11 type enzymes families were highly correlated with the anthocyanin accumulation pattern, suggesting that these genes might play an extensive and important role in anthocyanin biosynthesis.
Anthocyanin biosynthesis is regulated by several well-studied TFs, including MYB, bHLH, and WD40 [13,19,[55][56][57]. Yan et al. [38] annotated 83 MYB family TFs in L. ruthenicum using RNA-Seq, and the observed expression patterns suggested that some MYB TFs might play a role in the regulation of anthocyanin synthesis during different fruit development periods. In the present study, 62 unigenes encoding MYB genes and 192 unigenes encoding MYBrelated genes were identified. Furthermore, we selected two MYB genes and 19 MYB-related genes with FPKM � 50 in at least one of the three stages during fruit ripening, and found that the transcripts of the 13 genes decreased significantly during maturation. The results implied that the anthocyanin biosynthetic pathway might be partially negatively controlled by MYB repressors in L. ruthenicum. Similar phenomena have been observed in many plants, such as Arabidopsis [58], strawberry [59], and plum [8]. In addition, an MYB gene (CL450Contig1) was significantly positively correlated with five structural genes, including a CHS gene (CL25405Contig1, r = 1.000 �� ), a CHI gene (CL25937Contig1, r = 0.998 � ), an F3H gene (CL6713Contig1, r = 1.000 �� ), a DFR gene (CL6494Contig1, r = 1.000 �� ), and an ANS gene (CL29803Contig1, r = 1.000 �� ) (S3 Table). Moreover, another set of five MYB genes were also significantly correlated with at least five structural genes. This finding suggested that these MYB genes might interact with structural genes related to anthocyanin biosynthesis. In Arabidopsis and rice, 162 and 167 bHLH-encoding genes were respectively identified that potentially participate in a variety of combinatorial interactions, including the capacity to regulate a multitude of transcriptional programs, particularly flavonoid and anthocyanin biosynthesis [22,23,60]. We also identified 110 bHLH-encoding genes among the L. ruthenicum fruit maturation stages, and five genes with FPKM � 50 were selected for further analysis. From this we discovered that four bHLH-encoding genes were evidently up-regulated during fruit ripening, and the orthologues of a bHLH-encoding gene (CL1763Contig1) were identified in Capsicum annuum (EU046275.1), Nicotiana tomentosiformis (XM_009588892.2) and so on. Interestingly, all the genes were significantly correlated with several structural genes, including 4CL, CHS, CHI, F3'5'H, and DFR, indicating a possible link between bHLH and the structural genes in L. ruthenicum. Yan et al. [38] speculated that LrAN11, encoding a WD40 repeat protein, might be involved in the regulation of anthocyanin biosynthesis in L. ruthenicum. Our results indicated that CL30472Contig1 encoding LrAN11 (KY131959) accumulated at higher levels in the later stages of ripening, and showed a significant correlation with structural genes, such as a 4CL gene (CL32840Contig1, r = 1.000 �� ), a CHI gene (CL25937Contig1, r = 0.998 � ), and an F3'5'H gene (CL6842Contig1, r = 1.000 �� ). Further analysis is required to verify whether the identified candidate genes are related to anthocyanin biosynthesis in L. ruthenicum.
In addition to the TFs mentioned above, TFs such as WRKY, NAC, bZIP, and MADS have also been implicated in anthocyanin biosynthesis in plants [61][62][63][64]. In our study, several genes, including NAC, WRKY, bZIP, and MADS, were identified (S3 Table). Among them, two NAC genes (CL1987Contig1 and CL28914Contig1) and two WRKY genes (CL11403Con-tig1 and CL16938Contig1) were positively correlated with four and five structural genes, respectively. Furthermore, two bZIP genes (CL29599Contig1 and CL39763Contig1) and two MADS genes (CL6945Contig1 and CL22157Contig1) were upregulated and significantly positively correlated with six structural genes, implying that they might directly modulate the transcription of structural genes. Interestingly, almost all of the putative TFs were assembled to cluster 3 with increasing expression changes, and all of the TFs were highly correlated with the anthocyanin accumulation (Table 3). From these results, we believed that the high expressions of TFs were coupled with the anthocyanin biosynthesis, eventually leading the fruits to display a purple color.Based on the above results, an anthocyanin biosynthesis pathway in L. ruthenicum fruit was inferred (Fig 7). Briefly, P-coumaric acid is generated following the negative regulation of PAL and the stable regulation of C4H. Next, 4CL plays a key role in the generation of P-coumaroyl-CoA. Then, under the regulation of CHS, CHI, F3H, F3'H, F3'5'H, DFR, ANS and UFGT, anthocyanin is generated. The anthocyanin transcription factors including MYB, MYB-related, bHLH, NAC, WRKY, bZIP, MADS, and WD40 play roles in the regulation of anthocyanin structural genes expression, although the relationship between these TFs and anthocyanin biosynthesis requires further experimental validation.

Conclusions
In this study, we evaluated the transcriptome of L. ruthenicum during fruit development and identified a variety of processes associated with different stages of fruit maturation. There were substantial transcriptomic variations between the different ripeness stages that were correlated with differences in anthocyanin accumulation. Out of a total of 43,573 assembled unigenes, 12,734 unigenes were differentially expressed during fruit ripening in L. ruthenicum. Candidate genes related to anthocyanin biosynthesis, including structural genes and TFs, were identified, but further analysis is required to confirm their role in anthocyanin biosynthesis. This study is the first comprehensive study using transcriptomic techniques to investigate how anthocyanin biosynthesis responds to fruit maturation in L. ruthenicum. The results in this study paved groundwork for further functional study on anthocyanin-related genes that will ultimately decipher the mechanism underlying persistentlyhigh anthocyanin contents in L. ruthenicum, and such knowledge can also be useful for the other berries.  Table. Correlation analysis of the TFs and structural genes involved in anthocyanin biosynthesis. ' � ' means significant correlation at P-value < 0.05 (2-tailed), and ' �� ' means significant correlation at P-value < 0.01 (2-tailed). ' � ' and ' �� ' were indicated with the red tag. TFs which had significant correlation with five or more structural genes were showed with the yellow tag, and the blue tag indicates the TFs correlated with none of the structural genes. (XLSX) S4