Understanding the molecular mechanisms underlying the effects of light intensity on flavonoid production by RNA-seq analysis in Epimedium pseudowushanense B.L.Guo

Epimedium pseudowushanense B.L.Guo, a light-demanding shade herb, is used in traditional medicine to increase libido and strengthen muscles and bones. The recognition of the health benefits of Epimedium has increased its market demand. However, its resource recycling rate is low and environmentally dependent. Furthermore, its natural sources are endangered, further increasing prices. Commercial culture can address resource constraints of it.Understanding the effects of environmental factors on the production of its active components would improve the technology for cultivation and germplasm conservation. Here, we studied the effects of light intensities on the flavonoid production and revealed the molecular mechanism using RNA-seq analysis. Plants were exposed to five levels of light intensity through the periods of germination to flowering, the flavonoid contents were measured using HPLC. Quantification of epimedin A, epimedin B, epimedin C, and icariin showed that the flavonoid contents varied with different light intensity levels. And the largest amount of epimedin C was produced at light intensity level 4 (I4). Next, the leaves under the treatment of three light intensity levels (“L”, “M” and “H”) with the largest differences in the flavonoid content, were subjected to RNA-seq analysis. Transcriptome reconstruction identified 43,657 unigenes. All unigene sequences were annotated by searching against the Nr, Gene Ontology, and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. In total, 4008, 5260, and 3591 significant differentially expressed genes (DEGs) were identified between the groups L vs. M, M vs. H and L vs. H. Particularly, twenty-one full-length genes involved in flavonoid biosynthesis were identified. The expression levels of the flavonol synthase, chalcone synthase genes were strongly associated with light-induced flavonoid abundance with the highest expression levels found in the H group. Furthermore, 65 transcription factors, including 31 FAR1, 17 MYB-related, 12 bHLH, and 5 WRKY, were differentially expressed after light induction. Finally, a model was proposed to explain the light-induced flavonoid production. This study provided valuable information to improve cultivation practices and produced the first comprehensive resource for E. pseudowushanense transcriptomes.

The effects of light are likely to be mediated through the upregulation of the expression of genes involved in the secondary metabolite biosynthesis. For example, light can promote the upregulation of genes involved in the biosynthesis and accumulation of flavonoids in Catharanthus roseus and Ligustrum vulgare [18,19]. In the study of Azumaet [20], light treatment led to induced higher expression levels of CHS, CHI, F3H, flavonoid 3',5'-hydroxylase (F3'5'H), DFR, O-methyltransferase (OMT) as well as UFGT compared to dark grown berries. Pacheco [16] reported that Piper aduncum grown under 50% natural light irradiance had higher PAL expression than others. Leyva [21] also found that the regulation of CHS was up with the increased light intensity in Arabidopsis thaliana.
Based on the information described above, we hypothesize that (1) the accumulation of flavonoid is induced by light in an intensity dependent manner; (2) the induction is mediated by the differential expression of genes involved in the biosynthesis of the active components, flavonoids. To test this hypothesis, we first treated the plants with different light intensity levels. Second, we determined the abundance of the flavonoid contents with HPLC. Third, we compared the flavonoid abundance against the light intensity to identify the optimal levels. Forth, we selected plant materials treated at three levels with lowest, middle and highest levels of flavonoids for RNA-seq analysis. Fifth, analysis of the RNA-seq results identified genes involved in flavonoid biosynthesis and differential expressed genes (DEGs) between different light treatment groups. Last, models were proposed to explain the light-induced flavonoid accumulation.

Plant materials and growth conditions
Ninety 2-year-old healthy E. pseudowushanense plants were collected from Lei Shan County (16˚N, 108˚E) in Guizhou Province. The plants were transferred to plastic pots (10 cm × 10 cm for inner diameter and height, 1 plant per pot) filled with a substrate mixture of 75% peat and 25% vermiculite, and then placed in the greenhouse of the Institute of Medicinal Plant Development on March 1, 2015. The plants were randomly subjected to radiation with five level I1 (5.5 ± 2.5 μmolÁ m −2 Ás −1 ), I2 (14.5 ± 2.5μmolÁ m −2 Ás −1 ), I3 (18.2 ± 2.5 μmolÁ m −2 Ás −1 ), I4 (54.6 ± 2.5 μmolÁ m −2 Ás −1 ), and I5 (90.9 ± 2.5μmolÁ m −2 Ás −1 ) light intensities for 16 h per day (T5-fluorescent lamps were used as the light resource, and there were 30 pots per level). A 20-21˚C temperature range was set for entire cultivation, and humidity was maintained at 60%. Except for the light intensity, the other culture conditions are same at each pot. To control the light intensity is the same for all plants in each light treat level, the thin paper were used which eliminated the effect of light from outside. The light conditions were confirmed by Li-6400 external quantum sensor (LI-COR, Lincoln, NE, USA) system. After treatment for 30 days, the plants in each group were further divided into three subgroups with 10 plants each. Fresh leaves from plants belonging to the same subgroups were randomly collected, pooled, and then stored in liquid nitrogen until use.

Profiling of chemical compositions using HPLC
E. pseudowushanense leaf powder (200 mg) was passed through a No. 3 pharmacopoeia sieve (Each treatment group had 30 plants, they were divided into three sub groups, with 10 plants. The sub group leaves were mixed and each treatment group had 3 biological replications) and then extracted with 50 mL of 70% EtOH by ultrasonication at room temperature for 30 min. The solution was passed through a 0.45 μm microfiltration membrane, and a 20 μL aliquot of the filtrate was injected into HPLC for analysis. HPLC separation was performed on a Zorbax SB-C18 column (Agilent Technologies, Palo Alto, CA, USA) (5 μm, 250 mm × 4.6 mm). Eluents A and B were water and acetonitrile, respectively. The gradient elution program was as follows: 0-17 min (25%-26% B) and 17-26 min (26%-100% B). The column was washed with 100% eluent B for 15 min between every two testing samples and then re-equilibrated with 25% eluent B for 10 min. The elution was performed under the following conditions: flow rate, 1.0 mL/min; column temperature, 25˚C; and detection wavelength, 270 nm. Data processing was performed using PerkinElmer ChemStation software (version 6.3.1).

RNA isolation and quantification
For RNA-seq experiments, plant samples from two subgroups of each treatment group were subjected to total RNA extraction using the RNAprep Pure Plant Kit (Polysaccharides and Polyphenolics-rich) (Cat No. DP441, TianGene, China). RNA degradation and contamination were monitored using GeneGreen-stained 1% agarose gels, and RNA purity was determined using a NanoPhotometer 1 spectrophotometer (IMPLEN, Westlake Village, CA). RNA concentration was measured using Qubit 1 RNA Assay Kit in Qubit 2.0 Fluorometer (Life Technologies, Foster City, CA), and RNA integrity was assessed using the RNA Nano 6000 Assay Kit of a Bioanalyzer 2100 system (Agilent Technologies, Santa Clara, CA).

RNA-seq library construction and sequencing
The sequencing libraries were constructed using the NEBNext 1 Ultra™ RNA Library Prep Kit for Illumina (NEB, USA) in accordance with the manufacturer's protocol. In brief, mRNA was purified from total RNA using poly-T oligo-attached magnetic beads. Fragmentation was carried out using divalent captions under elevated temperature in the NEBNext First-Strand Synthesis Reaction Buffer (5×). First-strand cDNA was synthesized using a random hexamer primer and M-MuLV Reverse Transcriptase (RNase H). Subsequently, second-strand cDNA was synthesized using DNA Polymerase I and RNase H. The remaining overhangs were converted into blunt ends via exonuclease/polymerase activities, and the enzymes were removed. After adenylation of 3 0 ends of DNA fragments, NEBNext Adaptor with a hairpin loop structure was ligated to the cDNA fragments, which were then purified, end-repaired, A-tailed, and then ligated to index adapters (NEB). The templates were amplified by PCR and then sequenced on an Illumina Hiseq™ 2500 platform, which led to the generation of 125 bp pairedend reads. Data analysis and base calling were performed using Illumina instrument software. DNA sequencing was performed at Beijing Ori-Gene Science and Technology Co., Ltd. Raw data had been deposited in the Short Read Archive of GenBank with the accession numbers: xxx (to be provided).

De novo assembly and function annotation
Raw sequencing reads were processed with SolexaQA (http://solexaqa.sourceforge.net/) to filter out low-quality reads with default parameters and short reads with length 60 bp. The resulting high-quality RNA-seq data from the libraries were assembled using the computer program Trinity [22]. In case several transcripts were identified for the same gene, the longest transcript was selected as the representative sequence of the gene and will be called unigene sequence in the following text. For functional annotation, all unigene sequences were searched against several databases, including the NCBI non-redundant protein sequences (Nr, ftp://ftp. ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz), Gene Ontology (GO http://www.geneontology. org/), Swiss-Prot/Trembl (http://www.uniprot.org/), Pfam (http://pfam.xfam.org/), and Kyoto Encyclopedia of Genes and Genomes (KEGG; http://www.genome.jp/kegg/), by using the program BLASTX with E value 1e −5 and percentage of similarity ! 30%.

Gene expression quantification and differential gene expression analysis
To estimate the abundance of the transcripts, all transcripts assembled by Trinity were treated as the reference sequences. The clean reads were then mapped to the reference sequences using TopHat (version 2.0.10, http://tophat.cbcb.umd.edu/) with default parameters. The program Cuffdiff (version 2.2.1,(http://cuffdiff.cbcb.umd.edu/) was used to calculate the expression levels of genes and transcripts in terms of reads per kilobases per million reads (RPKM) and the p-value for differentially expressed genes (DEGs) based on two-tailed unpaired Student's t-test. Genes with the number of mapped reads ! 10, fold change ! 2, and uncorrected p 0.05 were deemed significant DEGs.

Enrichment analysis
GO enrichment analysis was conducted using GOseq [23]. We identified the significantly enriched GO term of DEGs with corrected p 0.05. For KEGG analysis, we used the KEGG pathway as a unit and applied the hyper geometric test to find significantly enriched pathways [24]. We identified the significantly enriched KEGG pathway of DEGs with corrected p 0.05.

Validation of RNA-seq experiments
The RNA samples used for RNA-seq analyses were subjected to reverse transcription quantitative real-time PCR (RT-qPCR) analysis. Each experiment was conducted with three technical replicates. For each sample, reverse transcription was performed on 1 μg total RNA by Trans-Script One-Step gDNA Removal and cDNA Synthesis SuperMix (TransGen) in a 20 μl volume with anchored oligo(dT)18 primer. The reaction was carried out at 42˚C for 15 min and 80˚C for 5 s using an ABI 7500 Fast instrument (Applied Biosystems). Gene-specific primers were designed using PrimerQuest (http://www.idtdna.com/Primerquest/Home/Index). The primers used in this study are listed in S1 Table. The actin gene was chosen as the endogenous control. Each qPCR reaction contained 10 μL of 2× TransStart 1 Top Green qPCR SuperMix (TransGen), 25 ng of cDNA sample, and 200 nM gene-specific primers in a final volume of 20 μL. The cycling conditions were 94˚C for 30 s, followed by 40 cycles of 94˚C for 5 s and then 60˚C for 34 s. Melting curve analyses were performed to verify the specificity by ABI 7500 Fast instrument. The relative expression levels were calculated using the 2 -ΔΔCt method [26].

Sequence analysis
For selected proteins, homologous sequences were retrieved from Genbank with an E value cutoff of 1e −5 . The sequences were then aligned with ClustalW software. Phylogenetic trees were constructed using the neighbor-joining algorithm with MEGA 7.0. The bootstrap score was calculated based on 1000 replications.

Statistical analysis
Correlation coefficients among flavonoid contents, gene expression levels of related enzymes, and transcription factors were calculated using Excel. All values are presented as the mean standard error of the mean. Statistical significance of differences was evaluated using Student's t-test or ANOVA in SPSS10 software. The significance of pearson correlation was calculated as described by VassarStats (http://www.vassarstats.net/).

Effects of light intensities on flavonoid content
The methodology validated in our previous study was applied to analyze the flavonoid content by HPLC at five light levels [27]. Fig 1 shows the changes in the contents of four different flavonoid glycosides in E. pseudowushanense under different light intensities. Interestingly, epimedin A showed different changes from epimedin B, epimedin C and icariin at I4 and I5 treatments. Epimedin A content increased as light intensity increased from I1 to I5. Thus, I5 increased epimedin A by 360.6% (p<0.05) compared with I1. Furthermore, epimedin B, epimedin C and icariin amounts showed similar changes. Epimedin B, epimedin C and icariin increased when light intensity increased from I1 to I4, whereas all decreased under I5. The highest epimedin B, epimedin C and icariin contents were observed under I4. Epimedin B, epimedin C and icariin contents were 421.9% (p<0.05), 624.0% (p<0.05) and 659.9% higher, respectively, than under I1.

RNA-seq analysis of E. pseudowushanense treated with different light intensities
In order to explore the molecular mechanism of light-induced flavonoid synthesis and accumulation in E. pseudowushanense, six cDNA libraries constituting two biological repeats were constructed from three treatment groups which the flavonoid contents were found most significantly different (i.e., the low I1, the middle I3 and the high I4) (Fig 1) and sequenced using Illumina high-throughput sequencing platform. Six samples were named L1, L2 (I1); M1, M2 (I3); H1 and H2 (I4). The RNA-seq results were summarized in Table 1. For the six samples, the total number of raw reads ranged from 45 to 55 million. After removing the adapters, lowquality sequences, and reads shorter than 35 bp, the numbers of clean reads were 40.3, 27.3, 27.0, 28.9, 32.9, and 32.9 million for the six samples, respectively ( Table 1). All the clean reads were combined and then assembled into 57,962 contigs by using Trinity. For each unigene, the longest transcript (in case of multiple transcripts) was selected as the representative and was called "unigene sequence." A total of 43,657 unigene sequences with lengths ranging from 224 bp to 17,683 bp, with an average length of 837 bp and an N50 of 1383 bp, were obtained (S1 File, S2 Table). To assess the quality of our assembly, the clean reads were mapped to unigenes. The ratios of all mapped reads ranged from 80.27% to 90.38%, whereas the ratios of uniquely mapped reads ranged from 71.67% to 83.44% (S3 Table).
We then examined the length distribution of these unigene sequences. In contrast to 12,127 unigene sequences that were longer than 1000 bp, 18,591 unigene sequences had lengths between 200 and 400 bp (Fig 2). We then compared our assembly with those for E. sagittatum based on information provided in the manuscript. Our assembled transcript was, on average, 1.23 times longer than those of E. sagittatum. Moreover, the number of genes was 15.3% greater than that of the E. sagittatum dataset.
To determine the potential functions of these unigene sequences, they were searched against the databases Nr, Nt, Trembl, Swiss-Prot, and Pfam by using BLAST with an E value cutoff of 1e −5 . The ratios of annotated unigene sequences ranged from 35% to 61% (S4 Table). Among the 43,657 unigenes, 25,989 (59.5%) and 15,441 (35.4%) had at least one significant match with an E value 1e −5 against the Nr and Nt databases. The mapping rates of unigene sequences to the Swiss-prot, Trembl, and Pfam protein databases were 45.2%, 60.1%, and 47.8%, respectively. In terms of the species source of top hit sequences, sequences from Nelumbo nucifera represented 36.5% of the top hits of our unigene sequences, followed by Vitis vinifera (10.7%), Ricinus communis (3.4%), Theobroma cacao (2.9%), Jatropha curcas

Functional classification of unigenes
We mapped the transcripts to GO terms and KEGG pathways; 23,553 unigene sequences were assigned GO terms. These terms belong to 57 functional groups, which were distributed under three main categories: molecular function, biological process, and cellular component (Fig 3).

Determination of gene abundance and identification of differentially expressed genes
The abundance of unigene sequences was quantified using the program cuffdiff and represented by FPKM [28].

Functional enrichment analysis of DEGs
To further narrow down the genes that are involved in light-induced flavonoid biosynthesis, the DEGs were first mapped to GO terms. The distribution of mapped GO classifications is shown in S2 Fig. The details for the mapping can be found in S8 Table. The most mapped terms of DEGs for the categories of biological process and cellular component were "defense response" and "integral component of membrane." In the category of molecular function, the most mapped term for DEGs between groups L and M was "Metal ion binding." By contrast, the most mapped term for DEGs between groups M and H was "ATP binding." Furthermore, the most mapped term between groups L and H was "protein serine/threonine kinase activity." In parallel, the DEGs were mapped to KEGG pathways. The most enriched pathways between groups L and M included a two-component system in environmental information processing and signal transduction (22 DEGs), phenylpropanoid biosynthesis (14 DEGs), and glyoxylate and dicarboxylate metabolism in carbohydrate metabolism (14 DEGs) (S9 Table). By contrast, the most enriched pathways between groups M and H included starch and sucrose metabolism (39 DEGs), amino sugar and nucleotide sugar metabolism (27 DEGs), and phenylpropanoid biosynthesis (20 DEGs) (S9 Table). The results confirmed that light-induced flavonoid accumulation is mediated through the increased expression levels of genes involved in the biosynthesis of phenolic acids and flavonoids. Furthermore, a dose-response relationship exists between light intensity and gene expression levels. Optimal light intensity and molecular mechanism of Epimedium pseudowushanense B.L.Guo  Optimal light intensity and molecular mechanism of Epimedium pseudowushanense B.L.Guo 3.6. Enzyme genes involved in the biosynthesis of active compounds in E. pseudowushanense The flavonoid pathway can be divided into three pathways leading to the production of anthocyantin, proanthocyanin, and flavonol, respectively [29]. Basing on the structural characteristics of the compounds, we proposed a pathway for the biosynthesis of flavonoids in E. pseudowushanense (S3 Fig). In this proposed pathway, L-phenylalanine is first converted to trans-cinnamic acid by phenylalanine ammonia-lyase (PAL, EC: 4.3.1.24) and subsequently to p-coumaric acid by trans-cinnamate 4-hydroxylase (C4H, EC: 1.14.13.11). p-Coumaric acid can be converted into p-coumaroyl-CoA by 4-coumarate-CoA ligase (4CL, EC: 6.2.1.12) and then catalyzed by chalcone synthase (CHS, EC: 2.3.1.74), chalcone isomerase (CHI, EC: 5.5.1.6), and flavanone 3-hydroxylase (EC: 1.14.11.9). As the product of these steps, dihydrokaempferol can be further converted into kaempferol by flavonol synthase (FLS, EC: 1.14.11.23), which is then converted to prenyl-flavonoids such as icariin by UGT, OMT, and some unknown methoxy transferase and isopentenyl transferase. Alternatively, kaempferol can be either hydroxylated by flavonoid 3 0 hydroxylase (EC: 1.14.13.21) to produce dihydroquercetin. Furthermore, kaempferol can be converted successively by dihydroflavonol 4-reductase (DFR, EC: 1.1.1.219) and leucoanthocyanidin dioxygenase (EC: 1.14.11.19) to generate anthocyanin. Optimal light intensity and molecular mechanism of Epimedium pseudowushanense B.L.Guo In our study, 21 unique sequences that encode 14 enzyme families involved in the flavonoid biosynthetic pathways were identified ( Table 2). The short and full gene names were shown. A prefix "epps" was added to the short gene name to indicated that it is derived from E. pseudowushanense. Multiple sequence alignments of the identified proteins and their homologous sequences were conducted to determine if the full-length sequences have been obtained. Furthermore, phylogenetic trees were constructed to examine the relationship of the following proteins: PAL (S4

Correlation between the expression profiles of biosynthetic genes and flavonoid contents
To determine which flavonoid biosyntheis genes were most strongly induced by light, we examined the differential expression of these genes as well as the correlation of the expression profiles of these genes and those of the flavonoid contents across the three treatment conditions ( Table 3). The expression level of FLS was upregulated between groups L and M. The expression levels of CHS and FLS were upregulated between groups L and H. Lastly, the Table 2 Optimal light intensity and molecular mechanism of Epimedium pseudowushanense B.L.Guo expression levels of C4H, CHI, and FLS were upregulated, whereas that of caffeoyl-CoA Omethyltransferase (COMTEC: 2.1.1.104) was downregulated between groups M and H. The expression profiles of four of the twenty-one unigenes were found to be highly correatled with those of the flavonid contents, with pearson correlation coefficents ! 0.9. In summar, the genes FLS, CHS, C4H and CHI seemed to be most strongly associated with the light-induced flavonoid accumulation.

Unigene_id Short Gene name KEGG Pathway ID (KO) Full Gene Name EC number
To study the co-expression patterns of these genes, we performed hierarchical clustering of the expression profiles of these 21 flavonoid biosynthesis genes (Fig 8). Three main clusters were readily discernable, the first cluster contained 15 genes that showed the highest expression levels in the "H" group. The second cluster contain 3 genes that were expressed at the highest levels in the "M" group. The remaining three genes belonged to the third cluster and had the highest expression levels in the "L" group. Genes in the cluster I have expression profiles that were better corrleated with the flavonoid contents comparing to those of the cluster II and III. It should be pointed out that the four genes FLS, CHS, C4H and CHI all belonged to cluster I.

RT-qPCR validation
To validate the RNA-seq data, 15 genes were selected and subjected to RT-qPCR analysis. These genes include the 2 genes that are highly correlated and differentially expressed, two genes (TR2108|c0_g1, TR575|c1_g1) belonged to the PAL family, one gene (TR6321|c0_g1) TR9038|c0_g1, the expression profiles determined by RNA-Seq experiments correlated well with those obtained from RT-qPCR experiments for 11 out of 15 (73.3%) genes with pearson correlation coefficients (r) > 0.9 (S10 Table). And 15 of 16 pairs of expression profiles were found to be significantly correlated (p < 0.05). This finding suggests that the results of our RNA-seq experiments are reliable (Fig 9).
To see if any correlations existed between flavonoid content and expression patterns of the flavonoids biosynthesis genes, we analyzed transcript abundance of four related genes (C4H, CHS1, FLS, and CHI) by RT-qPCR from 5 different light intensity described in Fig 1. The relative expression level of the four genes showed similar changes from I1 to I5 light intensity. The expression of four related genes under I1 and I2 are lower than I3 to I5. Interestingly the changes of CHS1 and FLS showed similarly changes to epimedin B, epimedin C and icariin. The I4 treatment showed highest expression level at C4H, CHS1 and FLS while the I5 treatment showed highest expression level at CHI gene (Fig 10).

Transcriptional factors involved in the light-induced flavonoid accumulation in E. pseudowushanense
To understand how the expression of genes involved in flavonoid biosynthesis was regulated in response to light, we first identified all unique sequences encoding the transcription factors in our RNA-seq dataset by comparing to sequences in the plant transcription factor database using BLAST with an E value cutoff of 1e −5 (S11 and S12 Tables). We identified 4621 unigene sequences that likely encode transcription factors. The lengths of unigene sequences representing these transcription factors varied from 224 to 13,144 bp, with an average length of 1241.5 bp and an N50 value of 1863 bp. The length distribution of these putative transcriptional factor genes is shown in S18 Fig. In terms of types, the identified transcription factors were distributed in 59 families, such as C3H, bHLH, FAR1, WRKY, NAC, MYB-related, and so on Optimal light intensity and molecular mechanism of Epimedium pseudowushanense B.L.Guo (S19 Fig). The differentially expressed transcription factors after light treatment mainly belong to the families FAR1, WRKY, bHLH, and MYB-related families (S13 Table).
To select further the transcription factors that are involved in the light-induced flavonoid accumulation, we first collected the sequences of all transcription factors from Arabidopsis thaliana, Oryza sativa, V. vinifera, and E. sagittatum, based on (1) similarity to known transcription factors involved in flavonoid biosynthesis; (2) p value for differential gene expression in any contrast group; and (3) correlation between gene expression profiles and flavonoid contents. Transcription factors including 31 FAR1, 17 MYB, 12 bHLH, and 5 WRKY are likely involved in light-induced flavonoid accumulation (S11 Table).

Light signalingl factors involved in the light-induced flavonoid accumulation in E. pseudowushanense
To select the light signaling factors that are most likely involved in the light-induced flavonoid accumulation, we collected them, based on (1) similarity to known light signal factors involved in flavonoid biosynthesis; (2) p value for differential gene expression in any contrast group; and (3) correlation between gene expression profiles and flavonoid contents(>|0.9|). Light signal factors including 3 COP1, 1 pif, 1 HY5, 1 SPA, 1 DET, 3 phy and 3 cry are likely involved in light-induced flavonoid accumulation (S14 Table).

Enzymatic genes involved in flavonoid biosynthesis
Previous studies demonstrated that light treatment of grape and kale could influence gene expression, leading to the accumulation of specific flavonol glycosides [28,30]. Further studies in grape berries reported that flavonol levels are sensitive to changes in light conditions; flavonols accumulate with increased expression of FLS [31][32][33]. These studies suggest that the expression levels of genes involved in flavonoid biosynthesis are regulated by light. In the present study, we found that C4H, CHS, CHI, and FLS were all upregulated under the different light treatments, partially explaining the light-induced flavonol accumulation in E. pseudowushanense. Optimal light intensity and molecular mechanism of Epimedium pseudowushanense B.L.Guo

Transcription factors involved in light-induced flavonoid biosynthesis
Transcription factors regulate the secondary metabolite biosynthesis and accumulation of flavonoids. Several families of transcription factors play roles in the production of flavonol compounds. Qiu et al. [34] identified a WRKY protein (OsWRKY13) as a transcriptional regulator of flavonoid biosynthesis in O. sativa, which could induce the expression of CHS. WRKY transcription factors are defined by the presence of the DNA-binding domain WRKY. The identified WRKY genes are significant regulators involved in plant developmental processes and responses to biotic and abiotic signals [35]. The inducible expression patterns of WRKY genes suggest that they are involved in the regulation of plant secondary metabolis [36].
As for flavonol biosynthesis, several specific regulators belonging to the MYB transcriptional factor family have been identified in model species. MYB proteins are characterized by the presence of one or many MYB repeat (R) DNA-binding domains. In A. thaliana, AtMYB12 activates the expression of AtFLS and AtCHS [37]. In grape, VvMYBF1, orthologous to AtMYB12, markedly upregulated the expression levels of VvFLS and VvCHI [38]. In E. sagittatum, some MYB members have been isolated and characterized, among which EsMYBF is homologous to AtMYB12 that is related to flavonol synthesis [30,39]. In grape, light induces the expression of an array of MYB transcription factors, such as VvMYBF1 and VvMYB12, which are positive regulators of the general flavonoid biosynthesis pathway as well as those specifically responsible for flavonol biosynthesis [31,40]. MYB transcription factors can directly and specifically interact with MYB recognition element (MRE). MRE is part of the light regulatory unit, which also contains bZIP recognition element (ACE). MREs can be found in the promoter regions of light-induced structural flavonoid genes, such as CHS and FLS in Arabidopsis and grapevine [41,42].
The expression levels of these MYB are also regulated by other transcription factors, such as Elongated Hypocotyl 5 (HY5). HY5 is a bZIP transcription factor that can promote photomorphogenesis [43] by recognizing ACE. In particular, HY5 has been linked to the activation of MYB and key structural genes (CHS and FLS) of the flavonoid pathway as well as the accumulation of flavonoids in response to light in Arabidopsis [44][45][46][47].
Located further upstream of the regulatory pathway, HY5 is a direct target of RING-fingertype ubiquitin E3 ligase Constitutive Photo-morphogenic 1 (COP1). COP1 acts as a negative regulator of light signaling directly downstream of the photoreceptors and controls different light-regulated plant development processes by adjusting its subcellular localization. In the presence of light, the interaction of the COP1/Suppressor of PhyA (SPA) complex with activated photoreceptors inhibits COP1/SPA function through the dissociation of COP1 from the complex and exportation from the nucleus. The downregulation of COP1 in the nucleus allows nuclear-localized transcription factors, such as HY5, to accumulate and induce the expression of genes responsible for flavonoid biosynthesis [48].
Aside from the transcription factors described above, other important classes of transcriptional factors that might be involved in flavonoid biosynthesis include the Far-red impaired Response 1 (FAR1) and Far-Red Elongated Hypocotyl 3 (FHY3) families [49]. FAR1 and FHY3 participate in diverse developmental and physiological processes and are essential for PhyA signaling in A. thaliana [50][51]. HY5 physically interacts with FHY3/FAR1 through their respective DNA binding domains in A. thaliana [52].

Other pathways related to light-induced flavonoid accumulation
Enrichment analysis showed that DEGs are significantly enriched for those involved in the two-component regulatory system, suggesting that this pathway might be involved in lightinduced flavonoid accumulation. A two-component regulatory system is a basic stimulus-response coupling mechanism to allow organisms to sense and respond to changes in different environmental conditions [53]. Two-component systems typically consist of a membranebound histidine kinase that senses a specific environmental stimulus and a corresponding response regulator that mediates the cellular response, mostly through the differential expression of target genes [54]. Two-component regulatory systems are also commonly found in plants. How this system is involved in light-induced flavonoid accumulation in E. pseudowushanense represents an interesting research question in the future.

Model proposed
To date, the mechanism by which light induces the biosynthesis of specific flavonoids in Epimedium is unknown. However, analysis of our transcriptome data implies that the mechanism of flavonoid accumulation in E. pseudowushanense is rather complex. Basing on previous studies, we proposed a model explaining light-induced flavonoid accumulation (Fig 11). In this model, light signals are received either by photoreceptors such as phytochrome or the twocomponent regulatory system through downstream signaling pathways, leading to the upregulation of genes involved in flavonoid biosynthesis and ultimately resulting in the accumulation of these compounds. This model will serve as a central hypothesis for the light-induced flavonoid biosynthesis that will be tested in the future.

Conclusions
This study represents the first comprehensive investigation of the genetic makeup responsible for the flavonol biosynthesis in E. pseudowushanense. Firstly, we find I4 light intensity is optimal for flavonoid ingredient accumulation. Then, we identified 43,657 unigene sequences in E. pseudowushanense from samples treated with light at three intensity levels by using RNA-seq technology. We determined the full-length sequences of 21 enzymatic genes involved in the flavonol biosynthesis. Among them, the FLS, CHS1 genes were strongly associated with lightinduced flavonoid accumulation. We also found 65 transcription factors, including 31 FAR1, 17 MYB-related, 12 bHLH, and 5 WRKY, which might participate in light-induced flavonoid accumulation. A model was proposed to explain the underlying molecular mechanism. This work provides valuable resources for further studies on flavonoid production in Epimedium. These information can help us to know why the flavonoid content changed under different light conditions. Besides in vitor experiments could be conduct to examine the fouction of FLS and CHS1 under diferent light intensities.
Supporting information S1