Conserved amino acid residues and gene expression patterns associated with the substrate preferences of the competing enzymes FLS and DFR

Background Flavonoids, an important class of specialized metabolites, are synthesized from phenylalanine and present in almost all plant species. Different branches of flavonoid biosynthesis lead to products like flavones, flavonols, anthocyanins, and proanthocyanidins. Dihydroflavonols form the branching point towards the production of non-colored flavonols via flavonol synthase (FLS) and colored anthocyanins via dihydroflavonol 4-reductase (DFR). Despite the wealth of publicly accessible data, there remains a gap in understanding the mechanisms that mitigate competition between FLS and DFR for the shared substrate, dihydroflavonols. Results An angiosperm-wide comparison of FLS and DFR sequences revealed the amino acids at positions associated with the substrate specificity in both enzymes. A global analysis of the phylogenetic distribution of these amino acid residues revealed that monocots generally possess FLS with Y132 (FLSY) and DFR with N133 (DFRN). In contrast, dicots generally possess FLSH and DFRN, DFRD, and DFRA. DFRA, which restricts substrate preference to dihydrokaempferol, previously believed to be unique to strawberry species, is found to be more widespread in angiosperms and has evolved independently multiple times. Generally, angiosperm FLS appears to prefer dihydrokaempferol, whereas DFR appears to favor dihydroquercetin or dihydromyricetin. Moreover, in the FLS-DFR competition, the dominance of one over the other is observed, with typically only one gene being expressed at any given time. Conclusion This study illustrates how almost mutually exclusive gene expression and substrate-preference determining residues could mitigate competition between FLS and DFR, delineates the evolution of these enzymes, and provides insights into mechanisms directing the metabolic flux of the flavonoid biosynthesis, with potential implications for ornamental plants and molecular breeding strategies.

From the gene expression analysis (Additional File 9), we distilled five important observations.Firstly, we observed diverse substrate preferences among multiple DFR/FLS copies, as seen in nodulation [1].They were found to have a greater gene expression ratio of FLS to DFR, while others like Vitis vinifera with a high anthocyanin content [2] have a higher DFR to FLS expression ratio.Fourthly, the presence of an alternate pigmentation pathway may also influence the DFR expression levels.The betalain-pigmented Beta vulgaris does not accumulate anthocyanins [3] and hence, shows no DFR expression.Similar patterns were observed in some high-carotenoid species like Solanum lycopersicum and Carica papaya, which exhibited minimal to no DFR gene expression.Lastly, we observed that the expression of the same FLS/DFR type in the same plant tissue with different phenotypes differed.For example, Vigna unguiculata green fruit showed lower expression of DFR N , DFR D , F3'H, and F3H compared to red fruit, indicating increased substrate availability in red fruit.Despite this, FLS expression remained constant in both red and green fruit.This pattern was consistent in other plants, in the white and red fruit of Fragaria vesca, green and red leaf of Lactuca sativa, and white, red, and orange flowers of Dianthus caryophyllus, suggesting a complex regulation mechanism.Previous studies suggested feedback mechanisms in flavonoid biosynthesis regulation [4][5][6].Another possible explanation could be the influence of anthocyanin biosynthesis transcription factors like the MBW complex [7].Surprisingly, in Theobroma cacao, green fruit exhibited higher expression of F3H, F3'H, and DFR N compared to mauve fruit.This apparent disparity in the gene expression and observed phenotype may be explained by a required expression of the anthocyanin biosynthesis genes before the anthocyanin pigmentation can become visible.Previous studies have reported that the accumulation of flavonols/anthocyanins may not always be directly correlated with the expression levels of FLS/DFR [8,9].This can be explained because the accumulation of anthocyanins is preceded by the expression of the required biosynthesis genes [10,11], i.e., enzymes need to be available for anthocyanin production.

Method
The accession numbers of different tissues (leaf, stem, flower, root, seed, and fruit) for the 43 species were retrieved from the SRA (SRA IDs used in this study available at https://github.com/bpucker/DFR_vs_FLS).This sample selection was further reduced to avoid the inclusion of any specific stress treatments, i.e., the different samples should represent different plant parts under 'normal' conditions.We aimed to assess the expression of a specific gene in a particular sample.The Python script generate_subset.py[12] was deployed to extract an expression data subset from the count table containing exclusively the values belonging to the accession numbers of the desired samples.The transcripts per million (TPMs) belonging to isoforms of a gene and close paralogs were aggregated per RNA-seq sample as previously described [13].To obtain an overall representation of the expression of a gene, we calculated the median value of the aggregated TPMs across a group of samples (Supplementary Fig 2 ).
The median was chosen due to its tolerance towards outliers.To visualize the gene expression patterns, a heatmap was generated using the seaborn package [14] in Python.The heatmap depicted the relative gene expression levels of F3H, F3'H, F3'5'H, FLS, and DFR.Additionally, ratios of F3'H/F3H, F3'5'H/F3H, and DFR/FLS were plotted to provide insights into their relative expression levels and potential interdependencies within the flavonoid biosynthesis.
and seed.The samples displayed per individual species depend on data availability.The heatmap analysis revealed distinct expression patterns of FLS and DFR in different plants.In Supplementary Fig 1, the monocot, Miscanthus sinensis, demonstrates a higher F3'H to F3H and DFR to FLS ratio, suggesting a potential predominance of dihydroquercetin utilization by DFR for anthocyanin or proanthocyanidin production.Cotton exhibits multiple FLS and DFR candidates of FLS H , FLS Y , DFR N , and DFR D type.Its FLS Y , limited to the Malvales order in the dicot clade, has low gene expression.Arabidopsis thaliana lacks F3'5'H and exhibits relatively low DFR gene expression.The F3H and FLS gene expression follow the same patterns, suggesting a preference for DHK as a substrate.Species with exclusive DFR D (aspartate), like oranges and tomatoes, exhibit low DFR gene expression despite significant F3'H gene expression.In Dianthus caryophyllus, red and orange flowers exhibit significant upregulation in F3H and DFR N expression compared to the white flower, while FLS expression remains relatively constant.Interestingly, F3'H expression is the least in orange flowers suggesting that its DFR N is predominantly acting on DHK as substrate leading to the formation of pelargonidin-based anthocyanins, hence the orange colour.The red flower has high gene expression of both F3H and F3'H suggesting DFR N here might be catalyzing both DHK and DHQ, leading to the production of cyanidin-and pelargonidin-based pigments and contributing to the observed red phenotype.

Supplementary Fig 1 :
Gene expression heatmap and gene expression ratio analysis of 10 selected species across various tissue samples.White tiles indicate that no gene was discovered for a certain function/type.The heatmap tiles represent the actual expression values in TPMs, with the number of samples indicated on the right.The expression heatmap shows genes associated with the branching of FLS and DFR within the flavonoid biosynthesis.Blue and yellow colors indicate high and low expression levels, respectively (see color bar).On the right, a heatmap illustrates the ratios of gene expression to investigate the substrate hydroxylation pattern in the FLS-DFR branching.Dark and light colors indicate high and low expression levels, respectively, based on the color bar.A full heatmap with all 43 investigated species is available in Additional File 8.
Lotus japonicus and variousGossypium species harboring both DFR N and DFR D , as well as FLS H and FLS Y , with DFR N favoring DHK and DFR D preferring DHQ.The FLS Y in L. japonicus is an ancestral FLS (aFLS) while the ones in Gossypium are re-emerged FLS Y. Interestingly, FLS Y in both species exhibited almost negligible expression.Secondly, tissue-specific expression of FLS and DFR was evident, as seen in Fragaria ananassa, where FLS displayed high expression in white flowers while DFR was more expressed in red fruits.This tissue-specific expression of FLS and DFR would mitigate competition between both in the same tissue.Thirdly, a preference for flavonols or anthocyanins might also affect the metabolic flux.Legume species like Vigna unguiculata, Phaseolus acutifolius, Glycine max, and Cicer arietinum from Fabales orders are known to produce flavonoids and isoflavonoids since they have a physiological role in