Comparative transcriptomic analysis reveals key components controlling spathe color in Anthurium andraeanum (Hort.)

Anthurium andraeanum (Hort.) is an important ornamental in the tropical cut-flower industry. However, there is currently insufficient information to establish a clear connection between the genetic model(s) proposed and the putative genes involved in the differentiation between colors. In this study, 18 cDNA libraries related to the spathe color and developmental stages of A. andraeanum were characterized by transcriptome sequencing (RNA-seq). For the de novo transcriptome, a total of 114,334,082 primary sequence reads were obtained from the Illumina sequencer and were assembled into 151,652 unigenes. Approximately 58,476 transcripts were generated and used for comparative transcriptome analysis between three cultivars that differ in spathe color (‘Sasha’ (white), ‘Honduras’ (red), and ‘Rapido’ (purple)). A large number of differentially expressed genes (8,324), potentially involved in multiple biological and metabolic pathways, were identified, including genes in the flavonoid and anthocyanin biosynthetic pathways. Our results showed that the chalcone isomerase (CHI) gene presented the strongest evidence for an association with differences in color and the highest correlation with other key genes (flavanone 3-hydroxylase (F3H), flavonoid 3’5’ hydroxylase (F3’5’H)/ flavonoid 3’-hydroxylase (F3’H), and leucoanthocyanidin dioxygenase (LDOX)) in the anthocyanin pathway. We also identified a differentially expressed cytochrome P450 gene in the late developmental stage of the purple spathe that appeared to determine the difference between the red- and purple-colored spathes. Furthermore, transcription factors related to putative MYB-domain protein that may control anthocyanin pathway were identified through a weighted gene co-expression network analysis (WGCNA). The results provided basic sequence information for future research on spathe color, which have important implications for this ornamental breeding strategies.


Introduction
The large diversity of floral traits in angiosperms have fascinated researchers and ornamental flower enthusiasts alike for generations. Evolutionary biologists have long held that the large diversity of floral traits is associated with high rates of diversification among species, a process that is likely mediated by natural selection [1]. Among floral traits, flower color and scent are undoubtedly of great evolutionary importance given the direct impact they can have on pollinator choice and the plant's fitness [2][3][4].
Anthurium andraeanum (Hort.) is a species complex created by interspecific hybridization between A. andraeanum Linden ex André and other related species [5]. The large diversity in spathe colors such as white, red, pink, orange, coral, green, brown, and purple and patterns such as obake, striped, and colored borders [5][6][7] has made this species an important tropical ornamental crop [8]. It is well known that anthocyanins (synthesized in the cytosol and localized in vacuoles), widely found in the flowers, seeds, fruits, and vegetative tissues of vascular plants as soluble flavonoid, confers pigmentation and also participate in the defense against a variety of biotic and abiotic stressors in plants [9][10][11]. Besides being directly beneficial to plants, anthocyanins have also been used as natural food colorants and display vital nutraceutical properties that could be advantageous for human health [11][12][13].
Extensive crossing analyses have been done on cultivars of Anthurium to unveil the genetic mechanisms involved in the color formation and segregation [6,14]. Elibox and Umaharan (2008) showed that spathe color is controlled by duplicate recessive epistasis and proposed two genes (O and R) as responsible for the distinction between color vs white and a modifier gene (M) as responsible for the distinction between reds and pinks/oranges [14]. The biosynthetic pathway generating anthocyanins is highly conserved in plants and the genes involve has been extensively studied [15]. For example, the chalcone synthase (CHS) gene catalyzes the first reaction leading to anthocyanin biosynthesis and assists in forming the intermediate chalcone, which is the primary precursor for all classes of flavonoids. The chalcone isomerase (CHI) gene, is the second key enzyme in the anthocyanin biosynthetic pathway that catalyzes the stereospecific and intramolecular isomerization of naringenin chalcone into its corresponding (2S)-flavanones. Dihydroflavonol 4-reductase (DFR) gene is the first committed of anthocyanin biosynthesis in the flavonoid biosynthetic pathway and is responsible for the formation of leucoanthocyanidins which can be converted into colored anthocyanidins by the anthocyanidin synthase (ANS) gene [11]. The enzymes which catalyze specific steps of the anthocyanin biosynthesis pathway are encoded by structural genes which are in turn under the control of regulatory genes (transcription factors (TF)). For example, TFs like R2R3-MYB and members of the basic helix-loop-helix (bHLH) (R/B) family, have been reported as regulatory elements that controls pigmentation in flowers and fruits in different species [16][17][18][19][20]. A putative TF AaMYB2, ectopically expressed in tobacco increased anthocyanin accumulation and increased the expression of DFR, flavonoid 3'-hydroxylase (F3'H), ANS, and possibly CHS genes [21]. However, regulatory genes driving the expression of the enzymes in the pathway might differ depending on the species [18][19][20][21][22][23][24][25][26].
In A. andraeanum, the major color pigments in the spathe are anthocyanins, predominantly cyanidin, and pelargonidin derivatives, of which the content and ratio determine the color and its intensity [27]. A study showed that the interaction between a R2R3-MYB TF with a basic helix-loop-helix (AabHLH1) regulates the proanthocyanidin accumulation [28]. Besides, temporal and spatial expression analysis of genes involved in flavonoid synthesis (CHS, F3'H, F3H (Flavanone 3-hydroxylase) DFR, and ANS) suggests that DFR transcript levels vary significantly between developmental stages and might represent a relevant regulator [29]. A phenotype and transcriptome analysis of anthurium leaf color mutants suggested that color formation in leaves was greatly affected by chloroplast development and pigment biosynthesis with increased expression of flavonoid 3'5' hydroxylase (F3'5'H) and DFR genes in the dark green leaf mutant compared to the wildtype [30]. However, the genes involved in the regulation of color in these mutants could not be identified. Thus, a more detailed analysis of gene expression was essential to improve our understanding between the genetic model(s) and the putative regulatory genes involved in the differentiation between colors. Prior studies have focused on the analysis of specific genes within the pathway [29,30], sometimes failing to perform appropriate comparisons between white cultivars and colored ones or providing quantitative support for the observed differences [29].
Approaches such as RNA sequencing (RNA-Seq) have allowed the profiling of global gene expression patterns and the mapping of simple or complex traits, respectively, both in model and non-model plants. Several recent studies have exploited this technology to study traits of particular interest in a wide range of species due to reduction in library construction and sequencing costs [23,[31][32][33]. In this work, we performed a comprehensive transcriptomic analysis of white, red, and purple spathes of three cultivars of A. andraeanum from two cutflower development stages. The aims of our study were: 1) to generate a new reference transcriptome based on the Honduras cultivar, 2) to identify differentially expressed genes directly involved in the biosynthesis of anthocyanins, and 3) to identify putative regulatory genes involved in the expression of spathe color. The present study provides an important molecular basis for understanding the color formation mechanism for germplasm evaluation and ornamental breeding.

Plant material
To identify the genes differentially expressed between white (W), red (R), and purple (P) spathes, cut-flowers were collected from mature 3-to 4-year-old plants of 'Sasha', 'Honduras', and 'Rapido' cultivars respectively (Fig 1). The samples were obtained from Kairi Blooms Ltd., a commercial anthurium farm located at Carapo village, Trinidad, and were harvested in triplicate (3 individual spathes from different plants) from two development stages: early (E) (stage 2 -cut-flower first visible) and late (L) (stage 6-spathe newly opened and fully expanded) ( Fig  1). All samples were harvested and immediately stored in liquid nitrogen.

RNA isolation, library preparation and, sequencing
RNAqueous Kit (Applied Biosystems, Foster City, CA) was used for total RNA isolation from spathe samples of three replicates from each of three cultivars and two developmental stages following the manufacturer's guidelines. DNA was removed from the samples using Turbo DNA-free (Applied Biosystems, Foster City, CA), also according to the manufacturer's instructions. Samples were then shipped in dried ice to Washington State University where they were purified with ethanol (70%) and resuspended in 25 μL of ultrapure water. The quantity of isolated RNA was checked using a Qubit fluorometer and the quality of each sample was determined on an Agilent 2100 Bioanalyzer.
To construct the RNA libraries, the Illumina TruSeq Stranded Total RNA with Ribo-Zero Plant kits were used according to the manufacturer's instructions. To assess the quality and determine the molarity of the prepared libraries, each sample was run on an Agilent 2100 Bioanalyzer. The samples were then normalized to 10 nM, randomized, and sequenced in a single lane (for a total of two lanes) of Illumina HiSeq 2500 System (paired-end, 100 bp reads) at the Washington State University Genomics Core. Raw Sequence data for 18 samples of this study are available under the BioProject ID PRJNA721430.

De novo transcriptome of anthurium and functional annotation
Total RNA was extracted as previously described and combined into one pool and shipped to Beijing Genomics Institute (BGI), Shenzhen, China to develop a de novo reference transcriptome from the spathes of 'Honduras' cut-flowers. Primary sequence data (paired-end, 91 bp reads) was generated using the Illumina HiSeq 2000 (Illumina, Inc., San Diego, CA). Trinity software [34] with default parameters, was used to assemble reads. The software's TGICL [35] and Phrap [36] were utilized to get sequences that could not be extended on either end.
Unigenes were aligned with blastx (expect value < 10 −5 ) to NCBI non-redundant protein (Nr), Swiss-Prot, Kyoto Encyclopedia of Genes and Genomes (KEGG), and Cluster of Orthologous Groups (COG) databases, to identify proteins with the highest sequence similarity to the respective unigene along with their functional annotation. Proteins with the highest ranks in the BLAST results were used to determine the coding regions in the unigenes and translate them into peptide sequences. To further characterize the function of the genes identified, DEGs and consensus sequences of isoforms were mapped against the UniProtKB/Swiss-Prot database using an expected value threshold of 1x10 -5 . The BlastX output was subjected to Blas-t2GO for Gene Ontology (GO) analysis.

PLOS ONE
sequences, removal of reads containing poly-N, and removal of reads with low quality (quality value of over 50% bases of the read was < 5 and error rate of 0.2). Mapping and alignment of reads to transcripts was performed with bowtie2 v2.4.2 [40,41]. All ambiguously mapping positions were maintained to estimate, using a probabilistic approach, the number of reads supporting the expression of any given transcript, as implemented in RSEM [42]. We fitted a generalized linear model with color and stage treatments with a negative binomial using the quasi-likelihood approach to reduce the impact of false positives with the edgeR software [43].
In this model, we tested simultaneously for the effect of developmental stage (E and L) and color (W, R, and P) and controlled for differences in library size. We set two sets of contrasts a priori, in which we aimed to identify differential expression between colors at each stage (Fig 2).
Normalized read counts from the top 10,000 variable genes were analyzed with multidimensional scaling (MDS) to visualize the overall variation in the samples using plotMDS in the limma package [44] in R [45]. An initial screening of the data was carried out through a hierarchical clustering dendrogram for each treatment based on the Euclidian distance and grouped with the unweighted pair group method with arithmetic mean (UPGMA) using the dendrogram function of the ape package [46] of R. All code is available at https://github.com/ oeco28/Anthurium_differential_expression.
Differentially expressed genes (DEGs) were calculated by using the total number of transcripts per million (TPM) for filtering purposes and using the estimated effective counts of reads mapping to each transcript with the software package edgeR [43,47]. DEGs were defined as genes having a false discovery rate (FDR) [48] < 5x10 -4 and an absolute log fold change (logFC) � 2. Pearson's correlations were calculated for all genes with putative connection with the anthocyanin pathway. Gene sets were generated considering those with 0.8 < correlation < -0.8 and p-value < 0.001 as significantly co-expressed. Finally, for the genes that present a function of interest, we took the Log 2 of the TPM and estimated a centered and normalized measure of expression in the following way: where i corresponds to individual measure of expression for that gene in individual sample i.
All differences in expression for individual genes of interest are shown for these individual genes in the normalized form. All statistical analyses and graphical representations were generated in R.

Identification of putative transcription factors
An expression network was used to identify potential transcription (specifically MYB-domain TF) responsible for the regulation of differential expression of genes involved in the anthocyanin metabolism. For this, we performed a weighted correlation network analysis (WGCNA) as implemented in WCGNA package [49] in R. Before network construction the proper softthresholding power (β) was determined at the inflection point of the curve of the soft threshold (power) vs scale free topology model fit. Followed by a search for modules of potentially coregulated genes in association with differences in color between spathes. We extracted all of the transcripts found in each module and identified the modules containing genes involved in anthocyanin metabolism that showed significant differential expression (p � 0.0005). Then, we extracted the putative MYB-domain containing genes that were identified as being in the same module that contained DEGs of interest (involved in anthocyanin metabolism) and retained those with the highest correlation in expression.

De novo transcriptome and sequencing
For the de novo transcriptome, a total of 114,334,082 primary sequence reads were obtained from the Illumina sequencer. After stringent quality control checks and filtering, 105,143,382 high-quality clean reads (9,462,904,380 nucleotides) were identified. Among the reads, 98.08% of the nucleotides had quality values > Q20, while the proportion of unknown nucleotides was < 0.00%, and the GC content was 42.97%. Filtered reads were assembled into 151,652 unigenes, the presence of unknown bases in the sequences varying between 0% to 5%, and the length of the unigenes ranged from 201 to 12548 bp.
For the RNA-seq, we obtained a total of 391.29 million clusters (782.58 million reads) of paired-end sequencing with Illumina Phred qualities � Q30. The expected number of reads per sample was 21.78 million reads or 5.5% of the total reads. The distribution of the reads among samples was relatively even across samples, with samples having between 3.8% and 6.7% of the total reads (Additional file 1: S1 Fig). On average, 85% of reads mapped to the reference transcriptome per sample. There was a total of 151,652 transcripts in the reference transcriptome. There were 58,476 transcripts for which the number of mapped reads across samples exceeded a minimum threshold of TPM on average across samples. One of the samples (RE2) was excluded from the analysis because it showed a very low rate of mapping to the reference transcriptome and was a clear outlier based on the UPGMA cluster analysis (Additional file 2: S2 Fig). Finally, the MDS analysis clearly separated the stages in principal component 1 and spathe colors in principal component 2 (Fig 3).

Differential expression between stages and across different color spathes
We identified 8,324 DEGs at a conservative FDR threshold value of 5x10 -4 and logFC � 2 for all the pairwise contrasts after fitting a quasi-likelihood model. There was a similar number of DEGs between stages within each color of the spathe (Table 1 and Fig 4). The relative number of DEGs between colors in the early stage of spathe development was generally smaller than the number of DEGs in the late stage. We found groups of genes showing similar patterns of differential expressions across comparisons (Fig 5).
Based on annotations, we identified that a large number of genes involved in transcriptional regulation and cell wall expansion were differentially regulated between stages for all colors. The pattern of differential expression was more complex between colors within each developmental stage, which is consistent with the fact that between the cultivars, the spathes were also highly differentiated in shape in addition to color. A comprehensive list of differentially expressed genes for each comparison is described in the additional file 3: S1 Table. Quantitative set diagrams showed how DEGs were shared across comparisons for each set of contrasts ( Fig 5). In our stage comparison (E vs L), we observed for all colors a general trend of having more statistically significant downregulated genes than upregulated in the late stage. Although we did not find a remarkable difference between the number of downregulated and upregulated genes in the rest of the comparisons, the general trend suggested that there was a slightly larger number of downregulated genes in colored spathes (R, P) when compared to the white (W) spathe at both development stages (E, L) (Additional file 3: S1 Table). We found a large number of DEGs in our comparisons, but we focused our attention to those genes involved in the anthocyanin biosynthetic pathway (Fig 6A, Table 2). We found that the CHI gene is highly expressed in the red cultivar for both stages in comparison with the other samples. The CHS gene, at the top of the regulatory cascade of the anthocyanin biosynthetic pathway, had higher expression levels in the white spathe in both stages of development and for the red early stage; however, for this gene, the red late, the early and late stages of purple cultivar presented lower expression. The gene related with the cytochrome P450 presented higher expression in the late stage of purple. The DFR gene presented higher expression for the red cultivar on stage late compared to the other stages in the other two cultivars. The genes F3'5'H, F3H, and LDOX presented the higher expression in both stages of the red cultivar ( Fig  6B, Table 2) compared to the white and purples cultivars. In addition, a putative MYB-domain was highly expressed in the purple stages, following by the red ones and finally down expressed in the white cultivar. Details on the identification of this putative MYB-domain are provided below in section Identification of putative transcription factors. Finally, the UFGT gene presented higher expression in the purple and red cultivars compared to the white one.
We analyzed the co-expression between genes in the anthocyanin pathway by estimating the correlation of normalized TPMs across the different samples. Using a minimal TPM value of 5 as a threshold, the Pearson coefficient heatmap (Fig 6C) suggests at least 6 highly correlated groups. A clear pattern emerges from this analysis showing that the CHI gene is highly

Identification of putative transcription factors
It has been shown that tissue-specific regulation of CHS/CHI and other relevant genes in the anthocyanin pathway is controlled by a MYB-like domain protein in different systems [18][19][20]. The Anthurium transcriptome, and many plant genomes have a large number of  Table). In order to reduce the number of potential candidates that could be involved in the control of CHI differential expression, we performed an expression network analysis across all samples using WGCNA. The soft-thresholding power and connectivity between DEGs, were screened prior to the WGCNA module analysis. The soft threshold power of 6 (β = 6) was selected according

PLOS ONE
Differential expression of genes responsible for differences in color in Anthurium andraeanum Table 2 to the preconditions of approximate scale-free topology. The general results of the modules and clusters of genes showing similar patterns of expression are shown in Fig 7A and 7B. Our analysis suggested that 21 modules or clusters of genes with similar expression can explain the diversity of gene expression profiles across spathe developmental stages and colors. A large number of transcripts were assigned to a module with low assignment scores (grey, n Genes = 10,998) indicative of genes that cannot follow a characteristic expression profile. Gene clusters varied in size from 557 (light cyan) to 8774 (turquoise). The large number of genes present in a moderate number of clusters, for such a divergent set of plants, suggested that changes in color across developmental stage was a tightly controlled process. We focused our attention on the red cluster containing the CHI gene (Unigene109494) and used it to identify MYB-like domain proteins that showed similar patterns of expression (Additional file 5: S3 Table). Our analyses showed that this cluster contained nine MYBdomain proteins encoding genes, that showed a highly significantly correlated transcriptional profile (modular correlation r2 > 0.8 and p < 0.0001)_compared to CHI (Fig 7A). Out of these, three MYB encoding genes putatively produced proteins larger than 300 aa, and one of them (Unigene7734) has a target regulatory sequence for miRNA recognition.

Discussion
The large diversity in anthocyanin pigmentation in flowers has been an important feature in the coevolution of plants and pollinators. Specifically, for A. andraeanum, it has been already established that the spathes accumulate anthocyanins progressively, reaching large quantities at stage 3 and above, and producing other flavonoids [27,50]. More recent work performed in dark red, red, light red, white, orange, and coral cultivars of Anthurium, suggested that F3'H expression might be a key control point in the regulation of anthocyanin biosynthesis and is strongly associated to the intensity of pink color [51,52]. In concordance, Collette et al., (2004) [29] established that high levels of CHS and F3H genes during the early stages of development could generate the differences between colors. Although previous analyses have implicated CHS among other enzymes in the biosynthetic pathway in the determination of color in spathe in Anthurium, none have been conclusive. For this reason, to gain insight into the molecular events that regulate the spathe color in A. andraeanum, we used RNA-seq and WGCNA analysis to examined the expression patterns of various structural genes (CHI, CHS, F3H, F3'5'H/ F3'H, DFR, and ANS) to determine differences in color in plants with white, red and purple spathes over different developmental stages. We divided the variable genes into co-expression modules by unsigned network construction based on their expression patterns to identify putative TFs like MYB that could help the understanding of the color in this species.
In this work, we generated a new de novo reference transcriptome for A. andreanum L. from a standard and established variety, Honduras, well known among growers and producers. We anticipate that this will be a resource that will be broadly used by growers interested in characterizing transcriptional profiles for the species. We identified differentially expressed genes among three cultivars to identify genes involved in the determination of color on spathes in A. andreanum L. The CHS gene was the main regulator differentially down-regulated in the red late and early and late stages of purple spathe when compared to the white spathe. The CHS gene followed a profile of expression in red spathes characterized by higher levels of expression in early stages followed by a subsequent decay. This result was surprising, and we hypothesize that the decline in expression of CHS in red spathe could be due to the fact that this gene catalyzes the first reaction in the anthocyanin biosynthesis, but is not needed in later stages to produce Naringenin Chalcone in higher proportion compared to early stages. Similar results consistent with the behavior of CHS content across stages have been reported previously in other species [53,54]. However, CHS seems to present different profiles of expression in time, suggesting a different role in different systems, like petunia flowers where the expression of CHS increases as development progresses [55]. Our results suggest that CHS plays a role in the biosynthesis of flavonoid in A. andraeanum and in the differentiation between white and colored spathes. In contrast with previous studies, we identified the importance of the CHI gene due to its highly correlated expression. In other species like tobacco [56] and tomato [57], it has been demonstrated that manipulating the CHI caused changes in the anthocyanin content as well as flower, peels, and flesh color intensity. Additionally, findings from other studies show that genes implicated with CHI are coordinately expressed with F3'H in the anthocyanin biosynthetic pathway [58]. Studies also show that the coordinated expression for these genes becomes more evident as transcript levels increase with plant maturity [58]. Taken together, the profiles of expression of CHI and F3'H, suggests that their coordinated expression plays an important role in the determination of color in A. andraeanum, especially when contrasting red and white spathe varieties. The high positive correlation found between CHI and other genes in the pathway (Fig 6C) is consistent with previous findings in other systems [59][60][61]. Our analysis suggests that differences in color are likely to be driven by differential expression of multiple in the anthocyanin pathways that work coordinately to favor the production of pigments.
We revealed that most of the structural genes (CHI, F3H, F3´5´H/F3'H, DFR, and UFGT) are involved in the anthocyanin biosynthetic pathway were significantly down-regulated in the white flowers. We detected significant differential expression of the DFR gene between white and red. Our analyses strongly suggests that DFR plays a key role in the formation of anthocyanins, which have been reported to directly determines the development of pink or white color in tobacco flowers [62]. This result suggested, that the expression of DFR could contribute to the color differences between the white and colored spathes in this study one the developmental stage is advanced. Similar findings have also been noted by Gopaulchan et al. (2014) [51], where the white spathes displayed equivalent levels of DFR transcript to the red and orange spathes at stage 2, and had higher expression at stage 6 compared to the colored spathes. In contrast, the UFGT genes were strictly inhibited in white flowers. It has been reported that the down-regulation of McDFR in apple fruit reduced the expression levels of some structural genes (F3H, F3'H, DFR, ANS, and UFGT), while the CHS and CHI genes were up-regulated [63]. It indicates that the altered expression of DFR also affected the expression of other anthocyanin biosynthetic genes. In this study, interestingly, reveals that CHI genes are the most significantly up-regulated genes in the comparison between white and red, which seem to be one of the important candidate genes in determining the flower color. Because of the great similarity between F3'5'H and F3'H, we couldn't distinguish between those two genes in the de novo transcriptome using amino acid sequence alone. Yet, the expression results showing a significant over-expression of this gene in red cultivar when compared to white, but not in the purple to white, strongly suggests that this gene is the F3'H gene driving the pathway towards the production of red pigment.
We find compelling evidence that the determination of purple color at late stages might be regulated by increased expression of F3'5'H/F3'H, cytochrome P450, MYB-domain, and UFGT. More importantly, in the purple spathe, we identified a cytochrome P450 oxidase differentially up-regulated when compared to red colored spathe which appeared to determine the difference between the red and purple hues. Functional annotation analysis suggested that the cytochrome P450 gene may encode an additional F3'H or F3'5'H genes. This latter result was striking because initially the purple spathe displays a red hue at the early stage of spathe development and then acquires the purple hue as it develops into a mature spathe progresses and this cytochrome P450 gene could be a highly divergent F3'H or F3'5'H gene. Hence, the CHS gene may have a stronger influence on anthocyanin accumulation than other structural genes, at least in the early stages of 'Honduras' and 'Rapido' cultivars. Taken together, our results suggest that F3'H is more likely to be a key structural gene in the late stages of the purple spathe. Besides, the accumulation of higher amounts of anthocyanin in purple (Rapido cultivar) may also be due to a cumulative effect of the high expression levels of all the other structural genes like P450. Previous studies in other species indicated that a single gene is not responsible for anthocyanin accumulation and that anthocyanin biosynthesis involves the coordinated mechanism of many genes [64,65]. However, additional research is necessary to test this hypothesis in A. andraeanum.
In this paper, a weighted gene co-expression network (WGCNA) analysis was also performed in order to identify other genes involved in anthocyanin accumulation in Anthurium. WGCNA is a method for analyzing the gene expression patterns of multiple samples. Genes with similar expression patterns can be clustered, moreover, the relationship between modules, allowing to obtain information on both genes function and the mechanisms controlling the traits of interest [66]. This method has been recently used to dissect anthocyanin pathway in apricots [67], apples [68] and eggplant [69].
Among the genes present in each weighted module that correlated with the metabolite of interest, we found 9 DEGs coding for transcription factors, indicating that there is a strong relationship between the CHI gene and MYB-domain, as previously hypothesized, even if the mechanism of this connection is still unclear.
Analyses in different plant systems have focused on understanding the pattern of expression of specific genes and demonstrated how diverse these underlying changes can manifest in differences in coloration can be [70][71][72]. The importance of R2R3 MYB TF in the co-regulation of the anthocyanin biosynthetic pathway have been demonstrated in other species. For example, in Paeonia ostia, a study found that two TF, PoMYB2, and PoSPL1, seem to negatively regulate anthocyanin accumulation by interfering with the formation of the MYB-bHLH-WDR complex [73]. Besides, analyses in Petunia hybrida have shown that two bHLH TFs interacts with a MYB and WD repeat protein to regulate the expression of CHS and DFR genes [17,50,[74][75][76]. In our study, the use of WGCNA to identify co-expression patterns among the six different tissues was especially key to identifying potential regulatory elements for the pathways of interest. Additionally, the mRNA encoding the putative MYB-domain contained a silencing RNA-target motif that was polymorphic among samples of white and colored spathes. At this point, the identification of this MYB-domain putative regulator remains hypothetical and further studies are needed to confirm its implication in the regulation of color in A. andraeanum.

Conclusions
A. andraeanum is a monocot plant type that presents a spathe with highly diverse pigmented colors. The present study showed that differences between the three colored spathes as well as differences among developmental stages could be explained by differences in their relative gene's expression abundances. Furthermore, could be explained due to coordinated expression between the genes CHI, DFR, F3H, and F3'5'H. There is also evidence from our study that the accumulation of higher amounts of anthocyanin in the purple spathe could be due to additional regulatory cumulative effect of the high expression levels of structural genes like the cytochrome P450. While the correlation between gene expression provides preliminary evidence of the role of these genes in determining spathe color intensity, further work is required to elucidate the roles of these genes.