Validation of suitable genes for normalization of diurnal gene expression studies in Chenopodium quinoa

Quinoa depicts high nutritional quality and abiotic stress resistance, attracting strong interest in the last years. To unravel the function of candidate genes for agronomically relevant traits, studying their transcriptional activities by RT-qPCR is an important experimental approach. The accuracy of such experiments strongly depends on precise data normalization. To date, validation of potential candidate genes for normalization of diurnal expression studies has not been performed in C. quinoa. We selected eight candidate genes based on transcriptome data and literature survey, including conventionally used reference genes. We used three statistical algorithms (BestKeeper, geNorm and NormFinder) to test their stability and added further validation by a simulation-based strategy. We demonstrated that using different reference genes, including those top ranked by stability, causes significant differences among the resulting diurnal expression patterns. Our results show that isocitrate dehydrogenase enzyme (IDH-A) and polypyrimidine tract-binding protein (PTB) are suitable genes to normalize diurnal expression data of two different quinoa accessions. Moreover, we validated our reference genes by normalizing two known diurnally regulated genes, BTC1 and BBX19. The validated reference genes obtained in this study will improve the accuracy of RT-qPCR data normalization and facilitate gene expression studies in quinoa.

regulate the time of consumption and production of energy. Thus, the circadian system manipulation is a strategy for optimizing plant growth and yield and overcoming environmental stress [14,15]. The study of the circadian clock by diurnal gene expression through RT-qPCR implies a careful and rigorous selection of reference genes. It has been shown that a reference gene that has been qualified as stable could depict diurnal expression patterns. When such a gene is used to normalize expression data, it causes considerable changes in the expression profiles of the target genes [16].
Quinoa (Chenopodium quinoa Willd.) is a dicotyledonous annual species from the family Amaranthaceae. It has an allotetraploid genome (1.39 Gb) presumably originating from the diploid species Chenopodium pallidicaule (A-genome) and Chenopodium suecicum (B-genome) [17]. Quinoa has high protein content and an exceptional balance between oil, protein and fat. It also supplies balanced amounts of the amino acid lysine along with the other eight essential amino acids [18]. This, together with its great adaptability to extreme climatic and soil conditions, conveys significant potential for quinoa production expansion around the world [19]. To date, no reference genes validation has been performed for diurnal expression studies in C. quinoa.
In this study, we aimed to select suitable reference genes for diurnal expression studies in C.
quinoa. For that, we selected eight candidate reference genes and used three statistical algorithms (geNorm, NormFinder and BestKeeper) to test their stability. We performed further validation by normalizing the expression data of the diurnally regulated DOUBLE B-BOX TYPE ZINC FINGER (BBX19) [20] against each of the candidate reference genes followed by a comparison between the obtained expression profiles and a model expression profile, which we refer to as "simulation". This Spectrophotometer (Thermo Fisher Scientific Inc., Waltham, United States). An A 260 /A 280 ratio between 2.0 and 2.2 and an A 260 /A 230 ratio higher than 1.7 indicated appropriate RNA quality.
Exemplary RNA agarose gels and the A 260 /A 280 and A 260 /A 230 ratios for the Titicaca accession are presented in S1 Fig and S1 Table, respectively. We used the DNase-treated RNA to perform the reverse transcription with the First Strand cDNA Synthesis Kit (Thermo Fisher Scientific Inc., Waltham, United States), according to the manufacturer's instructions. We used the RNA concentration obtained during the NanoDrop analysis to calculate the required volume of every sample to load a final amount of 1 ug of RNA to the reverse transcription process. Later, we used the products of the reverse transcription to perform a standard PCR (annealing temperature: 56 °C; 36 cycles; RAN-3 primers in Table 1) followed by agarose gel electrophoresis (2.0%) to verify the success of the cDNA synthesis (S1 Fig). The synthesized cDNA was stored at -20 °C for further use.

Selection of candidate reference genes
We made a literature survey to select potential candidate reference genes. The most commonly used reference genes, preferably the ones that have been previously reported to be efficient in species of the Amaranthacea family, were selected [16,23,24]. Afterwards, we validated this list of genes based on the available C. quinoa transcriptome data [17] by selecting those genes, which were stably expressed at least in two out of four stress conditions (low phosphate, salt, drought and heat) for soil-and hydroponic-grown plants within root and shoot tissues (one-way ANOVA and Tukey's Multiple Comparison Test; α=0.05) (S2 Table S1 Table). We designed paralog-specific primer pairs for amplification of the candidate reference genes except for GAPDH, whose primers were designed in a paralog-conserved region. We assessed the primer quality with the OligoCalc program (http://biotools.nubic.northwestern.edu/OligoCalc.html). We considered the next primer quality parameters: length (18-26 bp), similar melting temperature (50-60 °C), amplicon length (100 -160 bp), GC content (>40 %) and lack of self-annealing and primerdimmer formation. To design the GAPDH primers, a Multiple Sequence Alignment of GAPDH paralogs with CLC Main Workbench (QIAGEN Aarhus, Denmark) was performed with the following parameters; gap open cost: 10, gap extension cost: 1, end gap cost: as any other and very accurate alignment. We verified the specificity of the primers by standard PCR using DNA from the Danish accession Titicaca followed by agarose gel electrophoresis. Moreover, we sent the products of locus-specific primers for Sanger sequencing to the Institute for Clinical Molecular Biology (IKMB, Kiel University) and the sequencing results were analyzed with CLC Main Workbench. A further verification of the primer combinations specificity was made by examination of the RT-qPCR melting curves using the Bio-Rad CFX Manager 3.1 software (Bio-Rad Laboratories GmbH, Munich, Germany).

Quantitative real-time PCR (RT-qPCR) and determination of amplification efficiency
We performed RT-qPCR with a Bio-Rad CFX96 Real-Time System, which has a built-in Bio-Rad

Gene stability analysis
We assessed the expression stability of the candidate genes by three algorithms: geNorm [7], NormFinder [8] and BestKeeper [9]. The data used for these analyses were the averaged Cq value of the three biological replicates (three technical replicates each) obtained for seven different Zeitgeber points. For the determination of the stability by geNorm and NormFinder, transformation of the raw Cq values into relative quantities (RQ) was performed as follows: where, E represents the PCR efficiency and ΔCq for any given gene is the Cq value of each sample subtracted from the lowest Cq value among all the samples for that gene. In contrast, for the utilization of BestKeeper, the raw non-transformed Cq data were used, as required by the algorithm.
The geNorm algorithm calculates an average expression stability, M-value, which is defined as the average pairwise variation in a particular gene with all other potential reference genes. This software was also used to calculate the minimum number of genes required for normalization, which is given by the pairwise variation as follows: where, V n is the pair-wise variation of a given number of reference genes and V n+1 is the pair-wise variation produced when an extra reference gene is added to the analysis. Below the default cut-off value of 0.15, all gene pairs were considered stable and the addition of an extra gene for normalization was not required [7]. NormFinder identifies stably expressed genes based on a mathematical model that enables the estimation of the intra and inter-group variation of the sample set as a stability value (SV) [8]. In order to obtain the best combination of reference genes together with its stability value and the intra and intergroup variations, we added group identifiers to distinguish the expression data of CHEN-109 and Titicaca. In the case of BestKeeper, it helps in selection of the best reference genes after the calculation of the standard deviation (SD) and a coefficient of variance (CV) [9]. Following the algorithms protocols, we considered candidate reference genes with the lowest M, SV and CV values as the most stably expressed genes.

Validation of the reference genes
To validate the reliability of the top ranked reference genes and to clarify the ambiguous results given by the algorithms, we normalized the expression data of the diurnally regulated CqBBX19 Expression levels for both accessions were determined by2 − ∆ ∆ Cq method [25]. We designed the primer combination for CqBBX19 (Table 1) according to the criteria mentioned above and its specificity was assessed in the same way as the previously described for the reference genes primer pairs.

Selection of candidate genes
First, we performed a selection of candidate genes based on a literature survey. We preferentially selected reference genes that have been validated in species of the Amaranthacea family [16,23,24]. Afterwards, we validated these genes with C. quinoa transcriptome data, where plants had been grown under abiotic stress conditions [17]. We selected eight candidate reference genes (Table 1), which were stably expressed in root and shoot tissues in at least two out of four stress conditions (S2 Table S1 Table).    (Table 1).

Diurnal expression profiles of candidate genes
In a next step, we determined the expression profiles of the candidate genes by RT-qPCR and the raw Cq values were used to quantify the expression levels. We choose two accessions from Peru (CHEN-109) and Denmark (Titicaca), because of their different photoperiod sensitivity, which might be related to its geographical origin [26]. We extracted RNA from samples taken during one

Expression stability of the candidate genes in two quinoa accessions
We assessed the stability of the candidate genes by three algorithms: geNorm [7], NormFinder [8] and BestKeeper [9]. geNorm and BestKeeper placed PTB and IDH-A as the most stable genes for normalization of Titicaca expression data. On the other hand, the same algorithms placed TUB and IDH-A as the most stable genes for normalization of CHEN-109 expression data (Fig 3). NormFinder determined the most suitable reference gene for the two sets of expression data, simultaneously (Titicaca and CHEN-109), ranking ACT as the most suitable single gene and IDH-A together with TUB as the best combination of genes (stability value 0.035) (Fig 3 and S6 Fig). Additionally, we performed a pairwise variation analysis (cut-off value of 0.15) of the candidate genes for determination of the appropriate number of reference genes by geNorm. The appropriate number of genes required for normalization in this study was two (Fig 4). shown on the right side (green arrow) and the least stable genes on the left (red arrow).

Validation of selected reference genes for normalization of diurnal expression data
We performed a further validation of the candidate genes due to their diurnal expression patterns ( Fig   2) and their dissimilarity in the rankings obtained from the algorithms (Fig 3). We used the diurnally regulated gene CqBBX19 that shares sequence homology with BvBBX19, a major flowering time regulator of sugar beet (Beta vulgaris L.) [20]. Because both species belong to the same plant family, it is tempting to speculate that CqBBX19 has a similar function in quinoa. We normalized the expression data of the diurnally regulated gene CqBBX19 against each of the candidate reference genes. Furthermore, we carried out the normalization of CqBBX19 with a constant Cq value of 20, to identify the expected expression patterns (simulation). As an outcome, normalizing CqBBX19 expression data against GAPDH, IDH-B, UBQ, TUB, ACT and RAN-3 resulted in expression patterns that clearly differed from the simulation in number, shape and position of the expression peaks ( Fig   5). For instance, expression patterns obtained for CHEN-109 by normalization against UBQ and ACT depicted a peak at ZT-4, which is absent in the simulated expression pattern (one-way ANOVA; α=0.05). A different effect of inaccurate normalization was seen for the pattern obtained with RAN-3, which disregards peaks observed during the simulation (e.g. ZT-8 of Titicaca) (one-way ANOVA; α=0.05). A critical assessment of the results showed that the normalization of the expression data against GAPDH alters the significant differences in the expression levels between accessions (at ZT-4 and ZT-20; two-tailed t-test, α=0.05). On the other hand, the combination of the two best genes for normalization in Titicaca determined by BestKeeper (IDH-A + PTB) exhibited analogous diurnal expression patterns to the simulation. In conclusion, the geometric mean of IDH-A and PTB turned out to be the best reference to normalize diurnal expression data in C. quinoa. Our results show that using different reference genes for normalization, including those top ranked by stability, causes significant differences among the diurnal expression patterns of CqBBX19 for two different C. quinoa accessions under short day conditions. We were able to find a suitable reference for the normalization of our diurnal expression data through a novel approach, which avoids selection of reference genes that produce false transcriptional profiles.

Discussion
RT-qPCR is a powerful tool to study gene expression, which allows the measurement of small amounts of nucleic acids in a wide range of samples from numerous sources. Nevertheless, the accuracy of RT-qPCR results strongly depends on accurate transcript normalization, which is very frequently overlooked during the data assessment. Moreover, misled interpretations guided by inaccurate results have been reported by using a single reference with no evidence that it is stably expressed across all conditions [6]. Furthermore, finding a gene that is stably expressed through a 24 h cycle represents a challenge, because transcript abundance of thousands of genes interplay during the day and conventionally used reference genes may show cyclic expression patterns [16]. In this study, we aimed to select suitable reference genes for diurnal expression studies to normalize expression data from two different quinoa accessions. We selected eight candidate reference genes (ACT, GAPDH, RAN-3, IDH-A, IDH-B, PTB, UBQ, TUB) based on C. quinoa transcriptome data [17] and literature survey [16,23,24] and tested them for stability during the course of a day.
Through an approach based on the comparison between the expected expression pattern of the target gene and the expression patterns obtained by normalization against the candidates, we determined that a combination of IDH-A and PTB was the most suitable reference to normalize our diurnal expression data. To date, this is the first report of a systematic analysis of reference genes for normalization of gene expression data in diurnal studies in C. quinoa.
Housekeeping genes are typically used as reference genes because their transcript levels are assumed stable during the execution of basic cellular functions. Nevertheless, the circadian regulatory pathways involve complex and intricate processes [14,27], in which a housekeeping gene can undergo circadian regulation during the execution of its housekeeping function or the execution of a different function coupled to a circadian pathway (e.g. light induced GAPDH response in corn seedlings [28] and TUB carbon-induced diurnal response in Arabidopsis rosettes [29]. Furthermore, the complexity of the circadian pathways is reflected in the total number of diurnally expressed genes in plants, which varies widely from species to species. For instance, the percentage of rhythmic genes in transcriptome datasets of Arabidopsis rosettes and rice seedlings is 37.1 % (14,019) and 33.2 % (24,957), respectively [30]. In addition, the number of diurnally expressed genes highly depends on the tissue and developmental stage [30,31]. Therefore, it is expected that conventionally used reference genes depict cyclic regulation, as it has been already illustrated for rice plants subjected to circadian cycles of different temperature and day length conditions [16]. Additionally, the observed differences in the expression profiles between Titicaca and CHEN-109 may be due to the originrelated photoperiod sensitivity of the accessions [26]. Similar intra-species variations, where a gene which is stable in one experiment is unstable in another one, has been observed in peach [3], rice [13] and petunia [32]. Ultimately, the described differences between accessions together with the observed diurnal pattern of the expression profiles enhance the need for a rigorous expression stability analysis. Moreover, when the validation of the reference genes is well performed, the use of suitable reference genes reduces experimentally induced or inherent technical variations [6].
From the analysis of the stability of the candidate reference genes, we obtained contrasting results using geNorm, NormFinder and BestKeeper. The variations we observed in the rankings across all the algorithms are explicable as they use different statistical approaches. Ranking the reference genes by pairwise comparison approaches (geNorm and BestKeeper) is problematic, since some genes may have similar but not constant expression profiles and will be ranked as the most stable reference genes irrespective of their expression stability [8]. of the expected target gene expression. The simulation is particularly useful to disregard inefficient reference genes, whose resulting target gene expression patterns differed significantly from the simulated pattern. This meticulous approach overcomes failures related to stability-based selection of reference genes.
The results from geNorm pairwise variation showed that two reference genes are sufficient for accurate normalization of our diurnal expression data. The diurnal expression patterns of our target gene CqBBX19 normalized against CqIDH-A and CqPTB did not differ from the simulation pattern.
Therefore, CqIDH-A and CqPTB were suitable as reference genes and their geometric mean was appropriate to normalize the diurnal expression of CqBBX19 [20]. quinoa to normalize the expression of saponin biosynthesis genes after methyl jasmonate treatment [23]. In beet, BBX19 acts together with BOLTING TIME CONTROL 1 (BTC1) in the circadian regulation of the flowering pathway [33] and shows diurnal regulation with an expression peak at dawn [20] . Considering the close phylogenetic relationship between B. vulgaris and C. quinoa, CqBBX19 might similarly regulate the flowering time in quinoa through photoperiod pathway. Thus, the difference in the expression patterns of CqBBX19 between the accessions might be due to difference in their response to photoperiod.
In the current study, commonly used reference genes were considered unstable because they resulted in contrasting diurnal expression patterns of the target genes, which differed from the simulated pattern. Moreover, reference genes selected based on quinoa transcriptome data 18 showed unstable expression pattern in our dataset, which emphasizes the fact that they should be further validated for to be unsuitable as reference genes under certain experimental conditions [2,4,16]. The lack of stability of some of the widely used reference genes could not be detected by the algorithms in diurnal studies in rice [16]. Moreover, the use of some of these genes could misrepresent a nondiurnal gene as diurnally regulated. To illustrate this, the three least stable reference genes (BestKeeper) and their geometric mean were used to normalize IDH-A expression data (now treated as target gene) (Fig 6). As a result, the stably expressed IDH-A ( Fig 6C) acquired a diurnal fashion ( Fig 6A and Fig 6B), showing that the use of unsuitable reference genes greatly distorts the resulting expression patterns. As the conclusion of this study, we were able to find a suitable reference gene combination, CqIDH-A and CqPTB, for the normalization of our diurnal expression data. Here we used a novel approach, which is straightforward, meticulous and overcomes failures related to stability-based selection of reference genes. Our results also showed that using different reference genes for normalization, including those top ranked by stability, causes significant differences among the diurnal expression patterns of CqBBX19 for two different accessions of C. quinoa under short day conditions. This reinforces the fact that a more meticulous validation of reference genes is required for diurnal studies. Thus, we recommend the use of the presented comparison-based approach along with the use of the algorithms. In the future, the stable reference genes obtained in this study will significantly facilitate diurnal expression studies in quinoa by improving the accuracy of RT-qPCR data normalization. 5 S1 Table. A260/A280 and A260/A230 ratios of RNA isolated from diurnal leaf samples of the Titicaca accession.
S2 Table. Reference gene candidates obtained from cross validation of literature survey and C. quinoa shoot and root transcriptome data [17]. Selected genes were stably expressed at least in two out of four stress conditions in soil-and hydroponic-grown plants. S1_raw_images. Full-length gels corresponding to cropped gels in (A) S1 Fig A and