Selection and validation of reference genes for quantitative real-time PCR in the green microalgae Tetraselmis chui

Quantitative real-time reverse transcription PCR (RT-qPCR) is a highly sensitive technique that can be applied to analyze how genes are modulated by culture conditions, but identification of appropriate reference genes for normalization is a critical factor to be considered. For this reason, the expression stability of 18 candidate reference genes was evaluated for the green microalgae Tetraselmis chui using the widely employed algorithms geNorm, NormFinder, BestKeeper, the comparative ΔCT method, and RefFinder. Microalgae samples were collected from large scale outdoor photobioreactors during the growing phase (OUT_GP), and during the semi-continuous phase at different times of the day (OUT_DC). Samples from standard indoor cultures under highly controlled conditions (IND) were also collected to complement the other data. Different rankings for the candidate reference genes were obtained depending on the culture conditions and the algorithm employed. After comparison of the achieved ranks with the different methods, the references genes selected for samples from specific culture conditions were ALD and EFL in OUT_GP, RPL32 and UBCE in OUT_DC, and cdkA and UBCE in IND. Moreover, the genes EFL and cdkA or EFL and UBCE appeared as appropriate combinations for pools generated from all samples (ALL). Examination in the OUT_DC cultures of genes encoding the large and small subunits of ADP-glucose pyrophosphorylase (AGPL and AGPS, respectively) confirmed the reliability of the identified reference genes, RPL32 and UBCE. The present study represents a useful contribution for studies of gene expression in T. chui, and also represents the first step to set-up an RT-qPCR platform for quality control of T. chui biomass production in industrial facilities.

In the present study, the suitability of a set of 18 different candidate reference genes for normalization in RT-qPCR assays in the microalgae species T. chui were evaluated. The candidate genes were chosen based on their previous assessment in other organisms, particularly in other microalgae. The candidate genes included, 18S rRNA (18S), actin (ACT), aldolase A (ALD), alpha tubulin (aTUB-1 and aTUB-2), beta tubulin (bTUB), cyclin dependent kinase A (cdkA), elongation factor-like (EFL), glyceraldehyde-3-phosphate dehydrogenase (GAPDH), histone 2A (His2A), beta-ketoacyl-ACP synthase (KAS), large subunit of the ribulose-1,5-biphosphate carboxylase (rbcL), ubiquitin conjugating enzyme (UBCE), phosphoglycerate kinase (PGK), ribosomal protein S10 (RPS10) and L32 (RPL32), and eukaryotic translation initiation factor 2 (eIF2-1 and eIF2-2). T. chui microalgae samples were first collected from outdoor photobioreactors dedicated to the industrial production of biomass operated in semicontinuous mode. Sampling was performed from initial inoculation in photobioreactors until the cultures reached the appropriate cell density for daily harvesting and spanned 18 days. Thereafter, samples were drawn from outdoor photobioreactors at five different hours of the diurnal cycle from dawn to dusk. The collected samples represented microalgae exposed to strong differences in light irradiance (ranging from 3 to 1300 μmol photons m -2 s -1 of photosynthetically active radiation (PAR)) and also in culture temperature (ranging from 10.6 to 22.8˚C). Further samples were obtained from indoor cultures grown under stable temperature and light irradiance and spanned the growth curve from the initial inoculation to the stationary phase. Thereafter, the transcript abundance of the candidate reference genes were determined by RT-qPCR, and analyzed using geNorm, NormFinder, BestKeeper, the comparative ΔC T method, and RefFinder to identify reference genes suitable for gene expression normalization. Finally, the expression of the genes encoding for the large and small subunits of ADP-glucose pyrophosphorylase (AGPL and AGPS, respectively) were examined by using the most stable reference genes selected.

Strain and culture conditions
T. chui strain CCFM-03 (belonging to the Culture Collection of Fitoplancton Marino, S.L.) is cultured for industrial production of biomass in the facilities of the company Fitoplancton Marino, S.L. For the purpose of this study, 4000 L closed tubular outdoor photobioreactors were sampled. Microalgae cells were grown in seawater with f/2 culture medium [59] throughout the production cycle, including the first indoor stages during culture up-scaling through to outdoor cultures. The composition of f/2 medium was: 8. In outdoor cultures, the microalgae were exposed daily to ambient conditions, with temperature and light intensity being probably the most significant parameters that changed (S1 Table). Culture conditions that were strictly controlled included pH (maintained at 7.5-8.5 by on-demand CO 2 injections) and nitrate content, which was maintained over 500 μM. Indoor cultures of microalgae were maintained under controlled conditions at 22˚C in a thermoregulated room with a photoperiod of 24 h of cool fluorescent white light, and~150 μmol photons m -2 s -1 of PAR. All microalgae cultures were continuously aerated using compressed air containing 2% CO 2 .

Microalgae samples
Three different conditions were tested in the study: i) highly controlled indoor cultures (IND), ii) outdoor cultures during the growth phase (OUT_GP), that was, from initial inoculation of the photobioreactors until the cultures reached an appropriate cell density for semi-continuous cultivation, and iii) outdoor cultures during the semi-continuous phase (OUT_DC), in which the culture was maintained in exponential growth over~2 months by daily harvesting of biomass followed by addition of nutrients to restore adequate levels of nitrate, salts and vitamins.
For the IND cultures, three 5 L flasks were inoculated on day 1 (D1) at an average initial cell density of 85 x 10 3 cells/ml (1/20 dilution of a well grown source culture), which was determined using a haematocytometer (Neubauer counting chamber) and a microscope. Microalgae samples were withdrawn from each of the triplicate flasks for cell density determination and RNA isolation on D1, D4, D7, D10 and D13. Samples from D4 and D7 corresponded to the exponential growth phase and samples from D10 and D13 corresponded to the stationary growth phase (S1 Fig). A total of 15 samples were collected (three per time point) for gene expression analysis. Collection of samples for both cell density determination and RNA isolation occurred at 9:00 a.m. on the five sampling days.
OUT_GP samples were collected from three 4000 L photobioreactors of the same production unit after inoculation on D1 (with an average cell density of 20 x 10 3 cells/ml), and then at D4, D7, D11, and finally at D18 (the day before the commencement of the semi-continuous cultivation mode) during the exponential growth phase. In all instances, the samples were taken at 1:00 p.m. Samples were collected for cell density determination (S2 Fig) and 15 samples (three per time point) were collected for RNA isolation and further gene expression analysis.
OUT_DC samples were collected from three 4000 L photobioreactors of the same production unit and during the semi-continuous cultivation mode. Samples were withdrawn 13 days after the commencement of the semi-continuous phase (approximately 30 days after inoculation of the photobioreactors) for RNA isolation at five different times of the diurnal cycle: 7:00 a.m., 10:00 a.m., 1:00 p.m., 4:00 p.m. and 7:00 p.m. Thus, a total of 15 samples (three per time point) were processed for gene expression analysis. The average cell density at 7:00 a.m. (the first sample of a given was 7.4 x 10 6 cells/ml. The temperature of cultures in OUT_GP and OUT_DC was automatically controlled with a probe at each sampling time, and PAR was also recorded using a radiometer sensor LI-250A Light Meter (LICOR Biosciences). The data recorded is presented in S1 Table. Total RNA isolation and cDNA synthesis In each sample, an appropriate volume of culture was drawn to give~4 x 10 6 cells for RNA isolation. Cells were centrifuged at 5,000 g for 5 min, and the supernatant was removed. Microalgae cells were homogenized using stainless steel beads (0.2 mm, Next Advance) for 3 min at speed 10 in a Bullet Blender 1 24 (Next Advance). Total RNA was extracted using a NucleoSpin 1 Plant II kit (Macherey-Nagel) following the manufacturer's instructions. The extracted RNA was treated twice with DNase I to avoid further PCR amplification of residual traces of genomic DNA. RNA samples were quantified using a NanoDrop 2000 spectrophotometer (Thermo Scientific) and the quality was checked in agarose gels.
Total RNA (1 μg) from each sample was reverse-transcribed using an iScript™ cDNA Synthesis kit (Bio-Rad) in a reaction volume of 20 μL according to the manufacturer's protocol. All cDNA reactions were finally diluted 10-fold by adding 180 μL of nuclease-free water. The absence of genomic DNA contamination was confirmed by direct PCR amplification of two randomly selected RNA samples in the absence of cDNA synthesis.

Selection of candidate reference genes for RT-qPCR
An initial set of 16 genes involved in different cellular processes were pre-selected to assess their suitability as reference genes for RT-qPCR analysis. The selection of candidate reference genes was based on previous gene expression studies of other microalgae species and other organisms. The nucleotide sequence corresponding to 18S and rbcL from T. chui was directly retrieved from GenBank/EMBL/DDBJ database (S2 Table). The nucleotide sequences of the other candidate genes were identified in the available transcript sequences (~22,600) for the T. chui strain PLY429 in the iMicrobe database (www.imicrobe.us) after annotation with the AutoFACT tool [60]. EditSeq v8.1.3 (DNASTAR) was used to identify the coding sequences of candidate genes as well as the deduced protein sequences, and then gene identities were confirmed with BLASTp (S2 Table). Two different coding sequences were found in T. chui for both alpha tubulin (aTUB-1 and aTUB-2) and eukaryotic translation initiation factor 2 (eIF2-1 and eIF2-2), and all four genes were included in this study. The isolation of the target genes AGPL and AGPS followed the same general strategy outlined above.

Primer design and RT-qPCR
Specific primer pairs (Table 1) for the 18 selected candidate reference genes (and also for the target genes AGPL and AGPS) were designed using Oligo v7.60 software (Molecular Biology Insights). To ensure maximum specificity and efficiency in PCR amplification, highly stringent conditions were initially selected for primer design. For each primer pair these were: ΔG of the most stable 3'-dimer > -3.5 Kcal/mol, ΔG of the most stable dimer overall > -7.5 Kcal/mol, no hairpins with ΔG < -1 Kcal/mol and a melting temperature (T m ) > 40˚C, primer efficiency (PE) > 480, PE of false priming sites < 100, optimal annealing temperature (T a ) of the PCR product between 62-65˚C, primer lengths of 20-28 nucleotides, and amplicon length ranging between 100-200 base pairs. When most of the criteria were successfully met, the recommended temperature for annealing/extension in PCR was 68˚C. All designed primers were finally ordered from a commercial supplier (Eurofins). Appropriate performance of each primer pair was initially tested by PCR amplification of the target amplicons employing the same conditions described below for RT-qPCR. Specificity of each primer pair was verified by melting curve analysis from 70˚C to 95˚C with a ramp speed of 0.5˚C every 10 s. Single, sharp peaks were obtained in all instances, thus ruling out amplification of non-specific products or primer dimer artifacts (S3 Fig). To determine real-time PCR efficiency (E) of each primer pair, standard curves were prepared from serial dilutions of cDNA (from 100 ng to 0.01 ng) and the results plotted in Excel against their C T values (S4 Fig). Then, slopes of the linear regressions were determined applying the equation E = 10 −1/slope as previously reported [61]. The linear correlation coefficient (R 2 ) of each primer pair was also determined in Excel. The PCR products were analyzed by standard agarose gel electrophoresis (2.5% w/v in TAE 1X) and clear DNA bands of the expected sizes were obtained (S5 Fig). RT-qPCR was conducted using a CFX96™ Real-Time PCR Detection System (Bio-Rad). Each 10 μL reaction contained 5 μL of 2X iQ™ SYBR 1 Green Supermix (Bio-Rad), 300 nM of both forward and reverse primers (0.3 μL of a 10 μM stock each), 2 μL of cDNA (corresponding to the cDNA retrotranscribed from 10 ng of RNA), and 2.4 μL of nuclease-free water. All   reactions were run in duplicate, and the mean threshold cycle (C T ) of each sample (S3 Table) was used for further calculations. Relative transcript levels of the target genes were determined using the 2 -ΔΔCt method [62]. The thermal cycling profile included an initial incubation at 95˚C for 3 min, followed by 40 cycles of 95˚C for 15 s and 68˚C for 30 s.

Analysis of expression stability of the candidate reference genes
Five different approaches were employed to determine the expression stability of the candidate reference genes. The geNorm, or pairwise comparison approach, ranks candidate genes according to their expression stability [27]. The gene stability measure, or M, is determined for a given reference gene as the average pairwise variation for that gene in relation to the remaining tested reference genes. The gene with the highest M is stepwise excluded, with a new M value being repeatedly calculated for the remaining genes until a final pair of genes with the lowest M (the most stable) are obtained. Moreover, this Excel-based software determines the optimal number of references genes required for an accurate normalization of real-time PCR data. For this, the pairwise variation (V n /V n+1 ) was analyzed between the normalization factors NF n and NF n+1 ; V n values were calculated by stepwise inclusion of additional reference genes until the (n+1) gene did not contribute significantly to the newly determined normalization factor. NormFinder software (also Excel-based) is a model-based approach that ranks the candidate reference genes according to their minimal combined inter-and intra-group expression variation [28]. Such variations are combined to give a stability value, with the lowest value corresponding to the most stable expression. BestKeeper (an Excel-based statistical method) calculates the standard deviation (SD) and coefficient of variation (CV) from the input raw C T values, with genes exhibiting the lowest SD and CV being considered as the most stable [29]. Reference genes with SD higher than 1 are considered as inconsistent and are excluded. The comparative ΔC T algorithm uses the average SD values to rank the stability of all candidate reference genes, and the reference gene with the lowest SD is considered as the most stable [30]. Finally, as a complementary analysis to assess reference gene stability, the web-based tool RefFinder was employed to integrate the results from the four previous software packages, generating a comprehensive rank list of candidate genes [31].

Statistical analysis of RT-qPCR data
Statistical analyses to determine relative gene expression variations in OUT_DC, both for AGPL and AGPS, were conducted using Prism 6 (GraphPad Software). Changes in transcript abundance were obtained after: i) normalization with the two most stable reference genes using the combined outputs of geNorm, NormFinder, BestKeeper, and the comparative ΔC T method for the OUT_DC condition (RPL32 and UBCE), ii) normalization with the two genes exhibiting the highest stability with RefFinder in OUT_DC (RPS10 and rbcL), or iii) normalization with the two genes exhibiting the lowest stability with all the employed methods in OUT_DC (PGK and eIF2-1). The relative gene expression levels were also determined for AGPL and AGPS after normalization with the most stable genes found in the global analysis of all samples and conditions (ALL) based on the combined results of geNorm, NormFinder, BestKeeper, and the comparative ΔC T method (EFL and cdkA), as well as with the combination of EFL (top-ranked by RefFinder) with another of the most stably-expressed genes found by RefFinder (ACT). In all instances, data were analyzed using the Friedman test (non-parametric one-way ANOVA), and when significant, the Dunn's multiple comparison test was performed. Significance was accepted for P < 0.05.

Expression levels of the candidate reference genes
The expression stability of the candidate reference genes in T. chui was first assessed through the analysis of C T values in real-time PCR (a high C T means that the gene has a low expression level, and vice versa). Variation of data was compared between OUT_GP, OUT_DC and IND conditions, and with all data (ALL) being analyzed together (Fig 1). All the examined genes exhibited amplification prior to cycle n˚40 of the PCR amplification profile, with eIF2-1 always showing the highest C T (in some instances close to 33) across all conditions. In contrast, the lowest C T values (and hence the highest expression levels) were observed for 18S, and in some samples it was slightly higher than 9. Expression stability of reference genes geNorm analysis. As a starting point, the average expression stability (M) values of the 18 candidate reference genes were determined using the geNorm algorithm (Fig 2). Lower M values indicate higher stability. In all instances, candidate genes included in this study displayed an M value below the geNorm threshold of 1.5, thus revealing all of them as suitable for gene expression normalization. The genes with the most stable expression differed with condition and ALD and KAS, RPL32 and RPS10, and cdkA and UBCE displayed the lowest M values in OUT_GP, OUT_DC and IND samples, respectively. In ALL, the combination of the genes EFL and RPS10 was the best. As a whole, the least stable genes were eIF2-1 and PGK, although unexpectedly the former had the most stable expression in IND samples. To determine the optimal number of reference genes that would be required for an accurate normalization of gene expression analysis, pairwise variation values (V) were calculated with the cut-off of suitability set at 0.15, and below this value the addition of an additional internal reference gene does significantly improve normalization. As shown in Fig 3, the V2/3 was below the cut-off for all the experimental conditions studied and for the global analysis ALL. Thus, at least two genes were predicted as necessary for optimal normalization of gene expression levels in our experimental set-up.
NormFinder analysis. The stability of the 18 candidate reference genes evaluated with NormFinder software (Fig 4) should prevent misinterpretations of results owing to artificial selection of co-regulated genes. The stability ranking of candidate reference genes was significantly different between conditions. The top three stable candidate genes were EFL, cdkA, and RPS10 in OUT_GP, while UBCE, eIF2-2, and GAPDH were ranked top in OUT_DC. In IND, eIF2-1 was the most stable gene followed by UBCE and RPS10, although for OUT_GP, OUT_DC and ALL (pool of samples from all conditions) it was the least stable gene together with PGK. Finally, cdkA was the most stable gene in ALL followed by RPS10 and ACT.
BestKeeper analysis. The BestKeeper algorithm is a Microsoft Excel-based software and calculates the standard deviation (SD) and the coefficient of variation (CV) based on C T values. Candidate reference genes with lower SD and CV have higher stability and are more suitable for normalization and genes with SD > 1 are considered unstable and unsuitable for normalization. The results of the BestKeeper analysis are shown in Table 2. The top-ranked candidate genes in OUT_GP exhibited significantly higher SD (~2-fold) than the top-ranked genes in OUT_DC and IND. Based on SD values, the most stable genes were UBCE in OUT_GP, RPL32 in OUT_DC, and EFL in IND and ALL. Moreover, EFL was also ranked among the top three genes in OUT_GP and OUT_DC. RPL32 was ranked second in OUT_GP and ALL but ninth in IND, and UBCE ranked seventh in OUT_DC and eighth in IND. Overall, with the BestKeeper algorithm PGK, eIF2-1 and KAS were the least stable genes.
Comparative ΔC T analysis. The comparative ΔC T method uses the average standard deviation (SD) value of candidate reference genes as an indicator of expression stability. Genes with the lowest SD values are the most stable. As a whole, the highest stability of reference genes was found for IND samples and the 14 stable genes had an SD value < 1, compared to four reference genes for OUT_GP samples and one reference gene for OUT_DC samples ( Fig  5). EFL ranked first in OUT_GP samples and UBCE was the most stable gene in OUT_DC and IND samples. RPS10 had the lowest SD in the pooled samples, ALL, and was always among the top five most stable genes across all experimental conditions OUT_GP, OUT_DC and IND.
Comprehensive ranking of RefFinder. The web-based program RefFinder integrates the results of the four above-mentioned approaches (geNorm, NormFinder, BestKeeper and ΔC T method), and calculates the geometric mean of the ranking of all the candidate reference genes included in the analysis. In OUT_GP samples, the most stable genes were EFL, ALD and ACT, whereas RPS10, rbcL and UBCE were the most suitable reference genes in OUT_DC (Fig 6). In IND, UBCE, aTUB-2 and cdkA exhibited the highest expression stability. And in pooled samples, ALL, EFL, RPS10 and RPL32 were the most stable genes.
In summary, quite different rankings of the best candidate reference genes were observed when experimental samples from different culture conditions were compared and analyzed using different methods, which highlights that the results can vary significantly with the algorithm used. The geNorm output indicated that two reference genes were enough for normalization in all the conditions tested. Comparison of the ranking of the candidate reference genes with the different methods, revealed that the optimum references genes were ALD and EFL for OUT_GP, RPL32 and UBCE for OUT_DC and cdkA and UBCE for IND. Moreover, EFL and cdkA or EFL and UBCE were the most appropriate for the pooled samples (ALL). In all the above recommendations, pairs of genes belonging to the same functional group were excluded to avoid possible existence of co-regulation that might interfere with the reliability of the results. For instance, although EFL, RPS10 and RPL32 were among the most stable genes for ALL, these three genes are involved in protein biosynthesis and hence only EFL (as top-ranked by geNorm, BestKeeper and RefFinder) was chosen.

Expression profile of AGPL and AGPS in OUT_DC
ADP-glucose pyrophosphorylase catalyzes the first step of starch biosynthesis, that is, the synthesis of ADP-glucose. In eukaryotes, it is a heterotetramer comprised of two distinct subunits, two alpha (AGPS) and two beta (AGPL). As starch accumulates due to carbon fixation during photosynthesis and is further degraded in the dark, we hypothesized that AGPS and AGPL were likely co-regulated and would have similar expression patterns modulated by starch content fluctuations. To test this hypothesis and verify the validated reference genes in RT-qPCR, we quantified AGPL and AGPS expression in OUT_DC samples. The reference genes for normalization were RPL32 and UBCE, since they were the most stable for the OUT_DC samples and this was compared with the outcome of normalization with the least stable genes PGK and eIF2-1. In addition, AGPL and AGPS expression was normalized with RPS10 and rbcL, the most stable reference genes for OUT_DC samples using the RefFinder program ranking. As a complementary analysis and considering the results obtained in the combined condition ALL, the suitability of other candidate reference genes such as cdkA and ACT in combination with EFL was also evaluated. As shown in Fig 7, when RPL32 and UBCE genes were used for normalization, either alone or in combination, a statistically significant increase in AGPL transcripts was observed at 7 p.m. being 2.67-fold (UBCE) and 3.23-fold (RPL32) higher than at 7 a.m. (the calibrator time point). The gene expression results obtained with rbcL, RPS10 and their combination matched those previously mentioned, with a peak in transcript abundance at 7 p.m. (ranging between 2.76-fold and 3.10-fold higher than at 7:00 a.m. with rbcL and RPS10, respectively). This profile was also conserved when the data were analyzed with EFL and cdkA, or EFL and ACT (S6 Fig), with some slight differences in the ratios. However, when the analysis was performed with the least stable genes PGK and eIF2-1, strong differences could be found. Although the expression pattern with PGK was similar to those found with stable genes in the sense that a peak in gene expression was detected at 7 p.m. and the lowest transcript levels were found at 10 a.m., the ratios were significantly different. Up to 90.81-fold higher transcripts were measured at 7 p.m., representing a higher than 500 fold-increase in relation to 10 a.m., this ratio being significantly higher than the ratios found with the most stable genes (always lower than 25-fold). Moreover, when eIF2-2 was employed as internal control, it was striking that the highest expression levels of AGPL were detected at 4 p.m. (~28-fold higher than at 7 a.m.), although the lowest transcript amounts were measured again at 10 a.m. (0.48-fold lower than at 7 a.m.). With the combination of PGK and eIF2-1 the maximum and minimum expression levels were observed again at 7 p.m. and 10 a.m. respectively, but with ratios not only at 7 p.m. but also at 1 p.m. and 4 p.m. being strongly different to those shown with the most stable genes. As a whole, analyses with stable reference genes fitted our hypothesis in the sense that the highest transcript amounts of AGPL in OUT_DC were observed at the end of the light phase, when maximum starch accumulation might be expected. The expression of AGPS was evaluated using the same reference genes for normalization as was used for AGPL (Fig 8). The AGPS gene expression profile and expression ratios after Relative AGPL expression profiles in OUT_DC samples determined by qRT-PCR using different reference genes. In all instances, data are expressed (at the outside end of the corresponding column) as the mean fold change (mean + SEM, n = 3) from the calibrator group (OUT_DC-7 a.m.). Different letters denote significant differences (P < 0.05) between time points using the Friedman test (non-parametric one-way ANOVA) followed by the Dunn's multiple comparison test.
https://doi.org/10.1371/journal.pone.0245495.g007 normalization with RPL32 and UBCE alone or in combination, revealed a significant downregulation at 7 p.m. (~5-fold lower transcripts) compared to 7 a.m. (used as a calibrator). Similar results were also obtained for AGPS expression normalized with rbcL, RPS10 and their combination. The large drop in AGPS transcription at 7:00 p.m. was even more pronounced (~6-fold lower than at 7:00 a.m.) when EFL and cdkA or EFL and ACT were used for In all instances, data are expressed (at the outside end of the corresponding column) as the mean fold change (mean + SEM, n = 3) from the calibrator group (OUT_DC-7 a.m.). Different letters denote significant differences (P < 0.05) between times points using the Friedman test (non-parametric one-way ANOVA) followed by the Dunn's multiple comparison test.
https://doi.org/10.1371/journal.pone.0245495.g008 normalization (S6 Fig). However, expression patterns were substantially modified when normalization was performed using the least stable reference genes. For example, a significant upregulation of AGPS was observed at 7 p.m. when PGK was used for normalizarion (6.01-fold higher than at 7 a.m.), and at 4 p.m. when eIF2-1 was used for normalization (12.79-fold higher) or when both genes was used (4.17-fold higher). In contrast to what was initially hypothesized, transcriptional regulation of AGPS was opposite to that of AGPL, as the lowest transcript abundance of AGPS occurred concomitantly with the highest transcription of AGPL at the end of the light phase.
In conclusion, when appropriate reference genes are selected for normalization in gene expression analysis using different algorithms similar outcomes are obtained. However, inappropriate selection of reference genes is a major concern as it can lead to substantially modified gene expression patterns.

Discussion
In this stydy, the expression stability of 18 candidate reference genes was evaluated in the green microalgae species T. chui using geNorm, NormFinder, BestKeeper, the comparative ΔC T method, and RefFinder. Three different experimental conditions for microalgae production were tested (OUT_GP, OUT_DC, and IND), as well as a pool of all samples (ALL). The most prominent observations was that each of the above tools produced a different set of topranked reference genes in each group of experimental samples and conditions. This finding was not entirely unexpected, as such heterogenous results may be caused by application of different mathematical algorithms as found in previous studies [21,33,35,37]. As each algorithm has its own drawbacks [63], the use of different methods to define the most appropriate reference genes for target gene expression normalization in RT-qPCR analysis is strongly recommended to improve the reliability of results. Moreover, to our knowledge, the present study represents the first RT-qPCR survey of candidate genes for normalization in T. chui. The only related study was focused on the expression of a high-affinity phosphate transporter gene in T. chui using absolute rather than relative PCR quantification so that no normalization was needed [64]. Absolute quantification requires construction of a standard sense RNA for every target gene of interest to create an external calibration curve, and this other well-known constraints of absolute quantification [65], makes it significantly less practical when quantifying ten or more genes in a single study. Herein the suitability of 18 candidate reference genes for relative real-time expression analysis in microalgae samples harvested in a production setting from 4000 L photobioreactors under different conditions, OUT_GP and OUT_DC, was evaluated. The main stress conditions for microalgae were related to the daily environmental changes, with light intensity and temperature being probably the most relevant factors. This study aimed to have a final application in industry and as such is very different from the highly controlled, low culture volumes of laboratory conditions, either standard or stressful, previously used to identify reference for other microalgae species [44,47,[49][50][51][52][53]66].
Gene expression analyses represent a powerful tool for gaining insight into the molecular and biochemical mechanisms that govern physiological features and responses to environmental changes in a wide range of different organisms, including microalgae. For this reason high throughput sequencing approaches have been employed to provide a comprehensive view of the responsiveness of different metabolic pathways under defined experimental conditions, nitrogen starvation being probably the most widely assessed [58,[67][68][69]. Once the genetic resources containing thousands of predicted transcripts are generated and become available, such information can be further processed and used to study the expression of selected sets of genes involved in relevant metabolic pathways through RT-qPCR platforms containing a variable number of representative genes. For instance, 16 differentially expressed genes involved in lipid accumulation and carbon fixation were studied in the species Chorella sorokiniana grown under nitrogen limiting conditions to validate RNA-seq data [70]. In Tetraselmis suecica, a set of 14 selected genes exhibited significant up-and down-regulation using RT-qPCR in agreement with RNA-seq results [58]. Recently, a RT-qPCR platform containing more than 100 targeted genes was used for Nannochloropsis gaditana to study the influence of light quality on metabolism [71]. A similar strategy with an array of~100 genes was also employed in human cell lines to screen for the potential bioactivity of microalgae extracts [72]. These studies demonstrate that modulation of metabolic processes can be detected via gene expression analysis of gene markers. When dealing with microalgae production at an industrial scale, expression analysis with a species-specific RT-qPCR platform containing selected gene markers represents and innovative biotechnological approach for quality control of biomass. For T. chui, although there are available genetic resources, biotechnological tools still need to be developed. The present study identifying appropriate reference genes represents a first and key step for the accurate detection of varying gene expression under different conditons.
Most of the candidate reference genes included in this study have previously been tested in other microalgae species. In particular, the stability of the traditional GAPDH, ACT, and 18S reference genes has been simultaneously assessed in a range of microalgae species [17,43,47,49,50,52,53,55,70], macroalgae [40,73,74] and plants [22,75,76]. The two candidate alpha tubulin encoding genes here analyzed, aTUB-1 and aTUB-2, probably arose by gene duplication and this observation is not restricted to T. chui. Tubulin belongs to a multigene family in other eukaryotes including mammals, ciliates, fungi, or plants [77][78][79][80], and a variable number of gene duplicates have also been found in other evolutionary distant microalgae such as Bigelowiella, Cyanophora, Emiliania or Chlamydomonas [81]. Two genes encoding the eukaryotic translation initiation factor 2 (eIF2-1 and eIF2-2) were identified in T. chui and were analyzed. The eIF2-2 isoform in T. chui was most similar to mitochondrial isoforms, but lacked a signal peptide as shown by analysis with SignalP 5.0 [82]. To our knowledge, this is the first time that these gene duplicates have been tested as reference genes in microalgae, as the eukaryotic translation initiation factor 4E (eIF4E) was assessed in the dinoflagellates Prorocentrum minimum [47] and Karenia mikimotoi [50], and another gene referred to as eukaryotic (translation) initiation factor was tested in the green microalgae Closterium ehrembergii [56] or the dinoflagellate Alexandrium catanella [49].
The EFL gene included in this study is related to, but clearly distinguishable from the eukaryotic elongation factor-1 alpha gene (EF-1α) previously analyzed in several microalgae [52][53][54]. EF-1α expression was also assessed in green microalgae such as Volvox carteri [55] and Chlamydomonas [17,57], although since only EFL has been reported in green algae it seems likely that the putative EF-1α was misannotated [66,83]. In addition, the inclusion in the present study of KAS as a potential reference gene for normalization is new to microalgae, and came from previous data about gene stability in the microalgae Nannochloropsis gaditana [84]. Finally, the remaining genes (bTUB, PGK, His2A, rbcL, cdkA, UBCE, ALD, RPL32 and RPS10) were selected as they have been previously tested as reference genes in microalgae.
The results for the candidate reference genes in T. chui comparted to previous studies of other microalgae species revealed they were variable. For instance, although GAPDH and ACT are the most used reference genes for normalization in metazoan [26], few studies have demonstrated them to be optimal for normalization. For example, GAPDH was the most stable gene under different experimental conditions in Chlamydomonas [17,57], but in the present study it was not top-ranked. In contrast, ACT, which led the stability ranking in Chlamydomonas sp. [17], Nannochloropsis sp. [52], Ditylum brightwellii [45], Closterium ehrenbergiii [56], Isochrysis zhangjiangensis [51], Tetraselmis suecica [58], and was the second most stable gene in Alexandrium catanella [49], was not well ranked in our study or Volvox [55], Similarly, 18S was not the most stable gene in any condition in T. chui.
The use of alpha and beta tubulin genes has been has been reported in other microalgae species and in some cases these genes were two of the most stable genes [44]. In other studies only one of the two genes was stable [47,[50][51][52]54,56,58], and in other microalgae species none of the two genes was among those with the highest stability [17,43,55,57]. In this study, bTUB was never among the top seven most stable genes in any of the analysis or conditions assayed, and with the exception of IND, neither aTUB-1 nor aTUB-2 was in the first seven ranked genes. Overall, our study revealed that choice of the reference gene depends on species, experimental conditions and the analytical approach, highlighting the need for appropriate stability testing of reference genes before their use in expression analysis of target genes.
From an industrial perspective we assessed the expression of the T. chui genes encoding both the large (AGPL) and small (AGPS) subunits of ADP-glucose pyrophosphorylase, as the functional enzyme is a hetero-tetrameric protein composed of two alpha (or small, S) and two beta (or large, L) subunits [85,86]. The results confirmed that the use of inappropriate reference genes can lead to strong differences not only in the expression profile of the target gene, but also in its relative expression levels (expressed as fold-changes) between sampling points. This is in agreement with the findings of previous studies [17,52,54,87], which highlights the importance of validating reference gene stability. The second main objective was to study possible transcriptional regulation of both genes in an outdoor production system. The ADP-glucose pyrophosphorylase is involved in the synthesis of ADP-glucose, which is the donor for the elongation of alpha-1,4-glucosidic chains that serve as intracellular carbon and energy storage, as starch in plants and green and red algae and glaucophytes [88,89]. In microalgae, starch production can be strongly affected by light intensity and temperature [90,91], factors that significantly varied in samples collected from OUT_DC. Although AGPL and AGPS are both involved in starch synthesis the genes had a different regulation, as the peak of AGPL expression by the end of the light phase coincided with the lowest expression levels of AGPS. In a diurnal cycle, an increase in transcripts of ADP-glucose pyrophosphorylase gene was reported for the picoalgae Ostreococcus [92]. This finding suggests that AGPL and AGPS have different daily rhythms in T. chui under natural photoperiod. In this regard, genes exhibiting differential transcriptional regulation under light/dark cycles have been reported in a range of different microalgae such as Chlamydomonas [93,94], Ostreococcus [93,95], Nannochloropsis [96], Phaeodactylum [97], Tetradesmus [98], Bigelowiella [99], or Chrysochromulina [100]. More research will be needed to confirm if the transcriptional pattern of AGPL and AGPS is conserved in a diel cycle, as well as to analyze the effect of changing climate conditions on their expression.

Conclusions
As far as we are aware the present study represents the first systematic analysis of candidate reference genes for the species T. chui. Moreover, and in contrast to all previous reports on microalgae, the samples analyzed were collected from large scale outdoor photobioreactors from operating industrial production units. For each of the conditions assessed (OUT_GP, OUT_DC, and IND), distinct rankings of genes were obtained with the tested algorithms. Overall, the following references genes were identified as stably-expressed: ALD and EFL for OUT_GP, RPL32 and UBCE for OUT_DC, and cdkA and UBCE for IND. Moreover, the genes EFL and cdkA or EFL and UBCE were the most appropriate for pooled samples (ALL). Expression analysis of the genes AGPL and AGPS in OUT_DC samples confirmed that the selected reference genes were appropriate for normalization of target genes and the consequence of inappropriate reference genes for accuracy and reliability of the quantification. Finally, the present study forms the basis of an RT-qPCR platform to be used for quality control of T. chui biomass in industrial production facilities. The expression of AGPL and AGPS were determined by RT-qPCR using different reference genes. In all instances, data are expressed (at the outside end of the corresponding column) as the mean fold change (mean + SEM, n = 3) from the calibrator group (OUT_DC-7 a.m.). Different letters denote significant differences (P < 0.05) between time points using the Friedman test (non-parametric one-way ANOVA) followed by the Dunn's multiple comparison test. (TIF) S1 Table. Culture temperature (average ± SD,˚C) and PAR (μmol m -2 s -1 ) figures at sampling days and times in outdoor conditions. (DOCX) S2 Table. Candidate reference genes for normalization of RT-qPCR expression data. Information regarding the target genes AGPL and AGPS included in the study is also shown below. (DOCX) S3