Statistical differences resulting from selection of stable reference genes after hypoxia and hypothermia in the neonatal rat brain

Real-time reverse transcription PCR (qPCR) normalized to an internal reference gene (RG), is a frequently used method for quantifying gene expression changes in neuroscience. Although RG expression is assumed to be constant independent of physiological or experimental conditions, several studies have shown that commonly used RGs are not expressed stably. The use of unstable RGs has a profound effect on the conclusions drawn from studies on gene expression, and almost universally results in spurious estimation of target gene expression. Approaches aimed at selecting and validating RGs often make use of different statistical methods, which may lead to conflicting results. Based on published RG validation studies involving hypoxia the present study evaluates the expression of 5 candidate RGs (Actb, Pgk1, Sdha, Gapdh, Rnu6b) as a function of hypoxia exposure and hypothermic treatment in the neonatal rat cerebral cortex–in order to identify RGs that are stably expressed under these experimental conditions–using several statistical approaches that have been proposed to validate RGs. In doing so, we first analyzed RG ranking stability proposed by several widely used statistical methods and related tools, i.e. the Coefficient of Variation (CV) analysis, GeNorm, NormFinder, BestKeeper, and the ΔCt method. Using the Geometric mean rank, Pgk1 was identified as the most stable gene. Subsequently, we compared RG expression patterns between the various experimental groups. We found that these statistical methods, next to producing different rankings per se, all ranked RGs displaying significant differences in expression levels between groups as the most stable RG. As a consequence, when assessing the impact of RG selection on target gene expression quantification, substantial differences in target gene expression profiles were observed. Altogether, by assessing mRNA expression profiles within the neonatal rat brain cortex in hypoxia and hypothermia as a showcase, this study underlines the importance of further validating RGs for each individual experimental paradigm, considering the limitations of the statistical methods used for this aim.

Introduction In qPCR analysis, reference genes (RGs) with stable expression levels are essential internal controls for relative quantification of mRNA expression. RGs normalize variations of candidate gene expression under different conditions [1,2]. The ideal RG should be expressed at constant levels regardless of e.g. experimental conditions, developmental stages or treatments [3,4], and should have expression levels comparable to that of the target gene [5]. Nevertheless, increasing evidence suggests that the expression of commonly used RGs often varies considerably under different experimental conditions, as reviewed previously [6,7]. The choice of unstable RGs for the normalization of qPCR data may give rise to inaccurate results, concomitant with potential expression changes in genes of interest being easily missed or overemphasized. Thus, the identification of stable RGs is a prerequisite for reliable qPCR experiments [8][9][10].
RG selection should be performed using the same samples that will be compared when looking at genes of interest. For this purpose, several statistical methods have been proposed, i.e. GeNorm [11], qBase [12], BestKeeper [13], NormFinder [14], Coefficient of Variation (CV) analysis [15], and the comparative ΔCt method [16]. As previously reported [17], each of these strategies is based on certain assumptions that make the stability ranking depending on the method used, potentially leading to conflicting results.
To combine these stability rankings, two main approaches have been proposed, i) a weighted rank [18][19][20] and ii) the Geometric mean rank [21,22]. These methods use the average of the stability ranks, and in doing so ignore the limitations of each statistical method. To overcome this limitation, recently, Sundaram and colleagues have suggested an integrated approach [17], including a first selection step making use of the CV analysis (eliminating genes with CV>50%), then statistically comparing RG expression profiles between conditions and, finally, ranking them using NormFinder.
In the present study, we applied this same approach in the evaluation of the stability of five candidate RGs in a murine model of perinatal asphyxia and therapeutic hypothermia. Perinatal asphyxia is a clinical condition defined as oxygen deprivation that occurs around the time of birth and may be caused by perinatal events such as placental abruption, cord prolapse, or tight nuchal cord, limiting the supply of oxygenated blood to the fetus [23]. Recently, hypothermia has emerged as the standard of care for perinatal asphyxia. Although this treatment has been demonstrated to be effective in reducing mortality and long-term consequences of perinatal asphyxia, the underlying mechanisms of this therapy are still not completely understood [24][25][26][27][28]. Assessing gene expression changes in the neonatal hypoxic-ischemic brain may be of added value in order to further decipher the mechanism of perinatal asphyxia and to increase the effectivity of therapeutic hypothermia and related therapies.
Here, we used a murine model of perinatal hypoxic-ischemic encephalopathy that causes well-described physiological and behavioral impairments and recapitulate several key features of human perinatal hypoxic-ischemic injury [29][30][31], to address the abovementioned problems in RG selection and qPCR normalization. Several in vivo and in vitro studies on hypoxia, making use of qPCR, have been reported [32][33][34][35][36][37][38], indicating that hypoxia significantly impacts the expression of various commonly used RGs. Although some of these studies use the same or similar hypoxia models, the results vary substantially across studies, emphasizing the need to publish these validation studies prior or parallel to reporting qPCR results. We selected five candidate RGs based on published RG validation studies involving hypoxia (Table 1). Subsequently, we applied various validated methodological and statistical methods to evaluate the effects of anoxia and hypothermia on the expression stability of the candidate RGs. To evaluate the impact of the resulting RG selection, we assessed the expression levels of the Repressor Element 1-Silencing Transcription Factor (Rest), a gene that has been shown to be upregulated by hypoxic-ischemic injury in the peri-infarct cortex of adult rats following transient focal ischemia induced by middle cerebral artery occlusion (MCAO) [39]. Moreover, the proapoptotic gene BCL2/BCL-XL-associated death promoter (Bad), a gene that has been shown to be up-regulated by hypoxia in the MCAO rat model, was assessed [40]. This study provides evidence on the limitations of the current most used algorithms employed for the selection of stable RGs.

Ethical statement
Sprague-Dawley albino rats with genetic quality and sanitary certification from the animal facility of our Institution were used following the international rules and guidelines of the Federation of European Laboratory Animal Science Associations (FELASA). Animals were kept under standard laboratory conditions, with light/dark cycles of 12/12 h. Standard rat chow and water were given ad libitum. During housing, animals were monitored twice daily for health status. No adverse events were observed. All the procedures concerning the animal manipulation and treatment were performed according to the Guide of Animal laboratory Care (revisited in 1996) by the National Institute of Health Guide for the Care and Use of Laboratory Animals (Publications No. 80-23). The animal model described below has been approved by the Ethical Committee of CICUAL: "Comité Institucional para el Uso y Cuidado de Animales de Laboratorio" (Resolution N˚2079/07), Facultad de Medicina, Universidad de Buenos Aires, Argentina. All sections of this report adhere to the ARRIVE Guidelines for reporting animal research [41]. A completed ARRIVE guidelines checklist is included in S1 Checklist. Severe acute perinatal asphyxia was induced to the term fetuses using a model of hypoxiaischemia as described previously [26][27][28]. In the study, n refers to number of animals. In total twenty-four male rats were used 24/24. Four groups of 6 rats each were studied. The first group of offspring studied consisted of normally delivered naive pups that were used as controls (CTL; n = 6). After vaginal delivery of the first pup, pregnant dams were euthanized by decapitation and immediately hysterectomized. All full-term fetuses, still inside the uterus, were subjected to asphyxia by transient immersion of both uterine horns in a saline bath for 20 min at either 37˚C (perinatal asphyxia in normothermia, PA, n = 6) or 15˚C (perinatal asphyxia in hypothermia, [HYPPA]; n = 6). After asphyxia, the uterine horns were opened, pups were removed, dried of delivery fluids, and stimulated to breathe, and their umbilical cords were ligated. After recovery, one group of PA animals was placed on a cooling pad at 8˚C for 15 minutes for hypothermic treatment (PAHYP, n = 6), while hypothermic control animals (HYP, n = 6) received the same treatment. After 15 minutes of exposure to the cold environment, the core temperature of the newborns was measured with a rectal probe (mean temperature: 20.1˚C; n = 8).

Hypoxic-ischemic injury animal model
The pups were subsequently placed under a heating lamp for recovery after which they were and placed with a surrogate mother which had delivered normally within the 24 h before the experiments. Litters of 8 pups were maintained with each surrogate mother. Time of asphyxia was measured as the time elapsed from the hysterectomy up to the recovery from the water bath. To minimize individual variability groups were formed with litters from at least two different dams and only pups that adjusted to the following parameters were included: 1. Occipitocaudal length > 41mm, 2. Weight > 5g. Animals were sacrificed by decapitation 24 h post-treatment.

Total RNA extraction and reverse transcription cDNA synthesis
The brain cortex was isolated, snap-frozen in liquid nitrogen, ground into powder with pestle and mortar cooled in liquid nitrogen and then stored at −80˚C. Total RNA was isolated from about 80 mg tissue powder using TRIzol (Invitrogen Life Technologies, USA) following the manufacturer's instructions. The residual DNA was removed by the TURBO DNA free kit (Ambion Inc., UK). Yield and purity of RNA were determined by the NanoDrop ND-1000 spectrophotometer (Nanodrop Technologies, USA). RNA samples with an absorbance ratio OD 260/280 between 1.9-2.2 and OD 260/230 greater than 2.0 were used for further analysis. RNA integrity was assessed using agarose gel electrophoresis. One microgram of RNA from each sample was reverse transcribed using the High Capacity cDNA Reverse Transcription Kit (Applied Biosystems) according to the manufacturer's instructions. cDNA was stored at −20˚C for future use. For qPCR analysis, each cDNA sample was diluted 20 times with nuclease-free water.

Real-time PCR
Real-time PCRs were conducted using the LightCycler 480 Multiwell Plate 96 (Roche, Mannheim, Germany) containing 1μM of each primer. For each reaction, the 20 μl mixture contained 1 μl of diluted cDNA, 5 pmol each of the forward and reverse primers, and 10 μl 2 × SensiMix SYBR No-ROX Kit (Bioline, UK). The amplification program was as follows: 95˚C for 30 sec, 40 cycles at 95˚C for 15 sec, and 60˚C for 15 sec, and 72˚C for 15 sec. After amplification, a thermal denaturing cycle was conducted to derive the dissociation curve of the PCR product to verify amplification specificity. Reactions for each sample were carried out in triplicate. qPCR efficiencies in the exponential phase were calculated for each primer pair using standard curves (5 ten-fold serial dilutions of pooled cDNA that included equal amounts from the samples set). The mean threshold cycle (Ct) values for each serial dilution were plotted against the logarithm of the cDNA dilution factor and calculated according to the equation E = 10(−1/slope) × 100, where the slope is the gradient of the linear regression line.

Reference gene selection
Based on their common usage as endogenous control genes in previous studies (Table 1), five candidate RGs were analyzed, i.e., Actb, Pgk1, Gapdh, Sdha, Rnu6b. These genes represent commonly used endogenous control genes chosen from the relevant literature and have been previously validated in rat, mouse and human brain tissues exposed to hypoxia. The selected RGs belong to different molecular pathways to minimize the risk of co-regulation between genes. The primers were designed from nucleotide sequences identified using NCBI BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi). Rnu6b TaqMan MicroRNA Assay (Rnu6b) was commercially available (Thermo Fisher Scientific, Product number: 4427975-001093). All other primers were ordered from Thermo Fisher Scientific with their certificates of analysis. The primer characteristics of nominated RGs are listed in Table 2. The primer sequences (5´-3´) of the target genes were as follows: Rest;-F, AACTCACACAGGAGAACGCC-R, GAGGTTTAGGCCCGTTGTGA. Bad;-F, GCCCTAGGCTTGAGGAAGTC-R, CAAACTCTGGGATCTGGAACA.

Analysis of expression stability using multiple statistical approaches
To assess the stability of candidate RGs, five statistical methods, each with unique characteristics, were used: GeNorm, BestKeeper, NormFinder, Coefficient of Variation analysis, and the comparative ΔCt method. Ct values were converted to non-normalized relative quantities according to the formula: 2−ΔCt. CV analysis, GeNorm and NormFinder calculations are based on these converted quantities, whereas BestKeeper and the ΔCt method make use of raw Cq values. To obtain a integrated ranking, we used the workflows as described in more detail previously [17,21].

Impact of selection of RGs on gene expression normalization
The impact of RG selection on gene expression quantification was assessed via examining the expression of Rest and Bad. The relative expression profiles of Rest and Bad were determined and normalized with all tested RGs. Relative fold changes in gene expression were calculated using the DDCt and Pfaffl methods. Data was expressed as mean ± standard error of the mean (SEM) from six independent samples/group with triple qPCR reactions. One-way analysis of variance (ANOVA) test was applied to analyze significant differences between conditions for each house-keeping gene. Differences were reported as statistically significant when p<0.05. GraphPad Prism 6 (GraphPad Software, USA) was used for statistical procedures and graph plotting.

qPCR
Pilot assays were performed to optimize cDNA and primer quantities. A total of 0.9 mg of RNA that was previously treated with DNase was used for the reverse transcription reaction in a total volume of 40ml. One microliter of the resulting cDNA was used for the qPCR reaction. Each gene amplification was analyzed, and a melting curve analysis was performed, showing a single peak indicating the temperature of dissociation. Efficiencies are shown in Table 2. All Ct values were between 17.0 and 33.0.

Coefficient of variation analysis
We calculated the raw expression profiles of RGs as changes of Ct values across groups and ranked the gene stability according to the CV. The CV estimates the variation of a gene across all samples, therefore, a lower CV value indicates higher stability (Fig 1). This analysis on the cortical samples revealed Gapdh as the most stable RG, and Actb as the least stable RG. This method however does not consider potential expression differences between different conditions; hence, a CV analysis alone cannot determine the best set of RGs.
To assess if the mean mRNA levels across groups were significantly different from one another, a One-way ANOVA was used. The results demonstrated that variations in the Ct values for the different treatments were different for all candidate RGs. Four of the five genes tested (Sdha, Rnu6b, Pgk1, Actb) showed significant variation in mRNA levels when comparing different conditions (Fig 2). Only Gapdh showed no significant changes. These results, making use of the raw expression profiles of the RGs, suggest that the various experimental conditions were associated with changes in RG expression levels that, as such, could skew the normalized profile of target genes. As a result, RG selection without accounting for potential expression differences between conditions is accompanied by a significant bias in the results and their interpretation. Hence, it is of utmost importance to validate the stability of RGs prior to normalization in gene expression studies. Next, to identify the optimal RG(s), the expression stability of candidate RGs was analyzed using four well known statistical methods (Table 3). First, a GeNorm analysis was performed on all five candidate genes. GeNorm calculates a stability value (M) based on pairwise variation of every two genes. In our analysis, except for Rnu6b, which presented the highest M-value (M = 1.923), all of the other candidate RGs presented M-values lower than 1.5, which is considered to be the cut-off for suitability [42]. Based on this analysis for the neonatal cortex, the most stable RGs were Pgk1 and Actb. This is in contrast to the CV analysis, that showed those genes as the least stable ones (higher CV), and to the expression profiles that showed inter-group differences.
NormFinder calculates the stability score (S) based on the inter-and intra-group variation. However, it has been reported that including genes with high overall variation can affect the stability ranking of all genes with this method [17]. Actb, Sdha and Pgk1 were the most stable RGs, presented stability values lower than 0.3. Gapdh (SV = 1.736) and Rnu6b (SV = 3.17) were the least stable.
BestKeeper uses the cycle threshold (Ct) values to calculate a standard deviation (SD), coefficient of variance (CV), and Pearson correlation coefficient (r) for each gene. Lower SD and CV values indicate more stable gene expression, and genes that exhibit a SD in Ct values above 1.0 should be eliminated and regarded as unreliable controls. Then, the remaining RG are ranked according to r values, with a higher r value indicating more stable gene expression. None of the genes analyzed were excluded for having SD above 1. The most stable RG was Pgk1 (r = 0.825), while Rnu6b was considered the least stable gene (r = 0.106). The ranking obtained from this analysis was the same as the one obtained with GeNorm.
Using the Δ-Ct method, the ranking was similar to previous rankings. The most stable RGs were Pgk1 (Av. SD = 1.41) and Sdha (Av.SD = 1.45), and the least stable Rnu6b (Av.SD = 3.23). The overall ranking depicted in Table 3 was based on the geometric mean of the previous gene ranks. This ranking indicates that for this tissue and treatment, the most stable RG was Pgk1.

Impact of RG selection on target gene expression profiles
The impact of RG selection on gene expression quantification was assessed by examining the expression of Rest and Bad. These genes have shown to be influenced by hypoxia and hypothermia. Five gene expression normalizing strategies were used to select the least and most stable RGs, and the best combination of two genes, Actb/Pgk1 (Fig 3). Expression values were calculated relative to expression in control animals, using both the ΔΔCt method (Livak & Schmittgen, 2001) and the primer efficiency method (Pfaffl, 2001 , Fig 3). Results were similar using Livak or Pfaffl methods. As expected, even when the general pattern of target gene expression was similar for most of the RGs across treatments, target gene expression levels were different depending on the RG used for normalization causing differences in the significance level of the expression patterns.

Discussion
The selection of RGs in qPCR experiments has an enormous impact on the reliability and interpretation of results in gene expression studies making it a crucial, yet often understated, process. It is now recognized that normalization of qPCR results against a single RG is likely to be inadequate and that normalization against a panel of RGs containing at least three stable RGs is preferred. However, for most of the RGs used in published qPCR studies, no thorough investigation of their variation over experimental conditions has been performed and/or reported. Many researchers continue to use a single, unvalidated RG to normalize data. The majority of studies where assessment of the RGs' stability is included make use of statistical tools like GeNorm, BestKeeper, NormFinder, CV analysis, and the comparative ΔCt method, and the results usually differ depending on the method used, making the choice of the validation method a critical step in qPCR assays. In our study, when using Geomean, Pgk1 was the most stable gene across treatments, while U6 and Gapdh were ranked as the most variable. This is in stark contrast to the CV% Analysis and intergroup ANOVA Ct variations that indicated that Gapdh was the most stable gene among groups, and Actb the least stable.
Notably, using any of these methods alone is not sufficient in obtaining bias-free results. Often, stability validation studies rank genes using Geomean, a ranking obtained from the mean rank of the statistical tools used. This method does not take into account the limitations of each algorithm separately, which is why it is increasingly considered an erroneous approach when validating RGs. This makes the identification of the best RGs very unwieldy. Using the same statistical methods, new approaches have been proposed, such as the "Integrated approach" introduced by Sundaram and colleagues [17] that has been shown to provide a more accurate estimate of RG stability. Applying the same approach in a distinct experimental paradigm, the present study underscores the validity and importance of such an integrated approach.
Although we analyzed a small set of candidate RGs, we found substantial differences in the stability rankings obtained with the different methodologies, and the associated bias was reflected in our target gene quantification. Our study emphasizes the necessity of validating RGs previous to assessing target gene qPCR data and the importance of choosing the right set of statistical methods for doing so. Such an approach would lead to more accurate and reproducible expression assessments. Supporting information S1 Checklist. (PDF)