Selection and validation of reference genes for normalisation of gene expression in ischaemic and toxicological studies in kidney disease

Normalisation to standard reference gene(s) is essential for quantitative real-time polymerase chain reaction (RT-qPCR) to obtain reproducible and comparable results of a gene of interest (GOI) between subjects and under varying experimental conditions. There is limited evidence to support selection of the commonly used reference genes in rat ischaemic and toxicological kidney models. Employing these models, we determined the most stable reference genes by comparing 4 standard methods (NormFinder, qBase+, BestKeeper and comparative ΔCq) and developed a new 3-way linear mixed-effects model for evaluation of reference gene stability. This new technique utilises the intra-class correlation coefficient as the stability measure for multiple continuous and categorical covariates when determining the optimum normalisation factor. The model also determines confidence intervals for each candidate normalisation gene to facilitate selection and allow sample size calculation for designing experiments to identify reference genes. Of the 10 candidate reference genes tested, the geometric mean of polyadenylate-binding nuclear protein 1 (PABPN1) and beta-actin (ACTB) was the most stable reference combination. In contrast, commonly used ribosomal 18S and glyceraldehyde 3-phosphate dehydrogenase (GAPDH) were the most unstable. We compared the use of PABPN1×ACTB and 2 commonly used genes 18S and GAPDH on the expression of 4 genes of interest know to vary after renal injury and expressed by different kidney cell types (KIM-1, HIF1α, TGFβ1 and PECAM1). The less stable reference genes gave varying patterns of GOI expression in contrast to the use of the least unstable reference PABPN1×ACTB combination; this improved detection of differences in gene expression between experimental groups. Reduced within-group variation of the now more accurately normalised GOI may allow for reduced experimental group size particularly for comparison between various models. This objective selection of stable reference genes increased the reliability of comparisons within and between experimental groups.

Introduction Quantitative real-time polymerase chain reaction (RT-qPCR) is a widely used, sensitive and specific method for detection and quantification of messenger RNA (mRNA) expression over a large dynamic range for validation of selected genes identified by other techniques (e.g. RNA-seq) [1] [2]. For rare mRNA species, RT-qPCR may be the only practical way to quantitate gene expression. A critical aspect of RT-qPCR assessment of gene expression is controlling for the amount of starting material, i.e., normalisation of gene expression to an endogenous gene not affected by the experimental conditions. Normalisation is critical for comparison of samples from different sources that may contain varying quantities of mRNA [3]. Differences in the quantity of mRNA may result from differences in extraction and isolation efficiency of mRNA species. Variation in input quantity to RT reactions and inefficiency in complimentary deoxyribose nucleic acid (cDNA) synthesis may also introduce experimental variation [4]. For this study, these sources of variation were termed 'experimental error'. Varying normalisation strategies have been used to minimise the intrinsic variability resulting from such sources. Techniques include normalisation to the initial total RNA, addition of known quantities of cDNA, and the use of internal reference genes [4,5]. Endogenous reference genes are regarded as optimal, since detection and amplification of reference genes and target genes occur under the same conditions [6]. Nevertheless, critical prerequisites for an ideal reference gene are a broad dynamic range and constant level of expression compared to the gene of interest (GOI) under the same experimental conditions and, ideally, under different experimental conditions [7,8]. Selection of appropriate reference genes is critical to reducing experimental error. However, there is no consensus regarding the best reference genes based on rat strain, tissue type and injury model with a variety of genes suggested for various organs or regions of various organs [9][10][11][12][13][14][15][16].
Housekeeping genes (HKGs) are ideal candidate reference genes as these are constitutive genes required for basic cellular function and expressed in most cells under normal physiological conditions [9]. However, HKGs may be affected by experimental conditions as many represent metabolic pathways or structural genes that may be altered by experimental interventions [10]. Commonly employed HKGs for RT-qPCR include beta actin (ACTB), glyceraldehyde 3-phosphate dehydrogenase (GAPDH) and 18S ribosomal RNA (18S). The choice of these mRNA species stems from use in traditional non-or semi-quantitative methods such as Northern blotting and have often not been validated for RT-qPCR across diverse experimental conditions [8]. A growing body of evidence suggests that, both in vivo and in vitro, there may be considerable fluctuation under varying experimental conditions making them unsuitable as reference genes for RT-qPCR [2,10,12,[17][18][19][20][21]. For example, GAPDH varies under hypoxic stress [13] and 18S varies under both toxic and hypoxic stress [2]. In addition, reference genes may be tissue specific, so that ideal HKGs may differ between tissues and experimental conditions [22,23]. Currently, normalisation against the geometric mean of the most stable reference genes is regarded as the best strategy for error reduction in raw qPCR data [24].
A number of statistical algorithms such as Normfinder [11], geNorm (14, improved to qBase+), BestKeeper [25,26] and comparative delta quantification cycle (ΔCq) approach [27] were devised to evaluate stable reference gene/s for given experimental conditions. While these represent an improvement over arbitrary selection, there are limitations to these algorithms. They do not allow for randomly missing data and do not account for multiple systemic effects (experimental conditions), systemic effects related to continuous variables, or for reference gene-systemic effect interactions. In addition, stability values are not accompanied by confidence intervals (CIs). This precludes determination of minimal sample size, which could reduce error in reference gene selection. It also prevents ranking of reference genes with close stability values. In order to address these limitations a 3-way linear mixed-effects model (LMM) was employed based on the intra class correlation coefficient (ICC) [28].
The rat is one of the most commonly used models in the study of renal disease [29]. Ischaemia-reperfusion injury, IRI, and toxic injury represent the majority of causes of acute kidney injury [30]. However, pre-clinical animal studies have been plagued by inconsistent, poorly reproducible results. Gene expression studies often guide pre-clinical studies and direct further investigation. A consistent method of normalising GOI expression between experimental groups in time and across treatment strategies is critical. Thus, validated stable endogenous reference genes are needed to normalise RT-qPCR data for rat kidney and other studies.
There has been an increase in the number of studies evaluating the stability of gene expression in various tissues, however, few have studied the reference genes in rodent kidney. Validation studies of reference genes in rodents have been performed mainly in liver and heart injury models, usually for ischaemic and toxic injury [2,12,13,31]. We selected 10 candidate genes for analysis based on an exhaustive literature review of rat studies performed over the last 20 years to evaluate the suitability of reference genes. Table 1 summarises the most pertinent of these rat studies and compares these with human studies.
Experimental conditions which overlapped with the present study were weighted to a greater extent. Hence studies concerned with rat and kidney models were weighted to a greater extent than those of IRI which in turn were weighted to a greater extent than the rest of the studies. An arbitrary scale out of 5 was used to weigh each study when selecting 10 candidate reference genes for the present study based on this criteria and is detailed in Table 1. Ppia and Polr2a were excluded from analysis as although they have been identified as stable in myocardial and carotid body IRI models [13,38], the aforementioned Ppia and Polr2a genes were shown to be the most unstable in kidney tissue in a study analysing reference genes in obese zucker rats [9].
The primary aim was to determine the most stable reference genes for normalisation of gene expression studies in ischaemic and toxic rat renal kidney injury models. A second aim was to compare the results of the current standard techniques with that of a 3-way LMM and to estimate the sample size for determination of accurate qPCR results for given combinations of reference genes with known ICCs. Kidney injury molecule (KIM)-1, Hypoxia Inducible Factor 1 Subunit Alpha (HIF1α), Platelet and Endothelial Cell Adhesion Molecule 1 (PECAM1), Transforming Growth Factor Beta 1 (TGFβ1), were assayed as representative GOIs to allow statistical comparisons.

Descriptive statistics and gene expression
Descriptive statistics of the gene expression of the candidate reference genes is in Table 2 and plate specific efficiencies and correction factors are listed below in Table 3. As expected 18S showed the highest expression with an arithmetic and a geometric mean Cq of 15 while the lowest expression was YWHAG. The maximum standard deviation of Cq values was found for 18S. ACTB and PABPN1 had the lowest dispersion of Cq values from the mean. Reference gene Cq distributions were normal in each case and D'Agostino-Pearson test p > 0.05. A box plot of reference gene RQ values for a representative experimental run is presented in Fig 1. Amplification efficiency values were > 1.85 for all reference genes with the highest efficiency being 2. Mean efficiencies for the same reference gene were similar ±0.05 for all 3 experimental runs.

Reference gene stability
Reference gene ranking. Normfinder, qBase+, BestKeeper and comparative ΔCq approaches were used to assess the initial stability of candidate reference genes. The candidate reference genes are ranked in Table 4 with corresponding stability values in descending order with the most stable reference gene at the top. Normfinder ranked HMBS as the most stable (0.23) and PABPN1 and YWHAG as second (0.36). For Normfinder, lower values have greater stability. For a 2 gene stability factor, Normfinder found HMBS and YWHAG as the best two genes to construct a normalisation factor of 0.187. The qBase+ algorithm ranked ACTB,  comparative ΔCq approach ranked HMBS, PABPN1 and ACTB as the most stable. All techniques ranked GAPDH and 18S as the least stable.
As the different algorithms produced slightly differing rankings, we determined a consensus ranking using weighted rank aggregation. As there were 10 reference genes, we initially used the brute force approach with Spearman footrule distance applied to generate all possible rankings with minimum value of objective function. A second weighted consensus ranking was obtained with the cross-entropy Monte Carlo algorithm (in RStudio) (S2 File). Both brute force and cross-entropy Monte-Carlo approaches gave identical results (Table 5). PABPN1 and HMBS were most stable using these methods, while GAPDH and 18S were the least stable.

3-way linear mixed-effects model
Ideal normalisation factors for animal studies employ combinations of at least 2 reference genes [39]. ICC values ranged from 0 to 1, with higher values indicating greater stability. ICC estimates with a lower 95% CI limit less than 0.5, between 0.5 and 0.75, between 0.75 and 0.9, and greater than 0.90 are indicative of poor, moderate, good, and excellent reliability [40]. A normalisation factor constructed from ACTB and PABPN1 was the most stable with an ICC of 0.86 (95% CI, 0.65-1), indicating moderate to good reliability. The best 3-reference gene normalisation factor was ACTB, PABPN1 and YWHAG with an ICC of 0.77 (95% CI, 0.55-0.99), but with a lower limit less than ACTB×PABPN1 (Table 6). Given the experimental conditions, a minimum sample size of 106 would have been required to improve the precision of the ICC estimate and to narrow the width of the 95% CI to a width of 0.1 (S3 File, section 3). S4 File in Table 1 in Excel™ S4 File represents a tabulation of minimal sample sizes necessary to arrive at two to five 'true reference genes' with ICC ranging between 0.7 and 0.9 and two-sided confidence interval width = 0.1 and 0.2. 'True reference genes' are the reference genes determined to be the most stable for the specific experiment instead of the candidate reference genes. For an example, in the present experiment, there are 10 candidate reference genes but only two 'true reference genes' as determined by the 3-way LMM method. The accompanying Excel™ calculator tool can also provide the researcher with the minimum sample sizes necessary for a desired ICC and confidence interval width. Relevance of selecting a particular normalisation factor. Four commonly queried GOIs in the IRI and toxicological renal injury literature (KIM-1, PECAM1, HIF1α and TGFβ1) were analysed to test the impact of choice of normalisation factors/reference genes. Depending on reference gene/normalisation factor used, significant discrepancies in GOI fold differences between experimental groups were found in both ischaemic and toxicological injury models.
KIM-1 is a transmembrane glycoprotein expressed in proximal tubules and upregulated following AKI. Its ectodomain appears in urine after cisplatin toxicity, IRI and after feeding 0.25% adenine [41,42]. KIM-1 was selected to demonstrate the effect of selection of reference genes on normalisation of GOI since its expression varies with different acute challenges. PECAM1 is an endothelial cell junction molecule also expressed to different degrees on leukocyte sub-types and platelets [43]. Paralleling peritubular capillary rarefaction, PECAM1 is reduced in kidney cortex and outer strip of outer medulla in rodent IRI models [44]. HIF1α is a master regulator of cell responses to hypoxia, expressed in both proximal and distal tubular cells and leads to expression of several genes involved in adaptation to decreased oxygen availability [45]. TGFβ1 is a modulator of fibrosis in many models of tissue injury and is upregulated in proximal tubular epithelial cells after renal IRI [46].
GOI expression was assayed under the same conditions as the candidate reference genes and the normalised relative quotients (NRQs) were obtained by normalising the GOIs against reference genes/reference gene combinations (Figs 2-9). Tables 7, 9, 11 and 13 summarises the statistical methods used to analyse the respectively KIM-1, PECAM1, HIF1α and TGFβ1 gene expression and Tables 8, 10, 12 and 14 summarises the significance of experimental treatment  HMBS×PABPN1 and ACTB×PABPN1. However, as expected from stable normalisation factors, the pattern of fold differences between various experimental treatment groups were similar when KIM-1, PECAM1 and HIF1α were normalised against HMBS×PABPN1 (Figs 2c, 4c and 6c) or ACTB×PABPN1 (Figs 2d, 4d and 6d). KIM-1 and PECAM1 gene expression patterns were more consistent with histological injury when normalised against the stable normalisation factors HMBS×PABPN1 or ACTB×PABPN1 compared to either 18S or GAPDH.
For IRI, the fold expression of KIM-1, PECAM1, HIF1α and TGFβ1 observed among treatment groups depended on the normalisation factor employed (Figs 3, 5, 7 and 9 respectively). As expected, KIM1 and TGFβ1 showed a pattern of upregulation with increasing injury when normalised against the stable ACTB×PABPN1 (Figs 3d and 9d). PECAM1 gene expression tended to decrease with severity of renal injury and normalisation against GAPDH (Fig 5b) along with more stable HMBS×PABPN1 (Fig 5c) and ACTB×PABPN1 (Fig 5d) produced this pattern. HIF1α which is induced by reperfusion and oxygen availability is downregulated 24 hours after insult [47]. Tissues from all treatment groups in the present study were obtained either on day 7 or 14 after cisplatin or IRI; this might explain the discrepant HIF1α results irrespective of reference gene or reference gene combination (Figs 6a-6d, 7a-7d).
2-way ANOVA analysis of KIM-1, PECAM1, HIF-1α and TGFβ1 gene expression results (NRQs) confirmed superiority of using the geometric mean of 2 reference genes rather than one (p < 0.05) (S5 File, S5 Figs 1-4 in S5 file and S4 File in Table 1). For example, normalisation against PABPN1 did not demonstrate a difference in gene expression between control and IRI on sCKD (acute) or IRI on sCKD (chronic) groups, in contrast to normalisation against the geometric mean of ACTB×PABPN1.

Discussion
This is the first study to our knowledge that compares the most reliable reference genes for normalisation for RT-qPCR for transcript GOI levels in rat kidneys under ischaemic and toxicological conditions using multiple statistical approaches [48]. HMBS and PABPN1 were among the top 3 most stable genes according to Normfinder, qBase+, BestKeeper and comparative ΔCq methods. In contrast, the most commonly used reference genes in rat studies, 18S and GAPDH, were the least stable of the 10 genes assessed by all algorithms. The robustness of these results was highlighted by analysis using brute force and Monte Carlo cross entropy weighted aggregation that produced identical rankings. After weighted aggregation, PABPN1, HMBS and ACTB were ranked the most reliable reference genes. In contrast, in mouse models of renal cystic disease, GAPDH, peptidylprolyl isomerase A and phosphoglycerate kinase have been ranked highest, highlighting the need for model, species and strain specific housekeeping genes [31].

PLOS ONE
Reference gene selection was further refined and allowance made for experimental group variation not accounted for by the current models by development of a 3-way LMM. This model defined confidence intervals for stability values and group sizes and was superior in selecting optimal reference genes. In addition, it accommodated multiple continuous and categorical variables with sample random effects, gene fixed effects, systematic effects, and gene by systematic effect interaction. These are major advantages of using the 3-way LMM. Reference gene combinations can help reduce the measurement errors in single reference gene and improve reliability in the gene normalisation process, and the 3-way LMM provides statistical inference, including p-value and confidence intervals for stability. Statistical inference allows the selected reference gene combination to be generalizable to other experiments. geNorm and its' newer version qbase+, selects reference genes by using the standard deviation of respective gene expressions as a stability measure. It calculates the variation in the logtransformed gene expression ratio 'M', between a candidate reference gene with respect to all other reference genes in pairwise comparisons across all samples [14,27]. A higher 'M' value represents greater variation in gene expression and less stability. qbase+ follows a step-down approach to remove genes with the highest M-value step by step and recalculates M-values for the remaining genes. There is no objective cut-off point to determine when to stop the process. This approach means that the algorithm tends to select the most correlated genes rather than the candidate reference genes with the least variable expression. qbase+ cannot take into account covariates such as systemic effects or interactions between genes and systematic effects in analysis.
To determine the optimal number of reference genes required to construct the most stable normalisation factor, qbase+ calculates the average variation between log-transformed expression ratios of sequential normalisation factors (NF n /NF n+1 ). The authors showed that a

GOI = gene of interest
https://doi.org/10.1371/journal.pone.0233109.t013 pairwise variation (NF n /NF n+1 ) < 0.15 is unlikely to be improved with the inclusion of additional reference genes [14]. The linear mixed model (LMM) searches for the optimum number and gene combinations to construct the most stable normalisation factor. The linear mixed model (LMM) algorithm stops if the lower bound of 95% confidence interval of ICC does not increase for higher-order gene combinations. Using the lower bound of 95% confidence interval for ICC takes the variation of the stability measure into account and avoids selection of genes with ICC estimated so imprecisely that the researcher cannot be confident of a high value. Sample size calculation is based on the accuracy of the stability measure to aid optimal experimental design. The minimum effective sample size calculations take into account the number of reference genes studied, study design, stability level and desired confidence interval. The aim of the present study was to determine the most stable normalisation factor, i.e., the best combination of reference genes for a specific set of experimental conditions. GOIs will vary with experimental conditions and the correct sample size for the GOI must be determined independently. However, sample sizes in a gene expression study are dependent on both the effect size of the intervention (numerator) and stability of the reference genes utilised (denominator). The 3-way LMM, takes into account both continuous and categorical covariates and allows the minimal sample sizes required to determine this denominator with the least the error i.e. the most stable reference gene combination (normalisation factor) for a given experiment. As all statistical inferences were incorporated in the unified mixed-effects model, statistical inference for stability measures and systematic effects can be analyzed simultaneously. Furthermore, combining multiple statistical inferences in one model prevents inflation of Type I error. Calculation of the overall optimal sample size for a specific power (1-β) in a GOI experiment depends on the fold difference, i.e., the effect size of the specific GOI under the experimental conditions used (e.g., KIM-1) [49]. However, since GOI expression is shown as a ratio of the raw GOI data to the normalisation factor. Reduced error or variation in references genes reduces variation after GOI normalisation. Choosing a normalisation factor with an ICC of close to 0.9 (ACTB×PABPN1) increases the reliability of the group differences in GOI expression for a given experimental group size. Less reliable normalisation factors with lower ICC will have higher co-variates. If variance between groups is increased, larger experimental group sizes will be necessitated to maintain power and reliability of any experimental group differences. This can be further demonstrated by the following; essentially, the researcher is looking at the variance of the quotient of GOI/reference gene combination. The formula to calculate the variance of a product is complex and shown below [50].
Here, X = GOI, Y = 1/reference gene combination, μ x = mean of x and μ y = is mean of Y. If X and Y are not independent, the product of the variance is calculated as below: VarðX:YÞ ¼ CovðX 2 ; Y 2 Þ þ ððVarðXÞ þ m 2 x Þ:ðVarðYÞ þ m 2 y ÞÞ À ðCovðX; YÞ À m x m y Þ 2 ; However, if covariance were zero and GOI and HKG are independent i.e. not varying with each other in some way then: This value for Var(X.Y) is clearly going to be a greater value if there is any significant covariance. The lesser the ICC the greater the covariates hence a greater variance and standard deviation GOI/reference gene. The standard deviation of GOI/reference gene is used to calculate power and a reduced standard deviation results in potentially smaller group sizes required to detect difference between experimental groups.
Thus, it is clear that any covariance between the variables increases the variance of GOI/reference gene and hence a larger sample size will be required for the same level of significance in a power calculation for determination of significant differences in expression of GOIs between groups. However, calculation of a change in sample size based on the variance of the product of GOI and 1/reference gene is technically challenging unless the covariance values are known in advance for the experimental conditions under study.
Two further considerations are the relative abundance of reference gene expression and functional class. The most stable reference genes (HMBS, PABPN1 and ACTB) had similar levels of expression but the Cq values were almost an average 5 cycles lower than less stable genes such as GAPDH and TBP. While lower abundance may theoretically increase error due to lower fluorescence intensity, this did not occur. GAPDH and 18S were of higher abundance and ranked the lowest by all 4 algorithms. Identifying genes from different functional classes (HMBS, PABPN1 and ACTB) reduces the risk of selecting co-regulated genes.
Normalisation by the geometric mean of at least 2 reference genes has been strongly advocated due to potential for confounding [4,51,52]. The data confirm that the geometric mean of 2 genes was superior to a single reference gene. Most modern PCR machines can run 2 fluorescence detection channels simultaneously using a probe-primer system. While additional probes are more expensive, a single reaction with reduced risk of technical and experimental variation may ultimately be more time and cost efficient.
There are limitations to this study. The pattern of injury is not homogenous throughout the cortex and medulla under light microscopy; kidney samples that included both unaffected cortex and medulla do not take account of heterogeneity of injury or cell type [53,54]. Nevertheless, most gene expression studies in rat kidneys are performed on pooled tissue and use methodology consistent with ours. Consequently, the most stable reference genes in this study are likely to have external validity for toxic and ischaemic injury in rodent kidney. Note that protein expression does not necessarily correlate with gene expression for many reasons including variable effective translation and protein turnover [55][56][57].
Determination of gene expression stability requires evaluation within the context of treatment and tissue [10,52]. Major practical considerations in performing assessments of the best reference genes include cost, animal usage and tissue of interest. Combining the data set for analysis in our studies likely resulted in higher expression variability of the reference genes. This was reflected in the high sample size required to minimise the 95% confidence interval of even the most stable normalisation factor and highlights problems with variation due to the effect of covariates even with the most stable reference genes when treatment groups are pooled for evaluation.
Species and intervention type may also be important considerations. No studies have assessed the biological variability of reference genes in kidneys of different rat strains. In liver, 18S was the most stable reference gene for both Wistar and Zucker rats [38]. The SD rat strain was used in the present kidney studies. It is likely that both intervention and tissue type contribute to variability of gene expression. Cost and logistical difficulties likely limit academic laboratories from using multiple sets of reference genes to normalise RT-qPCR study data with identical animal species and developmental stages.

Conclusion
The results emphasise the need to determine stable reference genes and a geometric mean of 2 stable reference genes is superior to 1 for normalising a GOI. ACTB and PABPN1, validated as stable under multiple experimental conditions, provided optimal stability as reference genes. The 3-way LMM provided an effective method for identifying stable pairs of reference genes in any context. These techniques should reduce variance and increase reproducibility and reliability of pre-clinical studies.

Animals
Animal work was conducted in strict accordance with the Australian code for the care of animals for scientific purposes (National Health and Medical Research Council, 2013) and approved by the Animal Ethics Committee at the University of New South Wales (ACEC Approval 14\133A). All surgery was performed under isofluorane anaesthesia, and all efforts were made to minimise suffering. Rats were randomly allocated to the experimental treatment groups and subjected to nephrotoxic challenges (adenine or cisplatin) and/or ischaemia-reperfusion induced kidney injury as shown in Table 15 and S6 File (S6 Figs 1-3 in S6 File). Adverse events or deaths prior to endpoint were not observed. Animals were humanely euthanised under anaesthesia. The animal models, creatinine assays and histopathology are described in S6 and S7 Files [58].

Reference gene selection
Reference genes were selected from a literature search of rat studies (Table 16). Primers were designed using Beacon Designer (Palo Alto, California, USA). All primer pairs were intronspanning, non-homologous to other rat genes and avoided structural folding areas.

RT-qPCR
Specific RNA extraction and RT-qPCR methods are detailed in S8 File. Reference genes and KIM-1 were amplified in triplicate in all samples per experimental run. Triplicates were reassayed if there were missing values or if Cq differences were > 1 cycle within the triplicate. The maximum Cq value for reference genes was 34 cycles. Intra-experiment variation between technical replicates was < 1.2% and inter-experiment variation was < 1.5%. Samples were distributed over two 96-well plates in each experimental run and hence a factor correction for inter-plate variation was performed for each run [59]. Additional details of inter-plate variation correction are in S8 File. Plate specific amplification efficiencies were generally similar (± 0.05) for a given reference gene.

Statistical analyses
For expression stability analysis, plate corrected triplicate averages (i.e., raw Cq values) or relative quantities (RQs) were used (S9 File). Expression stability was analysed using NormFinder, qBase+, BestKeeper and comparative ΔCq statistical algorithms to determine the most stable reference gene or gene pairs for normalisation. The 4 algorithms produced slightly varying results. Hence a universal rank was constructed by using brute force and Monte Carlo cross entropy methods (S2 File) utilising RankAggreg package in RStudio [60]. The workflow describing input data and expression stability analysis for these statistical algorithms is detailed in S8 File. Linear mixed-effects model. To refine the reference gene selection process further, a 3-way LMM was developed that used the ICC of gene expression levels as the stability measure to rank reference genes with low residual variation within the intervention group and minimal between group variation. The 3-way LMM accommodated nested experimental designs, estimated variance components for determination of confidence intervals of stability values and provided minimum effective sample sizes for selection of reference genes for future studies [27].
The 3-way LMM was constructed with samples nested in experimental treatment groups. Systemic effects included gene expression variation due to experimental intervention and effects due to interactions with the reference genes. The reference gene combination with an ICC (ρ) with a 95% CI that has the highest lower limit is selected as the most stable normalisation factor. This provides an algorithm to determine the total sample size necessary to select reference genes with least uncertainty [28]. The minimum effective sample size necessary can be calculated to accurately estimate ICC of a given set of reference genes with desired precision (i.e. width of the [100(1-α)]% CI) [28,61]. A detailed workflow of the 3-way LMM and the formulas for effective sample size calculation are described in S3 File. 3-way LMM calculations were performed using SAS software (version 9.4, 2017, SAS Institute Inc, Cary, North Carolina, USA).
Normalised relative quotients. NRQs for GOIs was calculated as described by Hellemans, et al [26] and this is detailed in S10 File, formulas 11-13. Normality of the NRQ data set for each GOI and reference gene/ reference gene combination was confirmed by D'Agostino-Pearson test. The mean amplification efficiency and RQs for all GOIs are listed in S11 File (S11 File, Table 1 in S11 File).
Further statistical analysis. Gene expression results are presented as NRQ and expressed as means ± standard deviations. Comparisons were made using 1-way or 2-way ordinary ANOVA and Tukey's or Bonferroni's multiple comparison tests (post hoc) when the distribution of the variables was normal. The Kruskal Wallis test statistic was calculated and Dunnett's multiple comparison test was performed (post hoc) for non-normally distributed data. The D'Agostino-Pearson test was used to assess normality.