^{1}

^{2}

^{3}

^{2}

^{1}

^{2}

^{4}

^{4}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: UD UZ MT. Performed the experiments: MT JA. Analyzed the data: RF SV. Contributed reagents/materials/analysis tools: UZ UD RF. Wrote the paper: RF SV JA UZ OS. Conceptual work: OS RF.

Gene expression analysis is an essential part of biological and medical investigations. Quantitative real-time PCR (qPCR) is characterized with excellent sensitivity, dynamic range, reproducibility and is still regarded to be the _{t}, respectively. They rely on one or several stable reference genes (RGs) for normalization, thus potentially causing biased results. We therefore applied multivariable regression with a tailored error model to overcome the necessity of stable RGs.

We developed a RG independent data normalization approach based on a tailored _{t} values within samples of similarly treated groups are equal. Performance of LEMming was evaluated in three data sets with different stability patterns of RGs and compared to the results of

If RGs are coexpressed but are not independent of the experimental conditions the stability criteria based on inter- and intragroup variation fail. The linear error model developed, LEMming, overcomes the dependency of using RGs for parallel qPCR measurements, besides resolving biases of both technical and biological nature in qPCR. However, to distinguish systematic errors per treated group from a global treatment effect an additional measurement is needed. Quantification of total cDNA content per sample helps to identify systematic errors.

Fluorescence-based quantitative real-time PCR (qPCR) is the commonly accepted gold standard to quantitate the amount of mRNA transcripts in biological samples. Benefits of this procedure over conventional methods for measuring RNA include its sensitivity, large dynamic range and its potential for high throughput as well as accurate quantification [

Depending on the number of measured gene transcripts and samples the available normalization strategies can be roughly divided into three groups, (i) knowledge-driven approaches, (ii) data-driven approaches and (iii) modeling approaches. Knowledge-driven approaches are usually applied to small data sets that measure only few gene transcripts in a limited number of samples. In such cases, preselection of internal standards is used for data normalization. Usually, such an internal standard is represented by a number of reference genes (RGs) known to be stably expressed under the different experimental conditions. While a single RG is sufficient for data normalization under ideal assumptions, according to MIQE guidlines [

The second group of approaches encloses the data-driven normalization methods such as quantile normalization and rank-invariant set normalization. These methods where initially developed for the normalization of high-throughput gene expression data such as microarrays. In their study, Mar et al. [

The third group of methods is formed by modeling approaches, i.e., approaches that model the gene expression values as a composition of the true gene expression and various effects of technical and biological nature. For example Steibel et al. [

The use of RGs in a knowledge-driven approach is widely spread for normalization and is implemented in many software tools for qPCR data analysis. Software tools are reviewed by Pabinger et al [_{t} method described by Livak and Schmittgen [_{t} method relies on a single RG. Current tools like ^{+}_{n}) using the geometric mean of expression levels of

Bas et al [

Numerous studies [

To solve these problems we introduce a new modeling approach based on multivariable regression, which is specialized for the normalization of parallel qPCR data. Spurgeon et al. [

We selected three data sets with different stability patterns of RGs reflecting different experimental situations:

^{5} cells/well density in high-glucose DMEM medium (11965-092, Life Technologies), containing Glutamine and 10% FCS. 24h later, 100 μM WY14,643—4-Chloro-6-(2,3-xylidino)-2-pyrimidinylthioacetic acid (C7081, Sigma) or solvent control, DMSO—Dimethyl sulfoxide (D9170, Sigma) were added to 4 wells each (DMSO for the samples A, B, C, and D; WY14,643 for the samples E, F, G, and H). After 48h of treatment, all wells were lysed using 300 μL of RLT buffer from RNeasy kit (74104, Qiagen) and RNA was immediately isolated following manufacture’s instructions. Next, three independent RT reactions have been performed from each sample, resulting in 3 cDNAs for each sample (i.e. A1, A2, A3, etc). Finally, 24 generated cDNAs were run in technical duplicates on a 48×48 microfluidic dynamic array. To sum up, DS1 has four biological replicates, three technical replicates for each cDNA conversion step and two replicates for the qPCR step per condition. DS1 is available in

_{2}, 2 μL of 2.5 nM dNTP-Mix, 0.5 μL of 50 μM random hexamers, 0.2 μL of RNase Inhibitor, 0.25 μL of 50 U/μL Multiscribe reverse transcriptase and 1.85 μL RNase-free water. All reagents were purchased from Applied Biosystems (TaqMan Reverse Transcription Reagents: N808-0234). The reaction mixtures were mixed with RNA and incubated by 25°C for 10 min, 48°C for 30 min and then 95°C for 5 min. All RNA samples were transcribed twice to detect systematic errors during the cDNA synthesis. In the end, 48 generated cDNAs were put on in technical duplicates on a 96×96 microfluidic dynamic array. Likewise, all 46 primers were put in twice on the array as technical replicates. The DS2 is available in

Liver tissue samples, generated in a complex

The experiment was originally dedicated to explore the effect of vaso-active drugs (L-NAME, Molsidomine, saline) on the recovery from focal outflow obstruction in rats with portal hypertension due to liver resection (named: ligation/PH). Focal outflow obstruction was induced by ligation of the right median hepatic vein. Animals subjected to ligation only (named: ligation) and untreated rats (named: untreated) were used as controls. Samples were obtained after observation times of either 0h, 24h, 48h and 7d from 3 different locations of the right median lobe: the obstructed zone, the border zone and the normal zone. Further details regarding the experimental design and procedures are published in Hai et al. [

The mRNAs were extracted from the frozen tissues based on manufacturer’s protocols from Qiagen RNeasy Mini Kit (Valencia, CA). RNA quantity was determined using Nanodrop (Thermo Scientific, Waltham, MA) and the quality was assessed by Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA). RNA integrity number (RIN) was above 8.5 for all samples. cDNA synthesis was performed as described in DS2 with reagents by Applied Biosystems (TaqMan Reverse Transcription Reagents: N808-0234).

Pre-designed validated Taqman Gene expression assays were purchased from Life Technologies (Darmstadt, Germany) for the detection of human (DS1), mouse (DS2) and rat (DS3) transcripts. Gene expression assays are listed in the according data set table (

For the analysis of DS1, following genes were used as potential reference genes:

The assays of 9 reference genes used for DS2 and DS3 analysis are:

qPCR was carried out using the 48×48 (for the DS1) and 96×96 (for the DS2 and DS3) Dynamic Array (Fluidigm Corporation, CA, USA) with minor modifications to the manufacturer’s protocol found in Spurgeon et al. [

A dynamic array for Fluidigm Biomark measurements has 48 or 96 probe slots and 48 or 96 sample slots. The center of the chip is an integrated fluidic circuit (IFC), which is a network of fluid lines, NanoFlex valves and reaction chambers [

A protocol for quantification of mRNA by real-time RT-qPCR (reverse transcription (RT) followed by polymerase chain reaction (PCR)) is given by Nolan et al. [_{0}, the cDNA is roughly doubled (PCR efficiency _{t} are registered.
_{t} value indicates a lower mRNA starting concentration _{0}.

RT-qPCR measurements have technical and biological sources of variation. Sources of technical variation are pipetting errors of probes and samples, as well as errors within the steps of RNA extraction, reverse transcription (RT) and qPCR. The most critical step is the RT, which contributes most to the variation of mRNA measurements [

_{t} method_{t} deviation of a sample (_{t} method for normalization as well as identification of stable RGs using

A framework for data analysis of relative qPCR using linear mixed models has been introduced by Steibel et al [

We distinguish two types of technical errors: The probe error per array (_{P:A}) is the effect or mean of all 48/96 sample measurements of a gene on an array. Pipetting of the probe and the channel for probe transportation on the array influence _{P:A}. The sample error (_{S}) is the effect or mean of all 48/96 gene measurements of a sample. Sample preparation steps which are exclusive to a sample like pipetting, the channel for sample transportation and the RNA-extraction influence _{S}. Both technical errors are systematic influences which are present in 48/96 measurements at a time. They can be excluded in order to reduce variance of all measurements.

Furthermore, systematic errors or batch effects

Other systematic effects like influence of treatment and biological variance are part of the measurement and should be retained for visualization. The treatment effect splits up into two parts, the global effect (Δ_{T}) and the treatment effect per gene (Δ_{T:G}). Biological variance and non-systematic technical errors are described by the variable

To sum up, a measurement _{t} measurements may change with the treatment. This effect is the global effect Δ_{T} which need to be retained for

The different effects are estimated in the following order:

Estimate the probe error per array (_{P:A}).

Estimate systematic errors/batch effects (

Estimate the treatment/tissue effect (Δ_{T}).

Estimate the sample error (_{S}).

Estimate the treatment effect per gene (Δ_{T:G}).

It is important to stick to this order. If the sample error would be estimated first of all, systematic errors and treatment effects would be included in this sample error. Sticking to the estimation order above guarantees that Δ_{T} contains no systematic errors and is not removed by the sample error _{S}. The variables _{t} values of a gene under a treatment compared to the untreated group

The fold change values are 2^{−ΔCt}. They are visualized per gene and treatment group in boxplots with the fold change on a base-2 log scale. The displayed boxplots have a centered thick line which marks the median. Lower and upper bounds of the box are the quantiles _{25%} resp. _{75%}. Red points are classified outliers (1.5 times outside of interquartile range below/above of _{25%}/_{75%}).

We recommend to use a proper experimental design (e.g. latin square). Based on that systematic errors can be identified and excluded by using LEMming.

We calculated the average expression stability and the pairwise variation of common RGs in DS1 according to _{t} method. The normalization factor (NF) calculated from the genes _{t}, _{t} and LEMming normalized data is presented in _{0}-hypothesis of equality of ^{−11}) and std. devs of ΔΔ_{t} normalized data (p ≤ 10^{−6}). Additionally, we present a proof in _{t} with a single RG is greater than of std. devs LEMming processed data. The boxplots per gene and condition comparing raw data,

The x-axis shows the difference between standard deviation (sd) of raw values and sd of

The average expression stability and the pairwise variation of common RGs in DS2 according to _{t} of -0.28 at 24h, of -0.14 at 48h and -0.15 at 72h compared to 0h) which is present for all genes measured in DS2 (see

Left: Log2-fold differential expression of the gene

T-tests on the three RGs in raw data have no significant results comparing the 24h, 48h and 72h condition to the 0h condition with a Bonferroni corrected ^{−9}, 0h vs 72h ^{−9}, other RGs have under all comparisons ^{−11}). LEMming normalized data shows also significant results for the three RGs (see

Std. devs of ^{−11}). The mean reduction of std. dev. per gene and condition is 49.4% for

The average expression stability and the pairwise variation of common RGs in DS3 according to

The fold change values (

Boxplots of the untreated conditions are black, boxplots of treatment conditions (dedicated in

Using the three RGs in this case distorts the results as it is demonstrated by the variance-mean plot in

We analyzed how variance relates to the absolute _{t}-value with raw values of DS1 in _{t}-value, which means with lower mRNA content, the standard deviation of technical replicates increases (_{t}-values are less trustable than measurements with lower ones. The biological variance in the control group is significantly smaller compared to the treated WY14,643 group (ANOVA of standard deviation values per gene for the mean expression values of 4 biological replicates: p = 0.01).

_{t} value of each gene and well over the mean _{t} value for data set 1 (DS1). Each gene is measured six times per biological replicate (3× cDNA and 2× PCR per cDNA). The regression line shows that the standard deviation increases with the _{t} values (lower mRNA content). _{t} value for DS1. The mean of all six technical replicates is computed per biological replicate and gene. The standard deviation of these means is computed with 4 biological replicates for each gene. The biological variance is higher under treated conditions compared to the control conditions.

DS1 and DS2 with its multiple technical replicates allow to further discriminate sources of variation in the data. Thus, the contribution of non-systematic technical errors (cDNA conversion error and qPCR error) to the sample error _{S} and the residuals of LEMming

Proportions of variance contribution are estimated from raw data

The contribution of biological variance, cDNA conversion error and qPCR error to the overall variance varies from gene to gene. In raw data the cDNA conversion contributes most (median over all genes: 62%) to the overall variance, while biological variance only accounts for 23% and qPCR error for 15% of the overall variance (see

The same analysis was performed with DS2 (see _{S} into a cDNA conversion error and a sample pipetting error. Here the cDNA conversion is the dominant variance source in the raw data. Due to the multiple technical replicates in the experiment, this effect is nearly completely removed by the LEMming approach.

Black—raw data, Green—LEMming processed data.

The residuals _{0} hypothesis (Student-t distribution of residuals)) for DS1 accepted the _{0} hypothesis (Kolmogorov-Smirnov test _{0} hypothesis (Student-t distribution of residuals) was accepted by a Kolmogorov-Smirnov test (

Density plot of raw data (a) and residuals of LEM-method (b) of reference genes in DS3. Blue: kernel density estimation of raw data/residuals. Red: estimated Student-t distribution. (c) Quantile-Quantile plot with quantiles of estimated Student-t distribution versus quantiles of residuals.

Here we generated and validated a method for normalization of parallel qPCR measurements, called LEMming, which is based on a linear error model to exclude technical and systematic errors. LEMming takes advantage of the experimental design of qPCR studies conducted on microfluidic arrays, which is a high throughput platform for qPCR. Our LEMming tool allows, therefore, the analysis of such data without usage of reference genes (RGs). We applied LEMming to three data sets with different stability patterns of common RGs.

DS1 represents a data set with stable RGs available. _{t} method.

DS2 evaluated mice under starvation conditions in liver samples. It represents a data set where stability of RGs is questionable after normalization. RGs were not significantly differential expressed in raw data and were selected according to

DS3 evaluated rats after partial hepatectomy under vaso-active drugs in different zones of the liver after various observation times with a control of untreated and ligation of the right median hepatic vein. Thus, DS3 presents a very complex

This rises the question, whether M-values and pairwise variation criteria for stable RG selection are complete. The criteria are based on inter- and intragroup variation of the RGs. The selected RGs

The analysis of variance in DS1 and DS2 revealed that the reverse transcription step (cDNA conversion error) is the dominant technical error. Treatment conditions had a positive effect on the biological variance compared to the untreated condition. LEMming was able to remove large proportions of technical errors and retained the biological variation. We showed theoretically and practically that applying LEMming results in reduced gene wise variances per treatment group compared to normalization with a single RG. The reduction of these variances is based on the removal of systematic errors which are part of a linear mixed model estimated from the data. It is important to estimate the effects of this model in a particular order, otherwise effects like the global treatment effect would be removed as a sample error. Usually the residuals of linear mixed models are assumed to be normally distributed. If this is not the case, estimated parameters might be biased. We observed a Student t-distribution of residuals of the linear mixed model. This distribution is symmetric like a normal distribution, but has heavier tails. Thus, the estimated effects are not biased, but the standard error of the estimated effects might be inaccurate. To not overestimate the significances of differential expressions, we recommend to use robust tests to analyze them. We used the function

LEMming uses the assumption that the means of _{t} values within the samples of similarly treated groups are equal. Since the genes are selected by the criterion to see a difference between conditions, a global treatment effect Δ_{T} can be shown in most data sets. However, a systematic sample error per treated group _{T} are indistinguishable by LEMming. The use of RGs which are provably independent of the treatment would automatically compensate this. If such RGs are available, we strongly recommend to use them because of their capability to automatically remove systematic batch effects. However, with a growing number of experimental conditions the chance of finding such RGs decreases.

Thus, we recommend to use quantification of total cDNA content per sample as a second independent measurement in order to identify systematic sample errors per treated group (_{T}) is not distorted.

Despite a method being independent of RGs, we would still recommend to measure at least two RGs which is recommended according to the MIQE guidelines [

(PDF)

(PDF)

(R)

(PDF)

(XLSX)

(XLSX)

(XLSX)

This work was funded by the Bundesministerium fuer Bildung und Forschung (BMBF)—Virtual Liver Network (grant FKZ 0315755, FKZ 0315736, FKZ 0315765, FKZ 0315751)—and Robert Bosch Foundation, Stuttgart and supported by the DFG funding program Open Access Publishing. We gratefully thank Dr. Hai Huang for the provision of animal samples for data set 3 (Grant support: Klinische Forschergruppe 117-Optimierung der Leberlebendspende Grant number: Da251/5-2 und 3, Project B2, KFO117) and Igor Liebermann (IKP Stuttgart) for technical support.