A unified censored normal regression model for qPCR differential gene expression analysis

Reverse transcription quantitative polymerase chain reaction (RT-qPCR) is considered as the gold standard for accurate, sensitive, and fast measurement of gene expression. Prior to downstream statistical analysis, RT-qPCR fluorescence amplification curves are summarized into one single value, the quantification cycle (Cq). When RT-qPCR does not reach the limit of detection, the Cq is labeled as “undetermined”. Current state of the art qPCR data analysis pipelines acknowledge the importance of normalization for removing non-biological sample to sample variation in the Cq values. However, their strategies for handling undetermined Cq values are very ad hoc. We show that popular methods for handling undetermined values can have a severe impact on the downstream differential expression analysis. They introduce a considerable bias and suffer from a lower precision. We propose a novel method that unites preprocessing and differential expression analysis in a single statistical model that provides a rigorous way for handling undetermined Cq values. We compare our method with existing approaches in a simulation study and on published microRNA and mRNA gene expression datasets. We show that our method outperforms traditional RT-qPCR differential expression analysis pipelines in the presence of undetermined values, both in terms of accuracy and precision.


Introduction
High-throughput reverse transcription quantitative polymerase chain reaction (RT-qPCR) is a popular technology for gene expression profiling.An important advantage of qPCR is the speed, specificity and sensitivity of the qPCR assays.qPCR is often referred to as the gold standard for gene expression profiling [e.g.[1]].Therefore, it is commonly used within the context of diagnostic and prognostic testing as well as for biological validation of biomarkers discovered in large screening experiments with microarray or next generation sequencing technologies.RT-qPCR is a cyclic process in which targeted molecules are amplified and simultaneously quantified by measuring a fluorescence intensity.The raw RT-qPCR profiles are typically summarized into a single value, the quantification cycle Cq.Common procedures for calculating Cq-values are based on the number of cycles needed to cross a certain threshold, or on second derivatives of the amplification curve [e.g.[2]].If a target is not expressed or the amplification step fails, the threshold is not reached after the maximum number of cycles (limit of detection or LOD) and the Cq is undetermined.A typical qPCR dataset thus consists of both expressed and undetermined Cq-values (UV).To our knowledge, the existing methods for handling UV in qPCR experiments are very ad hoc.Popular approaches either remove UV or perform imputation.The former approach ignores data that is informative whereas the latter results in artifacts.Although the truncation at the LOD has implications for all types of subsequent data analysis (e.g.cluster analysis, absolute quantification, . ..), our method is specifically developed for differential expression analysis.
Similar to other gene expression workflows, differential expression analysis with qPCR involves a) preprocessing, b) statistical analysis and c) correction for multiple testing.The existing qPCR analysis pipelines are sequential: first, technical sample variation is reduced in a separate normalization step and subsequently multiple hypothesis tests are conducted.Although the effect of the pre-processing can be quite substantial, it is typically ignored in the subsequent analysis steps.This can imply an incorrect control of the significance level α, which can lead to an increased false positive rate or a reduced power.Within a microarray context, it has been shown that error propagation can improve the accuracy of the differential expression analysis substantially.But, even when error propagation is provided, the analysis can still be suboptimal: each step in a modular approach is optimized individually without taking the previous and the future analysis steps into account [3].
We present a unified censored normal regression (UCNR) model for differential expression analysis of RT-qPCR data.In contrast to a sequential approach, a "unified" method simultaneously performs normalization and statistical testing, while correctly accounting for UV.In the presence of UV, our method outperforms state of the art RT-qPCR data analysis pipelines in terms of accuracy and precision.Our hypothesis tests are more robust to UV and the method increases the stability of the estimated normalization factors.The paper is organised as follows: we introduce the unified censored normal regression model and elaborate on the interpretation of the parameters for differential microRNA/gene expression.Next, we compare the robustness of hypothesis testing and normalization of our model-based approach to popular methods for analysing RT-qPCR data.Finally, we illustrate the method on a microRNA gene expression study in neuroblastoma to find differentially expressed genes between MYCN amplified and MYCN single copy tumor samples [4], and on a mRNA gene expression study in neuroblastoma to detect differentially expressed prognostic genes between patients with higher risk of death from disease or higher risk of relapse or progression and patients with low risk [5].

Materials and methods
MicroRNA gene expression data in neuroblastoma (NB) with 430 profiled microRNAs and 18 small RNA controls, as described in [4], is used to set up a simulation study and analysed as a case study for differential expression analysis between two tumor groups in 61 samples, with 22 MYCN amplified (MNA) and 39 MYCN single copy (MNSC) tumor samples (61 samples × 448 microRNAs = 27328 observations).A second case study involves a cohort of 343 neuroblastoma patients from a study of the International Society of Pediatric Oncology, European Neuroblastoma Group (SIOPEN) with 58 prognostic genes, that serves as a multigeneexpression signature for patients with neuroblastoma [5], and 5 reference genes.Finally, we

Unified censored normal regression model
From a statistical point of view, UV can be considered as right censored, i.e. the data is incompletely observed, but UV are known to correspond with a Cq of at least LOD cycles (e.g.LOD = 40); we thus observe an undetermined value as the LOD.Suppose that C Ã represents the hypothetical Cq value that would have been observed if there was no LOD.In statistical terms the partly unobservable C Ã is referred to as a latent process.Our approach consists in modelling this latent process and the parameters of this model are expected to give an unbiased assessment of differential expression.
Formally, if C refers to the observed Cq and C Ã to the latent Cq process, then they are related through Consider a study with J samples and I targets.The samples are divided into K groups and the objective of the study is testing for differential expression between the groups.The censored regression model is a hierarchical statistical model.On the top layer, Eq (1) relates the observed C to the latent process C Ã .The latter is further modeled as a classical linear model.In particular, where • μ: intercept • α i : effect of target i, with i = 1, . .., I • β j : effect of sample j, with j = 1, . .., J • αγ ik : interaction effect between target i and group k, with k = 1, . .., K • ε ij $ N ð0; s 2 i Þ: error term reflecting the random noise.While the sample effect β j is included for preprocessing purposes, the interaction effect between target i and group k is a measure for differential expression of target i between a reference group and the group k of interest (see S1 Appendix).The parameter β j represents a normalization factor that is similar to the modified global mean [6], which is an improvement of the global mean strategy [4].The modified global mean procedure (MOD) consists of (a) centering the responses within each target for attributing an equal weight to each target in the subsequent normalisation; (b) centering the modified responses of (a) within each sample around the mean of all expressed targets (or around the mean of the targets expressed in all samples).MOD results in adequate removal of technical variability, as evidenced by more pronounced and balanced differential expression [6].
The model defined by Eqs (1) and ( 2) is referred to as the UCNR model, which is a variation of the Tobit model [7].Parameters can be estimated by means of maximum likelihood [8].The maximum likelihood estimators are consistent and asymptotically normal.The UCNR can be used for testing differential expression of targets and also for the estimation of differential expression in terms of the log 2 fold change, while simultaneously normalizing the data.For target i, this log 2 fold change is denoted by δ i , which is a contrast of the interaction effect parameters (see S1 Appendix).Within our framework, generalised Wald tests for H 0 : δ i = 0 can be used for assessing differentially expressed targets.In the absence of UV this procedure is very similar to a sequential analysis that exists in MOD normalized data followed by a t-test (see S1 Appendix).
The estimator of δ i takes the UV correctly into account.Treating UV as censored observations has the advantage that we continue working with all raw observed data and no ad hoc data manipulations are required.We illustrate the robustness of the estimator in the presence of UV by means of a simulation study with real data characteristics and compare it with classical sequential analyses.To make both methods comparable, we consider classical analyses using multiple t-tests, i.e. a single t-test for testing for differential expression for each target.
In the RT-qPCR literature, there is no consensus on preprocessing UV.Imputation of the undetermined Cq values by the LOD is suggested [9], while other approaches rely on a regression to the mean.In a simulation study we consider three common strategies for handling UV.The first strategy imputes the undetermined value by the LOD and normalizes the imputed values by subtracting the modified global mean or the mean of the selected reference genes (LOD).A second strategy imputes the UV by the maximum normalized value of each individual target and adds 1 so as to preserve the undetermined realization (MNV+1).The third strategy is a k-nearest neighbor algorithm that determines the k nearest neighbors for a gene using a Euclidian metric and imputes the UV by the average of the normalized expressed values of its neighbors (KNN).The latter strategy is provided within the SAM algorithm [10] for imputing missing values.

Robustness of the differential expression estimator in the presence of undetermined values
We conducted a simulation study based on the microRNA gene expression NB data [4].From the total of 448 microRNAs profiled in the NB set, we discarded the microRNAs with at least 1 UV regardless of the group (MNA or MNSC) and only considered the remaining set of 201 microRNAs.For each individual microRNA, we shifted one group to equalize the mean normalized Cq-values (MOD) in the two groups.For computational reasons and optimal graphical display, we considered a random subset of 50 microRNAs from which we altered 20 microRNAs by adding a δ i = 2 or δ i = −2 differential expression to the Cq values in one group.We divided the number of up-and downregulated microRNAs equally over the study.The difference δ i is thus interpreted as a linear fold change of 4. The remaining 30 microRNAs are not differentially expressed (δ i = 0).The δ parameters in this uncensored data set (δ i = 0 or δ i = ±2) are considered as the true parameter values.Hence, a total of 61 samples × 50 micro-RNAs = 3050 observations are involved.
We evaluated the impact of UV for the different methods in an iterative procedure.First the LOD is set to the largest Cq value in the dataset.At each step s = 1, . .., 1000 of the procedure, we censored the maximum uncensored observation in the dataset, resulting in a stepwise decrease of the LOD, and we test for differential expression of the microRNAs using the UCNR and the classical analyses with the three imputation strategies (LOD, MNV+1 and KNN with k = 10).At the end of the procedure, about a third of the data is censored.We evaluate the estimates di in Both statistics assess the robustness of the estimators in the presence of UV: the smaller the bias and the RMSE the better the estimator.The graphs suggest that the bias of the estimator obtained by UCNR is minimal and approximately piecewise constant in this study.The estimators provided by the sequential analyses fluctuate heavily due to the ad hoc imputations.Fig 1B shows a smaller RMSE for our new estimator in comparison with the classical analyses.One may argue that in a small interval (between about 300 to 600 steps) UCNR seems to have a slightly larger bias as compared to the other methods, but in this interval the UCNR method has a very good precision (Fig 1).Both graphs demonstrate the robustness of the new estimator (UCNR) of differential expression in the presence of UV, which improves upon the traditional approaches with respect to both accuracy and precision.
S1 and S2 Figs show results for the bias and the RMSE in a similar simulation study, but starting from 100 and 200 completely observed microRNAs (i.e.no UV at the start of the simulations).The conclusions are the same as for the 50 microRNA setting, demonstrating the scalability of the method.

Robustness of the hypothesis tests in the presence of undetermined values
Using the same data and simulation setup as in the previous paragraph, Figs 2 and 3 illustrate the behaviour of individual hypothesis tests for differential expression (H 0 : δ i = 0; H 1 : δ i 6 ¼ 0).The figures track the differential expression estimates and p-values for two representative targets for which δ i = ±2, for both the classical sequential approaches with multiple t-tests and the UCNR.As before, the estimator from the UCNR is more robust in the presence of UV, and improves upon the sequential approaches in terms of accuracy and precision.This is also reflected in the UCNR p-values (Fig 2(b)), that do not vary as much as with the sequential approaches and remain significant even when a large fraction of the C q values is set to UV.The uncertainty of the estimates from the four approaches is shown in a box plot (Fig 2(c)).After the introduction of approximately 350 UV the microRNA was removed from the study due to the large amount of censoring (19 observations in the MNA group and 37 in the MNSC group), which is common practice when too many datapoints are missing (typically 80% or more).
In the presence of UV, the estimates of differential expression with the sequential approaches differ due to the different imputation strategies.Imputation with MNV+1 treats the UV as ties, which explains the better performance of this estimator and an increase of the p-values only at the end.The estimates of the four analyses tend towards zero as censoring increases, but the bias is clearly larger for the sequential analyses.The p-values obtained from classical analyses inflate heavily at low levels of censoring, while the p-values of the UCNR remain stable over a larger range of censoring.Correctly rejecting the null hypothesis is more often guaranteed with the UCNR.Intuitively, the proportion of censored observations that a

Latent mean normalization
Centering the response around the global mean of the expressed microRNAs adequately removes technical variation and reduces the number of false negatives [4].This normalization procedure is further improved by first centering the targets and thus attributing equal weight to the individual targets [6].Our model extends this approach by including the UV.In particular, the estimator of the β j parameter in the UCNR model (2) has the interpretation of a normalisation factor for sample j.The robustness of this estimator ( bj ) is illustrated in Fig 4 which tracks the estimates of the true normalization factor for two representative samples in the simulation study obtained by UCNR, MOD normalization and MOD normalization on common targets.The latter computes the normalization factor using only the targets that are expressed in all samples.Since the censored observations are not considered, both MOD normalization estimates rapidly diverge from the true normalization factor (Fig 4), which explains the peaks and the crossing curves in Fig 1 .The UCNR method requires no imputations and takes the uncertainty of the estimates into account.We refer to this normalization procedure as latent mean normalization (LMN).Finally, note that the normalisation does not have to be performed as a separate step when the UCNR method is used for testing for differential expression, because testing and normalisation are combined in the unified statistical framework.

Normalization using reference genes
Many normalization strategies have been described in literature.Both MOD and LMN require a large number and unbiased set of genes to be profiled.An empirical approach to select stably expressed reference genes for normalization purposes has previously been described using a gene-stability measure based on the mean pairwise variation between a given candidate reference gene and other tested candidate reference genes [11].In the field of microRNA gene expression, however, only a few candidate reference microRNAs are reported and small noncoding RNAs are often used instead.This approach assumes however that none of these small RNAs are differentially regulated in the experiment.The selection of reference microRNAs is thus rather empirical.Large scale microRNA expression profiling studies global mean expression value normalization is characterized by a high expression stability and thus results in an adequate removal of technical variability [4].This normalization procedure avoids the necessity of identifying good reference RNAs.For small datasets, the procedure using reference genes [11] is often the default procedure.In [4], a strategy to identify stably expressed micro-RNAs is presented.
The UCNR Eq (2) can be easily adapted to perform normalization using multiple reference genes.First, we introduce an indicator R i for the reference genes: gene i is a reference gene: ( Let C Ã again refer to the latent Cq.Then, the UCNR with normalization based on reference makes use of where the interpretations of the parameters μ, α i and (αγ) ik and the error term ε ij remain as for model (2), and μ + β j + z j now represents the normalization factor.The parameter of interest, differential expression δ i of gene i, is again a contrast of the parameters and it is estimated simultaneously with normalization.As before, generalized Wald tests can be used for hypothesis testing (further details in S1 Appendix).

Case study I: Upregulation in the miR-17-92 cluster in MYCN amplified cancer cells
The use of model ( 2) is illustrated by analyzing the NB dataset [4] to detect up-and downregulated microRNAs between MNA and MNSC tumor samples.The LOD was set to 35, as values above this threshold were considered to be noise [12].This corresponds with 32.5% censored observations.The parameters of the UCNR model are again estimated using maximum likelihood estimation which allows for potential heteroskedasticity between the microRNAs.MYCN amplification is the most prominent genetic alteration in neuroblastoma.Here we focus on the miR-17-92 cluster, which is known to be upregulated in the MYCN amplified tumors [13].The miR-17-92 cluster is among the first microRNAs recognized as key components of a molecular network that impacts tumorigenesis and tumor maintenance [14].Table 1 lists the results of the differential expression analysis for the miR-17-92 cluster.All microRNAs from the cluster were found to be significantly upregulated in the MNA tumor samples (at the false discovery rate level of 5%).These findings are similar as described in [4] and [6], except for miR-17-3p which is now also found as differentially expressed.This makes sense because the entire cluster is simultaneously transcribed.

Case study II: Differential gene expression analysis of a multigeneexpression signature for patients with neuroblastoma
In a second case study, we demonstrate model (3) for a multigene-expression signature that serves as a risk predictor for patients with neuroblastoma [5].The signature supports 59 genes that were carefully selected using an innovative data-mining strategy.The prediction model was built using 30 training samples, randomly selected from a cohort of 343 neuroblastoma patients from the SIOPEN study.We perform a differential expression analysis on 58 prognostic genes on the 30 training samples (15 deceased high-risk (HR) and 15 low-risk patients (LR) with a long progression-free survival time).Furthermore, 5 reference genes (AluSq, HMBS, HPRT1, SDHA, and UBC) are included for normalization.
The LOD is set at 39.This choice is based on the application of a data-driven LOD selection criterion to the COG data (see S1 Appendix).Since the COG data and the SIOPEN data were generated on the same platform, an appropriate LOD for the former is expected to be good for the latter too.Moreover, by using an independent dataset for the selection of the LOD, the statistical inference procedures described earlier in the paper (e.g.hypothesis testing) remain valid.The selection of LOD = 39 was also confirmed as follows.With the SIOPEN data, a 95% confidence interval for the average Cq value for the detection of 1 molecule for an individual gene, based on the y-intercepts of a 5-point 10-fold serial dilution standard curve, is given by [37.90; 38.32] (qbase PLUS version 2.4).This gives a biological justification for the choice of 39.With a LOD of 39, 2.5% of the observations are censored.Differential expression analysis between HR and LR was performed by the UCNR method.
The UCNR detects 43 out of the 58 genes as differentially expressed (5% false discovery rate) between the HR and LR group.A full listing of the analysis results is available as supplementary material (S1 Table ).The analysis was also performed with the classical MNV+1 and LOD on the normalized data, using reference gene normalization.Both analyses detect 38 differentially expressed genes.  3) versus those from the classical MNV+1.The figure illustrates that the UCNR method has very often larger -log 10p values (i.e.smaller p values).Since UCNR correctly controls for the type I error, the method thus guarantees a higher sensitivity.Table 2 compares the number of called significant and non-significant genes for both analyses (UCNR Eq (3) and MNV+1).UCNR Eq (3) detects 5 (7-2) extra differential expressed genes.

Discussion
We present a unified censored normal regression (UCNR) model for assessing differential expression in qPCR experiments.The method acts on the raw Cq-values.It performs normalization and differential expression analysis simultaneously while providing a statistical rigorous way for handling undetermined Cq values (UV).Generalized Wald tests are used for assessing differential expression and the model parameters of interest have an interpretation in terms of log 2 fold changes.Table 1.Differential expression analysis with UCNR in the miR-17-92 cluster.The 8 microRNAs are upregulated in the MNA tumor samples.For each individual microRNA in the cluster, estimates di of the log 2 fold change (MNSC-MNA), adjusted p-values (correcting for multiple testing according to [15]) and 5% false discovery rate-adjusted confidence intervals [16] for the average fold change are given.Ignoring censoring will generally lead to inconsistent estimators [8]. the p-values and estimates of the log 2 fold change remain more stable with increasing degree of censoring with the UCNR method than with the other analysis methods.Figs 2 and 3 show the results of a microRNA with heavy censoring in the sense that these particular microRNAs are removed from the study after about 300-350 steps of reducing the LOD.Note that all methods result in biased log 2 fold change estimates and low power to detect differential expression when censoring reaches levels so that few samples with concentrations above the LOD remain.For microRNAs containing less censored observations, or when the microRNA is not differentially expressed, UCNR still outperforms the other methods, but the differences between the methods are smaller (S4 and S5 Figs).The proportion of censored observations in a microRNA before a type II error (false negative) or type I error (false positive) occurred, is used to measure the robustness of the test.For individual microRNAs, the proportions resulting from our model are never smaller and mostly larger than for the classical analyses, indicating its robustness (less sensitive to undetermined values).
The UCNR method extends the common normalization strategies, such as modified global mean normalization and the usage of reference genes or genes resembling the mean.Classical approaches first normalize the data to remove technical variability and statistical analyses are conducted on the normalized data.As a result, the standard errors of the fold change estimates are incorrectly estimated.Our method accounts for estimating the normalization factor.When normalizing using reference genes, the unified model uses the reference genes throughout the full analysis, resulting in a larger sample size and more degrees of freedom.This also affects the p-values.Classical sequential analyses are thus found to be more conservative than the unified method with reference genes.Note that the unified method also correctly controls the type I error rate.Hence, the power gain does not come at the expense of false positives.
Large scale experiments with many targets being measured are less vulnerable for incorrect standard errors estimates, even when a sequential approach is applied.From a theoretical perspective, correct standard error estimates and p-values can be obtained from a sequential procedure by applying an adjustment factor and corrected degrees of freedom.Note, however, that after these adjustments, the sequential approaches will still suffer from lower accuracy and precision in the presence of UV.
The UCNR method is successfully applied on a large scale neuroblastoma study to detect up and down regulated microRNAs between MYCN amplified (MNA) and MYCN single copy (MNSC) tumor samples.We compared the results for the MNA upregulated miR-17-92 cluster with the results obtained by the sequential analyses using multiple t-tests after MOD normalization and imputation of UV according to LOD and MNV+1 (S2 and S3 Tables).The findings are similar, except for miR-17-3p, which is not detected as a differentially expressed micro-RNA with the classical approaches.The estimates of the log 2 fold change differences obtained with UCNR are considerably larger.UCNR employs the information contained in the censored observations throughout the full analysis, resulting in more robust and efficient estimators in the presence of UV.
The method is also applied on a mRNA neuroblastoma dataset for detecting differentially expressed genes within a 58 gene-expression signature.Reference gene normalization was used.Since the signature is validated as an accurate risk predictor for patients with neuroblastoma, it is expected that most genes are differentially expressed between the high-risk and the low-risk group.UCNR illustrates its power by detecting more differential expressed genes than with the classical analyses.
The LOD censoring threshold plays a non-ignorable role in a censored regression context.In the SIOPEN case study, an optimal LOD was selected through the evaluation of a datadriven loglikelihood-based criterion on two independent datasets that were profiled on a similar platform as the SIOPEN data.Both analyses rendered an optimal LOD which could be biologically validated.
Since the UCNR model relies on a normal distributed process, the method is thus only applicable when the assumption of normality is not violated.The same holds for analyses with multiple t-tests.However, in the absence of the normality assumption the properties of the estimator still hold asymptotically (i.e. for a large number of observations).If the UV results from technical error such as failure of amplification rather than a concentration below the LOD, an additional assumption of this technical failure being random (i.e. the failure of quantification is not related to the concentration, the particular target, . . .being assessed) is needed.
Since the UCNR model is basically a linear regression model, it can also be adapted to more complex study designs (e.g.k-group designs) or can be extended by including one or more confounder variables.The method guarantees to correctly account for the normalization which is simultaneously performed with the estimation.
The R code and data used to conduct the simulation and case studies are available in a GitHub repository accessible at https://github.com/CenterForStatistics-UGent/UCNR.The case studies have been documented so that they can be adapted to analyse the users' own data.

Conclusion
We proposed a unified censored normal regression (UCNR) model for analyzing differential expression in qPCR experiments.The model acts on the raw Cq-values and accounts for undetermined values (UV) without requiring ad hoc imputation algorithms.The model integrates the normalization procedure within the statistical analysis.We showed that the estimator and hypothesis tests are robust in the presence of UV and that our method outperforms popular imputation methods in terms of accuracy and precision.

Fig 1 .
The study illustrates the robustness of the estimator of differential expression in the presence of UV (Fig 1).For iteration s, each analysis provides a set of estimates di of the parameters δ i .Fig 1A tracks the mean difference between δ i and di , which is an estimate of the bias, while Fig 1B tracks the square root of the mean squared error (RMSE).

Fig 1 .
Fig 1. (A) Bias and (B) RMSE of the differential expression estimates of 50 microRNAs, as a function of the number of censored Cq values.At the bottom the grey circles indicate the removal of a complete miRNA (as a consequence of censoring).The numbers on top of some of the grey circles represent the number of remaining miRNA in the study.Estimators are obtained by UCNR (green solid line), multiple t-tests with MOD normalization and LOD imputation of the UV (red dashed line), multiple t-tests with MOD normalization and MNV+1 imputation (blue dotted line) and multiple t-tests with MOD normalization and KNN imputation (black dotted-dashed line).A bias closer to zero suggest more accurate estimates.A small RMSE indicate a high precision of the estimator.The sharp jumps in the curves happen when a complete miRNA gets censored, which heavily affects the normalisation constants.https://doi.org/10.1371/journal.pone.0182832.g001

Fig 2 .
Fig 2. A differentially expressed microRNA (true δ i = 2) tracked during the simulation study.(A) Estimates of differential expression by UCNR (green solid line), multiple t-tests with MOD normalization and LOD imputation (red dashed line), MNV+1 imputation (blue dotted line) and KNN imputation (black dotteddashed line).Censoring an observation at some point for this particular microRNA is marked by a black circle (MNA group) or a grey square (MNSC group) on the horizontal axis.(B) Plot of − log 10 p-values for the hypothesis test (H 0 : δ i = 0; H 1 : δ i 6 ¼ 0).(C) Box plot of differential expression estimates.https://doi.org/10.1371/journal.pone.0182832.g002

Fig 3 .
Fig 3. A differentially expressed microRNA (true δ i = −2) tracked during the simulation study.(A) Estimates of differential expression by UCNR (green solid line), multiple t-tests with MOD normalization and LOD imputation (red dashed line), MNV+1 imputation (blue dotted line) and KNN imputation (black dotteddashed line).Censoring an observation at some point for this particular microRNA is marked by a black circle (MNA group) or a grey square (MNSC group) on the horizontal axis.(B) Plot of − log 10 p-values for the hypothesis test (H 0 : δ i = 0; H 1 : δ i 6 ¼ 0).(C) Box plot of differential expression estimates.https://doi.org/10.1371/journal.pone.0182832.g003

Fig 4 .
Fig 4. Estimates of the normalization factor of two representative samples ((A) sample 2, (B) sample 3) in the simulation study.Estimates are obtained by LMN (green solid line), MOD normalization (red dashed line) and MOD normalization on common targets (blue dotted line).The true normalization factor is represented by the horizontal line.https://doi.org/10.1371/journal.pone.0182832.g004 Fig 5 displays a Q-Q plot of the -log 10 transformed p-values from the UCNR model ( Figs 1 to 4 illustrate the influence of censoring on the differential expression estimator, the normalization factor estimator and the hypothesis tests.Sequential analyses are sensitive to censored observations as suggested by the fluctuations in the estimates of differential expression in Fig 1.The impact of ignoring censoring on two representative differentially expressed microRNAs (true |δ i | = 2) is illustrated in Figs 2 and 3:

Fig 5 .
Fig 5. Q-Q plot of the −log 10 p-values for the UCNR model (3) versus MNV+1.The p-values result from the differential gene expression analysis of the multigene-expression signature for patients with neuroblastoma.The solid line represents the bisector.The UCNR method typically has larger −log 10 p-values than the MVN+1 method, resulting in a higher sensitivity.https://doi.org/10.1371/journal.pone.0182832.g005