Table 1.
Summary of normalization methods and software packages on different datasets for DEGs analysis.
Fig 1.
Comparison of nine normalization methods.
(A) Illustrated are boxplots of log2 (counts+1) for MAQC data with two replicates in two conditions (uhr and hbr). The samples in hbr and uhr conditions are in green and red, respectively. Med-pgQ2 and UQ-pgQ2 are our proposed methods. (B) Illustrated are boxplots of the intra-condition coefficient of variation (uhr and hbr), respectively.
Fig 2.
RMSD (root-mean-square deviation) between the log2 expression fold changes of MAQC2 and qRT-PCR.
Illustrated is the RMSD between the log2 fold changes computed from DEGs based on different methods and the values computed from qRT-PCR. FPKM (yellow) has the least similarity while DESeq normalization (brown) has the highest one.
Fig 3.
ROC curve and AUC values from MAQC2 data.
The ROC curves and AUC values (inset) for evaluating the performance of the nine normalization methods were computed using MAQC2 with two conditions (uhr and hbr). Our proposed methods, Med-pgQ2 and UQ-pgQ2 (blue and grey, respectively) performed slightly better.
Table 2.
A one-sided of z-test on AUC values from Fig 3 comparing Med-pgQ2 to other methods.
Table 3.
Analysis of DEGs for MAQC2 and MAQC3 given a nominal FDR ≤ 0.05.
Table 4.
The actual FDR, sensitivity and specificity rate from MAQC2 data given a nominal FDR ≤ 0.05.
Fig 4.
ROC curve and AUC values from the simulated data at a fold-change of 1.5 and 2.
Illustrated are the ROC curves for detecting 1, 500 DEGs (750 up and 750 dow-regulated) using a fold change = 1.5 (A) and a fold change = 2 (B) with an unequal library size. Calculated AUC values are in the inset. The simulated data, containing a total of 15,000 genes in two conditions and 10 replicates per condition, was used for evaluating the performance of eight normalization methods. Our methods (UQ-pgQ2 and Med-pgQ2) are in cyan and blue, respectively.
Fig 5.
ROC curve and AUC values from the simulated data with 4 and 6 replicates in each condition.
Illustrated are the ROC curves and AUC values (inset) in analyzing the impact of biological replicates on the performance of normalization methods. We used the simulated data with four biological replicates (A) and six biological replicates (B), which contain 1,500 DEGs with 2 FC difference between two conditions. Our methods (UQ-pgQ2 and Med-pgQ2) are in cyan and blue, respectively.