Normalizing RNA-Sequencing Data by Modeling Hidden Covariates with Prior Knowledge

doi:10.1371/journal.pone.0068141

Figure 1.

Heatmap of RNA-seq data from (a) Pickrell data and (b) Montgomery data.

Rows represent subjects (or individuals) and columns represent genes. Using k-means clustering on corresponding RPKM normalized data, subjects are grouped in three clusters and genes are also grouped into three clusters. As observed later, these broad clustering patterns are primarily driven by confounding factors such as sequencing depth.

More »

Expand

Figure 2.

Detection of cis-eQTL on (a) Pickrell and data (b) Montgomery data.

Correlation between (c) SNP-level p-values and (d) gene-level p-values in Montgomery and Pickrell datasets. P-values for the correlation coefficients for SNP-level comparisons are show on top of each bar. For the gene-level comparison, all p-values for the correlation coefficients are smaller than . Error bars show the 90% confidence intervals for the correlation coefficients.

More »

Expand

Table 1.

Fraction of shared cis-eQTLs at 10% FDR between pairs of various versions of normalized Pickrell RNA-seq data.

More »

Expand

Figure 3.

Mean average precision (AUP) in predicting gene function from co-expression networks constructed from various normalization methods, on (a) Pickrell, and (b) Montgomery data.

The figure shows the performance of SVD at two different parameter settings (SVD (10) with , and SVD (2) with ). Error bars show the standard errors. Figure shows the cumulative performance for (c) Pickrell, (d) and Montgomery, datasets for the top 50 best predicted GO categories for each method.

More »

Expand

Figure 4.

Performance of SVD on the GO prediction test with varying number of removed PCs (i.e., setting of ) on (left) Pickrell and (right) Montgomery data.

The red star marks the optimal setting of for the cis-eQTL task.

More »

Expand