Fig 1.
SDImpute imputes dropouts and retains true zeros in the Trapnell dataset.
(A) The plots show the fraction of zero counts in scRNA-seq data against the mean of bulk expression entries across sample replicates of T0, T24, T36, and T72, respectively. The expression values are divided into five bins based on the mean of bulk gene expression entries of sample replicates. (B) Results of the Pearson Correlation between average expression levels of the same cell type in the imputed scRNA-seq data and the mean of the bulk RNA-seq dataset across sample replicates of T0, T24, T36, and T72, respectively.
Fig 2.
SDImpute improves the distribution and maintains the heterogeneity of gene expression in the Camp dataset.
(A) Boxplots show the results of the difference between the CV of gene expressions after imputation and the CV of non-zero expressions (FPKM (fragment per kilobase million) is greater than 0) before imputation in DE cells. (B) The plot shows the results of the genes unexpressed across DE cells in the raw data. Here, the CV of unexpressed genes is defined as zero, and different colored bars show the number of these genes with the zero CV and non-zero CV in the imputed data, respectively. (C) Scatter plots show the results of the genes expressed in all DE cells before imputation. Here, the x-axis and y-axis represent the CV before imputation and the CV after imputation, respectively. (D) Density plots show the distribution of six genes across iPS cells in raw data vs imputed data by SDImpute.
Fig 3.
SDImpute improves the visualization of cell types in simulated datasets.
(A), (C) Visualization after t-SNE [27] dimensionality reduction in simulated data of two cell types and four cell types, respectively. (B), (D) Heat maps of top 500 differential expression genes (DEGs) in simulated data of two cell types and four cell types, respectively.
Fig 4.
SDImpute improves the visualization of cell types in real datasets.
(A) PCA plots in raw data and SDImpute imputed data of the Camp dataset. (B) t-SNE plots in raw data and SDImpute imputed data of the Romanov dataset.
Fig 5.
SDImpute improves the clustering accuracy in the Camp dataset.
(A) Plots show the results of three clustering evaluation indexes, and the dashed line represents the clustering accuracy of raw data. (B) The plot shows the distribution of the Pearson correlation coefficient between definitive endoderm (DE) cells, and the Y-axis represents the mean of the correlation coefficients between each cell and the other cells. (C) Heat maps show the correlation coefficients between 10 randomly selected DE cells in the raw data and the SDImpute imputed data.
Fig 6.
SDImpute improves differential expression analysis in the Cell Type dataset.
(A) Box plots show expression levels of marker genes in raw data and SDImpute imputed data. (B) Density plots present the differential expression of two exemplary genes (LEFTY1 and DNMT3B) between H1 cells and DECs in the bulk data, raw data, and SDImpute imputed data, respectively. (C) Venn diagram of the differentially expressed genes (p-value <0.01) detected in raw data and SDImpute imputed data by DESeq2. (D) Enriched GO terms (p-value <10−3) related to the molecular function of the up-regulated genes of H1cells were only detected in SDImpute imputed data.
Table 1.
A summary of the scRNA-seq datasets.