scRNMF: An imputation method for single-cell RNA-seq data by robust and non-negative matrix factorization

doi:10.1371/journal.pcbi.1012339

Fig 1.

The overview of the scRNMF framework.

(a) Data pre-processing. Before imputing dropouts, all datasets are filtered out by removing low expression genes. And then, we apply Log-Normalization on the filtered matrix to obtain a processed matrix X. (b) Missing value imputation. scRNMF is an extension of matrix factorization. scRNMF aims to find two low-rank matrices whose product provides a good approximation to the original matrix. scRNMF integrates two loss function: L₂ loss and C-loss. The L₂ loss function is highly sensitive to outliers, which can introduce substantial errors. We utilized the C-loss function when dealing with zero values in the raw data. (b) Downstream analysis. The imputed matrix is used for downstream analysis.

More »

Expand

Fig 2.

Gene expression data recovery after imputation.

(a) UMAP plots for true data, raw data and imputed data by scRNMF on the Simulated 1 dataset. (b) RMSE and PCC between normalized true counts and imputed values on eight simulated datasets.

More »

Expand

Fig 3.

ARI and NMI of cell clustering results of different imputation methods on five datasets.

More »

Expand

Fig 4.

Evaluation of imputation methods through differential expression analysis on H1-DEC dataset.

(a) Volcano plots of DE genes detected by raw data and imputed data by scRNMF. (b) ACC and AUC scores of which the reference are set as the top 200, 400, 600, 800 and 1000 genes sorted by adjusted P values from the bulk data.

More »

Expand

Fig 5.

Evaluation of imputation methods through pseudo-time analysis by Monocle 2 on Time-course dataset.

(a) Visualization of cellular trajectories reconstruction from raw and imputed data. (b) POS and KOR scores are used to measure the correlation between the real time labels and the pseudo-time labels.

More »

Expand

Fig 6.

Comparison of imputation methods in reducing false positive signals on Simulated 1 dataset.

More »

Expand

Fig 7.

The impact of varying k values on the low-rank based imputation methods.

Cell clustering analysis is performed on the imputed data, and NMI and ARI are used as evaluation metrics.

More »

Expand

Fig 8.

Comparative running time of different imputation methods with a fixed gene count of 2000 (a) and a fixed cell count of 2000 (b).

More »

Expand