scDSSC: Deep Sparse Subspace Clustering for scRNA-seq Data

doi:10.1371/journal.pcbi.1010772

Fig 1.

Network architecture of scDSSC.

An autoencoder based on ZINB distribution is constructed, in which the encoder and decoder can be seen as two nonlinear mappings g and h. The output of the hidden layer is a low dimensional data, which is considered to be the data after dimension reduction and noise reduction. After the last layer of the decoder, four full connection layers are connected, which respectively generate reconstruction data and three parameters in the ZINB distribution. The output of self-expressive layer is self-expressive matrix (SEM), which is sparse and used to perform spectral clustering. According to the clustering results, a series of downstream analysis are applied to verify the performance of the model.

More »

Expand

Table 1.

The details of scRNA-seq datasets used in paper.

More »

Expand

Fig 2.

The clustering performance assessed by NMI.

Each subgraph represents the clustering results of six clustering methods on a dataset. Different colors correspond to different methods, and the ordinate represents the NMI score.

More »

Expand

Fig 3.

The results of cell visualization and annotation.

We have given the real cell types on the visualization results, which can help us more intuitively obtain the biological information contained in scRNA-seq datasets. In the figure, we can clearly see the location relationship and distribution between different types of cells. Left A is the visualization result of Romanov dataset, which has 7 cell types. And B represents the results of Human1 dataset, which has 14 cell types.

More »

Expand

Fig 4.

The results of marker genes and cell trajectory inference.

A shows the distribution of candidate marker gene expression on different types of cells. The left side lists seven cell types, and the lower side lists the marker gene names. The shape of each violin approximates the expression distribution of the marker gene on the cell, and the shade of color reflects the average expression level. B is the result of cell trajectory inference, which reveals the dynamic process of cell differentiation. Seven different colored dots correspond to seven different cell types, and the size of the dots reflects the number of cells.

More »

Expand

Fig 5.

scDSSC is scalable and robust for batch effect.

Plot A represents the cell visualization result of Human2 dataset, in which the 14 names indicate 14 different cell types. And plot B is the results of differential expression analysis on Human2. The rows and columns indicate the gene names and cell type names, respectively, and the brightness of the color reflects the average expression of the gene. In Plot C, we used ARI to evaluate the clustering performance of eight methods on six small datasets containing batch effects. Plot D shows the performance of scDSSC assessed by ACC, NMI and ARI on a large dataset Macosko_mouse. Plot C and Plot D use the same legend.

More »

Expand

Fig 6.

scDSSC is capable on simulation datasets.

Plot A shows the clustering performance of scDSSC on 12 simulated datasets. The six datasets in the left panel are generated when the dropout rate is set to 0.25, and the six datasets in the right panel are generated when the dropout rate is set to 0.05. Plot B shows the cell visualization results for one of the datasets with five cell types, corresponding to a dropout rate of 0.25. Plot C shows the differential expression analysis for the dataset used in Plot B. The meanings represented by the rows and columns, as well as the meanings represented by the light and dark colors, are consistent with the results of the differential expression analysis described above.

More »

Expand