Non-linear archetypal analysis of single-cell RNA-seq data by deep autoencoders
Fig 5
scAAnet identified archetypes with correspondence to known cell types in the pancreatic islet dataset.
(a) The UMAP visualization using the top 35 PCs of the observed scRNA-seq data (left), using the inferred cell usage matrix from the encoder (middle), and using the reconstructed expression profile from the output layer of scAAnet (right). UMAPs are colored by 10 known cell clusters. The average Silhouette scores for the three UMAPs are 0.604, 0.398 and 0.613, respectively. Black dots are locations of cells that have the largest usage of the corresponding GEP (marked in Arabic numerals). (b) UMAPs from the observed data colored by inferred cell usage for each GEP. (c) Heatmap showing the usage of all GEPs (rows) in all cells (columns). Cells are ordered by hierarchical clustering. (d) Heatmap showing the percentage of cells with usage > 25% of each GEP (rows) in each cell type (columns). Colors in c and d are coded in the same way as those in a. (e) Normalized gene scores (Methods) of known markers (columns) in each GEP (rows). (f) Mean z-scores of known markers (columns) in each cell cluster (rows).