Fig 1.
Overview of hierarchical marker gene selection in PBMC3k data.
(a) Marker gene heatmap generated by the one-vs-all FindMarker approach in Seurat. (b) Our constructed hierarchical structure of cell clusters in the PBMC3k dataset. (c) Assembled heatmap that concatenates marker gene heatmaps for individual splits in the constructed cell cluster hierarchy.
Fig 2.
Hierarchical marker gene selection in PBMC control dataset.
(a) Marker gene heatmap generated by the one-vs-all FindMarker approach in Seurat. (b) Constructed hierarchy of cell clusters in PBMC control dataset. (c) Assembled heatmap that summarizes all marker genes for various splits in the cell cluster hierarchy.
Fig 3.
Hierarchical marker gene selection in PBMC stim dataset.
(a) Marker gene heatmap generated by the one-vs-all FindMarker approach in Seurat. (b) Constructed hierarchy of cell clusters in PBMC stim dataset.(c) Assembled heatmap that summarizes all marker genes for various splits in the cell cluster hierarchy.
Fig 4.
Comparison of hierarchical marker genes with two baselines and three existing marker genes selection methods.
Baselines are either all genes or highly variable genes. The three existing approaches are the flat one-vs-all FindMarker in Seurat, the flat version of scGeneFit, and the hierarchical version of scGeneFit. For each evaluation datasets, we trained a K-Nearest Neighbor classifier on 70% of the cells, and tested classification accuracy on the remaining 30% cells. (a) Classification accuracies for the PBMC3k dataset; (b) Classification accuracies for the PBMC control dataset; (c) Classification accuracies for the PBMC stim dataset.
Fig 5.
UMAP visualization of hierarchical marker genes, two baselines and three existing marker genes selection methods, applied to three datasets.
(a) UMAP visualizations of PBMC3k dataset colored by cell types; (b) UMAP visualizations of PBMC control dataset; (c) UMAP visualizations of PBMC stim dataset.
Fig 6.
Compare hierarchical marker genes with two baselines and three existing marker genes selection methods, in the context of cell type mapping.
Given two scRNA-seq datasets with significant batch effect between them, we trained a K-Nearest Neighbor classifier on one dataset (reference), and tested classification accuracy on the other dataset (query). (a) Classification accuracies with PBMC control as reference and PBMC stim as query; (b) Classification accuracies with PBMC stim as reference and PBMC control as query.
Fig 7.
Marker gene heatmap generated by the one-vs-all FindMarker approach in Seurat.