Fig 1.
Visualization of t-SNE based on supercell clustering results.
Table 1.
Comparison of intermediate-resolution approaches. Columns indicate whether a method explicitly incorporates each design property.
Fig 2.
The workflow includes: (1) second-order co-occurrence neighbor extraction from multi-omics data; (2) cross-omics consistency fusion; (3) probabilistic pruning to identify high-order structural units (supercells); (4) supercell clustering via optimization; and (5) visualization and downstream analyses.
Table 2.
Top enriched markers in cluster 3 and cluster 7 based on one-sided Welch’s t-test.
Fig 3.
Functional enrichment analysis of upregulated genes.
(a) GO circular plot showing total and differentially expressed genes (DEGs) associated with each GO term. (b) Chord diagram linking representative upregulated genes to enriched immune-related processes. (c) GO enrichment bubble plot. (d) Bar plot of significantly enriched GO terms.
Fig 4.
GO enrichment analysis of upregulated protein-level markers.
(a) Circular GO plot summarizing enriched terms. (b) Network linking representative proteins to enriched GO terms. (c) Bubble plot of top enriched GO terms. (d) Bar plot of significantly enriched GO terms.
Fig 5.
Pathway enrichment analysis for Supercell 62 across four databases.
(a) KEGG pathway enrichment. (b) GO component enrichment. (c) Reactome pathway enrichment. (d) MSigDB enrichment. The results consistently highlight antigen presentation, dendritic cell programs, vesicle trafficking, and immune signaling pathways.
Fig 6.
Visualization of supercells in t-SNE embedding.
Cells belonging to Supercell 62 and Supercell 363 are highlighted using triangular markers. (a) Supercell 62 corresponds to a rare dendritic-cell population. (b) Supercell 363 contains a group of B cells with embedded NK cells, representing a rare composite population.
Table 3.
Performance comparison of different methods in ARI on the considered datasets.
Table 4.
Performance comparison of different methods in NMI on the considered datasets.
Fig 7.
Comparative t-SNE Visualization of Latent Representations on PBMC10 × Dataset: (a) scHG, (b) GBS, (c) scAI, (d) SMSC, (e) CGD, (f) scMNMF, (g) GSTRPCA.
Table 5.
Performance comparison of different methods in time(s) on the considered datasets.
Fig 8.
Execution time comparison (Log Scale) of different methods on the considered datasets.
Fig 9.
Peak memory consumption comparison (Log Scale) of different methods on the considered datasets.
Table 6.
Cell–level confusion statistics for the seven supercells (IDs 62, 63, 363, 453, 1771, 1852, 1917) in the PBMC10 × dataset before and after high-order refinement.
Fig 10.
Connectivity graph of Supercell 62 before high-order pruning.
Node size indicates degree centrality and colors denote ground-truth clusters. Dashed ellipses mark low-centrality cells removed during high-order refinement. (This plot was generated using the CNSknowall platform (https://cnsknowall.com), a comprehensive web service for data analysis and visualization.).
Table 7.
Performance comparison of additional baselines and scHG in ARI on the benchmark datasets.
Table 8.
Performance comparison of additional baselines and scHG in NMI on the benchmark datasets.
Table 9.
Ablation results of angle-aware metric and probabilistic pruning modules in six datasets across eight clustering evaluation metrics; E stands for Euclidean metric and A for angle-aware metric.
Fig 11.
Ablation experiment results (NMI & ARI) on the considered datasets.
Fig 12.
ARI variation across six benchmark datasets:
(a) PBMC10 × , (b) PBMC_Inhouse, (c) Sim, (d) Bmcite, (e) mESC, (f) PBMC_Cao.Dots indicate: yellow (ground-truth cluster numbers), green (algorithm-derived optimal cluster numbers). Dashed indicate: blue (top-10 ARI threshold)..
Fig 13.
NMI variation across six benchmark datasets: (a) PBMC10 × , (b) PBMC_Inhouse, (c) Sim, (d) Bmcite, (e) mESC, (f) PBMC_Cao.
Dots indicate: yellow (ground-truth cluster numbers), green (algorithm-derived optimal cluster numbers). Dashed indicate: blue (top-10 NMI threshold).
Fig 14.
Clustering performance (ARI) across the (α, β) hyperparameter space on six datasets: (a) PBMC10 × , (b) PBMC_Inhouse, (c) Sim, (d) Bmcite, (e) mESC, (f) PBMC_Cao.
Fig 15.
Clustering performance (NMI) across the (α, β) hyperparameter space on six datasets: (a) PBMC10 × , (b) PBMC_Inhouse, (c) Sim, (d) Bmcite, (e) mESC, (f) PBMC_Cao.
Fig 16.
Clustering performance (ARI & NMI) across the γ hyperparameter space on six datasets: (a) PBMC10 × , (b) PBMC_Inhouse, (c) Sim, (d) Bmcite, (e) mESC, (f) PBMC_Cao.Solid lines denote clustering performance under different values of γ, while dashed lines indicate the median ARI (blue) or NMI (red) for each dataset.
Table 10.
Randomness Analysis of scHG Across Six Datasets (Mean ± Variance over 20 Runs).
Fig 17.
Convergence behavior of the omics-weighted optimizer across datasets.
Objective value as a function of the iteration number for six representative datasets (PBMC10 × , PBMC_inhouse, Bmcite, mESC, PBMC_Cao, and Sim). Each curve corresponds to one dataset..
Fig 18.
Comparison between Euclidean and angle-aware metrics.
Although Euclidean distance suggests grouping cells i and k, the angular metric based on Pearson correlation indicates higher similarity between cells i and j, illustrating the advantage of the angle-aware metric.
Fig 19.
(a) Example graph constructed from adjacency matrix M. (b) Illustration of second-order co-occurrence neighbors based on shared -nearest neighbors exceeding threshold
. (c) Example of probabilistic pruning within a connected component.
Table 11.
Detailed information of single-cell multi-omics datasets.