scHG: A supercell framework with high-order graph learning enables scalable multi-omics analysis

doi:10.1371/journal.pcbi.1013851

Fig 1.

Visualization of t-SNE based on supercell clustering results.

More »

Expand

Table 1.

Comparison of intermediate-resolution approaches. Columns indicate whether a method explicitly incorporates each design property.

More »

Expand

Fig 2.

The workflow includes: (1) second-order co-occurrence neighbor extraction from multi-omics data; (2) cross-omics consistency fusion; (3) probabilistic pruning to identify high-order structural units (supercells); (4) supercell clustering via optimization; and (5) visualization and downstream analyses.

More »

Expand

Table 2.

Top enriched markers in cluster 3 and cluster 7 based on one-sided Welch’s t-test.

More »

Expand

Fig 3.

Functional enrichment analysis of upregulated genes.

(a) GO circular plot showing total and differentially expressed genes (DEGs) associated with each GO term. (b) Chord diagram linking representative upregulated genes to enriched immune-related processes. (c) GO enrichment bubble plot. (d) Bar plot of significantly enriched GO terms.

More »

Expand

Fig 4.

GO enrichment analysis of upregulated protein-level markers.

(a) Circular GO plot summarizing enriched terms. (b) Network linking representative proteins to enriched GO terms. (c) Bubble plot of top enriched GO terms. (d) Bar plot of significantly enriched GO terms.

More »

Expand

Fig 5.

Pathway enrichment analysis for Supercell 62 across four databases.

(a) KEGG pathway enrichment. (b) GO component enrichment. (c) Reactome pathway enrichment. (d) MSigDB enrichment. The results consistently highlight antigen presentation, dendritic cell programs, vesicle trafficking, and immune signaling pathways.

More »

Expand

Fig 6.

Visualization of supercells in t-SNE embedding.

Cells belonging to Supercell 62 and Supercell 363 are highlighted using triangular markers. (a) Supercell 62 corresponds to a rare dendritic-cell population. (b) Supercell 363 contains a group of B cells with embedded NK cells, representing a rare composite population.

More »

Expand

Table 3.

Performance comparison of different methods in ARI on the considered datasets.

More »

Expand

Table 4.

Performance comparison of different methods in NMI on the considered datasets.

More »

Expand

Fig 7.

Comparative t-SNE Visualization of Latent Representations on PBMC10 × Dataset: (a) scHG, (b) GBS, (c) scAI, (d) SMSC, (e) CGD, (f) scMNMF, (g) GSTRPCA.

More »

Expand

Table 5.

Performance comparison of different methods in time(s) on the considered datasets.

More »

Expand

Fig 8.

Execution time comparison (Log Scale) of different methods on the considered datasets.

More »

Expand

Fig 9.

Peak memory consumption comparison (Log Scale) of different methods on the considered datasets.

More »

Expand

Table 6.

Cell–level confusion statistics for the seven supercells (IDs 62, 63, 363, 453, 1771, 1852, 1917) in the PBMC10 × dataset before and after high-order refinement.

More »

Expand

Fig 10.

Connectivity graph of Supercell 62 before high-order pruning.

Node size indicates degree centrality and colors denote ground-truth clusters. Dashed ellipses mark low-centrality cells removed during high-order refinement. (This plot was generated using the CNSknowall platform (https://cnsknowall.com), a comprehensive web service for data analysis and visualization.).

More »

Expand

Table 7.

Performance comparison of additional baselines and scHG in ARI on the benchmark datasets.

More »

Expand

Table 8.

Performance comparison of additional baselines and scHG in NMI on the benchmark datasets.

More »

Expand

Table 9.

Ablation results of angle-aware metric and probabilistic pruning modules in six datasets across eight clustering evaluation metrics; E stands for Euclidean metric and A for angle-aware metric.

More »

Expand

Fig 11.

Ablation experiment results (NMI & ARI) on the considered datasets.

More »

Expand

Fig 12.

ARI variation across six benchmark datasets:

(a) PBMC10 × , (b) PBMC_Inhouse, (c) Sim, (d) Bmcite, (e) mESC, (f) PBMC_Cao.Dots indicate: yellow (ground-truth cluster numbers), green (algorithm-derived optimal cluster numbers). Dashed indicate: blue (top-10 ARI threshold).‌‌.

More »

Expand

Fig 13.

NMI variation across six benchmark datasets: (a) PBMC10 × , (b) PBMC_Inhouse, (c) Sim, (d) Bmcite, (e) mESC, (f) PBMC_Cao.

Dots indicate: yellow (ground-truth cluster numbers), green (algorithm-derived optimal cluster numbers). Dashed indicate: blue (top-10 NMI threshold).

More »

Expand

Fig 14.

Clustering performance (ARI) across the (α, β) hyperparameter space on six datasets: (a) PBMC10 × , (b) PBMC_Inhouse, (c) Sim, (d) Bmcite, (e) mESC, (f) PBMC_Cao.

More »

Expand

Fig 15.

Clustering performance (NMI) across the (α, β) hyperparameter space on six datasets: (a) PBMC10 × , (b) PBMC_Inhouse, (c) Sim, (d) Bmcite, (e) mESC, (f) PBMC_Cao.

More »

Expand

Fig 16.

Clustering performance (ARI & NMI) across the γ hyperparameter space on six datasets: (a) PBMC10 × , (b) PBMC_Inhouse, (c) Sim, (d) Bmcite, (e) mESC, (f) PBMC_Cao.Solid lines denote clustering performance under different values of γ, while dashed lines indicate the median ARI (blue) or NMI (red) for each dataset.

More »

Expand

Table 10.

Randomness Analysis of scHG Across Six Datasets (Mean ± Variance over 20 Runs).

More »

Expand

Fig 17.

Convergence behavior of the omics-weighted optimizer across datasets.

Objective value as a function of the iteration number for six representative datasets (PBMC10 × , PBMC_inhouse, Bmcite, mESC, PBMC_Cao, and Sim). Each curve corresponds to one dataset.‌‌.

More »

Expand

Fig 18.

Comparison between Euclidean and angle-aware metrics.

Although Euclidean distance suggests grouping cells i and k, the angular metric based on Pearson correlation indicates higher similarity between cells i and j, illustrating the advantage of the angle-aware metric.

More »

Expand

Fig 19.

Construction of supercells.

(a) Example graph constructed from adjacency matrix M. (b) Illustration of second-order co-occurrence neighbors based on shared -nearest neighbors exceeding threshold . (c) Example of probabilistic pruning within a connected component.

More »

Expand

Table 11.

Detailed information of single-cell multi-omics datasets.

More »

Expand