Fig 1.
(A) Balls; (B) elongated with bridge; (C) swiss roll; and (D) GL manifold. (A) and (B) show the 2-dimensional data sets. (C) plots the first two coordinates of the Swiss roll. (D) shows the 2-dimensional PCA plot of the SO(3) manifolds.
Fig 2.
UMAP and PCA on Tabula Muris data sets.
Tabula Muris data sets have elongated clusters in the PCA embedding and clusters connected with a bridge of points in the UMAP embedding. For both PCA and UMAP embeddings, certain clusters are not well-separated and connected by high density regions.
Table 1.
Notations.
Fig 3.
Optimal ℓp path between two points in a moon data set.
Table 2.
The results of clustering accuracy (ARI) for manifold data.
Table 3.
Geometric perturbation for manifold data.
Table 4.
The results of clustering accuracy (ARI) for scRNAseq data.
Fig 4.
Comparison of cluster structure preservation on PCA, UMAP and t-SNE embeddings.
Top row: 2d PCA, PM2, UMAP, and t-SNE embeddings of Cell Mix data set, colored by true cell type. Bottom row: average linkage dendrograms of cluster means for the rd embeddings, where r = 40 for PCA, r = 4 for PM2 and UMAP, and r = 3 for t-SNE.
Table 5.
Geometric perturbation for RNA data.
Fig 5.
Processing and clustering time for PBMC4K and Baron’s Pancreatic data sets.
Fig 6.
Clustering performance for different values of p.