Randomized Spatial PCA (RASP): A computationally efficient method for dimensionality reduction of high-resolution spatial transcriptomics data

doi:10.1371/journal.pcbi.1013759

Fig 1.

RASP overview.

A: Method workflow. RASP takes as input a cell-by-gene expression matrix and spatial coordinates from a ST experiment. The expression matrix is reduced via randomized PCA and spatially smoothed by a sparse inverse distance matrix. Non-transcriptomic covariates can be added at this stage (not pictured). The output of RASP can be used for downsteam analyses including cell type annotation and spatial domain identification. B: Evaluation pipeline. RASP was tested on four ST datasets each generated using a different platform (MERFISH, Stereo-seq, Xenium, Visium) and two synthetic datasets (100 replicates each). RASP results were compared against standard PCA and alternative spatially aware dimensionality reduction models (SEDR, GraphST, SpatialPCA, BASS, STAGATE, CellCharter, nichePCA, and MENDER) for both accuracy and speed.

More »

Expand

Table 1.

RASP parameter suggestions.

More »

Expand

Fig 2.

Overview of mouse ovary analysis using Vizgen MERFISH data.

A: Cell Type Annotation Comparison — Ground truth cell type labels (top left) alongside predictions from RASP, normal PCA, and other methods, illustrating accuracy of spatial mapping. B: Runtime Performance — Comparison of computational runtime across methods, highlighting RASP efficiency relative to standard randomized PCA. C: Clustering Accuracy (ARI) Across Methods — Adjusted Rand Index values for all methods, color-coded by clustering algorithm; shows full ARI range at default RASP parameters (kNN = 2–20, ) and the default parameter single-point performance (kNN=10, ). RASP-moran and RASP-CHAOS indicate ARI when using label-agnostic metrics for parameter selection. D: Effect of β on ARI — ARI values for RASP with inverse distance weighting raised to (left), 0.5 (middle), and 2 (right), plotted against kNN values; colors indicate clustering algorithm used. E: Spatial Expression Patterns of Lhcgr — Normalized expression (left), reduced rank reconstructed expression (center), and spatially smoothed reconstructed expression (right). White lines depict luteinizing mural and luteal cell compartment boundaries.

More »

Expand

Fig 3.

Covariate analysis (Mouse ovary dataset) for the cell type clustering case.

A: RASP Model Architectures — Illustrations of the baseline RASP model without covariates (top left), the single-stage RASP model incorporating one covariate (top right), and the two-stage RASP model architecture that integrates an additional covariate for improved clustering. B: Clustering Performance with Covariate Adjustment — Adjusted Rand Index (ARI) comparisons across models without covariates, single-stage RASP, and two-stage RASP, plotted against varying kNN values. Columns represent different clustering algorithms, while rows show the impact of different covariates: local cell density (top), cell library size (middle), and cell volume (bottom).

More »

Expand

Fig 4.

RASP performance across different scales in the Mouse Sagittal brain dataset.

A: Cell Type Annotations — Ground truth cell type labels (top) alongside predicted labels identified by RASP (bottom), demonstrating cell-level classification accuracy. B: Cell Type Clustering Accuracy — Adjusted Rand Index (ARI) values for all methods based on cell type annotations in A, with colors indicating clustering algorithms. Displays full ARI range at default RASP parameters (kNN = 2–20, ) and single-point ARI for default parameters (kNN = 10, ). RASP-moran and RASP-CHAOS indicate ARI when using label-agnostic metrics for parameter selection. C: Effect of β on Cell Type Clustering — ARI values for RASP with inverse distance weighting raised to (left), 1 (middle), and 2 (right), plotted against kNN values; colors denote clustering algorithm. D: Region Annotations — Ground truth spatial brain region labels and corresponding regions identified by RASP. E: Region-Level Clustering Accuracy — ARI quantification similar to B, but computed using region annotations from D. Shows full ARI range at default RASP parameters (kNN = 50–100, ) and single-point ARI performance (kNN = 50, ). RASP-moran and RASP-CHAOS indicate ARI when using label-agnostic metrics for parameter selection. F: Runtime Comparison — Computational runtime of all methods relative to randomized PCA, highlighting efficiency. H: Effect of β on Region-Level Clustering — ARI values computed on region annotations (D) with inverse distance weighting raised to (left), 1 (middle), and 2 (right), plotted against kNN values; colors indicate clustering algorithms.

More »

Expand

Fig 5.

Human dorsolateral prefrontal cortex (DLPFC) spatial transcriptomics analysis, 10X Visium data.

A: Cortical Layer Identification — Ground truth cortical layer annotations (left) compared to spatial domains identified by RASP, PCA, and other competing methods, demonstrating spatial domain delineation accuracy. B: Runtime Comparison — Computational runtimes for all methods relative to randomized PCA, highlighting relative efficiency. C: Clustering Accuracy (ARI) Across Methods — Adjusted Rand Index (ARI) values for methods with colors indicating clustering algorithms. Shows the full range of ARI values at default RASP parameters (kNN = 3–10, ) alongside the single-point ARI performance at the default parameter (kNN = 5, ). RASP-moran and RASP-CHAOS values reflect ARI from label-agnostic metric-based parameter selection. D: Impact of β on Clustering Performance — ARI values for RASP with inverse distance weighting raised to (left), 1 (middle), and 2 (right), plotted against varying kNN values; colors denote clustering algorithms. E: Spatial Expression of TMSB10 — Normalized TMSB10 expression (left), reduced rank reconstructed expression (center), and spatially smoothed reconstructed expression (right), all plotted on tissue sections. White lines indicate the border of cortical layer 5.

More »

Expand

Fig 6.

Spatial domain analysis of the mouse olfactory bulb using STOmics Stereo-seq data.

A: Laminar Structure Identification — Ground truth cortical laminar structure (top left) alongside spatial domains identified by RASP, PCA, and other methods, demonstrating domain detection accuracy. B: Runtime Performance — Comparison of computational runtimes for all methods relative to normal PCA, highlighting efficiency differences. C: Spatial Autocorrelation Metrics — Moran’s I (top) and CHAOS (bottom) scores for all methods, with colors indicating clustering algorithms. Default and best scores are reported for RASP with default parameters (kNN = 30–100, ). D: Effect of β on Spatial Metrics — Moran’s I (top) and CHAOS (bottom) scores for RASP as inverse distance weighting is raised to (left), 1 (middle), and 2 (right), plotted against varying kNN; colors indicate clustering algorithm. E: Spatial Expression of Doc2g — Normalized expression (left), reduced rank reconstructed expression (center), and spatially smoothed reduced rank reconstructed expression (right) of Doc2g plotted on tissue.

More »

Expand

Fig 7.

Spatial domain analysis of human breast cancer tissue using 10x Xenium data.

A: Ground Truth and Predicted Spatial Domains — Ground truth spatial domain annotations (top left) compared to spatial domain predictions from RASP and normal PCA, illustrating method performance in domain delineation. B: Clustering Accuracy Across Methods — Adjusted Rand Index (ARI) values for all methods, with colors representing different clustering algorithms. Displays full range of ARI scores at default RASP parameters (kNN = 30–100, ), along with the single-point performance at default parameter values (kNN = 50, ). RASP-moran and RASP-CHAOS indicate ARI computed using label-agnostic metrics for parameter selection. C: Spatial Expression of FAM3B and CD52 — Normalized expression (left column), reduced rank reconstructed expression (center column), and spatially smoothed reduced rank reconstructed expression (right column) of FAM3B and CD52 plotted on tissue sections. White boundaries indicate DCIS (top row) and Immune cell compartments (bottom row). D: Influence of Smoothing Parameters on Clustering — ARI values for RASP plotted against smoothing distance (top) and kNN (right), with colors indicating different clustering algorithms.

More »

Expand