SpaMask: Dual masking graph autoencoder with contrastive learning for spatial transcriptomics

doi:10.1371/journal.pcbi.1012881

Fig 1.

Overview of SpaMask.

(A) SpaMask employs two distinct masking techniques to handle the gene expression matrix and spatial topology structure separately. (B) SpaMask integrates Masked Graph Autoencoders (MGAE) and Masked Graph Contrastive Learning (MGCL) modules. MGAE employs node masking to infer missing features based on spatial neighbor information. MGCL applies edge masking to create a contrastive loss framework that tightens embeddings of adjacent nodes based on spatial proximity and feature similarity. (C) The learned latent representations are applied to spatial clustering, trajectory inference, gene expression imputation, and other downstream analytical tasks.

More »

Expand

Fig 2.

SpaMask enhances tissue structure identification in human DLPFC tissue.

(A) Boxplots of ARI (left), ACC (middle) and DIS (right) scores to all DLPFC slices. (B) Tissue image, manually annotated layer structures, and spatial domains detected by nine methods on slice 151507. (C) UMAP visualization and PAGA graph generated by SpaMask, GraphST, STAGATE, SEDR and DeepST embeddings respectively on slice 151507. (D) Manually annotated layer structures of slice 151673. (E) Spatial expression patterns of SVGs detected by SpaMask on slice 151673. (F) Spatial domains detected by SpaMask. (G) Spatial expression patterns of meta-genes detected by SpaMask.

More »

Expand

Fig 3.

SpaMask enhances spatial patterns of layer-specific marker genes in the DLPFC dataset.

(A) Visualization of the raw and SpaMask-denoised spatial expression data for six layer-specific marker genes on slice 151674. (B) Nissl images sourced from the publicly available Allen Human Brain Atlas. (C) Violin plots showing the raw and denoised expression levels of layer-specific marker genes. Red boxes highlight the cortical layers corresponding to each layer-specific marker gene. (D) The Moran’s I and Geary’s C indices for the top fifty differentially expressed genes from SpaMask, STAGATE, and raw data.

More »

Expand

Fig 4.

SpaMask improves the identification of known tissue structures in human breast cancer and melanoma tissues. (A) Manually annotated layer structures of human breast cancer tissue (left) alongside spatial domains detected by SpaMask, GraphST, STAGATE, and SEDR (right). (B) A bar chart illustrating the clustering accuracy of various methods on breast cancer, measured using ARI, Accuracy, and Discreteness scores. (C) Heatmap displaying the expression of structural domains for the top three DEGs from domains 1, 2, 11, and 18. (D) and (E) show the volcano plots of DEGs between domain 18 (tumor edge) and domain 2 (healthy), along with the differential expression analysis of specific genes. (F) IGHG1 serves as a marker gene for displaying the raw and denoised spatial expression. (G) Tissue image, manually annotated layer structures, and spatial domains detected by various methods on human melanoma tissue. (H) Top ten key GO:BP terms for cluster 1 (lymphoid, left) and cluster 3 (melanoma, right).

More »

Expand

Fig 5.

SpaMask performs spatial domain identification across various SRT datasets generated on different platforms.

(A) and (B) show the true annotations and clustering results from different methods for mouse embryo and mouse olfactory bulb, respectively, generated using the Stereo-seq platform. (C) Visualization of spatial domains identified by SpaMask in the mouse olfactory bulb data. (D) Visualization of the mouse somatosensory cortex dataset generated by the osmFISH platform, along with the spatial domains identified by various methods. (E) and (F) are visualizations of two specific slices of the mouse hypothalamic preoptic area, located at Bregma –0.04 mm and –0.09 mm, respectively, generated by the MERFISH platform.

More »

Expand

Fig 6.

SpaMask effectively alleviates batch effects across continuous slices on the human DLPFC tissue.

(A) Adapted from an open-source image available at Openclipart (https://openclipart.org/detail/38533/brain-side-cutaway). (B) Four consecutive slices (151673–151676) from Donor 3 with annotated cortical layers. (C) Box plots comparing SpaMask and various methods (SPIRAL, STitch3D, Splane, GraphST, STAligner, SEDR, and DeepST) across four metrics: ARI, ACC, DIS, and F1LISI. (D) Spatial domains identified by SpaMask, SPIRAL, STitch3D, and DeepST on Donor 3 slices, showing effectiveness in domain detection. (E) UMAP embeddings with color-coded batches (top) and cortical layers (bottom) for SpaMask and comparison methods, highlighting SpaMask’s ability to address batch effects while preserving domain structure across layers.

More »

Expand

Fig 7.

Ablation studies of SpaMask components on human DLPFC Donor 3 data from 10x Visium.

(A) The box plot shows ARI, ACC, and DIS metrics, comparing SpaMask configurations to assess component contributions. (B) The spatial domain identification results for Donor 3 emphasize the cortical layer structures identified by SpaMask under different configurations, including dual-channel without node masking or without edge masking, MGAE Channel with/without Mask, MGCL Channel with Mask and Positive Selection with Remain.

More »

Expand

Table 1.

Ablation studies on multiple datasets from various platforms to validate the significance of components of SpaMask contributions.

More »

Expand