Integrating multiple spatial transcriptomics data using community-enhanced graph contrastive learning

Wenqian Tu; Lihua Zhang

doi:10.1371/journal.pcbi.1012948

Abstract

Due to the rapid development of spatial sequencing technologies, large amounts of spatial transcriptomic datasets have been generated across various technological platforms or different biological conditions (e.g., control vs. treatment). Spatial transcriptomics data coming from different platforms usually has different resolutions. Moreover, current methods do not consider the heterogeneity of spatial structures within and across slices when modeling spatial transcriptomics data with graph-based methods. In this study, we propose a community-enhanced graph contrastive learning-based method named Tacos to integrate multiple spatial transcriptomics data. We applied Tacos to several real datasets coming from different platforms under different scenarios. Systematic benchmark analyses demonstrate Tacos’s superior performance in integrating different slices. Furthermore, Tacos can accurately denoise the spatially resolved transcriptomics data.

Author summary

Integrating multiple spatial transcriptomics datasets can provide more comprehensive understanding of tissue environment. However, batch effects and different sequencing resolutions make the integrative analysis of various spatial datasets be challenge. Current integration methods require spatial transcriptomics data with similar structures and similar resolutions, which might be violated for real heterogeneous datasets. We present Tacos, which adopts a community-enhanced contrastive graph neural network method to model the spatial transcriptomics data by considering heterogenous structures. Tacos uses a triplet loss to facilitate the alignment of different slices by pulling the mutual nearest neighbor pairs between spots from different slices close and pushing the randomly selected negative pairs away. Applications on spatial transcriptomics coming from various sequencing platforms demonstrate that Tacos is not only efficiently remove batch effects but also preserve the biological structures, especially on the spatial transcriptomics data with different resolutions. Moreover, Tacos is able to denoise the spatial transcriptomics data.

Citation: Tu W, Zhang L (2025) Integrating multiple spatial transcriptomics data using community-enhanced graph contrastive learning. PLoS Comput Biol 21(4): e1012948. https://doi.org/10.1371/journal.pcbi.1012948

Editor: Wei Li, Children's National Hospital, George Washington University, UNITED STATES OF AMERICA

Received: November 10, 2024; Accepted: March 10, 2025; Published: April 3, 2025

Copyright: © 2025 Tu, Zhang. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Source codes and tutorials have been deposited at the GitHub repository (https://github.com/0617spec/tacos; https://anonymous.4open.science/r/tacos-5EDE).

Funding: The study is supported by the National Natural Science Foundation of China (Grant Number: 62202343 to LZ) and the Key Technologies Research and Development Program (Grant Number: 2023YFF0725400 to LZ). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors declare that they have no competing interests.

Introduction

Spatial transcriptomics (ST) sequencing technologies provide both spatial location information and gene expression information, enabling comprehensive characterization of gene expression patterns in the context of tissue microenvironment [1]. It’s well known that the biological functions are highly associated with spatial information. ST technologies have been wildly adopted to study neuroscience, plant biology and disease research [2,3]. An increasing number of spatial transcriptomic datasets have been generated with varying spatial resolutions. For example, 10x Visium achieves a spatial resolution of 55 μm [4], Slide-seq with a spatial resolution of 10 μm approaches single-cell resolution [5,6], Stereo-seq and seqFISH achieve subcellular resolution [7,8]. However, these ST datasets often suffer from severe noise as the shallow nature of sequencing for each spot and other steps in preserving the spatial locations of sequencing.

Many data integration methods for single cell transcriptomics data have been proposed [9] (i.g., Seurat [10], Harmony [11] and scMC [12]), while they cannot be applied spatial transcriptomics data as the spatial coordinates are not considered. Some computational methods have been proposed for processing and analyzing spatial transcriptomics data. For example, SpaGCN [13] incorporates gene expression and spatial coordinates with extra histological images information to detect spatial domains by graph convolutional network. SpaceFlow [14] utilizes spatially regularized graph contrastive neural network to integrate spatial information and gene expression data, generating low-dimensional embeddings. STAGATE [15] adopts a graph attention autoencoder framework to selectively integrate information from neighboring spots. GraphST [16] combines graph contrastive neural network and self-supervised learning to fully leverage spatial information and gene expression profiles for various analytical tasks. These methods have been comprehensively benchmarked in a previous study [17]. However, SpaGCN, SpaceFlow and STAGAGT are designed for single slice, which cannot be applied to integrating multiple spatial transcriptomics data.

Slices coming from different experimental conditions or technology platforms or section locations often exhibit differences in various levels. Several computational methods have been proposed to integrate multiple slices of spatial transcriptomics data. PASTE [18] applies optimal transport algorithm to align different slices. GPSA [19] uses deep Gaussian process to align the spatial coordinates of different slices. These methods perform well in integrating multiple slices at the same resolution. STAligner [20] and SLAT [21] integrate multiple slices based on graph neural network. SPIRAL align multiple slices by combining graph neural network and optimal transport method [22]. They do not consider the different community density between slices from different resolutions, which may limit their performance in integrating multiple slices coming from different platforms.

To fill this gap, we developed a multiple spatial Transcriptomics data integration method using community-enhanced graph contrastive learning (Tacos). We applied Tacos to several real datasets, including multiple human cortex slices with the same platform, two mouse olfactory bulb slices from different platforms, two mouse embryo slices with different spatial resolutions, and spatial transcriptomic data of healthy and Alzheimer coming from different platforms. Systematic benchmark analyses with other methods show that Tacos is an efficient method for integrating multiple spatial transcriptomic datasets across different conditions.

Results

Overview of Tacos

Tacos takes both the spatial coordinates and the normalized gene expression profiles as inputs. As depicted in Fig 1, Tacos first constructs spatial graph for each slice based on its spatial coordinates. Then a graph contrastive learning-based encoder is used to extract spatially aware embedding of each slice. Considering heterogenous spatial structures within each slice or across different slices, Tacos adopts communal attribute voting and communal edge dropping strategies to generate augmented graph views (Methods). Specially, communal attribute voting strategy detects nodes’ features that are more likely to be masked. And communal edge dropping strategy is used to compute edges’ mask probabilities. With the community-enhanced contrastive learning-based encoder, Tacos obtains the spatially aware embeddings. Next, Tacos detects mutual nearest neighbor (MNN) pairs between spots from different slices based on the spatially aware embeddings to facilitate the alignment of different slices. Tacos treats the MNN pairs as positive pairs and treats randomly selected spots as negative points. Then Tacos adopts a triplet loss to pull the positive pairs close and push the negative pairs away to update the embeddings. Finally, downstream analyses can be done on the embeddings, which are used for denoising the spatial transcriptomics data (Methods: “Spatial Transcriptomics Data Denoising”).

Download:

Fig 1. Overview of Tacos.

Here, we take two spatial transcriptomics data as an example. Tacos treats these two normalized gene expression matrices X₁, X₂ and their corresponding spatial coordinates Y₁, Y₂ as inputs. Tacos builds graphs G₁ and G₂ based on Y₁ and Y₂. Two augmented graph views are generated for G₁ and G₂. A two-layer GCN encoder is adopted to extract embeddings Z₁ and Z₂. To align the slices, we identified MNN pairs based on Z₁ and Z₂. Tacos adopts a triplet loss to pull the MNN pairs close and push the negative pairs away to align the embeddings, where the negative pair are two spots with one spot from MNN and another spot being randomly selected. The total loss of Tacos is composed with three different constraints (contrastive constraint , within-slice constraint and across-slice constraint ). Red edges and red nodes are masked ones.

https://doi.org/10.1371/journal.pcbi.1012948.g001

Tacos achieves superior alignment performance on different slices from the same platform

We applied Tacos to the ST datasets of the dorsolateral prefrontal cortex (DLPFC) from 10x Visium. Firstly, we focused on the adjacent slices (slice number: 151674 and 151675) from the same donor. We compared Tacos with Scanpy [23], Harmony [11], SLAT [21], SPIRAL [22] and STAligner [20]. Tacos, SLAT, Harmony, SPIRAL and STAligner could remove the batch effect which can be visualized in the UMAP [24] space (Fig 2A). Then we inferred the developmental trajectory using PAGA [25] on the integrated embeddings. PAGA estimates the connectivity of manifold partitions, providing an interpretable, graph-like map of the manifold of embedding. A good integrated embedding should preserve the linear structure of different layers. The PAGA graphs from STAligner and Tacos clearly depicted a linear developmental trajectory, whereas Scanpy, Harmony, SPIRAL and SLAT were unable to capture such pattern.

Download:

Fig 2. Benchmarking Tacos with other methods on DLPFC datasets from 10X Visium.

(A) UMAP and PAGA visualization of aligned space of Scanpy, Harmony, SLAT, SPIRAL, STAligner and Tacos on slices 151674 and 151675. Spots are colored by slice numbers and annotated layers respectively. In a PAGA graph, thicker edges indicate stronger connections. (B) Bar plots of different metric scores of aligned performances of these methods on slices of 151674 and 151675. Batch Entropy Score, Graph connectivity, bASW and bLISI are metrics used for evaluating batch correction, while cASW and cLISI are used for evaluating biological conservation. (C) Visualization of the reconstructed marker gene (VIM for Layer1, PCP4 for Layer5) for slices 151507, 151669 and 151673, respectively.

https://doi.org/10.1371/journal.pcbi.1012948.g002

Then we applied Tacos to integrate non-adjacent slices (slice number: 151508 and 151675), which were derived from different donors. This setup introduced greater variability between the slices, making the task of alignment be difficult. The developmental trajectory inferred on the integrated embeddings of Tacos remained consistent in a linear developmental trajectory (Fig Aa in S1 Text). In comparison, STAligner was slightly less effective in maintaining the integrity of the hierarchical structure. On the other hand, Harmony, SLAT and SPIRAL exhibited increased misalignments, likely exacerbated by the inherent donor-specific differences between the slices. These misalignments disrupted the continuity of key structures, resulting in less reliable representation of the underlying biological architecture. We employed batch effect removal metrics (Methods; Batch Entropy Score, Graph connectivity, bASW, bLISI) and label preserved metrics (cASW, cLISI) to quantitively compare these methods quantitively. Tacos consistently ranked in the top for capturing the true layers (Figs 2C, Ab in S1 Text). SLAT and SPIRAL demonstrated the best alignment performance in terms of batch entropy score. However, it had less scores in preserving annotated layers.

We compared the community-enhanced graph contrastive learning with graph convolutional network (GCN) encoder and graph contrastive learning (GraphCL) on slices 151508 and 151675 of DLPFC data. Specially, we replaced the community-enhanced graph contrastive learning with GCN and GraphCL, while maintaining other structures in Tacos. The community-enhanced graph contrastive learning approach consistently exhibited the best overall performance (Table A in S1 Text). We also applied PAGA to the embeddings of each method and found that the community-enhanced graph contrastive learning method could accurately capture the liner structures between different layers than that of GCN and GraphCL (Fig B in S1 Text).

Finally, we investigated the effectiveness of Tacos in data denoising using one slice from each donor (slice number: 151507, 151669 and 151673). Specifically, we focused on the marker genes associated with distinct brain layers, such as VIM for Layer 1 and PCP4 for Layer 5. Tacos accurately recovered the marker expression levels (Fig 2D). For example, the recovered spatial pattern of VIM was highly consistent with that of Layer 1 in slice 151673. Moreover, Tacos is able to integrate multiple slices at the same time. Specially, we applied Tacos to integrate adjacent four slices of each donor on DLPFC data. We found that Tacos could also accurately detect the linear structures of adjacent layers (Fig C in S1 Text).

Tacos identifies slice-specific structures with clear boundaries across different platforms

We employed Tacos to analyze spatial transcriptomics data of mouse olfactory bulb collected from Slide-seqV2 [5] and Stereo-seq [7], which had different sequencing resolutions (Table 1). We compared Tacos with Scanpy, Harmony, SLAT, SPIRAL and STAligner in integrating results. Specially, we identified clusters by applying Louvain [26,27] algorithm on PCA space of normalized data and the low-dimensional embeddings derived from Harmony, SLAT, SPIRAL, STAligner and Tacos, respectively. Then we annotated each cluster according to the laminar structure in the DAPI-stained image [28] and Allen Mouse Brain Atlas annotation. Tacos notably detected clear layer boundaries within the mouse olfactory bulb (MOB) tissue (Fig 3A). These two slices had strong batch effects (Fig 3B). Tacos aligned clusters with the same structures together while preserved the uniqueness of Slide-seqV2 specific structures. Specially, the accessory olfactory bulb (AOB) and the granular layer of the accessory olfactory bulb (AOBgr) were Slide-seqV2 specific clusters, which were consistent with the spatially expressed patterns of their corresponding markers (Fxyd6 and Atp2b4). Tacos demonstrated the ability to preserve slice-specific structures while effectively integrating the shared laminar architecture, which avoided the risk of overcorrecting for batch effects.

Download:

Table 1. Summary of the spatial datasets.

https://doi.org/10.1371/journal.pcbi.1012948.t001

Download:

Fig 3. Benchmarking Tacos with other methods on the MOB dataset.

(A) Visualization of spatial domain detection results of Tacos, STAligner, SPIRAL, SLAT, Harmony and Scanpy. (B) UMAP visualization of Tacos aligned space colored by louvain clusters and platforms. The expression levels of Atp2b4 and Fxyd6 are also shown. (C) Spatial visualization of spatial domains identified by Tacos and corresponding marker genes from Slide-seq slice (top) and Stereo-seq slice (bottom). The layers progressed from the outer to the inner layers, including the olfactory nerve layer (ONL), glomerular layer (GL), external plexiform layer (EPL), mitral cell layer (MCL), granule cell layer (GCL), rostral migratory stream (RMS), granular layer of the accessory olfactory bulb and accessory olfactory bulb.

https://doi.org/10.1371/journal.pcbi.1012948.g003

To further investigate the shared clusters, we compared the spatial pattern of each cluster and its corresponding denoised marker gene. The shared clusters were spatially ordered from the outer to the inner layers of the MOB, including olfactory nerve layer (ONL), glomerular layer (GL), mitral cell layer (MCL), granule cell layer (GCL) and rostral migratory stream (RMS) (Fig 3C). Tacos accurately detected the RMS structure, a crucial structure in the mouse olfactory bulb slices. The boundaries of adjacent layers detected by STAligner and SPIRAL were not clear. Scanpy and Harmony failed to detect the structures, while SLAT only revealed the boundaries of major clusters.

Tacos accurately maps tissues during different developmental stages of mouse embryo across different platforms

We employed Tacos to analyze spatial transcriptomic data of mouse embryos at E8.5 and E9.5 from seqFISH [29] and Stereo-seq [7], respectively. These two slices had different resolutions (Table 1). In addition to the technical variations, these two slices also exhibited differences in different developmental stages. We annotated each slice with the label from previous studies [7,29] (Fig 4A). Then we compared the performance of Tacos with Harmony, SLAT, SPIRAL and STAligner in removing batch effect and preserving cell types. Tacos, SPIRAL, SLAT and Tacos successfully aligned the slices despite their substantial technical differences, while Harmony and STAligner failed to remove the technological discrepancies (Fig 4B). SPIRAL, SLAT and Tacos had higher scores using batch effect removal-related metrics. However, SLAT and SPIRAL had less scores than Tacos using the cell type preserved-related metrics (Fig 4C). Tacos had the largest cASW scores, suggesting that Tacos outperformed other methods in preserving data structures.

Download:

Fig 4. The performance of Tacos on datasets of mouse embryo from seqFISH and Stereo-seq.

(A) Spatial visualization of mouse embryo from seqFISH and Stereo-seq with the annotated label. (B) UMAP visualization of aligned embeddings obtained by Scanpy, Harmony, SLAT, STAligner, SPIRAL and Tacos. (C) Bar plots of the metric scores of these methods in aligning mouse embryo slices. (D) UMAP visualization of aligned slices. Spots with the same labels are colored by different colors, while spots with different labels are colored as grey. (E) Sankey plot of connection between seqFISH and Stereo-seq on low embedding of Tacos.

https://doi.org/10.1371/journal.pcbi.1012948.g004

Next, we investigated the aligned structures in the integrated space of Tacos. Specially, we examined pairs of spots with the same cell types across slices (Fig 4D). Tacos accurately aligned the neural crest from seqFISH and Stereo-seq. The neural crest displayed a dispersed pattern due to its complex migratory behavior and its role in differentiating into various cell types across multiple regions during embryonic development [30]. Tacos aligned Cardiomyocytes (seqFISH) and Heart (Stereo-seq) spots together. Moreover, Tacos accurately aligned the brain regions from Stereo-seq with the Forebrain/Midbrain/Hindbrain regions from seqFISH. Spots corresponding to the Aorta–gonad–mesonephros (AGM) region from Stereo-seq exhibited closely to the Splanchnic mesoderm and Intermediate mesoderm from seqFISH. The AGM is a critical site for hematopoiesis in mammals [31], and the mesonephros originates from the intermediate mesoderm [32]. Additionally, cells from the Splanchnic mesoderm serve as vasculogenic precursors, contributing to the formation of blood vessels and cells [33].

To further explore the biological relationships between tissues, we used a Sankey diagram based on Euclidean distances. Beyond the aforementioned pairs, we discovered significant connections between the Neural crest (Stereo-seq) and Cranial mesoderm (seqFISH), a relationship supported by previous studies, which highlighted the crucial interactions between these two regions during craniofacial development, facilitating coordination and regulatory processes [34]. Another strong association was observed between the Liver (Stereo-seq) and the Gut tube (seqFISH), which reflected their developmental connection, as the foregut, part of the gut tube, gave rise to the liver during organogenesis [35] (Fig 4E). Overall, Tacos accurately mapped tissues across different developmental stages, providing a comprehensive view of embryonic development trajectories.

Tacos accurately detects different sophisticated structures between healthy and Alzheimer disease slices

Next, we applied Tacos to the spatial transcriptomics data of a healthy mouse hippocampal sample [5] and an Alzheimer’s disease (AD) sample [36]. These two data were obtained by Slide-seqV2 platform. We identified spatial domains by Louvain algorithm on the embeddings of Tacos and compared the results of Harmony, SLAT, SPIRAL and STAligner, respectively. Tacos and STAligner effectively delineated distinct anatomical structures within the hippocampus. In contrast, Harmony, SLAT and SPIRAL struggled to produce clear boundaries between spatial domains, which made it challenging to identify some of the more subtle structural variations (Fig 5A).

Download:

Fig 5. The performance of Tacos on healthy hippocampus slice and Alzheimer hippocampus slice.

(A) Spatial visualization of domains detected by Scanpy, Harmony, SLAT, SPIRAL, STAligner and Tacos. (B) Spatial visualization of domains corresponding to CA1, CA2, CA3 and DG from Tacos and STAligner. (C) Spatial visualization of marker genes of spatial domain CA1, CA2, CA3 and DG. (D) Spatial visualization of disease relative genes in two slices including C1qa, C1qb, C1qc and Csf1r.

https://doi.org/10.1371/journal.pcbi.1012948.g005

Precise boundary detection is crucial for identifying pathologically relevant regions. We found that Tacos was able to capture the well-characterized hippocampal formations, such as the cord-like Cornu Ammonis (CA1, CA2, and CA3) and the arrow-like dentate gyrus (DG), structures that were prominently recognizable and essential for hippocampal function (Fig 5B). While STAligner, could only discern the CA1 and CA3 regions, failed to detect the relatively small CA2 region, which was a transitional area that bridged CA1 and CA3. A previous study had shown that CA2 had distinct characteristics and treated it as an independent domain [37]. Moreover, Tacos efficiently denoised the spatial transcriptomic data and the recovered spatial patterns of marker genes were consistent with the spatial domains (Fig 5C). Aβ plaques, composed of aggregated Aβ protein polymers, are a hallmark of Alzheimer’s disease and serve as one of the key pathological indicators [38]. These plaques were highly localized within the brain and are thought to contribute to neurodegeneration by disrupting neural function. We observed a strong spatial association between the early upregulation of C1q genes (C1qa, C1qb, C1qc)—which were implicated in the formation of Aβ plaques [36]—and their high expression specifically in regions dense with Aβ deposits (Fig 5D). Moreover, we detected a significant spatial correlation between the expression of Csf1r, a key growth factor receptor involved in microglial function, and Aβ plaque regions [36]. Microglia, the brain’s resident immune cells, are thought to play a critical role in the progression of AD, particularly in the clearance of Aβ deposits [36,39].

Finally, we applied Tacos to analyze the human healthy and AD slices coming from Xenium [40]. These two slices had large batch effects and Tacos aligned the healthy and Alzheimer’s disease slices (Fig 6A). We further compared the domains 1, 3, and 6, which had regular spatial patterns in both healthy and Alzheimer’s disease slices (Fig 6B). Interestingly, in the Alzheimer’s disease slice, we observed that the expression level of CHODL was relatively higher in domain 1. CHODL has been implicated in influencing cell survival and neuron growth in animal models, suggesting that its elevated expression in this domain may reflect an adaptive [41] or pathological response in the Alzheimer’s brain.

Download:

Fig 6. The performance of Tacos on datasets of Xenium (Healthy and Alzheimer disease human brain).

(A) UMAP plot of raw data (left) and the aligned low-dimensional embedding of Tacos (right). (B) Spatially visualization of domains detected on healthy slice (left) and the Alzheimer disease slice (right). (C) The distribution of denoised CHODL by Tacos on the spatial location. (D) Violin plot of CHODL in cluster 1 across healthy and Alzheimer slices.

https://doi.org/10.1371/journal.pcbi.1012948.g006

Discussion

Aligning spatial transcriptomics coming from different platforms or experiments is a major challenge problem. In this study, we present a graph contrastive learning-based method Tacos to integrate spatial transcriptomic data from different conditions. Tacos considers the heterogeneity of spatial domains and adopts a community enhanced strategy to build augmented graph views used for learning embeddings of spatial transcriptomics data. We apply Tacos to many spatial transcriptomics datasets coming from different platforms with their resolutions varying from subcellular to multicellular levels. Comprehensive benchmark analyses with other methods show its superior performance. Compared to other aligned methods such as SLAT and STAligner, Tacos exhibits strong robustness in aligning multiple spatial transcriptomic data, especially for data coming from different sequencing platforms.

Though Tacos achieves good performance in integrating spatial transcriptomics data, its performance can be further improved with the increasing numbers of annotated slices. Tacos is an unsupervised method, which does not take advantage of the annotated information. A semi-supervised method will be needed in the future. Moreover, the current framework of Tacos focuses on and transcriptomics information and spatial coordinates without considering histological images. Future work is expected to incorporate such information to facilitate more comprehensive understand of spatial heterogeneity. Large memory utilization is a critical issue for graph-based methods. To reduce memory usage, we have optimized our code by modifying the training process. Instead of feeding all slices into the model simultaneously, we process them individually during training, which can decrease memory requirements. Additionally, a lightweight graph convolutional approach in the community-enhanced graph learning step will be explored in the future.

Methods

Datasets and data preprocessing

In this study, we applied Tacos to six ST datasets from different platforms, including 10x Visium, Stereo-seq, Slide-seqV2, seqFISH, Xenium and MERFISH (Table 1). The dorsolateral prefrontal cortex (DLPFC) dataset was from 3 donors, and each donor had 4 slices (labeled as 151507–151510, 151669–151672 and 151673–151676). Each slice was manually annotated into 4–6 cortical layers and white matter [7]. The mouse olfactory bulb dataset included two slices obtained from Slide-seqV2 [5] and Stereo-seq [7]. The mouse embryo dataset consisted of two slices of somite stage E8.5 and stage E9.5 obtained from seqFISH [42] and Stereo-seq [7]. The mouse hippocampus dataset contained one healthy slice and one Alzheimer diseased slice coming from Slide-seqV2 platform. There are 2 slices in the high-resolution hippocampus dataset coming from Xenium [40].

We used scanpy [23] package to preprocess these data. Specially, the gene expression values were normalized by dividing the total UMI count across all genes in each spot and multiplying 10,000. And then the data was transformed to a natural log scale with pseud-count equaling 1. In this study, we used all genes if the spatial transcriptomics data has fewer than 3000 genes. Otherwise, we selected high variable genes using the function sc.pp.highly_variable_genes with flavor=“seurat_v3” in Scanpy package and top 3000 high variable genes were selected. Finally, genes in the intersection set of high variable genes of all slices were used.

Building spatial graph for spatial transcriptomic data

We constructed spatial graph for slice t using alpha complex [43] based on the spatial coordinate . Spots were vertexes and the normalized expression matrix was treated as feature. Then we built geometry-aware spatial proximity adjacency matrix for the graph . Specially, we computed a unique site V(s) for each spot or cell s using V(s) = {‖x − s‖ ≤ ‖x − v‖, ∀v ∈ C}, where C was the set of coordinates for all spots. V(s) is a spot set composed of any point x which is closer to s than to any other point v. Next, we identified the neighborhood edges E by connecting spots i and j as follows:

(1)

where is a local neighborhood around s with a radius r. The radius r was estimated by the mean distance of k nearest neighbors of the spot based on spatial coordinates. The alpha complex-based method uses geometric structures to define neighborhoods, which makes it more suitable for spatial data with non-uniform densities.

Extracting low-dimensional embeddings with community-enhanced encoder

We used a contrastive learning strategy to capture local information on spatial neighbor graph . In the first stage, we detected the putative communities using Leiden [44] on the normalized gene expression or on the embedding obtained by the following strategy. Specially, we generated two augmented graph views and by randomly perturbated the graph . Then we adopted a two-layer Graph Convolutional Network (GCN) on and to extract low-dimensional embeddings and . For the gene expression data with n genes, the first layer transforms the input feature dimension from n to 2h followed with Parametric Rectified Linear Unit (PReLU) activation, while the second layer reduces the feature dimension to h with the default value equaling 50. We used InfoNCE loss [45] to train the GCN as follows:

(2)

where and τ is the temperature of similarity. The tissue usually had complex spatial structures and the slices coming from different platforms might be in different resolutions, leading to the heterogenous communities within the spatial graphs. Therefore, we extracted the low-dimensional embeddings of slices by a community-enhanced encoder in the second stage [46]. The number of edges in community c was represented by , and the number of edges in the whole graph was represented by . We defined the community strength of c by as follows:

(3)

where was node degree. The community strength matrix was represented by S. Two strategies (communal attribute voting and communal edge dropping) were used to generate augmented graph views [46]. The dimensional attributes with higher scores were more likely to be masked. The probability of attribute masking was computed as: where was one-dimensional normalization operation and was community indicator. Then the masking matrices were generated from Bernoulli distributions. Afterward, the masking matrices were multiplied by the data matrices, achieving the purpose of masking a portion of the attribute as follows:

(4)

(5)

(6)

(7)

where ∘ was Hadamard product, and were two hyperparameters.

For each edge , the probability of masking was computed by:

(8)

(9)

where meant was an intra-community edge. The edge masking results were:

(10)

(11)

where and were two hyperparameters. To mask the graph edge, we manipulated the adjacency matrix of the graph:

(12)

(13)

By this way, we obtained two augmented graph views containing community strength information. Then we employed a two-layer GCN to learn the low-dimensional embeddings of these two graph views by:

(14)

(15)

We used InfoNCE loss to train the GCN as follows:

(16)

(17)

where . . was a dynamic balancing coefficient that could gradually increase during training . Finally, the embedding Z_t obtained from the GCN trained in the second stage on graph G_t.

Spatial similarity constraint within each slice

To capture the similarity between the spatially neighbored spots within each slice, we added a spatial similarity constraint [14] on the low-dimensional embedding Z_t of t-th slice as follows:

(18)

where represented the spatial Euclidean distance between spots i and spots j, and represented the Euclidean distance between low-dimensional embeddings of spot i and spot j.

Aligned constraint across different slices

We adopted the triplet learning to align different slices, which had shown superior performance in removing batch effects in scRNA-seq and spatial transcriptomics data integration [20,47]. We identified MNN pairs on the low-dimensional embeddings of these slices and then treated the MNN pairs as anchor and positive points. Negative points were randomly sampled. The aligned constraint minimizes the distance between positive pairs and maximizes the distance between negative pairs as follows:

(19)

where , , and represent the anchor point, positive point, and negative point, respectively. denotes the set of triplets and is the number of . θ (default: 1.0) is the margin parameter.

Finally, the total loss function is:

(20)

where α and β are hyperparameters that control the relative weighting of different components of the loss. We adopted the Adam optimizer in PyTorch with a learning rate being 0.001 and epoch equaling 2000 on a NVIDIA RTX 3090 GPU. The computational efficiency of Tacos on each dataset were shown in Table B in S1 Text.

Identification of spatial domains

We identified spatial domains with mclust or community-graph based Louvain or Leiden algorithm on the obtained embeddings of Tacos and other methods. Specially, we recommend to use mclust for data from the 10x Visium platform, while we suggest to use Leiden for other platforms.

Spatial transcriptomics data denoising

The learnt embeddings preserve the local context for each spot, which have potential in denoising the spatial transcriptomics data. Specially, we fed the embeddings into an extra two-layer perceptron neural network, which had a symmetric structure to the GCN encoder that map the low-dimensional embedding back to the original feature space. The corresponding loss function is:

(21)

where is the normalized gene expression matrix of slice t.

Evaluation metrics

We used the following metrics to evaluate the performance of integrating spatial domains of different slices.

Batch entropy score

Batch entropy score [48] quantifies the degree of alignment and it can be calculated as:

(22)

where is the proportion of spots from the i-th slice in a given region, is the total number of spots of the i-th slice. A higher batch entropy score indicates better alignment of different slices.

Label ASW and batch ASW score

Average silhouette width (ASW [49]) quantifies the degree of separation among spots with different labels. Higher label ASW score indicates better performance. Label ASW score is calculated as , where b presents the average distance of a slice i with other slices from the same cluster and a is the average distance of slice i with other slices from different clusters. To ensure higher scores indicate better slice mixing, batch ASW score (bASW) is scaled by subtracting them from 1. Label ASW score (cASW) and batch ASW are calculated on PCA embedding.

cLISI and bLISI

The inverse Simpson’s index (LISI) [11] reports the effective indicating perfect separation labels and perfect mixing slices. Suppose there are B slices, we rescaled cLISI and bLISI to the range 0–1. Specially, where x is LISI score.

Graph connectivity [50]

Graph connectivity (GC) metric measures whether the kNN graph representation of the integrated data connects cells with the same label. For a subset kNN graph which only contains cells with label c, GC is calculated as:

(23)

where is the number of nodes in the largest connected component of the graph.

Supporting information

S1 Text. Supplementary materials.

https://doi.org/10.1371/journal.pcbi.1012948.s001

(PDF)

References

1. Ståhl PL, Salmén F, Vickovic S, Lundmark A, Navarro JF, Magnusson J, et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science. 2016;353(6294):78–82. pmid:27365449
- View Article
- PubMed/NCBI
- Google Scholar
2. Rao A, Barkley D, França GS, Yanai I. Exploring tissue architecture using spatial transcriptomics. Nature. 2021;596(7871):211–20. pmid:34381231
- View Article
- PubMed/NCBI
- Google Scholar
3. Chen W-T, Lu A, Craessaerts K, Pavie B, Sala Frigerio C, Corthout N, et al. Spatial transcriptomics and in situ sequencing to study Alzheimer’s disease. Cell. 2020;182(4):976-991.e19. pmid:32702314
- View Article
- PubMed/NCBI
- Google Scholar
4. Moses L, Pachter L. Museum of spatial transcriptomics. Nat Methods. 2022;19(5):534–46. pmid:35273392
- View Article
- PubMed/NCBI
- Google Scholar
5. Stickels RR, Murray E, Kumar P, Li J, Marshall JL, Di Bella DJ, et al. Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nat Biotechnol. 2021;39(3):313–9. pmid:33288904
- View Article
- PubMed/NCBI
- Google Scholar
6. Rodriques SG, Stickels RR, Goeva A, Martin CA, Murray E, Vanderburg CR, et al. Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution. Science. 2019;363(6434):1463–7. pmid:30923225
- View Article
- PubMed/NCBI
- Google Scholar
7. Chen A, Liao S, Cheng M, Ma K, Wu L, Lai Y, et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell. 2022;185(10):1777-1792.e21. pmid:35512705
- View Article
- PubMed/NCBI
- Google Scholar
8. Eng C-HL, Lawson M, Zhu Q, Dries R, Koulena N, Takei Y, et al. Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH. Nature. 2019;568(7751):235–9. pmid:30911168
- View Article
- PubMed/NCBI
- Google Scholar
9. Tran HTN, Ang KS, Chevrier M, Zhang X, Lee NYS, Goh M, et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol. 2020;21(1):12. pmid:31948481
- View Article
- PubMed/NCBI
- Google Scholar
10. Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM 3rd, et al. Comprehensive integration of single-cell data. Cell. 2019;177(7):1888-1902.e21. pmid:31178118
- View Article
- PubMed/NCBI
- Google Scholar
11. Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K, et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods. 2019;16(12):1289–96. pmid:31740819
- View Article
- PubMed/NCBI
- Google Scholar
12. Zhang L, Nie Q. scMC learns biological variation through the alignment of multiple single-cell genomics datasets. Genome Biol. 2021;22(1):10. pmid:33397454
- View Article
- PubMed/NCBI
- Google Scholar
13. Hu J, Li X, Coleman K, Schroeder A, Ma N, Irwin DJ, et al. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat Methods. 2021;18(11):1342–51. pmid:34711970
- View Article
- PubMed/NCBI
- Google Scholar
14. Ren H, Walker BL, Cang Z, Nie Q. Identifying multicellular spatiotemporal organization of cells with SpaceFlow. Nat Commun. 2022;13(1):4076. pmid:35835774
- View Article
- PubMed/NCBI
- Google Scholar
15. Dong K, Zhang S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat Commun. 2022;13(1):1739. pmid:35365632
- View Article
- PubMed/NCBI
- Google Scholar
16. Long Y, Ang KS, Li M, Chong KLK, Sethi R, Zhong C, et al. Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST. Nat Commun. 2023;14(1):1155. pmid:36859400
- View Article
- PubMed/NCBI
- Google Scholar
17. Yuan Z, Zhao F, Lin S, Zhao Y, Yao J, Cui Y, et al. Benchmarking spatial clustering methods with spatially resolved transcriptomics data. Nat Methods. 2024;21(4):712–22. pmid:38491270
- View Article
- PubMed/NCBI
- Google Scholar
18. Zeira R, Land M, Strzalkowski A, Raphael BJ. Alignment and integration of spatial transcriptomics data. Nat Methods. 2022;19(5):567–75. pmid:35577957
- View Article
- PubMed/NCBI
- Google Scholar
19. Jones A, Townes FW, Li D, Engelhardt BE. Alignment of spatial genomics data using deep Gaussian processes. Nat Methods. 2023;20(9):1379–87. pmid:37592182
- View Article
- PubMed/NCBI
- Google Scholar
20. Zhou X, Dong K, Zhang S. Integrating spatial transcriptomics data across different conditions, technologies and developmental stages. Nat Comput Sci. 2023;3(10):894–906. pmid:38177758
- View Article
- PubMed/NCBI
- Google Scholar
21. Xia C-R, Cao Z-J, Tu X-M, Gao G. Spatial-linked alignment tool (SLAT) for aligning heterogenous slices. Nat Commun. 2023;14(1):7236. pmid:37945600
- View Article
- PubMed/NCBI
- Google Scholar
22. Guo T, Yuan Z, Pan Y, Wang J, Chen F, Zhang MQ, et al. SPIRAL: integrating and aligning spatially resolved transcriptomics data across different experiments, conditions, and technologies. Genome Biol. 2023;24(1):241. pmid:37864231
- View Article
- PubMed/NCBI
- Google Scholar
23. Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018;19(1):15. pmid:29409532
- View Article
- PubMed/NCBI
- Google Scholar
24. McInnes L, Healy J, Melville J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint. 2018.
- View Article
- Google Scholar
25. Wolf FA, Hamey FK, Plass M, Solana J, Dahlin JS, Göttgens B, et al. PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol. 2019;20(1):59. pmid:30890159
- View Article
- PubMed/NCBI
- Google Scholar
26. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech. 2008;2008(10):P10008.
- View Article
- Google Scholar
27. Levine JH, Simonds EF, Bendall SC, Davis KL, Amir ED, Tadmor MD, et al. Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis. Cell. 2015;162(1):184–97. pmid:26095251
- View Article
- PubMed/NCBI
- Google Scholar
28. Xu H, Fu H, Long Y, Ang KS, Sethi R, Chong K, et al. Unsupervised spatially embedded deep representation of spatial transcriptomics. Genome Med. 2024;16(1):12. pmid:38217035
- View Article
- PubMed/NCBI
- Google Scholar
29. Lohoff T, Ghazanfar S, Missarova A, Koulena N, Pierson N, Griffiths J. Highly multiplexed spatially resolved gene expression profiling of mouse organogenesis. BioRxiv. 2020.
- View Article
- Google Scholar
30. Vega-Lopez GA, Cerrizuela S, Tribulo C, Aybar MJ. Neurocristopathies: New insights 150 years after the neural crest discovery. Dev Biol. 2018;444 Suppl 1:S110–43. pmid:29802835
- View Article
- PubMed/NCBI
- Google Scholar
31. de Bruijn MF, Speck NA, Peeters MC, Dzierzak E. Definitive hematopoietic stem cells first develop within the major arterial regions of the mouse embryo. EMBO J. 2000;19(11):2465–74. pmid:10835345
- View Article
- PubMed/NCBI
- Google Scholar
32. Grubb BJ. Developmental Biology. Gilbert SF, editor. Oxford University Press; 2006.
33. Shalaby F, Ho J, Stanford WL, Fischer KD, Schuh AC, Schwartz L, et al. A requirement for Flk1 in primitive and definitive hematopoiesis and vasculogenesis. Cell. 1997;89(6):981–90. pmid:9200616
- View Article
- PubMed/NCBI
- Google Scholar
34. Noden DM, Trainor PA. Relations and interactions between cranial mesoderm and neural crest populations. J Anat. 2005;207(5):575–601. pmid:16313393
- View Article
- PubMed/NCBI
- Google Scholar
35. Ober EA, Verkade H, Field HA, Stainier DYR. Mesodermal Wnt2b signalling positively regulates liver specification. Nature. 2006;442(7103):688–91. pmid:16799568
- View Article
- PubMed/NCBI
- Google Scholar
36. Cable DM, Murray E, Shanmugam V, Zhang S, Zou LS, Diao M, et al. Cell type-specific inference of differential expression in spatial transcriptomics. Nat Methods. 2022;19(9):1076–87. pmid:36050488
- View Article
- PubMed/NCBI
- Google Scholar
37. Dudek SM, Alexander GM, Farris S. Rediscovering area CA2: unique properties and functions. Nat Rev Neurosci. 2016;17(2):89–102. pmid:26806628
- View Article
- PubMed/NCBI
- Google Scholar
38. Walker LC. Aβ Plaques. Free Neuropathol. 2020;1:1–31. pmid:33345256
- View Article
- PubMed/NCBI
- Google Scholar
39. Elmore MRP, Najafi AR, Koike MA, Dagher NN, Spangenberg EE, Rice RA, et al. Colony-stimulating factor 1 receptor signaling is necessary for microglia viability, unmasking a microglia progenitor cell in the adult brain. Neuron. 2014;82(2):380–97. pmid:24742461
- View Article
- PubMed/NCBI
- Google Scholar
40. Salas SM, Czarnewski P, Kuemmerle LB, Helgadottir S, Matsson-Langseth C, Tismeyer S. Optimizing Xenium In Situ data utility by quality assessment and best practice analysis workflows. BioRxiv. 2023.
- View Article
- Google Scholar
41. Jia L, Li F, Wei C, Zhu M, Qu Q, Qin W, et al. Prediction of Alzheimer’s disease using multi-variants from a Chinese genome-wide association study. Brain. 2021;144(3):924–37. pmid:33188687
- View Article
- PubMed/NCBI
- Google Scholar
42. Lohoff T, Ghazanfar S, Missarova A, Koulena N, Pierson N, Griffiths JA, et al. Integration of spatial and single-cell transcriptomic data elucidates mouse organogenesis. Nat Biotechnol. 2022;40(1):74–85. pmid:34489600
- View Article
- PubMed/NCBI
- Google Scholar
43. Edelsbrunner H, Kirkpatrick D, Seidel R. On the shape of a set of points in the plane. IEEE Trans Inform Theory. 1983;29(4):551–9.
- View Article
- Google Scholar
44. Traag VA, Waltman L, van Eck NJ. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep. 2019;9(1):5233. pmid:30914743
- View Article
- PubMed/NCBI
- Google Scholar
45. Oord A van den, Li Y, Vinyals O. Representation learning with contrastive predictive coding. arXiv preprint. 2018.
- View Article
- Google Scholar
46. Chen H, Zhao Z, Li Y, Zou Y, Li R, Zhang R. Csgcl: Community-strength-enhanced graph contrastive learning. arXiv preprint. 2023.
- View Article
- Google Scholar
47. Simon LM, Wang Y-Y, Zhao Z. Integration of millions of transcriptomes using batch-aware triplet neural networks. Nat Mach Intell. 2021;3(8):705–15.
- View Article
- Google Scholar
48. Haghverdi L, Lun ATL, Morgan MD, Marioni JC. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat Biotechnol. 2018;36(5):421–7. pmid:29608177
- View Article
- PubMed/NCBI
- Google Scholar
49. Rousseeuw PJ. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics. 1987;20:53–65.
- View Article
- Google Scholar
50. Luecken MD, Büttner M, Chaichoompu K, Danese A, Interlandi M, Mueller MF, et al. Benchmarking atlas-level data integration in single-cell genomics. Nat Methods. 2022;19(1):41–50. pmid:34949812
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Ståhl PL, Salmén F, Vickovic S, Lundmark A, Navarro JF, Magnusson J, et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science. 2016;353(6294):78–82. pmid:27365449
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Rao A, Barkley D, França GS, Yanai I. Exploring tissue architecture using spatial transcriptomics. Nature. 2021;596(7871):211–20. pmid:34381231
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Chen W-T, Lu A, Craessaerts K, Pavie B, Sala Frigerio C, Corthout N, et al. Spatial transcriptomics and in situ sequencing to study Alzheimer’s disease. Cell. 2020;182(4):976-991.e19. pmid:32702314
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Moses L, Pachter L. Museum of spatial transcriptomics. Nat Methods. 2022;19(5):534–46. pmid:35273392
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Stickels RR, Murray E, Kumar P, Li J, Marshall JL, Di Bella DJ, et al. Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nat Biotechnol. 2021;39(3):313–9. pmid:33288904
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref6] 6. Rodriques SG, Stickels RR, Goeva A, Martin CA, Murray E, Vanderburg CR, et al. Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution. Science. 2019;363(6434):1463–7. pmid:30923225
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref7] 7. Chen A, Liao S, Cheng M, Ma K, Wu L, Lai Y, et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell. 2022;185(10):1777-1792.e21. pmid:35512705
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref8] 8. Eng C-HL, Lawson M, Zhu Q, Dries R, Koulena N, Takei Y, et al. Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH. Nature. 2019;568(7751):235–9. pmid:30911168
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref9] 9. Tran HTN, Ang KS, Chevrier M, Zhang X, Lee NYS, Goh M, et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol. 2020;21(1):12. pmid:31948481
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref10] 10. Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM 3rd, et al. Comprehensive integration of single-cell data. Cell. 2019;177(7):1888-1902.e21. pmid:31178118
View Article
PubMed/NCBI
Google Scholar

[38] View Article

[39] PubMed/NCBI

[40] Google Scholar

[ref11] 11. Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K, et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods. 2019;16(12):1289–96. pmid:31740819
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref12] 12. Zhang L, Nie Q. scMC learns biological variation through the alignment of multiple single-cell genomics datasets. Genome Biol. 2021;22(1):10. pmid:33397454
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref13] 13. Hu J, Li X, Coleman K, Schroeder A, Ma N, Irwin DJ, et al. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat Methods. 2021;18(11):1342–51. pmid:34711970
View Article
PubMed/NCBI
Google Scholar

[50] View Article

[51] PubMed/NCBI

[52] Google Scholar

[ref14] 14. Ren H, Walker BL, Cang Z, Nie Q. Identifying multicellular spatiotemporal organization of cells with SpaceFlow. Nat Commun. 2022;13(1):4076. pmid:35835774
View Article
PubMed/NCBI
Google Scholar

[54] View Article

[55] PubMed/NCBI

[56] Google Scholar

[ref15] 15. Dong K, Zhang S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat Commun. 2022;13(1):1739. pmid:35365632
View Article
PubMed/NCBI
Google Scholar

[58] View Article

[59] PubMed/NCBI

[60] Google Scholar

[ref16] 16. Long Y, Ang KS, Li M, Chong KLK, Sethi R, Zhong C, et al. Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST. Nat Commun. 2023;14(1):1155. pmid:36859400
View Article
PubMed/NCBI
Google Scholar

[62] View Article

[63] PubMed/NCBI

[64] Google Scholar

[ref17] 17. Yuan Z, Zhao F, Lin S, Zhao Y, Yao J, Cui Y, et al. Benchmarking spatial clustering methods with spatially resolved transcriptomics data. Nat Methods. 2024;21(4):712–22. pmid:38491270
View Article
PubMed/NCBI
Google Scholar

[66] View Article

[67] PubMed/NCBI

[68] Google Scholar

[ref18] 18. Zeira R, Land M, Strzalkowski A, Raphael BJ. Alignment and integration of spatial transcriptomics data. Nat Methods. 2022;19(5):567–75. pmid:35577957
View Article
PubMed/NCBI
Google Scholar

[70] View Article

[71] PubMed/NCBI

[72] Google Scholar

[ref19] 19. Jones A, Townes FW, Li D, Engelhardt BE. Alignment of spatial genomics data using deep Gaussian processes. Nat Methods. 2023;20(9):1379–87. pmid:37592182
View Article
PubMed/NCBI
Google Scholar

[74] View Article

[75] PubMed/NCBI

[76] Google Scholar

[ref20] 20. Zhou X, Dong K, Zhang S. Integrating spatial transcriptomics data across different conditions, technologies and developmental stages. Nat Comput Sci. 2023;3(10):894–906. pmid:38177758
View Article
PubMed/NCBI
Google Scholar

[78] View Article

[79] PubMed/NCBI

[80] Google Scholar

[ref21] 21. Xia C-R, Cao Z-J, Tu X-M, Gao G. Spatial-linked alignment tool (SLAT) for aligning heterogenous slices. Nat Commun. 2023;14(1):7236. pmid:37945600
View Article
PubMed/NCBI
Google Scholar

[82] View Article

[83] PubMed/NCBI

[84] Google Scholar

[ref22] 22. Guo T, Yuan Z, Pan Y, Wang J, Chen F, Zhang MQ, et al. SPIRAL: integrating and aligning spatially resolved transcriptomics data across different experiments, conditions, and technologies. Genome Biol. 2023;24(1):241. pmid:37864231
View Article
PubMed/NCBI
Google Scholar

[86] View Article

[87] PubMed/NCBI

[88] Google Scholar

[ref23] 23. Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018;19(1):15. pmid:29409532
View Article
PubMed/NCBI
Google Scholar

[90] View Article

[91] PubMed/NCBI

[92] Google Scholar

[ref24] 24. McInnes L, Healy J, Melville J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint. 2018.
View Article
Google Scholar

[94] View Article

[95] Google Scholar

[ref25] 25. Wolf FA, Hamey FK, Plass M, Solana J, Dahlin JS, Göttgens B, et al. PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol. 2019;20(1):59. pmid:30890159
View Article
PubMed/NCBI
Google Scholar

[97] View Article

[98] PubMed/NCBI

[99] Google Scholar

[ref26] 26. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech. 2008;2008(10):P10008.
View Article
Google Scholar

[101] View Article

[102] Google Scholar

[ref27] 27. Levine JH, Simonds EF, Bendall SC, Davis KL, Amir ED, Tadmor MD, et al. Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis. Cell. 2015;162(1):184–97. pmid:26095251
View Article
PubMed/NCBI
Google Scholar

[104] View Article

[105] PubMed/NCBI

[106] Google Scholar

[ref28] 28. Xu H, Fu H, Long Y, Ang KS, Sethi R, Chong K, et al. Unsupervised spatially embedded deep representation of spatial transcriptomics. Genome Med. 2024;16(1):12. pmid:38217035
View Article
PubMed/NCBI
Google Scholar

[108] View Article

[109] PubMed/NCBI

[110] Google Scholar

[ref29] 29. Lohoff T, Ghazanfar S, Missarova A, Koulena N, Pierson N, Griffiths J. Highly multiplexed spatially resolved gene expression profiling of mouse organogenesis. BioRxiv. 2020.
View Article
Google Scholar

[112] View Article

[113] Google Scholar

[ref30] 30. Vega-Lopez GA, Cerrizuela S, Tribulo C, Aybar MJ. Neurocristopathies: New insights 150 years after the neural crest discovery. Dev Biol. 2018;444 Suppl 1:S110–43. pmid:29802835
View Article
PubMed/NCBI
Google Scholar

[115] View Article

[116] PubMed/NCBI

[117] Google Scholar

[ref31] 31. de Bruijn MF, Speck NA, Peeters MC, Dzierzak E. Definitive hematopoietic stem cells first develop within the major arterial regions of the mouse embryo. EMBO J. 2000;19(11):2465–74. pmid:10835345
View Article
PubMed/NCBI
Google Scholar

[119] View Article

[120] PubMed/NCBI

[121] Google Scholar

[ref32] 32. Grubb BJ. Developmental Biology. Gilbert SF, editor. Oxford University Press; 2006.

[ref33] 33. Shalaby F, Ho J, Stanford WL, Fischer KD, Schuh AC, Schwartz L, et al. A requirement for Flk1 in primitive and definitive hematopoiesis and vasculogenesis. Cell. 1997;89(6):981–90. pmid:9200616
View Article
PubMed/NCBI
Google Scholar

[124] View Article

[125] PubMed/NCBI

[126] Google Scholar

[ref34] 34. Noden DM, Trainor PA. Relations and interactions between cranial mesoderm and neural crest populations. J Anat. 2005;207(5):575–601. pmid:16313393
View Article
PubMed/NCBI
Google Scholar

[128] View Article

[129] PubMed/NCBI

[130] Google Scholar

[ref35] 35. Ober EA, Verkade H, Field HA, Stainier DYR. Mesodermal Wnt2b signalling positively regulates liver specification. Nature. 2006;442(7103):688–91. pmid:16799568
View Article
PubMed/NCBI
Google Scholar

[132] View Article

[133] PubMed/NCBI

[134] Google Scholar

[ref36] 36. Cable DM, Murray E, Shanmugam V, Zhang S, Zou LS, Diao M, et al. Cell type-specific inference of differential expression in spatial transcriptomics. Nat Methods. 2022;19(9):1076–87. pmid:36050488
View Article
PubMed/NCBI
Google Scholar

[136] View Article

[137] PubMed/NCBI

[138] Google Scholar

[ref37] 37. Dudek SM, Alexander GM, Farris S. Rediscovering area CA2: unique properties and functions. Nat Rev Neurosci. 2016;17(2):89–102. pmid:26806628
View Article
PubMed/NCBI
Google Scholar

[140] View Article

[141] PubMed/NCBI

[142] Google Scholar

[ref38] 38. Walker LC. Aβ Plaques. Free Neuropathol. 2020;1:1–31. pmid:33345256
View Article
PubMed/NCBI
Google Scholar

[144] View Article

[145] PubMed/NCBI

[146] Google Scholar

[ref39] 39. Elmore MRP, Najafi AR, Koike MA, Dagher NN, Spangenberg EE, Rice RA, et al. Colony-stimulating factor 1 receptor signaling is necessary for microglia viability, unmasking a microglia progenitor cell in the adult brain. Neuron. 2014;82(2):380–97. pmid:24742461
View Article
PubMed/NCBI
Google Scholar

[148] View Article

[149] PubMed/NCBI

[150] Google Scholar

[ref40] 40. Salas SM, Czarnewski P, Kuemmerle LB, Helgadottir S, Matsson-Langseth C, Tismeyer S. Optimizing Xenium In Situ data utility by quality assessment and best practice analysis workflows. BioRxiv. 2023.
View Article
Google Scholar

[152] View Article

[153] Google Scholar

[ref41] 41. Jia L, Li F, Wei C, Zhu M, Qu Q, Qin W, et al. Prediction of Alzheimer’s disease using multi-variants from a Chinese genome-wide association study. Brain. 2021;144(3):924–37. pmid:33188687
View Article
PubMed/NCBI
Google Scholar

[155] View Article

[156] PubMed/NCBI

[157] Google Scholar

[ref42] 42. Lohoff T, Ghazanfar S, Missarova A, Koulena N, Pierson N, Griffiths JA, et al. Integration of spatial and single-cell transcriptomic data elucidates mouse organogenesis. Nat Biotechnol. 2022;40(1):74–85. pmid:34489600
View Article
PubMed/NCBI
Google Scholar

[159] View Article

[160] PubMed/NCBI

[161] Google Scholar

[ref43] 43. Edelsbrunner H, Kirkpatrick D, Seidel R. On the shape of a set of points in the plane. IEEE Trans Inform Theory. 1983;29(4):551–9.
View Article
Google Scholar

[163] View Article

[164] Google Scholar

[ref44] 44. Traag VA, Waltman L, van Eck NJ. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep. 2019;9(1):5233. pmid:30914743
View Article
PubMed/NCBI
Google Scholar

[166] View Article

[167] PubMed/NCBI

[168] Google Scholar

[ref45] 45. Oord A van den, Li Y, Vinyals O. Representation learning with contrastive predictive coding. arXiv preprint. 2018.
View Article
Google Scholar

[170] View Article

[171] Google Scholar

[ref46] 46. Chen H, Zhao Z, Li Y, Zou Y, Li R, Zhang R. Csgcl: Community-strength-enhanced graph contrastive learning. arXiv preprint. 2023.
View Article
Google Scholar

[173] View Article

[174] Google Scholar

[ref47] 47. Simon LM, Wang Y-Y, Zhao Z. Integration of millions of transcriptomes using batch-aware triplet neural networks. Nat Mach Intell. 2021;3(8):705–15.
View Article
Google Scholar

[176] View Article

[177] Google Scholar

[ref48] 48. Haghverdi L, Lun ATL, Morgan MD, Marioni JC. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat Biotechnol. 2018;36(5):421–7. pmid:29608177
View Article
PubMed/NCBI
Google Scholar

[179] View Article

[180] PubMed/NCBI

[181] Google Scholar

[ref49] 49. Rousseeuw PJ. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics. 1987;20:53–65.
View Article
Google Scholar

[183] View Article

[184] Google Scholar

[ref50] 50. Luecken MD, Büttner M, Chaichoompu K, Danese A, Interlandi M, Mueller MF, et al. Benchmarking atlas-level data integration in single-cell genomics. Nat Methods. 2022;19(1):41–50. pmid:34949812
View Article
PubMed/NCBI
Google Scholar

[186] View Article

[187] PubMed/NCBI

[188] Google Scholar

Figures

Abstract

Author summary

Introduction

Results

Overview of Tacos

Tacos achieves superior alignment performance on different slices from the same platform

Tacos identifies slice-specific structures with clear boundaries across different platforms

Tacos accurately maps tissues during different developmental stages of mouse embryo across different platforms

Tacos accurately detects different sophisticated structures between healthy and Alzheimer disease slices

Discussion

Methods

Datasets and data preprocessing

Building spatial graph for spatial transcriptomic data

Extracting low-dimensional embeddings with community-enhanced encoder

Spatial similarity constraint within each slice

Aligned constraint across different slices

Identification of spatial domains

Spatial transcriptomics data denoising

Evaluation metrics

Batch entropy score

Label ASW and batch ASW score

cLISI and bLISI

Graph connectivity [50]

Supporting information

S1 Text. Supplementary materials.

References