Figures
Abstract
Spatial transcriptomics enables the measurement of gene expression in intact tissues. Despite this, reconstructing anatomically accurate spatial domains remains challenging, primarily due to expression sparsity, complex tissue architecture that is characterized by sharp boundaries and long-range continuity, and weak spatial signals. Traditional pipelines typically rely on expression-driven clustering and spatial smoothing, which underperform at boundaries and in sparse regions while neglecting morphological information. To address these challenges, AugGCL is proposed, an augmented graph-convolutional learning framework that enhances spatial structure decoding and gene expression reconstruction through targeted augmentation of both gene and image data. A key component of AugGCL is neighborhood information aggregation mechanism, which integrates expression similarity and spatial proximity to construct a weighted graph and an enhanced expression matrix, addressing sparsity without sacrificing boundary clarity. Additionally, a two stream weighted graph convolutional network jointly models refined gene features and image-derived morphological information, with image-aware auxiliary reconstructions enhancing weak spatial signals and sharpening boundaries. On datasets from the human dorsolateral prefrontal cortex, breast cancer, and mouse embryo, AugGCL outperforms baseline methods across multiple metrics, showing robustness and generalization across a range of datasets. Downstream analysis validated the reliability of the method, confirming its effectiveness in cell annotation, functional enrichment, and mechanistic studies. AugGCL generates clearer spatial domains and significantly advances the application of spatial transcriptomics in tissue structure and disease research.
Author summary
Spatial transcriptomics is an important technique for revealing tissue structure and disease mechanisms. However, existing spatial domain identification methods have not fully exploited the spatial information embedded in the data, especially when it comes to detailed exploration of tissue structure. This study presents a new tool for spatial domain identification, which makes full use of various types of spatial transcriptomic data from multiple perspectives to accurately identify real cell groupings. The method significantly improves spatial domain recognition accuracy by integrating gene expression with tissue morphology analysis. Experimental results show that this tool not only accurately identifies spatial domains but also provides strong support for the comprehensive exploration of biological tissue structure. On this basis, a series of downstream analyses, including volcano plot generation, functional enrichment analysis, and gene heatmap visualization, were performed. These analyses not only validated the effectiveness of the method but also revealed the functional characteristics and expression patterns of cells within spatial domains. This step further confirmed the broad application potential of the method in cell type annotation, functional enrichment, and mechanistic research, highlighting its significant potential in advancing biological and disease research.
Citation: Ji T, Yang B, Wang M, Ji H, Yang H, Liu Y (2026) AugGCL: Multimodal graph learning for spatial transcriptomics analysis with enhanced gene and morphological data. PLoS Comput Biol 22(1): e1013912. https://doi.org/10.1371/journal.pcbi.1013912
Editor: Michael Hawrylycz, Allen Institute for Brain Science, UNITED STATES OF AMERICA
Received: September 9, 2025; Accepted: January 12, 2026; Published: January 23, 2026
Copyright: © 2026 Ji et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The datasets used in this study are publicly available. The first dataset consists of human dorsolateral prefrontal cortex data, captured using 10× Visium technology, and can be accessed at http://research.libd.org/spatialLIBD/. The second dataset includes spatial transcriptomic data from mouse embryos at E9.5, obtained via Stereo-seq technology, and is available for download at https://db.cngb.org/stomics/mosta/. The third dataset includes human breast cancer data, sourced from the public 10× Genomics database, and can be downloaded from https://www.10xgenomics.com/datasets/human-breast-cancer-block-a-section-1-1-standard-1-1-0. Additionally, the AugGCL source code and reproduction scripts are publicly available at GitHub: https://github.com/Jtfboom/AugGCL.git.
Funding: This study was partially supported by the National Natural Science Foundation of China (6210023056 to HJ) and the Natural Science Basic Research Program of Shaanxi (2024JC-YBMS-473 to BY). Additionally, this research was supported by the Scientific and Technological Program of Xi’an (24GXFW0016 to MW) and the Graduate Innovation Fund of Xi’an Polytechnic University (chx2025026 to TJ). The funder 6210023056 contributed to the study design, data collection and analysis, publication decision, and manuscript preparation. The funder 2024JC-YBMS-473 supported the study design, data collection, and analysis. The funder 24GXFW0016 supported the technical implementation and data analysis of the study. The funder chx2025026 primarily supported the data collection and analysis work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Recent advancements in spatial molecular imaging have opened new avenues for studying tissue architecture and gene expression at the subcellular level [1–3]. Deeper insight into cellular interactions within their microenvironment is crucial for elucidating disease mechanisms. Leading technologies such as 10× Visium and Vizgen MERSCOPE/MERFISH [4,5] are highly effective in capturing the spatial context of transcripts, cellular positioning, and boundaries, often integrated with high-resolution multi-channel immunohistochemistry (IHC) imaging. As these spatial transcriptomics (ST) tools continue to evolve, they are reshaping our approach to spatial biology, offering unparalleled insights into tissue organization and the molecular pathways underlying diseases [6–8].
Accurately identifying spatial domains using multimodal data remains a significant challenge in spatial transcriptomics research. Traditional spatial domain identification methods, such as Seurat [9] and Louvain-based Scanpy [10], rely solely on gene expression data, which often leads to suboptimal clustering results. In contrast, spatial clustering methods that incorporate spatial information and tissue image features have shown enhanced clustering performance. For example, BayesSpace [11] utilizes Bayesian statistics to analyze both gene expression matrices and spatial neighborhood information, achieving effective spatial clustering. Similarly, DeepST [12] improves correlations between spatially adjacent points by early fusion of gene expression data and tissue image features, leading to richer latent representations of spatial transcriptomics data. SiGra [13] applies an image-enhanced graph transformer model to analyze single-cell spatial data, which decodes spatial domains while amplifying spatial signals.
Despite the integration of tissue image features in these methods to enhance spatial gene expression representations and improve domain identification, gene expression matrices often suffer from sparsity [14–16], with many genes exhibiting very low or even zero expression in certain cells or regions. This issue is particularly prevalent in spatial transcriptomics datasets. Existing methods for addressing this sparsity include techniques such as imputation methods like MAGIC [17], spatial smoothing, and dimensionality reduction approaches. These methods aim to fill in missing data, reduce noise, and extract meaningful latent representations from sparse gene expression matrices. However, the sparsity and noise in gene expression data continue to significantly impact the latent information in spatial transcriptomics, making it a critical challenge for accurate spatial domain identification.
To address these challenges, the AugGCL method was developed, which uniquely tackles gene expression matrix sparsity while enhancing the identification of spatial domains. Unlike traditional methods that primarily rely on gene expression data, AugGCL integrates spatial neighborhood information with original gene features to create a smoother, more consistent spatial gene expression representation. By leveraging the spatial relationships between genes and cells, this method alleviates the expression void problem, which refers to regions of missing expression in gene expression matrices due to sparsity or noise. Additionally, the method uses graph convolutional networks (GCN) [18–20] to fuse multimodal data, including raw gene expression, enhanced gene expression, multi-channel tissue imaging, and cell spatial locations. By incorporating both gene and morphological information, AugGCL is able to capture complex spatial structures and dynamic tissue changes that traditional methods often overlook.
In contrast to existing methods, AugGCL addresses the inherent sparsity in gene expression matrices, improving the consistency and clarity of gene expression representations across spatial domains. Existing approaches, while effective in many cases, struggle to handle these sparsity issues, often leading to suboptimal clustering and low spatial resolution. AugGCL’s ability to fuse multimodal data and enhance the spatial consistency of gene expression maps makes it a more robust solution for identifying complex spatial domains and accurately modeling tissue structures.
Extensive tests across various ST datasets and benchmarks against existing algorithms demonstrate that AugGCL outperforms other methods in spatial domain identification, latent embedding, and data augmentation. AugGCL significantly improves clustering performance and spatial resolution, providing more biologically relevant and accurate spatial gene expression maps. AugGCL will play a key role in revealing the intricate spatial architecture within heterogeneous tissues, exploring cell-to-cell interactions, and advancing the field of spatial transcriptomics.
Results
Overview of AugGCL
The AugGCL workflow (Fig 1) consists of several key stages, each corresponding to a specific component of the framework. First, data preprocessing filters out low-expression genes and generates an enhanced gene expression matrix, as shown in Fig 1A, where both data preprocessing and spatial graph construction take place. Spatial graph construction integrates gene expression and spatial adjacency data [21], laying the foundation for spatial analysis. Neighborhood Information Aggregation (NIA) then combines these elements to refine the expression matrix [22], improving spatial coherence. This step is illustrated in Fig 1B. The AugGCL model then fuses gene expression and tissue image features to improve spatial structure decoding, achieved through a multi-layer graph convolutional network, whose design is shown in Fig 1C. Finally, spatial clustering and biological analysis are conducted to interpret the represented results of data, yielding insights into cell interactions and molecular processes, as illustrated in Fig 1D.
(A) The data preprocessing and spatial graph construction steps are shown, where histological images are divided into patches, and spatial coordinates are used to create an adjacency matrix that captures cell proximity. (B) The neighborhood information aggregation process, where cosine similarity between cells is calculated based on both image and gene expression embeddings, constructing aggregation graphs with weighted edges. (C) The graph-based encoding and latent feature learning, where graph neural networks fuse the image and gene feature aggregation graphs to generate latent representations for each cell. These fused features are then used for spatial clustering and gene expression reconstruction, enabling a more accurate decoding of complex spatial structures. (D) The downstream biological analysis, where reconstructed gene expression matrices are used for spatial analyses such as heatmaps, volcano plots, and GO enrichment of upregulated genes, showcasing the biological relevance of the reconstructed data.
Spatial clustering of AugGCL on human dorsolateral prefrontal cortex tissue
To evaluate the clustering performance of AugGCL, it was compared with six commonly used spatial transcriptomics analysis methods: StLearn [23], SpaGCN [24], SEDR [25], GraphST [26], DeepST, and SiGra. All methods were applied to the human dorsolateral prefrontal cortex (DLPFC) dataset [27–30] obtained from the 10× Visium platform. This dataset contains 12 tissue slices, each annotated with six cortical layers (Layer 1–6) and white matter (WM).
To assess clustering accuracy, the widely adopted external clustering metric Adjusted Rand Index (ARI) was used. As shown in Fig 2A, AugGCL achieved the highest median ARI score among all compared methods, with the smallest interquartile range, demonstrating not only high accuracy but also exceptional stability across the heterogeneous tissue slices. In contrast, methods such as StLearn and SpaGCN exhibited broader ARI distributions and lower median scores, indicating that they are more sensitive to expression variability and structural complexity. These results emphasize that AugGCL is robust and generalizable in delineating spatial structures, consistently performing well across different tissue slices.
(A) ARI scores distribution of seven clustering models across 12 tissue slices. (B) ARI scores of AugGCL across individual samples, showing the stability of its performance across diverse tissues. (C) Ground truth annotation (layered cortical structure) for DLPFC sample 151507. (D) Spatial clustering result of AugGCL (ARI = 0.67) on sample 151507, showing clear identification of cortical layers. (E) UMAP visualization of AugGCL-learned embeddings on sample 151507. (F) Comparison of clustering results and UMAP visualizations from baseline models (DeepST, GraphST, SEDR, SpaGCN, stLearn, SiGra) on sample 151507.
Further analysis is provided in Fig 2B, where the per-sample ARI scores of AugGCL across all 12 slices are shown. The results indicate that AugGCL achieved an ARI score greater than 0.5 for all slices, with most slices reaching scores close to or exceeding 0.6. This further validates AugGCL’s stable and reliable performance across various slice variations, emphasizing its robustness in different biological contexts.
To visually assess the spatial clustering capability of AugGCL, a representative sample from the DLPFC dataset (ID: 151507) was selected. As depicted in Fig 2C, the ground truth annotation clearly reveals the layered structure of the cortex, from Layer 1 to Layer 6, as well as the white matter. The spatial clustering result from AugGCL, shown in Fig 2D (ARI = 0.67), is highly consistent with this ground truth, presenting clear boundaries between layers and excellent structural continuity. This confirms that AugGCL effectively captures the complex spatial structures of the DLPFC tissue.
To further validate, the learned embeddings were visualized using UMAP, as shown in Fig 2E. The UMAP plot demonstrates well-separated clusters that align closely with the tissue structure annotations, indicating that AugGCL has successfully learned to represent the spatial features in the embedding space. In contrast, clustering results from baseline methods, presented in Fig 2F, exhibit varying degrees of segmentation noise, boundary blurring, and layer mixing. For instance, GraphST misclassifies several regions, while SpaGCN and StLearn exhibit over-segmentation at layer boundaries, further highlighting the superior performance of AugGCL in accurately reconstructing complex cortical structures with minimal distortion.
Biological validation and analysis of spatial clustering results in AugGCL
To better understand the biological significance of gene expression changes, differential expression analysis was first performed, and the results were visualized using a volcano plot (Fig 3A). This plot highlights several genes with significant expression changes, such as CXCL14 [31], RELN [32], PTN [33], and CNN3 [34], all of which show large fold changes and statistical significance. These findings suggest that these genes may play crucial roles in the human DLPFC and could serve as potential biomarkers for further investigation into brain function and disease.
(A) Volcano plot: The relationship between log2 fold change and -log10 p-value of differentially expressed genes in the DLPFC dataset. Genes are categorized as upregulated (pink), downregulated (blue), and non-significant (yellow). (B) GO enrichment analysis: The enrichment results of upregulated genes in DLPFC Layer 2, with the most enriched terms including chemical synaptic transmission and glutamate receptor signaling pathways. (C) Gene expression visualization: Visualization of key genes (RELN, HPCAL1, PCP4, KRT17, VAMP1, MBP) in both raw and enhanced formats. The enhanced images reveal clearer spatial expression patterns.
Further Gene Ontology (GO) enrichment analysis showed that the upregulated genes in Layer 2 of the DLPFC are primarily involved in neurotransmitter signaling and synaptic activity. Notably, GO terms such as chemical synaptic transmission, anterograde trans-synaptic signaling, and regulation of glutamate receptor signaling pathways were highly enriched (Fig 3B). These results suggest that the upregulated genes in Layer 2 play a critical role in synaptic communication and the regulation of glutamate receptors, both of which are essential for the proper functioning of the DLPFC.
To visually assess gene expression in the tissue, the expression patterns of several key genes were examined using both raw and enhanced visualization methods (Fig 3C). While the raw expression maps displayed sparse and uneven distributions of gene activity, the enhanced maps provided clearer and more coherent visualizations, revealing distinct spatial patterns of gene expression. Specifically, genes such as PCP4 and KRT17 exhibited clear spatial clustering in the enhanced images, highlighting the spatial organization of gene activity in the DLPFC. The enhanced results enabled more precise identification and analysis of gene distribution across different regions of the DLPFC, offering deeper insights into the molecular architecture of this brain area.
With the enhancement provided by AugGCL, the spatial clustering and regional distribution of gene expression became much clearer, highlighting the significant advantages of this method in extracting and enhancing gene expression patterns in spatial transcriptomics. Compared to the raw images, the enhanced visualizations not only improved the clarity of gene activity but also revealed more accurately the spatial distribution and structural patterns of genes within the tissue.
AugGCL for spatial domain recognition and biological evaluation in breast cancer dataset
In this study, AugGCL was applied to the spatial transcriptomics analysis of the human breast cancer dataset [35–38], assessing its performance in spatial domain recognition and clustering accuracy. Compared to several existing methods, AugGCL demonstrated distinct advantages in identifying spatial domains within tumor tissues.
The performance of different models on the dataset was evaluated using two key metrics: ARI and Normalized Mutual Information (NMI) (Fig 4A). The results showed that AugGCL outperformed other models with ARI of 0.61 and NMI of 0.73, surpassing models such as stLearn (ARI = 0.57), SpaGCN (ARI = 0.55), and SiGra (ARI = 0.55). These results suggest that AugGCL not only identifies spatial domains more accurately but also delineates the boundaries between tumor and healthy tissue more effectively, demonstrating higher clustering precision. In contrast, other models exhibited lower ARI values, indicating their limited ability to distinguish spatial domains.
(A) Performance comparison of different models (stlearn, SpaGCN, SEDR, GraphST, DeepST, SiGra, and AugGCL) based on ARI and NMI scores. (B) Volcano plot showing the differential expression of genes, highlighting upregulated (pink) and downregulated (blue) genes based on log2 fold change and -log10 P-values. (C) Ground truth spatial clustering, with tissue regions labeled for DCIS/LCIS, IDC, healthy tissue, and tumor edges. (D) AugGCL clustering result (ARI=0.61) showing spatial clustering of tumor and healthy tissue regions. (E) Comparison of spatial clustering results from various models, including stLearn, DeepST, GraphST, SEDR, SpaGCN, and SiGra, with corresponding ARI values.
Next, the significantly upregulated and downregulated genes in the dataset were explored using a volcano plot analysis (Fig 4B). The volcano plot clearly highlights several key upregulated genes, such as SERPINA3 [39], KLK6 [40], and MGP [41], whose expression patterns are closely associated with different tumor subtypes, offering important insights for biomarker discovery and targeted therapies.
To better understand how these genes contribute to tumor progression, their functions and mechanisms were explored. SERPINA3, a serine protease inhibitor, may act as either a tumor promoter or suppressor depending on the tumor type, but its precise mechanisms remain unclear. KLK6 plays a crucial role in regulating tumor cell migration, invasion, and response to radiotherapy. Its elevated expression in invasive tumor regions is considered a key factor in tumor progression and metastasis. MGP is involved in tumor angiogenesis, regulating the maturation of blood vessels, affecting nutrient supply and tumor growth, and providing a potential therapeutic target for future treatment strategies.
In terms of spatial clustering, AugGCL effectively distinguished different tissue regions, including Invasive Ductal Carcinoma (IDC), healthy tissue, and tumor edges (Fig 4C). The clustering results from AugGCL were highly consistent with the ground truth labels (Fig 4D), particularly in clustering the tumor edge regions. This indicates that AugGCL is capable of effectively preserving spatial information and accurately reflecting the complex structure of the tumor microenvironment when processing spatial transcriptomics data.
Through comparison with other models (Fig 4E), the clear advantages of AugGCL were further validated. For instance, while GraphST (ARI = 0.56) and DeepST (ARI = 0.57) performed similarly, they still lagged significantly behind AugGCL. This further highlights the remarkable superiority of AugGCL in spatial data analysis, particularly in distinguishing tumor microenvironments from healthy tissues.
Model performance evaluation of the E9.5 mouse embryo dataset
This study applies spatial transcriptomic analysis to the Mouse Embryo E9.5 dataset [42] to explore gene expression patterns across various tissue regions during embryonic development. The dataset includes key developmental tissue regions, such as the aorta-gonad-mesonephros (AGM), brain, heart, liver, and neural crest (Fig 5A). These regions exhibit distinct spatial distributions and transcriptomic features, providing insights into cell fate determination and organ formation during early development.
(A) Ground Truth: The true labels for tissue regions in the mouse embryo E9.5 dataset, including key regions such as AGM, Brain, Heart, Liver, and Neural Crest. (B) Model Performance (ARI & NMI): Bar plot comparing the performance of different clustering models (stlearn, SpaGCN, SEDR, GraphST, AugGCL) based on ARI and NMI. (C) AugGCL Spatial Clustering Result: Spatial clustering results using AugGCL, with regions color-coded according to their predicted labels. (D) Gene Expression Analysis for Heart Marker Gene: Comparison of raw gene expression (Raw_Myh7) and AugGCL reconstructed gene expression (AugGCL_Myh7) for the heart marker gene Myh7. (E) Spatial Clustering Results Comparison: Spatial clustering results from different methods (SpaGCN, SEDR, stLearn, GraphST) with corresponding ARI values for each method.
The performance of different clustering models, including SpaGCN, SEDR, stLearn, GraphST, and AugGCL, was evaluated using the ARI and NMI (Fig 5B). AugGCL demonstrated superior precision in spatial domain identification and tissue differentiation, particularly in regions such as AGM, neural crest, and heart, with ARI of 0.36 and NMI of 0.58. In contrast, models like SEDR and GraphST showed relatively lower performance, revealing limitations in handling complex embryonic development data.
The spatial clustering results demonstrate that AugGCL effectively distinguishes key tissue regions, including the brain, heart, and neural crest, with strong alignment between the clustering results and ground truth labels (Fig 5C). This indicates that AugGCL accurately captures spatial tissue information, providing reliable data for further investigation of developmental processes.
Gene expression patterns for Myh7 were also compared between raw data and those reconstructed by AugGCL (Fig 5D). AugGCL demonstrated a clearer reconstruction of gene expression patterns, especially in the heart region.
A comparison of spatial clustering results from different methods (SpaGCN, SEDR, stLearn, GraphST) (Fig 5E) further confirms the superiority of AugGCL. It showed higher clustering accuracy and better retention of spatial structure between tissues compared to other methods.
Discussion
Current spatial transcriptomics analysis methods still face considerable challenges in integrating multimodal information and improving the quality of spatial expression representations. In particular, when addressing challenges such as high gene expression sparsity, complex tissue architecture, and weak spatial signals, traditional approaches that rely on unimodal features or rule-based clustering often struggle to balance biological interpretability with structural resolution. To address these limitations, this study proposes the AugGCL framework, which aims to systematically enhance spatial structure decoding and expression pattern reconstruction through a neighborhood information aggregation mechanism and multimodal graph neural network modeling.
The key innovation of AugGCL lies in its integration of the neighborhood information aggregation module. This module jointly considers gene expression similarity and spatial proximity to dynamically generate a weighted graph structure, which is then used to perform spatial enhancement on the raw gene expression matrix. Compared to strategies that rely solely on spatial coordinates, neighborhood information aggregation incorporates functional-level regulatory signals, thereby improving both the consistency and discriminative power of cellular features. This is particularly advantageous in regions of sparse gene expression, effectively mitigating the expression void problem and enabling the construction of smoother, structure-aware expression representations for downstream modeling.
In terms of graph neural network modeling, AugGCL introduces a multimodal fusion architecture that integrates enhanced expression features with image-based morphological features as joint inputs. Through graph convolutional layers, the model unifies the encoding of spatial, functional, and morphological information of cells. The introduction of the image modality not only provides spatial boundary and morphological priors, but also enhances the clarity of cluster boundaries and continuity of tissue structures via auxiliary loss supervision. This multimodal fusion strategy enables AugGCL to effectively capture both local spatial variations and global structural partitions.
Experimental results demonstrate that AugGCL achieves superior performance across several representative ST datasets, including human cerebral cortex, breast cancer tissue, and embryonic development samples. In tasks such as identifying cortical layering, tumor boundary delineation, and cell lineage differentiation, AugGCL significantly outperforms mainstream baseline models, exhibiting strong robustness and generalizability. Furthermore, AugGCL introduces new approaches for gene expression reconstruction and spatial visualization. The enhanced expression maps it generates exhibit improved spatial coherence and biological relevance, providing a solid foundation for downstream functional annotation and mechanistic exploration.
In conclusion, AugGCL establishes a unified analytical framework based on multimodal collaborative modeling of expression, structure, and image data. By incorporating neighborhood-level information enhancement and image-guided fusion mechanisms, it significantly improves the interpretability of spatial expression data and the accuracy of tissue structure resolution. This makes AugGCL a generalizable and scalable computational tool for advancing spatial transcriptomics research.
Materials and methods
Data and image preprocessing
In the preprocessing phase, both the spatial transcriptomics data and corresponding tissue images were processed. First, the gene expression matrices were loaded from various platforms, along with their associated tissue images. A filtering process was applied to the gene expression data to remove low-expressing genes: (1) genes expressed in fewer than three cells were discarded to reduce noise; (2) genes known to introduce biases were excluded. The processed data was saved in .h5ad format for subsequent analysis, and the gene expression matrix was exported as a CSV file for further examination.
For the image preprocessing, the original histology image was read and converted to grayscale if necessary. To prepare the image for model input, it was then transformed into a tensor. Using the spatial coordinates (such as cell positions), image patches around each coordinate, typically of size 50*50 pixels, were extracted and flattened into one-dimensional vectors. All the extracted patches were stored in a dataframe, aligned with the original gene expression data’s index for easy access and further analysis. This combined preprocessing workflow ensures the retention of biologically relevant genes and image features, significantly enhancing the quality and reliability of the dataset for downstream analysis.
Spatial graph construction
The construction of a spatial graph aims to better integrate information from neighboring cells, enhancing the relationships between the data points. By using spatial coordinates to compute the distances between cells and applying a predefined distance threshold or number of neighbors, it is possible to effectively capture the neighboring cells and their respective distance relationships.
Let represent the N cell sample data. Correspondingly, the matrix
represents the raw gene expression data, the matrix
represents the augmented gene expression data, and
represents the tissue image data. dij is the distance between cells xi and xj. The adjacency matrix A is determined as follows:
In Eq.(1), dij is selected as Euclidean distance of the spatial locations between two cells.
This equation illustrates that cell i and cell j are neighbors if their distance is less than or equal to a predefined threshold, and non-neighbors otherwise. Self-loops are removed to ensure that only valid neighbor relationships are considered. This adjustment helps maintain a graph where each node (cell) is connected to others based on proximity, which is essential for certain graph-based algorithms.
This process constructs a spatial relationship graph between cells, providing a quantitative perspective on cell-to-cell interactions and offering critical biological insights into the spatial distribution and interactions of cells within tissues.
Neighborhood information aggregation
In spatial data analysis, data augmentation methods combine gene expression similarity with spatial neighborhood information between cells to generate richer expression data. Specifically, the method calculates the similarity between cells using the gene expression matrix and combines it with the spatial adjacency matrix to produce an enhanced expression matrix. This approach not only retains the spatial structural characteristics of the cells but also enhances the correlation of gene expression among neighboring cells. For ST data with morphological information, image segmentation is performed based on the spatial positions of each cell to extract the image features of cell. Using the same data augmentation method, a weighted spatial adjacency graph and edge weights are generated, thereby improving the ability of model to analyze complex spatial structures and providing more accurate data support for downstream tasks such as spatial clustering.
In this process, the similarity metric Sij is used to measure the similarity between cells based on their gene expression and spatial neighborhood relationships. It can be expressed as:
where is the exponential function, and
is a parameter. dij could be selected as cosine distance [43], and correspondingly, the
is set to be 2, as shown in Eq.(2).
The cosine distance dij reflects the similarity between the gene expression profiles of two cells. A smaller dij indicates higher similarity, which results in a larger Sij. This is because the exponential function − dij) increases as dij decreases, emphasizing the similarity between cells. The higher the dij value, the more cells with similar expression patterns will be connected, resulting in stronger connections between cells. The visualization of this function has been added to Figure 1 in S1 Appendix, which appears in (1) Explanation of the Neighborhood Information Aggregation.
The aim of this step is to effectively integrate the spatial proximity relationship and gene expression similarity, and the integration method is as follows:
This weighting operation allows us to reflect gene expression relationships while incorporating spatial information, facilitating further analysis.
The normalization operation is performed on the weighted adjacency matrix C and the edge weights, as demonstrated below:
where is a small value (e.g., 10−10) used to avoid division by zero errors.
For the given gene expression matrix , the enhancement aims to address the sparsity issue in gene expression data by combining spatial information from neighboring cells with gene expression similarities. The data is enhanced using the weighted adjacency matrix
. The resulting matrix
incorporates spatial neighbor information and improves the expression levels of the cells by leveraging the similarities of adjacent genes, as calculated below:
where is a coefficient used to control the degree of enhancement.
Graph neural network model designing
In this study, a novel graph convolutional network model with a two-layer architecture is proposed to enhance spatial data analysis by simultaneously processing both gene expression and tissue image features. The gene encoder processes the enhanced gene feature matrix with the gene adjacency matrix
, while the image encoder processes the image feature matrix
, which represents the tissue image data, with the image adjacency matrix
. Each encoder consists of 512 hidden units, with the ELU activation function applied after each convolution. The model is trained using a learning rate of 0.001 for 200 epochs, with a hidden dimension setting of [512, 30]. The model also integrates edge weights in a weighted multi-layer GCN structure to improve feature extraction capabilities, enabling effective decoding of complex spatial structures. After feature fusion, the model’s output is used for downstream spatial clustering tasks. These design choices enable the model to effectively process and integrate multimodal spatial data, while the chosen hyperparameters ensure optimal training for accurate clustering.
The model’s inputs include the enhanced gene feature matrix , original gene expression matrix
, and image feature matrix
, which are associated with the weighted gene adjacency matrix
and the weighted image adjacency matrix
, respectively. The forward propagation process for gene features can be represented as:
where is the feature matrix of the l-th layer for gene features, Additionally,
is the enhanced gene expression matrix
.
is the weight matrix for the l-th layer of gene features, and
is the activation function.
The forward propagation process for image features can be represented as:
where is the feature matrix of the l-th layer for image features, Additionally,
is the image expression matrix
.
is the weight matrix for the l-th layer of image features.
After the individual feature propagation, the next step is to combine the gene and image features. By fusing and
, Hc is obtained,, which is then passed through a final fusion convolution layer to output
. The specific calculation process is as follows:
where Wc is the weight matrix for the fused features.
After further convolution and activation operations, the final output of the fused features will be used for downstream spatial clustering tasks. Additionally, the model supports the option to use image loss by setting parameters, allowing for optimization based on image features. This design enables the AugGCL model not only to retain the spatial structural characteristics of cells but also to enhance the correlation of gene expression among neighboring cells, providing more accurate support for spatial data analysis.
Loss function construction
In this section, to help the model learn better, the overall optimization objective consists of multiple components. Reconstruction loss (gloss) calculates the difference between the original gene data and the reconstructed gene matrix, measured using Mean Squared Error (MSE). The goal is to minimize this difference, helping the model more accurately restore the gene data. Next, image reconstruction loss (iloss) measures the difference between the original image features and the reconstructed image matrix, ensuring that the spatial features of the image are well restored. Finally, regularization loss (reloss) calculates the difference between the fused reconstructed matrix and the original gene data, ensuring that the model can better fuse image and gene data while maintaining consistent feature representations. Note that the regularization loss is optional and can be used depending on whether it is necessary for the task. The specific calculation process is as follows:
where ,
, and
represent the decoding operations for gene features, image features, and the fused features, respectively.
The final total loss is the weighted sum of these three components, with weights controlled by hyperparameters ,
, and
. The formula is as follows:
By minimizing this total loss, the model can simultaneously optimize the reconstruction of gene data, image data, and the fused representation of both, thereby improving overall performance.
Spatial clustering and visualization
In this study, two clustering methods, K-means and MCLUST, are employed to effectively identify complex spatial structures. The K-means clustering method, based on centroid optimization, efficiently detects cell types and their interactions, making it particularly suitable for large-scale datasets. On the other hand, MCLUST employs a Gaussian mixture model, enabling it to handle clusters of varying shapes and sizes, thus enhancing the interpretability of spatial data. Dimensionality reduction techniques, such as t-SNE or UMAP, are used to map high-dimensional data into lower-dimensional space, complemented by heatmaps and spatial distribution maps for visualization, thereby improving the understanding of cell distribution. By combining these two clustering methods with visualization techniques, an in-depth analysis of cellular spatial distribution can be conducted, specific cell subpopulations can be identified, and their variations under different biological conditions can be explored, providing crucial support for subsequent biomedical research.
To evaluate the clustering performance quantitatively, the adjusted rand index, a widely used external clustering metric, is employed [44–46]. The ARI assesses the agreement between the predicted clustering labels and the manually annotated ground truth while adjusting for chance. The ARI is calculated as follows:
The calculation of the ARI compares pairs of elements from the clustering result with pairs of cell types from the ground truth labels. By evaluating pairs of cells within the same cluster and the same real cell type, as well as between different clusters and different real cell types, ARI generates a value ranging from -1 to 1. A value of 1 signifies a perfect agreement between the predicted clusters and the true cell types.
Normalized mutual information measures the amount of shared information between two clustering results and is commonly used to assess the accuracy of unsupervised clustering methods. Given two sets of cluster labels U and V, NMI is defined as:
where MI is the mutual information between U and V, and K(U), K(V) are the entropies of each label set. The formulas are as follows:
where P(i) is the probability of a sample belonging to the i-th cluster in U, and is the probability of a sample belonging to the j-th cluster in V.
, where nij denotes the number of samples simultaneously assigned to cluster Ui and
, and N is the total number of samples. A NMI value closer to 1 indicates a higher level of agreement between the two clustering results.
Ablation study and statistical significance analysis
The ablation study assesses the impact of different components of the AugGCL model, specifically gene modality, image modality, and multimodal fusion. The results show that combining both gene and image modalities consistently improves performance across all samples, highlighting the advantages of multimodal integration. The impact of the neighborhood information aggregation mechanism was also demonstrated, showing how the augmented gene matrix, after applying the mechanism, mitigates sparsity and enhances spatial domain identification. For more details, consult Figure 2 and Table 1 in S1 Appendix, which appear in (2) Ablation Study.
To compare the performance of AugGCL against other baseline models, statistical significance analysis was conducted using the independent sample t-test method. The results show that AugGCL consistently outperforms other models, with highly significant performance improvements. The p-values indicate that AugGCL achieves superior results compared to other models in most comparisons. The statistical performance metrics and p-value comparisons can be found in Table 2 and Table 3 in S1 Appendix, which appears in (3) Statistical Significance Analysis.
Downstream analysis
In this section, downstream biological analyses [47] are performed to gain deeper insights into the clustering results and data processing. These analyses help uncover cellular and molecular characteristics. The gene expression heatmap offers a clear visualization of expression patterns across various tissue regions, enabling the identification of gene variability and its distribution among different cell types. The volcano plot compares log-fold changes in gene expression with statistical significance, helping to identify genes that are significantly upregulated or downregulated, and revealing cellular responses under various biological conditions. Gene Ontology enrichment analysis identifies biological processes, molecular functions, and cellular components that are significantly overrepresented among differentially expressed genes, providing further insight into the biological functions and regulatory mechanisms of the identified gene clusters. By combining these analyses, a comprehensive understanding of gene distribution in spatial contexts is achieved, functional differences across cell types are explored, and essential support is provided for subsequent biomedical research.
Supporting information
S1 Appendix. Supplementary materials.
(1) Explanation of the Neighborhood Information Aggregation. (2) Ablation Study. (3) Statistical Significance Analysis.
https://doi.org/10.1371/journal.pcbi.1013912.s001
(PDF)
References
- 1. Moffitt JR, Lundberg E, Heyn H. The emerging landscape of spatial profiling technologies. Nat Rev Genet. 2022;23(12):741–59. pmid:35859028
- 2. Moses L, Pachter L. Museum of spatial transcriptomics. Nat Methods. 2022;19(5):534–46. pmid:35273392
- 3. Palla G, Fischer DS, Regev A, Theis FJ. Spatial components of molecular tissue biology. Nat Biotechnol. 2022;40(3):308–18. pmid:35132261
- 4. He S, Bhatt R, Brown C, Brown EA, Buhr DL, Chantranuvatana K, et al. High-plex imaging of RNA and proteins at subcellular resolution in fixed tissue by spatial molecular imaging. Nat Biotechnol. 2022;40(12):1794–806. pmid:36203011
- 5. Fang R, Xia C, Close JL, Zhang M, He J, Huang Z, et al. Conservation and divergence of cortical cell organization in human and mouse revealed by MERFISH. Science. 2022;377(6601):56–62. pmid:35771910
- 6. Giacomello S, Salmén F, Terebieniec BK, Vickovic S, Navarro JF, Alexeyenko A, et al. Spatially resolved transcriptome profiling in model plant species. Nat Plants. 2017;3:17061. pmid:28481330
- 7. Berglund E, et al. Spatial maps of prostate cancer transcriptomes reveal an unexplored landscape of heterogeneity. Nature Communications 2018; 9:2419.
- 8. Thrane K, Eriksson H, Maaskola J, Hansson J, Lundeberg J. Spatially resolved transcriptomics enables dissection of genetic heterogeneity in stage III cutaneous malignant melanoma. Cancer Res. 2018;78(20):5970–9. pmid:30154148
- 9. Hao Y, Hao S, Andersen-Nissen E, Mauck WM 3rd, Zheng S, Butler A, et al. Integrated analysis of multimodal single-cell data. Cell. 2021;184(13):3573-3587.e29. pmid:34062119
- 10. Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018;19(1):15. pmid:29409532
- 11. Zhao E, Stone MR, Ren X, Guenthoer J, Smythe KS, Pulliam T, et al. Spatial transcriptomics at subspot resolution with BayesSpace. Nat Biotechnol. 2021;39(11):1375–84. pmid:34083791
- 12. Xu C, Jin X, Wei S, Wang P, Luo M, Xu Z, et al. DeepST: identifying spatial domains in spatial transcriptomics by deep learning. Nucleic Acids Res. 2022;50(22):e131. pmid:36250636
- 13. Tang Z, Li Z, Hou T, Zhang T, Yang B, Su J, et al. SiGra: single-cell spatial elucidation through an image-augmented graph transformer. Nat Commun. 2023;14(1):5618. pmid:37699885
- 14. Chitra U, Arnold BJ, Sarkar H, Sanno K, Ma C, Lopez-Darwin S, et al. Mapping the topography of spatial gene expression with interpretable deep learning. Nat Methods. 2025;22(2):298–309. pmid:39849132
- 15. Li B, Tang Z, Budhkar A, Liu X, Zhang T, Yang B, et al. SpaIM: single-cell spatial transcriptomics imputation via style transfer. Nat Commun. 2025;16(1):7861. pmid:40849313
- 16. Ståhl PL, Salmén F, Vickovic S, Lundmark A, Navarro JF, Magnusson J, et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science. 2016;353(6294):78–82. pmid:27365449
- 17. van Dijk D, Sharma R, Nainys J, Yim K, Kathail P, Carr AJ, et al. Recovering gene interactions from single-cell data using data diffusion. Cell. 2018;174(3):716-729.e27. pmid:29961576
- 18. Yun S, Jeong M, Kim R, Kang J, Kim HJ. Graph transformer networks. Advances in Neural Information Processing Systems. 2019;32:11983–93.
- 19.
Kipf TN. Semi-supervised classification with graph convolutional networks. arXiv preprint 2016. https://doi.org/arXiv:1609.02907
- 20. Dong K, Zhang S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat Commun. 2022;13(1):1739. pmid:35365632
- 21. İnkaya T. Consensus similarity graph construction for clustering. Pattern Analysis and Applications 2023; 26:703–33.
- 22. Wei S. Multi-angle information aggregation for inductive temporal graph embedding. PeerJ Comput Sci. 2024;10:e2560. pmid:39650384
- 23. Pham D, Tan X, Balderson B, Xu J, Grice LF, Yoon S, et al. Robust mapping of spatiotemporal trajectories and cell-cell interactions in healthy and diseased tissues. Nat Commun. 2023;14(1):7739. pmid:38007580
- 24. Hu J, Li X, Coleman K, Schroeder A, Ma N, Irwin DJ, et al. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat Methods. 2021;18(11):1342–51. pmid:34711970
- 25. Xu H, Fu H, Long Y, Ang KS, Sethi R, Chong K, et al. Unsupervised spatially embedded deep representation of spatial transcriptomics. Genome Med. 2024;16(1):12. pmid:38217035
- 26. Long Y, Ang KS, Li M, Chong KLK, Sethi R, Zhong C, et al. Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST. Nat Commun. 2023;14(1):1155. pmid:36859400
- 27. Camacho J, Ejaz E, Ariza J, Noctor SC, Martínez-Cerdeño V. RELN-expressing neuron density in layer I of the superior temporal lobe is similar in human brains with autism and in age-matched controls. Neurosci Lett. 2014;579:163–7. pmid:25067827
- 28. Arnsten AFT, Woo E, Yang S, Wang M, Datta D. Unusual molecular regulation of dorsolateral prefrontal cortex layer iii synapses increases vulnerability to genetic and environmental insults in Schizophrenia. Biol Psychiatry. 2022;92(6):480–90. pmid:35305820
- 29. Maynard KR, Collado-Torres L, Weber LM, Uytingco C, Barry BK, Williams SR, et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat Neurosci. 2021;24(3):425–36. pmid:33558695
- 30. Alon S, Goodwin DR, Sinha A, Wassie AT, Chen F, Daugharthy ER, et al. Expansion sequencing: spatially precise in situ transcriptomics in intact biological systems. Science. 2021;371(6528):eaax2656. pmid:33509999
- 31. Zhang Y, Jin Y, Li J, Yan Y, Wang T, Wang X, et al. CXCL14 as a key regulator of neuronal development: insights from its receptor and multi-omics analysis. Int J Mol Sci. 2024;25(3):1651. pmid:38338930
- 32. Camacho J, Ejaz E, Ariza J, Noctor SC, Martínez-Cerdeño V. RELN-expressing neuron density in layer I of the superior temporal lobe is similar in human brains with autism and in age-matched controls. Neurosci Lett. 2014;579:163–7. pmid:25067827
- 33. Zhao X, Liu L, Huang Z, Zhu F, Zhang H, Zhou D. PTN from Leydig cells activates SDC2 and modulates human spermatogonial stem cell proliferation and survival via GFRA1. Biol Res. 2024;57(1):66. pmid:39285301
- 34. Xia L, Yue Y, Li M, Zhang Y-N, Zhao L, Lu W, et al. CNN3 acts as a potential oncogene in cervical cancer by affecting RPLP1 mRNA expression. Sci Rep. 2020;10(1):2427. pmid:32051425
- 35. Li M, Zhang X, Ang KS, Ling J, Sethi R, Lee NYS, et al. DISCO: a database of deeply integrated human single-cell omics data. Nucleic Acids Res. 2022;50(D1):D596–602. pmid:34791375
- 36. Hu Q, Hong Y, Qi P, Lu G, Mai X, Xu S, et al. Atlas of breast cancer infiltrated B-lymphocytes revealed by paired single-cell RNA-sequencing and antigen receptor profiling. Nat Commun. 2021;12(1):2186. pmid:33846305
- 37. Zhang Y, Chen H, Mo H, Hu X, Gao R, Zhao Y, et al. Single-cell analyses reveal key immune cell subsets associated with response to PD-L1 blockade in triple-negative breast cancer. Cancer Cell. 2021;39(12):1578-1593.e8. pmid:34653365
- 38. Bassez A, Vos H, Van Dyck L, Floris G, Arijs I, Desmedt C, et al. A single-cell map of intratumoral changes during anti-PD1 treatment of patients with breast cancer. Nat Med. 2021;27(5):820–32. pmid:33958794
- 39. Soman A, Asha Nair S. Unfolding the cascade of SERPINA3: inflammation to cancer. Biochim Biophys Acta Rev Cancer. 2022;1877(5):188760. pmid:35843512
- 40. Schrader CH, Kolb M, Zaoui K, Flechtenmacher C, Grabe N, Weber K-J, et al. Kallikrein-related peptidase 6 regulates epithelial-to-mesenchymal transition and serves as prognostic biomarker for head and neck squamous cell carcinoma patients. Mol Cancer. 2015;14:107. pmid:25990935
- 41. Nieddu V, Melocchi V, Battistini C, Franciosa G, Lupia M, Stellato C, et al. Matrix Gla Protein drives stemness and tumor initiation in ovarian cancer. Cell Death Dis. 2023;14(3):220. pmid:36977707
- 42. Chen A, Liao S, Cheng M, Ma K, Wu L, Lai Y, et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell. 2022;185(10):1777-1792.e21. pmid:35512705
- 43. Estrada E. Communicability cosine distance: similarity and symmetry in graphs/networks. Comp Appl Math. 2024;43(1).
- 44. Shi Q, Li X, Peng Q, Zhang C, Chen L. scDA: Single cell discriminant analysis for single-cell RNA sequencing data. Comput Struct Biotechnol J. 2021;19:3234–44. pmid:34141142
- 45. Hubert L, Arabie P. Comparing partitions. Journal of Classification. 1985;2(1):193–218.
- 46. Zhang C, Liu J, Shi Q, Zeng T, Chen L. Comparative network stratification analysis for identifying functional interpretable network biomarkers. BMC Bioinformatics. 2017;18(Suppl 3):48. pmid:28361683
- 47. Liu J, Tran V, Vemuri VNP, Byrne A, Borja M, Kim YJ, et al. Concordance of MERFISH spatial transcriptomics with bulk and single-cell RNA sequencing. Life Sci Alliance. 2022;6(1):e202201701. pmid:36526371