Skip to main content
Advertisement
  • Loading metrics

SpatialKNifeY (SKNY): Extending from spatial domain to surrounding area to identify microenvironment features with single-cell spatial omics data

  • Shunsuke A. Sakai,

    Roles Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliations Division of Translational Informatics, Exploratory Oncology Research & Clinical Trial Center, National Cancer Center, Kashiwa, Chiba, Japan, Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan, Department of Radiation Oncology, National Cancer Center Hospital East, Kashiwa, Chiba, Japan

  • Ryosuke Nomura,

    Roles Investigation, Methodology, Validation, Visualization, Writing – review & editing

    Affiliations Division of Translational Informatics, Exploratory Oncology Research & Clinical Trial Center, National Cancer Center, Kashiwa, Chiba, Japan, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan

  • Satoi Nagasawa,

    Roles Investigation, Methodology, Writing – review & editing

    Affiliations Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan, Department of Breast Surgery, National Cancer Center Hospital East, Kashiwa, Chiba, Japan

  • SungGi Chi,

    Roles Investigation, Methodology, Writing – review & editing

    Affiliation Division of Translational Informatics, Exploratory Oncology Research & Clinical Trial Center, National Cancer Center, Kashiwa, Chiba, Japan

  • Ayako Suzuki,

    Roles Investigation, Methodology, Writing – review & editing

    Affiliation Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan

  • Yutaka Suzuki,

    Roles Investigation, Methodology, Writing – review & editing

    Affiliation Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan

  • Mitsuho Imai,

    Roles Data curation, Investigation, Resources, Writing – review & editing

    Affiliations Translational Research Support Office, National Cancer Center Hospital East, Chiba, Japan, Department of Genetic Medicine and Services, National Cancer Center Hospital East, Chiba, Japan

  • Yoshiaki Nakamura,

    Roles Data curation, Investigation, Resources, Writing – review & editing

    Affiliations Translational Research Support Office, National Cancer Center Hospital East, Chiba, Japan, Department of Gastroenterology and Gastrointestinal Oncology, National Cancer Center Hospital East, Chiba, Japan

  • Takayuki Yoshino,

    Roles Data curation, Investigation, Resources, Writing – review & editing

    Affiliations Translational Research Support Office, National Cancer Center Hospital East, Chiba, Japan, Department of Gastroenterology and Gastrointestinal Oncology, National Cancer Center Hospital East, Chiba, Japan

  • Shumpei Ishikawa,

    Roles Methodology, Writing – review & editing

    Affiliations Department of Preventive Medicine, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan, Division of Pathology, National Cancer Center Exploratory Oncology Research & Clinical Trial Center, Kashiwa, Chiba, Japan

  • Katsuya Tsuchihara,

    Roles Supervision, Writing – review & editing

    Affiliations Division of Translational Informatics, Exploratory Oncology Research & Clinical Trial Center, National Cancer Center, Kashiwa, Chiba, Japan, Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan

  • Shun-Ichiro Kageyama ,

    Roles Conceptualization, Supervision, Writing – review & editing

    riuyamas@east.ncc.go.jp (RY); skageyam@east.ncc.go.jp (SIK)

    Affiliations Department of Radiation Oncology, National Cancer Center Hospital East, Kashiwa, Chiba, Japan, Division of Radiation Oncology and Particle Therapy, Exploratory Oncology Research & Clinical Trial Center, National Cancer Center, Kashiwa, Chiba, Japan

  • Riu Yamashita

    Roles Conceptualization, Methodology, Supervision, Writing – review & editing

    riuyamas@east.ncc.go.jp (RY); skageyam@east.ncc.go.jp (SIK)

    Affiliations Division of Translational Informatics, Exploratory Oncology Research & Clinical Trial Center, National Cancer Center, Kashiwa, Chiba, Japan, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan

Abstract

Single-cell spatial omics analysis requires consideration of biological functions and mechanisms in a microenvironment. However, microenvironment analysis using bioinformatic methods is limited by the need to detect histological morphology and extend it to the surrounding area. In this study, we developed SpatialKNifeY (SKNY), an image-processing-based toolkit that detects spatial domains that potentially reflect histology and extends these domains to the microenvironment. Using spatial transcriptomic data from breast cancer, we applied the SKNY algorithm to identify tumor spatial domains, followed by clustering of the domains, trajectory estimation, and spatial extension to the tumor microenvironment (TME). The results of the trajectory estimation were consistent with the known mechanisms of cancer progression. We observed tumor vascularization and immunodeficiency at mid- and late-stage progression in TME. Furthermore, we applied the SKNY to integrate and cluster the spatial domains of 14 patients with metastatic colorectal cancer, and the clusters were divided based on the TME characteristics. In conclusion, the SKNY facilitates the determination of the functions and mechanisms in the microenvironment and cataloguing of the features.

Author summary

The advent of high-resolution and high-density spatial omics platforms has created a growing need for practical analytical tools in cancer research. While significant efforts have been made to develop unsupervised clustering methods, advancements in downstream analyses have been relatively slower. To address this issue, we developed SpatialKNifeY (SKNY), a versatile toolkit designed to analyze spatial omics data by defining the spatial domains of cancer cells and their microenvironment. SKNY offers a suite of analyses, including clustering and trajectory analysis, with a unique capability to extract spatial domains and their surrounding regions. The tool enables integrated studies of cancer cells alongside their stroma, immune cells, and vascular environment. Using SKNY, we quantified the vascular and immune environments surrounding cancer cells during progression, revealing insights consistent with established cancer pathology and progression models. These results highlight the toolkit’s utility and the biological interpretability of its analyses, providing a valuable resource for spatial omics research.

Introduction

Single-cell spatial omics platforms, such as Xenium, CosMx [1], and PhenoCycler [2], offer opportunities for the investigation of hundreds or thousands of genes in various organs and tissue types. The resolutions of the methods are at the single-cell level, providing deep insights into the localization of the expression of multiple genes in a particular microenvironment, which includes not only cancer cells but also immune cells and non-immune stromal cells. A key consideration in microenvironment analysis is the integration of gene expression and histological features to obtain a comprehensive understanding of biological functions and mechanisms. Classical methods that examine the tumor microenvironment (TME) using a microscope capture histological features through staining or fluorescence-based technologies, leading to the discovery of pathological mechanisms in the microenvironment [3]. However, in the current omics era, with the large number of specimens and gene panels, manual physical approaches are inefficient and impractical.

To address the high throughput of omics data, several third-party tools, such as Seurat and Scanpy, have been developed to efficiently analyze expression data from thousands of gene panels and samples [410]. Methods inherited from single-cell RNA-seq have been implemented, including cell clustering [1114], trajectory analysis [1518], and ligand-receptor analysis [1921]. These analytical methods use gene expression but do not consider molecular or cellular location. Hence, the integration of gene expression and location information is necessary for optimizing spatial omics analysis of the microenvironment.

In response to the demand, several tools dedicated to spatial omics have been developed, such as clustering analyses that integrate positional information with gene expression [22] and ligand-receptor enrichment analysis at each spot in a space partitioned on a grid [23]. Although the methods are attractive for application in analyzing spatial information, microenvironmental analysis is limited by the lack of direct histological information. Recently, the STAGATE algorithm [24] has been developed for detecting spatial domains (i.e., regions with similar spatial expression patterns), and Sopa [25] was constructed to extend ‘spatial domain’ analysis to single-cell spatial omics data. The methods can detect spatial domains that reflect and functionally resemble tumor, stromal, and vascular histologies. Even though these tools are invaluable for extracting and characterizing spatial domains, they are limited in analyzing a particular spatial domain’s surrounding area, the microenvironment.

Here, we extended the concept of the spatial domain to the microenvironment, which encompasses inside, peri-, and outside sections of the spatial domain, with the aim of estimating the functions and mechanisms of the microenvironment (Fig 1A). We developed an image processing-based toolkit, SpatialKNifeY (SKNY), to analyze the spatial domains in spatial omics data (Output 1-3) and extend it to the microenvironment (Output 4) (Fig 1B). Single-cell spatial transcriptomics data from Xenium [26] were used to detect spatial domains of tumor for analyzing the TME (Output 1: Detection) (Fig 1C). Clustering of the spatial domains resulted in the formation of clusters consistent with malignancy and subtypes (Output 2: Clustering), and the trajectory among spatial domains was estimated to represent the tumor progression process (Output 3: Trajectory estimation). The analysis extended from the spatial domain into the TME and assessed infiltration of endothelial cells into the tumor (Output 4: Spatial stratification). Moreover, to conduct an integrated analysis with multiple samples, SKNY was applied to a Xenium dataset of 14 patients with metastatic colorectal cancer. The results suggest that the SKNY can provide microenvironment analysis and may provide essential insights into their pathological functions. The SKNY algorithm is available under an open-source license (https://github.com/shusakai/skny).

thumbnail
Fig 1. SpatialKNifeY analysis landscape.

(A) The concept of the extension from spatial omics data and spatial domain to the microenvironment. (B) Implementation of SpatialKNifeY (SKNY). A Python library of SKNY depends on stlearn [23] and scanpy [9] functions (see “Methods”) and AnnData object programming [10]. (C) Outputs from SKNY analysis. Detection (Output 1, see “Fig 2”) delineates spatial domains based on a user’s positive and negative marker gene expressions. Clustering (Output 2, see “Fig 3”) makes clusters of spatial domain units based on the mean expression of each gene. Trajectory estimation (Output 3, see “Fig 4”) refers to the trajectory among spatial domains and pseudotime. Spatial stratification (Output 4, see “Fig 2”, “Fig 5”, and “Fig 6”) measures the distance from tumor boundary to each coordinate on the space and makes contour lines based on the distance.

https://doi.org/10.1371/journal.pcbi.1012854.g001

Results

SKNY detects tumor spatial domains from Xenium breast cancer data

In the present study, to detect the spatial domain with the SKNY, Xenium breast cancer data from a previous report were used [26]. A hematoxylin and eosin (HE)-stained image of the specimen from a previous report is shown (Fig 2A). The specimen on a single slide contained various tumor tissues, including ductal carcinoma in situ (DCIS) and invasive ductal carcinoma (IDC). Using Xenium data, the SKNY algorithm was applied to detect tumor spatial domains (yellow) and extract their boundaries (green) based on the expression levels of the epithelial cell marker CDH1 (Fig 2B). Independently, the STAGATE algorithm [24] was used to identify tumor spatial domains (S1A, S1B, and S1C Fig), resulting in high concordance with the SKNY results (Jaccard similarity coefficient = 0.85). The results suggest that the image-processing-based spatial domain extraction of the SKNY method is consistent with previous methods. Moreover, spatial domains were extracted from ovarian cancer, colorectal cancer, and melanoma data and were found to be visually consistent with the HE-stained images (S2 Fig). The inward/outward areas from the extracted spatial domain boundaries were measured (S3A Fig), and the contour line was delineated at 30-µm intervals to spatially stratify the TME (Figs 2C and S3B). High-power field images, including single (Fig 2C Left), triple (Fig 2C Middle), and multiple spatial domains (Fig 2C Right), showed visual concordance between the spatial domains and HE staining images for tumor detection.

thumbnail
Fig 2. Detection of spatial domain using Xenium data accurately discriminates between the tumor and stromal region.

(A) H&E staining image of breast cancer. (B) Detected spatial domains. The yellow and green colors indicate spatial domains and the boundary, respectively. (C) H&E staining images and spatial domain(s) from three ROIs. The red contour lines indicate distance from the surface of spatial domains at the 30-μm interval. (D) Dotplot showing marker genes of each cell type. The color bar indicates the scaled mean count, and the size indicates the percentages of the gene expressions. (E) Spatial expression distribution of cell marker genes in the ROI. The color bar indicates the scaled mean count.

https://doi.org/10.1371/journal.pcbi.1012854.g002

To confirm that the spatial domains were partitioned correctly between the tumor and stroma, the expression levels of several marker genes were examined in the stratified (−90, −60] to (+120, +150] sections in the total field. The results showed that cancer cell marker genes, such as CDH1, EPCAM, FOXA1, and GATA3, were enriched within the spatial domain (sections (−120, −90], (−90, −60], (−60, −30] and (−30, 0]) (Fig 2D). The myoepithelial cell marker genes, such as KRT5, KRT14, MYLK, and ACTA2, were enriched around the spatial domain boundary (the section of (0, +30]), and the macrophage, lymphocyte, endothelial cell, and stromal cell markers, such as CD68, TRAC, PECAM1, and MMP2, respectively, were enriched on the outside (the sections of (+30, +60], (+60, +90], (+90, +120], and (+120, +150]). The spatial localization of gene expression showed that EPCAM was overrepresented within the spatial domain, ACTA2 at the boundary, and PECAM1, TRAC, and MMP outside the domain (Fig 2E). The results suggest that the spatial domains stratified using the SKNY algorithm can be divided into tumors, peritumor, and stroma.

SKNY clusters the spatial domains with multiple mixed cell types into subclusters using the UMAP algorithm

Next, to assess the diversity of cells within extracted spatial domains, the α-diversity index (Chao1) was compared based on the gene expression between cancer cells and spatial domains. The results indicated that gene expression in the spatial domain was significantly more diverse than that in the cancer cells (P<0.001) (S4A Fig), suggesting that the spatial domains contain various cells, not only cancer cells. A similar trend was confirmed between the various cells independently annotated by Janesick et al. [26] and the spatial domain by the three α-diversity indices of Chao1, observed features (the number of unique genes detected in each cell or spatial domain), and Shannon (S4B-S4D Fig). Moreover, diversity variance of Chao1 was greater in spatial domains (standard deviation [SD]=62.1) than in cancer cells (SD=34.2). Hence, we hypothesized that the heterogeneity among spatial domains originated not only from cancer cells but also from diverse cells in the microenvironment. Here, we performed clustering of spatial domains to evaluate heterogeneity among spatial domain microenvironments. The gene expression data (313 genes) were reduced dimensionally by principal component analysis (PCA), resulting in nine clusters (0–8) based on their similarity in PCA space. Each spatial domain was placed in the two-dimensional space using UMAP (Fig 3A) and the original space (Fig 3B). To annotate the clusters with histology, we showed HE staining images based on the previous report (Fig 3C). Combining this histology on HE staining with the clusters shown in Fig 3B, we found that clusters 2, 3, 5, and 8 corresponded to non-invasive ductal carcinoma in situ (DCIS), whereas clusters 0, 1, 4, 6, and 7 corresponded to invasive ductal carcinoma in situ (IDC).

thumbnail
Fig 3. Clustering and annotation of spatial domain based on gene expression.

(A) Two-dimensional plot based on UMAP loadings of gene expression of spatial domains. The colors indicate clusters. (B) Spatial distribution of each cluster. (C) H&E image with histological annotations. (D) Dotplot showing markers of cell types and expression patterns of genes associated with tumor subtypes.

https://doi.org/10.1371/journal.pcbi.1012854.g003

To provide detailed annotations of each spatial domain cluster, we examined the expression of several marker genes. In clusters 0, 1, 4, 6, and 7 (IDC clusters), MKI67 and ERBB2 were expressed highly. Conversely, in clusters 2, 3, 5, and 8 (DCIS clusters), the myoepithelial cell markers ACTA2, MYLK, and KRT14 were expressed highly. The results suggest that gene expression in each spatial domain was consistent with the histological annotation (Fig 3D). Interestingly, cluster 1 showed high expression of endothelial cell markers, including PECAM1, VWF, and CD93, as well as chemokines and chemokine receptor genes associated with cell migration, CXCL12 and CXCR4. Furthermore, MKI67, ABCC11, and FOXA1 expression were moderate in cluster 1 compared to those in other IDC clusters (Fig 3D). Considering the moderate expression of the cancer-associated genes and their midpoint in the UMAP space (Fig 3A), Cluster 1 may represent a spatial domain at an intermediate stage in the transition from DCIS to IDC.

SKNY estimates spatial domain trajectory, which reflects tumor progression

To estimate the spatial domain trajectory from DCIS to IDC, a partition-based graph abstraction (PAGA) algorithm [15] was used to construct an adjacency graph representing the topology of expression patterns for each cluster (Fig 4A). The adjacency graph is divided into clusters 2, 3, 5, and 8 (DCIS) and clusters 0, 4, 6, and 7 (IDC), where cluster 1 connects the DCIS and IDC clusters. Additionally, cluster 3, exhibiting the lowest tumor marker gene expression, as shown in Fig 3D, was located at the lower end. This structure is consistent with the hypothesis that the spatial domain of DCIS clusters transitions to the IDC cluster via cluster 1. Moreover, we confirmed that other algorithms, such as Monocle [27] and Slingshot [17], estimated a similar trajectory (S5A and S5B Fig). The pseudotime with cluster 3 as the root was determined and placed in the two-dimensional space of the PAGA algorithm and the original space (Fig 4B Left and 4C). We evaluated the correlation between the pseudotime and MKI67 (r=0.52, P<0.001, Pearson coefficient)/ACTA2 (r=−0.47, P<0.001). The pseudotime illustrated tumor progression (Fig 4B Middle and Right).

thumbnail
Fig 4. Estimating spatial domain trajectory reveals temporal gene expression gradient along cancer progression.

(A) PAGA graph constructed based on the expression data of the spatial domains. (B) PAGA-initialized spatial domain embeddings with estimated pseudotimes, MKI67, and ACTA2 expressions. Pearson’s correlation coefficients and P values were used to evaluate linear relationship between pseudotimes and scaled expression of MKI67/ACTA2. (C) Spatial distribution of clusters and pseudotimes. (D) Heatmap showing gene expression level along with pseudotimes on three progression paths. (E) Representative HE staining images and gene expression on the ROI. Color bar indicates the scaled mean count.

https://doi.org/10.1371/journal.pcbi.1012854.g004

To identify characteristic gene expression at points on this pseudotime axis, we hypothesized three tumor progression paths (cluster 3→8→1→7→4: IDC path #1, 3→5→1→7→4: IDC path #2, and cluster 3→2: DCIS path) and evaluated trends in gene expression along the paths. In IDC paths #1 and #2, the expression of myoepithelial cell markers (ACTG2 and MYLK) tended to decrease in the early stages of progression, whereas that of malignant markers (ERBB2) tended to increase in the later stages (Fig 4D). In contrast, the myoepithelial cell and malignant marker fluctuations appeared to be moderate in the DCIS path. Moreover, in IDC paths #1 and #2, marker genes for endothelial cells (VWF and PECAM1), lymphocytes (CD4), macrophages (CD68), chemokines (CXCL12 and CCL5), and chemokine receptors (CXCR4) were expressed highly at the intermediate stages of cancer progression. Similarly, in the DCIS path, CD4, CD68, and CCL5 showed increased expression with progression. The findings suggest that endothelial cells and chemokine signaling are involved in the transition from DCIS to IDC. We also examined the spatial distribution of gene expression within the region of interest (ROI) corresponding to the transition phase from DCIS to IDC. The results showed a pattern in which PECAM1, VWF, CXCR4, and CXCL12 appeared to infiltrate regions of the tumor delineated by HE staining and EPCAM (Fig 4E). This also suggests that during the transition from DCIS to IDC, endothelial cells may infiltrate tumors and activate chemokine signals.

SKNY quantifies the infiltrating of endothelial cells to spatial domains in the microenvironment

We extended the spatial domains with their expression into their inner, peri-, and outer sections, namely, microenvironments, to quantitatively compare endothelial cell infiltration into tumors. We stratified the distance from the boundary of the spatial domain into 30-μm sections and extracted (−30, 0] (inner), (0, +30] (peri-), and (+30, +60] (outer) sections of each cluster (Fig 5A). Although no significant differences in expression levels were observed in the (+30, +60] section, significant differences among clusters were observed in the (−30, 0] section for endothelial cell markers PECAM1 and VWF (P=0.0053 and < 0.001, Kruskal−Wallis test, respectively), with relatively high expression in cluster 1 (Fig 5B). The spatial autocorrelation coefficient (Geary’s C) of PECAM1, VWF, and CD93 were calculated within each cluster, revealing that the DCIS-to-IDC cluster (Cluster 1) tended to have a lower score indicating higher autocorrelation (S5C Fig). To confirm the spatial expression patterns, ROIs selected from clusters 3, 8, 1, and 0 were extracted, and the distribution of cancer cell (EPCAM and CDH1) and endothelial cell (VWF, PECAM, CD93) markers was examined using Xenium Explorer. In clusters 3 and 8 (DCIS cluster), endothelial cell markers were localized outside the spatial domain, whereas in cluster 1 (DCIS-to-IDC cluster), the markers were localized in the tumor spatial domain (Fig 5C). Moreover, cluster 0 (IDC cluster) appeared to remain in the gaps where the cancer cells had migrated (Fig 5C Right). The results demonstrate that the analysis, expanded from the spatial domain to the microenvironment, could reflect the infiltration of endothelial cells into the tumor.

thumbnail
Fig 5. Spatial stratification of each spatial domain cluster elucidating endothelial cell invasion into the tumor.

(A) Spatial distributions of stratified spatial domain clusters into (−30, 0], (0, +30], and (+30, +60] sections. (B) Violin plots showing the endothelial cell marker gene expressions (PECAM1, VWF, and CD93) for each cluster in the (−30, 0], (0, +30], and (+30, +60] sections. The x-axes indicate cluster numbers, and the y-axes indicate scaled gene expression levels. The annotated values are the P values of the significance test. (C) Representative images of DAPI with epithelial cell markers (CDH1 and EPCAM) and endothelial cell marker (CD93, PECAM1, and VWF) expression for four ROIs.

https://doi.org/10.1371/journal.pcbi.1012854.g005

To analyze other stromal cells, we examined changes over the pseudotime (IDC path #1) in the expression of endothelial cells (PECAM1), macrophages (CD68), matrix metalloproteinases (MMP2), chemokine receptors (CXCR4), and chemokines (CXCL12) in each TME section, at (−30, 0] (inner), (0, +30) (peri-), and (+30, +60] (outer), respectively. In the inner section, PECAM1 (P=0.023), CD68 (P<0.001), and CXCR4 (P<0.001) showed increases during the transition period from DCIS to IDC (clusters 8, 1, and 7), whereas in the peri-section, PECAM1 (P=0.035) and CXCR4 (P<0.0024) showed an increase during (Kruskal–Wallis test, Bonferroni-corrected P values) (S6A Fig). In contrast, in the peri- and outer sections, MMP2 (P<0.001 and P=0.020, respectively) showed an increase in the peaks in early DCIS (cluster 3) and late IDC (cluster 4). We summarized the temporal sequences of the expression of the genes. In the early stages of cancer progression, MMP2 expression was upregulated in the peri- and outer regions (S6B Fig). In the tumor progression from non-invasive to invasive cancer, infiltration of endothelial cells (PECAM1) and macrophages (CD68) was noted in the tumor interior, in addition to increased chemokine signaling (CXCR4). After invasion, MMP2 expression was upregulated in the peritumor and outer regions.

SKNY quantifies the spatial localization of immune cells around spatial domains as a microenvironment

We focused on a microenvironment around the spatial domain and compared the localization of several immune cells between DCIS and IDC areas. Four ROIs were extracted for both DCIS and IDC, with sufficient inclusion of both tumor and stroma (Fig 6A). To quantify the spatial localization of gene expression outside the spatial domains, the area into was stratified into (0, +30], (+30, +60], (+60, +90], (+90, +120], and (+120, +150] sections based on the measured distance from the spatial domains’ surface. We compared the expression of immune cell markers, including CD19, CD4, CD8A, FOXP3, ITGAX, CD68, and CD163, in the sections between DCIS and IDC areas (Fig 6B). A general linearized model (GLM) model was constructed to conduct a statistical analysis of the differences in expression of the markers among the sections. The objective variable was defined as expression density (Ydensity), while the explanatory variables were section (Xsection), group (Xgroup), and interaction term between section and group (Xsection×group). The results showed that CD8A was elevated markedly in DCIS (P<0.001), and CD163 was increased significantly in IDC (P = 0.0039). Additionally, the GLM model suggested a significant interaction effect of CD8A (P=0.037), with a distinct spatial distribution pattern featuring peaks at (+30, +60] in DCIS and (+120, +150] in IDC. CD8A was concentrated closer to DCIS and farther away from IDC (Fig 6C).

thumbnail
Fig 6. Comparison of immune cell distribution in microenvironments between DCIS and IDC regions.

(A) ROIs for DCIS and IDC clusters, respectively. The ROIs in red indicate clusters for DCIS cluster and those in blue for IDC cluster. (B) Bar plot shows the expression density of various immune cell markers stratified by section. The red series indicates DCIS, and the blue series indicates IDC. The respective color gradients indicate each stratified interval. Error bars represent 95% confidence intervals (CI). The p-values and regression coefficients [95% CI] for each constructed GLM model are shown on the right side. (C) Spatial expression distribution of CDH1 and CD8A in each ROI.

https://doi.org/10.1371/journal.pcbi.1012854.g006

To identify the molecular pathways associated with the differences in CD8A localization, genes that exhibited a strong correlation with CD8A distribution we extracted from the total 313 genes in the Xenium panel [26] for each of DCIS and IDC (S7A Fig). The correlation coefficients for CCL5, CCL8, CD8B, and so forth, were observed to be higher in DCIS, while those for CX3CR1, KRT14, CD8B, and so forth were higher in IDC. The spatial localizations of CCL5 and CX3CR1 were also correlated with that of CD8A in DCIS and IDC areas, respectively (S7B and S7C Fig). Moreover, KEGG enrichment analysis was conducted on the genes with correlation coefficients ≥ 0.75. The results indicated that genes such as “Antigen processing and presentation” (hsa04612) in DCIS and “Primary immunodeficiency” (hsa05340) in IDC were enriched significantly (S7D Fig). The results demonstrated that the environment surrounding the spatial domain undergoes alterations at the pathway level in the context of cancer progression.

SKNY can integrate multiple samples of metastatic colorectal cancer and cataloguing features of the microenvironment

Finally, to conduct an integrated analysis of multiple samples, SKNY was applied to the Xenium dataset pertaining to metastatic colorectal cancer (Nsample=24, Npatient=14) from the TRIUMPH trial [28] (S8 Fig). A total of 2,151 spatial domains, including not only tumor cell but also non-tumor cell, were extracted and classified into 12 clusters in the tSNE space (Fig 7A). Conversely, as a conventional analysis at the cellular level, 391,639 tumor cells were extracted in isolation from non-tumor cells such as fibroblasts and immune cells (S9A-S9C Fig). The tumor cells were subsequently clustered, resulting in the formation of 15 distinct clusters within the tSNE space (Fig 7A). The patient IDs were linked to the spatial domains and tumor cells (Fig 7B). The results indicated that within the spatial domain-based space, each cluster encompassed multiple patients. In contrast, within the tumor cell-based space, each patient distinctly separated the clusters. Furthermore, spatial domain cluster 3 exhibited high COL1A1 expression, while spatial domain cluster 5 showed elevated FGFR2 and CXCL12 levels. The genes are expressed by cells in the TME, playing crucial roles in immune response and treatment resistance. However, such environmental clusters could not be identified through single-cell cluster analysis of tumor cells alone (S10 Fig). The findings demonstrate that SKNY’s integrated analysis of multiple samples is capable of cataloguing critical microenvironmental factors.

thumbnail
Fig 7. Cataloguing spatial domains using Xenium data of metastatic colorectal cancer in TRIUMPH trial.

(A) t-Distributed stochastic neighbor embedding (tSNE) plot of 2,151 spatial domains (left) and 391,639 tumor cells (right) from 23 Xenium data. The plots are colored according to the clusters determined by the leiden algorithm [29]. (B) tSNE from A colored by patient ID.

https://doi.org/10.1371/journal.pcbi.1012854.g007

Discussion

In the present study, the SKNY algorithm was applied to spatial transcriptomics data to predict the cellular and molecular functions and mechanisms in the TME. The TME includes diverse cells, such as cancer-associated fibroblasts, stromal cells, and immune cells, which are involved in cancer progression [30], and the TME concept has also been incorporated into clinical research on breast cancer [31]. For example, immunohistochemical pathological analysis has shown that intratumoral macrophages stained by CD68 are correlated with malignancy [32,33] and that intertumoral microvessel density assessed based on CD31, which reflects angiogenesis, is a key poor prognosis factor [34,35]. In breast cancer, high Ki67 and HER2 expression is associated with malignancy [36], whereas destruction of myoepithelial cells is associated with tumor invasion [37]. Consistent with these previous reports on pathology, the results of spatial stratification (Output 4) analysis, which showed an overrepresentation of CD68 and PECAM1 (CD31) within the spatial domain of the invasive tumor (Figs 5 and S6), demonstrated the infiltration of macrophages and endothelial cells into malignant cancer. Moreover, MMP2 was overexpressed in the early and late stages of tumor progression in the stromal area, and CXCR4 and CXCL12 were enriched after mid-stage progression inside the tumor (S6 Fig). MMPs contribute to the sprouting of vascular endothelial cells by degrading the vascular basement membrane and extracellular matrix in the early stages of tumor angiogenesis [38], and CXCR4/CXCL12 signaling pathway mediates cell migration signals and metastasis processes [39]. The results are consistent with the previous findings, suggesting that our algorithm can accurately estimate compatible biological mechanisms in the TME.

The trajectory estimation (Output 3) analysis was used to construct the tumor progression trajectory of the spatial domains (Fig 4). The interaction of various cells in the TME is considered crucial for cancer progression [30]; therefore, the progression trajectory should be determined by integrating all cells in the TME rather than by focusing solely on cancer cells. In our results, during the transition from DCIS to IDC, an overrepresentation of vascular endothelial cells expressing PECAM1 and VWF, as well as an increase in the CXCL12 and CXCR4 chemokine-chemokine receptor pair, was noted. The results are consistent with the known mechanisms by which cancer cells acquire invasive potential through endothelial cells [40] and the associated induction of cell migration signals from chemokines [39]. Most importantly, gene expression from non-cancer cells was the ‘missing link’ between DCIS and IDC in the trajectory, suggesting the utility of the approach for integrating all cells within the spatial domain. Furthermore, our data estimated the trajectory from the root to PGR-positive DCIS without progression to IDC. Reduced PGR expression has been suggested as a surrogate marker for GATA3 mutations, one of the genetic factors involved in DCIS progression [41,42]. Paradoxically, the previous reports, combined with our results, suggest that the transition to PGR-positive DCIS may slow cancer progression. The thin edge from PGR-positive DCIS to other clusters in the PAGA graph also supports this hypothesis.

The spatial stratification algorithm (Output 4) extended the spatial domain to encompass not only the inner section but also the surrounding area (Figs 2, 5, and 6). Our results showed that CD8A was spatially localized in closer proximity to DCIS regions and more distally to IDC regions (Fig 6). In the previous report, a reduction in the number of activated CD8+ T cells was observed in IDC than those in DCIS [43]. Our results are consistent with this previous report and support the validity of the SKNY analysis. Furthermore, by extracting gene sets with high correlation to CD8A expression in the stratified sections, SKNY evaluated biochemical pathways and cell-cell interactions within the regions where CD8+ T cells were present. Although the present study focused on a relatively limited number of genes, thereby only detecting typical antigen-presenting pathways, the application of a more comprehensive gene panel could facilitate the identification of specific drug target molecules.

The detection algorithm (Output 1) delineated different tumor shapes based on histological features (Fig 2). The enrichment of cancer cells and stromal markers within and outside the spatial domains indicates accurate separation of the tumor and stroma. Myoepithelial cells surround the ductal epithelium for structural support [44], and our results also showed that myoepithelial cell markers, including ACTA2, MYLK, and KRT14, were enriched in the perispatial domain of the tumor, suggesting high-quality detection of tumor contours using our algorithm. This high-quality contour guaranteed subsequent SKNY analyses, including clustering, trajectory estimation, and spatial stratification.

This study had some limitations. First, although our analysis of breast cancer and metastatic colorectal cancer samples confirmed advantages of the SKNY, it is necessary to verify SKNY performance using larger samples. In an integrated analysis of data from 14 metastatic colorectal cancer patients, SKNY demonstrated superior capacity for microenvironment characterization compared to conventional cell-level analysis methods. Even if the number of genes and samples in the panel increases in the future, SKNY can still catalog the microenvironment appropriately. Second, in the present analysis, the spatial omics data were converted to 10 × 10-μm grids, which may make it difficult to detect thin tissues, such as monolayered epithelium. However, setting the grid data to a smaller size should result in insufficient sensitivity of the marker genes on each grid. Therefore, it is necessary to consider the balance between grid size and marker gene sensitivity for each specimen and gene panel.

Conclusion

In conclusion, SKNY can be used in microenvironmental analyses to provide valuable insights into its pathological functions. It should be applicable not only to the TME but also to a wide range of microenvironments, such as tertiary lymphoid structures and myocardial and neuronal microenvironments.

Methods

Ethics statement

The study protocol was approved by Institutional review board of national cancer center (UMIN000027887).

Data acquisition and pre-processing

Breast cancer data from Xenium were downloaded from a public repository (https://www.10xgenomics.com/jp/products/xenium-in-situ/preview-dataset-human-breast). The ‘ReadXenium’ function from stlearn (v0.4.12) was used to read the HE images (https://www.dropbox.com/s/th6tqqgbv27o3fk/CS1384_post-CS0_H%26E_S1A_RGB-shlee-crop.png?dl=1) and files containing gene expression and cell coordinates (Xenium_FFPE_Human_Breast_Cancer_Rep1_cell_feature_matrix.h5 and Xenium_FFPE_Human_Breast_Cancer_Rep1_cells.csv.gz). The ‘tl.cci.grid’ function in stlearn was used to simplify the coordinate data into grid data () at the 10-µm interval.

Data for ovarian cancer, colorectal cancer, and melanoma were downloaded from the following URLs: https://www.10xgenomics.com/jp/datasets/ffpe-human-ovarian-cancer-data-with-human-immuno-oncology-profiling-panel-and-custom-add-on-1-standard, https://www.10xgenomics.com/jp/datasets/ffpe-human-colorectal-cancer-data-with-human-immuno-oncology-profiling-panel-and-custom-add-on-1-standard, https://www.10xgenomics.com/jp/datasets/human-skin-data-xenium-human-multi-tissue-and-cancer-panel-1-standard

In the TRIUMPH trial [28], formalin-fixed paraffin-embedded (FFPE) biopsy specimens were collected from 14 patients with HER2-amplified metastatic colorectal cancer. Twenty-four FFPE tissue sections were obtained, representing pre- and post-treatment time points or a single timepoint, depending on the patient’s treatment course. Spatial gene expression profiling was performed on the samples using the Xenium platform (10x Genomics, Pleasanton, CA, USA), which enables in situ analysis of RNA expression at subcellular resolution. For our analysis, a custom panel of 300 genes specifically designed for colorectal cancer research was utilized. The Xenium workflow consists of several key steps: tissue permeabilization and pretreatment, hybridization with gene-specific probes, rolling circle amplification (RCA) of target sequences, detection using fluorescently labelled oligonucleotides, and high-resolution imaging and data acquisition.

Detection of spatial domain

The pre-spatial domain (Spre) was determined by subtracting the grids with a negative marker (example: SFTPB; breast cancer, melanoma, ovarian cancer, colorectal cancer, and metastatic colorectal cancer: N/A) from the grids with a positive marker (example and breast cancer: CDH1; melanoma: MLANA; ovarian cancer, colorectal cancer, and metastatic colorectal cancer: EPCAM) (S11A Fig). The SKNY program can detect pre-spatial domains based on user selection. For example, if a user wants to obtain only the tumor region without the normal epithelium, logical subtraction between a positive marker’s expression (e.g., CDH1) and a negative marker (e.g., SFTPB) can be performed.

where Expr is defined as a function of extracting gene expression counts from the grid (S11A Fig). To remove noise from the pre-spatial domain, the “medianBlur” function (kernel size: 3×3) from the Python library opencv (v4.8.1) was applied, resulting in the formation of a denoised spatial domain (S11B Fig).

The STAGATE algorithm [24] was also used to extract spatial domain clusters for comparison with the existing methods. To annotate the extracted spatial domain clusters, the expression levels of epithelial markers (CDH1, EPCAM) were compared, and cluster 1, 3, and 9, which showed overexpression, was extracted as the spatial domain of the tumor. To assess the concordance between SKNY and STAGATE in the spatial domains, the Jaccard coefficient, which indicates the percentage of agreement between each lattice, was calculated.

Measurement of distance from the boundary line of the spatial domain

The boundary line was identified using the ‘findContours’ function from opencv in the spatial domain (S11B Fig). All adjacent grids were connected by edges and weighted according to the Euclidean distance: 1 for vertical and horizontal edges and a root of 2 for diagonal edges (S12 Fig). The shortest path from the boundary line to the other grids was measured using the multi-source Dijkstra method [45] to determine the distance from the spatial domain edges.

Segmentation from a spatial domain to individual spatial domains

The function ‘connectedComponentsWithStats’ from opencv was used to divide the entire image of the spatial domain into individual spatial domain (, ). The gene expression within each spatial domain was averaged.

Spatial stratification by spatial domains

Using the measured distances (S12 Fig), a stratification was performed with a half-open interval of 30 µm to determine the partial area (P) as a stratified spatial domain () (S11B Fig). For the stratified spatial domain of the outer section of the spatial domain (), the rectangle that enclosed each was then extracted, and each rectangle was enlarged by x μm to produce a rectangle including each stratified spatial domain (). The stratified spatial domain exclusive to the others () was calculated as follows:

where ⋀ represents the product sum, and ⋃ represents the union set. The gene expression of each was defined as the average gene expression of the grids within it. For simplicity, a flow assuming three adjacent spatial domains (S1, S2, and S3) and their (0, 30] stratified spatial domains (P0<μ≤30) is shown (S11C Fig). First, the rectangles covering the perimeter of each of S1, S2, and S3 are extracted and expanded by 30 μm (R1,30, R2,30, and R3,30), and their pairwise intersection () was taken. Then, to define stratified spatial domains specific to each spatial domain (PS1, 0<μ≤30, PS2, 0<μ≤30, and PS3, 0<μ≤30), intersection sets of the rectangle (R1, R2, or R3), and complement of the pairwise intersection () were taken.

Diversity analysis in the spatial domain

To compare the alpha diversity of gene expression between the segmented spatial domains and previously annotated cancer cells [26], the ‘diversity.alpha.chao1’, ‘diversity.alpha.shannon’, and `diversity.alpha.observed_otus` in the Python library scikit-bio was used to calculate Chao1, Shannon, and observed features, respectively [46].

Clustering of the spatial domains

The ‘pp.log1p’ function from scanpy (v 1.9.8) was used to log-transform gene expression in each spatial domain (). Subsequently, the ‘pp.pca’ function was used for dimension reduction through PCA. Fifty principal components were extracted in the order of highest eigenvector. The ‘pp.neighbours’ and ‘tl.leiden’ functions from the scanpy were adapted to form spatial domain clusters for leiden clustering. The ‘tl.umap’ function was used to place leiden embeddings on the UMAP two-dimensional space.

Trajectory estimation of the spatial domains

For trajectory inference by the PAGA algorithm [15], the “tl.paga” function of scanpy was used to construct the neighborhood graph of the spatial domain cluster, followed by estimation of the pseudotime by adapting the “tl.dpt” function. R libraries of Monocle [27] and Slingshot [17] were also used to confirm trajectory estimated by PAGA.

Statistical analysis

Pearson’s product-moment correlation coefficient was used to analyze the correlation between pseudotime and gene expression. Welch’s t-test was used to compare alpha diversity between the two groups. The Kruskal-Wallis test was used to compare gene expression between multiple groups. The “GLM” function in the Python library statsmodels (v0.14.1) was used to construct the general linearized model. KEGG pathway analysis was conducted using the ShinyGO (v0.80) [47], with an False Discovery Rate cutoff of 0.001.

Visualization

Xenium Explororor (v1.3) or the “pl.gene_plot” function in stlearn was used for the visualization of the Xenium data.

Supporting information

S1 Fig. Annotation of the spatial domain using the STAGATE algorithm.

Spatial distribution of each cluster by STAGATE algorithm at (A) single-cell level and (B) grid level. (C) Dotplot showing markers of cell types and expression patterns of genes associated with tumor subtypes. Clusters 1, 3, and 9 correspond to the tumor spatial domain.

https://doi.org/10.1371/journal.pcbi.1012854.s001

(TIF)

S2 Fig. Detection of tumor spatial domain using SKNY for multiple cancer types.

H&E staining images and detected spatial domains of ovarian cancer, colorectal cancer, and melanoma. The yellow and green colors indicate spatial domains and the boundary, respectively.

https://doi.org/10.1371/journal.pcbi.1012854.s002

(TIF)

S3 Fig. Measurement of distance from the surface of spatial domains.

(A) Heatmap indicating distance from surfaces of spatial domains. (B) The red contour lines indicate distance from the surface of spatial domains at the 30-μm interval.

https://doi.org/10.1371/journal.pcbi.1012854.s003

(TIF)

S4 Fig. Comparison of alpha-diversity index based on gene expression.

(A) Box plot of alpha-diversity index (Chao1) of expression between cancer cells and spatial domains. Box plot of (B) Chao1, (C) observed features, and (D) Shannon of expression among cells annotated by Janesick et al. and spatial domains.

https://doi.org/10.1371/journal.pcbi.1012854.s004

(TIF)

S5 Fig. Trajectory analysis of spatial domains using Monocle and Slingshot algorithms and autocorrelation analysis of marker gene for endothelial cell.

(A) The plot shows the Uniform Manifold Approximation and Projection (UMAP) representation of spatial domains, colored by their Leiden cluster assignments, overlaid with the trajectories predicted by the Monocle algorithm. The root point of the trajectory is labeled by the white node, and the branch points are labeled by the gray node. (B) The plot shows the UMAP representation of spatial domains, overlaid by their Leiden cluster assignments (left), overlaid with the trajectories predicted by the Slingshot algorithm (right). (C) Line plot depicting the relationship between Leiden clusters and Geary’s C for the genes PECAM1 (blue), VWF (orange), and CD93 (green).

https://doi.org/10.1371/journal.pcbi.1012854.s005

(TIF)

S6 Fig. Trajectory analysis illustrating the progression flow for the tumor microenvironment.

(A) Violin plots showing marker expressions of PECAM1, CD68, MMP2, CXCR4, and CXCL12 on the estimated trajectory path (IDC path #1) in the (−30, 0], (0, +30], and (+30, +60] sections. The color scale indicates the mean of pseudotimes in each cluster. The annotated values represent P values of the significance test. Red dots in the Figs indicate the mean the gene expression level. (B) Summary of the gene expression dynamics. Red and blue colors indicate the overrepresentation and underrepresentation of the gene expressions.

https://doi.org/10.1371/journal.pcbi.1012854.s006

(TIF)

S7 Fig. Identification of gene clusters correlated with CD8A distribution in DCIS and IDC.

(A) Top 10 genes with correlation coefficients with CD8A of DCIS or IDC. (B) Bar plot shows the expression density of CD8A, CCL5, and CX3CR1 stratified by section. The red series indicates DCIS, and the blue series indicates IDC. The respective color gradients indicate each stratified interval. Error bars represent 95% confidence intervals (CI). (C) Spatial expression distribution of CD8A, CCL5, and CX3CR1 in each ROI. (D) Lollipop plot shows significantly enriched KEGG pathways. The x-axis denotes fold enrichment, plot size represents the number of genes, and the color represents −log10(FDR).

https://doi.org/10.1371/journal.pcbi.1012854.s007

(TIF)

S8 Fig. SKNY application to spatial domain detection in the Xenium data set from the TRIUMP trial.

Spatial domains of metastatic colorectal cancer. The yellow and green colors indicate spatial domains and the boundary, respectively.

https://doi.org/10.1371/journal.pcbi.1012854.s008

(TIF)

S9 Fig. Clustering at the cellular level and extraction of tumor cell clusters.

(A) t-Distributed stochastic neighbor embedding (tSNE) plot of 1,133,514 cancer cells. The plot is colored according to the clusters determined by the leiden algorithm. (B) tSNE from A colored by EPCAM expression. (C) Extraction of clusters with high EPCAM and rearrangement of tSNE space.

https://doi.org/10.1371/journal.pcbi.1012854.s009

(TIF)

S10 Fig. Expression distribution of fibroblast markers and cytokines between spatial domain-based and tumor cell-based tSNE spaces. t-Distributed stochastic neighbor embedding (tSNE) plot of spatial domain and cancer cells.

The plots are colored according to the expression of COL1A1, FGFR2, CXCL12, and CXCL13.

https://doi.org/10.1371/journal.pcbi.1012854.s010

(TIF)

S11 Fig. Workflow of SpatialKNifeY algorithm.

(A) The schematic illustrates the process of identifying pre-spatial domains that express positive markers and do not express negative markers within a 10 × 10 µm grid. (B) The schematic shows the workflow for identifying and stratifying the spatial domain using three sequential steps: median filtering, contouring, and measurement of distance. (C) The schematic demonstrates the procedure for extracting stratified spatial domains that do not overlap with other ones.

https://doi.org/10.1371/journal.pcbi.1012854.s011

(TIF)

S12 Fig. Measurement of shortest distance from contour of spatial domains.

Left: Representation of a graph with weighted edges, where nodes are categorized as Outside the spatial domain (black), contour (green), and inside the spatial domain (yellow). Right: The result of applying the Multi-Source Dijkstra Algorithm is to compute the shortest distance from the contour. The calculated distances for each node are indicated in red.

https://doi.org/10.1371/journal.pcbi.1012854.s012

(TIF)

References

  1. 1. He S, Bhatt R, Brown C, Brown EA, Buhr DL, Chantranuvatana K, et al. High-plex multiomic analysis in FFPE at subcellular level by spatial molecular imaging. 2021.
  2. 2. Goltsev Y, Samusik N, Kennedy-Darling J, Bhate S, Hale M, Vazquez G, et al. Deep profiling of mouse splenic architecture with CODEX multiplexed imaging. Cell. 2018;174(4):968–81.e15. pmid:30078711
  3. 3. Kim S-W, Roh J, Park C-S. Immunohistochemistry for pathologists: protocols, pitfalls, and tips. J Pathol Transl Med. 2016;50(6):411–8. pmid:27809448
  4. 4. Hao Y, Stuart T, Kowalski MH, Choudhary S, Hoffman P, Hartman A, et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat Biotechnol. 2024;42(2):293–304. pmid:37231261
  5. 5. Hao Y, Hao S, Andersen-Nissen E, Mauck WM, Zheng S, Butler A, et al. Integrated analysis of multimodal single-cell data. Cell. 2021;184(13):3573–87.e29. pmid:34062119
  6. 6. Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM 3rd, et al. Comprehensive integration of single-cell data. Cell. 2019;177(7):1888–902.e21. pmid:31178118
  7. 7. Butler A, Hoffman P, Smibert P, Papalexi E, Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol. 2018;36(5):411–20. pmid:29608179
  8. 8. Satija R, Farrell JA, Gennert D, Schier AF, Regev A. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol. 2015;33(5):495–502. pmid:25867923
  9. 9. Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018;19(1):15. pmid:29409532
  10. 10. Virshup I, Bredikhin D, Heumos L, Palla G, Sturm G, Gayoso A, et al. The scverse project provides a computational ecosystem for single-cell omics data analysis. Nat Biotechnol. 2023;41(5):604–6. pmid:37037904
  11. 11. Cheng C, Easton J, Rosencrance C, Li Y, Ju B, Williams J, et al. Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data. Nucleic Acids Res. 2019;47(22):e143. pmid:31566233
  12. 12. Lin P, Troup M, Ho JWK. CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data. Genome Biol. 2017;18(1):59. pmid:28351406
  13. 13. Wan S, Kim J, Won KJ. SHARP: hyperfast and accurate processing of single-cell RNA-seq data via ensemble random projection. Genome Res. 2020;30(2):205–13. pmid:31992615
  14. 14. Yu L, Cao Y, Yang JYH, Yang P. Benchmarking clustering algorithms on estimating the number of cell types from single-cell RNA-sequencing data. Genome Biol. 2022;23(1):49. pmid:35135612
  15. 15. Wolf FA, Hamey FK, Plass M, Solana J, Dahlin JS, Göttgens B, et al. PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol. 2019;20(1):59. pmid:30890159
  16. 16. Saelens W, Cannoodt R, Todorov H, Saeys Y. A comparison of single-cell trajectory inference methods. Nat Biotechnol. 2019;37(5):547–54. pmid:30936559
  17. 17. Street K, Risso D, Fletcher RB, Das D, Ngai J, Yosef N, et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics. 2018;19(1):477. pmid:29914354
  18. 18. Cannoodt R, Saelens W, Sichien D, Tavernier S, Janssens S, Guilliams M, et al. SCORPIUS improves trajectory inference and identifies novel modules in dendritic cell development. 2016.
  19. 19. Jin S, Guerrero-Juarez CF, Zhang L, Chang I, Ramos R, Kuan C-H, et al. Inference and analysis of cell-cell communication using CellChat. Nat Commun. 2021;12(1):1088. pmid:33597522
  20. 20. Armingol E, Officer A, Harismendy O, Lewis NE. Deciphering cell-cell interactions and communication from gene expression. Nat Rev Genet. 2021;22(2):71–88. pmid:33168968
  21. 21. Nagai JS, Leimkühler NB, Schaub MT, Schneider RK, Costa IG. CrossTalkeR: analysis and visualization of ligand-receptorne tworks. Bioinformatics. 2021;37(22):4263–5. pmid:35032393
  22. 22. Palla G, Spitzer H, Klein M, Fischer D, Schaar AC, Kuemmerle LB, et al. Squidpy: a scalable framework for spatial omics analysis. Nat Methods. 2022;19(2):171–8. pmid:35102346
  23. 23. Pham D, Tan X, Balderson B, Xu J, Grice LF, Yoon S, et al. Robust mapping of spatiotemporal trajectories and cell-cell interactions in healthy and diseased tissues. Nat Commun. 2023;14(1):7739. pmid:38007580
  24. 24. Dong K, Zhang S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat Commun. 2022;13(1):1739. pmid:35365632
  25. 25. Blampey Q, Mulder K, Gardet M, Christodoulidis S, Dutertre C-A, André F, et al. Sopa: a technology-invariant pipeline for analyses of image-based spatial omics. Nat Commun. 2024;15(1).
  26. 26. Janesick A, Shelansky R, Gottscho AD, Wagner F, Williams SR, Rouault M, et al. High resolution mapping of the tumor microenvironment using integrated single-cell, spatial and in situ analysis. Nat Commun. 2023;14(1):8353. pmid:38114474
  27. 27. Qiu X, Mao Q, Tang Y, Wang L, Chawla R, Pliner HA, et al. Reversed graph embedding resolves complex single-cell trajectories. Nat Methods. 2017;14(10):979–82. pmid:28825705
  28. 28. Nakamura Y, Okamoto W, Kato T, Esaki T, Kato K, Komatsu Y, et al. Circulating tumor DNA-guided treatment with pertuzumab plus trastuzumab for HER2-amplified metastatic colorectal cancer: a phase 2 trial. Nat Med. 2021;27(11):1899–903. pmid:34764486
  29. 29. Traag VA, Waltman L, van Eck NJ. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep. 2019;9(1):5233. pmid:30914743
  30. 30. de Visser KE, Joyce JA. The evolving tumor microenvironment: from cancer initiation to metastatic outgrowth. Cancer Cell. 2023;41(3):374–403. pmid:36917948
  31. 31. Li JJ, Tsang JY, Tse GM. Tumor microenvironment in breast cancer-updates on therapeutic implications and pathologic assessment. Cancers (Basel). 2021;13(16):4233. pmid:34439387
  32. 32. Larionova I, Cherdyntseva N, Liu T, Patysheva M, Rakina M, Kzhyshkowska J. Interaction of tumor-associated macrophages and cancer chemotherapy. Oncoimmunology. 2019;8(7):1596004. pmid:31143517
  33. 33. Mahmoud SMA, Lee AHS, Paish EC, Macmillan RD, Ellis IO, Green AR. Tumour-infiltrating macrophages and clinical outcome in breast cancer. J Clin Pathol. 2012;65(2):159–63. pmid:22049225
  34. 34. Choi WWL, Lewis MM, Lawson D, Yin-Goen Q, Birdsong GG, Cotsonis GA, et al. Angiogenic and lymphangiogenic microvessel density in breast carcinoma: correlation with clinicopathologic parameters and VEGF-family gene expression. Mod Pathol. 2005;18(1):143–52. pmid:15297858
  35. 35. Arapandoni-Dadioti P, Giatromanolaki A, Trihia H, Harris AL, Koukourakis MI. Angiogenesis in ductal breast carcinoma. Comparison of microvessel density between primary tumour and lymph node metastasis. Cancer Lett. 1999;137(2):145–50. pmid:10374835
  36. 36. Synnestvedt M, Borgen E, Russnes HG, Kumar NT, Schlichting E, Giercksky K-E, et al. Combined analysis of vascular invasion, grade, HER2 and Ki67 expression identifies early breast cancer patients with questionable benefit of systemic adjuvant therapy. Acta Oncol. 2013;52(1):91–101. pmid:22934555
  37. 37. Pandey PR, Saidou J, Watabe K. Role of myoepithelial cells in breast tumor progression. Front Biosci (Landmark Ed). 2010;15(1):226–36. pmid:20036817
  38. 38. Potente M, Gerhardt H, Carmeliet P. Basic and therapeutic aspects of angiogenesis. Cell. 2011;146(6):873–87. pmid:21925313
  39. 39. Domanska UM, Kruizinga RC, Nagengast WB, Timmer-Bosscha H, Huls G, de Vries EGE, et al. A review on CXCR4/CXCL12 axis in oncology: no place to hide. Eur J Cancer. 2013;49(1):219–30. pmid:22683307
  40. 40. De Palma M, Biziato D, Petrova TV. Microenvironmental regulation of tumour angiogenesis. Nat Rev Cancer. 2017;17(8):457–74. pmid:28706266
  41. 41. Takaku M, Grimm SA, Roberts JD, Chrysovergis K, Bennett BD, Myers P, et al. GATA3 zinc finger 2 mutations reprogram the breast cancer transcriptional network. Nat Commun. 2018;9(1):1059. pmid:29535312
  42. 42. Nagasawa S, Kuze Y, Maeda I, Kojima Y, Motoyoshi A, Onishi T, et al. Genomic profiling reveals heterogeneous populations of ductal carcinoma in situ of the breast. Commun Biol. 2021;4(1):438. pmid:33795819
  43. 43. Gil Del Alcazar CR, Huh SJ, Ekram MB, Trinh A, Liu LL, Beca F, et al. Immune escape in breast cancer during in situ to invasive carcinoma transition. Cancer Discov. 2017;7(10):1098–115. pmid:28652380
  44. 44. Balachander N, Masthan KMK, Babu NA, Anbazhagan V. Myoepithelial cells in pathology. J Pharm Bioallied Sci. 2015;7(Suppl 1):S190-3. pmid:26015706
  45. 45. Dijkstra EW. A note on two problems in connexion with graphs. Numer Math. 1959;1(1):269–71.
  46. 46. Chao A. Nonparametric estimation of the number of classes in a population. Scand J Stat. 1984;11(4):265–70.
  47. 47. Ge SX, Jung D, Yao R. ShinyGO: a graphical gene-set enrichment tool for animals and plants. Bioinformatics. 2020;36(8):2628–9. pmid:31882993