Fig 1.
SpatialKNifeY analysis landscape.
(A) The concept of the extension from spatial omics data and spatial domain to the microenvironment. (B) Implementation of SpatialKNifeY (SKNY). A Python library of SKNY depends on stlearn [23] and scanpy [9] functions (see “Methods”) and AnnData object programming [10]. (C) Outputs from SKNY analysis. Detection (Output 1, see “Fig 2”) delineates spatial domains based on a user’s positive and negative marker gene expressions. Clustering (Output 2, see “Fig 3”) makes clusters of spatial domain units based on the mean expression of each gene. Trajectory estimation (Output 3, see “Fig 4”) refers to the trajectory among spatial domains and pseudotime. Spatial stratification (Output 4, see “Fig 2”, “Fig 5”, and “Fig 6”) measures the distance from tumor boundary to each coordinate on the space and makes contour lines based on the distance.
Fig 2.
Detection of spatial domain using Xenium data accurately discriminates between the tumor and stromal region.
(A) H&E staining image of breast cancer. (B) Detected spatial domains. The yellow and green colors indicate spatial domains and the boundary, respectively. (C) H&E staining images and spatial domain(s) from three ROIs. The red contour lines indicate distance from the surface of spatial domains at the 30-μm interval. (D) Dotplot showing marker genes of each cell type. The color bar indicates the scaled mean count, and the size indicates the percentages of the gene expressions. (E) Spatial expression distribution of cell marker genes in the ROI. The color bar indicates the scaled mean count.
Fig 3.
Clustering and annotation of spatial domain based on gene expression.
(A) Two-dimensional plot based on UMAP loadings of gene expression of spatial domains. The colors indicate clusters. (B) Spatial distribution of each cluster. (C) H&E image with histological annotations. (D) Dotplot showing markers of cell types and expression patterns of genes associated with tumor subtypes.
Fig 4.
Estimating spatial domain trajectory reveals temporal gene expression gradient along cancer progression.
(A) PAGA graph constructed based on the expression data of the spatial domains. (B) PAGA-initialized spatial domain embeddings with estimated pseudotimes, MKI67, and ACTA2 expressions. Pearson’s correlation coefficients and P values were used to evaluate linear relationship between pseudotimes and scaled expression of MKI67/ACTA2. (C) Spatial distribution of clusters and pseudotimes. (D) Heatmap showing gene expression level along with pseudotimes on three progression paths. (E) Representative HE staining images and gene expression on the ROI. Color bar indicates the scaled mean count.
Fig 5.
Spatial stratification of each spatial domain cluster elucidating endothelial cell invasion into the tumor.
(A) Spatial distributions of stratified spatial domain clusters into (−30, 0], (0, +30], and (+30, +60] sections. (B) Violin plots showing the endothelial cell marker gene expressions (PECAM1, VWF, and CD93) for each cluster in the (−30, 0], (0, +30], and (+30, +60] sections. The x-axes indicate cluster numbers, and the y-axes indicate scaled gene expression levels. The annotated values are the P values of the significance test. (C) Representative images of DAPI with epithelial cell markers (CDH1 and EPCAM) and endothelial cell marker (CD93, PECAM1, and VWF) expression for four ROIs.
Fig 6.
Comparison of immune cell distribution in microenvironments between DCIS and IDC regions.
(A) ROIs for DCIS and IDC clusters, respectively. The ROIs in red indicate clusters for DCIS cluster and those in blue for IDC cluster. (B) Bar plot shows the expression density of various immune cell markers stratified by section. The red series indicates DCIS, and the blue series indicates IDC. The respective color gradients indicate each stratified interval. Error bars represent 95% confidence intervals (CI). The p-values and regression coefficients [95% CI] for each constructed GLM model are shown on the right side. (C) Spatial expression distribution of CDH1 and CD8A in each ROI.
Fig 7.
Cataloguing spatial domains using Xenium data of metastatic colorectal cancer in TRIUMPH trial.
(A) t-Distributed stochastic neighbor embedding (tSNE) plot of 2,151 spatial domains (left) and 391,639 tumor cells (right) from 23 Xenium data. The plots are colored according to the clusters determined by the leiden algorithm [29]. (B) tSNE from A colored by patient ID.