Skip to main content
Advertisement

< Back to Article

Fig 1.

Deconvolution challenges due to coherence and heterogeneity.

A, Two-dimensional UMAP representation of scRNA-Seq data from the developing human heart. Each color denotes cell types (clusters), revealing local and global structures based on their similarity. B, Illustration of high cellular heterogeneity within Cluster-0, represented by the line connecting the centroid (star) of Cluster-0 to the cell at the maximum distance. C, Overlapping clusters (Cluster-2 in green, Cluster-3 in orange, and Cluster-4 in blue) due to cell type similarity, with cluster radii calculated using centroid distances and cell locations at maximum distances. Large intersecting areas indicate similarity between cell types. D, Heatmap of Pearson’s correlations among clusters, with increased red intensity signifying higher correlations between functionally coherent cell types. The correlation values between clusters are displayed. E, The 2D UMAP plot of scRNA-Seq reference data showing cellular clusters from two cell types (grey) and (orange) as separate clusters with their DEG vectors and . F, A spatial transcriptomics capture spot containing multiple cells from one cell type , , and (grey) and background noise (brick-red), illustrating the complexity of the spot’s transcriptomic profile. G, General deconvolution strategy using scRNA-Seq reference vectors to infer cell-type contributions in capture spots. Conventional over-permissive methods assign non-negative coefficients, and , to both cell types and even when only is present, leading to inaccurate, non-sparse solutions. H, Illustration of noise tolerance in deconvolution. Over-permissive tools distribute coefficients across both cell types to account for noise, while WISpR uses thresholding to eliminate the contribution of and recover the true sparse coefficient for , enhancing deconvolution accuracy. I–N, The reference scRNA-Seq data from the developing human embryonic heart is used to assess deconvolution approaches for distinguishing human heart-specific cell types from a mouse brain spatial transcriptomics dataset. Model predictions for atrial cardiomyocyte cell types in the mouse brain are presented. I, In DWLS prediction, human atrial cardiomyocytes show low percentages in mouse brain spots, with occurrences observed in cortical layers and thalamus. J, RCTD prediction reveals high percentages of cells in spots located in hippocampal, thalamic, and cortical regions. K, S-DWLS prediction, similar to RCTD, displays high percentages of cells in spots concentrated in cortical and hippocampal regions. L, SPOTlight and M, Stereoscope predicts a high abundance of cells across various brain zones, while N, WISpR predicts a minimal number of spots in the thalamus, supporting its more accurate prediction on mismatched datasets. Figure 1, 2, 5 & 6 – The font size of the image is below 6 pt which affects the readability of the image. Hence, please supply a corrected version with font size above 6 pt.

More »

Fig 1 Expand

Fig 2.

Overcoming limitations in sparse cell-type predictions for a biologically reliable task.

A, Illustration of mouse hippocampus on mouse brain coronal section. B, 2D UMAP representation of the 8 HPF cell types. C, Overlapping CA1 and CA3 due to cell type similarity, with cluster radii calculated using centroid distances and cell locations at maximum distances. D-I, Model predictions for CA3 in the P8 and J-O in adult mouse brain are presented. D, J, In DWLS prediction, CA3 cells are slightly over-represented in mouse brain spots (blue shade), with frequent spot prediction failures (red). E, K, RCTD prediction reveals high percentages of cells in spots located in hippocampal, thalamic and cortical regions. F, L, S-DWLS prediction displays confined but still disproportionate of cells in spots concentrated in stratum oriens, stratum pyramidale and stratum lucidum and thalamus regions. G, M, SPOTlight predicts a high abundance of CA3 cells in DG and across various brain zones including cortex. H, N, Stereoscope predicts CA3 in almost all hippocampus, thalamus and cortical regions, while I, O WISpR predicts CA3 cells in exactly CA3 morphological region in both P8 and adult mice, supporting its more accurate prediction on mismatched datasets.

More »

Fig 2 Expand

Fig 3.

WISpR spatial deconvolution workflow overview.

A, UMAP visualization of scRNA-Seq data showcasing four distinct cell types, each depicted in a unique color. B, Gene expression profiles for individual cell types derived from the intersection of genes (N) in scRNA-Seq and spatial transcriptomics datasets. C, Representative x- and y-coordinates of spots in the spatial transcriptomics dataset, with each color indicating a zone profile identified through clustering analysis. D, Gene expression profiles for individual spatial zones based on a selected gene set from the intersection of scRNA-Seq and spatial transcriptomics datasets. E, Identification of differentially expressed genes specific to cell types () and zones (), defining cluster differentiation strength. The union of and forms the reference scRNA-Seq and spatial transcriptomics datasets (G) for WISpR deconvolution. F, WISpR algorithm summary. The spatial transcriptomics matrix (MxL) and reference scRNA-Seq matrix (MxQ) guide the prediction of the best sparse coefficient matrix (QxL), representing the number of genes (M) in G for each spot, number of spots (L), and number of reference cell types (Q). WISpR iteratively thresholds coefficients based on spot-specific weights (w) and thresholds (). G, Spatial deconvolution results reveal estimated abundance patterns, locations, and co-occurring cell type compositions for each reference cell type using the WISpR algorithm.

More »

Fig 3 Expand

Table 1.

Generated blended data using mouse brain cells and human embryonic heart cells, labeled as Mix[a,b,c], indicates the proportion of mouse brain cells found in both the reference scRNA-Seq and synthetic spatial datasets (a), human embryonic heart data (b), and mouse brain cells absent in the synthetic spatial dataset within the reference data (c).

More »

Table 1 Expand

Fig 4.

Overview of the comparative analysis of deconvolution model performances using synthetic spatial data in four different scenarios and six different mixture percentages.

Four benchmark scenarios are generated by selected cells from 15 cell types in the developing human embryonic heart scRNA-Seq dataset. 6 different mixtures are generated using human heart cells and mouse brain cells. Tested deconvolution algorithms are DWLS, RCTD, S-DWLS, SPOTlight, Stereoscope, and WISpR. A, A mean Root Mean Squared Error (RMSE) assessment of predictive performances of six methods’ performances. Except for Scenario 3, WISpR outperforms the other five models. In Scenario 3, a slightly better mean predictive performance was observed in RCTD model, where the distribution of errors between RCTD and WISpR displayed no statistical significance. B, The mean standard deviations of RMSE scores. WISpR emerged as low error prone especially in blended data (Mix[50,50,0], Mix[50,40,10], Mix[50,30,20], Mix[50,20,30], Mix[50,10,40], Mix[50,0,50]). C, The mean F1 scores and D, their mean standard deviations in all scenarios. When there are highly matching cell types between scRNA-Seq and synthetic data, WISpR’s and S-DWLS’s F1 scores are the highest; however, when unmatched cell types are introduced, the S-DWLS score decreases dramatically. The low standard deviations in DWLS, RCTD and SPOTlight indicate their ubiquitous false predictions of the non-existing cell types.

More »

Fig 4 Expand

Fig 5.

Predictive performance evaluation on real heart data with selected cell types.

Top: Predictions for ventricular cardiomyocytes. Bottom: Predictions for outflow tract cells. Blue denotes unrealistic negative predictive coefficients, while the red gradient represents the cell type percentages in the corresponding capture spots. A and G, DWLS predictions reveal numerous spots with unrealistic negative coefficients. Biologically incorrect positive coefficients are also evident, particularly around the atria and outflow tract, especially for ventricular cardiomyocyte cell prediction. B and H, RCTD shows a high abundance of false-positive predictions for ventricular cardiomyocyte cells. For the outflow tract, four spots represent false-positive predictions at a low percentage. C and I, S-DWLS exhibits improved predictions; however, false-positive predictions persist for both ventricular cardiomyocytes and the outflow tract. One spot shows a negative coefficient in C. D and J, SPOTlight predictions indicate overpredicted capture spots in the corresponding tissue. E and K, Stereoscope also overpredicts cells belonging to ventricular cardiomyocytes. Moreover, cells for the outflow tract are incorrectly predicted in the epicardial zone of the heart. F and L, WISpR predictions accurately emphasize abundant ventricular cardiomyocytes and outflow tract cells in associated spots. M, WISpR estimates an average of cell types per spot, aligning with expected spatial complexity. N–O, WISpR has the lowest positive spots for ventricular cardiomyocytes (n = 119) and one of the lowest for outflow tract (n = 33), and achieves the highest precision (0.92 and 0.88) across validated regions. P, WISpR and S-DWLS show the most highest proportion estimates for dominant cell types in biologically validated region. R, WISpR demonstrates the lowest off-target assignment rates, confirming its sparsity-aware design minimizes overfitting and enhances biological fidelity.

More »

Fig 5 Expand

Fig 6.

Precise spatial delineation of brain cell types in adult mouse brain.

A, Illustration of the right hemisphere coronal section of an adult mouse brain, outlining major brain regions with dashed lines. B, Identification of 10 spatial clusters, extracted from the 10X Genomics database, representing CTX+AMYG (violet), HY+cerebral peduncle (blue), OSNs (yellow), ventral posterolateral TH (dark green), Ndnf+ neurons (cyan), fiber tracts (magenta), HPF and DG (brown), Npy+ neurons (light green), ventricles (white), and stria terminalis (grey). C, 3D UMAP illustration of scRNA-Seq [69], representing major cell types with distinct colors and cellular subtypes as shades of the major color. For simplicity, Neuron 25 (violet), Ependymal 47 (orange), Oligodendrocytes 5 (light green), and Astrocytes 14 (light blue) are depicted. D-I Spatial localization of some selected cell types and their selected marker genes are given below. D, Spatial localization of Astrocytes 14 on AMYG, CA1, CA3, DG, and outer layers of CTX, with expression levels of a critical astrocyte-specific DEG, Acsbg1. E and F, Coinciding spatial locations of Blood 73 cells and Vascular 70 cells, respectively, on the third ventricle, the ventral intersection point of the cerebral peduncle and the molecular layer of the DG. The expression of Alas2 and Crabp2, which are an erythrocyte and a fibroblast marker, respectively, are enriched. G, Cells of the Oligodendrocytes 5 localized in the corpus callosum, fimbria, stria terminalis, and cerebral peduncle, enriched in Ermn K (left), specifically expressed in myelinating oligodendrocytes. H, Ependymal 47 cells predominantly located in the lateral and third ventricles expressing Ccdc153, an ependymal cell marker gene. I, Representation of six distinctly located neuronal subtypes, Neuron 12 (yellow), Neuron 14 (orange), Neuron 25 (green), Neuron 27 (blue), Neuron 28 (magenta), and Neuron 59 (red). The brain- specific Ak5, is differentially expressed within Neuron Cluster 25 but also found in other neuronal cell types and a few excluded cells, indicating their possible neuronal characteristics.

More »

Fig 6 Expand

Fig 7.

Spatial characterization of cancer-associated cell types in human breast cancer malignancies.

A, 8 clusters of ER+ breast cancer (Patient ID: CID4535). Invasive cancer-affected tissues were given by shades of cyan, yellow (cancer within lymphatic tissues), and blue (cancer in adipose and lymphatic tissues). Notably, stromal regions (dark-green) and lymphocytes (red) formed distinct clusters without cancerous involvement. Adipose tissue (brown) exhibited minimal representation within the dataset. A cluster labeled as “Uncertain" (white) was identified, alongside sparse occurrences of artifact spots (magenta) [74]. B, Localization of CAFs myCAF-like cells, with their abundance proportions in the stroma (less abundance:white, complete abundance:black). C, Basal cells were not detected in ER+ tissue sample. D, Cycling cells were localized in the “Uncertain" zone. E, LumB SCs were highly concentrated in zones annotated as invasive cancer while absent in stromal and lymphatic zones. F, 7 clusters of TNBC breast cancer (Patient ID: CID44971). Invasive cancer and lymphatic zones (yellow), and DCIS (cyan). Normal tissue, including lymphocytes and stroma, were represented in blue, with stroma+adipose tissue in brown. Stromal regions were clustered in dark-green. Healthy lymphocytes were distributed across both invasive cancer and DCIS regions but distinctly clustered in red. A few magenta spots were artifacts [74]. G, CAFs myCAF-like cells were predominantly predicted in the invasive cancer and stromal+adipose tissue, representing their cancer-prone characteristics. H, Basal SCs and I, cycling cells were confined on the DCIS and a bounded region of the invasive cancer spatial cluster, representing similar gene expression patterns in distinct zones at high resolution. J, The DCIS had the capacity to diversify its cancer microenvironment based on the existence of LumB SCs. K, Expression profiles of representative genes, namely MAGED2, STARD10, and DHRS2, for TNBC, and L, C4orf48 and APOE, for ER+ cancer, identified in “SCSubtypes" algorithm [74]. (Top) The ridge plot represents the distribution of the gene expression levels and (bottom) the spatial distribution of the expression of the genes.

More »

Fig 7 Expand