Improving cell-type composition inference in spatial transcriptomics with SpaDAMA

doi:10.1371/journal.pcbi.1013354

Fig 1.

The network architecture of SpaDAMA.

(A) Describes the procedure for pseudo-ST data generation. The SpaDAMA training phase consists of three stages: (B), (C), and (D). In these stages, the pseudo and masked ST data are processed by a shared Encoder, which generates latent variables using a masked autoencoder. (E) The optimized model is subsequently applied to estimate the cellular makeup of real ST data. (F) Downstream analysis.

More »

Expand

Fig 2.

Performance evaluation of SpaDAMA across 32 simulated datasets demonstrated superior performance, as evidenced by higher PCC, SSIM, and AS values, along with lower RMSE and JS values.

AS (Accuracy Score) is a composite metric that combines PCC, SSIM, RMSE, and JS.

More »

Expand

Fig 3.

SpaDAMA accurately identifies the two subtypes of cardiomyocytes (atrial cardiomyocytes and ventricular cardiomyocytes) in the Human Developing Heart (HDH) dataset.

(A) Compares the prediction results of various methods applied to the HDH dataset. (B) Displays the marker gene MYH6 and the estimated proportions of atrial cardiomyocytes (a rare cell type), as determined by SpaDAMA and other methods. (C) Shows the marker gene MYH7 and the estimated proportions of ventricular cardiomyocytes (a major cell type), as predicted by SpaDAMA and alternative methods. (D) Compares the performance of SpaDAMA and other methods in estimating the proportions of atrial cardiomyocytes, using PCC and JSD values for the marker gene MYH6. (E) Compares the performance of SpaDAMA and other methods in estimating the proportions of ventricular cardiomyocytes, using PCC and JSD values for the marker gene MYH7.

More »

Expand

Fig 4.

SpaDAMA effectively analyzes the cell composition in the Murine Lymph Node (MLN) dataset.

(A) shows the cell composition results predicted by our method (SpaDAMA). (B) provides detailed information on the marker genes selected corresponding to each cell type. (C) Presents the marker gene Ly6a and the estimated proportions of Ifit3-high B cells, as predicted by SpaDAMA and other methods. (D) Shows the marker gene Cd79a and the estimated proportions of Mature B cells, as predicted by SpaDAMA and other methods. (E) Compares the PCC and JSD scores between SpaDAMA and other methods for the marker gene Ly6a and the estimated proportion of Ifit3-high B cells. (F) Compares the PCC and JSD scores between SpaDAMA and other methods for the marker gene Cd79a and the estimated proportion of Mature B cells.

More »

Expand

Fig 5.

SpaDAMA shows strong performance in analyzing cell composition within the Zebrafish Embryo (ZE) dataset.

(A) Clustering of scRNA-seq data by cell types from the same tissue. (B) Cell composition predicted by SpaDAMA. (C) Marker genes selected corresponding to each cell type. (D) Gene expression levels of blf, followed by predicted Erythroid Lineage Cell proportions by SpaDAMA and other methods. (E) Expression levels of si:ch211-157c3.4, followed by predicted Periderm_krt17 proportions by SpaDAMA and other methods. (F) PCC and JSD scores for blf and Erythroid Lineage Cell proportions. (G) PCC and JSD scores for si:ch211-157c3.4 and Periderm_krt17 proportions.

More »

Expand

Fig 6.

SpaDAMA demonstrates strong performance in analyzing cell composition within the Human Pancreatic Ductal Adenocarcinoma (PDAC) dataset.

(A) shows the true labels for the four regions (cancer, stroma, pancreatic, and duct epithelium). (B) presents the clustering results of scRNA-seq data, grouped by cell types from the same tissue. (C) The first panel shows the cancer regions labeled, followed by panels displaying the prediction results from various methods. (D) The first panel shows the pancreatic regions labeled, followed by panels displaying the prediction results from various methods. (E) shows the ROC curves for the prediction of the four regions (cancer, stroma, pancreatic, and duct epithelium) by different methods.

More »

Expand

Fig 7.

The heatmap displays the Pearson correlation scores of cell types, computed using the cell-type proportions inferred by SpaDAMA across four real ST datasets.

The color scale represents the correlation values. (A) shows the Zebrafish Embryo (ZE) dataset, (B) shows the Mouse Lymph Node (MLN) dataset, (C) shows the Human Developing Heart (HDH) dataset, and (D) shows the Human Pancreatic Ductal Adenocarcinoma (PDAC) dataset.

More »

Expand

Fig 8.

Comparison between SpaDAMA and RCTD on real spatial transcriptomics datasets.

For each dataset, the figure presents the predicted spatial distributions of selected cell types, with corresponding marker genes indicated in italicized titles (Marker Gene VS. Cell Type). (A) Human Developing Heart (HDH) dataset. (B) Murine Lymph Node (MLN) dataset. (C) Zebrafish Embryo (ZE) dataset. (D) Human Pancreatic Ductal Adenocarcinoma (PDAC) dataset.

More »

Expand

Table 1.

Summary of real datasets used for analysis.

More »

Expand