From spots to cells: Cell segmentation in spatial transcriptomics with BOMS

doi:10.1371/journal.pone.0311458

Fig 1.

Workflow of the BOMS algorithm.

A: BOMS takes as input the gene labels and the spot locations. B: Among the k spatial Nearest Neighbors, the number of occurrences of each gene is calculated to form the Neighborhood Gene Expression (NGE) vectors. These NGE vectors can be visualized in the color space by taking PCA projection of them in three dimensions. The spatial locations together with the NGE vectors form separate clusters for individual cell instances in the joint spatial-NGE space. BOMS takes advantage of this structure and tries to find the modes in this joint domain by iteratively moving towards the maxima of the underlying (estimated) density function. C: Sample trajectories of the Meanshift procedure are shown along with the final mode locations. D: Cell segmentation labels are estimated by grouping together all the molecules that were mapped to the same mode. E: Final cell outlines are shown with the NGE vectors. F: Movement of a spot after incorporating Cellpose flows with different confidence levels, . The meanshift direction is marked in red and the direction of Cellpose flow is marked in blue. The final location of the spot is a convex combination of the two vectors, with coinciding with the mean shift vector, and coinciding with the update as per Cellpose flow.

More »

Expand

Table 1.

Segmentation parameters for the different datasets.

More »

Expand

Fig 2.

Examples of BOMS results on the published datasets.

A: BOMS segmentation results on the Allen smFISH dataset. All molecules are colored by taking a PCA projection of the NGE vectors. Cell boundaries are shown by black contours. The right column shows a zoomed-in version of BOMS, Baysor, and the original published segmentation overlaid on the DAPI image. Green colored circles indicate that the method has correctly detected cell boundaries whereas red colored circles indicate incorrect segmentation. B: BOMS result on the STARmap dataset [15] overlaid on the DAPI image, C: BOMS result on the osmFISH dataset [14] overlaid on the poly(A) image and D: BOMS result on the MERFISH [13] dataset.

More »

Expand

Fig 3.

Effect of Cellpose flows on BOMS segmentation.

The figure demonstrates the impact of incorporating Cellpose flows into BOMS segmentation for the Allen smFISH (A) and osmFISH (B) datasets. For the Allen smFISH dataset, Cellpose was applied to the DAPI image, while for the osmFISH dataset, it was applied to the poly(A) image. Cyan boundaries represent Cellpose segmentations, while black outlines depict the resulting BOMS segmentations. All molecules are colored based on a PCA projection of their NGE vectors, where spots with similar colors have a similar molecular neighborhood. Transcriptional variation within cells is reflected by differences in the coloring of spots. As the influence of Cellpose flows increases, BOMS segmentation aligns more closely with Cellpose boundaries. Lower values are particularly necessary in cases with substantial intracellular transcriptional variation to ensure faithful alignment with Cellpose results.

More »

Expand

Fig 4.

Comparison of BOMS with related methods.

A: The runtime performance of BOMS vs. Baysor—BOMS produces results of similar quality to Baysor while being 10 times faster. B: Mutual Information Scores with respect to the Silver Standard (original published segmentations). The scores are similar to those of Baysor and pciSeq, except on the STARmap dataset. C: Schematic showing the calculation of correlation score for comparing a source and target Segmentation. For each cell in the source segmentation, the target cell with the maximum overlap is computed. Correlation score between the molecules in the overlapping region and the remaining molecules in the source cell is then estimated. If the source segmentation is correct, the corresponding correlation scores should be high. D: The number of detected cells reported by different methods, showing BOMS is able to recover more cells than the other methods. E: The fraction of molecules assigned to cells by different methods, showing least number of unassigned transcripts by BOMS. F: a. Correlation score for BOMS vs. Baysor, b. BOMS vs. pciSeq, c. BOMS vs. Silver Standard showing a higher performance of BOMS with respect to pciSeq and the original published segmentation.

More »

Expand

Table 2.

Summary of benchmarking results across datasets and methods. The table compares BOMS with other segmentation methods, including Baysor, pciSeq, and the Silver Standard, based on key metrics: runtime, mutual information, correlation score, number of detected cells, and fraction of molecules assigned. ‘S’ and ‘T’ refer to Source and Target segmentations, respectively. See Fig 4 for visual representations of these comparisons.

More »

Expand

Fig 5.

Breakdown of the correlation metric proposed by [9].

The figure illustrates the behavior of the Correlation metric when comparing Baysor with BOMS at varying settings, inducing under-segmentation or over-segmentation, on the Allen smFISH dataset. The Silver Standard segmentation is depicted with red contours in A–F. A: Baysor segmentation. B: BOMS with optimal parameters (h_s = 17.5, h_r = 0.4). C: BOMS with a high spatial bandwidth to induce under-segmentation (h_s = 30, h_r = 0.4). D: BOMS with a high range bandwidth to induce under-segmentation (h_s = 17.5, h_r = 0.9). E: BOMS with low spatial bandwidth to cause over-segmentation (h_s = 10, h_r = 0.4). F: BOMS with low range bandwidth to cause over-segmentation (h_s = 17.5, h_r = 0.08). G: Correlation scores depicting higher values implying good performance when BOMS under-segments because of high spatial bandwidth or when BOMS over-segments because of low range bandwidth in contrast to bad visual results. H: Normalized Mutual information values with respect to the Silver Standard make it evident that the results are actually worse despite good correlation scores.

More »

Expand

Fig 6.

Comparison of cells obtained from BOMS and Cellpose.

The figure shows results from Cellpose and BOMS on adjacent cells in the (A) Allen smFISH dataset, where Cellpose was applied to the DAPI image, and (B) osmFISH dataset, where Cellpose was applied to the poly(A) image. For the Allen smFISH dataset, BOMS identifies larger cell boundaries compared to Cellpose as it also includes transcripts outside the nuclei, which Cellpose cannot detect when only DAPI staining is available. For osmFISH, the results from BOMS align closely with Cellpose boundaries.

More »

Expand

Fig 7.

Detection of cells missed by Cellpose but identified by BOMS.

This figure presents examples from the osmFISH dataset where Cellpose, applied to the poly(A) image, failed to detect cells, while BOMS successfully segmented them using transcriptomic data. For each instance, the DAPI image, poly(A) image, and transcriptomic data are shown with Cellpose boundaries (cyan) and BOMS boundaries (black) overlaid. These examples highlight BOMS’s ability to segment cells in cases where auxiliary images are insufficient due to imaging artifacts or lack of signal.

More »

Expand

Table 3.

Comparison of the advantages and disadvantages of different cell segmentation methods. The table summarizes the strengths and limitations of BOMS and other methods (Baysor, pciSeq, and Auxiliary Stain Segmentation), providing qualitative insights into their performance characteristics.

More »

Expand

Fig 8.

Application of BOMS to the Xenium dataset.

A: Segmented cells using BOMS. B: Segmented cells using the Silver Standard. In (A) and (B), the spots are colored by taking a PCA projection of the NGE vectors. C: Cellpose boundaries overlaid on the DAPI image, where Cellpose was applied to define nuclear boundaries. D: Correlation score of BOMS with respect to the Silver Standard.

More »

Expand

Fig 9.

Results of clustering and DEG analysis on the Allen smFISH dataset.

A–D: UMAP embeddings of cells segmented by (A) Silver Standard, (B) BOMS, (C) Baysor, and (D) pciSeq, with clusters represented by distinct colors. The number of clusters is indicated in the title of each subplot. E–G: Heatmaps showing the Jaccard similarity of the top-5 DEGs identified for clusters in the Silver Standard and clusters from (E) BOMS, (F) Baysor, and (G) pciSeq.

More »

Expand

Fig 10.

Results of clustering and DEG analysis on the MERFISH dataset.

A–D: UMAP embeddings of cells segmented by (A) Silver Standard, (B) BOMS, (C) Baysor, and (D) pciSeq, with clusters represented by distinct colors. The number of clusters is indicated in the title of each subplot. E–G: Heatmaps showing the Jaccard similarity of the top-5 DEGs identified for clusters in the Silver Standard and clusters from (E) BOMS, (F) Baysor, and (G) pciSeq.

More »

Expand

Fig 11.

Cell type label transfer for the osmFISH dataset.

A: Bar plot showing the number of cells assigned to each cell type by the Silver Standard, BOMS, Baysor, and pciSeq. B: Example of an astrocyte cell identified both in the Silver Standard and by BOMS. C: Example of an astrocyte detected by BOMS but not present in the Silver Standard due to lack of poly(A) signal. The grayscale background shows poly(A) staining. D-E Joint UMAP embedding of cells from BOMS and the Silver Standard colored by (D) cell type and (E) dataset, illustrating overlap and unique contributions.

More »

Expand