BARcode DEmixing through Non-negative Spatial Regression (BarDensr)

doi:10.1371/journal.pcbi.1008256

Fig 1.

BarDensr uses non-negative regression to demix and deconvolve the observed image stack, yielding a sparse intensity image for each barcode.

The key task of spatial transcriptomics data analysis is to take a stack of images (right) and use it to infer the locations of rolonies in the tissue. To solve this problem, BarDensr posits an ‘observation model’: a description of the physical process by which rolonies in the tissue give rise to the brightnesses we observe at each voxel. In particular, BarDensr assumes there is an unobserved ‘rolony density’ for each gene at each voxel (left), and the observation model mathematically describes how this rolony density transforms into the image stack we can see (right). Once this observation model is formulated, we can use sparse regression to solve the inverse problem: starting from an image stack, the regression gives us the value of the (unobserved) rolony density.

More »

Expand

Table 1.

Notation.

More »

Expand

Fig 2.

BarDensr accurately estimates sparse, per-gene rolony densities from the proposed observation model.

(A) Rolony densities make it easier to detect rolonies. The left plot shows the max-projection of the original experimental image across all rounds and channels; detecting blob-like structures in this image can be challenging, especially when two rolonies are in close proximity. By contrast, the rolony densities for particular genes are sparser, so it is easier to identify the positions of individual rolonies in the tissue. The middle and right plots show examples of these rolony densities. The orange marks represent rolonies detected by a hand-curated approach. Note that the rolony densities appear to show several rolonies which were missed by the hand-curated approach (see S3 Fig for further details). (B) BarDensr accurately recovers the ground truth in simulated data. The left plot shows the simulated data in all rounds and channels. In the right plot, we applied BarDensr to this simulated data, and found that we were able to largely recover the true rolonies in this simulation (shown on the first column). The final column of plots shows the rolony densities learned by BarDensr, which shows that the algorithm accurately recovers most of the simulated ground truth rolonies, with a few mistakes. The middle column of plots shows a blurred version of the rolony densities and the spots discovered from these rolony densities.

More »

Expand

Fig 3.

BarDensr outperforms the existing methods by discovering more rolonies.

(A) Performance of BarDensr on simulated data. What percentage of rolonies are correctly detected? We use the Receiver Operating Characteristic curve (ROC curve) to look at this percentage (the complement of False Negative Rates, or 1-FNR) as a function of the tolerated False Positive Rate (FPR), for BarDensr (red), starfish (orange), Single Round Matching (SRM, green), as well as the correlation-based method (‘corr’, gray); cf. S1 Appendix, Sections E and F for details on these other methods. S4 Fig illustrates these simulation data. In drawing these curves, we consider two qualitatively different kinds of errors: errors because a rolony isn’t detected at all (dotted lines), and errors because a rolony is detected but it is assigned the wrong barcode (solid lines). The left plots show these curves for simulated data. The right plots show these curves for simulated data with ‘dropout’—a form of noise present in some spatial transcriptomic methods (cf. S1 Appendix, Section G for details). For all four kinds of simulations, we found BarDensr is able to find significantly more spots. (B) Performance of BarDensr on the hybrid simulation. Simulated data is always imperfect; to try to measure performance on a more realistic dataset, we used a hybrid method a la [18]. We injected fake rolonies into real data, and quantified how well different methods could recover these fake spots. The plots above show 1-FNR (y-axis) as the function of scale intensity of the fake rolonies (x-axis) and number of fake rolonies injected (S, colored lines), without (top) and with (bottom) dropout, using BarDensr (left) and SRM (right). See S1 Appendix, Section G for details.

More »

Expand

Fig 4.

BarDensr is the best choice for dense and/or low-resolution experimental data.

(A) The 5× downsampled image is compared to the original ‘fine scale’ image. All these plots show the max-projection across all rounds and channels, with the right two showing the zoomed region indicated by the red rectangles in the left two. Note that it is difficult to visually isolate single spots from the downsampled image. To test the performance of BarDensr on this low-resolution data, we first run the model on the original data, obtain rolony densities, and then finally downsample the rolony densities. Next, we run BarDensr on downsampled data, and examine the estimated rolony densities. (B) The rolony densities for a selected gene (Slc17a7) estimated using the original fine scale (left), as well as these two approaches. For a more complete example, see S5 Fig. (C) The cell-level gene expression quantification, for those genes that have more than four spots in the fine scale in a 1000 × 1000 region. The color of the heatmap indicates the proportion of gene counts (i.e., the total counts of each gene divided by the total counts of all genes detected in the region). The x-axis represents the 24 genes that were chosen, ordered based on the counts in the fine scale. The y-axis represents the cells, ordered based on the hierarchical clustering result from the fine scale, as shown in the dendrogram on the left. A total of 43 cells are segmented from the original image using a seeded watershed algorithm (cf. S1 Appendix, Section I). The two different results yield nearly identical clusterings, indicating that BarDensr recovers gene activity with accuracy sufficient to cluster cells even given low-resolution images. (D) In order to evaluate the performance of BarDensr and compare it with the other state of the art methods, we took a 200 × 200 region of real experimental data from a recent study [14] and created a benchmark dataset (cf. S1 Appendix, Section J). Above we plot the ROC performance on this benchmark for four different methods (BarDensr, correlation-based method, as well as starfish with ‘spot-based’ and ‘pixel-based’ approaches), using the original data and denser / relatively low-resolution data. While all the methods performed quite well on the original images (left), BarDensr has a better performance than the others when the lower-resolution images were used (right).

More »

Expand

Fig 5.

BarDensr can be scaled up with sparsifying and coarse-to-fine approaches.

(A) Coarse-to-fine acceleration. The Area Under the ROC (AUROC) summarizes the performance of a method by calculating the integral of the ROC curve (higher is better). The black curve plots the AUROC performance on the simulated data, against the number of seconds the default iterative algorithm has been allowed to use, up to a maximum of 15 iterations. We can also use a coarse-to-fine strategy, where we first run the algorithm on downsampled data for 20 iterations, and use the results to perform 10 additional iterations on the full high-resolution data; the red curve plots the performance for this strategy. (B) BarDensr can take advantage of gene-sparsity. Here we used two different approaches to analyze a 1000 × 1000 region of the experimental data. The first approach uses BarDensr naively, applying it directly to the image. The second approach, illustrated on the first two plots, accelerates the method using a ‘coarse-to-fine’ method by taking advantage of ‘gene-sparsity.’ Specifically, we split this region into 4 × 4 patches (the borders of these patches are indicated as the white lines on the left plot). After the relatively fast ‘coarse’ step, the barcodes that have very low maximum rolony densities were removed before the following ‘fine’ step. This keeps only a relatively small number of barcodes to consider for each patch (ranging from 38 to 65 out of 81 barcodes, as shown in the middle plot), therefore reducing the computation time and the memory usage for the ‘fine’ step later (cf. S1 Appendix, Section K for more detail). We here show that both methods yield nearly the same result, as shown in the ROC curves on the right plot. In particular, we treated one method as the ‘truth’ and constructed an ROC curve indicating the accuracy of the other method. We can then do the reverse, treating the other method as ‘truth.’ The results suggest strong agreement.

More »

Expand

Fig 6.

BarDensr reveals the laminar distribution of the cell type marker genes, as well as the identified cell types in the motor cortex.

The data was obtained from a recent study [14]. Top: each dot indicates a detected rolony; for each cell-type, we use the same color for all rolonies associated with marker-genes for that cell-type. Bottom: each dot indicates a cell. Dots are colored based on the cell-types; cell-types are estimated using the detected rolonies from the top plot.

More »

Expand

Fig 7.

Using cleaned images and SVD to examine model fit quality and variability.

(A) SVD analysis example using one gene. Spots are identified in F_j* for each barcode j* using local-max-peak-finding. For the gene barcode j* (Deptor) shown here (top left), three spots with the highest accuracy are being analyzed. The right panel shows the zoomed-in R × C plots of the raw image X (top) and ‘cleaned’ image X^(j*) (bottom) at these three spot locations for barcode j*. Note that ‘cleaned’ images are significantly sparser than the raw images, as desired. We then applied SVD to the cleaned image X^(j*) at these three spot locations. The first two columns on the bottom left show the zoomed-in image of the original spot (KF)_j* and the learned weighted barcode matrix (G_j*) corresponding to this gene barcode j*. The top singular vectors are plotted in the last two columns (showing a good match with G_j* and the cropped (KF)_j*). R² is the squared correlation coefficient between X^(j*) and the outer product of these two singular vectors; the high R² values seen here indicate that the model accurately summarizes X^(j*). (B) Results of SVD analysis of cleaned images for the top high-R² spots. This plot summarizes the results of the analysis illustrated in (A). The first column shows (KF)_j* around the brightest spots; the second column shows the top spatial singular vectors for the same region, and the last column shows the top temporal singular vectors for these spots (the top row shows the scaled G_j* learned from the model, and the bottom row shows the corresponding top temporal singular vectors for these spots). Only six barcodes that are most abundant in the selected region are shown here; S10 Fig provides a more complete illustration.

More »

Expand