^{1}

^{2}

^{3}

^{1}

^{¤}

^{2}

^{2}

^{3}

^{4}

^{3}

^{5}

^{1}

^{3}

^{4}

^{5}

The authors have declared that no competing interests exist.

Current address: Department of Electrical Engineering, Technion, Haifa, Israel

Progress in modern neuroscience critically depends on our ability to observe the activity of large neuronal populations with cellular spatial and high temporal resolution. However, two bottlenecks constrain efforts towards fast imaging of large populations. First, the resulting large video data is challenging to analyze. Second, there is an explicit tradeoff between imaging speed, signal-to-noise, and field of view: with current recording technology we cannot image very large neuronal populations with simultaneously high spatial and temporal resolution. Here we describe multi-scale approaches for alleviating both of these bottlenecks. First, we show that spatial and temporal decimation techniques based on simple local averaging provide order-of-magnitude speedups in spatiotemporally demixing calcium video data into estimates of single-cell neural activity. Second, once the shapes of individual neurons have been identified at fine scale (e.g., after an initial phase of conventional imaging with standard temporal and spatial resolution), we find that the spatial/temporal resolution tradeoff shifts dramatically: after demixing we can accurately recover denoised fluorescence traces and deconvolved neural activity of each individual neuron from coarse scale data that has been spatially decimated by an order of magnitude. This offers a cheap method for compressing this large video data, and also implies that it is possible to either speed up imaging significantly, or to “zoom out” by a corresponding factor to image order-of-magnitude larger neuronal populations with minimal loss in accuracy or temporal resolution.

The voxel rate of imaging systems ultimately sets the limit on the speed of data acquisition. These limits often mean that only a small fraction of the activity of large neuronal populations can be observed at high spatio-temporal resolution. For imaging of very large populations with single cell resolution, temporal resolution is typically sacrificed. Here we propose a multi-scale approach to achieve single cell precision using fast imaging at reduced spatial resolution. In the first phase the spatial location and shape of each neuron is obtained at standard spatial resolution; in the second phase imaging is performed at much lower spatial resolution. We show that we can apply a demixing algorithm to accurately recover each neuron’s activity from the low-resolution data by exploiting the high-resolution cellular maps estimated in the first imaging phase. Thus by decreasing the spatial resolution in the second phase, we can compress the video data significantly, and potentially acquire images over an order-of-magnitude larger area, or image at significantly higher temporal resolution, with minimal loss in accuracy of the recovered neuronal activity. We evaluate this approach on real data from light-sheet and 2-photon calcium imaging.

This is a PLoS Computational Biology Methods paper.

A major goal of neuroscience is to understand interactions within large populations of neurons, including their network dynamics and emergent behavior. This ideally requires the observation of neural activity over large volumes. Recently, light-sheet microscopy and genetically encoded indicators have enabled unprecedented whole-brain imaging of tens of thousands of neurons at cellular resolution [^{2} − 10^{3}) neurons simultaneously with adequate temporal resolution [

Another critical challenge is the sheer amount of data generated by these large-scale imaging methods. A crucial step for further neural analysis involves a transition from voxel-space to neuron-source space: i.e., we must detect the neurons and extract and demix each neuron’s temporal activity from the video. Simple methods such as averaging voxels over distinct regions of interest (ROIs) are fast, but more statistically-principled methods based on constrained non-negative matrix factorization (CNMF) better conserve information, yield higher signal-to-noise ratio, recover more neurons, and enable the demixing of spatially overlapping neurons [

Decimation ideas do not just lead to faster computational image processing, but also offer prescriptions for faster image acquisition over larger fields of view (FOV), and for observing larger neural populations. Specifically, we propose the following two-phase combined image acquisition/analysis approach. In the first phase, we use conventional imaging methods to obtain estimates of the visible neuronal locations and shapes. After this cell-identification phase is complete we switch to low-spatial-resolution imaging, which in the case of camera-based imaging simply corresponds to “zooming out” on the image, i.e., expanding the spatial size of each voxel. This has the benefit of projecting a larger FOV onto the same number of voxels; alternatively, if the number of voxels recorded per second is a limiting factor, then recording fewer (larger) voxels per frame implies that we can image at higher frame-rates. We are thus effectively trading off spatial resolution for temporal resolution; if we cut the spatial resolution too much we may no longer be able to clearly identify or resolve single cells by eye in the obtained images. However, we show that, given the high-spatial-resolution information obtained in the first imaging phase, the demixing stage of CNMF can recover the temporal signals of interest even from images that have undergone radical spatial decimation (an order of magnitude or more). In other words, CNMF significantly shifts the tradeoff between spatial and temporal resolution, enabling us to image larger neuronal populations at higher temporal resolution.

The rest of this paper is organized as follows. We first describe how temporal and spatial decimation (along with several other improvements) can be used within the CNMF algorithm to gain order-of-magnitude speed-ups in calcium imaging video processing. Next we investigate how decimation can enable faster imaging of larger populations for light-sheet and 2P imaging. We show the importance of the initial cell identification phase, quantitatively illustrate how CNMF changes the tradeoff between spatial and temporal resolution, and discuss how spatially decimated imaging followed by demixing can be interpreted as a simple compression and decoding scheme. We show that good estimates of the neural shapes can be obtained on a small batch of standard-resolution data, corresponding to a short cell-identification imaging phase. Finally we demonstrate that interleaved imaging that translates the pixels by subpixel shifts on each frame further improves the fidelity of the recovered neural time series.

Constrained non-negative matrix factorization (see

Mean-squared-error as a function of wall time.

We begin by considering imaging data obtained at low temporal resolution, specifically a whole-brain light-sheet imaging recording acquired at a rate of 2 Hz using nuclear localized GCaMP6f in zebrafish. We restricted our analysis to a representative patch shown in

The first algorithmic improvement follows from the realization that some of the constraints applied in CNMF are unnecessary, at least during early iterations of the algorithm, when only crude estimates for

Next we found that significant additional speed-ups in this simplified problem could be obtained by simply changing the order in which the variables in this simplified block-coordinate descent scheme are updated [

Next we reasoned that to obtain a good preliminary estimate of the spatial shape matrix

Besides downsampling methods, we also considered dimensionality reduction via structured random projections [

Further speed gains were obtained when applying spatial decimation (computing a mean within ^{2}, RSS), apparently because the spatially-decimated solutions are near better local optima in the squared-error objective function than are the non-decimated solutions.

In summary, by simplifying the early iterations of the CNMF algorithm (by removing the temporal deconvolution constraints to use fast HALS iterations on temporally and spatially subsampled data), we obtained remarkable speed-ups without compromising the accuracy of the obtained solution, at least in terms of the sum-of-squares objective function. But how do these modifications affect the extracted neural shapes and activity traces? We ran the algorithm without decimation until convergence and with decimation for 1 and 10 s respectively.

Our focus has been on speeding up CNMF, one computational bottleneck of the entire processing pipeline. For completeness, we report the times spent on each step of the pipeline in

CNMF method | [ |
here & [ |
---|---|---|

load data^{†} |
22 ± 1 | 22 ± 1 |

decimate | N/A | 33 ± 1 |

detect ROIs^{‡} |
5,360 ± 20 | 135 ± 2 |

NMF | 35,100 ± 400 | 990 ± 10 |

Δ |
410 ± 10 | 410 ± 10 |

denoise | 17,900 ± 300 | 600 ± 20 |

Average computing time (± SEM) in ms over ten runs for individual steps of the processing pipeline on a standard laptop. The 96×96 patch contained

^{†}Loading the whole data as single binary file; loading a frame at a time was an order of magnitude slower.

^{‡}Using greedy initialization; group lasso initialization [

We have shown that decimation leads to much faster computational processing of calcium video data. More importantly, these results inspired us next to propose a method for faster image acquisition or for imaging larger neural populations. The basic idea is quite simple: if we can estimate the quantities of interest (^{2} (though of course this situation is slightly more complex in the case of scanning two-photon imaging; we will come back to this issue below), we should be able to use our newly-expanded pixel budget to image more cells, or image the same population of cells faster.

As we will see below, this basic idea can be improved upon significantly: if we have a good estimate for the spatial neural shape matrix ^{2} factor) with minimal loss in accuracy of the estimated activity

We began by quantifying the potential effectiveness of this strategy using the zebrafish light-sheet imaging data examined in the last section. We emulated the decrease in spatial resolution by decimating the original imaging data as well as the neural shapes we had obtained by the CNMF approach, with a variety of decimation factors _{l} to extract and demix the activities _{l} from the corresponding downscaled data _{l}. The reconstruction based on _{1} and _{l} fits the original data _{l} and thus does not capture the fine spatial structure present in _{l} traces by comparing them to the original traces _{1} obtained from the full original data _{l}. The correlation between _{1} and _{l} decreases gracefully with the decimation factor _{l} directly from _{l}, instead of estimating _{1} from the full data _{l}. The problem in this ‘single-phase’ imaging setting is that ROI detection fails catastrophically once the pixelization becomes too coarse.

_{l} as well as the correlation between the calcium traces _{l} inferred from the decimated data versus the original traces _{1} obtained from the non-decimated data. Further, the results obtained with merely 1-phase imaging for decimation by 8×8 pixels are shown as cyan traces. _{l} and _{1} for all cells recovered from this patch. Thick lines show the median, thin lines and shaded region the interquartile range (IQR). The correlation decays slowly as _{l} are estimated directly from decimated data (cyan) instead of being decimated from the shapes _{1} estimated from a standard-resolution imaging phase. Mean-squared-error (dashed, right y-axis) normalized so that without decimation the value is 1.

Thus the results are quite promising; with the two-phase imaging strategy we can effectively increase our imaging budget (as measured by ^{2}) by over an order of magnitude with minimal loss in the accuracy of the obtained activity traces _{l}, and we also observe clear advantages of the two-phase over the single-phase decimation approaches. Finally, _{l}, even with quite large decimation levels _{l} from _{l} and _{l} serving as the decoder; here the compression ratio is ^{2}) that could be useful e.g. for wireless recordings, or any bandwidth- or memory-limited pipeline [

Turning towards imaging data acquired at a faster frame-rate, we next consider the case of a 2P calcium imaging dataset from mouse visual cortex acquired at 20 Hz. We chose ROIs based on the correlation image and max-projection image, cf. _{1} and fluorescence activity _{1} using CNMF. The upper row in _{1} ⋅ _{1} based on CNMF, illustrating that all relevant ROIs have been detected and the matrix decomposition computed by CNMF captures the data well. The lower row shows the spatially decimated data _{l} and the reconstruction _{1} ⋅ _{l} based on sparse demixing of _{l} using knowledge of the neural shapes _{1} from an initial cell identification imaging phase. As in the example in the last section, this illustrates that sparse demixing paired with cell shape identification applied to decimated data (panel D) captures the data virtually as well as CNMF without decimation (panel B). See

_{1} ⋅ _{1} (plus the estimated background). _{l}; _{1} ⋅ _{l}. The reconstruction looks very similar to the denoised high-resolution data of (B). Note: contours in (B-D) are not recomputed in each panel, but rather are copied from (A), to aid comparison.

_{l} recovered from spatially decimated data (using the 2-phase imaging approach) depend quite weakly on the decimation factor _{1} even for small _{l} into estimates of neural activity _{l}, using the sparse non-negative deconvolution method described in [_{1} and _{l} is strongest for the highest-ranked ROIs.

In each row, a denoised trace from _{l} is shown above the corresponding deconvolved trace from _{l}. Shapes were decimated by averaging 4×4 or 16×16 pixels. The legend shows the resulting shapes _{l} as well as the correlation of the inferred denoised fluorescence _{l} versus the estimate _{1} obtained without decimation. Further, the results obtained with merely 1-phase imaging for decimation by 4×4 pixels are shown as cyan traces.

We summarize the results over all neurons in this dataset in _{1} and _{l} decreases gracefully with _{l} were obtained directly on low-resolution data (1 phase imaging; cyan lines), similarly but more dramatically than in _{1} and _{l} (dashed), though the correlation between _{1} and _{l} does decay more quickly than does the correlation between _{1} and _{l}.

_{1} and _{l} and deconvolved traces _{1} and _{l}. Decimating in _{l} are not inferred in the pre-screening phase but are instead estimated directly from downscaled data (cyan). The same holds for the deconvolved traces (dashed). ^{s} instead of inferred

So far we have used the correlation between _{1} and _{l} (or _{1} and _{l}) to quantify the robustness of signal recovery from the decimated data _{l}. However, _{1} and _{1} should not be considered “ground truth”: these are merely estimates of the true underlying neural activity, inferred from noisy data _{1} and _{l} are close for small values of _{l} a significantly worse estimate of the true underlying neural activity _{1}? Of course ground truth neural activity is not available for this dataset, but we can simulate data ^{s} and compare how well ^{s}. To generate this simulated dataset, we started with ^{s} ≔ _{1} and ^{s} ≔ _{1} recovered from the full-resolution original data ^{s} = ^{s}^{s} + ^{s}, where ^{s} is chosen to match the statistics of the original residual _{1}_{1} − _{d} in that pixel while keeping the link between variance and mean; see ^{s}, and ^{s}, and compared the results in ^{s} is about 0.8 even if no decimation is applied), and the correlation between ground truth and the recovered

The autoregressive model for the calcium dynamics are of course not exact. Thus as a further control simulation we generated another dataset that used Poisson noise and ground truth fluorescence

In practice we envision that the first phase of cell identification will only be performed on a small initial batch of the data (and maybe also at the end of the experiment to check for consistency). Therefore we experimented with reducing the amount of data used to infer the neural shapes from the full data (2,000 frames, 100 s) to 1,000 frames (50 s) and 500 frames (25 s) respectively.

_{l} obtained on decimated data and the reference _{1} obtained without any decimation. Similar results hold for median and IQR, but the resulting plot is too cluttered. ^{s}. Traces were obtained on decimated data with reshuffled residuals, otherwise analogous to (B).

Finally, we investigated whether it would be possible to further improve the recovery of _{l} from very highly decimated data via an interleaving strategy [

_{l} obtained on decimated data and the reference _{1} obtained without any decimation. The average correlation (±SEM) decays faster without (cyan) than with interleaving (green). Similar results hold for median and IQR, but the resulting plot is too cluttered. ^{s}. Traces were obtained on decimated data with reshuffled residuals, otherwise analogous to (B).

The basic message of this paper is that standard approaches for imaging calcium responses in large neuronal population—which have historically been optimized so that humans can clearly see cells blink in the resulting video—lead to highly redundant data, and we can exploit this redundancy in several ways. In the first part of the paper, we saw that we can decimate standard calcium imaging video data drastically, to obtain order-of-magnitude speedups in processing time with no loss (and in some cases even some gain) in accuracy of the recovered signals. In the second part of the paper, we saw that, once the cell shapes and locations are identified, we can drastically reduce the spatial resolution of the recording (losing the ability to cleanly identify cells by eye in the resulting heavily-pixelated movies) but still faithfully recover the neural activity of interest. This in turn leads naturally to a proposed two-phase imaging approach (first, identify cell shapes and locations at standard resolution; then image at much lower spatial resolution) that can be seen as an effort to reduce the redundancy of the resulting video data.

We anticipate a number of applications of the results presented here. Regarding the first part of the paper: faster computational processing times are always welcome, of course, but more fundamentally, the faster algorithms developed here open the door towards guided experimental design, in which experimenters can obtain images, process the data quickly, and immediately use this to guide the next experiment. With more effort this closed-loop approach can potentially be implemented in real-time, whether for improving optical brain-machine interfaces [

Highly redundant data streams are by definition highly compressible. The results shown in

Regarding applications of the proposed two-phase imaging approach: we can potentially use this approach to image either more cells, or image cells faster, or some combination of both. In most of the paper we have emphasized the first case, in which we ‘zoom out’ to image larger populations at standard temporal resolution. However, a number of applications require higher temporal resolution. One exciting example is the larval zebrafish, where it is already possible to image the whole brain, but light-sheet whole-brain volumetric imaging rates are low [

A number of previous papers can be interpreted in terms of reducing the redundancy of the output image data. Our work can be seen as one example of the general theme of increasing the ratio

We expect that different strategies for increasing

One critical assumption in our simulations is that the total recorded photon flux per frame is the same for each decimation level

In traditional two-photon imaging the situation is more complicated. The image is created by serially sweeping a small, diffraction limited point across the sample. Along the “fast” axis, the beam moves continuously, and the integrated signal across a line is constant, regardless of detection pixelation—the signal is simply partitioned into more or fewer bins. Along the “slow” axis, however, the galvonometers are moved in discrete steps, and low pixel numbers generally mean that portions of the image are not scanned, increasing frame speed, but concomitantly these ‘missed’ areas generate no signal. This consequently reduces the total number of photons collected. Thus to achieve the same photon flux over the larger (lower spatially sampled) pixels, while maintaining the same SNR, we require an enlarged PSF, which maps a larger sampled volume to each pixel. This approach was recently demonstrated to be effective in [

In any instantiation, maximal imaging speed will be limited by the time required to collect enough photons for adequate SNR, which in turn is limited by photophysics and the light tolerance of the sample. In future work we plan to pursue both light-sheet and 2P implementations of the proposed two-phase imaging approach, to quantify the gains in speed and FOV size that can be realized in practice.

We also expect techniques for denoising, demixing, and deconvolution of calcium imaging video to continue to improve in the near future, as more accurate nonlinear, non-Gaussian models for calcium signals and noise are developed; as new demixing methods become available, we can easily swap these methods in in place of the CNMF approach used here. We expect that the basic points about temporal and spatial decimation discussed in this paper will remain valid even as newer and better demixing algorithms become available.

Light-sheet imaging of zebrafish was conducted according to protocols approved by the Institutional Animal Care and Use Committee of the Howard Hughes Medical Institute, Janelia Research Campus. Two-photon imaging of mouse was carried out in accordance with animal protocols approved by the Columbia University Institutional Animal Care and Use Committee.

The calcium fluorescence of the whole brain of a larval zebrafish was recorded using light-sheet imaging. It was a transgenic (GCaMP6f) zebrafish embedded in agarose but with the agarose around the tail removed. The fish was in a fictive swimming virtual environment as described in [

In vivo two-photon imaging was performed in a transgenic (GCaMP6s) mouse through a cranial window in visual cortex. The mouse was anesthetized (isoflurane) and head-fixed on a Bruker Ultima in vivo microscope with resonant scanners, and spontaneous activity was recorded. The field of view extended over 350 μm × 350 μm and was recorded for 100 seconds with a resolution of 512×512 pixels at 20 frames per second.

In the case of 2P imaging, the field of view contained

For the zebrafish data we ensured that the spatial components are localized, by constraining them to lie within spatial patches (which are not large compared to the size of the cell body) around the neuron centers, thus imposing sparsity on _{n} denotes the

The matrix products ^{⊤}^{⊤} in Algorithm 1 are computationally expensive for the full data. These matrix products can also be performed on GPU instead CPU; whereas for the comparatively small 96×96 patches we did not obtain any speed-ups using a GPU, we verified on patches of size 256×256 that some modest overall speedups (a factor of 1.5–2) can be obtained by porting this step to a GPU.

For the decimated data the matrix products are cheap enough to iterate just once over all neurons and instead alternate more often between updating shapes and activities (instead of performing many iterations within HALS^{⊤}^{⊤}, we increase the inner iterations in HAL

^{⊤} |
^{⊤} |

^{⊤} |
^{⊤} |

Algorithm 1 is a constrained version of fast HALS. To further improve on fast HALS, [_{n}; i.e., we are already focusing on the nonzero elements.

Because for the 2P data the observed imaging rate is much higher than the decay rate of the calcium indicator, we constrain the temporal traces _{d} by averaging the power spectral density (PSD) over a range of high frequencies, and estimate the coefficients of the AR(2) process for each cell following [

Every spatial component in _{2}-norm, with the corresponding temporal component scaled accordingly. Following [_{4}-norm of the corresponding spatial footprints, to penalize overly broad and/or noisy spatial shapes.

In order to calculate the Δ^{⊤} onto the shape ^{⊤}

To compress the data using truncated SVD in ^{⊤}^{⊤}. This method was faster than the randomized method due to [

Random compression was performed as described in [^{⊤} to obtain ^{⊤}. The iterated alternating fast HALS updates were ^{⊤}^{⊤}^{⊤}, ^{⊤}), with ^{⊤}^{⊤} computed once initially.

Applying the code of [_{1} and spatial background _{1}. We use the convention that the presence of a lower index _{1}, _{1} as well as _{1}. To emulate imaging with lower spatial resolution, spatial decimation was performed by converting _{1} back into a 512 × 512 × _{l} (data _{l}). We proceeded analogously for the spatial background to obtain _{l}. The corresponding temporal traces were estimated by solving _{l} replacing _{l} replacing _{l} and _{l} with the result of plain NMF that does not impose temporal constraints, i.e. solving _{l} ≥ 0.

In order to obtain the results for 1-phase imaging without previous shape identification we solved _{l}, initializing _{l}, _{l} by decimating _{1}, _{1} and setting the temporal components to _{1}, _{1}. With increasing decimation factors an increasing number of shapes got purged and absorbed in the background, reflecting the fact that it would have been difficult to detect all ROIs on low resolution data in the first place. Using the obtained remaining shapes we again solved

To obtain some form of ground truth (Figs ^{s} by taking the inferred quantities as actual ground truth: ^{s} ≔ _{1}, ^{s} ≔ _{1}, ^{s} ≔ _{1}, ^{s} ≔ _{1}. We calculated the residual ^{s}^{s} − ^{s}^{s⊤} and reshuffled it randomly but signal dependent for each pixel in time. We partitioned the residual for each pixel into 200 strata according to signal size and reshuffled it within each strata, thus retaining any potential link between noise variance and signal mean. The simulated dataset ^{s} was obtained by adding the reshuffled residual to ^{s}^{s} + ^{s}^{s⊤} and the same analysis as for the original data was performed.

We performed additional control simulations that also took the inferred quantities as actual ground truth, ^{2}, the Poisson model results in heteroscedastic noise because its variance grows with the mean. The variance of the Gaussian noise was chosen to be equal to the average variance of the Poisson noise. The results shown in

Another control simulation merely took _{1}, _{1} and _{1} as ground truth. However, instead of taking the denoised fluorescence traces _{1}, which by construction followed the autoregressive model, the fluorescence traces _{1}. Poisson noise was added,

Projecting the noise of each pixel onto the neural shapes yields the noise of each neural time series. In practice the latter is estimated based on the noisy trace obtained by projecting the fluorescence data onto the shapes. For interleaved imaging (_{odd} and _{even} based on the PSD for all odd and even frames respectively. The residuals in the noise constraint of the non-negative deconvolution were weighted accordingly by the inverse of the noise level.

where _{odd} and _{even} respectively. _{odd} and _{even} denote the vectors obtained by taking only every second component of

All analyses were performed on a MacBook Pro with Intel Core i5-5257U 2.7 GHz CPU and 16 GB RAM. We wrote custom Python scripts that called the Python implementation [

Structures that are smaller in size are more sensitive to binning.

(PDF)

The simulated ground truth traces were obtained on decimated simulated data with Poisson (orange) or Gaussian (cyan) noise instead of reshuffling (

(PDF)

The simulated ground truth traces were obtained on decimated simulated data with Poisson noise and calcium responses that were not modeled as AR process, but instead obtained from real data using

(PDF)

The supplementary video shows in the upper row the raw data, its reconstruction based on CNMF, and the residual. It illustrates that all relevant ROIs have been detected and the matrix decomposition afforded by CNMF captures the data well. The lower row shows spatially decimated raw data, corresponding to data acquisition at lower resolution, its reconstruction based on CNMF and knowledge of the neural shapes from an initial cell identification imaging phase, and the residual. It illustrates that demixing paired with cell shape identification captures the data virtually as well as CNMF without decimation, i.e. based on data of low resolution it enables its reconstruction at higher resolution.

(MP4)

All panels of the supplementary video are analogous to

(MP4)

We would like to thank Weiqun Fang for preparing the mouse and Eftychios Pnevmatikakis, Lloyd Russell, Adam Packer, and Jeremy Freeman for helpful conversations. We thank Andrea Giovannucci for his efforts to make a Python implementation of CNMF available. Finally, thanks to Guillermo Sapiro and Mariano Tepper for helpful conversations about [

Part of this work was previously presented at the NIPS (2015) workshop on Statistical Methods for Understanding Neural Systems [

The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/IBC, or the U.S. Government.