## This is an uncorrected proof.

You are currently viewing an older version of this article. A new version is available.

## Figures

## Abstract

Fluorescent calcium indicators are a popular means for observing the spiking activity of large neuronal populations, but extracting the activity of each neuron from raw fluorescence calcium imaging data is a nontrivial problem. We present a fast online active set method to solve this sparse non-negative deconvolution problem. Importantly, the algorithm progresses through each time series sequentially from beginning to end, thus enabling real-time online estimation of neural activity during the imaging session. Our algorithm is a generalization of the pool adjacent violators algorithm (PAVA) for isotonic regression and inherits its linear-time computational complexity. We gain remarkable increases in processing speed: more than one order of magnitude compared to currently employed state of the art convex solvers relying on interior point methods. Unlike these approaches, our method can exploit warm starts; therefore optimizing model hyperparameters only requires a handful of passes through the data. A minor modification can further improve the quality of activity inference by imposing a constraint on the minimum spike size. The algorithm enables real-time simultaneous deconvolution of *O*(10^{5}) traces of whole-brain larval zebrafish imaging data on a laptop.

## Author summary

Calcium imaging methods enable simultaneous measurement of the activity of thousands of neighboring neurons, but come with major caveats: the slow decay of the fluorescence signal compared to the time course of the underlying neural activity, limitations in signal quality, and the large scale of the data all complicate the goal of efficiently extracting accurate estimates of neural activity from the observed video data. Further, current activity extraction methods are typically applied to imaging data after the experiment is complete. However, in many cases we would prefer to run closed-loop experiments—analyzing data on-the-fly to guide the next experimental steps or to control feedback—and this requires new methods for accurate real-time processing. Here we present a fast activity extraction algorithm addressing both issues. Our approach follows previous work in casting the activity extraction problem as a sparse nonnegative deconvolution problem. To solve this optimization problem, we introduce a new algorithm that is an order of magnitude faster than previous methods, and progresses through the data sequentially from beginning to end, thus enabling, in principle, real-time online estimation of neural activity during the imaging session. This computational advance thus opens the door to new closed-loop experiments.

**Citation: **Friedrich J, Zhou P, Paninski L (2017) Fast online deconvolution of calcium imaging data. PLoS Comput Biol 13(3):
e1005423.
https://doi.org/10.1371/journal.pcbi.1005423

**Editor: **Joshua Vogelstein,
Johns Hopkins University, UNITED STATES

**Received: **November 7, 2016; **Accepted: **February 24, 2017; **Published: ** March 14, 2017

**Copyright: ** © 2017 Friedrich et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **All code is available on https://github.com/j-friedrich/OASIS and linked repositories therein. We used a publicly available dataset provided by the GENIE project, Svoboda lab, at Janelia Research Campus on http://crcns.org (Chen et al., 2013; GENIE project, 2015).

**Funding: **Funding for this research was provided by Swiss National Science Foundation (http://www.snf.ch) Research Award P300P2_158428 (JF), National Institutes of Health (NIH, https://www.nih.gov) 2R01MH064537 and R90DA023426 (PZ), Simons Foundation (https://www.simonsfoundation.org) Global Brain Research Awards 325171 and 365002 (JF,LP), Army Research Office (ARO, https://www.arl.army.mil?page=472) MURI W911NF-12-1-0594, NIH BRAIN Initiative (https://www.braininitiative.nih.gov) R01 EB22913 and R21 EY027592, Defense Advanced Research Projects Agency (DARPA, http://www.darpa.mil) N66001-15-C-4032 (SIMPLEX), and a Google Faculty Research award (LP); in addition, this work was supported by the Intelligence Advanced Research Projects Activity (IARPA, https://www.iarpa.gov) via Department of Interior/ Interior Business Center (DoI/IBC) contract number D16PC00003 (LP) and D16PC00007 (LP, PZ). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/IBC, or the U.S. Government.

**Competing interests: ** The authors have declared that no competing interests exist.

This is a

PLOS Computational BiologyMethods paper.

## Introduction

Calcium imaging has become one of the most widely used techniques for recording activity from neural populations in vivo [1]. The basic principle of calcium imaging is that neural action potentials (or spikes), the point process signal of interest, each induce an optically measurable transient response in calcium dynamics. The nontrivial problem of extracting the activity of each neuron from a raw fluorescence trace has been addressed with several different approaches, including template matching [2] and linear deconvolution [3, 4], which are outperformed by sparse non-negative deconvolution [5]. The latter can be interpreted as the maximum a posteriori (MAP) estimate under a simple generative model (linear convolution plus noise; Fig 1), whereas fully Bayesian methods [6–8] can provide some further improvements, but are more computationally expensive. Supervised methods trained on simultaneously-recorded electrophysiological and imaging data [9, 10] have also recently achieved state of the art results, but are more black-box in nature; Bayesian methods based on a well-defined generative model are somewhat easier to generalize to more complex multi-neuronal or multi-trial settings [11–13].

Spike train ** s** gets filtered to produce calcium trace

**; here we used**

*c**p*= 2 as order of the AR process. Added noise yields the observed fluorescence

**.**

*y*The methods above are typically applied to imaging data offline, after the experiment is complete; however, there is a need for accurate and fast real-time processing to enable closed-loop experiments, a powerful strategy for causal investigation of neural circuitry [14]. In particular, observing and feeding back the effects of circuit interventions on physiologically relevant timescales will be valuable for directly testing whether inferred models of dynamics, connectivity, and causation are accurate in vivo, and recent experimental advances [15, 16] are now enabling work in this direction. Brain-computer interfaces (BCIs) also rely on real-time estimates of neural activity. Whereas most BCI systems rely on electrical recordings, BCIs have been driven by optical signals too [17], providing new insight into how neurons change their activity during learning on a finer spatial scale than possible with intracortical electrodes. Finally, adaptive experimental design approaches [18–20] also rely on online estimates of neural activity.

Even in cases where we do not require the strict timing/latency constraints of real-time processing, we still need methods that scale to large data sets as for example in whole-brain imaging of larval zebrafish [21, 22]. A further demand for scalability stems from the fact that the deconvolution problem is solved in the inner loop of constrained non-negative matrix factorization (CNMF) [13], the current state of the art for simultaneous denoising, deconvolution, and demixing of spatiotemporal calcium imaging data.

In this paper we address the pressing need for scalable online spike inference methods. Building on previous work, we frame this estimation problem as a sparse non-negative deconvolution. Current algorithms employ interior point methods to solve the ensuing optimization problem and are fast enough to process hundreds of neurons in about the same time as the recording [5], but can not handle larger data sets such as whole-brain zebrafish imaging in real time. Furthermore, these interior point methods scale linearly in the length of the recording, but they cannot be warm started [23], i.e., initialized with the solution from a previous iteration to gain speed-ups, and do not run online.

Here we note a close connection between the MAP problem and isotonic regression, which fits data by a monotone piecewise constant function. A classic algorithm for isotonic regression is the pool adjacent violators algorithm (PAVA) [24, 25], which can be understood as an online active-set optimization method. We generalized PAVA to derive an Online Active Set method to Infer Spikes (OASIS); this new approach to solve the MAP problem yields speed-ups in processing time by at least one order of magnitude compared to interior point methods on both simulated and real data. Further, OASIS can be warm-started, which is useful in the inner loop of CNMF, and also when adjusting model hyperparameters, as we show below. Importantly, OASIS is not only much faster, but operates in an online fashion, progressing through the fluorescence time series sequentially from beginning to end. The advances in speed paired with the inherently online fashion of the algorithm enable true real-time online spike inference during the imaging session (once the spatial shapes of neurons in the field of view have been identified), with the potential to significantly impact experimental paradigms.

## Methods

This section is organized as follows. The first subsection introduces the autoregressive (AR(*p*)) model for calcium dynamics.

In the second subsection we derive an Online Active Set method to Infer Spikes (OASIS) for an AR(1) model. The algorithm is inspired by the pool adjacent violators algorithm (PAVA, Alg 1), which we review first and then generalize to obtain OASIS (Alg 2). This algorithm requires some hyperparameter values; the optimization of these hyperparameters is described next, along with several computational tricks for speeding up the hyperparameter estimation. We finally discuss thresholding approaches for reducing the number of small values returned by the original L1-penalized approach. The resulting problem is non-convex, and so we lose guarantees on finding global optima, but we can easily adapt OASIS to quickly find good solutions.

In the third subsection we generalize to AR(*p*) models of the calcium dynamics and describe a dual active set algorithm that is analogous to the one presented for the AR(1) case (Alg 2). However, this algorithm is greedy if *p* > 1 and yields only a good approximate solution. We can refine this solution and obtain the exact result by warm-starting an alternative primal active set method we call ONNLS (Alg 3). Finally, Alg 4 summarizes all of these steps.

### Model for calcium dynamics

We assume we observe the fluorescence signal for *T* timesteps, and denote by *s*_{t} the number of spikes that the neuron fired at the *t*-th timestep, *t* = 1, …, *T*, cf. Fig 1. Following [5, 13], we approximate the calcium concentration dynamics ** c** using a stable autoregressive process of order

*p*(AR(

*p*)) where

*p*is a small positive integer, usually

*p*= 1 or 2, (1) The observed fluorescence is related to the calcium concentration as [5–7]: (2) where

*a*is a non-negative scalar,

*b*is a scalar offset parameter, and the noise is assumed to be i.i.d. zero mean Gaussian with variance

*σ*

^{2}. For the remainder we assume units such that

*a*= 1 without loss of generality. We begin by assuming

*b*= 0 for simplicity, but we will relax this assumption later. (We also assume throughout that all parameters in sight are fixed; in case of e.g. drifting baselines

*b*we could generalize the algorithms discussed here to operate over shorter temporal windows, but we do not pursue this here.) The parameters

*γ*

_{i}and

*σ*can be estimated from the autocovariance function and the power spectral density (PSD) of

**respectively [13]. The autocovariance approach assumes that the spiking signal**

*y***comes from a homogeneous Poisson process and in practice often gives a crude estimate of**

*s**γ*

_{i}. We will improve on this below by fitting the AR coefficients directly, which leads to better estimates, particularly when the spikes have some significant autocorrelation.

The goal of calcium deconvolution is to extract an estimate of the neural activity ** s** from the vector of observations

**. As discussed in [5, 13], this leads to the following non-negative LASSO problem for estimating the calcium concentration: (3) where the**

*y**ℓ*

_{1}penalty on enforces sparsity of the neural activity and the lower triangular matrix

*G*is defined as: (4) The deconvolution matrix

*G*is banded with bandwidth

*p*for an AR(

*p*) process. Equivalently,

**=**

*s******

*c***with**

*g**g*a finite impulse response filter of order

*p*(

*p*+ 1 filter taps) and * denoting convolution. To produce calcium trace

**, spike train**

*c***is filtered with the inverse filter of**

*s***, an infinite impulse response**

*g***,**

*h***=**

*c******

*s***. (Although our main focus is on the autoregressive model, we will discuss more general convolutional observation models below as well, and touch on nonlinear effects such as saturation in the Appendix.) Following the approach in [5], note that the spike signal is relaxed from non-negative integers to arbitrary non-negative values.**

*h*### Derivation of the active set algorithm

The optimization problem (3) could be solved using generic convex program solvers. Here we derive the much faster Online Active Set method to Infer Spikes (OASIS). The algorithm is inspired by the pool adjacent violators algorithm (PAVA) [24, 25], which we review first for readers not familiar with this classic algorithm before generalizing it to the non-negative LASSO problem.

#### Pool Adjacent Violators Algorithm (PAVA).

The pool adjacent violators algorithm (Alg 1) is a classic exact algorithm for isotonic regression, which fits data by a non-decreasing piecewise constant function. This algorithm is due to [24] and was independently discovered by other authors [26, 27] as reviewed in [25, 28]. It can be considered as a dual active set method [29]. Formally, the (convex) problem is to (5)

We first present the algorithm in a way that conveys its core ideas (see Alg A in S1 Appendix), then improve the algorithm’s efficiency by introducing “pools” of variables (adjacent *x*_{t} values) which are updated simultaneously. We introduce temporary values ** x**′ and initialize them to the unconstrained least squares solution,

**′ =**

*x***. Initially all constraints are in the “passive set” and possible violations are fixed by subsequently adding the respective constraints to the “active set”. Starting at**

*y**t*= 2 the algorithm moves to the right until a violation of the constraint at some time

*τ*is encountered. Now the monotonicity constraint is added to the active set and enforced by setting . (Supposing the opposite, i.e. , we could move and by some small

*ϵ*to decrease the objective without violating the constraints, yielding a proof by contradiction that the monotonicity constraint should be made “active” here—i.e., the constraint holds with strict equality.) We update the values at the two time steps to the best possible fit with constraints. Minimizing their contribution to the residual by setting the derivative with respect to to zero, , amounts to replacing the values with their average, . However, this updated value can violate the constraint and we need to add this constraint to the active set and update as well, , etc. In this manner the algorithm continues to back-average to the left as needed until we have backtracked to time

*t*′ where the constraint is already valid. Solving (6) by setting the derivative to zero yields an update that corresponds to averaging (7) The optimal solution that satisfies all constraints up to time

*τ*has been found and the search advances to the right again until detection of the next violation, backtracks again, etc. This process continues until the last value

*x*

_{T}is reached and having found the optimal solution we return

**=**

*x***′.**

*x***Algorithm 1** Pool Adjacent Violators Algorithm (PAVA) for isotonic regression

**Require:** data *y*_{t} ∈ ** y** at time of reading

1: initialize set of pools , data index *t* ← 0, pool index *i* ← 0

2: **for** *y* in *y***do** ▹read next data point *y*

3: *t* ← *t* + 1

4: ▹add pool

5: **while** *i* > 0 and *v*_{i+1} < *v*_{i} **do** ▹merge pools if necessary

6:

7: remove

8: *i* ← *i* − 1

9: *i* ← *i* + 1

10: **for** (*v*, *w*, *t*) in **do** ▹construct solution for all *t*

11: **for** *τ* = 0, …, *w* − 1 **do** *x*_{t+τ} ← *v*

12: **return x**

In a worst case situation a constraint violation is encountered at every step of the forward sweep through the series. Updating all *t* values up to time *t* yields overall updates and an *O*(*T*^{2}) algorithm. However, note that when a violation is encountered the updated time points all share the same value (the average of the data at these time points, Eq 7) and it suffices to track this value just once for all these updated time points [30]. The constraints between the updated time points hold with equality , and are part of the active set. In order to obtain a more efficient algorithm, cf. Algorithm 1 and S1 Video, we introduce “pools” or groups of the form (*v*_{i}, *w*_{i}, *t*_{i}) with value *v*_{i}, weight *w*_{i} and event time *t*_{i} where *i* indices the groups. Initially the ordered set of pools is empty. During the forward sweep through the data the next data point *y*_{t} is initialized as its own pool (*y*_{t}, 1, *t*) and appended to the set of pools. Adjacent pools that violate the constraint *v*_{i+1} ≥ *v*_{i} are combined to a new pool . Whenever pools *i* and *i* + 1 are merged, former pool *i* + 1 is removed. It is easy to prove by induction that these updates guarantee that the value of a pool is indeed the average of the corresponding data points (see S1 Appendix) without having to explicitly calculate it using Eq (7). The latter would be expensive for long pools, whereas merging two pools has *O*(1) complexity independent of the pool lengths. With pooling the considered worst case situation results in a single pool and only its value and weight are updated at every step forward, yielding *O*(*T*) complexity. Constructing the optimal solution *x*_{t} for all *t* in a final effort after the optimal pool partition has been reached is also *O*(*T*). At convergence all constraints have been enforced; further note that convergence to the exact solution occurs after a finite number of steps, in contrast to interior point-methods which only approach the optimal solution asymptotically.

#### Online Active Set method to Infer Spikes (OASIS).

Now we adapt the PAVA approach to problem (3). PAVA solves a regression problem subject to the constraint that the value at the current time bin must be greater than or equal to the last. The AR(1) model posits a more general but very similar constraint that bounds the rate of decay instead of enforcing monotonicity. The key insight is that problem (3) is a generalization of problem (5): if *p* = 1 in the AR model and we set *γ* = 1 (we skip the index of *γ* for a single AR coefficient) and λ = 0 in Eq (3) we obtain Eq (5). Therefore we focus first on the *p* = 1 case and deal with *p* > 1 and arbitrary calcium response kernels in the next section.

We begin by inserting the definition of (Eq 3). Using that is constrained to be non-negative yields for the sparsity penalty
(8)
with *μ*_{t} ≔ λ(1 − *γ* + *γδ*_{tT}) (with *δ* denoting Kronecker’s delta) by noting that the sum of the last column of *G* is 1, whereas all other columns sum to (1 − *γ*).

Now the problem
(9)
shares some similarity to isotonic regression with the constraint (Eq 5). However, our constraint bounds the rate of decay instead of enforcing monotonicity. Thus we need to generalize PAVA to handle the additional factor *γ*.

For clarity we mimic our approach from the last section: we first present the algorithm in a way that conveys its core ideas, and then improve the algorithm’s efficiency using pools. We introduce temporary values ** c**′ and initialize them to the unconstrained least squares solution,

**′ =**

*c***−**

*y***. Starting at**

*μ**t*= 2 one moves forward until a violation of the constraint at some time

*τ*is detected (Fig 2A). Updating the two time steps by minimizing yields an updated value . However, this updated value can violate the constraint and we need to update as well, etc., until we have backtracked some Δ

*t*steps to time

*t*′ =

*τ*− Δ

*t*where the constraint is already valid. At most one needs to backtrack to the most recent spike, because at spike times

*t*′ (Eq 1). Solving (10) by setting the derivative to zero yields (11) and the next values are updated according to for

*t*= 1, …, Δ

*t*. Note the similarity of Eqs (7) and (11), which differs by weighting the summands by powers of

*γ*due to the altered constraints, and by subtracting

**from the data**

*μ***due to the sparsity penalty. (Along the way it is worth noting that, because a spike induces a calcium response described by kernel**

*y***with components**

*h**h*

_{1+t}=

*γ*

^{t}, could be expressed in the more familiar regression form as , where we used the notation

*v*_{i:j}to describe a vector formed by components

*i*to

*j*of

**.) Now one moves forward again (Fig 2C–2E) until detection of the next violation (Fig 2E), backtracks again to the most recent spike (Fig 2G), etc. Once the end of the time series is reached (Fig 2I) we have found the optimal solution and set .**

*v*Red lines depict true spike times. The shaded background shows how the time points are gathered in pools. The pool currently under consideration is indicated by the blue crosses. A constraint violation is encountered for the second time step **(A)** leading to backtracking and merging **(B)**. The algorithm proceeds moving forward **(C-E)** until the next violation occurs **(E)** and triggers backtracking and merging **(F-G)** as long as constraints are violated. When the most recent spike time has been reached **(G)** the algorithm proceeds forward again **(H)**. The process continues until the end of the series has been reached **(I)**. The solution is obtained and pools span the inter-spike-intervals.

While this yields a valid algorithm, it frequently updates each value and recalculates the full sums in Eq (11) for each step of backtracking. A similar algorithm has been suggested by [31] for the problem without sparsity penalty. However, it passes through the time series in reverse direction, from its end to its beginning, and is thus not applicable to online processing. It considers directly the deconvolved activity and efficiently does not update all time steps but only suspected spike times. However, their algorithm uses the inefficient updates of Eq 11, rendering it an *O*(*T*^{2}) algorithm.

As in PAVA, next we introduce “pools” into the algorithm; these are of critical importance in order to obtain a true *O*(*T*) algorithm. In PAVA these pools serve as sufficient statistics summarizing the data between jumps in the estimated output *x*_{t}; here the pools summarize the data between estimated spike times, where the estimated calcium signal jumps. Pools are now tuples of the form (*v*_{i}, *w*_{i}, *t*_{i}, *l*_{i}) with value *v*_{i}, weight *w*_{i}, event time *t*_{i} and pool length *l*_{i}. Here we explicitly track the pool length, which was identical to its weight for PAVA. Initially the ordered set of pools is empty. During the forward sweep through the data the next data point *y*_{t} is initialized as its own pool (*y*_{t} − *μ*_{t}, 1, *t*, 1) and appended to the set of pools. During backtracking pools get combined and only the first value is explicitly considered, while the other values are merely defined implicitly via . The constraint translates to as the criterion determining whether pools need to be combined. The introduced weights allow efficient value updates whenever pools are merged by avoiding recalculating the sums in Eq (11). Values are updated according to
(12)
where the denominator is the new weight of the pool and the pool lengths are summed
(13) (14)
Whenever pools *i* and *i* + 1 are merged, former pool *i* + 1 is removed. It is easy to prove by induction that the updates according to Eqs (12–14) guarantee that Eq (11) holds for all values (see S1 Appendix).

Analogous to PAVA, the updates solve Eq (9) not just greedily but optimally, finding the exact solution to the convex problem in *O*(*T*). One important point (especially relevant for online use) is that the computation time per observation timestep is not fixed but random, since we might have to backtrack to update an unpredictable number of pools. We found empirically that, over a wide range of hyperparameters, in about 80% of the cases 0–1 merge operation was performed per observation timestep. With less than 0.5% probability four or more merges were necessary; in all our experiments so far, never more than seven merges were needed.

The final algorithm is summarized in Algorithm 2 and illustrated in Fig 2 as well as in S2 Video. Comparing Algorithm 1 with 2 clearly reveals the modifications made and shows that for *γ* = 1 and λ = 0 the algorithm reduces to PAVA.

**Algorithm 2** Fast online deconvolution algorithm for AR1 processes with positive jumps

**Require:** decay factor *γ*, regularization parameter λ, data *y*_{t} ∈ ** y** at time of reading

1: initialize set of pools , time index *t* ← 0, pool index *i* ← 0, solution

2: **for** *y* in *y* **do** ▹read next data point *y*

3: *t* ← *t* + 1

4: ▹add pool

5: **while** *i* > 0 and **do** ▹merge pools if necessary

7: remove

8: *i* ← *i* − 1

9: *i* ← *i* + 1

10: **for** (*v*, *w*, *t*, *l*) in **do** ▹construct solution for all *t*^{†}

11: **for** *τ* = 0, …, *l* − 1 **do** ▹enforce via max

12: **if** *t* > 1 **then**

13: **return**

^{†}For online estimates of and the solution can be constructed within the loop over ** y** not just after it.

#### Dual formulation with hard noise constraint.

The formulation above contains a troublesome free sparsity parameter λ (implicit in ** μ**). A more robust deconvolution approach chooses the sparsity implicitly by inclusion of the residual sum of squares (RSS) as a hard constraint and not as a penalty term in the objective function [13]. The expected RSS satisfies 〈‖

**−**

*c***‖**

*y*^{2}〉 =

*σ*

^{2}

*T*and by the law of large numbers ‖

**−**

*c***‖**

*y*^{2}≈

*σ*

^{2}

*T*with high probability, leading to the constrained problem (15) (As noted above, we estimate

*σ*using the power spectral estimator described in [13]; see also [8] for a similar approach.)

We will solve this problem by increasing λ in the dual formulation until the noise constraint is tight. We start with some small λ, e.g. λ = 0, to obtain a first partitioning into pools , cf. Fig 3A. From Eqs (11–13) along with the definition of ** μ** (Eq 8) it follows that given the solution (

*v*

_{i},

*w*

_{i},

*t*

_{i},

*l*

_{i}), where (16) for some λ, the solution for λ + Δλ is (17) where

*z*is the index of the last pool and because pools are updated independently we make the approximation that no changes in the pool structure occur. Inserting Eq (17) into the noise constraint (Eq 15) results in (18) and solving the quadratic equation for Δλ yields (19) with ,

*β*= 2 ∑

_{i,t}

*χ*

_{it}

*ξ*

_{it}and where and

*χ*

_{it}=

*y*

_{ti + t}−

*v*

_{i}

*γ*

^{t}.

**(A)** Running the active set method, with conservatively small estimate , yields an initial *denoised* estimate (blue) of the data (gray) roughly capturing the truth (red). We also report the correlation between the *deconvolved* estimate and true spike train as a direct measure for the accuracy of spike train inference. **(B)** Updating sparsity parameter λ according to Eq (18) such that RSS = *σ*^{2} *T* (left) shifts the current estimate downward (right, blue). **(C)** Running the active set method enforces the constraints again and is fast due to warm-starting. **(D)** Updating by minimizing the polynomial function RSS() and **(E)** running the warm-started active set method completes one iteration, which yields already a decent fit. **(F)** A few more iterations improve the solution further. The obtained estimate (blue) is hardly distinguishable from the one obtained with known true *γ* (yellow dashed trace, plotted in addition to the traces in A-E, is on top of blue solid line). Note that determining based on the autocovariance (additionally plotted purple trace) yields a crude solution that even misses spikes (at 24.6 s and 46.5 s).

The solution Δλ provides a good approximate proposal step for updating the pool values *v*_{i} (using Eq 17). Since this update proposal is only approximate it can give rise to violated constraints (e.g., negative values of *v*_{i}. To satisfy all constraints Algorithm 2 is run to update the pool structure, cf. Fig 3C, but with a *warm start*: we initialize with the current set of merely *z* pools instead of the *T* pools for a cold start (Alg 2, line 4). This step returns a set of *v*_{i} values that satisfy the constraints and may merge pools (i.e., delete spikes); then the procedure (update λ then rerun the warm-started Algorithm 2) can be iterated until no further pools need to be merged, at which point the procedure has converged. In practice this leads to an increasing sequence of λ values (corresponding to an increasingly sparse set of spikes), and no pool-split (i.e., add-spike) moves are necessary. (Note that it is possible to cheaply detect any violations of the KKT conditions in a candidate solution; if such a violation is detected, the corresponding pool could be split and the warm-started Algorithm 2 run locally near the detected violations. However, as we noted, due to the increasing λ sequence we did not find this step to be necessary in the examples examined here.)

This warm-starting approach brings major speed benefits: after the residual is updated following a λ update, the computational cost of the algorithm is linear in the number of pools *z*, hence warm starting drastically reduces computational costs from *k*_{1} *T* to *k*_{2} *z* with proportionality constants *k*_{1} and *k*_{2}: if no pool boundary updates are needed then after warm starting the algorithm only needs to pass once through all pools to verify that no constraint is violated, whereas a cold start might involve a couple passes over the data to update pools, so *k*_{2} is typically significantly smaller than *k*_{1}, and *z* is typically much smaller than *T* (especially in sparsely-spiking regimes).

#### Additional baseline.

For ease of exposition we thus far assumed no offsetting baseline. Adding a known baseline *b* ≠ 0 the problem reads
(20)
For known baseline one merely needs to initialize OASIS by subtracting not only the sparsity parameter ** μ**(λ) from the data

**, cf. Eq (11) and Algorithm 2, but also the baseline**

*y**b*. The fluorescence depends only on the sum

**=**

*ϕ**b*

**1**+

**.**

*μ*If the baseline is not known, we want to optimize it too by solving the noise constrained dual problem
(21)
We denote all except the differing last component of ** μ** by

*μ*= λ(1 −

*γ*) (Eq 8) and of

*ϕ*by

*ϕ*=

*b*+ λ(1 −

*γ*).

*ϕ*is the total shift applied to the data (except for the last time step) due to the baseline and sparsity penalty before running OASIS. We increase

*ϕ*until the noise constraint is tight.

*ϕ*can be initialized by min

*y*

_{t}or better by a small percentile of

**, e.g. 15%. Once OASIS has been run with some**

*y**ϕ*the baseline is obtained by minimizing the objective Eq (20) with respect to it, yielding , and the sparsity parameter is . Appropriately adding to Eq (18) (22) and plugging the analytic expression into Eq (22) to account for the changing baseline, we obtain an estimate of Δ

*ϕ*using a block coordinate update of

*ϕ*and . Solving the ensuing quadratic equation for Δ

*ϕ*, yields (23) with ,

*β*= 2 ∑

_{i,t}

*χ*

_{it}

*ξ*

_{it}and where and . All pools are updated according to , cf. Eq (17). To satisfy all constraints Algorithm 2 is run, warm-started by initializing with the current set of pools.

#### Optimizing the AR coefficient.

Thus far the parameter *γ* has been known or been estimated based on the autocovariance function. We can improve upon this estimate by optimizing as well, which is illustrated in Fig 3. After updating λ (and ) followed by running Algorithm 2, we perform a coordinate descent step in that minimizes the RSS, cf. Fig 3D. The RSS as a function of is a high order polynomial, cf. Eq (11), and we need to settle for numerical solutions of
(24)
We used Brent’s method [32] with bounds to solve this problem. One iteration consists now of steps B-E in Fig 3, while for known *γ* only B-C were necessary. If optimizing the baseline too, we obtained better results by minimizing the RSS jointly with respect to and using L-BFGS-B [33] instead of keeping the baseline fixed.

#### Faster optimization of hyperparameters.

We have presented methods to estimate the hyperparameters λ, *b* and *γ*, which require a handful of warm-started iterations of OASIS. To gain further speed-ups these parameters can be estimated on decimated data. When downsampling by a factor *k*, the average of *k* subsequent frames is calculated, the noise divided by a factor and the initial estimate of the AR coefficient scaled to . Alternatively, one could estimate *σ* and *γ* based on the decimated data. Once the hyperparameters have been obtained, the corresponding inverse transformations are performed: , and such that the shrinkage due to the penalty term stays invariant. The final run of OASIS on the full data is warm started using the solution obtained on the decimated data. Data points that are not in the proximity of a spike of the downsampled solution are already combined into large pools, instead of initializing each data point as its own separate pool. More precisely, if the deconvolved decimated data has positive values at times {*t*_{i}}, for deconvolving the full data time steps ⋃_{i}{(*k* − 1)*t*_{i}, …, (*k* + 1.5)*t*_{i}} are initialized as individual pools, while the remaining time steps are pooled together into bigger pools, separated from each other by the individual ones, with values given by Eq (11) and weights by its denominator.

In particular the estimation of the AR coefficient *γ* is computationally burdensome, because it involves expensive repeated evaluations of the RSS in order to minimize it as function of (and ). The computing time depends linearly on the number of pools *z* and we gain further speed-ups by restricting the attention to merely a subset of pools. In particular, because *γ* can be well estimated based on large isolated calcium events, we restrict the calculation of the RSS to the pools with largest product of value and length. A large value indicates a large event and a long pool an isolated event. We present detailed results in the Results section, indicating that altogether we can save about an order of magnitude computation with the greatest savings obtained by reducing the optimization of from *O*(*z*) to *O*(1).

It is also worth noting that the hyperparameter estimation discussed above is performed in ‘batch’ mode, not online. However, once good hyperparameter values are obtained on a short initial batch we can switch into online mode (with the hyperparameters held fixed) and handle the remaining data in a stream.

#### Hard shrinkage and *ℓ*_{0} penalty.

It is well-known that *ℓ*_{1} penalization results in “soft-thresholding” [34], in which small values are zeroed out and large values are shifted to lower values (where the size of this shift is proportional to the penalty λ). We can perform hard instead of soft thresholding (avoiding this shrinkage of large values) by replacing the sparsity penalty by a constraint on the minimum spike size *s*_{min}. The problem
(25)
is non-convex and we are not guaranteed to find the global minimum. However, we obtain a good local minimum by merely changing the condition to merge pools from to , modifying lines 3 and 5 in Algorithm 2.

Now we must choose a value for *s*_{min}. In many cases we found that simply setting *s*_{min} as a small multiple of the noise level led to good results. If the scaling factor *a* (Eq 2) relating fluorescence to action potentials was known, we could properly normalize the spike train such that corresponds to one spike and choose *s*_{min} = 0.5, or a slightly higher value to avoid splitting one spike into two of size 0.5. However, often the factor is unknown or difficult to estimate, rendering the choice of *s*_{min} cumbersome. Analogous to the variation of λ, we can start with *s*_{min} = 0 and increase it until the RSS crosses the *σ*^{2} *T* threshold by sequentially removing the smallest ‘spike’ and merging the pools it used to separate. By maximizing *s*_{min} under the noise constraint we minimize the number of non-zero values of . De facto, we try to find a parsimonious description of the data by minimizing the number of non-zero values of , thus solving a sparsity problem with *ℓ*_{0} penalty:
(26)
Instead of sequentially removing the smallest ‘spike’ we actually obtained the best performance by sequentially adding spikes at the highest values of the *ℓ*_{1}-solution until the RSS is smaller than . While the updates resemble those of matching pursuit [35], in practice we found that adding spikes at the positions suggested by the *ℓ*_{1}-solution yields better results than matching pursuit (which adds spikes at positions that greedily lead to the highest RSS reduction per step). Specifically, we found that often matching pursuit cannot resolve spikes in close proximity, but instead results in erroneous placement of one big spike as an explanation for all nearby spikes. Instead of merging pools we now need to split pools. Denoting the time where to add a spike by *t*_{s}, i.e. the time where the *ℓ*_{1}-solution has its highest value after ruling out times where spikes have already been added, one searches for the pool *i* in which it falls, i.e. *t*_{i} < *t*_{s} < *t*_{i} + *l*_{i}. Pool *i* gets updated as , , and , which follows directly from Eq (11) with *μ*_{t} = 0. All pool indices greater than *i* are increased by one and a new pool is inserted after pool *i* with , , , and .

As is the case with all optimized hyperparameters, once we have obtained a decent estimate of *s*_{min} on an initial subset of the data we can switch back into online mode. In online mode our algorithm is typically faster than matching pursuit, since matching pursuit requires updating *O*(Δ) points of the residual with each update, where Δ is the length of the calcium transient (in number of frames).

### Generalization beyond the AR(1) case

#### A greedy solution for the AR(*p*>1) processes.

An AR(1) process models the calcium response to a spike as an instantaneous increase followed by an exponential decay. This is a good description when the fluorescence rise time constant is small compared to the length of a time-bin, e.g. when using GCaMP6f [36] with a slow imaging rate. For fast imaging rates and slow indicators such as GCaMP6s it is more accurate to explicitly model the finite rise time. Typically we choose an AR(2) process, though more structured responses (e.g. multiple decay time constants) can also be modeled with higher values for the order *p*.

For an AR(*p*) process the sparsity penalty can again be expressed as , because
(27)
with , by evaluating the column sums of *G*. For *p* > 1 the dynamics are no longer first-order Markov and the next value depends not only on the current but on possibly multiple previous time steps. Now following along the lines of the previous section just leads to a greedy, approximate solution; we will present an exact algorithm later. We use matrix- and vector notation to describe the dynamics of *c*_{t}. Let the transition matrix *A*, multi time step calcium vectors *ζ*_{t}, and vector ** e** be defined as
(28)
The calcium dynamics is given by

*ζ*_{t}=

*A*

**ζ**_{t−1}+

*s*

_{t}

**. Analogously to the AR(1) case we derive an algorithm that moves through the time series until it finds a violation of the constraint for some time**

*e**τ*, updates and , and backtracks further until the updates do not violate any constraints at previous time steps. Note that we also implicitly have constraints on

*ζ*_{t}, enforcing the fact that

*ζ*_{t+1}is mostly a time-shifted version of

*ζ*_{t}.

Assuming we need to backtrack by Δ*t* steps and introducing again *t*′ = *τ* − Δ*t*, the objective is to minimize with respect to under the active constraints *ζ*_{t} = *A ζ*

_{t−1}for

*t*=

*t*′ + 1, …,

*τ*. Plugging in the constraints on the dynamics the objective reads (29) Setting the derivative with respect to to zero and solving for yields (30) where denotes the square of the entry in the first row and column in the matrix obtained as

*t*-th matrix power of

*A*. Again, note that these entries describe the calcium kernel

**with components**

*h**h*

_{1+t}= (

*A*

^{t})

_{1,1}. Eq (30) reduces to Eq (11) for

*p*= 1 where

*A*is just a 1 × 1-matrix with entry

*γ*. The next values are updated according to for

*t*= 1, …, Δ

*t*.

We derive again an efficient formulation of the algorithm using pools. Considering the denominator in Eq (30) as a weight in analogy to the AR(1) case and calculating the weighted sum upon merging of pools is not valid for *p* > 1 because in general (*A*^{t})_{1,1}(*A*^{u})_{1,1} ≠ (*A*^{t+u})_{1,1}. Introducing pools is still useful as it allows us to keep track of only a small number of *p* elements in each pool. While for the case of AR(1) we only kept track of each group’s first element, we now keep track of the first as well as the *p* − 1 last elements. In order to speed up the update in Eq (30), we can precompute the powers of *A* and store (*A*^{t})_{1,:} in memory. Note that only the powers up to the maximal inter-spike-interval are needed, which can be much smaller than *T*; of course, for very large values of *t*, (*A*^{t})_{1,:} ≈ 0, by the stability of *A*; thus for high powers the entries of (*A*^{t})_{1,:} can also be well approximated by a quickly computable exponential function or simply be truncated. Analogous to the case *p* = 1, we can also impose a constraint on the minimum spike size *s*_{min} at the expense of having to deal with a non-convex problem by merely changing the condition to merge pools from to where *v*_{i} and *u*_{i} denote the first and last value of pool *i*.

According to Eq (30) the solution is a linear function of ** μ**, and hence of λ. Thus the hard noise constraint for the RSS ‖

**−**

*c***‖**

*y*^{2}=

*σ*

^{2}

*T*is a quadratic equation in λ, that can be solved analytically, under the assumption of invariant pool structure analogous to above case of AR(1), but involving more lengthy expressions which we state explicitly in S1 Appendix. Updating all pools independently according to Eq (30) can give rise to violated constraints, requiring us to rerun the algorithm, warm-started by initializing with the current set of pools, as described above. After 2–3 iterations no pools need to be merged and the final solution has been found. We can again interleave an update step for optimizing the parameters

*γ*

_{i}, as described above.

#### Online Non-Negative Least Squares (ONNLS).

We noted above that Eq (30) is not first-order Markovian: it includes a dependency on *p* − 1 previous time steps and hence in general the previous pool. In updating only the first value within a pool and using the current values of the *p* − 1 last values of the previous pool within the update Eq (30), we actually performed greedy updates. These greedy updates can yield remarkably good results, in particular for long pools, such that the last value is already well constrained by a number of data points and hardly affected by the next pool. Nonetheless, in some cases these greedy updates lead to errors in the timing of inferred activity, in particular when the rise time of the calcium response is slow compared to the frame rate. The method described in this section can be used to correct these small errors. It is again an active set method that can be run in online mode; however, the method introduced above is a *dual* active set method, whereas here we describe a *primal* active set method.

We begin by reformulating the problem as
(31)
where *K* = *G*^{−1} is the convolution matrix with entries *K*_{t,u} = *h*_{1+t−u} if *t* ≥ *u* else zero; the kernel vector ** h** can be taken as an arbitrary response kernel for most of the development in this section. As noted earlier,

*h*

_{1+t}= (

*A*

^{t})

_{1,1}for the special case of an AR process. As we have seen previously, the effect of the sparsity penalty (together with the non-negative constraint) is to shift the data down by a vector

**=**

*μ**λK*

^{−⊤}

**1**, and the problem reduces to a non-negative least squares (NNLS) problem. (32) (Note that the gradient of Eq (32) is the same as the gradient of Eq (31), . In addition,

*K*is triangular with positive numbers on the main diagonal, hence det

*K*> 0 and

*K*is invertible.)

A classic algorithm for solving a NNLS problem is the active set method of [37] and [38]. This algorithm alternates between normal equation matrix solves involving sub-matrices of *K*^{⊤} *K* and updates of the active set. A naive application of this algorithm would scale cubically with the number of spikes. Instead, we exploit the locality of the problem (the fact that changing a spike height at time *t* does not affect the solution at very distant times *s*) and apply the NNLS algorithm in the inner loop of a sequential coordinate block descent method. Specifically, we apply warm-started NNLS on blocks of size Δ (where Δ is the length of the calcium transient), stepping the block in steps of size Δ_{m} < Δ (we found to be effective for offline applications; for online applications Δ_{m} would be set smaller) and applying NNLS while holding the values of *s* outside the block fixed. We further exploit the Toeplitz structure of *K* to precompute the necessary sub-matrices of *K*^{⊤} *K*.

**Algorithm 3** Fast online deconvolution for arbitrary convolution kernels

**Require:** kernel ** h**, regularization parameter λ, window size Δ, shift size Δ

_{m}, data subset

*y*_{t:t+Δ−1}⊂

**at time of reading**

*y*1: initialize *K*_{t,u} ← *h*_{1+t − u} for 1 ≤ *u* ≤ *t* ≤ Δ, ** y** ←

**−**

*y**λK*

^{−⊤}

**1**,

*A*←

*K*

^{⊤}

*K*,

*t*← 1

2: **while** *t* + Δ ≤ *T* **do**

3: NNLS ▹classic NNLS on , but warm-started^{†}

4: ▹peel off contribution of previous activity

5: *t* ← *t* + Δ_{m}

6: NNLS ▹robustness to

7: **return**

^{†}The function NNLS implements a minor variation of the classic algorithm of [38] to solve : *K*^{⊤} *K* and *K*^{⊤} ** y** are precomputed outside the function, to exploit that NNLS is called several times with the same

*K*. Further is warm-started instead of initializing it as

**0**.

The resulting algorithm (Alg 3) runs in *O*(*T*) time. It involves solving a least squares problem for the time points within the considered window where ; thus it scales cubically with the number of spikes per window and depends on the sparsity of . (In fact, for AR(*p*) models, the required matrix solves can be performed using linear-time (not cubic-time) Kalman filter-smoother methods, but the matrix sizes were sufficiently small in the examples examined here that the Kalman implementation was not necessary.) Further speedups can be obtained by restricting the set of possible spike times, for example, by running the AR(1) version of OASIS on a temporally decimated version of the signal to crudely identify the set of spike times, then never updating away from zero on the complement of this set.

To summarize, we describe in Algorithm 4 how the algorithmic variants introduced here are combined into a final full algorithm that includes hyperparameter optimization, the variants for AR(1) or AR(2), and soft (*ℓ*_{1} penalty) or hard shrinkage (*ℓ*_{0} penalty).

**Algorithm 4** Full algorithm with hyperparameter optimization

**Require:** data ** y**, order

*p*of the AR-process, sparsity norm

*q*

1: initialize

2: AR parameters using autocorrelation of *y*

3: noise level using PSD of *y*

4: background using percentile of *y*

5: dual variable λ ← 0

6: temporally decimate batch of ** y** ▹for faster hyperparameter optimization

7: rescale hyperparamaters due to decimation

8: **while** hyperparamaters not converged **do** ▹optimize hyperparameters, cf. Fig 3

9: Run warm-started Alg 2 on with current hyperparameters

10: Update hyperparameters ▹Eqs (19, 23 and 24)

11: **if** *q* = 0 **then** determine *s*_{min} ▹Sec. ‘Hard shrinkage and *ℓ*_{0} penalty’

12: rescale hyperparamaters using the inverse transformations of line 7

13: run Alg 2 on full data *y*

14: **if** *p* = 1 **then**

15: **return**

16: **else**

17: run warm-started Alg 3 on full data *y*

18: **return**

## Results

### Benchmarking OASIS

We generated datasets of 20 fluorescence traces each for *p* = 1 and 2 with a duration of 100 s at a framerate of 30 Hz, such that *T* = 3,000 frames. The spiking signal came from a homogeneous Poisson process. We used *γ* = 0.95, *σ* = 0.3 for the AR(1) model and *γ*_{1} = 1.7, *γ*_{2} = −0.712, *σ* = 1 for the AR(2) model. Fig 4A–4C are reassuring that our suggested (dual) active set method yields indeed the same results as other convex solvers for an AR(1) process and that spikes are extracted well. For an AR(2) process OASIS is greedy and yields good results that are similar to the one obtained with convex solvers (lower panels in Fig 4B and 4C), with virtually identical denoised fluorescence traces (upper panels).

**(A)** Raw and inferred traces for simulated AR(1) data, **(B)** simulated AR(2) and **(C)** real data from [36] fitted with an AR(2) model. OASIS solves Eq (3) exactly for AR(1) and just approximately for AR(2) processes, nevertheless well extracting spikes. **(D)** Computation time for simulated AR(1) data with given λ (blue circles, Eq 3) or inference with hard noise constraint (green x, Eq 15). GUROBI failed on the noise constrained problem. The inset shows the same data in logarithmic scale. **(E)** Computation time for simulated AR(2) data. **(F)** Normalized computation time of OASIS for simulated AR(1) data with given λ (blue circles, Eq 3) and inference with hard noise constraint on full data (green x, Eq 15) or small initial batch followed by processing in online mode (orange crosses).

Fig 4D and 4E report the computation time (±SEM) averaged over all 20 traces and ten runs per trace on a MacBook Pro with Intel Core i5 2.7 GHz CPU. We compared the run time of our algorithm to a variety of state of the art convex solvers that can all be conveniently called from the convex optimization toolbox CVXPY [39]: embedded conic solver (ECOS, [40]), MOSEK [41], splitting conic solver (SCS, [42]) and GUROBI [43]. ECOS and MOSEK are the most competitive methods; these are interior-point methods that cannot use warm starts. With a known sparsity parameter λ (Eq 3), OASIS is about two magnitudes faster than any other method for an AR(1) process (Fig 4D, blue disks) and more than one magnitude for an AR(2) process (Fig 4E). Whereas several of the other solvers take almost the same time for the noise constrained problem (Eq 15, Fig 4D and 4E, green x), our method takes about three times longer to find the value of the dual variable λ compared to the formulation where the residual is part of the objective; nevertheless it still outperforms the other algorithms by a huge margin.

We also ran the algorithms on longer traces up to a length of *T* = 300,000 frames (Fig 4F), confirming that OASIS scales linearly with *T*, where we obtained a proportionality constant of 1 *μ*s/frame. For an unknown hyperparameter λ we obtained its value not only on the full data but on an initial small batch (1,000 frames) and kept it fixed, which sped activity inference up by a factor of three once *T* is sufficiently large (Fig 4F, orange vs green) without compromising quality (correlation between *deconvolved* activity and ground truth spike train 0.882 ± 0.001 vs 0.881 ± 0.002 for *T* = 300,000). Our active set method maintained its lead by 1–2 orders of magnitude in computing time. Further, compared to our active set method the other algorithms required at least an order of magnitude more RAM, confirming that OASIS is not only faster but much more memory efficient. Indeed, because OASIS can run in online mode the memory footprint can be *O*(1), instead of *O*(*T*).

We verified these results on real data as well. Running OASIS with the hard noise constraint and *p* = 2 on the GCaMP6s dataset of 14,400 frames collected at 60 Hz from [36, 44] took 0.101±0.005 s per trace, whereas the fastest other methods required 2.37±0.12 s. Fig 4C shows the real data together with the inferred denoised and deconvolved traces as well as the true spike times, which were obtained by simultaneous electrophysiological recordings [36].

We also extracted each neuron’s fluorescence activity using CNMF from an unpublished whole-brain zebrafish imaging dataset from the M. Ahrens lab. Running OASIS with hard noise constraint and *p* = 1 (chosen because the calcium onset was fast compared to the acquisition rate of 2 Hz) on 10,000 traces out of a total of 91,478 suspected neurons took 81.5 s whereas ECOS, the fastest competitor, needed 2,818.1 s. For all neurons we would hence expect 745 s for OASIS, which is below the 1,500 s recording duration (3,000 frames), and over 25,780 s for ECOS and other candidates.

OASIS solves the non-negative deconvolution problem exactly for an AR(1) process; however, as discussed above, for *p* > 1 the solution is only a good (greedy) approximation. To obtain the exact solution we ran the ONNLS algorithm on the simulated AR(2) traces using a window size of 200 frames, which was about ten times larger than the fluorescence decay time, and shifting the window by 100 frames. We obtained higher accuracy results than all the state of the art convex solvers we compared to, requiring merely 27.8±0.4 ms per trace for λ = 0 and 20.0±0.4 ms per trace for λ = 30, the value that ensures that the hard noise constraint is tight. The choice of λ regulated the sparsity of the solution, which affects the run time of ONNLS. The fastest state of the art convex solver (ECOS) required 305±9 ms and was thus an order of magnitude slower. It took merely 8.56±0.04 ms to obtain an approximate greedy solution using OASIS, independent of the choice of sparsity parameter λ. Though obtaining the exact solution requires more computing time, it is well within the same order of magnitude. In contrast, running batch NNLS was significantly slower, requiring 2,430±53 ms for λ = 0 and 1,620±37 ms for λ = 30. Solving the noise constrained problem by iterating warm-started ONNLS to obtain the corresponding value of the dual variable λ took 73±1 ms. However, we can improve on that by first running the fast but (for *p* > 1) approximate dual method to obtain a good estimate of λ as well as *s*, and then switching to the slower but exact primal method. Running OASIS and executing warm-started ONNLS just once required collectively merely 23±1 ms, similarly to cold-started ONNLS with given λ. Running ONNLS not just once, but until the value of λ has been further tuned such that the noise constraint holds not approximately but exactly, took altogether 31±1 ms.

### Hyperparameter optimization

We have shown that we can solve Eqs (3) and (15) faster than interior point methods. The AR coeffient *γ* was either known or estimated based on the autocorrelation in the above analyses. The latter approach assumes that the spiking signal comes from a homogeneous Poisson process, which does not generally hold for realistic data. Therefore we were interested in optimizing not only the sparsity parameter λ, but also the AR(1) coeffient *γ*. To illustrate the optimization of both, we generated a fluorescence trace with spiking signal from an inhomogeneous Poisson process with sinusoidal instantaneous firing rate (Fig 3). We conservatively initialized to a small value of 0.9. The value obtained based on the autocorrelation was 0.9792 and larger than the true value of 0.95. The left panels in Fig 3B and 3D illustrate the update of λ from the previous value λ^{−} to λ* by solving a quadratic equation analytically (Eq 18) and the update of by numerical minimization of a high order polynomial respectively. Note that after merely one iteration (Fig 3E) a good solution is obtained and after three iterations the solution is virtually identical to the one obtained when the true value of *γ* has been provided (Fig 3F). This holds not only visually, but also when judged by the correlation between *deconvolved* activity and ground truth spike train, which was 0.869 compared to merely 0.773 if was obtained based on the autocorrelation. The optimization was robust to the initial value of , as long as it was positive and not, or only marginally, greater than the true value. The value obtained based on the autocorrelation was considerably greater and partitioned the time series into pools in a way that missed entire spikes.

After illustrating the hyperparameter optimization we next quantify the computing time and quality of spike inference for various optimization scenarios. We generated 20 fluorescence traces with sinusoidal instantaneous firing rate as used in the illustration (Fig 3), again having a duration of 100 s at a framerate of 30 Hz, such that *T* = 3,000 frames, however we offset the data by an additional positive baseline *b* that can be present in real data. This baseline can be optimized together with the sparsity parameter λ, as shown in Methods (subsection “Additional baseline”). The fastest deconvolution method is to merely estimate all parameters and run OASIS just once, cf. first row in Table 1 which shows the mean (±SEM) for computing time as well as correlation of the inferred spike train. As a baseline estimate we used the 15% percentile of the fluorescence trace. The sparsity penalty was set to λ = 0. A better choice of λ is actually obtained by optimizing it, such that the hard noise constraint holds, cf. second row in Table 1. The next rows show that optimizing *b* further improves the result, as does adding *γ*. However, the increased number of optimized parameters results in extra computational cost. The computation time can be reduced by estimating *γ* not using the full data but only a limited number of pools, which does not affect the quality of the result, cf. row five and six in Table 1. Note that by restricting the optimization to a fixed number of pools, its computational load does not increase with the duration of the recording, hence the gain would be even more dramatic for longer time series. Further speed ups are obtained by estimating the parameters on a decimated version of the data, as the last rows in Table 1 illustrate. Here we decimated the fluorescence traces by a factor of ten, without harming the inference quality.

### Hard thresholding

OASIS solves a LASSO problem resulting in soft shrinkage. The deconvolved trace typically has values smaller than 1 and often shows “partial spikes” in neighboring bins reflecting the uncertainty regarding the exact position of a spike, cf. Fig 4. While this information can be useful, one sometimes wants to merely commit to one event within a time bin instead and get rid of remaining small values in . We ran a slightly modified version of the algorithm that replaces the sparsity penalty by a constraint on the minimal spike size *s*_{min}, yielding sparser solutions but rendering the problem non-convex. Although we are not guaranteed to find the global minimum, we obtained good results, cf. Fig 5. To quantify directly the similarity between the inferred deconvolved trace and ground truth spike train we calculated the correlation between the two. The best results were obtained for *s*_{min} = 0.5 yielding correlation 0.899 ± 0.009 with the true spike train compared to 0.879 ± 0.006 for the solution of the problem with hard noise constraint (Eq 15). However, in a practical application the scaling factor between calcium fluorescence and a single spike, which is 1 for our simulated data, is often unknown, rendering it impossible to simply set the threshold *s*_{min} to the half of it. Instead, we can vary the threshold until the RSS crosses the threshold *σ*^{2} *T*. The order in which the pools are merged or split matters for this non-convex case and sequentially adding spikes at the highest values of the *ℓ*_{1}-solution yielded the best performance with correlation 0.888 ± 0.007.

**(A)** Inferred trace using L1 penalty (L1, blue) and the thresholded OASIS (Thresh., green). The data (gray) are simulated with AR(1) model. **(B)** Inferred spiking activity. **(C)** The detected events using thresholded OASIS depend on the selection of *s*_{min}. The ground truth is shown in red. **(D,E,F)**, same as **(A,B,C)**, but the data are simulated with AR(2).

Fig 5 also shows results with a constraint on the minimal spike size for an AR(2) process. Adding the constraint helps when pressed for a binary decision whether to assign a spike or not, yielding visually excellent results. However, with a finite rise time of the calcium response the onset detection is notoriously difficult, because for a low threshold there are a lot of false positives due to noise, whereas for a high threshold, closer to the peak of the calcium kernel, the onset has already occurred earlier. Indeed, the greedy method for an AR(2) process tends to register spikes too late, which is further exacerbated when a threshold on the spike size (*s*_{min} = 0.5) is introduced, leading to low values of spike similarity (correlation 0.419 ± 0.016) compared to the solution of basis pursuit denoising (Eq 15) (correlation 0.497 ± 0.013). We can incorporate a correction step that whenever a new spike is added, slightly jitters the previous one and calculates the change in the optimization objective in order to determine the optimal placement of the spike. For simplicity and low computational burden, we restrict the consideration of the changing RSS to the pools prior and after the jittered spike, which improves the spike detection (correlation 0.462 ± 0.015) while only marginally increasing computational cost (from 8.65 ms to 11.65 ms). Further improvements can be obtained by following up with (O)NNLS. The solution obtained by OASIS with threshold on the minimal spike size and jittering can be used to restrict (O)NNLS to have non-zero values only in close proximity to the spikes of the greedily obtained solution. This processing step increased the performance of spike inference to correlation 0.530 ± 0.010, which is better than the already mentioned one obtained for exactly solving the convex problem (Eq 15). Hence, though imposing a minimal spike size renders the problem non-convex, a tractable approximate solution to this problem can improve over the exact solution of the convex basis pursuit denoising problem.

In the AR(2) case the exact solutions (ONNLS with λ or ONNLS with support only in the proximity of the thresholded solution) consistently improved over the faster greedy methods, as measured by spike train correlation. The performance was hardly affected by whether the penalized or the thresholded version was chosen. Spike train correlation harshly penalizes spikes that are detected but at an incorrect time, no matter how close; therefore the activity plots and correlation values convey somewhat complementary information about the quality of the inference. We attribute the performance gap between greedy and exact solutions to greedy methods missing the exact time step more often. However, the optimally attainable time resolution is already limited by low SNR, in particular if the rise time of the calcium indicator is finite. Indeed, being more lenient regarding the exact spike timing we calculated the correlations after convolving the spike trains with a Gaussian with standard deviation of one bin width. The correlation values increased to 0.731 ± 0.008 for the greedy thresholded solution and to 0.800 ± 0.007 if followed up by ONNLS, but did not increase further for wider Gaussian kernels. This indicates that in the considered SNR regime single time bin resolution is out of reach, but spike times can be inferred with an uncertainty of about one time bin width.

### Online spike inference with limited lag

For an exact solution of the non-negative deconvolution problem of an AR(1) process OASIS needs to backtrack to the most recent spike. (For an AR(2) process the solution is greedy and merely approximate. ONNLS yields an exact solution in this case but considers an even wider time window.) Such delays could be too long for some interesting closed loop experiments; therefore we were interested in how well the method performs if backtracking is limited to just a few frames. We varied the lag in the online estimator, i.e. the number of future samples observed before assigning a spike at time zero, for different signal-to-noise ratios (SNR). For each lag we chose the sparsity parameter λ such that the noise constraint was tight. This yielded increasing values of λ for smaller lags, compensating for the fact that limiting backtracking to fewer frames also imposes fewer constraints () on the dynamics. In the case of hard thresholding, better results were obtained with higher *s*_{min} for smaller lags too, in order to avoid that one spike is split in two. We used a hand-chosen value of *s*_{min} = 0.5 + 0.175 *e*^{−τ} where *τ* is the lag, that asymptotically approaches the 0.5 for batch processing. The obtained results are depicted in Fig 6. For realistic SNR (3–5, though [36] report even higher values, cf. Fig 4C) and sample rates (30 Hz), lags of 2–5 yielded virtually the same results as offline estimation. The exact number depends on the noise; however, the main effect of noise was to reduce the optimal performance attainable even with batch processing, as the asymptotic values in Fig 6A and 6B reveal.

**(A,B)** Performance of spike inference as function of lag for various noise levels (i.e., inverse SNR) without (A) and with positive threshold *s*_{min} (B). We used correlation of the inferred spike train as similarity measure and compared to ground truth as well as to the optimally recoverable activity when the lag is unlimited as in offline processing. **(C)** Inferred trace with positive threshold *s*_{min} for increasing lag using the data depicted in Fig 4A with high noise level (*σ* = 0.3). The gray lines indicate the true spike times.

## Discussion

We presented an online active set method for spike inference from calcium imaging data. We assumed that the forward model to generate a fluorescence trace from a spike train is linear-Gaussian. Further development will extend the method to nonlinear models [45] incorporating saturation effects and a noise variance that increases with the mean fluorescence to better resemble the Poissonian statistics of photon counts. In S1 Appendix we already extend our mathematical formulation to include weights for each time point as a first step in this direction.

Our method considered spike inference as a sparse non-negative deconvolution problem. We focused on the formulation that imposes sparsity using an *ℓ*_{1} penalty that renders the problem convex. Using this problem formulation for spike inference has already long standing success within the neuroscientific community. We were able to speed it up by an order of magnitude compared to previously employed interior point methods and derived an algorithm that lends itself to online applications. However, recently several investigators [46–48] have advocated sparser methods, e.g. by using an *ℓ*_{q}-norm with *q* < 1 instead of *q* = 1 [46] or by enforcing refractoriness [47] (see also [13] for some further discussion of sparsening beyond *ℓ*_{1} penalization). They report improved results, however in some cases at the expense of non-convexity, thus losing the guarantee of finding the global optimum. We leave it to future work to incorporate refractoriness into the methods developed here, but we did slightly modify the sparse non-negative deconvolution problem by adding the constraint that positive spikes need to be larger than some minimal value. A minor modification to our algorithm enabled it to find an (approximate) solution of this non-convex problem, which can be marginally better than the solution obtained with *ℓ*_{1} regularizer. The *ℓ*_{1}-penalized solution reflects the uncertainty regarding the exact position of a spike by distributing it as “partial spikes” over neighboring bins. The thresholded solution lets go of this potentially useful information and instead commits to one event within the locally optimal time bin. We leave it up to the user which approach to choose.

### Availability

We provide Python and MATLAB implementations of our algorithm online (https://github.com/j-friedrich/OASIS and linked repositories therein). The code is readily usable on new data and includes example scripts that produce all figures and Table 1 of this article.

Here we focused on temporal data, i.e. noisy neural fluorescence data that has been extracted and demixed from raw pixel data. We further added OASIS as deconvolution subroutine to CaImAn (https://github.com/simonsfoundation/CaImAn) [49], which implements CNMF for simultaneous denoising, deconvolution, and demixing of spatio-temporal calcium imaging data.

## Supporting information

### S1 Appendix. Technical appendix.

The supplementary material includes a naïve isotonic regression algorithm without pooling. We generalize OASIS to the case of weighted regression and provide a mathematical proof for updates according to Eqs (12–14). We further discuss how to account for elevated initial calcium fluorescence levels and provide explicit expressions of the hyperparameter updates for an AR(2) model.

https://doi.org/10.1371/journal.pcbi.1005423.s001

(PDF)

### S1 Video. Illustration of PAVA.

The supplementary video illustrates PAVA. The pool currently under consideration is indicated by the blue crosses. The algorithm sweeps through the time series and enforces the order constraints *x*_{1} ≤ … ≤ *x*_{T}.

https://doi.org/10.1371/journal.pcbi.1005423.s002

(MP4)

### S2 Video. Illustration of OASIS.

The supplementary video illustrates OASIS for an AR(1) process. As in Fig 2, red lines depict true spike times and the shaded background shows how the time points are gathered in pools. The pool currently under consideration is indicated by the blue crosses. The upper panel shows how the calcium fluorescence trace ** c**′ develops while the algorithm runs, cf. Fig 2. The video additionally shows the deconvolved trace

**′ =**

*s**G*′ (Eq 3) in the lower panel. The algorithm sweeps through the time series and enforces the constraint

**c****′ ≥ 0.**

*s*https://doi.org/10.1371/journal.pcbi.1005423.s003

(MP4)

## Acknowledgments

We would like to thank Misha Ahrens and Yu Mu for providing whole-brain imaging data of larval zebrafish. We thank John Cunningham and Eftychios Pnevmatikakis for helpful conversations as well as Scott Linderman and Daniel Soudry for valuable comments on the manuscript.

Part of this work was previously presented at the Thirtieth Annual Conference on Neural Information Processing Systems (NIPS, 2016) [50].

## Author Contributions

**Conceptualization:**JF LP.**Formal analysis:**JF LP PZ.**Funding acquisition:**JF LP.**Investigation:**JF PZ.**Methodology:**JF LP PZ.**Software:**JF PZ.**Supervision:**LP.**Writing – original draft:**JF LP PZ.**Writing – review & editing:**JF LP PZ.

## References

- 1. Grienberger C, Konnerth C. Imaging calcium in neurons. Neuron. 2012;73(5):862–885. pmid:22405199
- 2. Grewe BF, Langer D, Kasper H, Kampa BM, Helmchen F. High-speed in vivo calcium imaging reveals neuronal network activity with near-millisecond precision. Nat Methods. 2010;7(5):399–405. pmid:20400966
- 3.
Yaksi E, Friedrich RW. Reconstruction of firing rate changes across neuronal populations by temporally deconvolved Ca
^{2+}imaging. Nat Methods. 2006;3(5):377–383. pmid:16628208 - 4. Holekamp TF, Turaga D, Holy TE. Fast three-dimensional fluorescence imaging of activity in neural populations by objective-coupled planar illumination microscopy. Neuron. 2008;57(5):661–672. pmid:18341987
- 5. Vogelstein JT, Packer AM, Machado TA, Sippy T, Babadi B, Yuste R, et al. Fast nonnegative deconvolution for spike train inference from population calcium imaging. J Neurophysiol. 2010;104(6):3691–3704. pmid:20554834
- 6. Vogelstein JT, Watson BO, Packer AM, Yuste R, Jedynak B, Paninski L. Spike inference from calcium imaging using sequential Monte Carlo methods. Biophys J. 2009;97(2):636–655. pmid:19619479
- 7.
Pnevmatikakis, EA, Merel, J, Pakman, A, Paninski, L. Bayesian spike inference from calcium imaging data. Asilomar Conference on Signals, Systems and Computers. 2013;.
- 8. Deneux T, Kaszas A, Szalay G, Katona G, Lakner T, Grinvald A, et al. Accurate spike estimation from noisy calcium signals for ultrafast three-dimensional imaging of large neuronal populations in vivo. Nat Commun. 2016;7.
- 9. Sasaki T, Takahashi N, Matsuki N, Ikegaya Y. Fast and accurate detection of action potentials from somatic calcium fluctuations. J Neurophysiol. 2008;100(3):1668–1676. pmid:18596182
- 10. Theis L, Berens P, Froudarakis E, Reimer J, Rosón MR, Baden T, et al. Benchmarking spike rate inference in population calcium imaging. Neuron. 2016;90(3):471–482. pmid:27151639
- 11. Mishchencko Y, Vogelstein JT, Paninski L. A Bayesian approach for inferring neuronal connectivity from calcium fluorescent imaging data. Ann Appl Stat. 2011; p. 1229–1261.
- 12. Picardo MA, Merel J, Katlowitz KA, Vallentin D, Okobi DE, Benezra SE, et al. Population-Level Representation of a Temporal Sequence Underlying Song Production in the Zebra Finch. Neuron. 2016;90(4):866–876. pmid:27196976
- 13. Pnevmatikakis EA, Soudry D, Gao Y, Machado TA, Merel J, Pfau D, et al. Simultaneous Denoising, Deconvolution, and Demixing of Calcium Imaging Data. Neuron. 2016;89(2):285–299. pmid:26774160
- 14. Grosenick L, Marshel JH, Deisseroth K. Closed-loop and activity-guided optogenetic control. Neuron. 2015;86(1):106–139. pmid:25856490
- 15. Rickgauer JP, Deisseroth K, Tank DW. Simultaneous cellular-resolution optical perturbation and imaging of place cell firing fields. Nat Neurosci. 2014;17(12):1816–1824. pmid:25402854
- 16. Packer AM, Russell LE, Dalgleish HW, Häusser M. Simultaneous all-optical manipulation and recording of neural circuit activity with cellular resolution in vivo. Nat Methods. 2015;12(2):140–146. pmid:25532138
- 17. Clancy KB, Koralek AC, Costa RM, Feldman DE, Carmena JM. Volitional modulation of optically recorded calcium signals during neuroprosthetic learning. Nat Neurosci. 2014;17(6):807–809. pmid:24728268
- 18. Lewi J, Butera R, Paninski L. Sequential optimal design of neurophysiology experiments. Neural Comput. 2009;21(3):619–687. pmid:18928364
- 19. Park M, Pillow JW. Bayesian active learning with localized priors for fast receptive field characterization. In: Adv Neural Inf Process Syst; 2012. p. 2348–2356.
- 20. Shababo B, Paige B, Pakman A, Paninski L. Bayesian inference and online experimental design for mapping neural microcircuits. In: Adv Neural Inf Process Syst; 2013. p. 1304–1312.
- 21. Ahrens MB, Orger MB, Robson DN, Li JM, Keller PJ. Whole-brain functional imaging at cellular resolution using light-sheet microscopy. Nat Methods. 2013;10(5):413–420. pmid:23524393
- 22. Vladimirov N, Mu Y, Kawashima T, Bennett DV, Yang CT, Looger LL, et al. Light-sheet functional imaging in fictively behaving zebrafish. Nat Methods. 2014;.
- 23. Potra FA, Wright SJ. Interior-point methods. J Comput Appl Math. 2000;124(1):281–302.
- 24. Ayer M, Brunk HD, Ewing GM, Reid WT, Silverman E. An empirical distribution function for sampling with incomplete information. Ann Math Stat. 1955;26(4):641–647.
- 25.
Barlow RE, Bartholomew DJ, Bremner JM, Brunk HD. Statistical inference under order restrictions: The theory and application of isotonic regression. Wiley New York; 1972.
- 26.
van Eeden C. Testing and Estimating Ordered Parameters of Probability Distributions. PhD thesis, University of Amsterdam; 1958.
- 27. Miles RE. The complete amalgamation into blocks, by weighted means, of a finite set of real numbers. Biometrika. 1959;46(3/4):317–327.
- 28. Mair P, Hornik K, de Leeuw J. Isotone optimization in R: pool-adjacent-violators algorithm (PAVA) and active set methods. J Stat Softw. 2009;32(5):1–24.
- 29. Best MJ, Chakravarti N. Active set algorithms for isotonic regression; a unifying framework. Math Prog. 1990;47(1–3):425–439.
- 30. Grotzinger SJ, Witzgall C. Projections onto order simplexes. Appl Math Optim. 1984;12(1):247–270.
- 31. Podgorski K, Haas K. Fast non-negative temporal deconvolution for laser scanning microscopy. J Biophotonics. 2013;6(2):153–162. pmid:22438321
- 32.
Brent RP. Algorithms for Minimization Without Derivatives. Courier Corporation; 1973.
- 33.
Nocedal J, Wright S. Numerical optimization. Springer Science & Business Media; 2006.
- 34. Donoho DL. De-noising by soft-thresholding. IEEE Trans Inf Theory. 1995;41(3):613–627.
- 35. Mallat SG, Zhang Z. Matching pursuits with time-frequency dictionaries. IEEE Trans Signal Processing. 1993;41(12):3397–3415.
- 36. Chen TW, Wardill T, Sun Y, Pulver SR, Renninger SL, Baohan A, et al. Ultrasensitive fluorescent proteins for imaging neuronal activity. Nature. 2013;499(7458):295–300. pmid:23868258
- 37. Bro R, De Jong S. A fast non-negativity-constrained least squares algorithm. J Chemometrics. 1997;11(5):393–401.
- 38. Lawson CL, Hanson RJ. Solving least squares problems. vol. 15. SIAM; 1995. https://doi.org/ 10.1002/(SICI)1099-128X(199709/10)11:5%3C393∷AID-CEM483%3E3.3.CO;2-C
- 39. Diamond S, Boyd S. CVXPY: A Python-Embedded Modeling Language for Convex Optimization. J Mach Learn Res. 2016;17(83):1–5.
- 40.
Domahidi A, Chu E, Boyd S. ECOS: An SOCP solver for embedded systems. In: European Control Conference (ECC); 2013. p. 3071–3076.
- 41.
Andersen ED, Andersen KD. The MOSEK interior point optimizer for linear programming: an implementation of the homogeneous algorithm. In: High performance optimization. Springer; 2000. p. 197–232.
- 42. O’Donoghue B, Chu E, Parikh N, Boyd S. Conic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding. J Optim Theory Appl. 2016; p. 1–27.
- 43.
Gurobi Optimization Inc. Gurobi Optimizer Reference Manual; 2015. Available from: http://www.gurobi.com.
- 44.
GENIE project, Janelia Research Campus, HHMI; Karel Svoboda (contact). Simultaneous imaging and loose-seal cell-attached electrical recordings from neurons expressing a variety of genetically encoded calcium indicators. CRCNS.org; 2015. Available from: http://dx.doi.org/10.6080/K02R3PMN.
- 45.
Pologruto TA, Yasuda R, Svoboda K. Monitoring neural activity and [Ca
^{2+}] with genetically encoded Ca^{2+}indicators. J Neurosci. 2004;24(43):9572–9579. pmid:15509744 - 46. Quan T, Lv X, Liu X, Zeng S. Reconstruction of burst activity from calcium imaging of neuronal population via Lq minimization and interval screening. Biomed Opt Express. 2016;7(6):2103–2117. pmid:27375930
- 47.
Dyer EL, Studer C, Robinson JT, Baraniuk RG. A robust and efficient method to recover neural events from noisy and corrupted data. In: Int IEEE/EMBS Conf Neural Eng (NER). 2013; p. 593–596.
- 48.
Pachitariu M, Stringer C, Schröder S, Dipoppa M, Rossi LF, Carandini M, et al. Suite2p: beyond 10,000 neurons with standard two-photon microscopy. bioRxiv. 2016.
- 49.
Giovannucci A, Friedrich J, Deverett B, Staneva V, Chklovskii D, Pnevmatikakis E. CaImAn: An open source toolbox for large scale calcium imaging data analysis on standalone machines. Cosyne Abstracts 2017, Salt Lake City USA.
- 50. Friedrich J, Paninski L. Fast Active Set Methods for Online Spike Inference from Calcium Imaging. In: Adv Neural Inf Process Syst; 2016. p. 1984–1992.