Deconvolution of calcium imaging data using marked point processes

Calcium imaging has been widely used for measuring spiking activities of neurons. When using calcium imaging, we need to extract summarized information from the raw movie beforehand. Recent studies have used matrix deconvolution for this preprocessing. However, such an approach can neither directly estimate the generative mechanism of spike trains nor use stimulus information that has a strong influence on neural activities. Here, we propose a new deconvolution method for calcium imaging using marked point processes. We consider that the observed movie is generated from a probabilistic model with marked point processes as hidden variables, and we calculate the posterior of these variables using a variational inference approach. Our method can simultaneously estimate various kinds of information, such as cell shape, spike occurrence time, and tuning curve. We apply our method to simulated and experimental data to verify its performance.


Introduction
In recent years, many researchers have used calcium imaging to measure spiking activities in large neuron populations. Calcium imaging is an imaging technique that observes calcium ion concentrations inside neurons as a movie. This technique enables researchers to obtain various types of information about the neurons that cannot be obtained by the potential measurement approach.
Although calcium imaging offers a considerable amount of information, it is difficult to deal with the observed movie directly. Therefore, when using calcium imaging to investigate the structure of the brain system, we need to extract crucial information from the raw movie beforehand. In calcium imaging, for example, the cell shapes, positions, and spiking times of neurons are extracted from the raw movie.
Two statistical problems arise with this preprocessing. The first problem involves detecting the shape of each neuron. When a neuron spikes, the fluorescent value at the pixel contained within this neuron rises sharply. Therefore, to ascertain the firing activity of each neuron, it is necessary to know which pixels belong to a particular neuron. Some previous studies have calculated the summarized image averaged in the time direction and applied an image segmentation method to clarify the locations of the neurons [1,2]. However, each pixel in the movie may be contained in two or more neurons, and the observed calcium ion concentrations at such pixels become the sum of the calcium concentrations for the two neurons. Fig 1A illustrates such overlaps in the spatial domain. If neurons overlap each other, we need to deconvolve such overlaps to extract the calcium fluorescent dynamics of each neuron.
The second problem involves detecting the spike occurrence time. Calcium concentration is observed as a continuous time series that sharply increases when the neuron fires and subsequently slowly decays exponentially with a small time constant. If multiple spikes occur within a short time, slow decay of calcium fluorescence may cause these firings to overlap in the temporal domain in the movie. Fig 1B illustrates the overlap in the temporal domain. Therefore, to extract a spike train from the movie, we must also deconvolve calcium traces in the temporal domain. Previous studies provide several types of solutions for such deconvolution problems Calcium fluorescence observed at pixels W and X contained in the same neuron shows similar behavior, which differs from the calcium fluorescence observed at pixel Y contained in another neuron. The calcium fluorescence observed at pixel Z is the sum of these series when pixel Z is contained by both neurons. (B) Calcium fluorescence overlap in the temporal domain. Calcium fluorescence observed at each pixel is the convolution of calcium ion dynamics into a spike train. If two spikes occur in a short time relative to the time constant of these dynamics, increases in calcium ion concentration due to these spikes will overlap each other.
To solve these two problems simultaneously, recent studies have proposed a matrix deconvolution approach [8,9,10]. In this approach, the observed movie is converted into a matrix form, and this matrix is approximated by the product of two different matrices: one contains the cell shapes in its columns and the other contains the calcium fluorescent series in its rows. Initially, [8] proposed a deconvolution method using independent component analysis and succeeded in deconvolving a calcium imaging movie in both the temporal domain and the spatial domain simultaneously. However, because independent components analysis is a linear decomposition, it cannot deal with the nonlinearity of a calcium imaging movie. To address such nonlinearity, [9] introduced nonnegative matrix factorization to this deconvolution. Nonnegative matrix factorization can perform nonlinear decomposition, thereby allowing the effective separation of neuron overlapping. Recently, [10] used constrained nonnegative matrix factorization to incorporate the nature of calcium ion dynamics.
The matrix deconvolution approach deconvolves the calcium imaging movie simultaneously in both the temporal and spatial domains, and it can dramatically improve the deconvolution accuracy. However, such an approach still suffers from several problems. First, it cannot detect spike trains as 0-1 sequences. Using matrix deconvolution, we solve the optimization problem to obtain two matrices whose product approximates the movie matrix well. In the optimization step, we relax the 0-1 condition imposed on the time series expressing spiking activities. Consequently, to estimate spike occurrence time from the deconvolution results, we must perform an additional threshold procedure, which may cause either overestimation or underestimation of the number of spikes. Second, even if there are covariates that may influence spiking activities, we cannot incorporate this information in the deconvolution procedure. This affects the estimation of the tuning curve, which is the function of the firing rate with respect to an external stimulus.
Separating spike assignment and subsequent estimation as distinct steps may cause errors and loss of information. This problem has also been reported in neural decoding research. When applying neural decoding, we need to create spike trains from the measured potential beforehand by using a preprocessing procedure called spike sorting. Spike sorting assigns spikes to neurons by applying clustering methods to the waveform information of spikes. This preprocessing generally does not use stimulus information in this spike assignment, and it causes bias and a loss of information for tuning curve estimation [11]. To avoid such information loss, clusterless decoding methods using marked point processes are proposed [12,13,14]. In this approach, spike sequences and the attached waveform information are expressed as marked sequences, and the distribution of the marked sequences is directly estimated. Clusterless decoding enables us to avoid performing spike sorting and improves the decoding accuracy.
Such a problem due to two-step analysis also occurs in calcium demixing. Even if the aim is to investigate the relation between spike occurrence and other simultaneously measured covariates, the existing approaches ignore this covariate information, which may be most effective for extracting spiking activities from the raw movie. Consider two overlapping neurons that are tuned to an external stimulus, and their receptive fields are isolated. Then, spikes emitted from these two neurons can be easily assigned to true components by observing the stimulus value at each spike occurrence time. However, the existing approaches do not use this relation, resulting in the misassignment of the spikes. Moreover, existing approaches assume that the spikes occur at a constant firing rate along the time axis even though the analysis that follows is based on the existence of the tuning curves. Such model misspecification leads to bias in detecting the spikes and cell shapes in the demixing phase, and this bias also affects the subsequent analysis. Therefore, the simultaneous extraction of the spikes and estimation of the tuning curves has various advantages in calcium imaging analysis as well.
In this paper, we propose a new deconvolution method using a probabilistic generative model. We consider that the calcium imaging movie is generated from a stochastic process that has marked point processes as hidden variables and obtains various types of information as a posterior of the hidden variables using variational Bayesian inference. The marked point process is a stochastic process used to express a random series of events with characteristic values called marks. We express the injected calcium added to the pixels in the imaging movie when a spike occurs as vectors, and call these vectors as marks. Our model expresses the distribution of the sequences of the marks as the marked point process. Reflecting the dependent structure between the spiking activities and the stimulus change in the definition of the marked point processes, we simultaneously perform calcium demixing and tuning curve estimation. This generative modeling avoids a two-step analysis and reduces the error and loss of information that occurred in previous approaches.
The remainder of this paper is organized as follows: In Results, we first apply our method to the simulated data and verify its performance. We also apply our method to a measured calcium imaging movie to extract the place cell spiking activities. In Discussion, we discuss our method. We provide the details of our generative model and the estimation procedure in Methods and S1 Appendix.

Simulation study
In this section, we apply our method and the existing approach to artificially generated data and compare their performance. The details of the mathematical terms that appear in this section are explained in Methods.
First, we generated the data using a model. Consider that the activities of nine neurons are observed simultaneously by calcium imaging and that each neuron emits spikes corresponding to external stimulus x t 2 R. Let κ be a vectorization of the fluorescence image, and assume that the marked spike sequence of the k-th neuron is generated from the marked inhomogeneous Poisson process modulated by x t whose intensity is Here, N(�j�) is a Gaussian kernel, λ k is a mean rate, m k k is the mean vector of the density that explains the fluorescent image fluctuation of this neuron, m x k is a stimulus value to which this neuron reacts most extremely, and L k k ; t x k are the scale parameters for κ and x, respectively. Then, we assumed that the spike train of each neuron is generated independently, given x t . This means that the overall intensity function is expressed as Under such an assumption, we generated a marked spike sequence from this marked point process. Given this sequence, we generated a movie that lasted for 150 s including jumps from the state space model, which is defined as (1) and (2). We omit the details of the other hyperparameters' settings in this paper. A correlation image calculated from the generated movie and the external stimulus used for the data generation are shown in Fig 2. Each pixel of the correlation image contains correlation coefficients with neighboring pixels. It indicates which parts of the movie change simultaneously so that we can roughly estimate the spatial locations of the neurons from it.
Next, we applied our method to the generated movie. For kernel function f x (x j θ x ) that expresses the tuning curve of each cell, we selected the univariate Gaussian kernel and set the dimension of hidden state d κ as 40 and the number of mixtures K as 20. In the estimation step, we initialized the distribution of the hidden variables and the hyperparameters, as described in S1 Appendix. Then, we updated these distributions and hyperparameters for 10 iterations and obtained the variational posteriorq. Using the calculated variational posterior, we found the variational expectation of the hidden variables to obtain the deconvolution results.
To compare the performance of our method with the existing approach, we also applied constrained nonnegative matrix factorization proposed by [10]. Hereafter, we denote this method as CNMF. Since CNMF estimates the spike sequence as the real value sequence and not as the 0-1 sequence, we decided a threshold value for each neuron and detected the spike occurrence time from these estimated spike sequences using this threshold. Specifically, we calculated distances d r , r = 1, . . ., R from the estimated spike sequences s r 2 R; r ¼ 1; . . . ; R as wherem is a sample mean ands is a sample variance calculated from s r . Using these distances, we considered that spike occurred at r if d 2 r is larger than the upper 0.1% point of the chisquared distribution with one degree of freedom. Then, we assumed that the detected spike train of the k-th component is generated from the unmarked inhomogeneous Poisson process modulated by x t . Under such a Poisson assumption, the intensity function can be expressed as a function of x t ; this function is called the tuning curve in neuroscience. We assumed that the tuning curve of the k-th component is expressed as and estimated the parameters ðl k ; m x k ; t x k Þ; k ¼ 1; . . . ; K using the maximum likelihood estimation. For details of the tuning curve estimation, see [15], for example.
The estimation results of our method and CNMF are shown in Figs 3 and 4. Our method detected 9 neurons while CNMF detected 12 neurons, and this result shows that our method was able to estimate the true number of neurons, whereas CNMF overestimated this number. Although CNMF contains heuristic procedures such as merging and discarding components, to detect the number of neurons, it cannot adjust the model complexity along the minimization of their loss function. By contrast, our method uses the weighted gamma process as a prior and can perform model selection automatically in the estimation step. Figs 3 and 4 also show that our method estimated the cell shape and the tuning curve of each neuron accurately. Conversely, CNMF underestimated the tuning curve for some neurons. Adjusting threshold values may reduce such estimation errors; however, it is difficult to determine these values properly beforehand. Specifically, components 4A, 4B, and 4C in Fig 4 are separated into two different components in CNMF. If CNMF can incorporate the covariate information into calcium demixing, then it would integrate these components into one component by observing similar tuning profiles. This result shows the advantage of using the tuning curve at demixing phase.
To compare the performance quantitatively in terms of spike detection and regions of interest (ROI) detection, we calculated the F-measure to evaluate the performance of binary

PLOS COMPUTATIONAL BIOLOGY
classification. It is defined as the harmonic mean of precision and recall: The mean values of the F-measures in terms of spike detection were 0.998 (our model) and 0.808 (CNMF) with β 2 = 0.3. On the other hand, the mean values of the F-measures in terms of ROI detection were 0.985 (our model) and 0.503 (CNMF) with the same β 2 value. These values show that our model can detect both spike times and cell shapes better than CNMF.
We also compared the two methods in terms of tuning curve estimation. In this comparison, we ignored the marks and assumed that the spike train was generated from one neural population modulated by x t . Here, the tuning curve of this neuron population corresponds to the summation of all estimated tuning curves. Under this assumption, we calculated the likelihood, given the ground truth spikes. Then, the log likelihood of our model and CNMF was 92.88 and 85.16, respectively, indicating that our model provided better tuning curve estimates than CNMF.

Application to the experimental dataset
In this section, we apply our method to the dataset published by [16,17,18]. The authors investigated place cell dynamics of hippocampus CA1 between wild-type and Df(16)A +/− mice, an animal model of the 22q11.2 deletion syndrome, by using two-photon Ca 2+ imaging with GCaMP6f. In their experiment, mice on a treadmill were imposed to a head-fixed goal oriented task. The treadmill was composed of a 2 m-long belt consisting of three different fabrics and six different tactile cues. The mice searched for hidden rewards on the treadmill using these cues as clues. After the recording, the authors obtained the ROI from the observed movie by drawing GCaMP6f-labeled somata manually. The number of detected cells per wild-type mouse was 463 ± 37 (mean ± std, n = 6).
One imaged session of a wild-type mouse is used in our paper. Its frame rate was 7.5 Hz. The length of the movie was 2250 frames, the size of the one frame was 498 × 490 pixels, and there were 1.7007 pixels per micron. For the experimental details, see [17].
The size of the movie in this dataset was approximately 500 × 500, and it was too large for our method. Therefore, we divided the raw movie into 25 patches, each sized approximately 100 × 100 (d y = 10000), and applied our method to each patch independently. In this section, we show the result for one of these patches. A correlation image calculated from this patch and the mouse's position on the treadmill during the task are shown in Fig 5. A place cell has a place field, and it fires when the animal passes through its place field [19]. Hence, we assumed that the position of the mouse on the treadmill is external stimulus x t and that it affects the spiking activities of the neurons. We regarded this treadmill as a unit circle in R 2 and defined the external stimulus x t by the angle of this circle with range (−π, π). For the kernel function f x (x j θ x ), we selected the von Mises kernel where μ x and τ x are the center location and the scale parameter, respectively, of its place field. We also set d κ = 40 and K = 30.
In the estimation step, we initialized the distribution of the hidden variables and the hyperparameters as described in S1 Appendix. Then, we updated the distribution and the hyperparameters for 10 iterations and obtained the variational posteriorq. Using this variational posteriorq, we calculated the variational expectation of hidden variables to obtain the deconvolution results.
To compare performance, we also applied CNMF proposed by [10]. Similar to Simulation study, we decided threshold values to detect spikes from the deconvolution results of CNMF. Then, we assumed that the detected spike train of the k-th component is generated from an unmarked point process whose intensity is The estimation results of our method and CNMF are shown in Figs 6, 7 and 8. In order to evaluate our model performance in a quantitative manner, we manually detected the ROI of the neurons located in the raw movie. These ground truths of the ROIs and the corresponding calcium traces calculated by these ROIs are also shown in these figures. Note that the calcium traces shown in these figures are equal to the mean calcium values within each ROI and they do not necessarily reflect true calcium traces; they contain calcium fluctuations arising from other overlapping neurons and background noise. Our method detected 11 components and CNMF detected 17 components. From this point on, we denote the estimated component whose label is (A) in Fig 6 as Neuron 6A. Fig 6 shows that the spike trains obtained by our method and the calcium fluorescence obtained by CNMF are consistent for the neurons whose cell shapes are similar. For example, the spikes estimated by both methods for Neuron 6A are almost the same. This observation shows that our method provides estimations consistent with the existing approach.
Next, we compare each component estimated by the two methods in detail. Initially, we focus on the difference in terms of the estimated cell shapes between our method and CNMF. The estimated cell shapes for Neuron 7A are totally different. This is due to the regularization constraints imposed in CNMF. Because CNMF imposes regularization constraints to the estimated cell shape as being localized, it may fail to estimate neurons that are widely spread. As our method does not impose such a requirement, it can detect neurons whose cell shape is widely spread on the microscope plane.
The cell shape estimated by our method for Neuron 7B contained Neuron 8A, whereas CNMF correctly divided them into two components. This is because our model does not use any spatial information; the cell has a localized shape and is not divided into two or more parts separately. In CNMF, the initial value for the shape of the neuron is defined as being localized. Therefore, two neurons are regarded as different components as long as the their positions are isolated. We can introduce such spatial assumptions into our model as regularization or heuristics; however, it may result in miss-detection for a wide spread neurons. For instance, if we

PLOS COMPUTATIONAL BIOLOGY
define the limit size of the cell shape beforehand, we may fail to detect components such as Neuron 7A. If Neurons 7B and 8A have different tunings to the stimulus, our method would be able to distinguish them. However, these neurons fired simultaneously and had similar tuning to the covariates within the observation interval. The other reason for the result is that these neurons had similar calcium fluorescence. This observation indicates that these neurons may share another relation behind and should not be separated.
Neuron 7C, estimated as a single neuron by our method, was divided into two components in CNMF, and Neuron 7D was also divided into two components. Such differences arise from the definitions of the calcium footprint in the two methods. Our model assumes the calcium footprint to be a random variable and thus allows random fluctuations to a certain extent. On the other hand, CNMF expresses the footprints as deterministic vectors and divides the PLOS COMPUTATIONAL BIOLOGY random fluctuation of one neuron activity into two or more components. The assumption of the tuning curve existences in our model also works as a regularization for merging components with similar tuning profiles into one component.
Neurons 8B and 8C detected by CNMF were not detected by our method. These components had a common feature: the estimated calcium fluorescences of these components were noisy. Because of the high fluctuation of the calcium fluorescence, the number of spikes of these components was larger than that of the other components, which seems to be unreliable. Our method discards components whose calcium fluorescence cannot be approximated by convolution exponential decay with the spike sequence.
Neurons 8D and 8E were not detected by both methods. This is because the activities of these neurons were too small within the observed interval. Both methods considered these components as background noise fluctuation.

PLOS COMPUTATIONAL BIOLOGY
To compare the performance in terms of ROI detection, we calculated the F-measures of the two methods. We first detected the ROI of each neuron manually from the correlation image shown in Fig 5. The mean values of the F-measures are 0.578 (our model) and 0.501 (CNMF) with β 2 = 0.3. This result also shows the advantage of not using spatial information in our model.
Next, we compare the results in terms of spike detection and tuning curve estimation. On the whole, the tuning curves estimated by our method are sharper than those estimated by CNMF. This difference is due to the assumption of tuning curve existence at the demixing phase. CNMF assumes that the spikes are generated at a constant firing rate along the time axis, and thus, it tends to express the background fluctuation in the interval where true spikes do not occur as false positive spikes. For example, for Neuron 7D, the estimated spikes and tuning curves of the two methods are different. The calcium traces obtained using ground truth increase when the mouse passes through a specific area, and our method can express this response of the neuron to the mouse's movement. On the other hand, CNMF treats background fluctuations as spikes and estimates the tuning curve as a more widely spread shape.

Discussion
When using calcium imaging to measure the spiking activities of neurons, we need to extract the summarized information from the raw movie in advance. Recent studies have used the matrix deconvolution approach for this preprocessing; however, such an approach ignores the spiking nature of the neurons, and it cannot use the covariates information, which may have a great influence on the spiking activities.
To solve such problems, we proposed a new deconvolution method using a probabilistic approach. In our method, we assume that a calcium imaging movie is generated from a generative model with marked point processes as hidden variables, and we calculate the posterior using a variational inference approach. Our method detects cell shapes and spike times, and estimates tuning curves simultaneously, which reduces estimation biases and loss of information that occurs in the conventional two-step analysis. By applying our method to simulated and experimental data, we showed that it can estimate various types of information simultaneously from a raw movie.
Although our method provides a new framework for calcium fluorescence deconvolution, several problems exist at this point. Hereafter, we discuss the disadvantages of our method and suggest future extensions of this work.
Our model assumes that spike trains of different neurons occur independently, given external covariates. However, this assumption is quite strong; cortical spike trains generated by different neurons have some positive or negative correlations. Our model ignores such correlations as introducing such structure into the generative model makes estimations difficult. We aim to improve the model via the inclusion of such correlations in the future.
Our method approximates the true posterior using variational inference, and the variational posterior obtained by our method may differ from the true posterior. If this difference is not negligible, the estimation results may be unreliable. To obtain a good solution, we must relax the independence requirement imposed on the variational posterior. It is also important to investigate the asymptotic behavior of our method to evaluate the difference between the true posterior and the variational posterior.
Our method can be applied to various kinds of analyses of calcium imaging movies. One example is neural decoding. Decoding an external stimulus from a calcium imaging movie has already been attempted in previous studies [20]. In these studies, spike trains are detected from the calcium imaging movie beforehand using preprocessing procedures, and decoding is treated as another step in this preprocessing. As mentioned in Introduction, separating spike assignment and subsequent estimation as distinct steps causes errors and loss of information. On the other hand, since our method assigns spike trains and estimates the tuning curve simultaneously, it may be able to improve the decoding accuracy. Moreover, our method will enable us to decode stimuli from a calcium imaging movie in an online manner because it provides the probability distribution for the calcium imaging movie itself and does not require preprocessing beforehand.

Methods
We focus on neuron populations whose firing rates change according to an external stimulus and on the measurement of these neurons by calcium imaging. The explanation for our method comprises two parts. First, we explain a probabilistic model that decides the generative mechanism of the calcium imaging movie. Next, we demonstrate the method for estimating the hidden variables of this generative model.

Generative model
Our model comprises three components. The first component explains how calcium fluorescent dynamics construct the observed movie, given the spiking activities of neurons. The second component describes how these spiking activities are generated depending on the external stimulus. The third component specifies a prior for the parameters used in the first and second components.
Box 1 shows the summary of our generative model in mathematical form. The definitions of the distributions and kernels we use in our generative model are also summarized in S1 Appendix. To distinguish the distributions and probability density functions, we express the density functions as shorthand notations.
Calcium fluorescence model. Our generative model assumes that there is a hidden stochastic process that explains the dynamics of the true calcium fluorescence and that the observed movie is the linear transformation of this process. Fig 9 illustrates this component.
We define several variables. Let (0, T] be an observation interval, and consider that the calcium imaging movie is measured in this observation interval at sampling interval Δ.
Let c r 2 R d k ; r ¼ 1; . . . ; R be the values of this hidden process and y r 2 R d y ; r ¼ 1; . . . ; R be a vectorization of each frame of the raw movie. We assume that the movie has a low-rank nature and set the dimension of the hidden process d κ to be a relatively small value compared to the dimension of each frame d y . Suppose that the external stimulus is also simultaneously measured and denoted as x r 2 R d x ; r ¼ 1; . . . ; R. For notational simplicity, we denote these variables as In previous studies [4,6], the calcium fluorescence is expressed as convolution of the calcium ion decay with a spike train, which is considered as the 0-1 sequence. Conversely, in our method, we replace the 0-1 sequence with the marked spike sequence whose mark contains information of a fluorescent footprint emitted by each neuron. Let k r 2 R d k [ f;g; r ¼ 2; . . . ; R be this marked spike sequence; κ r takes a vector of the fluorescent footprint added to the calcium fluorescence c r if a neuron fires between the (r − 1)-th frame and r-th frame; otherwise, it takes ;. We consider κ ¼ fk r g R r¼2 . Given this κ, we assume that the true calcium PLOS COMPUTATIONAL BIOLOGY fluorescence c is distributed to a vector autoregressive model with jumps: where Normal(�) is a Gaussian distribution, F 2 R d k �d k is a transition matrix, V 2 R d k �d k is a transition precision matrix, m c init is an initial state mean, and S c init is an initial state covariance matrix. This assumption can be interpreted as follows. If any spike has not occurred, the calcium fluorescence c r generally stays around zero, and small fluctuations induced by the noise v r distribute to the Gaussian distribution whose precision matrix is V. If a spike has occurred between the (r − 1)-th frame and the r-th frame, then the fluorescent footprint of this spike κ r Box 1: Summary of our generative model • Calcium fluorescence model r ¼ 2; . . . ; R; ( y r ¼ Gc r þ w r ; w r � Normalð0; WÞ; r ¼ 1; . . . ; R; • Spike generation model • Prior • Other variables x is given; ðm c init ; S c init ; F; V; G; W; a 0 ; u x Þ are hyperparameters to be estimated; ðb 0 ; m k ; g k ; n k ; S k Þ are fixed hyperparameters: is added to c r . The instantaneous increase of c r due to this spike gradually decreases with the time constant decided by the transition matrix F.
Next, we consider that the movie y is observed as where G 2 R d y �d k is an observation matrix, and W 2 R d y �d y is an observation precision matrix. The observation matrix G maps the calcium fluorescence c r into the space of the images; Gc r corresponds to the denoised frame of the movie, and Gκ r corresponds to the vectorization of the fluorescent image of each spike. To reduce the computational cost in the estimation step, we restrict the observation precision matrix W as a diagonal matrix. Therefore, the likelihood of κ, given (y, c), is pðy; c j κÞ Nðy r À Gc r j 0; WÞ: Spike generation model. Next, we evaluate how the marked spike sequence is generated, given the external stimulus. In our model, we assume that the marked spike sequence κ is generated from marked point processes. Fig 10 illustrates this component. A marked point process is a stochastic process that expresses a random occurrence of events (for details, see [21]). The probability structure of the point process is decided by a conditional where k 2 R d k is a mark attached to each event, and H t is the history of this process up to t that contains the information about external covariates. Using point process for modeling the series of events, we choose a family of conditional intensity function indexed by parameters and estimate these parameters given the observed data. We consider that the marked spike sequences are generated from a spatio-temporal inhomogeneous Poisson process whose rate depends on the external stimuli x t . This means that the stimulus x, the marked spike sequences κ are generated from the marked point process whose intensity is λ(κ j x, ξ). Each mark in these sequences indicates a fluorescent footprint of a spike emitted by each neuron, and this mark is expressed as a transformed version Gκ r in this figure. Because each neuron in the movie has a characteristic fluorescent footprint and a characteristic receptive field for stimulus x, λ(κ j x, ξ) can be expressed as the multimodal function with peaks corresponding to each neuron. The blue and green components indicate the mean florescent footprints and the receptive fields of two different neurons, respectively. https://doi.org/10.1371/journal.pcbi.1007650.g010

PLOS COMPUTATIONAL BIOLOGY
conditional intensity function is expressed as where λ(κ j x, ξ) is a function defined on the product space R d k � R d x and ξ is a parameter of the function. This λ(κ j x, ξ) indicates how often spikes occur and what kinds of spikes occur depending on the external stimulus x; λ(κ j x, ξ)Δ is the probability of observing a spike whose fluorescent footprint is κ in an interval with length Δ when the stimulus takes x.
What kind of parametric model is suitable for expressing this λ(κ j x, ξ)? As explained in Introduction, when a neuron spikes, the calcium ion concentration inside this neuron rises sharply. Thus, each neuron emits a fluorescent footprint similar to its cell shape. Because we assume that the spiking rate is modulated by an external stimulus, each neuron also has a characteristic receptive field for the stimulus x. Therefore, the contribution of each neuron to λ (κjx, ξ) may become unimodal on the product space R d k � R d x . Hence, λ(κ j x, ξ) can be expressed as a multimodal function of κ and x with peaks corresponding to each neuron. Therefore, we model λ(κ j x, ξ) as a mixture of kernel functions. Let f(κ, x j θ) be a kernel function defined on R d k �d x indexed by θ, and let ξ be a finite measure defined on Θ. We assume that λ(κ j x, ξ) is expressed as the integral f(κ, x j θ) respect to ξ as Let ξ be a discrete measure defined on Θ, let fu k g 1 k¼1 be atoms of this ξ, and let fp k g 1 k¼1 be the weights attached to these atoms. Then, λ(κ j x, ξ) in (3) is also expressed as This expression can be interpreted as follows. An infinite number of neurons are influenced by the external stimulus. When the stimulus takes value x, the k-th neuron emits a spike whose mark is κ in a unit time interval with the probability π k f(κ, x j u k ). Note that the integral of π k f (κ, x j u k ) with respect to κ refers to the firing rate of the k-th neuron when the stimulus takes x, which is the tuning curve of this neuron.
In our model, we assume that f(κ, x j θ) is expressed as the product of two kernels defined on R d k and R d x , respectively, as Here, f κ (κ j θ κ ) is the distribution of the fluorescent footprints emitted by a neuron, f x (x j θ x ) is the normalized tuning curve of the neuron, and θ κ 2 Θ κ and θ x 2 Θ x are the parameters of the kernels. This decomposition means that an external stimulus does not affect the shape of the fluorescent footprint of each spike. This assumption seems to be reasonable. To simplify the estimation procedure, we select a Gaussian kernel for f κ (κ j θ κ ): where N(� j �) is a Gaussian kernel defined on R d k , μ κ is a mean vector, and Λ κ is a precision matrix about κ. For f x (x j θ x ), we select an appropriate kernel according to the type of stimulus we focus on. For example, if the stimulus takes a value on a Euclid space and the receptive field of the neuron has a unimodal shape, then a Gaussian kernel may be suitable. If the stimulus takes a value on the sphere, then the von Mises kernel is more appropriate for this f x (x j θ x ).
For notational simplicity, we use these general notations for f κ and f x in the remainder of this paper. The likelihood of ξ, given κ, is expressed under the notion of the point process. In our model, we use a discretized expression for the point process. Consider that the sampling interval Δ is sufficiently small and assume that each bin ((r − 1)Δ, rΔ], r = 1, . . ., R contains at most one spike. Under this assumption, the probability of observing a spike whose footprint is κ r in ((r − 1)Δ, rΔ] is λ(κ r j x r , ξ)Δ and that of not observing any spike in ((r − 1)Δ, rΔ] is 1 − R λ(κ j x r , ξ)dκ Δ. Multiplying these probabilities for the all bins and approximating the term 1 − R λ(κ j x r , ξ)dκ Δ as we obtain the likelihood of ξ, given κ, as Here, η is an empirical measure of {x t } t2(0,T] approximated by For the sake of notational simplicity, we omit the conditioning by x in the following equation.
Prior. To estimate ξ and other hyperparameters from the data, we take a Bayesian approach. In other words, we set a prior for ξ and calculate a posterior of ξ given the observed data. Specifically, we adopt a mixture of weighted gamma processes as a prior for ξ.
A weighted gamma process is a stochastic process defined on the space of finite discrete measures. A random measure ξ α is said to be distributed according to a gamma process with parameter α if where α is a finite measure on Θ, and Gamma(α(A)) is a gamma distribution with mean and variance α(A). A random measure ξ α,β is said to be distributed according to a weighted gamma process with parameters α and β if where β is a nonnegative function defined on Θ. These processes can be considered as random variables defined on the space MðYÞ that is composed of all finite measures on Θ. The probability measure of ξ α,β defined on MðYÞ is denoted as Gðdx j a; bÞ. It is known that this process can be used as a prior for intensity estimation [22,23,24].
Although a weighted gamma process is also a conjugate prior for ξ in our model, it is difficult to handle the posterior expressed as this process because the weighted gamma process generates a random discrete measure whose support consists of an infinite number of atoms, and we cannot simulate samples from this process on the computer. [24] addressed this problem by using a mixture of gamma processes instead of the weighted gamma process.
In our model, we modify the definition of the mixture of gamma processes in [24]. We define G K ð�Þ as where α 0 > 0 and β 0 > 0 are scalar parameters, are kernel parameters, and H κ is a prior for u k defined by Here, NW(�j�) is a normal-Wishart distribution, with location vector m κ and scale matrix S κ . The parameters γ κ > 0 and ν κ > d κ − 1 are scalars. We formulate u x as hyperparameters. Because the dimension of u x k is relatively small in our problem, this formulation works reasonably well. In the estimation step, we calculate the posterior of u k and also estimate u x in the empirical Bayes approach.

Another expression of the generative model
To construct an efficient estimation procedure, we provide another expression of our generative model. First, we introduce a new latent variable. Because we set the mixture of weighted gamma processes as a prior for ξ, each atom of ξ corresponds to one of ðu k k ; u x k Þ; k ¼ 1; . . . ; K. Here, Fig 11. Prior. Intensity function λ(κ j x, ξ) is expressed as the sum of kernels, and it is indexed by the discrete measure ξ defined on Θ κ × Θ x . This discrete measure ξ consists of atoms fðm k k ; L k k ; y x k Þg and each m k k ; L k k ; y x k corresponds to the mean fluorescent image, the precision matrix of the fluorescent images, and the tuning curve parameter of each neuron. A weight attached to each atom expresses the average firing rate of this neuron. We set the mixture of gamma processes G K ðdx j a 0 ; b 0 ; H; u x Þ as a prior for this ξ. https://doi.org/10.1371/journal.pcbi.1007650.g011

Variational inference
To obtain the approximate posterior, we use a variational inference approach [25,26]. The log marginal likelihood of y can be decomposed as log pðyÞ ¼ LðqÞ þ KLðq jj pÞ; LðqÞ ≔E qðc;z;π;u k Þ log pðy; c; z; π; u k j u x Þ qðc; z; π; u k Þ � � ; KLðq jj pÞ ≔E qðc;z;π;u k Þ log qðc; z; π; u k Þ pðc; z; π; u k j u x ; yÞ � � : Here, q is an arbitrary distribution of ðc; z; π; u k Þ, and KL(q||p) is the Kullback-Leibler divergence between q and the true posterior pðc; z; π; u k j u x ; yÞ. Because the left-hand side of this equation is independent of q, maximization of LðqÞ with respect to q corresponds to minimization of the Kullback-Leibler divergence between q and the true posterior. If q is the true posterior, then LðqÞ takes the maximum value. This LðqÞ is called the evidence lower bound.
However, it is difficult to calculate LðqÞ for an arbitrary q. Therefore, we specify the distribution family Q that makes it is easy to calculate LðqÞ, and find the distribution with the largest LðqÞ among this family. In other words, we solve the constrained functional optimization problem arg max q2Q LðqÞ and consider the solution of this problem as the approximation of the true posterior.
In this paper, we introduce an independence constraint qðc; z; π; u k Þ ¼ qðcÞ qðzÞ qðπÞ qðu k k Þ: Under this constraint, we maximize LðqÞ using coordinate ascent maximization. In the estimation step, we update the posterior of each parameter to the distribution that maximizes LðqÞ, while keeping the distributions of the other variables as fixed. Under our model, the following distribution families satisfy the stationary condition of the functional optimization problem: qðcÞ ¼ Nðc j fm c r g R r¼1 ; fS c r g R r¼1 ; fS c rÀ 1;r g R r¼2 Þ; qðπÞ ¼ Dirðπ j a 1 ; . . . ; a K Þ; NWðm k k ; L k k j m k k ; g k k ; n k k ; S k k Þ: The details of these distributions are shown in S1 Appendix. Therefore, instead of numerically updating the distribution, we can maximize LðqÞ with respect to the parameters of these distribution families.
At each iteration, we update the distribution of each parameter one by one and also maximize LðqÞ with respect to the hyperparameters. We repeat this operation several times and obtain the variational posteriorq. By calculating the variational expectation of the hidden variables using thisq, we obtain the deconvolution result.
Supporting information S1 Appendix. Technical appendix. The supplementary material includes the technical details of our model. We provide a proof for Proposition 2, the updated formulas in the estimation step, implication of the obtained posterior under our model, and some heuristic techniques for obtaining a good posterior approximation. (PDF)