An integrated calcium imaging processing toolbox for the analysis of neuronal population dynamics

The development of new imaging and optogenetics techniques to study the dynamics of large neuronal circuits is generating datasets of unprecedented volume and complexity, demanding the development of appropriate analysis tools. We present a comprehensive computational workflow for the analysis of neuronal population calcium dynamics. The toolbox includes newly developed algorithms and interactive tools for image pre-processing and segmentation, estimation of significant single-neuron single-trial signals, mapping event-related neuronal responses, detection of activity-correlated neuronal clusters, exploration of population dynamics, and analysis of clusters' features against surrogate control datasets. The modules are integrated in a modular and versatile processing pipeline, adaptable to different needs. The clustering module is capable of detecting flexible, dynamically activated neuronal assemblies, consistent with the distributed population coding of the brain. We demonstrate the suitability of the toolbox for a variety of calcium imaging datasets. The toolbox open-source code, a step-by-step tutorial and a case study dataset are available at https://github.com/zebrain-lab/Toolbox-Romano-et-al.

the imaged plane. If cell nuclei are unlabeled, avgImg Norm is converted to its complement (i.e., its negative), converting the unlabeled nuclei into bright spots. The user then has to interactively set two threshold parameters, thr Soma and thr Neuropil , that will result in binary masks that identify cell somata and the regions that surround them (i.e., the neuropil) , Mask Soma and Mask Neuropil , respectively (see left panel in Fig. 5a).
Mask Neuropil is calculated through a simple thresholding procedure on avgImg Norm using thr Neuropil . Mask Soma is calculated through an extended-maxima transform on avgImg Norm , which consists of the regional maxima of the H-maxima transform of avgImg Norm (obtained by suppressing all maxima in avgImg Norm whose height is less than thr Soma ). Then, both Mask Soma and Mask Neuropil are imposed as regional minima to the image complement of avgImg Norm . Finally, a watershed transformation is performed to obtain the ROI perimeters.
The imposition of regional minima in Mask Neuropil prevents the watershed algorithm from finding "catchment basins" (i.e., ROIs) in unwanted regions (i.e., the neuropil). This allows the implementation of the watershed transform in imaged regions where labeled cells are spatially distributed in a sparse or scattered manner.
Finally, the GUI can be used to manually curate ROIs, as previously explained.

Calculation of relative fluorescence variation
The data sanity test discards ROIs whose fluorescence signal is too low and/or presents artifactual fluorescence traces. For the detection of these artifacts, the module first calculates for each ROI a smooth estimate of its slowly varying fluorescence baseline (F smooth ). The latter is calculated using a running-window average of the 8th percentile of the ROI's fluorescence [1], with a time window 40 times larger than the decay time constant of the calcium reporter (t). This procedure results in a F smooth that robustly tracks the ROI's basal fluorescence level, without being affected by the fast activation transients. Thus, if an ROI is associated with a F smooth that shows sudden variations exceeding a user-selected threshold, the ROI is discarded (in practice, this eliminates ROIs of unhealthy neurons or healthy neurons that move in or out of focus during the imaging session). As previously explained, the user can then choose to use F smooth as an estimate of F0 in the calculation of the ∆F/F0.

The module for detection of assemblies
Besides PCA-promax, this module allows to cluster data using k-means and hierarchical clustering algorithms.
Briefly, k-means partitions data in a fixed number of clusters k, defined by the user. It involves randomly selecting k initial centroids and assigning each point (e.g., each neuron) to their closest centroids, thus forming k preliminary clusters. The centroids are then updated according to the points in the clusters, and this process continues until the points stop changing their clusters (i.e., convergence of the centroids). Typically, the clusters defined by k-means are highly independent and uncorrelated (i.e., they present low inter-cluster correlations).
On the other hand, (agglomerative) hierarchical clustering groups data by creating a cluster tree or dendrogram. The tree represents a multilevel hierarchy, where clusters at one level are joined as clusters at the next level. First, the similarity between every pair of variables (e.g., neuronal fluorescence traces) is calculated. Then, these similarities are used to determine the proximity of variables to each other. Variables are successively paired into binary clusters, and newly formed clusters are grouped into larger clusters in a bottom-up manner, until a hierarchical tree is formed. Finally, clustering is performed by determining where to cut the hierarchical tree, and assigning all the objects below each cut to a single cluster.
For both k-means and hierarchical clustering, the user can choose to cluster either the original z-scored dataset or the dimensionality-reduced dataset obtained through PCA. In contrast to PCA-promax, for both kmeans and hierarchical clustering, a number of important parameters have to be set. The user must choose between using a euclidean or a pair-wise correlation metric to calculate distances between variables. For hierarchical clustering only, the user must choose if the distance between two clusters is defined by the shortest or longest distance between two points in each cluster (single and complete linkage clustering, respectively). Being classic methods, the particularities of clustering with these metrics and linkages has been extensively reviewed [2,3]. Finally, to obtain the final clustering for both methods, the user has to set the total number of clusters to look for in the data (k for k-means, and the smallest height at which a horizontal tree cut leaves k clusters for hierarchical clustering).
The assemblies module also calculates the time series of the assemblies' significant activations. For this, it uses a matching index, MI [4][5][6]. The MI is defined as where Pat i is the binary activity pattern of imaging frame i, and Pat j is the binary target pattern of assembly j (i.e., binary N x 1 vectors representing the complete population of N ROIs, with ones indicating active ROIs and zeros indicating those inactive). Norms are equal to the number of ones in each vector. The MI quantifies the proportion of ROI activations that are common to both patterns with respect to the total number of activations present in both patterns. It is valued between 0 (no overlap in activations) and 1 (perfect overlap in activations). To estimate the significance of the assemblies' MIs over the course of the experiment, the algorithm uses the hypergeometric distribution. Under the null hypothesis of independent ROI activations, this is a discrete distribution that describes the probability of having k "hits" with n target ROI activations in a population of N ROIs, showing K activations at a given moment. Therefore, it allows estimation of the probability of observing a given activation match by chance, with ROIs independently activated. In step 47 the user can select the threshold p-value to consider an assembly activation significant.