Density-based clustering: A ‘landscape view’ of multi-channel neural data for inference and dynamic complexity analysis

Two, partially interwoven, hot topics in the analysis and statistical modeling of neural data, are the development of efficient and informative representations of the time series derived from multiple neural recordings, and the extraction of information about the connectivity structure of the underlying neural network from the recorded neural activities. In the present paper we show that state-space clustering can provide an easy and effective option for reducing the dimensionality of multiple neural time series, that it can improve inference of synaptic couplings from neural activities, and that it can also allow the construction of a compact representation of the multi-dimensional dynamics, that easily lends itself to complexity measures. We apply a variant of the ‘mean-shift’ algorithm to perform state-space clustering, and validate it on an Hopfield network in the glassy phase, in which metastable states are largely uncorrelated from memories embedded in the synaptic matrix. In this context, we show that the neural states identified as clusters’ centroids offer a parsimonious parametrization of the synaptic matrix, which allows a significant improvement in inferring the synaptic couplings from the neural activities. Moving to the more realistic case of a multi-modular spiking network, with spike-frequency adaptation inducing history-dependent effects, we propose a procedure inspired by Boltzmann learning, but extending its domain of application, to learn inter-module synaptic couplings so that the spiking network reproduces a prescribed pattern of spatial correlations; we then illustrate, in the spiking network, how clustering is effective in extracting relevant features of the network’s state-space landscape. Finally, we show that the knowledge of the cluster structure allows casting the multi-dimensional neural dynamics in the form of a symbolic dynamics of transitions between clusters; as an illustration of the potential of such reduction, we define and analyze a measure of complexity of the neural time series.


Introduction
Technology nowadays allows neuroscientists to simultaneously record brain activity from increasingly many channels, at multiple scales; indeed, recent years witnessed a kind of 'Moore's Law' for neural recordings [1], and this poses new challenges and opens new opportunities.
One obvious challenge is to devise data representations that easily convey in a compact form the spatio-temporal structure of the recorded data.Various forms of dimensional reduction are now commonly used in the analysis of multiple recordings.In general terms, if one views recorded experimental data as a matrix whose columns are the 'feature vectors' (in the case at hand, the set of recorded activities in a given time bin), and whose rows span the 'sample space' (in the case at hand, the successive time bins), dimensional reduction across the column direction provides a reduced representation in terms of few suitably identified features obtained from the original ones (e.g.principal component analysis); on the other hand, one can view clustering across the rows direction as a way to reduce the dimensionality of the data matrix by lumping together, according to some similarity measure, groups of activity vectors sampled at different times.This latter viewpoint, which we take here, to our knowledge has been much less used in neuroscience.
On the other hand, one recently explored opportunity is to take advantage of multiple recordings to revive old approaches to infer estimates of synaptic couplings from measured correlations between neural activities.Correlations measured from single neuron pairs obviously can only provide ambiguous estimates of the direct synaptic couplings (due to confounding causes like common input to the sampled neurons).However it was noticed in a landmark paper [2] that when many (order 100 for instance) simultaneous recordings are available, even though the underlying biological neural network is still dramatically undersampled, the global pattern of (individually small) pairwise correlations allows to extract meaningful information about the synaptic connectivity.This was achieved by assuming a maximum entropy Ising model, for which an "inverse Ising" problem was solved to infer the parameters (couplings and external input) for the given data.Many efforts were subsequently devoted both to extend the approach to non-equilibrium estimates, and to lighten the computational load of maximum entropy estimates (Boltzmann learning) through various mean-field approximations (see e.g.[3][4] [5]).
In the present work we propose an approach, based on clustering in the multi-dimensional state space of simultaneous recordings, that provides both an advantage for a compact representation of data, also amenable to efficient estimation of the complexity of the system's dynamics, and besides allows to improve inference on the network couplings.
After describing the clustering method (which is a slightly modified version of the 'mean-shift' algorithm [6] [7]), we first illustrate its working on time series generated from the dynamics of a Hopfield network which, since it possesses an energy function, naturally lends itself to density-based clustering in the state space; here we choose a 'hard' regime where the Hopfield network is in a disordered phase and spatio-temporal structures related to the energy landscape are not easily discernible from the time series.At this stage we also formulate a parametrization of the model's synaptic matrix, based on the identified clusters, and show that it allows to obtain an inference of the synaptic couplings which is much more insensitive to noise.
We then move on to the more complex and biologically motivated case of a multi-modular attractor spiking network: its integrate-and-fire neurons are endowed with spike-frequency adaptation (SFA), acting as an activity-dependent self-inhibition that introduces history-dependent effects making the 'landscape' dynamic; modules are approximately bistable between high and low activity states; within-modules connectivity is stronger than between-modules.
Reasons to choose this particular context include recently published evidence [8] [9] that cortical dynamics can occur in the form of abrupt switches, between an 'Up' and a 'Down' states, of local self-excited modules, such that the whole dynamics appears as the evolution of a 'binary word', each bit being the 'Up' or 'Down' state of each cortical module.
In the chosen setting of weakly coupled bistable modules, the network dynamics is largely determined by inter-modular couplings; in choosing the latter we wanted to preserve some a priori knowledge of key features of the state space, to be later checked against the found cluster structure, and we do this by seeking inter-modular couplings such that the network possesses a prescribed pattern of spatial pairwise correlations, derived in turn from a template Hopfield network.
To achieve this we develop a procedure, inspired by Boltzmann learning but extending its domain of application, which we believe has an interest beyond the contingent purpose.
We check the good performance of this new learning procedure by direct comparison between the prescribed spatial correlations and those generated by the optimal network at the end of learning; by comparing the state-space cluster structure of the optimal network and the template Hopfield network; by inferring effective inter-module synaptic couplings from simulations of the optimal network and comparing them to the synaptic efficacies of the template model.Knowledge of the cluster structure also allows to cast the multi-dimensional time series generated by the network dynamics in the reduced form of a symbolic dynamics in the clusters' centroid space.We perform such a reduction to show that it allows to expose in a compact way time-dependent features like historydependent effects; we do this by exploring a large range of characteristic times of SFA, and showing that the Lempel-Ziv complexity measure, applied to the centroid sequence, nicely captures the memory effects induced by SFA.
In summary, what we propose here is a way to use knowledge of the spatial correlations to develop informative and compact representations of multidimensional neural data, also allowing for improved inference and useful reduced representation of the multidimensional dynamics, and to develop a strategy for data-driven model building with spiking networks.

Results
2.1 When the landscape is (partially) known: The Hopfield Model as a first test ground for state-space clustering We start by providing an example of the information gained from clustering in the state space, i.e. finding the local maxima of the density distribution of the configurations generated by a system's dynamics, or -equivalently -the minima of an effective energy landscape (see Fig. 1).
Though we want to ultimately apply clustering to data from realistic neural simulations, to illustrate the method we consider here a well known neural network (the Hopfield model) for which an equilibrium distribution of states is defined.
Figure 1: Sketch of the steps involved in the density-based clustering of multidimensional time series through the mean shift algorithm.The time-binned Ndimensional data ('spikes') in panel A define a density profile in a N -dimensional space, interpreted as sampling a stationary probability distribution (panel B), from which an effective energy landscape can be defined (panel C).Panel D illustrates how the mean-shift operates to find the local maxima of the density distribution.Blue dot: current position of a point to be moved by the algorithm.Big yellow area: the chosen neighbourhood of the blue point.Green dots: neighbouring points of the blue point.Red dots: points that are not neighbours of the blue point.Small yellow dot: the new position of the blue point is given by the algorithm as the center of mass of its neighbouring points.See Section 4 for details.
For the Hopfield model [10], the equilibrium probability distribution of the neural states σ = {σ 1 , σ 2 , . . .σ N } (σ i = ±1) is given by: where β is interpreted as the inverse of a temperature and H is the energy function: where the P discrete vectors ξ µ are chosen as random uncorrelated configurations of the N spins (ξ µ i = ±1) which, depending on P , N and β can act as 'stored patterns', i.e. the dynamics relaxes to the neighborhood of one of the configurations ξ µ , which are minima of the energy (maxima of the probability distribution) together with their mirror patterns −ξ µ [11].Depending on P , N and β, besides the patterns ξ µ a large number of energy local minima is also present, notably various linear combinations of the stored patterns themselves or states uncorrelated with the patterns in the 'spin glass' phase.
In the following we consider the same Hopfield model (same synaptic couplings, therefore same energy minima) in two operating regimes: low noise (i.e.temperature, where the network is expected to reside most of the time in the proximity of the deepest energy minima), and high noise (where the network explores larger regions of the state space, less constrained by the structure of the underlying energy function).The network comprises N = 50 neurons and P = 4 stored patterns.Although for such small numbers relying quantitatively on known theoretical estimates of memory capacity is unwarranted, for the chosen values of N and P the model would be predicted to be well above the retrieval phase for both temperatures, and for its non-trivial ('glassy') energy landscape the energy minima are not expected to coincide with the memory patterns.
The neural configuration time series were generated simulating the Hofpield model for 20000 Monte Carlo steps (a suitable number of initial thermalization steps were neglected).
To the N -dimensional resulting time series we applied the (slightly modified) mean-shift (MS) clustering procedure as described in Section 4: essentially, we iteratively move the points in the N -dimensional space, each of which represents one state generated in the Monte Carlo sequence, towards the center of mass of their neighboring points, thereby identifying at the end the estimated position of the local maxima of the density distribution in the state space.
The states of the Hopfield model are defined over the N -dimensional hypercube, where N is the number of neurons.In order to perform the mean shift displacements we take the sign of the mean value of each coordinate, so that the points remain in the original space.Any time the mean value of a given coordinate is zero, we don't shift that coordinate.
Finally, after reaching a convergence criterion (see Section 4) of the MS algorithm, we re-run it over the set of found centroids, with a fixed radius given by Hamming distance 2 (overlap 0.92) 1 ; we found this further step to make clustering more resistant to noise.
In general, we will consider only clusters containing data points above a minimal fraction ("cutoff") of the whole time series, i.e. minimal mass (1% unless specified).In this way we avoid to consider as clusters small bumps of density due to finite sample fluctuations.
Figure 2, left panels, show representative time courses of the neural states for the low-temperature (β = 1.3, top) and high-temperature (β = 0.83, bottom) cases.Right panels show, for the two cases, the distribution of the overlaps between all configurations in the time series.
The subsequent Figure 3 illustrates the result of the clustering procedure for the low-and high-temperature cases. 1 The overlap q αβ between two binary configurations, and weighting each centroid with its mass in the analogous of Eq. 5. σ α and σ β is defined as  In the high-temperature case it is very difficult to discern a structure in the raster plot, and considering also the unimodal overlap distribution, broadly symmetric around zero, it seems a challenging case for a procedure aimed at extracting the energy minima.
This temperature dependence is reflected in the distribution of overlaps between the visited states.We note that, if energy minima would mostly coincide with the stored (uncorrelated) patterns, the overlap distribution would be trimodal, with a peak in zero and two symmetric peaks at high (positive and negative) overlap, which is not the case even for low temperature (top-right panel in Figure 2), with the distribution of visited states having average overlap about 0.5 in magnitude.For higher temperature (bottom-right panel in Figure 2) the overlap distribution is approximately Gaussian.
From Fig. 3 it is seen that for low temperature most centroids are indeed different from the stored patterns; for high temperature more centroids are identified, including the stored patterns.
While we did not perform a thorough analysis of clusters' centroids in terms of energy local minima, we checked that clusters are indeed akin to attractor basins, by measuring the configurations flow at zero temperature: starting the deterministic (zero-temperature Monte Carlo) dynamics from each one of the configurations assigned to a given cluster, the fraction of them for which the final overlap with their assigned centroid is larger than the initial one is 0.90 ± 0.04 (for the clusters found at β = 0.83), and 0.85 ± 0.05 (for the clusters found at β = 1.3).

Clustering-aided Inference of Synaptic Connectivity
As mentioned in the Section Section 1, several methods have been and are being developed to infer effective synaptic connectivities from the time series of simultaneous neural recordings, some of which fall in the category of so-called "Inverse Ising" problems [2,4].Obvious hindrance in the application of such methods is the large number of parameters to be inferred (effective synaptic efficacies, of order N 2 for N neurons), which makes them sensitive to noise in the data.
In the following, we show that the proposed clustering method can be used to formulate the inference problem in a reduced parameter space, using the activity configurations that are the identified centroids to parametrize a coupling matrix (inspired to the construction of the Hopfield connectivity matrix) with a number of parameters equal to the number of centroids.
Specifically (see also Section 4), the inference model has the form: where the effective coupling matrix J is the sum of weighted Hopfield-like terms c µ i c µ j (see Eq. 2)2 : c µ being the C centroids identified by the clustering procedure, and the weights ω µ are to be inferred.
To test this approach, we first use again the Hopfield model data of Fig. 3, bottom panel.Since 14 out of the 18 identified centroids are pairs of reflected patterns and give rise to the same Hopfield-like term in Eq. 8, the number of parameters ω to be estimated is reduced to 11.
In Figure 4, left panel, we show the inferred values of ω µ .Only four of them (marked with an asterisk) are significantly different from zero, and they correspond to the centroids that coincide with the 4 stored patterns.2).The red stars mark the weights corresponding to the patterns actually stored in the system.Right: Scatter plot of the inferred (reduced model) synaptic couplings against the actual ones.The red curve shows the identity as a reference Notice that, when the ωs are roughly equal, they play a role similar to an inverse temperature; indeed, the value of β used to generate the time series (0.83) is very close to the ω values for the centroids corresponding to the patterns.
We also remark that the obtained values for ω would not be trivially expected from the structure of the state space, in that not only 7 out of 11 centroids (not counting reflections) do not belong to the stored patterns, but the clusters of largest mass are not centered on patterns (remember that the Hopfield system is far from the retrieval phase).
In Figure 4, right panel, we compare the inferred J with the real J (N × (N − 1) = 2450 elements, taking only P + 1 = 5 values); the inference is actually good (the continuous identity line is drawn to guide the eye), and better than the one obtained inferring directly the full J matrix (i.e.not adopting the parametrization of the synaptic matrix in Eq. 8), as shown in Fig. 5, where we compare the relative error (( for the two cases.From this simple example, we confirm that the adopted parametrization, besides the obvious greater simplicity and lesser computational load, is effective in reducing the effect of noise in the data and makes the inference less prone to overfitting, while allowing a good match between model and data.
2.3 Towards a more realistic scenario: a multi-modular network of spiking neurons matching a prescribed spatial correlation structure In the previous Section we showed that, from the spatial correlation structure of the configurations sampled by Monte Carlo, MS clustering was effective in reconstructing the main local energy minima, and allowed for a parsimonious parametrization of the synaptic matrix that afforded better inference.In that case, the correspondence between local density maxima in the state space, and local energy minima, was ensured by the existence of a Gibbs equilibrium probability for the Hopfield model.
In the perspective of applicability to real electrophysiology data, in the present Section we extend the approach to networks of spiking neurons, and to a situation where the notion of a static energy landscape is no longer strictly applicable.
To establish a meaningful benchmark, we want to preserve some a priori knowledge of key features of the state space, to be checked against the found cluster structure, and we do this by setting up a spiking network constructed so as to possess a prescribed spatial correlation structure, from which we generate time series to be clustered.To achieve this we develop a method that, we believe, has a wider interest beyond the case at hand.
The chosen network architecture is composed of strongly self-coupled modules of spiking (integrate-and-fire) neurons, with much (10 −2 − 10 −3 ) weaker synaptic couplings between modules.Intra-module synapses are drawn from a Gaussian distribution, with mean and average chosen (using mean-field predictions) such that each module in isolation is approximately bistable, between states of low (DOWN) and high (UP) firing activity.
The interest in this choice for the architecture of the spiking network stems from accumulating evidence that, not only a modular structure is suggested by the mesoscopic anatomical organization of the cortex, but it also appears to be recognizable in the cortical neural dynamics, which can proceed as the dynamic composition of abrupt jumps between UP and DOWN states ( [8,9]), as mentioned in the Introduction (see also [12][13] [14]).
We remark that synapses are not be constrained to be symmetric, therefore the clustering procedure will not, strictly speaking, match the minima of an energy function of the system.Besides, the Integrate-and-fire spiking neuron model (see Section 4) is endowed with spike-frequency adaptation (SFA), a much studied (and pervasively observed) self-inhibitory mechanism depending on the activity history of the neuron; SFA makes the effective landscape, even when it exists, locally dynamic at the point currently corresponding to the network state.
Fig. 7 describes the main steps involved in the network construction.The multi-modular network is sketched (panel A) as a collection of 64 neural modules, each composed of 32 adapting excitatory and 16 inhibitory neurons (see see Section 4 for details).The approximate bistability of the single modules, which is to a large extent preserved in the interacting network, is illustrated in panel B by the time course of the firing rate of the excitatory neurons from two sample modules; the resulting bimodal distribution of firing rates allows to binarize the modules' activity, as shown in panel C 3 .Once binarized, the time course of the whole network activity can be represented as a sequence of binary vectors (see the 'raster plot' in panel D), to which clustering is applied.
We choose to assign a spatial correlation structure mimicking the one of a Hopfield attractor network, each module corresponding to one binary neuron of the Hopfield network: inter-module synapses, initially drawn from a Gaussian distribution, are subject to an iterative procedure (see below, and Section 4 for details) to match those of the reference Hopfield model.
Were it not for the effects of SFA, the correlation-matched multi-modular network would be expected to behave as an attractor, Hopfield-like system, with the Up and Down states of each module playing the role of the binary values of the Hopfield neurons.However, as the network enters the basin of one attractor, the active modules start lowering their activity because of SFA, thereby destabilizing the state: as noted above, the attractor landscape becomes 'dynamic', in that its depth and curvature around attractor states get lower when the system visits them, promoting transitions to other basins, which are biased by the correlations with other attractor states (due to finite-size effects) 4 .
The new procedure we developed to determine the inter-module synaptic couplings is inspired to the Boltzmann learning strategy where, at each iteration, the change in the couplings is proportional to the difference between the spatial correlations in the model and in the data (and analogously for the external fields).For Boltzmann learning, such difference is the gradient of the Kullback-Leibler distance between the state probabilities in the model and in the data.In our spiking network the explicit form of such a function is lacking; still, intuition suggests, and we assume, that a monotonic relation still holds between the synaptic couplings and the spatial correlations, and this is equivalent to assume that we know the sign of an unknown gradient.Formulated in this way, our 'pseudo-Boltzmann' iterative process to find the optimal synaptic couplings is naturally mapped onto the Rprop algorithm (see Section 4, Eqs.17, 18).
In summary, the sequence of steps we take is the following: 1) store a set of patterns a la Hopfield in a network of binary neurons (as in the previous section, the Hopfield network will be in its glassy phase); 2) from the Hopfield network, measure spatial correlations and site magnetizations (average activity); 3) set up a multi-modular spiking network of approximately bistable modules; 4) use pseudo-Boltzmann learning to find inter-modular couplings and external rates to mimic correlations and magnetizations of the Hopfield system; 5) perform MS clustering on the configurations generated by simulations of the spiking multi-modular system; 6) check the quality of the result (see below) .
The chosen 'reference' Hopfield model has N = 64, P = 4, β = 1.1, for which we generate a long Monte Carlo sequence, and measure the spatial correlations and site magnetization that are to be matched by the multi-modular spiking network with optimal inter-module couplings J pq and external inputs ν ext p obtained from the pseudo-Boltzmann procedure explained above.One may ask whether the simplest choice that intuition would suggest, i.e. simply taking the inter-module synapses as proportional to the computed Hopfield ones, would work.We checked this option, with poor results.One reason is that the low-and high-firing rate states (which are subject to binarization in the clustering procedure) are not dynamically equivalent (contrary to the binary neurons of the Hopfield case): one state can be deterministically more stable than the other (e.g. in the sense of the linear stability of the two corresponding fixed points in the mean-field approximation); furthermore, noise is higher in the high-firing rate state (finite-size noise is activity-dependent and acts multiplicatively).Finally, there are 'quenched noise' effects: the synaptic connectivity in each module is a different (small) realization of the same probabilistic model, and can lead to quite different dynamics between modules.
We checked the success of pseudo-Boltzmann learning in several ways.First, the correlation between the partial correlations measured from the optimal spiking network and those from the reference Hopfield model is very high (R 2 = 0.975); the average 'magnetization' for the optimal spiking network is order 10 −3 , to be compared with the potential range [−1, 1] and the theoretical null value for the Hopfield network.The found optimal external firing rates have large variations between modules (over ±20% with respect to their average), which is expected, since they contribute to compensate for the heterogeneous excitability of the different modules.
Second, we checked the similarity between the cluster structures emerging from the optimal spiking network and that of the reference Hopfield network, by computing the absolute value |q| of the overlaps between the centroids found for the two networks (c S and c H respectively for the spiking and Hopfield networks).
Out of the 12 c H centroids, 9 of them (75%) have |q| = 1 with at least one of the c S ; among the others, 2 of them have |q| ∼ 0.97 with at least one of the c S ; 1 of them has |q| ∼ 0.84 with one of the c S .
Conversely, out of the 13 c S centroids, 8 of them (62%) have |q| = 1 with at least one of the c H ; among the others, 2 of them have |q| ∼ 0.97 with at least one of the c S ; 3 of them have |q| ∼ 0.75 with at least one of the c H .This shows that the pseudo-Boltzmann iterative procedure, by enforcing approximately equal mean activities and spatial pair correlations between the Hopfield model and the modules of the spiking network results in fact in similar mostly visit regions of the state space for the two systems.
Third, we inferred inter-modular synaptic efficacies (using MPF as in the previous Section) from the spiking network time series, and found that for all modules pairs the inferred synapses are close to corresponding synapses of the reference Hopfield model (the mean absolute value of the relative error, taken over the J entries, is 11%, likely mostly due to finite-sample noise, as suggested by comparison with Fig. 5); this confirms that, despite the large differences between the synapses of the Hopfield network and the optimal inter-modular aver-age synaptic efficacies determined by pseudo-Boltzmann learning (not shown), the dynamics of the resulting spiking network effectively embodies inter-module interactions consistent with the reference Hopfield network.
To summarize, the somewhat complex procedure described allowed us to construct a modular spiking system with some control on desired features of the state space, not easily enforced by simple ad hoc assignment of the synaptic structure; while in this case the construction was guided by a reference Hopfield model, whose neurons were naturally mapped onto the network's modules, more in general we believe the procedure is interesting per se, as a means to enforce a prescribed pattern of spatial correlations (and associated state-space structure) in relatively complex networks of spiking neurons.

Dynamics in the centroid space
When performing MS clustering in the state space, information about the dynamics of the original time series is -by construction -lost.However, once clustering is done, knowledge of the clusters' centroids allows to go back to the multi-dimensional dynamics, and cast it in a useful compact form: to each one of the vectors expressing the states at successive sampling times, we substitute the label of the cluster that vector was assigned to.
The description of the system's dynamics is reduced to a 'symbolic dynamics' in the centroids space, which opens up options to expose dynamic features that may be difficult to uncover directly from the analysis of the spiking activity, as we illustrate through an example in the present section.
Based on the procedure developed in the previous Section to set up multimodular spiking networks, we want to generate a family of networks for which different degrees of 'complexity' can be expected, instantiated here in different dynamic memory span, induced by the SFA component that, as discussed, affects locally the dynamics in a history-dependent way.
In the previous section SFA was chosen small, just enough to obtain measurable state transition rates in the simulation time.Here we set up a series of spiking multi-modular networks, each one constructed as in the previous section, but with different parameters for SFA.
We span a range of values for the timescale of SFA (τ SF A ), while keeping the product τ SF A g SF A constant (in order to keep the SFA 'strength' constant and have comparable systems, see Section 4).
We show here that the reduced dynamics in the centroid space lends itself naturally to methods for symbols-oriented measures of complexity, able to easily capture memory effects.
It has long been suggested, and reported in several published works, that the Lempel-Ziv (LZ) complexity measure [24] may be usefully adapted to characterize neural data series [25,26,27]; in particular, a suitably normalized LZ complexity has been successfully employed as a diagnostic measure of the distance to the conscious state in neurological patients [28].In essence, the approach is based on the intuitive idea that the more complex the signal, the less its compressibility; in other words, more structure in the signal increases its predictability.Therefore, a memoryless stochastic time series would have maximal complexity, and any memory embedded in the dynamics generating the time series would make it decrease.Such entropic measures provide information beyond what linear correlation analysis can provide.
We therefore measure, for the symbolic sequence reduction of the multidimensional dynamics obtained from networks with different τ SF A , LZ complexity and a relative complexity index, as detailed in Section 4.
Expectation is that increasing τ SF A generates multidimensional time series with decreasing complexity.We checked that such expectation is met in our spiking simulation data, and that differences in LZ complexity for high and low τ SF A are statistically significant for the simulated time span.
In order to quantify the difference, for each value of τ SF A and g SF A we also simulated ten realizations of Markov chains in the centroid space, generated by the transition probabilities estimated from the simulation, thereby producing surrogate memoryless sequences with the same transition statistics as the data; we computed the LZ complexity averaged over the ten Markov chains ('c Markov '), and to compare it to the LZ complexity from the simulation data ('c sample '), we computed the ratio R = (c Markov − c sample )/c Markov (see Section 4), 5 .
For a meaningful comparative analysis of the complexity of the dynamics in the centroid space for widely different values of τ SF A (and related g SF A ), we should make sure that the corresponding networks are indeed similar between themselves, and with the reference Hopfield network, in terms of the state space structure.
For this purpose we did a preliminary analysis of the clusters found for all the explored values of τ SF A ; since, as τ SF A increases, an increasing number of small clusters appears 6 , with small differences from main ones, we first clustered the centroids with standard methods (hierarchical clustering using Hamming distance and complete linkage), to obtain a 'fuzzy' version of the whole set of main centroids; for those (76 clusters), we found that: the first 8 fuzzy centroids account for 66.3% of the total configuration mass and are detected, on average, for 84.0% of the τ SF A values; the first 10 fuzzy centroids account for 71.7% of the total configuration mass and are detected, on average, for 73.0% of the τ SF A values; the centroids of the reference Hopfield network are recovered in 87.2% of cases (by 'recovered' we mean that at least one of the centroids found for an Hopfield simulation-clustering has overlap > 0.719 with the considered Hopfield centroid, the threshold value 0.719 being determined such that 95% of the overlaps between the centroids found for the considered case are below such threshold. The results of the analysis are summarized in Figure 7: we see that the ratio R increases with increasing τ SF A or, in other words, that as memory effects increase the complexity of the actual centroid sequences gets increasingly larger than that of surrogate Markov sequences, which provides a quantitative information on the non-Markovian nature 7 ) of dynamics for higher τ SF A cases.In the figure we also report for comparison (red line) the R value obtained for the centroid sequence extracted from the time series generated by the reference Hopfield network (which is inherently Markovian); it is seen that, although the centroid sequence deviates from a Markov process, the relative difference w.r.t. the surrogate sequence is very small, much smaller than the one for the spiking dynamics, where SFA plays a major role.Of course, deviations from a Markov, memoryless dynamics can take many forms; a generic expectation is that, for a non-Markov process, sequences of states of given length are more, or less, likely to occur than predicted based only on the transition probability matrix.
To gain insight for the case at hand, we considered the centroid sequence for the spiking network with the highest τ SF A = 4s (of length about 1.3 × 10 3 ).For each of the triplets of centroid labels (13 3 triplets, since 13 centroids were extracted in this case by MS clustering) the probability of occurrence was estimated from the actual sample, and computed from the Markov transition probabilities estimated from the same sample.
Fig. 8, left panel, shows (blue points) a scatter plot of such probabilities (limited to the triplets occurring more than 10 times).With reference to the black identity line (close to which the points would obviously cluster if the actual sequence was Markovian), it is seen that many triplets are over-or underrepresented in the actual sequence (particularly in the region of higher probabilities which matters most), as a reflection of its non-Markovian nature.
In order to assess the significance of the observed differences, we also generated a truly Markovian sequence with the same transition probabilities and of the same length as the sample, and from it we estimated the triplets probabilities; the green crosses show the corresponding scatter plot with the computed Markov probabilities, and it is clearly seen that finite-sample effects are much smaller than the spread observed for the actual sequence, confirming its genuine non-Markovian nature.
Finally, given the dependence of R on τ SF A shown in Fig. 7, it was natural to ask how the non-Markovian estimated triplets occurrence probabilities would depend on τ SF A ; this is illustrated in Fig. 8, right panel, where we report the Kullback-Leibler distance between the sampled triplets distributions from the actual sequence and the surrogate Markov sequence, as a function of τ SF A ; although in principle for different τ SF A we may expect different contributions to the non-Markov nature of the sequence from sub-sequences of different length, we observe an approximately monotonic dependence of the KL distance on τ SF A .In summary, reducing the multidimensional time series of spiking data to the sequence of labels of the clusters identified by the state-space clustering procedure, casts the dynamics in a form easily suited to capture and quantify traces of memory effects, for instance allowing in principle the comparison between recordings in different experimental conditions.

Discussion
In this work we considered pre-existing strategies and developed new tools that, taken together, compose a methodology with good potential, we believe, in analysis and modeling of neuroscience data.
We started from a simple idea (though, to our knowledge, it was not exploited so far in the analysis of multiple neural recordings): to represent the multidimensional time series of neural activities as a density distribution in a corresponding multidimensional space, and perform a density-based clustering procedure to extract the local density maxima.We strived to show that this type of dimensional reduction is in fact a versatile instrument; in particular we illustrated examples that it can be useful in achieving better inference of synaptic couplings from neural activities, and also to cast the multi-dimensional neural dynamics in a compact form amenable to symbol-oriented methods of complexity analysis.
The clusters, and the associated centroids, have an obvious interpretation when the original time series is generated by an attractor dynamics, but retain an informative value even when (like in the case of SFA in the spiking network) the picture of a static attractor landscape is no longer appropriate; preliminary work in progress on multiple in-vivo recordings during motor tasks makes us confident that the approach can provide compact and usable information even in strongly non-stationary conditions.
We first validated the method on the activity generated by a simple Hopfield network, albeit we choose for it a highly nontrivial working regime in which the energy landscape explored during the network dynamics is 'glassy', with many local minima uncorrelated from the memory pattern embedded in the Hebbian synaptic matrix.We showed that our (modified) Mean-Shift clustering is indeed effective in revealing features of the energy landscape from very noisy time series.Besides, knowledge of the clusters' centroids allowed an efficient parametrization of the synaptic matrix, which in turn allowed a much better result when inferring the synaptic efficacies from the network's sampled activity.
We should remark that the generality and robustness of this latter result is still to be thoroughly investigated; in the case shown here, the mathematical form of the inference model is similar (though not equal) to the one of the model network generating the data used for inference.While this is indeed a limitation shared with most published works dealing with 'inverse Ising' approaches to synaptic inference, in future work we want to systematically explore more generic forms of parametrization of the synaptic matrix (still using information from the clusters extracted from the activity time series).
Furthermore, the proposed approach addresses, though admittedly in a specific context, a general issue.Whatever the specific method adopted, the extent to which the inference of single synapses can be trusted can be severely affected by several factors, like inherent inadequacy of the model used for inference, poor quality of the data and noise, limited data sample.Still, it is usually (explicitly or implicitly) assumed that even when the inference procedure fails to match single synaptic efficacies, if the synaptic matrix has a global structure it should still be captured in the inferred matrix.This would almost inevitably call for some kind of dimensional reduction of the inferred synaptic matrix, such that the informative relevant 'mesoscopic' structure is retained.In a sense, what we propose here can be viewed as a way to formulate a 'mesoscopic' inference problem in the first place; again, we cannot claim full generality here, but our clear success in the case of attractor networks examined in the present work makes the approach, we believe, an interesting option to be further explored.
Wanting to move to more realistic network models, we were naturally led to networks of spiking neurons (Integrate-and-fire, with SFA); in the Introduction and in the Results we provided motivations for a specific choice of the network architecture as composed of weakly coupled, individually bistable neural populations.In order to make contact with the study of clustering in the Hopfield network and, more importantly, to have some a priori knowledge of the effective landscape of the spiking system, we also wanted to set up the inter-module couplings such that the spiking network as a whole would share static properties of the state space with an Hopfield attractor network.This need motivated us to develop a procedure that determines the inter-modular couplings such that the network's activity generates a prescribed pattern of pairwise spatial correlations.
The procedure involves a new algorithm that extends the domain of appli-cability of Boltzmann learning, and uses Rprop learning; as already remarked, we believe this approach has a value beyond the specific purpose it served in the present work.It rests on the intuition that between excitatory synaptic connection strength and neuronal activity correlation a monotone relationship should hold.On a more general level, this is just an instance of a strategy (which is also a human ability) to identify the relevant variables in a problem and code them in such a way that they have a conditionally monotone relationship with the relevant observables or, more specifically, with the statistic deemed sufficient for the problem under exam [29].Such ability, coupled with the proven effectiveness in a variety of contexts of optimization algorithms that depend only on the sign of the derivative of the function to be optimized (as Rprop) [30,31], can make the idea behind the proposed algorithm robustly generalizable to a wide array of problems, dealing both with static and dynamic properties of neuronal networks, e.g. by taking into account spatial as well as temporal correlations.
Finally we went back to dynamics.A natural step was to substitute the original multidimensional time series with the corresponding 'symbolic dynamics' of centroid labels, and to ask whether the resulting paths in the centroid space would allow to extract information on the system's original dynamics that would be difficult to directly expose.A case in point, in our context, was to inspect how the sequences of transitions between centroids in the spiking modular network would depend on the strength of SFA.SFA introduces 'memory' in the dynamics, and higher SFA makes the original time series more history-dependent, and the corresponding symbolic dynamics of centroid labels is expected to be less Markovian.As discussed in the text, the pattern of transitions between centroids results from an interplay, in the original time series, between noise, spatial overlaps between attractor states of the multi-modular network, and SFA-dependent effects.
To quantify the memory-related complexity of the network activity, we defined a measure based on Lempel-Ziv (LZ) complexity (inspired to previous work in various scientific domains, including neuroscience): for different time scales of SFA, we compared the LZ complexity of the centroid time series with surrogate Markov sequences with the same transition probabilities, finding that, as expected, longer timescales of SFA correspond to less complex (and less Markovian) centroid sequences.We also provided insight into such SFA-dependent non-Markovian nature by studying the occurrence probability of selected subsequences.
Again, while we illustrated in some detail this reduction to symbolic dynamics and the analysis of its complexity in the specific case under consideration, its value rests with its generic applicability to multidimensional time series.
A few closing remarks, to facilitate comparison with other approaches to the characterization of multidimensional time series in neuroscience (including Hidden Markov Models, that recently have been frequently used in the analysis of neural data (see e.g.)).
First, the proposed state-space approach is free from bias towards spherical clusters and from a pre-defined number of clusters.Such freedom is inherent in the approach taken here, which is also quite easy to implement.
Second, for large data sets the density-based clustering approach can become computationally expensive, and of course it is meaningful to try to speed it up.A recent successful attempt was made in [6], where a preliminary distance-based selection procedure excludes outliers (points with low local density) from the iterative procedure.The validity and performance of the approach is tested in a variety of benchmark data sets, including the Olivetti face dataset [32].In comparing to the present work, we remark that on the one hand we did not make an effort towards computational efficiency; rather, we wanted to show the potential of density-based clustering for the analysis of neural data from multiple simultaneous recordings.On the other hand, we tried the method proposed in [6], and checked that it performs poorly in several representative situations, due to the high level of noise.A reasonable strategy would probably be a mixed method which first performs a number of iterations (dependent on the noise level) of the mean-shift algorithm (to 'clean up' enough the data distribution in the configuration space), followed by the faster procedure described in [6].

Clustering in the state space: a modified Mean Shift Algorithm
We start with a data set {x 1 , x 2 , . . .x M } consisting of M (real valued) vectors in a N -dimensional space.The Mean Shift (MS) algorithm [Fukunaga Hostetler][7] iteratively picks a random i (1 ≤ i ≤ M ) and at step t updates the vector x i according to the rule: where w(•) is a non increasing function of its argument, dist(x t i , x t j ) is a distance measure between vectors, and the original vector set is {x 0 i }.In other words, at each step x t i is replaced by a weighted average of the other points x t j : points that are closer to x t i are given more weight, thus defining a "soft" neighborhood of x t i (see Fig. 2).Therefore the update rule effectively moves x i towards regions of the N dimensional space where the (local) density of points is higher at step t.The other M − 1 vectors are instead left unchanged (x t+1 j = x t j for j = i); though a parallel, deterministic update rule is in principle possible applying Eq. 5 to all the data points at the same time, the random, sequential update just described proved more robust in practice.
The iteration terminates upon a threshold condition: when all x i (t) belong to a discrete space, the iteration terminates when the fraction of cases (out of the last M ) in which x i (t + 1) = x i (t) is below a fixed threshold; in the continuous case, an additional threshold on a minimal displacement is required.Clusters are finally determined as sets of initial points x 0 i absorbed by a same final point (their number we call the 'mass' of the centroid); each final point can be taken as representative of a different cluster (its 'centroid'.) The choice of the weighting function w(•) is central to the MS algorithm.Such function can in general depend both on the step t and the configuration of the neighborhood, thus adapting to different conditions.We used a step weight function: The choice of the radius r is critical; on the one hand, a small radius can make the algorithm too sensitive to local variation of density (e.g.due to noisy data), leading to many non-representative clusters; on the other, as the radius increases, the reduced sensitivity to local fluctuations can lead to merging different meaningful clusters.We therefore resorted to an adaptive heuristic criterion to select r.At each step t, we compute the standard deviation σ(n) of the distances between x t i , the point selected for update, and its n nearest neighbors.We then take n min = arg min n σ(n), and set r as the distance between x t i and its n min -th nearest neighbors.We found such heuristics to work well in our case of discrete (hyper cubic) space, for different data sets and for a wide distribution of cluster sizes.
This criterion for determining the radius, together with the additional MS clustering performed on the centroids found in a first run (see Section 2), constitute a modification of the original MS algorithm, which we found advantageous.
In the MS algorithm the initial ordering of data points is irrelevant, as they are treated as samples of a static distribution.For time series this means that the time structure of the data is completely ignored and points that are distant in time can end up being clustered together, e.g. as the result of subsequent 'jumps' of the system in a same region of the state-space.Nonetheless, as we saw in Section 2.4, segmenting the original time series in terms of transitions from one cluster to another can highlight relevant dynamical features (nonstationarities, memory effects) in the data.

Inference
Specifically, we will assume for the system under consideration an effective Isinglike energy function: We propose to decompose the effective coupling matrix J in a set of weighted Hopfield-like terms c µ i c µ j (see Eq. 2): where the vectors c µ are the C centroids identified by the clustering procedure, and the weights ω µ are to be found through an optimization procedure.Notice that, when the ωs are roughly equal, they play a role similar to an inverse temperature; indeed, we found that the value of β used in the Hopfield simulations is very close to the ω values for the centroids corresponding to the patterns.
For the optimization we use a recently introduced technique, Minimum Probability Flow (MPF) [33], with the advantage that it does not rely neither on Monte Carlo (MC) runs (like Boltzmann learning does) nor on mean field approximations (like most Inverse Ising methods do [34]).It is based on the maximization of a function whose maxima are close to the Boltzmann Likelihood ones, but with no needs of performing computationally expensive Gibbs samplings.
Whilst Boltzmann learning aims to minimize the distance (more precisely the Kullback-Liebler divergence) between the distribution of the data and the distribution of the model, MPF aims to minimize the distance between the distribution of the data and the model distribution after an infinitesimal Monte Carlo step, starting from the data.As the name of the algorithm suggests, this amounts to minimizing the outflow, determined by the model, of probability from the data.Since the latter is expressed only through the transition probabilities, performing the Monte Carlo is not actually needed, and the gradients of the objective function can be explicitly computed.
For completeness, we briefly summarize the main points in the MPF approach.The model probability density p, dependent on a set of parameters θ, it is assumed to evolve according to the master equation: The transition probabilities Γ are assumed to satisfy the detailed balance condition: The Maximum Likelihood estimate θ of the parameters, given the data, is: MPF proposes to consider instead, for infinitesimal ǫ: It was proven in [33] that: The MPF strategy is therefore to search for:

Spiking network and Pseudo-Boltzmann learning
The single neuron obeys the sub-threshold dynamics: with the additional condition that if V i ≥ V th , the emission of a spike is recorded and V i is reset to a value H for the duration of a refractory period t ref .In Eq. 15, V i is the membrane potential of neuron i, τ is the membrane time constant, J ij is the synaptic efficacy from neuron j to neuron i, k j label spikes emitted by neuron j, d ij is the spike transmission delay from j to i.  1. Inter-module synapses are only between excitatory neurons, and can be positive or negative, which can can be viewed as an effective way to also incorporate the excitatory input to inhibitory neurons; their mean efficacies are determined through the learning procedure described below.
All the synapses in the network, given their (learnt or assigned) mean efficacies, are drawn from a Gaussian distribution 25% relative variance.
The probability of connection between two excitatory neurons belonging to different modules is 0.5 while the delays for inter-module communication have d max EE = 50ms.
To choose the intra-module synaptic connectivity such that the isolated module is approximately bistable, we used predictions from mean-field theory [8].The bistable behavior and a narrow range of firing rates for the low and high activity states proved to be robust against quenched noise due to different realizations of the synaptic matrices.This allowed us to define a threshold θ bin on the firing activity to binarize the dynamic states of the modules (the binarization is actually performed on a smoothed version of activity by averaging over a 20 ms window).In this way, the dynamics of the network is represented by a sequence of binary vectors corresponding to the high or low state of all modules.The procedure to construct the inter-module connectivity (see below) preserves to a large extent the modules' bistability for the interconnected network.
Numerical simulations were performed using the event-driven simulator we described in [35].
The procedure to construct inter-module synaptic connectivity aims to "store" prescribed patterns of activities using correlations between the binarized network state vectors to build an effective Hopfield-like synaptic matrix.To store patterns in the modular network we start from a Hopfield model with given P stored patterns, from the simulation of which we estimate the mean single spin magnetization m Hopf p ≡ σ p t , and spatial spin-spin correlation functions c Hopf pq ≡ σ p σ q t − σ p t σ q t , where t is the average over the whole time series.We then iteratively minimize the difference between m Hopf p and c Hopf pq , and the corresponding quantities m p and c pq measured from the sequence of binarized states of the multi-modular spiking network (such that p and q are module labels).Such minimization is performed adopting an educated guess inspired by Boltzmann learning that would be appropriate for a binary spin system, and that we verify a posteriori.Specifically we implement the following procedure: • Run a long simulation of the multi-modular system • Binarize the activity of each module, obtaining a time-series of binary state vectors, from which • Measure the spatial connected correlations c pq and magnetizations m p • Perform a step of the pseudo-Boltzmann iteration over the values of J pq and ν ext p , where J pq is the average synaptic efficacy from neurons belonging to module q to neurons belonging to module p; and ν ext p is the common value of ν ext for all neurons in module p In essence, Rprop adapts the learning rate for each parameter only using the sign of the derivatives: when the sign of the derivative does not change between two successive iteration steps, learning should speed up, while when the sign of the derivative changes it should slow down.
We checked that, although every simulation during the training procedure starts from the same exact condition, discarding the first 5 × τ SF A steps of each run is enough to ensure that no appreciable correlations are detectable between different realizations.

Complexity measure of dynamics in the centroid space
After performing MS clustering on the time series generated by a simulation of the multi-modular spiking network with given τ SF A and g SF A , to each of the 64dimensional vectors containing the modules' firing activities we substitute the label of the centroid that vector belongs to.In this way, the multi-dimensional dynamics is converted into a symbolic sequence of centroid labels.
From such label sequence we measure the matrix of transition probabilities between all pairs of centroids, and generate surrogate label sequences as Markov processes with the same transition probabilities as the actual sequence.While, by construction, the surrogate sequences are memoryless, this may not be the case for the actual one.We focus on such possible memory effects, which are in general interesting to uncover, and for this purpose we adopt a measure typically used to evaluate the complexity of symbolic sequences, the Lempel-Ziv (LZ) complexity, which is a measure of compressibility, suitably normalized in order to eliminate trivial dependence on the length of the sequence.
The LZ complexity is computed as [24] C LZ = S LZ |S| log(|A|)/ log(|S|) (19) where S LZ is the length of the compressed sequence, |S| is the length of the sequence, |A| is the length of the alphabet.And the relative complexity index is defined as:

Figure 2 :
Figure 2: Left column: The horizontal axis represents the time in MCS, while the vertical axis identifies the different units.White (black) pixel in the matrix entry (i, j) means that the i unit had a value -1 (1) at MCS j.Right column: Histogram of the overlaps (see footnote 1) between the MC configurations.

Figure 3 :
Figure 3: Clustered configurations for the raster plots in Fig. 2. White is for units with value -1, while in colors (different for different clusters) are indicated units with value 1.The clusters whose centroids correspond to stored patterns or their reflections are marked by red dots.The centroids of the clusters appearing in the upper panel are a subset of those in the lower panel.Equal numbers refer to the same centroid.The clusters' masses can change due to finite size sampling, and clusters in each figure are ordered from biggest to smallest masses.

Figure 4 :
Figure4: Left: Inferred values of the weights ω α when fitting the reduced model (Eq.8) to the raster plot shown in the lower column of Fig.2).The red stars mark the weights corresponding to the patterns actually stored in the system.Right: Scatter plot of the inferred (reduced model) synaptic couplings against the actual ones.The red curve shows the identity as a reference

Figure 5 :
Figure 5: A posteriori probabilities of inference errors for the cases of the reduced model (red curve) and the full matrix inference (blue curve).Details in the text.

Figure 6 :
Figure 6: Sketch of the main steps involved in the network construction.See text for details.

Figure 7 :
Figure 7: Dependence of the complexity measure R vs τ SF A .Red line: R computed for the Markovian sequence generated by the reference Hopfield system; blue line: R vs τ SF A for the actual sequence.

Figure 8 :
Figure 8: Left: for all triplets occurring more than 10 times, the blue points are the estimated probability from the actual sequence vs the computed Markov probabilities from the estimated transition probability matrix; green crosses are the estimated probability from the actual sequence vs the estimated probability from the Markov surrogates.Right: the Kullback-Leibler distance between the sampled triplets distributions from the actual sequence and the surrogate Markov sequence, as a function of τ SF A .
and it is related with the Hamming distance h αβ by q αβ = 1 − 2 h αβ N .
The neuron model includes spike-frequency adaptation (SFA) in the form of a calcium-dependent inhibitory (potassium) current: g SF A is the strength of SFA, c i represents the intra-cellular calcium concentration, and τ SF A is a characteristic time of calcium dynamics.The c i variable acts as an integrator of the spiking activity of the neuron i, such that SFA implements an activity-dependent self-inhibition.I i is an external current applied to the neuron, implemented as a Poisson process with rate ν ext i .The network comprises 64 modules.Each module includes n E = 32 excitatory (E) neurons and n I = 16 inhibitory (I) neurons.E/I neurons within a module are connected with probabilities c XY , X, Y = E/I; each E/I neuron also receives C ext E/I external spike trains through synapses with mean J ext E/I , and with rate ν ext init (init refers to the fact that external rates are set to an initial value, then they are optimized during the learning process -see below).Delays d ij are drawn from an exponential distribution between d min XY d max XY , X, Y = E/I, where d min XY = 0.1ms always.The values of these parameters are given in Table