## Figures

## Abstract

Sorting spikes from extracellular recording into clusters associated with distinct single units (putative neurons) is a fundamental step in analyzing neuronal populations. Such spike sorting is intrinsically unsupervised, as the number of neurons are not known a priori. Therefor, any spike sorting is an unsupervised learning problem that requires either of the two approaches: specification of a fixed value k for the number of clusters to seek, or generation of candidate partitions for several possible values of c, followed by selection of a best candidate based on various post-clustering validation criteria. In this paper, we investigate the first approach and evaluate the utility of several methods for providing lower dimensional visualization of the cluster structure and on subsequent spike clustering. We also introduce a visualization technique called improved visual assessment of cluster tendency (iVAT) to estimate possible cluster structures in data without the need for dimensionality reduction. Experimental results are conducted on two datasets with ground truth labels. In data with a relatively small number of clusters, iVAT is beneficial in estimating the number of clusters to inform the initialization of clustering algorithms. With larger numbers of clusters, iVAT gives a useful estimate of the coarse cluster structure but sometimes fails to indicate the presumptive number of clusters. We show that noise associated with recording extracellular neuronal potentials can disrupt computational clustering schemes, highlighting the benefit of probabilistic clustering models. Our results show that t-Distributed Stochastic Neighbor Embedding (t-SNE) provides representations of the data that yield more accurate visualization of potential cluster structure to inform the clustering stage. Moreover, The clusters obtained using t-SNE features were more reliable than the clusters obtained using the other methods, which indicates that t-SNE can potentially be used for both visualization and to extract features to be used by any clustering algorithm.

**Citation: **Mahallati S, Bezdek JC, Popovic MR, Valiante TA (2019) Cluster tendency assessment in neuronal spike data. PLoS ONE 14(11):
e0224547.
https://doi.org/10.1371/journal.pone.0224547

**Editor: **Gennady Cymbalyuk,
Georgia State University, UNITED STATES

**Received: **January 4, 2019; **Accepted: **October 16, 2019; **Published: ** November 12, 2019

**Copyright: ** © 2019 Mahallati et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **Dataset-1 is available at https://www135.lamp.le.ac.uk/hgr3/. Dataset-2 is available via public database at CRCNS under doi: http://dx.doi.org/10.6080/K02Z13FP.

**Funding: **This research was supported by the Natural Sciences and Engineering Research Council of Canada. MRP was funded by Dean Connor and Maris Uffelmann Donation, Canadian Fund For Innovation and Natural Sciences and Engineering Research Council: Discovery Grant (RGPIN-2016-06358).

**Competing interests: ** The authors have declared that no competing interests exist.

## 1 Introduction

Recording of extracellular signatures of action potentials, referred to as spikes, is a standard tool for revealing the activity of populations of individual neurons (single units). Single unit activity contains fundamental information for understanding brain microcircuit function in vivo and in vitro [1]. Inferences about network activity can be made by identifying coincident activity and other temporal relationships among spiking patterns of different neurons [2]. However, the reliability of these inferences is strongly influenced by the quality of spike sorting, i.e., the detection and classification of spike events from the raw extracellular traces with the goal of identifying single-unit spike trains. Poor sorting quality results in biased cross-correlation estimates of the spiking activity of the different identified units [3].

The typical workflow for spike sorting includes spike detection, feature extraction, and clustering. While detection is pretty straightforward and can be efficiently done with simple thresholding, the feature extraction and clustering procedures are far from being satisfactorily settled [4]. It has been estimated that single or tetrode type electrodes (i.e. impedance< 100*K*Ω) can record neuronal activity within a spherical volume of 50 *μ*m radius with amplitudes large enough to be detected (> 60*μ*V). This volume of brain tissue constitutes about 100 neurons. While noting that many neurons are expected to be silent [1, 5], commonly, not more than a handful identified neurons are reported per electrode. Studies on current sorting algorithms used for individual electrode recordings have shown that they are limited in distinguishing 8 to 10 out of 20 units with less than 50% false positive and false negative rates [6, 7]. Other methods using high density electrode arrays reported simulations with no more than 10 units [8, 9].

Since we can’t physiologically verify how many neurons have been recorded, assigning the spikes within a recording to individual neurons remains a fundamental technical issue. The sorting is in essence an unsupervised learning challenge. Therefore, methods require one of two approaches: specification of a fixed value of the number of clusters to seek (*c*); or generation of candidate partitions for several possible values of c, followed by selection of a best candidate based on various post-clustering validation criteria. Moreover, improving spike classification to correctly identify cell types is a topic of interest highlighted by initiatives that aim to characterize and reconstruct different cell types in the brain and their role in health and disease [10, 11]. For that goal, [12] list the identification of the number of clusters as the first outstanding question in techniques for neuronal classification.

In summary, identifying the spike trains of individual units within a recording is a three-faceted problem: (i) assessing the cluster tendency in the pre-clustering phase (before initializing any clustering algorithm); (ii) clustering (i.e. finding partitions of the data); and (iii) evaluation of the validity of the clusters that have been isolated, post-clustering [13]. Spike sorting algorithms usually start by projecting the data to a lower dimensional space. There are several reasons to do this. For example, lower dimensional data usually reduces such as reduction of computation time. In this paper, the fundamental reason for dimensionality reduction (essentially to two or three dimensions) is that 2D and 3D projections allow visualization of high dimensional input data. In turn, this facilitates the choice of a few selected values of the integer c. Pre-specification of k is needed by almost all clustering algorithms as an input parameter (hyper parameter). In algorithms such as density based clustering or mean shift, the choice of k is implicit in the choice of parameters such as *γ* (cluster density times cluster distance) or *h* (the bandwidth parameter), respectively. In practice, since reduced dimensionality embedding of the data often does not provide visually well separated clusters, it is common to exclude large number of spikes and only manually include (also known as cluster cutting) a small core portion of the subsets to achieve well-isolated clusters [14]. Omitting spikes to obtain well-separated clusters may lead to single units with recognizable spike waveforms, but it discards spikes that, as mentioned before, are fundamental for analyses of temporal structure of spiking activity [15, 16]. Therefore, the bottleneck in spike sorting is at the pre-clustering stage: viz., inaccuracy of the assumed data structure that is inferred by visualization of it in the lower dimensional space. If clustering is to be done in a lower dimensional data space, errors here will affect both the initial estimate of the cluster number and the performance of the clustering algorithm. Thus, this study concerns itself with visual assessment in the pre-clustering stage.

We compare the visualization of neuronal spike data created using six methods (i) three well-known dimensionality reduction techniques: principle component analysis (PCA), t-student stochastic neighborhood embedding (t-SNE) and Sammon’s algorithm, (ii) two methods that extract features from the waveforms: wavelet decomposition and features such as peak to valley amplitude and Energy (PV-E), and (iii) a method that operates directly on waveforms in the input space: improved visual assessment of tendency (iVAT). The analysis is performed on two different types of ground truth data (labeled data): simulated spike sets and real recorded spike sets, called dataset-1 and dataset-2, respectively. Our results indicate that iVAT often suggests a most reasonable estimate for the primary (or coarse) cluster structure, while t-SNE is often capable of displaying finer cluster structure. While iVAT is only a visualization tool, t-SNE can be used for both visualization and to extract features. We provide an objective measure of comparison between t-SNE and the other methods, we evaluate the quality of partitions obtained by clustering in the upspace (input dimensional space; i.e., the waveforms) and also in the five two-dimensional representations. This test is performed by running k-means and generating a number of clusters equal to the actual (i.e. known) number of units. The quality of the partitions generated by each method is evaluated with *Dunn’s index (DI)* (an internal index describing the intrinsic quality of the generated clusters); the generalized Dunn’s index GDI_{33}; and the *Adjusted Rand’s index (ARI)* (an external measure of agreement between computed partitions and the ground truth partition).

The outline of the paper follows: Section 2 describes the datasets that we used (2.1), defines the problem (2.2), explains the methods used for data visualization (2.3 and 2.4), and lastly describes the measures used for evaluating clustering structure of the data (2.5). The results of the experiments on the datasets are given in section 3. Insights gained from the experiments are summarized in section 4.

## 2 Materials and methods

### 2.1 Datasets

The importance of model data or ground truth data, where the label or membership of each spike to an individual neuron is known, for spike sorting validation is emphasized by [17]. We use two labeled datasets: our first experiment uses simulations of extracellular traces as model data or surrogate ground truth data (hereafter called dataset-1), and the second experiment uses data obtained from in-vivo experiments as real ground truth data (hereafter called dataset-2).

**Dataset-1** Pedreira et al. [6] simulated extracellular traces that contain the activity of 2 to 20 neurons with additive background noise. The spike activity is generated by using 594 average spike waveforms (templates) compiled from Basal Ganglia and Neocortical recordings. The background noise (i.e., LFP noise) is simulated by superimposition of thousands of spikes at random times which were then scaled down to a standard deviation of 0.1. Each simulated trace also contains multi-unit activity, which was generated by superimposing 20 waveforms with normalized amplitudes limited to 0.5. The amplitude of single unit spikes had a normal distribution with mean of 1.1 and limited to range of 0.9 and 2 to reproduce the signal to noise ratio of the real recordings. Dataset-1 thus provides us with simulated extracellular traces each containing 3 to 21 subsets of spikes. For each cluster number *k*, five simulations using different sets of templates were generated (for a total of 90 datasets). The spikes were detected by voltage thresholding. The length of each spike waveform is 2 ms, with a sampling rate of 24 kHz, comprising 48 sample points. Thus, the upspace dimension for subsets in Dataset-1 is 48.

**Dataset-2**: We used the in vivo simultaneous intracellular and extracellular hippocampal recording datasets that are publicly available from the CRCNS website [18]. These are raw data of simultaneous intracellular and extracellular recordings from neurons in the CA1 region of the hippocampus of anesthetized rats. The experimental procedure consisted of inserting extracellular electrodes (either tetrodes, 13-*μm* polyimide-coated nichrome wires, or a single 60-*μm* wire) into the cell body layer of CA1, confirming the presence of unit activity in the recordings, and then inserting intracellular sharp-electrodes into the same region in close proximity to the extracellular electrode to impale a single neuron and record stable action potentials induced by current injections. With this method, it was possible to capture simultaneous spikes in the intracellular and extracellular recordings. We detected the intracellular spikes, and used those spike times to extract the extracellular spike train of that neuron (i.e. a labeled subset of spikes). Each spike waveform is 2 ms long which, with a sampling rate of 40 kHz, results in 80 sample points, so the upspace dimension of each spike and subsequent sets in dataset-2 is 80. Dataset-2 is valuable because each spike waveform in the subset is a recorded signal from a physiological setting; hence, the variability in the probability distribution of each subset comes from either natural (e.g. the effect of other current sources in the extracellular medium) or experimental conditions (e.g. electrode drift). Each recorded trace has one labeled subset of spikes. Hence, to generate each mixture, we used the extracted spike subsets from different traces. In total, we obtained 9 subsets of spikes from the database and then used combinations of 2,3,4,… to 9 of these subsets to create datasets containing spikes of 2 or more neurons (for a total of 502 datasets).

In summary, the data for our experiments were mixtures of subets of spikes each with different population size in either 48 or 80 dimensional input space, for dataset-1 and dataset-2 respectively. All the 9 subsets of waveforms used to create the mixtures in dataset-2 are displayed in S1 Fig in supplementary material. The 594 templates that are used to make the 95 mixtures in dataset-1 are available publicly through the creators [6].

### 2.2 Problem definition

Let *X* = {**x**_{1}, …‥**x**_{n}} ⊂ ℜ^{p} denote a set of vector data representing *n* spikes generated by one or multiple neurons. The coordinates of **x**_{i} are voltage samples that describe a spike event (they are always voltage samples in this article). The *k*-partitions of the *n* objects in a set *X* can be represented by a *k* × *n* matrix *U* in *M*_{hkn}, written in terms of the *k* crisp subsets of it (the clusters X_{i}) as
(1a)
(1b)

Here is an example. Suppose *X* = {*x*_{1}, *x*_{2}, *x*_{3}, *x*_{4}} is partitioned into 3 subsets, as shown at (1b). An equivalent matrix representation in the notation at (1a) is
The partition matrix U is in *M*_{h34}, the h denoting hard (not soft), 3 for the number of subsets, and 4 for the number of objects. More generally, *M*_{hcn} is the set of all hard partitions on n objects into k subsets.

Finding clusters in *X* comprises three steps: deciding how many clusters (*c*) to look for; constructing a set of candidate partitions {*U* ∈ *M*_{hcn}} of X; and selecting a “best” partition from CP (cf. Eq (2) below) using a *cluster validity index (CVI)*.

### 2.3 Dimensionality reduction and feature extraction

Data vectors in ℜ^{p} usually have high dimensionality (*p* > 3) (e.g., images, videos, and multi-variate data streams). Feature selection and dimensionality reduction algorithms are used to (i) make pre-clustering visual assessment of the structure of the data and (ii) to improve the performance of data-driven procedures, such as those for classification and clustering. Typical approaches for these procedures include those discussed in [19] and [20]. Methods for obtaining an “optimal” set of features (the intrinsic dimension of the input data) for clustering abound. Campadelli et al. [21] have a good survey of such methods. We point out finding the intrinsic dimension of a data set is not directed towards visual assessment, since the intrinsic dimension (there are many competing algorithms that find different features) is usually greater than 2 or 3.

Here we focus on visualization based on several well-known dimensionality reduction algorithms that have been used in a multitude of domains including neurosciences; we refer the interested reader to [20] and references therein for technical details.

Principal component analysis (PCA) is one of the most important and widely utilized linear dimensionality reduction techniques [22]. In order to find a low-dimensional subspace that accounts for the maximum amount of projected variance, PCA projects the data along the directions given by the leading eigenvectors of the data covariance matrix, i.e., the directions associated with the largest eigenvalues of the sample covariance matrix.

In neuroscience research, another common approach is to extract features of the waveforms that have a physical meaning such as waveform’s negative or positive amplitudes, known as valley and peak, their ratio or width, or waveform’s energy, among others [23, 24]. For our experiments we selected peak to valley (PV), and energy (E), hereby called PV-E. We remark that the PV-E features contain essentially the same information as the min peak-max peak features used by [25]. Another method based on wavelet transforms that enables visualizing the data in the wavelet coefficient subspace has also been implemented in clustering packages such as Waveclus and Combinato [7, 26].

We also consider two nonlinear dimensionality reduction techniques. The first of these is t-SNE (t-student Stochastic Neighbor Embedding), developed by [27]. It works by converting Euclidean distances between high-dimensional input data into conditional probabilities. In doing so, t-SNE converts the geometric notion of similarity into a statistical concept: if *x*_{j} is a neighbor of *x*_{i}, then the conditional probability *p*_{j|i} is high. Then, t-SNE finds low-dimensional representations *y*_{i} and *y*_{j} of *x*_{i} and *x*_{j} by minimizing the discrepancy between the upspace *p*_{j|i} and downspace conditional probabilities *q*_{j|i}, technically achieved by minimizing the Kullback-Leibler divergence between them. The objective of t-SNE is to minimize the sum of the divergences over all the data points. The downspace dimension is a choice made by the user.

Two features of t-SNE should be noted. First, it is not a linear projection like PCA but rather has a non-convex cost function, so its output may be different for different initializations. Second, it is a parametric technique. Different settings of hyperparameters such as the learning rate, the *perplexity*, and the iteration rate in the t-SNE algorithm generate different maps in the scatterplots, and may cause misinterpretation of the data structure [27].

The main parameter that affects the results of t-SNE is the perplexity, which is the limiting condition for the entropy of the probability distribution of the similarities of datapoints in the upspace. This means that the variance of the Gaussian that is centered over each datapoint, i.e., the extent of the neighborhood around that point, is limited by the choice of perplexity.

This limitation affects each datapoint separately based on the local density of the data. This is the feature that enables t-SNE to avoid crowding points in the center of the map so that cluster structure of the data in the upspace data is often seen in the t-SNE downspace projection. This feature, however, comes at the cost of sacrificing the shape of the distribution so that the distances between the clusters may not be meaningful. In other words, it is not possible to infer reliable spatial information from the topology of the low-dimensional maps.

Fortunately, the topology is not relevant for our application: viz. suggesting clusters in the neuronal waveform data. The optimal choice of perplexity is dependent on the number of points (spikes) in the dataset. We found that for neuronal datasets with thousands of spikes (data points), as long as the extreme values in the parameter ranges are not selected, the t-SNE algorithm is not very sensitive to changes in perplexity. On the other hand, the reliability of t-SNE visualizations seems to decrease as the number of samples decreases. See [28] for an example. We tried a large range of parameters for our datasets and selected a perplexity value of 30, learning rate of 500 and an iteration limit of 2000 based on the visual quality of the t-SNE embeddings.

We also consider another traditional nonlinear dimensionality reduction technique called the Sammon mapping [29], which is one form of multidimensional scaling. Multidimensional scaling (MDS) seeks a low dimensional embedding of the input data while preserving all pairwise Euclidean distances (In a more general setting, t-SNE can be interpreted as a form of probabilistic MDS). However, high-dimensional data usually lies on a low-dimensional curved manifold [30]. In such cases, preserving pairwise Euclidean distances will not capture the actual neighboring relationships: the actual distance between two points over the manifold might be much larger than the distance measured by the length of a straight line connecting them, i.e., their Euclidean distance). Sammon mapping improves upon classic multidimensional scaling by directly modifying its original cost function, i.e., the distortion measure to be minimized. In particular, the Sammon mapping cost function weights the contribution of each pair of data points relative to the overall cost by taking into account the inverse of their pairwise distance in the original high-dimensional input space. In this way, Sammon mapping often preserves the local structure of the data better than classical multidimensional scaling.

While these five methods do not all produce lower dimensional data with an analytic projection function, we will call all downspace data sets projections.

### 2.4 iVAT

There are a number of imaging techniques that can be applied directly to the upspace data before clustering it. Here we describe the iVAT method described in [31], which is a generalization of the original VAT algorithm given by [32]. *Improved Visual Assessment of Tendency* (iVAT) is a visualization tool that uses any dissimilarity matrix, D, of the data to display potential cluster structure. The steps of the iVAT method are the following. The vectors in the dataset are represented as vertices in a complete graph, with the distances between them the weights of the graph. The algorithm first finds the longest edge in the graph. Then, starting at either end, it finds the minimal spanning tree (MST) of D based on Prim’s algorithm [33]. Then, it reorders the rows (and columns) of D based on the order of edge insertion in the MST, creating D* (up to this point this is the original VAT algorithm). Then, iVAT transforms D* to D’* by replacing each distance *d*_{ij} in D* with the maximum edge length in the set of paths in the MST between vertices *i* and *j*. When displayed as a gray-scale image, I(D’*), possible clusters are seen as dark blocks along the diagonal of the image. Images of this type are often called *cluster heat maps* in the neuroscience literature [34, 35].

iVAT is not a clustering method or measure of performance for clustering algorithms, but a method to visually extract some information about the cluster structure from the input space before application of any clustering algorithm. iVAT does not alter the physical meaning of the input data (even after the shortest path transformation), it just rearranges the objects in a way that emphasizes possible cluster substructure. The recursive computation of *D*′* given in [31] is *O*(*n*^{2}). S1 Appendix contains the pseudocode for iVAT. The iVAT algorithm requires no parameters to pick other than the dissimilarity function (d) used to convert X to D. This input matrix can actually be a bit more general than a true distance because its only requirements are that *D* = *D*^{T}; *d*_{ij} ≥ 0∀*i*, *j*; *d*_{ii} = 0∀*i*. The most important points about this display technique are that it is applied directly to (a distance matrix of) the upspace data, so there is no distortion of the structural information introduced by a feature extraction function from the upspace to a chosen downspace, and iVAT preserves the physical meaning of the measured features. While any vector norm can be used to build an input matrix D(X) from a set X of feature vectors, the only distance used in this article is Euclidean distance. It is very important to understand that an iVAT image merely suggests that the input data has a certain number of clusters. Since iVAT can produce images from data of arbitrary dimensions, we can use it (or its scalable relative siVAT, [36]) to make a visual estimate of possible cluster structure in any upspace. While the iVAT algorithm is occasionally “wrong” (misleading), iVAT images usually provide some idea about the cluster structure of the input data [13].

Thus, iVAT provides clues about potential starting points for finding a useful partition of the input data. [28] have shown the connection of VAT and iVAT to Dunn’s index and single linkage (SL) clustering. The intensity of the blocks in iVAT images are a (more or less) visual representation of the structure identified by single linkage clustering for labeled or unlabeled data. This suggests that iVAT might be regarded as a tool for “taking a peek” at a specific type of data structure in the input space.

### 2.5 Evaluating cluster quality

Cluster validity comprises computational models and algorithms that identify a “best” member amongst a set of *candidate partitions (CP)*
(2)
where *c*_{m} and *c*_{M} are the minimum and maximum specified values of the numbers of clusters sought.

The approach to identify a “best” partition U (and concomitant value of *c*) in *CP* can be internal: using only information from the output results of the clustering algorithm, or external: using the internal information together with an outside reference matrix, usually the ground truth labels. Here, we use a classic internal scalar measure called *Dunn’s index*(DI) [37], and one generalization of it given by Bezdek and Pal [1998] called the generalized Dunn’s index (GDI_{33}). Dunn defined the diameter of a subset *X*_{k} as the maximum distance between any two points in that subset (△(*X*_{k})), and the distance between subsets *X*_{i} and *X*_{j} as the minimum distance between any two points of the two subsets (*δ*(*X*_{ij})). This index is based on the geometrical premise that “good” sets of clusters are compact (dense about their means) and well separated from each other. Larger values of DI imply better clusters, so we call DI a max-optimal cluster validity index (CVI).

Let *X*_{i} and *X*_{j} be non empty subsets of ℜ^{p}, and let *d*: ℜ^{p} × ℜ^{p} ↦ ℜ^{+} be any metric on ℜ^{p} × ℜ^{p}. Define the diameter △ of *X*_{k} and the set distance *δ* between *X*_{i} and *X*_{j} as:
(3)
(4)

Then for any partition *U* ↔ *X* = *X*_{1} ∪ … .*X*_{i} ∪ …*X*_{c}, **Dunn’s separation index** of *U* is:
(5)

Since we have labeled mixtures, we can calculate Dunn’s index on ground truth partitions in the upspace (input dimensional space) to give a measure of the compactness and isolation quality of the subsets in the original space. We have previously shown that this measure usually correlates with the quality of the visual assessment of potential cluster structure given by iVAT [28]. In the present work, we will also use a generalized version of Dunn’s index developed by [38] that alters the average distance from the mean as △ and the average linkage clustering distance as *δ*. The **Generalized Dunn’s index** (*GDI*_{33}) is:
(6)
(7)
(8)
where is the mean or centroid of the cluster. The notation |*d* in Eqs 3 to 8 for Δ and *δ* indicate that these formulas are valid for any metric d on the input space.

It has been shown that *GDI*_{33} is more robust with regards to sensitivity to outliers and hence produces more meaningful values for real life datasets with abundant aberrant points [39].

Although the focus of this paper is to advocate the utility of pre-clustering visualization of the data, we complemented our analysis with a clustering phase. We applied k-means clustering to our labeled mixtures (both dataset 1 and 2) to illustrate the effect of the dimensionality reduction phase on the quality of spike sorting. K-means is a classic clustering algorithm that takes the assumed number of clusters (k) and assigns cluster labels based on minimizing the squared error between the datapoints and centroids. K-means clustering was applied on the upspace data and in all the 5 downspace projections.

To evaluate the quality of the clustering on different spaces, we used the external **adjusted Rand index (ARI)** developed in [40], which is a well-known and fairly reliable criterion for performance assessment of the clustering results. Let *V* ∈ *M*_{hrn} be the crisp partition of the n objects possessing r clusters, according to ground truth labels. Let *U* ∈ *M*_{hcn} be any crisp partition of n objects with the k clusters generated by any clustering algorithm. Note that r does not necessarily equal c. The ARI is a measure of similarity between U and V, computed as:
(9)
where,

- a = Number of pairs of data objects belonging to the same subset in U and V.
- b = Number of pairs belonging to the same subset in V but to different subsets in U
- c = Number of pairs belonging to the same subset in U but to different subsets in V.
- e = Number of pairs not in the same subset in V nor the same subset in U.

[40] developed this correction to eliminate bias due to chance from Rand’s index. The ARI is also a max-optimal index in the sense that larger values imply a better match between the ground truth and the results of the clustering. As evident from the equation, this formula incorporates simpler measures such as percentage of spikes in the real set present in the sorted set or percentage of spikes in the sorted set that come from the real set.

## 3 Results and discussion

### 3.1 Visual assessment of cluster tendency

#### Analysis on dataset-1.

It is impossible to make a direct visual assessment of a set of recorded spike waveforms *X* = {**x**_{1}, …‥**x**_{n}} ⊂ ℜ^{p}, since each waveform has more than three voltage samples (i.e., dimensions), *p* > 3. The *upspace* dataset X can be mapped to a *downspace* dataset *Y* ⊂ ℜ^{q} by a feature extraction function *ϕ*: ℜ^{p} ↦ ℜ^{q} in many different ways. Dimensionality reduction methods are commonly employed for visualization purposes to gain insights into the data structure; and to provide clustering algorithms with lower-dimensional data to increase the computational efficiency. Next, we will demonstrate that different dimensionality reduction methods provide different scatterplots of the data, and hence, visually suggest different numbers of clusters. Towards this end, we used spike subsets of the simulated dataset that includes 5 simulations for each combination of different number of spike subsets for k from 3 to 21. Below we show some representative results: two cases of k = 3 (one with low Dunn’s index, DI, and one with higher DI) and then one case each for k = 5, k = 10, k = 15 and k = 20.

Fig 1 shows two cases from the dataset with k = 3. The colors in Fig 1 correspond to the three data labels. Bear in mind, as you view this and subsequent figures, that in the real case, the data are always unlabeled, so the projected data will be just one color, and the apparent structure will be much less evident than it seems to be in these figures. Fig 1a is a ‘good’ case in which all the algorithms map the spikes to projections with visually well-separated clusters and iVAT agrees with them (the larger diagonal block contains two less apparent, sub-blocks). In Fig 1b however, all 2D projections except t-SNE produce a single cluster (when plotted without colors), while t-SNE seems most successful in separating the three subsets (arguably, t-SNE shows k = 2 clusters when colors are omitted). The iVAT image suggests k = 2, conforming to the apparent (uncolored) pair of t-SNE clusters. The low value of DI is a warning that there is not much separation between these three subsets of waveforms.

The colors correspond to the three known membership labels, i.e. each color, yellow, blue or green, represents all the spikes from one of the three neurons. Note: In practice the membership labels are not known (hence the figures would be mono-color). On the right, all the spike waveforms are overlaid with the average waveform of each subset (bold black line). Note ref. to description of dataset 1: all waveforms are 2 ms long and with normalized amplitude. And at the far right is the iVAT image. Case (a), fairly well separated subsets with Dunn’s index = 0.168 and (b) subsets with low Dunn’s index at 0.068.

Figs 2 to 5 show representative mixtures of k = 5, 10, 15 and 20 component mixtures. These examples, and many others not reported here, show that iVAT and t-SNE usually provide useful visual estimates of the number of clusters up to around k = 15, but the other methods almost always fail with k = 4 or more subsets. We had 5 cases for each number of subsets (e.g. 5 different cases of mixture at k = 10, etc.) and overall t-SNE provided the most consistent estimate of the presumptive numbers of mixture components. There were some cases for which iVAT failed to display the expected number of dark blocks in mixtures having fewer than 10 components. The block structure in some of the reproduced iVAT images is more apparent at higher resolutions than shown here. Our experiments suggest that iVAT is somewhat sensitive to noise in the waveforms, which often manifests itself as a falloff in intensity towards one end of the diagonal. See Fig 2 for an example.

The spike waveforms are projected on the 2D feature space by: PCA, Peak-to-Valley and Energy features, Wavelet decomposition, t-SNE, and Sammon methods. The colors correspond to the three known membership labels which in real cases are not known. On the right, all the spike waveforms are overlaid with the average waveform of each subset in bold black line and at the far right is the iVAT image. The iVAT image displays 4 clear blocks and some disconnected data due to noise in the lower right: the only projection that clearly shows k = 5 is t-SNE.

From left to right projections (PCA, Peak-to-Valley and Energy features, Wavelet decomposition, t-SNE, and Sammon), the spike waveforms with the average of each known subset in bold black, and the iVAT image. The iVAT image displays 10 blocks; the projection that clearly shows k = 10 is t-SNE.

From left to right projections (PCA, Peak-to-Valley and Energy features, Wavelet decomposition, t-SNE, and Sammon), the spike waveforms with the average of each known subset in bold black, and the iVAT image. The iVAT image displays 9 blocks and t-SNE shows 13 (colored), and 9 or 10 in black.

From left to right projections (PCA, Peak-to-Valley and Energy features, Wavelet decomposition, t-SNE, and Sammon), the spike waveforms with the average of each known subset in bold black, and the iVAT image. The iVAT image displays 13 blocks and t-SNE shows 12.

Overall, from the t-SNE visualizations of the data, we infer that there is a common waveform shape among many cells suggested by the existence of a large central region populated by most of the clusters. However, there are cells that have clearly distinct signatures. The individual clusters exhibit variability: for some neurons the spike waveforms define homogeneous and compact clusters, while others are elongated clusters in the nonlinear space. This suggests that the relation between waveform samples for different neurons is different (or the way the waveform samples interrelate is different in different neurons’ spikes).

The next example in this section highlights the ability of iVAT to address two additional problems encountered in spike sorting, namely, anomaly detection and the need for multistage clustering (aka “re-iteration” amongst subclusters). Fig 6a is a set called Z (an arbitrary label) of n = 4665 waveform vectors comprising a mixture of k = 10 labeled subsets from simulated dataset-1. The 10 waveforms shown in Fig 6b are the average waveforms, , of the ten labeled subsets.

Panel (a) shows the waveform with annotation to the subset of waveforms that constitute the *Z*_{4} subset. Panel (b) highlights the templates (i.e. mean profile) of the 10 units. Panel (c) is the iVAT image of the 10 templates. Panel (d) is the dendogram given by applying single linkage hierarchical clustering to the 10 templates. Panels (e) and (f) display t-SNE projections with different settings of the perplexity parameter.

Visual inspection of Fig 6b suggests that , the average waveform of the 488 spikes for unit 4, here called Z_{4}, is an outlier (an anomaly) to the general shape of the other 9 graphs. This (easily seen) visual evidence suggests that Z_{4} may form an anomalous cluster in the input or projection spaces. But this observation does not justify removal of all 488 unit 4 spikes from the input data. However, the iVAT image of Z_{4} will corroborate our suspicion that Z_{4} is an anomalous cluster in Z.

Fig 6c is the iVAT image of , and Fig 6d is the dendrogram of the clusters produced by extracting the single linkage hierarchy of clusters from the vectors in . The integers along the borders of the iVAT image of show the identity of each pixel after iVAT reordering. The visualization in Fig 6c is quite informative: it not only isolates as an outlier (the single pixel at the lower right corner of the image), but it also depicts the other 9 subsets as members of a second large cluster. Moreover, this image suggests a hierarchical substructure within the 9x9 block. The intensities of {5,7} and {6,10} suggest that these pairs of subsets are closely related. The {3,9} block is next in intensity, followed by the 5x5 grouping of {8, 5, 7, 9, 3}, which are then coupled to {6,10}, and then this whole structure is embedded within the 9x9 block which includes {1,2}. We remark that the SL hierarchy is easily extracted by applying a back pass that cuts edges in the iVAT MST that reordered (cf. [41]). Fig 6c and 6d make the relationship between iVAT and single linkage quite transparent. And Fig 6c illustrates how an iVAT image can suggest multicluster substructure in a data set.

Fig 6e and 6f are scatterplots of t-SNE projections of corresponding to perplexity settings of 2 and 3. Both views show the labels of the 10 mean profiles, and both views seem to indicate that is an outlier in the set . We show these two projections to emphasize that every run of t-SNE with different settings of its hyperparameters may produce different visualizations of its input data. On the other hand, the iVAT image is uniquely determined up to a choice of the distance measure used to construct D.

Fig 7a is the iVAT image of the data set Z shown in Fig 6a. Comparing Fig 6c to 6f shows that iVAT very clearly suggests the same coarse cluster structure (k = 2) in all of the upspace data that it sees in , the set of mean profiles. Neither image suggests that k = 10; instead, both suggest that the best interpretation of the input data or its means is to first isolate the unit 4 waveform(s), and then regard the remaining spikes as a new cluster, which becomes a candidate for multistage clustering (reclustering, or re-iteration per Niediek et al. [2016]). Note that iVAT makes this information available whether the data are labeled or not.

Panel (a) is iVAT applied to the mixture. The iVAT image includes 2 blocks only: the spikes of subset *Z*_{4} constitute one block and the spikes of the other 9 units form the bigger block. Panel (b) displays the t-SNE projections of all the spikes in the mixture, color coded based on the membership of the data points to the units labeled with numbers. Panel (c) is the same t-SNE projections without labels. Comparing panels (b) and (c) shows that if the labels were not known the t-SNE map doesn’t show distinction between all the 9 units.

Finally, Fig 7b and 7c are labeled and unlabeled t-SNE scatter plots of Z. Both views suggest that Z contains 5 clusters. Subset Z_{4} is isolated in view Fig 7b, but not more isolated than subset Z_{2}, so t-SNE is less assertive about the anomalous nature of Z_{4} than iVAT is. If the labels are available, reclustering might be applied to 9, 7, 5, 8 and/or 3,6,10 to make a finer distinction between spike subsets. If the labels are unavailable, it’s hard to see what can be inferred from the t-SNE projection about Z beyond the suggestion provided by view Fig 7c that we should seek 5 clusters in Z.

We conclude this example with some general observations. First, the iVAT image is unique, while t-SNE plots are a function of three user-defined parameters. Second, single linkage clusters of the input data are available via clusiVAT [41] once an iVAT image is built. Third, while Z has 10 labeled subsets of input spikes, neither iVAT nor t-SNE makes this prediction. This emphasizes the fact that labeled subsets may not necessarily be clusters with respect to any computational scheme designed to detect clusters.

#### Analysis on dataset-2.

Now we turn to dataset-2 to further investigate the limits of discernible spike subsets since as mentioned previously, sets drived from dataset-2 are combinations of real spikes originated from pyramidal cells in rat hippocampus (ref to [42]). We extracted the spike subsets of nine individual neurons obtained from the different experimental trials. From these nine subsets, we built 36 mixtures at k = 2; 84 mixtures at k = 3; 126 mixtures at k = 4; 126 mixtures at k = 5; 84 mixtures at k = 6; 36 mixtures at k = 7; 9 mixtures at k = 8; and one mixture of all nine subsets (k = 9). This yields a total of 502 mixtures of labeled waveforms.

For the sake of brevity, we showcase four representative units and the various mixtures that can be built from them at k = 2, k = 3, and k = 4. Fig 8 shows the four representative subsets (all nine subsets of the waveforms are shown in S1 Fig in the supplementary materials).

Note that the waveforms in X4 are visually different than most of the waveforms in the other three subsets. This fuels an expectation that mixtures with X4 as one component will be somewhat more separable than mixtures without it.

Fig 9 shows all six views of pairs (Xi, Xj) made with 2D transformations of the 80D (upspace, p = 80) datasets for the mixtures of two representative single units. We will name the mixtures (Xk, Xj) = Xkj and will follow this convention for all mixtures. For example, the mixture of X1 and X2 is X12, and the mixture of X1, X2, and X4 becomes X124. The order in which X124 is built is not relevant because the union operation is both commutative and associative: i.e., (X1 ∪ X2) ∪ (X2 ∪ X1) = X12 = X21, and (X1 ∪ X2) ∪ X3 = X1 ∪ (X2 ∪ X3) = X1 ∪ X2 ∪ X3 = X123. The notation X124 means all the waveforms from X1, X2 and X4, put together in arbitrary order. The waveforms comprising each mixture are also shown, with the average waveform for each single unit in thick black. The colors of points in the 2D scatterplots correspond to class labels of the mixed data. It is important to remember that in a real application, the data are not labeled, so the associated 2D scatterplots will be mono-color dots in the plane. The mixtures are ordered according to increasing values Dunn’s index. Observe that for each mixture, different 2D projections may offer different interpretations of the cluster structure in the upspace data. In Fig 9a, all five projections show one big cluster, far more evident if the color labeling is missing, which is the case for real experiments in which we do not know the membership of the waveforms. In cases like this, since the clusters are projected densely side by side, human operators or algorithms tend to select only the core of the clusters. This usually produces better values for cluster validity indices, but at the expense of unwarranted confidence in subsequent analyses.

From left to right projections (PCA, Peak-to-Valley and Energy features, Wavelet decomposition, t-SNE, and Sammon) color coded with the known labels, the spike waveforms with the average of each subset in bold black, and the iVAT image. The figure shows disagreements between projection methods in providing maps of distinct subsets. The figure also shows an increase in quality of visualization with increasing Dunn’s index.

First, some general observations. Fig 9a, 9b and 9c all have a DI value of around 0.09. This is a relatively low value that indicates a lack of separation between the two components of the mixture. The iVAT images for these three cases are basically uniform (no strongly visible dark blocks), which indicates that the upspace data are not well separated. Separation emerges in Fig 9d, 9e and 9f, the three cases that have X4 as one component. Dunn’s index is essentially doubled (0.18 up to 0.23), so upspace separation of the pair of clusters has increased. The most visible separation is seen in the t-SNE downspace scatterplot, which is mirrored in the upspace iVAT images: the strong dark block corresponds to subset X4. Now we will discuss the six cases in more detail.

In Fig 9b, PCA, Wavelet, t-SNE, and Sammon show two clusters, while the PV-E plot shows just one. In Fig 9c, PV-E, wavelet, and Sammon projections show one cluster, it can be argued that PCA shows two while t-SNE shows two well-seperted clusters. In the other images, which include X4, a less distorted and noisy set of spikes, all the projections do a good job of mapping the clusters in a separable manner (for the wavelet projection in Fig 9e, it is hard to see two clusters when there are no color labels). The iVAT image also follows the same trend: the clarity of the two blocks generally becomes higher with a higher Dunn’s index.

The Peak to Valley and Energy (PV-E), are the only real (physically meaningful) 2D features. All the other 2D projections are dimensionless, i.e., they do not have physical meaning. It is important to emphasize that neither the 2D projections nor iVAT produce clusters, all these visual methods just suggest how many to look for.

The projections and the iVAT image of X23 with DI = 0.091 (Fig 9b) is a bit more separable and clear than the mixture of X13 with DI = 0.097 (Fig 9c). Both values are relatively small, and the difference between these two values (0.008) is negligible, indicating that these two cases are somewhat indistinguishable. The iVAT image for X14 clearly suggests the k = 2 at a Dunn Index of 0.228. This provides a much stronger indication of reliability than the smaller DI values. Indeed, Dunn characterized a partition as being compact and separated if and only if DI > 1. DI values less than about 0.5 usually characterize relatively poor cluster structure.

All the cases of mixtures of three subsets are portrayed in Fig 10, again ordered by their Dunn’s index, which is quite low and nearly equal in all four views. The numerator of DI is the minimum distance between any pair of subsets, and the denominator is the largest distance between points in some clusters, so it is dominated by the smallest between-subset distance and largest in-subset distance. Consequently, DI fails to recognize competing clusters that cannot dominate either of the two factors in Dunn’s formulation. These non-dominant clusters can often be seen in iVAT imagery. For example, in Fig 10b, the small yellow cluster seen in the t-SNE scatterplot of X124 appears as the small dark block in the lower right corner of the corresponding iVAT image. In Fig 10a all the projections except for t-SNE fail to point to k = 3 and the iVAT image is not informative either. In Fig 10c for X234, the PV-E and Wavelet projections suggest that k = 2, while PCA, t-SNE and Sammon point to k = 3. The t-SNE features provide the widest and most visible separation between the three clusters. The iVAT image of X234 is weakly suggestive of k = 3. Fig 10d for X134 provides a striking contrast in the ability of the visualization methods to correctly portray the presumed structure in the data. PV-E and Wavelet suggest k = 1, PCA and Sammon imply k = 2, and t-SNE points clearly to k = 3. The iVAT image is pointing to k = 2, at a relatively low value of DI = 0.097.

In each row, from left to right projections (PCA, Peak-to-Valley and Energy features, Wavelet decomposition, t-SNE, and Sammon) color coded with the known labels, the spike waveforms with the average of each subset in bold black, and the iVAT image. The contrast in the ability of the visualization methods to correctly portray the presumed structure in the data is more evident in this figure.

Finally, a similar trend continues in the k = 4 subset mixture of X1, X2, X3, and X4. The PV-E and Wavelet features indicate only one big cluster, and PCA, Sammon, and the iVAT image single out X4 while packing the other three sets of waveforms into a single cluster, whereas, t-SNE maps the four subsets with arguably enough clarity to declare that X1234 probably has four clusters. It can be argued that while the input has k = 4 labelled subsets, the primary visual evidence does not support k = 4, nor will there be a “best” set of clusters in the upspace at this value of c. In other words, just because the subsets have 4 labels does not guarantee that a cluster analysis of the data will agree. When you imagine the scatterplots in Fig 11 without colors there are not four distinguishable clusters present.

From left to right projections (PCA, Peak-to-Valley and Energy features, Wavelet decomposition, t-SNE, and Sammon) color coded with the known labels, the spike waveforms with the average of each subset in bold black, and the iVAT image. All the projections except for t-SNE fail to point to k = 3 and the iVAT image is not informative either.

In summary, we note again that although the subsets of spikes were obtained from independent trials, they were all induced by intracellular current injections to hippocampal pyramidal cells and recorded from their close proximity in extracellular medium. This to some degree explains the similar average waveforms and high variability between spikes of one subset. We observed that, for example, from the thirty-six mixtures of two subsets that can be created from the nine spike sets, not all of them were mapped as two clusters with the same quality (i.e. the quality of visualization was not consistent). overall, visualizations of mixtures using any of the other projection methods and iVAT did not suggest discernible clusters. This points out the challenge in identifying neurons of the same class (e.g., pyramidal) from their spike waveforms, at least when they are induced by current injections.

### 3.2 Objective assessment of clustering quality using different projections of the data

So far, we have shown that the lower-dimensional representations in our study may give different interpretations of the upspace data. This problem is highly dependent on the definition of similarity between spike waveforms of different units. Overall, iVAT and t-SNE were most helpful in assessing the pre-cluster presumptive structure of the waveform mixtures. In order to provide a more quantitative assessment of the effectiveness of the different low-dimensional representations in processing spike waveforms, we ran the k-means clustering algorithm on each of the 95 mixtures from dataset-1 and the 502 mixtures from dataset-2. For each mixture, five replicates of k-means were run using new initial centroid points and the run with lowest within-cluster sums of point-to-centroid distances was selected.

Dunn’s index and its generalizations provide measures of the intrinsic quality of the computed clusters (based on their distribution with respect to each other). Fig 12 shows the average Dunn’s index (DI) and generalized Dunn’s index (GDI_{33}) of the mixtures for the two datasets.

The two indices have the same trend: they decrease almost monotonically as the number of components (c) increases. However, the generalized version, GDI_{33}, provides a much clearer idea of the trend than DI because it has higher values that reflect separation more clearly, and it avoids the bias of inliers and outliers that may affect Dunn’s index. On the other hand, Fig 12 also suggests that both indices tend to favor lower numbers of clusters. This is a different type of empirically observed bias that must be accounted for when relying on cluster validity indices. See [43] for a discussion related to this point.

The DI and GDI_{33}, as internal measures, were used to give a sense of the structure inherent in ground truth partitions of the data in the upspace. Then, to evaluate candidate partitions produced by k-means in the upspace and downspace data sets, we used the adjusted rand index (ARI), which compares the cluster structure of each k-means partition to its ground truth partner at every value of c.

The k-means clustering algorithm is executed on each pair of features obtained with the five methods. The number of clusters to be generated (i.e., c) is set equal to the number of labeled subsets in the ground truth partition (i.e., 2, 3, 4,‥to 9 for cases in dataset-2 and 3, 4, 5, … .to 21 for cases in dataset-1). For each computed partition, we calculate the ARI measure of agreement between the computed waveform memberships and the memberships as given by the ground truth partition (recall that our data is labeled).

Fig 13a and 13b report the average ARI for mixtures in dataset-2 and dataset-1, respectively. In each figure, the first column is the ARI of the clusters achieved by running k-means on the input dimension space (the 48D waveforms for Dataset-1 and the 80D spike waveforms for Dataset-2). The next columns show the average ARIs calculated for the clusters achieved by k-means clustering on the 2D datasets produced by the five techniques. The ARI maximizes at 1, so clustering in the 2D t-SNE downspace data provides k-means clusters that, on average, slightly better match the ground truth partitions than k-means clusters in the input space.

## 4 Discussion and conclusions

Spike sorting is an unsupervised learning problem since the number of neurons associated with the recorded spikes are unknown. Hence, what we consider to be the spike train of a single unit may, in fact, be the spike train of multiple units. This does not negate the usefulness of the previous findings which apply the results of these sorting algorithms to infer neural coding and brain function. Most of this research is based on the rate coding principle [44], which uses the spike rate of the sorted units to model the neuronal response. Rate coding models neglect sorting errors assuming that as long as the spike rate changes according to the stimulus, the model will capture the response, whether the spike train consists of spikes of one or multiple neurons. However, the real units’ tuning curves might be different and a contaminated tuning curve may give misleading results.

This has been shown in studies concerned with issues arising from sorting quality on the results of rate coding models. For example, [45] evaluated the quality of the off-line reconstruction of arm trajectories from electrode array recordings and showed that discarding spikes substantially degrades the decoding of the movement to the extent that decoding the unsorted recordings reached higher performance results. They also showed that adding the tuning model (temporal features) of the spiking to the sorting process does not always improve the sorting based on waveform features. We can use the analogy of a verbal fight or discussion among a few people. An observer can tell if the discussion is going smoothly or if it is heated based on the overall volume of the voices of the group, even if the words uttered by individuals is not discernible. This is why rate coding models are popular and successful in certain respects, but they cannot elucidate how neurons interact to give rise to brain functions [46–50]. This emphasizes that reliable spike sorting is much more critical if any temporal or multiplex coding schemes are to be used to infer neural responses.

Current spike sorting packages use variety of clustering techniques all of which can benefit from an insight to the cluster structure in the data: k-means, Gaussian mixture decomposition or similar algorithms need explicit specification of the number of neurons to seek; mean shift needs the threshold on the bandwidth parameter to be defined; [51] and density based clustering relies on a threshold on density parameter *γ* for the number of density centers [52]. In high-dimensional data, the role of visualization in gaining knowledge of the data structure is critical. There is no doubt, as in Plato’s allegory of the cave, that there is always a loss or distortion of structural information in any transformation from the upsace (aka: input space or input dimension) to any downspace. We investigated this issue using iVAT, a tool that enables direct visualization of cluster structure in the upspace as well as five dimensionality reduction methods, including t-SNE. We showed that better sorting can be achieved by securing a visual assessment prior to clustering which affords an estimate of the cluster structure of the data (i.e., the number of clusters, c), or at least a small interval of integers that presumably bracket the true (meaning most distinguishable by some clustering algorithm) but unknown number of clusters.

In this paper, we demonstrated that the visual assessment of *c* from the iVAT images is often possible, highlighting that if clustering in the upspace is preferred, a visualization tool such as iVAT can be integrated into the package to inform the manual curation process. However, there were cases, in particular when *c* was high (> 10), that the iVAT image did not clearly indicate the number of clusters. But the visual assessment of a user that makes the estimate of *c* subjective. We showed that extracellular neuronal waveforms generate noisy datasets that at times do not comprise well-separated clusters. So, iVAT does not provide the definitive answer to the problem of spike sorting. Nevertheless, it provides insight into the coarse structure of the dataset. Moreover, we mentioned the relationship between iVAT and the single linkage clustering algorithm that is illustrated in Fig 6c and 6d (See [53] and [28] for further discussion). The majority of the edges in the MST that iVAT builds connect neighbor points and hence have very small values. The largest values in the MST usually correspond to edges that connect clusters (and the outlier points). The threshold between the small and large values reflects finer distinctions between clusters in the upspace, which can be used in assigning spikes to clusters. This feature was integrated into scalable and faster implementations of the iVAT algorithm which can be used for big datasets [54].

In order to highlight the importance of dimensionality reduction and feature extraction techniques (the pre-clustering stage), this subsection presented a comparison between clustering in the different downspaces and also the input space, using the same clustering algorithm in all spaces. It is important to recognize that the choice of clustering algorithm contributes to the accuracy of membership assignments. Also, the techniques and parameter choices applied in pre-processing of the signals such as in filtering and spike detection phases alter the end results. Here, we used the classic k-means clustering algorithm as opposed to common spike sorting softwares used by the neuroscience community to bypass the different pre-processing techniques that may affect the down-space representation of spikes. In this way, we provided a controlled set of spikes to be visualized by the 6 methods in our study. We further note that the choice of 2D as opposed to 3D in our manuscript was for publication purposes. Using 3D scatterplots may provide more information that improves the visualization and clustering, but this does not address the “best features” or “best algorithm” issues that follow pre-clustering assessment. Those issues will be addressed in a followup paper.

Our examples show that t-SNE performs well for projection of high dimensional data to the viewing plane. Given the good estimate for k provided by visualization with t-SNE projections, a logical next step would be be to substitute this t-SNE features with the one used within one of the softwares and compare the results. We note that t-SNE for the present analysis was parameterized (perplexity of 30 and learning rate of 500). The creators of t-SNE provide evidence that the performance of the algorithm is robust to these tunable parameters within a range, for example for perplexity within 5 to 50 [27, 55]. Nevertheless, we tested a range of parameters to empirically arrive to the chosen values. We observed that for datasets with large sample numbers (i.e. many spikes), as in our datasets, the parameter tuning within the said range didn’t change the quality of the projections. However, we acknowledge that the presence of parametrization in an algorithm is a disadvantage towards automated spike sorting and may limit the use of t-SNE to a good pre-clustering visualization as opposed to providing the features actually used for clustering. With regards to computational efficiency: the Barnes-Hut variant of t-SNE (publicly available through GitHub) highly accelerates the computations [55]. Since this study focused on investigating the fundamental relevance of pre-clustering methods using standard ground truth datasets the computation time was not reported. Dimitriadis et al. [56] report computation speeds of hours for big datasets containing > 10^{5} spikes.

Our experiments on visualization of the two labeled datasets provided further insights into spike sorting. In the first dataset, simulations were generated using average waveforms obtained from extracellular recordings in behavioral experiments. For the mixtures of spike subsets extracted from this dataset it was possible to estimate the presumptive cluster number in the data from the dark blocks in the iVAT images, even in some cases of mixtures of twelve subsets. Mixtures of higher subsets were sometimes displayed as compact and isolated clusters in the t-SNE projections. Our experiments confirm that when the data possess compact, well-separated clusters, visualization can be quite useful. Dataset-1 represents mixtures of spike sets that are generated by different cell types, brain regions and brain states, and these can be distinguished based on their spike waveforms. In contrast, dataset-2 represents mixtures of spike sets that are induced from cells of the same class receiving intracellular current injections, hence providing spikes with similar waveforms. Therefore, classifying based on extracellular waveforms alone may not be feasible in the latter case (cells of the same type receiving the same input). It should be noted that in sorting of spikes for each electrode, different distances of the units from the electrode improves sorting, since the amplitude (energy) of the waveforms is different. However, these results indicate that a further subtype classification beyond the two main classes of inhibitory and pyramidal categories (i.e. subtypes of pyramidal cells) may not be feasible by considering only the spike waveforms.

The extracellularly recorded potentials are already distorted signatures of intracellular action potentials, which makes the dimensionality reduction stage even more critical. The problem of crowding in lower dimensional maps such as PCA is well known. Analysis of the simultaneous extracellular and intracellular recordings have shown that the probability distributions of spikes from different neurons in the PCA feature space have some degree of overlap [57]. There is an inherent variability in the extracellular waveforms imposed by the background field potential [58], variations in the intracellular action potentials due to factors such as bursting [42], and slight electrode drift over the course of the experiment [59]. Activity of neighboring neurons is also a possible source of distortion in the waveform shape. Such activity may sometimes overlap in time and make multi-unit spike waveforms. The problem of overlapping spikes has been addressed by methods such as independent component analysis, ICA, (if number of electrodes is equal to or more than the number of neurons) [25] or template matching [60]. It is worth noting that in pre-clustering visualization of the data, overlapping spikes (multi-unit spikes) construct a cluster or block. Visualization is not to substitute post-clustering use of ICA or template matching. These methods can be used after clustering to extract individual sources of multi-unit spikes and reassign these spikes to the previously identified cluster templates. In fact, with the advances in high density probes and the need for computational efficiency, fast algorithms have been proposed based on the use of linear kernel banks [61] or template matching filters that enables real-time sorting [62]. Template matching methods rely on an initial clustering phase to obtain the ‘templates’ which is the clustering phase that naturally follows the pre-clustering visualization presented in this article.

With regards to the inherent variability in the extracellular waveforms, our results show that the t-SNE projection is the most reliable feature extraction scheme (for visualization) that we tested. We believe that t-SNE works well since it is a probabilistic-based approach that is appropriate for neuronal data. In a nutshell, the variability caused by the noisy spikes can often be circumvented by converting the deterministic dissimilarity measure between two waveforms into a probability of dissimilarities. We didn’t investigate the effect of increasing noise levels on the presented methods. Robustness to noise is an important aspect of any computational method. In a future study we will examine the effects of noise levels in visual assessment techniques, informed by observations of different types of noise such as: common-noise sources [63], Gaussian noise [64], and overlapping spikes [65].

Another reason why having a reliable dimensionality reduction stage is important is revealed by our results on Dunn’s index, which showed that, DI, in common with many other internal cluster validity indices, tends to be monotonic in *c*. This emphasizes the point that the common practice of running a clustering algorithm for several values of *c* and then choosing the best partition based on the optimal value of any cluster validity index may not be very effective. Moreover, by computing both DI and GDI_{33} for the same data, we demonstrated that there is no agreement about a generic CVI, a fact that has been shown before in previous experiments on internal CVIs [66]. Indeed, in the real (unlabeled data) case, it is wise to compute a number of different internal CVIs, with a view towards ensemble aggregation of the results. To appreciate the disparity that different CVIs can cause, see [39] for an extensive survey of 30 internal CVIs tested on 20 real data sets. See [67] for a survey of ensemble approaches to clustering. [68] have applied this method to aggregation of partitions obtained by different clustering methods used for sorting spike waveforms. Here we suggest using an ensemble approach on the votes cast by different internal cluster validity indices—DI and its 18 GDIs are just a few of the ones available in [39]—for each partition in CP. We think this approach will greatly improve the final interpretation of structure in unlabeled data. This will be the objective of our next foray into spike sorting clustering algorithms.

This study again confirms that there is no such thing as the best set of features for clustering or the best clustering algorithm for spike sorting, but that sorting is an iterative process that always comprises making a compromise between the best feature set and clustering algorithm. [12] list the identification of the number of clusters as the most outstanding question in techniques for neuronal classfication. This challenge can be partially addressed by subjective visual assessment of cluster tendency. While visual evidence is never enough, it has great value, as noted by the eminent statistician Sir Ronald Fisher, who said, nearly 100 years ago: “The preliminary examination of most data is facilitated by the use of diagrams. Diagrams prove nothing, but bring outstanding features readily to the eye; they are therefore no substitute for critical tests as may be applied to the data, but are valuable in suggesting such tests, and in explaining conclusions founded upon them” [69].

## Supporting information

### S1 Fig. The subsets of spikes of the 9 individual neurons used in the study.

Each subplot title displays the label of the experiment in Henze2009 dataset and the number of spikes in each subset: e.g., #745 means there are 745 waveforms in the sample.

https://doi.org/10.1371/journal.pone.0224547.s002

(PDF)

### S2 Fig. All the waveforms in the 90 mixtures of spikes detected from the simulations taken from Pedreira2012 dataset.

Each subset is generated from a template taken from real recordings as described in Material and Methods section. All the waveforms are in black and the average waveforms of subsets are in color. The color of the average waveforms doesn’t imply any order or significance. The colormap is Parula in MATLAB. The mixtures in subplots denoted by a yellow star label are the ones displayed as representative in the manuscript Figs 1 to 5. The one subplot with a red star is the mixture investigated in Fig 6.

https://doi.org/10.1371/journal.pone.0224547.s003

(PDF)

### S3 Fig. In this figure we reproduce the projections in Figs 1 to 5 (the representative mixtures from dataset-1) with axes labels.

In the PCA, Wavelet, t-SNE and Sammon plots, x axis is dimension 1 and y axis is dimension 2. In the PV-E plots, x axis is the peak to valley time (PV) in ms and y axis is the energy (E) of the waveform, unitless since derived from unitless simulations.

https://doi.org/10.1371/journal.pone.0224547.s004

(PDF)

### S4 Fig. In this figure we reproduce the projections in Figs 9, 10 and 11 with axes labels.

Those figures investigated all possible mixtures of the 4 subsets (i.e. sets of spikes fired by each neuron) shown in Fig 8. The subsets are taken from dataset-2 which are waveforms in microvolt units. In the PCA, Wavelet, t-SNE and Sammon plots, x axis is dimension 1 and y axis is dimension 2. In PV-E, x axis is peak to valley time (PV), in *μs* and y axis is energy (E) in *μV*^{2}.

https://doi.org/10.1371/journal.pone.0224547.s005

(PDF)

## Acknowledgments

This research was supported by the Natural Sciences and Engineering Research Council of Canada. MRP was funded by Dean Connor and Maris Uffelmann Donation, Canadian Fund For Innovation and Natural Sciences and Engineering Research Council: Discovery Grant (RGPIN-2016-06358). We confirm that the preprint of this article has been submitted to BioRxiv [70].

## References

- 1. Buzsáki G. Large-Scale Recording of Neuronal Ensembles. Nature Neuroscience. 2004;7(5):446–451. pmid:15114356
- 2. Brown EN, Kass RE, Mitra PP. Multiple Neural Spike Train Data Analysis: State-of-the-Art and Future Challenges. Nature Neuroscience. 2004;7(5):456–461. pmid:15114358
- 3. Ventura V, Gerkin RC. Accurately Estimating Neuronal Correlation Requires a New Spike-Sorting Paradigm. Proceedings of the National Academy of Sciences of the United States of America. 2012;109(19):7230–7235. pmid:22529350
- 4. Rossant C, Kadir SN, Goodman DFM, Schulman J, Hunter MLD, Saleem AB, et al. Spike sorting for large, dense electrode arrays. Nature Neuroscience. 2016;19(4):634–641. pmid:26974951
- 5. Shoham S, O’Connor DH, Segev R. How Silent Is the Brain: Is There a “Dark Matter” Problem in Neuroscience? Journal of Comparative Physiology A. 2006;192(8):777–784.
- 6. Pedreira C, Martinez J, Ison MJ, Quian Quiroga R. How Many Neurons Can We See with Current Spike Sorting Algorithms? Journal of Neuroscience Methods. 2012;211(1):58–65. pmid:22841630
- 7. Niediek J, Boström J, Elger CE, Mormann F. Reliable Analysis of Single-Unit Recordings from the Human Brain under Noisy Conditions: Tracking Neurons over Hours. PloS one. 2016;11(12):e0166598. pmid:27930664
- 8.
Pachitariu M, Steinmetz N, Kadir S, Carandini M, Harris KD. Kilosort: Realtime Spike-Sorting for Extracellular Electrophysiology with Hundreds of Channels. bioRxiv. 2016; p. 061481.
- 9.
Yger P, Spampinato GLB, Esposito E, Lefebvre B, Deny S, Gardella C, et al. Fast and Accurate Spike Sorting in Vitro and in Vivo for up to Thousands of Electrodes. bioRxiv. 2016; p. 067843.
- 10. Jorgenson LA, Newsome WT, Anderson DJ, Bargmann CI, Brown EN, Deisseroth K, et al. The BRAIN Initiative: Developing Technology to Catalyse Neuroscience Discovery. Philosophical Transactions of the Royal Society B: Biological Sciences. 2015;370 (1668).
- 11. Markram H, Muller E, Ramaswamy S, Reimann MW, Abdellah M, Sanchez CA, et al. Reconstruction and Simulation of Neocortical Microcircuitry. Cell. 2015;163(2):456–492. pmid:26451489
- 12. Armañanzas R, Ascoli GA. Towards Automatic Classification of Neurons. Trends in neurosciences. 2015;38(5):307–318. pmid:25765323
- 13.
Bezdek JC. A Primer on Cluster Analysis: 4 Basic Methods That (Usually) Work. Sarasota, FL: First Edition Design Publishing; 2017.
- 14. Dehghani N, Peyrache A, Telenczuk B, Le Van Quyen M, Halgren E, Cash SS, et al. Dynamic Balance of Excitation and Inhibition in Human and Monkey Neocortex. Scientific Reports. 2016;6(1).
- 15. Pazienti A, Grün S. Robustness of the Significance of Spike Synchrony with Respect to Sorting Errors. Journal of Computational Neuroscience. 2006;21(3):329–342. pmid:16927209
- 16. Cohen MR, Kohn A. Measuring and Interpreting Neuronal Correlations. Nature neuroscience. 2011;14(7):811–819. pmid:21709677
- 17. Einevoll GT, Franke F, Hagen E, Pouzat C, Harris KD. Towards Reliable Spike-Train Recordings from Thousands of Neurons with Multielectrodes. Current Opinion in Neurobiology. 2012;22(1):11–17. pmid:22023727
- 18.
Henze DA, Harris KD, Borhegyi Z, Csicsvari J, Mamiya A, Hirase H, et al. Simultaneous Intracellular and Extracellular Recordings from Hippocampus Region CA1 of Anesthetized Rats. 2009; http://dx.doi.org/10.6080/K02Z13FP.
- 19. Zhao Z, Wang L, Liu H, Ye J. On Similarity Preserving Feature Selection. IEEE Transactions on Knowledge and Data Engineering. 2013;25(3):619–632.
- 20. van der Maaten L, Postma E, Van den Herik J. Dimensionality reduction: A comparative review. Journal of Machine Learning Research. 2009;10:66–71.
- 21.
Campadelli P, Casiraghi E, Ceruti C, Rozza A. type [; 2015] https://www.hindawi.com/journals/mpe/2015/759567/.
- 22.
Theodoridis S. Pattern Recognition. Koutroumbas K, editor. London: Academic Press; 2009.
- 23. Hattori S, Chen L, Weiss C, Disterhoft JF. Robust Hippocampal Responsivity during Retrieval of Consolidated Associative Memory. Hippocampus. 2015;25(5):655–669.
- 24. Truccolo W, Donoghue JA, Hochberg LR, Eskandar EN, Madsen JR, Anderson WS, et al. Single-Neuron Dynamics in Human Focal Epilepsy. Nature Neuroscience. 2011;14(5):635–641. pmid:21441925
- 25. Takahashi S, Anzai Y, Sakurai Y. Automatic Sorting for Multi-Neuronal Activity Recorded With Tetrodes in the Presence of Overlapping Spikes. Journal of Neurophysiology. 2003;89(4):2245–2258. pmid:12612049
- 26. Quiroga RQ, Nadasdy Z, Ben-Shaul Y. Unsupervised Spike Detection and Sorting with Wavelets and Superparamagnetic Clustering. Neural Computation. 2004;16(8):1661–1687. pmid:15228749
- 27. van der Maaten L, Hinton G. Visualizing Data Using T-SNE. Journal of Machine Learning Research. 2008;9(Nov):2579–2605.
- 28.
Mahallati S, Bezdek JC, Kumar D, Popovic MR, Valiante TA. Interpreting Cluster Structure in Waveform Data with Visual Assessment and Dunn’s Index. In: Frontiers in Computational Intelligence. Studies in Computational Intelligence. Springer, Cham; 2018. p. 73–101.
- 29. Sammon JW. A nonlinear mapping for data structure analysis. IEEE Transactions on Computers. 1969;100(5):401–409.
- 30. Tenenbaum JB, De Silva V, Langford JC. A global geometric framework for nonlinear dimensionality reduction. Science. 2000;290(5500):2319–2323. pmid:11125149
- 31. Havens TC, Bezdek JC. An Efficient Formulation of the Improved Visual Assessment of Cluster Tendency (iVAT) Algorithm. IEEE Transactions on Knowledge and Data Engineering. 2012;24(5):813–822.
- 32.
Bezdek JC, Hathaway RJ. VAT: A Tool for Visual Assessment of (Cluster) Tendency. In: Proceedings of the 2002 International Joint Conference on Neural Networks, 2002. IJCNN’02. vol. 3; 2002. p. 2225–2230.
- 33. Prim RC. Shortest Connection Networks And Some Generalizations. Bell System Technical Journal. 1957;36(6):1389–1401.
- 34. Jensen AM, Tregellas JR, Sutton B, Xing F, Ghosh D. Kernel Machine Tests of Association between Brain Networks and Phenotypes. PLOS ONE. 2019;14(3):e0199340. pmid:30897094
- 35. Negi SK, Guda C. Global Gene Expression Profiling of Healthy Human Brain and Its Application in Studying Neurological Disorders. Scientific Reports. 2017;7(1):897. pmid:28420888
- 36. Kumar D, Bezdek JC, Rajasegarar S, Leckie C, Palaniswami M. A Visual-Numeric Approach to Clustering and Anomaly Detection for Trajectory Data. The Visual Computer. 2017;33(3):265–281.
- 37. Dunn JC. A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters. Journal of Cybernetics. 1973;3(3):32–57.
- 38. Bezdek JC, Pal NR. Some New Indexes of Cluster Validity. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics). 1998;28(3):301–315.
- 39. Arbelaitz O, Gurrutxaga I, Muguerza J, Pérez JM, Perona I. An Extensive Comparative Study of Cluster Validity Indices. Pattern Recognition. 2013;46(1):243–256.
- 40. Hubert L, Arabie P. Comparing Partitions. Journal of Classification. 1985;2(1):193–218.
- 41. Kumar D, Bezdek JC, Palaniswami M, Rajasegarar S, Leckie C, Havens TC. A Hybrid Approach to Clustering in Big Data. IEEE Transactions on Cybernetics. 2016;46(10):2372–2385. pmid:26441434
- 42. Henze DA, Borhegyi Z, Csicsvari J, Mamiya A, Harris KD, Buzsáki G. Intracellular Features Predicted by Extracellular Recordings in the Hippocampus In Vivo. Journal of Neurophysiology. 2000;84(1):390–400. pmid:10899213
- 43. Lei Y, Bezdek JC, Romano S, Vinh NX, Chan J, Bailey J. Ground Truth Bias in External Cluster Validity Indices. Pattern Recognition. 2017;65:58–70.
- 44.
Dayan P, Abbott LF. THEORETICAL NEUROSCIENCE. 2001; p. 432.
- 45. Todorova S, Sadtler P, Batista A, Chase S, Ventura V. To Sort or Not to Sort: The Impact of Spike-Sorting on Neural Decoding Performance. Journal of neural engineering. 2014;11(5):056005. pmid:25082508
- 46. Rullen RV, Thorpe SJ. Rate Coding Versus Temporal Order Coding: What the Retinal Ganglion Cells Tell the Visual Cortex. Neural Computation. 2001;13(6):1255–1283. pmid:11387046
- 47. Mehta MR, Lee AK, Wilson MA. Role of Experience and Oscillations in Transforming a Rate Code into a Temporal Code. Nature. 2002;417(6890):741–746. pmid:12066185
- 48. Huxter J, Burgess N, O’Keefe J. Independent Rate and Temporal Coding in Hippocampal Pyramidal Cells. Nature. 2003;425(6960):828–832. pmid:14574410
- 49. Zuo Y, Safaai H, Notaro G, Mazzoni A, Panzeri S, Diamond ME. Complementary Contributions of Spike Timing and Spike Rate to Perceptual Decisions in Rat S1 and S2 Cortex. Current Biology. 2015;25(3):357–363. pmid:25619766
- 50. Akam T, Kullmann DM. Oscillatory Multiplexing of Population Codes for Selective Communication in the Mammalian Brain. Nature Reviews Neuroscience. 2014;15(2):111–122. pmid:24434912
- 51. Comaniciu D, Meer P. Mean Shift: A Robust Approach toward Feature Space Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2002;24:603–619.
- 52. Rodriguez A, Laio A. Clustering by Fast Search and Find of Density Peaks. Science. 2014;344(6191):1492–1496. pmid:24970081
- 53. Havens TC, Bezdek JC, Keller JM, Popescu M, Huband JM. Is VAT Really Single Linkage in Disguise? Annals of Mathematics and Artificial Intelligence. 2009;55(3-4):237.
- 54. Rathore P, Kumar D, Bezdek JC, Rajasegarar S, Palaniswami MS. A Rapid Hybrid Clustering Algorithm for Large Volumes of High Dimensional Data. IEEE Transactions on Knowledge and Data Engineering. 2018; p. 1–1.
- 55. van der Maaten L. Accelerating T-SNE Using Tree-Based Algorithms. Journal of Machine Learning Research. 2014;15:3221–3245.
- 56.
Dimitriadis G, Neto J, Kampff A. T-SNE Visualization of Large-Scale Neural Recordings. bioRxiv. 2016; p. 087395.
- 57. Harris KD, Henze DA, Csicsvari J, Hirase H, Buzsáki G. Accuracy of Tetrode Spike Separation as Determined by Simultaneous Intracellular and Extracellular Measurements. Journal of Neurophysiology. 2000;84(1):401–414. pmid:10899214
- 58. Buzsáki G, Anastassiou CA, Koch C. The Origin of Extracellular Fields and Currents—EEG, ECoG, LFP and Spikes. Nature Reviews Neuroscience. 2012;13(6):407–420. pmid:22595786
- 59. Harris KD, Quiroga RQ, Freeman J, Smith SL. Improving Data Quality in Neuronal Population Recordings. Nature Neuroscience. 2016;19(9):1165–1174. pmid:27571195
- 60. Abeles M, Goldstein MH. Multispike Train Analysis. Proceedings of the IEEE. 1977;65(5):762–773.
- 61. Aksenova TI, Chibirova OK, Dryga OA, Tetko IV, Benabid AL, Villa AEP. An Unsupervised Automatic Method for Sorting Neuronal Spike Waveforms in Awake and Freely Moving Animals. Methods. 2003;30(2):178–187. pmid:12725785
- 62. Wouters J, Kloosterman F, Bertrand A. Towards Online Spike Sorting for High-Density Neural Probes Using Discriminative Template Matching with Suppression of Interfering Spikes. Journal of Neural Engineering. 2018;15(5):056005. pmid:29932426
- 63. Paralikar KJ, Rao CR, Clement RS. New Approaches to Eliminating Common-Noise Artifacts in Recordings from Intracortical Microelectrode Arrays: Inter-Electrode Correlation and Virtual Referencing. Journal of Neuroscience Methods. 2009;181(1):27–35. pmid:19394363
- 64. Takekawa T, Ota K, Murayama M, Fukai T. Spike Detection from Noisy Neural Data in Linear-Probe Recordings. European Journal of Neuroscience. 2014;39(11):1943–1950. pmid:24827558
- 65. Pillow JW, Shlens J, Chichilnisky EJ, Simoncelli EP. A Model-Based Spike Sorting Algorithm for Removing Correlation Artifacts in Multi-Neuron Recordings. PLoS ONE. 2013;8(5):e62123. pmid:23671583
- 66. Vendramin L, Campello RJGB, Hruschka ER. Relative Clustering Validity Criteria: A Comparative Overview. Statistical Analysis and Data Mining. 2010;3(4):209–235.
- 67. Vega-Pons S, Ruiz-Shulcloper J. A SURVEY OF CLUSTERING ENSEMBLE ALGORITHMS. International Journal of Pattern Recognition and Artificial Intelligence. 2011;25(03):337–372.
- 68. Fournier J, Mueller CM, Shein-Idelson M, Hemberger M, Laurent G. Consensus-Based Sorting of Neuronal Spike Waveforms. PLOS ONE. 2016;11(8):e0160494. pmid:27536990
- 69.
Fisher RA. Statistical Methods for Research Workers. –. 13th ed. Edinburgh: Oliver and Boyd; 1958.
- 70.
Mahallati S, Bezdek JC, Popovic MR, Valiante T. Cluster Tendency Assessment in Neuronal Spike Data. bioRxiv. 2018; p. 285064.