Fig 1.
Consensus-based clustering of pseudo-Gaussian clouds.
a. Synthetic dataset built from Gaussian distributions. b. Four examples of partitions obtained by K-means clustering with a large number of clusters. The boundaries of each K-means cluster are represented as the convex hull of the clustered points (black lines). c. Core clusters identified from 200 K-means clustering solutions, with a threshold for the core cluster size of 8. d. Pairwise probability of misclassification between core clusters (Pmis, see Methods). e. Dendrogram obtained by hierarchical clustering with the Single-Link method over the 1—Pmis distance matrix. Colors indicate the 5 clusters identified when the cluster tree is cut at 1 –Pth, with Pth = 0.05. f. Final clusters identified by consensus clustering from the dendrogram shown in e. Inset, pairwise probability of misclassification between final clusters.
Fig 2.
Consensus-based clustering applied to simultaneous intracellular/tetrode recordings.
a. Band-passed filtered extracellular voltage signal (black) obtained from a tetrode recording in rat hippocampal CA1. One cell in the vicinity of the tetrode was simultaneously recorded intracellularly (red, [9]). b. Dendrogram (top) obtained by hierarchical clustering with the Single-Link method over the distance matrix (bottom) defined from the pairwise probability of spike misclassification between core clusters. Each leaf of the tree corresponds to a core cluster while the height of each node represents the probability of spike misclassification between the closest core clusters of the linked groups. The final clusters were identified by cutting the cluster tree at 1 –Pth, with Pth = 0.15. c. Dendrogram and distance matrix measured between the final clusters, after consensus clustering. d. Whitened spatiotemporal waveforms of 7 of the final clusters identified by consensus-based clustering. The cluster templates, defined as the average waveforms, are shown in black. Cluster #3 was identified as the single unit matching the intracellularly recorded cell. Spike trains corresponding to cluster #22, #23 and #24 were merged and considered to be the same unit because their cross-correlograms were typical of a bursting cell. The estimated probability of misclassified spikes (, see Methods) is shown below each final cluster. After comparing the spike train of cluster #3 to the ground truth spikes, we found a true rate of misclassification of FPrate + FNrate = 1.7%.
Fig 3.
Comparison of the true rate of misclassified spikes (FPrate + FNrate) between the consensus-based clustering method and a quadratic support vector machine trained on the ground truth data (SVM, see Methods). Each dot corresponds to the rate of misclassification for the single unit matching the ground truth cell spike times. Symbols and colors correspond to different experiments. A few cells could not be identified correctly and showed much higher rates of misclassification than the optimal performances reached by the SVM classifier. Nevertheless, these clusters (circled symbols) were not validated as well isolated single units because their signal-to-noise ratio (SNR) was too low (SNR < 4) or their estimated rates of misclassified spikes (, see Methods) were higher than 20%.
Fig 4.
Consensus-based clustering applied to simultaneous intracellular/MEA recordings.
a. Band-passed filtered extracellular voltage signal (black) obtained from a multi-electrode array recording (59-electrode MEA, MultiChannel System) performed in vitro in turtle dorsal cortex (Mark Shein-Idelson, Lorenz Pammer, Mike Hemberger and Gilles Laurent, unpublished). One neuron whose spikes could be detected on the MEA was simultaneously recorded intracellularly (*, red). b. Top, Dendrogram obtained by hierarchical clustering with the Single-Link method over the distance matrix defined from the pairwise probability of spike misclassification between final clusters (Pmis, see Methods). Bottom, Pairwise probabilities of spike misclassification between final clusters. c. Whitened spatiotemporal waveforms of 3 of the final clusters identified by consensus-based clustering. We only show the waveforms over six channels centered on the position of maximal amplitude (shaded area). The cluster templates were defined as the average waveform (black). Cluster #1 was identified as the single unit matching the intracellularly recorded cell. The estimated probability of misclassified spikes (, see Methods) is shown for each cluster. All ground truth spikes were correctly classified (FPrate + FNrate = 0%).
Fig 5.
Consensus-based clustering applied to replicated tetrode recordings with ground truth.
a. Tetrode recordings obtained from rat hippocampal CA1 with simultaneous intracellular recording from one cell in the vicinity of the electrode were spatially replicated 8 times with a time shift. This resulted in artificial 32-channel recordings with 8 ground-truth clusters. These recordings were processed as if they had been obtained from 32-channel linear electrode array with 20-μm spacing between recording sites. b. Left, Dendrogram obtained by hierarchical clustering with the Single Link method over the distance matrix defined from the pairwise probability of spike misclassification between final clusters (Pmis, see Methods). Right, Pairwise probability of spike misclassification between final clusters. c. Whitened spatiotemporal waveforms of 5 of the final clusters identified by consensus-based clustering. The cluster templates were defined as the average waveform (black). Cluster #5 and #6 were identified as matching the intracellularly recorded cell #3 and #8 respectively. The estimated probability of misclassified spikes (, see Methods) is shown below each cluster. Almost all replicated groundtruth spikes were correctly identified (FPrate + FNrate ≤ 1%).
Fig 6.
Consensus-based clustering applied to in vivo MEA recordings with artificial ground truth spikes.
a. Eight artificial spike trains were added to the raw voltage signal recorded with a 32-channel linear electrode array in the dorsal cortex of the anaesthetized turtle during visual stimulation (see Methods). The artificial spikes waveforms were generated from the cluster template of single units recorded simultaneously but from another electrode array. Artificial spikes were randomly varied in amplitude by ± 10% standard deviation to mimic realistic spike shape variability. b. Left, dendrogram obtained by hierarchical clustering with the Single-Link method over the distance matrix defined from the pairwise probability of spike misclassification between the final clusters (Pmis, see Methods). Right, Pairwise probability of spike misclassification between final clusters. c. Whitened spatiotemporal waveforms of 6 of the final clusters identified by consensus-based clustering. The cluster templates were defined as the average waveform (black). Cluster #1 and #63 were identified as matching the ground truth spikes of cell #4, and #2 respectively. The estimated probability of misclassified spikes (, see Methods) is shown below each cluster, as well as the true rate of misclassification (color).
Fig 7.
Comparison between estimated and measured rates of misclassified spikes.
a. Comparison of estimated rates of false positive spikes () with actual proportion of false positive error (%FP). Each dot corresponds to the error rate for the single unit matching the ground truth cell spike times. Same symbols and colors as in Fig 3. b. Same as a but for false negative spikes (
). c. Same as a but for the total proportion of misclassified spikes (
). Circled symbols correspond to clusters that were not validated as well isolated single units either because SNR was too low (< 4) or their estimated rates of misclassified spikes was too high (
> 20%).
Table 1.
Main steps of the consensus-clustering approach for spike sorting.