Data-driven segmentation of cortical calcium dynamics

doi:10.1371/journal.pcbi.1011085

Data-driven segmentation of cortical calcium dynamics

Fig 6

Spatial and morphological metrics are most important to classify components.

(A) Correlation and t-statistic between artifact and neural components for each feature (N = 7, n = 2190). Spatial (circles), morphological (triangle), temporal (diamond), and frequency (square) metrics plotted. Cut off values that helped in the selection process are dotted lines, rejected values in gray. Closed points are components that meet requirements. Relative importance metric from the Random Forest classifier plotted against each metric by their respective classes. Selected metrics shown in the list within each type of feature, sorted by greatest t-statistics magnitude. (B) The dataset was parsed into ML modeling dataset (N = 7, n = 2190) that was used to establish the machine learning pipeline and a novel dataset (N = 5, n = 1661) of full experiments that will not influence the classifier. Modeling data was stratified 70/30 split based on classification. 1000 iterations of training the machine learning classifier on selected metrics and validating the machine classification with human classifications. (C) Performance of the ML training, using subsets of the ML modeling dataset. 1000 iterations resulted in accuracy, precision and recall boxplots. (D) 1000 iterations of training on the full ML building dataset was performed and the novel dataset was assessed on its performance. (F) SVD projection of metric data with human classification mapping (top) and the confidence of the ML classifier (bottom). (E) Performance of the classifier on each of the novel datasets, animals plotted separately showing distribution of the 1000 different trained classifiers. (G) Approximate location of false negatives and positives from novel datasets.

doi: https://doi.org/10.1371/journal.pcbi.1011085.g006