Somnotate: A probabilistic sleep stage classifier for studying vigilance state transitions

doi:10.1371/journal.pcbi.1011793

Fig 1.

Establishing a probabilistic sleep stage classifier.

(A) Continuous EEG and EMG recordings were made across a full sleep-wake cycle from freely behaving mice. (B) A fifteen-minute segment of the consensus of manual annotations by four independent experienced sleep researchers (top) and the corresponding anterior EEG, posterior EEG, and EMG recording. (C) Anterior EEG, posterior EEG and EMG multi-taper spectrograms. (D) Two-dimensional representation of the segment after targeted dimensionality reduction via LDA. Negative values in the first component (‘LD1’) and in the second component (‘LD2’) indicate the awake state; positive LD1 with negative LD2 indicates NREM; negative LD1 with positive LD2 indicates REM. (E) Probability of each state when fitting two-dimensional Gaussian distributions to the values in ‘D’. (F) Likelihood of each state given the probability of each state (as shown in ‘E’) and all possible state sequences, weighted by their likelihood given the state transition probabilities (as shown in ‘I’). (G) Distribution of values after dimensionality reduction by LDA. Each dot corresponds to a randomly chosen 1-second epoch. Colour indicates the state assigned in the manual consensus annotation. Lines indicate the standard deviations of multivariate Gaussian distributions, one for each state, fitted to all samples in the data set. (H) The state occupancy based on the time spent in each state across six 24-hour data sets, according to at least four manual annotations. (I) The corresponding state transition probabilities. (J) Accuracy of the LDA classifier, the naive Bayes classifier, and the HMM classifier (i.e. Somnotate). Accuracy was evaluated across six 24-hour data sets in a hold-one-out fashion. Error bars indicate standard deviation. P-values are derived from a Wilcoxon signed rank test with a Bonferroni-Holm correction for multiple comparisons.

More »

Expand

Fig 2.

Automated sleep stage classification by Somnotate exceeds manual accuracy.

(A) Somnotate was trained and tested, in a hold-one-out fashion, on six 24-hour data sets. Using a consensus annotation based on at least three manual annotations, the accuracy of the classifier was compared to the accuracy of individual manual annotations (n = 25 manual annotations from 13 experienced sleep researchers). (B) The confusion matrix for individual manual annotations compared to the manual consensus (left), for the automated classifier compared to the manual consensus (middle), and the difference between these two confusion matrices (right). (C) Comparison of state occupancies between the automated and manual consensus annotations. (D) State transition probabilities in the automated annotation, normalised to the state transition probabilities in the manual consensus annotation. (E) Cumulative frequency plot shows the duration of the differences between the automated annotation and the manual consensus. Note that the manual annotation had a temporal resolution of 4 s (vertical dashed line), whereas the automated classification was performed at a time resolution of 1 s. (F) Venn-diagram of the time points at which the automated annotation and manual consensus differed. (G) Excluding samples where Somnotate is not certain improves accuracy. Classifier accuracy was compared between cases when all samples were included (‘All data’) and when 5.5% of samples were removed because the likelihood of the predicted state dropped below 0.995 (‘High certainty’). The plot indicates mean ± standard deviation and p-values are derived from a Wilcoxon signed rank test. (H) Somnotate was trained on six 24-hour data sets and then tested on a 12-hour data set, which had been independently annotated by ten experienced sleep researchers (as in Fig 1). The accuracy of the annotation by Somnotate was compared to consensus annotations generated from different numbers of manual annotations. Error bars indicate standard deviation. P-values are derived from a Wilcoxon signed rank test.

More »

Expand

Table 1.

Performance of Somnotate and other state-of-the-art algorithms for automated mouse polysomnography compared to manual annotation by experienced experts.

Manual and automated annotations of six 24 hour datasets were evaluated based on the consensus of multiple expert annotations. Values represent mean ± standard deviation.

More »

Expand

Fig 3.

Somnotate is robust to errors in the training data, changes in the features of the data, and changes in the vigilance state transition probabilities.

(A) Somnotate’s accuracy was evaluated on six 24-hour data sets, in a hold-one-out fashion, while permuting an increasing fraction of annotations in the training data. Confusion matrices show the results when permuting 10% of the training data annotations (resulting in 6% mislabelled time points; left), permuting 50% of the training data annotations (resulting in 28% mislabelled time points; middle), or permuting 90% of the training data annotations (resulting in 51% mislabelled time points; right). Values represent mean ± standard deviation. (B) Somnotate’s accuracy as a function of the percentage of permuted training data annotations. (C) The accuracy of Somnotate, pre-trained on 24h standard sleep-wake cycle datasets, was evaluated against a manual annotation of baseline control data (six 12-hour light cycle-only data sets), and compared to its accuracy on data from the same animals after undergoing a sleep deprivation protocol (six 12-hour light cycle-only data sets). Confusion matrices are shown for the baseline (left), following sleep deprivation (middle), and as the difference between these two confusion matrices (right). (D) Comparison of Somnotate’s overall accuracy on baseline data and data collected after sleep deprivation. P-value is derived from a Wilcoxon signed rank test. (E) Mice experienced experimentally-induced awakenings via optogenetic stimulation of ChR2-expressing inhibitory neurons in the lateral preoptic hypothalamus. (F) State transition probabilities during optogenetic stimulation, normalised to the state transition probabilities during the baseline condition. The optogenetic manipulation increased the probability that the animals transitioned from NREM sleep and REM sleep, to the awake state. (G) The accuracy of Somnotate, pre-trained on 24h standard sleep-wake cycle datasets, was evaluated against a manual annotation for eleven 24-hour data sets recorded during optogenetic stimulation, and for baseline recordings from the same animals on days when optogenetic stimulation was not performed. Confusion matrices are shown for the baseline recordings (left) and the recordings with optogenetic stimulation (right). (H) The accuracy of Somnotate’s annotations was near identical for both data sets and not significantly different (p > 0.9, Wilcoxon signed rank test).

More »

Expand

Fig 4.

Somnotate identifies intermediate states associated with successful and failed vigilance state transitions.

(A) Three examples of intermediate states identified by Somnotate, in which the probability of the most likely state dropped below 0.995. In each case, the consensus annotation, input signals, power spectra and likelihood of each state assigned by Somnotate, are shown. The first example (left) shows a successful state transition from awake to NREM sleep. Just before the transition, Somnotate identifies time points with intermediate states in which the probability of being awake has decreased and NREM sleep has increased. The second example (middle) shows a brief state transition from NREM sleep to awake, and then back to NREM sleep, which includes time points with intermediate states. The third example (right) shows a failed transition from NREM sleep to awake, which includes a series of time points with intermediate states in which there is a partial decrease in the probability of NREM sleep and partial increase in the probability of being awake. (B) Ternary plot of the state probabilities assigned to each time point with an intermediate state in six 24-hour data sets (left). In the vast majority of cases, the probability mass was concentrated in one or two states. This was different to a theoretical distribution in which the probability mass outside the most likely state was randomly assigned to the other two states (right). (C) Power spectra extracted for time points with intermediate states (solid lines). For reference, the power spectra for the “pure” states are also shown (dashed lines). (D) Relative frequencies of successful state transitions (per day; left), failed state transitions (middle) and the ratio between these (right). Values indicate mean ± standard deviation. The failure rates of REM transitions were statistically significantly different from one another (p < 0.001; χ² contingency test), and hence indicated with black arrows; the failure rates of NREM transitions were not significantly different (p > 0.05; χ² contingency test), and hence indicated by grey arrows.

More »

Expand

Fig 5.

Vigilance state transition failure rates depend upon sleep-wake history.

(A) State occupancy during the first two hours of recovery from a period of sleep deprivation versus baseline recordings performed in the same animals over an equivalent period but not following sleep deprivation. (B) Failure rates for transitions during the awake state (i.e. awake-to-NREM transitions; left), transitions during NREM (NREM-to-awake and NREM-to-REM transitions; middle), and transitions during REM (REM-to-awake and REM-to-NREM transitions; right). P-values are derived from a Wilcoxon signed rank tests with a Bonferroni-Holm correction for multiple comparisons.

More »

Expand