Skip to main content
Advertisement

< Back to Article

Fig 1.

Conceptual overview of the SPINDLE framework.

(a) Measured EEG activity may vary depending on where the electrodes are placed. Assumed input in our setting are two EEG channels and one EMG channel. EMG signal is recorded on the neck muscle (not depicted for simplicity). (b) Raw signals are processed by windowed Fourier transforms applied on overlapping frames. The output of the preprocessing are time-frequency representations of EEG/EMG which are additionally preprocessed. (c) Three two-dimensional spectrograms are then sectioned into epochs which correspond to 4 sec intervals. Each epoch is independently processed by the two CNNs. (d) The first CNN estimates whether the evaluated epoch is an artifact. (e) The second CNN estimates the probability of each vigilance state. (f) The sequence of estimated vigilance state probabilities is then corrected using the Viterbi decoding algorithm and predetermined transition matrix of HMM which encodes the transition rules. If an epoch is not designated as an artifact, the most probable vigilance state is assigned. (g) If an epoch is marked as an artifact, the most probable vigilance state determines the type of the artifact: WAKE-artifact, NREM-artifact or REM-artifact. NREM/N, non-rapid eye movement; REM/R, rapid eye movement; WAKE/W, wakefulness; CNN, convolutional neural network; HMM hidden Markov model.

More »

Fig 1 Expand

Table 1.

Collected data overview.

Presented are the notable properties of EEG/EMG animal recordings produced in our study. All recordings were segmented into 4 sec time intervals (epochs) and then annotated rendering 21600 × 2 labels per animal. Table columns for each cohort and lab depict: (a) the number of wildtypes; (b) the number of mutants; (c) rodent specie; (d) the sampling rate of the recording device; (e) the derivation of 2 EEG signals with respect to the placement of corresponding EEG electrodes; (f) the derivation of EMG signal; (g) the number of human experts who scored the data; (h) the duration of each animal recording within given cohort; and lastly (i) the degree of signal corruption taken as the average percentage of artifacts computed from the scorings of the corresponding experts. The cohort C was scored by an expert from BaumannLab, as well as by an expert from BrownLab. All other cohorts were scored by experts from the same lab. Data acquisition is for each animal cohort explained in detail in Materials and Methods.

More »

Table 1 Expand

Fig 2.

Intra-lab and inter-lab human expert agreement.

Confusion matrices derived from the twofold annotation procedure of EEG/EMG data with number of common epochs shown at each intersection, and overall percentage agreement calculated above. We evaluated the agreement of human experts from the same lab (intra-lab agreement), but we also compared the scorings of a BrownLab human expert with the scorings of a BaumannLab human expert on the cohort C (inter-lab agreement). The agreements were computed per-cohort, for non-artifact and artifact data separately, and again when taking all epochs into account.

More »

Fig 2 Expand

Fig 3.

Spectral profiles.

For each animal, the averaged frequency spectrum of the EEG recording (its spectral profile) was computed per vigilance class. Each plot in the left column is related to one of the 4 animal cohorts and consists of the mean spectral profile curve and the corresponding standard deviation (a half of it). All curves are normalized relative to the total power of the signal. The middle and the right column respectively represent coarse-grained (following the classical delta, theta, sigma and beta bands) and fine-grained histogram binning applied to the raw spectral profiles, with bars representing summed spectral power in each bin.

More »

Fig 3 Expand

Fig 4.

Effects of preprocessing to spectral profiles.

Top row shows per vigilance state spectral profiles normalized relative to the total signal power, but only after the log transformation was applied. Relative differences in amplitudes between cohorts are attenuated (compared to Fig 3). Bottom row shows log transformed curves after being per frequency component standardized to emphasize the differences between vigilance states.

More »

Fig 4 Expand

Fig 5.

Qualitative analysis of SPINDLE.

150 epochs were extracted from an animal from the cohort C, where we found the automated classification to be more challenging, to qualitatively compare the predictions of SPINDLE to the scorings of two experts from different labs. The first three signals from top represent the input, two EEG and an EMG. The spectrogram in the middle is a time-frequency representation of one of the EEG signals. The bottom three plots are the hypnograms, the first one derived from the scorings of one human expert, the second one derived from the predictions of SPINDLE and the third one derived from the scorings of the other human expert.

More »

Fig 5 Expand

Table 2.

Predicting vigilance states—Agreement analysis.

The evaluation was performed with and without the application of HMM based post-processing (the outputs of steps (f) and (e) in Fig 1 respectively). The predictive power is quantified with respect to the global accuracy, and for each vigilance state separately with respect to the precision, recall and F1-score according to Eq 2.

More »

Table 2 Expand

Fig 6.

Comparison against expert intersection—Confusion matrices.

Our predictions were for each cohort independently compared to the annotation intersection of two human experts. Presented are the corresponding confusion matrices. The total and vigilance state scoring agreement was calculated with respect to Eq 2, whereas the artifact scoring agreement was calculated as described in Eq 1.

More »

Fig 6 Expand

Table 3.

Predicting vigilance states—Comparison against individual human experts.

The table shows global agreement rate measured by comparing (a) individual experts between themselves; (b) individual experts with SPINDLE; (c) the scoring intersection of two experts with the predictions of SPINDLE. The evaluation was performed on each cohort separately and only non-artifactual epochs were taken into account.

More »

Table 3 Expand

Table 4.

Predicting artifacts—Comparison against individual human experts.

The table shows global agreement rate in artifact detection (evaluated with respect to Eq 1) by comparing (a) individual experts between themselves; (b) individual experts with SPINDLE; (c) only the epochs marked as artifacts by both experts to the artifact predictions of SPINDLE. Note that the cohort D was omitted since it contained practically no epochs labeled as artifacts.

More »

Table 4 Expand

Fig 7.

Comparative analysis of SPINDLE.

SPINDLE was compared against three other state-of-the-art solutions (FASTER [10], SCOPRISM [6] and Autoscore [9]). Evaluations were performed for each animal separately and the results were grouped per cohort (top four figures). The global error rate was measured as ER = 100 − AC, and for each vigilance state separately the class specific error rate was measured as CER(C) = 100 − F1(C), where AC and F1(C) are defined in Eq 2. The evaluation of errors was performed on the scoring intersection of two human raters and did not take corrupted epochs into account. Execution times for scoring of 24 hour EEG/EMG animal recordings are given at the right bottom figure.

More »

Fig 7 Expand

Fig 8.

Scoring agreement comparison of FASTER and SPINDLE on 8 second intervals.

The overlap was measured with respect to all epochs from all animal cohorts, and additionally with respect to each vigilance state individually.

More »

Fig 8 Expand

Fig 9.

Predicting parameters of sleep architecture.

Fraction of sleep (top row), bout duration (second row), number of bouts (third row), and number of sleep transitions (bottom row). Given are the box plots computed from the evaluation of these parameters per hour for each cohort, using data from each human scorer and from SPINDLE output. W→N, transition from WAKE to NREM; N→W, transition from NREM to WAKE; N→R, transitions from NREM to REM.

More »

Fig 9 Expand

Fig 10.

Detecting mutation-induced cohort differences in sleep timing.

Fraction of NREM sleep, REM sleep and wakefulness per 2 hour intervals across 24 hours, in the cohorts A (wildtypes) and B (mutants). Results were evaluated from the scorings of the corresponding two experts and SPINDLE. The prediction curves were calculated for two different values of artifact threshold applied on the probabilistic output of the corresponding CNN (see step (d) in Fig 1). 0.5 is the default threshold, and indicates that only epochs with > 50% confidence of being non-corrupted were kept in the analysis. Similar procedure was applied for 0.7 artifact threshold. For measuring of the overall statistical significance, two-way ANOVA (marked as A) was used. ▲ p < 0.05 regions explain statistical differences of the corresponding 0.5Hz frequency bins measured using a two-tailed T-test with equal variances. The curves represent mean ± SEM. ZT, zeitgeber time.

More »

Fig 10 Expand

Fig 11.

Detecting mutation-induced cohort differences in EEG spectra.

EEG spectral power density plots of the cohort A (wildtypes) and B (mutants) during NREM, REM and wakefulness. Results were evaluated from the scorings of the corresponding two experts and SPINDLE. The prediction curves were calculated for two different values of artifact threshold applied on the probabilistic output of the corresponding CNN (see step (d) in Fig 1). 0.5 is the default threshold, and indicates that only epochs with > 50% confidence of being non-corrupted were kept in the analysis. Similar procedure was applied for 0.7 artifact threshold. For measuring of the overall statistical significance, two-way ANOVA (marked as A) was used. ▲ p < 0.05 regions explain statistical differences of the corresponding 0.5Hz frequency bins measured using a two-tailed T-test with equal variances. The curves represent mean ± SEM.

More »

Fig 11 Expand

Fig 12.

Data preprocessing and CNN input preparation.

The figure depicts the creation of the three-channel two-dimensional input for the CNNs. In step (a) raw time series of EEG/EMG signals are separately transformed into the corresponding time-frequency domain (power spectrum density is computed) via a sequence of short Fourier transforms applied to overlapping Hamming windows. In step (b) EEG signals are band-pass filtered (0.5–24Hz) and EMG power is integrated over frequency range (0.5-30Hz) resulting in one-dimensional representation of muscle activity change over time. Furthermore, one-dimensional representation of EMG is converted into the two-dimensional one by a multiplication of the signal. Finally, in step (c) the data is log transformed and standardized per frequency component.

More »

Fig 12 Expand

Fig 13.

Sleep scoring CNN architecture.

Presented are the architectural details of the CNN which estimates the probability distribution over vigilance states for the target Et. Input to the CNN is formed as shown in Fig 12. The CNN operates over four neighboring epochs Et−2, Et−1, Et+1 and Et+2 to capture the contextual information. Illustrated CNN consists of two max-pooling layers (depicted in blue), one convolutional layer (depicted in green), and two fully-connected layers (depicted in red). At the very end, a softmax layer outputs class probabilities. The dimensions of the first max-pooling layer are (width, height) = (2, 3) with the corresponding strides (2, 3). The dimensions of the second max-pooling layer are (2, 2) with the corresponding strides (2, 2). The dimensions of the convolutional kernel are (3, 3) with the corresponding strides (1, 1).

More »

Fig 13 Expand

Fig 14.

CNN-HMM for constraining state transitions.

The figure illustrates how the HMM is used on top of the CNN to enforce prediction sequences which adhere to physiological constraints. In particular, we disallow REMNREM and WAKEREM vigilance state transitions. The constraints are encoded through the transition probability matrix of the HMM, and the observation likelihoods are implicitly calculated by the CNN.

More »

Fig 14 Expand