Detection of sleep apnea from single-channel electroencephalogram (EEG) using an explainable convolutional neural network (CNN)

doi:10.1371/journal.pone.0272167

Fig 1.

Data processing and subjectwise 10-fold cross-validation.

(A) We used recordings from the SHHS dataset [34, 35]. For each subject, we low-pass filtered, downsampled, normalised, segmented, labelled, and undersampled recordings as indicated. (B) On each fold of our 10-fold cross-validation procedure, any subject’s recordings appeared only in the training set (white) or the testing set (gray). For example, subjects 1, 2, and 3 contributed to the training sets of folds 1 to 9, but were excluded from the training set of fold 10, in which subjects 1, 2, and 3 were part of the test set. Therefore, overall, our assessment of the model’s performance captured its ability to generalize across subjects.

More »

Expand

Table 1.

Optimized hyperparameters, and hyperparameter search spaces.

The rightmost column shows optimal hyperparameters evaluated with Hyperband-based tuning.

More »

Expand

Fig 2.

The architecture of our CNN trained to detect SA, comprising three convolutional layers.

Convolutions had a stride size of one and used zero padding. Each convolutional layer was followed by batch normalisation, ELU activation, and dropout (these operations are not illustrated). The dense layer was preceded by a flattening operation, and followed by ELU activation and a dropout layer. The output layer used a softmax classifier [42]. (Symbology after Schirrmeister and colleagues [27]).

More »

Expand

Table 2.

The mean distribution of annotations within the training and testing sets.

The validation set has the same proportions as the training set. Each EEG segment has an apnea annotation (i.e., “apnea” or “non-apnea”) and a sleep-stage annotation (i.e., “wake”, “REM” or “NREM”). Overall there was on average 1,144 segments per patient before undersampling and 378 segments per patient after resampling.

More »

Expand

Fig 3.

ROC curves summarizing the performance of our SA network.

We represent each fold of our subjectwise 10-fold cross-validation with a separate curve. Across folds, the AUC averaged 0.804 (s.t.d. = 0.031).

More »

Expand

Fig 4.

Confusion matrix, summarizing the performance of our SA network.

The area of each square represents the value of each matrix entry. Values are counts averaged across our subjectwise 10-fold cross-validation. The intervals (±) associated with each value show s.t.d. across folds. Overall, the network performed with accuracy = 76.8%, as indicated by the mass along the matrix’s main diagonal.

More »

Expand

Fig 5.

Confusion submatrices, each corresponding to one or other of three sleep stages: Wake (left), REM (middle), and NREM (right).

For wake and REM, our SA network appeared to behave in a biased fashion. Graphical conventions are as in Fig 4.

More »

Expand

Fig 6.

(A) Effect of critical-band masking on our SA network’s performance. We used high-, medium-, and low-intensity noise: SNR = 5, 10, and 20, respectively (Methods). Overall, high-intensity noise decreased performance more than low-intensity noise. The deleterious effect of noise was pronounced in some frequency bands but not others. E.g., adding bandlimited noise to test signals in the delta band (< 4 Hz) caused MCC to decrease from 0.38 to 0.14. The lower, horizontal solid line (performance = 0.0 MCC) indicates the performance baseline (Methods), and the upper, horizontal solid line indicates network performance in the absence of noise (MCC = 0.38). The error bars (shown only for high-intensity noise) are 95% confidence intervals computed across folds of our subjectwise 10-fold cross-validation. The Greek letters (top) mark traditional frequency bands; sigma marks the band associated with sleep spindles. (B) Effect of critical-band masking on our re-trained SA network; we re-trained the network after data were stage-wise shuffled. Adding bandlimited noise to test signals in the delta and alpha (8 to 13 Hz) bands, here, had little effect on network performance. By contrast, noise in the lower beta band and gamma band heavily reduced performance. Other graphical conventions are as in (A). Upper horizontal line marks the stage-wise shuffled no-noise response (MCC = 0.275).

More »

Expand

Fig 7.

Importance matrix, showing 1st-layer filters comprising our trained SA network.

Matrix columns correspond to folds from our subjectwise 10-fold cross-validation; rows correspond to importance (i.e., the most important filter on each fold is shown in row 1). To illustrate by example, on the first fold of cross-validation, the filter kernel illustrated at column 1, row 1 (top-left), was determined to be the most important; lesioning this filter reduced the trained SA network’s performance more significantly than any other filter on this fold.

More »

Expand

Fig 8.

Amplitude spectra of 1st-layer filters important to the SA network’s performance.

We show spectra for the 1st-, 2nd-, and 3rd-most important filters (“Rank 1, 2, and 3”, respectively). Important filters appeared to attenuate the delta band, and amplify the beta and gamma bands. The shaded rectangle marks a 95%-confidence interval (i.e., -1.96 < z-score < 1.96), wherein the spectral amplitude of rank 1, 2, and 3 filters is not appreciably different from that of all other 1st-layer filters comprising the ensemble.

More »

Expand