Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

SMMTM: Motor imagery EEG decoding algorithm using a hybrid multi-branch separable convolutional self-attention temporal convolutional network

  • DianGuo Cao ,

    Contributed equally to this work with: DianGuo Cao, ZhenYuan Yu

    Roles Formal analysis, Supervision

    caodg@qfnu.edu.cn

    Affiliation The College of Engineering, Qufu Normal University, Rizhao, Shandong, China

  • ZhenYuan Yu ,

    Contributed equally to this work with: DianGuo Cao, ZhenYuan Yu

    Roles Formal analysis, Validation, Visualization, Writing – original draft

    Affiliation The College of Engineering, Qufu Normal University, Rizhao, Shandong, China

  • Jinqiang Wang,

    Roles Supervision

    Affiliation The College of Engineering, Qufu Normal University, Rizhao, Shandong, China

  • Yuqiang Wu

    Roles Funding acquisition, Supervision

    Affiliation The College of Engineering, Qufu Normal University, Rizhao, Shandong, China

Abstract

Motor imagery (MI) is a brain-computer interface (BCI) technology with the potential to change human life in the future. MI signals have been widely applied in various BCI applications, including neurorehabilitation, smart home control, and prosthetic control. However, the limited accuracy of MI signals decoding remains a significant barrier to the broader growth of the BCI applications. In this study, we propose the SMMTM model, which combines spatiotemporal convolution (SC), multi-branch separable convolution (MSC), multi-head self-attention (MSA), temporal convolution network (TCN), and multimodal feature fusion (MFF). Specifically, we use the SC module to capture both temporal and spatial features. We design a MSC to capture temporal features at multiple scales. In addition, MSA is designed to extract valuable global features with long-term dependence. The TCN is employed to capture higher-level temporal features. The MFF consists of feature fusion and decision fusion, using the features output from the SMMTM to improve robustness. The SMMTM was evaluated on the public benchmark BCI Comparison IV 2a and 2b datasets, the results showed that the within-subject classification accuracies for the datasets were 84.96% and 89.26% respectively, with kappa values of 0.797 and 0.756. The cross-subject classification accuracy for the 2a dataset was 69.21%, with a kappa value of 0.584. These results indicate that the SMMTM significantly enhances decoding performance, providing a strong foundation for advancing practical BCI implementations.

Introduction

BCI is a technology that enables direct communication between the brain and external machines [1], allowing computers to interpret brain signals and utilize them to control external devices directly. Consequently, BCI technology holds the potential to transform human life. Currently, BCI have been widely used in human-computer interaction, sports rehabilitation, and disease treatment [24]. The main BCI paradigms include steady-state visual evoked potential (SSVEP), P300, and MI [5]. Among these, MI technology is one of the key research focuses in BCI, with MI-based systems already capable of controlling devices such as electric wheelchairs, cursors, and exoskeletons using EEG signals [68]. Therefore, MI technology can be applied across various industries, significantly enhancing people’s lives. However, the practical applications of MI remain constrained by the low accuracy of signal decoding, limiting its broader implementation.

Currently, two main types of decoding methods exist for MI-EEG signals: machine learning (ML) and deep learning (DL) [9]. ML has two parts: feature extraction and classifier design. Generally, before extracting EEG features, it is essential to remove signals and artifacts unrelated to the EEG, such as electromyography (EMG), electrooculography (EOG), and electrocardiography (ECG) [10]. Subsequently, temporal and spatial features are extracted from the processed signals to decode the EEG data [11]. After feature extraction traditional ML algorithms, such as support vector machine (SVM) and k-nearest neighbour (KNN) methods, are used to classify MI [12,13]. However, removing artifacts and extracting of features require rich prior knowledge, and the performance of ML algorithms is dependent on the feature selection [14].

DL can automatically extract specific features from raw EEG signals without the need for manually designed features [15]. Over the past six years, utilizing DL for MI classification has increased significantly [16]. Several architectures have been proposed for decoding MI signals, including recurrent neural network (RNN) [17], deep belief network (DBF)[18], and convolutional neural network (CNN) [19]. Heilmeyer FA et al. [20] studied different deep CNNs that can achieve comparable accuracies to those of traditional methods in EEG decoding tasks. Compact EEGNet [21] extracts both the temporal and spatial features of EEG signals by using different convolution kernel shapes, while generalizing to multiple EEG datasets and achieving relatively good performance. The temporal convolutional network (TCN) based on CNN was proposed for time series modelling [22], which can exponentially expand the size of the receptive field with a linear increase in the number of parameters. EEG-TCNet [23] uses EEGNet and TCN to build a high-performance network, and inputs the temporal feature outcomes of EEGNet into TCN to acquire high-level temporal features. However, using a single convolution and a convolution kernel cannot efficiently extract high-level features on multiple scales. MTFB-CNN [24] constructs a parallel convolution to acquire high-level features in time-frequency domain. Incep-EEGNet [25] uses an inception-based network model to capture multi-scale features to decode the original signal. In recent research, MSA [26], can calculate multiple valuable temporal sequence features, has been used for decoding EEG signals. ATCNet [27] builds a network model and uses MSA to highlight the most important information in EEG signals, achieving high classification accuracy. AE-FBCSP combines autoencoder-based feature compression with transfer learning to significantly improve cross- and within-subject motor imagery EEG decoding performance across multiple classification tasks [28]. Few-shot transfer learning approaches, have shown promise in improving MI decoding performance and generalization across tasks [29].

Currently, deep learning (DL) methods for decoding two-class motor imagery (MI) tasks have become relatively mature and show potential for further practical applications [30]. However, as the number of classification tasks increases, decoding performance remains limited, and further exploration is required for four-class MI tasks [31].

In this study, we propose the SMMTM model, which extracts the primary spatiotemporal features from EEG signals using SC serial convolution. However, this overlooks a significant amount of valuable information from the intermediate layers of the neural network. To address this issue, we designed an MSC parallel structure to capture multi-scale temporal features. Additionally, in order to overcome the limitation of CNNs in capturing long-term dependencies from time-series data, we introduced the TCN to extract high-level temporal features. Furthermore, through using MSA, we were able to extract more valuable global information from the MI data. To address the issues of weak feature stability and difficulties in extracting hidden information, we proposed a multimodal feature fusion (MFF). Through feature fusion, the multi-scale features extracted by the MSC and the global features extracted by the MSA are fused along the depth dimension to obtain hybrid features, enhancing the representational capacity of the features. Additionally, decision fusion combines the output results of the MSC and TCN to improve the robustness of the final output. Ultimately, SMMTM achieves high classification accuracy in four-class tasks and demonstrates a certain degree of generalization capability. The performance of the SMMTM model was evaluated on the public available BCI-2a and BCI-2b datasets. The main contributions in this paper are as follows:

  1. We propose SMMTM, a high-performance MI-EEG decoding model that utilizes SC to extract primary spatiotemporal features, MSC to capture multi-scale temporal features, MSA to extract global temporal features, TCN to capture high-level temporal features, and MFF to enhance the robustness and generalization of the results.
  2. To address the issue of insufficient feature extraction across different frequency bands, filters of varying lengths were designed for each branch, enabling the multi-branch structure to capture multi-scale temporal features. Results from ablation experiments demonstrate that the MSC significantly improves classification accuracy of MI. And the effectiveness of MSC in extracting multi-scale temporal features is demonstrated through weight visualization methods, while the rationality of the SMMTM model is validated using t-SNE visualization techniques.
  3. The proposed model achieves excellent results on the publicly available datasets BCI Competition IV-2a (BCI-2a) and BCI Competition IV-2b (BCI-2b), outperforming most of the existing models in terms of classification accuracy.

Prosed method

Overall structure of SMMTM

In this paper, we propose the SMMTM model, which consists of five modules. The structure of the SMMTM model is illustrated in Fig 1, comprising the SC module, MSC module, MSA module, TCN module, and MFF module.

Initially, spatiotemporal features are extracted using a SC, which consists of serial temporal and spatial convolutions. Subsequently, the MSC module, which is composed of multi-branch separable convolutions, captures temporal features on multiple scales. The output features are then passed to the MSA to highlight valuable information within the temporal sequence. Through feature fusion, the features obtained from MSC and MSA are merged along the depth dimension, and subsequently passed to the TCN to extract high-level temporal features. The results generated by MSC and TCN are then passed to the decision fusion module to enhance classification accuracy and robustness. Finally, the classification results of the SMMTM model are produced using a SoftMax function.

Spatiotemporal convolution module

EEG signals exhibit distinct spatiotemporal characteristics. The temporal dynamics and spatial distribution of the signals are critical for MI classification tasks tasks [3234]. Therefore, the temporal convolution in the spatiotemporal convolution (SC) module is used to extract the dynamic temporal features of EEG signals, and the spatial convolution is used to extract the spatial distribution features of electrodes. The combination of temporal and spatial convolutions has been proven essential for the efficient classification of EEG signals in multiple studies [21,27,35]. The SC module consists of serial temporal and spatial convolutions. Initially, temporal convolution is applied to extract primary temporal features, followed by spatial convolution to capture primary spatial features. This dual-feature extraction is crucial for EEG-based tasks, as it allows the model to simultaneously process temporal series over time and spatial relationships across different EEG channels. By integrating temporal and spatial information, SC enhances the model’s ability to understand complex brain activities.

The core of the SC consists of two convolutional layers. The first layer is the temporal CNN. We utilize F1 = 32 convolution kernels with a size of (1, 32) temporal filters to obtain time features from MI-EEG signals. The second layer is the spatial CNN, which uses depthwise convolution with F2 convolution kernel of size (C, 1) spatial filters to capture spatial features. C represents the number of EEG channels, with 22 for BCI-2a and 3 for BCI-2b. Preliminary experiments with additional temporal CNN layers showed minimal improvement in representation but led to overfitting and reduced generalization due to the limited sample size of the datasets. Therefore, using a single layer for both temporal and spatial convolutions strikes an optimal balance between feature extraction and model generalization. The dimension of the output feature from the SC module is F2 = , where D is typically set to 2, representing the number of filters linked to the current and previous layers. Following the spatial convolution, an average pooling layer is applied with a kernel size of (1, 8) and a stride of (1, 8). The pooling operation reduces the signal sampling rate to 32 Hz.

Multi-branch separable convolution module

EEG signals have multi-band characteristics, to capture the differences in EEG signals across different frequency bands, the multi-branch separable convolution MSC was designed. This approach enables the model to take into account both low-frequency and high-frequency components simultaneously, effectively enhancing the model’s ability to understand the signals. The MSC module builds upon the Inception architecture which has proven effectiveness in multi-scale feature extraction [25]. The MSC module consists of three branches, as visualized in Fig 2.

thumbnail
Fig 2. Structure of multi-branch separable convolution.

https://doi.org/10.1371/journal.pone.0333805.g002

The selection of three branches is motivated by the need to cover short-, medium-, and long-range temporal dependencies. Each branch consists of two convolution layers. The first convolutional layers in the three branches use kernel sizes of (1, 8), (1, 16), and (1, 32), respectively, to capture temporal features at multiple scales that are characteristic of EEG signals. Each branch employs depthwise separable convolutions, where the second layer uses a kernel size of (1, 1) for channel-wise integration, enabling efficient feature extraction with fewer parameters. In the MSC module, the number of filters in the first and second convolution layer is set to F2/4 for both the first and second branches. In the third branch of the MSC module, the number of filters in the first and second convolution layer is F2/2. Batch normalization (BN) is used after the second convolution layer to enhance the generalization capability of the model. The BN layer is followed by a nonlinear exponential unit (ELU) activation function. The feature outputs from the three branches are then concatenated in the depth dimension.

Multi-head self-attention module

The attention mechanism can mimic the behavior of the human brain by selectively focusing on important elements while ignoring others. Integrating the attention with deep learning model helps automatically focus on the most important parts of the input data through learning. MSA [36] is a technique widely used in computer vision and natural language processing [37], capable of computting multiple global time-dependent features in parallel to enhance model accuracy. Therefore, we utilize MSA to further extract critical information from the global temporal sequence. MSA takes each element in the same sequence as input, computes and aggregates attention weights between the elements to obtain the representation of each element. The structure of the MSA is shown in Fig 3.

thumbnail
Fig 3. Structure of multi-head self-attention.

https://doi.org/10.1371/journal.pone.0333805.g003

Calculating multi-head attention, for an input sequence , typically involves three main steps:

1. Linear transformation: The input sequence Y undergoes three sets of learnable linear transformations to obtain the three vectors Q (query), K (key), and V (value) of the sequence. These formulas can be calculated as follows:

(1)(2)(3)

These linear transformations allow the model to learn the relationships and dependencies among various elements in the input sequence. These elements are then used in the subsequent attention calculation.

2. Attention calculation: Calculate the attention distribution for each element in the sequence to obtain an attention vector of length m. The formula is as follows:

(4)

where is the square root of the K vector dimension. A dropout with a drop rate of 0.3 is used to prevent overfitting.

3. Multi-head attention: The m attention vectors obtained through the multi-head attention are fused, and the head attention vectors are concatenate to form a representation of length m, which is calculated as follows:

(5)

where m = 8 is the number of attention heads and is the i-th attention head. The MSA builds upon single-head attention. The embedding sequence is transformed through head projection to acquire Q,K,V. Contact represents a concatenation operation, Wo is an adjustable weight matrix utilized to map concatenated attention to the final representation. MSA simultaneously outputs self-attention projects, allowing it to consider information simultaneously from distinct subspaces at various locations.

Temporal convolutional network module

TCN have gained significant attention in recent years for the effectiveness in processing sequential data. TCN have been widely applied in various fields, such as time-series forecasting, natural language processing, and EEG signal decoding, due to the ability to capture long-range dependencies while maintaining stability during training. TCN does not require explicitly maintaining the state of the sequence data, which leads results in more efficient computation and the ability to capture longer temporal dependencies. The TCN module has the same architecture as described in [23]. The structure of the TCN is shown in Fig 4. TCN comprises multiple residual blocks, each consisting of two dilated causal convolutions. BN [38], ELU activation and dropout are applied after each layer, as shown in Fig 4.

thumbnail
Fig 4. Structure of the TCN consisting of two residual blocks.

https://doi.org/10.1371/journal.pone.0333805.g004

Causal convolution [39] constrains the convolution kernel to the current and previous time steps to prevent the SMMTM from learning future data. Dilated convolution [40] enhances the receptive field of the SMMTM without requiring additional convolutional layers.

Therefore, dilated causal convolution enables the model to capture information from extended time series. Point convolutions are utilized as residual connections when dilated causal convolution transforms the data into a different dimensionality, which is then input to the residual block. In the SMMTM model, the dimensions of both the input and output residual blocks for the MI-EEG data sequences are . Therefore, identity mapping is used for the residual connection.

The dilated causal convolution expansion factor increases exponentially with the number of residual blocks L. For the i-th residual block, the expansion factor is 2i-1. The receptive field size (RFS) for TCN module is described as follows:

(6)

In the SMMTM, the input sequence length of TCN is 15, the count of residual blocks L is 2. The RFS needs to be larger than the duration of the input sequence to prevent loss of information during the convolutional process. As a result, the kernel size of the convolutional layer is defined as Kt = 4, resulting in an RFS of 19, which is greater than 15.

The output of the TCN module is final element in the sequence, represented by a vector of size FT = . TC blocks from all windows are concatenated and then sent to the FC layer. Fig 1 illustrates this architecture.

Multimodal feature fusion module

To improve the model’s decoding accuracy and stability, we introduced the MFF module, which consists of feature fusion and decision fusion, as shown in Fig 1. Detailed architectural dimensions and computational complexities are as follows:

Feature fusion combines the output from the SMMTM to capture hidden information and enhance the feature representation. The MSC module outputs features of dimension (F2,1,T//64), where F2 represents the feature depth and T//64 denotes the temporal dimension. Feed the features output by the MSC module into the MSA module. Since the MSA module does not change the dimensions of the input features, the feature dimensions output by the MSA module are (F2,T//64). The features output from the MSC convolution are multi-scale features, and the output of the MSA module is global features. Therefore, the features output from the MSC and the MSA module were concatenated along the depth dimension to obtain a comprehensive fusion feature. Then, the fusion feature is fed into the TCN to extract higher-level time-dependent information. Decision fusion can be used to merge the output multiple classifier results, reducing the uncertainty or error of a single classifier. The prediction results of the TCN and MSC modules are integrated to achieve a more accurate probability. Finally, the probability is inputted into the SoftMax function to obtain the final classification result. The detailed configuration and parameter settings of the proposed model are summarized in Table 1.

thumbnail
Table 1. Propose the parameters of the model, where C is the number of channels, T is the number of sample points, is the number of temporal filters, D is the depth multiplier, , is the number of classifications.

https://doi.org/10.1371/journal.pone.0333805.t001

Experiments

Datasets

This section presents the test results obtained from the EEG data of the famous BCI competition dataset, to validate the effectiveness of the SMMTM.

First, the dataset and experimental settings are presented. Then, we describe the results and discuss our findings. In this work, the SMMTM model was evaluated on the BCI Competition IV-2a [41] and 2b [42] datasets, which have been widely used in the research community and are thus considered benchmark datasets in MI-EEG decoding.

This section provides a description of the 2a dataset, which includes the EEG data of four MI movements of nine subjects. The data were collected on separate days during two sessions for each subject. Both sessions included training and test sets. The EEG data were recorded at a sampling frequency of 250 Hz using 22 Ag/AgCl electrodes. In each trial, the MI data was collected for 4 seconds, as shown in Fig 5.

thumbnail
Fig 5. Timing scheme for each trial, with 4 seconds of MI activity.

https://doi.org/10.1371/journal.pone.0333805.g005

The BCI-2b dataset includes MI data recorded from 9 subjects using 3 electrodes, with a sampling rate of 250 Hz. Each subject participated in five sessions, which included MI tasks for both the left and right hands. For each subject, two training sessions were conducted without visual feedback, followed by three training sessions with visual feedback. The first three sessions were used as the training set, and the last two were used as the test set. The lengths of the tests in the experiment were between 3 and 7 seconds.

Data preprocessing

Data preprocessing is employed to eliminate noise and artifacts. EEG feature extraction plays a significant role in identifying the subject’s imagined movements before classification. In this study, we do not preprocess the raw EEG signals and input MI-EEG signals into the SMMTM, which included C channels and T sample points. We applied Z-score normalization to eliminate variability in the EEG signals to enhance training speed and improve model accuracy:

(7)

where the standardized output and training/test data are denoted as x0 and xi, respectively. The average μ and variance of the training data are calculated, and used to standardize the training and test data.

Experimental details

SMMTM is implemented using the PyTorch 1.12 DL framework in Python 3.7. The training process is performed on an Nvidia GTX 3090 24 GB. We use the Adam optimizer to optimize the model, with the following hyperparameters: learning rate of 0.001, , , and weight decay is 0.001. The overall model loss is computed using the cross-entropy function. The calculation formula is as follows:

(8)

x represents the output of the model, y is the label, M corresponds to the batch size.

The classification accuracy and kappa score are employed as evaluation metrics to assess the overall performance of the model:

(9)

pa denotes the model classification accuracy, while pe denotes the expected consistency level of the model, TP and TN respectively represent true positive and true negative, while FP and FN represent false positive and false negative.

The SMMTM was used to conduct cross-subject and within-subject experiments on subjects. Both sets of experiments used 5-fold cross-validation. The training parameters were determined through cross-validation on the validation set. Additionally, an early stopping [43] method was implemented to prevent overfitting. Model training was stopped if there is no significant loss reduction after 300 steps iterations each branch in the EEG decoding network.

Results

Ablation study

Ablation experiments were conducted on the BCI-2a to evaluate the impact of each layer of the SMMTM on EEG decoding performance. Table 2 presents the within-subject decoding results on BCI-2a for experiments where each module was individually removed, with the best results highlighted. Modules were removed prior to the training and validation procedures. The training parameters are the same as those described in Within-subject Decoding Experiment. The model’s average accuracy decreased by 5.5% with the removal of the MSC module and by 6.7% with the removal of the TCN module, indicating that effective temporal feature extraction significantly contributes to improving signal decoding accuracy. When the MSA module was removed, the model’s average accuracy drops to 82.7%. Adding feature fusion ahead of the TCN module improves the accuracy of the SMMTM by 2.6%. Using decision fusion after the TCN module enhances the accuracy of the SMMTM by 1.5%. The accuracy of the SMMTM is increased by 3.3% by applying MFF. The test results demonstrate that each module effectively improves EEG decoding accuracy.

thumbnail
Table 2. Ablation experiments: Mean accuracy and k-score of deep learning models with various combinations.

https://doi.org/10.1371/journal.pone.0333805.t002

Within-subject decoding experiment

The within-subject decoding performance of the SMMTM was tested on BCI-2a and BCI-2b and compared with that of other excellent algorithms, some of which were based on a reproduction of the original paper. Tables 3 and 4 list the MI decoding accuracy, average accuracy (percentage), k-score, p-value, and std of nine subjects. The best data is highlighted. The results show that the proposed algorithm outperforms other algorithms. The findings illustrate that the model showed good decoding performance in both datasets and achieved higher classification accuracy for poor subjects. In the 2a dataset, topic seven showed the best MI encoding effect, achieving an accuracy rate of 93.06%. In the 2b dataset, topic four showed the best MI encoding effect, achieving an accuracy rate of 96.56%. SMMTM was compared with other algorithms using the Wilcoxon signed-rank test. The results show that SMMTM significantly outperformed most algorithms (p < 0.05), and its decoding accuracy was higher than that of algorithms where the difference was not statistically significant.

thumbnail
Table 3. Within-subject comparison of decoding performance of state-of-the-art-methods on BCI-2a.

https://doi.org/10.1371/journal.pone.0333805.t003

thumbnail
Table 4. Within-subject comparison of decoding performance of state-of-the-art methods on BCI-2b.

https://doi.org/10.1371/journal.pone.0333805.t004

After training with DeepConvNet, ShallowConvNet, EEGNet, and the SMMTM, the feature distribution of subject S07 from the BCI-2a dataset was mapped onto a 2D plane using t-SNE, as illustrated in Fig 6. The visualized feature distribution shows that the features extracted by the SMMTM exhibit more distinct boundaries and well-separated clusters compared to those extracted by ShallowConvNet, DeepConvNet, and EEGNet. This indicates that the SMMTM effectively captures discriminative information, improving the separability of different classes.

thumbnail
Fig 6. The feature vectors for S07 in BCI-2a are distributed in 2D space using the t-SNE method.

https://doi.org/10.1371/journal.pone.0333805.g006

The convolutional kernel weights of the MSC are visualized, and the results are shown in Fig 7. The kernels in MSCConvs 1, 2, and 3 have lengths of 8, 16, and 32 samples, corresponding to 0.25, 0.5, and 1 seconds in time, respectively. The frequency bands learned by the MSCConv 1 is high and wide, and the MSCConv 3 is low and narrow. This demonstrates that the MSC can learn multi-scale of band information with different sizes of temporal filters.

thumbnail
Fig 7. Visualization of the convolutional weights of the MSC for subject 7 in BCI-2a.

https://doi.org/10.1371/journal.pone.0333805.g007

The test results on BCI-2a indicate that the SMMTM achieves the highest average classification accuracy with the smallest std and the highest k-score, as shown in Table 3. Additionally, the SMMTM outperforms other algorithms on most subjects. The accuracy is 7.61% higher than that of EEG-TCNet, which also uses TCN. 12.6% higher than that of EEGNet, which is a CNN-based model, only captures local information from EEG signals. In contrast, SMMTM enhances the CNN architecture by incorporating MSA and TCN to extract both local and global dependencies, thereby improving decoding performance. 5.68% higher than that of CNN-TFCSP, which also uses self-attention, this demonstrates that attention heads MSA can extract more comprehensive global information. 4.49% higher than that of MSFNet, which also uses multi-branch structure. Fig 8 shows, that we use the confusion matrix to evaluate efficiency of the four topic models. The diagonal line is the overall accuracy for each task. We find that, it worked better when classifying the left and right hands in most case, but not well when classifying the feet and tongues. It was difficult for participant 5 to classify feet and tongue, resulting in low classification accuracy.

thumbnail
Fig 8. The confusion matrices of the SMMTM on the MI BCI IV-2a dataset.

https://doi.org/10.1371/journal.pone.0333805.g008

The test results of BCI-2b indicate that the SMMTM achieves the highest average classification accuracy with the smallest std and the highest k-score, as shown in Table 4. Additionally, the SMMTM outperforms other algorithms on most subjects. The accuracy is 4.25% higher than that of EEG-TCNet, which also uses TCN; it is 4.02% higher than the accuracy of EEGNet, which also uses SC, and it is 4.18% higher than that of MSHCNN, which also uses multi-branch structure.

Cross-subject decoding experiment

Table 3 shows the overall accuracy and k-scores for within-subject MI-EEG classification using BCI-2a dataset. Due to the significant within-subject variability in EEG signals, the cross-subject decoding performance of the SMMTM on BCI-2a was tested to measure its generalizability. It was compared with that of other excellent algorithms. The replicated experimental results in Table 5 of this paper were obtained by replicating the network models based on the parameters of the network models in the original paper and training them through the Leave-One-Subject-Out (LOSO) method proposed in this paper. Meanwhile, these models were also experimentally validated in the corresponding public datasets BCI-2a and BCI-2b in the original paper. The calculation results of other network models in Table 5 are the calculation results provided in the original paper, which are directly cited in the table.

thumbnail
Table 5. Cross-subject comparison of decoding performance of state-of-the-art methods on BCI-2a.

https://doi.org/10.1371/journal.pone.0333805.t005

The model was evaluated using the leave one subject out (LOSO) method during cross-subject experiments. This approach involves selecting one subject from the nine subjects employed for testing purposes, while designating the remaining eight subjects as the training set. Table 5 lists the average accuracy (percentage) for each algorithm. The data for the best algorithm is highlighted. The results show that the proposed SMMTM achieves the highest average classification accuracy with the highest k-score, as shown in Table 5. Additionally, the SMMTM outperforms other algorithms on most subjects. The accuracy is 8.16% higher than that of EEG-TCNet, which also uses TCN, and it is 12.01% higher than the accuracy of EEGNet, which also uses SC. The Wilcoxon signed-rank test results indicate that SMMTM significantly outperformed EEGNet (p = 0.009), ShallowConvNet (p = 0.011), CCNN (p = 0.009), and CRAM (p = 0.039). with decoding accuracy was notably higher than that of EEG-TCNet and G-CRAM. These results validate the SMMTM as an effective decoding algorithm for identifying accurately EEG signals.

Conclusion

This study proposes the SMMTM, a hybrid high performance neural network based on five modules: SC, MSC, MSA, TCN, and MFF. Each module plays a crucial role in improving the model’s performance, as demonstrated by the ablation experiments, which show that the inclusion of each component leads to enhanced decoding accuracy. The model’s interpretability is illustrated through visualization methods, demonstrating the effectiveness of MSC in capturing multi-scale temporal features. The within-subject accuracy of the proposed SMMTM algorithm achieved 84.96% on the BCI-2a dataset and demonstrated superior performance over other state-of-the-art algorithms mentioned in the paper. For cross-subject evaluation on the BCI-2a dataset, the accuracy reached 69.21%, outperforming comparable methods. Additionally, on the BCI-2b dataset, the within-subject results of SMMTM also surpassed the referenced algorithms, further validating its effectiveness across different public datasets.The effectiveness of the proposed algorithm was verified by means of p-value tests on both the BCI-2b and BCI-2a datasets. These results indicate that the SMMTM model is not only capable of capturing intricate features from EEG signals but also robust across subjects, which is critical for real-world applications. The high classification accuracy and robustness of the model suggest that SMMTM could serve as a powerful tool for future BCI applications, potentially improving the efficiency and reliability of BCI systems. In future work, we will attempt to conduct online experimental verification. This will involve real-time implementation and evaluation of the model in practical BCI scenarios, allowing us to assess its real-world performance and make necessary adjustments for optimization.

References

  1. 1. Wolpaw JR, Birbaumer N, McFarland DJ, Pfurtscheller G, Vaughan TM. Brain-computer interfaces for communication and control. Clin Neurophysiol. 2002;113(6):767–91. pmid:12048038
  2. 2. Condori KA, Urquizo EC, Diaz DA. Embedded brain machine interface based on motor imagery paradigm to control prosthetic hand. In: 2016 IEEE ANDESCON. IEEE; 2016. p. 1–4.
  3. 3. Cho W, Guger C, Heilinger A, Ortner R, Murovec N, Xu R, et al. Motor rehabilitation for hemiparetic stroke patients using a brain-computer interface method. In: 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE; 2018. p. 1001–5. https://doi.org/10.1109/smc.2018.00178
  4. 4. Grover N, Chharia A, Upadhyay R, Longo L. Schizo-net: a novel schizophrenia diagnosis framework using late fusion multimodal deep learning on electroencephalogram-based brain connectivity indices. IEEE Trans Neural Syst Rehabil Eng. 2023;31:464–73. pmid:37022027
  5. 5. Gu L, Yu Z, Ma T, Wang H, Li Z, Fan H. EEG-based classification of lower limb motor imagery with brain network analysis. Neuroscience. 2020;436:93–109. pmid:32283182
  6. 6. Xiong M, Brandenberger A, Bulger M, Chien W, Doyle A, Hao W, et al. A low-cost, semi-autonomous wheelchair controlled by motor imagery and jaw muscle activation. In: 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC). IEEE; 2019. p. 2180–5. https://doi.org/10.1109/smc.2019.8914544
  7. 7. Fabiani GE, McFarland DJ, Wolpaw JR, Pfurtscheller G. Conversion of EEG activity into cursor movement by a brain-computer interface (BCI). IEEE Trans Neural Syst Rehabil Eng. 2004;12(3):331–8. pmid:15473195
  8. 8. Bundy DT, Souders L, Baranyai K, Leonard L, Schalk G, Coker R, et al. Contralesional brain-computer interface control of a powered exoskeleton for motor recovery in chronic stroke survivors. Stroke. 2017;48(7):1908–15. pmid:28550098
  9. 9. Alazrai R, Abuhijleh M, Alwanni H, Daoud MI. A deep learning framework for decoding motor imagery tasks of the same hand using EEG signals. IEEE Access. 2019;7:109612–27.
  10. 10. Wang G, Teng C, Li K, Zhang Z, Yan X. The removal of EOG artifacts from EEG signals using independent component analysis and multivariate empirical mode decomposition. IEEE J Biomed Health Inform. 2016;20(5):1301–8. pmid:26126290
  11. 11. Hernández D, Trujillo L, Z-Flores E, Villanueva O, Romo-Fewell O. Detecting epilepsy in EEG signals using time, frequency and time-frequency domain features. Computer science and engineering—theory and applications. 2018. p. 167–82.
  12. 12. Güler I, Ubeyli ED. Multiclass support vector machines for EEG-signals classification. IEEE Trans Inf Technol Biomed. 2007;11(2):117–26. pmid:17390982
  13. 13. Saeedi M, Saeedi A, Maghsoudi A. Major depressive disorder assessment via enhanced k-nearest neighbor method and EEG signals. Phys Eng Sci Med. 2020;43(3):1007–18. pmid:32662038
  14. 14. Nedelcu E, Portase R, Tolas R, Muresan R, Dinsoreanu M, Potolea R. Artifact detection in EEG using machine learning. In: 2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP). 2017. p. 77–83. https://doi.org/10.1109/iccp.2017.8116986
  15. 15. Dose H, Møller JS, Iversen HK, Puthusserypady S. An end-to-end deep learning approach to MI-EEG signal classification for BCIs. Expert Systems with Applications. 2018;114:532–42.
  16. 16. Craik A, He Y, Contreras-Vidal JL. Deep learning for electroencephalogram (EEG) classification tasks: a review. J Neural Eng. 2019;16(3):031001. pmid:30808014
  17. 17. Kumar S, Sharma R, Sharma A. OPTICAL+: a frequency-based deep learning scheme for recognizing brain wave signals. PeerJ Comput Sci. 2021;7:e375. pmid:33817023
  18. 18. Xu J, Zheng H, Wang J, Li D, Fang X. Recognition of EEG signal motor imagery intention based on deep multi-view feature learning. Sensors (Basel). 2020;20(12):3496. pmid:32575798
  19. 19. Wang J, Yu G, Zhong L, Chen W, Sun Y. Classification of EEG signal using convolutional neural networks. In: 2019 14th IEEE Conference on Industrial Electronics and Applications (ICIEA). IEEE; 2019. p. 1694–8. https://doi.org/10.1109/iciea.2019.8834381
  20. 20. Heilmeyer FA, Schirrmeister RT, Fiederer LDJ, Volker M, Behncke J, Ball T. A large-scale evaluation framework for EEG deep learning architectures. In: 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC). 2018. p. 1039–45. https://doi.org/10.1109/smc.2018.00185
  21. 21. Lawhern VJ, Solon AJ, Waytowich NR, Gordon SM, Hung CP, Lance BJ. EEGNet: a compact convolutional neural network for EEG-based brain-computer interfaces. J Neural Eng. 2018;15(5):056013. pmid:29932424
  22. 22. Musallam YK, AlFassam NI, Muhammad G, Amin SU, Alsulaiman M, Abdul W, et al. Electroencephalography-based motor imagery classification using temporal convolutional network fusion. Biomedical Signal Processing and Control. 2021;69:102826.
  23. 23. Ingolfsson TM, Hersche M, Wang X, Kobayashi N, Cavigelli L, Benini L. EEG-TCNet: an accurate temporal convolutional network for embedded motor-imagery brain–machine interfaces. In: 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE; 2020. https://doi.org/10.1109/smc42975.2020.9283028
  24. 24. Li H, Chen H, Jia Z, Zhang R, Yin F. A parallel multi-scale time-frequency block convolutional neural network based on channel attention module for motor imagery classification. Biomedical Signal Processing and Control. 2023;79:104066.
  25. 25. Riyad M, Khalil M, Adib A. Incep-eegnet: a convnet for motor imagery decoding. In: Image, Signal Processing: 9th International Conference and ICISP 2020, Marrakesh, Morocco, June 4–6, 2020, Proceedings 9. Springer; 2020. p. 103–11.
  26. 26. Hu Z, Chen L, Luo Y, Zhou J. EEG-based emotion recognition using convolutional recurrent neural network with multi-head self-attention. Applied Sciences. 2022;12(21):11255.
  27. 27. Altaheri H, Muhammad G, Alsulaiman M. Physics-informed attention temporal convolutional network for EEG-based motor imagery classification. IEEE Trans Ind Inf. 2023;19(2):2249–58.
  28. 28. Mammone N, Ieracitano C, Adeli H, Morabito FC. AutoEncoder filter bank common spatial patterns to decode motor imagery from EEG. IEEE J Biomed Health Inform. 2023;27(5):2365–76. pmid:37022818
  29. 29. Mammone N, Ieracitano C, Spataro R, Guger C, Cho W, Morabito FC. A few-shot transfer learning approach for motion intention decoding from electroencephalographic signals. Int J Neural Syst. 2024;34(2):2350068. pmid:38073546
  30. 30. Luo J, Gao X, Zhu X, Wang B, Lu N, Wang J. Motor imagery EEG classification based on ensemble support vector learning. Comput Methods Programs Biomed. 2020;193:105464. pmid:32283387
  31. 31. Ma W, Xue H, Sun X, Mao S, Wang L, Liu Y, et al. A novel multi-branch hybrid neural network for motor imagery EEG signal classification. Biomedical Signal Processing and Control. 2022;77:103718.
  32. 32. Ding Y, Robinson N, Zhang S, Zeng Q, Guan C. TSception: capturing temporal dynamics and spatial asymmetry from EEG for emotion recognition. IEEE Trans Affective Comput. 2023;14(3):2238–50.
  33. 33. Sharma R, Meena HK. Emerging trends in EEG signal processing: a systematic review. SN COMPUT SCI. 2024;5(4).
  34. 34. Xie J, Zhang J, Sun J, Ma Z, Qin L, Li G, et al. A transformer-based approach combining deep learning network and spatial-temporal information for raw EEG classification. IEEE Trans Neural Syst Rehabil Eng. 2022;30:2126–36. pmid:35914032
  35. 35. Liang G, Cao D, Wang J, Zhang Z, Wu Y. Eisatc-fusion: inception self-attention temporal convolutional network fusion for motor imagery EEG decoding. TechRxiv. 2023.
  36. 36. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Advances in neural information processing systems. 2017;30.
  37. 37. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T. An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint. 2020. https://arxiv.org/abs/2010.11929
  38. 38. Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. PMLR; 2015. p. 448–56.
  39. 39. Oord A v d, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A. Wavenet: a generative model for raw audio. arXiv preprint 2016. https://arxiv.org/abs/1609.03499
  40. 40. Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. arXiv preprint 2015. https://arxiv.org/abs/1511.07122
  41. 41. Brunner C, Leeb R, Müller-Putz G, Schlögl A, Pfurtscheller G. BCI competition 2008 –Graz data set A. Institute for Knowledge Discovery (Laboratory of Brain-Computer Interfaces), Graz University of Technology. 2008;16:1–6.
  42. 42. Leeb R, Brunner C, Muller-Putz GR, Schlogl A. BCI competition 2008 –Graz data set B. Putz. 2008.
  43. 43. Liang G, Cao D, Wang J, Zhang Z, Wu Y. EISATC-fusion: inception self-attention temporal convolutional network fusion for motor imagery EEG decoding. IEEE Trans Neural Syst Rehabil Eng. 2024;32:1535–45. pmid:38536681
  44. 44. Schirrmeister RT, Springenberg JT, Fiederer LDJ, Glasstetter M, Eggensperger K, Tangermann M, et al. Deep learning with convolutional neural networks for EEG decoding and visualization. Hum Brain Mapp. 2017;38(11):5391–420. pmid:28782865
  45. 45. Zhi H, Yu Z, Yu T, Gu Z, Yang J. A multi-domain convolutional neural network for EEG-based motor imagery decoding. IEEE Trans Neural Syst Rehabil Eng. 2023;31:3988–98. pmid:37815970
  46. 46. Zhang R, Liu G, Wen Y, Zhou W. Self-attention-based convolutional neural network and time-frequency common spatial pattern for enhanced motor imagery classification. J Neurosci Methods. 2023;398:109953. pmid:37611877
  47. 47. Wang C, Wu Y, Wang C, Ren Y, Shen J, Pang T, et al. MSFNet: a multi-scale space-time frequency fusion network for motor imagery EEG classification. IEEE Access. 2024;12:8325–36.
  48. 48. Qin Y, Yang B, Ke S, Liu P, Rong F, Xia X. M-FANet: multi-feature attention convolutional neural network for motor imagery decoding. IEEE Trans Neural Syst Rehabil Eng. 2024;32:401–11. pmid:38194394
  49. 49. Ha K-W, Jeong J-W. Motor imagery EEG classification using capsule networks. Sensors (Basel). 2019;19(13):2854. pmid:31252557
  50. 50. Tang X, Yang C, Sun X, Zou M, Wang H. Motor imagery EEG decoding based on multi-scale hybrid networks and feature enhancement. IEEE Trans Neural Syst Rehabil Eng. 2023;31:1208–18. pmid:37022411
  51. 51. Xie X, Chen L, Qin S, Zha F, Fan X. Bidirectional feature pyramid attention-based temporal convolutional network model for motor imagery electroencephalogram classification. Front Neurorobot. 2024;18:1343249. pmid:38352723
  52. 52. Amin SU, Alsulaiman M, Muhammad G, Mekhtiche MA, Shamim Hossain M. Deep learning for EEG motor imagery classification based on multi-layer CNNs feature fusion. Future Generation Computer Systems. 2019;101:542–54.
  53. 53. Zhang D, Yao L, Chen K, Monaghan J. A convolutional recurrent attention model for subject-independent EEG signal analysis. IEEE Signal Process Lett. 2019;26(5):715–9.
  54. 54. Zhang D, Chen K, Jian D, Yao L. Motor imagery classification via temporal attention cues of graph embedded EEG signals. IEEE J Biomed Health Inform. 2020;24(9):2570–9. pmid:31976916