RMETNet: A cross-subject motor imagery EEG signal classification model based on TSLANet and riemannian geometry features

Yun Zhao; Dongyi He; Fudai Ren; Qingling Xia; Linhao Xu; Guanghui Xie; Xiaoling Zhang; Renqiang Yang; Shuaidong Zou; Bin Jiang

doi:10.1371/journal.pone.0347671

Abstract

Motor imagery electroencephalogram (MI-EEG) analysis is essential for natural interaction and autonomous control in brain-computer interfaces (BCIs). However, deep learning models often struggle with inter-subject variability, which limits their ability to generalize across subjects. This study proposes RMETNet, a novel framework that integrates TSLANet, a spatio-temporal convolution module, and a multi-scale Riemannian geometry feature module. TSLANet suppresses noise and captures complex temporal patterns for preliminary signal decoding, while the spatio-temporal convolution module extracts higher-order representations. The Riemannian branch learns geometry-based distribution features across subjects, and the fused features are used for classification. To address inter-subject distribution shifts, RMETNet incorporates Maximum Mean Discrepancy (MMD) loss for domain adaptation, aligning feature distributions between source and target domains. Experiments show that on the four-class BCI Competition IV 2a (BCICIV2a) dataset, RMETNet achieved accuracies of 71.39% in the cross-subject setting and 80.71% in the subject-dependent setting; on the two-class BCI Competition IV 2b (BCICIV2b) dataset, it achieved 80.93% and 86.76%, respectively. The model consistently outperformed baseline algorithms. Ablation and visualization analyses further validated its effectiveness in reducing inter-subject feature distribution disparities and enhancing MI-EEG decoding. The code is available at: https://github.com/rokanfeermecer486/RMETNet.

Citation: Zhao Y, He D, Ren F, Xia Q, Xu L, Xie G, et al. (2026) RMETNet: A cross-subject motor imagery EEG signal classification model based on TSLANet and riemannian geometry features. PLoS One 21(4): e0347671. https://doi.org/10.1371/journal.pone.0347671

Editor: Onder Aydemir, Karadeniz Technical University: Karadeniz Teknik Universitesi, TÜRKIYE

Received: December 22, 2025; Accepted: April 6, 2026; Published: April 22, 2026

Copyright: © 2026 Zhao et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The code of this is available at: https://github.com/rokanfeermecer486/RMETNet.

Funding: This research was funded by the Scientific and Technological Research Program of the Chongqing Education Commission (KJZD-K202303103, KJQN202501104), and the Natural Science Foundation of Chongqing (CSTB2025NSCQ-GPX0794). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Brain-computer interface (BCI) technology enables direct interaction between the human brain and external devices by decoding the electrophysiological signals generated by brain activity, bringing revolutionary breakthroughs to fields such as medical rehabilitation and intelligent control [1]. Among these signals, motor imagery electroencephalography (MI-EEG) is a non-invasive neural signal with high temporal resolution. It is elicited when participants mentally rehearse specific movements, such as left-hand, right-hand, or foot movements, without overt limb motion, thereby activating the motor cortex. This process induces event-related desynchronization (ERD) and event-related synchronization (ERS) in motor-related regions, which are reflected in marked changes in the energy of the rhythm (8–12 Hz) and the rhythm (13–30 Hz) [2]. These characteristics provide an important physiological basis for MI-based BCI research. MI-EEG has become one of the core control signals in BCI systems. Accordingly, accurate MI-EEG decoding is central to efficient brain-computer interaction.

In MI-EEG decoding, the main methods include traditional approaches and deep learning methods. Traditional methods primarily rely on signal processing and shallow machine learning models to analyze EEG signals; however, these methods heavily depend on manually selected features and struggle to capture complex nonlinear relationships [3]. In contrast, deep learning methods, with their robust nonlinear modeling capabilities, have found widespread application in MI-EEG decoding [4]. Among these, convolutional neural networks (CNNs) can effectively extract time-frequency domain features of EEG through their hierarchical structure; recurrent neural networks (RNNs) and their variants, long short-term memory networks (LSTMs), can effectively capture the temporal dynamics of EEG; and models based on the Transformer architecture can significantly enhance the modeling capability of global dependencies in EEG. These models can automatically extract various spatiotemporal features or latent features of MI-EEG, significantly improving EEG decoding accuracy to a certain extent [5]. However, most existing studies focus on subject-dependent models, whereas cross-subject decoding is essential for improving model generalization and supporting broader BCI deployment. Because of substantial neurophysiological heterogeneity across individuals, the same motor imagery task may evoke different EEG response patterns in different subjects. In addition, differences in recording devices and environmental noise further increase the difficulty of cross-subject EEG analysis. As a result, existing models still show limited generalization because they do not effectively capture the relationships between intra-subject and inter-subject features [5]. To address this, researchers have begun to incorporate transfer learning strategies into inter-subject EEG analysis to enhance MI-EEG decoding capabilities. Zhang et al. proposed an adaptive transfer learning algorithm that adjusts pre-trained deep convolutional neural network models to adapt to new target subjects, thereby improving cross-subject decoding performance [6]. Additionally, semi-supervised multi-source transfer learning models have also improved the performance of cross-subject MI-EEG classification tasks to some extent by learning information-rich and domain-invariant representations [7]. Although these methods have made progress in cross-subject EEG decoding, they still show insufficient learning of domain-invariant features and limited robustness to inter-individual EEG noise, which constrains overall decoding performance.

To address these issues, this paper proposes RMETNet (Riemannian MMD-enhanced EEG TSLANet), a neural network designed for cross-subject MI-EEG decoding. First, the Time Series Lightweight Adaptive Network (TSLANet) is introduced to capture long- and short-term interactions in MI-EEG signals while adaptively attenuating high-frequency noise. This module contains an Adaptive Spectral Block (ASB) and an Interactive Convolution Block (ICB), which perform denoising and preliminary temporal decoding. Second, a spatio-temporal convolution module is used to extract higher-level spatio-temporal representations from the TSLANet output. Finally, a multi-scale Riemannian geometry feature module is designed to learn domain-invariant geometric features across subjects, and these features are fused with the convolutional features to improve classification performance. In addition, Maximum Mean Discrepancy (MMD) loss is introduced during training to align source- and target-domain feature distributions, thereby reducing inter-subject variability and improving cross-subject MI-EEG decoding performance.

The main contributions of this study are summarized as follows:

RMETNet integrates TSLANet, spatio-temporal convolution, and multi-scale Riemannian feature learning to capture complementary temporal, spatial, and geometric information, thereby improving MI-EEG decoding in both subject-dependent and cross-subject settings.
To better model cross-subject distribution differences, an MMD-loss-based domain adaptation strategy is introduced during training to align source- and target-domain features, reduce inter-subject variability, and improve cross-subject MI-EEG decoding performance.
The effectiveness of RMETNet was validated on the public BCI Competition IV 2a (BCICIV2a) and BCI Competition IV 2b (BCICIV2b) datasets. On BCICIV2a, the model achieved accuracies of 71.39% in the cross-subject setting and 80.71% in the subject-dependent setting; on BCICIV2b, it achieved 80.93% and 86.76%, respectively. In both evaluation settings, RMETNet outperformed the competing methods.

Related work

Subject-dependent MI-EEG decoding

In recent years, deep learning algorithms have demonstrated strong feature extraction capabilities in subject-dependent MI-EEG classification tasks [8]. Researchers have proposed various targeted deep learning architectures based on classical models, such as convolutional neural networks (CNN) [9–11], recurrent neural networks (RNN) [12,13], and hybrid models [14,15], which effectively address the limitations of traditional methods in terms of spatio-temporal feature extraction capabilities. For example, Lawhern et al. [16] proposed a CNN model for analyzing MI-EEG, capable of extracting spectral and temporal features. However, this network is a simple, shallow-layer model with limited feature extraction capabilities. Li et al. [17] utilized CNN to capture dependencies and temporal features between subject-dependent MI-EEG channels and further extracted their higher-order features, effectively improving decoding performance for subject-dependent tasks; Yang et al. [18] constructed a dual-branch CNN model (TBTF-CNN) to simultaneously capture EEG temporal and frequency features, thereby enhancing decoding performance to some extent. Although these models achieved good results in subject-dependent tasks, their adaptability to individual differences is limited, and the features extracted by the dual-branch model may ultimately be integrated through concatenation or simple fusion modules, resulting in limitations in the model’s robustness. Courav et al. [19] utilized the Transformer architecture to model raw EEG signals and explored the impact of input window and convolutional layer parameters on performance, demonstrating the Transformer’s strong generalization capability in small-sample environments. However, the model overly focused on the noise frequency bands of the signal, leading to overfitting in the results. To enhance the modeling capability of complex dynamic features in EEG signals, Ghosh et al. [20] designed a hierarchical network structure by simulating the oscillatory synchronization characteristics of brain neurons, enabling adaptive extraction of dynamic features across different frequency and temporal scales in multi-channel EEG signals. Based on a similar multi-scale modeling approach, Liao et al. [21] integrated a dual-branch structure, improved attention convolutional blocks, and temporal convolutional networks to construct a composite feature extraction framework. Through a multi-level feature fusion strategy, this method not only effectively enhances classification performance in motor imagery tasks but also demonstrates good scalability in terms of computational efficiency. Additionally, Zhang et al. [22] constructed a classification model based on the Inception network, extracting EEG features through a parallel approach combining depth and width, and utilizing residual modules to mitigate the vanishing gradient problem. The model demonstrated potential in cross-subject experiments, but its adaptability to cross-subject tasks was limited due to the absence of a cross-subject training strategy.

Although the aforementioned deep learning methods have improved subject-dependent MI-EEG classification, several limitations remain. Most approaches emphasize temporal or time-frequency characteristics while paying insufficient attention to electrode topology and interactions among brain regions, which restricts model accuracy and robustness in MI-EEG decoding. In addition, some models can capture richer spatio-temporal information but still generalize poorly because they do not fully account for the low signal-to-noise ratio and non-stationarity of MI-EEG signals. As a result, methods designed for subject-dependent settings often fail to align feature distributions effectively across individuals, leading to limited performance in cross-subject decoding.

Cross-subject MI-EEG decoding

Cross-subject MI-EEG classification tasks are of great significance for improving the generalization ability and robustness of BCI in practical applications. Researchers have begun to introduce transfer learning strategies into inter-subject EEG analysis to alleviate the inconsistency of feature distributions among different subjects and improve MI-EEG decoding ability [23]. For example, Wu et al. [24] proposed a parallel multi-scale filter bank convolutional neural network, which uses hierarchical feature extraction and fine-tuning strategies to construct individualized models on small-sample datasets, thereby significantly improving the adaptability of cross-subject motor imagery classification; EEGSym [25] employs a symmetric network architecture to integrate multi-subject data, effectively mitigating differences in feature distributions across subjects. This method enhances the model’s ability to model common features and improves its generalization performance in cross-subject tasks by introducing a shared parameter mechanism and symmetry loss constraints in the encoder structure. Additionally, a multi-scale adaptive network based on Transformers [26] utilizes a subject adapter module to dynamically fine-tune target data and employs multi-head attention mechanisms to capture signal temporal dependencies, further enhancing the efficiency of cross-subject transfer learning.

Although the aforementioned methods have improved the performance of cross-subject MI-EEG classification to some extent, several key issues remain to be further investigated. First, most algorithms primarily focus on static feature modeling, neglecting the dynamic evolution of EEG signals in the temporal dimension during motor imagery and lacking in-depth modeling of long-term and short-term temporal dependencies. Second, while these methods generally emphasize improving model generalizability, they still fall short in capturing and adapting to individual differences in features. To mitigate the non-stationarity and noise interference of EEG signals, some studies have introduced Riemannian geometry methods [27], which map the covariance matrices of the source domain and target domain to a symmetric positive-definite manifold space, optimizing feature alignment from both statistical and geometric perspectives, and integrating learnable wavelet transforms and Riemannian features into a deep neural network framework. This method demonstrates certain advantages in enhancing cross-subject robustness. However, such models involve high computational complexity during training, limiting their real-time applicability; additionally, over-reliance on a single Riemannian geometric feature makes them susceptible to the non-stationarity and noise fluctuations of EEG signals, leading to unstable covariance estimates. To further enhance cross-subject generalization capabilities, researchers have also introduced strategies based on transfer learning [28]. This type of method effectively enhances the model’s ability in multi-scale feature extraction and cross-domain feature alignment through knowledge transfer between the source domain and the target domain, demonstrating superior cross-subject classification performance on multiple datasets. However, existing transfer learning methods still have shortcomings in spatio-temporal dynamic modeling, cross-domain information fusion, and multi-modal feature collaboration, and further optimization is needed.

To address these challenges, this paper proposes a cross-subject MI-EEG decoding framework that combines TSLANet, spatio-temporal convolution, and domain-adaptive learning. The adaptive spectral and interactive convolution components capture short- and long-term temporal interactions while suppressing high-frequency noise. In addition, the transfer learning framework reduces domain inconsistency between subjects and improves feature alignment across domains. By integrating TSLANet with domain-adaptive learning, the proposed model can jointly handle static distribution shifts and dynamic neural activity patterns during motor imagery, thereby improving MI-EEG decoding performance.

Methodology

Overview of RMETNet

RMETNet is a deep learning framework designed for cross-subject MI-EEG decoding. As shown in Fig 1, it consists of three main modules: (1) TSLANet for temporal feature learning, (2) a spatio-temporal convolution module for higher-level spatio-temporal representation learning, and (3) a multi-scale Riemannian convolution module for cross-subject geometric feature extraction. The inputs include preprocessed raw EEG signals and Riemannian features computed from the source and target domains. The network contains two branches. The first branch takes raw EEG as input, extracts temporal features with TSLANet, and further refines them through temporal and depthwise convolutions. The second branch takes Riemannian features as input and learns geometric representations that characterize cross-subject distribution differences. The features from the two branches are then fused and fed into a fully connected layer with a Softmax classifier for MI-EEG classification. The detailed structure of RMETNet is listed in Table 1.

Download:

Fig 1. The architecture of RMETNet.

https://doi.org/10.1371/journal.pone.0347671.g001

Download:

Table 1. RMETNet Structural Details.

https://doi.org/10.1371/journal.pone.0347671.t001

More importantly, the proposed model uses Maximum Mean Discrepancy loss (MMD loss) as a domain alignment objective to measure feature distribution discrepancies, as shown in Fig 2. The goal of domain adaptation is to transfer knowledge learned from the source domain to a related target domain by minimizing the discrepancy between their feature distributions. MMD is an unsupervised distribution alignment method that embeds data from the source and target domains into a Reproducing Kernel Hilbert Space (RKHS) and calculates the distance between their mean embeddings in that space to quantify and reduce inter-domain distribution differences. During training, this discrepancy is added to the overall loss as a regularization term. In this way, the global distributions of the two domains are aligned, and cross-domain feature learning is achieved through shared network parameters, as defined below:

(1)

Download:

Fig 2. The training process of RMETNet.

https://doi.org/10.1371/journal.pone.0347671.g002

where represents the classification loss, denotes the weighting factor calculated from the source-domain labels, and denotes the MMD loss, which minimizes the distribution discrepancy between the source- and target-domain features. Its calculation formula is as follows:

(2)

where x_i and y_j represent the feature representations of the source domain and target domain obtained from the RMETNet, respectively, and are the feature mappings of the source and target domains in the RKHS, k(x_i, y_j) is the kernel function, and m and n are the number of samples in the source and target domains, respectively. The MMD loss is used to align the feature distributions of the source and target domains, thereby reducing the impact of inter-subject variability on the model.

TSLANet module

Time-series data are recorded continuously, and each time point contains only limited scalar information; as a result, subtle time-frequency variations in EEG signals are difficult to characterize from single samples alone. Traditional time-frequency analysis methods usually extract only simple temporal descriptors, such as power and phase synchronization, and may therefore miss richer dynamic information [29]. To address this issue, the TSLANet module in RMETNet adopts a hybrid convolutional design that combines spectral modeling with cross-time-frequency interaction, enabling more effective extraction of complex temporal dynamics from EEG signals.

As shown in Fig 3, TSLANet consists of two main components: the Adaptive Spectral Block (ASB) and the Interactive Convolution Block (ICB). The ASB transforms EEG time-series data into the frequency domain, applies adaptive thresholding to suppress high-frequency noise and emphasize informative components, and then reconstructs enhanced time-domain features through the inverse Fourier transform. The ICB is a lightweight convolutional block that uses multiple kernel sizes to refine features interactively, thereby improving the extraction of both local and global temporal patterns. The output of TSLANet is a set of features that jointly encode spectral and temporal information and serve as the input to the subsequent spatio-temporal convolution module.

Download:

Fig 3. The architecture of TSLANet.

https://doi.org/10.1371/journal.pone.0347671.g003

The TSLANet module processes a multi-channel EEG signal, , where N is the number of channels and M is the number of time points. This signal is first passed through an embedding layer to obtain a deep feature representation, , which serves as the input to the first TSLANet layer. For the l-th TSLANet layer, it takes the output from the preceding layer, , and extracts deep temporal variations via a residual connection, formulated as:

(3)

Specifically, the TSLANet module begins by transforming the input features into a comprehensive spectral representation F generated by applying a Fast Fourier Transform (FFT) to each channel. To address high-frequency noise in non-stationary EEG signals, the module employs an adaptive filter. This filter first calculates the signal’s Power Spectral Density (PSD), P = |F|², and then utilizes a learnable threshold, , optimized via backpropagation, to dynamically remove noise. This filtering operation is defined by:

(4)

where is a binary mask, and ⊙ denotes element-wise multiplication. This operation retains the spectral components of F that exceed the threshold, effectively filtering out noise while preserving significant features.

To further enrich the features, the module uses a dual-path filter architecture: a global filter W_G and a local filter W_L operate on the original spectrum F and the filtered spectrum F_filtered, respectively, then these two filters are fused through element-wise addition:

(5)

where W_G and W_L are learnable parameters that adaptively adjust the contributions of the global and local filters.

The integrated features are subsequently reconstructed back into the time domain via an Inverse Fast Fourier Transform (IFFT), resulting in an enhanced signal:

(6)

Finally, this enhanced time-domain representation, , is fed into an interactive convolutional module designed to capture both local details and long-range temporal dependencies. This module uses a convolutional layer with a small kernel (Conv1D₁) in parallel with a layer having a large kernel (Conv1D₂), interacting through a gating mechanism:

(7)

(8)

where is the GELU activation function. The summed output of these two branches is passed through a final convolutional layer to produce the module’s final output, X_out, which represents the term in the residual connection:

(9)

Spatio-temporal convolution module

To enable the model to learn local temporal features, two sequential convolutional layers are employed. A two-dimensional convolutional kernel of size (1, K_e), denoted as , is used to capture temporal information. Additionally, a two-dimensional convolutional kernel of size (C, 1), denoted as , is applied to capture spatial information, where C corresponds to the number of EEG channels. The spatial convolution serves to learn spatial filters (i.e., correlations among EEG channels), effectively extracting local features in the temporal dimension and global features in the spatial dimension from the EEG signals.

Let X_1D denote the 1D feature sample output by the TSLANet module. The output is reshaped into a 2D sample as follows:

(10)

where denotes the expansion operation along the second dimension, i.e., transforming the input from to so that the EEG features can match the 2D convolutional kernels.

To preserve the non-stationary characteristics of EEG signals, a square activation function is applied. To reduce noise and the number of model parameters, a pooling layer is used to downsample the feature maps. Furthermore, to normalize the mean of the latent features to zero, thereby improving training stability and accelerating the convergence process [30], batch normalization is applied after the activation function. The output of the convolution operation, denoted as , where b is the number of samples, f is the number of output feature maps, c is the number of channels, and s is the number of feature points per sample. The operations are formally defined as:

(11)

(12)

where and represent the kernel sizes of the two convolution operations. Conv2D() denotes the two-dimensional convolution operation, Square() denotes the square activation function, and AvgPool() denotes average pooling.

To ensure the stability of feature distributions and to avoid internal covariate shift within the model, a logarithmic activation function and batch normalization are applied after the square activation, as defined below:

(13)

where is the output feature map after the three convolution and pooling operations, denotes the logarithmic activation function, and BatchNorm() denotes batch normalization.

Multi-scale riemannian convolution module

Riemannian geometry provides an effective framework for aligning covariance-based EEG representations on a symmetric positive-definite manifold. After defining an appropriate metric, the mapped features preserve both global structure and inter-channel relationships, which helps reduce distribution differences across subjects and domains. For EEG signal processing, this study computes covariance matrices from the source and target domains and maps them into Riemannian space to obtain aligned feature representations. This process captures both global and local structural information in EEG signals and improves the model’s generalization ability in cross-subject learning.

The input to the Riemannian convolution module is the raw EEG data, which is first processed to obtain the covariance matrix of the source domain and target domain data. The covariance matrix is calculated as follows:

(14)

where X_i is the i-th sample of the EEG signal, is the mean of the samples, and N is the number of samples. The covariance matrix captures the second-order statistics of the EEG signal, reflecting the relationships between different channels.

Next, the Riemannian distance between the covariance matrices of the source and target domains is computed as follows:

(15)

where Cov₁ and Cov₂ are the covariance matrices of the source and target domains, respectively, denotes the Frobenius norm, and the matrix logarithm maps the covariance matrices into Riemannian space, where the distance is measured.

To align all covariance matrices to a reference point in Riemannian space—commonly the geometric mean of all matrices—we project each covariance matrix to the tangent space at via:

(16)

To enhance the robustness of the aligned covariance matrices and extract multi-scale features, three convolutional layers with different kernel sizes are used to capture local, mid-range, and global structures. These extracted features are subsequently fused, flattened, and projected into a compact feature space:

(17)

(18)

The output of this multi-scale Riemannian module is then fused with the features extracted from the time-frequency convolution module and passed to the classifier.

The dimensionality of the Riemannian geometric features depends on the number of EEG channels, so the module configuration differs slightly across datasets. For the BCICIV2a dataset, the three convolutional kernel sizes are set to 1 × 1, 3 × 3, and 5 × 5, respectively. For the BCICIV2b dataset, two convolutional layers with kernel sizes of 1 × 1 and 2 × 2 are used.

Classifier module

The classifier module consists of a fully connected layer and a Softmax layer. The fully connected layer takes the concatenated features from the TSLANet module and the multi-scale Riemannian convolution module as input, applies a linear transformation, and outputs a feature vector of size 128. This is followed by a ReLU activation function to introduce non-linearity. A dropout layer with a dropout rate of 0.5 is applied to prevent overfitting. Finally, the Softmax layer outputs the classification probabilities for each class. Then, the Cross-Entropy loss function is used to calculate the classification loss, which is defined as:

(19)

where N is the number of samples, C is the number of classes, y_ij is the true label for sample i and class j, and is the predicted probability for sample i and class j. The model is trained to minimize this loss function.

Experimental setup

Two publicly available motor imagery EEG datasets, summarized in Table 2, were used to evaluate the proposed model under both subject-dependent and cross-subject settings. We compared RMETNet with several state-of-the-art methods, conducted ablation studies, and used visualization analyses to examine the learned feature distributions.

Download:

Table 2. The summary of the BCICIV2a and BCICIV2b datasets used in this paper.

https://doi.org/10.1371/journal.pone.0347671.t002

Datasets

The BCICIV2a [31] and BCICIV2b [32], provided by Graz University of Technology, are widely used benchmarks for motor imagery-based EEG classification tasks. BCICIV2a contains EEG recordings from nine subjects performing four motor imagery tasks: left hand, right hand, both feet, and tongue. The EEG signals were recorded using 22 Ag/AgCl electrodes at a sampling rate of 250 Hz. Data were collected in two sessions at different times, with each session containing 288 trials (72 trials per class). BCICIV2b also includes recordings from nine subjects but focuses on two motor imagery tasks: left hand and right hand. EEG data were collected using 3 channels at the same sampling rate of 250 Hz. This dataset consists of five sessions: the first two sessions, without feedback, contain 120 trials in total (60 trials per class), while the last three sessions, with feedback, comprise 480 trials (160 per session, 80 per class). To remain consistent with prior studies, in this study, only the last three feedback sessions were used for all experiments on BCICIV2b. Therefore, each subject contributed 480 trials in our experiments. In addition, the original BCICIV2a and BCICIV2b recordings had already undergone acquisition-stage preprocessing, including a 0.5–100 Hz band-pass filter and a 50 Hz notch filter [31,32].

To visualize the characteristics of the two datasets, representative motor imagery trials from both BCICIV2a and BCICIV2b are shown in Fig 4. For each dataset, three subjects (subjects 1, 5, and 9) were selected to illustrate cross-subject variability, and for each subject, one trial from each motor imagery class was randomly chosen. The EEG waveforms from channels C3, Cz, and C4 were plotted over the interval from 0.5 to 4.5 s after cue onset, which is the analysis window used in this study. In addition, scalp topographies were computed from the same interval using band power in the 8–30 Hz range, with values expressed as z-scored log₁₀ band power to highlight the spatial patterns of task-related neural activity. As can be observed, the temporal waveforms and spatial distributions vary considerably across subjects, indicating substantial inter-subject variability that poses a major challenge for cross-subject classification. Meanwhile, EEG patterns elicited by different motor imagery tasks often differ only subtly in both temporal and spatial characteristics, which increases inter-class similarity and makes discriminative feature extraction and robust classification more difficult.

Download:

Fig 4. Visualization of representative motor-imagery trials from the BCICIV2a and BCICIV2b datasets.

The top panel shows BCICIV2a and the bottom panel shows BCICIV2b; columns correspond to subjects 1, 5, and 9. For BCICIV2a, rows represent left-hand, right-hand, feet, and tongue imagery, respectively; for BCICIV2b, rows represent left-hand and right-hand imagery. In each subpanel, the left plot shows EEG waveforms from channels C3, Cz, and C4 over the trial interval from 0.5 to 4.5 s, whereas the right plot shows the scalp topography computed from the same interval using band power in the 8-30 Hz range. Topographic values are expressed as z-scored log10 band power.

https://doi.org/10.1371/journal.pone.0347671.g004

Data preprocessing and division

The original EEG samples have dimensions of ch × sp, where ch denotes the number of electrode channels and sp denotes the number of temporal samples. Specifically, ch = 22 for BCICIV2a and ch = 3 for BCICIV2b. For each trial, a fixed EEG segment from 0.5 s to 4.5 s after cue onset was extracted for analysis. Thus, the segment length was 4 s, corresponding to 1000 temporal samples at a sampling rate of 250 Hz. The same time window was used for both BCICIV2a and BCICIV2b. No overlapping windows or trial-level augmentation were used. The extracted EEG signals were directly used for subsequent processing, and Z-score normalization was applied to reduce signal variability and improve training stability:

(20)

where x_i and x_o represent the raw and normalized data, respectively. The mean and standard deviation are computed from the training set and directly applied to the test set to maintain consistency during model training and evaluation.

Covariance matrices were computed separately for the training and validation sets, and Riemannian alignment was applied to reduce inter-subject variability and improve model generalization. Accordingly, the inputs for the cross-subject experiments included both raw EEG signals and Riemannian-aligned features. For cross-subject evaluation, a leave-one-subject-out (LOSO) strategy was adopted. In each fold, all available trials from one subject were used as the test set, while the data from the remaining subjects were used for training and validation. For BCICIV2b, only the three feedback sessions were included, resulting in 480 trials per subject. For subject-dependent evaluation, each subject was processed independently. Only the data from the same subject were used, and they were further divided into training, validation, and test sets with a ratio of 8:1:1.

Evaluation metrics

To evaluate the performance of the proposed model, this paper uses two commonly used metrics in the field of motor imagery EEG classification, including accuracy (ACC) and kappa coefficient (KAPPA). The accuracy is defined as the ratio of correctly classified samples to the total number of samples, while the kappa coefficient is a measure of inter-rater agreement that accounts for chance agreement. These metrics are calculated as follows:

(21)

where TP, TN, FP, and FN represent the true positives, true negatives, false positives, and false negatives, respectively.

(22)

where P_o is the observed agreement and P_e is the expected agreement by chance. The kappa coefficient ranges from −1–1, where 1 indicates perfect agreement, 0 indicates no agreement beyond chance, and negative values indicate less than chance agreement.

Implementation details

The proposed model was implemented in PyTorch with Python 3.8 and trained on an NVIDIA GeForce RTX 4080 SUPER The batch size is set to 64, with a total of 500 training epochs. The Adam optimizer is used with a learning rate of 0.0001 and a weight decay of 0.075. The TSLANet module consists of three layers. To ensure proper convergence during training, L2 regularization is applied to prevent overfitting. During the training process, the model weights corresponding to the lowest validation loss are saved, and these weights are loaded for testing.

Results and discussion

Comparison with state-of-the-art methods

To evaluate the feature recognition performance of RMETNet on MI-EEG signals, we conducted both subject-dependent and cross-subject experiments. Several representative state-of-the-art methods were selected as baselines for systematic comparison on the two public datasets. The baseline methods are briefly described below:

FBCSP [33]: A traditional machine learning method, its principle is based on implementing a spatial feature extraction algorithm using Common Spatial Pattern (CSP) on partitioned frequency bands, combined with a feature selection algorithm. Specifically, the frequency band is first sliced, then CSP filtering is applied to each sub-band. Finally, features are selected from the filtered signals to produce the classification result.
EEGNet [16]: A compact convolutional neural network that learns temporal filters via convolutional kernels to capture motion-related frequency information. The network also employs a separable convolution structure, including Depthwise and Pointwise Convolutions, to learn spatial filters. This design significantly reduces the number of model training parameters while preserving robust feature extraction capabilities.
ShallowConvNet [3]: The data transformation process of ShallowConvNet bears similarity to the feature extraction method of FBCSP, with its core lying in the joint modeling of temporal and spatial features. Specifically, ShallowConvNet extracts spatio-temporal features through a temporal convolution and spatial filters, followed by the application of a squaring nonlinearity, a mean pooling layer, and a logarithmic activation function to further extract deep features and enhance their discriminability.
EEG-TCNet [34]: Proposed as an extension of EEGNet, EEG-TCNet introduces a Temporal Convolutional Network (TCN) structure after EEGNet’s initial feature extraction to further mine information in the temporal dimension. This design combines the efficient feature extraction of EEGNet with TCN’s capacity for modeling long-term dependencies, thereby enhancing the representational power for temporal characteristics of EEG signals and improving classification performance.
EEG-ITNet [35]: This model introduces Inception modules and dilated causal convolutions to efficiently extract rich spectral, spatial, and temporal features from multi-channel EEG signals. The Inception modules capture features from different frequency bands through multi-scale convolutions, while the dilated causal convolutions model long-term dependencies by leveraging an expanded receptive field, thus comprehensively enhancing the feature representation and classification performance for EEG signals.
DS-TKL [36]: This method achieves cross-domain learning by selecting discriminative features from the source domain and performing pseudo-label correction in the target domain. The DS-TKL method first preprocesses samples through centroid alignment to reduce the distribution discrepancy between the source and target domains. Subsequently, it employs Riemannian tangent space features for feature adaptation to further promote feature space alignment between the domains. During the feature adaptation process, a dual selection is implemented through a regularization mechanism to optimize the feature selection process, significantly improving classification performance during iteration.
Transformer-Based Method [37]: This method proposes an end-to-end EEG signal decoding algorithm based on swarm intelligence theory and virtual adversarial training, which integrates the Transformer mechanism to optimize traditional convolutional classification methods. By introducing a self-attention mechanism, the research aims to expand the receptive field for EEG signals to capture global dependencies, thereby capturing broader spatio-temporal features. This approach trains the neural network by optimizing the model’s global parameters, enhancing the model’s global feature learning ability while improving EEG signal classification performance.
EEGConformer [38]: This is a compact, EEG-based Transformer architecture. Its principle involves using convolutional modules to learn local one-dimensional temporal and spatial features. The advantage of this model lies in its subsequent use of a self-attention module to extract global dependencies within these local temporal features. Compared to previous global temporal feature extraction strategies, the features learned by a model incorporating a self-attention mechanism are more comprehensive, achieving higher model performance.

In this paper, accuracy and the Kappa coefficient are used as metrics to quantitatively evaluate the proposed model and compare it with the baseline classification methods.

Results on BCICIV2a dataset

Table 3 summarizes the classification accuracy and Kappa scores of the proposed RMETNet and several representative deep learning models on the BCICIV2a dataset under the subject-dependent setting. To ensure a fair comparison of feature extraction capability, the cross-domain modules, including MMD loss and Riemannian geometry feature fusion, were removed, and only the base network was evaluated. As shown in the table, RMETNet achieves the best overall performance, reaching an average accuracy of 80.77% and a Kappa value of 0.76. In addition, it obtains the highest accuracies on four subjects (S01, S03, S04, and S07), while maintaining competitive performance on the remaining subjects, indicating strong subject-dependent decoding capability.

Download:

Table 3. Subject-dependent classification results on the BCICIV2a dataset. The best results are highlighted in bold.

https://doi.org/10.1371/journal.pone.0347671.t003

Compared with classical CNN-based approaches, RMETNet improves the average accuracy by 8.37 and 6.46% over EEGNet and ShallowConvNet, respectively. More importantly, the paired Wilcoxon signed-rank test shows that these improvements are statistically significant at the p < 0.01 level, demonstrating the clear advantage of the proposed model over conventional lightweight convolutional architectures in four-class motor imagery classification. RMETNet also outperforms EEG-ITNet by 4.03%, and this improvement is statistically significant at the p < 0.05 level, further confirming the effectiveness of the proposed architecture. In comparison with EEG-TCNet, EEGConformer, and LMDANet, RMETNet still achieves the highest average accuracy, with gains of 3.46, 2.11, and 1.91%, respectively, although no statistically significant difference is observed according to the reported test results. Overall, these findings indicate that RMETNet can learn more discriminative spatio-temporal representations and provides superior performance under the subject-dependent evaluation protocol.

As shown in Table 4, RMETNet achieves the best performance in the cross-subject setting, obtaining an average classification accuracy of 71.39% and a Kappa value of 0.59. Notably, these results are achieved using the same network configuration and hyperparameter settings for all target subjects, which further demonstrates the robustness of the proposed framework. RMETNet attains the highest accuracies on seven out of nine subjects (S01, S03, S04, S05, S07, S08, and S09), showing strong resistance to inter-subject variability.

Download:

Table 4. Cross-subject classification results on the BCICIV2a dataset. The best results are highlighted in bold.

https://doi.org/10.1371/journal.pone.0347671.t004

In terms of average accuracy, RMETNet surpasses EEGNet, ShallowConvNet, EEG-TCNet, EEG-ITNet, and the Transformer-based model by 9.05, 10.87, 6.61, 6.19, and 7.83%, respectively. According to the paired Wilcoxon signed-rank test, the improvements over EEGNet and EEG-ITNet are statistically significant at the p < 0.05 level, while the improvement over ShallowConvNet is statistically significant at the p < 0.01 level. These results indicate that classical and conventional deep learning baselines still face substantial difficulty in handling cross-subject EEG distribution shifts. Although RMETNet also achieves higher average accuracies than EEG-TCNet and the Transformer-based model, the corresponding differences are not marked as statistically significant. Taken together, the above results suggest that RMETNet is more effective at learning transferable representations, reducing cross-subject distribution discrepancy, and improving the generalization performance of motor imagery EEG decoding.

Results on BCICIV2b Dataset

The BCICIV2b dataset is a widely used benchmark for three-channel, binary motor imagery classification and therefore provides an appropriate testbed for evaluating the effectiveness of RMETNet in a low-channel scenario. As reported in Table 5, RMETNet achieves the best overall performance in the subject-dependent setting, with an average classification accuracy of 86.76% and a Kappa value of 0.74. Moreover, RMETNet obtains the best results on five of the nine subjects (S2, S3, S4, S5, and S7) and reaches a peak accuracy of 98.75% on subject S4.

Download:

Table 5. Subject-dependent classification results on the BCICIV2b dataset. The best results are highlighted in bold.

https://doi.org/10.1371/journal.pone.0347671.t005

Compared with FBCSP, EEGNet, ShallowConvNet, EEGConformer, and LMDANet, RMETNet improves the average accuracy by 6.75, 5.95, 7.17, 2.28, and 1.12%, respectively. Among these comparisons, the gains over FBCSP and ShallowConvNet are statistically significant at the p < 0.01 level according to the paired Wilcoxon signed-rank test, highlighting the clear superiority of RMETNet over traditional feature-engineering methods and shallow convolutional baselines. Although RMETNet also achieves higher average accuracy than EEGNet, EEGConformer, and LMDANet, these differences are not marked as statistically significant in the current results. In addition, RMETNet exhibits the smallest standard deviation (9.57), indicating more stable performance across subjects. These findings demonstrate that the proposed model remains highly effective even with only three EEG channels and can extract discriminative task-related features for binary motor imagery decoding more reliably than competing methods.

Furthermore, the robustness and generalization ability of the proposed model were evaluated through cross-subject experiments on the BCICIV2b dataset, with the results shown in Table 6. In this more challenging setting, RMETNet again achieves the best overall performance, yielding an average classification accuracy of 80.93% and a Kappa value of 0.54. More importantly, RMETNet ranks first on eight of the nine subjects, and only on subject S4 does it perform slightly below the best competing method, while still achieving a high accuracy of 91.63%. This demonstrates that RMETNet can maintain strong and consistent decoding performance under substantial inter-subject distribution differences.

Download:

Table 6. Cross-subject classification results on the BCICIV2b dataset. The best results are highlighted in bold.

https://doi.org/10.1371/journal.pone.0347671.t006

From the perspective of average accuracy, RMETNet improves upon EEGNet, ShallowConvNet, and DS-TKL by 21.33, 17.44, and 10.40%, respectively. The paired Wilcoxon signed-rank test further shows that all these improvements are statistically significant at the p < 0.01 level, providing strong evidence for the superiority of RMETNet in cross-subject decoding on the BCICIV2b dataset. In addition, RMETNet achieves a lower standard deviation (7.12) than DS-TKL (9.14), indicating better stability across subjects. These results suggest that RMETNet is capable of learning more transferable and domain-robust EEG representations, thereby effectively alleviating the negative impact of inter-subject variability. Such a property is particularly important for practical BCI systems, where reducing calibration effort and improving generalization to unseen users are key requirements.

Overall, the statistical analysis based on paired Wilcoxon signed-rank tests demonstrates that the proposed RMETNet not only achieves the highest average accuracies on both datasets, but also exhibits statistically significant improvements over several representative baselines, especially in the more challenging cross-subject setting. These results suggest that the proposed model can effectively learn more discriminative and transferable representations for motor imagery EEG decoding, thereby improving the generalization performance across subjects. The consistent superiority of RMETNet across both datasets and evaluation protocols further confirms its robustness and potential for practical BCI applications.

Ablation Study

To assess the contributions of the two cross-domain components, namely the multi-scale Riemannian feature module and the MMD loss, ablation experiments were conducted on both datasets. Table 7 reports four settings: removing both components, removing only the multi-scale Riemannian feature module, removing only the MMD loss, and using the complete model. The ablation results indicate that the multi-scale Riemannian feature module and the MMD-based domain adaptation strategy provide complementary gains in RMETNet. Relative to the setting without either component, the Riemannian module alone increases average accuracy by 0.04% and 0.23% on BCICIV2a and BCICIV2b, respectively, whereas the MMD loss alone increases it by 2.41% and 1.12%. When both components are used together, the gains reach 4.28% on BCICIV2a and 1.89% on BCICIV2b. According to the paired Wilcoxon signed-rank tests, all three ablated settings are significantly worse than the full model on BCICIV2a (p < 0.01). On BCICIV2b, the settings without both components and without the MMD loss are significantly worse than the full model (p < 0.01), whereas the setting without the Riemannian module is not significantly different from the full model. These findings suggest that TSLANet and the spatio-temporal convolution module provide robust temporal and local spatial representations, while the multi-scale Riemannian branch captures global covariance structure across subjects. At the same time, the MMD loss encourages the network to learn more domain-invariant features by reducing cross-subject distribution differences. Together, these components improve the overall performance of RMETNet on both public datasets and support its potential for practical MI-BCI applications.

Download:

Table 7. Ablation study results on the BCICIV2a and BCICIV2b datasets.

https://doi.org/10.1371/journal.pone.0347671.t007

Although the standalone improvement brought by the multi-scale Riemannian feature module is relatively limited in the ablation study, its contribution should be understood together with the MMD-based domain adaptation strategy rather than in isolation. Specifically, the Riemannian module alone yields only marginal gains over the setting without both components, whereas the full model consistently achieves the best performance when the Riemannian feature learning and MMD loss are jointly applied. More importantly, on BCICIV2a, the benefit of the Riemannian module becomes evident when it is combined with MMD: adding the Riemannian branch to the MMD-based model increases the average accuracy from 69.52% to 71.39%, and the paired Wilcoxon signed-rank test confirms that this improvement is statistically significant (p < 0.01). This observation suggests that the main role of the Riemannian branch is not merely to provide an independent accuracy boost, but to supply geometry-aware covariance representations that are more amenable to cross-subject alignment under MMD regularization. In other words, the Riemannian module and the MMD loss appear to act synergistically: the former enhances the structural representation of EEG covariance patterns, while the latter promotes distribution matching across subjects in the learned feature space. This complementary interaction is particularly important in cross-subject MI decoding, where reducing inter-subject variability is as important as improving feature discriminability. Therefore, although the additional operations of covariance computation, logarithmic mapping, and tangent-space projection increase model complexity, the experimental results suggest that their value lies mainly in strengthening the effectiveness of domain adaptation rather than serving as a purely standalone enhancement. Future work will further investigate this synergy by analyzing computational cost, feature distributions, and transfer behavior under different subject-transfer settings.

Visualization

UMAP [40] was used to visualize the deep feature distributions and to examine how the two cross-domain components affect class separation. Compared with t-SNE, UMAP preserves more global structure and therefore provides a clearer view of the learned feature organization. After training, the feature distribution of subject S1 on the BCICIV2a dataset is shown in Fig 5. Without the MMD loss and Riemannian feature fusion, the distances between classes are smaller. When the Riemannian branch is removed, the features show larger intra-class dispersion and weaker inter-class separation. By contrast, the complete model produces more compact intra-class clusters and clearer inter-class boundaries, which supports the effectiveness of RMETNet in learning discriminative subject representations.

Download:

Fig 5. UMAP visualization of the test features for subject S1 on BCICIV2a under different settings.

(a) Original test features. (b) Features without the Riemannian branch. (c) Features without MMD-loss-based training. (d) Features produced by the complete model.

https://doi.org/10.1371/journal.pone.0347671.g005

Complexity analysis

To evaluate the computational cost and deployment feasibility of RMETNet, we analyzed the model size, FLOPs, and single-trial inference latency. On BCICIV2a, RMETNet contains 0.198 M trainable parameters and requires 421.681 M FLOPs per trial; on BCICIV2b, it contains 0.159 M parameters and requires 57.752 M FLOPs. Under CPU-based single-trial inference, the average forward latency is 3.74 ± 0.04 ms on BCICIV2a and 0.52 ± 0.02 ms on BCICIV2b. In addition, covariance estimation and tangent-space alignment are performed during offline preprocessing rather than within the online forward pass, introducing only about 0.15 ms per trial for 22-channel data and 0.03 ms per trial for 3-channel data. These results indicate that RMETNet has a lightweight forward architecture and is suitable for near-real-time MI-BCI deployment. Nevertheless, because the cross-subject evaluation in this study relies on precomputed alignment statistics, the current experimental setting is more appropriately interpreted as session-level adaptation rather than fully streaming online inference.

Conclusion

This study proposes RMETNet, a neural network for cross-subject motor imagery EEG decoding that combines TSLANet, spatio-temporal convolution, multi-scale Riemannian feature learning, and MMD-based domain adaptation. After Z-score normalization, the raw EEG signals are processed by TSLANet and the convolutional branch to extract temporal and spatio-temporal representations, while a parallel Riemannian branch learns cross-subject geometric features. In cross-subject experiments, RMETNet achieves average accuracies of 71.39% on BCICIV2a and 80.93% on BCICIV2b. In subject-dependent experiments, it reaches 80.71% and 86.76% on the two datasets, respectively, outperforming the compared deep learning baselines. Visualization and ablation analyses further confirm the effectiveness of the Riemannian feature mapping and the MMD-based cross-subject learning strategy. Future work will further optimize TSLANet for EEG analysis and investigate lighter model designs while preserving classification performance. Overall, RMETNet provides an effective and robust framework for MI-EEG-based BCI applications.

References

1. McFarland DJ, Wolpaw JR. Brain-computer interface use is a skill that user and system acquire together. PLoS Biol. 2018;16(7):e2006719. pmid:29965965
- View Article
- PubMed/NCBI
- Google Scholar
2. Thomas KP, Guan C, Lau CT, Vinod AP, Ang KK. A new discriminative common spatial pattern method for motor imagery brain-computer interfaces. IEEE Trans Biomed Eng. 2009;56(11 Pt 2):2730–3. pmid:19605314
- View Article
- PubMed/NCBI
- Google Scholar
3. Tibor Schirrmeister R, Gemein L, Eggensperger K, Hutter F, Ball T. Deep learning with convolutional neural networks for decoding and visualization of eeg pathology. arXiv e-prints. 2017;arXiv-1708.
4. Craik A, He Y, Contreras-Vidal JL. Deep learning for electroencephalogram (EEG) classification tasks: a review. J Neural Eng. 2019;16(3):031001. pmid:30808014
- View Article
- PubMed/NCBI
- Google Scholar
5. Chen Y, Yang R, Huang M, Wang Z, Liu X. Single-Source to Single-Target Cross-Subject Motor Imagery Classification Based on Multisubdomain Adaptation Network. IEEE Trans Neural Syst Rehabil Eng. 2022;30:1992–2002. pmid:35849678
- View Article
- PubMed/NCBI
- Google Scholar
6. Zhang K, Robinson N, Lee S-W, Guan C. Adaptive transfer learning for EEG motor imagery classification with deep Convolutional Neural Network. Neural Netw. 2021;136:1–10. pmid:33401114
- View Article
- PubMed/NCBI
- Google Scholar
7. Zhang F, Wu H, Guo Y. Semi-supervised multi-source transfer learning for cross-subject EEG motor imagery classification. Med Biol Eng Comput. 2024;62(6):1655–72. pmid:38324109
- View Article
- PubMed/NCBI
- Google Scholar
8. Al-Saegh A, Dawwd SA, Abdul-Jabbar JM. Deep learning for motor imagery EEG-based classification: A review. Biomed Signal Process Control. 2021;63:102172.
- View Article
- Google Scholar
9. Wang L, Li M, Zhang L. Recognize enhanced temporal-spatial-spectral features with a parallel multi-branch CNN and GRU. Med Biol Eng Comput. 2023;61(8):2013–32. pmid:37294411
- View Article
- PubMed/NCBI
- Google Scholar
10. Liu X, Wang K, Liu F, Zhao W, Liu J. 3D Convolution neural network with multiscale spatial and temporal cues for motor imagery EEG classification. Cogn Neurodyn. 2023;17(5):1357–80. pmid:37786651
- View Article
- PubMed/NCBI
- Google Scholar
11. Fan Z, Xi X, Gao Y, Wang T, Fang F, Houston M, et al. Joint Filter-Band-Combination and Multi-View CNN for Electroencephalogram Decoding. IEEE Trans Neural Syst Rehabil Eng. 2023;31:2101–10. pmid:37083516
- View Article
- PubMed/NCBI
- Google Scholar
12. Bore JC, Li P, Jiang L, Ayedh WMA, Chen C, Harmah DJ, et al. A Long Short-Term Memory Network for Sparse Spatiotemporal EEG Source Imaging. IEEE Trans Med Imaging. 2021;40(12):3787–800. pmid:34270417
- View Article
- PubMed/NCBI
- Google Scholar
13. Song Y, Yin Y, Xu P. A Customized ECA-CRNN Model for Emotion Recognition Based on EEG Signals. Electronics. 2023;12(13):2900.
- View Article
- Google Scholar
14. Zhang D, Chen K, Jian D, Yao L. Motor Imagery Classification via Temporal Attention Cues of Graph Embedded EEG Signals. IEEE J Biomed Health Inform. 2020;24(9):2570–9. pmid:31976916
- View Article
- PubMed/NCBI
- Google Scholar
15. Sun B, Liu Z, Wu Z, Mu C, Li T. Graph Convolution Neural Network Based End-to-End Channel Selection and Classification for Motor Imagery Brain–Computer Interfaces. IEEE Trans Ind Inf. 2023;19(9):9314–24.
- View Article
- Google Scholar
16. Lawhern VJ, Solon AJ, Waytowich NR, Gordon SM, Hung CP, Lance BJ. EEGNet: a compact convolutional neural network for EEG-based brain-computer interfaces. J Neural Eng. 2018;15(5):056013. pmid:29932424
- View Article
- PubMed/NCBI
- Google Scholar
17. Li Y, Zhang X-R, Zhang B, Lei M-Y, Cui W-G, Guo Y-Z. A Channel-Projection Mixed-Scale Convolutional Neural Network for Motor Imagery EEG Decoding. IEEE Trans Neural Syst Rehabil Eng. 2019;27(6):1170–80. pmid:31071048
- View Article
- PubMed/NCBI
- Google Scholar
18. Yang J, Gao S, Shen T. A Two-Branch CNN Fusing Temporal and Frequency Features for Motor Imagery EEG Decoding. Entropy (Basel). 2022;24(3):376. pmid:35327887
- View Article
- PubMed/NCBI
- Google Scholar
19. Siddhad G, Gupta A, Dogra DP, Roy PP. Efficacy of transformer networks for classification of EEG data. Biomed Signal Process Control. 2024;87:105488.
- View Article
- Google Scholar
20. Ghosh S, Chandrasekaran V, Rohan N, Chakravarthy VS. Electroencephalogram (EEG) classification using a bio-inspired deep oscillatory neural network. Biomed Signal Process Control. 2025;103:107379.
- View Article
- Google Scholar
21. Liao W, Miao Z, Liang S, Zhang L, Li C. A composite improved attention convolutional network for motor imagery EEG classification. Front Neurosci. 2025;19:1543508. pmid:39981403
- View Article
- PubMed/NCBI
- Google Scholar
22. Zhang C, Kim Y-K, Eskandarian A. EEG-inception: an accurate and robust end-to-end neural network for EEG-based motor imagery classification. J Neural Eng. 2021;18(4):046014. pmid:33691299
- View Article
- PubMed/NCBI
- Google Scholar
23. Maswanganyi RC, Tu C, Owolawi PA, Du S. Multi-Class Transfer Learning and Domain Selection for Cross-Subject EEG Classification. Appl Sci. 2023;13(8):5205.
- View Article
- Google Scholar
24. Wu H, Niu Y, Li F, Li Y, Fu B, Shi G, et al. A Parallel Multiscale Filter Bank Convolutional Neural Networks for Motor Imagery EEG Classification. Front Neurosci. 2019;13:1275. pmid:31849587
- View Article
- PubMed/NCBI
- Google Scholar
25. Perez-Velasco S, Santamaria-Vazquez E, Martinez-Cagigal V, Marcos-Martinez D, Hornero R. EEGSym: Overcoming Inter-Subject Variability in Motor Imagery Based BCIs With Deep Learning. IEEE Trans Neural Syst Rehabil Eng. 2022;30:1766–75. pmid:35759578
- View Article
- PubMed/NCBI
- Google Scholar
26. Hu L, Hong W, Liu L. MSATNet: multi-scale adaptive transformer network for motor imagery classification. Front Neurosci. 2023;17:1173778. pmid:37389361
- View Article
- PubMed/NCBI
- Google Scholar
27. Paillard J, Hipp JF, Engemann DA. GREEN: A lightweight architecture using learnable wavelets and Riemannian geometry for biomarker exploration with EEG signals. Patterns (N Y). 2025;6(3):101182. pmid:40182177
- View Article
- PubMed/NCBI
- Google Scholar
28. Barachant A, Bonnet S, Congedo M, Jutten C. Multiclass brain-computer interface classification by Riemannian geometry. IEEE Trans Biomed Eng. 2012;59(4):920–8. pmid:22010143
- View Article
- PubMed/NCBI
- Google Scholar
29. Morales S, Bowers ME. Time-frequency analysis methods and their application in developmental EEG data. Dev Cogn Neurosci. 2022;54:101067. pmid:35065418
- View Article
- PubMed/NCBI
- Google Scholar
30. Deng X, Zhang B, Yu N, Liu K, Sun K. Advanced TSGL-EEGNet for Motor Imagery EEG-Based Brain-Computer Interfaces. IEEE Access. 2021;9:25118–30.
- View Article
- Google Scholar
31. Brunner C, Leeb R, Müller-Putz G, Schlögl A, Pfurtscheller G. BCI Competition 2008–Graz data set A. Institute for knowledge discovery (laboratory of brain-computer interfaces), Graz University of Technology. 2008;16(1-6):34.
32. Leeb R, Brunner C, Müller-Putz G, Schlögl A, Pfurtscheller G. BCI Competition 2008–Graz Data Set B. Graz Univ Technol. 2008;16:1–6.
- View Article
- Google Scholar
33. Ang KK, Chin ZY, Zhang H, Guan C. Filter Bank Common Spatial Pattern (FBCSP) algorithm using online adaptive and semi-supervised learning. In: The 2011 International Joint Conference on Neural Networks. IEEE; 2011. p. 392–6.
34. Ingolfsson TM, Hersche M, Wang X, Kobayashi N, Cavigelli L, Benini L. EEG-TCNet: An accurate temporal convolutional network for embedded motor-imagery brain–machine interfaces. In: 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE; 2020. p. 2958–65.
35. Salami A, Andreu-Perez J, Gillmeister H. EEG-ITNet: An Explainable Inception Temporal Convolutional Network for Motor Imagery Classification. IEEE Access. 2022;10:36672–85.
- View Article
- Google Scholar
36. Luo T-J. Dual selections based knowledge transfer learning for cross-subject motor imagery EEG classification. Front Neurosci. 2023;17:1274320. pmid:38089972
- View Article
- PubMed/NCBI
- Google Scholar
37. Tan X, Wang D, Chen J, Xu M. Transformer-Based Network with Optimization for Cross-Subject Motor Imagery Identification. Bioengineering (Basel). 2023;10(5):609. pmid:37237679
- View Article
- PubMed/NCBI
- Google Scholar
38. Song Y, Zheng Q, Liu B, Gao X. EEG Conformer: Convolutional Transformer for EEG Decoding and Visualization. IEEE Trans Neural Syst Rehabil Eng. 2023;31:710–9. pmid:37015413
- View Article
- PubMed/NCBI
- Google Scholar
39. Miao Z, Zhao M, Zhang X, Ming D. LMDA-Net:A lightweight multi-dimensional attention network for general EEG-based brain-computer interfaces and interpretability. Neuroimage. 2023;276:120209. pmid:37269957
- View Article
- PubMed/NCBI
- Google Scholar
40. McInnes L, Healy J, Melville J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:180203426. 2018.

[ref1] 1. McFarland DJ, Wolpaw JR. Brain-computer interface use is a skill that user and system acquire together. PLoS Biol. 2018;16(7):e2006719. pmid:29965965
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Thomas KP, Guan C, Lau CT, Vinod AP, Ang KK. A new discriminative common spatial pattern method for motor imagery brain-computer interfaces. IEEE Trans Biomed Eng. 2009;56(11 Pt 2):2730–3. pmid:19605314
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Tibor Schirrmeister R, Gemein L, Eggensperger K, Hutter F, Ball T. Deep learning with convolutional neural networks for decoding and visualization of eeg pathology. arXiv e-prints. 2017;arXiv-1708.

[ref4] 4. Craik A, He Y, Contreras-Vidal JL. Deep learning for electroencephalogram (EEG) classification tasks: a review. J Neural Eng. 2019;16(3):031001. pmid:30808014
View Article
PubMed/NCBI
Google Scholar

[11] View Article

[12] PubMed/NCBI

[13] Google Scholar

[ref5] 5. Chen Y, Yang R, Huang M, Wang Z, Liu X. Single-Source to Single-Target Cross-Subject Motor Imagery Classification Based on Multisubdomain Adaptation Network. IEEE Trans Neural Syst Rehabil Eng. 2022;30:1992–2002. pmid:35849678
View Article
PubMed/NCBI
Google Scholar

[15] View Article

[16] PubMed/NCBI

[17] Google Scholar

[ref6] 6. Zhang K, Robinson N, Lee S-W, Guan C. Adaptive transfer learning for EEG motor imagery classification with deep Convolutional Neural Network. Neural Netw. 2021;136:1–10. pmid:33401114
View Article
PubMed/NCBI
Google Scholar

[19] View Article

[20] PubMed/NCBI

[21] Google Scholar

[ref7] 7. Zhang F, Wu H, Guo Y. Semi-supervised multi-source transfer learning for cross-subject EEG motor imagery classification. Med Biol Eng Comput. 2024;62(6):1655–72. pmid:38324109
View Article
PubMed/NCBI
Google Scholar

[23] View Article

[24] PubMed/NCBI

[25] Google Scholar

[ref8] 8. Al-Saegh A, Dawwd SA, Abdul-Jabbar JM. Deep learning for motor imagery EEG-based classification: A review. Biomed Signal Process Control. 2021;63:102172.
View Article
Google Scholar

[27] View Article

[28] Google Scholar

[ref9] 9. Wang L, Li M, Zhang L. Recognize enhanced temporal-spatial-spectral features with a parallel multi-branch CNN and GRU. Med Biol Eng Comput. 2023;61(8):2013–32. pmid:37294411
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref10] 10. Liu X, Wang K, Liu F, Zhao W, Liu J. 3D Convolution neural network with multiscale spatial and temporal cues for motor imagery EEG classification. Cogn Neurodyn. 2023;17(5):1357–80. pmid:37786651
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref11] 11. Fan Z, Xi X, Gao Y, Wang T, Fang F, Houston M, et al. Joint Filter-Band-Combination and Multi-View CNN for Electroencephalogram Decoding. IEEE Trans Neural Syst Rehabil Eng. 2023;31:2101–10. pmid:37083516
View Article
PubMed/NCBI
Google Scholar

[38] View Article

[39] PubMed/NCBI

[40] Google Scholar

[ref12] 12. Bore JC, Li P, Jiang L, Ayedh WMA, Chen C, Harmah DJ, et al. A Long Short-Term Memory Network for Sparse Spatiotemporal EEG Source Imaging. IEEE Trans Med Imaging. 2021;40(12):3787–800. pmid:34270417
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref13] 13. Song Y, Yin Y, Xu P. A Customized ECA-CRNN Model for Emotion Recognition Based on EEG Signals. Electronics. 2023;12(13):2900.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref14] 14. Zhang D, Chen K, Jian D, Yao L. Motor Imagery Classification via Temporal Attention Cues of Graph Embedded EEG Signals. IEEE J Biomed Health Inform. 2020;24(9):2570–9. pmid:31976916
View Article
PubMed/NCBI
Google Scholar

[49] View Article

[50] PubMed/NCBI

[51] Google Scholar

[ref15] 15. Sun B, Liu Z, Wu Z, Mu C, Li T. Graph Convolution Neural Network Based End-to-End Channel Selection and Classification for Motor Imagery Brain–Computer Interfaces. IEEE Trans Ind Inf. 2023;19(9):9314–24.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref16] 16. Lawhern VJ, Solon AJ, Waytowich NR, Gordon SM, Hung CP, Lance BJ. EEGNet: a compact convolutional neural network for EEG-based brain-computer interfaces. J Neural Eng. 2018;15(5):056013. pmid:29932424
View Article
PubMed/NCBI
Google Scholar

[56] View Article

[57] PubMed/NCBI

[58] Google Scholar

[ref17] 17. Li Y, Zhang X-R, Zhang B, Lei M-Y, Cui W-G, Guo Y-Z. A Channel-Projection Mixed-Scale Convolutional Neural Network for Motor Imagery EEG Decoding. IEEE Trans Neural Syst Rehabil Eng. 2019;27(6):1170–80. pmid:31071048
View Article
PubMed/NCBI
Google Scholar

[60] View Article

[61] PubMed/NCBI

[62] Google Scholar

[ref18] 18. Yang J, Gao S, Shen T. A Two-Branch CNN Fusing Temporal and Frequency Features for Motor Imagery EEG Decoding. Entropy (Basel). 2022;24(3):376. pmid:35327887
View Article
PubMed/NCBI
Google Scholar

[64] View Article

[65] PubMed/NCBI

[66] Google Scholar

[ref19] 19. Siddhad G, Gupta A, Dogra DP, Roy PP. Efficacy of transformer networks for classification of EEG data. Biomed Signal Process Control. 2024;87:105488.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref20] 20. Ghosh S, Chandrasekaran V, Rohan N, Chakravarthy VS. Electroencephalogram (EEG) classification using a bio-inspired deep oscillatory neural network. Biomed Signal Process Control. 2025;103:107379.
View Article
Google Scholar

[71] View Article

[72] Google Scholar

[ref21] 21. Liao W, Miao Z, Liang S, Zhang L, Li C. A composite improved attention convolutional network for motor imagery EEG classification. Front Neurosci. 2025;19:1543508. pmid:39981403
View Article
PubMed/NCBI
Google Scholar

[74] View Article

[75] PubMed/NCBI

[76] Google Scholar

[ref22] 22. Zhang C, Kim Y-K, Eskandarian A. EEG-inception: an accurate and robust end-to-end neural network for EEG-based motor imagery classification. J Neural Eng. 2021;18(4):046014. pmid:33691299
View Article
PubMed/NCBI
Google Scholar

[78] View Article

[79] PubMed/NCBI

[80] Google Scholar

[ref23] 23. Maswanganyi RC, Tu C, Owolawi PA, Du S. Multi-Class Transfer Learning and Domain Selection for Cross-Subject EEG Classification. Appl Sci. 2023;13(8):5205.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref24] 24. Wu H, Niu Y, Li F, Li Y, Fu B, Shi G, et al. A Parallel Multiscale Filter Bank Convolutional Neural Networks for Motor Imagery EEG Classification. Front Neurosci. 2019;13:1275. pmid:31849587
View Article
PubMed/NCBI
Google Scholar

[85] View Article

[86] PubMed/NCBI

[87] Google Scholar

[ref25] 25. Perez-Velasco S, Santamaria-Vazquez E, Martinez-Cagigal V, Marcos-Martinez D, Hornero R. EEGSym: Overcoming Inter-Subject Variability in Motor Imagery Based BCIs With Deep Learning. IEEE Trans Neural Syst Rehabil Eng. 2022;30:1766–75. pmid:35759578
View Article
PubMed/NCBI
Google Scholar

[89] View Article

[90] PubMed/NCBI

[91] Google Scholar

[ref26] 26. Hu L, Hong W, Liu L. MSATNet: multi-scale adaptive transformer network for motor imagery classification. Front Neurosci. 2023;17:1173778. pmid:37389361
View Article
PubMed/NCBI
Google Scholar

[93] View Article

[94] PubMed/NCBI

[95] Google Scholar

[ref27] 27. Paillard J, Hipp JF, Engemann DA. GREEN: A lightweight architecture using learnable wavelets and Riemannian geometry for biomarker exploration with EEG signals. Patterns (N Y). 2025;6(3):101182. pmid:40182177
View Article
PubMed/NCBI
Google Scholar

[97] View Article

[98] PubMed/NCBI

[99] Google Scholar

[ref28] 28. Barachant A, Bonnet S, Congedo M, Jutten C. Multiclass brain-computer interface classification by Riemannian geometry. IEEE Trans Biomed Eng. 2012;59(4):920–8. pmid:22010143
View Article
PubMed/NCBI
Google Scholar

[101] View Article

[102] PubMed/NCBI

[103] Google Scholar

[ref29] 29. Morales S, Bowers ME. Time-frequency analysis methods and their application in developmental EEG data. Dev Cogn Neurosci. 2022;54:101067. pmid:35065418
View Article
PubMed/NCBI
Google Scholar

[105] View Article

[106] PubMed/NCBI

[107] Google Scholar

[ref30] 30. Deng X, Zhang B, Yu N, Liu K, Sun K. Advanced TSGL-EEGNet for Motor Imagery EEG-Based Brain-Computer Interfaces. IEEE Access. 2021;9:25118–30.
View Article
Google Scholar

[109] View Article

[110] Google Scholar

[ref31] 31. Brunner C, Leeb R, Müller-Putz G, Schlögl A, Pfurtscheller G. BCI Competition 2008–Graz data set A. Institute for knowledge discovery (laboratory of brain-computer interfaces), Graz University of Technology. 2008;16(1-6):34.

[ref32] 32. Leeb R, Brunner C, Müller-Putz G, Schlögl A, Pfurtscheller G. BCI Competition 2008–Graz Data Set B. Graz Univ Technol. 2008;16:1–6.
View Article
Google Scholar

[113] View Article

[114] Google Scholar

[ref33] 33. Ang KK, Chin ZY, Zhang H, Guan C. Filter Bank Common Spatial Pattern (FBCSP) algorithm using online adaptive and semi-supervised learning. In: The 2011 International Joint Conference on Neural Networks. IEEE; 2011. p. 392–6.

[ref34] 34. Ingolfsson TM, Hersche M, Wang X, Kobayashi N, Cavigelli L, Benini L. EEG-TCNet: An accurate temporal convolutional network for embedded motor-imagery brain–machine interfaces. In: 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE; 2020. p. 2958–65.

[ref35] 35. Salami A, Andreu-Perez J, Gillmeister H. EEG-ITNet: An Explainable Inception Temporal Convolutional Network for Motor Imagery Classification. IEEE Access. 2022;10:36672–85.
View Article
Google Scholar

[118] View Article

[119] Google Scholar

[ref36] 36. Luo T-J. Dual selections based knowledge transfer learning for cross-subject motor imagery EEG classification. Front Neurosci. 2023;17:1274320. pmid:38089972
View Article
PubMed/NCBI
Google Scholar

[121] View Article

[122] PubMed/NCBI

[123] Google Scholar

[ref37] 37. Tan X, Wang D, Chen J, Xu M. Transformer-Based Network with Optimization for Cross-Subject Motor Imagery Identification. Bioengineering (Basel). 2023;10(5):609. pmid:37237679
View Article
PubMed/NCBI
Google Scholar

[125] View Article

[126] PubMed/NCBI

[127] Google Scholar

[ref38] 38. Song Y, Zheng Q, Liu B, Gao X. EEG Conformer: Convolutional Transformer for EEG Decoding and Visualization. IEEE Trans Neural Syst Rehabil Eng. 2023;31:710–9. pmid:37015413
View Article
PubMed/NCBI
Google Scholar

[129] View Article

[130] PubMed/NCBI

[131] Google Scholar

[ref39] 39. Miao Z, Zhao M, Zhang X, Ming D. LMDA-Net:A lightweight multi-dimensional attention network for general EEG-based brain-computer interfaces and interpretability. Neuroimage. 2023;276:120209. pmid:37269957
View Article
PubMed/NCBI
Google Scholar

[133] View Article

[134] PubMed/NCBI

[135] Google Scholar

[ref40] 40. McInnes L, Healy J, Melville J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:180203426. 2018.

Figures

Abstract

Introduction

Related work

Subject-dependent MI-EEG decoding

Cross-subject MI-EEG decoding

Methodology

Overview of RMETNet

TSLANet module

Spatio-temporal convolution module

Multi-scale riemannian convolution module

Classifier module

Experimental setup

Datasets

Data preprocessing and division

Evaluation metrics

Implementation details

Results and discussion

Comparison with state-of-the-art methods

Results on BCICIV2a dataset

Results on BCICIV2b Dataset

Ablation Study

Visualization

Complexity analysis

Conclusion

References