The Synergy between Complex Channel-Specific FIR Filter and Spatial Filter for Single-Trial EEG Classification

The common spatial pattern analysis (CSP), a frequently utilized feature extraction method in brain-computer-interface applications, is believed to be time-invariant and sensitive to noises, mainly due to an inherent shortcoming of purely relying on spatial filtering. Therefore, temporal/spectral filtering which can be very effective to counteract the unfavorable influence of noises is usually used as a supplement. This work integrates the CSP spatial filters with complex channel-specific finite impulse response (FIR) filters in a natural and intuitive manner. Each hybrid spatial-FIR filter is of high-order, data-driven and is unique to its corresponding channel. They are derived by introducing multiple time delays and regularization into conventional CSP. The general framework of the method follows that of CSP but performs better, as proven in single-trial classification tasks like event-related potential detection and motor imagery.


Introduction
The successfulness of common spatial pattern analysis (CSP) in the brain-computer interface applications such as motor imagery (MI) and event-related potential (ERP) detection has received considerable attentions [1][2][3][4][5]. Being a supervised method, CSP extracts a set of optimal spatial filters from labeled data, which maximize the separability between two distinct mental conditions. The filters obtained by CSP heavily rely on spatial projections. Therefore, it technically underrates the temporal/spectral information of electroencephalogram (EEG), which however plays an important role in feature extraction. To address such a pitfall, researchers have taken various steps to restructure CSP so that temporal/spectral filters are also exploited [6][7][8].
A noticeable attempt is the introduction of common spatiospectral pattern (CSSP) [9], which constructs channel-specific temporal filters by applying time delay embedding. Given that the temporal filters of CSSP are rather basic, the common sparse spectral spatial pattern (CSSSP) marches forward by iteratively and simultaneously optimizing a complex temporal filter together with CSP spatial filters, under a regularization scheme [10]. It is noteworthy that in CSSSP, the obtained temporal filter will be equally applied to individual EEG channels. Unlike CSSP and CSSSP which are characterized by time delays, spectrally weighted common spatial patterns (SPEC-CSP) [11] and iterative spatio-spectral patterns learning (ISSPL) [12] introduce the linear time-invariant temporal filter and circulant temporal filter matrix, respectively. Both of them use Fourier transform so that the optimization of temporal filters can be carried out in the spectral domain. On the other hand, common spatio-temporal pattern (CSTP) [6] and bilinear common spatial pattern (BCSP) [7] show that temporal filters can be naturally obtained by modifying the objective function of CSP. Another interesting variant of CSP, namely analytic common spatial patterns (ACSP), implements Hilbert transform into CSP to extract complex-valued filters which already contain temporal information [13,14].
In this paper, a method namely common complex-spatiospectral pattern (CCSSP) is proposed to accommodate the benefits of both spatial and temporal filtering. It acquires a number of filters that are data-driven and tailored for each EEG channel. These filters are the mathematical synergy of spatial filters and high-order finite impulse response (FIR) filters, whose flexibility, if being well regularized, enables a better dissociation between two distinct mental conditions and consequently outstanding classification performance.

FIR Filter
Suppose x i,j is the j th sample in the i th channel of multi-variant EEG matrix X. If a FIR filter of (2k+1) order is applied, a sample after being filtered will be where a i,t are the FIR filter coefficients that are specific to the i th channel and t[½{k,k. (1) can be further rewritten in the matrix format as follows: where X t is the t time-point delayed version of X, and L t is a diagonal matrix with the i th diagonal element being a i,t .

Filtering
CSP aims to maximize the difference between signals of two conditions after spatial filtering. Defining w as the desired spatial filter, the objective function of CSP in this circumstance can be stated as where :0 stands for the transpose operator and X _ c is the EEG matrix in condition c after FIR filtering. Here multiplying (2) by w 0 will yield where w t~Lt w. (4) can be further reorganized as Inserting (5) into the objective function (3) gives

Singular Problem
The optimization ofw w in (6) is actually equivalent to solving a generalized eigenvalue problem, just like conventional CSP [7,15]: However in (7),X X is a matrix generated by concatenating several X t . Thus it becomes practically possible that the number of 'channels' will overwhelm the number of samples in each channel. Such an imbalance could causeX X cX X c 0 to be singular. The singularity, especially in cases where training data are very limited, can induce a biased outcome. Hence, in order to enforce a more trustworthy result, (7) shall be regularized: where < 1 ( : ) and < 2 ( : ) are regularization terms.

Regularization
It is known that conventional CSP is sensitive to both noises and overfitting [1,16]. These disadvantages can be addressed by introducing regularization terms into CSP, which has been well discussed in [17]. In this work, an efficient and effective strategy is employed to realize regularization: where a is a relatively small scalar and I is an identity matrix. (9) will assure the uniqueness ofw w.

Experimental Setup
Two types of datasets were collected for the assessment of the proposed method. One was acquired from publically available BCI competitions datasets for motor imagery classification, and the other was obtained from self-conducted ERP detection experiments in the scenario of rapid serial visual presentation (RSVP) [6,18,19].

MI Datasets
Three publically available datasets, i.e. data set IVa, data set IIIa from BCI Competition III [20], and data set IIa from BCI competition IV (http://bbci.de/competition/iv/), recorded subjects' EEG signals while imagining their limb movements. Each dataset contained training sets and testing sets. A brief summary of the experimental paradigms pertaining to this work was presented in Table 1, whilst full details of these three datasets were available in the literature [17,[21][22][23].

RSVP Experiments
The experiments approved by the National University of Singapore Institutional Review Board (NUS-IRB) consisted of training sessions and testing sessions. In each session, a sequence of small-sized images (4006400 pixels) was presented to the subject, who was instructed to immediately press a button when images of interest (targets) appeared. In this work, the targets were images containing target objects and others were regarded as distractors. There were 41 targets and around 4000 distractors in each session. After providing their written consent forms which were approved by NUS-IRB, 4 subjects participated into the experiments. Scalp EEG signals were collected at 250 Hz, using a 62-channel ANT amplifier (ANT B.B., Enschede, Netherlands), referenced to linked ears and grounded to the forehead. Figure 1 demonstrates the experimental paradigm.

Preprocessing
For all three MI datasets, the preprocessing procedure followed the work in [17]. That is, each trial extracted from the time segment located from 0.5 s to 2.5 s after the cue was bandpass filtered in 8-30 Hz by a fifth-order Butterworth filter. In RSVP experiments, without analog filter, the EEG signals were firstly high-pass filtered twice and then low-pass filtered using the 'eegfilt' function from EEGLAB [24], with the cut-off frequencies being 1 Hz and 25 Hz, respectively. The filtered signals were segmented into an event-locked window from the onset of each image to 500 ms after the onset.

Evaluation
Whether the proposed coupling of the high-order channelspecific temporal filter with spatial filter could render better singletrial classification capability than conventional CSP which relies solely on spatial filter, can be verified based on the overall performance on MI datasets and RSVP experiments. Additionally, results of competing methods, i.e. CSSSP, BCSP, and ACSP, were also reported for comparison.

Feature Extraction
Like conventional CSP, the features extracted by each method are the log-variances of the filtered signals. Only filters associated with the largest or smallest eigenvalues are used for extracting discriminative features. In the scenario of MI datasets, 3 pairs of features corresponding to the most discriminative filters were used [3,17]. For RSVP experiments, 2 pairs of features were extracted [5,7].

Classifier
The classifier adopted was weighted support vector machine (WSVM) based on LIBSVM [25] [26]. WSVM imposes higher penalties on the misclassification of the minority class [27]. Hence it is less vulnerable to the unbalanced classification problem, e.g. the number of distractors overwhelmed that of targets in RSVP experiments. For the similar reason, the balanced accuracy (BA) was particularly chosen as the performance measure for RSVP experiments in this work [6]. On the other hand, the performance measure for MI datasets resembled the work in [17].

Parameter Selection
As indicated in Section II, two parameters are left undetermined, i.e. order of the FIR filter 2kz1 and regularization scalar a. In this work, their values were chosen in two ways: 1) k and a were given (5 and 10 25 , respectively) and applied to all data sets; 2) k and a were automatically selected among [0, 1, 3, 5] and [10 24 , 10 25 , 10 26 ], respectively, using a 5-fold cross-validation procedure.

Results
In Table 2, CCSSP with and without automatic parameter selection were represented by Pcv and Pfix, respectively. It can be seen that both Pcv and Pfix outperformed the conventional CSP. In specific, Pcv and Pfix achieved 2.7% and 3.7% higher average accuracy, respectively. Their performances were more superior in RSVP experiments, where the achieved accuracies were 7.9% and 8.1% higher than CSP, respectively. Among 21 subjects, Pcv and Pfix had better performance than CSP in 15 subjects. Moreover, the paired t-test showed that the better performance of Pcv over CSP seemed to be marginally significant (p-value = 0.06), and the improvement offered by Pfix was shown to be statistically significant, as p-value is less than 0.001. Additionally, the proposed method surpassed other methods, i.e. CSSSP, BCSP, and ACSP, according to Table 2, which was also statistically significant, with p-value,0.001. Moreover, the general performances of CSSSP, BCSP, and ACSP were worse than that of CSP. However, this phenomenon was absent in the scenario of RSVP experiments, where BCSP obtained 5.8% higher average accuracy in compar-  ison to CSP, and ACSP and CSSSP had slightly better performance than CSP.

Discussion
Intuitively, the difference between CSP and the proposed method lies on the types of filters that are extracted. CSP filters are purely spatial filters, whilst the proposed CCSSP exploits the more complicated, spatial-FIR filters. It is worth noting that spatial-FIR filter is essentially a combination of several spatial filters, each of which takes effect on its corresponding time-delayed EEG data. Therefore, a spatial-FIR filter can be split up into a set of spatial filters. For instance, given k~5 in this work, 11 spatial filters could be derived, which were partly shown in Figure 2.
For Subject aa in Figure 2, the filters at the first row, regardless of CSP or CCSSP, all imposed heavy weights on the left somatosensory area, which is in accordance to the fact that the characteristic EEG signals are controlateral to the imagined hand movement [28]. On the other hand, in the scenario of imagined right foot movement, the characteristic area localizing on the central region between left and right hemispheres [28], seemed to be overlooked by CSP filter, which however was observable in filters at t~{5,{3,1,5. This indicates that CCSSP could uncover distinctive spatial distributions which might have been obscured by CSP. It is accomplished by exploring the temporal information of individual channel in addition to the overall spatial projection. Furthermore, Subject R3 visually presented a much prominent contrast between CSP filters and the spatial-FIR filters in Figure 2. Typically in a RSVP experiment, the main component of ERP elicited by a target, is P300, which emerges and propagates across the scalp from frontal to parietal, and has strong correlation in neighboring regions [5,29]. Hence, it is interesting to see that the CCSSP filters contained large patches of heavy weights in parietal area as well as other regions, while the CSP filters appeared more moderate, absent from showing discriminative regions.
Besides being perceived as a mixture of spatial filters, a spatial-FIR filter can be interpreted as a pool of channel-specific FIR filters. The fact that these FIR filters are channel-specific can be seen in Figure 2, where the changes of spatial weighting over different time delay t at different channels are different. It is explainable as the FIR filter coefficients defined in (1) are not uniformly, but specifically assigned to each channel. In the view of the mathematics, the selection of a channel-specific FIR filter is  Complex Channel-Specific FIR and Spatial Filters PLOS ONE | www.plosone.org determined by whether this particular FIR filter can help extremize the objective function (6). Its corresponding physical meaning in this study is that, such a particular FIR filter is a filter which makes the characteristic frequency band of that channel more prominent as compared to other bands. Figure 3 shows the frequency responses of FIR filters in Channel CP3 for Subject aa and Channel CP1 for Subject R3. Specifically, the frequency response in Channel CP3 (see Figure 3A) indicated the brain oscillations at m rhythm (8)(9)(10)(11)(12), which corresponds to the imagined movements, resides within the pass-band of FIR filter 1. Unlike FIR filter 1, FIR filter 2 relatively suppressed m rhythm in CP3. This difference is understandable as the FIR filters were synthesized for the purpose of further differentiating two conditions, e.g. right hand v.s. right foot, where the characteristic areas are distinct from each other. It is worth noting that slow ERP of low frequency such as P300 are the signature in RSVP experiments. Although the magnitude appeared small in Figure 3B, FIR filter 1 functioned like a low-pass filter in the window of 0 Hz to 20 Hz. On the other hand, FIR filter 2 relatively suppressed low-frequency signals and emphasized more in a higher frequency band (from 10 Hz to 20 Hz). The effect of FIR filters in Figure 3B becomes more straightforward in Figure 4, where the relative signal powers before and after filtering in CP1 are presented. As demonstrated in Figure 4, the low-frequency components both target ERP and distractor ERP were strengthened after FIR filter 1 in Figure 3B was applied, meanwhile the signal powers approximately above 5 Hz were significantly attenuated. This phenomenon also echoes that FIR filter 1 in Figure 3B served as a low-pass filter.
Furthermore, the waveform of filtered signals could shed some light on the impact of filters on the discrimination between targets and distractors. Figure 5A and Figure 5B depict the EEG signals after being filtered by the CSP filter and the proposed spatial-FIR filter, respectively. The general waveforms in both cases were similar. That is, target ERP diverged from distractor ERP in the sense of signal power. It is noteworthy however, that the target ERP in Figure 5B had sharper and higher peak, compared to Figure 5A. In addition, the recurring ripples of distractor ERP in Figure 5A were cleaned up in Figure 5B, meanwhile the target ERP in Figure 5B suffered less up-and-downs as compared to its counterparty in Figure 5A. Such improved smoothness shall be attributed to filters like FIR filter 1 in Figure 3B which narrowed down the frequency band. Mathematically, the sharpened peak and the attenuated distractor ERP in Figure 5B indicated larger difference in variances, and consequently a better separation between two conditions. Provided these observations and the classification performance, it could be stated that unlike CSP which tries to find stationary  filters, CCSSP searches for a set of filters originated from different time delays with different weighting. This voting process could grant CCSSP the desirable flexibility and dynamics, which in return rendered better performance, in comparison to conventional CSP.
It is necessary to point out that Pcv underperformed Pfix in Table 2 in general. To a great extent, it was due to that there were insufficient training samples with regard to the number of channels ofX X in (5). If high-density electrodes were adopted and/or accompanied by a high order of the FIR filter used, sizable training samples were required in model selection or parameter tuning. This is a noticeable drawback of the proposed method. In RSVP experiments, since there were much more training data, the selection of reliable parameters was ensured. Thus, it could be found that the performance of Pcv was comparable to that of Pfix. Another drawback of Pcv is associated with the computational burden. The matrix size ofX X is proportional to the time delay k. It would take remarkable time to identify the suitable regularization parameter and k.
Among other competitive methods listed in Table 2, CSSSP is the one which also makes use of the FIR filter to explore the temporal information. It optimized a single FIR filter and the single filter was applied to the entire multi-variant EEG signals, without much difference from the filters in the ordinary preprocessing step. Hence, CSSSP might improve the performance (e.g. in RSVP experiments), but the improvement could be constrained and counteracted by the necessity of careful regularization, if there was a lack of training samples (e.g. IVa, IIIa and IIa). Compared to CSSSP, BCSP performed much better in RSVP experiments, which was very close to CCSSP. However, BCSP did not perform well on MI datasets. The reason might be that, BCSP is suitable for ERP detection since ERP's time course is well defined and can be modeled in the common temporal patterns of BCSP. However, the characteristic signals of MI datasets are oscillatory rhythms, and FIR filters appeared to be more preferred. Similarly, ACSP which has evident strength in applications where phase relationships of data are critical was found not very effective on MI datasets. However, its average accuracy in RSVP experiments was slightly better than CSP. It might be partially contributed by the stronger phase relationship of ERP in comparison with that of oscillatory rhythms.

Conclusion
In this study, CCSSP has been introduced to the CSP family. It naturally integrates and optimizes complex, specially tailored FIR filters together with spatial filters for desirable separation of two distinct conditions. The merits of such a data-driven pass-band selection for individual channels in supplemental to the broadband CSP filtering have been attentively validated on datasets of different characteristic EEG signals. The quantitative and qualitative comparisons suggest superior discriminating capability of the proposed method over conventional CSP, e.g. 8.1% higher average accuracy in RSVP experiments.