Channel Selection Based on Phase Measurement in P300-Based Brain-Computer Interface

Most EEG-based brain-computer interface (BCI) paradigms include specific electrode positions. As the structures and activities of the brain vary with each individual, contributing channels should be chosen based on original records of BCIs. Phase measurement is an important approach in EEG analyses, but seldom used for channel selections. In this paper, the phase locking and concentrating value-based recursive feature elimination approach (PLCV-RFE) is proposed to produce robust-EEG channel selections in a P300 speller. The PLCV-RFE, deriving from the phase resetting mechanism, measures the phase relation between EEGs and ranks channels by the recursive strategy. Data recorded from 32 electrodes on 9 subjects are used to evaluate the proposed method. The results show that the PLCV-RFE substantially reduces channel sets and improves recognition accuracies significantly. Moreover, compared with other state-of-the-art feature selection methods (SSNRSF and SVM-RFE), the PLCV-RFE achieves better performance. Thus the phase measurement is available in the channel selection of BCI and it may be an evidence to indirectly support that phase resetting is at least one reason for ERP generations.


Introduction
Brain-Computer Interfaces (BCIs) are communication systems that allow people to send information to a computer or commands to other electronic devices only by measuring brain activities without any body movement. Such systems can be considered as the solely way of communication for people who suffer severe neuromuscular diseases and are incapable of any motor functions but are cognitively intact [1], [2]. To date, in non-invasive functional brain monitoring methods, the electroencephalography (EEG) provides a preferable solution in most circumstances with a high time resolution as well as simple and affordable recording requirement [3].
The P300 speller is one of the most popular EEG-based BCI paradigms and provides many clinical applications [4][5][6]. As described by Farwell and Donchin, a P300 speller presents a character matrix on a computer display in front of BCI users. Each cell of the matrix contains a character and either row or column is intensified individually and randomly. Spelling with the BCI, users should pay attention to the character they wish to communicate with [7]. Since the occurrence rate of the row (column) containing the focused character is often below 20%, intensifications of this row (column) exert target stimuli and elicit P300 responses. The BCI system identifies these P300 potentials and transforms users' attention to character output (i.e. the intersection of the row/ column targets). The P300 response is an internal mechanism of the human brain, which allows the P300 speller to require no BCI user training [8].
The performance of the P300 speller depends greatly on the quality and amount of the information acquired from EEG records [9]. Nowadays, a great challenge of the P300 speller is the optimization of the number of electrodes for each user [9], [10]. A reduced number of electrodes will take less time to install, be more user-friendly, reduce the expense of BCI equipment and consume less power. This may efficiently support wireless EEG caps [10], [11]. Previous works on this subject pay most attention to neuroscience evidences. According to what the neurophysiology suggests, early research focused on the standard locations (i.e., Fz, Cz, Pz) [8], [12], [13]. Some offline studies suggest that the use of additional locations, particularly posterior sites, may improve classification accuracy [14][15][16][17][18], and a six/eight-electrode configuration is proposed to provide a satisfactory classification with appropriate use [19][20][21][22]. However, P300 signals are subjectspecific [19], and the optimal EEG recording locations for P300 identification may vary in practice. It is possible that a different montage would be required for patients with various neuromuscular pathologies. Accordingly, an adaptive channel optimization method is necessary for practical applications to identify an individual montage [23].
The classical greedy strategy, known as 'backward elimination', has been popularly recommended and used in recent BCI channel selections [9], [10], [23], [24]. In general, it starts with a full set of electrodes according to the 10-20 system (covering all areas of the head) and reduces the number of required EEG channels while keeping the classification accuracy optimal [9]. A robust feature space should contain more identifiable but less redundant information. To this end, features worthless for accuracy should be removed. One approach is to use pure mathematical evidence, which is called dependent criteria, i.e., constructing a series of feature combinations and selecting the best one with the highest classification rate. Some methods in this approach, such as genetic algorithm or SVM-RFE, have been used successfully for channel optimization in BCIs [23][24][25]. However, it's difficult to use them in P300 spellers because of the high computational complexity. Another approach, called independent criteria, is computationally simple. It directly evaluates all features, and then removes the identified ''worthless'' features. Algorithms of this method are popular in many fields, but rarely used in EEG feature selections. The comprehensive work [10] on this subject by Cecotti et al indicates that the cost function based on signal to signal-plus-noise ratio (SSNR) is better than that based on classification accuracy in terms of channel selection using backward elimination in P300 spellers. Meanwhile, a pre-processing using a spatial filter (SF) based on the xDAWN algorithm [26], [27] helps to select the optimal channels remarkably [10]. Thus it is promising to evaluate the importance of EEG features on the classification by using a global measure of EEG signals [10].
Up to now, there have been two distinct mechanisms to generate averaged ERP responses. The evoked model, as a basis of SSNRSF (i.e. SSNR plus SF), considers that the ERP response results from a superimposed neuronal activity with fixed-polarity and fixed-latency to background electroencephalographic oscillations [28][29][30][31], while the oscillatory model believes it is generated by a partial phase synchronization of the ongoing EEG [32][33][34][35][36][37]. The debate between the two models has existed for a long time, since neither of them can explain the evoked potential exclusively. However, some recent literature suggests that the event-related potential is at least influenced by oscillatory brain activity [38][39][40]. Phase relations reflect the cooperative interactions between anatomically disparate neural populations [41], [42]. Such cooperative brain processes that are detectable at various spatial scales are supposed to be fundamental to the dynamic organization of sensory and cognitive brain functions [43], [44]. In ERP studies, phase-based measurement provides robust and sensitive monitoring on task-related fluctuations [39]. Moreover, recent studies imply that it's possible to evaluate how EEG features contribute to classification by using phase related measurement. On one hand, EEG records from the same or connected functional areas should show more phase synchronization than those from different or unconnected functional areas [43]. Such phase synchronization between two specific channels would also be different from target mental task to non-target [41]. On the other hand, task-relevant cortical areas make a great contribution to a mental task, and the corresponding channels should play an important role in classification. Therefore, channel selection through phase measurements is a promising approach in the P300 speller, but has never been reported.
In this study, different from additive evoked model based methods, the PLCV-RFE as a phase measurement based method is developed and tested to verify its effectiveness. By measuring phase relationship between EEG channels, the PLCV-RFE separates channels into diverse clusters, and then ranks them to ensure that the first n channels reflect as many important sources as possible. This paper is organized as follows. Section 2 addresses the methodology including the BCI experiment and the PLCV-RFE algorithm. Section 3 presents the results and discussion.

Ethics Statement
This study was approved by the Institution Research Ethics Board at Tianjin University. All subjects gave written informed consent after the nature and possible consequences of the study were explained.

Experiment
The experiment paradigm followed Farwell and Donchin [7]. During the experiment, the participant sat in front of a computer monitor and viewed a 6 by 6 letter matrix ( Figure 1). The task was to pay attention to a specified letter on the matrix and silently count the number of times the target character intensified, until a new character was specified for next selection. The character currently specified for selection was presented on the top left of the screen. At the beginning of each character block the matrix was blank for 2.5 s. Then, the rows and columns were intensified for 100 ms with 75 ms blank between intensifications. The matrix flashing presented 12 different stimuli to users and two of them contained the target character (one particular row and one particular column). A complete cycle of six row and six column intensifications constituted an epoch, and 15 epochs constituted a character block. 80 character blocks were conducted in the experiment for each subject. Thus there were 2,400 target trials and 12,000 non-target trials in this study.
Nine right-handed healthy subjects (23-25 years of age; 2 females) participated in the study. All subjects had no experience with a P300-based BCI system. The EEG signal was recorded by using a Neuroscan Synamps2 system with an EEG cap whose located electrodes follow the 10-20 system ( Figure 1). All channels were referenced to the central lobe and grounded prefrontal lobe, and then re-referenced to the bilateral mastoid. EEG signals were bandpass filtered at 0.1-200 Hz, digitized at a rate of 1,000 Hz and stored. In the pre-processing EEG signals were first filtered to 0.1-40 Hz and downsampled at 200 Hz for phase measurement, and then downsampled at 20 Hz for classification. 700 ms of EEG after stimulus onset from the 32 channels was defined as the stimulus response and extracted. The first 40 character blocks were used for training, while the others were the test session.

Phase Locking and Concentrating Estimation
Phase locking values (PLVs) may be an effective approach to measure the variability of phase difference between two EEG signals [45]. However, neglecting the initial phase value makes it impossible to measure the degree of phase concentration which induces the ERP from the view of the oscillatory model. In this paper, we propose a novel method to estimate the phase locking and concentrating value (PLCV) of EEG signals.
The instantaneous phase can be obtained by the analytic signal. For casual signal s (t) , the analytic signal z (t) is a complex function defined as (1) and (2), whereŝ s (t) is the Hilbert transform of s (t) .
Where j is the imaginary unit, A is the amplitude and e is the natural exponent.
If x and y denote the xth and the yth EEG channels, the phase locking value (PLV) should be defined as follows [45]: x and Q (t) y represent the instantaneous phase of EEG signals in the xth and the yth channels. The PLV is an average of n trials and ranges from 0 to 1. If channel x and channel y are more/less likely to be homologous, DQ (t) x,y will show less/more variability, which brings a high/low PLV.
The phase concentration value can be defined similarly as follows: . PCV derives from the idea of intertrial phase coherence (ITC) whose equation is It extends ITC to the condition of two channels. Thus PCV measures the consistency across the trials of two channels. If y is phase-synchronized to a certain mental task, Q (t) x or Q (t) y will be more constant when the task occurs according to the oscillatory model. Otherwise, Q (t) x or Q (t) y will be more random as it has no relation to the task. Therefore if channel x and channel y are more/less likely to be phase-synchronized to a certain mental task, SQ (t) x,y will show more/less concentration, which results in a high/ less PCV.
PLV and PCV characterize different relationships between EEG behaviors of two locations. PLV measures whether they come from the same source, while PCV represents whether or not they are related to a mental task. Here we combine them to analyze ERPs and the phase locking and concentration value (PLCV) is defined as: 1{PLV (t) x,y is used here because homologous features rarely provide useful information to the recognition and they can be considered as having negative contributions. From (6) a high PLCV represents a pair of heterologous channels synchronized to a certain mental task.

Channel Ranking Using PLCV
It is the variance between target response and nontarget response that is the most important element in the recognition of event-related potentials. Then, the target effect of phase locking and concentration is defined as: The TE value is a combined effect of mental task relativity and channel homology factors. A high value of TE (t) x,y signifies that the phase concentration of channel x and channel y in target response is significantly higher than that in nontarget response, and such concentration is more likely to be multisource. However, a low TE (t) x,y value may be a result of a homologous source, or the brain fluctuation behaving with little relation to the target mental task. For example, the value of TE (t) x,x is zero for the same source reflected by channel x and itself. In this view, a low TE (t) x,y value may show that the EEG features are similar in the xth and the yth channels. Under this condition, if the features from the xth channel have been selected in the training of the learning machine, the yth channel will provide little additional useful information, and then can be regarded as a redundant channel, i.e., the yth channel contributes little to the classification.
However, it is hard to select the channel of the least contribution directly by evaluating TE measurements. To solve this problem, we use a hierarchy clustering method combined with a recursive strategy to group channels and iteratively rank each of them. We call this method ''phase locking and concentrating value based recursive feature elimination (PLCV-RFE)''. In the clustering of PLCV-RFE, the TE is used to characterize the behavioral similarity between two channels. Since TE is a time varied value, we define the maximum value in the time window after the stimulus onset as the similarity, where n is the number of channels. Once the xth and the yth channels are identified as the most similar couple among all available channel couples, the less important channel is identified with the lower task relativity value (TRV) which is defined as: .
where k is a decreasing variable in the recursive procedure. Its initial value is n. In each step of the recursive procedure, the current similarity matrix S x,y Â Ã k|k is used to construct a hierarchy cluster, in which each EEG channel is considered as a leaf on the hierarchy cluster tree Z. According to the Z, the couple with the least S x,y is identified at first, and then the channel with a lower TRV is eliminated from the identified couple. Before the next repetition, the similarity matrix needs to be reconstructed, since the number of channels surviving has been decreased. This process is carried out iteratively and all channels are sorted with ascending importance. Figure 2 illustrates a flow chart of PLCV-RFE that takes a 5-channel set as an example.
The Pseudocode of PLCV-RFE is as follows: Initialize: group channels into k21 classes by using the cluster tree Z. 4) identify the only class that contains two channels: C = [p,q], where p,qMSC.

5)
compute the TRV of each channel in C.   Output: channel ranked list R.

Recognition Method
To evaluate the performance on a selected subset of channels, we measured the character recognition accuracy. 80-character spelling data for each subject was divided into two parts. The channel ranking ran on the first 40-character data, while the following 40-character data were used to calculate accuracies of the channel sets selected previously. Fisher's Linear Discriminant Analysis (FLDA) was used for character recognition. As a benchmark method for BCI classification, FLDA has been proven to be capable of providing good performance P300-based BCI spelling [22]. Figure 3 shows PLV and PCV differences between target and nontarget responses in certain channel pairs. In pair Cz and Pz, target PLVs achieve lower values from about 100 ms to 250 ms, while maintaining the level of nontarget at other times. In contrast, pair C4 and P8 gives higher target PLVs from about 100 ms to 450 ms. For PCV, the decrease and increase between nontarget and target can be found in pair Fc1-Oz and Fp1-Cz, respectively. Therefore, the phase measurement can reflect the changing relationship between channels. Figure 4 presents the character recognition errors (CREs) against the number of selected channels by PLCV-RFE in 1, 8 and 15 epochs (repetitions). In most cases, the test error curves often increase smoothly or remain steady after a rapid decrease, with the number of selected channels increasing. In general, the more repetitions used, the fewer test errors. For a further analysis, the optimal channel subset (OCS) is proposed in this study, which results in the least test error with the least number of channels. For example, if 0%, the least CRE is achieved when 10 and 12 channels are used; then the OCS is the first 10 channels.

Results and Discussion
Compared with the full channel set, the OCS has fewer CREs with fewer repetitions. For example, as illustrated by Table 1, the averaged CRE decreases 6.4%, 3.6%, 2.5%, 2.8%, 1. In addition, with the increased repetitions, the size of OCS will be reduced significantly. For example, an average of less than 10 channels will be achieved in the OCS when using more than 6 repetitions.
The important channel sites may give a further understanding of the P300 speller. Table 2 illustrates the top 12 channel rankings of all subjects. P7, P8 and Cz are common to most of the subjects. In this study, all channels are weighted by z scores. P7 is the highest and wins, then followed by P8, Cz, T8, FC5, Oz and Pz. Figure 5 displays the channel weights by means of topography. The weight distribution is subject-specific. For example, the middle channels get less importance in S5, but they are essential to S8 to get a good classification performance. In the last averaged topography, the color distribution is approximately symmetric. Coherently with the neurophysiologists, Cz, Pz and Oz play important roles in the P300 speller. But on the contrary, Fz ranks 22 nd , which shows less essential than FC5, the lateral channel in the frontal area. In addition, P7, P8 and T8 also contribute significantly to character recognition accuracy.
To confirm the efficiency of PLCV-RFE, two state-of-the-art feature selection methods, (SVM-RFE [24] and SSNRSF [10]), are involved in this study. Figure 6 shows the averaged size of OCS against the number of repetitions with corresponding CREs. The PLCV-RFE chooses fewer channels than SSNRSF and SVM-RFE in most cases. For example, in 8 repetitions, the OCS has 7.9, 11.9 and 13 channels for PLCV-RFE, SSNRSF and SVM-RFE respectively. Two paired t-tests prove that the PLCV-RFE is significantly superior to the other two methods in channel reduction (PLCV-RFE versus SSNRSF: t(134) = 21.9, p-value,0.05 and PLCV-RFE versus SVM-RFE: t(134) = 22.4, pvalue,0.01). In addition, all three methods have comparable performance in CREs. For example, the least averaged CREs in 5 repetitions are 2.5%, 3.1% and 3.3% when using PLCV-RFE, SSNRSF and SVM-RFE respectively, and all CREs are 0.8% in 8 repetitions. Two-way analysis of variance (ANOVA) shows no significant difference among these methods in terms of CREs (F(2, 360) = 0.03, p-value = 0.97). Therefore, PLCV-RFE achieves a better performance, considering its OCS gets fewer electrodes without loss of accuracy. A further comparison is shown in Figure 7. Since a practical BCI system prefers a small electrode set, one of the core objectives of a channel selection method is to discover the fewest electrodes with the least test error. Figure 7 displays the averaged CREs of 1 to 10 best subject-specific electrodes with different methods. A channel subset with six electrodes can be found in many studies on the P300 speller, and has been proven to be able to provide a classification performance as good as other expanded channel sets [20], [21]. Therefore, we make another comparison with the performance of a subject-independent six-channel set whose locations are predefined following previous studies on the P300 speller [21]. Among these four-channel sets, the PLCV-RFE achieves the lowest CRE and three paired t-tests demonstrate In this study, there are three feature selection methods compared from the view of performance of channel selection in the P300 speller. These methods are derived from different mathematical ideas. The SVM-RFE is a kind of dependent criterion that has been successfully used in many areas, such as channel selection in motor imagery-based BCI. As the SVM-RFE is an efficient universal algorithm, it can provide satisfactory results in many cases, but may be not the best one such as in this study. The SSNRSF and PLCV-RFE are both independent criteria with global measurements of the EEG signal. The SSNRSF assumes that spelling responses are additive to the spontaneous electroencephalogram and other background artifacts according to the mechanism of the evoked model. The quality of evoked potentials is reflected by signal to signal-plus-noise ratio, which is related to the classification accuracy in the SSNRSF. The aim of using the spatial filter xDWAN is to amplify the evoked energy, while restraining the background noise. However, the PLCV-RFE works with the assumption of phase correlation between spelling responses. The idea is rooted in the oscillatory model which believes that stimuli induce a phase reset of ongoing neural activity. If the phase resetting is an acceptable reason for the generation of ERP just as suggested in [38][39][40], a proper phase measurement approach can reflect abundant information about the EEG evolution with which to select channels. Therefore, the outstanding performance of PLCV-RFE may be an evidence to indirectly support that the oscillatory model is at least a partial reason for the ERP generation in BCI spelling tasks, due to no  utilization of energy or SNR information in the PLCV-RFE procedure.
Previous studies have suggested that visual response and cognitive processing are two main neural activities in response to overt target stimulus [46], [47]. Other neurophysiology studies have discovered that at least eight classes of independent components contribute to visual evoked responses [35]. Frenzel et al have realized a P300-based BCI system with two parallel communication lines by detecting different brain activities [48]. Therefore, the responded potentials of the P300-speller can be regarded as a mixed effect of many sources, which is consistent with the idea of PLCV-RFE that pursues the new features from different sources to robust channel selection.
From the view of results, the best channel location set is subjectspecific for users to control the P300 speller. And especially for patients suffering from central nervous disorders, a measurement of the best channel locations is beneficial and helpful for them to use the BCI system, since the great change of brain structures and functions may influence the normal EEG signal. This paper introduces a novel approach to select channels in P300 speller paradigms. The PLCV-RFE, as a phase measurement based channel selection algorithm, can effectively remove the less important channels without loss of classification accuracy, and shows better performance than other state-of-the-art methods in this study. Thus, phase measurement is effective in channel selection of BCI spelling.