Overlapped Partitioning for Ensemble Classifiers of P300-Based Brain-Computer Interfaces

A P300-based brain-computer interface (BCI) enables a wide range of people to control devices that improve their quality of life. Ensemble classifiers with naive partitioning were recently applied to the P300-based BCI and these classification performances were assessed. However, they were usually trained on a large amount of training data (e.g., 15300). In this study, we evaluated ensemble linear discriminant analysis (LDA) classifiers with a newly proposed overlapped partitioning method using 900 training data. In addition, the classification performances of the ensemble classifier with naive partitioning and a single LDA classifier were compared. One of three conditions for dimension reduction was applied: the stepwise method, principal component analysis (PCA), or none. The results show that an ensemble stepwise LDA (SWLDA) classifier with overlapped partitioning achieved a better performance than the commonly used single SWLDA classifier and an ensemble SWLDA classifier with naive partitioning. This result implies that the performance of the SWLDA is improved by overlapped partitioning and the ensemble classifier with overlapped partitioning requires less training data than that with naive partitioning. This study contributes towards reducing the required amount of training data and achieving better classification performance.


Introduction
The P300 is a component of an event-related potential (ERP) in a non-invasive scalp electroencephalogram (EEG) that was discovered by Sutton et al. [1]. The P300 appears as a positive peak approximately 300 milliseconds (ms) after a rare or surprising stimulus. The P300 is elicited by the oddball paradigm: rare (target) and non-rare (non-target) stimuli are presented to a participant, and then he/she counts the occurrence of the target stimuli silently. The P300 can be seen in the ERPs corresponding to the target stimuli. Visual and auditory stimuli have often been used to elicit the P300 [2,3]. Currently, the P300 is used in braincomputer interfaces (BCIs) for controlling devices.
The P300 was first utilized for spelling out letters by Farwell and Donchin in 1988 [4]. They proposed a BCI system that typed letters according to the detected P300 elicited by the visual target stimuli, referred to as a P300-based BCI or a P300 speller. The P300-based BCI can control not only a speller but also a wheelchair [5,6], computer-mouse [7], web browser [8], virtual reality system [9], game [10], or smart phone [11]. Since the BCI does not depend on muscle activity, it constitutes a new interface that will provide a better quality of life for patients disabled by neuromuscular diseases, such as amyotrophic lateral sclerosis (ALS) [12]. The interface, classification methods, and their extensions have been studied for more than 20 years (e.g., [13][14][15]).
Stepwise linear discriminant analysis (SWLDA) has been widely used as a standard classification algorithm for the P300-based BCI [16][17][18][19]. Farwell and Donchin first proposed the SWLDA, together with the entire classification protocol for P300 [4]. Schalk et al. proposed a general-purpose BCI system, named BCI2000, in which the P300-based BCI was implemented together with the SWLDA [20]. Krusienski et al. compared the classification algorithms for BCI [21]. Specifically, they compared the classification accuracy of Pearson's correlation method, linear discriminant analysis (LDA), SWLDA, linear support vector machine (SVM), and Gaussian kernel SVM. The results showed that LDA and SWLDA achieved a better performance than the others. Blankertz et al. proposed an LDA with shrinkage for P300based BCI that yielded a better performance than SWLDA when a small amount of training data were given [22].
Ensemble classifiers are among the most powerful classifiers for the P300-based BCI; however, they were developed and evaluated using a relatively large amount of training data. The ensemble of SVMs proposed by Rakotomamonjy and Guigue won the BCI competition III data set II that contains a huge amount of training data (15300 ERP data) [23]. They applied the ensemble classifiers to reduce the influence of signal variability using the classifier output averaging technique [24]. Salvaris et al. compared the classification accuracies of ensemble LDA and ensemble SVM classifiers using the BCI competition III data set II and BCI competition II data set IIb (7560 training data) [25]. They also employed an ensemble of six linear SVM classifiers and evaluated classification accuracies using their own data by 16-fold crossvalidation [26]. An ensemble SWLDA classifier was first proposed by Johnson et al. and evaluated on their own P300-based BCI data (6480 training ERP data) [27]. Arjona et al. evaluated a variety of ensemble LDA classifiers using 3024 training data [28].
In online (real-time) P300-based BCI experiments, a smaller amount of training data compared to the training data used in the BCI competition III data set II and BCI competition II data set IIb tended to be used. Townsend et al. recorded 3230 ERP training data for a row-column paradigm and 4560 ERP training data for a checkerboard paradigm [15]. Guger et al. evaluated the online performances of P300-based BCI, where LDA was trained on 1125 ERP training data [29]. The EEG data are usually high dimensional and the target training data that contain P300 were rare (e.g., 1/6) and have different statistical property from the nontarget data. In other words, researchers must address the class imbalance problem [30] that is severely prone to overfitting. Thus the thousands of training data can be considered small in this field. To be practical, the amount of the training data should be small in order to reduce the training time [21]. However, most of the studies on the ensemble classifiers for the P300-based BCI did not evaluate the classification accuracy using a practical amount of training data, e.g., less than 1000 ERP data.
In an online experiment where less than 1000 training data are given, the ensemble classifier may not perform well because of its method of partitioning training data. Most ensemble classifiers employ naive partitioning that divides training data into partitions by sets of data associated with a target command [23]. According to the use of the naive partitioning, training data were partitioned without overlaps. Johnson et al. also employed the same partitioning method [27]. Due to the naive partitioning method, however, each weak learner in the ensemble classifier is trained on a smaller amount of training data than a single classifier. In addition, the dimension of the EEG data is usually high. In such cases, classifiers are prone to overfitting [32]. Thus, the classification performance of the ensemble classifiers may deteriorate when the amount of training data is small and ensemble classifiers should therefore be evaluated when less than 1000 training data are given.
To develop a better classifier that requires less than 1000 training data, we propose a new overlapped partitioning method to train an ensemble LDA classifier, which we evaluated when 900 training data were given. The overlapped partitioning allows a larger amount of training data to be contained in a partition, Figure 1. Experimental design. We analyzed two P300-based BCI data sets A and B respectively. Data set A was recorded in this online experiment. The recorded data set A is divided into q pairs of training and test data by p=q cross-validation (see Figure 4). Then the classification is performed for all pairs to compute the classification accuracy (see Figure 5). The overlapped partitioning is employed to train ensemble classifiers. Data set B (BCI competition III data set II) contains separated training data and test data. The data set was also classified by the proposed classifiers. doi:10.1371/journal.pone.0093045.g001 although a part of the training data were reused. The proposed classifiers were evaluated on our original P300-based BCI data set and the BCI competition III data set II, using small (900) training data and large (over 8000) training data. One of three conditions for dimension reduction was applied: the stepwise method, principal component analysis (PCA) or none. Our objective was to clarify how the ensemble LDA classifiers with overlapped or naive partitioning and the single LDA classifier performed when 900 training data were given.
Overlapped partitioning is a new partitioning method that is applied in the training of an ensemble classifier, and is designed such that it will be suitable for application in P300-based BCI. When we evaluated the performance of the new method, we also assessed the influences of dimension reduction methods. The algorithms were first compared under the condition that 900 training data were used, which were the smallest amount of data used to date for the evaluation of ensemble classifiers for P300based BCI. In addition, the influence of the degree of overlap used in the ensemble classifier with overlapped partitioning was demonstrated for the first time. We consider that the overlapped partitioning is essential to implement the ensemble classifiers in an online system. This study contributes towards reducing the required amount of training data and achieving better classification performance in an online experiment. Figure 2. Structure of the P300-based BCI system. A target letter is presented to a participant, then letters on the stimulator are intensified by row or by column. The participant must do a mental task: silently count when the target letter is intensified. During this, the event-related potentials (ERPs) that contain the P300 component are recorded from the scalp. The signals are amplified, digitized, then stored in a computer. After finishing all intensifications, the signals were processed to predict a letter, then the feedback is displayed. doi:10.1371/journal.pone.0093045.g002 Figure 3. Stimulator for the P300-based BCI. It has 36 gray letters that form a matrix in the center. Each column of the matrix is numbered 1-6 and each row 7-12. A target letter is provided at the top center of the stimulator and the predicted letter is shown at the top right as feedback in test sessions. doi:10.1371/journal.pone.0093045.g003

Ethics Statement
This research plan was approved by the Internal Ethics Committee at Kyushu Institute of Technology. The possible risks, mental task, and approximate measurement time were explained to all participants. In addition, all participants gave their written informed consent before participating in this experiment.

Experimental Design
Ensemble classifiers with the proposed overlapped partitioning were evaluated on our original P300-based BCI data set (data set A) and BCI competition III data set II (data set B ) as shown in Figure 1. The primary objective is to clarify how the overlapped partitioning for ensemble classifiers influences the classification accuracy. The second objective is to confirm how the three conditions for dimension reduction (stepwise method, PCA, or none) improve classification performances.

Data Set A: Our Original P300-based BCI Data Set
To evaluate ensemble classifiers, we recorded EEG data using an online P300-based BCI, and then computed the classification accuracy offline. During the EEG recording, visual stimuli were provided to the participant. At the same time, the participant performed a mental task. The recorded signals were amplified, digitized, and then preprocessed before a letter was predicted. Our data contains P300-based BCI data from 10 participants that can be used for better statistical analysis. Parameters used in the stimulus and the recording method of data set A are summarized in Table 1.
Participants. Eleven healthy participants (ten males and one female aged 22-28 years old) participated in this study. They had no prior experience of controlling P300-based BCI. During the experiment, we checked the participants' obtained waveform as well as their health status. However, one male participant could not complete the task due to sickness. Thus, we finally analyzed data from ten participants in offline analysis.
Devices. The P300-based BCI consisted of a stimulator, amplifier, A/D converter, and computer as shown in Figure 2. EEG signals were recorded at Fz, Cz, P3, Pz, P4, PO7, Oz, and PO8 scalp sites according to the international 10-20 system, which is the alignment commonly used for P300-based BCI [9]. The ground electrode was located at the AFz site and the reference electrodes were located on the mastoids. The EEG signals were filtered (0.11-30 Hz band-pass filter) and amplified 25000 times with a BA1008 (TEAC Co. Ltd., Japan). Then, the signals were digitized by an AIO-163202FX-USB analog I/O unit (CONTEC Co. Ltd., Japan). The sampling rate was 128 Hz. The P300-based BCI was implemented by MATLAB/Simulink (Mathworks Inc., USA). The recorded signals were analyzed offline using MA-TLAB. Stimuli for the P300-based BCI were presented on a TFT LCD display (HTBTF-24W, 24.6 inches wide with 1920|1080 dpi; Princeton Technology Ltd., Japan) located 60 cm in front of the participant.
Stimuli. We employed most of the parameters of the stimulator that were used in the BCI competition III data set II [23]. The stimulator of the P300-based BCI consists of 36 gray letters that form a 6|6 matrix, a target indicator, and a feedback indicator (see Figure 3). All the columns and rows of the matrix were numbered to manage intensifications and for the subsequent prediction of a letter. The set of column numbers was C~1,2,3,4,5,6 f g , while the set of row numbers was R~7, 8,9,10,11,12 f g . In addition, a set of all the intensifications was I~C|R. A row or a column of gray letters in the matrix turned white for 100 ms (intensification duration), and then changed to gray again for 75 ms (blank duration). At least n(I)~12 intensifications were required to identify an input letter out of the 36 letters. This is called a sequence. One row or column in a sequence was selected by a random permutation. The number of intensification sequences N s was fixed to 15 in the online Preprocessing. EEG data were preprocessed for both online recording and offline analysis. The data were trimmed from the beginning of each intensification to 700 ms (8 channels689 samples). Each 100 ms pre-stimulus baseline was subtracted from the corresponding ERP data. Subsequently, ERP data were smoothed (using a moving average with a window size of 3 ), downsampled to 43 Hz (8 channels630 samples), and vectorized (240 channels6samples).
Sessions and a mental task. EEG data of P300-based BCI were recorded through a training session and ten test sessions, where only the data in the test sessions were evaluated by our proposed p=q cross-validation in the offline analysis. In each session, a participant was required to spell out five letters using the P300-based BCI. A target letter to be inputted was selected randomly by the system. Thus the 900 ERPs (5 letters 6 1 session 6 12 intensifications 6 15 sequences) were recorded in the training session and 9000 ERPs (5 letters 6 10 sessions 6 12 intensifications 615 sequences) for test sessions. A target letter was displayed for 3 s, and then intensifications were presented. The participant was asked to perform the oddball task to elicit P300: the participant had to focus on the cued target letter and count silently when the letter was intensified. During the sessions, observed EEG data were recorded. In the training session, the feedback was not displayed. In the test sessions the feedback was shown in the feedback indicator for 1 s at the end of all 15 intensification sequences for the target letter. The online feedback was computed using the single LDA classifiers [21] and was presented to the participants in order to confirm whether the participant conducted the mental task appropriately in the test sessions. The feedback of success or failure also contributes to motivate participants [33], even though presenting feedbacks does not improve the classification accuracy of P300-based BCI [34]. In addition, the feedback is essential for participants to acquire the appropriate mental task [35]. Also an experimenter confirmed the feedback to make sure that the appropriate classification performance were observed using LDA. All the previous data gathered before the current session were used for learning the classifier in the online recording.

Data Set B: BCI Competition III Data Set II
We also evaluated the proposed ensemble classifiers using BCI competition III data set II because many novel and traditional . Procedure of 1=10 cross-validation used for the evaluation on data set A. In this case, p~1 and q~10. ERP data sets corresponding to fifty letters inputted by a participant were measured. The square aligned at the top illustrates a data set that contained 180 ERP data, 30 of which were labeled as target ERPs, while the others were labeled as non-target ERPs. These data sets were sorted according to measured time. The data sets were divided into ten groups. Then, two successive groups were selected. The former group was assigned to training data and the latter to test data. Then, each weak learner in the ensemble classifier was learned on the assigned training data and tested using the following test data. doi:10.1371/journal.pone.0093045.g004 BCI algorithms have been evaluated using this data set. Since the competition data set contains a large amount of training data, we evaluated the classification performance using limited training data (900 ERPs) in addition to the full training data (15300 ERPs). Parameters used in the stimulus and data recording of the data set B are also summarized in Table 1.
Overview of the data set and stimulator. The data set contains EEG data for participants A and B. The EEG data were recorded from 64 channels. The recorded signals were bandpass filtered (0.1-60 Hz) and digitized at 240 Hz. The same procedure of intensifications and mental tasks for data set A was also applied to the data set B. The differences in the stimulators between data sets A and B were in the size, the font and the brightness of letters, horizontal/vertical distances of letters, and the method of presenting the target and feedback letters. It should be noted that the target and feedback presentation times were different between these two data sets, though these parameters were not directly related to the offline analysis. The data set contains EEG data corresponding to 85 target letters for training (85 letters 6 12 intensifications 6 15 sequences = 15300 ERPs) and EEG data of 100 target letters for testing (18000 ERPs) for each participant. A more detailed description of the data set can be found in [36].
Preprocessing. The same preprocessing method was used for data sets A and B; however different parameters were employed because the sampling rate and the number of channels for data set B were larger than those for data set A. All 64 channels data were used for the offline analysis. The data were trimmed from the beginning of each intensification to 700 ms (64 channels 6168 samples). Each 100 ms pre-stimulus baseline was subtracted from the ERP data. ERP data were smoothed (using moving average with window size = 18), downsampled to 20 Hz (64 channels 6 14 samples), and vectorized (896 channels6samples). The vectorized data are handled as feature vectors in the classification.

Ensemble classifiers with overlapped partitioning
The ensemble classifier divides given training data into partitions, then those partitions were used to train multiple classifiers in the ensemble classifier. The classifier in the ensemble classifier is called a ''weak learner.'' The number of weak learners is denoted by N c . The training data were divided into N c partitions using overlapped partitioning. A dimension reduction method was applied to these partitioned data, and then N c LDA weak learners were trained. The test data corresponding to a letter were processed to compute the scores, and then the scores were translated into a predicted letter. To evaluate the classification performance using thousands of training data, the proposed p=q cross-validation was applied. p/q cross-validation. p=q cross-validation is a special crossvalidation that can reduce the amount of training data. For a fair  Figure 6). One of three conditions for dimension reduction (DR) is applied to each partitioned data : the stepwise method, PCA, or none. Then, N c LDA weak learners are trained on these dimension-reduced data. The training data are used only for the training of weak learners as illustrated by blue lines. After the training session, the test data are processed to compute scores for decision making. doi:10.1371/journal.pone.0093045.g005 comparison of the classification accuracy, the amount of training data used in the offline analysis should be reduced to less than 1000. The traditional cross-validation method is not suitable because it provides at least 4500 training data in this case. Instead, we employed the proposed p=q cross-validation that performed q-fold cross-validation, where p=q of all data were assigned to the training data.
First, the ERP training data are divided into q groups. Second, assuming that the groups are aligned around a circle, pz1 groups from uth group (u[ 1,2,:::,q f g ) are sequentially selected. Then, p consecutive groups are assigned to the training data, and the last single group was assigned to the test data. The above procedures are repeated for all u. In total, q pairs of training and test data are prepared. For each pair, classification is performed. The classification accuracy can be computed as #correct letter= #total letter, where #total letter is the total number of letters and #correct letter is the number of correct predictions among all pairs. It should be noted that (q{1)=q cross-validation is equivalent to the conventional q-fold cross-validation.
In the present study, we have evaluated data set A using the 1=10 cross-validation as shown in Figure 4. In other words, five letters out of 50 were assigned to the training data, which contained 900 ERPs (9000 ERPs 6 1/10). It takes 180.125 seconds to spell out five letters in the conditions of this online experiment, which does not overly tire the participant. In addition to the 1=10 cross-validation, we also used the conventional 10-fold cross-validation (9=10 cross-validation) in order to compare the ensemble classifiers when a large amount of training data were provided. Thus, ERPs for 45 letters out of 50 were used as training data, which contained 8100 ERPs (9000 ERPs 6 9/10). The p=q cross-validation was not applied to data set B because the competition data set has separated training and test data.
Overlapped partitioning. In a BCI study on ensemble classifiers, naive partitioning was used [23]. According to their use of naive partitioning, the given training data were divided into partitions by letters without overlaps. Due to the partitioning without overlaps, the amount of training data in a partition becomes small so that covariance matrices might not be estimated precisely. Instead of this method, we proposed a generalized partitioning method.
All the procedures for training and testing the proposed ensemble classifier for P300-based BCI are shown in Figure 5. In overlapped partitioning, sets of training data are divided into N c partitions, where the overlap of each partition is allowed. In the first step of the overlapped partitioning method, training data assigned to input commands were sorted by recorded time and were divided into N c blocks without overlaps. Then, assuming that the blocks were aligned around a circle, N b consecutive blocks from vth block (v[ 1,2,:::,N c f g ) were selected to form a partition. The procedure was repeated for all v. An example of overlapped partitioning is shown in Figure 6. Each weak learner was trained on the partitioned data (see Figure 5). The advantage of this Figure 6. Overlapped partitioning when N c~5 and N b~3 . Training data were first divided into five blocks. Assuming that those five blocks were aligned around a circle, three continuous blocks were selected to form a partition. As a result, five partitions were prepared. The partitioned training data sets were used to train weak learners in the ensemble classifier. doi:10.1371/journal.pone.0093045.g006 partitioning method as compared to naive partitioning is that a larger amount of data are stored in each partition. Thus, overlapped partitioning may be robust against shortage of training data. In the present study, N c was fixed; however, N b was varied in the offline analysis.
An ensemble classifier with the overlapped partitioning can be considered as a special case of bagging used in pattern recognition [37]. In the bagging, random sampling from available training data allowing overlap is used, which is also referred to as bootstrap sampling. In contrast, the overlapped partitioning does not have any randomness so that no duplicated partition is made except for a special case. Unlike a standard pattern recognition problem, a set of EEG was recorded for every letter, where 30 ERPs contained P300 and the other 150 ERPs did not. The random sampling out of the full set of EEG data runs the risk that only a few ERP data that contain P300 could be selected in a partition, which may deteriorate classification performance. Also the random sampling out of five blocks of EEG data is not effective because duplicated partitions could be prepared. The proposed overlapped partitioning does not have such risks and provides different partitions with a constant ratio of EEG data with P300 to those without it. Thus, the weak learners of the ensemble classifier can efficiently be trained by the overlapped partitioning.
Dimension reduction. A dimension reduction method has often been applied to the BCI because EEG data are usually high dimensional. However, the influences of the dimension reduction methods have not been evaluated for ensemble classifiers. In this study, one of three conditions for dimension reduction was applied: 2 dimension reduction methods (the stepwise method and PCA) and a control condition without dimension reduction (none).
N Stepwise method The stepwise method selects suitable spatiotemporal predictor variables for classification by forward and backward steps. First, an empty linear regression model is prepared, then variables are appended through the following steps. In the forward step, a variable is appended to the model, then the model was evaluated by an F-test. Through the F-test, p-value was computed, which is the probability of the occurrence of a result by chance. The variable is added if the p-value of the F-test is higher than a threshold p in . The forward step is repeated until no variable is appended. In the following backward step, a variable of the temporal model is removed and the model was also evaluated by the F-test. Then, the variable is removed if the p-value of the F-test is lower than a threshold p out . The backward step is continued until no variable is removed. Then, the forward step is repeated again. The final model is determined when no variable is appended to or removed from the model. The remaining variables in the final model are used for classification. More details are given in [21,38]. We set p in~0 :1 and p out~0 :15, which were commonly used for P300-based BCI [21,22].
N Principal component analysis The principal component analysis (PCA) is a typical dimension reduction method which is based on the eigenvalue decomposition [39], and has also been applied to P300-based BCI [10,40]. In summary, the covariance matrix of training data is computed and then the eigenvalue decomposition is performed. The projected data using a normalized eigenvector corresponding to the largest eigenvalue is called the first principal component (PC). The other PCs can be calculated as well. We applied PCA to data in each partition, and then used 1-140 PCs for classification on data set A, 1-400 PCs for classification on data set B.
Linear discriminant analysis. Linear discriminant analysis (LDA) is a frequently used classifier for P300-based BCI. In the ensemble classifier, N c LDA weak learners are implemented. One of three conditions for dimension reduction is applied to the kth partitioned data, and then the weight vector of the kth LDA weak learner is trained as follows: where X k is a total covariance matrix over the target and nontarget training data, and m k,2 and m k,1 are the mean vectors of the target and non-target training data in the kth partition. The trained weight vectors of each LDA weak learner are used to compute the score for the decision making. See [22] for more details of a single LDA classifier.
Decision making. To predict a letter, its corresponding test data were processed to compute scores for decision making. A test feature vector that belonged to the ith intensification in the jth sequence in the kth partition after applying dimension reduction was denoted by x i,j,k ,i[I,j[ 1,2,:::,N s f g . The score s i corresponding to an intensification was computed as.
In the offline analysis, the number of intensification sequences N s was varied from 1 to 15. The inputted letters were then predicted by finding maximum scores from row and column intensifications, respectively: The first element of d represents the column number of a predicted letter, while the second represents the row number. For example, d~ (2,9) denotes ''N'' in Figure 3.
Special cases of overlapped partitioning. The ensemble classifiers with the proposed overlapped partitioning are equivalent to ensemble classifiers with naive partitioning or a single classifier in a special case. That is, the ensemble classifier with overlapped partitioning becomes the ensemble classifier with naive partitioning when N b~1 and N c w1. In this case, partitions do not overlap each other, which can be easily seen in Figure 6. Moreover, the When N b~Nc , all the partitioned data sets are just duplications of all the given training data. After a dimension reduction method has been applied, the same data are stored in all partitions. As a result, all the weight vectors of the classifiers become the same: Since the final model of the stepwise method or the projection of the PCA is adjusted by the same training data, the test data after the dimension reduction should be the same: x i,j~xi,j,1~xi,j,2~: : Considering Equation 5 and 6, the score for decision making instead of Equation 2 is computed by On the other hand, the score for a single classifier is formed as Thus, the relationship between the single classifier and the overlapped ensemble classifiers that have N b~Nc is From Equation 4, s 0 i and s 00 i work in the same way for decision making. Therefore, the ensemble classifier with overlapped partitioning that satisfies N b~Nc corresponds to a single classifier.

Comparison Protocol
We evaluated varieties of ensemble classifiers with overlapped partitioning in order to ensure the influence of the degree of overlap together with dimension reduction methods. One of three different conditions for dimension reduction was applied: stepwise, PCA, or none. They are denoted by overlapped ensemble SWLDA (OSWLDA), overlapped ensemble PCA LDA (OP-CALDA), and overlapped ensemble LDA (OLDA) classifiers, respectively.
Those 3 classifiers were evaluated on data sets A and B. Data set A, recorded by us, was analyzed in the small training data case using 1=10 cross-validation and in the large training data case using 9=10 cross-validation (conventional 10-fold cross-validation). Thus, the same amount of the training data was provided for each The data set A was evaluated by 1=10 cross-validation (900 training data ) and ensemble classifier 900 training data for the former and 8100 training data for the latter. Additionally, in the cross-validation, the training and test data were clearly separated so that none of the training data were used as the test data. Data set B (BCI competition III data set II) was also analyzed using limited training data (ERPs corresponding to the first 5 letters) and using full training data (ERPs corresponding to 85 letters). The former contained 900 ERPs while the latter contained 15300 ERPs for training.
To confirm the influence of overlapped partitioning, the degree of overlaps N b was varied, while the number of weak learners N c was fixed in the offline analysis. Evaluated combinations of N c and N b for data sets A and B were summarized in Tables 2 and 3, respectively. In particular, in the case N b~Nc , the ensemble classifier with overlapped partitioning is equivalent to the single classifier. In addition, in the case where N b~1 and N c w1, it behaves as a conventional ensemble classifier with naive partitioning. It should be noted that the algorithms were learned on 900 training data of both data sets, which was much smaller than the training data used in previous studies, for example, the 15300 training data used in the BCI competition III data set II [23], and 7560 data used in the BCI competition II data set IIb [41]. In our comparison, the single SWLDA which is commonly used in this field and the ensemble SWLDA proposed by Johnson et al. were compared.
For the statistical analysis of data set A, the effects of the intensification sequence (N s~1 ,:::,15), dimension reduction condition (stepwise, PCA or none), and degree of overlaps (N b~1 ,:::,5) were evaluated by three-way repeated-measures ANOVA followed by post hoc pairwise t-tests with Bonferroni's method. No statistical analysis was applied to data set B because of the limited number of participants.

Results
The classification performances of OSWLDA, OPCALDA, and OLDA were evaluated on data set A using 1=10 or 9=10 crossvalidation and data set B with limited or full training data. The degree of overlap used in the overlapped partitioning (N b ) was varied while the number of weak learners in the ensemble classifier (N c ) was fixed. As mentioned above, an overlapped ensemble classifier behaves as an ensemble classifier with naive partitioning when N c w1 and N b~1 , and becomes a single classifier when N c~Nb .
Data Set A Using 1=10 Cross-validation EEG data in data set A were classified by OSWLDA, OPCALDA, and OLDA using 1=10 cross-validation using parameters in Table 2. The classification performances of these classifiers for each participant are shown in Figure 7. The mean accuracies of these algorithms are shown in Figure 8 and in Table 4.
The key finding was that OSWLDA showed higher classification performance than the single SWLDA classifier (N b~5 ) and ensemble SWLDA classifier with naive partitioning (N b~1 ) when 900 training data were provided. As can be seen in Table 4, most algorithms achieved the best performance when N b~4 , while the worst accuracy was observed when N b~1 . Regarding OLDA, when N b~1 , the classification accuracy was close to the chance level (1/36). As can be seen in Figure 8, OSWLDA (N b~4 ) achieved a higher classification accuracy than the single SWLDA classifier (N b~5 ), especially in 3ƒN s ƒ7. At N s~5 , OSWLDA (N b~4 ) obtained an 11:2% higher accuracy than the ensemble SWLDA classifier with naive partitioning and a 4:8% higher accuracy than the single SWLDA classifier. Moreover, OP-CALDA (N b~4 ) achieved a better classification accuracies than OPCALDA (N b~5 ) when 6ƒN s ƒ9, although the differences The ensemble classifiers were trained on limited training data (900 training data ) or full training data (15300 training data). The number of weak learners N c and the number of blocks N b were parameters used in the overlapped partitioning. These evaluation methods and the parameters determine the amount of training data for a weak learner in an ensemble classifier. The number of training data for a weak learner (#training data for a weak learner) can be computed by given training ERPs 6N    were small. In contrast, the accuracy of OLDA (N b~4 ) was close to that of the single LDA classifier (N b~5 ), although OLDA (N b~4 ) achieved slightly higher accuracies in some sequences. A three-way repeated-measures ANOVA with the intensification sequence, dimension reduction conditions, and degree of overlap was applied. The main effects of the intensification sequence (F(14,126)~166:6, pv0:01), dimension reduction conditions (F (2,18)~614:6, pv0:01), degree of overlap (F (4,36)~1356, pv0:01) and all their interactions (pv0:01 for all) were significant. In addition, significant differences between the dimension reduction conditions (pv0:01 for all), and between pairs of N b , except for the pair N b~3 and 5 (pv0:01 for all), were revealed by the post hoc pairwise t-test with Bonferroni's method.
Data Set A Using 9/10 Cross-validation EEG data in data set A were also classified by the three algorithms using 9=10 cross-validation using parameters in Table 2. Classification performances of the three algorithms for each individual participant are shown in Figure 9. The mean classification performances are shown in Figure 10 and Table 5.
The classification performance of ensemble classifiers with the overlapped partitioning were as well as, or slightly better than that of the single classifier when 8100 training data were provided. As shown in Figure 10, the worst classification performance was achieved by the ensemble classifiers (N c~4 5,N b~1 ) for all algorithms, which was the same as the analysis of data set A using 1=10 cross-validation. However, only a little performance improvement of the overlapped ensemble classifiers can be found when compared to the single classifier (N c~4 5,N b~4 5).

Data Set B with Limited Training Data
EEG data in data set B were classified by OSWLDA, OPCALDA and OLDA using 900 training data using parameters in Table 3. Classification performances of OSWLDA, OP-CALDA, and OLDA evaluated on data set B using a limited amount of training data (900 ERPs) are shown in Tables 6, 7, and 8, respectively.
The OSWLDA and OPCALDA (N c~5 and N b~3 ,4) achieved better classification accuracies than those with naive partitioning Overlapped Partitioning for P300-Based BCIs (N c~5 and N b~1 ) and the single classifiers (N c~5 and N b~5 ) when 900 training data were available. As for OSWLDA, the best classification accuracies can be seen when N b~4 . Further, most of the best mean classification performances of OPCALDA can be seen when N b~3 or N b~4 . These tendencies are similar to the analysis of data set A using 1=10 cross-validation. OSWLDA achieved about 10% (15% at best when N s~6 ) higher mean classification accuracy than the single SWLDA classifier (N c~5 ,N b~5 ). OPCALDA also achieved a 5.5% higher mean classification accuracy than the single PCALDA classifier (N c~5 ,N b~5 ) when N s~1 2. However, all of the classification performances of OLDA were close to chance level.

Data Set B with Full Training Data
EEG data of data set B were classified by OSWLDA, OPCALDA and OLDA using 15300 training data using parameters in Table 3. Classification performances of the three algorithms evaluated on data set B using full training data (15300 ERPs) are presented in Tables 9, 10, and 11, respectively.
The classification performances of ensemble classifiers with the overlapped partitioning (OSWLDA, OPCALDA and OLDA, N c~1 7, 1vN b v15) were as well as, or slightly better than those with naive partitioning (N c~1 5 and N b~1 ) and those single classifier (N c~1 5 and N b~1 5) when 15300 training data were available in most sequences. The best classification performance was achieved by OSWLDA; 98% when N s~1 5, N c~1 7, N b~7 , 9,11. In other words, OSWLDA achieved a 1:5% higher classification performance than the ensemble of SVMs achieved by the winner of BCI competition III data set II [23]. OSWLDA achieved about 3% improvement over single SWLDA (N c~1 7, N b~1 7). However little improvement by the ensemble classifier with the overlapped partitioning can be seen compared to the single classifier, just as the analysis of data set A using 9=10 crossvalidation.

Discussion
In order to ensure the influence of the overlapped partitioning compared to traditional naive partitioning and a single classifier, classification accuracies of ensemble classifiers with those partitioning methods were compared when 900 training data were given. Two different P300-based BCI data sets were evaluated; data set A with 1=10 cross-validation and data set B using limited training data. The single classifier (N c~Nb ) and the traditional ensemble classifier with naive partitioning (N c w1 and N b~1 ) were also compared at the same time. One of three conditions for dimension reduction methods (stepwise, PCA, and none ) was also applied. The results show that OSWLDA trained on 900 ERPs achieved higher classification accuracy than the single SWLDA classifier (N c~5 , N b~5 ) and the ensemble SWLDA classifier with naive partitioning (N c~5 , N b~1 ) for both data sets (see Tables 4  and 6). More specifically, the proposed OSWLDA learned on 900 ERPs achieved a 4:8% higher accuracy than the single SWLDA for data set A (N c~5 , N b~4 , N s~5 ) and 15% higher than the single SWLDA for data set B (N c~5 , N b~4 , N s~6 ), where the single SWLDA is an established and commonly used classification algorithm for P300-based BCI.
The performance improvement of proposed classifiers trained on 900 ERPs was due to the mutual effect of the overlapped partitioning and the dimension reduction. In the statistical analysis of data set A using 1=10 cross-validation, the main effects of the intensification sequence, degree of overlap (N b ), dimension reduction conditions, and their interactions were significant. According to the results shown in Figure 8 (c), indeed, the overlapped ensemble LDA classifier without dimension reduction (OLDA) did not achieve higher classification accuracies than a single LDA classifier (N b~5 ) in many cases. Applying a dimension reduction method in itself is a solution to improve the classification performance of the ensemble classifier with naive partitioning.  Table 5. Mean classification accuracies (%) of OSWLDA, OPCALDA, and OLDA evaluated on data set A using  The best accuracy among all N b for each algorithm and each repetition is written in bold and the worst is underlined. An overlapped ensemble classifier becomes an ensemble classifier with naive partitioning when N c~4 5 and  Table 6. Classification accuracies (%) of OSWLDA on data set B with limited training data. Overlapped Partitioning for P300-Based BCIs PLOS ONE | www.plosone.org Table 7. Classification accuracies (%) of OPCALDA on data set B with limited training data.  Table 8. Classification accuracies (%) of OLDA on data set B with limited training data.  Table 9. Classification accuracies (%) of OSWLDA on data set B with full training data.  1  2  3  4  5  6  7  8  9  1 0  1 1  1 2  1 3  1 4  1 5   17  1  A  17  28  49  53  60  62  65  72  79  82  83  85  86  91   However, as shown in Figures 8 (a) and (b), when N b~1 , the dimension reduction method alone did not improve the classification accuracy as compared to their single classifiers. On the other hand, as also shown in Figures 8 (a) and (b), the overlapped ensemble LDA classifier together with the stepwise method (OSWLDA, N b~4 ) or PCA (OPCALDA, N b~4 ) achieved higher classification accuracy than their single classifiers (N b~5 ). This tendency was obvious, especially for OSWLDA. Thus, the improvement in the classification accuracy was due not only to the dimension reduction or partitioning method by themselves but also to their mutual effects. Taking this into consideration, the overlapped partitioning method, together with the dimension reduction method, effectively improved the classification performance of P300-based BCI. The performance improvement of the proposed classifiers compared to the single classifier was small when a large amount of training data were provided. However, the classification performances of proposed classifiers trained on a large amount of data were high enough to achieve 99.6% for data set A (see Table 5) and 98% for data set B (see Table 9). In those cases, however, a major performance improvement caused by overlapped partitioning was not confirmed. This was because the given training data were large enough so that the overfitting problem should not occur in most cases. Thus the advantage of overlapped partitioning can be seen when a small amount of high-dimensional training data were provided such as for the analysis of data set A using 1=10 cross-validation and the data set B with limited training data.
We suggest to use the conventional cross-validation to find the optimal overlapping ratio N b =N c before an online experiment. However the method prolongs the training time for the classifier. Instead of that, we also suggest to use N b =N c &0:8 (e.g., N b~4 and N c~5 ) because it showed suboptimal results for both data sets. In the small training data case (900 ERP data), OSWLDA and OPCALDA with N b =N c~0 :8 (N b~4 and N c~5 ) was suboptimal for both data sets A and B, but OLDA with N b =N c~0 :8 performed as well only for data set A. In the large training data case, OSWLDA, OPCALDA and OLDA with N b =N c~0 :78 (N b~3 5 and N c~4 5) evaluated on data set A and with N b =N c~0 :82 (N b~1 4 and N c~1 7) evaluated on data set B achieved reasonable classification accuracies. In this way, the overlapping ratio N b =N c &0:8 was suboptimal and it can be employed to avoid using the cross-validation.
This study first showed that the ensemble LDA classifiers with conventional naive partitioning were not effective compared to the single LDA classifier and the ensemble classifier with overlapped partitioning when 900 training data were given. This result implies that the ensemble LDA classifier with naive partitioning requires a longer training session to obtain more than 900 training data before an online experiment. It should be noted that 900 training data were the smallest used for the evaluation of the ensemble classifier to date. In contrast, the ensemble classifiers with the proposed overlapped partitioning method showed a significant improvement in the classification accuracy, which was even better than a single classifier when the stepwise method or PCA was applied for dimension reduction. Thus, overlapped partitioning was shown to be more practical than naive partitioning when the given training data were small (e.g., 900 training data).
The performance deterioration of the ensemble LDA classifiers with naive partitioning may be due to the poor estimation of the covariance matrices of LDA weak learners. Such performance deterioration can be seen in the results of OLDA on data set A using 1=10 cross-validation (N c~5 , N b~1 ), OLDA on data set A using 9=10 cross-validation, OSWLDA and OPCALDA on data set B with limited training data (N c~5 , N b~1 ,2), OLDA on data set B with limited training data, and OLDA on data set B with full training data (N c~1 7, N b~1 ). The problem can be seen when N b~1 ,2 because a small amount of training data were provided to the weak learners (see Tables 2 and 3). Regarding data set B, 900 training data were not sufficient to train weak learners of OLDA (N c~5 ,N b ƒ5 with limited training data and N c~1 7,N b~1 with full training data). Compared to data set A, data set B seems to require larger training data because the EEG data of data set B were higher dimensional (896 dimension). Estimated covariance matrices are imprecise when a small amount of high dimension training data are given [22]. Johnson and Krusienski first evaluated the classification performance of the ensemble SWLDA classifier with naive partitioning [27]. They evaluated the algorithm by changing the number of classifiers (N c was changed while N b was fixed to 1). In addition, three weighting methods for the ensemble classifier were evaluated. As a result, they found that the ensemble SWLDA classifier showed better performance than the single SWLDA classifier, depending on participants, though the statistical difference was not revealed. They also discussed that the classification performance was decreased when N c w6 and N b~1 because the amount of training data for a weak learner becomes small. We consider that a similar problem arose in the application of the ensemble classifier with overlapped partitioning when N c~5 and N b~1 , which is similar to their conditions. Such a problem can be avoided by applying the overlapped partitioning together with a dimension reduction method.
The ensemble classifiers with overlapped partitioning trained on 900 ERPs showed better classification performances than a single classifier in the middle intensification sequence condition in the offline analysis. According to Figure 8 (a), OSWLDA (N b~4 ) achieved higher classification accuracy than the single SWLDA classifier (N b~5 ) among 3ƒN s ƒ5. In contrast, the OPCALDA (N b~4 ) showed higher classification accuracy than the single The best mean accuracy among all N b for each repetition is written in bold and the worst is underlined. An overlapped ensemble classifier becomes an ensemble classifier with naive partitioning when N c~1 7 and N b~1 . The classifier is equivalent to a single classifier when N c~1 7 and N b~1 7. doi:10.1371/journal.pone.0093045.t009 . The performance saturation can be seen as the N s become larger while the classification performance was not precise when N s was smaller. In both cases, differences of those classification performances were hard to confirm. This might explain why the classification performance difference was obvious in the middle number of sequences. The selection of the number of the intensification sequence in an online P300-based BCI experiment depends on the applications of the BCI system. One criterion is the information transfer rate (ITR), which takes the accuracy, number of outputs, and output time (the number of sequences) into consideration [35]. OSWLDA on data set A using 1=10 cross-validation (N b~4 ) achieved the highest ITR (15.7 bits per minute) at N s~3 , although only a 71.4% accuracy was expected in an online experiment. On the other hand, accuracy must be prioritized, for example, when the BCI is used to provide precise control of a robotic manipulator that could be dangerous. To decide parameters such as the number of intensification sequences, we should consider what kind of criterion (accuracy, speed, or ITR) should be optimized in terms of BCI applications.
Determining the amount of training data also decides an expected online classification accuracy. If the system needs over 70% mean classification accuracy, only 900 training data are required. In case that over 95% mean accuracy is required, a large amount of training data should be prepared. Most BCI applications do not usually require over 95% classification accuracy because they are free from danger. Thus 900 training data are sufficient to achieve over 70% mean accuracy for most applications of BCI.
We would like to emphasize that the ensemble classifiers with overlapped partitioning required less training data than that with naive partitioning. OSWLDA and OPCALDA performs better than the ensemble classifier with naive partitioning enough to achieve over 90% classification accuracy using only 900 training data. Especially the mean classification accuracy of OSWLDA (N c~5 ,N b~4 ) with the small training data achieved as well as that of ensemble SWLDA with naive partitioning (N c~5 ,N b~1 ) for data set A. In this way the ensemble classifier with overlapped partitioning require less training samples than that with naive partitioning so that it might be useful to do away with expensive experiments.
In this research, the PCA and stepwise method were applied as a dimension reduction. The PCA and the stepwise method have different statistical properties; PCA finds the projection that maximizes the data variance while the stepwise method selects spatiotemporal variables. Although no great difference was found in the classification accuracy for data set A using 1=10 and 9=10 cross-validation and data set B with full training data, OSWLDA showed better performance than OPCALDA for data set B with limited training data. In this way, the stepwise method was robust for both P300-based BCI data sets. The difference between the two also appears in the online/offline test computational cost; the stepwise method requires a smaller processing burden than PCA because the stepwise method in the test case does not use data projection. The difference will be more obvious when N c becomes large. Considering the computational cost, the stepwise method is preferable in case a large number of classifiers are required.
In future research, LDA with shrinkage [22] or Bayesian LDA [32] will be applied to the ensemble classifier with overlapped partitioning. These two methods estimate covariance matrices in different ways so that LDA in itself becomes robust against a lack of training data. Thus, it may be possible to achieve better classification accuracy with a smaller amount of training data by applying the two methods. The proposed ensemble classifiers with overlapped partitioning may be applicable to other types of BCIs such as an event-related desynchronization/synchronization (ERD/ERS)-based BCI [42]. In fact, some ensemble classifiers for ERD/ERS-based BCIs were evaluated [43] and our proposed overlapped ensemble classifiers might also be applicable. Moreover, the ensemble classifier with the overlapped partitioning can be used in other pattern recognition problems, e.g., a cancer classification [44] or fMRI data analysis [45]. Furthermore, clustering algorithms such as k-means clustering [46] could be used for a new overlapped partitioning of the ensemble classifiers. By clustering the data with overlaps, classifiers that perform well for specific features can be trained. Thus, the clustered partitioning with overlaps may show an even better classification performance.

Conclusion
In this study, ensemble LDA classifiers with the newly proposed overlapped partitioning method were evaluated on our original P300-based BCI data set and the BCI competition III data set II. In the comparison, the classifiers were trained on limited training data (900) and large training data. The ensemble LDA classifier with traditional naive partitioning and the single classifier were also evaluated. One of three conditions for dimension reduction (stepwise, PCA, or none ) was applied. As a result, the ensemble LDA classifier with overlapped partitioning and the stepwise method (OSWLDA) showed higher accuracy than the commonly used single SWLDA classifier and the ensemble SWLDA classifier when 900 training data were available. In addition, the ensemble LDA classifiers with naive partitioning showed the worst performance for most conditions.  The best mean accuracy among all N b for each repetition is written in bold and the worst is underlined. An overlapped ensemble classifier becomes an ensemble classifier with naive partitioning when N c~1 7 and N b~1 . The classifier is equivalent to a single classifier when N c~1 7 and N b~1 7. doi:10.1371/journal.pone.0093045.t010 We suggest to use the stepwise method as a dimension reduction for the online implementation. In future research, the LDA with shrinkage or Bayesian LDA will be applied to the ensemble classifier with overlapped partitioning.  The best mean accuracy among all N b for each repetition is written in bold and the worst is underlined. An overlapped ensemble classifier becomes an ensemble classifier with naive partitioning when N c~1 7 and N b~1 . The classifier is equivalent to a single classifier when N c~1 7 and N b~1 7. doi:10.1371/journal.pone.0093045.t011