Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Comparing P300 flashing paradigms in online typing with language models

  • Nand Chandravadia,

    Roles Investigation, Methodology, Validation, Writing – original draft, Writing – review & editing

    Affiliation Deparment of Computer Science, Columbia University, New York, NY, United States of America

  • Shrita Pendekanti,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Department of Radiological Sciences, University of California, Los Angeles, Los Angeles, CA, United States of America

  • Dustin Roberts,

    Roles Investigation, Methodology, Validation, Writing – original draft, Writing – review & editing

    Affiliation Department of Radiological Sciences, University of California, Los Angeles, Los Angeles, CA, United States of America

  • Robert Tran,

    Roles Data curation, Formal analysis, Validation, Writing – review & editing

    Affiliation Department of Radiological Sciences, University of California, Los Angeles, Los Angeles, CA, United States of America

  • Saarang Panchavati,

    Roles Formal analysis, Validation, Writing – review & editing

    Affiliation Department of Radiological Sciences, University of California, Los Angeles, Los Angeles, CA, United States of America

  • Corey Arnold,

    Roles Conceptualization, Formal analysis, Supervision, Writing – review & editing

    Affiliation Department of Radiological Sciences, University of California, Los Angeles, Los Angeles, CA, United States of America

  • Nader Pouratian,

    Roles Conceptualization, Resources, Supervision, Validation, Writing – original draft, Writing – review & editing

    Affiliation Department of Neurological Surgery, University of Texas, Southwestern, Dallas, TX, United States of America

  • William Speier

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Software, Supervision, Validation, Writing – original draft, Writing – review & editing

    Speier@ucla.edu

    Affiliation Department of Radiological Sciences, University of California, Los Angeles, Los Angeles, CA, United States of America

Abstract

The P300 Speller is a brain-computer interface system that allows victims of motor neuron diseases to regain the ability to communicate by typing characters into a computer by thought. Since the system has a relatively slow typing speed, different stimulus presentation paradigms have been proposed designed to allow users to input information faster by reducing the number of required stimuli or increase signal fidelity. This study compares the typing speeds of the Row-Column, Checkerboard, and Combinatorial Paradigms to examine how their performance compares in online and offline settings. When the different flashing patterns were tested in conjunction with other established optimization techniques such as language models and dynamic stopping, they did not make a significant impact on P300 speller performance. This result could indicate that further performance improvements on the system lie beyond optimizing flashing patterns.

Introduction

Victims of amyotrophic lateral sclerosis (ALS), brain-stem stroke, and other upper motor neuron diseases lack the ability to vocalize their thoughts and emotion. With a sustained loss of speech, their capacity to write, speak, and laugh is irreversibly impacted. However, the advent of augmentative and alternative communication devices (AAC), such as brain-computer interfaces (BCI), have provided a possible avenue to restore their ability to communicate with the external world.

The P300 speller, an electroencephalogram (EEG)-based BCI, translates neural signals recorded from the scalp into speech in the form of virtual commands on a computer screen [1]. This system utilizes the P300 signal, an endogenous event-related potential (ERP) with a characteristic positive potential after a 300 millisecond delay from stimulus presentation [2]. First introduced by Farwell and Donchin, this system has users attend to a 6x6 matrix composed of alphanumeric characters. The user attends to a character on the matrix while the rows and columns of the matrix flash randomly. Because the target character flashes relatively infrequently in a stream of non-target, repeated stimuli, attending to the target character on the matrix elicits the P300 signal, according the “oddball” paradigm. The P300 signal, observed in the EEG, is then used in classification to detect which character on the matrix was selected. Though the P300 signal is robust, these systems generally have a relatively slow typing speed. Therefore many studies have focused on system optimization, attempting to improve overall system speed.

System optimization studies have traditionally focused on enhancing specific components of the P300 speller apparatus. For instance, Allison et al. modified the matrix size, demonstrating that increasing the size of matrix improves the amplitude of the P300 signal [3]. Lu et al. evaluated the inter-stimulus-interval (ISI), suggesting a longer ISI translates to a both a higher online accuracy and higher selection rate [4]. Both Townsend et al. and Jin et al. developed novel flashing patterns, demonstrating significant improvements in bit rate and practical bit rate compared to the traditional row column paradigm (RCP) [5, 6]. Recently, work has shown that a viable strategy of enhancing system performance is to simultaneously combine distinct optimization techniques into a singular method. For instance, Speier et al. tested the performance of a ‘famous faces’ stimulus paradigm integrated with a previously published particle filtering algorithm into a singular approach, establishing that the concatenation of two distinct methodologies into one offers superior results versus both approaches alone [7, 8].

This study surveys the differences in system performance between three proposed flashing patterns: Row-Column Paradigm (RCP), Checkerboard Paradigm (CBP), and the Combinatorial Paradigm (COMB) along with the integration of a language model using a particle filtering algorithm [8]. We hypothesize that the improvements offered by different flashing patterns are negligible in comparison to those from the incorporation of a language model, and therefore that improvements to BCI performance lie outside of flashing pattern optimization.

Checkerboard paradigm

The checkerboard paradigm, CBP, was introduced as a way of improving upon the errors associated with the RCP, while concurrently improving overall BCI performance [5]. The goal with the CPB was therefore to design a novel flashing pattern that addressed the constraints associated with the RCP: the adjacency effect and the double flash pattern [9, 10]. The adjacency effect describes situations where flashes of an adjacent row or column (i.e., non-target characters) draws the user’s attention, leading to false-positive P300 signals and ultimately erroneous detections of the intended character [9]. Further, the double flash pattern highlights an inadvertent conundrum associated with the RCP: random sequential row (column) or column (row) flashes can decrease the temporal resolution of the P300 signal [10]. First, because a requisite of the oddball paradigm is the presentation of “deviant stimuli” (i.e., random stimuli), consecutive flashes can impair the detection of the second flash. That is, only the first flash of the target row (column) flash will elicit the P300 signal; the second will not. Kanwisher reported this observation as the repetition blindness phenomenon [11]. In a standard rapid serial presentation (RSVP) task, consecutive stimuli presented with a temporal resolution of less than 500 milliseconds abate the recognition of the succeeding stimuli. In our P300 speller, the flash duration for a single target selection and the ISI are both 62.5 milliseconds, meaning the second flash occurs 125 milliseconds from the onset of the preceding flash, thereby diminishing the ability of the user to resolve the detection of the second flash. Hence, the aim of the CBP sought to mitigate these issues by addressing them in the stimulus design.

The CBP superimposes an imaginary checkerboard over the matrix in such a way that each adjacent character belongs to a different class [5]. Because a checkerboard inherently has an alternating pattern of two colors, the adjacent characters are grouped into two distinct classes. The characters of these two classes then randomly populate one of two corresponding virtual matrices, which the user never observes. These virtual matrices determine the stimulus pattern for each trial (i.e., each target selection). During each target selection, the rows within both virtual matrices are flashed followed by the columns of both virtual matrices. As the rows and columns of the virtual matrices are flashed, the corresponding characters on the real matrix are presented to the user. This methodology reduces the adjacency effect, ensuring that adjacent characters never experience simultaneous flashes, and further safeguards against sequential flashes (i.e., double flashes). However, it requires a larger number of flashes in order to distinguish between each of the characters in the grid. Fig 1 depicts a schematic of the CBP.

thumbnail
Fig 1. Checkerboard paradigm (CBP) in a 6x6 matrix.

Left: Matrix with imaginary checkerboard superimposed over it; adjacent characters are assigned to different classes. Center: Characters arranged in a virtual matrix, each matrix represents a different class. In this example, the first column has been selected from the top matrix. Right: Actual display seen by user; a face is flashed on top of the selected characters from the virtual matrices.

https://doi.org/10.1371/journal.pone.0303390.g001

Combinatorial paradigm

The Combinatorial Paradigm (COMB) proposed by Jin et al. utilizes mathematical combinations to minimize the number of flashes per trial with the intention of optimizing the practical bit rate of the system [6]. Reducing the number of flashes per trial would hypothetically improve the selection rate (i.e., due to a reduced number of flashes for classification), leading to an improved PBR (practical bit rate), while still maintaining the vitality of the P300 amplitude. The goal of the COMB paradigm was therefore to optimize the number of target flashes to improve the efficacy of the system. To choose an optimal number of flashes per trial, Jin et al. used the binomial coefficient of the xk term of (1 + x)n, where n equals the total number of flashes per trial, and k equals the number of flashes on the target character. A schematic of COMB is depicted in Fig 2.

thumbnail
Fig 2. Combinatorial paradigm (COMB) with () flashing pattern.

Left: Each character is assigned a unique, two number identifier corresponding to the time it will be flashed. For example (1,3) indicates that the character will be flashed in the first and third flash. For simplicity, the characters in this figure are assigned indices sequentially, in practice the assignment would be random. In this case, we depict the third flash; so all characters corresponding to the number 3 are flashed. Right: The output seen by a user; a face is flashed over characters that are assigned flash index 3.

https://doi.org/10.1371/journal.pone.0303390.g002

Since a traditional 6x6 matrix holds up to thirty-six characters, Jin et al. proposed using 7-flash and 9-flash patterns, which locate thirty-five and thirty-six characters, respectively. The 7-flash pattern is based on the combination , meaning that each trial has seven flashes and the target character flashes three times. Here, a single trial refers to one set of stimuli for a single target selection. Therefore, selecting the character “A” for a single trial should elicit three P300 responses in a set of seven flashes. Since the combination equals thirty-five, the 7-flash pattern locates thirty-five characters in a traditional 6x6 matrix. On the other hand, the 9-flash pattern is based on the combination , which results in each trial having nine flashes and the target character flashing two times. This combination equals thirty-six, meaning that the corresponding flashing pattern locates thirty-six characters on the traditional 6x6 matrix. The 9-flash pattern locates the same amount of characters as the RCP flash pattern, while the 7-flash pattern locates one less. Likewise, a 12-flash pattern, which mirrors the RCP flash pattern, is modeled as “12 choose 2”, where there are a total of twelve flashes for two target selections per trial—. In comparison to the 7- and 9-flash patterns, the 12-flash pattern creates a 71.43% and 33.33% increase in the total number of flashes per trial, respectively.

Methods

Subjects

Ten healthy subjects (6 male, 4 female) aged 20 to 35 years old participated in this study. All subjects were cognitively viable with no noticeable neurological deficits. All subjects formally consented to participate. This study was approved by the Institutional Review Board (IRB) at UCLA.

Data collection

EEG data were collected from a 32-cap electrode (g.GAMMAcap2, Guger Technologies), and signals were amplified with two 16 channel g.tec biosignal amplifiers (Guger Technologies). Signals were sampled at 256 Hz, referenced to the left ear, grounded to AFz, and filtered using a bandpass filter from 0.1 to 60 Hz. BCI2000, a BCI-based development framework, was used for stimulus presentation and data collection [12]. Users were presented with a 6x6 matrix consisting of alphanumeric characters with ‘famous faces’ flashes [13]. Three distinct flashing patterns, RC, CBP, and COMB, were presented to the user to assess divergence in performance (Table 1). Each flash lasted for 62.5 ms with a 62.5 ms ISI, yielding a 125 ms stimulus onset asynchrony (SOA). Subjects completed two training sessions for each flashing pattern—creating a total of six sessions per subject. Each session consisted of 10 characters, specifically “THE QUICK” and “BROWN FOX” (including the spaces). Each of the characters corresponded to 10 sets of flashes, thus each character was flashed 20 times, with a 3.5s interval between characters.

thumbnail
Table 1. Cross subject mean offline selection rate (SR), accuracy (ACC), and information transfer rate (ITR), for each flashing pattern and classifier.

https://doi.org/10.1371/journal.pone.0303390.t001

Training data from each session was used for classification in its corresponding flashing pattern. If classification reached a significant benchmark from calibration data, online testing was performed for the trained flashing pattern. In this case, classification was appreciable for all subjects, so all subjects performed three online testing sessions. The order of online testing for each flashing pattern was randomized to dilute the effects of non-familiarity. The classification was performed using a previously established particle filtering (PF) algorithm [8].

Language model

In this study, we use a probabilistic automata model as described by Speier et al. [8]. The model employs a directed graph that has states for each substring that starts a word in the corpus, beginning with a blank root node. Nodes are connected with directed edges to nodes that add a character to the string. For example, if the model only contained the word “CAR,” it would have four states: the root node representing a blank string, “C,” “CA,” and “CAR.” When the word “CAKE” is added to the model, it shares the root node and the “C” and “CA” states, and adds two additional states: “CAK” and “CAKE.” The state “CA” then links to both the states “CAR” and “CAK.” If a state represents a completed word, it will begin a new word with a link back to the root. The state “CAR,” for instance, links to the root because “CAR” is a complete word, but it also is the beginning of other words so it has additional links to other states such as “CARD” or “CART.” The relative frequencies of substrings in the Brown English language corpus determined transition probabilities between nodes [14]. For instance, the probability of typing the letter “R” after “CA” has already been entered is determined by dividing the number of occurrences of words that begin with “CAR” by the number of times words start with “CA” in the corpus. Similarly, the probability that a word ends and the state transitions back to the root is the ratio of the number of times that word occurs in the corpus over the number of word occurrences starting with that substring.

Classifier

Determining the probability distribution in real time over all possible strings is computationally impractical. Instead, a PF classifier was used to estimate the distribution by sampling a batch of possible output strings. Each possible string is a particle that has a pointer to a node in the language model. Particles independently pass through the model determined by the transition probabilities. Higher-probability ones replace low-probability particles by updating the sampling based on the EEG responses. The final distribution is estimated with the proportion of particles that point to each state after they have passed through the model.

Stepwise linear discriminant analysis (SWLDA) was used to identify signal features to include in the determination of the character that the user intended to type. At training time, ordinary least squares regression was used to predict the intended character. The model adjusts the number of features used based on their significance until a certain number of flashes occurs or it converges on a feature set. At testing time, a score for a particular character t is then given by the dot product of the feature vector and the derived features from that trial. Given a target character, the overall scores can be approximated as independent samples from a normal distribution [10]. (1) where μa, , μn, and are the means and variances of the distributions for the attended and non-attended flashes, respectively, and is the set of characters highlighted in flash i. The conditional probability of a target at time t given the EEG signal and the previous target characters x0:t−1 can then be found: (2) where p(xn|x0:n−1)) is the prior probability of character xn given the previously selected characters, derived from the language model. The distribution over all output strings is approximated by particle filtering. In PF, we first generate a fixed number of samples, or “particles” to estimate the distribution. Each individual particle j consists of 4 main elements: a pointer to a state in the language model ; a sequence that represents the different states in the history of the particle ; an index m that denotes the most recent time point when the particle was at the root node; and an associated weight for the particle, denoted w(j). At initialization, we generate P particles pointing to the root node, no history, and a uniform weight . When a new character begins, a character is sampled for each particle from its proposal distribution. This is defined by the transition probabilities of the language model and the particle’s history . (3) where is provided by the language model as in Eq 1.

After each stimulus-response, the score for that response, , is computed and the probability weight is updated for each of the particles: (4) where is computed as in Eq 2. The weights are then normalized and the probability of an output string is found by summing the weights of all particles that correspond to that string. (5) where δ is the Kronecker delta. Dynamic classification was implemented by setting a threshold probability, pthresh, to determine when a decision should be made. The program flashes characters until either the maximum probability exceeds the threshold, or the number of sets of flashes reached the maximum (10 flashes). The classifier then selects the string that satisfied . If there is a difference between this output and the previous output, the older characters are treated as errors and are replaced. A new batch of particles is sampled from the current particles based on their weights. Each new particle is given the same uniform weight as before. The subject moves to the next character and the process repeats. Online optimization of p is impractical, so all trials use a previously reported value of 0.95 [8].

Predictive spelling

In the adjusted model with PS, the same language model and classifier are used, but the projection step is altered to estimate the probabilities for complete words. A subset of particles, rho, continues to the root node during projection. Since particles can move multiple steps in a single transition phase, the particle history can be greater than t, and we denote it as nj. After projection, the probability distribution for complete words is calculated by summing the weights of the relevant projected particles.

The top k words are inserted into specific positions in the character grid. The EEG signals associated with the flashing of those specific cells are applied to particles that are mapped to those words. Particles mapped to less likely words are assigned a probability of 0 and are replaced in the next step of the algorithm. The probability of a complete word selection was determined empirically and set to 0.40. At a given point, the user was presented with six word suggestions.

Evaluation

The performance of a BCI system is based on the balance between its ability to perform a particular task and the time it takes to achieve the goal. In lieu of this tradeoff, evaluation is commonly based on the bit rate (BR).

(6)

The most common use of the BR is information transfer rate (ITR). We assume a uniform distribution across all the characters (where N is 36, the size of the alphabet). The same assumption applies to errors so (7) where is the single character accuracy and n is the total number of characters selected. This reduces the bit rate to (8)

This is then multiplied by the average number of characters selected per minute (selection rate) to produce the ITR.

Evaluating predictions at a character level is not reasonable in this predictive spelling (PS) scheme, as sentences with incorrect words could be a different length from the target sentence. In order to circumvent this, accuracy is based on Levenshtein distance (LD) [15]. We then have , and the equations above hold.

Because the distributions for the metrics used are not normally distributed, significance was tested using the nonparametric Kruskal-Wallis test.

Results

Offline analysis

In the offline analysis, no one paradigm significantly outperformed any other across the three measured metrics. Table 2 shows the offline selection rate (in characters per min), accuracy, and ITR for each flashing pattern. The differences in median SR between the three flashing paradigms were not found to be statistically significant (H = 2.96, p = 0.227). The accuracy across the different paradigms, while high, are also not significantly different (H = 0.581, 0.748). While the combinatorial paradigm has a slightly better accuracy, there are no significant differences in ITR between the three paradigms (H = 5.257, p = 0.0722).

thumbnail
Table 2. The offline selection rate (SR), accuracy (ACC), and information transfer rate (ITR), for each flashing pattern.

https://doi.org/10.1371/journal.pone.0303390.t002

We repeated the offline analysis using two subsets of the training data, one with half the data (5 characters) and the other with 30% (3 characters). When reducing the training data by half, the ITR values for the RC, CBP, and COMB paradigms decreased by 7.34%, 2.06%, and 3.57%, respectively. None of these decreases was statistically significant (p = 0.13, 0.41, and p = 0.19, respectively). When decreasing the training data to 30%, the three ITR values decreased by 16.13%, 3.29%, and 9.81%. The CBP performance was not significantly different from the full dataset (p = 0.28), but the RC and COMB paradigms did significantly decrease (p = 0.0025 and p = 0.0024, respectively). When using 50% of the dataset, the COMB paradigm’s performance was still significantly better than RC and CB (p = 0.02 and p = 0.03, respectively). When using 30% of the dataset, the difference between COMB and RC remained significant (p = 0.008), but the difference with CBP was no longer significant (p = 0.42). When using the reduced training, the optimal method varied across subjects, with one subject performing best with RC, four performing best with CBP, and 5 performing best with COMB.

Different classifiers were also analyzed to assess if there were any superior approaches to LDA [16]. We explored a support vector machine (SVM), random forest (RF), and a Riemannian geometry classifier implemented using the covariancetoolbox (https://github.com/alexandrebarachant/covariancetoolbox) package, modified as in Barachant et al [17]. We found that LDA achieves a significantly better ITR across all three flashing paradigms compared to SVM (p = 0.0015, 0.0208, and 0.0354, respectively) and RF (p = 0.0022, 0.0018, 0.0018, respectively). The Riemannian geometry classifier achieved statistically similar ITR to LDA (p = 0.34, p = 0.41, and p = 0.15, respectively). The COMB flashing paradigm had the highest ITR when using each of these classifiers. However, the difference was only statistically significant using the RF classifier (p = 0.002 and p = 0.007 compared to RC and CBP, respectively). No significant difference was found between flashing paradigms when using the SVM (p = 0.47 and 0.08, respectively) or Riemannian geometry classifiers (p = 0.20 and 0.32, respectively).

Online analysis

Table 3 shows the online selection rate (in characters per min), accuracy, and ITR for each flashing pattern. Despite the RCP yielding the highest mean SR, there were no significant differences in the mean SR for any of the flashing patterns (H = 0.98, p = 0.611) (Table 1). In contrast, the difference in the median accuracy across the flashing patterns was found to be statistically significant (H = 7.399, p = .025). Pairwise Mann-Whitney tests between each of the flashing patterns demonstrate that the CBP pattern was significantly higher than the COMB flashing pattern (p = .0045). There are no appreciable differences in accuracy between RCP and CBP across (p = .271) and between RCP and COMB for (p = .112). Further, the mean ITR, which is a function of both accuracy and SR, was not significantly different for any of the flashing patterns (H = 1.46, p = 0.481), consistent with SR and the offline paradigm. Therefore, no appreciable differences were detected in BCI performance among each of the flashing patterns.

thumbnail
Table 3. The online selection rate (SR), accuracy (ACC), and information transfer rate (ITR), for each flashing pattern.

https://doi.org/10.1371/journal.pone.0303390.t003

Waveform analysis

The P300 signal for each flashing pattern was evaluated at CPz, POz, PO7, and PO8 to examine for meaningful differences in the amplitudes of the waveforms [18]. Stimulus responses during online sessions were grouped based on whether the stimulus contained the target character. The average attended and non-attended responses were calculated for each subject and a global average was produced across subjects for each channel. Significance was tested at each latency to determine whether attended and non-attended responses differed significantly using Wilcoxon signed rank tests correcting for multiple comparisons using false discovery rate.

For all three stimulus paradigms, there were significant differences between attended and non-attended stimulus responses. Each of the four channels had a large positive peak preceded by a smaller negative peak in the attended responses. In the parieto-occipital channels, the negative peak occurred at a latency of approximately 200 ms and the positive peak at a latency of 300 ms. In the CPz channel, these peaks were slightly later, occurring at approximately 300 ms and 400 ms latencies, respectively. In the CPz, POz, and PO8 channels, the positive peak was significantly different from the non-attended response. The peak in the PO7 channel was not statistically significant, most likely because of a high variance across subjects.

The average signals were compared across the responses for the three stimulus paradigms. While the positive peaks for the parieto-occipital channels were generally larger for the checkerboard paradigm, no significant trend was seen between the three groups. This result suggests that the stimulus paradigm does not significantly affect the stimulus response produced.

The peak amplitude for each subject ranged from 200 ms to 500 ms, and thus amplitudes of the P300 signal were only compared within subjects. For subject 1, there were no significant differences in the amplitude of the P300 signal at any electrode location (F2,195 = 2.75, p = 0.066; F2,195 = 0.81, p = 0.445; F2,192 = 2.72, p = 0.068; F2,192 = 0.60, p = 0.548). The P300 signal peaked around 400 ms for each flashing pattern in this subject with similar latencies for each flashing pattern. However, in subject 2, there were significant differences in the amplitude at CPz and PO7 (F2,99 = 7.45, p < 0.05; F2,81 = 9.72, p < 0.05), albeit at POz and PO8 there were no significant differences in the amplitude of the waveforms (F2,84 = 0.89, p = 0.416; F2,195 = 1.89, p = 0.158) (Fig 3). Since the mean selection accuracies for each flashing pattern for subject 2 was 100%, the P300 responses represent pure responses undiluted by incorrect selections. Interestingly, subject 3 also had a mean selection accuracy of 100% for each flashing pattern, yet only PO8 demonstrated a significant difference in the mean amplitude of the P300 signal (F2,90 = 3.1849, p < 0.05), suggesting that the amplitude of the P300 signal is not a function of the flashing pattern, but of some psychological variable, such as attention or motivation.

thumbnail
Fig 3. Target P300 waveforms.

The average target response for each flashing pattern at CPz, POz, PO7, and PO8 for subject 2 when using the row/column (blue), checkerboard (green), and combinatorial (red) flashing paradigms.

https://doi.org/10.1371/journal.pone.0303390.g003

Discussion

A robust, clinically viable BCI speller requires high accuracy (>90%), and speed (at least 15-19 characters per minute) [19]. Although the functional utility of the P300 speller has been demonstrated in invasive conditions, specifically with signals acquired with electrocorticography (ECoG), the long-term safety and utility has yet to be determined. In order to ameliorate the risks of an invasive procedure, several studies aim to optimize the utility of a P300 speller with a non-invasive, EEG-based paradigm. Much work has been done to try and optimize the flashing pattern used, but has yielded mixed results [5, 6].

Our study aimed to provide a meaningful, standardized comparison of performance for each flashing pattern, incorporating optimization methodologies that have been shown to enhance performance. In our study, alternative flashing paradigms did not significantly improve typing performance in a system with dynamic stopping and language model priors. The mean online selection rate, mean online accuracy, and mean online ITR were not significantly different for any of the three flashing patterns. This observation contrasts with reports from both Townsend (2010) and Jin (2010) that the traditional RCP flashing pattern failed to meet equivalent performance standards compared to the CBP and COMB flashing patterns, respectively [5, 6].

Townsend (2010) reported that the CBP flashing pattern yielded both a greater online accuracy and practical bit rate [5]. In a 72 character grid, there are 24 flashes per target selection in the CBP flash pattern compared 17 flashes per target selection the RCP flash pattern. Because the number of flashes in the CBP is higher than in the RCP, this would naturally lead to a greater time duration for each target selection, leading to lower SR. Leveraging dynamic stopping, where the number of flashes per target selection is modulated by the classification threshold, would dilute this disparity, normalizing the selection rate for the CBP and RCP flashing patterns. While we find that the CBP pattern has significantly higher accuracy, we hypothesize that the excellent accuracy performance of the CBP pattern in the online paradigm is due to the fact that CBP optimizes for accuracy while making significant concessions to speed.

Jin (2010) stated that mean offline practical bit rate was significantly different for the 9-flash pattern compared to the 12-flash pattern (RCP), as a result of the diminished number of flashes required for a character selection—a 33.33% decrease in the number of flashes per selection [6]. Although there were fewer characters required for each character selection, this did not necessarily translate to a higher online selection rate.

Our results suggest that dynamic stopping, where the number of flashes per target selection changes as a function of a classification threshold needed to select a character, reduces the performance effects of a nine-flash pattern with static number of flashes. Dynamic stopping allows the system make decisions without needing to wait for a required number of flashes, thereby reducing the impact of the flashing pattern on performance.

One factor that was not addressed in this study would be the effect of using a multi-stage approach that allows for either a reduction in the number of targets on screen, or an increase in the number of potential targets [20]. In this case, the targets on screen still need to be highlighted, which could be done using any of the methods presented in this article (or others such as single character flashing). The requirement of making multiple decisions for each character selection could prioritize accuracy, possibly making the CBP preferable in this context. Future work should investigate how these flashing paradigms would interact with a multi-stage approach.

Conclusion

This study shows that when used in conjunction with other established methods, proposed flashing paradigms do not make a significant impact on P300 speller performance. A large contributing factor to this phenomenon could be that dynamic stopping allows the system to make decisions without needing to wait for a required number of flashes, reducing the impact of the flashing paradigm. This result likely implies that current bottlenecks in P300 speller performance lie outside the type of flashing paradigm used, and that optimization methods should be focused on improvements to language models and predictive spelling.

References

  1. 1. Farwell LA, Donchin E. Talking off the top of your head: toward a mental prosthesis utilizing event-related brain potentials. Electroencephalography and Clinical Neurophysiology. 1988;70(6):510–23. pmid:2461285
  2. 2. Picton TW. The P300 wave of the human event-related potential. Journal of Clinical Neurophysiology. 1992;9(4):456–79. pmid:1464675
  3. 3. Allison BZ, Pineda JA. Erps evoked by different matrix sizes: Implications for a brain computer interface (BCI) system. IEEE Transactions on Neural Systems and Rehabilitation Engineering. 2003;11(2):110–3. pmid:12899248
  4. 4. Lu J, Speier W, Hu X, Pouratian N. The effects of stimulus timing features on p300 speller performance. Clinical Neurophysiology. 2013;124(2):306–14. pmid:22939456
  5. 5. Townsend G, LaPallo BK, Boulay CB, Krusienski DJ, Frye GE, Hauser CK, et al. A novel p300-based brain–computer interface stimulus presentation paradigm: Moving beyond rows and columns. Clinical Neurophysiology. 2010;121(7):1109–20. pmid:20347387
  6. 6. Jin J, Horki P, Brunner C, Wang X, Neuper C, Pfurtscheller G. A new p300 stimulus presentation pattern for EEG-based spelling systems. Biomedizinische Technik/Biomedical Engineering. 2010;55(4):203–10. pmid:20569051
  7. 7. Speier W, Deshpande A, Cui L, Chandravadia N, Roberts D, Pouratian N. A comparison of stimulus types in online classification of the P300 speller using language models. PLOS ONE. 2017;12(4). pmid:28406932
  8. 8. Speier W, Arnold CW, Deshpande A, Knall J, Pouratian N. Incorporating advanced language models into the P300 speller using particle filtering. Journal of Neural Engineering. 2015;12(4):046018. pmid:26061188
  9. 9. Fazel-Rezai R. Human error in p300 speller paradigm for Brain-computer interface. 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. 2007;
  10. 10. Woldorff MG. Distortion of ERP averages due to overlap from temporally adjacent erps: Analysis and Correction. Psychophysiology. 2007;30(1):98–119.
  11. 11. Kanwisher NG. Repetition blindness: Type recognition without token individuation. Cognition. 1987;27(2):117–43. pmid:3691023
  12. 12. Schalk G, McFarland DJ, Hinterberger T, Birbaumer N, Wolpaw JR. BCI2000: A general-purpose brain-computer interface (BCI) system. IEEE Transactions on Biomedical Engineering. 2004;51(6):1034–43. pmid:15188875
  13. 13. Kaufmann T, Schulz SM, Grünzinger C, Kübler A. Flashing characters with famous faces improves ERP-based brain–computer interface performance. Journal of Neural Engineering. 2011;8(5):056016. pmid:21934188
  14. 14. Francis W, Kucera H. Brown Corpus Manual. Dept of Linguistics, Brown University. 1979.
  15. 15. Levenshtein, V. Binary codes capable of correcting deletions, insertions, and reversals. Soviet physics doklady. Vol. 10. No. 8. 1966.
  16. 16. Lotte F, Bougrain L, Cichocki A, Clerc M, Congedo M, Rakotomamonjy A, Yger F. A review of classification algorithms for EEG-based brai-computer interfaces: a 10 year update. Journal of Neural engineering. 2018;15(3):031005. pmid:29488902
  17. 17. Barachant, A, Congedo, M. A Plug & Play P300 BCI Using Information Geometry. arXiv, Aug. 30, 2014 [Online]. http://arxiv.org/abs/1409.0107
  18. 18. Speier W, Deshpande A, Pouratian N. A method for optimizing EEG electrode number and configuration for signal acquisition in p300 Speller Systems. Journal of Neural Engineering. 2011;8(5):056016.
  19. 19. Huggins JE, Wren PA, Gruis KL. What would brain-computer interface users want? opinions and priorities of potential users with amyotrophic lateral sclerosis. Amyotrophic Lateral Sclerosis. 2011;12(5):318–24. pmid:21534845
  20. 20. Treder MS, Blankertz B. (C)overt attaention and visual speller design in an ERP-based brain-computer interface. Behavioral and brain functions. 2010;6(1):1–13.