Characterization of the non-stationary nature of steady-state visual evoked potentials using echo state networks

State Visual Evoked Potentials (SSVEPs) arise from a resonance phenomenon in the visual cortex that is produced by a repetitive visual stimulus. SSVEPs have long been considered a steady-state response resulting from purely oscillatory components phase locked with the stimulation source, matching the stimulation frequency and its harmonics. Here we explore the dynamical character of the SSVEP response by proposing a novel non-stationary methodology for SSVEP detection based on an ensemble of Echo State Networks (ESN). The performance of this dynamical approach is compared to stationary canonical correlation analysis (CCA) for the detection of 6 visual stimulation frequencies ranging from 12 to 22 Hz. ESN-based methodology outperformed CCA, achieving an average information transfer rate of 47 bits/minute when simulating a BCI system of 6 degrees of freedom. However, for some subjects and stimulation frequencies the detection accuracy of CCA exceeds that of ESN. The comparison suggests that each methodology captures different features of the SSVEP response: while CCA captures purely stationary patterns, the ESN-based approach presented here is capable of detecting the non-stationary nature of the SSVEP.


Introduction
Steady-state visual evoked potentials (SSVEPs) are a neural phenomenon that appears when neural oscillatory activity, mainly in the primary visual cortex, synchronizes to a repetitive visual light source. It can be measured in the EEG as oscillatory components that match the stimulation frequency and/or its harmonics [1]. Specifically, the SSVEP response is characterized by an energy increase that is phase-locked with the visual stimulus and is elicited for stimulation frequencies in the 1 to 100 Hz range [2]. The characteristics of the SSVEP response, mainly the amplitude and latency, show large inter-subject variations [3,4,5] and are also influenced by the stimulus source, its frequency, intensity, color, and duty cycle [1].
SSVEPs have proved to be useful for many paradigms in cognitive and clinical neuroscience [6]. For example, they are widely used for the characterization of spatial selective attention [7], and recent studies correlate SSVEP patterns with age-related memory performance loss [8,9]. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 complex and computationally more expensive than training feedforward neural networks. Additionally, cyclic connections can undergo bifurcations that lead to drastic changes in the system's behavior [26]. Echo State Networks (ESN) [27] and Liquid State Machines [28] constitute an approach towards training and applying Recurrent Neural Networks, grouped together under what has come to be known as reservoir computing (RC). ESN is based on the principle that supervised adaptation of all interconnection weights in RNNs is not necessary if the untrained weights, known as the dynamical reservoir, fulfills certain algebraic properties [29], and thus significantly few number of connections need to be trained. This approach has certain analogies with kernel methods in machine learning, with the reservoir performing a nonlinear high-dimensional projection of the input signal for discriminating samples that are not linearly separable in the original space, and at the same time, serving as a memory that provides the temporal context [30].
In previous works we have demonstrated that among other dynamics ESNs can detect chaotic synchronization variations within temporal series [31]. Reservoir computing, and in particular ESNs, have been successfully applied in the past for EEG feature extraction and classification. In brain computer interfaces, ESNs were used in P300-based [32] and motor imagery [33] modalities, but never to our knowledge for SSVEP characterization. In the field of EEG biomarker discovery, ESNs have been employed to characterize abnormal EEG patterns in children diagnosed of Attention Deficit Hyperactivity Disorder (ADHD) [34] and Parkinson Disease (PD) patients [35] as well as for epileptic seizure detection [36].
The global structure of an RNN is depicted in Fig 1. Following the nomenclature and model used by Jaeger in [21], we consider a network of K input units, N internal units, and L output units. Input, internal and output connection weights are defined respectively by the connection weight matrices W in , W and W out . RNNs are characterized by having cyclic paths of connections defined by the back-projection weight matrix W back . Input (W in ), internal (W) and back-propagation weights (W back ) matrices conform the dynamical reservoir (DR) and are randomly constructed. The weight matrix W is characterized by its spectral radius, defined as the largest absolute eigenvalue of the weight matrix. The spectral radius is closely connected with the intrinsic dynamical timescale of the reservoir, and is therefore a key ESN training parameter. A small spectral radius leads to a faster RNN response.
For the ESN principle to work, the reservoir must fulfill the echo state property (ESP). The ESP holds if the current state of the network, which is running for an infinite time, is uniquely determined by the history of the input and the teacher-forced output (i.e. the initial state of the RNN does not matter, since it is forgotten). The echo state property has proved to be linked with the characteristics of the reservoir, with the input signals and with the input and backpropagation weights [37,38,39]. In most practical applications, a spectral radius below unity ensures the echo state property [40], although this is not a sufficient condition. Other key training parameters in RC are the input scaling and the model size [40]. The model size is defined by the number of internal units N. Generally, a larger DR can learn more complex dynamics. It is very important however to be aware of the possible over-fitting if a large number of internal units is chosen, which would lead to poor generalization. The input scaling determines the degree of nonlinearity in the reservoir responses. Tasks close to linear require small input scaling factors, while highly nonlinear tasks demand larger input scaling values.
When constructing an ESN the first step is thus the random dynamical reservoir (DR) construction for the selected global parameterization. Its role is to provide temporal context and work as a nonlinear expansion of the input. The DR random and feedback connections guarantee different dynamics or states to echo within the reservoir. The larger the reservoir is, the richer its dynamics. That is why the number of units in ESNs is larger compared to other recurrent neural network architectures. During the training process the input signal may provoke dynamics that follow certain trajectories within the randomly constructed DR. ESNs training is a supervised machine learning problem in which for a training input and target output, the output weights of the RNN linearly combine these trajectories to minimize the error between the network output and the desired target. Output weights can be calculated using any linear regression algorithm.

Stationary SSVEP detection
Canonical correlation analysis [41] is employed to evaluate the stationary characteristics of the SSVEP response. CCA is a multivariable calibration-less statistical method to calculate the maximal correlation between two multi-channel signals. CCA is widely used in statistical analysis and information mining [42,43]. SSVEP-based systems have largely used CCA-based methods in recent years due to its excellent accuracy and information transfer rate [44,45]. Given two multidimensional random variables X, Y and their linear transformationx ¼ w T X andỹ ¼ v T Y, CCA finds the weight vectors w and v that maximize the correlation betweenx andỹ. Canonical correlation therefore seeks a pair of linear transformations for X and Y such that when the multidimensional variables are transformed, the corresponding coordinates are maximally correlated [46]: r ¼ max E½xỹ� ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi E½x 2 �E½ỹ 2 � p ¼ w T XYv ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi The SSVEP response is characterized by oscillations in the visual cortex matching the stimulation frequency and its harmonics. The performance of a given stimulation frequency f i is evaluated by computing the canonical correlation between the EEG sequence under evaluation (X) and a reference signal (Y), constructed as a set of sine-cosine series at the stimulation frequency and its N h harmonics of duration equal to that of the EEG sequence: The maximal canonical correlation (ρ i ) is calculated for all N stimulation frequencies (f i ) under test. The stimulation frequency delivering the largest canonical correlation (ρ) is selected as responsible of eliciting the visual response [47].

Dynamical SSVEP detection
A novel approach for extraction and classification of SSVEP non-stationary patterns is proposed here. This approach is based on an ensemble of as many echo state networks as stimulation frequencies under test. Each ESN is therefore responsible of detecting the elicited response of a particular flickering frequency.

3.2.1.
Temporal SSVEP feature extraction. SSVEP EEG temporal components are calculated for each of the N e electrode signals (which are denoted as x i (n), i = 1,2,3. . .N e ). The SSVEP temporal response at f i for the harmonic K is computed by filtering the raw EEG using a narrow band-pass FIR filter with its central frequency at K�f i . The ensemble temporal response (x f i i ðnÞ) is computed by adding the calculated individual response at every calculated harmonic. FIR filters were selected as they present linear phase. Their induced delay is thus the same at all frequencies not causing phase or delay distortion. They are inherently stable since they have no feedback elements. The EEG signal was feed-forward filtered and then the fixed delay introduced by the filtering corrected. Filters were designed using a Hamming window using the Matlab fir1 function. The order of filter was 501 coefficients. Bandpass filter cut-off frequencies were set at 0.5 Hz above and below the target frequency.

ESN construction.
ESNs (one per stimulation frequency) have been configured to have as many input nodes as there are EEG electrodes that measure the visual evoked response (N e ). As will be explained in following sections, the optimal number of internal units, spectral radius, and input scaling factor has been selected by using three-fold cross validation exhaustive search. The temporal decomposition of SSVEP components coming from each electrode (x f i i ðnÞ)) feeds the N e input nodes of the ESN targeting detection of f i . The proposed ESN detection and classification methodology is applied to an EEG recording acquired during the interleaving of N s non-stimulation periods followed by N s stimulation ones, where the visual stimulation is presented at f i . The function of the ESN is to discriminate visual stimulation periods at f i from non-stimulation periods or stimulation periods at other frequencies. For this two-class regression problem, during ESN training, the network outputs (y f i ðnÞ) of samples corresponding to stimulation periods at f i are set to 1, while the output of samples corresponding to non-stimulation periods are set to -1. Therefore, during the ESN recall the associated ESN output is maximized during the visual stimulation period. Fig 2. depicts the architecture for the characterization of f i with N e = 2 and N h = 2. It is important to recall, that while being trained on binary labels, the ESN output is continuous.

Stimulation frequency detection. SSVEP frequency detection aims at determining
which stimulation frequency among the ones under evaluation (f i with i = 1,2. . .N) elicited the visual response during a stimulation trial. To achieve this, the ESN-based architecture has been designed as an ensemble of N ESNs, where each ESN has been trained for each stimulation frequency f i . The SSVEP response at each frequency (R f i ) is assessed as the difference between the averaged ESN output (y f i ðnÞ) during the stimulation observation window and a baseline sequence prior the stimulation. As the ESN output is maximized during stimulation periods at the trained stimulation frequency, the stimulation frequency with maximal R f i will be selected as responsible of eliciting the visual response.

Experimental method
Five Caucasian male volunteers S1 to S5 with average age 33.6 years, and ages ranging from 29 to 46, participated in six recording sessions each, where oscillatory visual stimuli were presented at six different frequencies f i : 12, 14, 16, 18, 20 and 22 Hz. The visual stimuli were presented using stimulation sources consisting of an array of flickering light emitting diodes (LEDs) through a diffusing panel of 100 squared centimeters. Neuroelectric's regulatory department acting as ethics committee stated that this experimental campaign was exempt from ethical review. The electroencephalographic recordings were performed on healthy adults. Neither the recording technique, nor the experimental protocol deemed to entail any danger or discomfort for the participants. Participant's enrolled were professionals working in the field of neuroscience, who were fully aware of the recording technique and protocol. Consequently only oral informed consent was only required. Participants invited to take part in this study received detailed information of the experimental procedure before the session. This explanation was provided by the technician in charge of performing the recording. They were informed that 1) they will undertake periodic visual stimulation, 2) EEG sensors will be placed on their head and conductive gel inserted between the sensors and the scalp in order to ensure adequate contact, 3) a remote possibility of a mild headache and\or skin irritation, 4) that the study aimed to analyze alterations in brain patterns induced by a visual stimulation, and 5) they could quit the experiment at any time if they so desire. They had the opportunity to ask the technician to clarify anything they did not understand or would like to know more about. Participants confirmed they had understood the explanations, and that they voluntarily engage in the study. The researcher responsible of performing the experiment documented this consent in a signed document in which for each participant the date, time and location where the experiment took place were recorded. In this document the researcher declared to have properly informed the participant and that he/she voluntarily agreed to take part in this study.
Each session consisted of one recording per stimulation frequency. In each recording, N s = 15 stimulation trials (duration randomly ranging from 4 to 5 seconds), where the visual stimulus was presented, were followed by the same number of non-stimulation trials (duration randomly ranging from 5 to 8 seconds) with no visual stimulation. One stimulation source placed on the right of the subject presented the stimulation frequency under evaluation, while another source on the left presented a frequency randomly selected among the other frequencies used in the experiment. The goal of this montage was to simulate background interferences as in brain computer interfaces applications. Stimulation sources were separated by approximately 25 cm.
The user was comfortably seated at one-meter distance from the stimulation sources and was instructed to look at the stimulation source placed on his right when hearing a beep sound (played one second before the stimulation started). EEG was acquired using Neuroelectrics Enobio, a wearable, wireless electrophysiology sensor system for the recording of EEG, at a sampling rate of 250Hz and from three channels placed in O1, Oz and O2, according to the 10-20 system [48]. The electrical reference was placed in the right ear-lobe. Background ambient light remained homogeneous throughout all experimental sessions. The AsTeRICS [49] platform was used to record the EEG streaming data, control the stimulation panels, and trigger the recording.

Performance evaluation and results
The goal of the performance evaluation is to compare how well the SSVEP response is detected using stationary and dynamical approaches. Using previously presented ESN and CCA based methods, the stimulation frequency among the 6 under evaluation (12, 14, 16, 18, 20 and 22 Hz) responsible for eliciting the evoked potential at each trial is assessed. The performance will be estimated for observation windows ranging from 0.5 to 4 seconds in terms of detection accuracy and information transfer rate (ITR) [50].

ESN parameterization.
ESN networks have been configured to have N e = 3 input nodes and one output node. Input nodes are fed with the filtered signals coming from O1, Oz and O2. The activation function of the network nodes is set to a hyperbolic tangent. Leastmean-square error minimization between the network output and the target output has been used to compute output weights. A washout duration of 250 samples has been applied. The optimal number of internal units, spectral radius and input scaling has been calculated through exhaustive search. The detection accuracy calculated over 4-second observation windows and a 0.5-second baseline has been assessed for every combination of internal units (10,50,100,150,200,250 and 300), spectral radius (from 0 to 1 in steps of 0.1) and input scaling factor (0.001, 0.005, 0.01 0.05 0.1 and 1). A total of 462 combinations were attempted.
Data is split into a training and a test set. The training set consists of the first 10 stimulation sequences, while the test set contains the remaining 5. The training set is used to calculate the optimal ESN parameterization, whereas the test set is used to evaluate the performance of the final model parameters. Parameter optimization was done based on 5 times 2-fold-cross validation (5 x 2-FCV). The cross-validation process was repeated 5 times in order minimize the effect introduced by the random dynamical reservoir initialization. In this process, the training set is partitioned in two non-overlapping folds for independently calculating the readout connections of the ESN and assessing the model under evaluation performance. Each fold is formed by 5 sequences of a stimulation trial (duration randomly ranging from 4 to 5 seconds), where the visual stimulus was presented, followed by a non-stimulation trial (duration randomly ranging from 5 to 8 seconds). The average fold duration was therefore 55 seconds (13750 samples).
Parameter optimization was done through grid search for every possible combination of the spectral radius, input scaling and network size values. In each stimulation sequence under evaluation, the frequency maximizing the difference between the average ESN output in the 4-second observation window and the average ESN output in a 0.5-second baseline measured right before the stimulation trial is selected as responsible of eliciting the visual evoked response. A spectral radius of 0.5, 250 internal units and 0.05 input scaling factor delivered the best average classification accuracy across subjects (66%).

RC-ESN SSVEP detection.
Final results were calculated for the parameters that delivered the best average classification accuracy in Section 5.1.1, namely a spectral radius of 0.5, 250 internal units and 0.05 input scaling factor, that are applied to SSVEP detection in 0.5, 1, 1.5, 2, 2.5, 3, 3.5, and 4-second observation windows using a baseline of 0.5 seconds. The test set as described in Section 5.1.1, consisting of the 5 stimulation sequences not used for the ESN global parameters search, is employed to evaluate the ESN detection performance at each observation window. The average of five independent iterations has been calculated in order to reduce the effect produced by the random dynamical reservoir construction. Table 1 presents the calculated detection accuracy and ITR, considering the trial duration as the addition of the observation window and the initial baseline (0.5-seconds). Fig 3A and 3B show respectively individual ITR and detection accuracy for each participant. The observation window duration does not influence the detection accuracy of the RC-ESN-based detection. Similar average and individual detection performance is achieved for every observation window length under test. This fact boosts the maximum obtained ITR that reaches 49 bits/minute in 0.5-second windows. An important subject variability is observed, while some subjects achieve an excellent communication performance (Subject 3 has an ITR of 87 bits/min in 0.5-second windows), in others it significantly decreases (ITR for subject 2 is 22 bits per minute).

Canonical correlation analysis
Recordings have been band-pass filtered using a finite impulse response filter with 250 coefficients and high and low cut-off frequencies set to 1 and 45 Hz, respectively. The filtering aims to reduce the influence of high amplitude low-frequency components caused by motion artifacts and bad electrode-skin contact, as well as removing power line-noise interferences. After filtering, stimulation sequences have been extracted and split into shorter observation windows starting at the beginning of the visual stimulation. Maximal canonical correlation has been calculated for each stimulation frequency and its second harmonic. Canonical correlation performance has been evaluated on the test set used to assess the performance of the final ESN model. Table 1 presents the detection accuracy and ITR for 0.5, 1, 1.5, 2, 2.5, 3, 3.5, and 4-second observation windows. Fig 4A and 4B displays respectively the individual ITR and detection accuracy for each participant for the proposed observation windows. Detection accuracy increases along with the observation window length, achieving a maximum average of 66% in the 4-second observation window. The maximum information transfer rate, 13 bits/minute, is reached at 2-second observation windows.
The data shows an important amount of subject variability: while some subjects achieve an excellent detection accuracy and ITR (Subject 3 has 100% classification accuracy for a 4-second observation window and a ITR of 75 bits/minute in 1.5-second windows), others deliver poor classification results (Subject 5 has 27% detection accuracy for 4-second observation windows and a maximum ITR of 3 bits/minute).

Stationary Vs dynamical SSVEP frequency detection
SSVEP is a subject-dependent phenomenon in which a given stimulation frequency has proved to range from excellent to random classification across subjects [36]. Table 2 presents the detection accuracy for each stimulation frequency obtained for the observation windows delivering largest average detection accuracy for CCA and ESN-based methodologies, respectively 4 and 2 seconds.
ESN-based methodologies significantly improve the average detection accuracy of subjects 1 and 5, while standard CCA performs better in subject 2 and 4. Both methodologies deliver excellent average classification performance for subject 3. Results show that the ESN-based method significantly improves classification in some stimulation frequencies and subjects compared to CCA (subject 5, stimulation frequency 22 Hz) and vice versa (subject 2 stimulation frequency 14 Hz).

Discussion and conclusions
We have presented a novel SSVEP-detection methodology based on reservoir computing with non-stationary pattern recognition capabilities whose performance has been compared with state of the art SSVEP stationary detection, Canonical Correlation Analysis. In this study the response to 6 visual stimulation frequencies ranging from 10 to 22 Hz is studied in 5 participants. Stationary and dynamical analysis of SSVEP response delivered disparate classification accuracy as displayed in Table 2. While for some subjects ESN-based methodologies outperformed the stationary analysis in terms of detection accuracy (Subject 5 at 22Hz) in others the opposite occurred (Subject 2 at 16Hz). Our hypothesis is that this performance may prove the elicitation of evoked responses of different nature, explaining why stationary and dynamical detection methodologies behave differently.
Traditionally brain response to a flickering visual stimulation has been considered steadystate, in which the elicited effect is believed to be unchanging in time. In this scenario CCA can capture this stationary nature by linearly transforming the EEG into a vector space that reproduces the flickering frequency and its harmonics. In contrast, as previously stated in Section 2, Reservoir Computing based approaches can detect dynamical patterns and complex synchronization between EEG channels, and therefore identify the non-stationary nature of the SSVEP. The study of this SSVEP response duality can be used to further understand brain mechanisms in cognitive and clinical neuroscience. Our hypothesis is that SSVEP dynamical response may serve to study alterations in brain patterns linked attention, aging neurodegeneration and psychiatric disorders.
The hereby proposed methodology can be also employed by itself or combined with other stationary techniques for the construction of BCI applications. Its detection capabilities proved to be independent from the observation window duration, delivering similar detection accuracy for windows ranging from 0.5 to 4 seconds (from 63% to 68%). In contrast, in CCA detection, performance increases along with window length, starting at random classification for 0.5-second windows and reaching a detection accuracy of 66% in 4-second windows. These results highlight the communication capabilities of the ESN-based method, which achieves an average information transfer rate of 47 bits/minute (with a maximum ITR of 87 bits/minute Table 2

. Detection accuracy percentage of canonical correlation analysis (4-seconds observation window) and proposed ESN-based methodologies (2-seconds observation window).
In red accuracies below 50%, in green in the 50-75% range and in green above 75%.

Subject 1
Subject 2 Subject 3 Subject 4 Subject 5 cca w = 4" esn w = 1.5" cca w = 4" esn w = 1.5" cca w = 4" esn w = 1.5" cca w = 4" esn w = 1.5" cca w = 4' esn w = 1. Characterization of the non-stationary nature of SSVEP using ESN for Subect 3), that outperforms the average information transfer rate of 13 bits/minute achieved by the CCA method. The methodology proposed here achieves a high communication rate for a BCI system with 6 degrees of freedom, compared to other state of the art applications [1,51,52]. However, given the reduced data set of this study, although training-test split methodology provides a reasonable sanity check, the low number of samples reduce the ability of this statistical test to robustly generalize. The applicability of the here proposed ESN-based methodology in BCI systems needs therefore to be further explored on larger data sets.