Suppression of overlearning in independent component analysis used for removal of muscular artifacts from electroencephalographic records

This paper addresses the overlearning problem in the independent component analysis (ICA) used for the removal of muscular artifacts from electroencephalographic (EEG) records. We note that for short EEG records with high number of channels the ICA fails to separate artifact-free EEG and muscular artifacts, which has been previously attributed to the phenomenon called overlearning. We address this problem by projecting an EEG record into several subspaces with a lower dimension, and perform the ICA on each subspace separately. Due to a reduced dimension of the subspaces, the overlearning is suppressed, and muscular artifacts are better separated. Once the muscular artifacts are removed, the signals in the individual subspaces are combined to provide an artifact free EEG record. We show that for short signals and high number of EEG channels our approach outperforms the currently available ICA based algorithms for muscular artifact removal. The proposed technique can efficiently suppress ICA overlearning for short signal segments of high density EEG signals.


Introduction
During the measurement of head surface potentials with electroencephalography (EEG), weak brain signals are often corrupted by various sources of interference. One of the most common interference sources are potentials generated by muscles contracted by a measured subject. These are so called muscular or electromyographic (EMG) artifacts. The EMG artifacts are commonly occurring nuisance in many EEG records; therefore, good methods for their removal are valuable tools for EEG processing.
In the past it was suggested to use the independent component analysis (ICA) for the EMG artifact removal. Under suitable circumstances the ICA was demonstrated as a good approach for this task, and its use was developed in many works [1][2][3][4][5][6][7][8][9][10][11]. PLOS  There are however limitations to what the ICA can achieve. If the signal length is not sufficient with respect to the number of EEG channels, the ICA suffers from so called overlearning [12][13][14], which prevents the separation and subsequent removal of EMG artifacts.
In the reminder of this section we introduce the problem of artifact removal with ICA more formally, specify the problem of overlearning, and review the current state-of-the-art methods in which this problem is addressed. Let where A denotes an MxM 0 mixing matrix. Note that we do not expect the number of sources M 0 to be less or equal to the number of signals M. This is a common assumption in the derivation of the ICA, but cannot be realistically expected in the addressed problem. We further assume that the source signals s m [n] are either generated by a brain or by other interfering sources (e.g. contracted muscles).
Our task is to remove the artifacts from the brain signals. As mentioned above, for this purpose it is possible to use the ICA, which allows to estimate a separation matrix B that transforms signals x[n] into source componentsŝ½n ¼ ½ŝ 1 ½n; . . . ;ŝ M ½n T that have their mutual dependence minimizedŝ ½n ¼ Bx½n: If the measured signals x[n] are sufficiently long, this approach was reported to achieve a fairly good separation of brain signals and artifacts [2,6]-this means that the most of the componentsŝ m ½n contain either brain signals or artifacts, while the mutual intermixing of these signals is noticeably reduced. Thus, it is possible to either manually [10,11], or automatically [11] classify eachŝ m ½n as either a brain signal or an artifact, and retain only the brain signals, which we will denote ass½n ¼ ½s 1 ½n; . . . ;s M ½n T s m ½n ¼ŝ m ½n; ifŝ m ½n is a brain signal; 0; ifŝ m ½n is an artifact: ( ð3Þ With brain signal componentss½n we can try to reconstruct the measured signals without the artifactsx ½n ¼ B À 1s ½n; ð4Þ wherex½n ¼ ½x 1 ½n; . . . ;x M ½n T are the estimates of the measured signals with artifacts suppressed. The above-mentioned methodology was used before [6,11,15,16] with a certain success; however, it will work only if the measured signals are sufficiently long compared to the number of measured channels. This limitation is illustrated on the following two examples. Example 1: We have used a 2s long record from 9 EEG electrodes located in the frontal area. This record contains a clearly visible muscular artifact present in each of the 9 channels. We applied the above-mentioned method to the data (the ICA was implemented using the popular FastICA algorithm [17], Eq (14,21,24)]). Fig 1A shows  In this example the muscular artifacts were suppressed, and we have obtained a nice looking reconstruction of the original EEG.
What we illustrated in Example 2 is a phenomenon known as overlearning [12][13][14]. The overlearning occurs when the ICA algorithm has too many degrees of freedom, which allows it to find componentsŝ½n that appear to be even more independent than ifŝ½n contained separated EEG and EMG signals.
There are several ways how to tackle the problem of overlearning. First, we could use longer EEG records. This, however, is not desirable. While the abovementioned algorithm might sometimes be able to suppress muscular artifacts, it will inevitably distort the reconstructed EEG. For example, by removing 4 of the 9 componentsŝ m ½n in the Example 1 we have reduced the rank of the reconstructed datax½n from the original 9 to 5 (note that the rank of an M dimensional signal x[n] with length N we understand the rank of the matrix [x m [n]] m=1. . .M,n=1. . .N ). Thus even though the result may be 'nice looking' some loss of information in the reconstructed signals must have occurred. Therefore, we would like to limit the application of muscular artifact removal and the corresponding rank reduction only to the time intervals, where the muscular artifacts have really occurred, avoiding unnecessary distortion of the artifact free EEG. Better yet, the interval length should not exceed the time during which the EMG artifact origin position (and the corresponding mixing matrix A) is close to stationary. These time intervals can be fairly short, often less than a few seconds [18].
Next, in the past there were several published works that suggested how to suppress overlearning.
Works [12][13][14]19] suggested that the proper choice of contrast functions in the ICA algorithm can decrease the level of overlearning. However, the improvement provided is often not sufficient to completely solve the problem. In our Example 2 we have used the FastICA algorithm with robust approximation of negentropy as the contrast function (which was suggested as the best choice in [12]); however, the overlearning is still distinctly present.
Works [13,14] suggested to use the PCA to decrease the dimensionality of x[n] prior to the ICA computation. When the dimension reduction is sufficient, this approach will undoubtedly suppress overlearning; however, this will be at the cost of EEG distortion. The brain signals may have smaller power than the EMG artifacts, thus the dimensionality reduction will inevitably remove some energy from EEG, possibly removing useful information that may be missing in a subsequent EEG analysis. Thus, the application of the PCA for the dimension reduction is not always desirable, especially, if there is a way to avoid it.
Works [13,14] also noted that the 'bumps' observable in the separated components produced by an overlearned ICA may have their energy predominantly at low frequencies, and therefore suggested to high-pass filter the analyzed signals prior to the ICA application. It was suggested to use either a 1Hz fixed cut-off frequency high-pass filter or an AR-process based high pass filter. It was further claimed that this high-pass filtering lessens the overlearning. In our experience this approach is not sufficient. In Example 2, the signals were already high pass filtered with a 1Hz cut-off filter (to suppress the baseline wonder), but the overlearning has still occurred. The 'bumps' in the separated components are simply faster (with duration of about 50ms); therefore, their occurrence cannot be prevented by prior high pass filtering without damaging useful brain signals.
Work [3], which concentrated on the suppression of ocular artifacts, suggested to use a PCA based dimension reduction to extract 3 principal components with the highest power, separate these components using the ICA, and then subtract the separated components from the original signals using a linear regression. This method is peculiar in that it uses no artifact classification. It assumes that the ocular artifacts will have the highest power, and thus will dominate the strongest principal components. Due to a distinct dimension reduction by the PCA, this methodology was claimed to be resilient to overlearning; however, we do not find it useful for other than ocular artifacts-not all EMG (or other) artifacts have power higher than brain signals, and consequently they may not get included into a few strongest principal components. In fact, with weaker artifacts, the strongest principal components may contain mostly brain signals, the removal of which would lead to the corruption of artifact free EEG signals. Still, for the sake of completeness, we include this method in our evaluation of the current state-of-the-art.
Last, there are some methods that try to alter the basic approach (2)-(4). The wavelet enhanced ICA [7,8] extends the brain signal selection by adding wavelet filtering. Surface Laplacian [10] was also proposed to filter the EEG channels after the basic approach (2)-(4) was applied. These techniques, however, have no effect on the overlearning, which already occurs in the separation step (2). Another technique, the ensemble empirical mode decomposition and subsequent ICA [20], first decomposes each of the measured signals x[n] into the intrinsic mode functions, and then the ICA is applied. This does affect overlearning, but in a negative way, because the number of signals separated by ICA increases, which makes the effects of overlearning more severe.
All in all, in our opinion, none of the above-mentioned methods provides an efficient way to suppress the ICA overlearning without problematic side effects. We have therefore developed our own approach to the ICA overlearning suppression, which seems to outperform the above-mentioned methods, and provides better EEG reconstruction of short EEG records with EMG artifacts.

Methods
In this section we will first describe the suggested algorithm, then point out some of its properties, and suggest how to choose its parameters.

Suggested algorithm
We propose to separate the M-dimensional signal x[n] into K subspaces with dimension L where P k are LxM subspace projection matrices (we assume that their rank is L, and their choice will be discussed in subsection Choice of subspaces), and X k ½n are L dimensional signals. Next, we apply the ICA separately to each signal X k ½n where B k is a separation matrix estimated for X k ½n (note that this means that the ICA will be computed K times for each X k ½n separately).
After, we classifyŜ k ½n ¼ ½ŝ k;1 ½n; . . . ;ŝ k;L ½n T as either brain signals or artifacts, and retain only brain signalsS k ½n ¼ ½s k;1 ½n; . . . ;s k;L ½n T s k;i ½n ¼ŝ k;i ½n ifŝ k;i ½n is a brain signal; 0 ifŝ k;i ½n is an artifact: ( ð7Þ This operation can be expressed in a matrix form as where Q k is a diagonal matrix with its diagonal comprised of zeros and ones, positioned so that the signalsŝ k;i ½n classified as artifacts are eliminated according to (7). Now, we reconstruct signalsX k ½n, in which the muscular artifacts are suppressed The above-mentioned sequence of steps can be expressed as where we denoted C k ¼ B À 1 k Q k B k . Last, we combine the individual signalsX k ½n, reconstructing the artifact free EEG. For this purpose we denote ; ð11Þ : ð12Þ Using notations (11) and (12), all the above-mentioned steps can be expressed as To combine the elements ofX ½n into the reconstructed signalsx½n, we reverse the projection into subspaces (matrix P) by employing the Moore-Penrose pseudoinverse of P Thus, the entire transformation that suppresses the artifacts can be expressed as where we denoted D ¼ P y CP.

Notes about properties of suggested algorithm
Since the dimension L of the subspaces is smaller than the dimension M of the original signals x[n], the ICA that is used to find the separation matrices B k may be less prone to overlearning. In essence by choosing L sufficiently small, the problems that we illustrated in our Example 2 disappear, and the ICA behaves as in low dimensional case shown in Example 1.
If the subspace projection matrices P k are chosen so that P has full rank, and if no artifacts are found in the separated signalsŜ k ½n, thenS k ½n ¼Ŝ k ½n (i.e. Q k are identity matrices), C k and C become identity matrices, and D becomes the identity matrix. Consequently, no distortion is introduced into the processed signals. This is a great advantage over the PCA based overlearning suppression approach, where the rank of measured signals is always reduced, irrespective to the number of components that are being removed (i.e. the rank is reduced even if no artifacts are being removed), which causes unavoidable distortion of the resulting reconstructed signals.
If artifacts are detected in the source signalŜ k ½n the rank ofS k ½n and C will be reduced. The rank of D may then also be reduced; however, in the Results section we will show that the newly proposed method causes dimensionality reduction much less severe than the one caused by the PCA based overlearning suppression.

Choice of subspaces
Three points should be observed when choosing the subspaces defined by the projection matrix P.
1. The dimension L should be small enough so that the overlearning does not pose a problem.
It should however remain big enough so that there is still a sufficient number of signals so that the ICA can separate brain signals and artifacts.
2. The matrix P should be full rank (otherwise D would not be identity when no artifacts are being removed, and unnecessary EEG distortion would occur).
3. The orientation of subspaces should be chosen so that the individual ICAs can achieve good separation between brain signals and artifacts in separated signalsŜ k ½n even with the reduced dimension L of X k ½n.
To address point (i), in this paper we identify the optimal range of values L by applying the algorithm to EEG data, and evaluating for which L we obtain the best reconstruction of brain signals. In the following sections we will introduce a methodology for the evaluation of the quality of brain signal reconstruction, and we will show that even for various EEG electrode systems we can identify a fixed range of values of L where the algorithm always performs well. The reader can then use the value of L suggested by our evaluation, and the optimization of L does not need to be repeated.
Point (ii) can be achieved by construction of P-simply by choosing sufficient number of sufficiently diverse subspaces, P can be full rank.
Point (iii) is somewhat more difficult to tackle. In this paper we would like to take a simple approach that is based on our experience with processing of EEG data, and suggest a choice that can be shown to produce good quality of reconstruction of brain signals. Specifically, we have observed that when we use a high density EEG array (e.g. with 111 electrodes), but limit ourselves to a small group of adjacent electrodes (e.g. 10-15 adjacent electrodes), the ICA usually provides a nice separation of brain signals and muscular artifacts. We have actually already illustrated this situation in our Example 1 which shows signals from a subgroup of 9 adjacent electrodes chosen from the 111 electrode array used in Example 2. In this paper we would therefore suggest the following steps to choose the matrices P k .
First, let us denote d ij the Euclidean distance between i-th and j-th electrode. Next, we define vectors where ℓ k,i , i = 1,. . ., L are indexes of L electrodes with the smallest distance d ' k;i; k from the k-th electrode (with k-th electrode included). Now, we define the vectors and construct the projection matrices as Thus, each projection matrix P k will extract a subspace composed of L signals from a group of L electrodes closest to the k-th electrode (k-th electrode included). Before continuing we will make a few notes about the projection matrices P k given by (18). Note that each p k,i (i.e. each row of P k ) is composed of zeros and a single value of 1. Thus, the columns of P k are orthogonal, and consequently the columns of P are orthogonal as well. Further, because k = 1,. . ., M, each column of P contains at least a single value of 1, and so has a nonzero norm. Consequently, P is a full rank matrix, just as we have required in point (ii).
We do not claim that our choice of P k is optimal, but we will illustrate and claim that it provides better results than all the above-mentioned state-of-the-art methods available today. Further optimization of P k may be possible, but it will not be addressed in this paper.
Last, some additional rationale for this choice of subspaces can be provided. The energy of surface EEG is typically dominated by shallow brain sources and artifacts. The head surface regions affected by these sources are typically somewhat spread due to the volume conduction; however, the greatest surface potential changes are still somewhat localized to the proximity of the origin of the source [21]. By choosing each subspace as the signals from a group of adjacent electrodes, we try to minimize the number of sources that dominate the signals in a subspace (at least as opposed to a subspace that would be composed of signals from nonadjacent electrodes spread all around the subject's head). Consequently, it becomes easier for the ICA to separate brain signals and artifacts as was requested in point (iii).

ICA algorithm
To find the separation matrices B k , we suggest to use the FastICA [17,Eq (14,21,24)] on data that were pre-whitened by PCA [22]. This is a popular approach, with fast convergence and previous successes in artifact separation [2,3]. Note that the suggested methodology may work with some other ICA algorithm, but testing the performance with other ICA algorithms is out of the scope of this paper. It is also noteworthy that works [12][13][14] reported that the overlearning does not seem to be dependent on the choice of a specific ICA algorithm.

Artifact classification
To decide whether the separated componentsŜ k ½n are of a brain or a muscular origin, we used a simple classifier that compares the power of a signal at lower and higher frequencies.
Compared to clean EEG, the EMG artifacts are known to have higher energy at higher frequency bands [11,18,23], which allows their easy identification.
We first filtered each signalŝ k;i ½n into three frequency bands 3-30Hz, 60-90Hz and 110-140Hz. Then, we estimated the average power of the signals in each of these bands as P k,i,1 , P k,i,2 and P k,i,3 , respectively. Next, we computed the minimum of power ratios a k;i ¼ min ðP k;i;1 =P k;i;2 ; P k;i;1 =P k;i;3 Þ: Last, the values of α k,i were compared with a chosen threshold T to determine whetherŝ k;i ½n is of a brain or a muscular origin.
To determine the optimal value of the threshold T, we computed α k,i for 2992 separated signalsŝ k;i ½n that were manually classified as either brain signals or muscular artifacts. The computed α k, i were then used to plot histograms shown in Fig 3, where a clear separation can be seen between the values of α k, i for the individual classes. From Fig 3 we can read that the optimal value of T is about 2.5. Thus, a signalŝ k;i ½n was classified as a muscular artifact if α k,i < 2.5.

Evaluation of algorithm performance
To evaluate the properties of the suggested method, we may apply it to EEG records containing EMG artifacts. This approach has however one serious drawback. While it is possible to observe that the artifacts are suppressed, it is not possible to judge how well the brain signals are reconstructed-with a common EEG record we have no way of knowing what exactly should the EEG look like, when the artifacts are removed. Therefore, this method cannot be objectively evaluated on real world EEG signals corrupted by EMG artifacts.
To circumvent this limitation while keeping the evaluation as realistic as possible, we have devised a testing procedure that uses real world EEG and EMG records, but simulates their Suppression of overlearning in ICA used for removal of EMG artifacts from EEG records mixing using a realistic head model. Specifically, we used the following approach. We started with artifact free EEG signals, henceforth denoted as x clean [n]. These signals were recorded on 20 healthy subjects (10 male, 10 female) with age ranging from 19 to 37 years (25.4 ± 5.1 years). During the acquisition of EEG data, the participants were seated with their head supported by a headrest, and instructed to relax. From each subject we obtained a 10-minutes long record with their eyes open and 2-minute long records with their eyes closed. These measurements were approved by a local ethical committee and an informed consent in the written form was obtained from all participants. The data from 111 electrode system with a reference placed on a forehead were digitized with 16 bit resolution and sampled at the frequency of 1024Hz. The data were filtered by a notch filter removing any possible power noise interference at 50Hz and higher harmonics, and by a high pass filter with 1Hz cut-off frequency to suppress the baseline wonder. The channels showing poor electrode connection were visually identified and rejected from further processing. After recording, the EEG was visually checked for artifacts. Special attention was paid to the fringe electrodes that are typically most affected by muscular activity. Only the parts of EEG records with no visually recognizable artifacts were used for testing.
Once, the artifact free EEG signals x clean [n] were gathered, we added the muscular artifacts obtained by a simulation on boundary element method (BEM) based realistic head model with the same electrode arrangement as the one used for the EEG measurement. We used a BEM model composed of 19440 elements arranged in 3 layers representing air-skin, skinskull and skull-brain boundaries. This model was previously used in [24,25] (see Fig 1 in [25]). In this simulation we placed 5 sources representing cervical muscles and 6 sources representing mandibles into the head model. Temporal signals for these sources were obtained independently for each source from separate surface EMG records measured by electrodes placed on mandibles and across neck during jaws contraction and head movements. Using this combination of EMG signals and realistic head modeling, we created EMG artifacts that could be realistically observed in EEG electrodes. Once the EMG artifacts, henceforth denoted as x artifact [n], were generated, they were added to the artifact-free EEG, creating a mixture x[n] that was subsequently processed by the newly suggested and other state-of-theart algorithms The constant η was chosen as where k.k is the Euclidean vector norm, and the ξ sets the ratio between the energy of signals   Suppression of overlearning in ICA used for removal of EMG artifacts from EEG records Once the reconstructed signalsx½n were obtained, the quality of their reconstruction was evaluated using an average correlation coefficient Because the values of r are random numbers, the comparison between the values of r obtained for various methods necessitates the use of statistical testing. We therefore obtained signals x[n] 35000 times, applied the above-mentioned methods to each of these trials, averaged the resulting values of r obtained for each trial into one mean value " r, and computed their standard deviation σ. These values were then compared with the one way ANOVA and the post-hoc Scheffe test.
The evaluation described above was used for two purposes. First, we examined the performance of the newly suggested method for various values of L and three different electrode systems (111 electrode system, 10-10 electrode system and 10-20 electrode system). The results were used to identify the values of L, for which the newly suggested method provides the best results. Second, we examined and compared the performance of all the above mentioned methods (the newly proposed method and the state-of-the-art methods)-for each method we computed the value of " r, and used these values to compare how well the individual methods remove EMG artifacts.
The evaluation is performed for EEG records that are 1s long (in practice a longer EEG record would be segmented to these shorter segments, and each segment would then be processed separately by an artifact removing algorithm). We find this segment length sufficient for the artifact removal, but short enough to accommodate the nonstationarity of EMG artifacts.
Last, to further test whether the proposed algorithm does not corrupt useful information in EEG data, we followed approach from [11], and checked whether our algorithm does not impair the detection of changes in alpha activity occurring when the measured subjects open or close their eyes. For this purpose, we used two minute long EEG records measured with subjects' eyes closed, and two minute long EEG records measured with subjects' eyes open. We used only records from the subjects that manifested distinct augmentation of alpha activity when their eyes were closed (based on visual inspection of signals in the time domain). This was the case in 13 out of 20 participants. The measured signals were processed by the proposed method using segmentation into 1 second long segments. To examine the effect of our method, we estimated PSDs before and after its application. The PSDs were computed for each subject using signals from an electrode that would correspond to the electrode OZ of the 10-10 electrode system. The PSDs estimation procedure was based on Welch's method with segmentation into 1s long segments with 75% overlap and weighting with the Hamming window. Last, the individual PSDs were averaged over subjects providing the average PSDs presented in section Results.

Results
The Fig 6 shows the values of " r computed with different subspace dimensions L for various electrode systems (ξ was chosen as 4). Note that for all electrode systems the best performance is achieved with L within interval 10-20. For further computations we chose value L = 12, and we suggest to use this value with future applications of our algorithm (the optimization of L does not need to be repeated).
We have also performed this optimization for the signal length of 2s and 4s, and found very similar results: the best performance was achievable for L within interval 10-20. Tables 1, 2 and 3 show the values of " r and σ obtained for the suggested algorithm (with L = 12) and other state-of-the-art algorithms. The last column indicates whether a statistically significant difference (SSD) was found between the sets of values r for respective method and the newly proposed method. These results were obtained for a reference electrode placed on a forehead. We also computed these values with a common average reference and also with data  Table 1. The correlation coefficients from the processing of EEG signals without muscular artifacts (ξ = 0).

Method
" r σ SSD re-referenced to several randomly picked electrodes and always achieved virtually identical results.
To present the best performance for the state-of-the-art methods we show the results for methods C and D with different numbers of principal components and the best performing Table 2. The correlation coefficients from the processing of signals with equally strong muscular artifacts and EEG (ξ = 1).

Method
" r σ SSD number of principal component (this number was found by the evaluation of all possible numbers of principal components, and cannot be realistically achieved in practice, where we have no way of knowing which result provides the highest " r).
To assess the dimensionality reduction caused by the newly proposed method, for each D we computed how many strongest singular values comprise 99% of the energy of all singular values. On average, we found this to be 78 for ξ = 0, 67 for ξ = 1 and 54 for ξ = 4. Note that these numbers are much higher than the number of PCs retained in the application of method C (Tables 1, 2 and 3). Therefore, our approach causes much lower dimensionality reduction, and consequently smaller corruption of clean EEG data.
In addition to the quantitative analysis provided above, we also present some illustrative results achieved when our method was applied to real EEG signals.
In Fig 7 we show the results obtained when the proposed method was applied to the real EEG signals with EMG artifacts shown in Fig 2A (the signal was split to 1s long segments and our method was applied separately in each of these segments). Fig 7 shows that the EMG artifacts were nicely removed, while the effects of overlearning (shown in Fig 2B) were not observed in any of the estimated source subsetsŜ k ½n during the processing. In addition, Fig 8 shows the topographical power distribution of these signals before and after the application of our method. In Fig 9 we also show the PSDs computed for five different positions on the scalp (these PSDs were computed using Welch's method with 0.5s long segments overlapped by 75% and weighted by the Hamming window). Note the decrease of power in fringe electrodes, which is associated with the removal of muscular artifacts. These results are similar to those previously presented in [11].  Suppression of overlearning in ICA used for removal of EMG artifacts from EEG records Overall, the results shown in Figs 7,8 and 9 show nice suppression of EMG artifacts without any noticeable effects of overlearning. However, as stated above, this kind of application cannot be used to quantitatively evaluate the quality of EEG reconstruction; therefore, we include it for illustrative purposes only.
Last, Fig 10 illustrates the changes in the alpha band power in EEG measured on subjects with eyes open and closed. The average PSDs clearly indicate the increase of alpha band power when subjects' eyes were closed. More importantly, Fig 10 also shows that this change does not seem to be impaired by the application of our method.

Discussion
The results show that the newly proposed method outperforms all the presented state-of-theart methods.  Suppression of overlearning in ICA used for removal of EMG artifacts from EEG records The presented results also confirm the weaknesses of the individual state-of-the-art methods pointed out in our introduction. The PCA based dimensionality reduction (method C) distorts the EEG even if no artifacts are being removed. This can be seen as decrease of " r for clean EEG (ξ = 0). The high pass filtering methods (methods Ba and Bb) show consistently poor performance for all setups, because with their use the overlearning is not suppressed. Method D performs poorly for artifact free EEG (ξ = 0), where the strongest principal components are composed of brain signals only, and are incorrectly removed from the original EEG. For the stronger artifacts the performance improves; however, it is never as good as for the newly proposed method.
Further, we would like to discuss some additional points related to our method. In this paper we concentrate primarily on EMG artifacts caused by transient muscular contractions (henceforth, transient EMG artifacts). Besides these artifacts, EEG can also be contaminated by EMG artifacts caused by tonic muscular contraction (henceforth, tonic EMG artifacts). The tonic EMG artifacts can be quite weak, even difficult to recognize upon visual inspection, and of little concern in some applications (e.g. see a small effect of tonic EMG artifacts on evoked response potentials in [11]).
We of course expect that our method will suppress even the tonic artifacts to some extentcompared to transient EMG artifacts they are just weaker and more stationary. Therefore, as long as the ICA is able to separate them, they will be identified and removed from the EEG. In fact, the decrease of power on higher frequencies shown in Fig 10 can likely be attributed to the suppression of tonic EMG artifacts, and this suppression is similar to the results presented in [11].
However, in principle, a stationary interfering source will be better removed using a longer EEG record. Therefore, if a researcher faces a situation where any residual tonic EMG artifacts remaining after the application of our method are still concerning, the application of our https://doi.org/10.1371/journal.pone.0201900.g010 method can be followed by the application of a 'classical' EMG removal procedure (e.g. [11]) that operates on longer EEG records. In this arrangement our method would be better suited to deal with transient EMG artifacts, and the 'classical' EMG removal procedure would be better suited to remove any residual tonic EMG artifacts. In applications where the presence of residual tonic EMG artifacts is not concerning, the application of our method alone would be sufficient.
In relation to tonic EMG artifacts, one could additionally point out that the EEG records x clean [n] may not be completely devoid of EMG artifacts, because the tonic EMG artifacts are always present, unless the measurement is performed in paralysis [11,23]. However, we would argue that while their presence cannot be excluded, visually unrecognizable tonic EMG artifacts have total power much smaller than the total power of EEG, and therefore the effect of these tonic EMG artifacts on the average correlation coefficients " r will be negligible, especially when we work with transient EMG artifacts with energy equal or greater than the energy of EEG.
Another point that we would like to address is related to our EMG classifier. Note that the classifier described in section Artifact classification is not the only possible approach. For example, work [11] suggested a classifier based on the slope of a power spectrum. In fact, we have also retested our method with the classifier from [11] and obtained virtually identical results. The reader should note that the classifier itself is not an unalterable part of the proposed method, and if necessary a researcher can adjust it to suit application-specific needs.
Further, in the motivation and evaluation of our method, we have limited ourselves to the removal of EMG artifacts, while making few comments about other types of artifacts. This was mostly dictated by a need to fit our presentation into a single paper, and it does not mean that the presented method cannot be generalized for other type of artifacts. In fact, as long as the ICA is able to extract the artifacts into separate components (which was already demonstrated for many different types of artifacts [3,4,16]), and we are able to classify these components respectively, the algorithm will allow to remove these components without the manifestation of overlearning.
In this paper we limit ourselves to EMG artifacts only, but because EMG artifacts are an omnipresent nuisance in a great number of EEG records, to have an algorithm that removes these artifacts alone, is, in our opinion, a quite useful contribution.
Regarding the future work, there are also other blind source separation algorithms that might be used for the suppression of EMG artifacts (e.g. SOBI [26], SSA [27], CCA [28]). It may be interesting to explore their overlearning properties, and compare their performance with the ICA based ones. However, before doing so we find it important to first improve the ICA based algorithm, as we attempt to do in this paper.

Conclusion
This paper presents an ICA based algorithm for the removal of muscular artifacts that is not prone to overlearning for short signals. This allows its application to short signal segments, which makes it better suited for the removal of artifacts that are distinctly non-stationary and appear, disappear or change their origin within a time frame of a few seconds or less. Thus, the algorithm can remove the artifacts present only in short segments to which it is applied. If the artifacts are not detected, the original EEG is not distorted. If the composition of artifacts is changed, the algorithm can react quickly.
We show that the proposed algorithm outperforms all the current state-of-the-art approaches, and provides the best reconstruction of the original EEG.
The algorithm was presented for the purpose of muscular artifact removal; however, we note that it can be easily extended for the removal of other types of artifacts.
Because the muscular artifacts are a recurring problem in a great number of EEG records, we believe that the presented algorithm will provide a useful and efficient tool for their removal.