Effect of number and placement of EEG electrodes on measurement of neural tracking of speech

Measurement of neural tracking of natural running speech from the electroencephalogram (EEG) is an increasingly popular method in auditory neuroscience and has applications in audiology. The method involves decoding the envelope of the speech signal from the EEG signal, and calculating the correlation with the envelope of the audio stream that was presented to the subject. Typically EEG systems with 64 or more electrodes are used. However, in practical applications, set-ups with fewer electrodes are required. Here, we determine the optimal number of electrodes, and the best position to place a limited number of electrodes on the scalp. We propose a channel selection strategy based on an utility metric, which allows a quick quantitative assessment of the influence of a channel (or a group of channels) on the reconstruction error. We consider two use cases: a subject-specific case, where the optimal number and position of the electrodes is determined for each subject individually, and a subject-independent case, where the electrodes are placed at the same positions (in the 10-20 system) for all the subjects. We evaluated our approach using 64-channel EEG data from 90 subjects. In the subject-specific case we found that the correlation between actual and reconstructed envelope first increased with decreasing number of electrodes, with an optimum at around 20 electrodes, yielding 29% higher correlations using the optimal number of electrodes compared to all electrodes. This means that our strategy of removing electrodes can be used to improve the correlation metric in high-density EEG recordings. In the subject-independent case, we obtained a stable decoding performance when decreasing from 64 to 22 channels. When the number of channels was further decreased, the correlation decreased. For a maximal decrease in correlation of 10%, 32 well-placed electrodes were sufficient in 91% of the subjects.


40
To understand how the human brain processes an auditory stimulus, it is essential to 41 use ecologically valid stimuli. An increasingly popular method is to measure neural outside the passband. Finally, the data was downsampled to 64 Hz and re-referenced 162 to Cz in the channel subset selection stage, and to a common-average reference (across 163 the selected channels) in the decoding performance evaluation stage. The delta band 164 was chosen because it yields the highest correlations and most of the information in the 165 stimulus envelope is contained within this frequency band (Vanthornhout et al., 2018;166 Ding and Simon, 2014). To assess the effect of frequency band on the results, we also 167 analyzed the optimal number and placement of electrodes for measurement of neural 168 tracking of speech in the theta band (4-8 Hz), the results are shown in the appendix.

169
The pre-processing pipeline does not include an artifact rejection step, as this 170 would require the use of electrodes that may later on be eliminated and therefore can 171 potentially leak information from the unselected channels to the selected ones. However, 172 to investigate the effect of artefact rejection we repeated the full analysis using artifact 173 rejection, involving the Sparse Time Artifact Removal method (STAR) (de Cheveigné, signed rank test of the effect of artefact rejection on correlation showed no significant 176 effect. (2017), who showed that good reconstruction accuracy can be achieved with a gammatone 179 filterbank followed by a power law. We used a gammatone filterbank (Søndergaard 180 et al., 2012;Søndergaard and Majdak, 2013), with 28 channels spaced by 1 equivalent 181 rectangular bandwidth, with centre frequencies from 50 Hz to 5000 Hz. From each 182 subband, we take the absolute value of each sample and raise it to the power of 0.6. 183 The resulting 28 signals were then downsampled to 1024 Hz, averaged, bandpass filtered 184 with a (0.5-4 Hz) Chebyshev filter to obtain the final envelope, and finally downsampled 185 again to 64Hz. The power law was chosen as the human auditory system is not a linear 186 system and compression is present in the system. The gammatone filterbank was chosen 187 as it mimics the auditory filters present in the basilar membrane in the cochlea. (1) where X ∈ R T ×(N ×τ ) is the EEG data matrix concatenated with τ time-shifted (zero-192 padded) version of itself, y ∈ R T ×1 is the speech envelope, w ∈ R (N ×τ )×1 is the decoder,

193
T is the total number of time samples, N is the number of channels, τ is the number of 194 time samples covering the time integration window of interest, and λ is a regularization 195 parameter. The solution to the backward problem (ŵ) is usually referred to as a decoder.

196
In order to choose the regularization parameter λ, we compute and sort the eigenvalues 197 of the covariance matrix associated to X. Then, we pick as λ the eigenvalue where the 198 accumulated percentage of explained variance is greater than 99%. 199 2.2.4. Channel selection To select channels we used the utility metric (Bertrand, 2018), which quantifies the effective loss, i.e., the increase in the LS cost, if a group of columns (corresponding to one channel or a set of channels and all their τ − 1 corresponding time-shifted version) would be removed and if the model (1) would be reoptimized afterwards: where X −g denotes the EEG data matrix X after removing the columns associated with 200 the g-th group of channels and their corresponding time-shifted versions. We will later 201 on define how channels are grouped in our experiments (see Subsection 2.2.5).

202
Note that a naive implementation of computing U g would require solving one LS workflow for finding the best k groups of EEG channels can be summarized as follows and remove the group with the lowest utility. Next, we recalculate the new values of the 211 utility metric taking only into account the remaining groups, and once again we remove 212 the one with the lowest value of utility. We continue iterating following these steps until 213 we arrive to k groups. 214 We used the utility metric in two conditions: (1) in the subject-specific case where optimal electrodes are selected for each subject, and (2) in the generic case where the 216 same set of electrodes is used for all subjects.

217
In the subject-specific case, we computed (for each subject i) the regularized T + λI (I denotes the identity matrix) and the cross-219 correlation vector r (i) = X (i) y T in order to compute the optimal all-channel decoder corresponding to the channels in group g were removed. We kept repeating this process 226 until only k groups remained.

227
Next, during the decoding evaluation stage, we computed a decoder by solving the 228 backward problem using the best k selected groups of channels for each subject. In this 229 stage, we re-referenced the channels with respect to the common average across the 230 selected channels and discarded the reference electrode Cz. We solved each backward 231 problem using a 7-fold cross-validation approach, where 6 folds were used for training and 232 1 for testing. This corresponds to approximately 12 and 2 minutes of data, respectively.

233
Using the decoderŵ, we computed the reconstructed envelope asŷ = Xŵ after which 234 we computed the Spearman correlation between the reconstructed speech envelope (ŷ) 235 and the true one (y). By following this procedure, for each subject, we ended up with 7 236 values of correlation (corresponding to the evaluation of the correlation using each one 237 of the test folds), which can be arranged as an array S ∈ R 90×k×7 (number of subjects × 238 number of groups × number of test folds).

239
To compare with the literature, we also implemented the DMB approach, wherein 240 we iteratively solved a backward problem for each subject, and at each iteration, the 241 group of electrodes with the lowest corresponding coefficient magnitudes in the decoder 242 was removed from the next iteration.

243
As a reference, we also implemented the forward model, where for each subject 244 and electrode the correlation between actual EEG and EEG predicted from the speech 245 envelope is obtained. The results are shown in the appendix.

246
In the subject-independent case, where the same set of electrodes is used for all 247 subjects, we only used the utility metric. The evaluation consisted of the same two 248 ‡ We used the utility metric toolbox from Narayanan and Bertrand (2019) available at https: //github.com/mabhijithn/channelselect stages described above. The only difference was that, during the channel selection stage, 249 we computed a grand average model by averaging the covariance matrices of all the 250 subjects, which is equivalent to concatenating all the data from all the subjects in the 251 data matrix X in (1). Finally, the decoding evaluation stage followed exactly the same 252 steps described for the subject-specific case above, i.e., using a subject-specific decoder 253 (yet, computed over electrodes that were selected in a subject-independent fashion). 254 2.2.5. Symmetric grouping of the EEG channels In addition to selecting individual 255 channels to remove (no grouping of channels), we also evaluated a strategy in which 256 symmetric groups of channels were removed, to avoid hemisphere bias effects across 257 subjects. Each group is composed of two EEG channels (see Figure 1). For channels   Figure 1: Channel grouping strategy. For channels located either over the left or right hemisphere (groups 1, 2, . . . , 27), each group is composed by one channel located over the left hemisphere and its closest symmetric counterpart located over the right hemisphere. For channels located over the central line dividing both hemispheres (groups 28, 29, 30, 31), each group is composed by one channels located over the frontal lobe and its closest symmetric counterpart located either over the parietal or the occipital lobe.

269
We compared the performance of the utility metric and DMB in the the subject-specific 270 case, where the optimal electrode locations were determined for each subject individually.

271
We compared the median of the correlation between y andŷ for each subject, as well as 272 the number of channels required to obtain it (from now on referred to as the optimal 273 number of channels). For both methods we observe a large increase in correlation when 274 we use a reduced number of channels, with the optimum of the median around 20 and 30 275 channels, for the utility metric and DMB, respectively (see Figure 2a). This means that 276 the evaluated strategies of removing electrodes can be used to substantially improve the 277 correlation metric in high-density EEG recordings.

278
We can see in Figure 2a that the utility metric globally outperforms the DMB 279 approach, obtaining consistently higher correlations (median) across subjects. In Figure   280 2b, we can see that the utility metric also outperforms the DMB approach on an    remove channels one by one, obtaining the best channels for each subject independently.
297 Figure 3a shows the median correlation, computed as the median across folds followed by

308
Figures 3a and 3b suggest that we could obtain a higher correlation with a reduced 309 number of channels. However, these are group results. Figure 3c shows, independently 310 for each subject, the difference between the correlation when we use all the 64 channels 311 and when we use a reduced number of channels. We can see that this effect is indeed 312 consistently present for all subjects when we use a number of channels between 19 and 57.

313
This behaviour can be seen more clearly in Figure 5a, where the percentage of subjects 314 with a correlation greater or equal to 100%, 95% and 90% of the correlation obtained 315 using all the channels (green, purple and cyan lines, respectively) is shown. Figure 5a 316 clearly shows that for 98% of the subjects it is possible to reduce the number of channels 317 to 19 and still obtain a correlation higher than the one obtained using all the channels.

318
Even if we go all the way down to 8 channels, we can see that 82%, 91% and 96% of 319 the subjects are still able to get a correlation higher than 100%, 95% and 90% of the 320 correlation obtained using all channels, respectively.  In the appendix the same analysis is conducted for the theta frequency band.

341
Generally the same trends are observed as in the delta band. A Wilcoxon signed 342 rank test showed that there was a significant difference (W=0, p < 0.001) between 343 the correlation using the optimal number of channels according to the utility metric 344 (median=0.12) compared to the one obtained using all the channels (median=0.06), 345 which is a 100% improvement. This suggests that these results are robust to the choice 346 of frequency band and filter parameters. However, other electrodes are selected. Corr. using the optimal # of channels (DMB) Corr. using the optimal # of channels (Utility metric) (b) Comparison of the correlation obtained using the optimal number of channels (number of channels where each subject obtained the highest correlation). Size of the markers is proportional to the optimal number of channels (one marker per subject). For comparison purpose, the grey marker has a size equivalent to 64 channels. Figure 2: Comparison of channel selection strategies: utility metric vs DMB (subject-specific scenario). A Wilcoxon signed rank test showed that there was a significant difference (W=6, p < 0.001) between the correlation obtained using the optimal number of channels according to the utility metric (median=0.22) compared to the one obtained using DMB (median=0.19). Another Wilcoxon signed rank test showed that there was also a significant difference (W=424.5, p < 0.001) between the optimal number of channels selected by the utility metric (median=19) compared to the one selected by DMB (median=32).

Subject-independent electrode locations
We now consider the case where the same 348 set of electrodes is used for all subjects. Figure 4a shows the correlation across subjects, 349 computed as the median across folds followed by the median across subjects. In this 350 figure, we can see that at least 50% (median) of the subjects exhibit a stable correlation 351 for 22 up to 64 channels.

352
Contrary to the subject-specific electrode locations, we here found a small benefit 353 of using the symmetric channel grouping strategy: median correlations with the optimal 354 number of channels significantly improved when moving from the channel-by-channel to Corr. using all the channels Corr. using the optimal # of channels (Utility metric) (d) Comparison of the correlation obtained using the optimal number of channels (number of channels where each subject obtained the highest correlation) vs the correlation obtained using all the channels. Size of the markers is proportional to the optimal number of channels (one marker per subject). The grey marker has a size equivalent to 64 channels. Figure 3: Comparison of the channel selection based on the utility metric vs using all the channels (subject-specific scenario). A Wilcoxon signed rank test showed that there was a significant difference (W=0, p < 0.001) between the correlation obtained using the optimal number of channels suggested by the utility metric (median=0.22) compared to the one obtained using all the channels (median=0.17).
using all the 64 channels). Corr. using all the channels Corr. using the optimal # of channels (Utility metric) (d) Comparison of the correlation obtained using the optimal number of channels (number of channels where each subject obtained the highest correlation) vs the correlation obtained using all the channels. Size of the markers is proportional to the optimal number of channels (one marker per subject). The grey marker has a size equivalent to 64 channels. Figure 4: Comparison of the channel selection based on the utility metric vs using all the channels (subject-independent scenario). A Wilcoxon signed rank test showed that there was a significant difference (W=0, p < 0.001) between the correlation obtained using the optimal number of channels suggested by the utility metric (median=0.18) compared to the one obtained using all the channels (median=0.16). obtained using all the available channels even if we use a reduced number of channels.

364
However, these are group results. Figure 4c shows, separately for each subject, the value of the correlation when we use a reduced number of channels. We can see that 367 this effect is not consistently present for all subjects (if that would have been the case, 368 all the lines would have appeared above 0 when we use a reduced number of channels 369 n k , 22 ≤ n k < 64). Nevertheless, a certain percentage of subjects do exhibit a higher 370 value of the correlation when using a reduced number of channels. Figure 5b helps us to 371 quantify this property, by showing the percentage of subjects with a correlation greater 372 or equal to 100%, 95% and 90% of the correlation obtained using all the channels (green, 373 purple and cyan lines, respectively). In this figure we can see that for 59%, 74% and 374 91% of the subjects it is possible to reduce the number of channels to 32 and still be 375 able to obtain a correlation higher than 100%, 95% and 90% of the correlation obtained 376 using all channels, respectively. The percentage of subjects can increase to 62%, 81% 377 and 91%, respectively, if we increase the number of channels from 32 to 40.  Figure 5: Percentage of subjects with a correlation greater or equal to 100%, 95% and 90% of the correlation obtained using all the channels. In the subjectspecific scenario we can see that for 98% of the subjects is possible to reduce the number of channels to 19 and still be able to obtain a correlation higher than the one obtained using all the channels. In the subject-independent scenario we can see that for 59%, 74% and 91% of the subjects is possible to reduce the number of channels to 32 and still be able to obtain a correlation higher than 100%, 95% and 90% of the correlation obtained using all channels, respectively. The percentage of subjects can increase to 62%, 81% and 91%, respectively, if we increase the number of channels from 32 to 40. electrode was selected in the corresponding subject-specific layouts. The ranges are 408 20-28; 31-40; 37-55; 44-67 out of 90 subjects for respectively the 8, 16, 24 and 32 channel 409 layout. These relatively small proportions indicate again that the generic layouts are not 410 optimal. Note that in this analysis we did not apply the symmetric grouping constraint 411 in the subject-independent case for the sake of comparison. in a multi-speaker scenario). They processed EEG recordings from 12 and 29 subjects, 420 acquired using an EEG system with 96 and 64 channels, respectively. They found that, 421 on average, the decoding accuracy dropped when using a number of channels less than 422 25. Both studies used the same channel selection strategy, which is based on an iterative 423 backward elimination approach, where at each iteration, the channel with the lowest 424 average decoder coefficient is removed from the next iteration. This strategy assumes FCz-4 CPz-4 (a) Best 8 channels.  Figure 6: Optimal channel selection The number next to each group of channels (formed by two electrodes, see Figure 1) indicates the ranking of the group with respect to its influence on the LS cost (see text). The lower this number, the more important the group.
that important channels will have a large coefficient in the LS solution. However, as 426 explained in the introduction, this is not necessarily a suitable assumption. They did 427 not report optimal electrode positions.

459
In the case where the same channels were selected for all subjects, the initial increase 460 in correlation with decreasing number of channels was smaller and not present for all 461 subjects. Therefore in this case our strategy is not useful to increase correlation.

479
While the presented channel layouts for 8, 16, 24, and 32 channels are the best we can 480 do with our current data and methods, and may be useful for some applications, it should 481 be pointed out that they yield relatively poor performance compared to subject-specific 482 layouts and are therefore certainly not optimal for all subjects. where an objective measure of speech intelligibility is needed. Our suggested electrode 489 positions could be used to configure an electrode cap or headset for this specific application. 490 We chose to run our calculations with the speech envelope as the stimulus feature and for 491 the delta band (0.5-4Hz), as these parameters are most commonly used. Note that when 492 deviating from these parameters, the selection should be re-run. In particular, when 493 higher-order stimulus features are used, we expect significant changes in topography and 494 therefore optimal electrode positions.

495
Subject-specific electrode locations are at this point mainly useful to increase 496 correlations when a full electrode cap is available. In this case, the utility-based algorithm 497 would be part of the processing pipeline to retain the optimal number of electrodes.

498
In the future subject-specific locations may also be useful to design a subject-specific 499 headset based on initial recordings with a full cap. However, this will require validation 500 of generalisability between test sessions and between EEG systems, which is currently 501 unknown.

503
In this work, the effect of selecting a reduced number of EEG channels was investigated 504 within the context of the stimulus reconstruction task. We proposed a utility-based 505 greedy channel selection strategy, aiming to induce the selection of symmetric EEG 506 channel groups. We evaluated our approach using 64-channel EEG data from 90 subjects. Corr. using the optimal # of channels (DMB) Corr. using the optimal # of channels (Utility metric) (b) Comparison of the correlation obtained using the optimal number of channels (number of channels where each subject obtained the highest correlation). Size of the markers is proportional to the optimal number of channels (one marker per subject). The grey marker has a size equivalent to 64 channels. Figure A1: Comparison of channel selection strategies: utility metric vs DMB (subject-specific scenario in the theta band). A Wilcoxon signed rank test showed that there was a significant difference (W=0, p < 0.001) between the correlation obtained using the optimal number of channels according to the utility metric (median=0.12) compared to the one obtained using DMB (median=0.09). Another Wilcoxon signed rank test showed that there was also a significant difference (W=758.5, p < 0.001) between the optimal number of channels selected by the utility metric (median=20) compared to the one selected by DMB (median=29). (c) Normalized correlation per subject (each line is a different subject), defined as the difference between the value of the correlation obtained when we use all the channels and the value of the correlation obtained when we use a reduced number of channels. For the best electrode selection, correlations were on average 98% higher than when using all the available electrodes. Corr. using all the channels Corr. using the optimal # of channels (Utility metric) (d) Comparison of the correlation obtained using the optimal number of channels (number of channels where each subject obtained the highest correlation) vs the correlation obtained using all the channels. Size of the markers is proportional to the optimal number of channels (one marker per subject). The grey marker has a size equivalent to 64 channels. Figure A2: Comparison of the channel selection based on the utility metric vs using all the channels (subject-specific scenario in the theta band). A Wilcoxon signed rank test showed that there was a significant difference (W=0, p < 0.001) between the correlation obtained using the optimal number of channels suggested by the utility metric (median=0.12) compared to the one obtained using all the channels (median=0.06). Corr. using all the channels Corr. using the optimal # of channels (Utility metric) (d) Comparison of the correlation obtained using the optimal number of channels (number of channels where each subject obtained the highest correlation) vs the correlation obtained using all the channels. Size of the markers is proportional to the optimal number of channels (one marker per subject). The grey marker has a size equivalent to 64 channels. Figure A3: Comparison of the channel selection based on the utility metric vs using all the channels (subject-independent scenario in the theta band). A Wilcoxon signed rank test showed that there was a significant difference (W=0, p < 0.001) between the correlation obtained using the optimal number of channels suggested by the utility metric (median=0.08) compared to the one obtained using all the channels (median=0.06). threshold: 100% of the corr. using all the channels threshold: 95% of the corr. using all the channels threshold: 90% of the corr. using all the channels Number of selected EEG channels % of subjects with corr. >= threshold (b) Subject-independent scenario. Figure A4: Percentage of subjects with a correlation greater or equal to 100%, 95% and 90% of the correlation obtained using all the channels in the theta band. In the subject-specific scenario we can see that for 99% of the subjects is possible to reduce the number of channels to 9 and still be able to obtain a correlation higher than the one obtained using all the channels. In the subject-independent scenario we can see that for 71%, 76% and 80% of the subjects is possible to reduce the number of channels to 32 and still be able to obtain a correlation higher than 100%, 95% and 90% of the correlation obtained using all channels, respectively. The percentage of subjects can increase to 76%, 82% and 84%, respectively, if we increase the number of channels from 32 to 36.  Figure A5: Optimal channel selection for the theta band. The number next to each group of channels (formed by two electrodes, see Figure 1) indicates the ranking of the group with respect to its influence on the LS cost (see text). The lower this number, the more important the group.  Figure B1: Median rank order across subjects of each channel, subject-specific scenario, utility metric. The rank order of a channel, provided by the utility metric, reflects the importance of a channel with respect to the other selected channels. The lower the rank, the more important the channel.   Figure B2: Median correlation for the forward model, subject-specific scenario. The forward model was calculated for each subject and electrode, and the median correlation between actual and predicted EEG is plotted per electrode. By comparing with Figure ??, we can see that for both the delta and theta bands, channels with lower rank order (more important channels) are generally also channels with higher correlation in the forward model. For the delta band such channels are primarily concentrated in the temporal and pre-frontal regions, whereas for the theta band they are located in the temporal and frontal regions. These locations agree with the optimal channel selection for both the delta and theta band. (d) 32 more frequent selected channels. Figure B3: Channel selection frequency (Delta band). The color indicates for each subject-independent layout (without symmetric grouping constraint) for how many subjects each channel was selected in the subject-specific case. The number next to the channel label indicates the number of subjects for whom the channel was selected out of the total number of 90 subjects. (d) 32 more frequent selected channels. Figure B4: Channel selection frequency (Theta band). The color indicates for each subject-independent layout (without symmetric grouping constraint) for how many subjects each channel was selected in the subject-specific case. The number next to the channel label indicates the number of subjects for whom the channel was selected out of the total number of 90 subjects.