Accelerated sparsity based reconstruction of compressively sensed multichannel EEG signals

Wearable electronics capable of recording and transmitting biosignals can provide convenient and pervasive health monitoring. A typical EEG recording produces large amount of data. Conventional compression methods cannot compress date below Nyquist rate, thus resulting in large amount of data even after compression. This needs large storage and hence long transmission time. Compressed sensing has proposed solution to this problem and given a way to compress data below Nyquist rate. In this paper, double temporal sparsity based reconstruction algorithm has been applied for the recovery of compressively sampled EEG data. The results are further improved by modifying the double temporal sparsity based reconstruction algorithm using schattern-p norm along with decorrelation transformation of EEG data before processing. The proposed modified double temporal sparsity based reconstruction algorithm out-perform block sparse bayesian learning and Rackness based compressed sensing algorithms in terms of SNDR and NMSE. Simulation results further show that the proposed algorithm has better convergence rate and less execution time.


Introduction
In Brain Computer Interface (BCI), a non-muscular connection between computers and human is made to assist in conversion of coded brain signals into external commands [25,32]. EEG based BCI has shown significant importance in recent years for health-care monitoring, including early detection of seizure, trauma, alzheimer and stroke [29]. Normal EEG signal contain large number of data that cannot be sampled and transmitted during many real life scenarios. Epileptic seizure detection of patients require continuous EEG monitoring that may last upto a number of hours [22]. Saving, processing and transmission of this huge data requires bulk storage and immense processing power [1]. As an example, multichannel EEG recording ranges from 24 to several hundred electrodes. With 24 electrodes, if sampled at 200 HZ using 12 bits of resolution, it produces a data of at least 1 GB per day [8]. Compressive PLOS ONE | https://doi.org/10.1371/journal.pone.0225397 January 7, 2020 1 / 15 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 sensing (CS) theory suggests solution to this problem by taking samples far fewer than the Nyquist rate along with faithful recovery [38]. Traditional way of compression discards huge amount of data resulting in a lossy compression. To overcome this issue, signal compression techniques with better sampling patterns have been developed, which enables to store the same amount of data in more compact form [24]. Some methods are matched filters, autocorrelation based euclidian distance, bayesian interface methods, wavelet compression and jpeg 2000 [4]. All these methods need sampling measurements at Nyquist rate using analog to digital converter (ADC), resulting in computationally complex data, high processing time, and expensive hardware. In order to mitigate all these problems, CS provides a promising solution [4]. CS reconstructs from highly under-sampled data, even below the Nyquist rate, discarding the redundant information. This results in a huge reduction in dimension due to less number of measurements. The basic theory of CS rely on two necessary conditions, sparsity and incoherence [14].
The main idea behind CS is that, signal can be represented by only few non-zero coefficients, this is done by using some sparse sensing matrix [11]. For CS, two assumptions are made. Firstly, either data is itself sparse or sparse in some transform domain. Secondly, the measurement basis and representation basis are mutually incoherent [9], this results in a compression below the Nyquist rate. As number of measurements are far fewer than original signal, thus recovering the original signal is an NP-hard problem [23]. Due to non-sparse representation of EEG in time domain, EEG signal is made sparse by using different basis or dictionary functions. There have been many publications indicating dictionaries such as slepian basis and Gabor framework [38]. Hesham [19], presented the CS framework for EEG using dirac sensing matrix, and efficiently reconstructed the EEG signal after compression. Angshul [10], illustrated CS recovery of EEG using 2-D fourier transform, however [38], claimed that better reconstruction can be achieved using wavelet domain instead of Gabor domain. Jun [3], claimed that using daubechies wavelets, the reconstruction accuracy achieved is better than that of other basis functions.

Compressed sensing
Compressed sensing depend on the hypothesis that the signal x is compressed by Φ 2 R M�N (sampling or measurement matrix). The sampling model is formulated as, where y 2 R M�1 represent the compressed measurements with M�N, indicating that number of sampled measurements are far less than the original signal. If x is sparse, then recovery problem only requires the compressed measurements and sampling matrix, but if not, than signal x should be sparse (transformed) in representation matrix (dictionary) Ψ 2 R N�P with N�P. This can be written as, where y 2 R P�1 is sparse. x can be recovered using measurements y, dictionary C, and sampling matrix F. The minimization problem formed is, where k.k 0 is the ℓ 0 norm, i.e it counts the number of non-zero entries. The x is called K-sparse when number of non-zero entries are vector equal to K. The task of recovering x from measurements y is an under-determined inverse problem, and finding its solution is NP-hard problem [37], as the sensing matrix C due to large undersampling is ill conditioned. So general problem is regularized in order to achieve recovery. Sparsity regularization is comparatively elementary solution to this problem. In sparse recovery problem, the expected signal is said to be sparse by representing it with transform C, to regularize it in transformed domain, ℓ 1 norm is used as the substitute for ℓ 0 norm. Hence, for EEG reconstruction problem, the sparsity regularization is formulated as [21], F represent the basis of sparsifying transform and k.k 1 is ℓ 1 norm. The minimization of variance of noise is done with fedility term in Eq 4. The regularization term λ 1 is added to induce sparsity on x over basis C, and k:k 2 F represents the Frobenius norm, which can be defined as, kyk The proposed method is based on double sparsity based framework. The motivation behind proposed method is to increase the existing reconstruction accuracy of multi-channel EEG signals using following contributions: First, the multi-channel EEG signal is pre-processed using zero mean and whitening transform. The total variation matrix exploited in previous work exhibits redundancy in reconstruction accuracy, so instead of using total variation matrix, proposed algorithm explored the concept of circulant matrix as a sparse sensing matrix. In addition, the shattern-p norm is used as non-convex surrogate function to exploit the double sparsity in CS recovery of multi-channel EEG signals. The flow diagram of proposed method is shown in  The rest of the paper is organized as follows, section III summarize the existing methods used for the reconstruction of compressively sampled EEG signals. Section IV includes the discussion and analysis of proposed method on the basis of quantitative analysis, section V gives the concluding remarks.

Rackness based compressive sensing
Power consumption during wireless transmission has been focus of many researchers, Nicola et al. [16] worked on solving the power consumption issue by introducing the rackness based compressed sensing. Using rackness based CS, good signal reconstruction of EEG signal can be achieved. Rackness approach is based on assumption that certain signal exhibits non-flat energy distribution. Using this assumption, there is no need to construct F from randomly selected i.i.d entries. Instead of constructing randomly, F is tuned statistically which match with the input signal. This property increases the average energy of y, which ultimately increases the reconstruction accuracy. To formulate this property, rackness ρ between two stochastic process x and F is defined as, where the static expectation over x and F is represented by E (F,x) , with h.i as standard inner product, F j is the sensing sequence, and x is the signal instances. Using rackness based approach, both noise suppression and good signal reconstruction is achieved.

Block sparse bayesian learning
Using field programmable gate array (FPGA), Liu et al. [27] proved that comparing to wavelet domain, block sparse bayesian learing (BSBL) shows prominent results in terms of power consumption and reconstruction accuracy. Using fast marginalized (FM) likelihood method, fast implementation of BSBL was developed in which EEG signal is structured into blocks as shown in Eq (7), which represents that x has g blocks, with few non-zero blocks and d i is the size of ith block. The BSBL uses intra-block structure correlation to model the signal x with Gaussian distribution. The resulting reconstruction is not robust to all methods, as blocking introduces noise, therefore, regularization is done in order to achieve better results.

Simultaneous co-sparsity and low rank
In a very recent work [2], it was discussed that due to correlation in EEG signals, same sparsity pattern is adopted after transformation i.e. the values of transform coefficients have values at same positions leading to row-sparse recovery [2]. This theory was formulated using ℓ 2,1 norm minimization problem.X where ℓ 2,1 norm is the sum of the rows of ℓ 2 norm. In Eq 8, ℓ 2 norm gives the dense solution in selected rows, hence sum of ℓ 2 norm promotes selection of very few rows.

Blind compressed sensing
The theory of CS relies on the assumption that sparsity of the signal is known in some basis. The Blind Compress Sensing (BCS) [17], avoid this assumption by using both CS and dictionary learning. BCS estimates the sparse signal as well as the sparsifying dictionary from the data. The assumption that data is sparse in learned dictionary, i.e. X = DJ where J are the sparse coefficients and D is the unknown dictionary (to be estimated) [17].

Double temporal sparsity based reconstruction
Using double temporal sparsity reconstruction (DTSR), better reconstruction can be achieved with acceleration in time. Priya [28] proposed DTSR for sparse signal recovery of fMRI data with prominent results. In this work we have sued a modified form of DTSR along with some pre-processing for sparse recovery of EEG signal. The DTSR algorithm uses total variation based algorithm and imposes two ℓ 1 norm constraints. First constraint is applied on transformed domain of temporal data and other is imposed on the consecutive difference of same data. The cost function of Eq 4 can be written as,x where λ 1 , λ 2 are positive regularization terms, F is the 2-D Fourier transform while the third term shows the consecutive difference of columns of data. The matrix formulation of a Eq 10 can be shown asx where D shows the consecutive difference on the successive columns of x, known as total variation temporal sparsity shown as

Proposed method
In this section, alternating direction multiplier method (ADMM) is modified on the basis of desired EEG signal recovery problem using DTSR. In the pre-processing of signal x, first of all signal is made zero mean and then it is made white by making each column of signal un-correlated to each other. The unit variance and zero mean can be expressed in mathematical terms as, where μ is the mean, and σ is the variance of EEG signal x, this is done in fashion that each value of x is subtracted form mean of individual columns of x, and divided by σ resulting EEG signal with zero mean. This zero mean EEG data is made white by using optimal whitening method [30]. The reason behind this step is that close channel in multichannel EEG signal exhibits strong correlation, removing this correlation by making the columns of EEG signal orthogonal to each other results in less search time in optimal sparse signal recovery. This can be shown in Fig 2, where zero-mean and white data becomes stable in less number of iteration than original algorithm. The whitening of zero-mean signal can be shown mathematically as where d is the dimension of zero-mean vector, W is the d×d whitening matrix. Whitening in general terms can be viewed as [30], where V is the variance, V ¼ diagðs 2 1 ; . . . ; s 2 d Þ, such that, varðm i Þ ¼ s 2 i . The whitening transform should follow, WSW T = I and thus W(SW T W) = W, which is only achieved if W satisfy the constraint Using Mahalanobis whitening method [30], the whitening transform employed in this work is The difference between zero-mean and whitened data can be seen by number of iterations in Fig 2. Instead of using total variation matrix D used in Eq 11, in this paper we explored the idea of using circulant matrix [13]. For CS, the binary measurement matrices are formed using parity check matrix of array coding. A parity matrix H(r,q) is the identity matrix of dimension r×q and (i,j)-th circulant permutation matrix P (i−1)(j−1) with q as odd prime and r as positive integer, such that 1�r�q [36]. The i-th row circulant matrix is formed by cyclic shift of (i+1)-th or (i-1)-th row by one term. This can be shown as, After making these changes in the cost function of DTSR, the modified version of algorithm can be rewritten as,ẑ Instead of using frobenious norm, which is basically square root of the sum of the absolute squares defined in Eq 11, we have used the schattern-p norm. Schattern-p norm has been successfully used for sparse synthesis model and shows accurate results [3,7,18,33]. Eq 17 can be re-written asẑ where k:k p S p is the schattern-p norm and it is the sum of all singular values σ of data z upto value p, for matrix T 2 H m and p2 [1,+1], k:k p S p is defined as, where H m is monotone Hilbert space, and σ is the singular values of T in non-increasing order such that σ 1 (T) � σ 2 (T). . .. To solve Eq 18, optimized solution is obtained using ADMM [6].

Optimization algorithm
Strained optimization problem in recent literature [20,26,34]. The ADMM ease the solution by breaking down the original cost function into several objective function, that are comparatively easy to solve. Following [28], two auxiliary matrices P 2 R M�N and Q 2 R M�N are introduced in Eq 18 aŝ By adding these new constraints for each of auxiliary matrices, the objective function formed isẑ where B 1 , B 2 are lagrange multipliers to satisfy the equality auxiliary and original matrices, and η 1 , η 1 are penalty parameters. ADMM updates variables P, Q and z alternatively in the above lagrange function. By keeping the other two variables fixed, one variable is minimized in each iteration. Thus, the above function can be decomposed into three sub-problems, with new objective function as, where j is the number of iterations, A 1 and A 2 subproblems minimize objective function over P and Q respectively with fixed z. Similarly, A 3 minimizes z keeping P and Q fixed. Subproblems A 1 , A 2 and A 3 are solved iteratively by updating the lagrange multipliers B 1 and B 2 . A 1 and A 2 subproblems. Subproblems A 1 and A 2 are ℓ 1 minimization problem, for general ℓ 1 minimization problem as the solution is [28], where W is unitary matrix and U; V 2 R M�N with α, β > 0. V in Eq 25 is initial approximate of U. Hence for j iteration, the solution for subproblem A 1 is For j iteration P is computed for subproblem A 1 . Similarly Q is computed for subproblem A 2 using z j−1 and B jÀ 1 2 as A 3 subproblem. Subproblem A 3 is quadratic, to solve it conjugate gradient algorithm is used [12,15]. In this paper, line search conjugate gradient algorithm is used [28]. This algorithm is iterative in which descent direction is selected on the minimization of function. Further, step size is determined by using line search method. For general quadratic equations, line search conjugate gradient algorithm gives the finite convergence.
Updating lagrange multiplier. In the last step lagrange multiplier is updated iteratively. Lagrange multiplier helps in achieving the convergence in subsequent iteration. The pseudo algorithm of proposed method is shown in Algorithm 1. The convergence is achieved by comparing convergence of objective function with threshold or with maximum number of iterations achieved.

EEG dataset
The publicly available EEG dataset [5] is used for the purpose of sparse signal recovery of multi-channel EEG signal. This commonly used EEG dataset contain 32 channel EEG signal of length 30720 data points, with each signal of channel contain 80 epochs and 384 points. For compression of each epoch, sensing matrix Φ 2 R 192�384 is used as sparse circulant matrix and Ψ 2 R 384�384 is used as Fourier domain sparsifying matrix formed by calculating the Fourier transform of z along each row. To recover EEG signal, ADMM algorithm is used [6].

Quantitative analysis
This section includes the results of the proposed modified DTSR method in comparison with few of the existing CS-based EEG signal reconstruction techniques. For the purpose of reconstruction quality measurement, normalized mean square error (NMSE), mean square error (MSE) and signal to noise distortion ratio (SNDR) are used.
For reference EEG signal x and its reconstructed versionx, the NMSE is computed as where k.k 2 represent the ℓ 2 -norm. Similarly SNDR is calculated as MSE can be calculated as where x andx are the reference and reconstructed EEG signals, C is the number of EEG channels, N is the length of epoch, and L is the number of experiments. Reconstructed NMSE (average), MSE (average), band SNDR (average) for all 32 channel EEG signal is presented in Table 1. The compression rate of 25%(4:1), and 50%(2:1) is used for evaluation. The MSE of multi-channel EEG signal is shown in Fig 2. The proposed method gives best results in less number of iteration than other existing algorithms. Multi channel EEG signal along with its reconstructed versions with different algorithms are shown in  Table 1 indicates that the proposed algorithm outperforms in terms of accuracy and execution time as compared to other state of the art algorithms.

Conclusion
In this work compressively sampled EEG data is recovered using DTSR algorithm. Conventional DTSR algorithm which was originally designed for fMRI data is tailored for application to EEG sparse recovery by making three main contributions. As a first step pre-processing is done by making the EEG data zero mean and unit variance. Second step is to formulate circulant matrix instead of total variation matrix for limiting the search space for fast convergence of the algorithm. Finally it is shown that instead of frobenius norm, using shattern-p norm yields better reconstruction accuracy. The proposed modified DTSR algorithm outperforms conventional DTSR as well as other state of the art CS recovery techniques in terms of NMSE and SNDR.