Facilitating Joint Chaos and Fractal Analysis of Biosignals through Nonlinear Adaptive Filtering

Background Chaos and random fractal theories are among the most important for fully characterizing nonlinear dynamics of complicated multiscale biosignals. Chaos analysis requires that signals be relatively noise-free and stationary, while fractal analysis demands signals to be non-rhythmic and scale-free. Methodology/Principal Findings To facilitate joint chaos and fractal analysis of biosignals, we present an adaptive algorithm, which: (1) can readily remove nonstationarities from the signal, (2) can more effectively reduce noise in the signals than linear filters, wavelet denoising, and chaos-based noise reduction techniques; (3) can readily decompose a multiscale biosignal into a series of intrinsically bandlimited functions; and (4) offers a new formulation of fractal and multifractal analysis that is better than existing methods when a biosignal contains a strong oscillatory component. Conclusions The presented approach is a valuable, versatile tool for the analysis of various types of biological signals. Its effectiveness is demonstrated by offering new important insights into brainwave dynamics and the very high accuracy in automatically detecting epileptic seizures from EEG signals.


Introduction
Biological signals often exhibit both ordered and disordered behavior. Two of the most important theories for biosignal analysis are chaos theory and random fractal theory [1,2]. Chaos theory is mainly concerned about apparently irregular behaviors in a complex system that are generated by nonlinear deterministic interactions with only a few degrees of freedom, where noise or intrinsic randomness does not play an important role. For it to be applicable, signals under study have to come from a predominately deterministic system, be relatively noise free, and be stationary (i.e., the statistics of the signals remain fairly constant over time). For a better understanding of the concepts of stationarity, contamination with noise, and determinism, we refer to [3][4][5] for some simple and very illustrative examples. On the other hand, random fractal theory assumes that the dynamics of the system are inherently random and requires the signals be scale-free. Therefore, the foundations of chaos theory and random fractal theory are fundamentally different.
Experimental biological signals are often noisy and nonstationary. These factors complicate tremendously analysis of biosignals using chaos theory. On the other hand, fractal analysis may be hindered by rhythmic activity, which is a signature of biology but is incompatible with the notion of scale-free. These problems can at best partially be mitigated by frequency-domain filtering or wavelet analysis. Rapid accumulation of complex data in life sciences has made it increasingly important to develop new methods to better cope with these difficulties. Here, we present an adaptive algorithm, which has a number of interesting properties: (1) it can readily remove nonstationarities from the signal, including baseline drifts and signal components due to nonphysiological body movements; (2) it can more effectively reduce noise in the signals than linear filters, wavelet denoising, and chaosbased noise reduction schemes; (3) it can readily decompose a multiscale biosignal into a series of intrinsically bandlimited functions; (4) it offers a new formulation of fractal and multifractal analysis, and is better than existing methods when a biosignal contains a strong oscillatory component.

Nonlinear adaptive multiscale decomposition
The proposed adaptive algorithm first partitions a time series into segments (or windows) of length w~2nz1 points, where neighboring segments overlap by nz1 points, and thus introducing a time scale of wz1 2 t~(nz1)t, where t is the sampling time. For each segment, we fit a best polynomial of order M. Note that M~0 and 1 correspond to piece-wise constant and linear fitting, respectively. Denote the fitted polynomial for the i-th and (iz1)-th segments by y (i) (l 1 ), y (iz1) (l 2 ), l 1 ,l 2~1 , Á Á Á ,2nz1, respectively. Note the length of the last segment may be smaller than 2nz1. We define the fitting for the overlapped region as y (c) (l)~w 1 y (i) (lzn)zw 2 y (iz1) (l), l~1,2, Á Á Á , nz1 ð1Þ can be written as (1{d j =n), j~1,2, where d j denotes the distances between the point and the centers of y (i) and y (iz1) , respectively. This means the weights decrease linearly with the distance between the point and the center of the segment. Such a weighting ensures symmetry and effectively eliminates any jumps or discontinuities around the boundaries of neighboring segments. In fact, the scheme ensures that the fitting is continuous everywhere, is smooth at the non-boundary points, and has the right-and left-derivatives at the boundary.
To appreciate how the algorithm copes with an arbitrary trend without any a-priori knowledge, we have shown in Fig. 1 two scalp EEG signals that were heavily contaminated by head movements. The thick red curves were obtained by the adaptive algorithm, which captured the head movement very well. The thin black curves were obtained by a popular smoothing method based on LOESS [6], which is also a polynomial based nonlinear filtering. Its parameters were chosen to match those of the adaptive filter. While it is also good, it is not as effective.
Since the adaptive detrending can deal with an arbitrary trend without a-priori knowledge, we can conclude that it can readily deal with nonstationarity in a biosignal, including baseline drifts and motion artifacts such as those shown in Fig. 1.
Note that the trend is not necessarily the undesired signal. When it is treated as noise, the adaptive filter is high-pass. When it is considered as signals, the filter is low-pass. When we use two window sizes and take the difference between the trend signals, the filter is band-pass. More generally, if we introduce a series of window sizes, w 1~2 n 1 z1vw 2~2 n 2 z1vw 3~2 n 3 z1v Á Á Á, then we get a sequence of trend signals. The difference between two trend signals of window sizes w i~2 n i z1 and w j~2 n j z1 is a band limited signal, with cutoff frequencies 1=(n i t) and 1=(n j t), where t is the sampling time. For convenience, those signals may be called intrinsically band limited functions (IBFs) and the procedure multiscale decomposition. This procedure will be made more concrete when we consider fractal structure of sunspot numbers and discuss epileptic seizure detection from EEG in Section Results.
In [7,8], we have shown that the adaptive filter is more effective in reducing noise from time series data than linear filters, wavelet shrinkage, and chaos-based noise reduction schemes. To appreciate this property, we have shown in Fig. 2 a comparison of this algorithm with wavelet denoising and chaos-based projective filtering for reducing noise in the chaotic Lorenz data. Indeed, we observe that the adaptive denoising is the most effective. This can be further corroborated by the smallness of the remaining noise, the root mean square error (RMSE), shown in Fig. 3.
To better understand the meaning of H, it is useful to mathematically be more precise. Let {x 1 ,x 2 , Á Á Á ,x n } be a stationary stochastic process with mean x and autocorrelation function of the type, This is often called an increment (or noise) process. Its power spectral density (PSD) is 1=f 2H{1 . Its integration, is called a random walk process having PSD 1=f 2Hz1 . Simple non-overlapping smoothing of {x 1 ,x 2 , Á Á Á ,x n } yields a new time series, with variance where s 2 is the variance of original stochastic process {x 1 ,x 2 , Á Á Á ,x n }. Eq. (5) offers an excellent means of understanding H. For example, if H~0:50, m~100, then var(X (m) )~s 2 =100. When H~0:75, in order to have var(X (m) )~s 2 =100, then we need m~10 4 , which is much larger than m~100 for the case of H~0:50. On the other hand, when H~0:25, if we still want var(X (m) )~s 2 =100, then m&21:5, much smaller than m~100, the case of H~0:50. An interesting lesson from such a simple discussion is that if a time series is short while its H is close to 1, then smoothing is not a viable option for reducing the variations there. Many excellent methods have been proposed for estimating H. The most popular is perhaps the detrended fluctuation analysis (DFA) [24]. Indeed, it is among the most reliable [23]. The adaptive decomposition algorithm proposed here can be used to formulate a new fractal and multifractal analysis approach, and is even better than DFA when a signal contains a strong trend. For convenience, we call it AFA.
AFA works as follows. If we start from an increment process, x(1),x(2), Á Á Á, similar to DFA, we first construct a random walk process using Eq. (3). If the original data can be considered as a random walk-like process, which is true for EEG [1,25,26] and sea clutter radar returns [23,27,28], then this step is not necessary. However, for ideal fractal processes, there is no penalty if this is done, even though the process is already a random walk process.  Next, for a window size w, we determine, for the random walk process u(i) (or the original process if it is already a random walk process), a global trend v(i),i~1,2, Á Á Á ,N. Here N is the length of the random walk process. The residual, u(i){v(i), characterizes fluctuations around the global trend, and its variance yields the Hurst parameter H, To prove Eq. (6), we start from an increment process with H. The PSD for the corresponding random walk process, is 1=f 2Hz1 . Using Parseval's theorem [1], The variance of the residual data corresponding to a window size w may be equated to the total power in the frequency range (f w ,f cutoff ), Total power* where f w~1 =w, and f cutoff is the highest frequency of the data. When f w %f cutoff , we immediately see that Eq. (6) has to be valid. In fact, the above proof makes it clear that even if we start from a random walk process with H, integration will make the process to have a spectrum of 1=f 2Hz1z2~1 =f 2(Hz1)z1 , and therefore, the final ''Hurst'' parameter will be simply Hz1. This indicates that there is no penalty if one uses Eq. (3) when the data are already a random walk process. To extend Eg. (6) to a multifractal formulation, we can simply write where q is a real number: depending on whether q is positive or negative, large or small values of deviations are emphasized, respectively. In many applications, the case of q~2 may be most concerned, since H(2)~H. For notational convenience, F (2) (w) may be simply denoted as F(w). Eq. (6) can also be extended to high-dimensional case, such as an image or a high-dimensional trajectory. In the case of 2-D, this can be achieved by first applying the algorithm to the xcomponent of the data, then applying it to the y-component. In fact, the order of whether x-component first or y-component first  does not matter. This is best seen by considering polynomial order to be 1 and functions f (x,y) having the property d dx y). The approach will work in more general situations, including non-differentiable random surfaces. The fractal analysis approach formulated here has two important features that are better than DFA: (1) the trend for each window size w obtained here is smooth, while that obtained by DFA changes abruptly at the boundary of neighboring segments; (2) it can more readily estimate H from a signal with a strong oscillatory trend. The latter property will be made clearer when we analyze the sunspot numbers in Section Results.

Analysis of sunspot numbers
To appreciate the effectiveness of AFA, we examine how it estimates the fractal scaling exponent from sunspot numbers  (which can be downloaded at http://sidc.oma.be/sunspot-data/). The best known property of sunspot numbers is the approximate 11 year cycle, which can be clearly seen from the data shown in Fig. 4(a). Because of this cyclic trend, DFA cannot readily detect the fractal structure in the sunspot variations [29].
Sunspot numbers up to year 2006 have been examined by Movahed et al. [30] using Fourier filtering based DFA, by Zhou and Leung [31] using empirical mode decomposition (EMD) based DFA, and by Hu et. al. [29] by first using the adaptive detrending algorithm described here then applying DFA. The results based on EMD is consistent with that of Hu et. al. [29]. The latter is much simpler. Referring to Fig. 4, the latter approach is to first get the trend data, shown as the solid black curve in Fig. 4(a) (whose phase diagram is shown in Fig. 4(c), which suggests chaos-like dynamics), then obtains the residual signal shown in Fig. 4(b), and finally applies DFA to the residual signal. The H parameter for the shorter data analyzed in Hu et. al. [29] is about 0.74. When the same approach is applied to the longer data analyzed here, H is 0.78. Therefore, the variation of the sunspot numbers around its 11-year cycle is a fractal process with long-range correlations.
When we apply AFA to the sunspot numbers, we obtain the results shown in Fig. 5. H estimated with polynomial order 1 is 0.80, with a short scaling range, while that estimated with polynomial order 2 is 0.83, with a fairly long fractal scaling range up to about 60 months, or half of the 11-year cycle. Therefore, H value estimated is consistent with that by other more complicated methods, including EMD based DFA and adaptive detrending based DFA.

Epileptic seizure detection from EEG
We now demonstrate how AFA can shed new lights on the dynamics of brainwaves and help detect epileptic seizures from EEG.
Following earlier studies, we treat EEG as a random walk process instead of increment process, therefore, the first step, forming a random walk process, is not necessary here. For ease of comparison with the result of [32], we shall work on the same data sets analyzed there, which consist of three groups, H (healthy), E (epileptic subjects during a seizure-free interval), and S (epileptic subjects during seizure), each group contains 100 data segments, whose length is 4097 data points with a sampling frequency of 173.61 Hz. The data can be downloaded at http://www.meb.unibonn.de/epileptologie/science/physik/eegdata.html. Examples of the EEG signals for the three groups, H, E, and S, are shown in Figs. 6(a1,b1,c1), together with their phase diagrams in Figs. 6 (a2,b2,c2). For the details of the data, we refer to [33]. Figure 7 shows a typical log 2 F (w) vs. log 2 w curve for an EEG signal. We observe that there are two short scaling regions, whose Hurst parameters are denoted as H s and H L in the plot. The first scaling determines a time scale of (w 1 z1)=2~9 samples, which amounts to 173:61=9~19:3 Hz. The second scaling break determines a time scale of (w 2 z1)=2~33 samples, which amounts to 173:61=33~5:3 Hz. Using these two time scales, we can obtain two trend signals for each EEG signal. Their difference yields one IBF for each EEG signal. They are shown in Figs. 8(a1,b1,c1) for the signals of Figs. 6(a1,b1,c1). The corresponding phase diagrams are shown in Figs. 8(a2,b2,c2). Fig. 8(c2) is especially interesting, since it suggests chaos-like dynamics for the seizure EEG signal.
When we use the three parameters, H S ,H L , and the saturation level to classify the three EEG groups, we obtain the results shown in Fig. 9. We observe that the three groups almost perfectly separate. This excellent classification result suggests that the two time scales identified above must be generic. This is indeed so, after we visually examine a large subset of the data analyzed here. Note that such an excellent classification accuracy cannot be obtained by using DFA.
It is interesting to note that the seizure detection accuracy shown in Fig. 9 is comparable to that of Adeli et. al. [32] using a complicated approach consisting of (1) decomposing the EEG signals into delta, theta, alpha, beta, and gamma subbands, (2) calculating features such as standard deviations, largest Lyapunov exponent, and correlation dimension for each subband, and (3) using neural networks to classify the three different EEG groups. In particular, it is noted [32] that correlation dimension is more useful for the beta (13-30 Hz) and gamma (30-60 Hz) subbands, while the Lyapunov exponent is more useful for the alpha (8-12 Hz) band. While qualitatively, this observation is consistent with our finding [34] that the largest Lyapunov exponent, because of the particular algorithm of computing it, is more pertinent to larger scale (i.e., slower or lower-frequency dynamics), while correlation dimension characterizes smaller scale (i.e., faster or higher-frequency dynamics). AFA presented here has suggested that the more precise time scales are not given by the traditional idea of the 5 EEG subbands, but are given by the fractal scaling breaks, which are w19:3 Hz and 5:3{19:3 Hz.

Discussion
Motivated by the pressing need of joint chaos and fractal analysis of complex biological signals, we have proposed a nonlinear adaptive algorithm, which has a number of interesting properties, including removing arbitrary nonphysiological trends or baseline drifts from physiological data, reducing noise, and carrying out fractal analysis. The latter property is utilized to analyze sunspot numbers and three different EEG groups for the purpose of detecting epileptic seizures. It is found that the approach is highly effective. In particular, we have found that the approach can automatically partition the frequency into three bands, below 5.3 Hz, above 19.3 Hz, and between 5.3 and 19.3 Hz. This suggests that a more convenient and more intrinsic way of partitioning EEG signals would be to partition them into these three bands, instead of the traditional delta, theta, alpha, beta, and gamma subbands.
The validity of the proposed approach hinges on being able to locally represent a continuous time function by its Taylor series. Therefore, it will work better when the signal is sampled more densely. This is especially true when denoising is concerned. On the other hand, it may lose power when dealing with signals generated by discrete maps or sampled from a continuous time system with very large sampling time. We do not expect this to be a true difficulty, however, since experimental systems usually are continuous time systems, and there is no shortage of technology to adequately sample the dynamics of the system.
While we have used sunspot numbers and EEGs for example applications, we surmise that the approach proposed here can readily be used to analyze a broad range of biological and nonbiological signals. Furthermore, some of the IBFs (such as shown in Fig. 4(c) and Fig. 8(c2)) may better be amenable to chaos analysis. To maximally realize the potential of the approach, interested readers are welcome to contact the authors for the codes.