Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Bayesian Wavelet Shrinkage of the Haar-Fisz Transformed Wavelet Periodogram

  • Guy Nason ,

    Contributed equally to this work with: Guy Nason, Kara Stevens

    Affiliation School of Mathematics, University of Bristol, Bristol, United Kingdom

  • Kara Stevens

    Contributed equally to this work with: Guy Nason, Kara Stevens

    Affiliation School of Mathematics, University of Bristol, Bristol, United Kingdom

Bayesian Wavelet Shrinkage of the Haar-Fisz Transformed Wavelet Periodogram

  • Guy Nason, 
  • Kara Stevens


It is increasingly being realised that many real world time series are not stationary and exhibit evolving second-order autocovariance or spectral structure. This article introduces a Bayesian approach for modelling the evolving wavelet spectrum of a locally stationary wavelet time series. Our new method works by combining the advantages of a Haar-Fisz transformed spectrum with a simple, but powerful, Bayesian wavelet shrinkage method. Our new method produces excellent and stable spectral estimates and this is demonstrated via simulated data and on differenced infant electrocardiogram data. A major additional benefit of the Bayesian paradigm is that we obtain rigorous and useful credible intervals of the evolving spectral structure. We show how the Bayesian credible intervals provide extra insight into the infant electrocardiogram data.

1 Introduction

For a real-life time series it is sometimes difficult to determine whether the underlying process is really stationary using only observations from a section of the process. Often, the spectral behaviour of a real-life time series can change from one time point to another and nonstationarity may only become apparent with continued observation. If we disregard the stationarity assumption, there are an abundance of different models that can be considered. One class of nonstationary models, which we consider here, are the locally stationary processes with slowly evolving second-order structure. Two prominent sub-classes are the locally stationary (Fourier) processes due to [1] and the locally stationary wavelet (LSW) processes due to [2]. However, nonstationary Fourier processes have a long history see, e.g. [35]. Reviews can be found in [6] and [7]. The second-order structure of a time series can be assessed via the (auto-)covariance or spectrum, and accurate specification and estimation of these quantities is particularly important to improve our understanding of the data.

This article assumes that a time series can be modelled by a LSW process and considers the estimation of the associated evolutionary wavelet spectrum (EWS). As is the case for stationary spectral estimation obvious ‘raw’ estimators are not statistically consistent and require smoothing. For example, [2] introduced a kind of ‘method of moments’ spectral estimator and used wavelet shrinkage to smooth it and [8] used kernel smoothing to produce estimates for forecasting. See also [9] who introduce a pointwise estimator. [10] introduced a powerful new approach based on Haar-Fisz transformation of the raw wavelet periodogram and essentially using universal thresholding [11] on the Haar-Fisz coefficients.

This article builds on the [10] work by using Bayesian wavelet shrinkage to bear on the Haar-Fisz coefficients and does so for two reasons. First, recent Bayesian wavelet shrinkage techniques based on the Berger-Müller prior and empirical marginal maximum likelihood determination, such as [12], show dramatic performance improvements over earlier concepts such as universal thresholding. The Bayesian approach uses priors well-adapted to the known mathematical theory underlying wavelet coefficients of a wide class of functions from Besov scales. Secondly, the coherent Bayesian approach permits rational and effective quantification of credible intervals for the EWS. Our simulation results and results on real data show good performance and new insights.

Section 2 reviews the locally stationary wavelet model and the associated evolutionary wavelet spectrum and the wavelet periodogram. Section 3 briefly reviews the Haar-Fisz transformation at establishes notation for subsequent Bayesian wavelet shrinkage. Section 4.2 first reviews wavelet shrinkage and Bayesian wavelet shrinkage and then describes each of the components of our Bayesian wavelet shrinkage method adapted for the Haar-Fisz-transformed spectral coefficients. Section 5 outlines some implementation issues, presents a simulation and analyses an infant electrocardiogram (ECG) data set and compares it to earlier analyses. Finally, section 6 concludes and provides some ideas for further developments.

2 Locally Stationary Wavelet Processes

Locally stationary wavelet (LSW) processes were introduced by [2], and extended to encompass a larger range of processes in [9] which we use here. As in [2] assume that the wavelets used are [13] compactly supported, and that the length of the support for any wavelet ψj,0 is equal to j: = ∣supp(ψj,0)∣. Therefore, if we have J scales, where 1 is the finest scale, then ∣supp(ψj, k)∣ = j = (2j − 1)(1 − 1) + 1 ∀ j ≥ 1. Here ℕ is the set of natural numbers {1, 2, 3, …}.

Definition 1 (The Locally Stationary Wavelet Process) A LSW process is a sequence of doubly indexed stochastic processes, {Xt, T}t = 0, …, T−1, where T = 2J for some J ∈ ℕ. This process has the representation (1) where is a discrete non-decimated family of wavelets for scale j ∈ ℕ, location k ∈ ℤ based on a mother wavelet, ψ(t), of compact support, which we shall refer to as the synthesis wavelet; and ξj, k is a Gaussian random zero mean orthonormal increments sequence. The component wj, k:T ξj, k can be thought of as a random amplitude of the oscillation .

The quantities in Eq (1) possesses the following properties:

  1. [ξj, k] = 0, ∀ j ∈ ℕ, k ∈ ℤ (⇒ [Xt] = 0).
  2. [ξj, k, ξj′, k] = δj, j δk, k, ∀ j, j′ ∈ ℕ, k, k′ ∈ ℤ.
  3. For each j ∈ ℕ there exists a function Wj(z) for z ∈ (0,1), that possesses the following properties
    1. There exists a sequence of constants Cj such that for each T
    2. The total variation (TV) of is bounded by a constant Lj, that is
    3. The constants Cj and Lj satisfy

The time evolution of LSW processes is governed by the time-scale varying evolutionary wavelet spectrum which we define next.

2.1 Evolutionary Wavelet Spectrum and its Estimation

The evolutionary wavelet spectrum (EWS) measures the ‘contribution to the variance’ of Xt, T at scale level j ∈ ℕ and location z ∈ (0,1) and is defined as follows.

Definition 2 (Evolutionary Wavelet Spectrum) The EWS is defined by (2)

Estimation of the EWS can be achieved by first computing the raw wavelet periodogram, defined as follows.

Definition 3 (Raw Wavelet Periodogram) The raw wavelet periodogram is defined as (3) where Xt, T = 0 for t ≠ 0, …, T − 1, j = 1, …, J, k = 0, …, T − 1, J = log2(T) and is a discrete non-decimated family of wavelets we shall refer to as the analysis wavelet.

In theory, the analysis wavelet from Eq (3) is the same as the synthesis wavelet in Eq (1). However, often in practice the synthesis wavelet is unknown. For the purposes of our analysis we shall assume the synthesis wavelet is known and equivalent to the analysis wavelet. The raw wavelet periodogram, Ij, k, is a biased estimator of the EWS, but can be made asymptotically unbiased after simple correction which we will explain next. To proceed with this, the autocorrelation wavelet (ACW) is defined as follows.

Definition 4 (Discrete Autocorrelation Wavelet) The ACW at scale j ∈ ℕ at lag τ ∈ ℤ is defined by The discrete ACW determines the autocorrelation of a wavelet at a particular scale, j and different lags, τ. The discrete ACW provides a family of symmetric, compactly supported, positive semi-definite functions on τ ∈ ℤ. Further theoretical details can be found in [2] and [14]. To form an asymptotically unbiased estimator of the spectrum we require the inner product matrix of the ACW defined as follows.

Definition 5 (The Inner Product Matrix) The operator A = (Aj, l)j, l ≥ 0 is defined by (4) and the J-dimensional matrix is AJ = (Aj, l)j, l = 1, …, J.

Then using definitions 1 and 5, proposition 3.3 of [2] shows that (5) for j ∈ ℕ, k ∈ ℤ, where A is calculated using the chosen analysis wavelet and the variance of the wavelet periodogram is given by (6) This result implies that as the sample size increases (T → ∞) the variance does not vanish. [2] show that the obvious asymptotically unbiased estimator for {Sj(k/T)} where Ik = (I1,k, …, IJ, k) is not statistically consistent. As is typical in spectral analysis in time series the periodogram needs to be smoothed to obtain consistency.

2.2 Wavelet Periodogram Smoothing

Various techniques have already been developed to smooth the wavelet periodogram, such as those by [2, 9, 10]. [9] is theoretically attractive but practically challenging.

In [2] each level, j, of the raw wavelet periodogram is smoothed as a function of z using translation-invariant (TI) de-noising [15]. Non-linear wavelet shrinkage is performed on the approximately distributed raw wavelet periodogram then bias corrected by the inner product matrix (A−1). An appropriate threshold for the shrinkage was determined in [2]. The technique raises a number of questions, such as what is an appropriate wavelet? [2] believe that smoother wavelets, such as Daubechies extremal phase with 10 vanishing moments, help to avoid ‘leakage’ of power into the surrounding scales because of their short support in the Fourier domain. They also produce less spiky and variable estimates in their example.

[10] suggested applying the soft shrinkage rule upon the Haar-Fisz coefficients of the raw wavelet periodogram, using a scale dependent threshold. The methodology produced an estimator which was mean-square consistent, rapidly computable, easy to implement and performs well in practice. However, the theoretical validation of this technique was restricted to locally stationary processes with a time-varying, but piecewise constant form.

The Haar-Fisz transform in [10] is very attractive producing transformed periodogram ordinates that are very close to being uncorrelated and Gaussian. We apply Bayesian wavelet shrinkage to this enticing situation and not having to worry about first order estimation error in the variance.

3 Spectral Normalisation using the Haar-Fisz Transform

The Haar-Fisz transformation works by normalising the wavelet coefficients of a signal to obtain elements that are close to Gaussian and have near-constant variance. We adapt the definition from [10], Section 6, which applies the Haar-Fisz transform to the raw wavelet periodogram Ij, k as follows. To prevent unnecessary notational overload we will temporarily drop the j subscript and write Im for Ij, m. The next algorithm is applied to each scale j of the periodogram separately.

  1. Let cJ, m: = Im for m = 0, …, T − 1, where T = 2J.
  2. For l = (J − 1), …, 0, recursively for the vectors where m = 1, …, 2l − 1, and dl, m and cl, m are the Haar wavelet and scaling coefficient of the raw wavelet periodogram at scale j, respectively.
  3. Divide the wavelet coefficients by the scaling coefficients to produce the Haar-Fisz coefficients (7) for cl, m ≠ 0. For cl, m = 0 set fl, m = 0.
  4. For l = 0, …, J − 1, recursively modify the vectors cl: where c0,0 = c0,0 and m = 1, …, 2l,
  5. Define m = cJ, m, m = 1, …, 2J.

In other words, we have transformed the input vector into the Haar-Fisz output vector . Now we re-introduce the j subscript as this Haar-Fisz processing is replicated at each scale, and let denote the non-linear invertible Haar-Fisz operator, hence j, k = Ij, k.

[10] model the raw wavelet periodogram as where Rj(z) = (AS)j(z), z = k/T and , for j ∈ ℕ, k = 1, …, 2J = T.

Proposition 6.1 in [10] details a number of properties possessed by . Property 6.1(2) states the Haar-Fisz transformation possesses the log-like property, which suggests the a potential model for the is (8) for j = 1, …, J and k = 1, …, 2J, where j(z) = Rj(z), z = k/T and . As the distribution of j, k is approximately , ej, k are approximately uncorrelated with , due to Proposition 6.1 (3,4,5) from [10]. Model Eq (8) is conducive to Bayesian wavelet shrinkage as explained next.

4 Bayesian Wavelet Shrinkage

4.1 Brief Review of Wavelet Shrinkage

Wavelet shrinkage is a form of nonparametric regression introduced in a series of seminal articles such as [11, 16]. See [17] or [18] for more details and further references. Suppose we have a set of noisy observations, y = (y1, …, yn) of an unknown function f(x), taken at regularly spaced locations, denoted by x = (x1, …, xn). In our context, we can use the well-known additive signal-plus-noise model for each scale-level, j, in Eq (8): where e = (e1, …, en) are random variables which are usually assumed to be iid with zero mean and some variance σ2. The aim is to devise an estimator to recover the signal f (also known as ) from the noisy observations yi (). Wavelet shrinkage is very simple and the estimator can be obtained by the following three steps.

  1. Apply the discrete wavelet transformation (DWT) to noisy data y, giving where d = W y, β = Wf(x), ε = W e and W is the orthogonal DWT matrix for a particular smoothing wavelet (SW). The vector β are considered to be the ‘true’ wavelet coefficients, d are the noisy empirical wavelet coefficients.
  2. Apply a shrinkage method and threshold (such as hard shrinkage and the universal threshold) to the noisy coefficients, d, to obtain estimates, , of the wavelet coefficients β.
  3. Apply the inverse DWT to the estimated coefficients to obtain an estimate, , of the underlying function f(x) at the data points x.

To enable us to obtain good estimates with a sound basis for obtaining credible intervals we adopt a Bayesian wavelet shrinkage approach as described next.

4.2 Bayesian Wavelet Shrinkage

Bayesian statistical methods start with existing prior knowledge of model parameters (β), which are updated using the data (y) to give posterior knowledge. The resulting posterior knowledge can be used to interpret these parameters. The model commonly used for Bayesian inference is (9) where p(yβ) is the likelihood, p(β) is the prior density function and p(βy) is the posterior density function of β given y. Credible intervals can be obtained from the upper and lower tail quantiles of the posterior distribution.

Adopting a Bayesian approach for wavelet shrinkage has become increasingly popular for wavelet denoising due to its excellent theoretical and practical properties, see [19, 20, 21, 22, 23] and [12], for example. Bayesian wavelet shrinkage has also been used for stationary spectral estimation in [24] and for credible intervals for regression by [25, 26] and [27]. The usual procedure is to place a prior distribution on the wavelet coefficients, use the Bayesian paradigm specified by Eq (9) with the necessary components specified as follows to enable us to derive a closed-form expression for the posterior means and variance. For parts of our specification below we shall use the empirical Bayes approach from [12].

4.3 Regression Model

We shall apply Bayesian wavelet shrinkage to the Haar-Fisz transformed wavelet periodogram, . Taking the DWT of Eq (8), for a particular scale j, we obtain (10) where hl, m = (Wj)l, m, βl, m = (Wj)l, m, ɛl, m = (Wej)l, m for scales l = 0, …, J − 1 and locations m = 1, …, 2l, and W is the T × T orthogonal DWT matrix associated with some [13] compactly supported wavelet. Due to the orthogonality of the wavelet transformation and the approximate error structure of the ej, k noted above, the distribution of the wavelet-transformed error is approximately , where . For notational clarity we shall cease mention of the scale index j. However, it should be remembered we are applying Bayesian wavelet shrinkage scale-by-scale j to Eq (8).

4.4 Prior

We propose using the Berger-Müller mixture prior for βℓ,m (11) where ξτ(β) = τξ(τβ), δ0(x) is the Dirac-delta function at zero, αl is the prior probability that the wavelet coefficient is zero, τl is the prior precision and ξ is the distribution of a non-zero wavelet coefficient. [12] recommended using a heavy-tailed distribution, such as the Laplace distribution, to model this parameter and we use this here. Therefore (12) where τl is the prior precision and is the prior variance for scale l = 1, …, J.

4.5 Hyperparameter Determination

As in [12] we use marginal maximum likelihood estimation (MMLE) to determine the hyperparameters: prior probability and precision (αl, τl), and error variance νl. To do this, we maximize the hyperparameters over the log-likelihood of the error distribution multiplied by the prior, (13) where (14) The maximum log-likelihood can not be obtained analytically and required numerical maximisation.

4.6 Likelihood

Due to Property 6.1 (4) the Haar-Fisz transformation bestows approximate/asymptotic Gaussianity upon the data. Hence, we assume a likelihood of the form (15) where ϕνl(⋅) is the the probability density function of the Gaussian distribution with variance , which we shall assume is equal to the error variance.

4.7 Posterior Distribution

By combining the prior and the likelihood, we obtain the posterior distribution of the form (16) where θl = αl(1 − αl) − 1 is the odds ratio.

We will use the posterior mean as our ‘estimator’ of the wavelet coefficients {βl, m}. The posterior mean can be obtained by evaluating the integral (17)

For credible intervals we require the posterior variance which can be calculated via the integral (18) To simplify notation define (19)

Lemma 1 The quantities Qi(h) for the Laplace mixture prior in Eq (12) are given by

  1. i = 0
  2. i = 1
  3. i = 2

Proof. in the appendix.

Proposition 1 The posterior mean of the wavelet coefficients in model (10) with components specified by sections 4.4 to 4.6 is given by: (20) and posterior variance by (21)

Proof. Substitute the formula(19) into Eqs (17) and (18).

The next result gives us the necessary log-likelihood function of our Bayesian model from Eq (13) for the Laplace mixture prior.

Lemma 2 The log-likelihood function for the Laplace mixture prior is where ϕν(⋅) is the zero mean Gaussian pdf with variance ν2, Φ(⋅) is the Gaussian cdf, and .

Proof. The proof uses the same methods as for the proof of Lemma 1.

5 Implementation, Simulation and an Example

5.1 Implementation Issues

We determine the hyperparameters via MMLE of Eq (13) using the function optim in R which uses the L-BFGS-B method from [28]. Empirical investigations revealed that with four coarsest scales, l = 0,1,2,3, as they consist of 1,2,4 and 8 wavelet coefficients (respectively), numerically maximising the log-likelihood for each scale resulted in strongly biased hyperparameter estimates. Therefore, instead of maximising the log-likelihood for the four coarsest scales separately, the coefficients were grouped together and maximisation was performed over all the four scales. To distinguish between scales, the hyperparameter estimates were scaled appropriately, such that as the scale decreased αl decreased and τl increased by a factor of two.

Ultimately, we are seeking an estimate of the posterior (mean and) variance of (z). Formula (21) gives us an estimate of the posterior variance of βl, m the wavelet coefficients of . We could use the approximate method of [25] to obtain the posterior variance of (z). This works well for Haar wavelets (where the square of the wavelet ψ2(z) is equal to the father wavelet) but less accurate for non-Haar wavelets. Hence, we adopt the following simple sampling strategy to obtain posterior credible intervals for (z).

We simulate S realisations for a complete set of wavelet coefficients {βl, m} from the posterior distributions given by Eq (16). Each realisation of wavelet coefficients is then subjected to the inverse wavelet transform which provides a posterior realisation of the = {(z1), …, (zn)}. We then use the sample mean and variance of the (zi) to provide the ‘estimate’ and credible intervals.

Fig 1 depicts a flow diagram of the entire computational process required to produce an estimate of the EWS via Bayesian wavelet shrinkage of the Haar-Fisz transformed wavelet periodogram and credible intervals.

Fig 1. Flow diagram of Bayesian modelling of the discrete wavelet transformation (DWT) of the Haar-Fisz (H-F) transformation of the raw wavelet periodogram using a pre-determined smoothing wavelet (SW).

5.2 Simulation

To test the performance of our method we simulated 200 realisations, , from the EWS in Fig 2 with Gaussian innovations as shown in Fig 3. The EWS was designed to encapsulate a time series with slowly varying power at a middle scale along with a burst of power at the finest scale.

Fig 3. Simulated locally stationary wavelet process generated using the Haar synthesis wavelet and Gaussian innovations from the spectrum in Fig 2.

These simulations were executed using the aforementioned wavethresh [29] package. First, we create a blank spectral object using the cns() function and then using the inserter function putD() we installed the sinusoidal spectral energy at level four and the small block at the finest scale. The realizations can then be generated by executing the LSWsim() function with the specified spectrum as an argument.

For each realization we produced a Bayesian Haar-Fisz and translation-invariant (TI) de-noised estimator using the Daubechies extremal phase (EP) with 1–10 vanishing moments, and Daubechies least asymmetric (LA) with 4–10 vanishing moments smoothing wavelets. The TI estimator was described in Section 2.2. The average mean squared error (AMSE) were calculated using Haar-Fisz estimator with twenty cycle spins to remove any features of the wavelet alignment which might unduly influence our estimator. See [15] for further details on cycle spinning.

We calculated the mean EP smoothing wavelet estimate for each of the 200 processes, then calculated AMSE for both methods. The overall AMSE for the TI De-noising estimators was 0.192 and for Bayesian Haar-Fisz estimators 0.131.

Table 1 shows the AMSE for each estimator and choice of smoothing wavelet. The EP1 corresponds to the Haar wavelet, which gives the poorest estimator for Haar-Fisz and second poorest for TI-D, this is only the best wavelet to use if the underlying structure of the EWS for each scale is known to be piecewise constant. We found that both methods seemed fairly robust to the choice of wavelet, as the difference between the AMSE appeared to be fairly small. Although we noticed the AMSE of the TI de-noising estimator decreased as the support of the wavelet increased, which was not the case for the Bayesian Haar-Fisz estimator. However, the Bayesian Haar-Fisz estimator consistently out performed the TI de-noising estimator and also with a much smaller variability (as indicated by the mean absolute deviation figures).

Table 1. Average mean square error (×10−3) over 200 simulations for the translation-invariant de-noising (TI-D) and Bayesian Haar-Fisz (H-F) estimators using the smoothing wavelets: Daubechies extremal phase (EP) with 1–10 vanishing moments and Daubechies least asymmetric (LA) with 4–10 vanishing moments. Figures in parentheses show the median absolute deviation (mad() in R) of the mean squared errors.

We compared the best TI de-noising estimator [2, SW = EP10], as shown in Fig 4 to our best estimator using Bayesian modelling of the Haar-Fisz periodogram (SW = LA6), see Fig 5, as determined by the results in Table 1.

Fig 4. Estimated evolutionary wavelet spectrum using translation-invariant denoising with SW = EP10 arising from the realisation in Fig 2.

Fig 5. Estimated evolutionary wavelet spectrum using our Bayesian Haar-Fisz method with SW = LA6, arising from the realisation in Fig 2.

Comparing the plots in Figs 4 and 5, we can see that the Bayesian Haar-Fisz estimator is less susceptible to Gibbs-type phenomena, but the leakage of power in neighbouring scales appeared to be fairly comparable for both estimators. Some of the power from scale j = 6 has leaked into j = 5,7, which has made recovery of the true underlying signal difficult.

Figs 69 show the EWS estimation for the simulated example in greater detail. The new method is certainly better at detecting the burst at the finest scale shown in Fig 6. In Fig 9 we judge our method to be comparable to the TI-denoising away from z = 0.6 and considerably better near to z = 0.6. (This is because the red and blue lines both roughly match the solid line truth, but the blue is much better near to z = 0.6 where the TI-D (red) suffers from extreme variability).

Fig 6. True evolutionary wavelet spectrum (black solid), translation-invariant estimator (red dotted) and our Bayesian Haar-Fisz estimator (blue dashed) for the j = 1 finest scale along with the 50% (dark grey) and 90% (light grey) credible intervals for the Bayesian Haar-Fisz estimator.

These estimates are all obtained by denoising the single realisation from Fig 3.

Fig 7. True evolutionary wavelet spectrum (black solid), translation-invariant estimator (red dotted) and our Bayesian Haar-Fisz estimator (blue dashed) for the j = 2 scale along with the 50% (dark grey) and 90% (light grey) credible intervals for the Bayesian Haar-Fisz estimator.

These estimates are all obtained by denoising the single realisation from Fig 3.

Fig 8. True evolutionary wavelet spectrum (black solid), translation-invariant estimator (red dotted) and our Bayesian Haar-Fisz estimator (blue dashed) for the j = 3 scale along with the 50% (dark grey) and 90% (light grey) credible intervals for the Bayesian Haar-Fisz estimator.

These estimates are all obtained by denoising the single realisation from Fig 3.

Fig 9. True evolutionary wavelet spectrum (black solid), translation-invariant estimator (red dotted) and our Bayesian Haar-Fisz estimator (blue dashed) for the j = 4 scale along with the 50% (dark grey) and 90% (light grey) credible intervals for the Bayesian Haar-Fisz estimator.

These estimates are all obtained by denoising the single realisation from Fig 3.

A key advantage of our new methodology is the ability to easily generate credible intervals which are shown by grey-scale in Figs 69. For example, even though the estimator for S3(z) appears to be non-zero in Fig 8, the 50% credible intervals completely contain zero which indicates (correctly) that there is no real power at this scale level. The same is true, but less clear maybe, in Fig 7.

5.3 ECG Example

To test our methods further, we consider the study of infant sleep [30]. Five mothers and their healthy first-born infants slept in a sleep laboratory designed to be similar to a normal domestic bedroom once a month for the first five months. The rooms were thermally controlled and all infants slept supine in a cot besides their mother, who were free to care for their infants as they would at home (e.g. feed, change nappy, etc). Most studies commenced around 8–9pm and finished around 8–9am the next morning.

Amongst the measurements taken of each infant was their heart rate via ECG (electro-cardiogram) monitors, their brain waves via a EEG (electro-encephalogram) sensor and eye movements using a EOG (electro-oculogram) sensor. The infant’s sleep state was then determined through manual analysis where a trained observer visually interprets the EEG and EOG at predetermined time periods, which can be time consuming and laborious. Four sleep states were recorded: AWAKE, ACTIVE SLEEP, BETWEEN and QUIET SLEEP. For simplicity, we have combined the latter three states into ASLEEP.

This data is freely available as part of the wavethresh [29] package for R [31] in the data sets BabyECG and BabySS.

Fig 10 is a plot of 2048 observations sampled every 16 seconds recorded from 21:17:59 to 06:27:18 of the ECG and determined sleep state for the same sixty-six day old infant. The plot indicates that when the infant is awake there is a larger variance in the infant’s heart rate compared to the two different sleep stages, for which quiet sleep appears to possess the smallest variance. We have produced an estimate of the EWS for the differenced ECG data to establish whether we could use the second order structure of the data to determine the infant’s sleep state. The plot in Fig 11 implies the majority of the power of the spectrum is present at the finest scale. There appears to be some difficulty in discerning the infant’s sleep state when it changes quickly (such as between location z ∈ [0.2,0.4]). As with earlier analyses, such as that in [18], there appears to be a link between active sleep and higher power at the finest scale. However, our new analysis reveals much more: that there is more uncertainty associated with the higher power estimates and more certainty when the power is lower. The arrangement of the posterior mean estimate relative to the 50/90% credible intervals indicates skew in the posterior distribution which is especially noticeable around the peak near 0.65. A blow-up of the finest scale power is shown in Fig 12.

Fig 10. Electrocardiogram plot (light grey line) and sleep state (black solid line) of a 66 day old infant sampled every 16 seconds recorded from 21:17:59 to 06:27:18.

Fig 11. Estimated evolutionary wavelet spectrum for all scales of the infant ECG data.

Fig 12. Estimated evolutionary wavelet spectrum (blue dashed line) for j = 1 with 50% (dark grey) and 90% (light grey) credible interval for the differenced Infant ECG data and sleep state (black solid line).

6 Conclusion and Further Work

This article combines the Haar-Fisz transform with Bayesian wavelet shrinkage to obtain a new method for modelling the evolutionary wavelet spectrum of a locally stationary wavelet process. Bayesian wavelet shrinkage is known and powerful technique and well-established for noisy data contaminated by uncorrelated Gaussian noise which the Haar-Fisz transform approximately, but effectively, provides. Although there are competing methods for spectral estimation there are, as far as we know, no methods for generating credible intervals for evolving spectra certainly in the wavelet case. Our Bayesian wavelet shrinkage gives a rational method for assessing uncertainty in this case providing us with approximate credible intervals.

Further work to improve our method would be to improve our method of determining hyperparameters and also investigate its application to irregularly spaced time series. Another interesting possibility is to apply Bayesian wavelet shrinkage to Haar-Fisz transformed spectra in the stationary or locally stationary Fourier case.

A Proofs

Proof of Lemma 1

The integral in Eq (19) can be shown to be equal to: (22) (23) where ϕν(⋅) is the zero mean Gaussian pdf with variance ν2, and . Formula (23) is obtained by substituting in the formula for the Laplace density in Eq (19) and splitting the integral into two parts on the negative and positive domains. Then, on each integral, the exp(−τlx∣) term is merged with the exponential in the normal density and then the square completed for each term.

Finally, to obtain the quoted formulae in Lemma (1) use the following properties of the Gaussian distribution:


Kara Stevens was supported by a studentship funded by the SuSTaIn Science and Innovation Award grant EP/D063485/1. Guy Nason was partially supported by EPSRC grants from EP/I01687X/1: “The Energy Programme, an RCUK cross-council initiative led by EPSRC and contributed to by ESRC, NERC, BBSRC and STFC” and EP/K02951/1. The authors would like to thank Peter Fleming, Jeanine Young and K Pollard of the Institute of Child Health, The Royal Hospital for Sick Children, Bristol for supplying the data.

Author Contributions

Conceived and designed the experiments: GPN KS. Performed the experiments: KS. Analyzed the data: KS GPN. Contributed reagents/materials/analysis tools: GPN. Wrote the paper: GPN KS.


  1. 1. Dahlhaus R. Fitting Time Series Models to Nonstationary Processes. Annals of Statistics. 1997;25:1–37.
  2. 2. Nason GP, von Sachs R, Kroisandt G. Wavelet Processes and Adaptive Estimation of the Evolutionary Wavelet Spectrum. J Roy Statist Soc B. 2000;62:271–292.
  3. 3. Page CH. Instantaneous power spectra. Journal of Applied Physics. 1952;23:103–106.
  4. 4. Silverman RA. Locally stationary random processes. IRE Trans Information Theory. 1957;IT-3:182–187.
  5. 5. Priestley MB. Evolutionary Spectra and Non-Stationary Processes. Journal of the Royal Statistical Society: Series B. 1965;27:204–237.
  6. 6. Nason GP, von Sachs R. Wavelets in time series analysis. Phil Trans R Soc Lond A. 1999;357:2511–2526.
  7. 7. Dahlhaus R. Locally stationary processes. Handbook of Statistics. 2012;30.
  8. 8. Fryzlewicz P, Van Bellegem S, von Sachs R. Forecasting non-stationary time series by wavelet process modelling. Ann Inst Statist Math. 2003;55:737–764.
  9. 9. Van Bellegem S, von Sachs R. Locally Adaptive Estimation of Evolutionary Wavelet Spectra. Ann Stat. 2008;36:1879–1924.
  10. 10. Fryzlewicz P, Nason GP. Haar-Fisz Estimation of Evolutionary Wavelet Spectra. J Roy Statist Soc B. 2006;68:611–634.
  11. 11. Donoho DL, Johnstone IM. Ideal Spatial Adaption via Wavelet Shrinkage. Biometrika. 1994;81:425–455.
  12. 12. Johnstone IM, Silverman BW. Empirical Bayes Selection of Wavelet Thresholds. Ann Stat. 2005;33:1700–1752.
  13. 13. Daubechies I. Ten Lectures on Wavelets. Philadelphia: SIAM; 1992.
  14. 14. Eckley IA, Nason GP. Efficient computation of the discrete autocorrelation wavelet inner product matrix. Statistics and Computing. 2005;15:83–92.
  15. 15. Coifman RR, Donoho DL. Translation-invariant de-noising. In: Antoniadis A, Oppenheim G, editors. Wavelets and Statistics. vol. 103 of Lecture Notes in Statistics. Berlin: Springer-Verlag; 1995. p. 125–150.
  16. 16. Donoho DL, Johnstone IM. Adapting to Unknown Smoothness via Wavelet Shrinkage. J Am Statist Ass. 1995;90:1200–1224.
  17. 17. Vidakovic B. Statistical Modeling by Wavelets. New York: Wiley; 1999.
  18. 18. Nason GP, Silverman BW. The stationary wavelet transform and some applications. In: Antoniadis A, Oppenheim G, editors. Wavelets and Statistics. vol. 103 of Lecture Notes in Statistics. Berlin: Springer-Verlag; 1995. p. 281–300.
  19. 19. Chipman HA, Kolaczyk ED, McCulloch RE. Adaptive Bayesian Wavelet Shrinkage. J Am Statist Ass. 1997;92:1413–1421.
  20. 20. Vidakovic B. Wavelet-Based Nonparametric Bayes Methods. In: Practical Nonparametric and Semiparametric Bayesian Statistics. vol. 133 of Lecture Notes in Statistics. Berlin: Springer-Verlag; 1998. p. 133–155.
  21. 21. Clyde MA, George EI. Empirical Bayes estimation in wavelet nonparametric regression. In: Bayesian Inference in Wavelet Based Models. Berlin: Springer-Verlag; 1999. p. 309–322.
  22. 22. Müller P, Vidakovic B. Bayesian inference with wavelets: density estimation. J Comput Graph Stat. 1999;7:456–468.
  23. 23. Ruggeri F, Vidakovic B. Bayesian Modeling in the Wavelet Domain. In: Dey DK, Rao CR, editors. Bayesian Thinking Modeling and Computation. vol. 25 of Handbook of Statistics. Amsterdam: Elsevier; 2005. p. 315–338.
  24. 24. Pensky M, Vidakovic B, De Canditiis D. Bayesian Decision theoretic Scale-Adaptive Estimation of a Log-Spectral Density. Statistica Sinica. 2007;17:635–666.
  25. 25. Barber S, Nason GP, Silverman BW. Posterior Probability Intervals for Wavelet Thresholding. J Roy Statist Soc B. 2001;64:189–205.
  26. 26. Semadeni C, Davison AC, Hinkley DV. Posterior probability intervals in Bayesian wavelet estimation. Biometrika. 2004;91:497–505.
  27. 27. Davison A, Mastropietro D. Saddlepoint approximation for mixture models. Biometrika. 2009;96:479–486.
  28. 28. Byrd RH, Lu P, Nocedal J, Zhu C. A limited memory algorithm for bound constrained optimization. SIAM J Sci Comp. 1995;16:1190–1208.
  29. 29. Nason GP. wavethresh: Wavelets Statistics and Transforms; 2012. R package, version 4.6.1. Available from:
  30. 30. Sawczenko A, Galland B, Young J, Fleming P. Night time mother-infant interactive behaviour and physiology: a longitudinal comparison of room sharing versus bed sharing (‘co-sleeping’). Pediatric Pulmonology. 1995;20:341.
  31. 31. R Development Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria; 2012. ISBN 3-900051-07-0. Available from: