Multi-channel framelet denoising of diffusion-weighted images

Diffusion MRI derives its contrast from MR signal attenuation induced by the movement of water molecules in microstructural environments. Associated with the signal attenuation is the reduction of signal-to-noise ratio (SNR). Methods based on total variation (TV) have shown superior performance in image noise reduction. However, TV denoising can result in stair-casing effects due to the inherent piecewise-constant assumption. In this paper, we propose a tight wavelet frame based approach for edge-preserving denoising of diffusion-weighted (DW) images. Specifically, we employ the unitary extension principle (UEP) to generate frames that are discrete analogues to differential operators of various orders, which will help avoid stair-casing effects. Instead of denoising each DW image separately, we collaboratively denoise groups of DW images acquired with adjacent gradient directions. In addition, we introduce a very efficient method for solving an ℓ0 denoising problem that involves only thresholding and solving a trivial inverse problem. We demonstrate the effectiveness of our method qualitatively and quantitatively using synthetic and real data.


Introduction
Diffusion MRI affords in vivo insights into brain tissue microstructure and allows reconstruction of white matter pathways for neuroscience studies involving development, aging, and disorders [1][2][3][4][5]. However, since diffusion MRI derives its contrast from MR signal attenuation, it suffers from low signal-to-noise-ratio (SNR), which complicates subsequent quantitative analyses. To improve SNR, multiple repetitive scans are typically acquired and averaged for noise reduction. This however inevitably prolongs acquisition times and is hence prohibitive in clinical settings. Post-acquisition algorithms, such as total variation (TV) denoising [6], have been widely adopted due to their ability to remove noise without requiring additional acquisition time. PLOS  Diffusion-weighted (DW) images are typically acquired with non-collinear gradient directions. As shown in Fig 1, DW images that are scanned with similar gradient directions share a lot of commonalities. However, these commonalities diminish very quickly if the difference between the gradient directions increases. Denoising performance can be improved by making full use of information between images scanned with similar gradient directions; however, images scanned with very different gradient directions have to be avoided to reduce artifacts. We can also observe from the figure that the DW images are typically very noisy, indicating the great importance of denoising.
In this paper, we propose a group ℓ 0 minimization denoising framework that utilizes tight wavelet frames and takes advantage of the correlation between DW images scanned with neighboring gradient directions. The power of tight wavelet frames lies in their ability to sparsely approximate piecewise smooth functions and the existence of fast decomposition and reconstruction algorithms associated with them. In contrast, TV based methods are effective on restoring images that are piecewise constant, e.g., binary or cartoon-like images. They will, however, cause staircasing effects in images that are not piecewise constant [6].
TV denoising is typically realized by penalizing the ℓ 1 -norm of image gradients. Instead of ℓ 1 regularization, which has been shown in the theory of compressed sensing [7] to produce sparse solutions, we opt to use ℓ 0 regularization. In [8], wavelet frame based ℓ 0 regularization shows better edge-preserving quality compared with the conventional ℓ 1 regularization. In [9], iterative hard threshoding algorithms show better performance than iterative soft thresholding algorithms. Based on these facts, we propose a group version of ℓ 0 minimization to take advantage of the correlation between DW images. Extensive experiments were carried out using synthetic data with different levels of noncentral chi (nc-χ) noise and real diffusion MRI data. The experimental results demonstrate that the proposed method outperforms TV denoising and non-local means (NLM) denoising [10]. Part of this work has been presented in a workshop [11]. Herein, we provide additional examples, results, derivations, and insights that are not part of the workshop publication. The rest of the paper is organized as follows: In Approach Section, we will provide detailed descriptions for our method. In Experiments Section, we will demonstrate the effectiveness of our method using extensive experiments on synthetic data and real data. In Discussion Section, we will provide in-depth discussions of our method. Finally, we will conclude this work in Conclusion Section.

Approach
We will provide first a brief introduction to framelets, followed by details on how framelets can be incorporated into an ℓ 0 minimization framework for DWI denoising.

Tight framelets
where h�, �i is the inner product of L 2 ðRÞ. It is clear that an orthonormal basis is a tight frame, since the identity hold for arbitrary orthonormal bases in L 2 ðRÞ. When X(C) forms an orthonormal basis of L 2 ðRÞ, X(C) is called an orthonormal wavelet basis. Tight frames are generalization of orthonormal bases with greater redundancy-a property central to sparse representation and often desirable in applications such as denoising [13]. Given a set of generators C :¼ fc 1 ; . . . ; c R g � L 2 ðR d Þ, which are desirably (anti)symmetric and compact functions, the corresponding quasi-affine system X(C) from level J is the collection of dilations and shifts of C: with c l;r;k ≔ 2 l 2 c r ð2 l � À kÞ; l � J; When X(C) forms a (tight) frame of L 2 ðRÞ, each function ψ r , r = 1, . . ., R, is called a (tight) framelet and the whole system X(C) is called a (tight) wavelet frame system. A tight wavelet frame is also called a Parseval frame. Note that in the literature the affine (or wavelet) system, which corresponds to the decimated wavelet (frame) transforms, is commonly used. The quasi-affine system above, introduced and analyzed in [14], corresponds to the undecimated wavelet (frame) transforms and essentially oversamples the wavelet frame system starting from level J − 1 and downwards. In this paper, we focus on the quasi-affine system because it has been shown to work better in image restoration [12]. We set J = 0 and consider only l < 0. An approach to constructing framelets C is by utilizing multiresolution analysis (MRA) [12]. One starts with a refinable function ϕ with refinement mask a 0 2 ' 2 ðZÞ satisfying and�ð0Þ ¼ 1, where� denotes Fourier transform of ϕ. The key is to find the masks a r 2 ' 2 ðZÞ that gives The finite sequences a 1 , . . ., a R are called wavelet frame masks, or the high pass filters of the system. The refinement mask a 0 is also known as the low pass filter. The two equations above can be combined by defining ψ 0 ≔ ϕ. The unitary extension principle (UEP) [14] provides a general theory for constructing MRA-based tight wavelet frames. That is, as long as {a 1 , . . ., a R } are finitely supported and their Fourier series satisfy for all ν 2 {0, π} and ξ 2 [−π, π], the quasi-affine system X(C) forms a tight frame in L 2 ðRÞ. For example, consider the centered B-splines of order p, i.e., with j = 0 when p is even and j = 1 when p is odd. The corresponding refinement mask is given asâ where r = 1, 2, . . ., p. It is straightforward to show that the UEP conditions (6) are satisfied. Wavelet frame masks for p = 1, 2, 4 are shown in Table 1. It is worth noting that these masks corresponds to differential operators of various orders. For example, for piecewise linear Bspline, the masks a 1 and a 2 correspond to the first order and second order difference operators respectively up to a scaling factor. When a tight wavelet frame is used, the given data is considered to be sampled as a local average Noting that [12] hf ; c lÀ 1;r;k i ¼ where the dilated sequence is defined as  The decomposition and reconstruction down to level −L [12], i.e., where can be realized with convolution using the masks. Denoting by W the L-level framelet decomposition, i.e., where � denotes the convolution operator. If we use W > to denote the framelet reconstruction, we have W > W = I, i.e., f = W > Wf. Given a 1-dimensional framelet system for L 2 ðRÞ, the d-dimensional tight wavelet frame system for L 2 ðR d Þ can be easily constructed by using tensor products of the 1-dimensional framelets [12].

Problem formulation
Given a multi-channel or vector-valued image f of an arbitrary dimension with voxel i 2 {1, . . ., N} consisting of vector f i 2 < M , where N is the number of voxels and M is the number of channels, we are interested in restoring its denoised counterpart u by solving the following problem: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi X Here, u (m) is the m-th channel of u. The regularization term is in fact a summation of G terms, each of which grouping a number of channels. The g-th grouping (with associated tuning parameter λ g,l,r ), where g = {1, 2, . . ., G}, is defined according to the set of weights {w g,m }, where m 2 {1, 2, . . ., M}. Channels with w g,m 6 ¼ 0 are included in the grouping and their weighted framelet coefficients are jointly considered via ℓ 2 -norm for penalization. The different groups can possibly overlap, implying that each channel can be included in different groups at the same time. This is in spirit similar to the overlapped group Lasso [15]. We set l g;l;r ¼ lð P m w 2 g;m Þ 1 2 ; l; r 6 ¼ 0; 0; otherwise: Here λ is a constant that can be set independently of the weights.

Optimization
Problem (18) can be solved effectively using penalty decomposition (PD) [16]. Defining auxiliary variables In PD, we (i) alternate between solving for u and v using block coordinate descent (BCD). Once this converges, we (ii) increase μ > 0 by a multiplicative factor δ > 1 and repeat step (i). This is repeated until increasing μ does not result in further changes to the solution [16]. See Algorithm 1 for a summary of the algorithm. Convergence analysis is provided in the S1 Appendix.
First subproblem. We solve for v in the first problem, i.e., min v L μ (u, v). This is a group ℓ 0 problem and the solution can be obtained via hard-thresholding: ðv g;m;l;r Þ i ¼ w g;m ðW l;r u ðmÞ Þ i ; if ðh g;l;r Þ i � 2l g;l;r m ; 0; if otherwise; where This subproblem can be replaced using soft-thresholding to obtain an ℓ 1 version of the algorithm. Second subproblem. By taking the partial derivative with respect to u (m) , the solution to the second subproblem, i.e., min u L μ (u, v), is for each m I þ m 2 where we have dropped the subscript i for notation simplicity. Note that since we have Otherwise, set u k+1,0 = u k . (4) Set k ! k + 1 and go to Step (1).

Setting the weights
In our case, each channel corresponds to a DW image. In setting the weights {w g,m }, we note that the weights should decay with the dissimilarity between gradient directions associated with a pair of DW images. To reflect this, we let G = M and set, for g, m 2 {1, . . ., M}, w g;m ¼ e k½ðn > m n g Þ 2 À 1� ; jn > m n g j ⩾ cosðyÞ; 0; otherwise; ( ð26Þ where κ � 0 is a parameter that determines the rate of decay of the weight. The exponential function is in fact modified from the probability density function of the Watson distribution [17] with concentration parameter κ. Essentially, this implies that for the g-th DW image acquired at gradient direction ν g , there is a corresponding regularization group that includes a set of images with associated weights {w g,m }. The weight is maximal at w g,g = 1 and is attenuated when m 6 ¼ g. Weights of images scanned at a gradient direction with angle greater than θ in relation to ν g are set to 0, and the respective images are hence discarded from the group. We set θ = 30˚.

Debiasing
The magnitude of the complex MR signal is commonly used because the phase of the complex signal is highly sensitive to many experimental factors [18,19]. The magnitude MR signal is not affected by the phase error and it follows a nc-χ distribution [20,21] rather than a Gaussian distribution and bias correction needs to be carried out especially when the SNR is low [18]. Bias correction can be performed before [22] or after [23] denoising. In our case, we adopted the latter for unbiased noise reduction [23].

Experiments
The main goal in the following experiments is to demonstrate that denoising performance can be improved by using 1. UEP-based tight wavelet frames, which avoids the staircasing effect; 2. ℓ 0 over ℓ 1 regularization; 3. Collaborative utilization of angularly neighboring DW images.
Unless stated otherwise, we used the piecewise linear tight wavelet frame with L = 2 levels of decomposition. The optimal λ values for ℓ 0 and ℓ 1 were in (1,8], determined using grid search from 0.2 to 50 in steps of 0.2 based on the maximal peak signal-to-noise ratio (PSNR) defined  as where MAX is the maximal signal value and MSE is the mean square error. For debiasing, the noise level is estimated from the image background using the method described in [24]. More advanced noise estimation methods [23,25] can be used for improved accuracy. We utilized NLM filtering as a comparison baseline. Following the work presented in [26], we set the patch radius to 1 and search radius to 2.

Datasets
Spiral data. A synthetic dataset of a spiral was generated for quantitative evaluation. The parameters used for synthetic data simulation were consistent with the real data described next: b = 2000s/mm 3 , 48 gradient directions, 64 × 64 × 16 voxels with resolution 2 × 2 × 2 mm 3 . Three levels of 32-channel nc-χ noise [27] was added: σ = 5, 7.5, and 10, corresponding to SNR = 30, 20, 10. SNR is defined as η/σ [28], where η is the true signal value, which in our case is the white matter non-DW signal. The varying curvature reflects the various degree of bending of white matter pathways and gives us a good basis for evaluating how denoising performance changes in different conditions. ISBI phantom. Evaluation was also performed using the realistic diffusion MRI phantom adopted in the ISBI 2013 HARDI challenge (http://hardi.epfl.ch/static/events/2013_ISBI/). A python package, called phantomαs [29], was used to generate the noise free phantom, with gradient directions and diffusion weighting consistent with the spiral data described above. Three levels of 32-channel nc-χ noise, similar to the spiral data, was added to the noise free phantom.
Real data. DW images were acquired using Siemens 3T TRIO MR scanner with the same gradient directions and b-value as the spiral data. The imaging protocol is as follows: 128 × 96 imaging matrix, voxel size of 2 × 2 × 2 mm 3 , TE = 97 ms, TR = 11, 300 ms, 32-channel receiver coil. Imaging acquisition was repeatedly performed on the same subject for 8 times. We averaged the 8 sets of DW images and removed the nc-χ noise bias to obtain the ground truth for evaluation. Informed written consent was obtained from the subject and the experimental protocol was approved by the Institutional Review Board of the University of North Carolina (UNC) School of Medicine. The study was carried out in accordance with the approved guidelines.

Results
The staircasing effect. The staircasing effect is often observed in denoising based on TV regularization [30]. The power of tight wavelet frames lies in their ability to sparsely approximate piecewise smooth functions. They are hence better suited for images with gradual intensity changes. In Fig 2, we show an example of how piecewise linear framelet denoising avoids the staircasing effect and results in a smoother image without blocking artifacts. In contrast, TV denoising causes patch artifacts when the image is not piecewise constant.
Bias correction. The noise-induced bias on the estimated magnitude signal is especially prominent when the diffusion weighting is high. We removed the nc-χ bias using the method described in [27]. signal is low [27]. This manifests as elevation of intensity value after denoising. Removing the noise bias produces an image that is closer to the ground truth.
Type of framelets and number of levels. Using the spiral data for evaluation, our results shown in Fig 4 indicate that piecewise linear framelet denoising performs better than other types of framelets. Fig 5 indicates that denoising performance improves with the increase in the number of levels, L. However, the time cost increases dramatically with L, i.e., 27 s for L = 1, 29 s for L = 2, and 378 s for L = 3 (based on a 4-core Intel i7 processor). Therefore, we choose L = 2 for reasonable denoising performance with a reasonable time cost.
Effects of grouping. Fig 6 shows the results of denoising with and without grouping of angularly neighboring images. Grouping can be observed to significantly improve PSNR.
Comparison between methods. The PSNR and SSIM [31] results for the spiral data and ISBI phantom, shown in Figs 7 and 8, indicate that the proposed ℓ 0 -based framelet method gives the best performance for all noise levels. The DW images, shown in Figs 9 and 10, indicate that both ℓ 1 and ℓ 0 give sharper edges compared with NLM. Noise, however, is not totally removed for the case of ℓ 1 . Only ℓ 0 is able to effectively remove noise and preserve edges. Note that, for both synthetic and real data, nc-χ bias was removed using the method described in [27]. We also compared the computational times of different methods using the spiral data with a computer equipped with a 4-core Intel i7 processor. The results, shown in Table 2, indicate that our method and ℓ 1 perform more efficiently than NLM.
Real data. For the real data, we used the average image as the ground truth for quantitative evaluations. The results for all 8 datasets, shown in Fig 11, are consistent with Figs 7 and 8, indicating that ℓ 0 gives the best performance. The visual results in Fig 12 indicate that the results given by ℓ 0 is closest to the ground truth. This is confirmed by the root-mean-square  Multi-channel framelet denoising error (RMSE) map computed between the denoised data and the ground truth data. In contrast, NLM over-smooths the image and edge information is hence lost.

Discussion
In this paper, we have introduced a method that harnesses correlations between DW images scanned with similar gradient directions for effective edge-preserving denoising. Our main contributions lie in three aspects. Firstly, UEP was employed to generate frames that were discrete analogues to differential operators of various orders; Secondly, instead of the conventional ℓ 1 regularization, a very efficient method was proposed in order to solve an ℓ 0 denoising problem that involves only thresholding and a trivial inverse problem; Thirdly, DW images acquired using neighboring gradient directions were used for collaborative denoising. NLM was used as a comparison baseline in our evaluation. However, similar to [22,32], we found the performance of NLM to be unsatisfactory. NLM can be improved by designing better metrics for patch matching, instead of the conventional Euclidean distance. For instance, inspired by the human visual system, Foi and Boracchi [33] proposed a patch foveation operator for measuring patch distance. Baselice [34] proposed to measure pixel using the Kolmogorov-Smirnov distance, showing promising performance in reducing speckle noise in ultrasound images. NLM can be further improved by extending its search volume. For instance, collaborative NLM [35] extended the search volume to a number of co-denoising images to enrich the similar information used in noise reduction. Chen et al. [36,37] proposed to improve NLM by considering the similar information in both spatial domain and diffusion wavevector domain. This idea was further employed to improve atlas building [38] and resolution enhancement [39].

Conclusion
In conclusion, we have proposed a method to remove the noise in DW images. The proposed method takes advantage of multi-channel framelet and the correlations among DW images for effective noise removal. The associated ℓ 0 optimization problem is solved by an effective iterative hard thresholding algorithm. Extensive experiments on synthetic data and real data demonstrate the advantage of our method over various noise reduction methods, including TV regularization, NLM, and the ℓ 1 counterpart of our method.
Supporting information S1 Appendix. Proof of convergence. (PDF)