PCANet based nonlocal means method for speckle noise removal in ultrasound images

Speckle reduction remains a critical issue for ultrasound image processing and analysis. The nonlocal means (NLM) filter has recently attached much attention due to its competitive despeckling performance. However, the existing NLM methods usually determine the similarity between two patches by directly utilizing the gray-level information of the noisy image, which renders it difficult to represent the structural similarity of ultrasound images effectively. To address this problem, the NLM method based on the simple deep learning baseline named PCANet is proposed by introducing the intrinsic features of image patches extracted by this network rather than the pixel intensities into the pixel similarity computation. In this approach, the improved two-stage PCANet is proposed by using Parametric Rectified Linear Unit (PReLU) activation function instead of the binary hashing and block histograms in the original PCANet. This model is firstly trained on the ultrasound database to learn the convolution kernels. Then, the trained PCANet is utilized to extract the intrinsic features from the image patches in the pre-denoised version of the noisy image to be despeckled. These obtained features are concatenated together to determine the structural similarity between image patches in the NLM method, based on which the weighted mean of all pixels in a search window is computed to produce the final despeckled image. Extensive experiments have been conducted on a variety of images to demonstrate the superiority of the proposed method over several well-known despeckling algorithm and the PCANet based NLM method using ReLU function and sigmoid function. Visual inspection indicates that the proposed method outperforms the compared methods in reducing speckle noise and preserving image details. The quantitative comparisons show that among all the evaluated methods, our method produces the best structural similarity index metrics (SSIM) values for the synthetic image, as well as the highest equivalent number of looks (ENL) value for the simulated image and the clinical ultrasound images.


Introduction
Medical imaging plays a critical role in disease monitoring and diagnosis. Compared with other imaging techniques such as X-ray, CT and MRI, ultrasound imaging is a noninvasive, real-time and radiation-free imaging modality. However, the ultrasound images are inevitably corrupted by speckle noise due to the coherent imaging mechanism from the scatters [1]. Such a noise reduces the sharpness of image details and complicates the diagnosis of the tiny structure of lesions. Therefore, despeckling is of great significance for improving ultrasound image quality. Moreover, as a pre-processing step, denoising will also benefit image post-processing tasks such as image segmentation, classification and registration.
In ultrasound imaging, speckle noise has a random granular pattern. The distribution of speckle noise is signal dependent and is governed by Fisher-Tippett distribution [2] or Gamma distribution [3], which can be represented [4,5]  where (x,y) is the pixel location, v(x,y) is the clean image, u(x,y) is the noisy image, η is a Gaussian noise distribution with zero-mean and variance σ 2 , and the factor γ is related to the ultrasound devices and additional processing. Extensive studies indicate that γ = 0.5 can be used to model speckle noise in the ultrasound images [4,6]. Many methods have been developed for ultrasound images despeckling [7]. In general, the existing despeckling methods can be classified as the spatial domain based methods and the frequency domain based ones. The traditional spatial domain based methods, such as Frost filter [8], Kuan filter [9], squeeze box filter (SBF) [10] and speckle reducing anisotropic diffusion (SRAD) filter [11], are based on the local comparison of pixels. One disadvantage of these approaches is that they cannot deliver sufficient noise reduction while preserving image details effectively. For the frequency domain based methods, the most popular techniques are the wavelet based methods [12][13][14]. These methods work by transforming speckle noise into additive noise and then removing it within the wavelet domain. They tend to introduce the artifacts related to the choice of mother wavelet.
Recently, the nonlocal means (NLM) filter proposed by Buades et al. [15] is considered as a state-of-the-art algorithm for eliminating the additive noise. This method explores the selfsimilarities between image patches instead of individual pixels, and each image pixel is restored by the weighted average of all pixels in a search window. Nowadays, the NLM method has been widely applied to denoise the medical images such as MR images and CT images [16][17][18]. Despite the success in removing Gaussian noise, the traditional nonlocal means (TNLM) method by its very nature becomes unsuitable for speckle noise reduction since speckle noise is different from Gaussian noise significantly. To overcome this drawback, several modified NLM-based approaches have been presented [5,[19][20][21][22] for despeckling. Specifically, Zhan et al. [5] have proposed a weight refining method for speckle noise reduction in which the weights are calculated in the lower-dimensional principal component analysis (PCA) subspace. Coupe et al. [19] have introduced the optimized Bayesian nonlocal means (OBNLM) filter. In contrast to the TNLM, the OBNLM filter determines the similarity between two image patches based on the Pearson distance derived by the Bayesian framework instead of the Euclidean distance of TNLM approach. Yang et al. [21] have presented a hybrid despeckling approach which combines the NLM with the local statistics of noise (NLMLS). In this method, the local statistics of speckle noise are used to pre-filter the ultrasound image and the similarity is computed based on the pre-filtered image to produce the final denoised result. Two total variation (TV) model based NLM methods have been proposed by Dong et al. [22] to reduce the multiplicative noise. Compared with the classical TV-based methods, the nonlocal TV methods perform better in preserving image textures and repetitive structures. The above-mentioned methods as the extension of TNLM method have contributed to the development of ultrasound image restoration techniques.
However, most of the NLM-based despeckling methods depend on the utilization of graylevel information and hand-crafted features. These features are not sufficient to accurately represent the structural similarity between image patches in the ultrasound images. If the intrinsic features can be learned from the images of interest for structural similarity computation, the better despeckled results can be obtained.
Deep learning, as a popular algorithm within the research community of machine learning, can automatically learn the intrinsic features from the training data. Up to now, various deep learning models, such as deep belief network (DBN), stacked auto-encode (SAE), convolutional neural networks (CNN) and nonlocal deep network [23], have been proposed. Among these models, the CNN is very popular due to the use of convolutional architectures. In recent years, many CNN-based variations have been introduced and successfully applied to various visual tasks such as feature extraction, classification [24,25], super-resolution [26], object detection [27,28] and image denoising [29][30][31]. In particular, Zhang et al. [31] have proposed a denoising convolutional neural network (DnCNN) model for removing the additive white Gaussian noise (AWGN), in which a very deep convolutional architecture is constructed while residual learning strategy and batch normalization are integrated to speed up the training process. Although the DnCNN and other CNN-based denoising models exhibit the promising denoised results, they generally involve the following problems. The training of CNN network is very complicated because it involves numerous parameters to be learned and requires too much empirical knowledge and special skills in parameter setting. In addition, most of the CNN-based denoisers are specially designed for denoising AWGN, and they cannot handle speckle noise very well. Thus, how to construct a simple and effective learning network for ultrasonic speckle reduction will pose a great challenge.
More recently, the principal components analysis network (PCANet), a very simple unsupervised deep learning baseline introduced by Chan et al. [32], has been proposed for image classification and hand-written digit recognition. This network consists of three simple data processing components: cascaded PCA, binary hashing and blockwise histograms. In this network, the basic PCA is first employed for learning the multistage convolution kernels, and the subsequent binary hashing and the blockwise histogram are utilized to produce the output features. Compared with the CNN-based models, the training of the PCANet model is extremely simple and efficient because no any regularized parameters or numerical optimization solvers are required for the involved filter learning [32].
Due to the advantage of the PCANet, it will be of significance to introduce this model into the NLM method to characterize structural similarity of image patches by extracting the robust intrinsic features from the ultrasound images. However, the binary hashing used in the PCANet may lead to the loss of useful feature information. To overcome the problem, we will present an improved PCANet in which an activation function Parametric Rectified Linear Unit (PReLU) [33,34] is used instead of the binary hashing and block histograms at the output stage to extract the intrinsic features. To test the restoration performance of the proposed method, the extensive experiments on the synthetic image, the simulated image and the real ultrasound images are performed to make the comparisons among the proposed method and other despeckling methods. The experimental results demonstrate the superiority of the proposed method in speckle reduction and image detail preservation.

The modified PCANet model
The PCANet model is a simple and valuable baseline for image classification and recognition tasks. The intrinsic features can be effectively extracted through three processing layers including the PCA-based convolutional layer, the binary hashing-based nonlinear layer and the histograms-based output layer. These processing methods used in the nonlinear layer and the output layer, which are similar to sparse representation strategies [35,36], will result in the loss of image features. To overcome the disadvantage, we propose a modified PCANet, which consists of three processing components: the two convolutional layers and the output layer. The detailed structure of this network is given as follows.
A. The convolutional layer. Let the number of training images be N, the patch size be k 1 ×k 2 at all stages. For the p-th training image, all the patches around each pixel are collected step by step. For each patch in this training image, its mean is subtracted from the intensities of all pixels in this patch. Accordingly, the mean-removed matrix A p = [a p,1 ,a p,2 ,. . .,a p,S ] is produced for the whole image, where a p,s denotes the s-th mean-removed vectorized patch, and S is the number of patches produced from the p-th image. By processing all the training images in the same way, the matrix A is obtained as: The PCA operation is then implemented on A to derive the convolution kernels. The reconstruction error within a family of orthonormal filters is minimized by the PCA algorithm, i.e., where L 1 is the number of filters in the first layer, I L 1 is the identity matrix of size L 1 ×L 1 , and ||�|| F denotes the Frobenius norm. The solutions of Eq (3) are the L 1 principal eigenvectors of AA T . Therefore, the PCA filters are expressed as: where q l (AA T ) represents the l-th principal eigenvector of AA T , and mat k 1 ;k 2 ðq l ðAA T ÞÞ is the function that maps q l (AA T ) to the matrix O 1 l . Similar to the deep neural networks (DNNs), the multiple stages of PCA filters can be stacked for extracting higher level features. All outputs of the first convolutional layer are utilized as the inputs of the second convolutional layer. By repeating almost the same process as in the first convolutional layer, the leading L 2 eigenvectors are obtained as the filters in the second one.
B. The output layer. For the p-th training image, there are L 1 outputs fF 1 from the first convolutional layer. Similarly, each input F 1 p;v will produce L 2 corresponding outputs fF 2 p;w g L 2 w¼1 from the second convolutional layer. In the original PCANet, all these L 1 ×L 2 outputs are binarized using the Heaviside step function while each group of L 2 binary outputs is weighted summed to obtain L 1 decimal-valued images. However, the previous analysis has shown that this operation will degrade the effectiveness of the extracted features. To ensure that all those features obtained from the input image can be kept as accurate as possible, the binary hashing and block histograms will be replaced by the PReLU function [34]. The reason of using PReLU function instead of ReLU and sigmoid functions is that it can help to preserve image details better by additionally utilizing the structural information carried by the negative entries. In this modified PCANet model, the PReLU will be utilized as the final output layer to map nonlinearity into the data to ensure the accuracy and robustness of the extracted intrinsic features. Here, the PReLU function is defined as: ( When a = 0, the function degenerates into the Rectified Linear Unit (ReLU). If a is a very small fixed value, the PReLU will be reduced to the Leaky ReLU (LReLU). In this paper, a is fine-tuned as 0.25 to ensure the optimal despeckled results.

Feature extraction using the modified PCANet model
To realize robust intrinsic features extraction, the modified PCANet must be trained to obtain the convolution filter kernels. Considering that the PCANet uses the image patch based training strategy, we have chosen 200 general ultrasound images from the open ultrasound database [37] to train the PCANet. They consist of abdomen images, urinary tract images, pediatrics images, gynaecology images and musculo skeletal joints images, among which the number of each type of ultrasound images is 40. All the 200 images are cropped to be the size of 480×320 and pre-processed by the OBNLM filter.
By using the trained PCA filters, the modified PCANet can extract the intrinsic features. A flowchart is given to illustrate how the modified PCANet extracts the features from the considered patches of an input image. As shown in Fig 1, the original noisy ultrasound image is preprocessed by the OBNLM filter before input into the PCANet. The patches in the pre-processed ultrasound image are convoluted with the trained PCA filters to produce L 1 feature maps in the first convolutional layer. Then, each feature map is convoluted with the trained PCA filters to generate L 2 feature maps in the second convolutional layer. All the L 1 ×L 2 feature maps are processed by the PReLU function to produce the final outputs.

The modified PCANet based nonlocal means method
The main aim in this study is to improve the despeckling performance of NLM method. To refine the calculation of structural similarity between image patches in the NLM method, the proposed where h acts as a decay parameter controlling the degree of filtering, X(i,j) and X(m,n) are the concatenated feature vectors of image patches centered at (i,j) and (m,n), respectively. ||�|| 2,α is a Gaussian weighted Euclidean distance with α denoting the standard deviation of Gaussian function.
Based on the structural similarity ω(i,j,m,n), the restored intensity NLM[I(i,j)] at (i,j) in the noisy image I is determined as: where O(i,j) is the search window centered at (i,j).
To further improve the denoised results, the structural similarity between image patches will be refined by using the image restored by the proposed method instead of the pre-filtered results produced by the OBNLM method. The refined structural similarity will help the NLM filter to provide better despeckling performance.

Implementation of the proposed method
The implementation of the proposed method is summarized in Fig 3, the detailed description is as follows.
Step 1: Pre-processing. The original noisy image is pre-filtered by the OBNLM filter to produce the pre-processed image. The noise standard deviation is estimated, and the decay parameter is set as suggested in [38].
Step 2: Generation of the feature images. The pre-filtered image is input into the modified PCANet which has been trained using the open ultrasound database [37] to generate L 1 ×L 2 feature images as the outputs of this network. Step 3: Computation of similarity weight. For each considered image patch in the original noisy image, the corresponding feature vector is constructed by using the feature images obtained in Step 2. The produced feature vectors are employed to compute the structural similarity between two image patches via Eq (6).
Step 4: Image restoration. Based on the similarity weight and the pre-fixed decay parameter, each pixel in the noisy image is restored by the NLM method based on Eq (7).
Step 5: Refinement of similarity weight. The restored image obtained in Step 4 is input into the modified PCANet to produce L 1 ×L 2 feature images, and then they are used to compute the structural similarity via Eq (6).
Step 6: Output of the final despeckled image. Based on the derived similarity weight in Step 5, the final despeckled image is produced by Eq (7).

Experimental results and discussion
In this section, the proposed method is tested on the synthetic image, the simulated image and the real ultrasound images. To demonstrate the superiority of the proposed PCANet based NLM method (PPCA-NLM), it will be compared with such traditional well-known despeckling algorithms as Frost, Kuan, SBF, SRAD, TNLM, OBNLM and NLMLS. Meanwhile, the proposed method will be also compared with the DnCNN method and the NLM methods using the original PCANet model (OPCA-NLM), the ReLU-based PCANet model (RPCA-NLM) and the sigmoid-based PCANet model (SPCA-NLM). In all experiments, the window size of Frost and Kuan filters is fixed to be 3×3. For SBF and SRAD filters, the parameters are fine-tuned as referred in [10,11]. The sizes of similarity window and search window are set as 7×7 and 17×17 in the TNLM, OBNLM, NLMLS and PCANet based NLM filters, respectively. For the DnCNN denoiser, the same database is used as the PCANet to train the DnCNN model based on Eq (1), and the parameters and network structures are set as suggested in [31]. For these NLM-based filters, the decay parameter is determined using the ruleof-thumb, i.e., h = β�σ, where β and σ denote a predefined constant and the noise standard deviation [38], respectively. In these PCANet based NLM methods, the number of PCA filters in the two convolutional layers is fixed to be 12 and the patch size is 7×7, respectively.

The synthetic image
The experiment is conducted on the synthetic image corrupted by various levels of speckle noise with σ = 3, 4, 5, and 6, which is simulated based on Eq (1).  The quantitative evaluations are also made among all the tested methods. Two well-known evaluation indexes as peak signal-to-noise ratio (PSNR) and structural similarity index metrics (SSIM) [39] are used for performance appreciation, which are defined as:   SSIM ¼ where W and H represent the width and height of the image, respectively. u is the noise-free image andû is the denoised image. mû and μ u are the mean intensity of imagesû and u, respectively. dûu is the covariance between imagesû and u, sû and σ u are the standard deviations of imagesû and u, respectively. C 1 and C 2 are the small constants to stabilize SSIM. Tables 1 and 2 list the PSNR and SSIM values for all evaluated methods operating on the synthetic image corrupted with different levels of speckle noise, where the best values are marked in bold. It can be seen that the DnCNN can produce the maximum PSNR at all noise  levels, and the PPCA-NLM method ranks the second. However, the proposed method provides significantly higher SSIM values than the DnCNN. The overall evaluation based on the abovementioned visual inspection and the objective indexes demonstrates that the PPCA-NLM method exhibits the excellent despeckling performance for the synthetic image.

The simulated image
A more challenging and relevant ultrasound image has been generated for the "cyst" phantom based on FieldⅡ simulation, which is a program to simulate ultrasound transducer fields and ultrasound imaging using the linear acoustics. The "cyst" phantom consists of a collection of point targets, five cyst regions, and five highly scattering regions [40]. Fig 7 presents the "cyst" phantom, the simulated image, and the despeckled results of all the tested methods. The Frost, Kuan and SBF filters keep much noise in the images as shown in Fig 7(C)-7(E). The SRAD filter performs well in noise removal, but it leads to the blurry edges and the staircase effect in the despeckled image. All NLM-based methods smooth out speckle noise more effectively than the above-mentioned local filters. However, the TNLM and NLMLS filters cannot maintain the sharpness of point targets and the OBNLM method produces the obvious artifacts. For the DnCNN method, many unwanted artifacts can be observed in the smooth region as shown in  Fig 7(N). Moreover, two other widely used evaluation indexes, i.e., equivalent number of looks (ENL) and contrast-to-noise ratio (CNR) [41] are utilized to quantitatively appreciate the despeckling performance, which are defined as: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where μ b and μ o are the mean intensity of background area and object area, σ b and σ o are the standard deviation of background area and object area, respectively.  Four pairs of ROIs are selected to evaluate the despeckling performance for these tested methods, and the ROIs are marked with different colors as shown in Fig 7(B). The ENL and CNR are listed in Tables 3 and 4, respectively. As regards ENL, the PPCA-NLM method provides the maximum values at all ROIs. Likewise, the proposed method performs the best for three ROIs except that its CNR is slightly less than that of the RPCA-NLM method for the blue ROI. The performance appreciation based on ENL and CNR confirms that the PPCA-NLM method is superior to other compared methods.

The real ultrasound images
Three real ultrasound images are also used to further assess the effectiveness of the proposed method.    Tables 5 and 6. With regard to ENL results, the PPCA-NLM method produces the maximum values at three ROIs among all the filters. In terms of CNR, the OPCA-NLM method provides the competitive results, followed by the PPCA-NLM and DnCNN methods. The above quantitative comparisons and visual inspection illustrate that the PPCA-NLM method is very suitable and practicable for despeckling the real ultrasound image.
Besides, two other clinical ultrasound images including the fetal image and the parotid gland image are used to further display the visual impression for the TNLM, OBNLM,  NLMLS, DnCNN and PPCA-NLM filters. As indicated in Figs 9 and 10, the PPCA-NLM method outperforms the compared methods in that it not only effectively removes speckle noise but also enhances the sharpness of boundaries and retains image structures very well.

Conclusion
In this paper, the modified PCANet based deep learning baseline is introduced into NLM method for reducing speckle noise in the ultrasound images. This proposed method utilizes the intrinsic features extracted from an input noisy image by the PCANet to refine the similarity computation in the NLM method. The introduction of the intrinsic features instead of the gray-level information into the NLM method can facilitate the effective despeckling of ultrasound images due to its effectiveness in representing their structural information. The experimental results demonstrate that the proposed method outperforms the other compared methods in speckle reduction and image detail preservation. Therefore, the proposed approach has the great potential application to ultrasound-based clinical diagnosis. In future, the new deep learning models will be developed to denoise the images corrupted by Rician noise or Poisson noise.