A new development of non-local image denoising using fixed-point iteration for non-convex ℓp sparse optimization

We proposed a new efficient image denoising scheme, which mainly leads to four important contributions whose approaches are different from existing ones. The first is to show the equivalence between the group-based sparse representation and the Schatten-p norm minimization problem, so that the sparsity of the coefficients for each group can be measured by estimating the underlying singular values. The second is that we construct the proximal operator for sparse optimization in ℓp space with p ∈ (0, 1] by using fixed-point iteration and obtained a new solution of Schatten-p norm minimization problem, which is more rigorous and accurate than current available results. The third is that we analyze the suitable setting of power p for each noise level σ = 20, 30, 50, 60, 75, 100, respectively. We find that the optimal value of p is inversely proportional to the noise level except for high level of noise, where the best values of p are 1 and 0.95, when the noise levels are respectively 75 and 100. Last we measure the structural similarity between two image patches and extends previous deterministic annealing-based solution to sparsity optimization problem through incorporating the idea of dictionary learning. Experimental results demonstrate that for every given noise level, the proposed Spatially Adaptive Fixed Point Iteration (SAFPI) algorithm attains the best denoising performance on the value of Peak Signal-to-Noise Ratio (PSNR) and structure similarity (SSIM), being able to retain the image structure information, which outperforms many state-of-the-art denoising methods such as Block-matching and 3D filtering (BM3D), Weighted Nuclear Norm Minimization (WNNM) and Weighted Schatten p-Norm Minimization (WSNM).


Introduction
Images are generally contaminated by noise during acquisition, transmission and compression and real-life images are often degraded with mixed noise and it is hard to identify the type and model the noise [1][2][3][4][5][6][7][8]. Images with high resolutions are desirable in many applications, e.g., object recognition, image classification, and image segmentation in medical and biological science. As an essential low-level image processing procedure, image denoising has been studied extensively and belong to a special type of classical inverse problems. The general observation with additive noise can be modeled as Y = X + N, where Y is the noisy observation, and X and N present the original image and white Gaussian noise, respectively. Though a plethora of noise removal techniques have appeared in recent years, for example, Convolutional Neural Network (CNN) [9,10] have proved very promising on denoising tasks for which large training sets are available, but when the training data are scarce, their performance suffers from overfitting. Therefore image denoising for real-life noise still remains an important challenge in order to recover the images with high quality [11].
Image denoising problem is in general ill-posed and it requires appropriate regularization. Over the past few decades, numerous image denoising methods have been developed [12]. This is usually achieved by minimizing a suitable energy functional that characterizes a tradeoff between data-fidelity and regularity. Frobenius norm is often employed to measure the data fitting loss for additive Gaussian noise.
Sparse signal representation describes a signal that can be approximated as a linear combination of as few as possible atoms from a given dictionary. Recently, Elad [13] showed that sparse overcomplete representation approach is quite effective in denoising images, supported by recent study that better denoising performance can be achieved by using a variant of sparse coding methods [14,15]. In order to promote sparsity more extensively than convex regularization, it is also standard practice to employ non-convex optimization [16].
In image denoising, following [17], each noise patch y i is extracted from the noisy image Y. In order to better exploit group sparsity, we group a set of similar patches Y ¼ ½y 1 ; y 2 ; . . . ; y n � 2 R m�n . Thus, denoising problem becomes the recovery problem of x i from y i . Now let us consider the group sparsity defined by a group norm ||A|| p,2 : ðD; AÞ ¼ arg min where A ¼ ½a 1 ; a 2 ; . . . ; a n � 2 R m�n is related to image patches by X = DA. We note that the group norm (quadratic symmetric gauge function, see 2.4.2 of [18])||�|| p,2 is defined by jjAjj p;2 ≜ k ðjja 1 jj 2 ; . . . ; jja n jj 2 Þ k p ; where α i = [α i,1 , . . ., α i,m ] T denotes the i th column of matrix A in R m�n . In recent years, many research is devoted to address the group sparse optimization problem (1), aiming at the improvement of efficiency and accuracy (e.g., see survey paper [16] and references therein). Once all group sparse codes A are achieved, the latent clean image X can be reconstructed as X = DA by standard approach(see Theorem 1 and 2 in [19]).
The main contributions of this paper are illustrated as follows: 1. We unify the group-based sparse coding in [20] and the Schatten-p norm minimization problem in [21] by proving their mathematical equivalence. 4. The proposed Spatially Adaptive Fixed Point Iteration (SAFPI) algorithm attains the best denoising performance on the value of PSNR and SSIM, being able to retain the image structure information, which outperforms many state-of-the-art denoising methods such as BM3D, WNNM and WSNM.
The rest of the paper is organized as follows. In Section 2.1, we prove the equivalence of group-based sparse coding and the Schatten-p norm minimization problem and propose a new solution to Schatten-p norm minimization problem. A fixed point iteration for solving sparse optimization in ℓ p space with p 2 (0, 1] is formulated and discussed. In Section 2.2, we establish an image denoising scheme using nonlocal self-similarity and Schatten-p norm minimization. In Section 3, based on the new developed Spatially Adaptive Fixed Point Iteration (SAFPI) algorithm, we present the experimental results using a set of standard benchmark images. And the comparison with several existing methods are also provided to demonstrate our improvement. Finally, the paper ends with concluding remarks.

Proximal operator for Schatten-p norm minimization
The eigenvalues of Y T Y are called the singular values of Y, denoted by σ 1 (Y), . . ., σ min{m,n} (Y) in decreasing order (see page 246 of [22]). Let r = rank(Y), it is clear that s rþ1 ðYÞ ¼ 0; . . . ; s min fm;ng ðYÞ ¼ 0: The matrix Y also has the following Singular Value Decomposition (SVD) Y = USV T , where U 2 OðmÞ; V 2 OðnÞ (O is the set of orthogonal matrices) and σ n is an m×n diagonal matrix with diagonal entries σ 1 (Y), . . ., σ min{m,n} (Y). We introduce the Schatten-p norm (0 < p < 1) of Y, which is defined as Special cases of the Schatten-p norm include the nuclear norm (p = 1) and the Frobenius norm (p = 2). Next we analyze the relationship between group-based sparse coding and the Schatten-p norm minimization problem, which improves Theorem 2 in [23]. But our approach is based on the "symmetry" technique (similar to [17] for other purpose), which is essentially different from [23].
Theorem 1 The group-based sparse coding problem (1) is equivalent to a Schatten-p norm minimization problem.
Eqs (12), (13) and (14) imply that any operation designated for sparse coefficient vector α's can be conveniently implemented with singular values of X (only differs by a constant scalar).
The Schatten-p norm (0 < p � 1) has been widely used to replace the nuclear norm for better approximating the rank function. There are extensive study for the Schatten-p norm optimization problem (14) in literature [24,25]. Note that the main difference between group sparse coding and the Schatten-p norm minimization problem is that group sparse coding has a dictionary learning operator while the Schatten-p norm minimization problem does not involve such operation.
2.1.2 Computation of proximal mapping using fixed point iterative method. Now let us recall the definition of proximal mapping.

Definition 2
The proximal mapping of a mapping Y : R 7 ! R is The proximal mapping of jjYjj p S p is defined as: And we have the following celebrated theorem: Theorem 3 [Theorem 1 of [26]] If matrix Y 2 R m�n has the following Singular Value Decomposition (SVD) Y = USV T , where U 2 OðmÞ; V 2 OðnÞ and σ n is an m × n diagonal matrix with diagonal entries σ 1 (Y), . . ., σ min{m,n} (Y). Then we have in Eq (2) where s i ðXÞ is defined as the scalar proximal mapping in (2): In order to be transparent for our proposed approach to solve Eq (4), we recall two important concepts in convex optimization next.
Definition 4 (see Chapter 2 p 82 of [27]) Let R n be paired by a bilinear functional (inner product) h,i and let f : R n 7 ! R be any extended real-valued function on R n . Then the function f � on R n defined by f � ðyÞ ¼ min x f ðxÞ À hx; yi; y 2 R n is called the Fenchel conjugate of f (with respect to the given pairing). Note that f � is always a closed convex function, regardless of the structure of f. Definition 5 Given the proper convex function f : R n 7 ! ðÀ 1; þ1�, the subdifferential of such a function is the (generally multivalued) mapping @f : R n 7 ! R n� defined by @f ðxÞ ¼ fx � 2 R n� j f ðzÞ � f ðxÞ þ hx � ; z À xi; z 2 R n g: The elements x � 2 @f(x) are called subgradients of f at x. Actually, same definition works for nonconvex f (however, subgradient need not exist).
A point x 2 R n is a minimizer of a function f (not necessarily convex) over R n if and only if f is subdifferentiable at x and 0 2 @f(x).

Lemma 6
Prox l k �k p ðxÞ ¼ arg min is closed and convex, but it has no close form solution for general p.

Definition 7
Given a function g : ½a; b� 7 ! R, find ξ 2 [a, b] such that ξ = g(ξ). If such ξ exists, it will be called a fixed point of g and it could be computed by the following algorithm: ξ (n) = g(ξ (n−1) ), n � 1. And g is said to be a contraction on [a, b] if there exists a constant L such that 0 < L < 1 and |g(x) − g(y)| < L|x − y| for any x, y 2 [a, b].
Theorem 8 Let us denote @ l;p ¼ max ; then for 0 < p < 1, we have fixed À point of the contraction w 7 ! jxj À lp w jwj 2À p jxj � @ l;p :

Spatially Adaptive Fixed Point Iteration (SAFPI) denoising algorithm
In [20,21], the authors proposed a group sparse representation framework and a Schatten-p norm minimization framework for image denoising. In Theorem 1, we have shown these two approaches are equivallent. From combining Theorem 3 and Theorem 8, we obtained a fixed point iteration solution of Eq (14) in Theorem 1, which is more rigorous than [20,21].
After grouping a set of similar patches Y ¼ ½y 1 ; y 2 ; . . . ; y n � 2 R m�n , the denoising problem becomes the recovery problem of x i from y i . And as was shown in Theorem 1, the Schatten-p norm minimization problem (14) converts the denoising problem to recover the low-rank matrix X from the non-low-rank matrix Y, and thus filtering out the noise of the structure set. And the second identity in Eq (14) can be solved using Theorem 3 and Theorem 8.
Wavelet-based image denoising assumes that the wavelet coefficients obey the Laplace distribution, and the threshold method is used to filter the noise in the image. The prior distribution of the block matrix singular values can also approximate the Laplace distribution in space. The parameter λ for each group that balances the fidelity term and the regularization term should be adaptively determined for better denoising performance. Using the Spatial Adaptive Laplacian Transcendental as appeared in [17,31], the threshold parameter can be set where σ i denotes the locally estimated variance at the position i. Now the second identity in Eq (14) becomes From Eq (4), if Y has singular value decomposition Y = USV T , we haveX ¼ UŜV T , wherê S ¼ diagfε 1 ; . . . ;ε minfm;ng g. Andε i can be computed using Theorem 3 and Theorem 8.
Recently some developed iterative regularization techniques in [17] offers an alternative approach toward spatial adaptation. The basic idea of iterative regularization is to add filtered noise back to the denoised image i.e., y ðkþ1Þ ¼x ðkÞ þ dðy Àx ðkÞ Þ ð6Þ where k denotes the iteration number and δ is a relaxation parameter. Besides, we can execute the above denoising procedures for better results after several iterations. In the k + 1-th iteration, the iterative regularization strategy in [17] is used to update the estimation of noise variance. Then the standard deviation of noise in k + 1-th iteration is adjusted aŝ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi s 2 o À jjy À y ðkÞ jj where γ is a scaling factor controlling the re-estimation of noise variance and the local estimated variance at the i-th position iŝ s ðkþ1Þ i ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where χ i is the i-th singular value of image y.
The higher the structural similarity of the blocks in the structure group is, the more correlative the column vectors in the block matrix will be, which means that it has a low rank property corresponding to the noise-free matrix. The information is mainly concentrated in those largest singular values. During the proximal operation, selecting the appropriate threshold parameter for those with larger singular value makes the processed singular value closer to the noisefree singular value, which can well preserve the useful information in the image while filtering out the noises.
Therefore, choosing blocks with more similar structure will help to improve the image denoising effect. There are many commonly used similarity measures such as Euclidean distance, cosine angle, and correlation coefficient. The traditional block similarity measure function have some shortcomings in measuring the similarity between blocks. Euclidean distance simply calculate the difference between the pixel gray value of the blocks, and then add up as a standard measure of the degree of similarity. Although this method is simple and easy to implement, it only treats the blocks as isolated pixels and neglects the statistical relevance between local pixels, which leads to the inaccuracy of similarity measure. This is because the blocks are not in an Euclidean space. There is a very strong correlation between the pixels in the block. The local pixel correlation carries important structural information of the blocks.
In order to solve this problem, Structural SIMilarity (SSIM) index [32,33] is often used to evaluate the image quality. SSIM is defined as where m X ; mX , s 2 X ; s 2 X , and s X;X denote the average of X, the average ofX, the variance of X and the variance ofX, respectively. c 1 and c 2 are two variables to stabilize the division with weak denominator.
A detailed step-by-step description of Spatially Adaptive Fixed Point Iteration (SAFPI) denoising algorithm is given by Algorithm  4. SVD for each noisy data matrix Y i : Application of proximal operator: Updating the ε i value by using the follow formula with computed λ i and ε i from step 5 and 4; 8. Image update: obtain an improved denoised imagex ðkÞ by weighted averaging all denoised patchesX Output:x ðkÞ .

Results and discussion
In recent years, many denoising algorithms have been developed and the adaptive image removal algorithms [34][35][36] is a hot trend in signal and image denoising. To demonstrate the effectiveness of the proposed denoising algorithm, in this section, we compared the denoising performance with recently proposed state-of-the-art denoising methods, such as BM3D [37], WNNM [38], WSNM [21], Expected Patch Log-likelihood (EPLL) [39], Spatially Adaptive Iterative Singular-value Thresholding (SAIST) [17], Patch-Based Near-Optimal image denoising (PBNO) [40], Global Image Denoising (GID) [41], iterative denoising system based on Wiener filtering (WIENER) [34], and Linear Complex Diffusion Process (LCDP) [35]. We have used some well known images that are commonly used in the literature such as [17,21,38,42]. We added noise to them, and test the proposed denoising algorithm with different power p under different noise levels. The experimental images are shown in Fig 1. There are several image quality evaluation indicators measuring success of denoising such as kurtosis, low signal-to-noise-ratio(SNR). Low kurtosis indicate superior performance and it is defined as is the k-th cumulant function. In our work, Non-local image denoising using fixed-point iteration for non-convex ℓ p sparse optimization we evaluated the performance with three criterion: Structure Similarity Index (SSIM), kurtosis and Peak Signal-to-Noise Ratio (PSNR) which defined as 10 log 10 , where M denotes the maximum intensity of the underlying image and MSE ¼ 1 mean squared error between the denoised imageX and the noiseless image X. All the experiments were carried out on Matlab (R2016a) of a PC with Intel(R) Xeon(R) CPU E5 − 1630 V4@3.7GHz and 32GB RAM.

Analysis of over-shrinkage problem and optimal power p
Firstly, we noticed that not all values of power p applied well to the proposed Spatially Adaptive Fixed Point Iteration (SAFPI) algorithm. It would conduct an approximation deviation with the solved singular values and produce excessive contraction, when the value of p is not suitable. As shown in Fig 2, we tested SAFPI to process low rank approximation on the red patch in Fig 2B with the noise level be 50, which is randomly marked from "Monarch" Fig 2A. In Fig 2C, fs ðpÞ i g represents the singular values of the denoised similar patches with different power p. The ground-truth line (denoted by blue line) is the singular value connection line for the similar blocks of the noiseless red patch in Fig 2A. Now we can see that fs ð0:8Þ i g (shown on green line) is more close to the ground-truth line. This means that the other fs ðpÞ i g's (denoted by black, red, blue lines) conducted a serious over-shrinkage problem. In this case, setting p = 1 as in WNNM in denoising will lead to bad processing results. So the advantage of SAFPI algorithm is to overcome the over-shrinkage problem, in case we can find the optimal value of power p.
Secondly, in order to find the optimal values of p under different noise levels for SAFPI algorithm, we randomly chose 10 test images in Fig 1 for our experiments and set the values of power p to be from 0.05 to 1 with an interval of 0.05. The zero mean additive white Gaussian noise levels were set to be σ n = {20, 30, 50, 60, 75, 100}, and the other parameters were the same as WSNM [21]. The results are shown in Fig 3, the horizontal coordinate denotes the different values of p and the vertical coordinate represents the average value of PSNR under given noise level. And the red dots are the optimal points for each given noise level.
We can see that the best values of power p are 1.0, 0.90, 0.85 and 0.6, when the noise levels are low or medium 20, 30, 50 and 60, respectively. While handling very high noise levels 75, 100, the average PSNR values decrease firstly and then increase, the best values of p are 0.95 and 0.9 respectively. To sum up, we find that the optimal value of p is inversely proportional to the noise level except for high level of noise, where the best values of p are 1 and 0.95. And then we applied the best empirical values for the next experiments.  Tables 1, 2, 3, 4, 5 and 6. It can be seen from Table 7 that our algorithm always obtains the best average values of PSNR under different noise levels. The proposed approach achieves 0.3dB to 0.51dB improvement on average over the BM3D, when the noise levels are between 20 and 100. It also achieves 0.02dB, 0.06dB and 0.14dB improvement on average over the WSNM, when the noise levels are 30, 50 and 100, respectively. And our average values of SSIM are the best when the noise levels are 20, 30, 50 and 60. To sum up, for every given low and medium noise level, our algorithm attains the best denoising performance on the values of SSIM and PSNR for all noise levels. This leads to a better image denoising performance and high robustness to noise strength in comparison to several existing denosing algorithms.

Performance comparison with different methods
For visual quality, some comparative images are shown in Figs 4, 5, 6 and 7. As shown in Fig 4, our algorithm resumed the structure of the ear (which is magnified in the highlighted red window) better than other algorithms. When the noise level is very high, as shown in the zoom-in window in Fig 7, our algorithm could reconstruct clear texture structures, while the competing methods get more blurred textures.Other visual improvements can be seen in Figs 5 and 6. Sometimes the variation of noise is too big and too small in the same image (in different parts of the image). To demonstrate our method, we randomly selected two small pieces from the given image. Although their local noise level would be different, our algorithm always gets the best visual texture. Now we could conclude that the proposed SAFPI algorithm can display excellent denoising performance, producing good visual effect and rebuilding better textures.
If noise is non-Gaussian, one popular method is to transform the non-Gaussian noise into a more tractable Gaussian model such as the generalized Anscombe transformation (GAT) [43,44]. In this paper, we deal with non-Gaussian noise using the proposed algorithms. In the first experiment, we assumed the noise was a mix of Gaussian noise (σ n = 20, 50, 100) and speckle noise (the density is d = 1 � 10 −3 ). Then we used the SAFPI algorithm directly to remove the noise. We randomly selecteded six images (Lena, Monarch, Barbara, Cameraman, House, Peppers) on Fig 1 for experimental verification and compared with some excellent denoising algorithms which has been mentioned in the previous experiments. The results are shown on Table 8. In the second experiment, we assumed the noise was mixed Poisson-Gaussian noise. Then we transformed the Poisson-Gaussian hybrid noise into an approximate Gaussian noise using the GAT [43] algorithm and obtained the repaired images by using the proposed denoising algorithm and the exact unbiased inverse GAT [44]. We used Lena and C.man as the test images and set eight different peak values to be (1,2,5,10,20,30,60,120). The Poisson-Gaussian noise were set to be σ = peak/10. We compared with BM3D, SAIST, WNNM and recorded the average PSNR and kurtosis parameters of these two images. The results are shown on Table 9. All bold numbers represent the best evaluation index values. From Table 8, we can see when the standard deviation is not big (σ n = 20, 50), our proposed algorithm almost achieved the best values of all three quality evaluation indicators, and obtained the best PSNR values on all of hybrid noises experiments. From Table 9, it can be seen that the SAFPI algorithm almost get the highest averaged PSNR value under all peaks experiments. In most cases, our proposed algorithm obtained relatively good kurtosis metrics and optimal PSNR value, which is 0.2 to 0.3dB higher than the BM3D algorithm and about 0.1dB to 1.11dB higher than WNNM.
Finally, we bravely attempted to discuss the complexity of SAFPI algorithm. We assume each patch size is A � A, where A represents the length or width of each block, and k is the number of similar patches in each structural group y i . Now calculating SVD (step 4 in Algorithm 1) needs Oð min ðA 2 � K 2 ; A 4 � KÞÞ flops in each iteration. And it also costs OðKÞ to compute the singular values in step 6. Next since the image y (k+1) can be divided into N blocks Non-local image denoising using fixed-point iteration for non-convex ℓ p sparse optimization Non-local image denoising using fixed-point iteration for non-convex ℓ p sparse optimization  Non-local image denoising using fixed-point iteration for non-convex ℓ p sparse optimization  Non-local image denoising using fixed-point iteration for non-convex ℓ p sparse optimization Non-local image denoising using fixed-point iteration for non-convex ℓ p sparse optimization in step 2, then it needs i � N � Oð min ðA 2 � K 2 ; A 4 � KÞ þ KÞ flops, where i is the number of iterations in Algorithm 1. Then we recorded the execution times of several excellent denoising algorithms spent on the above experiments with the standard deviation σ n of the white Gaussian noise to be 20: SAFPI 4843.339s, WSNM 5453.311s, WNNM 4410.991s, SAIST 923.9837s, BM3D 17.6242s and EPLL 1550.607s. Our algorithm did not take much longer time while maintaining the best denoising results.

Conclusions
In this paper, a fixed-point iteration scheme was developed for sparse optimization in ℓ p space with p 2 (0, 1] by using proximal operator. We showed that group sparse coding was equivalent to Schatten-p norm minimization problem, and thus the sparse coefficient of each group were measured by estimating the singular values of each group. When analyzing the optimal value of power p, we can find that the optimal value of Schatten p-norm is related to the noise level. As the noise level increases, the optimal value of p decreases gradually. And if the noise  reaches a high level, the optimal value of p will be close to 1. The developed SAFPI algorithm can obtain higher PSNR indices and is able to retain promising texture structure information and visual quality. The methods developed in this paper leads to a better image denoising compared to other competing denoising algorithms. There are several future research directions. Non-local image denoising using fixed-point iteration for non-convex ℓ p sparse optimization We are further exploring other non-convex optimization strategies for more effective convergence and further improvement. The convolutional neural networks(CNN) based denoising methods become more and more popular now and we will investigate CNN architectures for the denoising of images in the future.