Image denoising via a non-local patch graph total variation

Total variation (TV) based models are very popular in image denoising but suffer from some drawbacks. For example, local TV methods often cannot preserve edges and textures well when they face excessive smoothing. Non-local TV methods constitute an alternative, but their computational cost is huge. To overcome these issues, we propose an image denoising method named non-local patch graph total variation (NPGTV). Its main originality stands for the graph total variation method, which combines the total variation with graph signal processing. Schematically, we first construct a K-nearest graph from the original image using a non-local patch-based method. Then the model is solved with the Douglas-Rachford Splitting algorithm. By doing so, the image details can be well preserved while being denoised. Experiments conducted on several standard natural images illustrate the effectiveness of our method when compared to some other state-of-the-art denoising methods like classical total variation, non-local means filter (NLM), non-local graph based transform (NLGBT), adaptive graph-based total variation (AGTV).


Introduction
Image denoising is one of the most fundamental and widely studied problems in low-level image processing. Its main purpose is to reduce undesirable distortions and noise present in images while preserving important features such as discontinuities, edges, corners and textures. Image denoising can be described by the following model [1]: where u is the original image, u 0 is the observed image and z is assumed to be an additive white Gaussian noise. Image denoising is an ill-posed problem and requires an appropriate regularization to restore the original image [2]. Solutions proposed so far can be divided into two categories [3]. The first one makes use of some prior knowledge about images such as image smoothness or sparsity of image coefficients in certain transformed domains (e.g., DFT or Wavelet). The PLOS ONE | https://doi.org/10.1371/journal.pone.0226067 December 12, 2019 1 / 16 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 second one relies on existing self-similarity in images as, for example, bilateral filter and nonlocal means (NLM) filter [4]. The solution we propose in this paper belongs to the first class of method where the denoising problem is expressed as: where ku 0 À uk 2 2 is a fidelity term which requires that the desired image u approaches to the observation image u 0 while the prior(u) is a term representing the prior knowledge we have on the original image u and μ is a weighting parameter which determines the tradeoff between prior knowledge and the fidelity of the observation.
Total variation (TV) algorithm [5] and its improved versions [6] are among the most popular prior knowledge-based methods due to their efficiency for image denoising. They also constitute powerful tools for multiscale image analysis [7]. Most total variation-based image denoising methods consider the original image as a continuous function defined on < 2 , and evaluate the noise through the integration of the absolute gradient of the observed function. Under this hypothesis, it is then natural to take the image as a smooth function over a discrete sampling structure [8]. Reducing the total variation indicates that the unwanted details have been removed. However, reducing the gradient may not be sufficient in the case of images with noise or with many details.
Recently, image denoising has also been studied from the point of view of graph signal processing (GSP) [9]. Kheradmand and Milanfar [3] proposed a general graph-based regularization framework which was later modified by Pang et al. [10] who used patch gradients instead of patch intensities to define the patch self-similarity. Mahmood et al. [11] proposed an adaptive graph total variation (AGTV) for tomographic reconstruction. Though these methods provide good experimental performance, some problems remain: (i) there is neither theoretical justification nor intuitive interpretation of the relationship between the graph structure derived from the image and the image denoising performance; (ii) these algorithms involve complicated mathematical construction and large calculations.
In addition, one of the key issues in performing graph signal processing (i.e. denoising in our case) concerns the selection of edge weights [12]. Indeed, these weights have a significant effect on the amount of noise removal. In [13], Smolka et al. compute the weights using a Gibbs distribution of the intensities for the adjacent pixels. Black et al. [14] derived these weights via robust statistics. A more common but less robust approach exploits a Gaussian kernel function where the weights are calculated only from two isolated pixels based on their intensities and location information [15]. A more reliable idea is to consider the pixel neighborhood due to the high degree of redundancy in natural images [6]. In such a way Buades et al. [4] proposed to use a windowed non-local means filter to characterize one pixel instead of only using the pixel itself. Some graph-signal based image denoising methods also borrow the image patch thought to construct the graph, the most typical scheme being AGTV. However, they only take the image patch intensity into consideration and ignore the location information of the patch. Thus, image spatial information has not been utilized.
To overcome the above problems, we propose a non-local patch graph total variation (NPGTV) as a novel method for natural image denoising. Our method can be seen as an improved version of AGTV by considering the pixel coordinate as an ingredient to construct the K nearest neighbor graph (KNN graph). More clearly, both the image patch intensity and patch location information are taken into account. By doing so, image details can be preserved at a greatest extent. In addition, in this paper, we also analyze the impact of the patch size and of the K value of the KNN graph on the denoising performance. It is important to notice that AGTV reconstructs and utilizes graph total variation, repeatedly, our method merely constructs a non-local patch graph and use GTV model once. In this way, time cost is significantly reduced. As we will see in the sequel, our proposed method can achieve better performance compared to some recent and efficient non-local based denoising methods and total variation based denoising methods.
The rest of this paper is organized as follows. Some basic knowledge about graph signal processing is reviewed in Section 2. Section 3 presents our NPGTV method and details its implementation. In Section 4, we experimentally illustrate its effectiveness and compare it to several state-of-the-art denoising methods. Some future works are sketched in Section 5.

Graph signal and weighted graph
A dataset consisting of N elements with a known relationship between these elements can be represented by a graph G = {V, A, W}, where V, A and W stand for the nodes set, the adjacency matrix and the weight matrix of graph G, respectively. Each element in set V corresponds to a node in graph G while each W i,j in W reflects the degree of relationship in-between the two nodes v i and v j . In the general case, G can have directed or undirected edges (i.e. arcs) and W i,j can take arbitrary real or complex values. For a denoising image, the direction of edge is meaningless and, considering one node v i and nodes connected to it, one can construct the neighborhood of v i as N i = {j|W i,j 6 ¼ 0}.
The graph signal of G is defined as a map from the nodes of G into the real number set <, where f is a real value function and f(v i ) is the graph signal on vertex v i . A graph signal can also be represented as a vector f 2 < n . In the image denoising problem, the signal at each node (i.e. pixel) corresponds to the image intensity.

Signal smoothness with respect to the intrinsic structure of graph
As is stated above, image denoising is an ill-posed problem, and thus prior knowledge about the sought image is required for the regularization. When an image is represented in the form of graph signal, the attribute that can describe graph signal should be chosen as the prior knowledge accordingly. In this paper, we take the graph signal smoothness as prior knowledge.
Smoothness is one of the most important properties of graph signals and requires taking into consideration the intrinsic structure of the data domain. Here, the intrinsic structure refers to the weighted graph onto a sampled manifold [16]. For analyzing continuous signals on differentiable manifolds, discrete calculus provides the right tools to operate [17]. Discrete differential operators defined on a graph have been widely explored in the literature. Herein, we only examine some important concepts and definitions [18] in order to derive an accurate mathematical description of the smoothness of signal on a graph. The edge derivative of a signal f along the edge e = (i, j) connecting the nodes i and j is defined as: ½f ðjÞ À f ðiÞ�: ð4Þ Based on the Eq (4), the graph gradient of f at node i is the vector: The local variation at node i can be defined as: This formula provides a measure of local smoothness of graph signal f around node p. The global smoothness using the discrete p-Dirichlet form of f is then defined as:

AGTV algorithm
AGTV algorithm was proposed for tomographic data reconstruction. The whole algorithm can be divided into five steps: (i) Project input data into the sinogram space to obtain a filtered back projection (FBP). (ii) Construct a patch graph from the FBP. (iii) Formulate an objective function that takes graph total variation and adjoint operator of the wavelet transform as regular terms. (iv) Solve the objective function with the forward-backward primal dual (FBPD) algorithm. (v) Repeat step (ii) to (iv) until convergence. Although the AGTV algorithm can perform well on the tomographic data denoising, there still exist some shortages that hinder it from being applicable to the natural image denoising problem. First, only image intensity is taken into consideration during the graph construction while ignoring the patch location information. Second, the patch graph needs to be constructed repeatedly in the whole algorithm, which will lead to the tedious hyperparameter tuning problem. How to overcome both deficiencies and construct an effective GTV-based image denoising algorithm becomes our main motivation to propose the NPGTV algorithm.

A non-local patch graph total variation
In this section, we describe the proposed NPGTV algorithm accordingly to three steps: (i) representation of an image as a weighted undirected graph; (ii) establishment and solving of the total variation model; (iii) description of our complete NPGTV proposal. This choice stands for the main steps of our algorithm depicted in Fig 1. First, as shown in Fig 1(a) and 1(b), a group of non-local image patches are extracted from a noisy image. By next, a KNN graph is derived from these patches as illustrated in Fig 1(c). Then the NPGTV model is established. Finally, the denoised image is achieved by performing the Douglas-Rachford splitting algorithm on to this model (Fig 1(d)).

Total variation of graph signal
As we discuss above, Eq (7) can measure the graph signal smoothness. The p value in (7) can take 1, 2 and 1. When p = 1, S 1 (f) is the total variation of the signal on a graph [20]: From (8), one can easily find that edge weights have an important effect on graph total variation. Thus, for a same image, changing the graph topology by modifying its edges will lead to different graph total variation. Notice that in [21] another GTV based on the l p norm was used: where A stands for the adjacent matrix and ρ max (A) denotes the eigenvalue with the largest magnitude. Although (9) can measure graph smoothness generally, it is not necessary vanished for constant graph signal and may be zero for non-constant signal as mentioned in [20]. As a consequence, in this paper, we limit our discussion to (8).

Modified graph representation
One of the core steps of our method is to construct from an image a graph G for GTV regularization. To do so, we build a weighted undirected graph G = (V, E) to describe an image by considering its pixels as elements of V. The set E contains the corresponding edge information.
The edge e m,n only exists if the node v m and v n are connected. One naïve way consists in connecting each pixel to its neighbors. One can thus obtain a 4-connect graph or an 8-connect graph. Another strategy, like in AGTV [11], connects image patch center at each pixel through  the K nearest neighbor (KNN) algorithm. However, both methods suffer from various drawbacks. The former, being a local model, will tend to alter image details when denoising. The latter is a non-local model that merely uses patch intensity to calculate the distance between two image patches without considering central pixel coordinate. To go beyond these disadvantages, we propose a graph construction method that combines patch intensity with pixel coordinates. The method consists of four steps: 1. The whole image u 2 < n×n is first divided into a series of overlapped patches. Let us note S i be a s × s image patch whose center is at the ith pixel; 2. Each patch is then vectorized and concatenated with its center pixel coordinates. More clearly a given image patch S i is expanded into a vector v i into which, the coordinates of the ith pixel (i row , i col ) are incorporated so to get an access to a new vector v 0 i ¼ ðv i ; li row ; li col Þ, where λ is a parameter that expresses the spatial constraint; 3. Each image patch is connected with its k nearest neighbors depending on the Euclidean distance metric through the KNN algorithm. In details, the Euclidian distance between two image patches S i and S j is such as dði; jÞ ¼ kv 0 4. Finally, using the Gaussian kernel weighting scheme (10), the graph weight matrix W is computed as follows: where σ is a parameter that controls the sensitivity of the similarity measure to the noise. This one is empirically fixed to 20% of the sum of the noise variance like in [22]. Such procedure is illustrated in Fig 2.

Parameters analysis
From the above, it is easy to find that the patch size s, the K value of KNN algorithm and the spatial constraint parameter λ will have an impact on the denoising result. All three parameters are relevant to the noise level. Generally speaking, these parameters will take large values under high noise levels, and vice versa. But they cannot be too high. For the first two parameters, the larger the patch size and K, the smoother the image will be. If the patch size and K take too large values, some image details will be removed. Besides, too large patch size will lead to image edge blur (just as illustrated in Fig 3). Too large K values will also increase the calculation cost slowing down our method. Thus in the experiments presented in our paper, we set a patch size of 9×9 pixels under high noise level (larger than 15db) and 5×5 pixels under weak noise level (less than15db). The value of K is fixed to 5. Such parameters have been shown to be robust while details and fine structure can be better preserved.
As for the spatial constraint parameter λ, the location information of the patch is normalized into the range of gray level of the image. By doing so, patch intensity and location are more relevant when constructing the graph with the KNN algorithm. Notice that, when λ = 0, v 0 i is degraded into v i . As a consequence, the similarity between two patches is measured based on the distance of their intensity level. When λ takes a high value, patch location will become the main ingredient to determine the similarity between two patches. Taking such value may not work well since it is extremely sensitive to minor transformations, both in geometry (shifts and rotations) and in imaging conditions (lighting or noise) [23]. Therefore, the selection of the value of λ heavily depends on the range of gray level of the image. In our experiment, the gray level is normalized between 0 to 1 considering images of size 512×512 or 256×256 pixels. λ should guarantee that the values of λi row and λi row not too far from this range. Thus, we set λ in the range between 0.01 and 0.1. Fig 4 well validates the effectiveness of our strategy to select the value of λ.

Optimization solution of NPGTV
After image graph representation is obtained, combining with the definition of graph total variation in (8), we construct the NPGTV model and solve it. We set u 0 the graph signal derived from the original noisy image and u the corresponding recovered one, i.e. the original image. Our model can be formulated as a convex optimization problem as follows: arg min u kuk TV s:t:ku À u 0 k 2 � ε; ð11Þ where ε is the radius of a L 2 ball. In this paper, the operator splitting method, one of the most important techniques, is employed to solve the model (11). The basic idea of the method is to divide the optimized object into several convex functional in the form of summation. Thus, a complicated problem can be decomposed into several subproblems that are easier to solve. For (11), we set f 1 (u) = ||u|| TV and f 2 as the indicator function of the set H  Image denoising via a non-local patch graph total variation where i H (u) is zero if u is in the set H and infinity otherwise. prox f : < n ! < n is the proximal operator which is defined as follow [24]: The function on the right-hand in (14) is strongly convex and not every infinite, so it has a unique minimizer for every x 2 < n . Eqs (12) and (13) can be solved by various approximate splitting algorithms, such as Forward-Backward Splitting (FBS) [25], Douglas-Rachford Splitting (DRS) [26] or Alternating direction method (ADM) [27], and Primal-Dual hybrid gradient (PDHG) [28]. The convergence conditions required by Douglas-Rachford splitting (DRS) are slack. Besides, it has more general convergence character when solving the finitedimensional problem with the fixed-step DRS iterative scheme. More important, it does not need the decomposed subproblems to be differentiable like for the FBS algorithm [29]. Therefore, in our paper, we choose the DRS algorithm to solve the NPGTV model. Applications of the DRS algorithm in signal and image processing can be found in [30][31][32][33]. DRS algorithm was originally used to solve the equation μ = Ax + Bx, where A and B are both positive definite matrices. Later, it was used to solve non-linear problems. For any γ > 0, there exists at least one solution for the unconstrained non-convex problem shown in (15) min This solution satisfies the following two conditions: with prox γ,f y expressed as [24]: where γ is a parameter. Algorithm 1 presents the derivation process of the DRS algorithm for NPGTV. Here the graph signal u 0 derived from the original noisy image is set as the algorithm initial input. λ n stands for the iterative step. tol denotes the stopping tolerance parameter. The detailed derivation process of the algorithm can be found in [34].

Algorithm 1: Douglas-Rachford Splitting algorithm for NPGTV
INPUT: y 0 = u 0 , γ > 0, ε 2[0,1], tol > 0 ITERATIVELY: for n = 0, 1, . . ..I-1 do x n ¼ prox g;f 2 y n λ n 2 [ε, 2 − ε] y nþ1 ¼ y n þ l n ðprox g;f 1 y n ð2x n À y n Þ À x n Þ if y nþ1 À y n y nþ1 < tol then BREAK end if end for OUTPUT: u = y n+l The whole procedure of our NPGTV algorithm is summed-up in Algorithm 2. Algorithm 2: Non-local patch graph total variation Input: Noisy grayscale image Processing steps: A. Transform the noisy image into a modified non-local patch graph structure G; B. According to Eq (10) calculate the edge weights; C. Calculate Graph Total Variation according to Eq (8); D. Construct the optimization form of the denoising problem of total variation on graph according to Eq (11); E. Solve the NPGTV model by means of the DRS algorithm as Algorithm 1. Output: The denoised image

Algorithm complexity analysis
The computational complexity of the NPGTV algorithm depends on the KNN graph construction and on the optimization algorithm given in Algorithm 1. For an image of n pixels, n overlapped patches can be achieved. The computational complexity when directly building the graph with KNN algorithm is O(n 2 ). In order to reduce such complexity, and inspired by the AGTV algorithm, we decided to use the open-sourced library called FLANN [35]. Set the size of each image patch is s × s pixels and K value in KNN algorithm is fixed. With FLANN, the computational complexity of the graph construction is reduced to O(nlog(n)). Let I be the maximum number of iterations for the DRS algorithm to converge. In that case, the DRS computational cost is O(I|E|), where |E| represents the number of edges in the graph G = (V, E). As G is a KNN graph where E � Kn, the computational complexity of our optimization algorithm is linear in the size of the number of the graph vertex n, i.e., O(IKn). Based on the above analysis, the whole complexity of the NPGTV algorithm is O (IKn + nlog(n)).

Experiments
This section presents the experimental results we obtained by applying our method to the classic benchmark images considered in the literature. Performances of our scheme are also compared to several state-of-the-art denoising methods. Tests have been conducted using Graph Signal Processing MATLAB toolbox (GSPBox) on a PC with an Intel 4.0 GHz CPU and 16 GB of memory.
In the following examples, images were contaminated by additive independent and identically distributed zero-mean Gaussian noise (AIIDZMGN) with the standard deviation σ. To investigate the effect of the denoising process, as in [1] [4] and [8], we choose the standard Baboon, Boat and Pepper images. The top row of Fig 4 highlights the denoising results obtained on the standard Baboon image with NPGTV considered an AIID zero-mean Gaussian noise of 10 dB. Zoomed medallions in Fig 5 columns (d) and (e) allow a better screening of the result. We can see that an effective suppression of noise is achieved while complicated skin textures and periodic patterns are preserved. Similar comments can be made on the Boat and Pepper images (middle and bottom Fig 5 rows, respectively). Such qualitatively good performance can be explained by the presence of similar textures not only in the immediate neighborhood of a given pixel but also of distant pixels.
Comparisons with some state-of-the-art denoising methods have been carried out for various levels of noise. These state-out-of-art method set includes the non-local graph based transform (NLGBT) method [22], non-local means (NLM) filter scheme [4], and the classical totalvariation approach (TV) [37]. In order to make a fair assessment, we manually choose the parameters' values of each method in order to obtain the best results they can provide for a given image and noise level. Fig 6 displays the results obtained for the Barbara image contaminated by an additive zero-mean Gaussian noise of 30dB. (More experiment results of some other standard images can be founded in Figs A-E of S1 Appendix) In order to quantitatively evaluate and compare these methods, we use Peak Signal to Noise Ratio (PSNR) and structural  Image denoising via a non-local patch graph total variation similarity index (SSIM) [38]. Considering images of m×l pixels, they are defined as: where u andû denote the original image and its corresponding denoised version, respectively; μ u and mû stand for the average of u andû while σ u and sû represent their standard deviation; s uû means the covariance of u andû; c 1 and c 2 are constant values to avoid denominator be zero.
The PSNR and SSIM measures we obtained are reported in Table 1. In both cases, our NPGTV method provides noticeably better results for both noise elimination and feature preservation than the classical TV algorithm and the low-pass filtering on graph method. The proposed NPGTV algorithm not only achieves better performance than the NLGBT algorithm, but also needs less time than the NLGBT algorithm. In addition, the TV method eliminates some details that are preserved by NPGTV like the details and textures of the Barbara image. Although our method has a close link with the AGTV algorithm [11], it is far better than AGTV in terms of complexity and denoising effect. In Fig 7, we compare our method and AGTV on some benchmark images. These images are corrupted by the noise at σ = 30 dB. Intuitively, the images denoised with our method preserve more image details compared to AGTV. Table 2 shows that our method achieves higher PSNR and SSIM values on the most testing images.  Image denoising via a non-local patch graph total variation

Conclusion
In this paper, we have proposed a non-local patch graph total variation algorithm for image denoising. Unlike most continuous Total Variation-based methods for image denoising, the problem has been formulated using a graph-spectral approach. An imagewas represented as an undirected graph whose edge weights are computed by means of a Gaussian kernel function, where Euclidean distance is calculated with two modified image patches instead of two isolated pixels. The numerical implementation of the algorithm was performed through the Douglas-Rachford Splitting algorithm. We also have demonstrated the relationships between our graph denoising method and a number of alternatives including the classical total variation and adaptive graph total variation. Qualitative and quantitative assessments, as well as the comparison of our approach with several state-of-the-art denoising methods, demonstrate its effectiveness and better behavior in general. Future works will focus on convex optimization-it could be improved-and extending our approach to some other kind images, such as color image etc. Besides, we plan to bring in some technologies like superpixel/supervoxel based methods [40,41] to reduce the complexity of the graph construction.

Author Contributions
Formal analysis: Yan Zhang.