A Convex Functional for Image Denoising based on Patches with Constrained Overlaps and its vectorial application to Low Dose Differential Phase Tomography

We solve the image denoising problem with a dictionary learning technique by writing a convex functional of a new form. This functional contains beside the usual sparsity inducing term and fidelity term, a new term which induces similarity between overlapping patches in the overlap regions. The functional depends on two free regularization parameters: a coefficient multiplying the sparsity-inducing $L_{1}$ norm of the patch basis functions coefficients, and a coefficient multiplying the $L_{2}$ norm of the differences between patches in the overlapping regions. The solution is found by applying the iterative proximal gradient descent method with FISTA acceleration. In the case of tomography reconstruction we calculate the gradient by applying projection of the solution and its error backprojection at each iterative step. We study the quality of the solution, as a function of the regularization parameters and noise, on synthetic datas for which the solution is a-priori known. We apply the method on experimental data in the case of Differential Phase Tomography. For this case we use an original approach which consists in using vectorial patches, each patch having two components: one per each gradient component. The resulting algorithm, implemented in the ESRF tomography reconstruction code PyHST, results to be robust, efficient, and well adapted to strongly reduce the required dose and the number of projections in medical tomography.


I. INTRODUCTION
Users of X-ray tomography aim to push the frontiers of their studies towards new domains which require finer time resolution, better signal to noise ratio, and less radiation damage.
All these three requirements bring to a data-starving situation where for a given quality goal, the available data volume is never enough. A solution to this problem consists in filling in the gap left by the missing data with an a-priori knowledge of the solution. The study. There are mainly two ways : either the sparsity structure is a-priori known and an appropriate basis of functions can be built from the beginning, or it must be automatically learned from a learning set with the dictionary learning technique. The dictionary learning technique has recently been applied to tomography reconstruction using the Orthogonal Matching Pursuit (OMP) denoising procedure [1]. This procedure consists in obtaining first an over-complete basis of functions and then in fitting every patch of the image using at most N omp components selected from this basis. The components are heuristically selected choosing, each time, the one having the maximum overlap with the remaining error. This optimization method cannot be implemented as a convex objective function optimization problem because the linear combination of two candidate solutions can have more than N omp components. In other words the optimization domain is not convex. In this paper we present a more advanced formalism based on a convex functional that we describe in section II. For the solution of our functional minimization problem we have applied the recently developed tools taken from the field of convex optimization [2]. The result is a robust and efficient algorithm that we discuss and illustrate with synthetic and medical data in section III.

II. METHODS
In this section we introduce first the decomposition of an image into non-overlapping patches and the related objective function for denoising. Then we introduce our original formalism which ensures, using overlapping patches, a smooth transition at the patches borders, and finally we apply this formalism to image denoising and tomography reconstruction.
We denote by 1 p the indicator function of patch p, which is equal to 1 over the patch support ( typically an m*m square) and is zero otherwhere. For non-overlapping patches, covering the whole domain, we have : where i denotes the pixel position and can be thought as a two-dimensional vector. We are looking for the ideal solution x that we express by the vector, w, of its coefficients in the basis of patch functions: where the set {ϕ k } is an over-complete basis of functions over the patch support; r p is the closest to the origin corner of the patch p, and w kp is the component k, p of vector w which multiplies the basis function ϕ k in the patch p. The denoising problem, given an image y, consists in finding the minimum of a functional F (w) = f (w) + g(w) which is sum of two terms. The term f (w) = y − x 2 2 links the solution to the the data y. The other term, g(w), contains the a-priori knowledge about the solution. This way of breaking the functional in two terms has his roots in the Bayesian theorem. From a probabilistic point of view the denoising problem consists in finding, given a noisy image y of an object, the most probable object x that can generate that image. We represent the object x through the patches coefficients w. The Bayes theorem, applied to denoising, states that the conditional probability of w being the exact object given a measurement y, is the product of the probability of y being the measure given the exact solution w, times the a-priori probability of w.
Assuming gaussian noise, the conditional probability of y being the measure given the , where x is expressed through the patches co-efficients w by equation 2. The a-priori probability of w is written as exp (−g(w)/ (2σ 2 )).
The a-priori knowledge that the solution is sparse in our patches basis can be expressed by using the sparsity-inducing L 1 penalization [3]: g(w) = β w 1 . The most probable solution w is obtained by finding the F (w) minimum: The solution can be obtained by using the iterative shrinkage thresholding algorithm [2] (ISTA) iterations: where T α is the shrinkage operator defined as and γ is a positive number lesser than the inverse of the Lipschitz condition number L: γ ∈]0, 1/L].
The Lipschitz number L is such that : The ISTA algorithm can be accelerated by the Fast Iterative Shrinkage Thresholding [4](FISTA) method.
In its non-overlapping version, the image denoising with patches is able to detect features that are within the field of the patch: if a line crosses the central region of a patch, it will be detected if the basis of function has been trained to detect such lines. But in the situation where a line intersects only one point in a corner of the patch square, the signal of this point is indistinguishable from that of a noisy point, no matters the dictionary training.
For this reason the patches denoising technique is often used with overlapping patches using post-process averaging [5]. In this case the minimization problem is solved for each patch separately first, and then the averaging is performed in the overlapped regions.
In this study we do not follow this procedure but we add an overlap term into the objective function. We choose a system of patches which covers the whole domain, and we allow for overlapping. In this case the sum of all indicator functions is greater or equal to one : We define the core indicator functions 1 c p , which indicate the core of the patches, and make a non-overlapping covering: For a given point i, 1 c p (i) indicates which patch p has its center C p closest to point i: The solution x is composed using the central part of the patches as indicated by the functions 1 c p : Now we introduce the P operator which is the projection operator, for tomography reconstruction, and is the identity for image denoising. The functional F (w) whose minimum gives the optimal solution is written, for both applications, as: where the ρ factor weights a similarity-inducing term which pushes all the overlapping patches, which touch a point i, toward the value x i of the global solution x in that point.
For future reference we call y − P(x) 2 2 the fidelity term. The solution is found with the FISTA method, using the gradient of f (w) which is easily written in compact form: where P T is the adjoint operator of P, and is called back-projection operator in the case of tomography, and is still the identity for image denoising.

III. APPLICATION
In this section we compare the fit with patches of a given image with or without overlapping. We analyze the quality of the recovered image as a function of the two regularization parameters β and ρ. The same study is then performed on the same image with additive noise. Then we apply the method to tomography reconstruction of virtual phantoms using our convex functional with patch overlapping. We analyze the quality of the reconstruction as a function of the two regularization parameters projections. Finally, we apply the method onto experimental cases for the reduction of number of projection in order to reduce the deposited dose during a tomography.
A. Fitting problem pixels support.
We apply this set of patches that we have learned from Lena on a completely different image (Boy [7]). In figure 2 we show the effect of fitting with non-overlapping patches a noiseless image. The original image is subfigure 2a). It has been normalized to have its  With these values these images have nearly the same sparsity than the non-overlapping results in the same columns of the first row.
One can see in figure 2 that using overlapping patches the discontinuities have disappeared and the lines are homogeneously represented along their length even for the strongly regularized case. The overlap constraint reduces the tessellation effects but we can see that for the same sparsity ratio, another kind of distortion, in the form of smoothing, appears.
This is due to the fact that the fidelity term, which links the data to the solution, is proportional to the area of the image, while the L 1 term acts on all the w pk coefficients whose number increases not only with the area of the image but also with the number of replica.
Taking into account that we have 9 replica per patch, we show in the first image of second column the result obtained with β = 0.2/9. This rescale back the L 1 term to a situation which is comparable with the β = 0.2 case of image b) of the first row. The level of preserved details is comparable in these two cases. The sparsity, however is bigger in the overlapping case: s = 0.068 which means nearly seven functions per patch. In the non-overlapping case, instead, it was about 3 components per patch. This means that the similarity inducing term is constraining a part of the available degrees to realize smooth transitions between neighboring patches.

B. Denoising problem
For the denoising problem two things must be kept in mind to select the regularization weights: for an image with a stronger noise a stronger value of β will be necessary, but it is also true that for a stronger value of the regularizing parameter more features of the noiseless image will be filtered out. The best value of the regularization parameters is therefore a compromise between the necessity of filtering out the noise and that of preserving image features. We define the quality improvement factor Q based on the the Structural SIMilarity (SSIM [8]) index S which is 1 when the images are identical. Our definition of Q is : We We observe that the quality improvement factor, for the overlapping case, peaks at a β value which is about 1/9 of the β value of the no-overlapping peak. As explained before this lower β is compensated by the nine-fold increase in the number of components which are weighted by L 1 norm. We note that the optimal quality is better with the overlapping method. The trend of β as a function of the noise level is as expected : a stronger noise needs to be regularized with a stronger β.
In figure 3 we show the denoising results (a and c) for two noised images y noised (b and d) obtained adding a σ {noise} = 0.1, 0.2 strong white gaussian to the original image : . The denoising has been performed with a replication step of 3 pixels and with the β and ρ giving the optimal SSIM.

C. Dose reduction in computed tomography
In this section we apply the method using overlapping patches for tomography reconstruction on synthetic data. The figure 5 presents a 1024 × 1024 pixels phantom and a bases of patches that has been learned from the phantom .
We show in figure 6 the reconstruction obtained from 150 projections between 0 and 180 degree. Note that this is well below the number of projections, of the order of thousand, that would be required in this case by the Shannon-Nyquist sampling theorem. The lack of information due to the small number of projections results in a noised reconstruction when the filtered-back-projection reconstruction is applied (left) while the image recovered with our method (right), with (3l, 3n) translations for replication, maintains a high quality.

FIG. 5: A phantom and its K-SVD basis
For this noiseless case the choice for β and ρ was easy and figure 6 was produced with our first guess which gave already an excellent reconstruction, without need of further optimization. The first guess was β = 0.001 and ρ = 50. The quality improvement Q between the FBP result and our method is 20. The improvement factor is in this case the FBP error (the SSIM index distance from 1 as in 14) divided by the error of our method.
The noisy case is instead more complex. To study it we add to the phantom sinogram a gaussian white noise with a σ equal to 5% of the maximum sinogram value. In figure 7 we show the Q dependency versus β(squares) and ρ(circles) not far from the optimal choice.
We have not fully optimized Q because we have found that in this extreme case where ρ tends to be very big, we get a substantial decrease in convergence speed after the shoulder in the ρ(circles) curve. Before the shoulder we have convergence in some hundreds of iterations while after the shoulder it takes some thousands to converge.
We observe that at these high values of ρ the effective behavior of our method is similar to those of a total variation penalization : the image is flattened over large regions. We think that the increase in convergence time is due to the fact that the algorithm takes a longer time to propagate when the flattened regions are larger.

D. Differential Phase tomography
Finally we show a promising application of our method to medical tomography for a sample imaged using X-ray phase contrast imaging(PCI). This technique has shown an enhancement of soft tissue visualization in comparison to conventional techniques [9]. PCI employs the dual property of X-rays of being simultaneously absorbed and refracted while passing through a tissue. Among all the phase contrast techniques we chose to test our method on the analyzer based imaging because of the high sensitivity and unique results provided by this modality for investigating large and highly absorbing biological tissues (i.e. full human breasts) [10]. Because breast are highly radio sensitive organs, X-ray CT of these organs are not clinically applied, even if a 3D would be benefit for radiologists. A reduction of the deposited radiation dose in CT combined with the unprecedented contrast improvement offered by PCI is thus of high interest for breast cancer detection.
In the analyzer based PCI technique, the projection data contain a signal which is proportional to the gradient of the X-ray phase in one direction (i.e. the direction perpendicular to the plane formed by the incoming and diffracted X-rays on a perfect Bragg crystal which is used for analyzing the radiation passing through the sample). When the object is rotated around an axis (Z-axis, for instance), this signal contains contributions from the X and Y gradient components, where the X and Y axis corotate with the sample. The two components are de-phased by a rotation angle of 90 degrees and can be reconstructed separately by multiplying before-hand the sinogram with the cosine and sine of the rotation angle. We apply our formalism considering that the reconstructed and learning images are vectorial objects : the value associated with a pixel is not a scalar but a two-component vector. More details on the principles and technical aspects of PCI are available in [9].
The sample studied is a 7cm human breast imaged with a pixel size of 100 µm. The experiment was conducted at the biomedical beam line at the European Synchrotron Radiation Facility (ESRF). The sample was a human breast mastectomy specimen. The study was performed in accordance with the Declaration of Helsinki and was approved by the local ethics committee. A monochromatic X-ray beam with energy of 60 keV was used to image the breast cancer sample. Result of reconstruction is shown in figure 10(top). On this image, radiologists could easily identify the skin, fat and glandular tissue.
The training set is obtained from another breast sample imaged by the same technique but with an high quality reconstruction. We consider a slice image, we apply a Sobel filter to extract the two derivative components and we run the KSVD algorithm. Figure 9 shows the patches basis functions that we use to fit both components at the same time. The patches size is 7 × 7 pixels and each basis function is displayed as a 14x7 rectangle whose upper 7 × 7 part is the X component and the lower one the Y component. For an eventual future clinical application of the PCI method it is important to investigate which is the acceptable compromise in terms of low dose and sufficient level of image quality.
We need therefore to better explore how the quality of the reconstruction is degraded when we reduce the dose (i.e. number of projections and the acquisition time) further below the standard values. To this end, we performed a reconstruction with only 125 projections and results are shown in the figure III-D for one gradient differential image.
For an eventual future clinical application of the PCI method it is important to investigate which is the acceptable compromise in terms of low dose and sufficient level of image quality.
We need therefore to better explore how the quality of the reconstruction is degraded when we reduce the dose (i.e. number of projections and the acquisition time) further below the standard values. To this end, we performed a reconstruction with only 125 projections and results are shown in the figure11. The first column present the result using our method, the second column is the result of reconstruction using FBP algorithm. If a slightly higher noise level is tolerable, the method may be used with very few pro-jections and thus applied to the screening and diagnosis of human breast cancers with an even lower radiation dose than conventional dual mammography. The results of our reconstruction show an image quality and a capability of discriminating fine structures that are still clinically acceptable. On the contrary, images produced with the standard FBP reconstruction method are very noisy and not diagnostically satisfactory.

IV. CONCLUSION
We have presented a new convex functional which implements in a mathematically pure form the concept of overlapping-patches-averaging, which was used so-far with a non-convex formalism. The resulting algorithm is robust, efficient, and well adapted to strongly reduce the noise in a natural image. The method was applied also to a medical diagnostic case by considering phase contrast tomographic data of whole cancer bearing human breasts acquired with phase contrast imaging. We demonstrated that it is possible to reduce the deposited dose in breast CT by a factor 5 compared to the standard algorithm while keeping the same image quality. Although we used this specific example as proof of principle in this study, the method we developed and described can be easily applied to other medical tomography fields.
The numerical results have been generated with PyHST [11,12], tthe ESRF tomography reconstruction code which uses the GPU implementation of the presented methods.