Efficient learning-based blur removal method based on sparse optimization for image restoration

In imaging systems, image blurs are a major source of degradation. This paper proposes a parameter estimation technique for linear motion blur, defocus blur, and atmospheric turbulence blur, and a nonlinear deconvolution algorithm based on sparse representation. Most blur removal techniques use image priors to estimate the point spread function (PSF); however, many common forms of image priors are unable to exploit local image information fully. In this paper, the proposed method does not require models of image priors. Further, it is capable of estimating the PSF accurately from a single input image. First, a blur feature in the image gradient domain is introduced, which has a positive correlation with the degree of blur. Next, the parameters for each blur type are estimated by a learning-based method using a general regression neural network. Finally, image restoration is performed using a half-quadratic optimization algorithm. Evaluation tests confirmed that the proposed method outperforms other similar methods and is suitable for dealing with motion blur in real-life applications.


Introduction
Imaging systems suffer from several types of image degradations. Motion blur is a common phenomenon, as are defocus blur and atmospheric turbulence blur. All of them reduce the image quality significantly. Therefore, it is essential to develop methods for recovering approximated latent images from blurry ones in order to increase the performance of imaging systems. Such methods will find wide applicability in various fields. However, the issue of blur removal is a notoriously ill-defined inverse problem that has perplexed scholars for decades [1]. A point spread function (PSF) can be used to describe image blur. The PSF models how the imaging system captures a point source or object and describes how the point spreads across an image. Generally, the PSF can be transformed into a parametric model [2], with the parameters being the motion length, defocus radius, and turbulence degree [3].
For estimating the PSF, Fergus et al. [4] proposed a method that combines the gradient domain of natural images with the maximum a posteriori (MAP) method. Xu et al. [5]  • A learning-based parameter estimation scheme is proposed. With the help of the GRNNs, the blur model parameters for input blurry images can be estimated based on their blur features. Then, their PSFs can be constructed.
• A nonblind deblurring algorithm based on sparse representation is presented. Its target image is constrained by an p quasi-norm in the cost function.
• The solutions for different p values are discussed and their performances on some typical p values are analyzed. Further, the proposed deblurring method is compared with several related blur removal approaches.
The remainder of this paper is organized as follows: Section 2 introduces the imaging system and the parametric model for the three most common blurs. Next, the proposed blur feature is described. Section 3 describes the proposed deblurring framework, including the properties of the GRNN and the restoration algorithm. The results of the simulations and comparative analysis performed are described in Section 4. Section 5 summarizes the study and presents concluding remarks.

Blur models and features
Imaging system. An imaging system can generally be regarded as a linear-translationinvariant system [19], and the image blurring procedure, shown in Fig 1, can be described as follows: where f(x,y) and g(x,y) represent the original image and the observed blurry image, respectively; h(x,y) is the PSF; η(x,y) is the additive noise generated during image acquisition or transmission; x and y represent the coordinates of a digital image; and the symbol " � " represents the convolution operator. For simplicity, we can also use bold notation (such as f and h) to represent the image and PSF, respectively.

Parametric model of blurs
a) Linear motion blur [20]. Relative movement between the camera and the target when the exposure time is insufficiently small results in linear motion blur. Its PSF can be modeled as follows: hðx; y; L; FÞ ¼ L À 1 ; ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where L represents the motion length and F is its orientation. An example of linear motion blur is shown in Fig 2A. b) Defocus Blur [21]. In the case of defocusing, blurring arises because of the optical system of the lens. The path followed by light through the lens depends on its wavelength. However, natural light consists of components with several different wavelengths. This results in physical limitations with respect to the construction of lenses that prevent the camera from producing perfectly sharp images, leading to defocus blur. The extent of blur that is visible in an image is a function of the lens aperture: where the radius, r, determines the blur extent. Further, (x 0 ,y 0 ) is the center of the PSF. This is depicted in Fig 2B. c) Atmospheric turbulence blur [7]. The variations in the heat, pressure, and wind velocity in the atmosphere result in small-scale, irregular air motions characterized by winds, which vary in speed and direction. These have a determining effect on light propagation and cause atmospheric turbulence blur, which is also known as Gaussian blur. This type of blur usually occurs during remote sensing. Generally, a low-pass Gaussian filter [3,22,23] can be used to model it: where σ indicates the blur degree and (x 0 ,y 0 ) is the center of the PSF. Since Eq (4) is an infinite impulse response filter, proper truncation and normalization are necessary in practice. Normally, the support domain is set as [x 0 −3σ,x 0 +3σ]×[y 0 −3σ,y 0 +3σ], since 99.7% of the energy is contained in this region; here, × is the Cartesian product. Fig 2C show an example of the Gaussian blur model.

Blur features
Natural images are diverse, and their statistical characteristics vary significantly. Nevertheless, in recent years, an increasing number of studies have indicated that the gradient domains of natural images share common features and that their histograms tend to have sharp tops and long tails [24]. Generally, a digital image can be treated as a discrete binary function mathematically, and its gradient domain can be represented based on the pixel difference. In practice, image edge detection operators [3,25] such as the Roberts and Prewitt operators are used commonly for detecting the first-order derivatives while the Sobel and Laplacian operators are used for the second-order derivatives.
In this study, inspired by the properties of image edges, a blur feature based on the autocorrelation of blurry images in the gradient domain is proposed. By definition, autocorrelation is the correlation of a signal with a delayed copy of itself as a function of the delay [26]. It is a useful tool for measuring the similarity between a signal and its shifted versions. A digital image can be treated as a two-dimensional (2D) real signal and its gradient domain e(x,y) as a binary function. Thus, its autocorrelation R(x,y) can be described as where Z x and Z y are the support domains of e(x,y).
In addition, based on the conjugate symmetry of real signals [27,28], the computation of Eq (5) can be accelerated with the help of the fast Fourier transform (FFT) and inverse fast Fourier transform (IFFT). Therefore, the autocorrelation of the image edge, e(x,y), can be simplified to where R and e represent the matrix form of R(x,y) and e(x,y), respectively, for brevity, and F ð�Þ and F À 1 ð�Þ are the FFT and IFFT operators. Further, |�| 2 represents the element-wise square of the modulus of the former.
Since the value range of image edges will vary widely, their blur features should be adjusted on a notionally common scale. Feature scaling [29], also known as unity-based normalization, is a technique that is used to bring all values within the range [0,1]. Hence, the normalized blur feature, R norm , can be described as follows: where R max and R min are the maximum and minimum values of R. In Figs 3-5, the left column shows examples of natural images that were blurred artificially by motion blur, defocus blur, and Gaussian blur using different parameters. Their gradient domain versions were extracted by using the Sobel operator both in the horizontal and the vertical directions. The image edges, which were obtained by adding them, are shown in the middle column. As can be seen, with an increase in the image blurriness, the information contained in the edges decreases. The right column shows the amplitude of the blur feature described in Eqs (5)- (7). It can be seen from Fig 3 that the features of the motion-blurred images retain the orientation of the PSF, but the length of the shape of the blur features increases with the motion length. In the case of the defocus blur, the amplitude range of the feature increases with the defocus radius, r, as shown in Note that, in Figs 3-5, because the size of the blur features depends on the size of the images, according to Eqs (5)- (7), and most of the amplitude energy is located near the center of the blur feature, their boundaries have been truncated to ensure that the blur features are all the same size.

The proposed learning-based blur removal method
General regression neural network. A GRNN is a probabilistic neural network (PNN), and was first proposed by Specht [30]. It can converge to the underlying function of the data with only a few training samples available, and its spreading rate is the only additional parameter that requires adjustment by the user. This makes GRNNs a very useful tool in areas such as regression, prediction, and classification.  A GRNN is composed of an input layer, a pattern layer, a summation layer, and an output layer [31]. Its overall block diagram is shown in Fig 6. In the input layer, X = (X 1 ,X 2 ,� � �X n ) represents the input data of the GRNN G(X).
In the pattern layer, the number of units, s, is equal to the length of training data (x 1 ,x 2 ,� � � x s ) and (y 1 ,y 2 ,� � �y s ). The activation function of each unit, i, is given by where δ is the spreading rate, and T represents the transpose.
In the summation layer, S D ¼ P s i¼1 r i and S N ¼ P s i¼1 r i y i need to be determined. Their ratio yields the final output, G(X) = S N /S D .
Taking another perspective, a GRNN is a generalization of a radial basis function network (RBFN) and a PNN. In the pattern layer, the activation function in Eq (8) is similar to the radial basis function kernel (Gaussian kernel), and it uses the training sample x i as the mean value of each unit. In the output layer, S N /S D indicates the probability of how well the training sample can represent the prediction position. In practice, the GRNN outperforms the RBFN as well as traditional back propagation neural networks in predictions, as the former only involves a one-pass learning scheme instead of repetitive iterations during the training process [30]. Therefore, the advantages of the GRNN are fast learning and convergence, even though the number of inputs is very high.
Framework of the deblurring method. The principal framework of the proposed deblurring method is shown in Fig 7. It consists of two primary steps. The first is a learning-based parameter estimation procedure while the second involves the use of a nonlinear deblurring algorithm.
In order to estimate the blur parameters efficiently, more than ten thousand natural images were artificially blurred using the three above-mentioned blur models and different parameters, and their blur features were extracted using Eq (7). The Frobenius norms [32] of these features, k R norm k F , were taken as the inputs to train a GRNN for each blur model. In this manner, for a given blurred image, the PSF could be estimated from its blur feature while using the GRNN. Next, the latent image was restored via a nonblind blur removal algorithm, which is discussed in the following subsection. The main steps for parameter estimation are shown in Algorithm 1. (2) GRNN: G(X); Output: Deblurring result: f; 1: Extract the edges, e, from the input blurry image, g; 2: Calculate the blur feature, R norm , using Eq (7); 3: Estimate the blur model parameter, Y, using the network, Y ¼ Gðk R norm k F Þ; 4: Generate the PSF, h, from the estimated Y; 5: Restore the latent image, f, using Algorithm 2; 6: return f; Latent image restoration. Once the PSF has been acquired, a variety of nonblind deconvolution methods can be used to recover the latent image. Wiener filtering and the Lucy-Richardson method are frequently used. These deblurring techniques were proposed decades ago based on the MLE. However, they both have the disadvantage of being sensitive to any incorrect PSF estimate, and the ringing effect [33,34] is unavoidable. In recent years, the theory of sparse representation and machine learning have been introduced in the field of image restoration [35][36][37]. Based on a statistical analysis of natural images, several studies have shown that the image gradients tend to have a heavy-tailed distribution [38,39]. The most commonly used form of this distribution is the hyper-Laplacian model [13,38]. For each element of the image gradient rf i , the hyper Laplacian model can be expressed by the joint distribution of f as follows: where � 2{x,y} represents the gradient in two orientations and α signifies the slope of the exponential function. From the perspective of MAP estimation, the image prior is turned into a regularization term in the logarithm cost function [17]. Let k�k p represent a quasi-norm, which is defined as In this study, the cost function was defined as follows: where rf is short for (r x f, r y f). The first term on the right-hand side of the equation is a data fidelity term to ensure the best approximation of the original image while the second term is a constraint on the image gradient. Because the optimization problem described in Eq (11) is nonlinear, traditional descent methods are ineffective because of their slow convergence. Fortunately, inspired by the halfquadratic penalty method [15,16,40], which can alternately optimize the p -based expression in Eq (11) when 0<p�1, the problem can be simplified to an alternating optimization problem: where β is an intermediate coefficient that is varied during the alternating scheme. The solution of Eq (12) converges to that of Eq (11) as β!1, and the optimization procedure can be described by the two subproblems given below: (1) Update u. For a given f, Eq (12) becomes an p -based constrained optimization problem,û (2) Update f. For a given u, the minimization problem is reduced to an 2 -norm constrained optimization problem; this is also known as least-squares minimization, For the u subproblem, the solution of Eq (13) varies for different p values. In particular, when p = 1, it becomes a total variation problem, whose analytical solution can be obtained by the soft thresholding algorithm [15], where sgn(�) is the signum function and threshold T = α/(2β).
In general, for a given p, the minimization problem can be solved by setting the derivative of Eq (13) to zero. Thus, for the ith element of u and rf, we get In particular, when p = 0.5, Eq (16) can be simplified to a cubic function, which can be solved using Cardano's formula [41]. When p = 2/3, Eq (16) can be expanded as a quartic function, which can be solved using Ferrari's and Descartes' solutions [41]. For some special cases with 1<p<2, analytical solutions can be obtained as described in [42]. Finally, for all other p values, no analytical solution exists. However, the Newton-Raphson method is more effective in these cases.
For the f subproblem, the closed-form solution of Eq (14) can be obtained in the frequency domain:f where F ð�Þis the conjugate operation and � is the Hadamard product. Recognizing that the operation in the frequency domain is based on an assumption of a circular shift, the image boundaries should be preprocessed to second-order smoothness, as reported previously [43]. The pseudocode in Algorithm 2 describes the deblurring method. First, the observed image, g, is used to initialize f; the temporary coefficient, β, should be set to a small value.
Next, the algorithm updates the intermediate optimal values,û andf , alternately. Meanwhile, ε is set to a large value and it is ensured that the condition k>1 is met, so that β!1. Finally, the algorithm converges to a stable state, yielding the best approximation of the original image.

Experimental evaluation
Experimental setup. A training dataset from the Pascal Visual Object Classes (VOC) dataset was used [44]. The dataset consisted of 16,135 natural images of animals, humans, transportation, landscapes, and architectures, among other entities. Fig 8 shows some examples of images from the Pascal VOC dataset. Different artificial blurs were induced in them using increasing parameters so as to build a training set for each blur type, including a series of Gaussian blurs whose σ values were increased from 1 to 5 in steps of 0.2 and whose dimensions were of 25×25 pixels. For linear motion blur, estimating the motion length and the orientation simultaneously will lead to overfitting. Thus, in the experiment, we maintained the same orientation and trained a GRNN for each of the five values for the angle F; i.e., 0˚, 30˚, 45˚, 60˚, and 90˚, respectively. The motion length, L, increased from 2 pixels to 20 pixels.
Next, the blur features, R norm , were extracted from each training set. Hypothesis testing [45,46] was performed to counter the adverse effects of the outliers. Finally, the features and their corresponding parameters were used as the training sets for the GRNNs while setting the spreading rate, δ, to 0.6-0.9. The training procedure is similar to that of an RBFN: 1) An unsupervised learning method is needed to determine a set of offsets in the activation function. 2) A least squares method is used to train the weights in the summation before the output. Furthermore, the difference is that each unit of a GRNN is influenced by every sample from the training set. Thus, each training sample x q acts as the offset of the activation function in each pattern unit of a GRNN. Additionally, in the summation layer, the outputs of the pattern unit ρ q are weighted with the corresponding values of the training samples, y q , when going to the denominator unit S D , but are weighted with one when going to the numerator unit S N . Because of the use of the one-pass algorithm, the training procedures converged rapidly and stably. The training was implemented in MATLAB using an Intel Xeon E5-2620 v4 CPU (2.1 GHz) and an NVIDIA 1080 GPU. It took approximately 10-15 minutes to generate a network.
As can be seen from the flowchart in Fig 7, for an image blurred by an unknown PSF, the blur feature should be extracted first. Then, using the GRNN as well as the parametric blur models, its PSF can be estimated successfully. In this manner, the latent image can be restored from the blurry image and the PSF estimated by the deconvolution method described in Algorithm 2. The value of coefficient α in Eq (12) was determined by trial and error, while the criteria for convergence in the algorithm were set as ε = 1e4 and k = 2, respectively.
Next, we evaluated the convergence speed of Algorithm 2 for different p values. A series of defocus blurs with increasing radii were simulated in order to test the performance of the solution methods for Eq (14). The simulations were performed on the same platform as that used in the training procedure discussed above. Fig 9 shows the cost times for p values of 1/2, 2/3, 3/ 4, 4/5, 1, and 2. Generally, the Newton-Raphson method is effective for solving Eq (14). However, as discussed in the previous section, for special cases, such as those where p = 1/2 and p = 2/3, analytical solutions exist, as given in Eq (17) and Eq (18), that converge faster than those for 0<p<1 values. This fact is also shown in Fig 9A and 9B. Moreover, when p = 1, Eq (14) becomes a one-dimensional shrinkage problem [15] and converges much faster, as shown in Fig 9C. When p =2, the image prior reduces to a Gaussian prior, and its deblurred equivalent exhibits a greater number of ringing artifacts and more noise than does the sparse prior [47]. As a result, we used p = 1 for the rest of the evaluations and comparisons.
The test images were taken from the LIVE database [48]. In this paper, they are numbered as "Img01," "Img02," etc. for ease of comprehension. Table 1 shows their labels and filenames as well as the parameters for the simulated blur models. Fig 10 presents the examples of ground truth images. The testing process is described in the following subsection.
We compared the proposed method with several other blind image deblurring methods. Cho and Lee [49] used the shock filter and bilateral filter with multiple orientations of images to enhance the structures, with the aim of ensuring that the histograms of the gradient domain of the reconstructed images were similar to those of the original one. Krishnan et al. [50] proposed an image prior model based on a piecewise function that approximates the statistical characteristics of natural images using a sparsity representation technique. They solved the deblurring problem using an iterative shrinkage-thresholding algorithm. Zhao et al. [51] proposed another PSF-estimating method based on the bilateral filter and shock filter, whereas Wu and Su [52] restored an image by adopting a unified probabilistic model of the image for solving the MAP problem.  To measure the performance of the proposed method and compare it with other similar methods, we used the peak signal-to-noise ratio (PSNR) [53], structural similarity (SSIM) index [53,54], and feature similarity (FSIM) index [55] as the image quality assessment (IQA) metrics.

Experimental results
The test images were blurred by motion blur, defocus blur, and Gaussian blur using the model parameters listed in Table 1. An example of "Img04" blurred by motion blur using (L,F) = (18,30˚) and the deblurring results are shown in Fig 11. The brand names written on the caps and the grains in the wood wall behind them are blurred and indistinct in Fig 11A. The deblurring methods described in [49] and [50] introduce ringing effects both around the caps and the image borders, as shown in Fig 11B and 11C. This phenomenon is reduced by the methods described in [51] and [52], as is evident from Fig 11D and 11E. However, the wood grains in the image are not sharp and clear. In contrast, as can be seen from Fig 11F,

PLOS ONE
Sparse optimization for image restoration method restores both the texture and the background. Table 2 lists the IQA data of a comparative analysis performed on the blurred images. Fig 12 shows another example of motion blur with (L,F) = (16,45˚); the image here is "Img13." The lighthouse and cabins are blurred and unclear in Fig 12A. As can be seen from Fig 12B, the deblurring method described in [49] reduced some of the blurriness; however, the visual effect is still lacking. Further, as can be seen from Fig 12C, the method described in [50] restored the sharpness but introduced strong artifacts. These limitations were overcome by the methods described in [51] and [52] (see Fig 12D and 12E, respectively) as well as by the  [49]. (C) Method in [50]. (D) Method in [51]. (E) Method in [52]. (F) Proposed method. https://doi.org/10.1371/journal.pone.0230619.g011

PLOS ONE
proposed method (see Fig 12F), which restored most of the details. The statistics related to the IQAs as determined from a comparative analysis of the motion-blurred images are shown in Fig 13. For both examples, namely, "Img04" and "Img13," the PSNR, SSIM, and FSIM were the highest for the proposed method. In addition, Table 2 lists the original IQA data.
Similarly, Fig 14 shows another example, namely, "Img15," which was blurred by defocus blur with r = 4. In daily life, photograph degradation resulting from the camera being out of focus is a common phenomenon. As can be seen from the figure, the texture on the wings are blurred and fuzzy, although the extent of the blurriness is low. The compared methods are all able to restore the details effectively, but our method has the least ringing effect. It should be noted that there are spatially varying blurs in part of the ground truth image, such as the background and some of the flowers; consequently, these details cannot be restored. Nonetheless, in the case of the "mornach.bmp" image, the proposed method outperformed the others, as is evident from the IQA data obtained from a comparison of the defocus-blurred images (see Table 2 and Fig 15).
In Fig 16, an image, namely, "Img19," blurred by Gaussian blur with σ = 3.2 is shown. The outline of the plane is unclear and so are the words on the fuselage (see Fig 16A). Atmospheric   [49]. (C) Method in [50]. (D) Method in [51]. (E) Method in [52]. (F) Proposed method. https://doi.org/10.1371/journal.pone.0230619.g012 turbulence is a complex phenomenon that has troubled scholars for decades. Its PSF is symmetrical and its support domain is infinite both in the spatial and frequency domains. In practice, truncation in the spatial domain makes the deblurring procedure feasible. However, it  [49]. (C) Method in [50]. (D) Method in [51]. (E) Method in [52]. (F) Proposed method. https://doi.org/10.1371/journal.pone.0230619.g014 leads to unavoidable errors that reduce the accuracy of the restoration results. As shown in Fig  16B-16F, all the methods could recover the outline of the plane successfully but were incapable of restoring the words on the fuselage. Their quality assessments are listed in Table 2 [49]. (C) Method in [50]. (D) Method in [51]. (E) Method in [52]. (F) Proposed method. https://doi.org/10.1371/journal.pone.0230619.g016 shows the IQA parameters for all the tested images. It can again be seen that the proposed method was superior to the other methods in most cases. In addition, the comparative analysis showed that ringing effects are inevitable in the case of the conventional blind blur removal method [49,50]. Further, while the ringing effects were reduced to a certain extent in the cases of the methods given in [51] and [52], they did exist. On the other hand, the proposed deblurring method exhibited promising results in terms of both the visual quality and objective assessment parameters. Fig 18 compares the proposed method with the other methods in terms of speed. The size of the testing images was 768×512×3, and the sizes of the testing PSFs ranged from 3×3 to 25×25. As can be seen, the processing time of the methods in [50][51][52] increases rapidly with the PSF size. The method in [49] converges rapidly, but the IQAs of its results are unsatisfactory. Our method requires the lowest runtime and produces a deblurring result within five seconds, whereas tens of seconds or even more than a minute are required by the other methods.

Real-life applications
In the electronics industry, integrated circuit boards moving on a conveyor line are examined by a fixed-position camera hanging over them. As a result, these boards suffer from a significant degree of motion blur and their photographs are of extremely poor quality (see Fig 19 and Fig 20A). The deblurring results from the methods in [49][50][51][52] are shown in Fig 20B-20E. As the movement of the conveyor can be regarded as linear motion with a constant orientation,  [49]. (C) Method in [50]. (D) Method in [51]. (E) Method in [52]. (F) Proposed method.
https://doi.org/10.1371/journal.pone.0230619.g020 its PSF can be estimated using the proposed method. The thus-obtained deblurring results are shown in Fig 20F. It can be seen clearly that the text is clearer and the area around the pins of the chip shows more details.

Conclusions
In this study, a blur parameter estimation technique based on deep learning and a nonlinear regularized deblurring method based on sparse representation for image blur removal were proposed and evaluated. The blur features were determined based on the autocorrelation of the gradient domain of the image, which is strongly related to the blur degree. Further, a GRNN was adopted for learning-based parameter estimation. More than ten thousand natural images were used to generate training samples of motion blur, defocus blur, and atmospheric turbulence blur. Using the GRNN, the PSF of the test blurry images could be estimated correctly. Moreover, the deblurring algorithm converged quickly owing to the use of a half-quadratic method. The results of simulations and a comparative analysis confirmed that the proposed method is superior to other deblurring methods. The main disadvantage of GRNN is that its size can be huge, which results in expensive computation. Another limitation of the proposed method is that it can only deal with spatially invariant blurs in specific scenarios. Although the training process is time-consuming, it only needs to be carried out once. In future work, a better training dataset and blur features will be considered to improve the performance of the neural network, and the deblurring model can be extended into nonlinear blurs for wider application.