Joint bayesian convolutional sparse coding for image super-resolution

We propose a convolutional sparse coding (CSC) for super resolution (CSC-SR) algorithm with a joint Bayesian learning strategy. Due to the unknown parameters in solving CSC-SR, the performance of the algorithm depends on the choice of the parameter. To this end, a coupled Beta-Bernoulli process is employed to infer appropriate filters and sparse coding maps (SCM) for both low resolution (LR) image and high resolution (HR) image. The filters and the SCMs are learned in a joint inference. The experimental results validate the advantages of the proposed approach over the previous CSC-SR and other state-of-the-art SR methods.


Introduction
Image super-resolution (SR) arms to reconstruct a high resolution (HR) image from a single or several low resolution (LR) image of the scene together [1][2][3][4][5]. The resolution limitations of low-cost imaging sensors is overcome by SR methods, and the degradation in the LR images caused by blur and the motion of camera or scene are utilized to reconstruct the HR image by SR methods. Since the motion parameters are estimated along with HR image solely from the LR images, SR is a difficult inverse problem, especially the single image SR.
To reconstruct a HR image from the LR image, information usually was lost during the down-sampling procedure, and more prior knowledge should be exploited. The errors are easily caused in estimating the missing pixels of HR image based on some simple assumptions that does not hold in many practical systems. For better reconstructing complicated structures in natural images, local priors are exploited in image patches by the single image super resolution (SISR) for estimating the HR image [6][7][8][9]. Example-based methods are one of the most important SISR methods. The existing example-based methods can be categorized into sparse coding-based and mapping-based methods [10][11][12]. Sparse coding-based methods train a couple of dictionaries for LR and HR image patches, and there are many approaches are proposed to build mapping functions between the LR and HR patches [13][14][15][16]. The mapping functions are learned using LR and HR patch pairs [6], [17][18]. For obtaining the final results, the pixels in the overlapped patches are averaged by the previous example-based methods. However, the pixels in the overlapped patches usually are not consistent, and the averaging process may a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 destroy the inconsistency. Some approaches are proposed to overcome this problem and show the improved performance in image reconstruction [19].
Recently, the convolutional sparse coding based super-resolution (CSC-SR) method [19] utilizes a global sparse coding method for preserving the consistency better. CSC has been applied in unsupervised learning for visual features [20], [21]. It represents the input signal by the linear combination of N sparse feature maps corresponding to N filters. The number of filters in decomposing LR image is different from the number of filters in reconstructing HR image in CSC-SR method. By processing the global image rather than in each local patch, the previous CSC-SR has shown its outperformance over the sparse coding (SC) based ones. Although CSC-SR adopts a more adaptive decomposition-reconstruction strategy, this model still utilizes the fixed number of filters. Therefore, these parameters should be assigned a priori and a structure of latent space representing the input data should be applied to solve CSC problem. In that case, the sparse coding maps and the filters should be learned optimal. Otherwise, the errors in estimating the parameters may lead to instability in reconstructing the HR image.
To address the above problems, we present a novel convolutional sparse coding based super-resolution method. There are two stages in the proposed method. Firstly, we learn the filters and the sparse coding maps adaptively by modeling them using a Beta-Bernoulli distribution in decomposing the LR image. Secondly, we use the same distributions associated with the sparse coding maps when reconstructing the HR image. In each stage, the sparse coding maps are learned by minimizing the CSC problem with a variance Bayesian inference process. The Bernoulli distributions have an ability of controlling the frequency with the factors. Since learning processes on LR image and HR image are based on the same distribution, the filters and the sparse coding maps in each stage are inferred simultaneously under a joint inference process. The experimental results of the proposed method validate the competitive results with the state-of-the-art methods.

Related works
Due to the consistency between the overlapped patches, the existing sparse coding-based methods [13][14][15][16] may smooth out the consistency including the high frequency edges and structures of the image. To preserve the consistency, a convolutional sparse coding is proposed to encode the whole image in [22]. The CSC based super-resolution (CSC-SR) method learns the sparse coding maps of the LR and HR image separately by solving the CSC problem. We use 3×3 low pass filter with all coefficients being 1/9 to extract the smooth component of the LR image, which is the same as the work [19]. Denote by y the LR image, by Y s the smooth component of y, by Y the residual component of y. Y represents high frequency edges and texture structures. In CSC-SR, after the smooth component of LR image Y s extracted by a low pass filter, and a group of LR filters are learned to decompose the residual component Y: where Y is the residual component including high frequency edges and texture structures, ff l i g i¼1...N are the filters, fA l i g i¼1...N 1 are the sparse coding maps. k•k F represents the Frobenius norm and k•k 1 represents l-1 norm. is convolution operator. And we use the initial sparse coding map provided on the website, http://www4.comp.polyu.edu.hk/~cslzhang/papers.htm. Gu et al. [19] use a stochastic average based alternating direction multiplier algorithm (SA-ADMM) to efficiently solve the CSC model (1) with a large number of filters. The HR image is also decomposed into a smooth component and residual component. X denotes the residual component which represents high frequency edges and structures of HR image. The N 1 -dimensional sparse coding space of ðA l 1 ; A l 2 ; Á Á Á A l N 1 Þ can be transformed to N 2 -dimensional sparse coding space by multiply a trained mapping function M 2 R N 1 ÂN 2 [19]. By amplifying the transformed matrix ðA l 1 ; A l 2 ; Á Á Á A l N 1 ÞM with k factor, the column vectors of the transformed matrix are the initial the sparse coding map set fA h j g j¼1...N 2 . The filters and the mapping functions can be learned by solving another CSC minimization problem, where A h j ¼ A l ðx; yÞm j represents the j-th sparse coding map of HR image, M represents the mapping function matrix of size is the set of filters of HR image. After solving the minimization problem Eq (2) by SA-ADMM algorithm, the mapping function M and the filers f h j ; j ¼ 1 . . . N 2 can be learned and finally the residual component with high frequency texture structure of the HR image can be reconstruct by the summation of the convolution results of HR filters and the sparse coding mapsX ¼

Joint bayesian convolutional sparse coding for image superresolution
The performance of convolutional-sparse-coding (CSC)-based super resolution still depends on the appropriate choice of the unknown parameters [19]. To address the limitation, we present a joint Bayesian convolutional sparse coding framework for image super resolution (JB-CSC SR). Firstly, rather than the non-parameter Bayesian based sparse coding, we develop a new beta process to build the Bayesian based CSC model. Secondly, since we use different numbers of the filters and the corresponding sparse coding maps to decompose/reconstruct the LR and HR images, two CSC problems are solved by jointly learning a mapping function between the sparse coding maps in the two feature spaces.

Bayesian based convolutional sparse coding model for decomposing LR image
In this paper, we propose a new CSC model for super resolution with Bayesian prior. As shown in S1 Fig, the SR model is a linear combination of N atoms with the corresponding coefficients. Likewise, the traditional CSC model is the linear combination of the convolutions with the corresponding linear coefficients, where all the linear coefficients equal to one. Therefore, adaptive linear coefficients should be added in the CSC model.
For conveniently describing our model, we transform the residual component of LR image Y, the filters f l i¼1...N 1 and the sparse coding maps A l i in Eq (1) into 1-dimensional, thus, we have the residual component Y2R P , filters f l i¼1...N 1 2 R S , sparse coding maps a l i¼1...N 1 2 R ðmþsÀ 1ÞðnþsÀ 1Þ , where P = m×n, S = s×s. The number of filters for decomposing the LR image is With the base measure ℏ 0 and the parameters a,b>0, a representation of Beta Process is as follows, where d d l i is unit point mass at d l i . A draw ℏ from the process is a set of N probabilities p i¼1...N 1 , each associated with d l i¼1...N 1 that are drawn i.i.d. from the base measure ℏ 0 . Considering π i to be a Bernoulli distribution parameter, we use ℏ to draw a binary vector Bernoulliðp i Þ. In the limit N 1 !1, the number of the non-zero elements in z l itself is derived from Poisson(a/b) that control the number of used convolutions Since the residual component Y can be represented as the linear combination of the convolutions d l 1 ; d l 2 ; Á Á Á ; d l N 1 , it can be expressed in the following term, where With the binary vector z l 2 R N 1 , the parameters in Eq (5) can be expressed as, where represents the Hadamard vector product, I N 1 , I S , and I P represent the identity matrix with size of N 1 ×N 1 , S×S, P×P, respectively. γ s~G amma(c,d), γ ε~G amma(e,f) are drawn from the Gamma distributions, respectively. Following the model Eq (5) and the general structure of beta process described in [23], the convolutional sparse coding model with Bayesian prior can be expressed as, where w l is defined as in Eq (5). We rewrite the formula Eq (6) as follows, Eq (7) is a typical convolutional sparse coding minimization problem with Bayesian priors.
The weighted linear combination of the convolutions, X N 1 i¼1 f l i a l i w l i , is equivalent to the summation of the convolution of the filters f l i ; i ¼ 1 Á Á Á N 1 and the weighted sparse coding maps a l i w l i , i = 1Á Á ÁN 1 . Analogous to [19], we adopt the SA-ADMM algorithm to solve the minimization problem Eq (7) to train the filters f l i¼1...N 1 and learn the new sparse coding maps a l i w l i ; i ¼ 1 . . . N 1 with the Bayesian prior.

Bayesian based convolutional sparse coding model for reconstructing HR image
Authors in [19] train the mapping function to obtain the sparse coding maps of the HR image from the sparse coding maps of the LR image. However, inappropriate choosing the unknown parameters, including the number of the convolutions for HR image and the parameters of the mapping function, may lead to instability of the performance of the algorithm. To address this problem, we build the convolutional sparse coding (CSC) model with Bayesian prior to learn the HR filters to reconstruct the HR image. For convenient expression, given the sparse coding maps learned from Eq (7), we have A l ¼ ða l 1 w l 1 ; . . . ; a l N 1 w l N 1 Þ 2 R ðmþsÀ 1ÞðnþsÀ 1ÞÂN 1 . N 2 is the number of filters for reconstructing the HR image. The authors in [19] transform A l from N 1 -dimension feature space to the N 2 -dimension feature space by right-multiplying it with the mapping function M 2 R N 1 ÂN 2 , (m j 1 0; jm j j 1 ¼ 1, m j is the j-th column of M), they have  (3), we have a representation of Beta Process as follows where dðd h j Þ is unit point mass at d h j . Similar to the Beta Process in Sub-section Ⅲ-A, we also have ℏ to draw a binary vector Bernoulliðp j Þ.
and ε h 2R kP represent the coefficient vector and the error term, respectively. The coefficient vector w h can be decomposed by the binary vector z h 2 R N 2 . Similar to Eq (5), we have, I N 2 , I kS , I kP represent the identity matrix with size of N 2 ×N 2 , kS×kS, kP×kP, respectively. γ s and γ ε are the defined as in Eq (5). Following the parameter settings in Eq (10), the convolutional sparse coding model with Bayesian prior for learning the HR filters can be expressed as, where w h is defined as in Eq (10).
Þw h represents the weighted linear combination of the convolutions, we rewrite the formula Eq (11) as follows, Eq (12) is also a convolutional sparse coding minimization problem with Bayesian priors. We solve it by SA-ADMM algorithm. After training the HR filters ff h j g j , the sparse coding maps fa h Lj g j , and the mapping function MΛ, the residual component of HR image is formulated as the summation of the convolutions of HR filters and the sparse coding maps:

Variance bayesian inference process
Gibbs sampling is usually used to perform Bayesian inference to sample the parameters from the conditional distribution of each hidden variable, given the other observations [24]. The authors in [25] update the parameters according to the posterior distributions over the model for Gibbs sampler. The inference process performs updating. It is also called sampling over these posteriors. Inspired by [23], [25], we derive analytical expressions for the Gibbs sample, starting with the convolutions fd l i g i¼1...N 1 of the LR image instead of the dictionary in their models [25]. For our model, we derive the following about the posterior distribution over a LR convolution to sampling the i-th convolution, Let y d l i denote the contribution of the convolution d l i to the residual component of LR image y, then we have, With y d l i , the posterior distribution in Eq (14) can be rewritten as, Given Eq (15), the posterior distribution over a convolution can be written as where Once the convolutions have sampled, we sample the parameter z l i . By the distribution of the i-th convolution, the posterior probability distribution over z l i can be expressed as With the prior probability of z l i ¼ 1 given by π i , we can write the following posterior probability, The prior probability of z l i ¼ 0 is given by 1−π i , the posterior probability can be derived as, Analogous to the K-SVD algorithm, we derive the expression of sampling z l i , Having sampled the convolution and the binary vector, we adopt the method of sampling of s l , γ s , π i , γ ε , according to [25].

Summary of algorithm
The proposed model consists of two stages: first, we solve the CSC model problem to learn LR filters and the sparse coding maps; second, we solve the CSC problem to learn the HR filters, the sparse coding maps, and the mapping function for reconstructing the HR image. By incorporating the Bayesian prior and the inference process into the convolutional sparse coding model, we summarize the algorithm of the proposed method in Algorithm 1.

Algorithm 1
1. Input the train LR image Y, decompose it into smooth component Y s and residual component y; 2. Initialize the parameters z l , s l , ε l , d l i , and f l i¼1...N 1 as in Eq (5). 2. Solving Eq (6) by SA-ADMM algorithm to obtain f l i¼1...N 1 and the weighted sparse coding maps A l ¼ ða l i w l i Þ i¼1...N 1 while updating the parameters for z l by Eq (20), and updating s l ,ε l , f l i¼1...N 1 accordingly. Then the smooth component Y s is zoomed into X s with k factor. 3. Initialize M ¼ ðm 1 ; m 2 ; Á Á Á ; m N 2 Þ 2 R N 1 ÂN 2 , m j $ Nð0; S À 1 I N 1 Þ and the parameters z h , s h , ε h , d h j , and f h j¼1...N 2 as in Eq (10); 4. Transform the sparse coding maps

Experimental results
In this section, we compare the proposed joint Bayesian convolutional sparse coding (JB-CSC) method with four state-of-the-art SR methods, including the Beta process joint dictionary learning method (BPJDL) [16], convolutional neural network based method (CNN-SR) [18], convolutional sparse coding based method (CSC-SR) [19], and adjusted ANR (A-ANR) [26]. We use the source code provided in the authors' website, and use the parameters which they recommend. The original ground truth images are downsized by bi-cubic interpolation to generate LR-HR image pairs for both training and evaluation.

Parameter setting
To have a fair comparison between CSC-SR [19] and our model, we adopt the same training set provided in [19], the same regularization parameters, and the same implementation on the image boundary. The beta-distribution parameters are set as a = N 1 for LR image and a = N 2 for HR image, and b = 1 for both HR and LR image. The hyper-parameters in Eq (5) and Eq (10) were set as c = d = e = f = 10 −6 , which is provided in [24]. All these parameters in the Bayesian inference process have not been tuned. We conduct all the comparison experiments on a PC with Intel Core i7 25.GHz CPU 4GB RAM on Matlab platform.
In S3 Fig, we can see from the small window in the right-bottom of the images, the SR results by CNN-SR over-smooth the edge. And very little ringing artifacts are generated by BPJDL, A-ANR, and CSC, since the zooming factor is low. The PSNR values of the results at this factor k = 2 show the outperformance of our method.
In S4 Fig, in the small window of the images, the SR results of the competing methods, such as BPJDL and A-ANR show that more edges are over-smoothed. Comparing CSC-SR and the proposed method, the edges of the image can be preserved better than CSC-SR. In S5 Fig, the results of the proposed method are compared with the results of BPJDL, A-ANR, CNN-SR, and CSC-SR. In the small windows, we can see that more textures of the hair are preserved much better in BPJDL. This comparison verifies that the models with Bayesian prior, such as BPJDL and the proposed method, have the outperformance in preserving the texture details of the images.
In S6 Fig, the proposed method is compared with the competing methods with the zooming factor k = 4. In (b), we can see that the result of BPJDL shows that the ringing artifacts increase as the zooming factor increases. As shown in (d), (c), (e), (f) of S6 Fig, the ringing artifacts remain in the results of A-ANR, CSC-SR and the proposed method while not in the result of CNN-SR. However, the edges are over-smoothed by CNN-SR. The PSNR value of the proposed method is higher than the competing methods.

Conclusion
In this paper, we present a convolutional sparse coding based super resolution method with a joint Bayesian learning strategy (JB-CSC). JB-CSC employs a coupled Beta-Bernoulli process to incorporate the Bayesian prior into the convolutional sparse coding model, which avoids instability caused by estimating the unknown parameter. Different from the CSC method, the filters and sparse feature maps for both low resolution (LR) image and high resolution (HR) are learned adaptively through the Bayesian learning strategy. The experimental results validate the advantages of the proposed approach, compared with the previous CSC-SR and other state-of-the-art SR methods.