Enhancing sparse representation of color images by cross channel transformation

Transformations for enhancing sparsity in the approximation of color images by 2D atomic decomposition are discussed. The sparsity is firstly considered with respect to the most significant coefficients in the wavelet decomposition of the color image. The discrete cosine transform is singled out as an effective 3 point transformation for this purpose. The enhanced feature is further exploited by approximating the transformed arrays using an effective greedy strategy with a separable highly redundant dictionary. The relevance of the achieved sparsity is illustrated by a simple encoding procedure. On typical test images the compression at high quality recovery is shown to significantly improve upon JPEG and WebP formats.


Introduction
In the signal processing field sparse representation usually refers to the approximation of a signal as superposition of elementary components, called atoms, which are members of a large redundant set, called a dictionary [1].The superposition, termed atomic decomposition, aims at approximating the signal involving as few atoms as possible [1][2][3][4].Sparsity is also relevant to data collection.Within the emerging theory of sampling known as compressive sensing (CS) this property is key for reconstructing signals from a reduced number of measures [5][6][7].In particular, distributed compressive sensing (DCS) algorithms exploit inter signal correlation structure for multiple signal recovery [8,9] Sparse image representation using redundant dictionaries has been considered in numerous works e.g.[11][12][13] and in the context of applications such as image restoration [14][15][16], feature extraction [17][18][19][20] and super resolution [21][22][23][24].
The sparsity property of some types of 3D images benefits from 3D processing [25][26][27][28].In particular most RGB (Red-Green-Blue) images admit a sparser atomic decomposition if approximated by 3D atoms [29].Within the 3D framework the gain in sparsity comes at expenses of computational cost though.
The purpose of this work is to show that the application of a transformation across the direction of the colors improve sparsity in the wavelet domain representation of the 2D color channels.The relevance of this feature is demonstrated by a simple encoding procedure rendering good compression results in comparison to the state of the art.
The contributions of the paper are listed below.
• Highlighting of a distinctive feature concerning sparse representation of RGB images.Namely, that transformations in the direction of the color channels significantly improve the sparsity of the 2D atomic decomposition of the image.
• Demonstration that the discrete cosine transform (dct) can improve sparsity with respect to the optimal decorrelation transform, i.e. that given by the eigenvectors of the correlation matrix, which is thereby image dependent.
• Showcasing of the relevance of sparsity to image compression with high quality recovery.On a set of 15 typical test images the achieved compression is shown to significantly improve upon JPEG and WebP formats.The results are competitive with those produced by JPEG2000.
The work is organised as follows: Sec. 3 introduces the mathematical notation.Sec. 4 compares several cross color transformations enhancing sparsity.A numerical example on a large data set is used to illustrate the suitability of the dct for that purpose.Sec. 5 demonstrates the gain in sparsity obtained by the atomic decomposition of color images when the dct is applied across the RGB channels.Sec.6 illustrates the relevance of the approach to image compression with high quality recovery.The conclusions are summarised in Sec. 7.

Mathematical Notation
Throughout the paper we use the following notational convention.R represents the set of real numbers.Boldface letters indicate Euclidean vectors, 2D and 3D arrays.Standard mathematical fonts indicate components, e.g., d ∈ R N is a vector of components d(i) ∈ R, i = 1, . . ., N .The elements of a 2D array I ∈ R Nx×Ny are indicated as I(i, j), i = 1, . . ., N x , j = 1, . . ., N y and the color channels of I ∈ R Nx×Ny×3 as I(:, :, z), z = 1, 2, 3.The transpose of a matrix, G say, is indicated as G .A set of, say M , color images is represented by the arrays The inner product between arrays in R Nx×Ny is given by the Frobenius inner product •, • F as Consequently, the Frobenius norm • F is calculated as The inner produce between arrays in R N is given by the Euclidean inner product •, • as

Cross color transformations
Given a color image I(i, j, k), i = 1, . . ., N x , j = 1, . . ., N y , k = 1, 2, 3 the processing of the 3 channels can be realised either in the pixel/intensity or in the wavelet domain.Since the representation of most images is sparser in the wavelet domain [29][30][31][32][33] we approximate in that domain and reconstruct the approximated image by the inverse wavelet transform.Thus, using a 3 × 3 matrix T, we construct the transformed arrays U and W as follows U (:, :, z) = 3 l=1 I(:, :, l)T (l, z) z = 1, 2, 3. (1) where dwt indicates the 2D wavelet transform.For the transformation T we consider the following cases (i) The dct.
(ii) The reversible YCbCr color space transform.
iii) The principal components (PC) transform.iv) A transformation learned from an independent set of images.
The dct is given by the matrix The YCbCr color space transform is given by the matrix below [35]   0.299 −0.169 0.5 0.587 −0.331 −0.419 0.114 0.5 −0.0813

 
The columns of the principal components transform are the normalised eigenvectors of covariance matrix of the RGB pixels.Thus, the transformation is image dependent and optimal with respect to decorrelating the color channels.As a first test, the approximation of the transformed channels is realised by keeping a fixed number of the largest absolute value entries, and setting the others equal to zero.In relation to this, for an image of size L x × L y × 3 we define the Sparsity Ratio (SR) as follows SR = L x • L y • 3 Number of nonzero entries in the three channels. ( The quality of a reconstructed image Ĩ, with respect to the original 8-bit image I, is compared using the Peak Signal-to-Noise Ratio (PSNR), where For the numerical examples below the transformation corresponding to case (iv) is learned from a set of images I{m}, m = 1, . . ., M all of the same size.Starting from an invertible 3×3 matrix T k , with k = 1, the learning algorithm proceeds through the following instructions.
1) Use T k to transform the arrays I{m} → U k {m} → W k {m} as in ( 1) and (2).
2) Approximate each transformed array W k {m} to obtain Wk {m} by keeping the largest K absolute value entries.
3) Apply the inverse 2D wavelet transform idwt to reconstruct the approximated arrays Ũk {m}, m = 1, . . ., M as 4) Use the original images I{m}, m = 1, . . ., M to find G = T −1 by least squares fitting, i.e where 5) While E decreases, or the maximum number of allowed iterations has not been reached, Given the arrays Ũk {m}, m = 1, . . ., M the least squares problem for determining the transformation T k has a unique solution.However, the joint optimisation with respect to the arrays Ũk {m}, m = 1, . . ., M and the transformation T k is not convex.Hence, the algorithm's outcome depends on the initial value.
The transformation (iv) used in the numerical examples of Secs.4.1 and 5.1 has been learned from M = 100 images, all of size 384 × 512 × 3, from the UCID data set [34], which contains images of buildings, places and cars.The learning curves for two random orthonormal initialisations is shown in Fig. 1.
It is worth mentioning that, as shown in Fig. 1, learning is richer when starting comparatively distant from a local minimum (left graph in Fig. 1).However, since the convergence is to a local minimum other random initialisations, even if generating less learning, may give better results (right graph in Fig. 1).
The aim of the next numerical test is to demonstrate the effect on the SR (c.f. ( 3)) produced by the above (i) -(iv) transformations across the color channels.

Numerical Test I
Using the whole Berkeley data set [36], consisting of 300 images all of size 321 × 481 × 3 we proceed with each image as in ( 1) and ( 2).The dwt corresponds to the Cohen-Daubechies-Feauveau 9/7 (CDF97) wavelet family.Fig. 3 shows the transformed channels of the image in Fig. 2, including the dct transformation across channels (right graph) and without T transformation (left graph).
Ĩ(:, :, z) where T −1 is the inverse of T. When no transformation T is considered the image is reconstructed directly from (5).The 2nd and 4th columns of Table 1 show the mean value PSNR (PSNR), with respect to the whole data set, for SR=20 and SR=10 respectively, corresponding to the transformations T listed in the 1st column of the table.The 3rd and 5th columns give the standard deviations (std).
While Table 1 shows that all (i) -(iv) transformations render equivalent superior results, with respect to not applying a transformation (case (v)), the dct slightly exceeds the others.Case (iv) refers to the best results achieved when initialising the learning algorithm with 500 different random orthonormal transformations.When initialising the algorithm with transformations (i) -(iii) it appears that each of these transformations is close to a local minimiser of the algorithm.This stems from the fact that such initialisations do not generate significant learning.
The common feature of most of the 300 images in the data set used in this numerical example is the correlation property of the three color channels.This property is assessed by the correlation coefficients , where Γ(i, j, z) = (I(i, j, z) − I(:, :, z)), and I(:, :, z) indicates the mean value of channel z.
As seen in the 3 histograms of Fig. 4 for most images in the data set the correlation of the color channels is high.It is surprising then than the PC transformation (iii), which completely decorrelates the channels, does not overperform the other transformations, on the contrary.This feature has also been noticed in the context of bit allocation for subband color image compression [37].

Approximations by atomic decomposition
We have seen (c.f.Table 1) that by transformation of channels it is possible to gain quality when reducing nonzero entries in the channels.Now we discuss how to improve quality further by approximating the 2D arrays (2) by an atomic decomposition, other than just by neglecting their less significant entries.For the approximation to be successful it is important to use an appropriate dictionary.To this end, one possibility could be to learn the dictionary from training data [38][39][40][41][42][43][44][45].However as demonstrated in previous works [29,32,33] a separable dictionary, which is easy to construct, is well suited for the purposes of achieving sparsity and delivers a fast implementation of the approach.Since we use that dictionary in the numerical examples, below we describe the method for constructing the atomic decomposition of the array W considering specifically a separable dictionary.Firstly we concatenate the 3 planes W (:, :, z), z = 1, 2, 3 into an extended 2D array W ∈ R 3Lx×Ly and divide this array in small blocks W q , q = 1, . . ., Q. Without loss of generality the blocks are assumed to be square of size N b × N b say, and are approximated using separable dictionaries For q = 1, . . ., Q every element W q is approximated by an atomic decomposition as below: where y,q n is the index in the set {1, 2, . . ., M y } corresponding to the label of the atom in the dictionary D y contributing to the n-th term in the approximation of the q-th block.The index x,q n has the equivalent description.The assembling of the approximated blocks gives rise to the approximated array W K , i.e., W K = ĴQ q=1 W kq q , where K = Q q=1 k q and Ĵ represents the assembling operation.
The approximation of the partition W q = 1, . . ., Q is carried out iteratively as a two step process which selects i) the atoms in the atomic decomposition (9) and ii) the sequence in which the blocks in the partition are approximated.The procedure is called Hierarchized Block Wise (HBW) implementation of greedy pursuit strategies [31,46].For the selection of the atoms we apply the Orthogonal Matching Pursuit (OMP) approach [47] dedicated to 2D with separable dictionaries (OMP2D) [48].Thus, the whole algorithm is termed HBW-OMP2D [31].The method iterates as described below.
On setting k q = 0 and R 0 q = W q at iteration k q + 1 the algorithm selects the indices x,q kq+1 and y,q kq+1 , as follows: x,q kq+1 , y,q kq+1 = arg max n=1,...,Mx m=1,...,My where R kq q = W q − W kq q .The calculation of W kq q is realised in order to minimise R kq q F , which is equivalent to finding the orthogonal projection onto the subspace spanned by the selected atoms{A n = d x x,q n (d y y,q n ) T } kq n=1 .In our implementation, the calculation of the coefficients c q (n), n = 1, . . ., k q in (9) is realised as where the set {B kq n } kq n=1 is biorthogonal to the set {A n } kq n=1 and needs to be upgraded and updated to account for each newly selected atom.Starting from ) the updating and upgrading is realised through the recursive equations [48,49]: with the additional re-orthogonalisation step As discussed in [31,46], for x,q kq+1 and y,q kq+1 , q = 1, . . ., Q the indices resulting from (10), the block to be approximated in the next iteration corresponds to the value q such that q = arg max q=1,...,Q .
The algorithm stops when the required total number of K atoms has been selected.This number can be fixed using the SR, which is now calculated as Remark 1.The above described implementation of HBW-OMP2D is very effective in terms of speed, but demanding in terms of memory (the partial outputs corresponding to all the blocks in the partition need to be stored at every iteration).An alternative implementation, termed HBW Self Projected Matching Pursuit (HBW-SPMP) [32,50], would enable the application of the identical strategy to much larger images than the ones considered in this work.

Numerical example II
For this and the next numerical example, we use a mixed dictionary consisting of two classes of sub-dictionaries of different nature: I) The trigonometric dictionaries D x C and D x S , defined below, where w c (n) and w s (n), n = 1, . . ., M are normalisation factors.
II) The dictionary D x L , which is constructed by translation of the prototype atoms in Fig. 5.The mixed dictionary D x is built as  Notice that while case (v), which does not include any T transformation, gives superior results than by disregarding entries (c.f.Table 1) when applying any of the transformations (i) -(iv) the results improve further.The PC transform, however, appears significantly less effective than the others.In addition to rendering the best results, the dct brings along the additional advantage of being orthonormal.Consequently, it does not magnify errors at the inversion step.Because of this, we single out the dct as the most convenient cross color transformation out of the four considered here.

Application to image compression
In order achieve compression by filing an atomic decomposition we need to address two issues.Namely, the quantisation of the coefficients c q (n) n = 1, . . ., k q , q = 1, . . ., Q in ( 9) and the storage of the indices ( x,q n , y,q n ), n = 1, . . ., k q , q = 1, . . ., Q.We tackle the matters by simple but effective procedures [33].
For q = 1, . . ., Q the absolute value coefficients |c q (n)|, n = 1, . . ., k q are converted to integers through uniform quantisation as follows The signs of the coefficient are encoded separately as a vector, s q , using a binary alphabet.Each pair of indices ( x,q n , y,q n ) corresponding to the atoms in the decompositions of the block W q is mapped into a single index o q (n).The set o q (1), . . ., o q (k q ) is sorted in ascending order o q (n) → õq (n), n = 1, . . ., k q to take the differences δ q (n) = õq (n) − õq (n − 1), n = 2, . . ., k q and construct the string of non-negative numbers õq (1), δ q (2), . . ., δ q (k q ).The order of the set õq (n), n = 1, . . ., k q induces order in the unsigned coefficients, c ∆ q (n) → c∆ q (n), and in the corresponding signs s q (n) → sq (n).
For each q the number 0 is added at the end of the indices õq (n), n = 1, . . ., k q before concatenation, to be able to separate strings corresponding to different blocks.Each sequence of strings corresponding to q = 1, . . ., Q is concatenated and encoded using the off-the-shelf MATLAB function Huff06 [51], which implements Huffman coding.
The compression rate is given in bits-per-pixel (bpp) which is defined as bpp = Size of the file in bits Number of intensity pixels in a single channel .
At the reconstruction stage the indices ( ˜ x,q n , ˜ y,q n ), n = 1 . . .k q are recovered from the string of differences δ q (n)), n = 2, . . ., k q .The signs of the coefficients are read from the binary string.The quantised unsigned coefficients are read and transformed into real numbers as: The codec for reproducing the examples in the next sections has been made available on [52].

Numerical Example III
The relevance to image compression of the achieved sparsity by dct cross color transformation is illustrated in this section by comparison with results yielded by the compression standards JPEG, WebP, and JPEG2000, on the 15 images in Table 3.These are typical images, used for All the results have been obtained in the MATLAB environment (version R2019a), using a machine with CPU Intel(R) Core(TM) i7-3520M RAM 8GB CPU @ 2.90GHz.For the image approximation the HBW-OMP2D method was implemented with a C++ MEX file.All the channels and all the images were partitioned into blocks of size 16 × 16.The approximation times to produce the results in Table 4 are displayed in the last column of Table 3.
For realising the comparison we proceed as follows: we set the required value of PSNR as that produced by JPEG at quality=95 and tune compression with the other methods to produce the same PSNR.In our codec the tuning is realised by approximating the image up to PSNR o = 1.025 • PSNR (where PSNR is the targeted quality) and setting the quantisation parameter ∆ so as to reproduce the targeted quality.For compression with JPEG and JPEG2000 we use

Conclusions
The application of a cross color transformation for enhancing sparsity in the atomic decomposition of RGB images has been proposed.It was demonstrated that the effect of the transformation is to re-distribute the most significant values in the dwt of the 2D channels.As a result, when approximating the arrays by disregarding the less significant entries, the quality of the reconstructed image improves with respect to disabling the cross color transformation.Four transformations have been considered: (i) a 3 point dct, (ii) the reversible YCbCr color space transform, (iii) the PC transform, (iv) a transformation learned from an independent set of images.The quality of the image approximation was improved further by approximating the transformed arrays by an atomic decomposition using a separable dictionary and the greedy pursuit strategy HBW-OMP2D.The dct was singled out as the most convenient cross color transformation for approximating RGB color images in the wavelet domain.
The approximation approach has been shown to be relevant for image compression.By means of a simple coding strategy the achieved compression considerably improves upon the compression standards JPEG and WebP.On a set of 15 typical test images the results are at least as good as those produced by JPEG2000.We feel confident that the findings communicated here will motivate the consideration of dct as a cross color transformations for other image processing applications which also benefit from the sparsity of a representation.

Figure 1 :
Figure 1: Learning curves for the transformation (iv) corresponding to two different random orthogonal transforms initialising the process.The mean value PSNR with respect to the 100 images in the training set corresponds to SR=15 for all the images.

Figure 2 :
Figure 2: One of the RGB images in the Berkeley's data set

Figure 3 :
Figure 3: Magnitude of the entries in the array W constructed as in (2) from the image of Fig. 2, with T the dct (right graph) and without T (left graph).

Figure 4 :
Figure 4: Histograms of the correlation coefficients between the RGB channels.

Figure 5 :
Figure 5: Prototypes (each in different color) that generate the dictionaries D x L by sequential translations of one point.

Table 1 :
Mean value PSNR, with respect to the 300 images in the Berkeley data set.The approximation of each image is realised by setting the least significant entries in the arrays W{m} = 1, . . ., 300 equal zero, in order to obtain SR=20 for all the images (2nd column) and SR=10 for all the images (4th column).

Table 2
shows the improvement in PSNR achieved by atomic decompositions using the mixed dictionary for SR= 20 and SR= 10.

Table 2 :
Mean value PSNR, with respect to 300 images in the Berkeley data set, produced by 2D atomic decomposition of the arrays W{m} = 1, . . ., 300 in order to obtain SR=20 (2nd column) and SR=10 (4th column).

Table 3 :
the MATLAB imwrite command.The comparison with WebP was realised using the software for Ubuntu distributed on[55].Test Images.The last column gives the approximation times to produce the results in