A competitive scheme for storing sparse representation of X-Ray medical images

A competitive scheme for economic storage of the informational content of an X-Ray image, as it can be used for further processing, is presented. It is demonstrated that sparse representation of that type of data can be encapsulated in a small file without affecting the quality of the recovered image. The proposed representation, which is inscribed within the context of data reduction, provides a format for saving the image information in a way that could assist methodologies for analysis and classification. The competitiveness of the resulting file is compared against the compression standards JPEG and JPEG2000.


Introduction
Sparse image representation refers to particular techniques for data reduction. Rather than representing the image informational content by the intensity of the pixels, the image is transformed with the aim of reducing the number of data points for reproducing the equivalent information. Lessening the cardinality of medical data is crucial for remote diagnosis and treatments [1][2][3][4]. The interest for this matter in the area of medical technology has recently been invigorated by the prospect of earlier disease detection, using neural networks and deep learning methodologies for automatic analysis of X-Ray plates [5][6][7].
In recent work [8] we have demonstrated that sparse representation, obtained by a large dictionary and greedy algorithms, renders high quality approximation of X-Ray images. The framework was proven to produce approximations which are far more sparser than those arising from traditional transformations such as the Cosine and Wavelet Transforms. Nevertheless, for the approximation to be useful within the context of automatic health care systems and remote reporting, it is necessary to ensure that the high levels of the achieved sparsity are not affected by saving the reduced data in a small file. This paper follows on the study in [8] by presenting a simple scheme to store the sparse approximation of an X-Ray medical image in a file of competitive size with respect to the most commonly used formats, namely JPEG and JPEG2000 (JPEG2).
Although in general X-Ray images are sparser if the approximation is performed in the wavelet domain, for analysis purposes the approximation in the pixel intensity domain may be needed. Hence, we consider here approximations in both domains. It is pertinent to stress that the aim of this work is not to propose yet one more method for image compression, but a method for economic storage of the informational content of an X-Ray image as it can be used for further processing [7,[9][10][11][12][13][14][15]. The comparison of the resulting file size, against those produced by JPEG and JPEG2, is carried out with the purpose of illustrating the effectiveness of the proposed scheme.

Sparse image representation
Throughout the paper R stands for the set of real numbers. Lower boldface letters are used to represent vectors and upper boldface letters to represent matrices. Standard mathematical fonts indicate their corresponding components, e.g., d 2 R N is a vector of components d(i), i = 1, . . ., N and I 2 R N x ÂN y a matrix of elements I(i, j), i = 1, . . ., N x , j = 1, . . ., N y .
Suppose that an image, given as a 2D array I 2 R N x ÂN y of intensity values, is to be approximated by the linear decomposition where each c(n) is a scalar and each D ' n is an element of R N x ÂN y normalized to unity, called an 'atom', to be selected from a redundant set, D ¼ fD n g M n¼1 , called a 'dictionary'. The selection of the atoms D ' n in (1) is carried out according to an optimality criterion.
The goal of a sparse representation is to produce an approximation I k , of an image I, using as few atoms as possible. The mathematical methods for performing the task are either based on the minimization of the l 1 -norm [16,17] or are greedy strategies for stepwise selection of atoms from the dictionary [18,19]. When using large dictionaries, the latter are especially suited for practical applications. We focus here on the greedy algorithm known as Orthogonal Matching Pursuit (OMP), which is very effective at furnishing sparse solutions. In spite of the fact that the method is computationally intensive, the implementation described in the next section is very efficient in terms of processing time.

Orthogonal Matching Pursuit in 2D
The OMP method was introduced in [19] and has been implemented by a number of different algorithms. We describe a particular implementation for 2D, henceforth referred to as OMP2D. The dictionary is restricted to be separable, i.e. a 2D dictionary D which corresponds to the tensor product of two 1D dictionaries D ¼ D x D y . Our implementation of OMP2D is based on adaptive biorthogonalization and Gram-Schmidt orthogonalization procedures, as proposed in [20] for the one dimensional case. For the convenience of the interested researcher we include the description of its generalization to separable 2D dictionaries, as it is implemented by the MATLAB and C++ MEX functions we have made available.
Given a gray level intensity image, I 2 R N x ÂN y , and two 1D dictionaries and D y ¼ fd y m 2 R N y g M y m¼1 , we approximate the array I 2 R N x ÂN y by the atomic decomposition of the form: The coefficients fcðnÞg k n¼1 in (3) are such that kR k k F is minimum (kÁk F being the Frobenius norm induced by the Frobenius inner product hÁ, Ái F ). This is guaranteed by calculating the coefficients as where the matrices fB k n g k n¼1 are recursively constructed, at each iteration, to account for each newly selected atom. Starting from B 1 1 Þ T the set of matrices is upgraded and updated through the formula: For numerical accuracy of the orthogonal set fW n g kþ1 n¼1 at least one re-orthogonalization step is usually needed. It requires the recalculation of the element W k+1 as The algorithm iterates until for a given parameter ρ the stopping criterion jjI À I k jj 2 F < r is met.

Approximation by partitioning
The OMP2D approach described above is to be applied on an image partition. This implies the division of the image into small disjoint blocks I q , q = 1, . . ., Q, which without loss of generality we assume to be square of dimension N b × N b . Denoting each element of the image partition by I q , these are approximated, independently of each other, to produce the approximations I k q q , q = 1, . . ., Q, of the form (2). The superscript k q indicates the number of atoms intervening in the decomposition of the block q. As already mentioned, in spite of the fact that the approximation of most X-Ray images is significantly sparser if performed in the wavelet domain, results in the pixel intensity domain may be required for particular applications.
The approximation in the wavelet domain involves a transformation, via a Wavelet Transform, which converts the image into the array to be approximated. The inverse transformation is taken after the approximation is concluded. Fig 1 provides graphical illustrations of image partitions in the pixel intensity and the wavelet domain.
Remark 1: The OMP2D algorithm described in Sec. 2.1 is very effective up to some block size. While the actual size depends on the sparsity of the image, previous studies indicate that in general for block sizes greater than 24 × 24 an alternative implementation is advisable. The alternative implementation, called Self Projected Matching Pursuit (SPMP2D) [8,21] is not only dedicated to tackling large dimensional problems, but also potentially suitable for implementation in Graphic Processing Unit (GPU) programming. However, because for X-Ray medical images a partition into blocks of size 16 × 16 is a good compromise between sparsity and processing time, we have focussed on the implementation of OMP2D as given in Sec. 2.1. Other possibilities for approximating a partition are discussed in [8,22].

Mixed dictionaries
The available literature in relation to the construction of dictionaries for image representation is mainly concerned with methodologies for learning atoms from training data [23][24][25][26][27][28]. Those methodologies are not designed for learning the types of dictionaries of our interest though. We restrict the dictionary to be separable, in order to reduce the computational burden and memory requirements. In previous works [8,21,29] we have demonstrated that very large separable dictionaries, which are easy to construct, render high levels of sparsity. Such dictionaries are not specific to a particular class of images. A discrimination is only made to take into account whether the approximation is carried out in the pixel intensity or in the wavelet domain. In each domain we use mixed dictionaries which are similar in nature, but not equal. They have the trigonometric dictionaries D x C and D x S , defined below, as common components.
For approximations in the wavelet domain we add the dictionary D x Lw , as proposed in [8], which is built by translation of the prototype atoms in the right graph of Fig 2. The mixed dictionary D x wd , to be used only in the wavelet domain, is formed as For approximations in the pixel intensity domain we add the dictionary, D x Lp , which is built by translation of the prototype atoms in the left graph of Fig 2. In this case the mixed dictio- pd are very large, but not needed as such. All calculations are carried out using the 1D dictionaries.

Coding strategy
For the sparse representation of an X-Ray image to be useful within the current trend of medical technology developments it should be suitable to be encapsulated in a small file. Accordingly, the coefficients of the atomic decompositions need to be converted into integer numbers. This operation is known as quantization. We adopt a simple and commonly used uniform quantization technique. For q = 1, . . ., Q the absolute value coefficients |c q (n)|, n = 1, . . ., k q are converted to integers as follows: where dxe indicates the smallest integer number greater than or equal to x, Δ is the quantization parameter, and θ the threshold to disregard coefficients of small magnitude. The signs of the coefficient are encoded separately, as a vector s q , using a binary alphabet.
In order to store the information about the particular atoms present in the approximation of each block, we proceed as follows: Firstly each pair of indices ð' x;q n ; ' y;q n Þ corresponding to the atoms in the decompositions of the block I q is mapped into a single index o q (n). Then the set o q (1), . . ., o q (k q ) is sorted in ascending order o q ðnÞ !õ q ðnÞ; n ¼ 1; . . . ; k q . This guarantees that, for each q-value,õ q ðiÞ <õ q ði þ 1Þ; i ¼ 1; . . . ; k q À 1. The order of the indices induces an order in the unsigned coefficients, c D q !c D q and in the corresponding signs s q !s q . The advantage introduced by the ascending order of the indices is that they can be stored as smaller positive numbers, by taking differences between two consecutive values. Certainly by defining d q ðnÞ ¼õ q ðnÞ Àõ q ðn À 1Þ; n ¼ 2; . . . ; k q the stringõ q ð1Þ; d q ð2Þ; . . . ; d q ðk q Þ stores the indices for the block q with unique recovery. The number 0 is then used to separate the strings corresponding to different blocks.  The quantized magnitude of the re-ordered coefficients are concatenated in the strings st cf as follows: The next encoding/decoding scheme summarizes the above described procedure.

Encoding
• Given an image partition I q 2 R N b ÂN b ; q ¼ 1; . . . ; Q use OMP2D (or other method) to approximate each element of the partition by the atomic decomposition: • The approximation is carried out on each block, independently of the others, until the stopping criterion is reached.
• For each q, quantize as in (7) the magnitude of the coefficients in the decomposition (11) to obtain c D q ðnÞ; n ¼ 1; . . . ; k q . Store the signs of the non-zero coefficient as components of a vector s q .
• For each q, map the pair of indices ð' x;q n ; ' y;q n Þ; n ¼ 1; . . . ; k q in (11) into a single index o q (n), n = 1, . . ., k q and sort these numbers in ascending order to have the re-ordered sets: o q ð1Þ . . . ;õ q ðk q Þ;c D q ð1Þ; . . . ;c D q ðk q Þ ands q ð1Þ; . . . ;s q ðk q Þ to create the strings: st ind , as in (8), and st cf , and st sg as in (9) and (10)  st cf contains the magnitude of the corresponding coefficients (quantized to integer numbers).
st sg contains the signs of the coefficients in binary format. The quantization parameter Δ also needs to be stored in the file. We fix θ = 1.3Δ for all the images.

Decoding
• Recover the indices from their difference. This operation also gives the information about the number of coefficients in each block.
• Read the quantized unsigned coefficients from the string st cf and transform them into real numbers as jc r q ðnÞj ¼ Dc D q ðnÞ þ ðy À D=2Þ. Read the corresponding signs from the string st sg .
• Recover the approximated partition, for each block, through the liner combination • Assemble the recovered image as where theĴ indicates the operation for joining the blocks to restore the image.

Note:
The MATLAB scripts for implementing the scheme and reproducing the results presented in the next section have been made available on [30].

Evaluation metrics
The quality of the approximation is quantified by the Mean Structural SIMilarity (MSSIM) index [31,32] and the classical Peak Signal-to-Noise Ratio (PSNR), calculated as The required MSSIM is set to be MSSIM ! 0.997. This limit guarantees that the approximation is indistinguishable from the image, in the original size (it should be noticed that the MSSIM between an image and itself is 1). For the comparison with standard formats all the PSNRs are fixed as values for which JPEG and JPEG2 produce the required MSSIM. For producing a requested PSNR with the sparse representation approach we proceed as follows: the approximation routine is set to yield a slightly larger value of PSNR and the required one is then obtained by tuning the quantization parameter Δ.
As a measure of sparsity we use the Sparsity Ratio, which is defined as [33]:

SR ¼ Number of pixels in the image Number of coefficients in the representation
: Accordingly, the sparsity of a representation is manifested as a high value of SR. In addition to the SR, which is a global measure of sparsity, a meaningful description of the variation of the image content throughout the partition is rendered by the local sparsity ratio, which is given as where k q is the number of coefficients in the decomposition of the q-block and N 2 b is the number of pixels in the block.

Data sets
We illustrate the effectiveness of the proposed encoding scheme by storing the outputs of high quality sparse approximation of two data sets: 1) the Lukas 2D 8 bit medical image corpus, available on [34] and 2) the sample of 25 X-ray chest images taken randomly from the National Institute of Health (NIH) dataset available on [35].
The Lukas corpus consists of 20 images which cover different parts of the human body, as shown in Fig 3. The average size of the images in the set is 1943 × 1364 pixels.
The chest X-Ray images are available in the raster graphics file format portable network graphics (png). Information related to the NIH X-Ray collection is given in [36]. The 25 images used here have been placed on [30]. The sample consists of common views of the chest in radiology. The posteroanterior, anteroposterior, and lateral views, which are obtained by changing the relative orientation of the body and the direction of the X-Ray beam. The images in this set are of similar size and smaller than in the Lukas corpus. The average size is 496 × 512 pixels.

Results
The approximation of all the images in the Lukas corpus are performed in both the pixel intensity and the wavelet domain. The size of the blocks in the image partition is fixed taking into account previously reported results [8,21], which indicate that 16 × 16 is a good trade-off between the resulting sparsity and the processing time. The information about the sizes of the corresponding files is given in bits per pixel (bpp) in the third and forth columns of Table 1. The results for JPEG and JPEG2 are placed in the fifth and sixth columns, respectively. The last two rows of the table are the mean value of standard deviation (std) of the corresponding columns.
As shown in Table 1, all the files corresponding to approximations in the wavelet domain (S wd ) are smaller than those corresponding to approximations in the pixel intensity domain (S pd ), and also smaller than the JPEG ones. On average the files with the sparse representation in the wavelet domain are 22% smaller than those with the representation in the pixel domain, and 27% smaller than the JPEG files. In the present form JPEG2 produces the smallest files (on average 9% smaller than the files with the representation in the wavelet domain). However, both JPEG and JPEG2 formats involve an entropy coding step, which is not included in our scheme. Instead, the outputs of our algorithm are stored in HDF5 format [37,38]. In the provided software [30] this is implemented in a straightforward manner using the MATLAB function save. On the contrary, adding an entropy coding step to the software would increase the processing time.
A very interesting feature of the numerical results is that the quantization process, intrinsic to the economic store of the coefficients in the image approximation does not reduce the sparsity. For the sake of comparison in addition to calculating the SRs obtained with the dictionary approach in both domains, before and after quantization, we have also calculated the Table 1. Comparison of size rate (in bpp) for the Lukas corpus, listed in the first column. The third column shows the bpp values corresponding to the sparse representation in the pixel domain (S pd ). The forth column shows the corresponding results in the wavelet domain (S wd ). The fifth and sixth columns are the bpp values for the formats JPEG and JPEG2, respectively. All the approaches render a MSSIM ! 0.997 and the values of PSNR listed in the second column. corresponding SRs produced by nonlinear thresholding of the wavelet coefficients. The results are shown in Fig 4. As already discussed, since the quantization of coefficients degrades quality, to achieve the required PSNR the approximation of the image has to be carry out up to a higher PSNR value. Nevertheless, because the quantization process maps some coefficients to zero, for the corpus of 20 images in this study, quantization does not affect sparsity. On the contrary, as can be observed in Fig 4, for the sparsest images (first 5 images in Table 1) sparsity actually benefits from quantization. It is also clear that for the images in the upper part of the table the SR in the wavelet domain is significantly larger than in the pixel intensity domain. However, the level of sparsity achieved by the dictionary approach is, in both domains, significantly higher than that achieved by nonlinear thresholding of the wavelet coefficients. If the dictionary approach operates in the wavelet domain, then after quantization the mean value gain in SR with respect to thresholding of the wavelet coefficients is 163%, with standard deviation of 17%. In the pixel intensity domain the corresponding gain is 113%, with standard deviation of 30%.
It is worth mentioning that the local sparsity ratio (c.f. (14)) can be used to produce a digital summary of the images. Indeed, each of the graphs of Fig 5 depicts Fig  5 corresponds to one of the image in the Lukas corpus: Hand 0 , Foot 1 , Head 1 , Sinus 1 , Leg 0 , Thorax 0 , Breast 0 , Pelvis 1 and Spine 1 . This suggests that the digital summary of the images could be of assistance to classification and feature extraction techniques. For recent publications in that area of application see [39][40][41][42][43][44][45].
The approximation of the 25 chest X-Ray images is carried out to achieved a PSNR of 45dB for all the images. This guarantees a MSSIM of at least 0.99 for every image. The resulting size of the files lead to the same remarks as the values in the previous example displayed in Table 1. Indeed, as shown in Fig 6, the size rate produced by the dictionary approach in the pixel intensity domain (S pd ) is competitive with the JPEG format for the same quality. The files produced in the wavelet domain (S wd ) are smaller than the JPEG files but larger than the JPEG2 ones. However, let's recall that the entropy coding step, which is part of the JPEG and JPEG2 compression standards, is not included in our scheme. This results in larger files but fast processing. Certainly, using MATLAB environment and a small notebook 2.9GHz dual core i7 3520M CPU and 4GB of memory, the average time for carrying out the approximation and creating the files of the 25 images with the dictionary approach is 1.7 s per image (with standard deviation 0.5). This time is the average of five independent runs in the time domain and another five independent runs in the wavelet domain.
In this case the sparsity of the image representation benefits from quantization even more than in the previous case: The dictionary approach in the pixel intensity domain yields a mean value SR (SR) equal to 25.4 before quantization and SR ¼ 36:7 and after quantization. In the wavelet domain SR ¼ 35:4 before quantization, while SR ¼ 52:0 after quantization.

Conclusions
A scheme for encapsulating the sparse representation of an X-Ray image in a small file has been proposed. The approach operates within the over-complete dictionary framework, which achieves a notable level of sparsity in the image representation. An interesting feature, emerging from the numerical results illustrating the approach, is that the process of devising the small file does not affect the sparsity of the representation. We consider this a very important outcome, because the central aim of the approach is to reduce, as much as possible, the cardinality of the data representing the informational content of the images. Additionally, the competitivity of the file size was compared against the standard formats JPEG and JPEG2. The MATLAB routines, for implementing all the steps of the process to reproduce the results in Table 1 and Fig 6, have been made available on a dedicated website [30].
As a final remark it should be mentioned that the quality of the image representation in the present study is very high. A lower value of MSSIM might be acceptable as diagnostically lossless representation of X-Ray images [32]. Within the proposed scheme the only effects of setting a lower value of MSSIM are: a) the approximation routine runs faster, since less atoms are chosen, and b) the storage file is of course smaller. The right value of MSSIM is to be set according to how the file is to be used whilst taking into account specialized opinions.

Acknowledgments
Thanks are due to Prof. Rainer Köster and Dr. Jürgen Abel for making available the Lukas corpus of 8-bit 2D medical images [34].