Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Simultaneous optimized orthogonal matching pursuit with application to ECG compression

Abstract

A greedy pursuit strategy which finds a common basis for approximating a set of similar signals is proposed. The strategy extends the Optimized Orthogonal Matching Pursuit approach to selecting the subspace containing the approximation of all the signals in the set. The method, called Simultaneous Optimized Orthogonal Matching Pursuit, is stepwise optimal in the sense of minimizing at each iteration the mean square error norm of the signals in the set. When applied to compression of electrocardiograms, significant gains over other transformation based compression techniques are demonstrated on the MIT-BIH Arrhythmia dataset.

1 Introduction

Important signals in everyday life such as natural images, audio, and electrocardiogram records, are in general highly compressible. This implies that the original signal, available as a large set of numerical values, can be transformed into a set of much smaller cardinality or a set containing a large proportion of zero values. The transformation, which should not compromise the informational content of the data, is frequently called sparse representation. Traditional methods for sparse representation of signals are realized by applying an orthogonal transformation and disregarding the least relevant points in the transformed domain. Subsequently the signal is recovered by means of the inverse transformation. However, alternative transformations, which are not orthogonal but adapted to a signal at hand, have been shown to render high level of sparsity. Such transformations aim at representing a signal as a superposition of elements, which are called ‘atoms’ and are selected from a large set called ‘dictionary’. The superposition is said to be sparse if it involves a number of atoms much smaller than the number of numerical values representing the original signal.

Given a dictionary, the problem of finding the sparsest approximation of a signal, up to some acceptable error, is an NP-hard problem [1]. In practice it is addressed by tractable methodologies known as Pursuit Strategies. Such methodologies can be grouped for the most part in two broad categories. Namely, Basis Pursuit and Greedy Pursuit Strategies. The Basis Pursuit (BP) approach endeavors to obtain a tractable sparse solution by minimization of the 1-norm [2]. Greedy algorithms seek for a sparse solution by stepwise selection of dictionary’s atoms. When dealing with real data the latter are in general more convenient. From the seminal Matching Pursuit (MP) [3] and Orthogonal Matching Pursuit (OMP) [4] methods, a number of Greedy Pursuit Strategies have been developed to improve the process of sparsely representing single signals [516]. Due to complexity issues and memory requirements, most of these techniques are to be applied by segmenting the signal and approximating each segment independently of the others. Nonetheless, when the segments bear similarity with each other, for some applications it is convenient to look for the dictionary’s atoms suitable to represent all the segments simultaneously. The greedy Pursuit Strategy which has been dedicated to simultaneously approximate a set of signal is based on OMP [4] and has been termed Simultaneous Orthogonal Matching Pursuit (SOMP) [17]. Since in this work we extend the Optimized Orthogonal Matching Pursuit method [6] to simultaneously approximate a set of signals, we term the new approach Simultaneous Optimized Orthogonal Matching Pursuit (SOOMP).

The difference between SOMP and the SOOMP approach introduced in this work is equivalent to the difference between OMP and OOMP methods, both for approximating single signals. OOMP is stepwise optimal in the sense of minimizing at each iteration the norm of the residual error. Whilst OMP minimizes the norm of the error only with respect to the coefficients of the atomic superposition, OOMP minimizes the norm of the error with respect to those coefficients and the selection of a new atom. In the case of multiple signals SOOMP is designed to minimize the mean value of the error norm squared. An additional advantage arises from the proposed implementation. Based on adaptive biorthogonalization, the SOOMP method produces at each iteration the common dual basis to the basis of selected atoms. This allows to calculate the coefficients of the representation of each signal in the set simply by computation of inner products. We implement the previous SOMP method in an equivalent manner and compare the two approaches for the approximation of stereo music, by selecting atoms from a highly coherent trigonometric dictionary. The practical relevance of the SOOMP approach is further illustrated by using it for compression of electrocardiogram (ECG) records.

An ECG signal represents a sequence of heartbeats which, if properly segmented and aligned, are suitable to be simultaneously approximated. This property is shown to benefit compression. Reliable comparison with other compression techniques is made possible by recuse to an adaptive quantization procedure that facilitates to reconstruct the whole ECG record at the required quality. The compression results are shown to significantly improve upon results produced by different transformation based approaches.

The paper is organized as follows: Sect 2 introduces the problem and the mathematical notation. Sect 3 establishes the proposed SOOMP approach for simultaneous approximation of a set of similar signals. Sect 4 compares the SOOMP and SOMP approaches for the simultaneous approximation of stereo music. Sect 5 applies the proposed SOOMP method for compressing digital ECG records and produces reliable comparisons with previously reported results. The conclusions are presented in Sect 6.

2 Mathematical introduction of the problem

In order to pose in mathematical terms the problem to be addressed we need to introduce the notation used throughout the paper as well as some preliminary background.

The sets of real, integer, and natural numbers are indicated by , and , respectively. Boldface letters are used to indicate Euclidean vectors or matrices whilst standard mathematical fonts indicate components, e.g., is a vector of components and is a matrix of elements which when not leaving room for ambiguity will also be represented as . A set of Q signals of equal length N, to be simultaneously approximated in a common subspace, is represented as a set of vectors . The inner product is indicated as , e.g. for and

The 2-norm induced by the inner product is denoted as , e.g. for

A set of M vectors

such that and N < M, is called a redundant dictionary for and its elements are called atoms.

In our context a signal is assumed to be well approximated by an element, say , belonging to a finite dimensional subspace . This assumption implies that, within a tolerance much larger than the numerical errors in the calculations, is accepted to be a good approximation of if Examples of signals fulfilling this definition are, amongst others, audio signals, and electrocardiograms. These are all signals with acceptable approximations which, without affecting their informational content, do not necessarily produce a highly accurate point-wise reproduction of the signals. These type of signals are suitable for lossy compression.

Since this work concerns approximation of similar signals we need to make an assumption on the signals that will be considered. We say that a finite set of Q signals are similar if they can be well approximated in a subspace of dimension k, with k significantly smaller than N. This is equivalent to assuming that there exists a common basis for such that each signal is approximated as

The quality of the approximated set will be assessed in mean value

where with .

3 Strategy for simultaneous approximation of a set of signals

Given a set of similar signals and a dictionary, the aim is to simultaneously approximate all the signals in the set within a common subspace . In other words, each signal is to be approximated as a k-term atomic superposition

(1)

where the atoms in (1) are selected from the given dictionary according to the criterion of optimality that will be established by Proposition 1 in the next subsection. Let us suppose for the moment that these atoms are known. Assigning a weight to the signal , with , the coefficients in (1) are required to minimize the mean value of the square norm of the errors in the approximation of the set of signals, i.e.

(2)

Since the above minimization is equivalent to finding the components of each vector such that

(3)

Accordingly, the minimization with respect to the coefficients in (1) can be implemented by adaptive biorthogonalization [18], as proposed within the OOMP algorithm for a single signal [6],

(4)

with vectors calculated as will be described in the next section.

The selection of the atoms in the decomposition (1) such that

poses an intractable problem (for a dictionary of M atoms there are possibilities to be checked). We address the selection in a tractable manner by extending the OOMP strategy to simultaneously approximate a set of similar signals. The extended strategy is refereed to as SOOMP (Simultaneous OOMP).

3.1. SOOMP algorithm

The algorithm is initialized by setting: , , and k = 0. The first atom is selected as the one corresponding to the index such that

(5)

This first atom is used to assign , calculate and iterate as prescribed below.

  1. (1) Upgrade the set , increase , and select the index of a new atom for the approximation as(6)
  2. (2) Compute the corresponding new vector as(7)
    including for numerical accuracy the re-orthogonalization step:(8)
  3. (3) Upgrade vectors as(9)
  4. (4) Update as(10)
  5. (5) If the stopping criterion is met finish the iterations. Otherwise repeat steps (1)–(5).

Note: given a tolerance , as stopping criterion we set:

(11)

or

(12)

depending on convenience for the particular application.

Once the iterations have finished calculate the coefficients for the decomposition (1) as

For calculate the final approximation of each signal as

Remark 1. The set of vectors as given in (9) fulfills that

where is the orthogonal projector of onto . Please find the proof in [6], or as a particular case of the more general proof in [18].

Proposition 1. The recursive selection of the indices , as proposed in (6), is stepwise optimal. It minimizes, at each iteration, the mean of the square distance between the set of signals and their corresponding approximations .

Proof: For k = 0 it is clear that selected as in (5) minimizes the mean of the square distance as given by

Let us assume that the indices selected as proposed in (6) minimize, in the specified stepwise sense, the mean square distance

We shall prove by induction that if the atoms are fixed, at iteration k  +  1 the atom selected as in (6) minimizes . The proof stems from the fact that at iteration k the approximation of each signal is the orthogonal projection of onto the subspace (c.f. Remark 1).

Consider that is augmented by one element, say , so that , where indicates direct sum. The orthogonal projection of each signal onto can be expressed as

Thus

Since is optimized and fixed at iteration k, it is true that at iteration k + 1 the index of the atom which minimizes fulfils

(13)

The proof is concluded using the self-adjoint properties of to write:

(14)

Moreover, since all atoms are normalized and the set is an orthonormal basis for we have

so that

which shows the equivalence between (13) and (6).

Corollary 1. The selection criterion (6) guaranties that, if for the right hand side of (6) is not zero, the selected elements are linearly independent.

Proof: For k = 1 the single atom is linearly independent. Let us assume that the first k selected atoms are linearly independent and prove that then the newly selected atom is also linearly independent. The proof is achieved by contradiction. Indeed, if is linearly dependent then , for some scalers , so that and from (14) we gather that the right hand side of (6) is zero. This contradiction leads to conclude that the selected elements by criterion (6) are linearly independent.

Remark 2. If the dictionary is complete or over-complete, i.e., , then by selecting N atoms the method can reconstruct the exact signals Otherwise if the selected atoms provide the orthogonal projections , where . It should be stressed, though, that with a suitable dictionary all the signals are expected to be well approximated in a subspace with k significantly smaller than N. Otherwise the representation would not qualify to be sparse.

Remark 3. The complexity of the SOOMP algorithm, at each iteration, is O(NMQ). For equal weights the difference between the complexity of the SOOMP selection criterion (13) and the SOMP one [17] is the denominator in the right hand side of (13). This introduces extra computations of complexity O(NM) so that the order of complexity of both algorithms is equivalent. However, as will be illustrated in the next section, due to the reduction in the number of selected atoms, and hence the number of iterations, the approximation may run faster if SOOMP rather than SOMP is used.

4 Numerical example

We illustrate here the algorithm’s implementation by simultaneously approximating stereophonic music. This type of music, commonly called stereo music, is usually produced by using two independent audio channels and . The example is dedicated to showing the sparsity obtained with the proposed SOOMP method and the precursor SOMP one, when simultaneously approximating both channels using a highly coherent trigonometric dictionary.

Since the signals structure varies with time, approximations with trigonometric dictionaries are carried out on a partition of the signal. In this example the partition consists of disjoint segments, called frames, which are assumed to be all of the same size L. The signal representation is realized by independent approximation of each frame.

As shown in [15,16], for music representation the combination of a Redundant Discrete Cosine Dictionary and a Redundant Discrete Sine Dictionary , defined below, renders higher sparsity than when using pure or dictionaries of the same redundancy as the combination .

where and are normalization factors.

The signal representation is realized by independent approximation of each frame. The atoms are selected for simultaneously representing both channels and in the ith frame, for , with I the integral part of N/L. In the numerical simulations the number of atoms in each dictionary and is 2L so that the total number of atoms in is 4L.

As a metric of approximation quality we use the standard Signal to Noise Ratio () calculate as

where are the approximations of the channels . Since the frames are disjoint the approximation of each channel is obtained by the concatenation of the approximations of the corresponding frames i.e., , where indicates the concatenation operation. The numbers of atoms for approximating each i-frame are decided to meet the condition:

(15)

On defining

the parameters in (15) are determined as

where is fixed to the same value for each frame in oder to achieve the expected approximation quality.

The metric of sparsity is considered to account for the number of elements in the approximation of the whole signals. For this we define the Sparsity Ratio (SR) as

where is the total number of atoms in the signal representation and N the number of samples in each of the channels. Thus, a large value of indicates a high level of sparsity.

The numerical example is realized using four stereo clips of melodic music: 1. Classic Orchestra. 2. Classic Guitar. 3. Chopin Piano. 4. Piazzolla Tango. All four clips are of the same length, N = 256000 samples in each channel (5.8 secs). The approximation is carried out on frames of length L = 1024 and for qualities corresponding to  dB,  dB, and  dB. The improvements in the values of obtained with SOOMP, with respect to those obtained with SOMP, are noticeable from the comparison of the 3rd and 4th columns in Table 1 for all four clips and all approximation qualities. The 5th column displays the gain over SOMP yielded by SOOMP. The 6th and 8th columns show the approximation times (the values are the average of five independent runs with MATLAB in a Laptop Core i7-1165G7). The 7th and 9th columns give the corresponding standard deviations.

thumbnail
Table 1. Comparison of sparsity (SR values) for approximations of four clips of music up to the same SNR (20, 25 and 30 dB). The 5th column gives the gain in SR achieved by SOOMP (2) over SOMP (1). The 6th and 8th columns give the approximation times in secs. (average of five independent runs). The 7th and 9th columns are the corresponding standard deviations.

https://doi.org/10.1371/journal.pone.0325555.t001

The left graphs in Fig 1 show 2000 samples in channel 1 of the original chips as well as the corresponding approximation up to 25 dB. The right graphs have the same description but correspond to the 2000 samples in channel 2. As illustrated by the graphs in Fig 1, SNR = 25 dB produces already very good pointwise approximation of the signals.

thumbnail
Fig 1. 2000 samples in the Chips 1–4 (blue lines) and the corresponding approximations (red lines) up to SNR = 25 dB.

The graphs on the left correspond to one of the channels and the graphs on the right to the other.

https://doi.org/10.1371/journal.pone.0325555.g001

5 Application to compression of ECG records

A digital ECG signal represents a sequence of heartbeats. In a typical record each heartbeat is characterized by a combination of three graphical deflections, known as QRS complex, and two lateral and less visually noticeable P and T waves. A short segment of a typical ECG record is illustrated in Fig 2.

In order to simultaneously approximate all the beats in a record we need to segment and align the beats to meet the requirement of being similar. The procedure is discussed in the next subsection.

5.1. Segmentation and alignment of heartbeats

The QRS complex is segmented once the central R peak is detected. This can be effectively done by the Pan Tompkins method [19]. In our numerical examples we use the off-the-shelf MATLAB implementation of this algorithm [20]. Since the distance between peaks in a record is not uniform, the length of the segmented beats should be passed to the decoder. The segmented peaks are placed in arrays of equal length L by padding with zeros. Fig 3 illustrates the resulting configuration with 80 heartbeats. Fig 4 shows the two dimensional image of the segmented and aligned heartbeats corresponding to records 111 and 100 in the MIT-BIH Arrhythmia database [21].

thumbnail
Fig 3. Configuration resulting by segmentation and alignment of 80 heartbeats for illustration purposes.

https://doi.org/10.1371/journal.pone.0325555.g003

thumbnail
Fig 4. Images of the magnitude of the aligned heartbeats in records 111 (left graph) and 100 (right graph).

https://doi.org/10.1371/journal.pone.0325555.g004

For simultaneously approximating heartbeats we use a wavelet dictionary. Given a partition of the interval [c, d] the dictionary is constructed as follows [22,23].

(16)

with

(17)

and

(18)

where indicates the restriction of the function to the interval [c, d]. Different families of wavelet basis and dictionaries for approximation of heartbeats have been compared in [22], where the Cohen–Daubechies–Feauveau family was singled out as the most effective one. We have confirmed the same outcome for simultaneous approximations and adopted the Cohen–Daubechies–Feauveau dictionary of redundancy approximately two introduced in [22].

The prototype functions and are plotted in the left and right graphs of Fig 5 respectively. The MATLAB codes for producing numerically both functions and building the dictionary (16) are described in [23]. The codes have been made available in [25] together with of the complete MATLAB software for reproducing the numerical examples in this work.

thumbnail
Fig 5. Cohen–Daubechies–Feauveau scaling and wavelet functions [24].

https://doi.org/10.1371/journal.pone.0325555.g005

The segmented and aligned heartbeats are simultaneously approximated using the SOOMP approach by assigning the same weight to each heartbeat, i.e. . In this case the algorithm stops at iteration k if

with

(19)

indicating the mean of and a fixed value of the metric of quality , which is defined by

(20)

where is the whole ECG record, is the reconstructed record from the approximated heartbeats and is the mean of .

Given a required value of , at the approximation step of the processing the parameter is fixed as 0.8 , in order to achieve the target value at the quantization step described in Subsect 5.2.

Even if the approximation is realized to achieve the required by the whole record, it is interesting to calculate the quality metric for each heartbeat in the array. To this end we defined

(21)

where is the approximation of the beat and its mean value. The values of for the simultaneous approximation of records 111 and 100 are shown in the left and right graphs of Fig 6 respectively.

thumbnail
Fig 6. Values of prdn from the simultaneous approximation of the aligned beats in records 111 (left graph) and 100 (right graph).

https://doi.org/10.1371/journal.pone.0325555.g006

The total produced by the reconstruction of the whole records is, in both cases, . The mean value for record 111 is 9.0 with std=4 while for record 100 the mean is 9.4 with std=1.4. However, as noticeable in the figures, for some qs the values are much higher than for others. This is a consequence of the irregularities of the beats, which can be perceived in the left image of Fig 4.

The top left graph of Fig 7 depicts a heartbeat in records 111 and its approximation. This heartbeat yields . As shown by the red line the figure, such high value is produced by a smooth version of the noisy signal. On the contrary, for regular heartbeats the values are close to the of the whole record. The right graph of the figure shows the approximation of one of those beats. The left bottom graph is the approximation of one of the few beats in record 100 which yields the outlier value . The right bottom graph shows the approximation of one of the other beats.

thumbnail
Fig 7. The top left graph shows one of the heartbeats in records 111, and its approximation, which yields an outlier value of .

The right graph corresponds to a beat yielding close to the mean value . The bottom graphs have the same description but the heartbeats are from record 100.

https://doi.org/10.1371/journal.pone.0325555.g007

By the simultaneous approximation of the aligned heartbeats these are transformed into a reduced set of numbers which allow to reconstruct the approximated heartbeats. This set consists of a) the k indices corresponding to the common atoms in the decomposition of the heartbeats (c.f. (1)) b) the different coefficients in the decomposition of each heartbeat (c.f. (1)). These coefficients can be placed in a two dimensional array as illustrated in Fig 8. The top image on the left in this figure shows the magnitude of the array arising from the approximation up to of record. 111. The bottom left image corresponds to the approximation up to of record 100.

thumbnail
Fig 8. The top image on the left depicts the magnitude of entries in the array containing the coefficients in the approximation of record 111.

The bottom left image depicts the magnitude of entries in the arrays corresponding to the approximation of records 100. The images on the right are the magnitude of the entries in the arrays (top) and (bottom) arising by applying the discrete cosine transform on the columns of the arrays represented by the images on the left.

https://doi.org/10.1371/journal.pone.0325555.g008

It is clear from the location of the brightest pixels in the left images of Fig 8 that the coefficients of largest magnitude are concentrated in vertical lines. This suggests that, to favor compression for storing these values, it is convenient to apply an orthogonal transformation to map the coefficients in the vertical direction to smaller values which eventually might be quantized to zero. Consequently, by applying the discrete cosine transform on each column of , we create the transformed array with the following entries

(22)

where indicates the one dimensional discrete cosine transform operating on the n-th column of array . The transformed points corresponding to the left images in Fig 8 are represented in the right images of this figure. The introduction of this step to decorrelate the vertical entries in the array is key to boost the performance of the adopted encoding strategy described below. The notorious change of intensity in the images on the right of Fig 8 indicates that after quantization some of the entries of the transformed arrays will be mapped to zero. Within the encoding strategy described in the next section, this effect enhances compression.

5.2. Encoding

At the encoding step the array is expressed as a vector of components, adopting the column-major order. The encoding of this vector follows the procedure outline in [26]. The components of are converted to integer numbers by a mid-tread uniform quantizer as follows:

(23)

where indicates the largest integer smaller or equal to x and is the quantization parameter. For comparison with results in other publications in the numerical examples the quantization parameter is set to produce the required quality of the reconstructed signal.

The absolute value of the elements (23) are placed in a smaller vector, say , after the elimination of zeros. The signs are encoded separately in a vector using a binary alphabet: 1 for + and 0 for –.

Assuming that the nonzero values in (23) occur at the positions , these indices are re-ordered in ascending order , i.e. . This induces new order in the coefficients, and in the corresponding signs . Defining the array stores the indices with unique recovery.

Finally the vectors , as well as the length of the heartbeats , are compressed using adaptive Huffman coding implemented by the off-the-shelf MATLAB function Huff06 [27]. The additional numbers which have to be passed to the decoder are:

  1. (i) The indices of the selected dictionary’s atoms forming the common basis.
  2. (ii) The quantization parameter .
  3. (iii) The mean value of the 1D ECG record (if not previously subtracted).
  4. (iv) The number of rows and columns of , i.e. Q and k.

5.3. 1D ECG signal recovery

At the decoding stage, after reverting Huffman coding, the locations of the nonzero entries in the transformed array after quantization are readily obtained. This allows the recovery of the array as follows.

  1. (i) Set and .
  2. (ii) Reshape the vector to produce a 2D array of size . The array is recovered from the one by inverting the transformation (c.f. (22)).
  3. (iii) Each row of the recovered array gives the coefficients in the decomposition (1) of the approximated heartbeats, i.e.
  4. (iv) Finally the reconstructed beats are assembled in a 1D record using the distance between heartbeats that was stored in the vector .

The achieved compression ratio , which is defined as

(24)

depends on the required quality of the recovered signal. In the numerical examples the quality of the recovered records is assessed by the as defined in (20). It is pertinent to stress the importance of adopting this normalized metric for comparison of reconstruction quality. The subtraction of avoids dependence on the signal baseline.

6 Numerical tests

For the numerical test we use the MIT-BIH Arrhythmia database [21]. Each of the records is of 30 min length, consisting of N = 650000 11-bit samples at a frequency of 360 Hz.

For comparison purposes we compress the subset of records reported in [28,29], and [30] and reproduce the values of in those publications. This is achieved as follows: The SOOMP method is applied to approximate the set of heartbeats in each record up to 80% the target PRDN. The quantization parameter is automatically fixed, by a bisection algorithm, in order to reproduce the target for the whole record within two decimal places.

The first, second and third columns of Table 2 reproduce the results published in [28]. The comparison is relevant because the approach [28] is also based on approximation of heartbeats using a dictionary. The techniques are very different though. Whilst our dictionary does not have to be stored because it is numerically generated, the dictionary in [28] is part of the ECG record to be compressed. Moreover, the method for finding the sparse representation is different and so is the procedure to store the parameters that should be passed to the decoder.

thumbnail
Table 2. Comparison with the results in [28]. The first collumn lists the records considered in [28]. The second column displays the values of PRDN and the third collum their CRs. Our CRs for the same PRDN are shown in the forth column. The fifth column shows the CRs obtained with the fast approach [26].

https://doi.org/10.1371/journal.pone.0325555.t002

Our compression results are shown in the forth column of Table 2. These results demonstrate a significant gain in for the same recovery quality. For further comparison we apply the fast compression algorithm [26], which does not require peak segmentation or Huffman coding. This method has been already shown to improve the average for the 48 records in the MIT-BIH Arrhythmia dataset with respect to the results in [31,32], and [33], for a broad rage of average qualities. For comparison with [28] in Table 2 the compression is realized to reproduce the listed in the second column for each record.

The first, second and third columns of Table 3 reproduce the results published in [29], which are achieved with an approach based on the Singular Value Decomposition (SVD).

thumbnail
Table 3. Same description as in Table 2 but the comparison is with the results of Table I in [29].

https://doi.org/10.1371/journal.pone.0325555.t003

Our compression ratios () are shown in the the forth column of Table 3. The fifth column shows the produced by the fast compression algorithm [26].

The first, second and third columns of Table 4 reproduce the results published in [30], which are also obtained with a Singular Value Decomposition based approach. Our are shown in the forth column of this table. The fifth column shows the produced by the fast compression algorithm [26].

thumbnail
Table 4. Same description as in Table 2 but the comparison is with the results of Table I in [30].

https://doi.org/10.1371/journal.pone.0325555.t004

Note: The MATLAB software for reproducing the tables is available on http://www.nonlinear-approx.info/examples/node017.html

7 Conclusions

The Optimized Orthogonal Matching Pursuit approach has been extended with the purpose of selecting a common basis for the simultaneous approximation of a set of similar signals. The extended approach, termed Simultaneous Optimized Orthogonal Matching Pursuit, minimizes at each iteration the mean value square error norm of the joint approximation. The algorithm’s implementation was demonstrated by approximating stereophonic music using a highly coherent trigonometric dictionary. The applicability of the method to ECG compression was illustrated on records taken from the MIT-BIH Arrhythmia database. The particular records were selected for comparison purposes as in [28,29], and [30]. The simultaneous approximations of aligned heartbeats was used for compressing a whole record. The adopted compression strategy was shown to improve upon compression results achieved by other methods for the same reconstruction quality. The comparison was made possible by means of an iterative quantization procedure which delivers the required quality.

While the proposed approach involves detection and alignment of R-peaks, it is the approximation step which introduces the highest computational cost. In order to address this matter as a line of future work, it would be interesting to investigate the possibility of selecting a suitable subspace from a whole data set (instead of a subspace for each single record as is done here). Certainly approximating new records using previously selected atoms would significantly speed the compression procedure. It is still to be discerned if the SOOMP approach could pick out a common low dimension subspace to approximate, up to a given quality, any new ECG record. We feel confident that the results presented in this work will motivate further research in the topic.

References

  1. 1. Natarajan BK. Sparse approximate solutions to linear systems. SIAM J Comput. 1995;24(2):227–34.
  2. 2. Chen SS, Donoho DL, Saunders MA. Atomic decomposition by basis pursuit. SIAM J Sci Comput. 1998;20:33–61.
  3. 3. Mallat SG, Zhang Z. Matching pursuits with time-frequency dictionaries. IEEE Trans Signal Process. 1993;41:3397–415.
  4. 4. Pati YC, Rezaiifar R, Krishnaprasad PS. Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. In: Proceedings of the 27th ACSSC. 1993. p. 40–4.
  5. 5. Gribonval R. Fast matching pursuit with a multiscale dictionary of Gaussian chirps. IEEE Trans Signal Process. 2001;49(5):994–1001.
  6. 6. Rebollo-Neira L, Lowe D. Optimized orthogonal matching pursuit approach. IEEE Signal Process Lett. 2002;9(4):137–40.
  7. 7. Andrle M, Rebollo-Neira L, Sagianos E. Backward-optimized orthogonal matching pursuit approach. IEEE Signal Process Lett. 2004;11(9):705–8.
  8. 8. Andrle M, Rebollo-Neira L. A swapping-based refinement of orthogonal matching pursuit strategies. Signal Process. 2006;86:480–95.
  9. 9. Donoho DL, Tsaig Y, Drori I, Starck J. Stagewise orthogonal matching pursuit. IEEE Trans Inf Theory. 2006;58:1094–121.
  10. 10. Needell D, Tropp JA. CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. Appl Comput Harmon Anal. 2009;26(3):301–21.
  11. 11. Needell D, Vershynin R. Signal recovery from incomplete and inaccurate measurements via regularized orthogonal matching pursuit. IEEE J Select Topics Signal Process. 2010;4:310–6.
  12. 12. Eldar YC, Kuppinger P, Bolcskei H. Block-sparse signals: Uncertainty relations and efficient recovery. IEEE Trans Signal Process. 2010;58(6):3042–54.
  13. 13. Rebollo-Neira L, Bowley J. Sparse representation of astronomical images. J Opt Soc Am A Opt Image Sci Vis. 2013;30(4):758–68. pmid:23595338
  14. 14. Rebollo-Neira L, Matiol R, Bibi S. Hierarchized block wise image approximation by greedy pursuit strategies. IEEE Signal Process Lett. 2013;20:1175–8.
  15. 15. Rebollo-Neira L. Cooperative greedy pursuit strategies for sparse signal representation by partitioning. Signal Process. 2016;125:365–75.
  16. 16. Rebollo-Neira L, Aggarwal G. A dedicated greedy pursuit algorithm for sparse spectral representation of music sound. J Acoust Soc Am. 2016;140(4):2933. pmid:27794341
  17. 17. Tropp JA, Gilbert AC, Strauss MJ. Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit. Signal Process. 2006;86(3):572–88.
  18. 18. Rebollo-Neira L. Constructive updating/downdating of oblique projectors: a generalisation of the Gram-Schmidt process. J Phys A Math Theor. 2007;40:6381–94.
  19. 19. Pan J, Tompkins WJ. A real-time QRS detection algorithm. IEEE Trans Biomed Eng. 1985;32(3):230–6. pmid:3997178
  20. 20. Sedghamiz H. Complete pan tompkins implementation ECG QRS detector. MATLAB Central File Exchange. Accessed 2024 June 4. Available from: https://www.mathworks.com/matlabcentral/fileexchange/45840-complete-pan-tompkins-implementation-ecg-qrs-detector
  21. 21. MIT-BIH Arrhythmia Database. Available from: https://physionet.org/physiobank/database/mitdb/.
  22. 22. Rebollo-Neira L, Cerna D. Wavelet based dictionaries for dimensionality reduction of ECG signals. Biomed Signal Process Control. 2019;54:101593.
  23. 23. DČerná D, Rebollo-Neira L. Construction of wavelet dictionaries for ECG modeling. MethodsX. 2021;8:101314. pmid:34434834
  24. 24. Cohen A, Daubechies I, Feauveau JC. Biorthogonal bases of compactly supported wavelets. Comm Pure Appl Math. 1992;45:485–560.
  25. 25. Nonlinear Approximation Example. Available from: http://www.nonlinear-approx.info/examples/node017.html
  26. 26. Rebollo-Neira L. Effective high compression of ECG signals at low level distortion. Sci Rep. 2019;9(1):4564. pmid:30872627
  27. 27. Skretting K. Huffman coding and arithmetic coding. MATLAB Central file exchange. Accessed 2024 June 4. Available from: https://www.mathworks.com/matlabcentral/fileexchange/2818-huffman-coding-and-arithmetic-coding
  28. 28. Adamo A, Grossi G, Lanzarotti R, Lin J. ECG compression retaining the best natural basis k-coefficients via sparse decomposition. Biomed Signal Process Control. 2015;15:11–7.
  29. 29. Liu TY, Lin KJ, Wu HC. ECG data encryption then compression using singular value decomposition. IEEE J Biomed Health Inform. 2018;22(3):707–13. pmid:28463208
  30. 30. Zheng L, Wang Z, Liang J, Luo S, Tian S. Effective compression and classification of ECG arrhythmia by singular value decomposition. Biomed Eng Adv. 2021;2:100013.
  31. 31. Lee S, Kim J, Lee M. A real-time ECG data compression and transmission algorithm for an e-health device. IEEE Trans Biomed Eng. 2011;58(9):2448–55. pmid:21606020
  32. 32. Ma J, Zhang T, Dong M. A novel ECG data compression method using adaptive Fourier decomposition with security guarantee in e-health applications. IEEE J Biomed Health Inform. 2015;19(3):986–94. pmid:25222961
  33. 33. Tan C, Zhang L, Wu H-T. A Novel Blaschke unwinding adaptive-Fourier-decomposition-based signal compression algorithm with application on ECG Signals. IEEE J Biomed Health Inform. 2019;23(2):672–82. pmid:29993788