A pyramid-like model for heartbeat classification from ECG recordings

Heartbeat classification is an important step in the early-stage detection of cardiac arrhythmia, which has been identified as a type of cardiovascular diseases (CVDs) affecting millions of people around the world. The current progress on heartbeat classification from ECG recordings is facing a challenge to achieve high classification sensitivity on disease heartbeats with a satisfied overall accuracy. Most of the work take individual heartbeats as independent data samples in processing. Furthermore, the use of a static feature set for classification of all types of heartbeats often causes distractions when identifying supraventricular (S) ectopic beats. In this work, a pyramid-like model is proposed to improve the performance of heartbeat classification. The model distinguishes the classification of normal and S beats and takes advantage of the neighbor-related information to assist identification of S bests. The proposed model was evaluated on the benchmark MIT-BIH-AR database and the St. Petersburg Institute of Cardiological Technics(INCART) database for generalization performance measurement. The results reported prove that the proposed pyramid-like model exhibits higher performance than the state-of-the-art rivals in the identification of disease heartbeats as well as maintains a reasonable overall classification accuracy.


Introduction
An electrocardiogram (ECG) is a recording of the electrical activity of the heart over a period of time.It provides a noninvasive and inexpensive way for studying the heart.Heartbeat classification is one of the important fields in ECG analysis.The Association for Advancement of Medical Instrumentation (AAMI) categorized heartbeats into 5 classes: Normal(N), Supraventricular (S) ectopic, Ventricular (V) ectopic, Fusion (F) and Unknown (Q) beats [1].Heartbeat classification is an essential step toward identifying arrhythmias.Arrhythmias affect the body by impacting heart's ability to pump blood.Critically, arrhythmias can be divided as lifethreatening and non-life-threatening ones [2].For example, ventricular fibrillation and tachycardia are life-threatening arrhythmias, which are fatal and require medical attention immediately.Non-life-threatening arrhythmias, such as atrial fibrillation, just present a chronic health threat to patients, but special care is still needed to avoid further deterioration of heart function.
Although to perform an electrocardiography test is simple, the manual interpretation of ECG recordings could be time-consuming and error-prone, especially for the long-term ECG recordings.Hence, an intelligent approach on automatic heartbeat classification from ECG recordings is highly demanded, which would be of great assistance for clinicians in heart diseases diagnosis.
Many research attempts have been made to address the heartbeat classification problem.The current process has difficulties in guarantying a high detection sensitivity of disease heartbeats as well as maintaining a good overall classification accuracy.Most of the existing work take heartbeats as mutual-independent data samples, with no connections to their predecessors or successors [2][3][4][5][6].Therefore, the neighbor-related information is ignored in their classification process.In addition, the use of a single static feature set to classify all types of heartbeats together may cause high misclassification on S beats in particular.A number of factors need to be further considered in classification: (1) ECG recordings are imbalanced and usually dominated by the N beats; (2) Some shape-related features must be included to distinguish the V beats from the N beats for they have different QRS complexes; (3) The N and S beats are similar in QRS complex morphology, but the S beats have a fast heart rhythm.In other words, the existence of the shape-related features makes a S beat be easily misidentified as a N beat.In this study, we aim to propose a pyramid-like model to solve these problems and improve the heartbeat classification performance.

Related work
The related studies in heartbeat classification from ECG recordings are reviewed in this section.Besides, we introduce two feature extraction techniques-the Higher-order statistics and the Discrete wavelet transformation.The Earth mover's distance (EMD) is also discussed for measuring the dissimilarity of two multi-dimensional distributions.

Literature review
Many machine-learning approaches have been proposed for automatic heartbeat classification since last two decades.The variety of classification performance among these approaches are primarily the features and the classifiers used.
The features used to represent a heartbeat are usually extracted from cardiac rhythm or time/frequency domains, in which the RR-Interval is reported as one of the most widely used feature [2,3,[7][8][9][10].RR-Interval holds indispensable information about heart rhythms and has capacity to discriminate the disease heartbeats from the normal ones.Other features, such as the higher order statistics (HOS) [7,11], wavelet coefficients [12][13][14][15][16][17], morphological amplitudes [2,18], signal energy [17], and random projection features [19,20], can also be commonly found in the literature.As irrelevant features could cause negative impacts to the classification performance and decrease the generalization power, different feature selection techniques have been applied to clear up the noise and reduce the feature dimension, such as the floating sequential search [4] and the weighted linear discriminant model with a forward-backward search strategy [21].
Although some promising results have been achieved, the current methods on heartbeat classification still have some problems.The associations among heartbeats are often ignored in existing classification process.All types of heartbeats are presented using a same set of static features.This could limit the classification performance and possibly lead to a failure in identification of S beats.Therefore, heartbeat classification is seeking for a solution to provide high accuracy.

Higher-order statistics
The higher-order statistics (HOS) methods are commonly used to estimate signal shape.They contain both amplitude and phase information of non-Gaussian linear processes and high immunity to the Gaussian background noise in comparison to the lower-order statistics [30].In this work, we counted the skewness (3rd order statistics) and the kurtosis (4th order statistics) into our feature set.
The skewness measures the symmetry of a distribution.The kurtosis denotes whether the distribution is heavy-tailed or light-tailed, as compared to the normal distribution.For an input signal, assume X 1. ..,N denotes all the data samples, � X is the mean and s is the standard deviation, the skewness and kurtosis can be defined respectively as below.

Discrete wavelet transform
The discrete wavelet transform (DWT) provides a time-frequency representation of a signal, which is widely used in data compression, noise reduction and multi-frequency-bands signal analysis.The DWT iteratively decomposes a signal to different frequency bands with a scaling function and a wavelet function.The high-frequency component provides the detail information; while the low-frequency components is a coarse approximation of the upperlevel signal.Each component is represented by a collection of wavelet coefficients, which is obtained by the inner products of mother wavelet function and the upper-level signal.The choice of the mother wavelet function is the key of the discrete wavelet transform, which heavily depends on applications.In term of noise reduction on raw ECG signals, we use the Daubechies-4 wavelet for its good orthogonality and short vanishing moment.For morphology features extraction, the Haar wavelet is chosen because of its simplicity.Besides, it has been demonstrated as the ideal wavelet for short time signal analysis [17].The Haar function can be represented as

Earth mover's distance
The Earth mover's distance (EMD) is a metric of dissimilarity between two multi-dimensional distributions [31].A distribution can be represented by a set of clusters.Such a representation is called the signature of the distribution.Data points from a distribution are grouped into a set of clusters, with each cluster denoted by its mean (or mode) and the fraction of the distribution that belongs to the cluster.Thus, one cluster can be regarded as a single feature in a signature.The distance between the features is called the ground distance.Signatures could be different in length.For example, simple distributions have shorter signatures than the complex ones.
The Earth mover's distance can be formulated and solved as a transportation problem [32].Assume that there is a signature P with m cluster: and a signature Q with n cluster: where p and q are the cluster representatives (mean or mode), and w denotes the cluster weight.
Let D = [d [ i, j]] be the ground distance between p i and q j and F = [f i,j ] be the flow between p i and q j .The optimal F is obtained by minimizing the overall work: subject to the following constrains: The Earth mover's distance is defined as the work normalized by the total flow:

Methodology
This section presents the proposed methodology.Firstly, we introduce the preprocessing method.Then we discuss the appropriate features for heartbeat classification.After that, we present the pyramid-like model in detail.

Preprocessing
The raw ECG signals always come with Gaussian white noise and baseline wanders.The baseline wanders is the effect that the base axis (X-axis) of individual heartbeats appear to move up or down rather than being straight all the time, as shown in Fig 2 .In order to avoid propagation of the negative impact of these two problems to the classification stage, an effective method for cleaning up the ECG recordings is indispensable.
To correct the baseline wanders, each ECG recording is processed with a 200-ms width median filter followed by a 600-ms median filter to obtain the signal baseline, which is then subtracted from the raw ECG signal to get the baseline corrected data.Then, a discrete wavelet transform is applied to remove the Gaussian white noise.The baseline corrected recordings are decomposed to different frequency bands with various resolutions.We select the Daubechies-4 as the mother wavelet function because its short vanishing moment is ideal for analyzing signals like ECG with sudden changes.The coefficients of detail information (cD x ) in each frequency band are then processed by a high-pass filter with a threshold value T ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 2 � logðnÞ p ; ð13Þ where n indicates the length of the input signal.The coefficients that failed by the filter are set to zero.Finally, the clean recordings are obtained by employing inverse discrete wavelet transform on the coefficients.After noise reduction, The ECG recordings are segmented to individual heartbeats using the R locations provided by the databases.For each R peak, 90 samples (250-ms) before R peak and 144 samples (400-ms) after R peak are taken to represent a heartbeat.This is long enough to catch the samples representing the re-polarization of ventricular and short enough to exclude the neighbor heartbeats [7].

Feature extraction
Three types of features are used to characterize a heartbeat in this work: RR-interval, HOS and wavelet coefficients.Table 1 summarizes the statistics of these features and gives their p-values among the N, S and V beats.The RR-interval is the time distance between two successive R peaks.Specifically, the interval between the current R peak and the previous R peak is known as pre-RR, while the interval between current R peak and the following R peak is post-RR.The RR-interval is one of the most indispensable features used for heartbeat classification.Zhancheng et al. [2] have done extensive work to prove that pre-RR is the top distinguishing feature for recognizing S beats.Table 1 shows the p-value of pre-RR between class N and S is 2.16e−58, which means that pre-RR leads to a significant difference between the N and S beats.
The skewness (3rd order statistics) and the kurtosis (4th order statistics) are effective in estimating shape parameters of ECG signals.They are able to well distinguish V beats because the major difference of V beats against other types of heartbeats is the shape.The corresponding p-values in Table 1 justify this statement.
The wavelet coefficients provide multi-frequency-bands information of signals.Since each heartbeat only contains 235 data samples, the maximum level of wavelet decomposition is up to 7. As reported by Asl et al. [12], each type of heartbeats can find its own representative and distinct components in the detail information at level 4-7.In this study, the detail information at these levels are used to represent morphology-related features of a ECG signal.
In conclusion, each of the above-mentioned features is sensitive to at least one certain type of heartbeats distinct from the others.However, if grouping all these features to form a single feature set to classify all types heartbeats together, it is likely to lead to a poor classification performance.Therefore, a pyramid-like model is proposed to select and organize these features to improve performance.

Pyramid-like classification model
The proposed pyramid-like model is made up of the nsDispatcher, nRefiner and sRefiner.nsDispatcher at first, where each heartbeat is categorized into the N or S group.After that, in the level-2 classification, the nRefiner classifies the heartbeats in the upper N group to the N, V, F or Q group.Simultaneously, the sRefiner classifies the heartbeats in the upper S group to the S, V, F or Q group.
When the shape-related features are included in consideration, N and S beats are difficult to distinguish, because the N and S beats share a similar QRS complex.Therefore, we focus on classification of N and S beats specially.In nsDispatcher, only the heart rhythm information (RR-interval) is considered.
Model training.Algorithm  The core of the nsDispatcher is the decision rules shown between line 20-33 in Algorithm 1.They determine which group (N or S) a heartbeat belongs to.Let hb denote a heartbeat and t be the threshold value, the decision rules can then be mathematically expressed as rule 1: and rule 2: where normalRreRR represents the median value of the pre-RR values of the normal heartbeats.
The rules are motivated by two observations: (1) a S beat generally has a shorter pre-RR value than that of a surrounding N beat; and (2) the gap of the pre-RR value between a S beat and a N beat varies with patients.Therefore, a heartbeat should not be treated as an independent data sample, but be associated with the surrounding beats as well as the patient-specific information.The rule 1 uses the surrounding beats to help classification.Suppose that in an ECG recording, there is a S beat followed by a N beat.The S beat can be easily caught by the rule 1.However, when there are two successive S or N beats, the rule 1 can fail because there is not enough information.As such, the rule 2 is applied to complement the rule 1 by taking advantage of the patient-specific information (normalPreRR).
If any of the rules is satisfied, the heartbeat is categorized as class S, otherwise as class N. The goal of the training process is to find out the best threshold value (t) that helps to achieve a high detection sensitivity of both the N and S beats for the decision rules of each patient.We traverse every possible t in the range of (−1, 0).Values beyond this range is practically impossible so far.The parameter step is used to control the precision of t.The smaller the step, the more precise the t but the more time-consuming the training process.Formally, the objective function (line 39 in Algorithm 1) is formulated as: The trained threshold values are stored in trsValues (line 39 in Algorithm 1).
In terms of the nRefiner and the sRefiner, Table 2 summarize their compositions and the training features.Notice that the N group is seriously imbalanced and dominated by the normal heartbeats.To reduce the impact caused by the imbalance problem, a mix classifier ensemble method is applied in the nRefiner.The reason for excluding the heartbeat rhythm for training the sRefiner is that the V beats could also have irregular RR-interval values as the S beats.
Classification.The details of level-1 and level-2 classification are presented in Algorithm 2 and Algorithm 4, respectively.The algorithm goes on by looking for a patient p b in DS training who has the most similar pre-RR values distribution with p a , and assign p b 's threshold value to p a (line 12-13 in Algorithm 2).We implement a function named getNeighbor (Algorithm 3) to perform the task.The function uses the Earth mover's distance (EMD) to measure the dissimilarity of two distributions.Notice that if p b 's threshold value equals to 0, which means that no S beat is found in p b , it is believed that there is also a low probability to find S beats in p a .However, we never want to miss a potential S beat, which may lead to a serious consequence to a patient.In such a case, we assign the smallest value in trsValues to p a (line 14-16 in Algorithm 2).This implies that the algorithm try to search for the potential S beats while avoid classifying the N beats as S beats.
Once the E(normalPreRR) as well as the t are ready, the heartbeats are processed by the decision rules.
Algorithm 3: find the nearest neighbor of a patient

Experimental ECG databases
In this section, three ECG databases are introduced, namely the MIT-BIH-AR database and the INCART database.They are public-accessible from the Physiobank [33].S1 File contains hyper links for downloading the data.
Most of the works on heartbeat classification trained and evaluated their models on the MIT-BIH-AR database.In order to have a fair comparison, both the training and the evaluation of the pyramid-model is performed on the MIT-BIH-AR database as well.Besides, we use the INCART database to assess the generalization performance of the proposed model.
All ECG recordings in these databases have an equal length of 30 minutes, but they are not sampled in the same frequency.They need to be re-sampled to 360Hz before use.The recordings are well-labeled in heartbeat level.The original heartbeat annotations include 15 classes, which are further grouped into 5 super-classes by the AAMI [1], as shown in Table 3.
Details of these databases are respectively given below.

MIT-BIH-AR database
The database contains 48 two-lead ambulatory ECG recordings from 47 patients (including 22 females and 25 males).Each recording is denoted by a 3-digits number.The recordings were digitized at 360Hz per second per channel with 11-bit resolution over a 10 − mV range.For most of them, the first lead is modified limb lead II (except for the recording 114).The second lead is a pericardial lead (usually V1, sometimes are V2, V4 or V5, depending on subjects).In this study, only the modified limb lead II is used.The database is seriously imbalanced.The N beats dominate most of the recordings.Therefore, the k-fold validation scheme cannot be applied to split the database for training and testing.Two different paradigms are found in the literature to solve this problem [2,3,6,7].One is the intra-patient paradigm, which first mixes up the heartbeats from all recordings and then evenly allocates each category of heartbeats into two groups.The other one is the inter-patient paradigm.In this paradigm, the ECG recordings are divided into two datasets (DS1 and DS2) with each dataset containing approximately the same portion of heartbeat classes.Table 4 shows the division and the corresponding heartbeat classes distribution.The DS1 is used for model training and the DS2 is used for model performance evaluation.
It has been empirically proven that the intra-patient paradigm can bias the classification result by allowing training and testing heartbeats coming from the same patient [9].By contrast, the inter-patient paradigm is more objective.In order to reveal the true performance of the pyramid-like model and have a fair comparison with the stat-of-the-art rivals, the interpatient paradigm is strictly followed in this work.

Experimental evaluation
In this section, we conduct a benchmark evaluation for the proposed pyramid-like model on the MIT-BIH-AR database, with the result being compared to the state-of-the-art methods.Besides, we use the INCART database to assess the model generalization performance.
All the experiments presented in this work are programmed in Python 3.63 and done in a 64-bits Windows 10 PC, with i5 − 4590 CPU and 12 GB memory.

Evaluation metrics
In this work, the performance is evaluated by sensitivity (Se), positive predictive value (+P) and accuracy value (Acc) as follows, where TP, TN, FP and FN denotes true positive, true negative, false positive and false negative, respectively, and ∑ represents the amount of instances in the data set.
It should be noted that penalties would not be applied for the misclassification of class F and Q, as recommended by the AAMI standard.

Classification result and discussion
Table 6 shows the result of the level-1 classification.The majority of the N and S beats are correctly classified by the nsDispatcher.Although 3153 N beats are misclassified as S beats, they only account for a small portion of the total N beats.A good classification sensitivity and positive predictive value of the N beat is still achieved.On the other hand, the misclassified N beats lead to a decrease of the positive predictive value of the S beats.However, as the heartbeat classification plays an important role toward identifying the cardiac arrhythmia, the accuracy over the class S is considered most important [28].From an overall point of view, the nsDispatcher does a decent job.Table 7 gives the final classification results of the proposed pyramid-like model in detail.It is worth noting that, form level-1 to level-2 classification, only 164 N beats and 87 S beats are misclassified by the nRefiner and the sRefiner.In addition, the level-2 classification achieves superior performance in detection of the V beats.The results indicate the effectiveness of the nRefiner and the sRefiner.In terms of the F and Q beats, a poor performance is obtained, which is a normal phenomenon because both F and Q beats are originally unclassifiable.The same issue is commonly found in all the existing research works.
The proposed model is compared to the state-of-the-art methods over the same test set (DS2).Table 8 summarizes the comparative result.The proposed model exhibits higher performance in terms of the positive predictive value of N beats and the sensitivity value of the disease heartbeats (S and V).In addition, it takes the second best place in global accuracy (91.5%) and the sensitivity value of class N (99.0%).
Although our model has the lowest positive predictive value of the S beats, we make a breakthrough in the sensitivity value (91.0%).Actually, as we can see, the positive predictive values of class S are commonly low in most of the existing methods.The best one is obtained by Ye C et al. [6], which is just 17% better than ours, but we beat it in the sensitivity value by more than 30%.

Generalization result and discussion
The classification result on the INCART database is summarized in Table 9.The performance is compared to the latest work by Mariano L. and Juan P. [4], which is the only work can be found performing model evaluation on both the MIT-BIH-AR and the INCART database.Table 10 presents the comparative result.
Notice that the compared method [4] follows the AAMI2 labeling, where class F and Q are merged into class V.In order to have a fair comparison, we adapt the proposed model to the AAMI2 labeling.
As seen from Table 10, the proposed model has a comparable performance with the rival on the INCART database.Both the works achieve similar values in all metrics.However, if we look back at Table 8, the proposed pyramid-like model presents better performance on DS2.
It is worth noting that, from DS2 to the INCART database, the proposed model maintains a stable heartbeat classification performance.This is very important, as robustness is indispensable for an algorithm to be applied in a clinical practice.

Conclusion
Millions of people around the world are suffering from the cardiac arrhythmia.Automatic heartbeat classification helps early identify this issue, making it possible for people to get the right treatment sooner.In this paper, a pyramid-like model has been proposed for automatic heartbeat classification.The model integrates three components, namely nsDispatcher, nRefiner and sRefiner.During the classification process, the nsDispatcher first allocates the heartbeats into the N or S group.The nRefiner and the sRefiner then further classify the heartbeats in the N and S group respectively to give the final decision.The significance of the proposed model is that it takes the surrounding heartbeats as well as the patient-specific information into consideration to help identification of a S beat.Besides, the nRefiner and the sRefiner are customized with different classifier structure and training features to adapt to the classification requirements in the N and S group.
The proposed model has been evaluated on the MIT-BIH-AR database, with the performance being compared against the state-of-the-art methods.In addition, the INCART database is used to measure the generalization performance of the proposed model.The experimental results have proven the effectiveness and robustness of the proposed model in heartbeat classification.
Fig 1 presents the whole decomposition process.Only the low-frequency components are decomposed.

Fig 1 .
Fig 1.A demonstration of discrete wavelet decomposition.cD x denote the wavelet coefficients of coarse approximation and detail information at x level, respectively.https://doi.org/10.1371/journal.pone.0206593.g001 Fig 3 gives a visual demonstration on the feature significance via boxplots.The boxplot of each wavelet coefficient can be found in the S2 File.

Table 1 . Feature statistics and the corresponding p-values between heartbeat classes.
https://doi.org/10.1371/journal.pone.0206593.t001 1 presents the training process of nsDispatcher.The input training database is denoted as DS training , where each ECG recording represents a patient.

Table 2 . The nRefiner and the sRefiner.
classification, one important step is the estimation of the normal pre-RR value of a patient (line 4-11 in Algorithm 2).For each patient p a in DS test , we perform a statistical analysis on p a 's heartbeat pre-RR values via boxploting.If less than 10% of the data are considered as outliers, we assume that the ECG recording is dominated by the normal heartbeats and use RR value.Such an assumption is practical and reasonable because S beats occur sparsely in real-world applications.On the other hand, if more than 10% of the data are considered as outliers, the ECG recording is likely to be distorted by the S beats and median(heartbeats.preRRs)could represent the pre-RR value of a S beat.In such a case, we use https://doi.org/10.1371/journal.pone.0206593.t002Inlevel-1 In level-2 classification (Algorithm 4), each heartbeat in the N group is further classified by the nRefiner to class N, V, F or Q.Similarly, the sRefiner reclassified the S beats to class S, V, F or Q.

Table 3 . ECG-based heartbeat annotations. AAMI class Original class Type of beat
This database consists of 75 ECG recordings sampled at 257Hz.Each recording contains 12 standard leads.Similarly, only the modified limb lead II is used in this study.The annotations were first produced by an automatic algorithm and then corrected manually based on the standard PhysioBank beat annotation definitions.None of the recordings contains pacemakers, but most of them have ventricular ectopic beats.The heartbeat distribution of the INCART database is shown in Table5.