Robust, real-time generic detector based on a multi-feature probabilistic method

Robust, real-time event detection from physiological signals acquired during long-term ambulatory monitoring still represents a major challenge for highly-artifacted signals. In this paper, we propose an original and generic multi-feature probabilistic detector (MFPD) and apply it to real-time QRS complex detection under noisy conditions. The MFPD method calculates a binary Bayesian probability for each derived feature and makes a centralized fusion, using the Kullback-Leibler divergence. The method is evaluated on two ECG databases: 1) the MIT-BIH arrhythmia database from Physionet containing clean ECG signals, 2) a benchmark noisy database created by adding noise recordings of the MIT-BIH noise stress test database, also from Physionet, to the MIT-BIH arrhythmia database. Results are compared with a well-known wavelet-based detector, and two recently published detectors: one based on spatiotemporal characteristic of the QRS complex and the second, as the MFDP, based on feature calculations from the University of New South Wales detector (UNSW). For both benchmark Physionet databases, the proposed MFPD method achieves the lowest standard deviation in sensitivity and positive predictivity (+P) despite its online algorithm architecture. While the statistics are comparable for low-to mildly artifactual ECG signals, the MFPD outperforms reference methods for artifacted ECG with low SNR levels reaching 87.48 ± 14.21% in sensitivity and 89.39 ± 14.67% in +P as compared to 88.30 ± 17.66% and 86.06 ± 19.67% respectively from UNSW, the best performing reference method. With demonstrations on the extensively studied QRS detection problem, we consider that the proposed generic structure of the multi-feature probabilistic detector should offer promising perspectives for long-term monitoring applications for highly-artifacted signals.


Introduction
Event detections from physiological signals are often faced with important noise perturbations, especially in clinical monitoring context. Main strategies are focused on finding an efficient feature reliable in most cases. Generally, these methods such as [1] get interesting results under low-to mid-level noise conditions, but performances decrease significantly with the signal-to-noise ratio (SNR) diminution or with a change in the noise type since all features have vulnerabilities to specific distortions. To circumvent this weakness, multi-feature detectors were proposed [2] but the decentralized fusion method does not fully exploit feature informations using statistical learning strategies. The main objective of this paper is to propose a generic event detection method with centralized fusion, in which the final decision is made by a weighted sum of posterior detection probabilities derived from each feature's statistical properties. The power of the method is illustrated by its application to real-time QRS complex detection from electrocardiogram (ECG) signals. QRS complex is the most prominent deflection in ECG signal and corresponds to the electrical depolarization of ventricles. The detection is often the first analysis performed on ECG signal processing, in order to estimate basic cardiac markers, such as heart rate or to perform further ECG segmentation and analysis. The QRS complex detection has been investigated for many decades [1] and yet remains a challenge [3] as an event detection problem from physiological signals. Many different methods have been proposed and a number of review publications have been dedicated to this subject [4] [5]. The main proposed methods are based on filtering and non linear transformations [1], fuzzy hybrid neural networks [6], S-Transform [7] or wavelet analysis [8][9][10][11]. Although these QRS detection methods perform well in low-to mid-level noise conditions, their applications on long-term periods of ECG recordings under noisy conditions such as in ambulatory care and intensive care units still pose a significant challenge as evidenced by the large number of RR correction methods reported in [12]. Indeed, these ECG recordings are often prone to episodes of strong signal non-stationarity, sudden modifications of beat morphologies and most importantly the presence of several types of noise (baseline drift, saturation, power-line pickup, muscular contractions and motion artifacts [13]). Recently in [14], the authors explained that QRS detection has not been completely assessed in terms of robustness to noise. As a consequence, recent publications [15] [16] [17] and a recent PhysioNet challenge [3] have been focused on the specific problem of robust QRS detection. Furthermore, the emergence of wearable cardiac monitors, with a limited number of leads [18] [19] for long-term daily-life recordings [20] further revives the research interests on robust QRS detection for low-quality electrocardiograms.
In our previous works, we have proposed different methods to improve the robustness of QRS detection, through multisensor fusion [2], adaptive selection of QRS detectors as a function of the signal context [21] or through optimal detector parameter configuration, using evolutionary methods [22] [23]. More recently, we revisited this optimization process in order to identify optimal parameter configurations with respect to changes in signal noise [24].
In this paper, we propose and evaluate a novel, generic event detector, that provides improved robustness through the probabilistic combination of a set of signal features. Section 2 presents the general architecture of the proposed Multi-Feature Probabilistic Detector (MFPD) and a specific implementation adapted to robust QRS detection. Section 4 evaluates its performances on two ECG databases: 1) the benchmark MIT-BIH [25] to validate the proposed method on clean ECG signals with consensus annotations, 2) the benchmark noisy database (created by adding noise recordings to the MIT-BIH) with known artifact types and levels, to validate the proposed method on highly artifactual ECG signals including various artifacts.
1. Pre-processing: Raw signals are processed in order to improve the SNR and to pre-select potential candidates (to be validated by the detector) of the events of interest at instants t.
2. Feature extraction: For every event candidate selected at instant t, a vector CðtÞ ¼ fC i ðtÞji 2 Ig is created, where I is a set of complementary features extracted from the preprocessed signals.
3. Probability density estimation: The probability density functions (pdf), noted as P i ðC i ðtÞ; Y i0=1 ; H 0=1 Þ are used to model feature i on the observed candidate C(t), with the two hypotheses: where D(t) is the final detection decision (D(t) = 1 for detection and D(t) = 0 otherwise) and Θ i0/1 the parameter set for each hypothesis. Note that for each feature, the two pdf belong to the same distribution family, whose parameters Θ i0/1 are initialized at the beginning of the recording and updated throughout the detection process (see Section 2.2.3 for more details).

Probabilistic characterization:
The posterior probability P i ðH 1 jC i ðtÞÞ is calculated by applying the Bayes law. Moreover, the Kullback-Leibler divergence (KLD) between each pdf pair characterizing feature i, D i KL , is calculated. 5. Decision fusion: The posterior probabilities, weighted by their respective KLD, are combined to build a binary decision D(t) on whether candidate C(t) is a valid event (D(t) = 1) or not (D(t) = 0). According to the decision for the current candidate, distribution parameters are updated to complete the real-time learning process.

Real-time QRS detection implementation
In this section, we detail the realization of the above-mentioned MFPD, adapted for real-time detection of QRS complexes. From the generic approach of Fig 1, the specific adaptions to QRS complex detection include mainly the pre-processing (step 1) and the feature extraction (step 2), they are depicted in Fig 2. 2.2.1 Pre-processing. As in many other QRS detection methods, the first step consists in applying to the raw ECG signal different transformations. Typically, a band-pass filter, a derivative filter, a non-linear transformation and a final smoothing filter are applied. We adopt here the pre-processing method used in [1,26]: Fig 2 represents a diagram of these steps. In the following, the band-pass filtered ECG signal, computed using finite impulse response (FIR) low pass (f cutoff = 19 Hz, order = 256) and high pass (f cutoff = 8 Hz, order = 256) filters, will be denoted SAecg; and the output of a squared transformation followed by a derivative FIR filter (f cutoff = 30 Hz, order = 129) then a smoothing filter (length = 101 ms) on signal SAecg will be denoted SFecg (cf Fig 2). FIR filters were designed using Remez exchange algorithm. Each local maximum detected at an instant t on SFecg is considered as a potential QRS candidate.

Feature extraction.
In this paper, a set of 3 features CðtÞ ¼ fC i ðtÞ; i 2 Ig; I ¼ fs; a; cg is extracted for each QRS candidate. These features are the input to the probability density estimation step (step 3 in Section 2.1 and Fig 1). The squared slope (s), the raw amplitude (a) and the absolute correlation with a beat template (c) constitute the most common features used in the literature for QRS detection. In the current study, we prove the concept of decision fusion with adaptive weights to track the relative importance of each feature while the optimization of feature selection lies beyond the scope of our investigation.
1. Feature probability models: in the proposed QRS detection application, each candidate is characterized by a set of 3 features I ¼ fs; a; cg. The probability distribution function (pdf) of features are chosen with several conditions: 1) distributions should have the same support as the calculated feature, e.g. (0, 1) for the squared slope, (−1, +1) for the amplitude and [0, 1] for the absolute correlation, 2) pdf parameter estimations should be easy to calculate, with low complexity in the updating methods, 3) the distance measures on these pdfs should also be easy to compute and tractable, without numerical integrations. In this paper, we implemented three different pdfs: (i) a Beta distribution with closed support, (ii) a Gamma distribution with one-sided open support, and (iii) a generalized normal distribution (GND) with two-side open support. Thus, pdf parameter update and distance calculation methods in this paper can be generalized to include most continuous features in the future.
• The squared slope of the peak (s) is the value of SFecg signal at instant t. This feature is represented with the Gamma distribution with two degrees of freedom: where k 2 R þ is the shape parameter, and y 2 R þ the scale parameter. The indicator function 1 1 x>0 typically limits the function support to R þ .
• The peak amplitude (a) is the value of SAecg signal at instant t. We characterized it using the GND defined as: where GðtÞ ¼ R 1 0 x tÀ 1 e À x dx is the gamma function and m 2 R the position parameter, a 2 R þ the scale parameter and b2 R þ the shape parameter. Note that both positive and negative peak values can be fitted with the GND model.
• The absolute Bravais-Pearson correlation (c) is calculated between the candidate peak (represented by 50 ms of raw ECG signal centered 20 ms before the peak) at instant t and an adaptive template. The template duration was chosen in order to extract mainly the information around the peak, where the information is most characteristic of the QRS complex in our opinion. With a longer template duration, QRS complexes affected by noise can obtain a lower correlation, this weakness is less present with a short template duration.
With an even shorter duration, high-frequency artifacts can be very similar to QRS complexes. In order to model this feature, we have chosen the Beta distribution, defined as: This can also be considered as a special case of the Dirichlet distribution, with two positive shape parameters α and β. Parameters are estimated using maximum a posteriori (MAP) method, further details are reported in section 4.2.
Note again that the same pdf family is proposed for both H 0 and H 1 considering the numerical tractability of the KLD calculations (see discussion in 2.2.4).
2. Initialization: As in step 3) in section 2.1, the model parameters-{k, θ} for the Gamma distribution, {α, β, μ} for the GND, and {α, β} for the Beta distribution-are initialized during the heating-up period (cf block distribution initialization in Fig 1) using the Pan-Tompkins detector [1] for the 40 first validated QRS detections (note that these detections are considered in the performance evaluation step). At the end of the heating-up period, all QRS candidates are labelled as either validated (H 1 ) or invalidated (H 0 ) and the model parameters are estimated for each distribution, using the maximum likelihood estimator (MLE) for the s and a feature, and MAP estimator for the c feature. Beat templates (for both H 1 and H 0 ) are also initialized (aligned and averaged) using the validated and invalidated candidates of the heating-up.
3. Learning: Once initialized, the MFPD detector shifts to the decision fusion mode (step 3-5 in section 2.1 and Fig 1) whose final detection results (of step 5) feed the pdf parameter updating (step 3) in the same manner as during the initialization. KLDs are updated as a direct consequence (step 4). In a similar manner, the beat templates are updated after each decision using 80% of the previous template and 20% of the current candidate. If no detection occurs during 3.5 sec, all parameters are reset and a new initialization starts.

Probabilistic characterization.
Based on the pdf model for each feature in 2.2.3, two probabilistic markers are calculated for each feature: the posterior probability and the KLD. The posterior probability of validating H 1 for feature C i (t) is given by using the Bayes rule. A binary decision can typically be made by thresholding this probability as in most single feature-based QRS detectors. We propose in this paper a centralized fusion by making a weighted sum of these posterior probabilities. The key here is to update dynamically a metric to measure the pertinence of each feature, or the power of separating two classes in the detection context by measuring the distance between the two antagonist pdfs. The KLD is such a non-negative measure defined by: It is particularly well-suited to assess the distance between the distribution pair P i ðC i ðtÞ; Y i0=1 ; H 0=1 Þ. Analytic expressions can be found in the literature in the case of Beta [27] and Gamma distributions [28]. However for the GND case and to our best knowledge, no analytic expression can be found in the general case when μ p 6 ¼ μ q . One theoretical contribution of this paper is to efficiently calculate the KLD between two GND distributions in the case of general settings. While detailed derivation is reported in the appendix B, here we give a summary of the main results: 1. Analytic expressions for b q 2 N þ [f0g have been derived; 2. Eq (5) is monotonously increasing with β q ; 3. The computational complexity requires 2 × (β q + 1) gamma function evaluations.
Thus without numerical integration of Eq (5), we are able to obtain a close approximation of the KLD value for all b q 2 R þ . Note that the other parameters (α p , α q , β p , μ p , μ q ) have no effect on the calculation complexity.
Finally, and without loss of generality, the KLD is one of the f-divergence functions that measures the difference between two probability distributions. They are non-negative, monotonous and jointly convex. For example, the reverse KLD D R KL ðpkqÞ ¼ D KL ðqkpÞ is another fdivergence and can be easily calculated by reversing the role of the two distributions. However, it is less evident to derive tractable numerical methods for the Hellinger distance, or the χ 2divergence for the exponential pdf family.

Decision fusion.
Based on the probabilistic markers, the following decision rule is applied where λ is the decision threshold. The modified � D i KL is calculated by letting i � ¼ argmaxfD i KL g and: Such that the most significant KLD should not exceed 2/3 after normalization. Intuitively, the decision rule represents the sum of all posterior probabilities, weighted by their normalized KLD, such that features that are better separated in distributions (between H 0 and H 1 ) have more weight in the final decision making.

Databases
Two databases were used: • Benchmark database: we first performed a comparative study using the first lead of the MIT-BIH Arrhythmia Database with consensus annotations to show the detection performances on clean ECG signals even though the main objective of the paper is to evaluate detector's robustness under artifact conditions.
• Benchmark noisy database: we then created a benchmark simulated database by adding to the MIT-BIH Arrhythmia Database three noise sources (baseline wander, muscle and electrode motion artifact) extracted from the MIT-BIH noise stress test database. Noises were recorded from physically active volunteers [25] and with SNR levels from −6dB to 24dB by 6dB increments. It is composed of 864 noisy signals for which the reference annotations are simply copies of those for the MIT-BIH Arrhythmia Database. The purpose of this database test is to provide a ground truth for detection performance comparison with different levels and types of noises. A quick access to this benchmark database is available at: https://github. com/dge996/MIT_NoiseStress/ though it can also be constructed using the method described in [25].
We choose to report results on two databases to prove the design concept of the MFPD robustness for clean and noisy databases with consensus annotations. For the second one, both artifact type and level classification annotations are provided, they allow differentiated comparisons.

Comparison methods
As in recent publications [29] [15], performance of the proposed MFPD was compared with the following state of the art QRS detection methods: • University of New South Wales detector (UNSW): a feature-based (ECG amplitude and derivative) detector, with adaptive thresholding, suitable for both clinical and poorer quality tele-health ECG [15].
• Spatiotemporal Characteristics Detector (SCD): a simple and robust realtime QRS detection algorithm based on spatiotemporal characteristics [30], with state-of-the-art performances reported for noisy signals.
WBD was chosen because it is a reference in QRS detection. The other two methods were selected because their objectives are the same as MFDP, which is not the case for many detectors from the literature focused on clean ECG signals. SCD is described by their authors as robust and real-time, UNSW detector is described as suitable for poorer quality ECG signals. Moreover, both were recently published (2016) with open access toolbox. We underline however that contrary to UNSW and WBD methods, the MFPD is an online algorithm designed for monitoring applications which forbids typically bi-directional filtering and beat by beat corrections in post-treatment. We thus further included detection delay statistics, that cannot be obtained for classical detectors.

Performance criterion
In this study, we used the bxb program (as part of the WFDB applications available on physionet) to obtain the beat-by-beat performance statistics for QRS complex detection according to the American national standard for ambulatory ECG analyzers (ANSI/AAMI EC38). We report for the two databases the total number of TP, FP and FN as well as the sensitivity (Se) and positive predictivity (+P) metrics calculated per record and a mean and standard deviation alongside the overall result. As in recent publications [24] [34], we chose a match window of 50 ms. Only one detection is considered as true positive (TP) inside the match window centered at each QRS annotation while the rest are considered as false positive (FP) detections. A false negative (FN) is counted when no detection is found inside the match window. Due to their direct influences on heart rate variability analysis, we also report the mean and standard deviation of jitters: the difference in time between a TP and its associated annotation. To easily observe the increase in detection performance as a whole, the detection error rate DER = (FN+FP)/(TP+FN) is also reported. Furthermore, computational complexity and detection delay of the MFPD are also given since they are key implementation issues in realtime monitoring applications.

Results
In addition to the global performance evaluation and comparison summarized in 4.4, we also provide here some intermediate results to illustrate the multi-feature complementarity in 4.1, their distribution estimation results in 4.2, and the importance of KLD weighting in the centralized decision making in 4.3. Fig 3 shows an example of the processed ECG signals and the features extracted in this paper. Panel (a) shows an ECG segment from record MIT-101, with added electrode movement noise at 6 dB. Panels (b) (c) and (d) represent, respectively, the SAecg (feature a), the SFecg (feature s) signals and the Bravais-Pearson correlation (feature c) related to each QRS complex candidates in panel (a). Candidates with the × symbol are not validated, while those with the ⚬ symbol are validated as QRS by the MFPD method. In this example, all validated detections were TP and all invalidated candidates were true negatives (TN), not belonging to any annotation's match window. The vertical box pinpoints a segment that, if analyzed individually with thresholding, would have produced a false positive. The weighted fusion by Eq (6) however takes the opposite decision. This example shows the power of the multi-feature complementarity in such complex signal context.

Distribution estimation
Estimated parametric distributions (in dashed line) and normalized histogram (in vertical boxes) for each feature are illustrated in Fig 4. The x-axis stands for the dimensionless feature values while y-axis the probability density (also dimensionless). Generally speaking, the pdf type associated with updated parameters can reasonably fit the feature histograms. Kolmogorov-Smirnov tests were realized at each pdf parameter update, with the current candidates For mildly-artifacted ECG signals, due to the stability of the QRS complex morphology, the correlation features C c (t) can be close to 1 for most valid candidates. Thus large α values are estimated for the beta distribution P c ðC c ðtÞ; Y c1 ; H 1 Þ yielding a rapid convergence towards a dirac-like distribution around 1, and a large D c KL value (cf the net separation in the two distributions of the third column in Fig 4). As a consequence, future candidates must have a very high correlation to be validated, giving a decision with a high +P, but with a low sensitivity in the case of sudden noise artifacts or even mild morphology change. In order to obtain a better trade-off between +P and sensitivity, we propose to limit the estimated α parameter of the beta distribution by imposing a conjugate prior law: Pða; bÞ / Bða; bÞ K e À aa e À bb for K; a; b 2 N þ . The exp −aα term indeed forbids large values in estimating α to control the distribution shape of H 1 . Numerical implementation is detailed in appendix A for the MAP estimator.

KLD weighting
In this section, we show the importance of the KLD measure in comparison with both Single Featured Probabilistic Detector (SFPD) and reference methods in Fig 5. SFPD is implemented as MFPD but with only one feature (s, a or c). Note that the evolution of KLD values during the successive (in)-validation process is a direct consequence of the parameter-fitting process since Eq (5) is a function of the updated pdf parameters (cf [27,28], Appendix B). Indeed, the relative importance of the KLD of the a-feature during artifactual period suggests a better separability between the two antagonist distributions (H 0 vs H 1 ) and consequently highlights the decision from the a-feature in the centralized fusion (see Eq (6)). Interestingly, the KLD values tend to approach each other once the artifact period ends (near 50s), from which point all feature decisions participate more equally in the fusion. On the other hand, that the SFPD using the s-feature performs the worst (with multiple FP detections) is inline with its lowest KLD measure, or the smallest distances between the related H 0 and H 1 pdfs. Therefore, depending on the noise onset, the KLD measure of the distribution distances evolve dynamically to quantify the pertinence of each feature and to adapt their relative weights in the decision fusion. The power of the proposed method lies on this engineered flexibility since KLD are not learned from a particular database, but from the past observations of the features labelled by the detection results. Furthermore, the detection statistics of the whole record MIT-231 are given in Fig 5: the MFPD outperforms the SCD method in both sensitivity and +P, has lower sensitivity than UNSW and WBD but highest in +P (see Table 1), which is consistant with the illustration of the 60s segment detection results in Fig 5. Table 2, we first present the performance comparison on the MIT-BIH database, containing mainly clean ECG recordings, and high Se and +P results for all tested methods. We simply note that the MFPD has the lowest  Robust, real-time generic detector based on a multi-feature probabilistic method standard deviation for both Se and +P statistics, a phenomenon also observed in the next tests to illustrate its stability. On the whole, the results indicate a trade-off in favor of the +P as compared with the reference methods while the overall score DER also shows high detection accuracy in both mean and standard deviation terms. To illustrate the method's inter-subject stability, 2.08% of signals had a DER higher than 60% with MFPD versus 6.25% for UNSW, 2.08% for SCD and 8.33% for WBD. This test validates the MFPD on a clean ECG database with consensus annotations and peer methods.

MIT-BIH noise stress database.
A second performance analysis was done using the MIT-BIH noise stress database. Fig 6 compares Se and +P (in %) for different noise types. Firstly, for +P the MFPD outperforms almost all reference methods except in the case of muscle artifact with SNR lower than 0dB. As for Se, the MFPD has slightly lower performances in  Robust, real-time generic detector based on a multi-feature probabilistic method the high SNR region for all artifact types, but higher performances in the low SNR regions. In general the MFPD has better Se scores in comparison with the SCD and WBD method as reported by Table 3. The UNSW, on the other hand, has shown comparable performances in most cases, but suffers from FP detections in the presence of electrode motion artifacts. The MFPD also achieves the lowest overall DER score in terms of mean and standard deviation. To illustrate the method's inter-subject stability, 11.81% of signals had a DER higher than 60% with MFPD versus 16.78% for UNSW, 17.82% for SCD and 16.90% for WBD. These results demonstrate the MFPD viability and highlight its efficiency in noisy context. Finally, we can notice that performances with baseline wander artifacts are generally higher than with other noise types for all tested methods, which can be attributed to the preprocessing filtering steps that attenuate baseline wanders.

Complexity and realtime detection delay
For the computation complexity, the MFPD algorithm implemented in the C++ language costs 19.32 ± 4.75 s to analyze an hour of ECG signal resampled at 1000 Hz using a MacBook Pro laptop (2017, 3,1 GHz Intel Core i7) without multi-threading or graphic parallel computing. The resampling at 1000 Hz was done because, according to us, this frequency is more representative of actual ECG acquisition devices and easier to interpret for readers.
In realtime monitoring applications, event detection delay is also part of the performance that can be used to trade-off for more accuracy for example. It is different from the jitter reported in Tables 2 and 3 that measures the time distance between annotation and detection. The delay reported in Table 4 measures the time distance between the annotation and the sample time at which the MFPD fusion decision validates the corresponding QRS and is typically influenced by lengths of the filters and the templates to calculate the correlation feature. Table 4 shows that this delay in the monitoring is rather stable across the two databases. These results confirm the feasibility of the MFPD in realtime monitoring of ECG signals, even for multi-lead recordings.

Discussion
In this paper, a novel, generic and robust detector combining different features extracted from the signal of interest has been proposed. The original aspects of this method concern  Table 4. MFPD detection delay (mean and std in ms).

Detection delay in ms
particularly i) the probabilistic approach with online learning ii) the multi-feature design iii) the centralized fusion method based on KLD.
In the proposed method, the pdf of each feature is patient, device and even experience specific. Parametric probability models are designed with regard to the real-time constraints of our application to avoid the tuning of the number of bins and widths as in histogram based approaches, and the increasing evaluation costs, inherent to variable-bandwith kernel density estimation approaches [35]. Our proposed online learning method requires a small data sample (40 validated candidates) to initialize the probabilistic model for each recording. No manual annotations are needed since the Pan-Tompkins detector [1] results are used for feature extraction and classification labeling (H 0 /H 1 ). Prior laws of the Beta distribution for the correlation feature are fixed and not database dependent. Among the wide variety of pdf families, the GND seems well-suited for uncentered features with long tail distributions, but to our best knowledge, no analytic expression of the KLD can be found in the general case. Existing solutions [36] are limited to cases of equal means (see 2.2.4). We proposed in this paper an innovative estimation method of the KLD in the general case. With a reasonable computational cost, it can be used in real-time context.
The proposed MFPD QRS detector has been evaluated using two different databases and compared with two most recent state of the art detectors and one reference detector from the literature. Results show that the features provide complementary information to improve detection performance compared with single featured based QRS detectors (see Fig 5), particularly under noisy conditions. This is essentially due to the fact that different types of noise or uncommon pathological artifacts might influence features in different manners while MFPD makes a centralized decision using the KLD as weight to measure the relative pertinence of each feature. In a way, the MFPD is capable of neglecting features (that yield low KLD between the antagonist distributions) that are mostly corrupted by artefacts. Previously proposed methods based on decentralized fusion [2] and algorithm-switching [21] also prove the relevance of multi-feature approaches. To our knowledge, this is the first real-time QRS detection method integrating such an adaptive, multi-feature, centralized decision fusion. Furthermore, MFPD method is more compact and easy to implement than [2] or [21]. Quantitative comparisons results are particularly encouraging for challenging monitoring situations, in which the heterogeneity and levels of noise may be particularly high. For the two databases under evaluation, the ms-level jitter for all tested methods should be acceptable for further HRV parameter analysis and is thus not regarded as a distinguishing factor for comparison.
Even though this method was implemented for single-lead ECG signals, it can be extended to the multi-lead and multi-source cases. Indeed, further improvements in detection robustness are expected by combining multiple ECG leads, but also by integrating other physiological signals (pulse oximetry, phonocardiography, etc) or other sensors sensitive to noise (accelerometers for movement noise, etc). Future works will be directed towards the extension and evaluation of the method in these multi-channel, multi-source contexts. Finally, in addition to the qualitative results of the selected features' relevance shown in Fig 3 and in Fig 5 through KLD evolution, integrating the amplitude and derivative features as used by UNSW [15] into the MFPD architecture should lead to higher sensitivity in QRS detection.

Conclusion
We proposed an original multi-feature probabilistic detector working in real-time and applicable to different physiological signal applications. The method, illustrated on QRS complex detection, has been compared to two latest detectors in the literature, using the MIT-BIH arrhythmia benchmark database and a benchmark noisy database by adding noise recordings [25] to the MIT-BIH database. The proposed MFPD has achieved significant performance improvements on artifacted signals and comparable (in mean) but more stable (in std) performances for low-to mildly artifacted signals. These performance improvements are mainly due to the multi-feature probabilistic model and the KLD-based decision fusion that adaptively adjust the relative contribution of each feature's decision in real-time.
The following contributions and originalities can be highlighted. The MFPD uses a probabilistic approach with online learning and an original adaptive, multi-feature, centralized decision fusion based on KLD. Its application on QRS detection achieved notable performance gains on artifacted ECG signals in comparison with one classical and two recent methods. The proposed method can also be easily extended to the multi-lead and multi-source cases, or to other physiological event detection applications. Besides the theoretical contribution and experimental validation, this new approach boasts a reasonable computational cost and thus can be embedded into low-power devices offering interesting possibilities in the current context of connected health applications.

A.1 About the KKT
Consider the non-linear optimization problem: maximize f ðxÞ; R n ! R (the cost function) subject to m inequality and l equality constraints: We note that in the particular case of m = 0, the KKT conditions are reduced to the Lagrange conditions. If g i and h j are affine functions (MAP of beta distribution), then no regularity condition is needed.

A.2 MAP for the Beta distribution
We derive here the numerical method to calculate the maximum-likelihood estimator (MLE) and MAP estimator given N independent samples of the Beta distribution. Recall the Beta density function: Pðx; a; bÞ ¼ 1 Bða; bÞ x aÀ 1 ð1 À xÞ bÀ 1 1 0�x�1 where B(α, β) is the normalization constant. For the MLE, the function to maximize is the joint log-likelihood function: log Pðx i ; a; bÞ ¼ ða À 1ÞXþðb À 1ÞY À NlogBða; bÞ; Robust, real-time generic detector based on a multi-feature probabilistic method In a Bayesian setting typically to avoid the diraclike shape of the beta distribution (see discussion in Sect. 4.2), a prior law can be added: Pða; bÞ / Bða; bÞ K e À aa e À bb for K < N; a; b 2 R þ : The objective function of the MAP estimator becomes: with inequality constraints g 1 = � − α � 0 and g 2 = � − β � 0. � > 0 is close to zero to form a closed space. Note that these constraints are affine functions and satisfy the regularity conditions. We thus search the solution of: and checking the inequality constraints in (12) and (13). We next show that J(z) is always invertible given the inequality constraints. The JðzÞ ¼ ½ @F i @z j � writes: where B(α, β) is the normalization constant. For the MLE, the function to maximize is the joint log-likelihood function: . In a Bayesian setting typically to avoid the dirac-like shape of the beta distribution (see discussion in Sect. 4.2) , a prior law can be added: The objective function of the MAP estimator becomes: with inequality constraints g 1 = − α ≤ 0 and g 2 = − β ≤ 0. > 0 is close to zero to form a closed space. Note that these constraints are affine functions and satisfy the regularity conditions. We thus search the solution of: where ψ(·) represents the di-gamma function. We apply the Newton Raphson method on and checking the inequality constraints in (12) and (13). We next show that J(z) is always invertible given the inequality constraints. The J(z) = [ ∂Fi ∂zj ] writes: where ψ (1) (·) is the tri-gamma function (second derivative of the log-gamma). Its determinant is: The second equality is due to the fact that C and D commute (i.e. CD = DC). From the relation it can be verified that det J(z) > 0. Thus the Jacobian matrix is always invertible.

14/20
where ψ (1) (�) is the tri-gamma function (second derivative of the log-gamma). Its determinant is: where B(α, β) is the normalization constant. For the MLE, the function to maximize is the joint log-likelihood function: . In a Bayesian setting typically to avoid the dirac-like shape of the beta distribution (see discussion in Sect. 4.2) , a prior law can be added: The objective function of the MAP estimator becomes: with inequality constraints g 1 = − α ≤ 0 and g 2 = − β ≤ 0. > 0 is close to zero to form a closed space. Note that these constraints are affine functions and satisfy the regularity conditions. We thus search the solution of: where ψ(·) represents the di-gamma function. We apply the Newton Raphson method on F (z) = [F 1 , F 2 , F 3 , F 4 ] t with: and checking the inequality constraints in (12) and (13). We next show that J(z) is always invertible given the inequality constraints. The J(z) = [ ∂Fi ∂zj ] writes: where ψ (1) (·) is the tri-gamma function (second derivative of the log-gamma). Its determinant is: The second equality is due to the fact that C and D commute (i.e. CD = DC). From the relation it can be verified that det J(z) > 0. Thus the Jacobian matrix is always invertible.

B Kulback-Leiber divergence for generalized normal distributions
The probability density of the generalized normal distribution writes: Pðx; a; b; mÞ ¼ b 2aGð1=bÞ e À ðjxÀ mj=aÞ b for a; b > 0 Thus, the Kullback-Leibler divergence is: Since α p > 0, we have: |ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl fflffl } ð�Þ Since Γ(z + 1) = z Γ(z), the second term can be further simplified: In the following, we treat the term in ( � ). First defineũ ¼ jm p À m q j a p to replace ( � ) with: Since both |t| and |t + 1| exist in the expression, we further decompose ( � ) into: Notice that when β q is even, the two functions inside the integrals are identical since ðt À 1Þ b q ¼ ð1 À tÞ b q . For instance, if β q = 2: by exploiting the relation: This result can be generalized for all even-valued β q including β q = 0 though it is supposed to be strictly positive by definition. Let's then investigate the case of β q = 2n + 1 with n 2 N. First, for β q = 1, to coerce (�) into a sum of lower and upper incomplete gamma functions. As in (15), we use the following relations: . ð16Þ To generalize, (�) is a sum of either weighted gamma functions using (15) if β q is even or a sum of weighted upper and lower incomplete gamma functions using (16) and (17) if β q is odd. β q -degree binomial coefficients are used to calculate the weights. Next we show that (�) is monotonously increasing: in which k 2 and e À xjtj b p j1 þ tj b q are positive for t 2 R. log|1 + t| is negative for t 2 [−2, 0], and positive otherwise. A sufficient condition is: By splitting the integral into 2 parts and letting y = −t in the first part, the integral of the above Robust, real-time generic detector based on a multi-feature probabilistic method inequality becomes Z 2 0 e À xy b p j1 À yj b q logj1 À yjdy þ Z 2 0 e À xy b p ð1 þ yÞ b q logð1 þ yÞdy ¼ Z 2 0 e À xy b p ðj1 À yj b q logj1 À yj þ ð1 þ yÞ b q logð1 þ yÞÞ |ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl } G b q ðyÞ dy Note that the function G b q ðyÞ is a continuous function of y, differentiable by piece and G b q ðyÞ � 0 for all y 2 [0, 2], β q > 0, which validates the condition in (18).