ECG R-wave peaks marking with simultaneously recorded continuous blood pressure

Background and objectives ECG signal is relatively weak and vulnerable to various noise interferences, such as electromyography. There will be robustness problems when detecting the instantaneous heart rate independently. In some cases, multiple human physiologic parameters are monitored to help in heart rate detection. Methods In this paper, an algorithm that marks the R-wave peaks with the help of simultaneously recorded continuous blood pressure is proposed and tested on two databases. One database, called the challenge database, is provided by the PhysioNet/Computing in Cardiology Challenge 2014, and the other is the MGH/MF waveform database. Results The final scores of the proposed algorithm are 97.3% for the challenge database and 96.6% for the MGH/MF waveform database. Conclusions The experimental results show that this algorithm has high detection accuracy and a relatively strong robustness.


Introduction
Long-term and continuous monitoring of the instantaneous heart rate is the main means of human care [1] [2]. A common method to obtain the instantaneous heart rate is to decide the RR intervals of an ECG signal, making the key problem the detection of R-wave peaks [3].
A variety of algorithms have been applied to detect the R-wave peaks of an ECG. Generally, all these algorithms can be divided into three steps: (1)

2.2Proposed training algorithm
Before the main detection algorithm, introduction of a training stage is suggested. In real corresponding applications, it is reasonable and feasible to pre-monitor some people for a period of time to determine some individual parameters for the following long-time, regular detection, which is an important philosophy. The training algorithm proposed by us includes the following three aspects. First, an offline pacemaker pulse template library was formed. Compared to the regular ECG signal, a pacemaker ECG has pacemaker pulses with a narrow width and higher height, and it is easy to determine the pacemaker pulses as R-wave peaks, bringing about the false detection shown in Fig 1A, while the correct marks are shown in Fig 1B. Our solution is to first eliminate the pacemaker pulses based on the pacemaker pulse template library before the normal R-wave detection. To make the offline pacemaker pulse template library, we did the following: (1) for each pacemaker record, its pacemaker pulse peaks were marked with the help of the human eye, and (2) took 0.5 � LEN before each pulse peak and 0.5 � LEN after it as a pacemaker pulse complex (LEN = 1s), then averaged all the cut out pulse complexes to form a pacemaker pulse template.
Second, the combination vector of features used in R-wave confirmation were trained. Usually, we need to judge the initial ECG R-wave detection result and modify it. The confirmation of an R wave true or false detection should be based on multiple features of the wave. The method of linear discriminant analysis (LDA) can be applied to train the combination vector on the multiple features, with which we can form a combined feature to judge the R wave detection result. In this paper, two features of the wave are defined, one is the waveform similarity with the obtained current R-wave template, defined by their correlation coefficient, and the other one is the normalized wave peak (or amplitude) value. The steps of the LDA-based combination vector training using the two features are: (1) to establish the training samples, calculating the two features of the truly detected R waves and falsely detected R waves in the database and marking their class labels; (2) to calculate the combination vector with the established training samples according to the LDA theory. In this paper, we obtained the values of the combination vector as [0.9303, 0.3667], where 0.9303 is for the correlation coefficient, and 0.3667 is for the normalized amplitude.  Third, the time delay between BP peak and corresponding ECG peak was trained, marked as 'differ' here and is shown in Fig 2A. As we hope improve the algorithm effectiveness, this delay parameter is the most important one to use in the fusion algorithm. The 'differ' time delay between two corresponding peaks of ECG and BP is obtained from the cross-correlation function, as shown in Fig 2B. Noticing, the quality of the pre-monitored ECG and BP signals for determining the 'differ' should be good enough. In this paper, we select the choice ECG and BP segment to calculate the cross-correlation function, and if there are pacemaker pulses on ECG, we should eliminate them first based on the built pacemaker pulse template library as in the later main detection algorithm. The basic choice ECG and BP segment selection algorithm is as follows: (1) Initially detect peaks on each 12-s-length segments of ECG and BP, and make a waveform template for each of them respectively; (2) For each ECG segment, get the cross-correlation coefficient of each detected R wave with the template signal, called C i i = 1,. . ., N, N is the number of detected R peaks; then, set 'copeak' = median(C i i = 1,. . .,N); similarly, calculate 'cobp' for each BP segment; (3) Set the coefficient 'coeff' = 'copeak' � 'cobp' for each segment. If the max value of all 'coeff's is higher than 0.8, the corresponding ECG and BP segments are taken as the choice ones to calculate the best 'differ'. If not, set 'differ' to an empirical value (0.2 s) statistically.

Proposed detection algorithm
The overall block diagram for our proposed R-wave peak detection algorithm can be found in The basic strategy behind the algorithm is that when BP is not monitored, we detected the ECG R-wave peaks directly and output them as the final results; on the contrary, the fusion of ECG and BP respective detection results were used as an output. First, a threshold algorithm was used to determine whether BP is monitored. Take the threshold 'a' = 10 and 'b' = 2. If the peak value is less than or equal to threshold 'a' or the mean value of its absolute value is less than threshold 'b', BP is regarded as not being monitored.
In the following subsections, three blocks, including the ECG detection, BP detection and fusion, will be thoroughly explained.
2.3.1 ECG detection. The flowchart of ECG R-wave peak detection is shown in Fig 4. Preprocessing-ecg: The ECG preprocessing procedure can be seen in Fig 5. To remove the 50 Hz interference, a notch filter (at 50 Hz) is used. To suppress the baseline wander, a twoorder smooth filter (0.6 s window) is applied to ECG. Finally, a bandpass FIR filter between 0.5-80 Hz is used to further suppress the noise of the ECG. Remove pacemaker pulse if it exists: The overall decision tree for removing pacemaker pulse is shown in Fig 6. I. Preliminary peak detection: This step aims to find all paced beats if they exist, and a peak detector based on the max search is applied (Sameni 2010 [14]). In this detector, the window length of max search is variable, and the smaller it is, the more peaks will be detected. After many tests, we set it to 0.2 s, which is small enough to detect all the possible pacemaker pulses. Before detecting the wave peaks, we cut the signal into segments, with lengths set to 24 s, and then performed the peak detection above on each segment separately.
II. Determining whether the pacemaker pulse exists: To determine whether the pacemaker pulse exists, the specific procedures are as follows.
(1) Get the offline pacemaker pulse template library TemplateðiÞ; i ¼ 1; � � � ; M, which are made in advance. M is the number of templates in this library.
(2) Get the above preliminary peak detection results RpeakðjÞ; j ¼ 1; � � � ; N, N is the number of detected R waves; then, do the cross-correlation between Template (i) and Rpeak (j), and then obtain coeff (i, j), coeff is an M � N matrix.
(4) After processing all the rows similarly, the M median values can be obtained, and the template that has the highest median value, named P_template, can be selected as the most correlated one. Lastly, set coef ¼ maxðmeÞ.
(5) If coef>0.9, it indicates that on the ECG signal, pacemaker pulses exist.
III. Remove the pacemaker pulse when it exists: After determining the ECG signal pacemaker pattern, mark the peaks whose correlation coefficients with P_template are higher than 0.8, and then set their voltage to 0.
After determining the ECG signal pacemaker pattern, mark the peaks whose correlation coefficients with P_template are higher than 0.8, and then set their voltage to 0.
Peak Detection: The detection accuracy of the ECG signal will surely affect the final result, especially when BP is not monitored. Therefore, besides the ECG preprocessing method shown in Fig 5, we can use an additional preprocessing method called wavelet transformation to decrease the noise and interference. In this article, a CWT (continuous wavelet transform) was used. The wavelet basis is cmor1-1.5, and the wavelet scale is 120, when the sampling rate is 1000 Hz. When the sampling rate is 'fs', the wavelet scale calculation formula is as follows: After obtaining the enhanced signal using CWT, we also cut it into segments with lengths set to 24 s. Then, the peak detection will be performed on each segment separately.
The peak detection is also based on the peak detector (Sameni 2010). However, the differences from the detection of pacemaker pulses are: (1) the max search window length is set to 0.4 s; (2) a threshold is applied. The method of obtaining the threshold is: (1) cut the current detected segment into 10 shorter segments and get the maximal value of each shorter segment first; (2) the threshold is just 1/2 as much as the minimum of all the maximal values. The peaks whose values are higher than the threshold will remain.
Peak modification-ecg: The modification of the ECG peak detection result is based on pattern recognition. The main steps are: (1) extracting two features of each detected R-wave peak, one is the correlation coefficient with the ECG template (the template is made with R-wave peak detection results), and the other is the normalized peak amplitude; (2) using linear discriminant analysis (LDA) to combine the two features into a new one (the combination vector has been calculated in advance); (3) setting a threshold, and if the value of the new feature is higher than the threshold, the corresponding R-wave peak will remain; otherwise, it will be deleted.
Modification is performed on each segment whose duration is set to 24 s. When modifying, after calculating the feature combination value for each detected peak, the threshold for selecting the R-wave peaks is adapted. Try different thresholds (from small to large) and select the threshold that corresponds to the smallest standard deviation value of the RR intervals as the final threshold.
2.3.2 BP detection. The flowchart of BP wave peak detection is shown in Fig 7. Preprocessing-bp: The flowchart of BP preprocessing is shown in Fig 8. To suppress the baseline wander, a two-order smooth filter (2s window) is applied to the BP. Peak detection: Implementing the same peak detection based on the max search and threshold as ECG on the preprocessed BP signal, the BP detection results can be obtained.
Peak Modification-bp: The BP modification is based on the template method due to its simple and relatively stable waveform.
(1) Make a BP complex template whose duration is set to 1 s.
(2) Get the correlation coefficient between each detected BP complex and the template signal.
(3) Compare each coefficient with 0.6: if it is lower than 0.6, the corresponding detected peak will be deleted from the result.

Fusion.
In this step, first cut the signals into shorter segments, and for each segment, when BP is monitored abnormally (Fig 9A), we can only output ECG initial detection results directly. When the BP waveform is better than the ECG waveform (Fig 9B and 9C), considering that the continuous BP signal features a simple waveform, which is easy-to-understand and shows periodic synchronization with ECG, the BP initial detection result can be used to help detect the ECG R-wave (taking the detected BP peak positions as a reference and searching for ECG R-wave peaks based on the 'differ' within the range of 'thrm', which is set to 0.1 s after a large number of tests). In contrast, when the ECG waveform is better (Fig 9D), we can ignore the BP and output ECG initial detection results directly. The flow diagram of the fusion can be seen in Fig 10.
Compare the quality of ECG and BP: The template matching algorithm is used to compare the quality of the BP waveform and ECG waveform on each segment. Segments that are too short will make it difficult to establish a waveform template, while segments that are too long will lower the performance of the fusion step. Thus, first, we cut the ECG and BP signal into segments of 24 s in length and made templates for each segment. Then, each segment was cut into shorter segments of 4s in length, and the quality of ECG and BP on each shorter segment was compared. The specific steps are as follows: (1) Calculate the QRS template with the detected ECG R-wave peak positions on 24 s-long segments and the BP template similarly; (2) Cut the current segment into shorter segments of 4 s in length. For each shorter segment, compare the quality of the ECG and BP: (a) get the cross-correlation coefficient of each detected R wave with the template signal, called C i = (i = 1,. . .,n) (n is the number of detected ECG peaks on the current shorter segment); (b) calculate the ECG matching coefficient: Re ecg ¼ medianð½C 1 ; C 2 ; . . . ; C n �Þ; (c) similarly, obtain the BP matching coefficient Re bp ; (d) compare the matching coefficients of BP and ECG: Re bp and Re ecg , and if Re bp � Re ecg , this indicates that the quality of BP is better than ECG for this segment. Otherwise, it shows that the quality of ECG is better.
(3) For each shorter segment, output the detection results of the signal with better quality.

A special step for the MGH/MF waveform database
The above algorithm is based on the condition of one ECG and one BP signal. However, there are 3 ECG signal leads for each record of the MGH/MF waveform database, so a special step is added when testing on this database. This step, based on the template matching method, is used to pick out the best-quality ECG signal from the three ECG leads. For each ECG signal, we calculated its quality index name 'Reecg' and chose the lead with the highest 'Reecg' for further processing. The steps for calculating 'Reecg' are as follows: (1) Detect the R-wave peaks using the peak detector above, and make a 0.3-s-length QRS template; (2) Get the cross-correlation coefficient of each detected R wave with the template signal, called C i (i = 1,. . .,N); N is the number of detected R peaks; (3) For i = 1: N, if C i >th, T = T+1, endif, endfor, (th = 0.7); (4) Calculate the ECG quality index: Reecg = T / N.

Test and score
First, we tested the algorithm with the 200 training records from the challenge database given by the official website and then compared this with the official 'qrs' marks from the '.atr' files to get the final score. The details about the scoring rules can be found in Silva et al. [12]. In addition, Se (Sensitivity) = TP/(TP+FN) and PPV (Positive Predictivity) = TP/(FP+TP), where TP, FN, and FP represent the true positive (i.e., correct detected QRS), false negative (i.e., missed QRS) and false positive (i.e., extra detected QRS), respectively. Table 1 shows the results of the improved algorithm proposed in this paper and the recently published algorithms. The overall score of our algorithm is between the 1 st of Urška Pangerc et al. [15] and the 2 nd of Alistair E W Johnson et al. [16]. Similarly, we tested the algorithm with the MGH/MF waveform database and compared the score with that of Urška Pangerc et al. The overall comparative results are shown in Table 2.
The comparative scores show that our algorithm is superior to most competitors and comparable to the team (Urška Pangerc et al.) with the highest score.

Discussion
From the test results, it has been shown that the detection accuracy and robustness of the proposed algorithm are both superior to the most competitors' algorithms and very competitive with that of the best algorithm. Although fusion strategies are commonly used in the competitors' works in the PhysioNet/Computing 2014 Cardiology Challenge [15,16], the fusion method proposed in this manuscript is different and seems more efficient. We also think that there are two main factors contributing to our algorithm's effectiveness.
First is the setup of the offline pacemaker pulse template library in the training stage or premonitoring stage, which allows for the pacemaker pulses on the ECG to be detected and eliminated efficiently. From above Tables 1 and 2, we can know that the performance score of our algorithm is similar for recordings with and without artificial pacemaker pulses. We did T-test (α = 0.05) between recordings with artificial pacemaker and all recordings, also T-test (α = 0.05) between recordings without artificial pacemaker and all recordings, whose results are shown in Tables 1 and 2, indicating that our method has a good robustness to whether a data has pacemaker pulses or not.
Second is the R-wave judgment method. In the method, two features, excluding the RR intervals, are used, and their combination vector is determined using LDA at the training stage, which can improve the detection accuracy robustly. Since we do not use RR interval uniformity as a feature, the judgment method has good robustness in case of sudden RR-interval change. As seen in Fig 12C, when the RR-intervals show a sudden change, our algorithm still did not lead to false marks. Further, we tested the independence of our algorithm from sudden changes in heart rate using all data in each database. We defined the heart rate variability of each data as its RMSSD, and assigned every data properly into one of 8 equal-number groups divided by RMSSD. After calculating the performance scores SE and PPV for each data, we statistically obtained their minimum, maximum, average and standard deviation σ, and the gross and overall performances for each divided group. After that, a T-test (α = 0.05) was performed on Se and PPV of each group of data and them of all data. Tables 3 and 4 have listed the tested results. Fig 13A  and 13B are the graphs showing the overall performance score versus heart rate variability for two databases respectively.
Likewise, we tested the independence of our algorithm from the average heart rate size using all data in each database. We assigned every data into one of 10 pre-divided equal-bandwidth heart rate intervals, starting from 40 to 160 or 197 beats per minute. After calculating the performance scores SE and PPV for each data, we obtained the several statistic indexes and the gross and overall performance scores for each group, and the T-test (α = 0.05) was also performed. Their results for two databases are shown in Tables 5 and 6. Fig 13C and 13D are the curves of the performance score versus average heart rate for two databases.
In the above tables, H represents the T-test result, H = 0 represents that the group and the whole database pass the T-test at the level of α = 0.05, and the H 0 hypothesis is established, that is, we have sufficient confidence to say that there is no significant difference between the group and the population. Most of the H = 1 cases in the tables are caused by the concentration of the high scores (the average is high and the σ is small), and a small part is caused by the low scores caused by the poor signal in the sample (the average is low and the σ is large), which may be related to the fact that the data is not a standard normal distribution. Seemly, the scores are slightly reduced with the increase of HRV and average heart rate from the figures, but the T-test results show that this is not a significant trend. Therefore, generally speaking, the performance of our algorithm is not affected by the heart rate variability and average heart rate size. Compared with the official reference annotations, our algorithm still has some false marks or missed detections. Fig 14A shows a missed detection case when the quality of the BP is Interesting, in our experiment, we found a data that its official reference annotations may have too large R-peak position mark errors, resulting in our algorithm's low score for it, shown in Fig 14D (data mgh026 of MGH/MF database).
Above we discuss how to do a fusion to improve the R-wave peaks marking in this paper, actually, our core algorithm may allow for synchronous extraction of RR interval and systolic blood pressure, which can be used for spontaneous baroreflex testing. The synchronous (beatby-beat) extraction of RR intervals and systolic blood pressure values in the testing is not trivial, as the two time series may get out of synchronization if R-waves are missed but the systolic blood pressure peaks are still detected (or vice versa). We are interested in doing some work in adjusting our algorithm to adapt to spontaneous baroreflex testing better in the future. Positively, we must not use fusion of the detected peak results of the two kinds of series any more.

Conclusion
In this article, we put forward an algorithm to detect R-wave peaks with the help of the BP signal under the condition that the ECG and BP signal are recorded simultaneously. Experimental results show that this algorithm can surely improve the detection rate and robustness of the ECG R-wave peak detection. The algorithm is robust to ECG pacemaker pulses and sudden changes in heart rate.