State-dependent Gaussian kernel-based power spectrum modification for accurate instantaneous heart rate estimation

Accurate estimation of the instantaneous heart rate (HR) using a reflectance-type photoplethysmography (PPG) sensor is challenging because the dominant frequency observed in the PPG signal corrupted by motion artifacts (MAs) does not usually overlap the true HR, especially during high-intensity exercise. Recent studies have proposed various MA cancellation and HR estimation algorithms that use simultaneously measured acceleration signals as noise references for accurate HR estimation. These algorithms provide accurate results with a mean absolute error (MAE) of approximately 2 beats per minute (bpm). However, some of their results deviate significantly from the true HRs by more than 5 bpm. To overcome this problem, the present study modifies the power spectrum of the PPG signal by emphasizing the power of the frequency corresponding to the true HR. The modified power spectrum is obtained using a Gaussian kernel function and a previous estimate of the instantaneous HR. Because the modification is effective only when the previous estimate is accurate, a recently reported finite state machine framework is used for real-time validation of each instantaneous HR result. The power spectrum of the PPG signal is modified only when the previous estimate is validated. Finally, the proposed algorithm is verified by rigorous comparison of its results with those of existing algorithms using the ISPC dataset (n = 23). Compared to the method without MA cancellation, the proposed algorithm decreases the MAE value significantly from 6.73 bpm to 1.20 bpm (p < 0.001). Furthermore, the resultant MAE value is lower than that obtained by any other state-of-the-art method. Significant reduction (from 10.89 bpm to 2.14 bpm, p < 0.001) is also shown in a separate experiment with 24 subjects.


Introduction
In recent years, instantaneous heart rate (HR) estimation has attracted considerable attention owing to the advent of wearable devices such as wristwatches and bands that can be used to obtain photoplethysmographs (PPGs). At present, various commercially available reflectancetype wrist-worn PPG devices, such as Apple Watch, Fitbit Surge, and Samsung Gear, are capable of producing instantaneous HR estimates. However, the accuracy of most of these devices is limited to situations in which the wearer is at rest, walking, or performing low-intensity PLOS  multichannel PPG signals and multi-axis accelerometer signals simultaneously acquired by wrist-worn devices during physical exercise. In addition, chest ECG signals were simultaneously recorded using wet ECG sensors. The ECG-based HRs were considered to be true values; they were calculated using 8-s windows with 2-s shifts (6-s overlap), yielding HRs every 2 s. The same window length (8 s) and shift (2 s) were used throughout this study to assess the performance of the proposed algorithm against that of existing algorithms [1-3, 20-25, 27]. The ISPC dataset comprises two-channel PPG signals and three-axis acceleration signals sampled at 125 Hz, and it is publicly downloadable [28]. The data correspond to three exercise types: Type 1 (T1), Type 2 (T2), and Type 3 (T3). The T1 dataset (n = 12) includes the data of each subject running on a treadmill at varying speeds: 30 s of rest, 1 min at 6-8 km/h, 1 min at 12-15 km/h, 1 min at 6-8 km/h, 1 min at 12-15 km/h, and another 30 s of rest. The T2 dataset (n = 5) includes the data of each subject performing a variety of actions, including running, jumping, push-ups, shaking hands, stretching, and pushing. The T3 dataset (n = 6) includes the data of each subject performing high-intensity arm movements, such as boxing. The true HR was calculated from the ECG signals by manually identifying the individual R peaks in each time window. No R-peak detection algorithm was used to avoid any possible detection error.
The BAMI dataset comprises three-channel PPG signals and three-axis acceleration signals sampled at 50 Hz. Fig 1(A) shows the BAMI device, which consists of three photosensors (NJL5310R, NJR Corporation, Japan) for acquiring the PPG signals and an inertial measurement unit (IMU; LSM6DSMUSTR, STMicroelectronics) for acquiring the acceleration signals. The BAMI dataset is also publicly downloadable [26]. Fig 1(B) and 1(C) show examples of the measured PPG signals corrupted by low-and high-intensity MAs, respectively. Fig 1(D) and 1 (E) show the corresponding power spectra. It can be seen that the dominant frequency in the power spectrum corresponds to the true HR in the case of low-intensity MAs. On the other hand, the dominant frequency may not correspond to the true HR in the case of high-intensity MAs. The data were collected from 24 healthy subjects (10 male, 14 female; average age: 26.9 ±4.8 years) at Wonkwang University, recruited by trained personnel. The exercise protocol included 1 min of rest, 2 min of walking for warming up, 3 min of running at 6-8 km/h, 2 min of walking, 3 min of running at 8-12 km/h, and 1 min of walking for cooling down. The entire process was executed on a treadmill. For the reference true HRs, ECG data were simultaneously recorded at a sampling rate of 125 Hz by a 24-h Holter monitor (SEER Light, GE Healthcare, Milwaukee, WI, USA). Then, we manually identified the R peaks and computed the average RR intervals in each time window. Note that in the ISPC dataset, the number of cardiac cycles was manually counted in each 8-s window with 2-s shifts from the measured ECG signals [28].

Preprocessing
In this study, the instantaneous HR HR est (i) was estimated on the basis of the 8-s segmented PPG signals S np (i) and the acceleration signals A m (i) in the i th window, where np = {1, 2,. . . NP} (NP is the number of photosensors) and m = {1, 2,. . . M} (M is the number of acceleration axes). HR estimation was performed every 2 s (2-s shift; thus, 6-s overlap). To assess the performance of our algorithm, we used the same window length and shift as those used in previous studies [1-3, 20-25, 27]. A fourth-order Butterworth bandpass filter (BPF) with cutoff frequencies of 0.4 and 4 Hz was applied to the signals S np (i). The range of approximately 40-200 bpm covers the HRs of subjects of all ages, both at rest and during high-intensity physical activity [29,30]. The filtered signals were subsequently normalized to zero mean with unit variance in the i th window. The normalized signals were then averaged and down-sampled to 25 Hz to reduce the computational load. The power spectrum was subsequently computed by 2,048-point fast Fourier transformation (FFT) to provide high-frequency resolution (12.5 Hz/ 2,048 = 0.0061 Hz; 0.3662 bpm), and the results were normalized to have a minimum value of zero and maximum value of one, denoted by P S (i). Given a signal with a sampling rate of 125 Hz (without down-sampling), 8,192-point FFT is required to provide a similar frequency resolution (62.5 Hz/8,192 = 0.0076 Hz). The same steps (BPF, down-sampling, and normalization) were applied to the signals A m (i), and their normalized 2,048-point FFTs were denoted by P A (i).

HR estimation with MA cancellation
Given the computed power spectra P S (i) and P A (i), the true clean power spectrum P C (i) can be estimated as Eq (1) can also be expressed as By substituting P S (i) in the term P S ðiÞÀ P A ðiÞ Assuming that the power spectra P C (i−1) and P C (i) in consecutive windows nearly overlap, (3) can be substituted with P C (i−1) obtained in the (i-1) th window. Hence, P C (i) can be recursively estimated as Finally, we determined the dominant frequency in P C (i) for HR values of 0.6-3.3 Hz (approximately 40-200 bpm) to obtain HR est (i). It is shown that the power spectrum P C (i) is recursively obtained from the previous power spectrum P C (i−1) in the (i-1) th window. It is important to consider the previous power spectrum in order to efficiently suppress MAs, because MAs originate from dynamic frequency changes with higher uncertainties whereas clean PPG signals change slowly, assuming that the HRs in consecutive windows are close. It can be shown that the recursive estimation is similar to the Bayesian approach in that it estimates the current HR density function on the basis of the previous density function. On the other hand, it differs from the Bayesian approach in that its prediction is not based on the HR transition model. Fig 2 shows time-frequency spectrum (TFS) of the PPG signals for subject 1 measured by the BAMI device. In Fig 2(A), the black circles represent the true HRs, and it can be seen that some frequencies from the measured PPG signals reflect the true HRs. In Fig 2(B), the black circles represent the dominant frequencies of the PPG signals, and it can be seen that they do not overlap the true HRs. Hence, if the HR is estimated by directly identifying the dominant frequency in the power spectrum P S (i), the result would be inaccurate. In Fig 2(C), the black circles represent the dominant frequencies of the three-axis acceleration signals, and they indicate the detection of MAs by the power spectrum P A (i). In Fig 2(D), the black circles represent the estimated HRs after MA cancellation using the acceleration signals. The results show that the use of P C (i) after MA cancellation improves the accuracy of HR estimation.
For more rigorous comparison and analysis, the performances of the proposed algorithm with and without MA cancellation were compared for both the ISPC dataset and the BAMI dataset. The results are summarized in Table 1A and 1B. It can be seen that the use of P C (i) significantly decreases the MAE from 13.71 to 6.73 bpm and from 21.27 to 10.89 bpm for the ISPC dataset and the BAMI data, respectively. The overall MAE from both datasets decreases from 17.57 bpm to 8.86 bpm, which is summarized in Table 2. However, MA cancellation has an inherent limitation and does not always work for all data. It is inapplicable under certain conditions, such as when the MAs are uncorrelated with the acceleration signals, the SNR is extremely low, and the overlap between the true HR and the MA frequency is extremely small. For instance, in the cases of subjects 10,14,15,17, and 20 in the ISPC dataset and subjects 5, 6,9,11,16, and 18 in the BAMI dataset, the MAEs were extremely high both with and without

Result validation using FSM framework
The FSM framework was used to eliminate inaccurate estimation results and thus overcome the above-mentioned limitation. Four states are defined in the framework, namely stable, recovery, alert, and uncertain states. After the estimation of HR est (i) with MA cancellation, the FSM framework is used to determine the state and validate the estimation result in real time. A stable state indicates that the estimated HR is highly likely to be accurate and it is thus declared valid. A recovery state indicates that the estimated HR is somewhat likely to be accurate with the need to explore possible transition to the stable state. An alert state indicates that the estimated HR is somewhat likely to be inaccurate. An uncertain state indicates that the estimated HR is highly likely to be inaccurate. The FSM framework transits from one state to another in every window in response to the estimation accuracy indicators, namely the crest factor (CF) and the HR change between consecutive windows. The CF is the ratio of the dominant frequency power to the root mean square of the total power of P S (i). The higher the CF, the less corrupted is the PPG signal. Thus, the CF condition CF(i)�TH CF indicates that P S (i) is acceptable for HR estimation, where CF(i) is the CF value in the i th window and TH CF is the CF threshold value. The HR change between consecutive windows is based on the observation that the absolute HR difference every 2 s is approximately 5 bpm at the 99% level in the ISPC database. Thus, the HR change condition |HR est (i)−HR est (i−1)|�TH HR , where TH HR is the HR change threshold, indicates acceptable estimation of HR est (i) on the basis of HR est (i−1) in the (i-1) th window. The FSM framework thus determines the state on the basis of the CF and HR change, and only a stable state result is declared valid. Other state results are declared invalid and discarded. Accordingly, the FSM automatically validates the estimation results without the true HR value and ignores inaccurate estimation results caused by extremely low SNRs in the PPG signals or by MAs uncorrelated with the accelerometer signals. The details of the framework are presented elsewhere [27].   frequencies in the PPG signals. However, even after MA cancellation, the most dominant frequency in the power spectrum P C (i) is 0.8 Hz, as shown in Fig 4(C), whereas the frequency power at the true HR (2.1 Hz) appears as a slightly weaker peak. Even if the MA frequencies were correctly identified using the acceleration signals and the corresponding MAs were removed, the MA frequency power was still greater than any other frequency power. This is because the MA frequency power in the power spectrum P S (i) overwhelmed the frequency power of the true HR; thus, the MA frequency power was slightly greater than the frequency power of the true HR in the power spectrum P C (i). Such inaccurate estimates are often observed when the SNR is extremely low, with the pure PPG signals overwhelmed by the MAs. When the FSM framework is applied under these conditions, the estimate may be declared invalid and ignored, assuming that HR est (i−1) is accurate. This would prevent degradation of the estimation accuracy, although the valid HR rate (VHR) would decrease. To overcome the low SNR problem, the power spectrum P S (i) was modified by emphasizing the power of the frequency corresponding to HR est (i−1), assuming that it is accurate. As expressed by Eq (5), a modified power spectrum denoted byP S ðiÞ was then obtained by multiplying the power spectrum P S (i) with the Gaussian kernel function, which has a mean value of HR est (i−1).

Gaussian kernel-based power spectrum modification
where HR est (i−1) is assumed to be accurate. Otherwise,P S ðiÞ ¼ P S ðiÞ. The standard deviation was set to σ = 1 Hz or 60 bpm and its effect was investigated, as discussed in the "Results" section below. Fig 4(D) shows the Gaussian kernel-based modified power spectrumP S ðiÞ. By emphasizing the power of the frequency corresponding to HR est (i−1), the highest peak of the power spec-trumP S ðiÞ occurs at the true HR frequency of 2.1 Hz. Furthermore, after MA cancellation, the resultant clean power spectrumP C ðiÞ has a much more dominant peak at the true HR frequency, as shown in Fig 4(E). The FSM framework validates the HR estimate in real time and the Gaussian kernel-based modification is only applied when HR est (i−1) is declared valid. In addition, the modification requires the assumption that the HRs in two consecutive windows are close. This assumption was satisfied in the present study because the HR estimation was performed within 8-s windows with 2-s shifts (6-s overlap).
The flowchart of the proposed algorithm is shown in Fig 5. In the i th window, the PPG and acceleration signals (i.e., S n (i) and A m (i)) are acquired, and the power spectra (P S (i) and P A (i)) are computed in the preprocessing stage. If the FSM framework declares HR est (i−1) to be valid in the (i-1) th window, P S (i) is modified using the Gaussian kernel function in Eq (5). Otherwise, P S ðiÞ ¼ P S ðiÞ. The clean power spectrumP C ðiÞ is subsequently computed with MA cancellation based onP S ðiÞ and P A (i). Finally, HR est (i) is obtained by identification of the dominant frequency inP C ðiÞ over the HR range 0.6-3.3 Hz and validated by the FSM framework.

Evaluation metrics
The performance of the proposed algorithm was evaluated by comparing its results with those of previously developed algorithms [1-3, 20-25, 27] using the ISPC (n = 23) and BAMI (n = 24) datasets. It should be noted that N = 2 for the ISPC dataset and N = 3 for the BAMI dataset. For both datasets, M = 3. Two methods were adopted to evaluate the HR estimation performance of the proposed algorithm using state-dependent Gaussian kernel-based power spectrum modification with the FSM framework (FSM-SGPS). The first method involved direct frequency determination of the dominant frequency (DFDF) in the power spectrum P C (i) after MA cancellation. The second method involved a combination of the FSM framework and DFDF (FSM-DFDF). The accuracy of the algorithm was further evaluated by calculating the absolute error (AE) of its estimation: where HR true (i) is the true HR (bpm) in the i th window. The overall evaluation of HR estimation was performed on the basis of the MAE (bpm), average of the relative AEs (ARE) (%), and valid HR rate (VHR) (%) as the percentage of valid results among all windows: where N window is the total number of windows used for HR estimation and N valid is the number of windows declared valid by the FSM framework. In accordance with the previous FSM framework-based study [27], we used TH CF = 2.4 and TH HR = 5.03 bpm. In addition, for statistical analysis, we performed one-way analysis of variance (ANOVA) using MATLAB (Math-Works, Natick, MA, USA) to compare the resultant HR means from different algorithms in order to obtain statistical evidence as to whether the associated means are significantly different. A p value below 0.05 was considered significant.

Optimal parameter search
To determine the optimal value of the standard deviation σ in Eq (5), various values between 10 bpm and 80 bpm in increments of 10 bpm were used to apply the proposed FSM-SGPS algorithm to all the 23 subjects of the ISPC dataset. Fig 6(A) and 6(B) show the estimation MAEs and VHRs with respect to σ. It can be seen that the MAE drastically decreases as σ increases from 10 to 30 bpm and then becomes constant as σ further increases to 80 bpm. Similarly, VHR drastically decreases as σ increases from 10 to 30 bpm and then becomes constant as σ further increases to 60 bpm. The optimal value of σ may vary with the subject, exercise type, and exercise intensity, but the performance is nearly the same for σ values of 30-60 bpm. Hence, σ = 60 bpm was adopted for further performance analysis.  Power spectrum modification for accurate HR estimation

Results for ISPC data
It can be seen that FSM-SGPS produces more accurate estimates compared with DFDF and higher VHR results compared with FSM-DFDF. Table 3 summarizes the overall performance comparison of the three estimation methods. Applying the FSM-SGPS method to all the subjects of the ISPC dataset yielded an MAE of 1.20 bpm, an ARE of 1.05%, and a VHR of 78.84%. Thus, the MAE was 0.21 bpm higher than that of FSM-DFDF but 82.17% lower (-5.51 bpm) than that of DFDF. The FSM-SGPS results were statistically different from the DFDF results (p < 0.001) but not significantly different from the FSM-DFDF results. However, the VHR of the proposed algorithm was much higher (21.40% higher) than that of FSM-DFDF. This implies that FSM-SGPS minimizes the invalidity rate of the HR results by increasing the overall accuracy even under severe MA corruption, while FSM-DFDF ignores the results as far as possible on the basis of the CF and the HR condition. For instance, for subjects 10,13,14,15,17,18,19,20, and 21, the VHRs of FSM-DFDF were less than 50% (average = 26.9%), indicating that more than 50% of the results were ignored. For the same subjects, FSM-SGPS increased the average VHR to 60.03% by minimizing the invalidity rate of the HR results. Table 4 summarizes the performance comparison of the proposed method and other previously proposed methods. It should be noted that some of the previous methods were tested for only the first 12 subjects of the ISPC dataset, while the others considered subjects 14-23. Only a few methods considered all 23 subjects. For a more accurate comparison, the test results are shown for subgroups of the study participants (subjects 1-12, subjects 13-23, subjects 14-23, all subjects except subject 13, and all 23 subjects). The results indicate that the FSM-SGPS algorithm yields more accurate HR estimates than all the other methods.    The results of subject 6, whose PPG signals were severely corrupted with MAs, are noteworthy. DFDF produced an extremely high MAE of 33.06 bpm, and FSM-DFDF did not produce any significant accuracy enhancement (MAE = 33.88 bpm). Fig 9(A) shows the estimated HR trace for FSM-DFDF. As can be observed from the figure, the estimated HRs in regions A, B, and C deviate drastically from the true HRs, indicating that the FSM framework validated the estimates in these regions because of the absence of large variations in the estimated HR changes for consecutive windows and consistently high CF values over a long period. Conversely, FSM-SGPS increased the overall accuracy through modification of the power spectrum P S (i), as reflected by the MAE and VHR values (see also Fig 9(B)). The results indicate that the estimation results were improved, especially in regions B and C. The incorrect estimation results in region A will be discussed in the next section.

Discussion and conclusions
In the pre-processing stage, we down-sampled the signals to 25 Hz. Regarding the effects of the sampling rate, it was reported that the HR estimation performance was nearly the same but the computational time was drastically reduced by down-sampling the signals [20]. Moreover, in [2], the HR estimation performances were compared at different sampling frequencies of 25, 125, 250, and 500 Hz. The results showed that the HR estimation results were similarly accurate for the first 12 subjects of the ISPC dataset, i.e., MAEs of 1.02 bpm (25 Hz), 1.06 bpm (125 Hz), 1.10 bpm (250 Hz), and 1.12 bpm (500 Hz). Our results also showed consistent trends. Even with a different sampling rate, the HR estimation performance was nearly the same. The MAE values with a sampling rate of 125 Hz were 1.21 bpm and 2.15 bpm for the ISPC and BAMI datasets, respectively. Note that the MAE values with a sampling rate of 25 Hz were 1.20 bpm and 2.14 bpm for the ISPC and BAMI datasets, respectively. As down-sampling reduces the computational load without accuracy degradation, most existing algorithms including our method down-sampled the signals to around 25 Hz [1-3, 20-25, 27].
There are numerous MA patterns in PPG signals, all of which are unpredictable. Fig 10  shows some examples of corrupted PPG signals along with the simultaneously measured ECG signals during high-intensity exercise. Moreover, in the measured PPG signals, the MAs are shaped similarly to ordinary pulses and are therefore difficult to distinguish. To obtain accurate HR estimation results, considerable research effort has been devoted toward the use of simultaneously measured acceleration signals as noise references in HR estimation algorithms. As numerous MAs are combined with real pulse waves in an unpredictable manner, exact mathematical formulation is difficult. In this study, we simplified the additive relationship between the two spectra. Furthermore, many such approaches have been proposed. For instance, in [25], the MA frequency powers were suppressed by dividing a constant value in the MA frequency range. In [31], the MA frequency power from an acceleration signal was considered as the probability of the event that the corresponding frequency is not the HR while the frequency power from the PPG signal was considered as the probability of the event that the corresponding frequency is the HR. In [21], the frequency peaks in the PPG spectrum were compared with the frequency peaks in the accelerometer spectra, and the HR candidates were eliminated if the peaks were overlapped. We will investigate the relation in future work. Based on each proposed relationship, the accelerometer-assisted MA cancellation algorithms provided accurate HR estimation results. However, these results were not always accurate. To address this issue, we modified the power spectrum P S (i) by applying a Gaussian kernel function with a mean value of HR est (i−1) when the state in the i th window was stable. As a result, the MAE values improved to 1.20 bpm from 6.73 bpm for the ISPC dataset and to 2.14 bpm from 10.89 bpm for the BAMI dataset. On the other hand, when the power spectrum modifications were performed for all the windows regardless of the state, the MAEs increased to 1.79 bpm from 1.20 bpm, and to 4.75 bpm from 2.14 bpm, for the ISPC and BAMI datasets, Power spectrum modification for accurate HR estimation respectively. The AREs also increased to 1.67% from 1.05%, and to 3.55% from 1.73%, for the ISPC and BAMI datasets, respectively. In addition, we found that the VHRs slightly decreased to 77.92% from 78.84%, and to 87.82% from 90.48%, for the ISPC and BAMI datasets, respectively. Thus, the power spectrum modification is more effective when the state in the i th window is stable on the basis of the FSM framework.
Compared with FSM-DFDF for the ISPC dataset, FSM-SGPS slightly increased the MAE from 0.99 to 1.20 bpm, whereas it increased the VHR considerably from 57.44% to 78.84%. For the BAMI dataset, the MAE decreased from 3.08 to 2.14 bpm, while the VHR increased from 72.71% to 90.48%. However, as shown in Fig 9(B), the proposed FSM-SGPS algorithm still has some limitations for accurate estimation and validation of all HRs. Its deficiency can be observed when the PPG signal does not reflect the true HR over a long period, even with MA cancellation. The power spectrum P S (i) cannot be modified under this condition owing to the unavailability of the estimation result of the previous window. Fig 9(C) and 9(D) show the 8-s segment example of the measured PPG signal and the corresponding reconstructed PPG signal after MA removal followed by inverse FFT in region A, respectively. Fig 9(E) shows the simultaneously measured 8-s ECG signal, in which the R peaks do not correspond to the pulse peaks from the PPG signal. Extremely low SNR causes this condition, which is often encountered when there is a severe pressure change between the photosensor and the measurement site (wrist), with the pressure change completely overwhelming the pure PPG signal. Regardless of how tightly the device is worn on the wrist to avoid pressure changes, this situation may occur under certain circumstances, such as during high-intensity exercise. Hence, there is a need for further investigation of hardware that minimizes pressure changes. Alternatively, a pressure sensor may be embedded in the wearable device and additional MA cancellation can be applied on the basis of its signals.