Validity of (Ultra-)Short Recordings for Heart Rate Variability Measurements

Objectives In order to investigate the applicability of routine 10s electrocardiogram (ECG) recordings for time-domain heart rate variability (HRV) calculation we explored to what extent these (ultra-)short recordings capture the “actual” HRV. Methods The standard deviation of normal-to-normal intervals (SDNN) and the root mean square of successive differences (RMSSD) were measured in 3,387 adults. SDNN and RMSSD were assessed from (ultra)short recordings of 10s(3x), 30s, and 120s and compared to 240s–300s (gold standard) measurements. Pearson’s correlation coefficients (r), Bland-Altman 95% limits of agreement and Cohen’s d statistics were used as agreement analysis techniques. Results Agreement between the separate 10s recordings and the 240s-300s recording was already substantial (r = 0.758–0.764/Bias = 0.398–0.416/d = 0.855–0.894 for SDNN; r = 0.853–0.862/Bias = 0.079–0.096/d = 0.150–0.171 for RMSSD), and improved further when three 10s periods were averaged (r = 0.863/Bias = 0.406/d = 0.874 for SDNN; r = 0.941/Bias = 0.088/d = 0.167 for RMSSD). Agreement increased with recording length and reached near perfect agreement at 120s (r = 0.956/Bias = 0.064/d = 0.137 for SDNN; r = 0.986/Bias = 0.014/d = 0.027 for RMSSD). For all recording lengths and agreement measures, RMSSD outperformed SDNN. Conclusions Our results confirm that it is unnecessary to use recordings longer than 120s to obtain accurate measures of RMSSD and SDNN in the time domain. Even a single 10s (standard ECG) recording yields a valid RMSSD measurement, although an average over multiple 10s ECGs is preferable. For SDNN we would recommend either 30s or multiple 10s ECGs. Future research projects using time-domain HRV parameters, e.g. genetic epidemiological studies, could calculate HRV from (ultra-)short ECGs enabling such projects to be performed at a large scale.


Methods
The standard deviation of normal-to-normal intervals (SDNN) and the root mean square of successive differences (RMSSD) were measured in 3,387 adults. SDNN and RMSSD were assessed from (ultra)short recordings of 10s(3x), 30s, and 120s and compared to 240s-300s (gold standard) measurements. Pearson's correlation coefficients (r), Bland-Altman 95% limits of agreement and Cohen's d statistics were used as agreement analysis techniques.

Results
Agreement between the separate 10s recordings and the 240s-300s recording was already substantial (r = 0. cardiovascular disease. PREVEND subjects completed a first survey between 1997-1998. During the second (between 2001-2003) and third (between 2003-2006) screening rounds beat-tobeat blood pressure recordings were collected during a 15minute supine resting period, which were used for HRV calculations (details given below). All subjects gave written informed consent. The PREVEND study was approved by the medical ethics committee of the University Medical Center Groningen and conducted in accordance to the Helsinki Declaration guidelines.

Measurement procedure
Using a standardized procedure, continuous beat-to-beat pressure recordings on the middle finger using a Portapres 1 pressure recording device (FMS Finapres Medical systems BV, Amsterdam, The Netherlands) and Beatscope software (Finapres Medical Systems, Amsterdam, The Netherlands) were used to measure NN-interval time series. The cuff of the Portapres 1 was placed on the middle finger of the dominant arm. The subjects were measured in the supine position in a quiet room at constant temperature (22°C), breathing spontaneously and holding the Portapres cuff at heart level, and were not allowed to talk or move during the measurement.

Processing of data
Before HRV analysis the pulse wave data was visually pre-processed to exclude non-sinus rhythm, ectopic beats, and artifacts, such as premature ventricular beats, electrical 'noise', or aberrant beats. NN-intervals from the beat-to-beat blood pressure signals were detected, with an accuracy ±5ms. Artifacts were removed and the resulting gaps were interpolated. The NNinterval detection and interpolation algorithm used has been previously described [17]. When a recording measured had more than 5% interpolated NN-intervals, the data were considered invalid and discarded. From these processed beat-to-beat blood pressure signals the deflections were detected and all intervals in-between these deflections (NN-intervals) were used to calculate SDNN and RMSSD. SDNN and RMSSD were obtained using the CARSPAN 2.0 program (IECProgramma, Groningen, the Netherlands), which is a software package specifically designed for cardiovascular spectral analysis [18]. From the 15 min of recorded signal we selected the last 4 to 5 min with a stationary time series. This recording length of 240s to 300s of high quality signal was considered the gold-standard recording length. SDNN and RMSSD were calculated for this total recording length. Three non-overlapping 10s recordings were randomly selected from a subject's total recording, while periods of 30s and 120s were selected from the start of the total recording. In addition we also calculated the average SDNN and RMSSD of the three 10s recordings (Avg10s) (Fig 1). After data processing we had HRV data of 3,387 subjects that were used for analysis.

Statistical analyses
Prior to the analyses, SDNN and RMSSD data were log-transformed to obtain approximately normal distributions. Pearson's correlation coefficients(r) for SDNN and RMSSD were calculated between the gold-standard recording and the three separate 10s, the Avg10s, the 30s, and the 120s recordings. However, a correlation coefficient is blind to the possibility of bias caused by the differences in the mean and/or standard deviation (SD) between the two measurements. More specific, a strong correlation does not necessarily imply a close agreement. Therefore the Bland-Altman procedure was used to calculate 95% LoA [12,13]. In contrast to the traditional Bland-Altman plots we plotted the measurement of the gold standard on the x-axis [19]. The bias was calculated as the mean difference between the HRV measurements of the gold standard and those of the (ultra-)short recording periods. Furthermore, we calculated Cohen's d statistics to quantify the bias of the HRV measurements of different recording lengths relative to their within-group variations [14]. This was done by dividing the bias in HRV by the standard deviation (SD) of the total recording. For example, a Cohen's d of 0.027 is the difference between two recording means of 2.7% of the SD of the total recording could be interpreted as a very small effect (where d = 0.20 is a small, d = 0.50 is a moderate, and d = 0.80 is a large difference) [14,20]. In addition, to measure the reliability of our 10s recording periods we calculated the intra-class correlation coefficients (ICC; absolute agreement, two-way analysis of variance) between the three 10s measurements for both RMSSD and SDNN. Stata v11.2 (StataCorp LP, Texas, USA) was used for all statistical analysis. P-values <0.05 were considered statistical significant.

Simulation study
As a result of our study design, measures based on the (ultra-)short segments are not independent from the total (gold standard) period from which they were selected, which automatically generates an inflation of the correlations, Cohen's ds, and 95% LoAs that we determine in this study. Therefore, we conducted a simulation study using a bootstrapping procedure in order to assess the correlations, 95% LoAs, and Cohen's d statistics expected under the null hypothesis of no agreement between the measurements of the (ultra-)short recordings and the remainder of the total recording. That is, the only agreement between HRV measurements from the (ultra-)short and total recordings arises from the (ultra-)short recording being part of the total recording.
The HRV values for the remainders of the total recording (i.e. of length 230-290s for the 10s recordings, of length 210-270s for the 30s recordings, and of length 120-180s for the 120s recordings) were approximated by subtracting HRV based on the (ultra-)short recording from HRV of the total recording using a mathematical formula for decomposing variances. Formula (1) shows how HRV from a 290s recording is approximated by subtracting HRV from a 10s recording from a total recording of 300s.

HRV 290s
ð Þ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi HRVð300sÞ 2 Á ðNð300sÞ À 1Þ À HRVð10sÞ 2 Á ðNð10sÞ À 1Þ where N(xs) is the number of NN intervals for the xs recording. Next 3,387 HRV values from the actual data set of (ultra-)short recordings and 3,387 HRV values from the actual data set of corresponding remainders were drawn independently of each other with replacement and then each pair of HRV values was combined to approximate HRV from a total recording using a mathematical formula for adding independent SDs. For example, to simulate HRV from a 300s recording under the null hypothesis, HRV from a 10s recording was selected as well as HRV from a 290s recording and from these two values HRV from a 300s recording was approximated using Formula (2).
Correlation coefficients, 95% LoAs, and Cohen's ds were computed to determine the agreement of the 10, 30, and 120 NN interval measurements with the total recording under the null hypothesis. This procedure was repeated 1,000 times and for each of the HRV variables (SDNN or RMSSD) measured from each of the (ultra-)short recordings (10s, Avg10s, 30s, and 120s) 95% reference ranges were determined for the correlation coefficients, 95% LoAs, and Cohen's ds. The observed values were compared to these ranges expecting that the observed values will show more agreement than expected and hence fall outside the simulated reference ranges (see Fig 2). An observed value outside the corresponding reference range indicates a significant difference (p<0.05).

Results
In our sample of 3,387 subjects the mean age was 53 years and 51% were women. The average total recording length was 294s (min-max:239-302s) with a total population heart rate average of 68(SD:±10) beats per minute. At the second screening, our total population had 6.7% of subjects with a recent cardiovascular event, 39% were hypertensive, 7.6% had diabetes mellitus type 2, 34% had hypercholesterolemia and 22% had chronic kidney disease. Median values for SDNN and RMSSD were similar for males and females (see Table 1). We observed the usual strong inverse correlation with age for both SDNN (r = -0.30) and RMSSD (r = -0.28). The 3,387 subjects used in the current study that had good quality HRV measures available constituted about half of the total sample size of the second screening of PREVEND. As shown in Table 2 characteristics of the subjects used in the current study were very similar to those of subjects not used in the current study. Table 3 shows the raw and natural log transformed SDNN and RMSSD categorized by recording length. It shows that the mean values of RMSSD and, particularly, SDNN increased for longer recording lengths. This increase was 1.32ms for the mean RMSSD (from 28.16 for Avg10s to 29.48 for the total recording), while the mean SDNN increased 9.94ms (from 25.87 for Avg10s to 35.81 for the total recording).

Pearson's correlation coefficients
Correlation between a single 10s recording and the gold-standard recording was already substantial (r = 0.758-0.764 for SDNN; r = 0.853-0.862 for RMSSD) and increased significantly for Avg10s (r = 0.863 for SDNN; r = 0.941 for RMSSD) [ Table 4; Fig 3a]. For both SDNN and RMSSD the correlations of Avg10s were similar to those of the 30s recordings (r = 0.863 and 0.859, respectively for SDNN; r = 0.941 and 0.932, respectively for RMSSD). Near perfect correlations with the gold standard were found for the measurements of the 120s recording (r = 0.956 for SDNN and r = 0.986 for RMSSD). Overall the correlations were lower for SDNN compared to RMSSD, but this difference became smaller with the increase of recording length. The differences in correlation between SDNN and RMSSD were significant as shown by their non-overlapping 95%CI.
d Defined as total cholesterol ! 6.21 mmol/L, or lipid lowering T x . e Calculated using the CKD-EPI serum creatinine-serum cystatin C equation.

Intra-class correlation coefficients
To measure the reliability of our three 10s recording periods we calculated their ICCs for RMSSD and SDNN ( Table 5). The ICC was modest between the three 10s recordings for SDNN (0.657-0.670) and improved for RMSSD (0.740-0.751).

Discussion
In order to investigate the utility of routine 10s ECG recordings for HRV calculation in largescale epidemiologic studies we evaluated the agreement of SDNN and RMSSD between (ultra-) short recordings and a gold-standard recording of 240s to 300s in 3,387 adults. We showed that RMSSD consistently outperformed SDNN. RMSSD measured from recordings of only 10s in length are already reliable and good proxies for those measured from longer recording lengths (240s-300s), in particular when the measurements from multiple 10s recordings are averaged. For SDNN the measurements from 10s recordings were reliable, but although they correlated moderately (for the single recordings) to strongly (for Avg10s) with the gold standard, agreement was poor in both cases (i.e. Cohen's d close to 1) and hence are bad proxies. For SDNN measured from 30s recordings the agreement with the gold standard was still only moderate, but sufficient to yield reliable estimates of "actual" SDNN. SDNN and RMSSD measured from 120s recordings were both in high agreement with the gold-standard recordings. Our findings that RMSSD measured from 10s recordings is a good proxy for the "actual" RMSSD, but that this doesn't hold for SDNN, are in line with previous studies [8][9][10][11]. All of Validity of (Ultra-)Short Recordings for HRV Measurements these also observed that measurements from ultra-short recordings yield good estimates of RMSSD, while for SDNN the agreement is not sufficient to provide reliable estimates for the "actual" SDNN. In addition we and others observed that the correlation or agreement increased with an increase of the recording length for RMSSD and especially for SDNN [10,11]. The high dependence of SDNN on recording length is to be expected because SDNN reflects the total power of all HRV frequency components combined whereas RMSSD is a reflection of high frequency HRV components only [3]. Furthermore in line with our findings others have shown that averaging HRV measures obtained from sequential time periods reduces the error imposed by the analysis of very short segments [8,11]. We found that the reliability of the three individual 10s recording periods was substantial, in particular for RMSSD.
In our study we chose to extract the (ultra-)short recordings from the total recording length to specifically address our research question whether HRV measured from (ultra-)short  Validity of (Ultra-)Short Recordings for HRV Measurements recordings reflect the "actual" HRV. Our design differs from that of Schroeder and colleagues [11], who measured HRV at sequential time periods. Their design is more suited to assess the repeatability (or reliability) of HRV measurements, while our study design reflects our focus on the validity of (ultra-)short recordings for HRV measurements in the time domain (SDNN, RMSSD) compared to a gold-standard recording period of 240s to 300s. A consequence of our study design is that the measurements of the (ultra-)short recordings are not independent of the total recording and hence correlations and agreement measures are expected to be inflated. Nevertheless for both HRV measures all observed correlations were significantly higher and all 95%LoAs significantly smaller than those simulated under the null hypothesis supporting the validity of HRV measurements based on (ultra-)short recordings. The biases and Cohen's d for both HRV measures did not differ from the expectation. This can be explained by the fact that the distributions of the simulated HRV measures from the (ultra-)short and total recordings are similar to those of the observed ones, leading to similar mean differences between the HRV measurements of the gold standard and those of the different (ultra-)short recording periods. However, the variation in those paired differences between the observed measurements of the (ultra-)short segments and those of the total recording is smaller than from the respective paired differences of the simulated measurements, explaining the much higher correlation and tighter 95%LoAs. In this study we analyzed a general population in which the mean age was 53 years and both sexes were included [15,16]. Previous studies [8][9][10][11] only included healthy individuals and Dekker et al. [8] further limited their study population to young men (mean ± SD age 25.9 ±3.8years), thereby reducing the generalizability of their results even more. Therefore our results are more representative of the general population. However, 10s ECGs in cases with cardiac arrhythmias should be used with caution because given the very low number of beats in 10s, one artefact caused by cardiac arrhythmia will make up about 5% of the total duration of the recording depending on the heart rate. Therefore for calculating RMSSD and SDNN we suggest the following criteria: (a) one artefact (of any kind such as detection failure or arrhythmia, harmless or not) at the beginning or at the end of a recording should be excluded and the remaining part of the segment should be used, and (b) other artefacts, not at the beginning or at the end, or more than one, means the exclusion of the entire segment. This is because we would need a continuous segment to calculate the successive differences (i.e. SDNN and RMSSD) and one interruption would imply a great loss of successive differences.
A major strength of our study was the large sample size of 3,387 subjects, which allowed for precise estimates of agreement measures between different recording periods. Furthermore the significance of our study is reinforced by our statistical approach. We calculated not only Pearson's correlation coefficients to measure the strength of linear association between the recordings, but also used Bland-Altman's statistics [12,13] and Cohen's d [14] to evaluate the degree of bias. As pointed out by Altman and Bland correlation coefficients are not sufficient to demonstrate the agreement of measurements [12,13]. No previous studies have used these different agreement analysis techniques. The importance of considering measurements of differences is demonstrated when comparing our results of the Pearson's correlation coefficients and Cohen's d statistics. For instance a substantial decrease in Cohen's d statistic from Avg10s to 30s is shown for SDNN, while the Pearson's correlation coefficients remains the same. Therefore, only considering Pearson's correlation coefficient results for SDNN would lead to an erroneous interpretation.
Unlike other studies [8][9][10][11] that also measured frequency domain HRV parameters such as the high frequency (HF) component we limited our study to time domain parameters RMSSD and SDNN. This was because ECGs of less than 60s duration are not sufficient to assess the HF components and ECGs of at least 120s should be used to address the low frequency components [1][2][3]. Therefore our conclusions do not apply to HRV parameters in the frequency domain.
An important implication of our study is that 10s ECG recordings could be used for calculating time-domain HRV parameters, particularly RMSSD, in future epidemiologic studies. In standard in-clinic evaluation of heart rate dynamics, 300s is the recommended length of measurement [3]. Nevertheless, 10s recordings from 12-lead ECGs are already commonly used to detect resting abnormalities in interval lengths, wave morphology and segment elevation/ depressions [10] and have already shown their usefulness as diagnostic tool [5,6,21]. For example, reduced HRV measured from three 10s ECG recordings was recently found to be associated with an increased incidence of heart failure [21]. An example of our findings applicability is genome-wide association studies (GWAS), where large sample sizes are needed to detect small effects of genetic variants. A large number of cohorts may have short ECG recordings available but may not (yet) have measured RMSSD (and SDNN). The increase in sample size when using RMSSD (and SDNN) from these cohorts in a GWAS will most likely outweigh the loss in accuracy of the phenotype measurements and hence permit the identification of more genetic variants.
In summary, from our unprecedented large sample size, the selection of our (ultra-)short recording from our total recording, our careful data processing and our sophisticated statistical analysis we can conclude that particularly RMSSD from (ultra-)short recordings manages to capture HRV well. Even a single 10s (standard) ECG recording yields a valid RMSSD measurement, although averaging over multiple 10s ECGs is preferable. For SDNN we would recommend recordings of at least 30s or, if not available, multiple 10s ECGs. In addition, our study suggests that it is unnecessary to use recordings longer than 120s to obtain accurate measures of RMSSD and SDNN.