How Accurate Is the Prediction of Maximal Oxygen Uptake with Treadmill Testing?

Background Cardiorespiratory fitness measured by treadmill testing has prognostic significance in determining mortality with cardiovascular and other chronic disease states. The accuracy of a recently developed method for estimating maximal oxygen uptake (VO2peak), the heart rate index (HRI), is dependent only on heart rate (HR) and was tested against oxygen uptake (VO2), either measured or predicted from conventional treadmill parameters (speed, incline, protocol time). Methods The HRI equation, METs = 6 x HRI– 5, where HRI = maximal HR/resting HR, provides a surrogate measure of VO2peak. Forty large scale treadmill studies were identified through a systematic search using MEDLINE, Google Scholar and Web of Science in which VO2peak was either measured (TM-VO2meas; n = 20) or predicted (TM-VO2pred; n = 20) based on treadmill parameters. All studies were required to have reported group mean data of both resting and maximal HRs for determination of HR index-derived oxygen uptake (HRI-VO2). Results The 20 studies with measured VO2 (TM-VO2meas), involved 11,477 participants (median 337) with a total of 105,044 participants (median 3,736) in the 20 studies with predicted VO2 (TM-VO2pred). A difference of only 0.4% was seen between mean (±SD) VO2peak for TM- VO2meas and HRI-VO2 (6.51±2.25 METs and 6.54±2.28, respectively; p = 0.84). In contrast, there was a highly significant 21.1% difference between mean (±SD) TM-VO2pred and HRI-VO2 (8.12±1.85 METs and 6.71±1.92, respectively; p<0.001). Conclusion Although mean TM-VO2meas and HRI-VO2 were almost identical, mean TM-VO2pred was more than 20% greater than mean HRI-VO2.


Introduction
When assessed as oxygen consumption (VO 2 ), cardiorespiratory fitness (CRF) may be measured either using a treadmill with conventional gas analysis equipment (TM-VO 2meas ) or predicted from equations based on treadmill speed, incline or treadmill time (TM-VO 2pred ) [1]. The prognostic importance of CRF has been extensively investigated in recent meta-analyses confirming the strong inverse relationships between CRF and all-cause mortality in healthy individuals [2] and in patients with either coronary artery disease (CAD) or congestive heart failure (CHF) [3][4][5][6]. The prospective studies included in these reviews involve large numbers of subjects and have shown that a 1 MET (equal to 3.5 mL O 2 Ákg -1 Ámin -1 ) increment increase in CRF is associated with an approximate 10-20% reduction in all cause and cardiovascular mortality [2,7] with a similar effect being observed with CHF [6,8].
Cardiovascular pathology frequently screened for with treadmill testing includes both CAD and CHF. In using CRF as an outcome measure from a treadmill test, VO 2peak is commonly expressed as METs with 1 MET being the VO 2 at rest with current convention stating that it is equal to 3.5 mL O 2 Ákg -1 Ámin -1 [29]. Kaplan-Meier curves have been used extensively to document the link between CRF and long-term morbidity/mortality [30,31]. Although VO 2 can be predicted from treadmill speed, incline or the test time for a particular protocol, currently the only way to ensure an accurate measurement of VO 2 is direct measurement with gas analysis. Using only two simple measurements, rest HR and an activity HR (either sub-maximal or maximal), the recently published HR index (HRI = activity HR/rest HR), equation for predicting VO 2 expressed as METs is associated with a high correlation between HRI and VO 2 , the equation being METs = 6 x HRI -5 [32]. The HRI equation was derived from group mean data from 60 studies in which an exercise test contained a resting HR (HR rest ), and a VO 2 measured at the activity HR (either submaximal or peak) and expressed in the form of mLO 2 Ákg -1 Ámin -1 or METs. The original data are shown as a regression plot in Fig 1. The utility of this equation is that it provides a simple independent surrogate method of estimating VO 2 using only the rest and either the sub-maximal or maximal activity HR measurements. Though the HRI equation was developed from aggregate data, there has been no analysis to date that has established its predictive accuracy for assessment of VO 2 .
The objective of this study was to compare aggregate HRI-derived VO 2 (HRI-VO 2 ) data against VO 2peak from two different treadmill tests, either: 1) VO 2 measured with conventional gas analysis equipment (TM-VO 2meas ) or 2) VO 2 predicted from equations based on treadmill speed, incline or treadmill time (TM-VO 2pred ).
from October 2011 till March 2013 using MEDLINE, Google Scholar and Web of Science. Search terms included (in various combinations) exercise testing, oxygen uptake, VO 2 , CRF, cardiovascular disease (CVD), CAD, CHF and physical activity. With publications having the prerequisite HR data extensive cross-referencing was undertaken to source other publications with eligible criteria [33].
Eligibility criteria for study inclusion are 1) >100 patients enrolled, 2) documented VO 2peak (either measured or predicted) expressed as either mL O 2 Ákg -1 Ámin -1 or as METs, 3) measured maximal HR (HR max ) associated with VO 2peak , and 4) measured HR rest . Where large scale studies included cycle ergometry in conjunction with treadmill testing, the study was excluded. In publications likely to have used a similar subject cohort based on 1. participating authors, 2. study location, 3. time period when the study was performed and 4. characteristics of the study population e.g. healthy, suspected or known CAD the most recent publication was chosen. From the HR data, a predicted MET value (VO 2peak ) was derived using the HRI equation (METs = 6 x HR index-5, where HR index is HR max /HR rest ).
At the time of closure of data acquisition in March 2013 a total of 40 studies (TM-VO 2meas ; n = 20 studies, TM-VO 2pred ; n = 20 studies) had been identified with all but one being published since 1991. MEDLINE searching identified 19 of the 40 studies (TM-VO 2meas ; n = 11 studies, TM-VO 2pred ; n = 8 studies) used in this analysis with the remaining 21 studies being sourced through Web of Science, Google Scholar and cross referencing. The TM-VO 2meas studies had a bias towards clinical outcomes related to CHF whereas the TM-VO 2pred studies were frequently associated with long-term outcome (survival) in screening for CVD. Though multiple search strategies were used to obtain studies meeting selection criteria it is acknowledged that even with rigorous attention to search detail, suitable studies may have been missed.

Statistical analysis
Categorical variables were expressed as numbers and percentages with continuous variables expressed as mean ± standard deviation. Student's paired t-test was used to compare HRI-VO 2 against both TM-VO 2meas and TM-VO 2pred . Results are expressed in two formats, namely 1) pooled data for each of TM-VO 2meas and TM-VO 2pred against HRI-VO 2 expressed as group means and shown in the form of line of identity and Bland Altman plots [34] and 2) CRF data shown in tertiles for both TM-VO 2meas and TM-VO 2pred groups against HRI-VO 2 . representing a data point, there were 57 data points. Age and gender distribution was similar for the TM-VO 2meas (51.0 years and 64.9% males) and TM-VO 2pred groups (52.9 years and 71.0% males). The principal details of the 40 treadmill studies used in the analysis are outlined in Table 1. These include the test protocol, use of handrail support and the health status of participants. Of the 20 TM-VO 2meas studies, 14 (70%) involved subjects with CHF and all 14 used protocols other than the standard Bruce protocol [35]. The design of these alternate protocols reduced the stage increment of VO 2 usually to 2 METs or less with certain ramp protocols having increments of less than 1 MET per minute. In only two of the TM-VO 2meas studies was hand rail support mentioned, being 'not permitted' in one study (Dressendorfer [36]) and 'discouraged' in the other (Oliveira [37]).

Studies used in the analyses
Typically, subjects with known or suspected CVD or with significant cardiovascular risk factors were involved in the TM-VO 2pred studies (Table 1). A Bruce protocol, either as the standard or a modified protocol, was used in 13 (65%) of the 20 TM-VO 2pred studies. With TM-VO 2pred studies, the use of handrail support was defined in seven studies (35%) and not stated in the remaining 13 studies. Descriptors of handrail support used for these seven studies were 'discouraged' in 3 studies, 'not permitted' in 3 studies and 'light hand rail support' in 1 study. Predictive treadmill equations in TM-VO 2pred studies were either given or referenced in only 12 (60%) of the 20 studies.

Characterization of study groups
A. Group means: oxygen consumption and heart rate. The mean TM-VO 2pred reported in the 20 studies was 8.12 METS; the mean TM-VO 2meas reported in the 20 studies was 6.51 METS, a difference of 1.61 Mets or 24.7% ( Table 2). The mean HR rest with TM-VO 2pred was 75.6 beatsÁmin -1 and with TM-VO 2meas was 77.6 beatsÁmin -1 ; the mean HR max for TM-VO 2pred 146.3 beatsÁmin -1 and TM-VO 2meas 147.1 beatsÁmin -1 (Table 2). However, the absolute differences in group means for HR rest and HR max between TM-VO 2pred and TM-VO 2meas were small at 2.0 beatsÁmin -1 for HR rest and only 0.8 beatÁmin -1 for HR max ( Table 2).
Alternatively if VO 2peak is determined by HRI-VO 2 the difference between TM-VO 2pred and TM-VO 2meas is reduced to only 0.17 MET or 2.6% (TM-VO 2pred 6.71 METs, TM-VO 2meas 6.54 METs), a not unexpected result in view of the small differences in HR rest and HR max between these two groups ( Table 2).
The plot of TM-VO 2meas against HRI-VO 2 shows a uniform distribution around the line of identity with the Bland Altman plot suggesting that there is no bias between these two separate methods of determining VO 2peak (Fig 4A and 4B). However, a similar line of identity plot for TM-VO 2pred against HRI-VO 2 indicates a strong bias with the Bland Altman plot indicating a systematic error in support of over-prediction of TM-VO 2pred (Fig 5A and 5B).

Discussion
It is crucial to have high quality CRF data for use in epidemiological studies as management strategies involving both pharmacological and lifestyle intervention rely on this accuracy. The utility of the HRI equation [32] as a surrogate measure of VO 2 expressed in METs is confirmed in this study when assessed against VO 2peak for both TM-VO 2meas measured with conventional gas analysis equipment and for TM-VO 2pred predicted from equations based on treadmill speed, incline or treadmill time. A close agreement between HRI-VO 2 and TM-VO 2meas was observed in the 20 TM-VO 2meas studies with only a 0.4% difference (p = 0.84) between group means. By comparison, a highly significant 21.1% (p<0.001) over-prediction of VO 2peak was observed when comparing HRI-VO 2 against TM-VO 2pred in the 20 TM-VO 2pred studies. The Table 2. Heart rate and oxygen consumption data for TM-VO 2meas and TM-VO 2pred . Group mean (± 1SD) heart rate (HR) and oxygen consumption (VO 2 ) data. HR rest , HR peak , HRI-VO 2 and VO 2peak for TM-VO 2meas and TM-VO 2pred .  magnitude of the potential error using TM-VO 2pred challenges the current methods of treadmill prediction of CRF which appear to lead to overestimation of CRF and potentially to false prognostic classification. If the magnitude of the disparity between HRI-VO 2 and TM-VO 2pred as shown in this study is, for example, applied to the outcome data of CRF as expressed in METs in the metaanalysis by Kodama [2], there is a strong likelihood of a false classification based on the overprediction of CRF. For example, in treadmill studies investigating the effect of handrail support, a practice that lengthens treadmill time, VO 2peak is over-predicted by 20% to 30% [9][10][11][12][13]17] which would lead to a potentially false prognostic classification of CRF. To correct for the consistently observed over-prediction of VO 2peak of around 20% resulting from the use of handrail support, Foster has developed simple modifications of the ACSM equations for use when handrail support is observed during treadmill testing [17]. None of the 20 TM-VO 2pred studies used in this analysis referenced use of the Foster or similar equations to correct for observed handrail support. This prediction error could potentially apply to other published studies that express results in the form of survival tables and Kaplan-Meier curves. The measurement of CRF is not only limited to CVD. CRF also defines long-term risk in both healthy subjects and other common medical conditions, such as stroke [38], dementia [39] and diabetes mellitus [40]. In the TM-VO 2pred group of studies, the smallest difference (9.1%) between HRI-VO 2 and TM-VO 2pred was observed in the highest CRF tertile. Presumably, the fittest subjects find less difficulty with treadmill walking and so have less need for handrail support. Conversely, the least fit, i.e., the lowest tertile, are most likely to utilize handrail support, even when instructed otherwise, and, in the present study, they demonstrated a 31.2% difference between HRI-VO 2 and TM-VO 2pred . Results from the HUNT 3 Fitness Study also noted the greatest overestimation of VO 2peak in the least fit subjects [18].

Studies
Collectively the 20 TM-VO 2pred studies used in this analysis involve a tenfold greater number of subjects when compared with the 20 TM-VO 2meas studies, whether considering the total number of subjects (105,044 TM-VO 2pred versus 11,477 TM-VO 2meas ) or the median number (3,736 TM-VO 2pred versus 337 TM-VO 2meas ). This observation indicates an inherent bias in using predicted VO 2 studies for epidemiological purposes. In recognizing the need for high quality population CRF data, the Fitness Registry and the Importance of Exercise: A National Database (FRIEND) was established in 2014 [41]. A recent publication from this group has provided age-related reference standards of CRF from 7783 tests in which VO 2max was determined by gas analysis, the authors highlighting the shortcomings of using TM-VO 2pred largely because of over-prediction of VO 2max associated with hand rail support [42]. Their statement together with the observations in the present review suggest that, for the continued use of TM-VO 2pred data, a reappraisal of current methods used for prediction of VO 2peak warrants consideration.
One important question arising from this analysis is the value of using maximal HRI to predict VO 2peak from HR derived values (rest and peak) as opposed to treadmill parameters (speed, incline or treadmill time). When calculating maximal HRI, two independent predictors of future CVD risk, namely an estimated VO 2peak [2,43] and HR rest [44] are incorporated within the HRI. The maximal HRI is based on two measured values of HR and, when used as an index, there is minimal predictive error especially when compared to VO 2pred using equations based on speed, incline or treadmill time. As a 1.0 MET increment corresponds to a HRI increment of 0.167, Kaplan-Meier curves ranging from <5 to >10 METs have a corresponding HRI range from <1.67 to >2.50 (e.g., 5 METs = Rest [HRI = 1] + 4 METs [HRI = 4 x 0.167] = 1.67). In considering a range of activity from rest (1.0 MET) to the maximum aerobic performance of an elite athlete (e.g. 19 METs), the corresponding range of HRI would be from 1 to 4. The simplicity of calculating HRI together with the range of index used for clinical evaluation suggests that it could provide a useful addition to the assessment of CRF. To illustrate this, a range of 5, 10 and 15 MET levels have corresponding HRIs of 1.67, 2.5 and 3.33.

Study Limitations
This review has used the simple concept of HRI as a surrogate measure of VO 2 . The equation was established from aggregate data acquired from 60 studies. In applying the HR index to this analysis, we have compared aggregate data from TM-VO 2pred and TM-VO 2meas against HRI-VO 2 with no intention of indicating the individual predictive accuracy of the equation. Ideally the use of individual, as opposed to aggregate data would have been preferable but it was beyond the capability of this analysis.

Conclusions
The usefulness of CRF is well established for assessing CV risk with treadmill testing providing a simple and convenient method of assessing CRF. The aggregate analysis used in this study shows a close relationship, i.e., a non-significant 0.4% difference, between HRI-VO 2 and TM-VO 2meas but a large and highly significant 21.1% difference between HRI-VO 2 and TM-VO 2pred .This overestimation of TM-VO 2pred , and so CRF, challenges the validity of predicting VO 2 peak from equations based on treadmill speed, incline or protocol time when attempting to document a link between CRF and long-term morbidity/mortality.