Figures
Abstract
Introduction
Field tests to estimate maximal oxygen consumption (VO2max) are an alternative to traditional exercise testing methods. Published field tests and their accompanying estimation equations account for up to 80% of the variance in VO2max with an error rate of ~4.5 ml.kg-1.min-1. These tests are limited to very specific age-range populations. The purpose of this study was to create and validate a series of easily administered walking and stepping field equations to predict VO2max across a range of healthy 18-79-year-old adults.
Methods
One-hundred-fifty-seven adults completed a graded maximal exercise test to assess VO2max. Five separate walking and three separate stepping tests of varying durations, number of stages, and intensities were completed. VO2max estimation equations were created using hierarchal multiple regression. Covariates including age, sex, body mass, resting heart rate, distance walked, gait speed, stepping cadence, and recovery heart rate were entered into each model using a stepwise approach. Each full model created had the same base model consisting of age, sex, and body mass. Validity of each model was assessed using a Jackknife cross-validation analysis, and percent bias and root mean square error (RMSE) were calculated.
Results
Base models accounted for ~72% of the total variance of VO2max. Full model variance ranged from ~79–83% and bias was minimal (<±1.0%) across models. RMSE for all models were approximately 4.5 ml.kg-1.min-1. Stepping tests performed better than walking tests by explaining ~2.5% more of the variance and displayed smaller RMSE.
Conclusion
All eight models accounted for a large percentage of VO2max variance (~81%) with a RMSE of ~4.5 ml.kg-1.min-1. The variance and level of error of models examined highlight good group mean prediction with greater error expected at the individual level. All the models perform similarly across a broad age range, highlighting flexibility in application of these tests to a more general population.
Citation: Rowley TW, Cho C, Swartz AM, Cho Y, Strath SJ (2022) Validation of a series of walking and stepping tests to predict maximal oxygen consumption in adults aged 18–79 years. PLoS ONE 17(2): e0264110. https://doi.org/10.1371/journal.pone.0264110
Editor: Dalton Müller Pessôa Filho, Universidade Estadual Paulista Julio de Mesquita Filho - Campus de Bauru, BRAZIL
Received: October 16, 2020; Accepted: February 3, 2022; Published: February 25, 2022
Copyright: © 2022 Rowley et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting Information files (titled "Data Set".
Funding: This study was funded by the National Institutes of Health R01HL091019 (SJS). Website: https://grants.nih.gov/grants/oer.htm The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Maximal oxygen consumption (VO2max) is a key indicator of health and cardiorespiratory fitness [1] and is considered a “clinical vital sign” and strong predictor of mortality [2]. The traditional, gold standard method to assess VO2max is open circuit spirometry in conjunction with a graded exercise test (GXT) to volitional fatigue. Open circuit spirometry, a method of indirect calorimetry, requires the use of a computerized metabolic measurement system to analyze expired gasses to determine oxygen utilization [1]. A standard GXT protocol, typically performed on a treadmill or cycle ergometer, incrementally increases exercise intensity until the participant achieves VO2max [3]. Despite valuable information obtained from VO2max testing, it is not always feasible in certain settings. The cost of the equipment required to complete such tests is high, and testing requires trained professionals, often making this form of testing inaccessible to the general public. Economic factors aside, VO2max testing is not always a safe option for certain populations [1], such as the elderly who are at a higher risk for falling or those with an increased risk of experiencing an adverse cardiac event during vigorous exercise.
Submaximal VO2 testing to predict VO2max is an alternative to traditional maximal testing without requiring the participant to work to a maximal intensity [1]. Two popular submaximal modalities are the treadmill and cycle ergometer [4–8]. Similar to maximal exercise testing, the cost associated with submaximal VO2 testing can be high and requires specialized equipment and trained personnel. Submaximal field testing, which involves simple equipment and measures (e.g. distance wheel, heart rate monitor), is another alternative to maximal exercise testing. Traditionally, these alternative, low cost options include over-ground walking/running [9–11] or stepping tests [7, 12, 13]. These tests can provide a safe testing alternative for high risk populations and can be easily administered in the field or clinical setting with little expense to estimate VO2max.
Ease of delivery and physical burden of a test are only two components to consider when selecting a field test to estimate VO2max. How well a field test prediction equation estimates VO2max, as determined through methodological validation research, and what population(s) the test is designed for are also important factors to consider. Explained variance and error of the estimate reported in the literature fluctuates among submaximal field tests predicting VO2max, with the highest performing prediction equations reporting in the region of 80% of the shared variance and an error of approximately 4.5 ml.kg-1.min-1 [8, 10]. Unfortunately, a limitation within the current body of literature is a lack of consistency in validation and reporting efforts [8]. Additionally, many of the published field tests tend to target homogenous groups of recreationally active young adults [6, 12] or adults with a narrow age range [10], with few studies developing and comparing field tests across a broad age range [13, 14]. Further, the modalities of these tests may be deemed inappropriate for certain populations, limiting their application to a broad, generalized population. Thus, there is a scientific need to examine the precision and accuracy of easily administered, low cost, submaximal field tests that transcend a wide age range. Accordingly, the purpose of this study was to determine the validity of various walking and stepping tests to predict VO2max among a broad age-range of adults.
Materials and methods
Participants and study overview
This study had a cross-sectional design that spanned three days and two different settings. Day one of testing took place within a university laboratory on a large, midwestern campus. There, participants completed demographic, anthropometric, and VO2max assessments, using the equipment and techniques outlined under the measures section. Days two and three took place at a separate, on-campus gymnasium with a climate controlled environment and a 200-meter indoor track. These testing days comprised of different walking and stepping exercise tests. One hundred and sixty-two individuals were recruited based on the following inclusion criteria: a.) age between 18–79 years old; b.) ambulatory (i.e. free of any walking limitations, such as use of an assistive device or amputation); c.) able to walk on a treadmill; and d.) healthy as determined by a physical examination within the past three years. Individuals were excluded if they: a.) had a diagnosis of a cardiovascular, metabolic, or pulmonary condition; b.) were pregnant or nursing; and c.) had a history of severe arthritis or other orthopedic conditions. Participants were recruited via telephone, flyers, and word of mouth from a large, metropolitan area and surrounding communities. This study was approved by the University of Wisconsin-Milwaukee Institutional Review Board, #08.298.
Written informed consent from the participants was obtained prior to enrollment to the study.
Measures
Demographic and anthropometric assessment.
Participants completed a health history questionnaire that assessed current health status and family health history. Height was measured to the nearest quarter of an inch using a stadiometer (Detecto, Webb City, MO, USA) and weight was measured to the nearest quarter of a pound using a calibrated physician’s scale (Detecto, Webb City, MO, USA), with which body mass index (BMI) was calculated. Resting blood pressure and heart rate were assessed using auscultation and palpitation, respectively, following standard procedures [15].
Maximal exercise test.
A modified Balke treadmill protocol [1] was used to measure VO2max. Participants were fitted with a 3-way, non-rebreathing mouthpiece, nose clip, and head support (Hans-Rudolph) that were connected to a metabolic cart using a tube (TrueOne 2400, ParvoMedics, Sandy, UT, USA) to assess expired gas. Measurement of oxygen consumption through expired gasses using this metabolic cart has been previously validated against the traditional Douglas bag method. Specifically, excellent accuracy and precision was reported for gas exchange variables, and VO2 was found to differ by [0.018] l/min [4]. Heart rate and electrical activity were monitored using a 12-lead EKG (Case System, GE Healthcare, USA). Volitional fatigue or the following criteria had to be met to be considered a maximal exercise test: a plateau <2.1 ml/kg/min between two stages, a respiratory exchange ratio of 1.1 or greater, and a heart rate within 10 bpm of age-predicted maximal heart rate (220-age) [16].
Field tests.
During the field tests, participants were fitted with a heart rate monitor (Polar, Polar Electro Inc., Bethpage, NY, USA) to measure recovery heart rate. All tests were separated by a minimum of 5-minutes of seated recovery. Additional time was given to the participant as they deemed it necessary. Heart rate returning back to baseline prior to each new test being started was used as a further marker of sufficient rest being obtained between tests administered. This was consistent for each field test.
Walking tests.
Participants completed a series of over-ground walking tests (Table 1). Total distances (m) for single stage tests and individual-stage distance for ramped-intensity, multi-stage tests were measured using a Pittsburgh brand 10,000 ft/m distance wheel. Walking speed (m.s-1) was calculated by dividing distance with time and was recorded for single stage tests and individual stages for ramped-intensity protocol tests. Walking speeds were selected for ease of administration.
Depending on the protocol (tests 3–5), participants were instructed to walk at a self-selected slower than normal, normal, and/or faster than normal walking speed. These walking speeds were self-determined. Additionally, the progressive nature of these walking tests emulates traditional graded exercise tests. Recovery heart rate was recorded at 30-second time points for two-minutes after each test.
Step tests.
Test duration, stages per test, and stepping cadence were selected to mimic the progressive nature of traditional graded exercise tests (Table 2). Step height was selected to mimic traditional step height (e.g. on a flight of stairs) and two different heights were selected to further modify intensity levels. Stepping cadence was assigned based on age (Table 3) with the older age group(s) starting at a lighter intensity than the younger age group(s), to ensure that the test remained submaximal. Recovery heart rate was recorded at 30-second time points for two-minutes after each test.
Statistical analysis
Statistical analysis was completed in SPSS Version 22. Hierarchal regression analysis (using stepwise selection) was used to build models to predict VO2max. The base model for each equation consisted of age (years), sex (male = 1, female = 0), and body mass (kg), and was entered as the first step of the model. Resting heart rate (bpm) and recovery heart rate (bpm) variables were entered into each model. Walking distance (m) and walking speed (m.s-1) were entered into walking test models, and step cadence (bpm) and step height (in) were entered into step test models. For ramped protocol walking tests, individual-stage distance, individual-stage speed, total distance, and average speed were included when building the equations. Variables that significantly predicted VO2max were kept in the model, while variables that did not significantly predict VO2max were excluded. Main effects were only considered due to sample size limitations. The resulting model from hierarchical and selection process were tested for multicollinearity using variance inflation factor (VIF). Variables identified with a high VIF (>1.0) were removed from the model. Explained variance (R2), adjusted R2 (R2adj), and root mean square error (RMSE) were generated for each model.
Each regression equation was then cross-validated using the Jackknife analysis (leave one subject out) method [17] using SAS Version 9.4. Bias and RMSE were created for each test predicting VO2max. Bland-Altman plots [18] and 95% limits of agreement (LoA, SD of the differences 1.96) were created and a t-test for differences between measured and predicted VO2max values was assessed. Significance for all tests was set at p<0.05.
Results
Five of the 162 participants recruited did not qualify for the study. Of the final 157 participants, two-thirds of the sample was female (66%) and the average age was 48.9 ± 17.4years (mean ± SD). Average measured VO2max was 34.3 ± 10.1 ml.kg-1.min-1 and average BMI was 25.7 ± 4.3 kg.m-2. Participant characteristics broken down by sex are presented in Table 4.
Base model
The base model for each regression equation included age (years), sex (male), and body mass (kg). While the specific values for the base model varied among tests, this model alone accounted for ~72% of the explained variance in VO2max and the RMSE was approximately 5.45 ml.kg-1.min-1. Age and body mass had a negative relationship with VO2max meaning that as age or body mass increased, VO2max decreased. Male sex, alternatively, was associated with a higher VO2max. This relationship was true across all base models, which are reported in Tables 5 and 6 for the walking and stepping equations, respectively.
Full models.
Models were constructed on a test-by-test basis. Estimation of VO2max was strong across all prediction equations. The explained variance for the field test equation models varied from 79.7% to 83.5%, with Test 1 (the five-minute walking test) being the weakest predictor of VO2max. Test 8 (the three stage, nine-minute step test using an 8-inch step) was the strongest predictor of VO2max. Likewise, RMSE for these tests ranged from 4.138 ml.kg-1.min-1 to 4.656 ml.kg-1.min-1 for Test 8 and Test 1, respectively. By adding variables to the base models, the full models were able to account for approximately 10% more explained variance in VO2max.
Walking regression equations.
Walking regression results are presented in Table 5. Gait speed and recovery heart rate were common predictors among the walking equations. Gait speed, when significant, had a positive relationship with VO2max, where a faster-selected gait speed was associated with a higher VO2max. For the tests with multiple stages (Test 3–5), slower than usual gait speed was never a significant predictor. Heart rate variables varied among the tests and included 30- or 60-second recovery heart rate. All heart rate variables had a negative relationship with VO2max.
Stepping regression equations.
Stepping regression results are presented in Table 6. Thirty-second recovery heart rate was a significant predictor for each step test. Like the walking tests, heart rate variables were negatively related to VO2max. Test 8 performed better than any of the other tests (walking or stepping) for predicting VO2max (R2 = 0.835, R2adj = 0.830, and RMSE = 4.138 ml.kg-1.min-1).
Jackknife validation results
Results of the jackknife validation revealed that bias was relatively small for each test, with each model reporting a bias well within ± 1%. Root mean square error ranged from 4.102 ml.kg-1.min-1 to 4.662 ml.kg-1.min-1, for Test 8 and Test 1, respectively. Jackknife results are presented in Table 7.
Of the walking tests, the model for Test 2 still accounted for the greatest explained variance in VO2max with a Jackknife adjusted R2 of 0.824 and RMSE of 4.287 ml.kg-1.min-1, and bias of -0.0000421% and 0.0000406%, respectively. Of the stepping tests, the model for Test 8 accounted for the greatest explained variance in VO2max with a Jackknife adjusted R2 of 0.834 and RMSE of 4.102 ml.kg-1.min-1, and bias of -0.0000411% and 0.000104%, respectively. Bland-Altman plots were created for Test 2 (Fig 1) and for Test 8 (Fig 2). Plots show mean error to be close to zero, and LoA of +8.599 to -8.599 ml/kg/min (t-test, -0.000445) for Test 2 and +8.250 to -8.250 ml/kg/min (t-test, -0.001) for Test 8. Both Figs 1 and 2 show that there is no systematic bias of the prediction noted across the sample.
Figure shows mean error to be close to zero (-0.0004) and the limits of agreement are +/- 8.599 ml/kg/min. This indicates that there is minimal bias between the measured and predicted VO2max values.
Figure shows mean error to be close to zero (-0.001) the limits of agreement are +/- 8.250 ml/kg/min. This indicates that there is minimal bias between the measured and predicted VO2max values.
Discussion
The purpose of this study was to determine the validity of several easily administered walking and stepping field-tests to predict VO2max across a broad age range. We found that among all eight tests examined, the 9-minute stepping test with three stages, using an 8-inch step yielded the highest bias-adjusted R2 (0.834) and lowest RMSE (4.102 ml.kg-1.min-1) while maintaining minimal bias, well within ±1%. Overall, the stepping tests outperformed the walking tests for predicting VO2max by having the highest bias-adjusted R2 values and lowest RMSE. However, of the walking tests, a single stage, two-minute test to walk as far as possible yielded the highest bias-adjusted R2 (0.824) and lowest RMSE (4.287 ml.kg-1.min-1), also maintaining a minimal bias within ±1%.
Three popular field tests that are widely used are the Queen’s College Step Test [12], Cooper 12-minute run [9], and the one-mile walk test [10]. The Queen’s College Step Test is a 3-minute, single stage step test that requires participants to maintain a cadence of 22 steps/min as they step up and down from a 16.25-inch step and then manually measure and record recovery heart rate [12]. Despite being a single stage test, which makes the test itself shorter, a step height that is close to a foot and a half tall makes this test rigorous and concerns related to balance and fall risk need to be considered. Alternatively, the step tests presented in the current study are 6 and 8-inches tall, which is comparable to a standard step height.
Stepping tests can be difficult to administer at times, as they require the participant to maintain a certain cadence while stepping up and down. Benefits of walking and running tests is that the participant can self-regulate. For example, both the Cooper 12-minute run test and the one-mile walk test instruct participants to cover as much ground within the time frame and walk as quickly as possible to complete the mile, respectively [9, 10]. The simplest of the walking tests in the current study was a two-minute test that asked participants to cover as much ground as possible while still maintaining a walk. These simple instructions paired with a short duration make this test very easy to administer and highly achievable for most individuals. Further, as the participants are walking, it is possible to measure the distance as they go, unlike the Cooper 12-minute run where distance can be difficult to gauge depending on the location of the test.
The field tests in the current study performed well when predicting VO2max, accounting for approximately 80% of the explained variance and yielding RMSE of approximately 4.5 ml.kg-1.min.-1. The Queen’s College Step Test reports a low R2 value of 0.563 [12], which accounts for ~30% less of the explained variance of VO2max than our highest performing step test. The Cooper 12-minute run and the one-mile walk test report explained variances for VO2max of around 77% and 81%, respectively [9, 10]. The explained variance for both the one-mile walk and Cooper 12-minute test is similar, albeit lower than the explained variance we report within for our walking tests in the current study. McArdle et al., reports a standard error, however the units are in ml.min-1, making it difficult to compare error rates among tests [12]. Cooper did not report an error for the 12-minute run estimation equation [9], but the one-mile walk test reported an associated error of 5.0 ml.kg-1.min-1 [10] which is marginally higher than what we report with our current study findings. Error associated with an equation can impact the interpretation of a score. Too large of an error of the estimate can make it difficult to detect true change in a variable (i.e. VO2max), and thus smaller error is preferred.
Cross validation analysis showed that our tests yielded minimal bias, meaning that the estimated VO2max values were very similar to the measured VO2max values. Unfortunately, there is inconsistency within the literature regarding validation reporting efforts, including the three previously published field tests listed above [9, 10, 12]. Kline and colleagues did, however, perform a cross-validation analysis in a separate sample and reported a final, adjusted variance of ~77% (R2 = 77.4) and standard error of 4.4 ml.kg-1.min-1 [10]. Although the error is similar to the ones we report here, the explained variance is lower than we found in the current study.
Some considerations are warranted when utilizing any of the field tests we report on. First, when considering feasibility and safety, the 9-minute stepping test, using an 8-inch step might not be appropriate for elderly or frail populations. As there was minimal difference in equation performance between the 9-minute stepping test using a 6-inch step and the 6-minute stepping test using a 6-inch step (~1% in variance and ~.1 ml.kg-1.min-1 in error), the shorter duration test with the shorter step could be a safer more practical option. Still, any form of stepping test could still perpetuate the risk for falls. The two-minute over-ground walking test could be the best option for a quick estimation of VO2max as it requires minimal equipment and is shorter in duration. Additionally, the instructions are simple (“cover as much ground as possible in two-minutes”), whereas the stepping tests require a ramped cadence protocol which could cause confusion. Compared to the stepping tests, the two-minute walking test accounts for a similar amount of variance in VO2max as the stepping tests (~82%) and contains a similar level of error (~4.2 ml.kg-1.min-1).
This study is not without limitations. First, the sample size was relatively small, which limited the analysis to only include main effects. Future studies should aim for a larger sample to allow for the investigation of interactions to potentially strengthen the model(s) to better predict VO2max. Second, while these models are statistically sound, further investigation into the application of these measures should be investigated. In a clinical setting or as a baseline estimate, any of these tests should be acceptable for estimating VO2max. The testing environment should also be considered when administering these tests, as they were developed in a climate-controlled environment. Factors, such as temperature, humidity, and wind could impact test results, thus altering the reliability of the estimation. Further, these models were developed in healthy adults, thus these results are limited to that population. Finally, despite assessing how well our models performed compared to the traditional gold-standard of open circuit spirometry for assessing VO2max, we did not compare our models to previously validated field test, which may have been a beneficial comparison to make.
In conclusion, this study generated VO2max estimation equations from eight different stepping and over-ground walking field tests. A jackknife cross-validation assessment followed the creation of each equation to provide information on bias of each equation. By incorporating this bias, which was small, each equation accounted for ~80% of the explained variance for predicting VO2max with an error of ~4.5 ml.kg-1.min-1. These results highlight that reported tests perform well to estimate group mean VO2max values, but larger error would be expected for a given individual as the Bland-Altman plots display errors of ±8–9 ml.kg-1.min-1. Compared to previously published field tests, the tests presented here are appropriate for a broad age range and are simple to administer, requiring minimal equipment.
References
- 1.
American College of Sports Medicine. ACSM’s guidelines for exercise testing and prescription. 10th ed. Wolters Kluwer. 2018. 81p.
- 2. Ross R, Blair SN, Arena R, Church TS, Després JP, Franklin BA, et al. Importance of assessing cardiorespiratory fitness in clinical practice: a case for fitness as a clinical vital sign: a scientific statement from the American Heart Association. Circulation. 2016 Dec 13;134(24):e653–99. pmid:27881567
- 3. Astrand P., Rhyming. A nomogram for calculation of aerobic capacity from pulse rate during submaximal work,". Journal of Applied Physiology. 1954; 7:218–221. pmid:13211501
- 4. Bassett DR Jr, Howley ET, Thompson DL, King GA, Strath SJ, McLaughlin JE, et al. Validity of inspiratory and expiratory methods of measuring gas exchange with a computerized system. Journal of Applied Physiology. 2001 Jul 1;91(1):218–24. pmid:11408433
- 5. Maritz JS, Morrison JF, Peter J, Strydom NB, Wyndham CH. A practical method of estimating an individual’s maximal oxygen intake. Ergonomics. 1961 Apr 1;4(2):97–122.
- 6. Coleman AE AE C. Validation of a submaximal test of maximal oxygen intake. The Journal of Sports Medicine and Physical Fitness. 1976; 16(2): 106–111. pmid:966743
- 7.
YMCA of the USA, Golding LA. YMCA Fitness Testing and Assessment Manual. 4th ed. Champaign (IL): Human Kinetics; 2000. 247 pg.
- 8. Akalan C, Robergs R, Kravitz L. Prediction of VO2max from an individualized submaximal cycle ergometer protocol. Journal of Exercise Physiology Online. 2008;11(2):1–7.
- 9. Cooper KH. A means of assessing maximal oxygen intake: correlation between field and treadmill testing. Jama. 1968 Jan 15;203(3):201–4. pmid:5694044
- 10. Kline CJ, Porcari JP, Hintermeister R, Freedson PS, Ward A, McCarron RF, et al. Estimation of from a one-mile track walk, gender, age and body weight. Med. Sports Exerc. 1987; 19:253–259. pmid:3600239
- 11. Ribisl PM, Kachadorian WA. Maximal oxygen intake prediction in young and middle-aged males. The journal of sports medicine and physical fitness. 1969 Mar;9(1):17. pmid:5789284
- 12. McArdle WI, Katch F, Pechar G, Jacobson LO, Ruck S. Reliability and interrelationships between maximal oxygen intake, physical work capacity and step-test scores in college women. Medicine and science in sports. 1972 Dec;4(4):182–186. pmid:4648576
- 13. Jetté M, Campbell J, Mongeon J, Routhier R. The Canadian Home Fitness Test as a predictor of aerobic capacity. Canadian Medical Association Journal. 1976 Apr 17;114(8):680. pmid:1260614
- 14. Billinger SA, Van Swearingen E, McClain M, Lentz AA, Good MB. Recumbent stepper submaximal exercise test to predict peak oxygen uptake. Medicine and science in sports and exercise. 2012 Aug;44(8):1539. pmid:22382170
- 15.
Swain DP, Brawner CA, American College of Sports Medicine. ACSM’s resource manual for guidelines for exercise testing and prescription. Wolters Kluwer Health/Lippincott Williams & Wilkins; 2014.
- 16. Howley ET, Bassett DR, Welch HG. Criteria for maximal oxygen uptake: review and commentary. Medicine and science in sports and exercise. 1995 Sep 1;27:1292–1301. pmid:8531628
- 17.
Friedman J., Hastie T., & Tibshirani R. The Elements of Statistical Learning. Springer Series in Statistics, New York. 2001.
- 18. Bland JM, & Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986; 1:307–310. pmid:2868172