Predicting VO2peak from Submaximal- and Peak Exercise Models: The HUNT 3 Fitness Study, Norway

Purpose Peak oxygen uptake (VO2peak) is seldom assessed in health care settings although being inversely linked to cardiovascular risk and all-cause mortality. The aim of this study was to develop VO2peak prediction models for men and women based on directly measured VO2peak from a large healthy population Methods VO2peak prediction models based on submaximal- and peak performance treadmill work were derived from multiple regression analysis. 4637 healthy men and women aged 20–90 years were included. Data splitting was used to generate validation and cross-validation samples. Results The accuracy for the peak performance models were 10.5% (SEE = 4.63 mL⋅kg-1⋅min-1) and 11.5% (SEE = 4.11 mL⋅kg-1⋅min-1) for men and women, respectively, with 75% and 72% of the variance explained. For the submaximal performance models accuracy were 14.1% (SEE = 6.24 mL⋅kg-1⋅min-1) and 14.4% (SEE = 5.17 mL⋅kg-1⋅min-1) for men and women, respectively, with 55% and 56% of the variance explained. The validation and cross-validation samples displayed SEE and variance explained in agreement with the total sample. Cross-classification between measured and predicted VO2peak accurately classified 91% of the participants within the correct or nearest quintile of measured VO2peak. Conclusion Judicious use of the exercise prediction models presented in this study offers valuable information in providing a fairly accurate assessment of VO2peak, which may be beneficial for risk stratification in health care settings.


Introduction
Peak oxygen uptake (VO 2peak ) is widely referred to as cardiorespiratory fitness (CRF) [1], and is inversely linked to cardiovascular disease, hypertension, certain cancers, metabolic syndrome [2,3], and all-cause mortality [4]. At present there is no consensus identifying a precise threshold of cardiorespiratory fitness to be associated with increased cardiovascular risks. However, values below 8 METs and 6 METSs in healthy men and women, respectively, are linked with higher all-cause mortality and adverse cardiovascular effects [4]. Additionally, data suggest that MET levels > 9 and > 7 (vs. lower MET levels) among men and women, respectively, is associated with a mortality risk reduction of ! 50% over an average 8 years follow-up [5]. Despite being an essential health indicator, VO 2peak is rarely assessed in health care settings [5,6], likely because direct gas analysis measurements of VO 2peak is expensive, necessitate the use of advanced equipment, and trained personnel [2]. However, reliable and valid prediction models should be considered as several studies have shown that either directly measured or estimated VO 2peak enhance CVD-mortality prediction beyond traditional risk factors [7,8].
Although a maximal test is considered a safe practice, complications and adverse effects occur, normally linked to underlying disease [9]. Consequently, health care personnel should monitor when testing individuals at high risk.
Therefore, the aim of the present study was to develop VO 2peak prediction models from both submaximal-and peak treadmill performance, on the basis of data from a large healthy population of both men and women 20-90 years, with a great diversity in measured VO 2peak . If these models show fair predictive accuracy they will provide a safe and feasible method for estimating VO 2peak for a wide variety of people.

Study sample
In 2006-2008, the total population above 20 years of age in Nord-Trøndelag county, in Norway, were invited to the third wave of the HUNT study (HUNT 3). Out of a total population of 94194, 54% accepted the invitation (n = 50821). A sub-study (The HUNT Fitness Study) invited healthy subjects (without cardiovascular disease, cancer, pulmonary disease and use of blood pressure medication) in three pre-selected municipalities within the county, to perform treadmill testing with direct measurement of maximal oxygen uptake (VO 2max ). Out of 12609 potential eligible participants, 5633 appeared, and 1003 failed to complete the cardiopulmonary exercise test (CPET), withdrew or were excluded for medical reasons detected during medical interview. 4637 participants completed the exercise testing.

Ethics statement
The study was approved by REK-Regional Committees for Medical and Health Research Ethics (2013/1788/REK nord), the Norwegian Data Inspectorate and the National Directorate of Health. The study was conducted in conformity with the Declaration of Helsinki and all participants signed a document of informed consent.

Exercise test procedures
A 10-minute warm-up was implemented with workload individualized to induce some sweat, moderately augmented heart rate and breathing, but devoid of exhaustion. Subsequent the warm-up subjects entered the treadmill used for testing (DK7830; DK City, Taichung, Taiwan) and were equipped with a heart rate monitor (Polar S610 or RS400; Polar, Kempele, Finland) and face mask (Hans Rudolph; Shawnee, KS). Subjects were instructed to avoid handrail grasp. Cardiorespiratory variables were measured continuously using ergospirometry (MetaMax II; Cortex Biophysik GmbH, Leipzip, Germany) connected to computer software (Cortex Meta-Soft, version 1.11.5). A graded individualized treadmill protocol, starting with the warm-up workload, was used with subjects walking or running at gradually increased speed and/or inclination. Treadmill speed was increased (0.5-1.0 kmÁh -1 ) when VO 2 uptake measurements remained stable > 30 s, keeping a fixed inclination if possible. Test was terminated when subject reached volitional exhaustion (e.g. leg fatigue and shortness of breath), preferably within 8-12 minutes (Table 1). VO 2max was taken as the mean of the three successive highest 10-s VO 2 values and defined by a leveling off of VO 2 (<2 mLÁkg -1 Ámin -1 change over the span of these successive measurements) despite increasing speed and/or inclination, in combination with a respiratory exchange ratio (R) above 1.05 and subjective volitional exhaustion (e.g. leg fatigue and shortness of breath). Since a total of 17.6% of the subjects failed to reach all the criteria, the term VO 2peak was used. During the incremental test most subjects had their steady-state VO 2 measured at one (n = 2827) or two (n = 2576) submaximal levels. At the first submaximal level (VO 2 < ventilatory anaerobic threshold (established by V-slope)) steady-state VO 2 was attained from each subject after 3 minutes. Measurements at this level were used to develop the submaximal models. At each level, as well as at peak performance, treadmill-velocity and inclination in addition to heart rate were also registered. Velocities in the range 5.9-8.0 kmÁh -1 typically represents the transition from walking to running, with individual variation attributed differences in e.g. stride length, leg length and body-size [20][21][22]. Test velocities used in development of the VO2 peak prediction models suggest that most participants (92%) walked during the first submaximal measurement, whereas approximately 80% were running during peak measurements. An average of 87% of all participants used 10% treadmill inclination during both the first submaximal and the peak measurements. For development of the submaximal performance models peak heart rates (HR peak ) were predicted from age in two gender specific linear regression models based on the HUNT 3 fitness data (men: HR peak = 215.336-0.73 x age,

Statistical analysis
Descriptive statistics are given as mean and standard deviation for men and women, respectively. Potential variables were chosen on the basis of correlation with measured VO 2peak in previous literature, and entered subsequently in a hierarchical linear regression model. All the retained variables (Treadmill inclination and velocity, weight, age and Fraction HR peak ) made a considerable influence on total model fit. The models were checked for normality and homoscedasticity of residuals and these assumptions were satisfied. All models presented in this paper were derived from the total sample. Internal cross-validation was checked by data-splitting procedures, i.e. SPSS randomly selected approximately 50% of all cases, here denoted validation sample, with the remaining cases denoted cross-validation sample. In these subsets linear regression analysis were performed on the validation sample and applied to predict VO 2peak in both the validation-and the cross-validation samples. Model fit was evaluated by squared multiple regression coefficients (R 2 ) and standard errors of the estimate (SEE). R 2 and R 2 adjusted increased similarly for each new independent variable added to the models. R 2 and R 2 adjusted were either identical or differed in the third decimal place, showing that both had almost identical impact on the outcome variable. As a result R 2 was chosen throughout this paper. To be able to compare the model precision to models derived from external samples we also calculated the % SEE which refers to the percentage of the measured mean VO 2peak within which the estimates generally fall. In the total sample, as well as subgroups of age, VO 2peak and treadmill velocity, we calculated constant error (CE) and total error (TE) for the model. CE represents the mean difference between measured and predicted values (∑ (measured-predicted)/n), while TE represents the squared mean differences ( p ∑ (measured-predicted) 2 /n).
Pearson correlation and variance explained between measured and predicted VO 2peak were used to examine potential shrinkage between validation and cross-validation samples. Further internal validation was done by cross-classifying subjects into quintiles of measured and predicted VO 2peak . Measures of rank correlation and agreement were tested by use of Kendall`Tau and Cohens`Kappa statistics. Two-sided Paired Samples T-test was used to establish differences between measured and predicted VO 2peak . Statistical analyses were performed with SPSS 20.0 (Statistical package for social sciences, Chicago, IL, USA).

Results
Descriptive characteristics are presented in Table 2. Descriptive data in the validation and cross-validation samples were equally distributed (Table 3). Additional descriptive data of the HUNT 3 fitness population are displayed in a previous study [23].
Predicting VO 2peak from peak treadmill performance Peak treadmill inclination and velocity accounted for most of the variance explained by the VO 2peak prediction model (men: R 2 = 0.72, p<0.001; women: R 2 = 0.68, p<0.001), with velocity being the paramount factor. Modest influence were seen from weight and age, and the total explained variance for the peak performance prediction model was R 2 = 0.75 (p<0.001) in men and R 2 = 0.72 (p<0.001) in women. Including resting heart rate and peak heart rate into the model did not contribute considerable changes in R 2 and SEE and were thus excluded from the models. A strong correlation was demonstrated between the predicted-and measured VO 2peak (men: r = 0.87; women: r = 0.85) (Fig 1). Two gender specific VO 2peak prediction equations were derived from multiple linear regression using the total sample, male: VO 2peak = 24.24 + (0.599 x treadmill inclination in %) + (3.197 x treadmill velocity in kmÁh -1 )-(0.122 x body weight in kilos)-(0.126 x age in years); women: VO 2peak = 17.21 + (0.582 x treadmill inclination in percent) + (3.317 x treadmill velocity in kmÁh -1 )-(0.116 x weight in kilos)-(0.099 x age in years) (Tables 4, 5 and 6).

Cross-validation of the peak performance prediction model
The Coefficient of determination (R 2 ) remained stable between the total sample (0.75 and 0.72) and the validation sample (0.76 and 0.72) among both men and women, respectively (Tables 5  and 7), thus suggesting an internally robust prediction model. Also, there were non-significant differences between measured and predicted VO 2peak , and we display CE values close to zero, RHR (beatsÁmin -1 ) 5 9 ± 10 58 ± 9 6 1 ± 10 Data are presented as arithmetic mean ± SD. VO 2peak : peak oxygen uptake; HR peak : peak heart rate; RHR: resting heart rate.

Cross-classification of participants in the peak performance prediction model
The models managed to categorize participants fairly accurately into the correct measured VO 2peak group when cross-classifying participants into quintiles of measured and predicted VO 2peak (Table 11). In total, 75.3% and 77.6% of the men and women, predicted to be in the lowest quintile, were classified correctly into the lowest measured quintile, respectively, while 95.4% and 96.7% were correctly classified within the correct or closest measured quintile. 77.4% and 78.0% of the men and women, predicted to be in the highest quintile, were correctly classified into the highest measured quintile, respectively, with 95.8% and 95.9% being classified correctly into one of the two highest quintiles (Table 11). The rank correlation between measured and predicted quintiles were 0.74 and 0.70 in men and women, respectively, while measure of agreement by Kappa statistic was 0.45 in men and 0.41 in women.
Predicting VO 2peak from submaximal treadmill performance  Cross-validation of the submaximal performance prediction model R 2 was stable between the total sample (0.55 and 0.56) and validation sample (0.54 and 0.56) in men and women, respectively (Tables 13 and 14), indicating a strong prediction model. Furthermore, there were non-significant differences between the measured and predicted VO 2peak , and CE was close to zero in the total sample (− 0.14 and − 0.12), validation sample (− 0.27 and − 0.03) and cross-validation sample (− 0.09 and − 0.11), among both men and women, respectively (Tables 15-17)   Cross-classification of participants in the submaximal performance prediction model Cross-classification of predicted (from submaximal performance) and measured VO 2peak achieved a fairly accurate placing of subjects into the correct VO 2peak quintile (Table 18). In total, 62.0% and 50.3% of the men and women were predicted appropriately into the lowest measured quintile, respectively, with an increase to 91.3% and 80.9% within the closest measured quintiles. A total of 59.0% and 72.5% of the men and women, in the highest predicted quintile, were correctly categorized into the highest measured quintile, respectively, increasing to 84.9% and 89.0% within one of the two highest quintiles (Table 18). The rank correlation between measured and predicted quintiles were 0.61 and 0.60 in men and women, respectively, while measure of agreement by Kappa statistic was 0.30 in men and 0.28 in women.

Discussion
The exercise-based prediction models generated in this study accurately placed approximately 91% of the low-and high-fit participants within the correct or nearest quintile of measured VO 2peak , and predicted VO 2peak with fair precision using both the peak performance and submaximal models.

Accuracy of the VO 2peak prediction models
The peak performance models displayed accuracy (SEE) of 10.5% (R 2 = 0.75) and 11.5% (R 2 = 0.72), in men and women, respectively. This is better than some previous research reporting accuracy in the range 13.3-16.6% [14,24], and also less accurate or equal to that reported by yet others (4.5-11.4%) [10][11][12][13]15,18,25]. Better prediction accuracy in other models may partly be attributed their homogeneous fitness level in sample subjects [10,12,13,15,18] and/or narrow age range [10][11][12]. Validating other models using HUNT 3 data is difficult given the use of different independent variables, e.g. watts on cycle ergometer [13,18,25] or 20m-shutle run [10][11][12]. Although the ACSM running model [20] used, similar to us, speed and gradient, it is developed from steady-state submaximal aerobic exercise, and can be used exclusively in predicting VO 2 during steady-state submaximal work. Hence, it will overestimate VO 2 for peak exercise since contribution from anaerobic metabolism is significant [20], which was confirmed in a previous validation study [24]. However, we were able to validate a model by Uth and colleagues [15] using heart rate ratio (HR peak /resting heart rate) as predictor variable for VO 2max . The Uth model, derived from 46 well-trained men, presented a SEE of 4.5%. This accuracy was considerably lower when validated using HUNT 3 data (18% and 19% SEE in men and women, respectively), which is supported by Esco and colleagues [14] who also observed a substantial reduction in accuracy (SEE of 16.6%), using 109 healthy men to validate the Uth model. This underscores the importance of similar gender, age and physical fitness between the subjects using the model and the subjects used in developing the model to assure best possible accuracy [2,19]. Submaximal VO 2peak prediction models are generally outperformed on accuracy by models derived from peak workload [26], which is also the case in this study presenting accuracies (SEE) of 14.1% (R 2 = 0.55) and 14.4% (R 2 = 0.56), in men and women, respectively. Moreover, non-exercise based prediction models derived from HUNT 3 fitness data [27] yielded a somewhat better accuracy (12.8% and 14.3% in men and women, respectively) than the present submaximal models, while the present peak models had better accuracy. Previous research reported prediction error in the range 7.3-20.9% [2,16,[28][29][30][31][32][33][34][35][36]. The bench-mark Åstrand-Ryhming nomogram [37] reported accuracy of approximately 10%, which was confirmed when validated by Cink & Thomas [38]. Both Åstrand and Cink observed minor differences between measured and predicted VO 2peak . However, both used small groups of physically fit college students for their calculations. Validating the Åstrand-Rhyming nomogram using untrained sedentary subjects [39] showed a 26.5% systematic underestimation of VO 2max . Several peak [13,18,25] and submaximal models [16,[29][30][31]34,35] used cycle ergometer to measure VO 2peak/max , however, compelling evidence points to a 6-15% lower VO 2peak compared to that obtained when running [40][41][42][43][44].

Cross-validation of VO 2peak prediction models
Randomly splitting data into validation and cross-validation samples established good stability throughout all models, suggesting minor shrinkage in accuracy if used on other similar populations. Moreover, data splitting will minimize potential over fitting that might deteriorate the external validity of the models [45]. For the peak performance models, error estimates are fairly stable across subgroups of both age and treadmill velocity. Conversely, in the VO 2peak subgroups we observed a trend of systematic under-and overestimation of the predicted values in the high-and low-fit participants, respectively. This is consistent with previous findings [12,46,47].
Similarly for the submaximal models, error estimates are reasonably stable across the treadmill velocity subgroups, whereas across the age subgroups there is a tendency towards under-and overestimating VO 2peak in the youngest and oldest, respectively. For the VO 2peak subgroups an even greater tendency towards under-and over estimation in the high-and lowfit participants, respectively, is observed compared to the peak performance models. Wier and colleagues [26] argue that the underestimation of the fittest participants is of less importance from a public health perspective, since a high level of fitness is not associated with adverse health outcomes. However, it highlights the necessity of using models derived from aerobically fit subjects to obtain high predictive accuracy and stability for a well-trained population. Such models are previously developed [12,15,18], while models with high predictive accuracy for low-fit populations are scarce. The models inability to accurately identify fitness level in the low-fit subjects represent a potential concern, since low aerobic fitness is associated with increased prevalence of chronic disease as well as a higher mortality risk, e.g. cardiovascular disease and metabolic syndrome [3,4,48]. However, cross-classification accurately predicted approximately 91% of participants, in both sexes, within the nearest quintile of measured VO 2peak . There are several factors that might contribute to the systematic over-and underestimation of VO 2peak , as well as to the attenuation of prediction accuracy. The statistical rationale is that our models are based on linear regression, where the distribution assumptions smooth out extreme observations compared to the grand mean, and may therefore under predict high observations and conversely over predict low observations (regression-to-the-mean phenomenon).
For the submaximal performance models there are additional plausible factors. Genetics account for an additional source of prediction inaccuracy as maximal heart rate is heterogeneous, with significant variations in a population [1]. Based on HUNT 3 fitness data, our group recently reported a standard deviation on measured maximal heart rate of ±14 beatsÁmin -1 [23]. Consequently, imbedding fraction of maximal/peak heart rate as a separate equation in the model weakens the accuracy of the VO 2peak prediction [1]. Furthermore, since the models are based on linear predictions the best trained are underestimated, and could be so because they have a good movement economy, conversely an overestimation of the least fit, attributed poor movement economy. These additional possible explanations are supported by the considerably higher over-and under estimation of VO 2peak in the submaximal performance models compared to the peak performance models. Moreover if a person using the prediction equation has a better movement economy than that of the subjects in the HUNT 3 fitness study, he or she will be overestimated using the submaximal model, and conversely underestimated with poor movement economy. The person will have a better or worse aerobic capacity, influenced by movement economy, not by VO 2peak.

The independent variables influence on VO 2peak
Calculating standardized β weights, for the models based on peak performance, revealed velocity as the key determinant of VO 2peak , followed by age and weight, among both sexes. Not surprisingly inclination had the least impact on VO 2peak , since approximately 87% of the subjects tested on 10% treadmill inclination in the peak performance models. Likewise for the submaximal models velocity was paramount in determining VO 2peak among both sexes. In men importance of succeeding determinants of VO 2peak were fraction HR peak (consisting of age and work heart rate), weight and inclination. For women this was altered to inclination, fraction of HR peak and weight. Inclination being more potent in women may be related to a larger diversity in running inclination. Explained variances in the submaximal models were 55% and 56% in men and women, respectively, which yields better predictive capabilities than some (31-51%) [17,31,34], and yet worse than other previous models (60-83%) [2,16,28,29,32,33,35].

Strengths and Limitations
The large sample size, including both men and women, and wide age range makes this study robust. Our direct test to volitional exhaustion to measure VO 2peak by ventilatory gas analysis is preferable compared to indirect estimates when making prediction equations from population studies, since direct measurements display higher correlations as well as lower standard error of estimate [5]. The low participation rate may contribute to bias caused by self-selection. Still, 5633 (45%) of those invited to the present Fitness study from the total HUNT population volunteered for the cardiopulmonary exercise test. Out of these 5633, 1003 candidates withdrew, did not complete the CPET or were excluded for medical reasons, leaving 4631 (37%) completed tests. Some potential candidates declined participation due to long waiting lines caused by limited capacity at test sites. Consequently, it is possible that those who finally partook could be healthier than those who withdrew from testing. However, comparing the Fitness study participants to a healthy sample from the total HUNT population (i.e. free from pulmonary-and cardiovascular diseases, sarcoidosis or cancer) established that there were no considerable differences between the two [3]. However, the consistent overestimation of the least fit candidates associated with the highest health risks is more precarious. This should be taken into account when applying the models.
The models inability to accurately identify fitness level in the low-fit subjects represent a potential concern, since low aerobic fitness is associated with increased prevalence of chronic disease as well as a higher mortality risk, e.g. cardiovascular disease and metabolic syndrome

Practical implications
In a health care setting the models good ability to detect subjects with low VO 2peak is paramount to classify persons in need of physical activity and lifestyle intervention. Cross-classification of participants into quintiles of measured and predicted VO 2peak demonstrate the models reasonable ability to classify participants appropriately. More importantly, both the use of peak-and submaximal performance models are considered a generally safe practice on high-risk cardiovascular disease patients [49]. Our models are derived from a large population of both men and women, with a wide heterogeneity in fitness levels as well as covering a large age span (20-90 years). This provides a high degree of applicability for widespread use.

Conclusions
The VO 2peak prediction models presented in this study are inexpensive and uncomplicated to utilize, thus a convenient option for both recreational athletes as well as in health care settings. Judicious and appropriate use of these predictive models will offer valuable information in providing a fairly accurate estimate of peak oxygen uptake, which is beneficial for establishing cardiorespiratory fitness, and with potentially improved risk stratification.