External validation of VO2max prediction models based on recreational and elite endurance athletes

In recent years, numerous prognostic models have been developed to predict VO2max. Nevertheless, their accuracy in endurance athletes (EA) stays mostly unvalidated. This study aimed to compare predicted VO2max (pVO2max) with directly measured VO2max by assessing the transferability of the currently available prediction models based on their R2, calibration-in-the-large, and calibration slope. 5,260 healthy adult EA underwent a maximal exertion cardiopulmonary exercise test (CPET) (84.76% male; age 34.6±9.5 yrs.; VO2max 52.97±7.39 mL·min-1·kg-1, BMI 23.59±2.73 kg·m-2). 13 models have been selected to establish pVO2max. Participants were classified into four endurance subgroups (high-, recreational-, low- trained, and “transition”) and four age subgroups (18–30, 31–45, 46–60, and ≥61 yrs.). Validation was performed according to TRIPOD guidelines. pVO2max was low-to-moderately associated with direct CPET measurements (p>0.05). Models with the highest accuracy were for males on a cycle ergometer (CE) (Kokkinos R2 = 0.64), females on CE (Kokkinos R2 = 0.65), males on a treadmill (TE) (Wasserman R2 = 0.26), females on TE (Wasserman R2 = 0.30). However, selected models underestimated pVO2max for younger and higher trained EA and overestimated for older and lower trained EA. All equations demonstrated merely moderate accuracy and should only be used as a supplemental method for physicians to estimate CRF in EA. It is necessary to derive new models on EA populations to include routinely in clinical practice and sports diagnostic.


Introduction
The concept of maximal oxygen uptake (VO 2max ) was first suggested by Hill et al. in the 1920s [1]. VO 2max is the highest attained oxygen uptake during an incremental exercise test with large muscle groups (e.g., treadmill or cycling). VO 2max is an important parameter to objectively assess cardiorespiratory fitness (CRF) both in healthy people and those suffering from cardiovascular diseases (CVD) [2,3]. The American Heart Association (AHA) recognized that CRF, described mainly as a VO 2max , should be used as an essential factor in the comprehensive diagnostic process [3]. Moreover, a lower level of CRF is strongly related to a higher risk of CVDs, death from numerous cancer types, and all-cause mortality [4]. This represents a switch from risk factors widely discussed in recent decades, such as smoking, hypertension, or hyperlipidemia [2,3,5].

VO 2max in sports & performance diagnostics
VO 2max is an important variable in endurance sports, such as running, cycling, swimming, triathlon, or team sports [6]. VO 2max strongly correlates with athlete's aerobic performance, could be applied to prescribe training properly, and is useful to assess adaptation to exercise [7][8][9]. Furthermore, the VO 2max could help in the prediction of a race time [10,11]. Elite athletes achieve varied VO 2max values, dependent on their discipline and training experience [12,13]. Males typically have higher VO 2max than females [14], and VO 2max values decrease with age [15]. Body weight and height are related as well as the testing mode. Higher VO 2max values are observed on a treadmill compared to the cycle ergometer [16]. Briefly, Kaminky et al. indicate the level of training, the method of testing (cycle ergometry, treadmill, rowing machine, etc.), co-existing CVDs, and respiratory exchange ratio (RER) as factors influencing VO 2 [17]. Other contributors perhaps include the psychological attitude of the athlete to the effort (i.e. that the CPET conducting until refuse may potentially last longer), race, and ethnicity [18,19]. Due to its numerous practical implications and variability, it is important to precisely assess VO 2max in different athletic disciplines and populations [20].

VO 2max in clinical practice
Measuring VO 2max is also especially important under clinical conditions during the examination of the cardiovascular system [3,21]. It could be regarded as the integrated function of (amongst others) lungs, heart, blood vessels, and muscles [9,22]. Recommendations for VO 2max -testing include the presence of ambiguous pathologic exertional symptoms, cardiovascular risk estimation, and monitoring response to applied treatment [21]. Moreover, understanding the exercise limitation is crucial information for healthcare professionals to monitor cardiac status and could be used to prescribe treatment properly for those suffering from CVDs [2,23]. Therefore, VO 2max is a practically relevant parameter for a new, growing population of patients in cardiologic ambulatory care-endurance athletes (EA) [21]. Both highly trained endurance athletes (HTEA), recreational endurance athletes (REA) and low-trained endurance athletes (LTEA) with suspected CVD and those undergoing cardiopulmonary exercise testing (CPET) for periodic training evaluation are potential candidates for VO 2max assessment [21].

Epidemiology of CVDs among EA
In light of current literature, there is growing importance in preventing and treating CVDs in athletes. As the number of people practicing endurance sports increases, new patient populations arose, professional and former EAs [3]. For example, in recent times, due to the SARS--CoV-2 infection, some athletes have had cardiac involvement. CPET and VO 2max assessment are important elements of a comprehensive diagnostic approach [24]. To deepen the epidemiological data, it is worth mentioning that arterial hypertension is the most common CVD among physically active people. The risk of CVD is found especially in the group of people after 35 years old (thus former and retired EA). Although sport is recognized as a preventive factor for CVD, Medical Practitioners should be aware of the prevalence of risk factors among EA [25]. What is more, as claimed by Petek et al. among the wide cohort of collegiate athletes prevalence of persistent or exertional symptoms on return to exercise occurs only in 44/3529 (1.2%). This has been achieved, among others, by a properly conducted diagnostic and screening process consisting of CPET and VO 2 assessment [21].
Again, as observed by Petek et al. the comparison of VO 2max with cardiac morphology and echocardiography may facilitate the correct planning of the therapy [24]. Moreover, Moulson et al. recently found that CPET is a valuable component of the Return to Play Program and cardiac screening in young competitive EAs following SARS-CoV-2 infection [26]. To summarize, directly measured VO 2max can be used as a valuable predictive cardiometabolic risk factor.

CPET protocol and applicability of prediction formulae
The gold standard to measure VO 2max is performing a CPET [27]. VO 2max is reached when the subject meets the physiological limit and maintains it for some time (usually 15-s, 30-s, or 60-s) [28]. Due to practical reasons, such as high costs of the procedure or a lack of testing devices as well as health contraindications, this form of measuring is often not possible to apply in a sports setting [27].
Parameters such as age, sex, and heart rate (HR) could be used to predict VO 2max through various models [27,29]. The reliability of this potentially non-sophisticated and valuable method is complicated and doubtful because of low accuracy, especially in women, extremely small or tall subjects, and in individuals with high BMI values [30,31]. In the 2013 statement, AHA pointed out that there is a need for a universal and transferable prediction standard [32].
Prediction formulae undoubtedly have numerous advantages, however, those currently used were created on different populations and with the incorporation of heterogeneous testing modes [33]. Indeed, proper external validation should be a mandatory stage before the new model will be widely used [34,35]. Moreover, the risk of using only predicted values is a certain inaccuracy and error in the particular equations [36]. On the other hand, the benefit is that there is no need to undergo full CPET, which may be expensive, or when there is limited availability of specialized clinics, equipment, etc (eg. in a field settings) [37,38].
Validation studies are performed to evaluate a given model in varied conditions and on differentiated populations to assess its possible measurement bias and the ability to extrapolate its results [39]. This study aimed to externally evaluate prediction formulae on EA tested under the same conditions from one tertiary care sports diagnostic center. EAs were selected for a study population as VO 2max is an important parameter in the evaluation of the overall fitness level and the selected equations are often derived from the athletic population [22]. The secondary aim was to assess the impact of age and CRF on the risk of error and bias in tested models. We hypothesize that their validity may not be sufficient to make them an equivalent method for directly measured VO 2max .

Material and methods
We applied TRIPOD guidelines for the development and validation of prediction models (for detailed protocol see Supplementary information. TRIPOD Checklist for Prediction Model Validation) [39]. Results from CPETs collected between 2013-2021 were retrospectively analyzed. Maximal-effort examinations consisted of the treadmill (TE) or the cycle ergometry (CE) tests, paired with body composition (BC) analysis took place in the medical clinic (www. sportslab.pl, Warsaw, Poland). Tests were performed on an individual request as a part of regular endurance assessment or training monitoring.

Cardiopulmonary exercise testing protocol
Cardiopulmonary exercise tests (CPET) were preceded by body mass (BM) and fat mass (FM) analysis with 5 kHz/50 kHz/250 kHz electrical bioimpedance method on the body composition (BC) monitor (Tanita, MC 718, Japan. Conditions during BC and CPET were: 40 m 2 indoor, air-conditioned area, 40-60% humidity, temperature 20-22˚C, altitude 100 m MSL. Endurance athletes (EA) were instructed via e-mail on how to prepare: avoid any demanding exercises 24 hours before CPET, consume a high carbohydrate meal and hydrate with isotonic beverages 2-3 hours earlier, and exclude any stimulants or caffeine on the day of the procedure.
Cycle ergometry (CE) examination was performed on a cycle ergometry Cyclus-2 (RBM elektronik-automation GmbH, Leipzig, Germany) and treadmill (TE) examination was conducted on a mechanical treadmill (h/p/Cosmos quasar, Germany). CPET scores were measured using a Hans Rudolph V2 Mask (Hans Rudolph, Inc, Shawnee, KS, USA), a gas exchange analyzer Cosmed Quark CPET (Rome, Italy), and dedicated manufacturer's software (from PFT Suite to Omnia 10.0E.). Data collection was performed with a breath-by-breath acquisition system and a 15-s filter was used for data analysis. Each breath was considered as a separate point and all points were included in the calculation of the average VO 2max value.HR was measured via ANT and a torso strap as a part of the Cosmed Quark set (product accuracy comparable to ECG; ± 1 bpm.). The CPET device was calibrated with reference gas (16% O 2 ; 5% CO 2 ) and turbine flow for each person separately, according to manufacturer recommendations. Equipment software was regularly actualized between 2013-2021. Three gas analyzing devices were utilized and each one has been changed after 36-48 months. Every part of CPET equipment was periodically verified by manufacturer employees to keep their mechanical certificates valid. Blood lactate (LA) was assessed with the usage of Super GL2 analyzer (Müller Gerätebau GmbH, Freital, Germany). The instrument was also individually prepared before each round of analysis and calibrated with reference solution before each sample set.
Exercises begin with a 5-min. warm-up (walking or pedaling with minimal resistance). Participants' endurance capacities were used to assess starting load. The initial power for CE was 60-150W and was increased in 2 min. intervals by 20-30W. The initial speed for TE was 7-12 km�h -1 (described by a person as a "conversation pace") at 1% inclination. The pace was raised by 1 km�h -1 every 2 min. Observer verbally encouraged athletes to keep effort as long as possible due to assess their endurance most exactly. Achievement of oxygen uptake (VO 2 ) or heart rate (HR) plateau, or volitional inability to maintain intensity were reasons for test termination. LA was measured by taking a 20 μL blood sample from a fingertip: directly prior to exercises, after any resistance or pace modification, and 3 min. after termination. Samples were obtained without an interruption in CE and TE tests. Before a proper sample was obtained, the first drops were gathered in a swab. HR (not averaged) was recorded at the highest point during intervals and used in further analysis [40]. Maximal oxygen uptake (VO 2max ) was defined as an averaged maximum oxygen uptake during the 15-s period at the end of the CPET.

Derivation cohort
The rigorous inclusion/exclusion process was applied to narrow the validation group to only those EAs who achieved maximum exertion during CPET and were free of any possible VO 2max alleviating factors (see Fig 1. Flowchart of the inclusion-exclusion and further groups classification process). 6,439 EAs underwent CPET. Participants were eligible for preliminary inclusion if they had: (1) experience in regular running or cycling training �3 months, (2) age �18 years, (3) �±3 standard deviations (SD) from mean for all of the testing variables (extreme outliers were excluded), (4) lack of any acute or chronic medical condition (also musculoskeletal injuries, or addictions), (5) not taking any medications, (6) not being an active smoker.
Finally, 5,260 EA met all inclusion criteria. The population was divided between males and females into four age groups: 18-30; 31-45; 46-60, �61 years, and 4 endurance groups: HTEA, REA, LTEA, and "transition". Endurance classification was conducted based on the speed (km�h -1 ) or power (W�kg -1 ) at the RCP calculated independently for each sex. Speed/ power at RCP was a variable-of-choice because it is currently described as a parameter most closely corresponding to the critical endurance capacity [41,42]. Moreover, the selection of a variable different from VO 2max to the classification of participants in terms of their endurance capacity, enabled to make group assignments independent of the factor directly validated in the study. Participants with >+1.5 SD were classified as HTEA (n = 309), <+0.5SD/>-0.5SD as REA (n = 2,033), <-1.5 SD as LTEA (n = 339). To precisely distinguish endurance subgroups, those placed between �+0.5SD/�+1.5SD and between �-0.5SD/�-1.5SD were classified as "transition" (n = 2.579). Models' validation was conducted on each of the age and endurance cohorts independently (except the "transition" group) both for TE VO2max and CE VO2max .

Selected prediction models
Candidate models were found from previous systematic reviews for CPET testing (up to February 2019) [43,44] and additional literature search in PubMed, MEDLINE, EMBASE, Scopus, and Web of Science databases (for a period between March 2019-December 2021 and metaanalyses) for keywords: Cardiopulmonary exercise testing, Cardiorespiratory fitness, Exercise testing, VO 2max , VO 2peak .

Fig 1. Flowchart of the inclusion-exclusion and further groups classification process.
Age classification is presented in years. Endurance classification has been performed based on speed/power at respiratory compensation point (RCP) which is currently described as a variable most closely corresponding to the critical power. Moreover, the selection of a variable different from VO 2max to the classification of participants in terms of their endurance capacity, enable to make group assignments independent of the factor directly validated in the study. Abbreviations: EA, endurance athlete; CPET, cardiopulmonary exercise testing; SD, standard deviation; RER, respiratory exchange ratio; VO 2 , oxygen uptake (mL�min -1 �kg -1) ; LA, lactate concentration (mmol�L -1 ); fR, breathing frequency (breaths�min -1) ; RCP, respiratory compensation point; HR peak , peak heart rate during CPET (bpm); HR max , maximal heart rate during CPET (bpm); F, female; M, male; HTEA, high-trained endurance athletes; REA, recreational endurance athletes, LTEA, low-trained endurance athletes.
Moreover, the Wasserman et al. [22] model was validated in the study due to its well-established reputation. Equations from 2 meta-analyses [46,47] were also considered because of their wide range of applications for EA.
During studies selection, we did not define the criteria for the VO2max measurement protocol due to the high variability of the currently described methods. Nevertheless, according to the current literature, different testing protocols were applied for runners and cyclists [48][49][50][51]. Similar values of VO 2max were observed, which suggests that it is possible to provide an exact comparison between them.
13 equations from 8 different publications were included in the analysis. Their detailed characteristics are presented in the supplementary material (S2 File).

Statistical analysis
Baseline statistics were exported into the Excel file (Microsoft Corporation, Washington, USA) and are presented as mean (±SD and 95% CI) or frequency (percentage) for categorical variables, and median for continuous variables. Differences between subgroups (all continuous variables) were analyzed using the ANOVA test-of-variance and post-hoc HSD Tukey test. There was not any missing data in the whole population. Thus, an entire cohort has been validated.
External validation was conducted by following the recommendations for the validation and interpretation of diagnostic prediction models [34]. In summary, we assessed equations accuracy by comparisons between the originally established formulas and data obtained directly from CPETs and BC examinations (e.g., VO 2max , BMI). Linear model regressing measured VO 2 max on pVO 2 max was generated for each equation. Performance considered as the proximity of the observed and expected CRF, was evaluated with the usage of the R 2 , root mean square error (RMSE). Cutoffs for R 2 were: (1) R 2 <0.3 for none or very weak effect size; (2) 0.3<R 2 <0.5 for weak or low effect size; (3) 0.5<R 2 <0.7 for moderate effect size; (4) R 2 >0.7 for high effect size [52]. Additionally, calibration slope (the slope of a linear regression model that includes the model's linear predictor as the only covariate parameter estimate where 1 being ideal; C1), and calibration-in-the-large (mean observed compared to mean predicted value where 0 being ideal; C2) were calculated. Ggplot 2 package in RStudio (R Core Team, Vienna, Austria; version 3.6.4), originally written Python script (Python Software Foundation, Delaware, USA; version 3.10.1), and STATA software (StataCorp, College Station, Texas, USA; version 15.1) were used in statistical analysis. The significance borderline was at a two-sided p-value <0.05.

Ethical approval
All parts of the study were approved by the Bioethical Committee-IRB of the Medical University of Warsaw (AKBE/32/2021) and were conducted in line with the Declaration of Helsinki. Moreover, each EA has to provide their written consent in a separate document.
Briefly, VO 2max differed significantly between the selected equations. The performance of prediction models is presented in Tables 2 and 3 along with R 2 , root mean square error (RMSE), calibration-in-the-large (C1), and calibration slope (C2). Figs 2-5 shows the regression analysis of observed vs predicted VO 2max stratified by age for the whole population, HTEA, REA, and LTEA, respectively. Subgroups that did not meet the TRIPOD guidelines [30] to consider their validation results as reliable (i.e., n�100) were additionally marked in tables and graphs.
Performance calculations for the whole population and each subgroup, with comparison (mean and SD) between observed and predicted VO 2max are presented in the supplementary material (Table 3a-3d in S4 File). For TE, the lowest non-significant differences (mean and CI) were for Petek's equation both in males (mean = -0.11; CI, -0.42, 0.20) and females (mean = -0.52; CI,-1.20, 0.16). For CE, the lowest non-significant differences (mean and CI) were for Petek's equation in the male population (mean = -0.08; CI, -0.68, 0.52). Similarly, for the female population, the lowest but significant differences were also for Petek's equation n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a

Fitzgerald et al. † (mL�min -1
�kg -1 ) n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a

Fitzgerald et al. † (mL�min -1
�kg -1 ) n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/ a n/a n/ a n/ a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/ a n/a n/ a n/ a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  (1) and (2) for females). For CE, significant overestimation was observed in Wilson's and Fitzgerald's models respectively for males and females, and underestimation in Wasserman's, Mylius's, and Kokkinos's formulae for both males, females, and the whole population.
For HTEA a group size � 100 was only for male runners (n = 188). For TE, significant differences between the observed and predicted VO 2max were both in males and females for the HTEA subgroup. The lowest obtained differences on TE were for the Wilson's for males (mean = 2.52; CI, 1.50, 3.54) and for the Fitzgerald's for females (mean = 4.1; CI, 1.92, 6.28).

Discussion
The aim of the current study was to assess the accuracy of common VO 2max prediction equations in a large sample of healthy EA tested under standardized conditions. We hypothesized that their accuracy may not be adequate to make them a comparable approach for CPET.
The main novelty of this study is a comprehensive comparison of the accuracy of various formulas and their usefulness for determining VO 2max in the athletic population (including different subpopulations depending on participants' training level). The analysis of the accuracy of prediction equations suggests that more precise models are required to better establish the VO 2max level, which may be crucial for the clinical assessment of EAs.
The main findings are that: (1) the currently available equations show limited accuracy, (2) it is most recommended to use models derived from populations with the possibly most similar characteristics to the target group, (3) models derived from active athletic populations works the most accurately and showed the highest transferability, and (4) a steeper decline in predicted VO 2max for older participants was noted.

Current limitations in model's transferability
Until now, most frequently underestimation of results of what in younger EA and overestimation in older ones have been observed [21,29]. Malek et al. found that 16 of 18 commonly used prediction equations were inaccurate when used in an athlete population [29]. Moreover, there was a lack of equations to predict VO 2max developed in large samples of trained participants, especially elite athletes [21]. In one recent study, Petek et developed new VO 2max equations for EA, although the sample size was relatively small [21]. Their main finding was that the previously established models, both on general cohorts and EA, perform poorly when used for EA undergoing CPET for clinical reasons.
Valid VO 2max prediction equations are important as they can lead to false-negative or falsepositive results and inadequate recommendations regarding a safe level of physical activity or the level of advancement of the training plan [8,21]. Furthermore, the normality of the VO 2max values is often a very important step to determine the cause of the exercise limitation [53].
Practical application of the most accurately derived predictive equations is a better distinction of physiological vs impaired endurance. Moreover, it undoubtedly improves the clinical usage of VO 2max assessment for EA examined with the suspected or confirmed CVD or to precisely prepare individualized training plans.
One of the reasons for obtaining very heterogeneous predicted results is the discrepancy in the methodology [21,30]. Potential complications were mainly related to CPET-usage of cardiology-specific protocols for TE (e.g., Bruce protocol [54]) which are not commonly used in sports-performance diagnostics [55]. Individual running economy, general fatigue, or nonspecific stress during testing rise the probability of bias [20]. The error could be even up to 40% of the actual value [28]. Our study population is larger and contains EA from the individual-or team-sports disciplines. The testing protocol was strictly standardized, and measurements included advanced parameters influencing performance-LA and BM [55].

Repercussions of using predictive equations with low to moderate accuracy
Consequences of applying prediction models with limited accuracy could be seen in sports and performance diagnostics, clinical practice, the applicability of particular equations, training prescription, and follow-up. In sports diagnostics, this can lead to prescribing incorrect, ineffective training [56]. In sports medicine and cardiology obtaining accurate VO 2max values is especially important for patients with CVDs, given the growing data suggesting the role of CRF in stratifying the risk in such groups [3,57]. Moreover, there are currently more inaccuracies in the estimation of VO 2max among patients suffering from CVDs than in healthy individuals. Overestimation is especially noticeable among patients with impaired cardiac output during exercise. Relying on inaccurate VO 2max values may result in a missed diagnosis and incorrectly prescribed therapy, which does not bring the expected results and poses a health risk. Among other conditions, of particular importance is heart failure as referred to by Kokkinos et al [57]. Furtherly, for such equations, their applicability is limited to narrow populations with characteristics as close as possible to the group from which they were originally created (i.e. derivation cohorts) [21].

Specificity of particular subgroups
Outcome of the present study was that the examined prediction equations of VO2max had limited prediction value in the locomotion (running versus cycling), age, and performance subgroups of participants. An explanation of this limited prediction value might be due to the selection of specific predictors (sex, age, and weight) that were not measures of CRF. Among  the selected predictors, the only mechanical workload was a measure of CRF. CRF in EA consisted not only of a health-related but also a sport-related physical fitness parameter; thus, it would be of great practical importance that predicted VO2max could reflect changes in sports performance. Furthermore, performance subgroups of participants might differ for body composition (i.e. lower body fat percentage in HTEA than in LTEA), which in turn might consist a bias in the assessment of CRF [58]. Ceaser and Hunter point out that endurance capacity may also depend on participant ethnicity, so this factor should be considered when deriving new models [18,58]. It is worth noting that models derived from wide populations or EA groups showed the highest performance. This is in line with the results observed so far by Petek et al. and Malek et al. [21,29]. The sex-specific equations (i.e. those provided by Kokkinos et al for CE) did not show noticeable higher accuracy. The underlying mechanism remains further investigation as females are presented physiologically with lower VO 2max than males [59]. A steeper decline in predicted values for older EA in VO 2max may be justified by maintaining a higher VO 2 with age through regular physical activity. Similar results have already been confirmed by Kaminsky et al [59]. EAs observe a lower decline in VO 2 with age than their corresponding reference group [59]. The highest inaccuracies have been noted for HTEA and younger participants. Perhaps, due to their physically higher VO 2max which is also supported by additional endurance training. Thus, those EA placed above normal reference values [59]. As we underlined, there is a need for more advanced prediction models which will consider additional parameters (like age and physical activity) and fits demanding of HTEA and young individuals.

Source of errors measured & differences between protocols
The observed R 2 ranged from 0.02 (Fitzgerald et  . We stipulate that this finding could result from similar characteristics of the primary cohort to our validation cohort. As different CPET results are achieved on CE and TE, the Petek et al. models adjusted for both modalities and athletic group showed the best performance. It is worth mentioning, that the tested formulae represented generally lower validity for women. It is well established that female athletes achieve lower CPET scores compared to male athletes. Although, the underlying mechanism for the reduced VO 2 prediction rate in this sex remains unclear. Moreover, the results between the models differed significantly (p<0.05), despite validating them on our one population. It is suggested to consider the most outlying results. More research is needed to refine the effects and recalibration of the currently available equations. The models derived from broader cohorts (provided from FRIENDS by Kokkinos et al.) or sports cohorts (provided by Petek et al.) showed less inaccuracy in both direct comparison of measured and predicted VO 2 and statistical indices (R 2 , RMSE and calibrations). Thus, we recommended them for estimating _ VO 2max in male and female EAs. We would like to note that the equations derived from metaanalyses, i.e. Figerald and Wilson represented the smallest inaccuracy between directly measured VO 2max and predicted. The salient feature of the meta-analysis equations is that they utilize attainable demographic information for the widest cohorts. This advantage is not feasible for equations derived from original papers due to practical limitations in recruiting such a numerous amount of participants. To summarize, models represent wide differences, and innacuracies were lower when applied to cohorts with comparable profiles.

Directions of future research
We recommend that the formulas used to estimate VO 2max should be applied to groups with a similar profile to the one from which they were originally derived, especially in narrow populations like LTEA, REA, or HTEA [45]. At the same time, we emphasize that there is a significant need to create new, more advanced models under unified guidelines and with the incorporation of PROBAST-AI [60] and TRIPOD checklist [35]. It will facilitate the further selection of the appropriate equation to apply in EA depending on their level of CRF. In addition, the need of selecting other predictors, such as oxygen uptake at submaximal exercise intensity, ethnicity, or a daily number of steps, should be considered in future studies.

Conclusions
To conclude, we have accomplished an independent external validation of prognostic models for the prediction of the CRF level, defined as a VO 2max . Each included prognostic model showed only moderate discriminatory ability, but acceptable performance at derivation population. Direct VO 2max determination by CPET cannot be replaced or interchangeable with predictive equations for EA based only on their own results. An updated and unified prognostic formula for clinical and experimental use in EA populations is necessary. Despite no formula being completely exact, the best performance was noted for males on the CE in Kokkinos model (R 2 = 0.64) and males on the TE in the Wasserman model (R 2 = 0.26), whereas for females on the CE in Kokkinos (R 2 = 0.65) and female on the TE in Wasserman (R 2 = 0.30) equations. Those models seem to better predict VO 2max in our EA population and may provide utility as a method-of-choice in assessment tool during sports diagnostics or clinical practice. The overall lowest model accuracy has been observed for HTEA and EA 18-30 yr. A potential limitation of the study was the ethnic homogeneity of our group, as the subjects were mainly Caucasian.