Is a verification phase useful for confirming maximal oxygen uptake in apparently healthy adults? A systematic review and meta-analysis

1 Graduate Program in Exercise Science and Sports, University of Rio de Janeiro State, Rio de Janeiro, Brazil, 2 Laboratory of Physical Activity and Health Promotion, University of Rio de Janeiro State, Rio de Janeiro, Brazil, 3 Department of Sport and Physical Activity, Edge Hill University, Ormskirk, Lancashire, England, 4 Department of Sport, Health and Exercise Science, University of Hull, Hull, England, 5 Department of Kinesiology, California State University, San Marcos, California, United States of America, 6 Department of Clinical Medicine, Clinics of Hypertension and Associated Metabolic Diseases, University of Rio de Janeiro State, Rio de Janeiro, Brazil


Objective
To compare the highest VO 2 responses observed in different verification phase procedures with their preceding CPET for confirmation that VO 2max was likely attained.

Methods
MEDLINE (accessed through PubMed), Web of Science, SPORTDiscus, and Cochrane (accessed through Wiley) were searched for relevant studies that involved apparently healthy adults, VO 2max determination by indirect calorimetry, and a CPET on a cycle ergometer or treadmill that incorporated an appended verification phase. RevMan 5.3 software was used to analyze the pooled effect of the CPET and verification phase on the highest mean VO 2 . Meta-analysis effect size calculations incorporated random-effects assumptions due to the diversity of experimental protocols employed. I 2 was calculated to determine the heterogeneity of VO 2 responses, and a funnel plot was used to check the risk of bias, within the mean VO 2 responses from the primary studies. Subgroup analyses were used to test the moderator effects of sex, cardiorespiratory fitness, exercise modality, CPET protocol, and verification phase protocol.

Introduction
Maximal oxygen uptake (VO 2max ) represents the upper physiological limit of the utilization of oxygen for producing energy during strenuous exercise performed until volitional exhaustion [1,2]. The VO 2max is widely regarded as the gold standard measure of cardiorespiratory fitness and is typically determined using a cardiopulmonary exercise test (CPET) in clinical, applied physiology, and sport and exercise science settings [1,[3][4][5][6]. The VO 2max is often used to diagnose cardiovascular disease [7], predict all-cause mortality [8][9][10], develop exercise prescriptions [3,11,12], and evaluate the efficacy of exercise programmes [13][14][15]. Consequently, the validity of VO 2max values obtained during CPETs has widespread importance in clinical, sporting, and research-related contexts. The use of indirect calorimetry for the determination of VO 2max during exercise testing to volitional exhaustion on a treadmill or cycle ergometer has become common during the past few decades [16][17][18]. This has largely been attributed to the development of fast-responding metabolic gas analyzers allowing the time-efficient acquisition of real-time, breath-by-breath, respiratory gas exchange and flow rate data during CPET [see 19 for a review]. These technological advances have contributed to a transition from the Douglas bag method and time-consuming discontinuous step-incremented protocols to more time-efficient continuous ramp or pseudo-ramp protocols for determining VO 2max [20][21][22][23][24][25]. Despite the considerable progress in the efficiency by which CPET can be conducted and evaluated, there is still much to be learned about the determination of VO 2max [2,[24][25][26][27][28][29][30]. One particularly problematic aspect has been the challenge in identifying a lack of VO 2max attainment due to inappropriate test protocols, premature fatigue, or poor participant motivation and lack of effort [31].
The concept of a VO 2max originated almost 100 years ago with the seminal works of Hill and colleagues [32,33]. They proposed the existence of an individual upper limit or 'ceiling' of VO 2 during maximal exercise, beyond which no further increase in VO 2 occurs despite increasing work rate (WR) and higher metabolic demand. The primary criterion for confirming that a VO 2max has been elicited has historically been based on the occurrence of a VO 2 plateau, commonly defined as a small or no increase in VO 2 despite a continued increase in WR [34]. The landmark study of Taylor et al. [34] was the first to use a formal VO 2 plateau criterion, which was defined as an increase in VO 2 of less than 0.150 L/min (or � 2.1 mL�kg -1 �min -1 , considering an average body mass of 72 kg from 115 male participants) in response to a specific discontinuous step-incremented protocol performed over 3-5 laboratory visits. Subsequent studies have often used the Taylor et al. [34] criterion or alternative thresholds to confirm the attainment of a VO 2 plateau [see 29 for a review]. Since the widespread adoption of continuous short-duration and ramp-based CPET protocols, several studies have reported low incidences of the VO 2 plateau [35][36][37][38][39]. The variability in VO 2 plateau incidence has been attributed to differences in the criteria used for detecting the VO 2 [49][50][51].
In the absence of a VO 2 plateau, secondary VO 2max criteria based upon achievement of threshold values for the respiratory exchange ratio (RER), percentage of age-predicted maximal heart rate, post-exercise blood lactate concentration, and ratings of perceived exertion (RPE) have become commonly used to evaluate whether a true VO 2max has been attained [29,40]. However, this approach has been widely criticized by numerous investigators due to the individual variability in maximal physiological responses for these variables and lack of specificity in identifying individuals who did not continue the CPET to their limit of exercise tolerance. Research has shown that some individuals can satisfy some of the secondary criteria thresholds long before the highest VO 2 value observed in the CPET has been attained [2,29,37,39]. The maximal RER criterion, for example, can be satisfied at VO 2 values 27-39% lower than the highest VO 2 value achieved in the CPET [37,39]. Like the VO 2 plateau, secondary VO 2max criteria are often dependent on exercise modality, test protocol, and participant characteristics [29].
A review by Midgley et al. [29] suggested a new set of standardized VO 2max criteria should be developed that are independent of exercise modality, test protocol, and participant characteristics, so they can be universally applied. In 2009, Midgley and Carroll [28] provided an early narrative review of an evolving test procedure that showed promise for developing more standardized VO 2max criteria, the so-called 'verification phase'. The verification phase consists of an appended square wave bout of severe-intensity exercise (e.g. above critical power), or similar multistage exercise bout, performed until the limit of exercise tolerance [28]. It is commonly applied after a short recovery period from a CPET, however, longer recovery periods of up to 24-48 hours also have been used [52]. The verification phase is based on the premise that when the highest VO 2 values in the CPET are consistent with the verification phase (typically within 2-3% in accordance with the test-retest reliability of VO 2max ), this provides substantial empirical support that the highest possible VO 2 has been elicited. Poole and Jones [2] recently stated that to confirm the attainment of VO 2max a verification phase should be performed at a higher WR than the last load attained in the CPET (i.e. > WR peak ) in all future studies. Conversely, Iannetta et al. [25] recommended WRs within the upper limit of the severe exercise intensity domain to allow the verification phase to be maintained long enough for VO 2max attainment. According to their recent findings, verification phases performed at 110% of the WR peak attained during CPETs with increment rates of 25 and 30 W/min resulted in exercise durations that were too short to allow VO 2 to reach the highest VO 2 recorded at the end of the preceding ramp CPETs [25]. Along with exercise intensity and duration, it is also unclear whether other factors affect the utility of the verification phase such as exercise modality, differences in the type and duration of the recovery period between the verification phase and CPET, whether a verification criterion threshold is adopted, and participant characteristics such as sex and cardiorespiratory fitness levels.
Given the considerable uncertainty regarding the application of the verification phase, it is feasible to think that a systematic review and meta-analysis is needed to comprehensively summarize the evidence for improving our understanding of the strengths and weaknesses of the substantial number of different verification procedures that have been utilized and its impact on the attainment of VO 2max . Thus, the aim of the present study was to systematically review and provide a meta-analysis on the application of the verification phase for confirming whether the highest possible VO 2 has been attained during ramp or step-incremented CPETs in apparently healthy adults.

Protocol and registration
The systematic review was performed in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. A completed PRISMA checklist is shown in S1 Checklist. The protocol for this study was recorded at http://www.crd.york.ac. uk/PROSPERO (CRD42019123540). The main questions addressed by the present study were: To what extent does the highest VO 2 attained in the CPET differ from that attained in the verification phase? Secondly, are the highest VO 2 values in the CPET and verification phase affected by the verification-phase characteristics (e.g. intensity, adoption of a criterion threshold, and aspects of the recovery period between the CPET and the verification phase), or even with respect to particular subgroups (e.g. sex, cardiorespiratory fitness levels, exercise test modality, and CPET protocol design) in apparently healthy adults?

Search strategy
MEDLINE (accessed through PubMed), Web of Science, SPORTDiscus, and Cochrane (accessed through Wiley) were searched for peer-reviewed literature using a combination of medical subject heading (MeSH) descriptors, with a time frame that spanned the inception of each database until the search date (September 30 th , 2020). The search strategy was developed based on the PICO method [i.e. Participants: apparently healthy humans; Interventions: any intervention involving exercise; Comparisons: incremental CPET and an appended squarewave or multistage verification phase; and Outcome: VO 2max confirmation]. The electronic search strategies for all databases are provided in S1 Text.
The terms were adapted for use with other bibliographic databases. Reference lists and citations of eligible articles were also hand searched for additional relevant studies. The search was performed in a standardized manner by two independent researchers (VABC and TP). Only English language studies were eligible for inclusion and only if they satisfied three a priori criteria: (1) involved apparently healthy participants who were � 18 years of age; (2) determined VO 2max using expired gas analysis indirect calorimetry; and (3) the CPET was carried out using bipedal cycle ergometer or bipedal treadmill running or walking. Studies were excluded if they involved: (1) participants who had taken dietary supplements or drugs that could affect body mass, metabolic profile, or exercise performance; or (2) the use of non-maximal test protocols.

Data extraction and management
Two independent reviewers extracted data using a standardized form. The following data were summarized: (1) characteristics of study participants (total sample number, sex, age, body mass index [BMI], and cardiorespiratory fitness); (2) type of intervention (CPET and verification-phase duration, exercise modality, and exercise test protocol used); and (3) outcome measures (mean ± standard deviation [SD] for group VO 2max and protocol duration during the CPET and verification phase). Disagreements were resolved by consensus. When the relevant quantitative data were not reported, authors of the original studies were contacted to request the data.

Quality assessment
The risk of bias for all eligible studies was not assessed because it does not apply to the characteristics of the present review. For example, randomization sequence generation and treatment allocation concealment were not applied, since there were no comparison groups and each individual acted as their own control. It is also noteworthy to mention the absence of blinding in both participants undergoing testing and evaluators who applied the CPET and verification phases, because procedurally all exercise protocols were performed in a fixed order (i.e. CPET followed by the verification phase). Given that VO 2max is the evaluation of an objective numerical variable, the blinding of the evaluator does not generate a different interpretation of the VO 2max values obtained in a CPET and verification phase. Finally, the assessment of incomplete outcome data (sample loss) and selective reporting of outcomes also does not apply, because it is a cross-sectional study with a single outcome of interest.

Statistical analysis
All meta-analyses were performed using Review Manager (RevMan) software version 5.3 (Copenhagen, The Nordic Cochrane Centre, The Cochrane Collaboration, 2014). Data are presented as the mean ± SD unless otherwise stated. The outcome was the mean difference (95% confidence interval [CI]) between the CPET and verification phase for the highest absolute VO 2 (L/min). Given that absolute VO 2 are continuous data, the weighted mean difference (WMD) method was used for combining study effect size estimates. With the WMD method, the pooled effect estimate represents a weighted mean of all included study group comparisons. The weighting assigned to each individual study group (i.e. the comparison of the CPET and verification phase results) in the analysis is inversely proportional to the variance of the absolute VO 2 (L/min). This method typically assigns more weight in the meta-analysis to studies with the highest precision (inverse variance) /larger sample sizes. The WMDs were calculated using random-effects models given the study group differences in CPET modalities and protocols, types of recovery, and verification phase protocols.
Heterogeneity of net study group changes in VO 2max (L/min) was examined using the Q statistic. Cochran's Q statistic is computed by summing the squared deviations of each trial's estimate from the overall meta-analytic estimate and weighting each trial's contribution in the same manner as in the meta-analysis. P-values were obtained by comparing the statistic with a χ 2 distribution with k-1 degrees of freedom (where k is the number of trials). A P-value of < 0.10 was adopted since the Q statistic tends to suffer from low differential power. The formal Q statistic was used in conjunction with the methods for assessing heterogeneity. The I 2 statistic measures the extent of inconsistency among the results of the primary study groups, interpreted approximately as the proportion of total variation in point estimates that is due to heterogeneity rather than sampling error. Effect sizes with a corresponding I 2 value of � 50% were considered to have low heterogeneity. The publication bias of the articles was assessed using a funnel plot.
Subgroup analyses were defined a priori to investigate the magnitude of differences between CPETs and verification phases due to variations in sex, cardiorespiratory fitness level, exercise modality, CPET protocol design, or how the verification phase was performed. Forest plots were constructed to display values at the 95% confidence level. Effect sizes were calculated by subtracting the highest mean values for VO 2 (L/min) observed in the CPET from the verification phase values, on the basis of grouping studies with selected verification-phase characteristics for intensity (i.e. sub vs. supra WR peak ) and type of recovery between the CPET and verification phase (i.e. active vs. passive). The studies were also classified according to whether a criterion threshold for VO 2max was used for the verification phase (i.e. yes vs. no), whether the verification phase was performed in the same testing session as the CPET or on a different day, and the duration of the verification phase (i.e. � 80 s, 81-120 s, and > 120 s). Stratified analyses were also conducted according to particular subgroups such as sex (i.e. male and female), cardiorespiratory fitness level using the cut-off points proposed by Astorino et al. [53] (i.e. low: < 40 mL�kg -1 �min -1 ; moderate: 40-50 mL�kg -1 �min -1 ; high: > 50 mL�kg -1 �min -1 ), exercise test modality (i.e. cycling and running), and CPET protocol design (i.e. discontinuous step-incremented, continuous step-incremented, and ramp protocols).

Results
The literature search identified 371 potential articles, with 334 obtained from electronic database searches and 37 from the wider inspection of reference lists and electronic citations of these articles. Eighty studies published between 1980 and 2020 met the eligibility criteria and were included in the systematic review (see Fig 1).

Participants
The total number of participants recruited across all included studies was 1,680 (1,077 men, 473 women, and the sex of 130 participants was not specified). Included studies had a median (interquartile range [IQR]) sample size of 13 [10] participants. Participants were aged between 19 and 68 yr, all apparently healthy, and with a physical activity status ranging from sedentary to highly-trained endurance athletes. Thirty-six studies included only men, two included only women, 41 included both men and women, and one study did not specify the sex of the participants (see Table 1). On average, participants had a BMI within the normal range (mean ± SD [range]: 24.4 ± 2.5 [19.4-32.0] kg/m 2 ) and a moderate level of cardiorespiratory fitness (VO 2max mean ± SD [range]: 46.9 ± 12.1 [23.9-68.6] mL�kg -1 �min -1 ). Table 2 summarizes the characteristics of the CPET and verification phase protocols of the 80 studies included in this systematic review. Forty-three studies (54%) performed the CPET on a cycle ergometer, 35 (44%) on a treadmill, and two studies (3%) used both modalities. Seventythree studies (91%) used continuous step-incremented or ramp/pseudo-ramp CPET protocols. Three (4%) used only discontinuous step-incremented protocols. Two studies (3%) used both discontinuous and continuous step-incremented protocols and another two studies (3%) applied self-paced protocols. Thirty-three (41%) of the 80 studies included in the review used one or more VO 2 plateau or secondary VO 2max criteria to confirm the attainment of VO 2max . Thirty studies used the VO 2 plateau, 21 used the heart rate plateau or a criterion based on age- predicted maximal heart rate, 18 used the maximal RER attained in the CPET (RER max ), and 8 used the post-CPET blood lactate concentration.

Characteristics of studies regarding the CPET and verification phase protocols to evaluate VO 2max
In terms of processing respiratory VO 2 data at volitional exhaustion, the most common approach was based on time averages. Thirty-eight studies (48%) reported stationary time averages of 5-to 30-s, whereas 29 (36%) used VO 2 data points at fixed intervals of 15-to 30-s, two studies (3%) used 15-breath averages, two studies (3%) used 10-25-s moving averages, one (1%) used 10-s epochs, two (3%) used 20-s rolling averages, one (1%) used 30-s rolling means, and one study (1%) used Douglas bag collections. Four studies (5%) did not detail which VO 2 data processing method was applied.
Regarding the period between the CPET and verification phase procedure, 34 studies (43%) used a short-term active recovery (e.g. pedaling at light-intensity, walking at a slow pace, or stretching) of 1, 3, 5, 6, 8, 10, or 5-10 min, while 26 studies (33%) employed passive recovery of 5, 6, 9, 10, 15, 20, 60, or 60-90 min. Two studies (3%) employed a combination of passive RER max = maximal respiratory exchange ratio; RPE = rating of perceived exertion; SD = standard deviation; SPV = self-paced maximal oxygen uptake; TR = treadmill; VO 2 = oxygen uptake; VO 2max = maximal oxygen uptake; VP = verification phase; WR = work rate; WR peak = peak work rate. Note: whenever possible, authors were contacted to provide unpublished data. https://doi.org/10.1371/journal.pone.0247057.t002 and active recovery and another (1%) used a self-paced approach where participants were permitted to choose their own WR. Three studies (4%) employed short-term recovery (e.g. 8-10 min) without stating whether it was active or passive. Fifteen studies (19%) carried out the verification phase on a different day to the CPET. Sixty studies (75%) used square-wave verification phase protocols, while 20 studies (25%) used multistage verification protocols characterized by an initial warm-up stage. Overall, 53 studies (66%) adopted "supra WR peak " verification phases based upon the WR peak achieved during the CPET (e.g. one treadmill or cycle ergometer WR stage higher than that completed in the CPET, or 105-130% of the WR peak achieved in the previous CPET). Seven studies (9%) used only 100% of WR peak , while two other studies (3%) used both WR peak and supra WR peak verification phases. Three studies (4%) examined both sub and supra WR peak within the same study and one study (1%) used a predicted WR based on the following formula to elicit the participant's limit of tolerance within 180 s: power output = (finite work capacity � 180 s) + critical power. Fourteen studies (18%) used only sub WR peak verification phases ranging from 85%-95% WR peak (typically two stages below the WR peak achieved during the CPET) (see Table 2).
Forty-two studies (53%) employed cut-off points to analyze differences between the highest VO 2 values obtained during the CPET and verification phase to confirm that VO 2max was likely attained. Criteria for VO 2max verification were frequently based on the intra-subject coefficient of variation acquired from the researchers' laboratories or from published literature, including a VO 2 difference � 2%, � 3%, � 5.0-5.5%, � 1.5-2.2 mL�kg -1 �min -1 , � 50-150 mL/min, or alternative methods. Table 3 shows comparisons between the highest VO 2 values elicited in the CPET and verification phase for each study. Fig 2 displays the forest plots of effect sizes and 95% CIs for the highest VO 2 values (54 studies) based on the random effects meta-analysis results. Notably, the mean highest VO 2 values were similar between the CPET and verification phase (mean difference = 0.03 [95% CI = -0.01 to 0.06] L/min, P = 0.15). Pooled data for VO 2max following the CPET and verification phase showed no significant heterogeneity among the studies overall (see Fig 2). Except for one of the included studies judged to have a high risk of bias [68], the meta-analyzed studies were judged to have a low-risk of bias as shown by the funnel plot (Fig 3).

Quantitative data synthesis: Differences between the highest VO 2 attained in the CPET and verification phase
Results of subgroup analyses according to the characteristics of the verification phase protocol are summarized in Fig 4. There were no significant differences between the CPET and verification phase for the highest VO 2 values attained after stratifying studies for verificationphase intensity (mean difference = 0.03 [95% CI = -0.01 to 0.07] L/min, P = 0.11), type of recovery utilized (mean difference = 0.02 [95%CI = -0.02 to 0.07] L/min, P = 0.36), VO 2max verification criterion adoption (mean difference = 0.02 [95% CI = -0.02 to 0.06] L/min, P = 0.29), verification procedure with regards to whether or not it was performed on the same day as the CPET (mean difference = 0.03 [95%CI -0.01 to 0.06] L/min, P = 0.21), or verification-phase duration (i.e. no longer than 80 s, from 81 to 120 s and longer than 120 s) (mean difference = 0.03 [95%CI -0.03 to 0.09] L/min, P = 0.35).
Subgroup analyses regarding sex, cardiorespiratory fitness level, exercise modality, and CPET protocol are summarized in Table 4. The median time to exhaustion was 665 s (IQR, 600 s) for the CPET and 148 s (IQR, 110 s) for the verification phase. Considering all sub- Table 3

Discussion
A growing number of studies have included the verification phase procedure to increase confidence that the highest possible VO 2 has been elicited by apparently healthy adults during a CPET. To the best of our knowledge this is the first systematic review and meta-analysis of these studies, and evidences that 90% of which have been published since 2009. The major findings were: (a) in general, the verification phase protocols elicited similar highest VO 2 values to those obtained in the preceding CPET protocols; and (b) concordance between the highest VO 2 values in the CPETs and verification phases were not affected by sex, cardiorespiratory fitness level, exercise modality, CPET protocol, or verification phase protocol. The present systematic review and meta-analysis shows that the highest mean VO 2 values elicited by verification phase bouts were similar to those elicited in continuous ramp or pseudo-ramp CPET protocols in the majority of studies. In fact, the mean absolute difference of 0.03 L/min for the 54 studies included in the meta-analysis represents a relative difference of only 0.85% between the highest VO 2 values attained in the CPET and verification phase. This is within the most commonly adopted measures of test variability of 2-3% [57,97]. The present findings also provide evidence that the similarity between the highest VO 2 values attained during the CPETs and verification phases are not affected by sex, cardiorespiratory fitness, exercise modality, CPET protocol design, or how the verification phase was performed (see Table 4 and Fig 4). This contrasts with traditional VO 2max criteria, which are test-protocol [35], for example, observed that participants with lower cardiorespiratory fitness had a lower tendency to exhibit a deceleration in the VO 2 response at the end of a CPET compared to those with higher cardiorespiratory fitness and, therefore, are less likely to exhibit a VO 2 plateau.
Six of the 54 meta-analyzed studies reported significant mean differences between the highest VO 2 values observed in the CPET and verification phase [25,55,56,68,87,95]. Astorino and DeRevere [56], for example, observed significantly higher mean VO 2max values by 0.03 and 0.04 L/min during the CPET than in the verification phase for two samples of participants heterogeneous for cardiorespiratory fitness. However, sub-group analyses revealed that while maximal VO 2 in the CPET was higher than that attained in the verification phase for participants with moderate and high cardiorespiratory fitness, the opposite was true for those with lower cardiorespiratory fitness. Similar findings have been reported by Arad et al. [55], indicating that cardiorespiratory fitness level may be a key moderator of the differences between the highest VO 2 values attained in the CPET and verification phase. A plausible explanation is that individuals with low cardiorespiratory fitness are more susceptible to stopping early during the CPET due to fatigue-associated symptoms [29], which would tend to result in lower VO 2 values. In the present meta-analyses, the mean VO 2max in the verification phase was 8% higher than in the CPET in the low cardiorespiratory fitness group, but 12% and 10% higher in the CPET than in the verification phase in the moderate and high cardiorespiratory fitness groups, respectively (see Table 4). The lack of statistical significance, however, highlights the uncertainty regarding the effects of cardiorespiratory fitness on the differences between the highest VO 2 values in the CPET and verification phase.
Regarding verification-phase duration, Keiller and Gordon [87] observed significantly higher VO 2 values during the incremental treadmill CPETs versus the verification phase with a mean duration of approximately 2 min. This is consistent with the findings of McGawley [95] for 10 recreational runners who performed five consecutive treadmill CPET trials, plus an appended verification phase with a mean duration of < 2 min. Iannetta et al. [25] analyzed the VO 2 responses to ramp-incremented cycling CPETs with WR increments of 5, 10, 15, 25, and 30 W/min, each followed by two verification phases performed at different WRs. The verification phase bouts performed at 110% of the WR peak from ramp protocols with ramp rates of 25 and 30 W/min (i.e. short verification phase bouts of~80 s) yielded VO 2 values significantly lower than those attained in the CPETs. In contrast, the highest VO 2 values attained during verification phase bouts based on slower WR increments of 5, 10, and 15 W/min, which allowed sufficient time for VO 2max attainment (i.e. 162, 122 and 103 s, respectively) were not different to those achieved in the preceding CPETs. Although the aforementioned studies suggest that verification phase duration is a key moderator for the mean differences between the highest VO 2 observed in the CPET and verification phase, our sub-analysis found no difference for verification-phase durations of � 80 s, ranging from 81 to 120 s, and > 120 s (see Fig  4). Notably, however, only three studies reported short durations of 80 s or less [25,79,113] and the lack of statistical significance may be due to the paucity of data. In contrast to the aforementioned studies [25,87,95], Colakoglu et al. [68] observed significantly lower VO 2 values in the CPET versus the verification phase in nine cycling and track and field athletes. According to Midgley et al. [97], if the mean highest VO 2 attained in the verification phase is significantly higher than in the CPET, the investigator should consider that the CPET protocol was inadequate in eliciting the highest possible VO 2 response in all or some of the participants. In the study by Colakoglu et al. [68], participants performed a prolonged step-incremented CPET consisting of one 4-min, three 2-min, and then 1-min increments until volitional exhaustion after 1 h of recovery from a submaximal CPET of at least four 5-min stages. It is feasible that the procedures performed before the maximal CPET may have led to poor participant motivation, lack of effort and premature fatigue in the following test. Additionally, the four verification phase bouts at 100%, 105%, 110%, and 115% of the WR peak attained in the CPET were performed on four different days to the CPET without any preceding maximal exercise. This also may have positively favored the significantly higher mean VO 2 values in the verification phase compared to the CPET and contrasts with the same-day verification phase used by Keiller and Gordon [87], McGawley [95], and Iannetta et al. [25].
An aim of the present systematic review was to suggest best practices for the application of verification phase protocols. The subgroup analyses revealed no systematic bias between the highest VO 2 values observed in the CPET and verification phase according to the verificationphase intensity (i.e. sub WR peak vs. supra WR peak ), type of recovery between the CPET and verification phase (i.e. active vs. passive), whether a VO 2max criterion threshold was used for the CPET (i.e. yes vs. no), whether the verification phase was performed in the same testing session or on a different day, and the verification-phase duration (see Fig 4). Considering that differences in the verification phase procedure do not appear to influence its effectiveness, a specific verification procedure currently cannot be recommended. However, some caution must be exercised to avoid an inappropriately high verification-phase WR that results in a short test duration and insufficient time for the highest possible VO 2 to be elicited [25], especially in untrained individuals characterized by slow VO 2 kinetics [127]. Midgley et al. [97] stated that this is a plausible rationale for the early recommendations of Thoden [128], that individuals who do not reach 3 min in a supra WR peak verification phase should undertake a subsequent verification phase at the same WR or one stage lower than verification-phase the last completed WR stage in the CPET. Poole and Jones [2] suggested that researchers should select a WR that is sufficiently higher than the WR peak attained in the CPET, such as~110% WR peak , to give the VO 2 signal for the higher WR the opportunity to emerge from the extant noise. If the subsequent verification phase produces a VO 2 plateau signifying VO 2max , this signal would be lower than expected for the WR based on the previous VO 2 -WR slope. Conversely, Iannetta et al. [25] advocated a verification-phase WR lower than the WR peak attained in the CPET in order to allow VO 2max to be elicited, since WRs above critical power should elicit VO 2max if the time to exhaustion is sufficiently long. Midgley et al. [39] proposed an alternative approach based on a multistage verification phase protocol that combines WRs below and above WR peak to obtain a protocol that incorporates a supra WR peak intensity with a relatively prolonged verification-phase duration. This approach has since been adopted in other studies [39,53,54,61,62,64,69,76,82,87,89,90,99,104,108,110,111,115,117,122]. Notably, the only study to observe a statistically significant influence of verification phase intensity employed a multistage verification phase protocol incorporating 2 min at 50% of WR peak , increasing to 70% for an additional minute, and then 105 or 115% until volitional exhaustion [107]. Based on their findings, the authors recommended the use of 105% of the WR peak attained in the CPET rather than 115% WR peak . The confounding results and various recommended approaches regarding the verification phase intensity indicates that more research is required before an evidencebased recommendation can be made.
Regarding the recovery time between the CPET and verification phase, intervals between 10-20 min have been commonly used, although in total a wide range of intervals from 1-3 min [65,77,88,113] to 90 min [41] have been used. The present meta-analysis found no significant effect of recovery time on minimizing the difference between the mean VO 2 elicited in the CPET and verification phase. An alternative method is to perform the verification phase on a separate day, although the additional visit to the laboratory and the day-to-day variability in VO 2max [129] might considerably reduce the utility and robustness of this approach. Scharhag-Rosenberger et al. [111] specifically investigated this issue by comparing a 10-min recovery to a verification phase performed on a separate day. No significant difference was observed between the two verification protocols, even though the time to exhaustion was significantly longer when the verification phase was performed on a separate day (2:06 ± 0:22 min vs. 2:42 ± 0:38 min). These findings suggest no advantage in performing the CPET and verification phase on separate days.
Inadequate data processing may negatively impact the utility of the verification phase procedure. Myers et al. [36] suggested small sampling intervals such as 5 and 10 s result in unacceptable variability in VO 2 data, whereas large intervals such as 60 s may not be sufficiently sensitive to accurately track rapid changes in VO 2 such as those observed in ramp and pseudoramp CPET protocols. Midgley et al. [130] observed that the reproducibility of VO 2max during continuous step-incremented treadmill CPETs is not affected by the length of the VO 2 timeaverage interval between the range of 10 to 60 s, however, the actual VO 2max values were significantly different between time averages. The authors suggested that a 30-s stationary timeaverage for CPETs provides a good compromise between removing noise while maintaining the underlying trend in the VO 2 data. However, no study to date has addressed the effect of the VO 2 sampling interval on the verification phase.
A final issue to be addressed refers to appropriate criteria to accept that the highest possible VO 2 has been achieved. The most common criterion used in the reviewed studies is that the highest VO 2 observed in the verification phase should not exceed 3% of the highest VO 2 obtained in the CPET. This threshold can be justified by the technical error of measurement and intra-individual biological variation associated with the determination of VO 2max [15, 56, 57, 62, 63, 69, 71, 78, 82, 86, 89-91, 95, 107, 108, 113, 116, 120-122]. The more restrictive value of � 2% [97,110] and the less restrictive values of � 5-5.5% [104][105][106]111] may also be appropriate for single or different-day variability. Further research is required before an appropriate verification-phase threshold can be recommended, which provides a high degree of confidence that the difference between the highest VO 2 values observed in the CPET and verification phase are beyond the technical error of measurement and intra-individual biological variation.
Some limitations of the present review need to be acknowledged. First, the meta-analysis only included 79% of the participants that underwent CPET with verification phase protocols in the 80 studies included in the systematic review. This issue was due to unsuccessful attempts to acquire the required unpublished information from some authors. Second, the meta-analysis was based on comparison of the highest VO 2 responses in the CPET and verification phase averaged across study participants. Noakes [131] criticized this approach, stating that the CPET is performed on individuals and not groups and, therefore, the group average approach does not identify individuals who may not have attained VO 2max . A meta-analysis using individual participant data is therefore required. Finally, the present systematic review and metaanalysis comprised only apparently healthy adults and it is still unclear to what extent the use of the verification phase procedure is applicable to special or clinical populations. A growing number of studies have included special or clinical populations such as obese adults [132,133], breast and prostate cancer survivors [134], wheelchair athletes [135], individuals with spinalcord injuries [136], patients with heart failure [137] or cystic fibrosis [138][139][140], and pediatric populations [141][142][143][144][145][146][147], including children with spina bifida in an outpatient condition [148], and adolescents with cystic fibrosis [149].

Conclusions
The present meta-analysis showed that the effect sizes calculated from the highest mean VO 2 in apparently healthy adults were similar between CPETs and verification phases performed on a cycle ergometer or treadmill. Furthermore, mean differences between the highest VO 2 values elicited in the CPETs and verification phases were not affected by participant characteristics, exercise modality, or the CPET and verification protocol design. Our findings indicate that from a practical perspective, different procedures may be applied to establish similar highest mean VO 2 responses during the verification phase as compared to the ramp or continuous step-incremented CPETs. It is worth mentioning, however, that some caution must be exercised concerning the selection of sub or supra WR peak verification phases, since any exercise above the critical power must be of sufficient duration to allow the achievement of the highest possible VO 2 response in the verification phase. Our data reinforce the notion that a verification phase applied after ramp or continuous step-incremented CPETs may provide additional and unbiased evidence that the highest possible VO 2 has been achieved. On the other hand, the invalidation of the highest VO 2 obtained in CPETs by subsequent verification phases was less likely on a group basis. The mean differences in highest VO 2 responses were typically within the test-retest variability of the experimental protocols employed. Accordingly, our findings support the usefulness of the verification phase to confirm the likely attainment of VO 2 on incremental CPET. However, the necessity or mandatory application of the verification phase, especially constant supra WR peak verification bouts, in all CPET situations remains open to question.