Manipulating graded exercise test variables affects the validity of the lactate threshold and V˙O2peak

Background To determine the validity of the lactate threshold (LT) and maximal oxygen uptake (V˙O2max) determined during graded exercise test (GXT) of different durations and using different LT calculations. Trained male cyclists (n = 17) completed five GXTs of varying stage length (1, 3, 4, 7 and 10 min) to establish the LT, and a series of 30-min constant power bouts to establish the maximal lactate steady state (MLSS). V˙O2 was assessed during each GXT and a subsequent verification exhaustive bout (VEB), and 14 different LTs were calculated from four of the GXTs (3, 4, 7 and 10 min)—yielding a total 56 LTs. Agreement was assessed between the highest V˙O2 measured during each GXT (V˙O2peak) as well as between each LT and MLSS. V˙O2peak and LT data were analysed using mean difference (MD) and intraclass correlation (ICC). Results The V˙O2peak value from GXT1 was 61.0 ± 5.3 mL.kg-1.min-1 and the peak power 420 ± 55 W (mean ± SD). The power at the MLSS was 264 ± 39 W. V˙O2peak from GXT3, 4, 7, 10 underestimated V˙O2peak by ~1–5 mL.kg-1.min-1. Many of the traditional LT methods were not valid and a newly developed Modified Dmax method derived from GXT4 provided the most valid estimate of the MLSS (MD = 1.1 W; ICC = 0.96). Conclusion The data highlight how GXT protocol design and data analysis influence the determination of both V˙O2peak and LT. It is also apparent that V˙O2max and LT cannot be determined in a single GXT, even with the inclusion of a VEB.


Background
To determine the validity of the lactate threshold (LT) and maximal oxygen uptake ( _ VO 2max ) determined during graded exercise test (GXT) of different durations and using different LT calculations. Trained male cyclists (n = 17) completed five GXTs of varying stage length (1, 3, 4, 7 and 10 min) to establish the LT, and a series of 30-min constant power bouts to establish the maximal lactate steady state (MLSS). _ VO 2 was assessed during each GXT and a subsequent verification exhaustive bout (VEB), and 14 different LTs were calculated from four of the GXTs (3, 4, 7 and 10 min)-yielding a total 56 LTs. Agreement was assessed between the highest _ VO 2 measured during each GXT ( _ VO 2peak ) as well as between each LT and MLSS. _ VO 2peak and LT data were analysed using mean difference (MD) and intraclass correlation (ICC).

Conclusion
The data highlight how GXT protocol design and data analysis influence the determination of both _ VO 2peak and LT. It is also apparent that _ VO 2max and LT cannot be determined in a single GXT, even with the inclusion of a VEB. PLOS  Introduction Sampling of expired gas and blood data during a graded exercise test (GXT) to exhaustion permits identification of the gas exchange threshold (GET), the respiratory compensation point (RCP), the lactate threshold (LT), and maximal oxygen uptake ( _ VO 2max ). These indices can distinguish cardiorespiratory fitness, and demarcate the domains of exercise [1,2] that can be used to prescribe exercise and to optimize training stimuli [3][4][5][6]. However, despite the popularity of these indices, the methods used to determine them can differ substantially and there has been little systematic investigation of their validity [7][8][9].
The recommended duration of a GXT to assess _ VO 2max is 8 to 12 minutes [10][11][12][13]. However, there is little consensus on an appropriate GXT protocol design, including duration, stage length, or number of stages, needed to establish the LT. A stage length of at least 3 minutes has been recommended [13], although an 8-minute stage length has also been suggested for blood lactate concentrations to stabilize [14]. The number of stages and GXT duration will depend on the starting intensity and power increments. Power is typically increased identically [15], regardless of sex or fitness, leading to a heterogenous GXT duration and number of stages completed [16]. A customized approach to LT testing has been recommended to ensure a more homogenous GXT duration [17].
More than 25 methods have been proposed to calculate the LT [18]; these include the power preceding a rise in blood lactate concentration of more than 0.5, 1.0 or 1.5 mmol . L -1 from baseline [19], the onset of a fixed blood lactate accumulation (OBLA) ranging from 2.0 to 4.0 mmol . L -1 [20,21], or the use of curve fitting procedures such as the D max or modified D max methods (ModD max ) [22,23]. However, many of these 'accepted' methods are influenced by GXT protocol design [8,24] and their underlying validity has not been reported.
Assessing the validity of a measurement requires comparison with a criterion measure. The maximal lactate steady state (MLSS) represents the highest intensity where blood lactate appearance and disappearance is in equilibrium and where energy demand is adequately met by oxidative phosphorylation [25]. Exercise performed above the MLSS results in accelerated blood lactate appearance and it has therefore been suggested as an appropriate criterion measure for the LT [25,26]. The primary advantages of the MLSS test include its independence of participant effort, it's submaximal and is reliable [27]. However, the disadvantage is the necessity of multiple laboratory visits and that it yields only one index of performance.
_ VO 2max is considered the "gold standard" for assessing cardiorespiratory fitness [28] and the highest recorded _ VO 2 from a GXT is often accepted as the _ VO 2max [10]. Establishing the LT requires a GXT that typically exceeds 20 minutes [13]; however, in these instances the highest _ VO 2 may underestimate the _ VO 2max [12] and is termed _ VO 2peak . Recently, the use of a verification exhaustive bout (VEB) has been recommended to confirm the _ VO 2max . However, it is unknown if a VEB performed after a longer duration GXT provides a valid estimate of _ VO 2max . The aim of this study was to determine the validity of the LT and _ VO 2max derived from a single visit GXT. We hypothesized that our results would yield one or more GXT stage length and LT calculation method combination that provides a valid estimation of the criterion measure of the LT (i.e., MLSS). We also hypothesized the highest _ VO 2 measured during longer duration GXTs would underestimate _ VO 2max and that the highest _ VO 2 value measured during each VEB would be similar to the _ VO 2peak measured during the 8-to 12-minute GXT.

Ethical approval
All procedures were performed in accordance with the ethical standards of the institutional and/or national research committee, and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Participants/Experimental design
Seventeen trained male cyclists ( _ VO 2max 62.1 ± 5.8 mL . kg -1. min -1 , age 36.2 ± 7.4 years, body mass index (BMI): 24.1 ± 2.0 kg . m -2 ) volunteered for this study which required 7 to 10 visits to the laboratory. Informed consent was obtained from all individual participants included in the study.
Visit one included risk stratification using the American College of Sports Medicine Risk Stratification guidelines [29], written informed consent, self-reported physical activity rating (PA-R) [30], measurement of height and body mass, and completion of a cycling GXT with 1-minute stages (GXT 1 ) followed by a VEB. The remaining visits consisted of four cycling GXTs with varying stage length (3-, 4-, 7-and 10-min stages) and a series of 30-min constant power bouts to establish the MLSS. The GXTs and constant power bouts were performed in an alternating order and the order of the GXTs was randomised. Prior to each GXT and the constant power bouts a 5-min warm up was administered at a self-selected power followed by 5 min of passive rest. Participants performed each test at their preferred cadence determined during the initial visit. Antecubital venous blood (1.0 mL) was sampled during all visits (excluding GXT 1 ) at rest, and at the end of every stage during the GXTs or every 5 min during the constant power exercise bouts. All participants self-reported abstaining from the consumption of alcohol and caffeine or engaging in heavy exercise 24 h prior to each visit. Participants were given at least 48 h between visits and all tests were completed within 6 weeks. The Victoria University Human Research Ethics Committee approved all procedures (HRE 017-035).

Equipment/Instruments
All exercise testing was conducted using an electronically-braked cycle ergometer (Lode Excalibur v2.0, The Netherlands). A metabolic analyser (Quark Cardiopulmonary Exercise Testing, Cosmed, Italy) was used to assess oxygen uptake ( _ VO 2 ) on a breath-by-breath basis, and heart rate was measured throughout all tests. Antecubital venous blood was analysed using a blood lactate analyser (YSI 2300 STAT Plus, YSI, USA).

GXTs with verification exhaustive bout
Demographic data, PA-R, and measurements of height and body mass were used to estimate _ VO 2max [31] and maximum power output _ W max [30,32]. Where _ VO 2max is expressed in millilitres per kilogram per minute, BMI is in kg . m -2 , and _ W max is in Watts.
A custom GXT protocol with a desired time limit of 10 min was then designed for each particpant using: _ W max =10 min = 1-min intensities (W . min -1 ). Additional customized protocols Eq 1 were designed for each of the remaining GXTs based on a percentage of the measured _ W max from GXT 1 . The predicted _ W max was 80%, 77%, 72% and 70% for GXT 3, GXT 4 , GXT 7 , and GXT 10 , respectively. The target number of stages for each participant was nine; the initial stage and subsequent stages of the remaining GXTs were determined using the following equations: where stage 1 power and predicted _ W max subsequent power increments are expressed in Watts. A 5-min recovery was administered after each GXT, followed by a VEB performed at 90% of _ W max measured from GXT 1 to measure the highest measured _ VO 2 measure ( _ VO 2peak ) [17].

Constant power exercise bouts to establish the maximal lactate steady state
The power associated with the respiratory compensation point (RCP) from GXT 1 was used in a regression equation (Eq 5) to estimate the MLSS (RCP MLSS ) and the first constant power exercise [33]. The RCP was determined as the average of the power output associated with: 1) the break point in ventilation relative to expired carbon dioxide ( _ V E = _ VCO 2 ), 2) second break point in _ V E and 3) the fall in end-tidal carbon dioxide (P ET CO 2 ) after an apparent steady state [34][35][36]. where the RCP MLSS and RCP are expressed in Watts Participants performed 3 min of baseline cycling at 20 W prior to each constant power bout. The MLSS was established as the highest intensity where blood lactate increased <1.0 mmol . L -1 from the 10 th to the 30 th minute [26]. If the blood lactate concentration increased >1.0 mmol . L -1 the power was decreased by 3%, otherwise the power was increased by 3% [27]. This process continued until the MLSS was obtained.

LT and respiratory compensation point calculations
The LTs were calculated from GXT 3,4,7 and 10 using 14 methods (4 GXTs Ã 14 LTs = 56 LTs in total), and the RCP and the RCP MLSS were also calculated from GXT 1 (56 LTs + RCP and RCP MLSS = 58 total estimates) (

Data analysis
Breath-by-breath data were edited individually with values greater than three standard deviations from the mean excluded [43]. The data was interpolated on a second-by-second basis and averaged into 5-and 30-s bins [44,45]. The highest measured _ VO 2 value from every GXT and VEB was determined as the highest 20-s rolling average. The _ VO 2max was computed as the highest _ VO 2 measured from any GXT or VEB. The _ VO 2peak for each GXT was defined as the highest measured _ VO 2 from either the GXT or the subsequent VEB. The _ W max for every GXT was determined as the power from the last completed stage plus the time completed in the subsequent stage multiplied by the slope (Eq 6). The _ VO 2 response at the MLSS was determined by the average _ VO 2 value during the last two minutes of the 30-minute constant power bout. Calculated LTs were excluded if the mean difference between the MLSS and calculated LT was greater than the error of the measurement of the MLSS [coefficient of the variation (CV%) = 3%, 7.9 W] [27], the effect size (ES) was greater than 0.2, or the Pearson Product moment correlation coefficient (r) was less than 0.90. Using these criteria, 10 of the 56 LTs and the RCP MLSS (Eq 5) were included in the analysis (Table 1).
Also shown is the mean difference (MD), the Pearson product moment correlation (r) and effect size (ES) of the difference when compared with the MLSS. (log = using the log-log method as the point of the initial data point when calculating the D max or Modified D max ; poly = Modified D max method calculated using a third order polynomial regression equation; exp = Modified D max method calculated using a constant plus exponential regression equation; OBLA = onset of blood lactate accumulation, B + = baseline lactate value plus an absolute lactate value). Bold represents the LT that met the three criteria for inclusion in our final analysis: mean difference less than 7.9 Watts, Pearson moment product correlation >0.90, and a less than trivial ES difference from the MLSS (ES <0.2)

Statistical analysis
A one-way analysis of variance with repeated measures was used to assess significant differences between the MLSS and the calculated LTs. Agreement between the MLSS and the calculated LTs was evaluated using a two-way mixed intraclass correlation coefficient (ICC), standard error of the measurement (SEM), Lin's concordance correlation coefficient (p c ) [46], Bland-Altman plots [47], (r), CV% [48,49] and a magnitude-based inference approach involving standardised differences (ED) [50,51]. Differences between _ VO 2peak values measured during each GXT were assessed using ES, p-values, and the CV%. Agreement between _ VO 2 measured during each GXT and subsequent VEB was evaluated using intraclass calculation coefficient (ICC), SEM, and CV% [49]. Descriptive statistics are reported as the mean ± SD. Alpha was set to P 0.05.

Validity of LT estimates
Comparisons of the 58 estimations of the MLSS and the calculated MLSS are detailed in Table 1.  total) and the MLSS (all log-log methods were excluded given an ES > 1.0). Ten of the calculated LTs and the RCP MLSS met our inclusion criteria for final analysis-detailed comparisons with the MLSS are provided in Table 3 and

Discussion
The main findings of the present study are as follows. Only 11 of the 58 threshold values met our inclusion criteria as valid estimates of the MLSS. Of the 11 methods included in our analysis, three of the ModD max methods yielded the most favourable estimations of the MLSS, and the Log-Poly-ModD max derived from GXT 4 provided the best estimation of the MLSS. There was an inverse relationship between stage length and LT, and this effect was larger in all D max methods compared with the OBLA and baseline plus absolute lactate value methods. The _ VO 2peak values measured during the longer duration GXTs (GXT 3-10 ) underestimated the _ VO 2max and the _ VO 2peak values obtained from GXT 1 (MD = 1.2 to 4.8 mL . kg -1. min -1 ). Finally, contrary to our hypothesis, the VEB after the longer duration GXTs did not yield _ VO 2peak values comparable to the _ VO 2peak derived from GXT 1 . The use of five GXT protocols, 14 common LT methods, the RCP and RCP MLSS resulted in 58 unique thresholds. However, despite their common use, we observed that only 11 of these values met our criteria for inclusion (MD < 7.9 W; ES < 0.2; r > 0.90). Of the four D max methods included in our analysis, one consisted of the traditional ModD max method [22]. This had the poorest agreement relative to the other ModD max methods included in our analysis. The remaining three D max methods are new variations of the ModD max method, and the Log-Poly- (50) and Hopkins (49). log = using the log-log method as the initial data point when calculating the D max or Modified D max ; poly = Modified D max method calculated using a third order polynomial regression equation; exp = Modified D max method calculated using a constant plus exponential regression equation; OBLA = onset of blood lactate accumulation.

Fig 2. (A-D) Forrest Plots of the difference (ES ± 95% CI) between the MLSS and the power calculated from the 13 lactate thresholds derived from (A) GXT 3 , (B) GXT 4 , (C) GXT 7 and (D) GXT 10 (52 in total and excluding log-log). The solid vertical bar represents no difference from the MLSS and the dashed vertical bars represents the threshold between a trivial and small difference (ES = 0.2) established by Cohen
https://doi.org/10.1371/journal.pone.0199794.g002

Table 3. Mean ± standard deviation, mean difference (MD), intraclass correlation coefficient (ICC), Lin's concordance correlation coefficient (ρ c ), standard error of the measurement (SEM), effect size (ES) with 95% confidence limits, and coefficient of the variation (%CV) between the maximal lactate steady state (MLSS) and the eleven thresholds included in our analysis.
(RCP MLSS = MLSS estimate based on the respiratory compensation point; log = Modified D max method using the log-log method as the point of the initial lactate point; poly = Modified D max method calculated using a third order polynomial regression equation; exp = Modified D max method calculated using a constant plus exponential regression equation; OBLA = onset of blood lactate accumulation). ModD max derived from GXT 4 had the highest correlation and lowest mean difference with the MLSS. These variations of the ModDmax method use the power at the log-log LT as the initial intensity to calculate the ModD max and then either the traditional third-order polynomial or exponential plus-constant regression curve to fit the lactate curve [23,41]. Although the validity of these three methods has not previously been assessed, the favourable estimations of the MLSS may be related to the greater objectivity with which they determine the intensity that corresponds with the initial rise in blood lactate concentration [37].    [15,52,53]. One concluded that the D max method derived from GXT 3 was a valid estimation of the MLSS (r = 0.97) [54]. We also observed a high correlation between D max and the MLSS (r = 0.94 to 0.97) ( Table 1), but, as indicated by the MD and other measures, a high correlation is not sufficient to establish validity [55]. Another study examined D max derived from two GXTs with similar durations (36 vs. 39 min), but with different stage lengths (30-s vs. 6-min) [15]. The D max derived from GXT 30s was not correlated (r = 0.51) with the MLSS, even though the MD was 5 W, whilst the D max derived from GXT 6 was correlated (r = 0.85); however, it underestimated the MLSS (MD = 22 W). The third study concluded the D max derived from GXT 1 yielded poor estimates of the MLSS (r = 0.56; bias = -1.8 ± 38.1 W) [53]. Thus, although some studies [15,54] have used correlation analysis to suggest the D max provides a valid estimate of the MLSS, this is not supported by the more comprehensive assessment of validity performed in the present and other studies [53].

MD
There were five fixed blood LT methods and one baseline plus an absolute value that met our inclusion criteria, and, as previously reported [15,24], these varied with the GXT protocol used. The baseline + 1.5 mmol . L -1 was the only LT derived from GXT 3 included in our analysis (bias = -6 ± 35 W). This is consistent with the results of one previous study (bias = 0.5 ± 24 W), which also recruited trained male cyclists and had a similar GXT protocol design [56]. Consistent with our findings, this study also reported that an OBLA of 3.5 mmol . L -1 derived from GXT 3 did not provide a valid estimation of the MLSS. In contrast, another study confirmed the validity of the OBLA of 3.5 mmol . L -1 [52], despite recruiting trained cyclists and using an identical GXT protocol. These conflicting results are likely attributable to the low reproducibility of the OBLA methods [16].
While none of the OBLAs from GXT 3 met our inclusion criteria, the OBLA methods of 2.5 mmol . L -1 derived from GXT 4 and GXT 7 provided valid estimations of the MLSS, as did the OBLA of 3.0 mmol . L -1 derived from GXT 7 and GXT 10 . The OBLA of 3.5 mmol . L -1 from GXT 10 was the highest fixed blood LT that identified the MLSS. There is no previous data investigating the validity of these OBLA methods. However, it is worth noting that these five methods provided superior estimations of the MLSS compared with the original ModD max , but were less favourable than the newly-developed ModD max methods.
An OBLA of 4.0 mmol . L -1 is the most commonly-accepted fixed blood lactate value for estimating the LT or MLSS. Three previous studies have attempted to validate use of an OBLA of 4.0 mmol . L -1 with cycle ergometry [15,53,57]. One study found that it overestimated the MLSS (MD = 49 W) when derived from GXT 1 [53]. The other study reported poor agreement (bias 7 ± 49 W) when OBLA of 4.0 mmol . L -1 was derived from GXT 4 [57]. The final study observed a poor correlation between an OBLA of 4.0 mmol . L -1 and the MLSS (r = 0.71) [15]. Our results indicated the OBLA of 4.0 mmol . L -1 overestimated the MLSS across all GXTs. Validation of a single visit graded exercise test Thus, in agreement with previous research, our results indicate; the OBLA of 4.0 mmol . L -1 does not accurately estimate the MLSS. It is also worth noting that the original authors cautioned the use of this OBLA method, given the lack of a significant correlation when comparing OBLA methods from a GXT and the MLSS [24].
The RCP derived from an 8-to 12-minute GXT consistently overestimates the MLSS [44,53], and this was confirmed in our study (Table 1). Therefore, we used a regression equation Validation of a single visit graded exercise test (Table 3). Nonetheless, for many participants the difference between MLSS and RCP MLSS exceeded the CV% for the MLSS (Fig 3). Therefore, although the RCP MLSS can be used as a convenient 'starting point' when establishing the MLSS, we recommend methods based on blood sampling from the current study and assessing blood lactate kinetics in real time as recommended by Hering et al. [58] for a more accurate estimation of the MLSS. Although a single GXT can be used to estimate both _ VO 2max and LT, the optimal test duration for each measure is different [11,13]. To address this challenge, we added a supramaximal VEB after each GXT, equivalent to that performed following GXT 1 , expecting all VEBs would yield similar _ VO 2 values. However, the _ VO 2peak values from the VEB after the longer duration GXTs underestimated the _ VO 2peak from GXT 1 . Although the _ VO 2peak values from GXT 3 and GXT 4 were similar to GXT 1 , the differences were larger than the typical coefficient of variability for _ VO 2peak (CV < 3%) [59]. Our results are consistent with previous recommendations that longer duration GXTs are not optimal for establishing _ VO 2peak [10,60]. Furthermore, while a VEB can be used to verify that _ VO 2peak was achieved, it appears that a VEB following a prolonged GXT cannot be used to establish _ VO 2max . Extending the duration of the GXT stages results in a lower _ W max [61]. This has implications for exercise prescription, as it is common in sport and exercise science research to prescribe exercise intensity as a percentage of _ W max . For example, in the present study the MLSS ranged from 63 ± 4% (range = 52 to 72%) of _ W max from GXT 1 to 82 ± 4% (range = 74 to 88%) Mean and standard deviation of GXT duration, max power (Watts) from each GXT, percentage of maximum power from the prolonged GXT expressed as a percentage of W maximum power from GXT 1 and power of each VEB (Watts) from the GXTs. Relative power of the verification exhaustive bout expressed as a percentage (%) of the maximal power measured during the GXT. The subscript (i.e., 1, 3, 4, 7 or 10) refers to the stage duration (minutes) for each test. Validation of a single visit graded exercise test of _ W max from GXT 4 . Prescribing exercise in the current study cohort at a fixed percentage of _ W max (e.g., 73% of _ W max ), would result in all participants exercising above or below the MLSS, GXT 1 and GXT 4 , respectively. This is important as it has previously been reported that prescribing exercise relative to LT results in a more homogenous physiological response than when exercise performed relative to _ W max [62]. This also highlights why it is important to consider the GXT protocol and the method used to determine relative exercise intensity when comparing results between studies.

GXT
The wide range of _ W max for each GXT is also note-worthy, the _ W max range for GXT 1 was 320 to 517 W and the duration ranged from 9 to 12 minutes. Had we employed a standardized GXT (e.g., 35 W increments), and assuming _ W max stayed constant, the range would have been 9-to 15 min. Applying this to our longer duration GXTs resulted in a homogenous duration (GXT 4 : 32-to 39 min), whereas a standardised approach (e.g., 35 W increments) would have resulted in a range of 27-to 46 min [57]. Thus, individualizing GXT protocol design is a useful approach to ensure homogenous test duration [17].

Conclusion
In conclusion, the traditional D max and OBLA of 4.0 mmol . L -1 did not provide valid estimates of the MLSS. The best estimation of the MLSS was the Log-Poly-ModD max derived from GXT 4 . The validity of our newly-developed ModD max model may relate to the objectivity for determining the initial rise in blood lactate concentration. However, we must advise caution with the use of our newly-developed method until future research investigates the reliability and reproducibility. It is apparent that both _ VO 2max and LT cannot be determined in a single GXT, even if the GXT is followed by a VEB. Therefore, to appropriately determine _ VO 2max the optimum duration of a GXT is 8-12 minutes and the _ VO 2 values measured during the GXT and VEB be within 3% = CV [63]. Our data also highlight how differences in GXT protocol design and methods used to calculate the relative exercise intensity may contribute to the conflicting findings reported in the literature. Bishop.