## Figures

## Abstract

Measurement of oxygen uptake during exercise () is currently non-accessible to most individuals without expensive and invasive equipment. The goal of this pilot study was to estimate cycling from easy-to-obtain inputs, such as heart rate, mechanical power output, cadence and respiratory frequency. To this end, a recurrent neural network was trained from laboratory cycling data to predict values. Data were collected on 7 amateur cyclists during a graded exercise test, two arbitrary protocols (Prot-1 and -2) and an “all-out” Wingate test. In Trial-1, a neural network was trained with data from a graded exercise test, Prot-1 and Wingate, before being tested against Prot-2. In Trial-2, a neural network was trained using data from the graded exercise test, Prot-1 and 2, before being tested against the Wingate test. Two analytical models (Models 1 and 2) were used to compare the predictive performance of the neural network. Predictive performance of the neural network was high during both Trial-1 (MAE = 229(35) mlO_{2}min^{-1}, r = 0.94) and Trial-2 (MAE = 304(150) mlO_{2}min^{-1}, r = 0.89). As expected, the predictive ability of Models 1 and 2 deteriorated from Trial-1 to Trial-2. Results suggest that recurrent neural networks have the potential to predict the individual response from easy-to-obtain inputs across a wide range of cycling intensities.

**Citation: **Zignoli A, Fornasiero A, Ragni M, Pellegrini B, Schena F, Biral F, et al. (2020) Estimating an individual’s oxygen uptake during cycling exercise with a recurrent neural network trained from easy-to-obtain inputs: A pilot study. PLoS ONE 15(3):
e0229466.
https://doi.org/10.1371/journal.pone.0229466

**Editor: **Juan M. Murias, University of Calgary, CANADA

**Received: **October 21, 2019; **Accepted: **February 6, 2020; **Published: ** March 12, 2020

**Copyright: ** © 2020 Zignoli et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **Data and Python Code are available from Kaggle: https://www.kaggle.com/andreazignoli/cycling-vo2

**Funding: **AZ received a research grant from the Fondazione Cassa di Risparmio di Trento e Rovereto (CARITRO, https://www.fondazionecaritro.it/) (grant 2017.0379). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** All authors disclose any relationship or interest that could bias or influence the work.

## 1 Introduction

Aerobic metabolism, measured universally via oxygen uptake () [1], is the principal mechanism by which humans generate energy from ingested foodstuffs for life. Physical activity demands additional O_{2} to working muscles, which is matched by O_{2} delivery from the cardiopulmonary system to limit reliance on the less efficient anaerobic pathways. The kinetics, the maximal attainable (i.e. ) and the required for sub-maximal activities, are highly related to health, fitness and exercise performance [2,3]. Direct measurement of requires expensive, invasive and fragile instrumentation, such as metabolimeters [4]. As a consequence, the study of exercising is mostly confined to laboratories and clinics. During outdoor activities, wearing and carrying a metabolimeter can put the athletes and the instrumentation in danger. Therefore, estimating without reliance on a metabolimeter would be highly useful for a number of performance assessment applications.

Typically, when a metabolimeter is not available, the steady value of (i.e. ) is estimated from heart rate (HR). However, this methodology has limitations [5]. For example, for very low and very high intensity exercises, the relationship is not linear. Furthermore, heart rate is affected by a high day-to-day variability [6]. However, another method we might use to directly estimate is through its relationship with mechanical power output (P). Indeed, cycling exercise is a repetitive and easily testable activity in which the mechanical power output can be measured directly and reliably using a power meter [7] and even estimated using simple energetic relationships [8].

However, like heart rate, does not respond promptly to mechanical power output variations and dynamics must be taken into account [9]. The three distinct phases involved with dynamics include: 1) a cardio-dynamic phase-I, 2) a fundamental phase-II and 3) a slow phase-III component. Whilst phase-I and II are always present in responses to step-changes in the workload, the phase-III only becomes discernible at heavy and severe exercise intensities [10]. If the dynamic is considered to be linear (this assumption has been questioned multiple times [11–13]), a first-order model can be used to roughly approximate the at the next instant k+1 (i.e. (k+1)) from and mechanical power output (in Watt) at the current instant k (i.e. and P(k)) (Fig 1A): where G is the “gain” [14], is the resting [15] and Δt(k) is the time that separates the two instants k and k+1. This formulation has some shortcomings, e.g.: 1) changes of G and τ across exercise intensity domains [16] (or with transitions from greater baseline intensities [17]) and cadences (ω) [18], 2) prolonged exercise affects the relationship between G and P [19] and 3) response to exercise is affected by recent exercise history [20]. Such a description can be improved including those features known to be relevant or related to , e.g.: current and past values of mechanical power output, pedalling cadence, heart rate and respiratory frequency (RF).

A schematization of the analytical equations approach (A) and artificial intelligence (AI) approach (B) is given. A: the current value of the power (P) and oxygen uptake () (i.e. P(k) and (k) respectively) are used in a formula (i.e. Δe^{-t/τ}, with Δ calculated as the difference between the steady state and the current ) to forecast future values of (i.e. ). B: In an AI approach to the time-series problem, current (k) and past values (k-1, k-2, … k-n) of heart rate (HR), P and cadence (ω) are used to forecast future values .

The problem of forecasting data starting from observations of other variables taken sequentially in time can be considered as a time series prediction problem. Analytical equations of the dynamics have limited capacity to accurately model such complex data without requiring very large and complex formulations. An alternative approach may be found among the artificial intelligence technologies [21]. Particularly, machine learning algorithms can be used to learn from data and individuate patterns of variation between variables (Fig 1B). A machine learning algorithm that considers time sequences can be implemented by means of the so-called artificial neural networks, a biologically-inspired computational system that mathematically formalizes the connections among and within layers of artificial neurons [22]. Artificial neurons receive one or more inputs and sums them to provide an output. Inside the neuron, each input is weighted, and the sum is passed through a non-linear activation function. Given a sufficient number of layers and neurons, a neural network can always be trained (i.e. the weights of the neural network are calibrated) to approximate a real relation between inputs and outputs [23].

Examples of the application of artificial intelligence to time series problems include financial time series forecasting [24], as well as arrhythmia detection from ECG signals [25]. Recurrent neural network and, in particular long-short term memory [26], are suited for time series forecasting problems and sequences [27]. Unlike feed-forward neural network, recurrent neural networks make use of an internal memory to process sequence of inputs. This is a very important property when the prediction of the neural network depends on the historical context of inputs.

With respect to the field of cycling performance, for example, an artificial intelligence approach has been proposed for training data processing [28]. In the field of exercise physiology, a neural network has been developed [29] to model the heart rate versus mechanical power output relationship. With this neural network, it was possible to find the heart rate associated with the anaerobic threshold non-invasively in soccer players. In data estimation, Laitinen & Rasanen [30] used a neural network to estimate in children with congenital heart disease from inputs like heart rate and blood pressure. However, the accuracy attained suggested that the predictive power of their neural network was “insufficient” at that time. In 2017, Gonzalez et al., [31] presented an accurate mathematical description for dynamics during high-intensity variable cycling exercise, and the same authors suggested that a neural network could “perform even better” than their analytical model [32]. More recently, machine-learning has been used [33] to predict accurately during walking and with different daily activities [34], including cycling [35].

In light of the promises of the artificial intelligence technologies [21], the purpose of this pilot study was to predict the individual response of during high-intensity cycling exercise starting from easy-to-obtain inputs. We hypothesised that a recurrent neural network could provide accurate individualised predictions across a variety of exercise conditions. To highlight the potential of this methodology, we compared the predictive accuracy of the neural network with a first-order kinetics equation and a previously published higher-order model.

## 2 Methods

### 2.1 Experimental data

Seven recreational cyclists (6 males, Table 1) participated in the study and visited the laboratory on three separate occasions. The ethics committee of the Department of Neurological and Movement Sciences of the University of Verona approved the study.

The participants gave informed consent and the research was conducted in accordance with the declaration of Helsinki. All tests were performed on an electromagnetically-braked bicycle ergometer (Excalibur Sport, Lode). Measurements of mechanical power output and pedalling cadence were collected continuously. Respiratory measurements, such as and respiratory frequency were collected using breath-by-breath methods from an automated open-circuit gas analyser (Quark CPET, Cosmed). Immediately before every test session, the gas analyser and the flow meter were calibrated. Invalid breaths (i.e. those lying outside the ranges of respiratory frequency b/min 2–90 (min-max); ventilation (L) 0.100–10000, FeO_{2}% (%) 5–20; FeCO_{2} (%) 1–10) were automatically removed in real-time by the CPET software. Heart rate was recorded continuously (beat-by-beat) during the test with a heart rate monitor incorporated into the gas analyser. Heart rate was interpolated and provided at the breath-by-breath time sequence by the metabolimeter.

During the first visit, participants underwent a graded exercise test (GXT) for aerobic assessment. and respiratory frequency data were averaged over 4 min at rest, to give a resting metabolic rate () and resting respiratory frequency (RF_{R}) respectively. Participants warmed up for 10 minutes at 85 W and freely chosen pedalling frequency. The graded exercise test started at a workload of 100 W for 4 min and, subsequently, the workload increased by 40 W every 4 min until exhaustion. The pedalling cadence during the test was kept constant at 90 rpm, using a monitor that provided participants with visual feedback. The peak power output (PPO) of the test was determined using the power of the last completed stage and the time of the last uncompleted stage [36]. The and the maximal respiratory frequency RF_{MAX} were defined as the highest value of and respiratory frequency registered during the test over a 20-s rolling average [37]. The first ventilatory threshold (VT1) was determined from visual inspection of: 1) the first disproportionate increase in minute ventilation (VE); 2) an increase in with no increase in (where is the exhaled volume of CO_{2}); 3) an increase in end-tidal O_{2} with no consequent fall in end-tidal CO_{2} tensions. The second ventilatory threshold (VT2) was determined from: 1) the second disproportionate increase in minute ventilation; 2) the first systematic increase in ; 3) the first systematic decrease in end-tidal CO_{2} tension[38,39]. To account for the differences that exists in the versus power output relationships from graded versus constant exercise [9], the power values expected to elicit the associated with the first and second ventilatory thresholds were estimated using the equations established by Kuipers et al. [40]. We are aware that graded exercise testing protocols can influence the versus power output relationship [41], hence reducing the validity of the specific power values associated with first and the second ventilatory threshold, i.e.: P_{VT1} and P_{VT2}, respectively. P_{VT1} and P_{VT2} were obtained considering the power output of the previously completed stage and the time of the completed stage when the ventilatory threshold occurred (e.g. 18 min = 240 W; 220 W (16 min) + 2 min/4min x 40 W).

Three different mechanical power output levels were defined for every participant as follows: moderate intensity P_{1} = 0.5*P_{VT1}, heavy intensity P_{2} = 0.5*(P_{VT2}-P_{VT1})+ P_{VT1} and severe intensity P_{3} = 0.5*(PPO-P_{VT2}) + P_{VT2} (**Table 1**). After a recovery period of 1 hour, participants performed a 30” Wingate test (WG) on a mechanically braked cycle ergometer (Ergomedic 894-Ea, Monark). During this test, the highest mechanical power P_{MAX} and the maximal cadence ω_{MAX} were determined.

During the second and third visit to the laboratory, athletes performed a warm-up for 10 min at a constant power of 85 W and rested for 4 minutes, before performing two different protocols on separate days. The first protocol (Test 1) consisted of a constant mechanical power output of 100 W for 4 minutes, followed by three repetitions of three constant bouts of 4 minutes at P_{2}, P_{3} and P_{1} (please see the additional material for a graphical representation of the protocol). The second protocol (Test 2) started with a linear increase in the mechanical power output (i.e. a ramp) from P_{1} to P_{3} in 4 min. The initial ramp was followed by: a 1-min bout at P_{3}, a 4-min bout at P_{1}, a 1-min ramp from P_{1} to P_{2}, a constant bout of 3 min at P_{2}, a 4-min bout at P_{1}, a 2-min ramp from P_{1} to P_{2}, a constant bout of 2 min at P_{2} and a 4-min bout at P_{1} (please see the additional material for a graphical representation of the protocol). These two arbitrary protocols were designed to elicit different dynamics behaviours and facilitate the convergence of the parameter estimation.

### 2.2 Dataset preparation

Metabolic and power data were synchronized in time in a post-processing phase. Particularly, mechanical power output and pedalling cadence signals were resampled to meet the same breath-by-breath frequency of the cardiopulmonary data. Data were normalized between 0 and 1, to facilitate convergence during parameter optimization [42]. data was set to 1 if it matched and 0 if it matched . Respiratory frequency data was set to 1 if it matched RF_{MAX} and to 0 if it matched RF_{R}. Mechanical power output data was set to 1 if it matched PPO and 0 if it matched zero, while pedalling cadence data was set to 1 if it matched ω_{MAX} and 0 if it matched zero.

Past input values were included and used for predicting the output values. As a result, the input **x** and the output **y** for our machine learning algorithms became:

Therefore, the shape of the input was nx4, while the shape of the output was 1x1. This means that every single exercise provided several samples equal to the total number of breath N minus the number of past inputs n (i.e. N-n). While N was determined by the duration of the exercises, a value of n = 70 breaths was proposed as a good estimate of the time-dependence decay between the output and past values of inputs. This implied that the machine learning algorithms could hypothetically learn about relationships between inputs and outputs lasting across 70 breaths. This number was chosen because it provided the best combination between computational time and prediction accuracy.

The entire dataset was split into 2 sub-datasets: the training set and the test set. The training set included 3 of the 4 tests performed by every cyclist and was used to adjust the weights of the neural network. The test set included the remaining test and was used to confirm the actual predictive power of the neural network. In a first Trial (Trial 1), the training set included the graded exercise test, Wingate test and Test 2, while the test set included Test 1. In a second Trial (Trial 2), the training set included the graded exercise test, Test 1 and Test 2, while the test set included the Wingate test.

### 2.3 Neural network design

An artificial intelligence regressor was developed and used to predict values of . The neural network was coded and implemented using Python (ver. 3.6, Python Software Foundation), a high-level programming language for general-purpose programming. In particular, the open-source library called Keras was adopted to specifically design and test the neural network. The neural network was created using a Tensorflow *backend* with *cuda* support (2xNVidia GT750M i74xxx). A summary of the model is given in **Table 2**.

Three LSTM layers of 32 neurons were used with 1 hidden layer of 10 feed-forward neurons and 1 output layer of 1 neuron. Input shape for LSTM layers were determined form the batch size (10), the number of past inputs considered in the time series (70) and the number of neurons of the layer.

The neural network was composed with long-short term memory neurons [26], best suited for time-series analyses and sequence detection [27]. A total of 3 long-short term memory layers of 32 neurons each were formed, plus 1 hidden layer of 10 neurons and 1 output layer of 1 neuron. The neural network was trained with a stochastic gradient method (*adagrad*) that optimises a categorical cross entropy loss. The training dataset entries were shuffled and the whole dataset was crossed in 20 epochs. The weights were initialized as random values, while biases were initialised as random positive values. The batch size (that defines the number of samples propagated in the neural network) was set to 10.

There are no specific and scientifically proven steps to be followed in the design of the neural network. However, we know that the choice of the number of layers, the number of neurons, the number of epochs and the batch size affect the accuracy of the output and the computational time. Therefore, to select these parameters, we proceeded by trial and error, until the best combination of accuracy and computational time was found. The final architecture with 3 long-short term memory layers, has been shown to work well in other time-series classification problems using physiological data [43].

### 2.4 The analytical models

Two models for data prediction were used to test the relative predictive power of the neural network. The two candidate models were chosen as they have been already tested during the prediction of data from mechanical power output data in cycling [32].

Parameters of the models were calculated using a particle swarm optimization algorithm [44], with the goal of finding those model parameters that could lead to the best match with experimental data (in the least square sense). The number of iterations was fixed at 250, a number that was found to provide stable solution in a reasonable amount of time. The particle swarm optimization algorithm was implemented and run in the Matlab (ver. 2017a, Mathworks) numerical environment as follows:

- Model 1: the dynamics were approximated using the equation offered in the Introduction.
- Model 2: the complete description of this model can be found in the original article [31]. Dynamics equations of the model are reported in the Appendix using a formulation best suited for spreadsheets.

### 2.5 Statistics

To assess the prediction ability of the different models, a residual analysis was conducted. Residuals were calculated as the difference between the experimental values and the output values predicted by the models. Mean absolute error (MAE) and root mean squared error (RMSE) of the residuals were calculated. A regression analysis of the residuals provided a Pearson’s correlation coefficient *r* and variance explained *R*^{2} statistic from the fit of each output. A Bland-Altman analysis [45] was used to assess the level of agreement between measured and predicted data. The mean bias and the limits of agreement at 95% of probability LA_{95%} were calculated. The bias was significant if the equality line fell outside the confidence intervals CI_{95%} of the mean bias for the sample. The confidence limits of the mean bias were calculated with the significance level set to 0.05. Additionally, best practice suggested we define *a priori* a significant and meaningful level of maximal acceptable limits. This limit was set to 200 mlO_{2}min^{-1}, which, in our experience, is comparable with the magnitude of the typical noise underlying measurements at high exercise intensities.

An autocorrelation analysis calculated the strength of the relationship between a residual and residuals at prior time steps. An autocorrelation consistently falling outside the confidence bands meant that the model failed to incorporate important relationships between the current output and past values of the inputs. Confidence bands were calculated with a significance level set to 0.05 [46].

## 3 Results

Training the neural network required approximately 30 min (PC equipped with 2xTitan i75900), while Model 1 and 2 calibration (particle swarm optimization) required 10 min and 20 min respectively (MacBook Pro, 2.8 GHz Intel Core i7). Testing the models required few seconds for every simulation.

In trial 1, after the particle swarm optimization, mean values of the parameters of Model 1 were: G = 10.07 (0.85) mlO_{2}min^{-1}W^{-1} and τ = 45 (3) s. Values for Model 2 are reported in the Appendix. For the neural network and Models 1 and 2, results of the residual and Bland-Altman analyses are presented collectively in **Table 3** for both the experimental Trials. The performances of the neural network in Trial 1 and 2 are shown in **Fig 2A** and **2B** for a representative participant.

Experimental data (circles) of oxygen uptake () are reported. Predicted values of (solid line) are superimposed on experimental data. In this example, MAE was 0.028 (i.e. 164 mlO_{2}min^{-1}), with a RMSE of 0.04 (i.e. 229 mlO_{2}min^{-1}). **B)** The performance of the regressor is shown for a single representative athlete during a WG test. Experimental data (circles) of oxygen uptake () are reported. Predicted values of (solid line) are superimposed on experimental data. In this example MAE was 0.03 (i.e. 176 mlO_{2}min^{-1}), with a RMSE of 0.05 (i.e. 294 mlO_{2}min^{-1}). Please see the S1 Material for the other individuals’ responses.

In Trial 1 we compared predicted and experimental data during a variable high-intensity exercise. In Trial 2 we compared predicted and experimental data during a brief 30” “all-out” Wingate test.

The residual analysis in Trial 1 shows that the predictive power of the neural network was significantly superior to that of the other models, as seen by the smaller mean absolute error and root mean square error and higher correlation coefficient and variance explained (**Table 3**). For both our neural network and Models 1 and 2, the Bland-Altman analysis for the measured versus predicted showed no proportional error rate, with differences unrelated to the magnitude of the measurement error. In the case of the neural network the bias was not significant. For model 1, the equality line fell outside the confidence intervals of the mean bias of the sample and outside the limits of 200 mlO_{2}min^{-1}. Model 2 performed slightly better than Model 1: the equality line fell outside the confidence intervals of the mean bias of the sample but inside the limits of 200 mlO_{2}min^{-1}. For the neural network, the autocorrelation analysis suggested that there was no significant autocorrelation between observations and lagged observations. In fact, autocorrelation consistently stayed within the confidence bands. In the case of Model 1 and 2, the autocorrelation fell outside the confidence bands for the initial portion of the signal.

During Trial 2, residual analysis highlighted that the neural network could accurately predict the actual data both during the ascending and descending phases of the evolution. On the contrary, both Models 1 and 2 did not predict the additional required after high-intensity exercises. This is confirmed by the high values of correlation coefficient and variance explained for the predictions of the neural network (**Table 3**). Bland-Altman analysis suggested that, in the case of the regressor, the bias was not significant. On the other hand, for Model 1, the equality line fell outside the confidence intervals of the mean bias of the sample and outside the limits of 200 mlO_{2}min^{-1}. Model 2 performed slightly worse: the equality line fell outside the confidence intervals of the mean bias of the sample and outside the limits of 200 mlO_{2}min^{-1}. Bland-Altman analysis suggested that, in the case of Models 1 and 2, the biases were significant. The autocorrelation analysis for the predicted values of the neural network showed that there was no significant autocorrelation between observations and lagged observations. When predictions were made with Models 1 and 2, autocorrelation analysis highlighted that the other models failed to incorporate important relationships between current and past input values. In fact, a consistent portion of the autocorrelation lied outside the confidence bands.

## 4 Discussion

We hypothesized that a recurrent neural network approach could be successfully used to accurately predict individual cycling data from easy-to-collect inputs [21]. In fact, the mechanical power output and the pedalling cadence are both easily collectable by portable power-meters [7]). Indeed, heart rate and respiratory frequency are both measurable with chest belts [47,48] and have already been successfully used by Beltrame et. al. [49] for the prediction of from wearable sensors.

The ability of neural networks to model complex data was already known and other more basic learners could have been used (e.g. k-nearest-neighbour or support vector machine). While simpler learners like Hammerstein-Wiener models have been already tested [32,50], we are not aware of any existing example of the application of k-nearest-neighbour or support vector machine methods in the prediction of during high-intensity cycling.

However, the major innovation of our neural network lies in the long histories of values of the inputs (reflected in the number of input neurons). Latinen & Rasanen [30] used a neural network with 14 input neurons, one hidden layer of 4 neurons and one output neuron for . Beltrame et al. [33] used a neural network with 7 input neurons, one hidden layer of 11 neurons and one output neuron. Both studies only used current inputs and not past values. Beltrame et al. [49] used only 1 sec of lag to include “dynamic changes” of heart rate. Very recently, Borror et. al. [35] presented a neural network that can predict cycling in cycling, with a workflow similar to ours. They included body mass, mechanical power output, pedalling cadence and heart rate as inputs. However, in their work, no past input values are passed to the neural network, and the heart rate dynamics is only considered by means of its “time derivative”. In our neural network, there were 3 hidden layers of 32 long-short term memory neurons each, one hidden layer of 10 neurons and one output neuron. The neurons adopted were long-short term memory neurons, particularly suited for time-series analysis [26]. To the best of our knowledge, we are the first to apply recurrent neural networks to the prediction of cycling .

The predictive power of the neural network was very high during Trial 1, as measured and predicted showed a nearly perfect agreement (MAE = 229 (35) mlO_{2}min^{-1}, *r* = 0.94). The performances of models 1 and 2, although inferior, were still good in Trial 1 (MAE = 355 (86) mlO_{2}min^{-1}, *r* = 0.83 and 273 (49) mlO_{2}min^{-1}, *r* = 0.9 for model 1 and 2 respectively). This means that the performances of our neural network and Models 1 and 2 were very close. However, a robust model should predict data across a wide range of scenarios. To this end, we tested our models using a short “all-out” sprint effort (Wingate test). Importantly, Models 1 and 2 were not designed specifically for the Wingate test and have a limited number of parameters that can be tuned. However, the neural network, due to the considerable number of parameters used, can work well across a wider range of exercising scenarios. In fact, the number of parameters used may limit the number of physiological mechanisms that can be mathematically described.

In Model 1, a single phase is included and characterised by the parameter τ. The time-constant τ and the oxygen gain G, in Model 1, are constant across all the exercising intensities. Therefore, it becomes difficult to predict experimental values of during brief “all-out” exercises [51]. In Model 2, the parameter T_{I} (see Appendix) has been included to account for the delayed component that sum to the principal element. However, if the time duration of the exercise is shorter than T_{I} (e.g. the Wingate test lasts 30”), then this additional component is not activated. In Model 2, the high number of parameters affected the confidence of parameter estimation and this is confirmed by the large variability of the parameter estimates. Mean and standard deviation of the variables found with our experimental data, were compatible to those reported in the original article [31] (Appendix).

On one hand, the high predictive power of the neural network, although reduced, was remarkably conserved during Wingate test (Trial 2: MAE = 304(150) mlO_{2}min^{-1}, *r* = 0.89). This means that the dataset that we used to train the neural network (in terms of duration of the exercises) for every participant, was large enough to provide good predictive power. Further studies are needed to establish the minimal amount of data that should be used to train a neural network and retain a high predictive ability. On the other hand, the performances of Models 1 and 2 deteriorated during Wingate test (Trial 2: MAE = 391(71) mlO_{2}min^{-1}, *r* = 0.75 and MAE = 463(112) mlO_{2}min^{-1}, *r* = 0.48 for the Model 1 and 2 respectively). It can be noticed (**Fig 2B**) that a small lag is present between the measurements and predictions. This might be because two of the inputs used by the neural network (i.e. respiratory frequency and heart rate) did not promptly react to abrupt changes in power output. However, an autocorrelation analysis showed that our neural network could incorporate relevant relationships between current and past input values, whereas Models 1 and 2 could not. Due to the reduced number of parameters of Model 1 and 2, the predictive power does not heavily depend on the amount of data used to calibrate the parameters. We suggest that the performance of the analytical models, although inferior, is guaranteed even if smaller datasets are used for their calibration. We did not investigate the influence of the dimension of the training set on the performance of the neural network, but we believe that the performance would deteriorate with smaller and smaller training datasets. This is a first limitation of a neural network approach: we rely on large datasets.

The second main limitation of the neural network method lies in its “black box” approach. In fact, it is unlikely we can understand how the non-linearities of the dynamics are represented inside the neural network. Additionally, our exercises were carried out in the laboratory environment and they were limited in time (max duration ~1400 s). In practical settings (e.g. training and races), a cardiovascular drift could mislead our estimations. The use of long-short term memory neurons makes it difficult to understand the variables that contribute the most to the total estimation. In our study, the pedalling cadence was kept constant, so it is likely that the contribution of this variable may be limited in our study. As well, respiratory frequency indicating the ventilatory response to exercise has an important link with , while heart rate has additional known associations with exercising [5,52].

Even though we investigated a few different exercising conditions (i.e. moderate, heavy and severe intensity and “all-out” efforts), we should the results of this pilot study with caution. In fact, more work is needed before this algorithm could be embedded in a portable system able to estimate cycling in real-time: the verification of the predictive ability of the neural network on a larger sample (7 cyclists is a small sample) and on different environmental conditions (e.g. outdoor). Also, including input parameters like body mass, gender and fitness level, may provide in the future even better predictive outcomes for the estimation of the aerobic performance. Importantly, the ability of the neural network in predicting the values for an individual who was not included in the training dataset, has yet to be assessed.

## 5 Conclusions

In the context of forecasting values, the results of our pilot study suggested that a recurrent neural network can use great quantities of information from other mechanical (such as mechanical power output and pedalling cadence) and physiological markers (such as heart rate and respiratory frequency), as well as past input values, to attain accurate predictions of cycling . Results suggest that this algorithm has the potential to be, in the next future, embedded in a portable system and to provide real-time assessment of individual cycling during training or racing.

## Appendix

Dynamics equations of Model 2 (see [31,53]) are reported with a formulation that can be readily implemented in a spreadsheet. The main difference between this model and Model 1 is that phase-II (“fast” phase) and III (“slow” phase) of dynamics are considered in this model. Gonzalez et al. included these two additional phases with two delayed components that become active only after a given period. The principal governing equation is:

Where is the principal component that is active after T_{II} and is characterized by a time-constant τ_{II}.

Where A_{II}(k) can be computed as:

Where s is the gain for the fast phase. is the slow component of that activates after T_{III} and is characterized by a time constant τ_{III}.

Where A_{III}(k) can be computed as:

Where Pc is a “critical power” threshold.

In trial 1, after PSO, the values of the parameters were (mean(SD)): = 397(398) mlO_{2}min^{-1}, Pc = 359(39) W, Δ = 79(72) W, s = 8.67(0.49) mlO_{2}min^{-1}W^{-1}, τ_{I} = 43(1.38) s, τ_{II} = 199(52) s, T_{I} = 10(6.72) s, T_{II} = 113(27) s. In trial 2, after PSO, the values of the parameters were: = 779(445) mlO_{2}min^{-1}, Pc = 383(15) W, Δ = 64(34) W, s = 9.03(0.9) mlO_{2}min^{-1}W^{-1}, τ_{I} = 42(1.7) s, τ_{II} = 183(30) s, T_{I} = 11(4.5) s, T_{II} = 103(34) s.

## References

- 1. Jones AM, Poole DC. Oxygen uptake dynamics: from muscle to mouth–an introduction to the symposium. Medicine and Science in Sports and Exercise. 2005;37: 1542–1550. pmid:16177607
- 2.
Rossiter HB. Exercise: Kinetic Considerations for Gas Exchange. In: Terjung R, editor. Comprehensive Physiology. Hoboken, NJ, USA: John Wiley & Sons, Inc.; 2010. https://doi.org/10.1002/cphy.c090010 pmid:23737170
- 3. Bassett DR Jr, Howley ET. Limiting factors for maximum oxygen uptake and determinants of endurance performance. Medicine & Science in Sports & Exercise. 2000;32: 70.
- 4. American Thoracic Society, American College of Chest Physicians. ATS/ACCP Statement on cardiopulmonary exercise testing. Am J Respir Crit Care Med. 2003;167: 211–277. pmid:12524257
- 5. Achten J, Jeukendrup AE. Heart rate monitoring: applications and limitations. Sports Med. 2003;33: 517–538. pmid:12762827
- 6. Lamberts RP, Lambert MI. Day-to-Day Variation in Heart Rate at Different Levels of Submaximal Exertion: Implications for Monitoring Training: Journal of Strength and Conditioning Research. 2009;23: 1005–1010. pmid:19387374
- 7. Passfield L, Hopker Jg, Jobson S, Friel D, Zabala M. Knowledge is power: Issues of measuring training and performance in cycling. Journal of Sports Sciences. 2017;35: 1426–1434. pmid:27686573
- 8. di Prampero P, Cortili G, Mognoni P, Saibene F. Equation of motion of a cyclist. Journal of Applied Physiology. 1979;47: 201–206. pmid:468661
- 9. Keir DA, Paterson DH, Kowalchuk JM, Murias JM. Using ramp-incremental responses for constant-intensity exercise selection. Applied Physiology, Nutrition, and Metabolism. 2018;43: 882–892. pmid:29570982
- 10. Jones AM, Grassi B, Christensen PM, Krustrup P, Bangsbo J, Poole DC. Slow Component of V˙O2 Kinetics: Mechanistic Bases and Practical Applications. Medicine & Science in Sports & Exercise. 2011;43: 2046–2062. pmid:21552162
- 11. Koga S, Shiojiri T, Shibasaki M, Kondo N, Fukuba Y, Barstow TJ. Kinetics of oxygen uptake during supine and upright heavy exercise. J Appl Physiol. 1999;87: 253–260. pmid:10409583
- 12. Keir DA, Robertson TC, Benson AP, Rossiter HB, Kowalchuk JM. The influence of metabolic and circulatory heterogeneity on the expression of pulmonary oxygen uptake kinetics in humans: Pulmonary oxygen uptake kinetics are slowed in relation to work rate. Exp Physiol. 2016;101: 176–192. pmid:26537768
- 13. Beltrame T, Hughson RL. Linear and non-linear contributions to oxygen transport and utilization during moderate random exercise in humans: Aerobic system linearity in frequency domain. Experimental Physiology. 2017;102: 563–577. pmid:28240387
- 14. Barstow TJ, Molé PA. Linear and nonlinear characteristics of oxygen uptake kinetics during heavy exercise. Journal of Applied Physiology. 1991;71: 2099–2106. pmid:1778898
- 15. Ainsworth BE, Haskell WL, Leon AS, Jacobs DR, Montoye HJ, Sallis JF, et al. Compendium of physical activities: classification of energy costs of human physical activities. Med Sci Sports Exerc. 1993;25: 71–80. pmid:8292105
- 16. Whipp B, Rossiter H, Ward S. Exertional oxygen uptake kinetics: a stamen of stamina? Biochemical Society Transactions. 2002;30: 237–247. pmid:12023858
- 17.
Keir DA, Copithorne DB, Hodgson MD, Pogliaghi S, Rice CL, Kowalchuk JM. The slow component of pulmonary O
_{2}uptake accompanies peripheral muscle fatigue during high-intensity exercise. Journal of Applied Physiology. 2016;121: 493–502. pmid:27339183 - 18. Abbiss CR, Laursen PB, Peiffer JJ. Optimal cadence selection during cycling. International SportMed Journal. 2009;10: 1–15.
- 19. Passfield L, Doust JH. Changes in cycling efficiency and performance after endurance exercise. Medicine & Science in Sports & Exercise. 2000;32: 1935–1941.
- 20. Burnley M, Jones AM, Carter H, Doust JH. Effects of prior heavy exercise on phase II pulmonary oxygen uptake kinetics during heavy exercise. J Appl Physiol. 2000;89: 1387–1396. pmid:11007573
- 21. Zignoli A, Fornasiero A, Bertolazzi E, Pellegrini B, Schena F, Biral F, et al. State-of-the art concepts and future directions in modelling oxygen consumption and lactate concentration in cycling exercise. Sport Sciences for Health. 2019; 1–16.
- 22. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521: 436–444. pmid:26017442
- 23. Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators. Neural Networks. 1989;2: 359–366.
- 24. Kaastra I, Boyd M. Designing a neural network for forecasting financial and economic time series. Neurocomputing. 1996;10: 215–236.
- 25. Rajpurkar P, Hannun AY, Haghpanahi M, Bourn C, Ng AY. Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks. arXiv:170701836 [cs]. 2017 [cited 4 Jun 2019]. Available: http://arxiv.org/abs/1707.01836
- 26. Hochreiter S, Schmidhuber J. Long Short-Term Memory. Neural Computation. 1997;9: 1735–1780. pmid:9377276
- 27. Graves A. Generating Sequences With Recurrent Neural Networks. arXiv:13080850 [cs]. 2013 [cited 3 Jun 2019]. Available: http://arxiv.org/abs/1308.0850
- 28. Jobson SA, Passfield L, Atkinson G, Barton G, Scarf P. The Analysis and Utilization of Cycling Training Data: Sports Medicine. 2009;39: 833–844. pmid:19757861
- 29. Erdogan A, Cetin C, Goksu H, Guner R, Baydar ML. Non-invasive detection of the anaerobic threshold by a neural network model of the heart rate—work rate relationship. Proceedings of the Institution of Mechanical Engineers, Part P: Journal of Sports Engineering and Technology. 2009;223: 109–115.
- 30. Laitinen PO, Rasanen J. Measured versus predicted oxygen consumption in children with congenital heart disease. Heart. 1998;80: 601–605. pmid:10065031
- 31. Gonzalez AA, Bertschinger R, Brosda F, Dahmen T, Thumm P, Saupe D. Kinetic analysis of oxygen dynamics under a variable work rate. Human movement science. 2017.
- 32. Artiga Gonzalez A, Bertschinger R, Saupe D. Modeling and with Hammerstein-Wiener Models. 2016. pp. 134–140.
- 33. Beltrame T, Amelard R, Villar R, Shafiee MJ, Wong A, Hughson RL. Estimating oxygen uptake and energy expenditure during treadmill walking by neural network analysis of easy-to-obtain inputs. Journal of Applied Physiology. 2016;121: 1226–1233. pmid:27687561
- 34. Beltrame T, Amelard R, Wong A, Hughson RL. Extracting aerobic system dynamics during unsupervised activities of daily living using wearable sensor machine learning models. Journal of Applied Physiology. 2017; jap.00299.2017. pmid:28596271
- 35. Borror A, Mazzoleni M, Coppock J, Jensen BC, Wood WA, Mann B, et al. Predicting oxygen uptake responses during cycling at varied intensities using an artificial neural network. Biomedical Human Kinetics. 2019;11: 60–68.
- 36. Kuipers H, Verstappen F, Keizer H, Geurten P, van Kranenburg G. Variability of Aerobic Performance in the Laboratory and Its Physiologic Correlates. International Journal of Sports Medicine. 1985;06: 197–201. pmid:4044103
- 37. Robergs RA, Dwyer D, Astorino T. Recommendations for improved data processing from expired gas analysis indirect calorimetry. Sports Medicine. 2010;40: 95–111. pmid:20092364
- 38. Davis JA. Anaerobic threshold: review of the concept and directions for future research. Med Sci Sports Exerc. 1985;17: 6–21. pmid:3884961
- 39. Ahmaidi S, Hardy J, Varray A, Collomp K, Mercier J, Prefaut C. Respiratory gas exchange indices used to detect the blood lactate accumulation threshold during an incremental exercise test in young athletes. European journal of applied physiology and occupational physiology. 1993;66: 31–36. pmid:8425510
- 40. Kuipers H, Rietjens G, Verstappen F, Schoenmakers H, Hofman G. Effects of stage duration in incremental running tests on physiological variables. Int J Sports Med. 2003;24: 486–491. pmid:12968205
- 41. Pallarés JG, Morán-Navarro R, Ortega JF, Fernández-Elías VE, Mora-Rodriguez R. Validity and Reliability of Ventilatory and Blood Lactate Thresholds in Well-Trained Cyclists. Sandbakk Ø, editor. PLoS ONE. 2016;11: e0163389. pmid:27657502
- 42. Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. 2015. pp. 448–456.
- 43. Zignoli A, Fornasiero A, Stella F, Pellegrini B, Schena F, Biral F, et al. Expert-level classification of ventilatory thresholds from cardiopulmonary exercising test data with recurrent neural networks. European Journal of Sport Science. 2019; 1–9. pmid:30880591
- 44.
Das S, Abraham A, Konar A. Particle Swarm Optimization and Differential Evolution Algorithms: Technical Analysis, Applications and Hybridization Perspectives. In: Liu Y, Sun A, Loh HT, Lu WF, Lim E-P, editors. Advances of Computational Intelligence in Industrial Systems. Berlin, Heidelberg: Springer Berlin Heidelberg; 2008. pp. 1–38. Available: http://link.springer.com/10.1007/978-3-540-78297-1_1
- 45. Bland JM, Altman D. Statistical methods for assessing agreement between two methods of clinical measurement. The lancet. 1986;327: 307–310.
- 46.
Box GE, Jenkins GM. Time series analysis, control, and forecasting. San Francisco, CA: Holden Day. 1976;3226: 10.
- 47. Kim J-H, Roberge R, Powell J, Shafer A, Jon Williams W. Measurement Accuracy of Heart Rate and Respiratory Rate during Graded Exercise and Sustained Exercise in the Heat Using the Zephyr BioHarnessTM. International Journal of Sports Medicine. 2012;34: 497–501. pmid:23175181
- 48. Laukkanen RMT, Virtanen PK. Heart rate monitors: State of the art. Journal of Sports Sciences. 1998;16: 3–7. pmid:22587712
- 49. Beltrame T, Amelard R, Wong A, Hughson RL. Prediction of oxygen uptake dynamics by machine learning analysis of wearable sensors during activities of daily living. Scientific Reports. 2017;7: 45738. pmid:28378815
- 50. Su SW, Wang L, Celler BG, Savkin AV. Oxygen Uptake Estimation in Humans During Exercise Using a Hammerstein Model. Annals of Biomedical Engineering. 2007;35: 1898–1906. pmid:17687652
- 51. Beneke R, Pollmann C, Bleif I, Leithäuser R, Hütler M. How anaerobic is the Wingate Anaerobic Test for humans? European journal of applied physiology. 2002;87: 388–392. pmid:12172878
- 52. Nicolò A, Massaroni C, Passfield L. Respiratory Frequency during Exercise: The Neglected Physiological Measure. Front Physiol. 2017;8: 922. pmid:29321742
- 53.
Artiga Gonzalez A, Bertschinger R, Brosda F, Dahmen T, Thumm P, Saupe D. Modeling Oxygen Dynamics under Variable Work Rate. SCITEPRESS—Science and and Technology Publications; 2015. pp. 198–207. https://doi.org/10.5220/0005607701980207