Estimating oxygen uptake in simulated team sports using machine learning models and wearable sensor data: A pilot study

Dermot Sheridan; Arne Jaspers; Dinh Viet Cuong; Tim Op De Beéck; Niall M. Moyna; Toon T. de Beukelaar; Mark Roantree

doi:10.1371/journal.pone.0319760

Abstract

Accurate assessment of training status in team sports is crucial for optimising performance and reducing injury risk. This pilot study investigates the feasibility of using machine learning (ML) models to estimate oxygen uptake (VO₂) with wearable sensors during team sports activities. Six healthy male team sports athletes participated in the study. Data were collected using inertial measurement units (IMU), heart rate monitors, and breathing rate sensors during incremental fitness tests. The performance of different ML models, including multiple linear regression (MLR), XGBoost, and deep learning models (LSTM, CNN, MLP), was compared using raw and engineered features from IMU data. Results indicate that while LSTM models with raw IMU data provided the most accurate predictions (RMSE: 4.976, MAE: 3.698 ), MLR models remained competitive, especially with engineered features. Multi-sensor configurations, particularly those including sensors on the torso and limbs, enhanced prediction accuracy. The findings demonstrate the potential of ML models to monitor VO₂ noninvasively in real-time, offering valuable insights into the internal physiological demand during team sports activities.

Citation: Sheridan D, Jaspers A, Viet Cuong D, Op De Beéck T, Moyna NM, de Beukelaar TT, et al. (2025) Estimating oxygen uptake in simulated team sports using machine learning models and wearable sensor data: A pilot study. PLoS ONE 20(4): e0319760. https://doi.org/10.1371/journal.pone.0319760

Editor: Noman Naseer, Air University, PAKISTAN

Received: November 8, 2024; Accepted: February 6, 2025; Published: April 21, 2025

Copyright: © 2025 Sheridan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data can be accessed on Zenodo at the following DOI: doi: 10.5281/zenodo.14609092.

Funding: This work was conducted with the financial support of Science Foundation Ireland under grant numbers SFI/12/RC/2289_P2 and SFI/18/ CRT/6223. SFI, Insight Research Centre for Data Analytics, URL: https://www.sfi.ie/sfi-research-centres/insight, SFI/12/RC/2289_P2, Mark Roantree SFI, Center for Research Training in Artificial Intelligence, URL: https://www.crt-ai.ie, SFI/18/ CRT/6223, Dermot Sheridan. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

In team sports, including the various football codes, accurately assessing the players’ training status is of significant interest to coaches and sports scientists [1]. Systematic monitoring of the training process allows the evaluation of athletes’ physical output and internal physiological responses [2]. Various tracking technologies, including Global Navigation Satellite System (GNNS) and accelerometry, are available for monitoring external load [3], while monitoring heart rate (HR) and rating of perceived exertion (RPE) are standard methods for assessing an athlete’s internal load [4,5]. Understanding the relationship between internal and external loads is crucial as it provides insights into an athlete’s adaptations to training, indicating changes in fitness or fatigue state [6]. It is highly beneficial to connect external training load measures to relevant outcomes for effective training [1].

Maximal oxygen uptake (VO₂max) is a traditional indicator of an athlete’s aerobic power, which is crucial for sustaining high-intensity efforts over prolonged periods in team sports like soccer [7]. Studies have shown that players with higher VO₂max values tend to cover more distance during games, a critical factor for running-based team sports [8]. Therefore, monitoring physiological parameters is crucial; VO₂max and oxygen consumption at anaerobic threshold parameters are essential for assessing the metabolic demands of different field roles in team sports, including soccer [9]. However, due to scheduling, such testing, like spiroergometry or cardiopulmonary fitness assessments (e.g., VO₂max testing), often proves impractical during competitive periods [10]. Consequently, finding a method to monitor these changes unobtrusively throughout the season is key to accurately assessing training loads and tracking athletes’ physical fitness [11]. Such an approach would negate the need for frequent, invasive testing procedures like VO₂max tests, traditionally used to gauge adaptations over time.

A recent systematic review of the relationship between external, wearable sensor-based, and internal parameters emphasises two challenges. The first is whether we can capture and quantify the complex loading of team sports [12]. Running-based team sports are intermittent sports, consisting of hundreds of brief and very intense actions, such as jumps, tackles, changes of direction, accelerations, and decelerations [13]. Thus, more specific measures and devices are needed to identify loading in sports. IMU represent a valuable integration of sensor technologies, typically including 3D accelerometers, 3D gyroscopes, and 3D magnetometers in a single device. IMU data (100 Hz), as captured by current wearable devices in team sports, provides a sensitive measure of high-intensity actions [14]. This data has been used to create custom accelerometer metrics, which quantify three-dimensional movements and have been used to estimate VO₂ with varying accuracy for different physical activities [15,16]. These calculations reduce the 3D raw accelerometer data to a 1D vector, which is more convenient to handle but could result in losing value information. Additionally, the gyroscope data may offer a way to capture more information about movement during team sports. When combined with accelerometer data, it has been shown to enhance fatigue detection in runners [17].

The second challenge is whether we can accurately model the relationship between the sensors-based measure of external load and the athlete’s individual internal response [12]. Traditional statistical models are limited in modelling human physiological responses as they rely heavily on significant predictors and struggle with non-linear relationships and complex data. ML provides advantages, such as interpreting complex and non-linear patterns essential for accurate predictions and understanding of physiological responses [18]. Different from traditional ML, deep neural networks can process raw data directly and autonomously learn to identify complex, hidden features, thereby eliminating the need for manual feature extraction [19]. Notably, in the development of models for action pattern recognition, the prevalent deep models include Convolutional Neural Networks (CNN), Long-Short-Term Memory Networks (LSTM), along with their hybrid forms (Chang et al. 2023). Temporal convolutional neural networks using cardiorespiratory biosignals and power as input have been utilised to accurately predict VO₂ responses to varying exercise intensities, leveraging past data to forecast future cardiorespiratory dynamics [20]. Such models have demonstrated capability in estimating slower VO₂ kinetics with increasing exercise intensity, facilitating nonintrusive monitoring across different exertion levels [21]. Using a non-linear ML model with heart rate, respiration rate, and acceleration data from medical-grade wearables as input significantly reduced VO₂ estimation errors during the Bruce treadmill test, a progressively intense workout. This method has shown superior performance compared to previous heart rate-based estimations [22].

Further, integrating motion data from IMU, GNNS, and a heart rate sensor has markedly enhanced the accuracy of VO₂ estimation during outdoor running and walking, underlining the potential of neural networks for real-time assessments [23]. A personalised LSTM neural network model, trained on heart rate, mechanical power output, cadence, and respiratory rate, estimated individual VO₂ responses with high predictive accuracy across a range of cycling intensities [24]. This suggests a pathway for creating personalised models that can account for individual variations to improve the estimation of VO₂, enabling precise monitoring during games and training. Accurately estimating VO₂ for each player is an essential step in determining the contributions of the aerobic energy system to activities. Techniques like the excess post-exercise oxygen consumption plus delta lactate method (EPOC [La-]) and the accumulated oxygen deficit method (O2deficit) can provide insights into the anaerobic contribution, with anaerobic contributions accounting for a significant portion during high-intensity intermittent activities [25]. These advancements allow for a more accurate determination of the internal response, which helps us to evaluate the athlete’s training status.

This study compared the accuracy of various deep learning ML models in estimating individuals’ VO₂ from wearable sensor data during outdoor jogging and simulated team sports activities. We compared the prediction accuracy for the ML model using different IMU data representations of raw and engineered features. Finally, we analysed various combinations of body-worn accelerometers to evaluate their impact on VO₂ prediction. This comparative study seeks to determine the most effective data input method for ML models to estimate VO₂ during the high-intensity actions typical in team sports.

Materials and methods

Participants

A total of six healthy male team sports athletes (height: 182.55 ± 3.64 cm; mass: 79.62 ± 11.26 kg; VO₂max: 56.78 ± 3.83 mL⋅kg−1⋅min⁻¹) participated in the study. See Table 1 for detailed participant baseline characteristics. Participants were selected based on the following inclusion criteria: a minimum of four training sessions per week over two years, with at least three sessions being pitch-based training or games. Eligibility also required no self-reported history of metabolic, neurological, pulmonary, or cardiovascular diseases and no symptoms of lower extremity injuries for at least six months prior to the study. All participants provided written informed consent in accordance with the Declaration of Helsinki. The study was approved by the local ethics committee of Dublin City University (DCUREC/2021/256).

Protocol/data acquisition

Each athlete participated in two sessions at Dublin City University (DCU), spaced at least 48 hours apart. The first laboratory visit comprised three phases to assess VO₂: a resting phase, a sub-maximal exercise protocol, and a maximal graded exercise test (GXT). During the resting phase, VO₂ and respiratory frequency were averaged over five minutes to establish baseline metabolic rates [24]. The sub-maximal trial started at 9 km/h, increasing by 1 km/h every six minutes until blood lactate levels reached ≥ 3 mmol/L, with intermittent 1-minute rest periods for lactate sampling [26]. The treadmill gradient remained fixed at 1% to simulate the energetic cost of outdoor running [27]. Following a 5-minute rest, the maximal ramp incremental test commenced, starting at a speed 1 km/h below the final sub-maximal speed and increasing by 1 km/h each minute until reaching 16 km/h, followed by incremental increases in slope by 1% each minute until voluntary exhaustion. All tests were conducted under similar conditions (20–21 ∘C).

Download:

Table 1. Participant VO₂max and Resting VO₂.

https://doi.org/10.1371/journal.pone.0319760.t001

The second visit involved on-field tests on a synthetic pitch, incorporating a steady-state jog and an intermittent team sport simulated circuit developed from existing protocols [28]. Each circuit included three counter-movement jumps, an eight-meter jog, an eight-meter change of direction (COD) agility section, two jumps for distance, a 10 m sprint, seven meters of walking, and a tackle bag to be hit with force. These activities are designed to reflect the dynamic nature of team sports and lasted approximately 45 seconds, followed by 15 seconds of rest, repeated five times.

Sensor measurements

This section describes the data measured during the laboratory and field visits and the features used for the dynamic oxygen prediction models. Fig 1 shows the measurement setup of wearable sensors worn by the athlete during the protocol.

Oxygen Uptake (VO₂): Pulmonary gas exchange data were captured using a Cosmed K5 breath-by-breath metabolic analyser, calibrated with a specific gas mixture and a flow meter before each session. VO₂ values were normalised by body mass, providing a detailed measure of aerobic capacity for each breath [29].

Heart Rate (HR) & Breathing Rate (BR): HR and BR were monitored using Zephyr Bio Harness 3.0, a validated tool for physiological monitoring in sports settings [30]. The device was worn on the chest to ensure accurate measurement of cardiac and respiratory parameters.

Inertial Measurement Units (IMU): IMU measured linear acceleration, angular velocity, and magnetic field variations at five body locations: the lower back, both tibiae, and both wrists. These placements were chosen to capture whole-body movement dynamics, including limb-specific and core-generated movements relevant to team sports activities. Sensors were secured with adjustable straps and calibrated before each session to ensure alignment with anatomical landmarks and reduce signal noise. Data were sampled at 250 Hz, providing high-resolution motion data.

The IMU collected multidimensional signals, including: - Linear acceleration: Used to identify movement intensity and transitions between activity states. - Angular velocity: Captured rotational dynamics, particularly during changes in direction - Magnetic field variations: Used for orientation tracking to complement acceleration and angular velocity data.

Calibration procedure: All wearable sensors underwent a multi-step calibration process prior to data collection: 1. Static Calibration: The sensors were placed on a stable surface to establish a baseline for signal offset correction 2. Dynamic Calibration: Participants performed a series of controlled movements (for example, walking, jogging, and arm swings) to synchronise signals across all devices. 3. Signal Quality Check: Data were inspected in real time to confirm synchronisation and detect potential artifacts.

This setup ensured accurate and reliable data collection across all test conditions, facilitating detailed feature extraction for oxygen uptake prediction.

Download:

Fig 1. Measurement setup: The Cosmed K5 portable Metabolic (the gas analyser is worn on the back, and the face mask covers the mouth and nose), the Bio Harness 3 HR and BR device is worn under the shirt, and the Shimmer 3 IMU sensors are attached at five locations on the athlete’s bodies.

https://doi.org/10.1371/journal.pone.0319760.g001

Pre-processing

For each treadmill test, the athletes VO₂max was calculated as the maximum value of the rolling average of the VO₂ signal with a window length of 30 seconds. Recommendations for exercise physiologists to adopt these data processing strategies to reduce variability in VO₂ measurements have been published. A 15-breath average can correct a residual error in VO₂ datasets to within 10% of the raw variability [31]. We adapted to smoothed VO₂ with a 31-point moving average window to reduce interference noise [22]. We repeated the calculation of the maximum value from the 31-point moving average window of the VO₂ signal for comparison. The treadmill speed (km/h) for each stage of the sub-maximal and maximal test was added to the VO₂ data for visit 1; for the outdoor test, the GNSS speed (km/h) recorded on the Cosmed K5 device was used to determine the Speed of outdoor running. The Activity Four class labels (Resting, Treadmill Running, Outdoor Running, Simulated Team Sports Circuit) to describe the movement during the protocol were engineered and added to the VO₂ data. The subject’s physical characteristics, age (yrs), height (cm), weight (kg), resting oxygen uptake (VO₂rest), and VO₂max were included as features (Table 1). The Zephyr data (1Hz) was directly merged into the breath-by-breath VO₂ data. The 5 Shimmer IMU data files were merged into one single file. To achieve this, the data was resampled from 250Hz to 125Hz to facilitate matching times. This raw IMU data was merged with the breath-by-breath VO₂ data to preserve its frequency. The IMU data is marked by windows of each breath recorded in the VO₂ data, and an example of the data can be seen in Fig 2. Due to issues with two sensors (right arm and left leg) during different sessions that failed to record during the experiment, these two of the five IMU sensors were removed from the final data. Data only from the torso, right tibia and left wrist were used. The Magnetometer data was excluded from the analysis. This forms the RAW dataset.

Download:

Fig 2. Examples of IMU raw data from the Accelerometer and Gyroscope over a 3-breath window during three different activities: Treadmill Running (TM), Outdoor Running, and a Simulated Circuit.

https://doi.org/10.1371/journal.pone.0319760.g002

Data representation

To investigate the impact of dataset representation on estimating oxygen uptake during team sports activity, the time series data were represented in a different format by transforming the raw data. The 6-axis data of the IMU sensor that consists of 3-axis acceleration and 3-axis angular velocity was engineered into axis-specific mean amplitude deviation (MAD) values, and their resultant MADxyz were determined as follows:

(1)

The calculations were computed on the window of IMU data between the breaths, and the dataset was compressed into a single row for each breath. The MADxyz is sensitive to changes in the axis’s inclination angle and movement, making its magnitude always greater than or equal to MAD [15]. The same procedure was used to process each IMU sensor. In this way, two MADxyz (Accel xyz, Gyro xyz) individual features were obtained from one IMU, and six features were calculated for three IMUs. Taken together, this forms the engineered features dataset for the experiments.

Data structuring

The input data structure can significantly inuence on the deep learning performance results. Four input data structures of 1, 3, 5, and 7 breath windows were studied here for better input adaptation.

Machine learning approach

A supervised machine learning regression approach was utilised, employing a modified Leave-One-Subject-Out (LOSO) cross-validation strategy. This approach was selected to mimic real-world scenarios where models are deployed to predict outcomes for new, unseen individuals. By leaving one subject out for testing while training the model on data from all other subjects, LOSO cross-validation evaluates the model’s generalisation capability across individuals.

Our approach is inline by methodologies used in similar studies, such as Zignoli et al. (2020), which split trials into training and testing sets to evaluate model generalisability. In our study, data from the first visit of the test subject was included in the training phase, allowing the model to capture intra-individual variability before testing on data from the second visit. This modification ensures that the model is evaluated on entirely unseen session-level data while benefiting from information about the individual’s general fitness profile, simulating practical applications in sports monitoring.

A portion of the training data was set aside as a validation set to ensure the model’s reliability and accuracy. The validation set was used to select hyperparameters by minimising the root mean square error (RMSE). A grid search technique was employed to systematically identify the optimal hyperparameters, such as the number of layers and neurons, that achieved the lowest RMSE. This method ensures an unbiased and exhaustive search across the hyperparameter space, improving model performance and robustness. The same hyperparameter selection process was applied uniformly across all models to maintain consistency in model evaluation.

Linear regression.

Linear regression [32] is a linear method for modeling the relationship between dependent and independent variables by fitting a linear function to observed data. Its simplicity and interpretability make it a common starting point for evaluating more complex models. The coefficients of the function are derived from minimising the difference between the predicted values (outputs of the linear function) and actual data points. In this study, principal component analysis (PCA) was applied to transform the inputs before being forwarded to the linear regression model. Although PCA was not strictly necessary given the limited number of features, it was included to ensure consistency with standard preprocessing practices and to represent the input data in a compact, orthogonal space. Experiments revealed that retaining all six principal components yielded the best test scores, as dimensionality reduction did not improve performance. This preprocessing step concentrated information in the first few components, which, in theory, could facilitate easier learning for the model This approach maintains the computational efficiency and interpretability of the model while allowing for a direct comparison with more complex models to estimate the breath-by-breath VO₂.

Non-linear regression model.

XGBoost (Extreme Gradient Boosting): XGBoost is a powerful machine learning technique, its ability to handle non-linear relationships and its reputation for high performance and speed in both regression and classification problems, building on the Gradient Boosting Decision Trees (GBDT) framework. It optimises model performance and computational efficiency-specific advantages such as handling missing data and regularisation features to prevent overfitting and scalability for large datasets. In the XGBoost regression model, predictions result from summing the outputs of K decision trees, allowing for complex nonlinear relationships [33].

Architecture details: Various configurations were tested, including different numbers of trees (Number of Trees: 500, 1000, 3000, 5000) and depths (Maximum Depth: 3, 5, 7, 10).

Deep learning models

Deep learning models are considered as their architectures have demonstrated the ability to capture complex patterns and temporal dependencies in sequential data, which is critical for estimating VO in dynamic sports performance [19]. The impact of different deep learning neural network models was analysed to predict breath-by-breath VO₂ using raw time series data as input.

Multi-Layer Perceptron (MLP).

MLP networks are adept at handling non-linearity, internal randomness, and long-term unpredictability in time series data; they transform the high-dimensional input data into a manageable latent space to make accurate predictions [34].

Architecture details: The activation function is ReLU; there is no dropout, and L2-regularisation is used with the weight decay 1e-4 to reduce overfitting. Various configurations test different depths (from one to four layers) and widths (32 or 64 neurons per layer), examining how each configuration affects the model’s performance.

Long Short-Term Memory (LSTM).

The LSTM model is well-suited for time series predictions due to its ability to remember patterns based on previous timestep states, making it effective in capturing long-term dependencies within sequential data [35]. We adopt a bidirectional LSTM version, which processes its inputs in a bidirectional manner along the breath dimension to capture contextual information from past and future breaths, enhancing predictive accuracy.

Architecture details: In LSTM, we consider breath a time step, so we have a fixed length (7, 5, 3, 1 breath); no padding is needed. There is no dropout; we use regularisation with the weight decay 1e-4 to reduce overfitting. We experiment with various configurations, ranging from 1 to 4 hidden layers, each with 32 or 64 units.

Convolutional Neural Network (CNN).

A CNN is well-suited for time series predictions due to its effectiveness in extracting local temporal features from sequential data [36]. The model uses a series of convolutional layers; each CNN layer has kernels to slide the breath dimension, computing and extracting temporal features at every single timestep. Like the LSTM, the CNN focuses on the output from the middle breath for final processing, ensuring relevance to the temporal centre of the data.

Architecture details: A fully connected layer initially processes each breath and generates breath-level latent vectors prepared for subsequent 1D convolution.The 1D convolution is performed along the breath dimension, utilising ’same’ padding to maintain the original input shape by adding zero padding at the edges. A stride of 1 is used, ensuring the convolution operation moves along the breath dimension one step at a time. The CNN configurations vary in depth, ranging from 3 to 5 convolutional layers and in the number of kernels, using either 32 or 64 kernels per layer. This variability allows the model to learn features at different levels of abstraction. A consistent kernel size of 3 is applied across all convolutional layers, effectively capturing local temporal patterns while maintaining broader contextual information.

Sensors configurations.

The study also investigates the impact of various sensor placements on predictive accuracy:

Input Set A: HR + BR + IMU Torso
Input Set B: HR + BR + IMU Torso + IMU Arm
Input Set C: HR + BR + IMU Torso + IMU Leg
Input Set D: HR + BR + IMU Torso + IMU Arm + IMU Leg

Statistics

In this study, we explore the impact of different IMU data representations on model accuracy by comparing features derived from raw data (RAW) versus engineered data (MAD). We then evaluate the influence of various sensor configurations on the accuracy of VO₂ predictions. Subsequently, we conduct a residual analysis to assess the model’s prediction ability. This involves calculating residuals as the difference between the measured VO₂ values and the predicted VO₂ values. To quantify the accuracy of the predictions, we calculate the mean absolute error (MAE) and the RMSE for the residuals.

Further, a regression analysis of these residuals yields Pearson’s correlation coefficient (r) and explains the proportion of variance (R²) accounted for by each model. Additionally, we employ a Bland-Altman analysis to assess the agreement between the measured and predicted VO₂ values, calculating both the mean bias and the limits of agreement at a 95% confidence interval (twice the standard deviation). The Bland-Altman plot is particularly suited for this study as it provides a visual representation of the agreement between two measurement methods (predicted and measured VO₂ values), highlighting systematic biases and variability across the range of measurements. This complements traditional error metrics like MAE and RMSE by offering insight into how prediction errors vary with VO₂ magnitude, thus enhancing the interpretability of the model’s performance.

Results

Model performance across configurations: Table 2 presents the performance metrics (RMSE and MAE) for the tested machine learning models using different input configurations and data representations. The LSTM model with RAW data and Set C input configuration achieved the best overall performance, with the lowest RMSE (4.976) and MAE (3.698 ) on the test set. This indicates the LSTM model had the most accurate predictions among all tested configurations.

Other models, such as CNN and MLP, also demonstrated competitive results. For instance, the CNN with RAW data and Set C achieved an MAE of 4.174 , while the MLP with the same configuration achieved an MAE of 4.326 . Models using MAD data generally performed less effectively than those using RAW data.

The analysis of sensor configurations revealed that multi-sensor setups, such as Set B (torso and arm sensors) and Set C (torso and leg sensors), provided the highest prediction accuracy. Single-sensor setups, such as Input A (torso only), also yielded acceptable performance but were slightly less accurate.

Download:

Table 2. Performance metrics of ML models.

https://doi.org/10.1371/journal.pone.0319760.t002

Evaluation of agreement and prediction bias: Fig 3a illustrates the linear relationship between the predicted and measured VO₂ values for the LSTM model using RAW data and the Set C configuration. The R² value of 0.87 indicates that the model accounts for 87% of the variance in the measured VO₂ values, showcasing strong predictive performance.

Download:

Fig 3. (a) Linear correlation plot showing the relationship between predicted VO₂ and measured VO₂ for the LSTM model using RAW data and Set C input configuration, with an R² value of 0.87.

(b) Bland-Altman plot illustrating the difference between measured and predicted VO₂ values against the average of the two for all subjects combined. (c) Kernel Density Estimation (KDE) plot indicating the bimodal distribution of predicted VO₂ values.

https://doi.org/10.1371/journal.pone.0319760.g003

Fig 3b presents the Bland-Altman plot, which assesses the agreement between predicted and measured VO₂ values. The mean bias of 0.50 reflects a slight tendency toward overprediction. Most data points fall within the 95% limits of agreement (Upper LoA: 10.24, Lower LoA: –9.23), indicating consistent predictions across the range of VO₂ values. The Bland-Altman analysis provides an additional layer of insight into prediction discrepancies, complementing conventional performance metrics.

Download:

Fig 4. The box plot illustrates the residuals (predicted VO₂ minus measured VO₂) across different exercise conditions for the LSTM model using RAW data and Set C input configuration.

The exercise conditions include baseline, jogging, recovery1, circuit1, recovery2, circuit2, and recovery3.

https://doi.org/10.1371/journal.pone.0319760.g004

Fig 3c shows the kernel density estimation (KDE) of the predicted VO₂ values, revealing a bimodal distribution with peaks in lower and higher ranges. This distribution suggests that the model effectively differentiates between inactive and active states. However, the bimodal pattern indicates potential challenges in capturing nuanced transitions within active states or across varying intensities. This observation highlights an area for future improvements in feature engineering and model refinement to better capture intermediate states.

Residual analysis across exercise conditions: Residuals (predicted VO₂ minus measured VO₂) were analyzed across different exercise phases to assess the model’s performance in varying conditions. Fig 4 shows that the median residual is close to zero for most conditions, indicating minimal systematic bias. However, residual variability is higher during recovery phases, suggesting challenges in accurately capturing VO₂ kinetics during rapid transitions between exercise and rest.

Temporal predictions: Fig 5 compares breath-by-breath VO₂ predictions with measured values for the LSTM model using RAW data and Set C. While the model tracks the overall trends of VO₂ changes, deviations occur during high-intensity and recovery phases. The smoothed predictions (Fig 5) reduced MAE from 3.374 to 2.902 , demonstrating the utility of post-processing techniques in improving predictive accuracy.

Download:

Fig 5. The graph compares breath-by-breath VO₂ predictions (blue line) with measured VO₂ values (green line) for the LSTM model using RAW data and Set C input configuration for Subject 2.

The left plot shows unsmoothed predictions with an MAE of 3.374 (), while the right plot shows smoothed predictions with an MAE of 2.902 (). The plot includes different exercise and recovery phases, shaded as follows: baseline and recovery phases (light blue), jogging (light pink), and simulated soccer circuit (light green).

https://doi.org/10.1371/journal.pone.0319760.g005

Discussion

In this study, we investigated the ability of ML models to estimate individual VO₂ during simulated team sports activities using wearable sensor data. The residual analysis demonstrated that our ML models could accurately predict measured VO₂ during these activities, utilising data collected from wearable sensors during fitness testing. We compare deep-learning models that have shown potential in predicting VO₂ kinetics during intermittent transitions [20,21,23,24] against a MLR model, which served as a baseline for comparison. Comparative analysis revealed no significant advantage of deep learning models over the baseline MLR model in terms of predictive power (Table 2). Our best MLR model achieved an MAE of 3.79 (), second only to the LSTM model, which achieved an MAE of 3.69 () between predicted and measured VO₂.

We also investigated two different data representations for model performance. While deep learning models like LSTM and CNN showed strong performance with RAW data, MLR models remained competitive, particularly with MAD data representations. The choice of sensor configuration also played a significant role, with multiple sensor setups, such as torso and leg (Set C) or torso and arm (Set B), providing the most accurate predictions. A single torso-mounted sensor (Set A) notably provided good predictive performance.

This research represents the first application of ML models to predict VO₂ during simulated team sports activities, making direct comparisons with existing studies challenging. Some comparisons can be drawn with similar research. An LSTM model using inputs such as heart rate, mechanical power output, pedalling cadence, and respiratory frequency was employed to estimate VO₂ during variable high-intensity cycling exercises, achieving an MAE of approximately 3.5 () and an value of 0.89 [24]. In our study, the LSTM model achieved an value of 0.87 (Fig 3a), which is closely comparable. Similar to this approach, our LSTM model was trained using data from a GXT, with two arbitrary protocols of varying intensities used to evaluate predictive performance. The same study also compared their LSTM model against two baseline analytical models, which showed values of 0.83 and 0.90, respectively, indicating comparable performance between the LSTM and the baseline models. Our findings align with this observation, as our baseline MLR model performed similarly to our LSTM model, achieving an value of 0.87. These results suggest that while deep learning models like LSTM can effectively predict VO₂, simpler models like MLR also offer competitive accuracy.

Comparing our performance to an LSTM model that used motion features from GNSS and IMU data during unconstrained outdoor walking and running, the reported MAE was 1.36 (), which outperformed our best LSTM model by 2.33 () [23]. The experimental protocol in their study differs from ours; it involved four distinct three-minute exercise conditions, including two walking and two running sessions. These continuous conditions likely resulted in steady-state activity, which is supported by the performance of their LSTM model with HR-only input, achieving an MAE of 2.52 (). Their best-performing LSTM model utilised 93,151 total parameters and trained for over 8,000 epochs, indicating a substantial learning capacity that could potentially lead to overfitting, particularly when working with a smaller dataset.

As the complexity of the model increases, so does the risk of overfitting [37], with only marginal improvements over simpler models. This raises the question of whether deep learning significantly benefits this particular task. Notably, the performance of our deep learning and MLR models was comparable across different data representations, with both methods featuring among the top-performing models. One possible explanation is that deep learning models may overfit the training data, as the validation results indicate Table 2. The RMSE and MAE values for the test sets were consistently higher than those for the validation sets across all models. All deep learning models, LSTM, CNN, and MLP exhibited signs of overfitting of varying degrees, with the most pronounced overfitting observed in the MLP models, followed by the CNN and LSTM models. This pattern suggests that while these models perform well on validation data, their ability to generalise to unseen test data is less robust. We employed L2 regularisation, cross-validation, and early stopping techniques to mitigate the overfitting risk. Despite these measures, the challenge of achieving robust generalisation remains, highlighting the need for further research into optimising model complexity and training strategies.

A study conducted in a simulated futsal setting demonstrated that while VO₂ estimation using a simple linear regression equation derived from treadmill test HR data matched measured VO₂ at a group level (p-value = 0.38), it failed to provide reliable predictions at an individual level. This was evidenced by weak correlations and significant bias, as indicated by a Bland-Altman analysis showing a bias of −28 (), with errors reaching up to 19 () [38]. In contrast, our model shows a reduced bias of 0.51 (), with limits of agreement varying by only 9 (), demonstrating superior accuracy in reflecting individual physiological responses. Another study employed mixed-effects unpenalised linear regression model to predict VO₂ max using HR and accelerometer data during submaximal running. This model achieved a MAE of 2.33 () [39]. Our baseline MLR models have the most consistent performance of all models explored in this study, with performance for all six models presented in Table 2. These findings underscore the potential improvements that MLR models and data fusion from wearable sensors offer in enhancing VO₂ estimation. Linear models offer interpretability and robustness against overfitting, particularly in studies with small sample sizes [40]. Both the Silva et al. (2018) and Brabandere et al. (2018) used data from incremental fitness tests to build VO₂ estimation models, employing linear regression analysis of HR and VO₂ derived from treadmill tests, using a traditional approach [41]. Similarly, as shown in Fig 6, our incremental fitness test data demonstrated a clear linear relationship between HR and VO₂. However, this relationship was not as consistent during the outdoor simulated circuit test, where a significant variation in HR was observed while VO₂ remained steady.

Download:

Fig 6. Examples of oxygen uptake (VO₂) measurements (blue) and the easy-to-obtain input physiological variables are HR measurements (red) and BR (green) during the first (left) and second (right) visits for Subject 2.

https://doi.org/10.1371/journal.pone.0319760.g006

One of the strengths of deep learning models is their ability to capture these transitions, but they need adequate training data to learn the patterns [20]. One of our research questions was whether we could use data from laboratory fitness tests to build ML models to predict VO₂ during team sports activities. The structured exercise protocol employed during the incremental fitness test may not fully represent the unstructured activities we try to predict. Fig 4, the box plot, effectively illustrates the variability in the performance of an LSTM model for predicting VO₂ across different exercise conditions. The box plot reveals that the LSTM model with RAW data and Set C input configuration has, in general, a median residual of around 0 across different exercise conditions, indicating no significant prediction bias. However, there is noticeable variability in the residuals, particularly during the recovery phases. The model predicts more consistently during circuit activities than during the baseline and recovery phases.

Team sports’ simulated activities consist of variable-speed locomotion and high-intensity actions, such as changing direction, jumping, and sprinting [13]. These activities pose unique challenges when modelling VO₂. Fig 2 shows that the raw 3D accelerometer and 3D gyroscope data can capture differences in treadmill running, outdoor running, and team-sport simulated circuits over a 3-breath window, offering high-frequency data sources to detect intense movement changes. Each circuit in our protocol included eight individual movements repeated five times, significantly increasing the number of transitions. It has been shown that IMU signals can detect high-intensity sports movements [28].

Fig 5 shows the five individual simulated circuits captured in the predicted unsmooth VO₂ output. Fig 5 shows the smoothing function applied to the input VO₂ during steady-state measurements. However, this smoothing operation may not be suitable for processing VO₂ data during intermittent activities. It may have removed too much dynamic information from the data for the model to learn the VO₂ kinetics, contributing to the lag in our model’s performance. If transitional periods are not accounted for, this can lead to a loss of accuracy [42]. Fig 5 shows that during the recovery phase, our model struggles with these slow components, consistently overestimating the demands of recovery. When we apply the same smoothing to the VO₂ output (Fig 5) as was used on the VO₂ input, you can see underestimation during jogging and circuit 1, while circuit 2 fits well. While the model captures the general trend, significant variations and inaccuracies are observed, particularly during recovery. These discrepancies suggest that the model may require further refinement and training to improve its accuracy and reliability in tracking VO₂ dynamics during team sports activities.

One limitation of our study is the bimodal distribution observed in the predicted VO₂ values (Fig 3c). This pattern suggests that the model mainly distinguishes between active and inactive states. Although this behaviour aligns with physiological principles, where VO₂ varies significantly between rest and activity, it indicates that this distinction may dominate the predictive signal. As a result, the model may struggle to accurately capture nuanced changes within active states or during transitions between intensities. To address this, future work should consider incorporating features that could better represent transitions and intermediate states. Furthermore, expanding the dataset to include a wider range of intensities and activity transitions may improve the model’s ability to fully capture VO₂ dynamics.

Another limitation of our study is the need for more suitable intermittent testing protocols to better represent transitional periods and the dynamic nature of team sports in the training data. Addressing this limitation is crucial for improving model accuracy and reliability, as deep learning neural networks have demonstrated the ability to predict slower VO₂ kinetics and transitions effectively in structured protocols [21].

Finally, our dataset consisted of six participants, reduced from an initial eight due to incomplete data, all of whom were young, healthy adult males. This limits the generalisability of our findings to a broader population. Although this sample size was sufficient to provide good predictive power for neural network models, it is widely acknowledged that larger datasets are necessary for optimal performance and generalisability in neural network approaches.

The findings of this study highlight the potential practical applications of ML models in providing personalised predictions based on individual physiological responses. These models can facilitate the assessment of an athlete’s training status without the need for traditional fitness testing, offering real-time feedback that enables on-the-fly adjustments to training plans. This capability may enhance athletic performance and reduce the risk of injury by delivering precise, individualised feedback tailored to each athlete’s unique physiological profile. Recent studies have shown similar benefits, where ML indices predicted soccer players’ training status using heart rate and other physiological data, correlating strongly with submaximal run test outcomes [43].

Conclusion

This pilot study demonstrates the feasibility of using ML models to predict VO₂ using wearable sensor data during simulated team sports activities. To enhance the generalisability and accuracy of these models, future research should focus on expanding the dataset and incorporating more varied and complex training protocols. An intermittent field test, such as the Yo-Yo Intermittent Recovery Test Level 1, could provide more relevant training data while capturing different fitness levels [44]. Once fully developed, ML models could enable non-invasive monitoring of VO₂ during training sessions and competitive matches. By integrating wearable sensors with advanced algorithms, these models can provide real-time, individualised feedback, optimising athlete performance and well-being.

References

1. Impellizzeri FM, Shrier I, McLaren SJ, Coutts AJ, McCall A, Slattery K, et al. understanding training load as exposure and dose. Sports Med 2023;53(9):1667–79. pmid:37022589
- View Article
- PubMed/NCBI
- Google Scholar
2. Impellizzeri FM, Marcora SM, Coutts AJ. Internal and external training load: 15 years on. Int J Sports Physiol Perform 2019;14(2):270–3. pmid:30614348
- View Article
- PubMed/NCBI
- Google Scholar
3. Seshadri DR, Drummond C, Craker J, Rowbottom JR, Voos JE. Wearable devices for sports: new integrated technologies allow coaches, physicians, and trainers to better understand the physical demands of athletes in real time. IEEE Pulse 2017;8(1):38–43. pmid:28129141
- View Article
- PubMed/NCBI
- Google Scholar
4. Schneider C, Hanakam F, Wiewelhove T, Dweling A, Kellmann M, Meyer T, et al. Heart rate monitoring in team sports: A conceptual framework for contextualizing heart rate measures for training and recovery prescription. Front Physiol. 2018;9.
- View Article
- Google Scholar
5. Haddad M, Stylianides G, Djaoui L, Dellal A, Chamari K. Session-RPE method for training load monitoring: validity, ecological usefulness, and influencing factors. Front Neurosci. 2017;11:612. pmid:29163016
- View Article
- PubMed/NCBI
- Google Scholar
6. Halson SL. Monitoring training load to understand fatigue in athletes. Sports Med. 2014;44(S2):139–47. pmid:25200666
- View Article
- PubMed/NCBI
- Google Scholar
7. Osgnach C, di Prampero PE. Metabolic power in team sports - part 2: aerobic and anaerobic energy yields. Int J Sports Med 2018;39(8):588–95. pmid:29902809
- View Article
- PubMed/NCBI
- Google Scholar
8. Daly LS, Catháin CÓ, Kelly DT. Do players with superior physiological attributes outwork their less-conditioned counterparts? A study in Gaelic football. Biol Sport 2024;41(1):163–74. pmid:38188097
- View Article
- PubMed/NCBI
- Google Scholar
9. Manari D, Manara M, Zurini A, Tortorella G, Vaccarezza M, Prandelli N, et al. VO2Max and VO2AT: athletic performance and field role of elite soccer players. Sport Sci Health 2016;12(2):221–6.
- View Article
- Google Scholar
10. Doeven SH, Brink MS, Frencken WGP, Lemmink KAPM. Impaired player-coach perceptions of exertion and recovery during match congestion. Int J Sports Physiol Perform 2017;12(9):1151–6. pmid:28095076
- View Article
- PubMed/NCBI
- Google Scholar
11. Lacome M, Simpson B, Broad N, Buchheit M. Monitoring players’ readiness using predicted heart-rate responses to soccer drills. Int J Sports Physiol Perform 2018;13(10):1273–80. pmid:29688115
- View Article
- PubMed/NCBI
- Google Scholar
12. Helwig J, Diels J, Röll M, Mahler H, Gollhofer A, Roecker K, et al. Relationships between external, wearable sensor-based, and internal parameters: a systematic review. Sensors (Basel) 2023;23(2):827. pmid:36679623
- View Article
- PubMed/NCBI
- Google Scholar
13. Taylor JB, Wright AA, Dischiavi SL, Townsend MA, Marmon AR. Activity demands during multi-directional team sports: a systematic review. Sports Med 2017;47(12):2533–51. pmid:28801751
- View Article
- PubMed/NCBI
- Google Scholar
14. Nedergaard NJ, Kersting U, Lake M. Using accelerometry to quantify deceleration during a high-intensity soccer turning manoeuvre. J Sports Sci 2014;32(20):1897–905. pmid:25394197
- View Article
- PubMed/NCBI
- Google Scholar
15. Vähä-Ypyä H, Bretterhofer J, Husu P, Windhaber J, Vasankari T, Titze S, et al. Performance of different accelerometry-based metrics to estimate oxygen consumption during track and treadmill locomotion over a wide intensity range. Sensors (Basel) 2023;23(11):5073. pmid:37299803
- View Article
- PubMed/NCBI
- Google Scholar
16. Gómez-Carmona CD, Pino-Ortega J, Sánchez-Ureña B, Ibáñez SJ, Rojas-Valverde D. Accelerometry-based external load indicators in sport: too many options, same practical outcome?. Int J Environ Res Public Health 2019;16(24):5101. pmid:31847248
- View Article
- PubMed/NCBI
- Google Scholar
17. Chang P, Wang C, Chen Y, Wang G, Lu A. Identification of runner fatigue stages based on inertial sensors and deep learning. Front Bioeng Biotechnol. 2023;11:1302911. pmid:38047289
- View Article
- PubMed/NCBI
- Google Scholar
18. Zignoli A, Fornasiero A, Bertolazzi E, Pellegrini B, Schena F, Biral F, et al. State-of-the art concepts and future directions in modelling oxygen consumption and lactate concentration in cycling exercise. Sport Sci Health 2019;15(2):295–310.
- View Article
- Google Scholar
19. Tunca C, Salur G, Ersoy C. Deep learning for fall risk assessment with inertial sensors: utilizing domain knowledge in spatio-temporal gait parameters. IEEE J Biomed Health Inform 2020;24(7):1994–2005. pmid:31831454
- View Article
- PubMed/NCBI
- Google Scholar
20. Amelard R, Hedge ET, Hughson RL. Temporal convolutional networks predict dynamic oxygen uptake response from wearable sensors across exercise intensities. NPJ Digit Med 2021;4(1):156. pmid:34764446
- View Article
- PubMed/NCBI
- Google Scholar
21. Hedge ET, Amelard R, Hughson RL. Prediction of oxygen uptake kinetics during heavy-intensity cycling exercise by machine learning analysis. J Appl Physiol (1985) 2023;134(6):1530–6. pmid:37199779
- View Article
- PubMed/NCBI
- Google Scholar
22. Wang Z, Zhang Q, Lan K, Yang Z, Gao X, Wu A, et al. Enhancing instantaneous oxygen uptake estimation by non-linear model using cardio-pulmonary physiological and motion signals. Front Physiol. 2022;13:897412. pmid:36105296
- View Article
- PubMed/NCBI
- Google Scholar
23. Davidson P, Trinh H, Vekki S, Müller P. Surrogate Modelling for oxygen uptake prediction using LSTM neural network. Sensors (Basel) 2023;23(4):2249. pmid:36850848
- View Article
- PubMed/NCBI
- Google Scholar
24. Zignoli A, Fornasiero A, Ragni M, Pellegrini B, Schena F, Biral F, et al. Estimating an individual’s oxygen uptake during cycling exercise with a recurrent neural network trained from easy-to-obtain inputs: A pilot study. PLoS One 2020;15(3):e0229466. pmid:32163443
- View Article
- PubMed/NCBI
- Google Scholar
25. Panissa VLG, Fukuda DH, Caldeira RS, Gerosa-Neto J, Lira FS, Zagatto AM, et al. Is Oxygen uptake measurement enough to estimate energy expenditure during high-intensity intermittent exercise? Quantification of anaerobic contribution by different methods. Front Physiol. 2018;9:868. pmid:30038583
- View Article
- PubMed/NCBI
- Google Scholar
26. Garcia-Tabar I, Rampinini E, Gorostiaga EM. Lactate equivalent for maximal lactate steady state determination in soccer. Res Q Exerc Sport 2019;90(4):678–89. pmid:31479401
- View Article
- PubMed/NCBI
- Google Scholar
27. Jones AM, Doust JH. A 1% treadmill grade most accurately reflects the energetic cost of outdoor running. J Sports Sci 1996;14(4):321–7. pmid:8887211
- View Article
- PubMed/NCBI
- Google Scholar
28. Wundersitz DWT, Josman C, Gupta R, Netto KJ, Gastin PB, Robertson S. Classification of team sport activities using a single wearable tracking device. J Biomech 2015;48(15):3975–81. pmid:26472301
- View Article
- PubMed/NCBI
- Google Scholar
29. Perez-Suarez I, Martin-Rincon M, Gonzalez-Henriquez JJ, Fezzardi C, Perez-Regalado S, Galvan-Alvarez V, et al. Accuracy and precision of the COSMED K5 portable analyser. Front Physiol. 2018;9:1764. pmid:30622475
- View Article
- PubMed/NCBI
- Google Scholar
30. Hailstone J, Kilding AE. Reliability and validity of the Zephyr^TM BioHarness^TM to measure respiratory responses to exercise. Meas Phys Educ Exerc Sci 2011;15(4):293–300.
- View Article
- Google Scholar
31. Robergs RA, Dwyer D, Astorino T. Recommendations for improved data processing from expired gas analysis indirect calorimetry. Sports Med 2010;40(2):95–111. pmid:20092364
- View Article
- PubMed/NCBI
- Google Scholar
32. Bishop CM. Pattern recognition and machine learning. Information science and statistics. New York: Springer; 2006.
- View Article
- Google Scholar
33. Chen T, Guestrin C. XGBoost. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining; 2016. p. 785–785.
- View Article
- Google Scholar
34. Rozos E, Dimitriadis P, Mazi K, Koussis AD. A multilayer perceptron model for stochastic synthesis. Hydrology 2021;8(2):67.
- View Article
- Google Scholar
35. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput 1997;9(8):1735–80. pmid:9377276
- View Article
- PubMed/NCBI
- Google Scholar
36. Reusch RS, Juracy LR, Moraes FG. Assessment and optimization of 1D CNN model for human activity recognition. In: 2022 XII Brazilian symposium on computing systems engineering (SBESC). 2022. p. 1–7.
- View Article
- Google Scholar
37. Shirdel M, Asadi R, Do D, Hintlian M. Deep learning with kernel flow regularization for time series forecasting. arXiv. 2021. http://arxiv.org/abs/2109.11649.
- View Article
- Google Scholar
38. Silva P, Santos ED, Grishin M, Rocha JM. Validity of heart rate-based indices to measure training load and intensity in elite football players. J Strength Cond Res 2018;32(8):2340–7. pmid:28614162
- View Article
- PubMed/NCBI
- Google Scholar
39. De Brabandere A, Op De Beéck T, Schütte KH, Meert W, Vanwanseele B, Davis J. Data fusion of body-worn accelerometers and heart rate to predict VO2max during submaximal running. PLoS One 2018;13(6):e0199509. pmid:29958282
- View Article
- PubMed/NCBI
- Google Scholar
40. Allen AEA, Tkatchenko A. Machine learning of material properties: Predictive and interpretable multilinear models. Sci Adv. 2022;8(18):eabm7185. pmid:35522750
- View Article
- PubMed/NCBI
- Google Scholar
41. Achten J, Jeukendrup AE. Heart rate monitoring: applications and limitations. Sports Med 2003;33(7):517–38. pmid:12762827
- View Article
- PubMed/NCBI
- Google Scholar
42. Altini M, Penders J, Amft O. Estimating oxygen uptake during nonsteady-state activities and transitions using wearable sensors. IEEE J Biomed Health Inform 2016;20(2):469–75. pmid:25594986
- View Article
- PubMed/NCBI
- Google Scholar
43. Mandorino M, Clubb J, Lacome M. Predicting soccer players’ fitness status through a machine-learning approach. Int J Sports Physiol Perform 2024;19(5):443–53. pmid:38402880
- View Article
- PubMed/NCBI
- Google Scholar
44. Krustrup P, Mohr M, Amstrup T, Rysgaard T, Johansen J, Steensberg A, et al. The yo-yo intermittent recovery test: physiological response, reliability, and validity. Med Sci Sports Exerc 2003;35(4):697–705. pmid:12673156
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Impellizzeri FM, Shrier I, McLaren SJ, Coutts AJ, McCall A, Slattery K, et al. understanding training load as exposure and dose. Sports Med 2023;53(9):1667–79. pmid:37022589
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Impellizzeri FM, Marcora SM, Coutts AJ. Internal and external training load: 15 years on. Int J Sports Physiol Perform 2019;14(2):270–3. pmid:30614348
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Seshadri DR, Drummond C, Craker J, Rowbottom JR, Voos JE. Wearable devices for sports: new integrated technologies allow coaches, physicians, and trainers to better understand the physical demands of athletes in real time. IEEE Pulse 2017;8(1):38–43. pmid:28129141
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Schneider C, Hanakam F, Wiewelhove T, Dweling A, Kellmann M, Meyer T, et al. Heart rate monitoring in team sports: A conceptual framework for contextualizing heart rate measures for training and recovery prescription. Front Physiol. 2018;9.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref5] 5. Haddad M, Stylianides G, Djaoui L, Dellal A, Chamari K. Session-RPE method for training load monitoring: validity, ecological usefulness, and influencing factors. Front Neurosci. 2017;11:612. pmid:29163016
View Article
PubMed/NCBI
Google Scholar

[17] View Article

[18] PubMed/NCBI

[19] Google Scholar

[ref6] 6. Halson SL. Monitoring training load to understand fatigue in athletes. Sports Med. 2014;44(S2):139–47. pmid:25200666
View Article
PubMed/NCBI
Google Scholar

[21] View Article

[22] PubMed/NCBI

[23] Google Scholar

[ref7] 7. Osgnach C, di Prampero PE. Metabolic power in team sports - part 2: aerobic and anaerobic energy yields. Int J Sports Med 2018;39(8):588–95. pmid:29902809
View Article
PubMed/NCBI
Google Scholar

[25] View Article

[26] PubMed/NCBI

[27] Google Scholar

[ref8] 8. Daly LS, Catháin CÓ, Kelly DT. Do players with superior physiological attributes outwork their less-conditioned counterparts? A study in Gaelic football. Biol Sport 2024;41(1):163–74. pmid:38188097
View Article
PubMed/NCBI
Google Scholar

[29] View Article

[30] PubMed/NCBI

[31] Google Scholar

[ref9] 9. Manari D, Manara M, Zurini A, Tortorella G, Vaccarezza M, Prandelli N, et al. VO2Max and VO2AT: athletic performance and field role of elite soccer players. Sport Sci Health 2016;12(2):221–6.
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref10] 10. Doeven SH, Brink MS, Frencken WGP, Lemmink KAPM. Impaired player-coach perceptions of exertion and recovery during match congestion. Int J Sports Physiol Perform 2017;12(9):1151–6. pmid:28095076
View Article
PubMed/NCBI
Google Scholar

[36] View Article

[37] PubMed/NCBI

[38] Google Scholar

[ref11] 11. Lacome M, Simpson B, Broad N, Buchheit M. Monitoring players’ readiness using predicted heart-rate responses to soccer drills. Int J Sports Physiol Perform 2018;13(10):1273–80. pmid:29688115
View Article
PubMed/NCBI
Google Scholar

[40] View Article

[41] PubMed/NCBI

[42] Google Scholar

[ref12] 12. Helwig J, Diels J, Röll M, Mahler H, Gollhofer A, Roecker K, et al. Relationships between external, wearable sensor-based, and internal parameters: a systematic review. Sensors (Basel) 2023;23(2):827. pmid:36679623
View Article
PubMed/NCBI
Google Scholar

[44] View Article

[45] PubMed/NCBI

[46] Google Scholar

[ref13] 13. Taylor JB, Wright AA, Dischiavi SL, Townsend MA, Marmon AR. Activity demands during multi-directional team sports: a systematic review. Sports Med 2017;47(12):2533–51. pmid:28801751
View Article
PubMed/NCBI
Google Scholar

[48] View Article

[49] PubMed/NCBI

[50] Google Scholar

[ref14] 14. Nedergaard NJ, Kersting U, Lake M. Using accelerometry to quantify deceleration during a high-intensity soccer turning manoeuvre. J Sports Sci 2014;32(20):1897–905. pmid:25394197
View Article
PubMed/NCBI
Google Scholar

[52] View Article

[53] PubMed/NCBI

[54] Google Scholar

[ref15] 15. Vähä-Ypyä H, Bretterhofer J, Husu P, Windhaber J, Vasankari T, Titze S, et al. Performance of different accelerometry-based metrics to estimate oxygen consumption during track and treadmill locomotion over a wide intensity range. Sensors (Basel) 2023;23(11):5073. pmid:37299803
View Article
PubMed/NCBI
Google Scholar

[56] View Article

[57] PubMed/NCBI

[58] Google Scholar

[ref16] 16. Gómez-Carmona CD, Pino-Ortega J, Sánchez-Ureña B, Ibáñez SJ, Rojas-Valverde D. Accelerometry-based external load indicators in sport: too many options, same practical outcome?. Int J Environ Res Public Health 2019;16(24):5101. pmid:31847248
View Article
PubMed/NCBI
Google Scholar

[60] View Article

[61] PubMed/NCBI

[62] Google Scholar

[ref17] 17. Chang P, Wang C, Chen Y, Wang G, Lu A. Identification of runner fatigue stages based on inertial sensors and deep learning. Front Bioeng Biotechnol. 2023;11:1302911. pmid:38047289
View Article
PubMed/NCBI
Google Scholar

[64] View Article

[65] PubMed/NCBI

[66] Google Scholar

[ref18] 18. Zignoli A, Fornasiero A, Bertolazzi E, Pellegrini B, Schena F, Biral F, et al. State-of-the art concepts and future directions in modelling oxygen consumption and lactate concentration in cycling exercise. Sport Sci Health 2019;15(2):295–310.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref19] 19. Tunca C, Salur G, Ersoy C. Deep learning for fall risk assessment with inertial sensors: utilizing domain knowledge in spatio-temporal gait parameters. IEEE J Biomed Health Inform 2020;24(7):1994–2005. pmid:31831454
View Article
PubMed/NCBI
Google Scholar

[71] View Article

[72] PubMed/NCBI

[73] Google Scholar

[ref20] 20. Amelard R, Hedge ET, Hughson RL. Temporal convolutional networks predict dynamic oxygen uptake response from wearable sensors across exercise intensities. NPJ Digit Med 2021;4(1):156. pmid:34764446
View Article
PubMed/NCBI
Google Scholar

[75] View Article

[76] PubMed/NCBI

[77] Google Scholar

[ref21] 21. Hedge ET, Amelard R, Hughson RL. Prediction of oxygen uptake kinetics during heavy-intensity cycling exercise by machine learning analysis. J Appl Physiol (1985) 2023;134(6):1530–6. pmid:37199779
View Article
PubMed/NCBI
Google Scholar

[79] View Article

[80] PubMed/NCBI

[81] Google Scholar

[ref22] 22. Wang Z, Zhang Q, Lan K, Yang Z, Gao X, Wu A, et al. Enhancing instantaneous oxygen uptake estimation by non-linear model using cardio-pulmonary physiological and motion signals. Front Physiol. 2022;13:897412. pmid:36105296
View Article
PubMed/NCBI
Google Scholar

[83] View Article

[84] PubMed/NCBI

[85] Google Scholar

[ref23] 23. Davidson P, Trinh H, Vekki S, Müller P. Surrogate Modelling for oxygen uptake prediction using LSTM neural network. Sensors (Basel) 2023;23(4):2249. pmid:36850848
View Article
PubMed/NCBI
Google Scholar

[87] View Article

[88] PubMed/NCBI

[89] Google Scholar

[ref24] 24. Zignoli A, Fornasiero A, Ragni M, Pellegrini B, Schena F, Biral F, et al. Estimating an individual’s oxygen uptake during cycling exercise with a recurrent neural network trained from easy-to-obtain inputs: A pilot study. PLoS One 2020;15(3):e0229466. pmid:32163443
View Article
PubMed/NCBI
Google Scholar

[91] View Article

[92] PubMed/NCBI

[93] Google Scholar

[ref25] 25. Panissa VLG, Fukuda DH, Caldeira RS, Gerosa-Neto J, Lira FS, Zagatto AM, et al. Is Oxygen uptake measurement enough to estimate energy expenditure during high-intensity intermittent exercise? Quantification of anaerobic contribution by different methods. Front Physiol. 2018;9:868. pmid:30038583
View Article
PubMed/NCBI
Google Scholar

[95] View Article

[96] PubMed/NCBI

[97] Google Scholar

[ref26] 26. Garcia-Tabar I, Rampinini E, Gorostiaga EM. Lactate equivalent for maximal lactate steady state determination in soccer. Res Q Exerc Sport 2019;90(4):678–89. pmid:31479401
View Article
PubMed/NCBI
Google Scholar

[99] View Article

[100] PubMed/NCBI

[101] Google Scholar

[ref27] 27. Jones AM, Doust JH. A 1% treadmill grade most accurately reflects the energetic cost of outdoor running. J Sports Sci 1996;14(4):321–7. pmid:8887211
View Article
PubMed/NCBI
Google Scholar

[103] View Article

[104] PubMed/NCBI

[105] Google Scholar

[ref28] 28. Wundersitz DWT, Josman C, Gupta R, Netto KJ, Gastin PB, Robertson S. Classification of team sport activities using a single wearable tracking device. J Biomech 2015;48(15):3975–81. pmid:26472301
View Article
PubMed/NCBI
Google Scholar

[107] View Article

[108] PubMed/NCBI

[109] Google Scholar

[ref29] 29. Perez-Suarez I, Martin-Rincon M, Gonzalez-Henriquez JJ, Fezzardi C, Perez-Regalado S, Galvan-Alvarez V, et al. Accuracy and precision of the COSMED K5 portable analyser. Front Physiol. 2018;9:1764. pmid:30622475
View Article
PubMed/NCBI
Google Scholar

[111] View Article

[112] PubMed/NCBI

[113] Google Scholar

[ref30] 30. Hailstone J, Kilding AE. Reliability and validity of the Zephyr^TM BioHarness^TM to measure respiratory responses to exercise. Meas Phys Educ Exerc Sci 2011;15(4):293–300.
View Article
Google Scholar

[115] View Article

[116] Google Scholar

[ref31] 31. Robergs RA, Dwyer D, Astorino T. Recommendations for improved data processing from expired gas analysis indirect calorimetry. Sports Med 2010;40(2):95–111. pmid:20092364
View Article
PubMed/NCBI
Google Scholar

[118] View Article

[119] PubMed/NCBI

[120] Google Scholar

[ref32] 32. Bishop CM. Pattern recognition and machine learning. Information science and statistics. New York: Springer; 2006.
View Article
Google Scholar

[122] View Article

[123] Google Scholar

[ref33] 33. Chen T, Guestrin C. XGBoost. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining; 2016. p. 785–785.
View Article
Google Scholar

[125] View Article

[126] Google Scholar

[ref34] 34. Rozos E, Dimitriadis P, Mazi K, Koussis AD. A multilayer perceptron model for stochastic synthesis. Hydrology 2021;8(2):67.
View Article
Google Scholar

[128] View Article

[129] Google Scholar

[ref35] 35. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput 1997;9(8):1735–80. pmid:9377276
View Article
PubMed/NCBI
Google Scholar

[131] View Article

[132] PubMed/NCBI

[133] Google Scholar

[ref36] 36. Reusch RS, Juracy LR, Moraes FG. Assessment and optimization of 1D CNN model for human activity recognition. In: 2022 XII Brazilian symposium on computing systems engineering (SBESC). 2022. p. 1–7.
View Article
Google Scholar

[135] View Article

[136] Google Scholar

[ref37] 37. Shirdel M, Asadi R, Do D, Hintlian M. Deep learning with kernel flow regularization for time series forecasting. arXiv. 2021. http://arxiv.org/abs/2109.11649.
View Article
Google Scholar

[138] View Article

[139] Google Scholar

[ref38] 38. Silva P, Santos ED, Grishin M, Rocha JM. Validity of heart rate-based indices to measure training load and intensity in elite football players. J Strength Cond Res 2018;32(8):2340–7. pmid:28614162
View Article
PubMed/NCBI
Google Scholar

[141] View Article

[142] PubMed/NCBI

[143] Google Scholar

[ref39] 39. De Brabandere A, Op De Beéck T, Schütte KH, Meert W, Vanwanseele B, Davis J. Data fusion of body-worn accelerometers and heart rate to predict VO2max during submaximal running. PLoS One 2018;13(6):e0199509. pmid:29958282
View Article
PubMed/NCBI
Google Scholar

[145] View Article

[146] PubMed/NCBI

[147] Google Scholar

[ref40] 40. Allen AEA, Tkatchenko A. Machine learning of material properties: Predictive and interpretable multilinear models. Sci Adv. 2022;8(18):eabm7185. pmid:35522750
View Article
PubMed/NCBI
Google Scholar

[149] View Article

[150] PubMed/NCBI

[151] Google Scholar

[ref41] 41. Achten J, Jeukendrup AE. Heart rate monitoring: applications and limitations. Sports Med 2003;33(7):517–38. pmid:12762827
View Article
PubMed/NCBI
Google Scholar

[153] View Article

[154] PubMed/NCBI

[155] Google Scholar

[ref42] 42. Altini M, Penders J, Amft O. Estimating oxygen uptake during nonsteady-state activities and transitions using wearable sensors. IEEE J Biomed Health Inform 2016;20(2):469–75. pmid:25594986
View Article
PubMed/NCBI
Google Scholar

[157] View Article

[158] PubMed/NCBI

[159] Google Scholar

[ref43] 43. Mandorino M, Clubb J, Lacome M. Predicting soccer players’ fitness status through a machine-learning approach. Int J Sports Physiol Perform 2024;19(5):443–53. pmid:38402880
View Article
PubMed/NCBI
Google Scholar

[161] View Article

[162] PubMed/NCBI

[163] Google Scholar

[ref44] 44. Krustrup P, Mohr M, Amstrup T, Rysgaard T, Johansen J, Steensberg A, et al. The yo-yo intermittent recovery test: physiological response, reliability, and validity. Med Sci Sports Exerc 2003;35(4):697–705. pmid:12673156
View Article
PubMed/NCBI
Google Scholar

[165] View Article

[166] PubMed/NCBI

[167] Google Scholar

Figures

Abstract

Introduction

Materials and methods

Participants

Protocol/data acquisition

Sensor measurements

Pre-processing

Data representation

Data structuring

Machine learning approach

Linear regression.

Non-linear regression model.

Deep learning models

Multi-Layer Perceptron (MLP).

Long Short-Term Memory (LSTM).

Convolutional Neural Network (CNN).

Sensors configurations.

Statistics

Results

Discussion

Conclusion

References