Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Data augmentation of time-series data in human movement biomechanics: A scoping review

  • Christina Halmich ,

    Roles Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    christina.halmich@salzburgresearch.at

    Affiliations Department of Artificial Intelligence and Human Interfaces, Paris-Lodron-University Salzburg, Salzburg, Austria, Salzburg Research Forschungsgesellschaft mbH, Salzburg, Austria

  • Lucas Höschler ,

    Contributed equally to this work with: Lucas Höschler, Christoph Schranz

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Writing – review & editing

    Affiliation Department of Sport and Exercise Science, Paris-Lodron-University Salzburg, Salzburg, Austria

  • Christoph Schranz ,

    Contributed equally to this work with: Lucas Höschler, Christoph Schranz

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Writing – review & editing

    Affiliation Salzburg Research Forschungsgesellschaft mbH, Salzburg, Austria

  • Christian Borgelt

    Roles Writing – review & editing

    Affiliation Department of Artificial Intelligence and Human Interfaces, Paris-Lodron-University Salzburg, Salzburg, Austria

Abstract

Background: The integration of machine learning and deep learning methodologies has transformed data analytics in biomechanics. However, the field faces challenges such as limited large-scale data sets, high data acquisition costs, and restricted participant access that hinder the development of robust algorithms. Additional issues include variability in sensor placement, soft tissue artifacts, and low diversity in movement patterns. These challenges make it difficult to train models that perform reliably across individuals, tasks, and settings. Data augmentation can help address these limitations, but its use in biomechanical time-series data remains insufficiently evaluated. Objective: This scoping review on data augmentation for biomechanical time-series data focuses on examining current techniques, evaluating their effectiveness, and offering recommendations for their application. Design: Four online databases (PubMed, IEEE Xplore, Scopus, and Web of Science) were used to find studies published between 2013 and 2024. Following PRISMA-ScR guidelines, two screening processes were conducted to identify relevant publications. Results: 21 publications were identified as relevant. There is no universal best practice for augmenting biomechanical time-series data; instead, methods vary based on study aims. A key issue identified is the absence of soft tissue artifacts in synthetic data, leading to discrepancies and emphasizing the need for realistic techniques. Furthermore, many studies lack proper evaluation of augmentation methods, making it difficult to understand the effects of different techniques. This understanding is crucial for assessing the impact of the augmented data set on downstream models and evaluating the quality of the data augmentation process. Conclusion: This review highlights the importance of data augmentation in addressing limited data availability and improving model generalization in biomechanics. Tailoring augmentation to data characteristics can enhance the performance and relevance of predictive models. However, understanding how different augmentation techniques impact data quality and downstream performance remains essential for developing better methods.

Introduction

With the rise and advancements of Machine Learning and Deep Learning methodologies, there has been a noticeable transformation in data analytics across various fields, including human movement biomechanics [1]. The availability of wearable sensor data in recent years has further fueled this shift by enabling data-driven analytics through the generation of extensive data sets [2]. However, the complex multi-variable nature of human movement data poses challenges for traditional analytical methods, necessitating approaches capable of handling data-intensive tasks [3].

Machine learning, and deep learning in particular, have achieved remarkable success across domains such as computer vision, natural language processing, and medical imaging. These advances have also extended to biomechanics, where data-driven models are increasingly used to analyze complex movement patterns, extract meaningful features, and support clinical decision-making [3,4]. In addition, deep learning methods have emerged as a solution to address these changing challenges in human movement biomechanics by facilitating the analyses of large biomechanical data sets, extraction of relevant features, uncovering hidden relationships, and revealing emerging trends, thus advancing the understanding of human movement dynamics [3,4].

As a result, machine and deep learning methods are increasingly used for tasks such as gait classification [57], joint angle and moment prediction [8,9], fall detection [10,11], and hand movement recognition in sign language [12]. These applications typically rely on time-series data from wearable or laboratory-based sensors, such as inertial measurement units (IMUs), force plates, and electromyography (EMG), spanning domains including clinical assessment and sports performance analysis.

Despite the increasing adoption of deep learning in human movement biomechanics, the scarcity of large-scale biomechanical data sets remains a significant obstacle [13]. Most prominent deep learning architectures, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), are inherently data-intensive [14]. They rely on large, diverse and well-annotated training sets to capture the full range of variability in human motion [14]. These architectures contain millions of parameters and require abundant data to learn hierarchical representations and avoid overfitting. With sparse training data, these models often memorize noise or irrelevant features, resulting in poor generalization to new participants, tasks, or sensor configurations [5,15]. Although regularization strategies such as dropout, weight decay, or cross-validation are commonly used, they offer limited protection when the data is fundamentally sparse [16]. This challenge is further compounded in biomechanical datasets, which frequently exhibit class imbalance, session-specific artifacts, and subject-specific variability. As a result, deep learning pipelines trained on limited biomechanical time-series data often yield fragile models that perform well only under narrowly defined conditions.

This limitation is especially problematic in domains like rehabilitation, sports science, or rare condition monitoring, where collecting large-scale, labeled datasets is time-consuming, expensive, or ethically constrained [13,17]. The acquisition of biomechanical data comes with particularly high effort, as the measurement setup requires physical effort from participants, extended measurement duration under varying influence factors, and time-consuming data preprocessing. This makes many research projects economically challenging both in terms of cost and time [13,17]. Additionally, the limited availability of participants, coupled with challenges in obtaining informed consent, is further underlining this issue [13]. Moreover, data collection using sensors can be impeded by uncontrollable conditions (e.g., equipment malfunctions, human error, sensor errors due to soft tissue) leading to data quality issues or loss [4]. Consequently, enhancing the robustness and generalizability of machine learning models in these settings requires strategies that extend or enrich existing datasets. While merging datasets could offer increased diversity, it is often impractical due to heterogeneity in sensor types and placements, feature representations, and activity definitions, especially across datasets collected in different research settings [18]. Therefore, data augmentation techniques have emerged as a promising solution to simulate variability without requiring data set merging.

Data augmentation involves expanding existing data sets and increasing data diversity by modifying original samples, thus enhancing the robustness and generalization capabilities of machine learning models [15]. It is not uncommon for data generation and data augmentation to be used interchangeably. However, it is important to note that data generation involves the creation of new synthetic samples, whereas data augmentation pertains to the modification of existing data samples. For the sake of simplicity within this work, data augmentation will be taken to encompass data generation methods.

Initially applied in image recognition, data augmentation techniques have proven effective in expanding data sets and creating diversity [1921]. However, the application of these techniques to time-series data, prevalent in human movement biomechanics, presents unique challenges.

Time-series data augmentation requires careful consideration of temporal dependencies and domain-specific constraints to generate synthetic samples that accurately represent real-world phenomena [15,19,21]. In computer vision, the effects of data augmentation are often visually interpretable, making it easier to detect unrealistic or invalid transformations. In contrast, the temporal dependencies and biomechanical plausibility required in time-series data make it far more difficult to identify when an augmentation distorts the underlying signal structure or introduces biologically implausible patterns [15,19,21]. Moreover, the aim of data synthesis in biomechanics is to generate data that not only looks visually realistic but also improves the quality of subsequent analysis such as classification tasks.

Existing augmentation techniques in biomechanics primarily rely on physics- or statistics-based methods, which may not represent valid biomechanical samples due to simplifications in formulas used or realism of rotations that are introduced [4,10,22]. Alternatively, data-driven methods utilizing machine learning techniques, such as Generative Adversarial Networks (GANs), diffusion models, or auto-regression, are not yet commonly used in biomechanics [23,24] nor validated in the biomechanical setting.

To the best of our knowledge, there exists no comprehensive comparison of various augmentation techniques for biomechanical time-series data nor clear guidance on which methods to employ. This gap in the literature can lead to inefficiencies and inconsistencies in research, as well as hinder advancements in the field. By systematically exploring and analyzing current data augmentation techniques used for biomechanical time-series data across various tasks, this work aims to provide a systematic overview of existing data augmentation methods and to offer guidance for researchers working with biomechanical time-series data. This topic is particularly timely as the use of machine learning in biomechanics continues to expand and pipelines become more standardized, yet data augmentation methods are often adopted from other domains, such as computer vision, without sufficient validation for the unique constraints of biomechanical time-series data. Improvements in augmentation techniques in these areas are crucial because only with sufficient and well-augmented data can we enhance the accuracy and robustness of predictive models, facilitate the development of personalized training and rehabilitation programs, and ultimately lead to better health outcomes and performance optimizations. Understanding when and how to apply these methods is critical, given their growing use to address fundamental limitations in biomechanical data collection.

Materials and methods

Eligibility criteria

Aligned with the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews [25]) framework, this scoping review systematically searches, selects, analyzes, and reports relevant literature. Thereby ensuring transparency throughout the reviewing process. The eligibility criteria were set to ensure a focused and comprehensive review of studies and research concerning data augmentation for biomechanical time-series data in human movement biomechanics. The following criteria were established:

  1. Publication Types:
    • Peer-reviewed journal articles
    • Conference proceedings
  2. Publication Period:
    • Publications between January 2013 and July 2024.
  3. Language:
    • Publications must be written in English.
  4. Topic Relevance:
    • Publications must address biomechanical aspects using data augmentation techniques on time-series data.
  5. Exclusion Criteria:
    • Publications that solely address biomechanical aspects without discussing data augmentation techniques.
    • Studies that only utilize non-time-series data.
    • Publications that do not analyze human movement, such as a focus on animal biomechanics or non-biological systems.

These criteria ensure that the review focuses specifically on relevant studies that contribute to the understanding of data augmentation techniques in the context of biomechanical time-series data analysis within human movement biomechanics.

Information sources

Four databases, PubMed, Scopus, IEEE Xplore, and Web of Science , were searched for peer-reviewed journals and conference proceedings. While Scopus and Web of Science cover interdisciplinary research across various fields, IEEE Xplore added conference proceedings from engineering, technology, and computer science. Additionally, PubMed was incorporated to encompass biomedical, life science, and health-related disciplines. Publications from January 2013 until July 2024 were considered. The initial database search was conducted in November 2023, extracting publications from PubMed, Scopus, and IEEE Xplore. The decision to include Web of Science was made in January 2024. The final database search for all databases was conducted in July 20204.

Search

Our search strategy was designed to identify all indexed publications utilizing time-series data generation methods within the biomechanics domain.

Initial query formulation.

To formulate effective search queries, we began by defining primary terms related to our topic. Key terms such as “time-series,” “data augmentation,” and “biomechanics” were selected to ensure the inclusion of papers that utilized data augmentation for biomechanical time-series data sets. We then explored synonyms and variations of these terms to achieve comprehensive coverage. To capture the temporal aspects, we used multiple variants of the keywords including “time-series,” “temporal,” “sequential,” “waveform,” and “one-dimensional.” The domain-specific terms were centered around biomechanics. Keywords such as “data augmentation” and its synonyms like “synthetic*” and “generate*” were employed to filter for papers focused on data augmentation and synthetic data generation. The term “generate” produced numerous false positives, necessitating additional qualifiers like “data” and “sample” to improve specificity. Additionally, domain-specific features indicative of time-series data, such as “gait” and “kinematics,” were incorporated.

Iterative refinement process.

Our search strategy underwent iterative refinement based on initial search results and feedback from collaborators. After an initial database search, we reviewed the retrieved articles and identified areas for refinement or expansion of our search terms. Adjustments were made to incorporate additional keywords, synonyms, and database-specific search techniques, enhancing the thoroughness of our approach. This iterative process systematically improved the effectiveness of our search strategy, ensuring comprehensive coverage of relevant literature.

Initial search challenges.

The initial search yielded only a few relevant publications. Identifying two key papers that were missing from our results highlighted the need to explore alternative search terms commonly used in biomechanics. It was discovered that machine learning terms are infrequently used in biomechanical literature; instead, verbs denoting the outcome of the method, such as “synthetic data set” or “generated data,” are preferred.

The term “data augmentation” yielded limited results, leading to the inclusion of synonyms and qualifiers. Similarly, alternative mechanical features indicative of time-series data were incorporated. To ensure relevance to machine learning, the term “learning” was refined to “machine learning” and “deep learning” due to initial challenges.

Final search queries.

This procedure resulted in the formulation of four final search queries, where an asterisk is a wildcard operator:

  1. (“time-serie*” OR “timeserie*” OR “time serie*” OR “temporal data” OR “temporal sequence” OR “periodical data” OR “sequential data” OR “time structured” OR “time sequence”) AND (“data augmentation” OR (“synth*” AND “data”) OR “data generation” OR “data enhancement” OR “data enrichment” OR “data creation”) AND “biomech*” AND (“machine learning” OR “deep learning”)
  2. (“time-serie*” OR “timeserie*” OR “time serie*” OR “temporal data” OR “temporal sequence” OR “periodical data” OR “sequential data” OR “time structured” OR “time sequence”) AND ((“synth*” OR “generated” OR “augmented” OR “enhanced” OR “enriched” OR “created”) AND (“data” OR “sample*”)) AND “biomech*” AND (“machine learning” OR “deep learning”)
  3. ((“synth*” OR “generat*” OR “augment*” OR “enhance*” OR “enrich*” OR “creat*” OR “simulat*”) AND (“data” OR “sample*”)) AND “biomech*” AND (“machine learning” OR “deep learning”) AND ( “gait” OR “inertial sensors” OR “IMU” OR “kinematics” OR “Ground Reaction Force” OR “inverse dynamics” OR “joint moments” OR “kinetics” OR “waveform” OR “joint angles” OR “power” OR “one dimensional data”)
  4. (“time-series” OR “timeseries” OR “time series” OR “temporal data” OR “temporal sequence” OR “periodical data” OR “sequential data” OR “time structured” OR “time sequence”) AND ((“synth*” OR “generat*” OR “augment*” OR “enhance*” OR “enrich*” OR “creat*” OR “simulat*”) AND (“data” OR “sample*”))AND “biomech*” AND (“machine learning” OR “deep learning”)

Table 1 shows the final queries and the number of publications found in each database.

thumbnail
Table 1. Number of publications found in each database with the corresponding search terms that include duplicates.

https://doi.org/10.1371/journal.pone.0327038.t001

Validation.

To validate the effectiveness of our search strategy, we compared the retrieved search results to a set of known relevant articles identified from preliminary literature searches. This validation process ensured that our search strategy captured a comprehensive range of relevant literature. Additionally, we consulted with domain experts in biomechanics and data augmentation to review the search strategy and provide feedback on its comprehensiveness and relevance. By incorporating these validation steps, we enhanced the credibility of our methodology and ensured the robustness of our search strategy.

Selection of sources of evidence

Using the queries listed in the previous section Final search queries, a total of 710 publications were found, published between January 2013 and July 2024. After identifying these publications, the next steps in the PRISMA guidelines to ensure a transparent and systematic selection process are screening and assessing the eligibility of publications to comprehensively capture relevant literature for our scoping review. Fig 1 provides a visual summary of the systematic selection process, adhering to PRISMA guidelines for scoping reviews. After removing duplicates, 372 unique publications remained.

thumbnail
Fig 1. Flow diagram of the systematic selection.

PRISMA 2020 flow diagram for systematic reviews based on the publications on this topic.

https://doi.org/10.1371/journal.pone.0327038.g001

First screening process

In the initial screening, each publication’s title and abstract were independently reviewed by at least two researchers to assess relevance to this scoping review. The criteria for relevance included:

  • Alignment with the scope of biomechanical time-series data augmentation in human movement
  • Discussion of data augmentation methods
  • Explicit aims to augment data or proposed methods suitable for data augmentation

This inclusion criterion ensured that only publications with data augmentation as the primary goal or relevant methodologies specifically designed for data augmentation were considered. Any discrepancies or uncertainties during the screening were resolved through discussion and consensus among all researchers. This rigorous process ensured the quality and relevance of the included publications.

Full-text analysis

After the initial screening, 48 publications were identified as fitting the scope of this scoping review and were selected for full-text analysis. However, one publication was not available online and was excluded after the authors did not respond to our inquiries, leaving 47 publications for full-text review. Each of the 47 publications underwent a thorough review by at least two researchers. In cases of uncertainty, all researchers collectively examined and discussed the publication in question. During this process, the main data items explained in Section Data items were extracted.

Final selection

At the end of the detailed review process, 21 publications were included in the scoping review. This selection was based on the rigorous application of inclusion criteria and the detailed examination of each publication, ensuring the comprehensiveness and relevance of the included evidence. Table S1 presents the final 21 publications, including their titles, and authors.

Data items

To systematically and consistently capture the relevant information from the publications, the researchers completed a survey addressing various aspects of the studies. This survey facilitated the extraction of essential details from the publications, including the year and location of publication, specifics about the data used (such as type and sensor placement), the disciplinary context, and details about machine learning models trained on the augmented data set. Furthermore, the survey included an analysis of research gaps that necessitated the use of data augmentation techniques. The primary focus of the analysis was identifying the type of augmentation methods employed and the methodologies used to generate synthetic data in the reviewed publications.

Results

Geographic and temporal trends

The publications show a diverse geographical distribution, as illustrated in Fig 2. The data was extracted based on the affiliation of the corresponding first author. Germany was the most prolific contributor with five publications [4,17,2628]. China and the United States followed closely with four publications [11,23,29,30], [3134]. Japan [12,22], Korea [10,35], Italy [36,37], and the United Kingdom [24,38] each contributed two publications.

thumbnail
Fig 2. Geographical distribution of publications.

Geographical distribution of the publications extracted from the affiliation of the first author and ranked by the number of publications.

https://doi.org/10.1371/journal.pone.0327038.g002

An analysis of the temporal distribution of publications reveals interesting trends in research activity over the period considered. As shown in Fig 3, a notable peak is observed in 2020 with seven publications [4,10,12,26,28,32,36], indicating significant interest in the topic at that time. This is followed by four publications in 2024 [11,30,37,38], reflecting sustained research activity. In 2022, three publications within this area were published [23,24,31]. In 2019 [17,22], 2021 [33,34] and 2023 [29,35], two publications each were recorded, indicating consistent research presence during these periods. Conversely, 2018 saw one publication [27]. Notably, no publications were found from 2013 to 2017, suggesting an emergence of interest in the topic in subsequent years.

thumbnail
Fig 3. Temporal trends of publications.

Number of publications published over the years 2018 to 2024. Notably, the years from 2013 to 2017 are not included as no publications were made that fall within the inclusion criteria.

https://doi.org/10.1371/journal.pone.0327038.g003

Reasons for data augmentation

The authors of the publications list multiple reasons for using data augmentation for biomechanical time-series data sets:

  1. Limited Data Set Size [4,10,12,2224,26,28,29,3234,36,38]
    The main reason for employing data augmentation methods is the limitation in data set size, which impedes the performance of training machine and deep learning models. In this context, the model utilized to solve the actual objective is referred to as a downstream model. Synthetic data is frequently used to improve generalization and prevent overfitting in the downstream model, as reported by several studies [10,12,2224,29,33].
  2. Limited Variation of Movement and Parameters [22,34,35]
    When data is sparse within a data set and a limited number of participants is used to acquire data, the number of movements and the variety of their execution is limited. Data augmentation can be used to enhance model performance by introducing a broader range of data variations.
  3. Data Acquisition Challenges [11,31]
    Acquiring data can be troublesome due to issues such as sensor malfunctions or errors in the data processing pipeline, resulting in data loss. Data augmentation can aid the data collection process and account for missing data.
  4. Sensitivity of IMU Sensors [11,27,35]
    Inertial measurement unit (IMU) sensors are highly sensitive to the positioning on the body and the rotation they are in. Slight modifications in sensor placement and rotation result in different signals collected from the IMU sensors and thereby hinder a good performance of the model. Data augmentation can help increase the variability in the signals such that the model can adapt to changes in position and rotation. Here, [27] focused on evaluating and explaining the existing gap between synthetic and measured data, related to the influence of soft tissue.
  5. Financial Constraints [11,17,24,30,3537]
    Collecting data and testing new inventions in sports equipment is time and money-consuming. A lot of resources need to be spent on imitatively testing new development of sports equipment, (e.g. footwear). Acquiring data from a large number of participants is expensive in terms of money and resources, such as equipment, expert knowledge ,and wages of employees. Data augmentation can help limit these costs and resources due to introducing variability in the data set and thus limit the amount of data acquisition that needs to take place.
  6. Unavailability of Combined Data Set [36]
    There may not be a data set that includes several desired data sources for specific environments. For example, the absence of a Motion Capture (MOCAP) data set in combination with IMU data in microgravity, together with the huge cost of acquiring it, necessitated its synthetic generation. [36]

Hence, the reasons for employing a data augmentation method predominantly either revolve around addressing challenges related to limited data set availability such as restricted movement variations and lower downstream model performance, or focus on the underlying causes for the data limitations, such as expensive data acquisition and sensor issues during data collection. In summary, the reasons listed by the authors demonstrate that data augmentation serves as a versatile and indispensable tool for addressing various challenges in sports product development and related research domains. It enables enhanced model performance, robustness, and generalization capabilities despite data limitations and resource constraints.

Disciplines and aims

The publications included explore a diverse range of applications and sports disciplines where data augmentation methods are used for biomechanical time-series data sets. The scoping review reveals a predominant focus on walking tasks, with studies spanning various terrains and surfaces. Additionally, research extends to fall detection, diverse sports disciplines, and specialized areas such as sign language recognition and microgravity training. Each study employs data augmentation techniques to enhance model performance and overcome data limitations, reflecting its crucial role in advancing research across these domains.

Walking

Most publications primarily focused on different types of walking tasks. Even though these publications all focused on increasing the data set size and thereby improving the downstream model’s ability to generalize more efficiently, the aims of each publication cover a wide range of use cases:

  • Creating virtual IMUs from optoelectronic marker trajectories [26]
  • Detecting chronic ankle instability [23]
  • Analyze errors caused by rotation and misalignment of IMU sensory [27]
  • Estimation of gait parameters [28,33,34]
  • Improve exoskeleton controls [29,31]
  • Reducing resources needed for experiments [22]
  • Estimation of myoelectric activity [30]
  • Development of a novel data augmentation technique [24,37]

Diverse disciplines

The remaining publications covered a diverse range of sports and disciplines related to human movement biomechanics:

  • Fall detection applications that aim to predict and categorize falls [10,11,35]
  • Head impact in contact sports (i.e. American football) to predict head injuries [32]
  • Running with a focus on the mechanical properties of footwear [17]
  • Training in microgravity with the aim of creating a data set for categorizing correctly and incorrectly performed exercises in microgravity [36]
  • Sign language and the recognition of different signs [12]
  • Running and walking with the aim of estimating sagittal lower body kinetics and kinematics [4]
  • Walking and jumping to test strategies for optimizing model performance [38]

Data sources and data set details

Table S2 shows the data sources used across all publications within the scope of this review. While ten publications [10,12,2224,2931,34,35] acquired custom data sets within their publication, seven publications [4,17,26,32,33,37,38] used previously collected or public data. The remaining four publications [11,27,28,36] used previously collected or public data sets in combination with custom data.

Data sources

Most publications [10,12,23,24,27,32,33,35,38] used one type of sensor, mostly MOCAP systems or IMUs. Eight publications [11,17,22,26,2830,36] used two different kinds of sensors to create their data set with the most common combination being MOCAP and force plates. [4,31,34] were the only publications that used three different types of sensors - IMUs, MOCAP and Force Plates - to acquire their data set, while [37] used a data set containing data from five sensors. [32] utilized a data set that was collected using specifically designed mouth guards measuring head impact and [22] used a Nintendo Wii Balance Board as a cheaper alternative to force plates. Fig 4 shows the quantity of different data sources used across the publications.

thumbnail
Fig 4. Quantity of data sources.

Frequency with which each sensor was used across the publications included in this scoping review.

https://doi.org/10.1371/journal.pone.0327038.g004

Number of participants

The number of participants varied considerably across all publications.

  • For custom data sets, the number of participants ranged from six to 42 ().
  • For publications using previously collected or public data sets, the number of participants ranged from ten to 2295 (), with [26] and [32] not reporting the total number of participants.
  • For publications using combined data sets (both newly acquired and previously collected data), the number of participants in the newly acquired data ranged from six to 32 (). The number of participants from public or previously collected data ranged from 17 to 93 (). Note that [36] did not report the number of participants in the added data set.

In summary, the variability in the number of participants across studies underscores the challenges of limited data availability. This presents several issues due to limited movements and limited variations in said movements. The small sample sizes in many studies can lead to limited statistical power, making it difficult to generalize findings to a larger population. Small data sets are more susceptible to random variations and may not capture the full spectrum of potential outcomes. With fewer participants, the diversity of movements captured in the data set is restricted. This limitation can affect the training of machine learning models, which require a wide range of input data to learn effectively. A limited variety of movements means the models may not perform well in real-world scenarios where movements can vary significantly. This includes variations in age, anthropometry, and fitness levels. Consequently, models trained on these data sets may have biased performance and may not apply to all user groups. Therefore, data augmentation emerges as a crucial tool to overcome these challenges as it can help to introduce more variety in the data thereby enabling the development of more effective and generalizable models in the field of biomechanics.

Data set size

Only eight publications reported the size of their final data set. However, the size of training instances is hardly comparable and may depend on the data representation.

  • [4] 595 walking and running cycles
  • [32] 572 head impact kinematics
  • [12] 16890 signs
  • [10] reported a total of 1278 samples, however, it was not clear what these samples were.
  • [38] 75732 bilateral trials
  • [11] 3313 fall samples
  • [30] 215 slow, 241 normal, and 258 fast gait cycles
  • [37] 37687 gait cycles

Augmentation method

The augmentation methods reported in the publications can be categorized into three main groups: physics-based, classical, and data-driven methods. The following sections will analyze each group regarding their usage and characteristics.

Physics-based methods

We categorized ten methods as physical-based methods. These methods either augment data by rotating IMU data to create more IMU signals or use musculoskeletal (MS) models to generate new data samples. These techniques mostly ensure the biomechanical validity of the generated samples, but validity may be limited due to simplifications in the formulas used or the realism of the rotations introduced. Physics-based methods can be distinguished into the following two categories:

Rotation methods

Rotation methods do not create new synthetic participants but enhance the performance of downstream models by creating variations in sensor signals. In this way, these methods increase the variability in movements and reduce the influence of misaligned sensor data on the performance of downstream models. In four publications, rotation methods are employed to augment the training data and increase the data set size.

  • [35] proposed two rotation-based methods, uniform and normal augmentation, sampling rotation angles from a uniform or truncated normal distribution.
  • [33] drew random rotations from a normal distribution around the true orientation of the sensor. Added gravity and random Gaussian noise to account for noise experienced by true sensors.
  • [26] randomly rotated the relative orientation of virtual sensors to simulate acceleration and gyroscope data. To obtain the angular velocity, they calculated the second derivative of the body segment origin from an MS model.
  • [27] randomly sampled rotation angles and added Gaussian noise to generate synthetic IMU data.

Musculoskeletal models

The remaining six publications utilized MS models to generate new data, ensuring biomechanical plausibility through anthropometric scaling and fitting of validated body models with appropriate degrees of freedom.

  • [17] created MS models to solve 1120 optimal control problems using their proposed method.
  • [4] randomly drew measured joint angles, GRFs, and speeds from training data to generate unique MS models, solving optimal control problems to generate training samples, where simulated data was chosen randomly from 1000 simulations of each subject.
  • [36] built an MS model using OpenSim [39] and added 3D modeled geometry to simulate microgravity.
  • [31] simulated IMU data using a OpenSim model [39] due to saturation and dropout of the sensors.
  • [11] build an openSim model based on markerless MOCAP and augmented data by scaling the simulation model.
  • [28] Set up an anatomical coordinate system based on a biomechanical model, rotated and translated it to match possible sensor positions, and calculated derivatives to obtain acceleration and angular rates.

Classic methods

Six publications used classical methods like adding noise or jittering. While these techniques augment original samples, they do not guarantee biomechanical validity nor add meaningful information of biomechanical patterns, but increase robustness and generalization by adding variability to sensor data.

  • [22] proposed probabilistic augmentation, generating an arbitrary number of new steps data based on insole sensor pressure data of one step by drawing from a multivariate normal distribution.
  • [29] applied multi-window sampling, scanning input data with a shifting window to provide more data samples.
  • [12] applied time and magnitude warping techniques, slicing input into equal lengths and applying distortions and noise.
  • [10] compared time warping, jittering, and scaling techniques to change temporal characteristics, scale data magnitude, and add mechanical noise.
  • [34] compared various warping techniques.
  • [38] applied jittering, magnitude and random guided warping, window slicing and a spawner.

Data-driven methods

Five publications used data-driven methods, which are characterized by learning the inherent data distributions and patterns to generate realistic yet synthetic data instances.

  • [23] used a Dual-GAN with gradient penalty to generate spatio-temporal and kinematic data, approximating the real data distribution by a nonlinear dimensionality reduction method (t-SNE algorithm).
  • [24] used a GAN with an autoencoder, where the generator (autoencoder) compressed and reconstructed input data, and the discriminator distinguished real from simulated data, creating samples following the training distribution.
  • [32] employed PCA for dimensionality reduction, followed by an emulator to generate new time series, creating a stochastic dimensionality reduction with time-dependent modes.
  • [37] used a Wasserstein GAN with gradient penalty to generate synthetic movement patterns from only anthropometric measures.
  • [30] used a Wasserstein GAN based on the TimeGAN framework to produce synthetic IMU and EMG data.

Fig 5 illustrates that the various augmentation methods are fairly evenly distributed across the publications. This distribution indicates that there is no single, universally accepted standard method for data augmentation in the field of biomechanical data analysis. Instead, the choice of augmentation method appears to be highly dependent on the specific aims and requirements of each study. Overall, the selection of an augmentation method should consider the specific study goals, the available computational resources, and the desired balance between simplicity and biomechanical fidelity.

thumbnail
Fig 5. Data augmentation methods.

Frequency of each data augmentation category used across the publications included in this scoping review.

https://doi.org/10.1371/journal.pone.0327038.g005

Biomechanical validity.

If the biomechanical validity of the generated data is crucial, physics-based methods are superior due to adhering to realistic biomechanical constraints. Additionally, these techniques can address and mitigate the impact of misaligned sensor data, enhancing the performance of the downstream model.

Robustness and limited resources.

If the aim is to improve the robustness of downstream models without the need for realistic synthetic samples and under limited time resources, classic methods like jittering or warping might be the right choice to increase data variability.

Variability and biomechanical patterns.

If the aim is to increase data variability and add underlying biomechanical patterns, data-driven methods can be effective, as these methods can learn intricate relationships within the data. While they follow the distribution of the original data set, they can still produce more realistic synthetic samples that reflect the natural variability presented in the data.

Noteworthy methods

Some publications, although excluded during the full text analysis in the selection process, offer interesting methods that could be beneficial for developing improved data augmentation techniques. Notably, the integration of Reinforcement Learning (RL) in Musculoskeletal (MS) models is a promising approach for enhancing these methods. This integration primarily aims to improve MS models to enhance their biomechanical validity and understanding of complex movements. For instance, [40] implemented a Deep Reinforcement Learning (DRL) algorithm using Proximal Policy Optimization (PPO) combined with reward shaping and imitation learning to simulate the walking patterns of both healthy individuals and users of transfemoral prostheses. The results indicated that the prosthesis model required higher muscle forces, demonstrating the added complexity in achieving a natural gait compared to a healthy leg. Similarly, [41] trained an RL agent to simulate human walking by interacting with an MS model. The agent’s actions, based on muscle excitations, were optimized through continuous feedback from the environment, leading to improved walking simulations. Additionally, [42] integrated bioinspired reward reshaping strategies to enhance the simulation and analysis of human locomotion and falls.

These approaches primarily focus on leveraging RL to enhance the biomechanical validity of MS models, allowing for a better understanding of complex human movements. By incorporating biomechanical principles into reward function design, optimizing action and state space management, and ensuring rigorous validation, these models can also be used for data augmentation and may help improve MS models to generate more realistic synthetic data.

Downstream model

In the publications, various downstream models were employed to predict features and evaluate the data augmentation methods. A summary of the number of downstream models used per publication and commonly used downstream models is provided below.

Overview of downstream model usage

  • Single Model Usage: Ten publications [4,10,11,24,28,29,31,34,35,38] used one type of downstream model.
  • Comparison of Two Models: Five publications [22,26,27,33,36] compared two types of downstream models against each other.
  • Comparison of Multiple Models: Three publications compared three [23], seven [12] and eight [30] types of downstream models, respectively.
  • Publications Without Downstream Models: Three publications [17,32,37] did not use any downstream model.

Common downstream models

  1. Long Short-Term Memory Networks (LSTMs):
    • Standard LSTM:
      • Sign language recognition [12]
      • Detecting falls [11]
      • Predicting chronic ankle disease [23]
      • Prediction of joint kinematics and/or kinetics of various joints [26,33]
    • ConvLSTM (Combination of LSTM and CNN):
      • Sign language recognition [12]
      • Pre-impact fall detection [35]
      • Predicting chronic ankle disease [23]
      • Evaluating joint training [27]
    • Bi-Directional LSTM:
      • Predicting knee joint angles [34]
      • Pre-impact fall detection [10]
      • Estimation of electrical muscle activity [30]
    • LSTM-Fully Convolutional Network Variant: Predicting chronic ankle disease [23]
  2. Convolutional Neural Networks (CNNs):
    • Standard CNN:
      • Prediction of joint kinematics and/or kinetics of various joints [4,24,33]
      • Evaluating joint training [27]
      • Predicting foot placement [29]
      • Estimating ground reaction forces [24]
      • Sign language recognition [12]
      • Estimation of electrical muscle activity [30]
    • Temporal Convolutional Network (TCN): Predicting sagittal hip moments [31]
    • Attention-Based Multiple-Input Multiple-Output Conv2D Regression Model: Estimation of electrical muscle activity [30]
  3. Other Machine Learning Models:
    • K-Nearest Neighbour:
      • Sign language recognition [12]
      • Estimation of electrical muscle activity [30]
    • Random Forest:
      • Sign language recognition [12]
      • Estimation of electrical muscle activity [30]
    • Decision Tree: Estimation of electrical muscle activity [30]
    • Support Vector Machines (SVMs):
      • Sign language recognition [12]
      • Categorization of correctly and incorrectly performed movements [36]
      • Estimation of electrical muscle activity [30]
    • Multi-Layer Perceptrons (MLPs) - Non-CNN:
      • Sign language recognition [12]
      • Prediction of joint angles and moments of lower limbs [26,28]
      • Categorization of correctly and incorrectly performed movements [36]
      • Estimation of electrical muscle activity [30]
    • Regression Variants:
      • Gaussian Processes and Linear Regression: Estimating vertical GRFs [22]
      • Multinomial Logistic Regression: Categorize participants in different groups [38]

The use of LSTMs, particularly their variants such as convLSTMs, was prevalent in many publications, highlighting their popularity in processing sequential biomechanical data. However, the diversity of models used suggests that no single model is universally superior. This variety underscores the importance of selecting appropriate models based on the specific task and data characteristics.

Moreover, the reliance on these models further emphasizes the necessity of data augmentation techniques, especially when dealing with limited data sets. LSTMs and CNNs, in particular, depend on a large amount of training data to avoid overfitting and to enhance generalization. Without sufficient data, these models can struggle to learn robust features and may perform poorly on unseen data [19,20].

Evaluation of augmentation methods

Evaluation methods

The majority of publications [4,10,12,22,24,26,27,29,33,35,38] evaluated the quality of data augmentation methods based on the performance of downstream models. Typically, the performance of downstream models trained on non-augmented data was compared to those trained on augmented data. Only a few works [11,28,32,37] validated synthetic data against the original data set. This was done either visually [11,32] or statistically [28,37], ensuring the augmented data preserved statistical properties. Three publications [23,30,34] used both methods to evaluate their augmentation method. Some publications did not evaluate the performance of the data augmentation method [17,31,36].

Limited comparability.

Focusing primarily on downstream model performance provides a direct measure of how effective data augmentation methods are in improving the predictive capabilities of models. While this is crucial, the following limitations should be considered. The lack of comparison of augmentation methods means it is unclear why certain methods work better than others, especially if the details of the data augmentation are not well understood. Understanding the effect of data created using different augmentation techniques is crucial for comprehending the impact of the increased data set on the downstream model and evaluating the quality of certain aspects of the data augmentation technique. Furthermore, comparing synthetic and measured data can help identify gaps and discrepancies. Recognizing where synthetic data falls short can guide the development of more sophisticated augmentation methods that better capture the complexities of real data.

Findings

As shown in Table S3, the consensus was that data augmentation generally enhances the accuracy and generalization of downstream models across individual tasks.

Kinetics vs kinematics.

Findings on different effects of augmented data on kinematics and kinetics were reported by [4,28]. Both found that while predictions for kinematics were increased using synthetic samples created by data augmentation methods, there was no improvement in terms of root mean squared error (RMSE) and correlation for kinetics. [28] further reported that adding noise to the training data did increase the performance on kinetics. As [4] used an MS model to generate new synthetic data, the suggestion was made that this discrepancy is due to noisy, oscillating joint moments produced by the musculoskeletal model, which only tracked joint angles and GRFs, but not joint moments. Additionally, the reference joint moments could be too smooth due to filtering applied to marker and force plate data before computing joint moments, leading to inaccuracies, especially for hip joint moments, as they were estimated using inverse dynamics, which accumulated errors. [28], which used a rotation-based augmentation technique, argued that predicting kinematics is more challenging than predicting joint moments due to the closer relationship between acceleration and joint moments and the more complex initial value problem of joint angles. Consequently, kinematics predictions benefit from larger data sets, as demonstrated by improved accuracy seen with measured, combined, and simulated data. Conversely, kinetics predictions improve with additional noise in the input data rather than larger data sets, due to the absence of larger soft tissue movements in simulated data. Soft tissue movements affect the calculation of both joint angles and moments, and simulated IMU data lacks these movements, which is a limitation that will be discussed in the next paragraph. [24] on the other hand reported an improvement in both, kinematic and kinetic predictions, by sampling synthetic data of markers and GRFs simultaneously. This approach demonstrated enhanced test set performances in both kinematic and kinetic predictions, highlighting the potential benefits of more comprehensive data augmentation strategies.

Soft tissue artifacts.

Five publications highlighted issues related to soft tissue artifacts in synthetic data introduced by IMUs attached to various body segments. Synthetic data often lacks typical soft tissue movement artifacts present in measured data, leading to discrepancies known as the “synthetic gap” [4,2628,34]. This gap significantly influences sensor measurements, especially at higher movement velocities. For instance, determining the hip joint center is challenging due to soft tissue, and rigidly attaching the pelvis IMU is problematic, causing larger artifacts in pelvis measurements [26,34]. Some studies suggested that noise from soft tissue movements could improve kinetic predictions, whereas kinematic predictions benefited more from larger data sets [28]. Additionally, the assumption of rigid sensor attachment to the skeleton when generating synthetic IMU data from MS models does not reflect the loose connections caused by soft tissue in reality, which could be addressed with more realistic models such as wobbling mass models [4]. Overall, integrating soft tissue artifacts into data augmentation models is crucial for accurately representing measured signals and improving the performance of neural networks in biomechanical analysis [27]. Most of these publications employed rotation-based augmentation methods, which could be particularly susceptible to soft tissue errors [2628]. The other two publications used an MS model [4] and different warping techniques [34], respectively. This further emphasizes that soft tissue artifacts might be an issue for several types of data augmentation techniques and therefore need to be considered in the development of more effective methods. On the other hand [30] offers an approach where the positioning and orientation of IMUs do not need to match exactly, as long as the IMUs are approximately attached to, in their case, thigh, and shank. Even though, they do not address soft tissues within their publication, it would be interesting to know whether they had also issues related to this topic. In conclusion, synthetic IMU data should be used cautiously when augmenting data sets, as they tend to be too clean and thus not realistic. Additionally, further research is needed to evaluate how soft tissue artifacts can be effectively simulated to ensure a more accurate representation of real-world data.

Discrepancies in simulated IMU accuracy and model performance.

An interesting discrepancy arises when comparing the simulated IMU accuracy and downstream model performance between two studies, [34] and [28]. In [34], the accuracy of the pelvis IMU was notably poorer than that of other segments, especially for rotational velocities. This study also found that including synthetic IMU data significantly improved kinematic predictions for hip and knee degrees of freedom. However, predictions for internal-external rotation of the hip and knee remained less accurate. Conversely, [28] reported that the pelvis sensor achieved the highest accuracy among all sensors, with better performance than the leg sensors. Their model results showed that joint moment predictions for the hip were particularly accurate, as were joint angle predictions in the sagittal plane. These differences highlight the variability in simulated IMU data accuracy and suggest that the effectiveness of data augmentation techniques can vary significantly based on sensor locations and types of movements analyzed. The contrasting findings underscore the importance of tailoring data augmentation methods to specific biomechanical contexts and rigorously validating synthetic data against measured data to ensure model reliability and accuracy.

Future research

Only a few works mentioned future research directions on data augmentation in biomechanical studies. However, they indicate that future research should address several key areas to enhance the effectiveness and realism of synthetic data.

Incorporating subject-specific information.

Conditioning neural networks with inputs that define subjects (e.g. anthropometry) and trials (e.g. gait cycle duration) during training can help overcome current limitations in estimating absolute differential quantities like velocities and accelerations from synthetic data [24]. However, it is worth noting that using only anthropometric data, as in [37], is not effective for generating synthetic data samples. It may be beneficial to merge these two approaches, incorporating the anthropometric methodology from [37] into the framework presented by [24].

Modeling soft tissue and motion artifacts.

Developing more sophisticated soft tissue models and general motion artifact compensation strategies is essential to bridge the reality gap between simulated and measured data. Future work should focus on improving these models to better replicate the dynamics of real-world movements [4,27].

Weighted augmentation strategies.

Exploring scientific augmentation strategies that give more weight to challenging motions can improve model robustness and performance. This approach can help in better preparing models to handle difficult scenarios [35].

Domain adaptation and iterative training.

Employing domain adaptation techniques, such as GANs, to learn mappings between simulated and measured data can reduce discrepancies. Additionally, iterative data generation during training within a closed loop, such as meta-learning algorithms, can dynamically adjust simulator parameters to produce synthetic data that enhances model accuracy [4].

In conclusion, advancing data augmentation methods by integrating subject-specific details, improving soft tissue models, employing weighted augmentation strategies, and leveraging domain adaptation and iterative training approaches will accelerate the development of more accurate and realistic biomechanical models. By focusing on these areas, future research can significantly improve the quality and applicability of synthetic biomechanical data, ultimately enhancing the performance of predictive models in this field.

Conclusion

This work represents the first scoping review on data augmentation and synthetic data generation for time-series data in human movement biomechanics. It demonstrates the indispensable role of data augmentation methods in addressing limited data set availability in biomechanical time-series data.

Key reasons for employing data augmentation include overcoming restricted movement variations, enhancing downstream model performance, and mitigating issues related to expensive data acquisition and sensor inaccuracies. By enabling improved model performance, robustness, and generalization, data augmentation is vital for advancing research in sports product development, health, rehabilitation, and related domains.

This review shows a predominant focus on walking tasks, complemented by other areas of interest, including fall detection, various sports, sign language recognition, and microgravity training. This diversity underscores the importance of data augmentation in improving research across different applications.

Variability in participant numbers highlights the challenges of limited data, which restricts movement variations and model training effectiveness. Data augmentation not only increases the sample size but also introduces more variety into a data set, thereby, facilitating the development of more effective and generalizable models.

Our findings reveal no single, universally accepted standard for data augmentation in biomechanical data analysis. Instead, methods are chosen based on specific study goals, available resources, and the desired balance between simplicity and biomechanical fidelity. Physics-based methods are recommended for ensuring biomechanical validity, classic methods like jittering or warping are suitable for improving robustness with limited resources, and data-driven methods excel in increasing data variability and reducing biases.

The prevalent use of LSTMs and their variants, such as convLSTMs, as downstream models highlights their popularity in processing sequential biomechanical data. These downstream models are commonly used to evaluate the effectiveness of augmentation techniques. However, the variety of models used suggests the necessity of selecting appropriate downstream models based on the specific task and data characteristics. This reliance on models emphasizes the need for data augmentation techniques, especially to avoid overfitting and enhance generalization with limited data sets.

The evaluation of data augmentation methods suggests that kinematics predictions benefit from larger data sets, while kinetics predictions improve with additional noise due to the absence of soft tissue movements in simulated data. Synthetic IMU data should be used cautiously as it tends to be too clean and overly smoothed. Thus, further research is needed to evaluate how soft tissue artifacts can be simulated to ensure a more accurate representation of real-world data.

While focusing on downstream model performance is crucial, it also has limitations. The lack of comparison among augmentation methods means it is unclear why certain techniques work better than others. Understanding the effects of different augmentation techniques is essential for evaluating their impact on downstream models and identifying where synthetic data falls short. This understanding will guide the development of more sophisticated methods that better capture the complexities of real data.

Hence, we conclude that a systematic comparison of data augmentation techniques for biomechanics, which is currently lacking, is an important task for future research. Such comparisons are vital for advancing the field by providing clearer guidance on the most effective strategies across various applications and data characteristics. Moreover, integrating soft tissue artifacts into data augmentation models, rigorously evaluating synthetic data, and tailoring augmentation methods to specific study needs are essential steps for enhancing the accuracy, robustness, and applicability of predictive models in biomechanical research.

Supporting information

S1 Table. Included publications.

All publications that were included in the final selection of this review.

https://doi.org/10.1371/journal.pone.0327038.s001

(PDF)

S2 Table. Data sources and data set details.

Data sources used in the individual publications, such as Motion Capture (MOCAP), inertial measurement units (IMUs), and electromyography (EMG). Additionally, the details of the data set created using the data sources is listed for each publication. No asterisk (*) denoted publications that used only their custom data set, acquired within the scope of the corresponding publication. * marks all publications that used previously recorded or public data sets. ** marks publications using a combination of own data and previously recorded or public data. Numbers with exponent c were corrected based on the reference and participant details provided.

https://doi.org/10.1371/journal.pone.0327038.s002

(PDF)

S3 Table. Data augmentation results.

Reported evaluation methods and results of data augmentation across publications.

https://doi.org/10.1371/journal.pone.0327038.s003

(PDF)

References

  1. 1. Xiang L, Wang A, Gu Y, Zhao L, Shim V, Fernandez J. Recent machine learning progress in lower limb running biomechanics with wearable technology: a systematic review. Front Neurorobot. 2022;16:913052. pmid:35721274
  2. 2. Camomilla V, Bergamini E, Fantozzi S, Vannozzi G. Trends supporting the in-field use of wearable inertial sensors for sport performance evaluation: a systematic review. Sensors (Basel). 2018;18(3):873. pmid:29543747
  3. 3. Halilaj E, Rajagopal A, Fiterau M, Hicks JL, Hastie TJ, Delp SL. Machine learning in human movement biomechanics: best practices, common pitfalls, and new opportunities. J Biomech. 2018;81:1–11. pmid:30279002
  4. 4. Dorschky E, Nitschke M, Martindale CF, van den Bogert AJ, Koelewijn AD, Eskofier BM. CNN-based estimation of sagittal plane walking and running biomechanics from measured and simulated inertial sensor data. Front Bioeng Biotechnol. 2020;8:604. pmid:32671032
  5. 5. Mekni A, Narayan J, Gritli H. Quinary classification of human gait phases using machine learning: investigating the potential of different training methods and scaling techniques. BDCC. 2025;9(4):89.
  6. 6. Mekni A, Narayan J, Gritli H. Classification of eight gait phases using machine learning: integration of multi-source gait data. In: 2025 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI). vol. 3; 2025. p. 1–6.
  7. 7. Mekni A, Narayan J, Gritli H. Multi-class classification of gait cycle phases using machine learning: a comprehensive study using two training methods. Netw Model Anal Health Inform Bioinforma. 2025;14(1):30.
  8. 8. Höschler CSCFJKAL, Halmich H, Schwameder H. Towards real-time assessment: wearable-based estimation of 3D knee kinetics in running and the influence of preprocessing workflows. In: Proceedings of the 42nd International Conference of Biomechanics in Sports (ISBS). vol. 42. International Society of Biomechanics in Sports; 2024. p. 72. https://commons.nmu.edu/isbs/vol42/iss1/72/
  9. 9. Höschler L, Halmich C, Schranz C, Fritz J, Čigoja S, Ullrich M, et al. Wearable-based estimation of continuous 3D knee moments during running using a convolutional neural network. Sports Biomech. 2025;:1–19. pmid:40125937
  10. 10. Kim TH, Choi A, Heo HM, Kim H, Mun JH. Acceleration magnitude at impact following loss of balance can be estimated using deep learning model. Sensors (Basel). 2020;20(21):6126. pmid:33126491
  11. 11. Tang J, He B, Xu J, Tan T, Wang Z, Zhou Y, et al. Synthetic IMU datasets and protocols can simplify fall detection experiments and optimize sensor configuration. IEEE Trans Neural Syst Rehabil Eng. 2024;32:1233–45. pmid:38408008
  12. 12. Hernandez V, Suzuki T, Venture G. Convolutional and recurrent neural network for human activity recognition: application on American sign language. PLoS One. 2020;15(2):e0228869. pmid:32074124
  13. 13. Knudson D. Confidence crisis of results in biomechanics research. Sports Biomech. 2017;16(4):425–33. pmid:28632059
  14. 14. Mienye ID, Swart TG, Obaido G. Recurrent neural networks: a comprehensive review of architectures, variants, and applications. Information. 2024;15(9):517.
  15. 15. Fister I, Vrbančič G, Podgorelec V, Fister I. Synthetic data augmentation of cycling sport training datasets. Lect Notes Netw Syst. 2022;371:65–74.
  16. 16. Brigato L, Iocchi L. A close look at deep learning with small data. 2020. https://arxiv.org/abs/2003.12843
  17. 17. Dorschky E, Krüger D, Kurfess N, Schlarb H, Wartzack S, Eskofier BM, et al. Optimal control simulation predicts effects of midsole materials on energy cost of running. Comput Methods Biomech Biomed Engin. 2019;22(8):869–79. pmid:30987457
  18. 18. Fleischmann S, Dietz S, Shanbhag J, Wuensch A, Nitschke M, Miehling J, et al. Exploring dataset bias and scaling techniques in multi-source gait biomechanics: an explainable machine learning approach. ACM Trans Intell Syst Technol. 2024;16(1):1–19.
  19. 19. Iglesias G, Talavera E, González-Prieto Á, Mozo A, Gómez-Canaval S. Data Augmentation techniques in time series domain: a survey and taxonomy. Neural Comput Applic. 2023;35(14):10123–45.
  20. 20. Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. J Big Data. 2019;6(1):1–48.
  21. 21. Iwana BK, Uchida S. An empirical survey of data augmentation for time series classification with neural networks. PLoS One. 2021;16(7):e0254841. pmid:34264999
  22. 22. Eguchi R, Takahashi M. Insole-based estimation of vertical ground reaction force using one-step learning with probabilistic regression and data augmentation. IEEE Trans Neural Syst Rehabil Eng. 2019;27(6):1217–25. pmid:31094691
  23. 23. Liu X, Zhao C, Zheng B, Guo Q, Yu Y, Zhang D, et al. Spatiotemporal and kinematic characteristics augmentation using Dual-GAN for ankle instability detection. Math Biosci Eng. 2022;19(10):10037–59. pmid:36031982
  24. 24. Bicer M, Phillips ATM, Melis A, McGregor AH, Modenese L. Generative deep learning applied to biomechanics: a new augmentation technique for motion capture datasets. J Biomech. 2022;144:111301. pmid:36201910
  25. 25. Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med. 2018;169(7):467–73. pmid:30178033
  26. 26. Mundt M, Thomsen W, Witter T, Koeppe A, David S, Bamer F, et al. Prediction of lower limb joint angles and moments during gait using artificial neural networks. Med Biol Eng Comput. 2020;58(1):211–25. pmid:31823114
  27. 27. Zimmermann T, Taetz B, Bleser G. IMU-to-segment assignment and orientation alignment for the lower body using deep learning. Sensors (Basel). 2018;18(1):302. pmid:29351262
  28. 28. Mundt M, Koeppe A, David S, Witter T, Bamer F, Potthast W, et al. Estimation of gait mechanics based on simulated and measured IMU data using an artificial neural network. Front Bioeng Biotechnol. 2020;8:41. pmid:32117923
  29. 29. Xiong J, Chen C, Zhang Y, Chen X, Qian Y, Leng Y, et al. A probability fusion approach for foot placement prediction in complex terrains. IEEE Trans Neural Syst Rehabil Eng. 2023;31:4591–600. pmid:37971912
  30. 30. Liang W, Muhammad Rehan Afzal H, Qiao Y, Fan A, Wang F, Hu Y, et al. Estimation of electrical muscle activity during gait using inertial measurement units with convolution attention neural network and small-scale dataset. J Biomech. 2024;167:112093. pmid:38615480
  31. 31. Molinaro DD, Kang I, Camargo J, Gombolay MC, Young AJ. Subject-independent, biological hip moment estimation during multimodal overground ambulation using deep learning. IEEE Trans Med Robot Bionics. 2022;4(1):219–29.
  32. 32. Arrué P, Toosizadeh N, Babaee H, Laksari K. Low-rank representation of head impact kinematics: a data-driven emulator. Front Bioeng Biotechnol. 2020;8:555493. pmid:33102454
  33. 33. Rapp E, Shin S, Thomsen W, Ferber R, Halilaj E. Estimation of kinematics from inertial measurement units using a combined deep learning and optimization framework. J Biomech. 2021;116:110229. pmid:33485143
  34. 34. Sharifi Renani M, Eustace AM, Myers CA, Clary CW. The use of Synthetic IMU signals in the training of deep learning models significantly improves the accuracy of joint kinematic predictions. Sensors (Basel). 2021;21(17):5876. pmid:34502766
  35. 35. Yu X, Ma T, Jang J, Xiong S. Data augmentation to address various rotation errors of wearable sensors for robust pre-impact fall detection. IEEE J Biomed Health Inform. 2023;27(5):2197–207. pmid:37015700
  36. 36. Ravizza M, Pedrocchi A, DeWitt J, Ferrigno G. From mocap data to inertial data through a biomechanical model to classify countermeasure exercises performed on ISS. Convegno Nazionale di Bioingegneria. 2020; p. 418–421.
  37. 37. Kárason H, Ritrovato P, Maffulli N, Tortorella F. Generative data augmentation of human biomechanics. Image analysis and processing - ICIAP 2023 workshops. Cham: Springer. 2024. p. 482–93.
  38. 38. Liew BXW, Pfisterer F, Rügamer D, Zhai X. Strategies to optimise machine learning classification performance when using biomechanical features. J Biomech. 2024;165:111998. pmid:38377743
  39. 39. Delp SL, Anderson FC, Arnold AS, Loan P, Habib A, John CT, et al. OpenSim: open-source software to create and analyze dynamic simulations of movement. IEEE Trans Biomed Eng. 2007;54(11):1940–50. pmid:18018689
  40. 40. De Vree L, Carloni R. Deep reinforcement learning for physics-based musculoskeletal simulations of healthy subjects and transfemoral prostheses’ users during normal walking. IEEE Trans Neural Syst Rehabil Eng. 2021;29:607–18. pmid:33646954
  41. 41. Su B, Gutierrez-Farewik EM. Simulating human walking: a model-based reinforcement learning approach with musculoskeletal modeling. Front Neurorobot. 2023;17:1244417. pmid:37901705
  42. 42. Nowakowski K, Carvalho P, Six J-B, Maillet Y, Nguyen AT, Seghiri I, et al. Human locomotion with reinforcement learning using bioinspired reward reshaping strategies. Med Biol Eng Comput. 2021;59(1):243–56. pmid:33417125
  43. 43. Fukuchi RK, Fukuchi CA, Duarte M. A public dataset of running biomechanics and the effects of running speed on lower extremity kinematics and kinetics. PeerJ. 2017;5:e3298. pmid:28503379
  44. 44. University CM. CMU graphics lab - motion capture library; 2024. http://mocap.cs.cmu.edu/
  45. 45. Dorschky E, Nitschke M, Seifer A-K, van den Bogert AJ, Eskofier BM. Estimation of gait kinematics and kinetics from inertial sensor data using optimal control of musculoskeletal models. J Biomech. 2019;95:109278. pmid:31472970
  46. 46. Komnik I, Weiss S, Fantini Pagani CH, Potthast W. Motion analysis of patients after knee arthroplasty during activities of daily living–a systematic review. Gait Posture. 2015;41(2):370–7. pmid:25680471
  47. 47. Hernandez F, Wu LC, Yip MC, Laksari K, Hoffman AR, Lopez JR, et al. Six degree-of-freedom measurements of human mild traumatic brain injury. Ann Biomed Eng. 2015;43(8):1918–34. pmid:25533767
  48. 48. Laksari K, Kurt M, Babaee H, Kleiven S, Camarillo D. Mechanistic insights into human brain impact dynamics through modal analysis. Phys Rev Lett. 2018;120(13):138101. pmid:29694192
  49. 49. Horsak B, Slijepcevic D, Raberger A-M, Schwab C, Worisch M, Zeppelzauer M. GaiTRec, a large-scale ground reaction force dataset of healthy and impaired gait. Sci Data. 2020;7(1):143. pmid:32398644
  50. 50. Liew BXW, Rugamer D, Abichandani D, De Nunzio AM. Classifying individuals with and without patellofemoral pain syndrome using ground force profiles - development of a method using functional data boosting. Gait Posture. 2020;80:90–5. pmid:32497981
  51. 51. Özdemir AT, Barshan B. Detecting falls with wearable sensors using machine learning techniques. Sensors (Basel). 2014;14(6):10691–708. pmid:24945676
  52. 52. Camargo J, Ramanathan A, Flanagan W, Young A. A comprehensive, open-source dataset of lower limb biomechanics in multiple conditions of stairs, ramps, and level-ground ambulation and transitions. J Biomech. 2021;119:110320. pmid:33677231
  53. 53. Young AD, Ling MJ, Arvind DK. IMUSim: a simulation environment for inertial sensing algorithm design and evaluation. In: Proceedings of the 10th International Conference on Information Processing in Sensor Networks. Chicago, IL, USA; 2011. p. 199–210.