Figures
Abstract
Variability of myoelectric activity during walking is the result of human capability to adapt to both intrinsic and extrinsic perturbations. The availability of sEMG signals lasting at least some minutes (instead of seconds) is needed to comprehensively analyze the variability of surface electromyographic (sEMG) signals. The current study introduces a dataset of long-lasting sEMG signals recorded during walking sessions of 31 healthy subjects, aged between 20 and 30 years, conducted at the Movement Analysis Lab of Università Politecnica delle Marche, Ancona, Italy. The sEMG signals were captured from ten distinct lower-limb muscles (five per leg), including gastrocnemius lateralis (GL), tibialis anterior (TA), rectus femoris (RF), hamstrings (Ham), and vastus lateralis (VL). Synchronized electrogoniometric and foot-floor-contact signals are also supplied to enable the spatial/temporal analysis of the sEMG signals. The experimental procedure involves subjects walking barefoot on level ground for approximately 5 minutes at their natural speed and pace, following an eight-shaped path featuring linear diagonal segments, curves, accelerations, and decelerations. An advanced analysis of the sEMG signals was performed to test the reliability and usability of the current dataset. The considerable duration of the signals makes this dataset particularly useful for studies where a significant volume of data is crucial, such as machine/deep learning approaches, investigations examining the variability of muscle recruitment during physiological walking, validations of the reliability of novel sEMG-based algorithms, and assembly of reference datasets for pathological condition characterization.
Citation: Di Nardo F, Morbidoni C, Iadarola G, Spinsante S, Fioretti S (2025) Long duration multi-channel surface electromyographic signals during walking at natural pace: Data acquisition and analysis. PLoS ONE 20(2): e0318560. https://doi.org/10.1371/journal.pone.0318560
Editor: Monika Błaszczyszyn, Opole University of Technology: Politechnika Opolska, POLAND
Received: December 1, 2023; Accepted: January 17, 2025; Published: February 12, 2025
Copyright: © 2025 Di Nardo et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All data used in the present article is freely available consulting the public repository of medical research data PhysioNet at the following link https://physionet.org/content/semg/1.0.0/ https://doi.org/10.13026/bwvb-ht51
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
1. Introduction
Electromyography (EMG) is acknowledged as one of the main techniques able to supply suitable and reliable biological signals to characterize and study the neuromotor system [1]. Surface electromyography (sEMG) and fine-wire electromyography (fwEMG) could be both employed to identify and describe the recruitment of body muscles during a specific motor task. Nevertheless, fwEMG is quite an invasive and sometimes painful approach. For this reason, it is rarely adopted, except for particular conditions such as capturing the activity of deep muscles or muscles with small cross-sectional areas [2,3]. Specifically, its invasiveness makes fwEMG inappropriate for monitoring dynamic and cyclic tasks, such as walking. On the contrary, sEMG analysis is frequently recommended in this kind of motor tasks because it is non-intrusive, it does not discomfort or hurt patients, it is not difficult to perform, it allows the monitoring of long-lasting tasks, and it permits the recording of muscular activity from a significant proportion of motor-units which is likely representative of whole muscle activity [4–6].
Walking is an activity of huge importance in human everyday life [7]. It is an ordinary movement that involves mainly the lower limb muscles. So, acquiring sEMG signals from those muscles is a prime concern of gait analysis. Indeed, identifying and analyzing muscle activity during gait allows for providing relevant information in clinics, rehabilitation, and recovery from neurological and orthopaedic disorders [8–10]. During physiological walking, myoelectric activity changes considerably from person to person and within the same subject [11,12]. The variability seems to further intensify in pathological conditions [13]. This phenomenon is reported to be the result of the human capability to adapt to both intrinsic and extrinsic challenges and perturbations [14]. The quantification of this wide variability permits to enhance the interpretation of sEMG signals in both physiological and pathological populations. A deeper investigation of sEMG variability could be useful to comprehend the neural basis of muscular synergies [15], to describe the flexibility of musculoskeletal system [16], and to deepen the understanding of processes that the neuro-muscular system adopts to adjust for the ongoing mechanical conditions [17]. The high variability of sEMG signal is one of the main reasons why recent literature is starting to recommend the analysis of muscle recruitment during natural walking lasting at least 3–5 min, not less [18]. Moreover, walking continuously for some minutes allows the subject to move more naturally, as in everyday life. After some strides, subjects start feeling comfortable and walking at their natural pace. Thus, although a few gait cycles might be enough in some applications, the acquisition of EMG signals in more numerous strides is strongly suggested. Despite valuable datasets including sEMG signals being accessible [19–25], to our knowledge free databases composed of long-lasting sEMG signals during walking are not available.
The aim of the present study is to introduce a dataset composed of long-lasting (4-5 minutes) sEMG signals and their analysis. The sEMG signals were recorded during uninterrupted ground walking of 31 young able-bodied subjects in the Movement Analysis Lab of Università Politecnica delle Marche, Ancona, Italy. In addition to the availability of the complete raw sEMG signals from ten different lower-limb muscles (five per leg), the dataset includes synchronized footswitch and knee electrogoniometric signals acquired in the same population, which are useful to achieve a spatial/temporal characterization of muscular recruitment during each walking trial. In a standard protocol of gait analysis, at least four lower-limb muscles are typically considered for sEMG-based evaluations: gastrocnemius lateralis, tibialis anterior, rectus femoris, and hamstrings [18]. The idea is to include one pair of agonist-antagonist muscles for each joint (ankle, knee, and hip). This is possible with only four muscles, since two bi-articular muscles are included in the protocol: rectus femoris (hip and knee) and gastrocnemius lateralis (knee and ankle). Moreover, it is known that vastii muscles have a prominent role in the process of stabilizing patella and knee joint during walking [26]. Thus, the analysis of vastii behavior could add further insight into the comprehension of walking physiology and the etiology of common knee pathologies [27]. For these reasons, sEMG signals from gastrocnemius lateralis, tibialis anterior, rectus femoris, hamstrings, and vastus lateralis are collected and stored in the current dataset.
To test the reliability and the usability of the current dataset, sEMG signal quality was analyzed by evaluating the signal-to-noise ratio (SNR), sEMG frequency content was quantified by the continuous wavelet analysis, and a comparative analysis was performed against the results of acknowledged scientific studies. Further details on the characteristics of the dataset could be found in [28–30].
2. Data acquisition
2.1. Involved participants
This study is based on gait data recorded from 2011 and 2018 and already used in previous publications. Thirty-one young able-bodied subjects were involved in the study. Detailed anthropometric characteristics of the participants are reported in Table 1. All the participants were students who were used to attending the Movement Analysis Lab. They were selected according to the following inclusion criteria: (i) 20 years old < age < 30 years old; (ii) absence of known locomotor disorders; and (iii) body mass index (BMI) ranging from 18 kg/m2 to 25 kg/m2 to avoid underweight and overweight conditions. Subjects who communicated manifest disorders, confirmed diseases, pain, or after surgical intervention were excluded.
Before the experiment, each subject was introduced to the experimental protocol and the motor task to perform and informed of any potential risk. Every subject provided written informed consent. The experiment was performed by trained investigators, using approved and non-invasive protocols. The subject’s welfare was guaranteed during the whole experiment. The data was analyzed anonymously. Authors had no access to information identifying individual participants during or after data collection. Experiments and data acquisition have been conducted according to the ethical principles of the Helsinki Declaration and approved by the local ethical committee.
2.2. Test setup
Three different typologies of signals were recorded (Fig 1): basographic, electrogoniometric, and sEMG. The signals were acquired at a sampling rate of 2 kHz and a resolution of 12 bit by the multi-channel recording system Step32 (Version PCI-32 ch2.0.1. DV), Medical Technology, Italy. For the acquisition of the foot-floor-contact signal (i.e., basographic signal), each subject was instrumented with three footswitches (size: 11 × 11 × 0.5 mm; activation force: 3 N, manufacturer: Medical Technology) applied bilaterally beneath the heel (Heel), the first (1st MH) and the fifth (5th MH) metatarsal heads of each foot, as reported in panel A of Fig 2.
Footswitches applied bilaterally beneath the heel (Heel), the first (1st MH) and the fifth (5th MH) metatarsal heads of each foot (panel A); Knee electrogoniometer (panel B).
For the acquisition of dynamic knee joint angles in the sagittal plane, an electrogoniometer (accuracy: 0.5°, manufacturer: Medical Technology) was attached to the lateral side of each lower limb (panel B of Fig 2). In some subjects (9 out of 31 subjects, see the.hea file described in the “Data Analysis” section), the hip angle was also measured by applying the same electrogoniometric sensor.
sEMG signals were recorded with single differential probes of fixed geometry constituted by Ag/Ag-Cl disks (manufacturer: Medical Technology, size: 7 × 27 × 19 mm; electrode diameter: 4 mm; inter-electrode distance: 8 mm, gain: 1000, high-pass filter, cut-off frequency: 10 Hz, input impedance > 1.5G Ω, CMRR > 126 dB, input referred noise ≤ 1 µ Vrms), and with variable geometry constituted by Ag/Ag-Cl disks (manufacturer: Medical Technology, minimum inter-electrode distance: 12 mm, gain: 1000, high-pass filter, cut-off frequency: 10 Hz, input impedance > 1.5G Ω, CMRR > 126 dB, input referred noise ≤ 200 nVrms). sEMG signals were further amplified and low pass filtered (cut-off frequency 450 Hz) by the recording system. Probes with fixed geometry were applied over gastrocnemius lateralis (GL), tibialis anterior (TA), and Hamstrings (Ham) whereas probes with variable geometry were applied over rectus femoris (RF) and vastus lateralis (VL), as depicted in Fig 3. It is worth specifying that the sEMG signal indicated with “Hamstrings” reflects the global combined function of the hamstrings muscle group during walking, including the medial (semimembranosus and semitendinosus) and lateral (biceps femoris) muscles, rather than isolating the sEMG signals from individual muscles within the hamstring complex.
sEMG probes with fixed geometry were applied over tibialis anterior (TA), gastrocnemius lateralis (GL), and Hamstrings (Ham) and probes with variable geometry were applied over rectus femoris (RF) and vastus lateralis (VL).
Before positioning the probes, the skin was shaved, cleaned with abrasive paste, and then wet with a soaked cloth. To ensure proper electrode-skin contact, electrodes were dressed in highly conductive gel. Electrode location and orientation over the muscle with respect to tendons, motor point, and fiber direction were accomplished to the European Recommendations for Surface ElectroMyoGraphy (SENIAM) [31]. Footswitches and electrogoniometers were positioned following the standard procedure indicated by the manufacturer of the system. All these sensors and signals converge into the Patient Unit of the multi-channel recording system. This Patient Unit (Fig 1) is positioned at the lower back, anatomically corresponding to the lumbar spine. This placement situates the device just above the sacral region and superior to the gluteal muscles, ensuring a strategic alignment that follows the natural curve of the lumbar region. The Unit is secured with an adjustable elastic belt that wraps around the subject’s waist. This belt ensures a stable fit, allows for slight adjustments to accommodate various body types and maintains comfort, while minimizing the side-to-side and up-and-down movement of the Unit during physical activities, especially while walking.
Signals were checked for quality and sensor positioning and regulated as necessary. sEMG probes, footswitches, and electrogoniometers are parts of the same acquisition system (Step32 - Version PCI-32 ch2.0.1. DV, Medical Technology, Italy); thus, all the signals are acquired synchronously. The sampling rate of 2 kHz allows to describe the full bandwidth of the sEMG signal.
To mitigate cross-talk, electrode placement followed SENIAM guidelines [31], prioritizing electrode orientation and muscle alignment. Moreover, the procedure indicated in [32] was also adopted. Once the EMG probes had been positioned, crosstalk verification was conducted through visual examination. Cross-talk was presumed if simultaneous activity with similar amplitude variations was observed in two muscles within the same limb section. In this case, double differential probes were employed to enhance spatial selectivity, and the signal obtained from these probes was compared against that of the single differential ones. Confirmation of cross-talk was established when the double differential signal displayed a noticeably reduced amplitude; in such cases, the signal was disregarded.
2.3. Acquisition protocol
Before starting a trial, subjects were asked to stand in their comfortable posture for 5 seconds. Then, volunteers were encouraged to walk barefoot uninterruptedly on level ground for around 5 minutes. The average trial duration (± standard deviation, SD) over the whole 31-subject population is 258 (± 53) s. Details of the trial duration are reported in Table 1. One single trial was performed by each subject. Subjects were given no specific indication about speed, pace, acceleration, deceleration, or reversing. They were only asked to follow freely an eight-shaped path like the one described by the black feet in Fig 4. No markers or signs on the floor were added. The shape of this path (including acceleration, deceleration, and reversing) and the long duration of the trial were chosen to allow the subject to move more naturally, as in everyday living. Despite subjects being given no indication, three different paces adopted by the subjects were observed during the experimental trials. The three zones characterized by the three different paces are superimposed to the eight-shaped path in Fig 4 to help the comprehension of pace variability during this walking task: the red zones, where subjects were used to curving or reversing; the yellow zones, where subjects were used to accelerating and decelerating; and the orange zone, where subjects were used to walk following linear diagonal segments. The eight-shaped path was indeed intentional to introduce variability in sEMG patterns and emulate everyday walking with natural deviations.
Orange indicates the zone where subjects walked following linear diagonal segments; yellow indicates the zones where subjects accelerated or decelerated; and red indicates the zones where subjects curved or reversed.
3. Data analysis
The dataset is composed of raw walking signals recorded from 31 young healthy subjects. Following the PhysioNet requirements [29], the files are provided in waveform database (WFDB) format. Specifically, two WFDB files are provided for each subject, with “.dat” and “.hea” extensions. For example, for the first subject (S1), the two files S1.dat and S1.hea are available. The.dat file is structured as a data matrix composed of 14 rows. Each row includes a whole signal (basographic, electrogoniometric, or sEMG) according to what is reported in the correspondent.hea file. The.hea file provides the sampling rate (in Hz) and the number of samples for each signal and then describes the order in which the signals are stored (from row 1 down to row 14). sEMG signals are expressed in μV; foot-switch signals are expressed in V; and electrogoniometric signals are expressed in degrees. Footswitch and electrogoniometric data are provided to allow users to achieve a spatial/temporal characterization of muscular recruitment during walking. Footswitch, electrogoniometric, and sEMG signals are synchronized. The basographic signals from footswitches were converted to four levels and processed to segment and classify the different gait cycles under the acknowledged procedure introduced in [33]. Briefly, the procedure is based on the computation of the least significant bit (LSB), defined as the difference between the maximum and minimum values of the basographic signal (expressed in V), normalized by the number of quantization levels provided by the footswitches, which in this case is 8 (2 values, ON or OFF, for each of the three footswitches leading to 2^3 = 8). This allows for the extraction of 8 gait phases from the =basographic signal recorded by the footswitches. Then, an additional quantization process was performed, by grouping some of these 8 levels in order to achieve 4 phases. Each new level corresponds to a specific phase of the foot-floor contact: Heel contact (H), Flat foot contact (F), Push off (P), Swing (S). The complete and detailed procedure is reported in [33]. The total uncompressed size of the whole dataset is 427.0 MB. A text file with the anthropometric information of the participants reported in Table 1 is also included. An example of each typology of signal visualized in a single stride of a representative subject (subject 12) is shown in Fig 5.
3.1. SNR analysis
sEMG signal quality was tested by evaluating the signal-to-noise ratio (SNR). The SNR of each raw signal was computed (in dB) using the approach described by Bonato et al. [34], which quantifies it as the logarithm of the ratio between the square of the standard deviation (SD) of the sEMG signal and the square of the SD of the noise:
where is the SD of the noise and
is the SD of the actual signal. In this current study, this method was applied exactly as described, without any additional filtering of the signal (aside from the hardware filtering described in section 2.2) or outlier removal. The only contribution made by the authors was in selecting which part of the signal to use for calculating the SD of the noise and which part for calculating the SD of the signal, as follows. Before the beginning of the experimental trial, subjects were asked to stand in their comfortable posture for 5 seconds (i.e., around 10000 samples). These 10000 samples were considered as noise. Then, the first 10000 samples of the sEMG signal were visually inspected to identify potential undesired muscle activations or spikes.
was computed in this segment of 10000 samples, excluding undesired muscular activations and spikes.
is computed in the remaining samples of the signal from 10000 to the end. SNR value computed in each signal is reported in Table 2.
A common guideline indicates that SNR higher than 10 dB could be considered suitable for most clinical and research applications of sEMG signals. This SNR value indicates that the signal amplitude is at least 10 times higher than the background noise, providing a relatively clear distinction between the desired muscle activity and the unwanted noise components. However, lower SNR values, such as 5–10 dB, may also be considered adequate in specific experimental conditions. The last two rows in Table 2 show that the average SNR value over the whole population is higher than 11 dB for each muscle. Moreover, only 10% of SNRs in Table 2 are lower than 10 dB and a negligible percentage of SNRs (0.97%) is lower than 5 dB.
3.2. Frequency analysis
The acquired sEMG signals were validated also using the quantification of their frequency content. First, the basographic signals were processed to segment and identify the different gait cycles following the acknowledged procedure introduced in [33]. For each muscle and each gait cycle, the time-frequency energy density of the signal was identified through the scalogram function provided by the continuous wavelet analysis [35]. Moreover, the maximum energy density was assessed, as the time-frequency interval in which the energy density exceeds 75% of its peak value over the gait cycle. An example of a wavelet scalogram for two muscles representative of distal-leg muscles (tibialis anterior) and proximal-leg muscles (rectus femoris) in a random stride is depicted in Fig 6.
Energy density of the sEMG signal is represented through wavelet scalogram in a random stride for two representative muscles. Gait cycle percentage and frequency content are reported in the horizontal plane. Normalized color-level coded scale represents the amplitude of energy density; red = maximum energy density; blue = minimum energy density.
For all muscles and all strides, the energy density of sEMG signal is completely included in the range [0–500] Hz, which is the typical frequency content of this electrophysiological signal [33]. Specifically, the maximum frequency detected in a single stride ranges from around 150 Hz up to almost 500 Hz, in line with what was reported in [35]. The maximum energy density ranges in [60–170] Hz for GL, [60–220] Hz for TA, [65–220] Hz for RF, [60–185] Hz for Ham, and [65–220] Hz for VL.
Despite the frequency content changing from muscle to muscle, a common frequency band of [65–170] Hz could be identified among all muscles, consistent with EMG literature [36,37]. Possible motion artifacts typically reported in low frequency ([0–15] Hz), ECG interference (typical in trunk muscles but very rare in leg muscles, [0–30] Hz), and power line noise (50 Hz) are not perceivable in these signals.
3.3. Comparison to other datasets
To test the reliability of the current dataset, a comparative analysis was performed against the results of acknowledged scientific studies with walking conditions as similar as possible to the ones considered here. In those studies, muscle activation intervals are typically expressed as a function of the gait cycle. Thus, the basographic signals were processed to segment and classify the different gait cycles under the acknowledged procedure introduced in [33]. Furthermore, for each gait cycle the main four gait phases were chronologically identified: Heel contact (H), Flat foot contact (F), Push off (P), and Swing (S). Electrogoniometric signals were low-pass filtered (FIR filter, 100 taps, cut-off frequency of 15 Hz). Most of the gait studies reported in literature were performed during straight walking. For a suitable comparison with those studies, it was necessary to detect and discard non-straight or altered cycles from the present dataset, like those relative to deceleration, acceleration, and reversing. A multivariate statistical filter [33,38] was adopted to test knee angles and gait phase durations from every stride of the subject’s walking, by comparing them with the mean value computed on each single subject. If knee angles and/or gait phases in the single stride were significantly different from the mean value, that stride was rejected. sEMG signal was high-pass filtered (FIR filter, 100 taps, cut-off frequency of 20 Hz) to reduce movement artifacts. Muscle activation intervals were computed in each gait cycle by the application of a widely adopted double-threshold statistical detector to sEMG signals [34]. Onset and offset instants of activation were expressed in percentage of gait cycle. Then, mean activation intervals over all the strides of the population were computed as follows: (i) since the number of muscle activations is cycle (stride) dependent [38], for each subject gait cycles are grouped by the number of activation intervals; (ii) for each subject, onset and offset instants of each activation were averaged over each group; (iii) values computed in (ii) were averaged over the whole population of 31 subjects, respecting the grouping criteria; (iv) standard error was computed; and (v) mean data for each group were plotted as a function of gait cycle percentage. This approach to data organization is known as Statistical gait analysis (SGA). Results are depicted in the following Figs 7 and 8.
Average (± standard error) activation intervals for the three main modalities of activations for rectus femoris (upper panel), hamstrings (middle panel), and vastus lateralis (lower panel) during gait cycle.
Average (± standard error) activation intervals for the three main modalities of activations for gastrocnemius lateralis (upper panel) and tibialis anterior (lower panel) during gait cycle.
Overall, the activation intervals follow the typical pattern reported for these five muscles by acknowledged physiological references in the literature [3,4]. Despite this first confirmation, a more reliable comparison should be made against more recent results proposed in the literature with the same approach (SGA), but with different datasets. Figs 7 and 8 clearly illustrate how the same muscle can exhibit different activation patterns (and thus sEMG signal with different characteristics) across various strides. Specifically, this analysis shows distinct modalities of activation, characterized by one, two, or even three activation intervals per gait cycle in different strides for the same muscle. This variability underscores the complexity of muscle recruitment during natural walking and provides empirical evidence of the non-uniformity of sEMG signals. Only the extended duration of data acquisition allowed to perform the SGA, enabling it to identify and categorize multiple activation modalities per muscle with statistical reliability. To be useful, SGA requires the acquisition of sEMG signals in numerous strides. Thus, all SGA-based studies are typically performed on datasets consisting of long-lasting sEMG signals. This allows for a more suitable comparison with the proposed dataset, also composed of long-lasting sEMG signal. Unfortunately, despite detailed research on the web, we were not able to find free downloadable datasets for direct comparison. Different sEMG datasets during walking are available. However, they are limited by short duration of the trials (only few consecutive strides per subject), and/or small number of muscles considered, and/or by considering treadmill walking [19–25]. The current comparison was performed against the results of acknowledged SGA-based studies [32,39–41] and reported in Table 3, in terms of onset and offset of each activation interval of the two most frequent activation modalities for each muscle.
The third column in Table 3 reports the results of the EMG-based study performed by Agostini et al. [32] in a population of 100 healthy school-age children. The acquisition was accomplished with the same model of the recording system used in the current study (Step32, Medical Technology); the experimental procedure and sEMG analysis (SGA) were also the same. Trial duration was 2.5 minutes. The only actual difference between the two datasets is the age of the subjects (mean ± SD): (24.2 ± 1.9) years in the current study, versus (9.0 ± 1.4) years in [32]. However, many studies indicated that the mature pattern of muscle activity is usually achieved around six years in normally developing children [32,42–44]. Thus, given the difficulty encountered in finding suitable signals for comparison, this rich dataset could be considered as a reliable reference. Results in Table 3 (column 2 versus column 3) show that the assessment of activation intervals provided by the two studies is substantially superimposable for each muscle, strengthening the quality of the proposed dataset. A further SGA-based study could be useful to evaluate the quality of the current dataset [39]. This other study includes a population of 18 young healthy subjects (10 females and 8 males, age range 20-30 years, trial duration = 3 minutes) which matches the sample considered here. Nevertheless, the manuscript presents the result only in graphical forms (figures) and it is not feasible to extract the detailed onset and offset instants of activation to compare with. However, the comparison performed by visual inspection indicates a total agreement between the results of the present analysis and the one reported in [39]. To be thorough, the results of two previous SGA-based studies [40,41] of the same group of researchers of the current study have also been reported in Columns 4 and 5 of Table 3. The first study [40] focuses on the analysis of ankle muscles during ground walking. Fourteen healthy volunteers were included (mean age ± SD): (23.9 ± 2.3) years; male-female ratio: 7/7, task duration: 5 minutes. Strazza et al. [41] consider a population of 30 healthy subjects (15 females and 15 males, age range 20–30 years, trial duration = 5 minutes). As highlighted in Table 3, these results further support the reliability of the current dataset. However, comparison results could be influenced by the fact that part of these signals have been included in the current dataset.
To conclude this section, it is worth mentioning that signals from the current dataset were already used in numerous published studies on human walking, including the statistical evaluation of ankle muscle activation intervals [40], the quantification of muscle co-contraction [41], and as a control group for examining the variability of muscular recruitment in hemiplegic walking [13]. In more recent studies, these signals were employed as input to neural networks to classify and predict gait phases from sEMG signals [30,45], as well as to evaluate the reliability of novel algorithms for muscular activation detection [35].
4. Data usage notes and data limitations
The substantial duration of these signals renders this dataset highly suitable for Compressed Sensing [46], or Machine Learning approaches that rely on extensive data volume. Additionally, the dataset can be utilized to serve as a reference dataset for characterizing pathological conditions as well as to analyze and quantify the variability of muscle recruitment during normal walking. Specifically, methods like Lyapunov exponent (LyE), approximate entropy (ApEN), and detrended fluctuation analysis (DFA) are powerful tools for assessing signal variability and complexity. Given that the present dataset is publicly available through PhysioNet, it could be possible to compute these non-linear figures on the dataset to analyze variability and complexity.
Despite the care employed in the experimental procedure, it is necessary to report some limitations noticed in the current dataset. The first issue concerns the population, which is not perfectly gender-balanced (19 F vs. 12 M). Differences in muscle activity and gait patterns between genders are well-documented, and this could potentially influence the generalizability of sEMG-based findings. However, we would like to emphasize that the primary objective of this work is to introduce a free-available dataset, showcasing its complexity, uniqueness, and potential applications, rather than to provide clinical or research indication or results (which could be interesting for future studies but currently go beyond the scope of the present work). The analyses conducted here were intended solely to highlight the dataset value and its capacity to support further studies. Notably, this dataset could also facilitate future research focused on gender-specific muscle recruitment during walking, as it allows for balanced subgroup analyses (e.g., 12 females vs. 12 males), as done in previous works by this same research group [47].
Then, participants were not given specific instructions regarding their walking speed, pace, or acceleration. The lack of standardized instructions may introduce variability in gait patterns among participants, affecting data consistency. However, the primary objective of this study was to create an sEMG dataset that captures muscle activity while walking as naturally as possible, closely reflecting the gait patterns adopted in daily life. The variability due to the lack of standardized instructions is therefore not seen as a limitation but rather as an inherent strength of the dataset. By including a broader range of walking behaviors, the dataset provides a richer and more realistic foundation for research that aims to understand muscle recruitment in real-world scenarios. Nonetheless, it is important to remember that the analyses conducted here to highlight the potential of the dataset could be influenced by the variability of gait patterns. Therefore, potential users of this dataset should be aware of its context and apply it appropriately in their studies.
In the current study, a wired multi-channel recording system was used to capture sEMG signals, electrogoniometric data, and foot-floor contact signals. It is recognized that the presence of wires can theoretically introduce constraints or affect gait to some extent. Thus, careful attention was given to the management of the cables in order to minimize any impact on the participants’ natural gait. Specifically, the cables were secured and organized to reduce tension and allow freedom of movement. They were routed along the body and affixed in such a way as to prevent excessive slack or interference with the participants’ stride. This approach was intended to ensure that the walking experience remained as natural as possible. It is worth also mentioning that the extended duration of our sEMG recordings helps mitigating initial sensor-induced discomfort, allowing participants to achieve more natural gait patterns over time. At the end of the test, indeed, no significant restrictions were reported by the participants during the trials.
The last two columns in Table 2 show that the average SNR value over all muscles is higher than 13 dB for each subject. Only for subject 20, mean SNR is lower than 10 dB (7.7 dB) and some sEMG signals are characterized by a very low SNR value ( < 5 dB, left Ham, right Ham, and right RF). The use of these signals is recommended only after selective noise filtering has been applied.
Saturation of sEMG signal was identified on two occasions: in the left GL of subject 2 from sample 187280 to sample 248245 and in the right Ham of subject 3 from sample 23343 to sample 88880. However, the cases of signal saturation involve only two sEMG signals out of a total of 310 recorded sEMG signals across the entire dataset. Therefore, we believe that these isolated instances do not compromise the overall quality and reliability of the dataset. Moreover, in the left Ham of subject 16, a high spike probably due to an unexpected interference is identified at sample 260730, lasting around 300 samples. However, it should be noted that this interference lasts for only 300 samples. We consider this spike negligible compared to the total signal duration of approximately 514000 samples, given its minimal impact on data integrity. The presence of these signal issues advises caution when using specific portions of the data, recommending data-cleaning techniques and selective exclusion of compromised segments.
Subject 18 interrupted her walking trial at around sample 240700 due to the detachment of one of the sensors. Therefore, data from that sample onwards are no longer reliable. Some of right (around sample 165363) and left (around sample 176765) footswitches detached due to excessive sweating of the subject 22. Thus, from these samples onwards, the basographic signal is reliable only for discriminating the two main phases of the gait (stance and swing) and not for a more detailed description. No further significant issues were detected. Users are advised to consider all these issues when utilizing the current dataset.
5. Code availability
The basic algorithm code developed for identification of each stride from the whole basographic signal is available at the general-purpose open repository figshare [48]. The main purpose of the algorithm is to extract, from the basographic signal, the starting sample of each stride (and also the end sample, since the strides are consecutive.) in order to segment the sEMG signal in strides. The script runs in Matlab environment with a minimal interactive interface. At the first level, the user is asked to enter the number of the stride to analyze. Then, the user is prompted to indicate the channel number (i.e., the row of the matrix) of the basographic signal to display (right or left according to the protocol reported in the.hea file) and the corresponding channel for the muscle. Finally, the user needs to enter the name of the input.mat file. At the end of the procedure, the starting sample of each stride is saved in the variable “indice” and a figure for visualizing the basographic signal and its corresponding sEMG signal in the selected stride is provided.
References
- 1.
Loeb GE, Gans C. Electromyography for experimentalists. Chicago, IL, USA: University of Chicago Press; 1986.
- 2. Onishi H, Yagi R, Akasaka K, Momose K, Ihashi K, Handa Y. Relationship between EMG signals and force in human vastus lateralis muscle using multiple bipolar wire electrodes. J Electromyogr Kinesiol. 2000;10(1):59–67. pmid:10659450
- 3. Sutherland DH. The evolution of clinical gait analysis part l: kinesiological EMG. Gait Posture. 2001;14(1):61–70. pmid:11378426
- 4.
Perry J. Gait analysis: normal and pathological function. Thorofare, NJ, USA: Slack Inc.; 1992.
- 5. Spulák D, Cmejla R, Bačáková R, Kračmar B, Satrapová L, Novotný P. Muscle activity detection in electromyograms recorded during periodic movements. Comput Biol Med. 2014;47:93–103. pmid:24561347
- 6. Wang W, De Stefano A, Allen R. A simulation model of the surface EMG signal for analysis of muscle activity during the gait cycle. Comput Biol Med. 2006;36(6):601–18. pmid:16029872
- 7. Martinsen B, Haahr A, Dreyer P, Norlyk A. High on walking: conquering everyday life. West J Nurs Res. 2018;40(5):633–47. pmid:28256935
- 8. Benedetti MG, Catani F, Bilotta TW, Marcacci M, Mariani E, Giannini S. Muscle activation pattern and gait biomechanics after total knee replacement. Clin Biomech (Bristol). 2003;18(9):871–6. pmid:14527815
- 9. Frigo C, Crenna P. Multichannel SEMG in clinical gait analysis: a review and state-of-the-art. Clin Biomech (Bristol). 2009;24(3):236–45. pmid:18995937
- 10. Mahaudens P, Banse X, Mousny M, Detrembleur C. Gait in adolescent idiopathic scoliosis: kinematics and electromyographic analysis. Eur Spine J. 2009;18(4):512–21. pmid:19224255
- 11. Ivanenko YP, Poppele RE, Lacquaniti F. Five basic muscle activation patterns account for muscle activity during human locomotion. J Physiol. 2004;556(Pt 1):267–82. pmid:14724214
- 12.
Winter DA. Biomechanics and motor control of human gait: normal elderly and pathological. ON, Waterloo: Waterloo Biomechanics Press, 1991.
- 13. Di Nardo F, Spinsante S, Pagliuca C, Poli A, Strazza A, Agostini V, et al. Variability of muscular recruitment in hemiplegic walking assessed by EMG analysis. Electronics. 2020;9(10):1572.
- 14. Dingwell JB, Joby J, Cusumano JP. Do humans optimally exploit redundancy to control step variability in walking? PLoS Comput Biol. 2010;6:1–15.
- 15. Ranganathan R, Krishnan C. Extracting synergies in gait: using EMG variability to evaluate control strategies. J Neurophysiol. 2012;108(5):1537–44. pmid:22723678
- 16. Araujo RC, Duarte M, Amadio AC. On the inter- and intra-subject variability of the electromyographic signal in isometric contractions. Electromyogr Clin Neurophysiol. 2000;40(4):225–9. pmid:10907600
- 17. Huber C, Federolf P, Nüesch C, Cattin PC, Friederich NF, Tscharner V von. Heel-strike in walking: assessment of potential sources of intra- and inter-subject variability in the activation patterns of muscles stabilizing the knee joint. J Biomech. 2013;46(7):1262–8. pmid:23518206
- 18. Agostini V, Ghislieri M, Rosati S, Balestra G, Knaflitz M. Surface electromyography applied to gait analysis: how to improve its impact in clinics? Front Neurol. 2020;11:994.
- 19. Moreira L, Figueiredo J, Fonseca P, Vilas-Boas JP, Santos CP. Lower limb kinematic, kinetic, and EMG data from young healthy humans during walking at controlled speeds. Sci Data. 2021;8(1):103. pmid:33846357
- 20. Lencioni T, Carpinella I, Rabuffetti M, Marzegan A, Ferrarin M. Human kinematic, kinetic and EMG data during different walking and stair ascending and descending tasks. Sci Data. 2019;6(1):309. pmid:31811148
- 21. Macaluso R, Embry K, Villarreal D, Gregg R. Human leg kinematics, kinetics, and EMG during phase-shifting perturbations at varying inclines. IEEE Dataport. 2020.
- 22. Hu B, Rouse E, Hargrove L. Benchmark datasets for bilateral lower-limb neuromechanical signals from wearable sensors during unassisted locomotion in able-bodied individuals. Front Robot AI. 2018;5:14. pmid:33500901
- 23.
Chereshnev R, Kertész-Farkas A. HuGaDB: Human gait database for activity recognition from wearable inertial sensor networks. In: van der Aalst W, et al. Analysis of images, social networks and texts. AIST 2017. Lecture notes in computer science. vol. 10716. Springer, Cham; 2018.
- 24. Kirtley C. CGA normative gait database. 2014. Available from: http://www.clinicalgaitanalysis.com/data/.
- 25. Wei W, Tan F, Zhang H, Mao H, Fu M, Samuel OW, et al. Surface electromyogram, kinematic, and kinetic dataset of lower limb walking for movement intent recognition. Sci Data. 2023;10(1):358. pmid:37280249
- 26.
Grelsamer RP, McConnell J. The patella: a team approach. Gaithersburg (MD): Aspen; 1988.
- 27. Cowan SM, Bennell KL, Hodges PW, Crossley KM, McConnell J. Delayed onset of electromyographic activity of vastus medialis obliquus relative to vastus lateralis in subjects with patellofemoral pain syndrome. Arch Phys Med Rehabil. 2001;82(2):183–9. pmid:11239308
- 28. Di Nardo F, Morbidoni C, Fioretti S. Surface electromyographic signals collected during long-lasting ground walking of young able-bodied subjects (version 1.0.0). PhysioNet. 2022.
- 29. Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation. 2000;101(23):E215-20. pmid:10851218
- 30. Di Nardo F, Morbidoni C, Cucchiarelli A, Fioretti S. Influence of EMG-signal processing and experimental set-up on prediction of gait events by neural network. Biomed Signal Process Control. 2021;63:102232.
- 31. Hermens HJ, Freriks B, Disselhorst-Klug C, Rau G. Development of recommendations for SEMG sensors and sensor placement procedures. J Electromyogr Kinesiol. 2000;10(5):361–74. pmid:11018445
- 32. Agostini V, Nascimbeni A, Gaffuri A, Imazio P, Benedetti MG, Knaflitz M. Normative EMG activation patterns of school-age children during gait. Gait Posture. 2010;32(3):285–9. pmid:20692162
- 33. Agostini V, Balestra G, Knaflitz M. Segmentation and classification of gait cycles. IEEE Trans Neural Syst Rehabil Eng. 2014;22(5):946–52. pmid:24760911
- 34. Bonato P, D’Alessio T, Knaflitz M. A statistical method for the measurement of muscle activation intervals from surface myoelectric signal during gait. IEEE Trans Biomed Eng. 1998;45(3):287–99. pmid:9509745
- 35. Di Nardo F, Basili T, Meletani S, Scaradozzi D. Wavelet-based assessment of the muscle-activation frequency range by EMG analysis. IEEE Access. 2022;10:9793–805.
- 36. Clancy EA, Bertolina MV, Merletti R, Farina D. Time- and frequency-domain monitoring of the myoelectric signal during a long-duration, cyclic, force-varying, fatiguing hand-grip task. J Electromyogr Kinesiol. 2008;18(5):789–97. pmid:17434755
- 37. De Luca CJ. The use of surface electromyography in biomechanics. J Appl Biomech. 1997;13(2):135–63.
- 38.
Agostini V, Knaflitz M. Statistical gait analysis. Distributed diagnosis and home healthcare. vol. 2. American Scientific Publishers; 2012. p. 99–121.
- 39. Agostini V, Lo Fermo F, Massazza G, Knaflitz M. Does texting while walking really affect gait in young adults? J Neuroeng Rehabil. 2015;12:86. pmid:26395248
- 40. Di Nardo F, Ghetti G, Fioretti S. Assessment of the activation modalities of gastrocnemius lateralis and tibialis anterior during gait: a statistical analysis. J Electromyogr Kinesiol. 2013;23(6):1428–33. pmid:23886485
- 41. Strazza A, Mengarelli A, Fioretti S, Burattini L, Agostini V, Knaflitz M, et al. Surface-EMG analysis for the quantification of thigh muscle dynamic co-contractions during normal gait. Gait Posture. 2017;51:228–33. pmid:27825072
- 42. Sutherland DH, Olshen R, Cooper L, Woo SL. The development of mature gait. J Bone Joint Surg Am. 1980;62(3):336–53. pmid:7364807
- 43. Berger W, Altenmueller E, Dietz V. Normal and impaired development of children’s gait. Hum Neurobiol. 1984;3(3):163–70. pmid:6480437
- 44.
Stout JL. Gait: development and analysis. In: Campbell SK, Palisano RJ, Vander Linden DW, editors. Physical therapy for children. Philadelphia: WB: Saunders Company; 2004. p. 161–7.
- 45. Di Nardo F, Morbidoni C, Mascia G, Verdini F, Fioretti S. Intra-subject approach for gait-event prediction by neural network interpretation of EMG signals. Biomed Eng Online. 2020 Jul 28;19(1):58. pmid:32723335
- 46.
Iadarola G, Meletani S, Di Nardo F, Spinsante S. A New method for sEMG envelope detection from reduced measurements. 2022 IEEE International Symposium on Medical Measurements and Applications (MeMeA). Messina, Italy; 2022. p. 1–6. https://doi.org/10.1109/memea54994.2022.985643s6
- 47. Di Nardo F, Mengarelli A, Maranesi E, Burattini L, Fioretti S. Gender differences in the myoelectric activity of lower limb muscles in young healthy subjects during walking. Biomed Signal Process Control. 2015;19:14–22.
- 48. Di Nardo F. Identification of single strides from a foot-floor contact signal. figshare. 2023.