Using wearable sensors to classify subject-specific running biomechanical gait patterns based on changes in environmental weather conditions

Running-related overuse injuries can result from a combination of various intrinsic (e.g., gait biomechanics) and extrinsic (e.g., running surface) risk factors. However, it is unknown how changes in environmental weather conditions affect running gait biomechanical patterns since these data cannot be collected in a laboratory setting. Therefore, the purpose of this study was to develop a classification model based on subject-specific changes in biomechanical running patterns across two different environmental weather conditions using data obtained from wearable sensors in real-world environments. Running gait data were recorded during winter and spring sessions, with recorded average air temperatures of -10° C and +6° C, respectively. Classification was performed based on measurements of pelvic drop, ground contact time, braking, vertical oscillation of pelvis, pelvic rotation, and cadence obtained from 66,370 strides (~11,000/runner) from a group of recreational runners. A non-linear and ensemble machine learning algorithm, random forest (RF), was used to classify and compute a heuristic for determining the importance of each variable in the prediction model. To validate the developed subject-specific model, two cross-validation methods (one-against-another and partitioning datasets) were used to obtain experimental mean classification accuracies of 87.18% and 95.42%, respectively, indicating an excellent discriminatory ability of the RF-based model. Additionally, the ranked order of variable importance differed across the individual runners. The results from the RF-based machine-learning algorithm demonstrates that processing gait biomechanical signals from a single wearable sensor can successfully detect changes to an individual’s running patterns based on data obtained in real-world environments.


Introduction
Running is one of the most common recreational activities around the world but despite its popularity, each year approximately 50% of runners experience a running-related musculoskeletal injury [1][2][3]. The etiology of overuse running injuries is multifactorial, and can result from the interaction of many extrinsic factors, such as environmental conditions, running surface, footwear, and weekly training mileage, as well as intrinsic risk factors such as age, foot strike pattern, and gait biomechanics [1][2][3][4]. Prolonged exposure to these intrinsic and extrinsic risk factors may lead to overuse running injury [5]. One risk factor that has received very little attention in the literature is whether gait biomechanical patterns change as a result of environmental weather conditions. Previous investigations of injury risk, based on ambient temperature, have suggested that tissue damage may occur due to a lack of proper warm up. For example, Milgrom et al. reported an increased risk of Achilles paratendinitis among infantry recruits in winter conditions, as compared to summer [6]. On the other hand, cold weather has been shown to reduce shoe-surface traction, resulting in a reduced risk of acute knee and ankle injuries among football players [7,8]. Only a handful of studies have investigated the effect of environmental weather conditions on running performance, but none have investigated whether gait biomechanics change as a result of environmental weather. For example, Ely et al., [9] reported a progressive reduction in marathon performance as temperatures increased from 5 to 25 degrees C, for both males and females and across competitive and recreational runners, but performance was more negatively affected for slower runners. These studies suggest that weather can affect both physiological and mechanical aspects of running gait. Thus, it is possible that different weather conditions may be associated with concomitant changes in gait biomechanical running patterns, however, to our knowledge no study has directly investigated this hypothesis.
The main reason the inter-relationship between environmental weather conditions and gait biomechanics has not been investigated is most likely due to the inability to collect such data in a laboratory setting. However, due to the availability and utility of modern portable inertial measurement units (IMUs) and global positioning system (GPS), it is now possible to collect data outside of the laboratory setting [10][11][12]. Since large quantities of data can be collected using wearable devices, machine learning (ML) techniques are also needed to better understand the complexities of gait biomechanics and how concomitant changes in biomechanical patterns may be related to injury or performance [13,14]. Furthermore, traditional biomechanics research generally investigates potential differences between two groups using groupbased analyses. For example, several researchers have identified differences in running patterns based on different age groups, gender and/or injury status [15][16][17]. In contrast, more recent research has shown that group-based comparisons are not efficacious due to the existence of sub-groups [18,19], and other studies have shown that subject-specific models are necessary to understand individual differences and risk factors [20][21][22][23]. Several authors have also used different ML algorithms to develop these sub-group-based models, including principal component analysis, support vector machine and hierarchical cluster analysis [17][18][19]. However, to our knowledge no study has directly investigated whether a subject-specific model provide deeper insight into emerging IMU-based biomechanical investigations based on changes in environmental weather conditions. Therefore, the purpose of this study was to develop a classification model based on subjectspecific changes in biomechanical running patterns across two different environmental weather conditions using data obtained from wearable sensors in out-of-laboratory environments. We hypothesized that we could classify changes in subject-specific running patterns based on weather conditions with a classification accuracy greater than 80% and that the ranked order of variable importance would be based on subject-specific ML models. A secondary objective was to determine the ranking of the biomechanical variables, based on their importance in the classification margin, in order to better understand changes in subject-specific running patterns.

Participants
Six recreational runners (Five females: age = 47.5±9.69 years, height = 169.17±6.56 cm, weight = 67.42±11.5 kg; and one male: age = 29 years, height = 170 cm, weight = 75 kg) volunteered to participate in the study. The runners were free of any neuromuscular diseases or musculoskeletal injuries and they were registered for a half-marathon training program managed by a local running group. This protocol was approved by the University of Calgary Conjoint Health Research Ethics Board (REB  and all runners provided their written informed consent.

Instrumentation
Biomechanical gait variables from each runner were recorded using the Lumo Run1 (Lumo Bodytech Inc., Mountain View, CA) wearable inertial measurement unit (IMU), consisting of a 3-dimensional (3D) accelerometer, magnetometer, and gyroscope. (dimension: 4.98cm x 2.84cm x 0.99cm). The Lumo Run IMU was attached to the posterior aspect of either the runner's waistband or running belt as per the manufacturer's instructions [24] (Fig 1). This wearable sensor device measured and recorded data for six different biomechanical variables [24] and averaged these data for each ten-strides (Table 1) and a complete description of these variables can be found on the manufacturer's website [24]. A GPS watch (Garmin vívoactive1 HR; Garmin International Inc., KS, USA) was attached to each runner's preferred wrist (Fig 1) and recorded running speed (m/s), distance (kilometers (km)), and global positioning data, including latitude, longitude and altitude, every second.

Data collection
Gait variables from winter runs were recorded from mid-February to mid-March, while spring runs were recorded from late April to mid-May. Each runner performed two training runs during each weather condition for a total of four runs used in this analysis. Each run began at 8:30 AM on a Sunday, and was completed outdoors on pavement, and along a similar route. Data corresponding to the temperature (degrees Celsius), snow depth (cm), precipitation (mm), and humidity (%) for each run were derived from three different International Air Transport Association-affiliated weather stations in Calgary, AB: Canada Olympic Park (WDU), Calgary International Airport (YYC), and Calgary INT'L CS Alberta (PCI).
For each run, data from km 0 to 1 were discarded, as this was considered a warmup period, and any data following 6-km was also not used in the analysis in order to minimize the effects of fatigue, if any. Therefore, only 5-km of data (i.e., from km 1 to 6) were analyzed from each run and in total, the input data consisted of 66,370 strides (~11,000/runner) across the four runs. Altitude, latitude and longitude data from the Garmin watch were used to ensure the elevation profile for each of the four runs were similar, and that the data from each run were collected from a route with minimal changes in elevation, in order to minimize the effect of running on uphill and/or downhill.

Data analysis
A robust, and non-linear machine learning classifier, called Random Forest (RF), was used to develop the classification model which measured the accuracy and importance of gait biomechanical variables in classifying runs of differing environmental weather conditions. The RF classifier has been shown to provide a higher classification accuracy than other existing ML classifiers with a faster computation speed, while facilitating complex interactions among predictor variables and providing information about the importance of each predictor variable [25][26][27]. In other word, RF provides variable importance measures to rank predictors according to their predictive power [28]. Two validation methods (Method 1: one-against-another and Method 2: partitioning datasets) were used to ensure that the proposed RF-based subjectspecific classification approach was robust and that the data were not overfit [29]. With Method 1 (one-against-another) data combining one winter run and one spring run were considered the training dataset, and the testing dataset consisted of the remaining winter and spring runs. With Method 2 (partitioning datasets), 70% of each runner's total strides performed in both weather condition were randomly selected for training, and the remaining 30% were used for testing purposes. Individual training and test sets were generated for each subject. Each classification method was applied using the standalone Python programming language (version 3.6, www.python.org) [30]. The developed RF model was trained and crossvalidated using the built-in Anaconda distribution of Python with notable packages including matplotlib, numpy, scipy, and scikit-learn ("sklearn.ensemble.RandomForestClassifier") [31,32]. The number of trees in the RF was set to 100, as previous research has shown this is a sufficient number for obtaining high accuracy solutions to similar classification problems [33,34]. Additionally, the RF used a Gini index to calculate the impurity of a node from the CART (classification and regression tree) learning system in order to construct the decision trees [26]. The RF trees compute a heuristic for determining how significant a variable (6 Lumo Run gait variables) is in predicting a target (weather). Statistical analyses were performed using repeated measures ANOVA (P<0.05) and Cohen's d effects size estimates were calculated for each difference on the outcome measures between each weather condition.

Results
Fig 2 presents an overview of the RF-based classification accuracy obtained with test data generated using the two validation methods. Using Method 2 (partitioning datasets), the RF-based model demonstrated an excellent overall mean classification accuracy of 95.42%. In fact, all runners yielded a classification accuracy higher than 90% with the exception of Runner 5, who exhibited a classification accuracy of 89.06%. In contrast, the overall mean classification accuracy obtained with Method 1 (one-against-another) was 87.18%, and all the runners yielded a classification accuracy higher than 85% except for Runner 5, who exhibited an accuracy of 70.47%. Significant differences (P<0.05) in the overall classification accuracies were also found between the methods. Overall, for all runners, Method 2 yielded a higher classification accuracy than Method 1. Moderate differences in classification accuracy were also observed between Methods 1 and 2 for Runner 5 (18.59%) and Runner 6 (14.37%), but the differences in classification accuracy between the methods were slight for Runner 3 (8.0%) and Runner 4 (6.14%), and non-existent for Runner 1 (2.16%), and Runner 2 (0.45%). Overall, the ranking of the variables, based on their importance in the classification margin, differed across all runners and classification methods ( Table 2 and Fig 3). For example, although vertical oscillation of pelvis was the most important variable, using both methods, for Runners 2 and 5, it ranked lower for Runner 1, wherein pelvic drop was the most important variable across both methods. Similarly, pelvic rotation was the second-ranked variable for both methods for Runners 2 and 4 but was less significant for the other runners. Overall, cadence was less important for all runners, with the exception of for Runner 3, wherein it was the second most important variable using Method 2. Another notable difference was found for braking where for Runner 4 it was the most important variable using Method 1 but only the third most important variable with Method 2. A similar inconsistency was found for pelvic rotation, which was identified as the most important variable with Method 1 but was ranked fourth with Method 2. The remaining three variables, braking, ground contact time, and cadence, were not found to be important for the classification task and were consistently ranked third, fifth and sixth across both methods, respectively (Fig 4). Table 2 also presents the results of the statistical analyses of the individual and overall results from both weather conditions. All runners, except Runner 4, demonstrated lower vertical oscillation of the pelvis in winter than in spring. The pelvic drop of two runners (Runner 2 and Runner 3) and the pelvic rotation of three runners (Runner 3, 4 and 6) were higher in winter than in spring. There was no clear difference in braking between winter and spring because three runners (Runners 1, 3 and 4) exhibited the same values during both conditions, two runners (Runners 4 and 6) had lower braking values in winter, and one runner (Runner 2) presented a higher braking value in winter. Two runners (Runners 1 and 2) had lower ground contact time values in winter, whereas two runners (Runners 3 and 4) had a higher ground contact time in winter, and the remaining two runners (Runners 5 and 6) had a similar value during both weather conditions. Finally, with the exception of Runner 4, all runners demonstrated a higher cadence during winter. Overall, five biomechanical variables (excluding cadence) demonstrated lower values during winter runs as compared to spring runs. However, no significant differences were found between the two weather conditions for any of the six variables (P>0.05). Cohen's d effect size and 95% confidence intervals [95%CI] are presented in Table 2 and reveal the effect sizes between winter and spring runs were small (i.e., d<0.5), except for vertical oscillation of the pelvis, pelvic drop, and cadence, which were moderate (i. e., 0.5<d<0.8).
The results of the environmental weather conditions are presented in Table 3 and show the average temperature, humidity and snow depth were significantly different between winter and spring runs, along with no differences in precipitation.

Gait Variable
Analyzed parameters

Subject-specific results
Overall results The speed and overall route were similar between sessions, as presented in Table 4. In addition, the speed, heart rate, altitude, latitude and longitude showed no significant differences between the two weather conditions (Table 4).

Discussion
The objective of this study was to classify changes in subject-specific running gait patterns based on the environmental weather (winter vs. spring) conditions using an RF classifier. The findings of the current study support our hypotheses and demonstrate that an RF-approach was a robust method for accurately classifying large datasets collected using wearable sensors in real-world settings. Interestingly, each subject's classification method had different important predictor variables based on the RF evaluation. Therefore, each individual runner exhibited different changes in overall gait biomechanics, and changes in the weather conditions affected the mechanics of individual runners differently. To our knowledge, this study constitutes the first examination of changes in subject-specific gait biomechanics based on environmental weather conditions. These findings also support the efficacy of wearable technology, and subsequent data science approaches for understanding the complexities of running gait patterns based on collecting data in out-of-laboratory environments [29,35].
Overall, the results of this investigation demonstrate that the presence of snow and colder temperatures results in runner-specific changes in biomechanical gait patterns, possibly in an effort to reduce the risk of falling due to the slippery surface [36]. These assumptions are supported by previous studies that also indicated injury rates were higher in colder weather conditions compared to warmer weather due to running on icy and slippery running paths [37][38][39]. Moreover, the results of the current study also indicate that the changes in running biomechanical patterns between weather conditions may contribute to overuse runningrelated injuries [5]. For example, when pelvic drop was important for classification (e.g. Runner 1), there was greater pelvic drop in spring than winter, but when it was not important (e.g. Runners 2 and 3), it was lower in spring than winter. A similar pattern was observed in vertical oscillation of the pelvis: when it was important (e.g. Runners 2 and 5), there was greater amounts of oscillation in spring than winter, but when it was less important (e.g. Runner 4), there was greater oscillation in winter than spring. These results suggest that the runners involved in the current study adjusted to different weather conditions by reducing vertical or frontal plane motion accompanied by slight increases in running cadence and shorter stride  length. However, it is important to note that all of the participants were injury-free and these aforementioned gait changes were not necessary to mitigate symptoms of injury. On the other hand, adopting a more constrained running pattern may, over time, may contribute to an overuse running injury [40]. Future prospective research is therefore necessary to help understand the inter-relationship between environmental weather conditions, concomitant and subject-specific changes in gait patterns, and the etiology of injury.
The RF classifier has received increasing attention within the gait-related research community due to its ability to yield excellent classification results and its fast-computational processing speed [41,42]. In addition, this classifier provides consistent classifications using predictions derived from an ensemble of decision trees as well as a ranking of the variables according to their ability to differentiate between the target classes [41,43]. The results of the current study are largely consistent with previous RF-based gait biomechanics studies involving wearable sensors (40,41). However, while research has investigated how IMUs systems can be used for the assessment of running biomechanics in laboratory and clinical settings [44], very few studies have been conducted in real-world settings [45,46]. Therefore, to provide insights into this knowledge gap and open new research directions, the current study developed and evaluated subject-specific methods, using an RF classifier using data from a single IMU, and achieved excellent classification accuracy results. Interestingly, the slight differences in classification accuracy obtained between the two tested RF-methods suggest that the inclusion of information from multiple runs is beneficial for building a successful model. In addition, the current study demonstrates that the RF algorithm was able to accurately classify and determine the relative importance of each input variable for an individual runner [47,48].
While it is important to note that the combination of multiple variables was needed to achieve a high classification accuracy and fully understand the multidimensional characteristics of the subject-specific running biomechanics associated with different weather conditions, the current findings can be compared to previous studies that have either addressed the effects of temperature on running performance [9,49,50] or injury rates [51]. For example, our findings are consistent with previous work demonstrating the usefulness of multidimensional analyses to better understand the complex patterns and inter-relationships between multiple biomechanical variables when classifying runners based on subtle differences in gait patterns that may be indicative of performance and/or injury [52][53][54][55]. Moreover, in the current study, regardless of the classification method, all runners exhibited slightly lower values for all biomechanical gait variables, except cadence, during winter as compared to spring. These findings support previous research indicating a more economical running technique with a lower risk Running biomechanical gait patterns identification based on environmental weather conditions of overuse injury during winter (colder) weather conditions [56][57][58]. Reduced pelvic drop has also been considered a protective factor for patellofemoral pain [59,60], as well as a gait retraining strategy to reduce pain associated with this common running-related injury [61]. Future research is therefore necessary using wearable sensors in real-world situations to help better elucidate these inter-relationships. To our knowledge, this is the first study to quantify subject-specific changes in real-world running gait biomechanics as a result of changes in environmental weather conditions. Moreover, the current study also represents one of the first investigations to analyse data from a runner's actual training run. Specifically, a recent systematic review [62] suggested that future studies should involve long-term data collections, across multiple running bouts, and in a runner's natural environment, thus enabling prospective studies and the development of subjectspecific models of gait. Considering that the etiology of overuse running injuries is multifactorial, and can result from the interaction of many extrinsic factors such as environmental conditions, the results of the current study are an important contribution to help to better understand injury etiology.

Limitation and future directions
The stated findings should be considered with respect to limitations. First, although there was a small number of runners (n = 6), the method employed is generalizable considering that we used subject-specific models to measure changes in gait parameters across 66,370 strides. Regardless, further investigation using a larger sample size is necessary to determine if homogenous sub-groups, or clusters, will form as a result of consistent within-group biomechanical changes (18,58). Second, we did not include any non-weather-related factors such as changes in runner's clothing, footwear, nutrition, sleep, or daily mood state profile. Future research should consider these factors in order to gain a more complete understanding of how external factors can influence running gait biomechanics. Third, although the present study examined two different weather conditions, these results of the present study may only be applicable to these weather conditions and temperatures. As well, the temperatures in the present study (i.e., -10˚C to +6˚C) were lower than those of Ely et al., [9] (i.e., +5˚C to +25˚C) and Knapik et al., [51] (i.e., +15˚C to +35˚C). Lastly, a limited number of spatiotemporal and biomechanical variables obtained from a commercially available wearable sensor device were used for the current study. While it is likely that additional or more complex variables from one or more wearable sensors could improve the classification accuracy of the current study, we posit that the simplicity and translatability to the current market of wearable sensors is a significant advantage that should not be overlooked. Regardless, future research should include a broader range of variables, and possibly more wearable sensor devices, in order to gain a deeper understanding for subject-specific changes in gait patterns during out-of-laboratory data collections.

Conclusion
In summary, our developed RF-based subject-specific classification model demonstrated excellent mean classification accuracies (87.18% and 95.42%) based on a large set of running gait data from a small group of runners. These novel results support the use of a robust machine learning approach for determining subject-specific changes in running gait patterns based on differences in external weather conditions using a single IMU device. We believe that our RF-based method may provide a more in-depth understanding of changes in gait biomechanics in response to extrinsic injury-risk factors and therefore conclude that the relationship between environmental weather conditions and gait biomechanics is subject-specific and multifactorial and involves unique interactions between intrinsic and extrinsic factors.