Prediction of gait trajectories based on the Long Short Term Memory neural networks

The forecasting of lower limb trajectories can improve the operation of assistive devices and minimise the risk of tripping and balance loss. The aim of this work was to examine four Long Short Term Memory (LSTM) neural network architectures (Vanilla, Stacked, Bidirectional and Autoencoders) in predicting the future trajectories of lower limb kinematics, i.e. Angular Velocity (AV) and Linear Acceleration (LA). Kinematics data of foot, shank and thigh (LA and AV) were collected from 13 male and 3 female participants (28 ± 4 years old, 1.72 ± 0.07 m in height, 66 ± 10 kg in mass) who walked for 10 minutes at preferred walking speed (4.34 ± 0.43 km.h-1) and at an imposed speed (5km.h-1, 15.4% ± 7.6% faster) on a 0% gradient treadmill. The sliding window technique was adopted for training and testing the LSTM models with total kinematics time-series data of 10,500 strides. Results based on leave-one-out cross validation, suggested that the LSTM autoencoders is the top predictor of the lower limb kinematics trajectories (i.e. up to 0.1s). The normalised mean squared error was evaluated on trajectory predictions at each time-step and it obtained 2.82–5.31% for the LSTM autoencoders. The ability to predict future lower limb motions may have a wide range of applications including the design and control of bionics allowing improved human-machine interface and mitigating the risk of falls and balance loss.


Introduction
Prediction of gait kinematics is a useful approach to improve the operation of assistive devices (i.e.bionics) and minimise the risk of falling [1][2][3][4][5].Prediction of the human gait however has been a challenging process due to the locomotor system's high degrees of freedom that continuously change and the asymmetrical foot contact with the ground [6].One of the most common straightforward approach for gait prediction is to combine forward dynamics with optimisation methods in which human muscle forces and limb movements are determined by minimising a cost function [7,8].The method, however, requires a long computational time, and it is highly dependent on the measured data [9,10].In order to achieve efficient computational time and to set the optimisation parameters without relying on the measured data, inverse dynamics along with optimisation methods are implemented to predict the human walking trajectories [11].Ren et al. [12] predicted all segments motion and ground reaction forces from the average gait forward velocity, double stance duration and the period of the gait cycle.Although those methods are able to capture the gait features, they idealise the human motions and were unable to generalise the gait trajectories [13,14].
The emergence of inexpensive wearable sensors such as Inertial Measurement Units (IMUs) [15,16] have posed new challenges (i.e. increase in dimensionality) and opportunities (i.e.new insights) to the analysis of the human movement [17] outside laboratory settings [4].In response to these challenges, a new set of Machine Learning (ML) algorithmic models have been widely adopted by biomechanists [17,18].The ML algorithms are a subfield of Artificial Intelligence (AI) concerned with the establishment of computer programmes that learn patterns from data [19].Computational techniques related to ML have been successful in solving several aspects of biomechanics gait research problems [20,21], such as the gait classification [22][23][24], joint angle prediction [25] and energy expenditure minimisation in lower limb exoskeletons [26].Tanghe et al. have applied the Probabilistic Principal Component Analysis (PPCA) to predict the future lower limb joint kinematics and achieved an error rate between 4.5-12.5% [27].The data however were collected at an imposed speeds (2 and 5 km.h -1 ) and the error was calculated at 3 points only in the gait cycle (10, 50 and 100%) [27].
One of the most utilised algorithms for the human movement prediction are the Artificial Neural Networks (ANNs) [17,28,29].A class of ANN known as deep learning (inspired by the structure and function of the brain) [30], were found to be insightful in human activity classification [31][32][33], gait pattern recognition [34] and the improvement of user intention detection in wearable assistive devices (i.e.bionics) [35][36][37][38].It was also applied in regression tasks such as the prediction of lower limb joint angles from the Angular Velocity (AV) and Linear Acceleration (LA) of foot and shank segments [39].Gholami et al. implemented Convolutional Neural Networks (CNN) to predict the lower limb joint angles from the foot LA data and achieved between 6.5-11.1% error rate [40].Nonetheless, the developed ANNs in the literature were predicting the gait trajectories (i.e.knee angles) from an independent variable (i.e.foot LA) and it was not implemented to predict future gait trajectories of the same independet variable.
The kinematics of the lower limbs are the means by which powered exoskeletons are controlled, falls are prevented and abnormal gaits are identified [1][2][3][4] The authors are not aware of previous work that investigated sequential ML models such as LSTM neural networks to predict future gait trajectories at preferred walking speed (PWS) and imposed walking speed.The LSTM neural networks are an ANN architecture known for modelling time-series information [41,42].The LSTM neural networks have proven wide success in modelling human movement data such as the lower limb kinematics prediction [43] neurodegenerative disease diagnosis [44], gait event detection [45,46] and falls recognition [47].The aim of this work was to develop and compare four standard LSTM architectures (Vanilla, Stacked, Bidirectional and Auto-encoders) for the prediction of future lower limb trajectories, i.e. foot AV, shank AV, thigh AV, foot LA, shank LA and thigh LA.In our previous paper [48], we found that LSTM autoencoder (i.e.LSTM architecture) is able to predict the future gait trajectories at an imposed speed.This work further investigates different LSTM architectures in predicting future gait kinematics when individuals walk at PWS and an imposed speed.

Study participants
Walking data were collected from 13 male and 3 female participants (28 ± 4 years old, height 1.72 ± 0.07 m, mass 66 ± 10 kg) who walked for 10 minutes at their PWS (4.34 ± 0.43 km�h -1 ) and at an imposed speed (5 km�h -1 , 15.4% ± 7.6% faster) on a 0% gradient treadmill.Trials were randomised and one participant's data (male) was omitted due to incomplete data.The PWS was calculated by starting the treadmill at 3 km�h -1 , then it was gradually increased until the participant says "this is comfortable".It was again increased until an uncomfortable speed (i.e fast walking) was reached.After that, speed was gradually decreased until the participant says "this is comfortable".An average value was then calculated from the two comfortable recorded speeds to represent the PWS [49].Each trial was started with a 2 minutes familiarisation session for treadmill walking.Ethics approval was granted by the Victoria University Human Research Ethics Committee (ID HRE18-230) and the Department of Defence and Veterans' Affairs Human Research Ethics Committee (Protocol 852-17).All participants signed a consent form and volunteered freely to participate.

Gait analysis
A set of 30 retroreflective markers were attached to each participant in the form of clusters [39].Each cluster comprised of a group of individual markers that represent a single body segment (e.g.shank).This include left and right foot clusters (3 markers), left shank cluster (5 markers), right shank cluster (5 markers), left thigh cluster (5 markers), right thigh cluster (5 markers) and pelvis cluster (4 markers).The 3D position of each cluster was tracked using a 14 camera motion analysis system (Vicon Bonita, Version 2.8.2) recording at 250Hz.Before capturing the dynamic trials, a static pose (1 second) was recorded where an additional 8 retroreflective markers were placed on anatomical landmarks (e.g.lateral femur medial epicondyle) identified by palpation [50][51][52].The static pose was used to calibrate the position and orientation of the lower body skeletal system [53].

Kinematic walking profiles
Recorded 3D positional data were processed using Visual 3D (C-motion, Inc, Version 6) to compute LA and AV for the thigh, shank, and foot segments of the right limb [48].LA and AV were then interpolated with a least-squares fit on a 3 rd order polynomial and filtered using a lowpass digital filter with a 15Hz cut-off frequency [54,55].A stride was defined as the interval between two successive heel strikes of the right foot [56].Outlier strides (i.e.bad strides) were labelled as bad and excluded from the final time series data.The final time series data were downsampled to 50 Hz (to accelerate LSTM computational time) [43] and normalised with zscores using Matlab (Mathworks, Inc, R2020a).The sagittal plane kinematics included the translation along the Y-axis (i.e.LA) and the rotation along the X-axis (i.e.AV), and were used for LSTM prediction, resulting in six predictor variables, (i) 1 and 2).

Datasets
Processed time series data (10,500 strides) were combined to include the two walking speeds.The total dataset comprised of 10,500 strides from 15 participants that included 6 kinematic feature variables (X 1 , X 2 , X 3 , Y 4 , Y 5 , Y 6 ).The dataset was divided into training set from 14 participants and a testing set from 1 participant.To evaluate the generalisation capability using leave-one-out cross validation, the testing set was created for each participant comprised of 75 timesteps (i.e.1.5s of the gait cycle) and the 6 kinematic feature variables (Table 1).A single timestep is equivalent to 0.02s (i.e.1/50Hz).The 75 timesteps was selected based on the average stride time and was selected from the first right foot strike following the familiarisation session.

Time series transformation to a supervised learning problem
The T � F (i.e.Timesteps � Features) data structure was transformed into S � T � F (i.e.Samples � Timesteps � Features) structure (Fig 3) [48].One sample is a one window that consists of multiple timesteps and the 6 features.A single timestep is equivalent to 0.02s (i.e.1/50Hz).The training input data was transformed from 614,083 timesteps and 6 features into 122,811 samples and inside of each sample are 25 timesteps and 6 features.While the corresponding output training data was 112,811 samples and inside of each sample are 5 timesteps and 6 features.The testing input data was converted from 75 timesteps and 6 features to 15 samples, 25 timesteps and 6 features.While the corresponding output testing data was converted to 15 samples, 5 timesteps and 6 features.The LSTM model architectures Regression problems are amongst the challenging tasks to ANN [57].Due to the parameters' initialisation (i.e.neuron weights), the bias-variance trade-off and the function approximation that may trap at a local minima [57].As such, it is necessary to experiment with different network architectures to find the optimum solution.After determining the possibility of future trajectories in our last paper [48], this work was to determine which LSTM architecture performs the best.There was no report that was found looking into the optimum LSTM neural networks model for gait forecasting prediction.There were 4 LSTM neural network variants that have been tested in this paper.This include; (1) Vanilla LSTM neural networks, (2) Stacked LSTM neural networks, (3) Bidirectional LSTM neural networks and (4) LSTM autoencoders neural networks.A detailed description for the LSTM model as well as each architecture is provided in the S1 Appendix.
The sliding windows and the LSTM models were developed using Python 3 (Libraries: Keras, Numpy, Pandas and Scikit learn) and executed in Amazon Web Services (AWS) Elastic Computing (EC2) [58,59].The networks were optimised using Stochastic Gradient Descent (SGD) optimisation algorithm [60][61][62].Proposed by Rumelhart et al,. 1986 [63], the SGD algorithm aims to obtain the minimum error (MAE in this work) at each batch using the network weights and biases.Using a sparse grid-search, for all models the SGD's learning rate was tuned to 0.07, the gradient norm was clipped to 1.0 and the momentum (for accelerating the gradient descent) was set to 0.9 [64].
The models' configuration (Table 2) was determined based on a sparse grid search for obtaining the least MAE on an unseen participant.All participants were tested using the leaveone-out cross validation technique.When a participant is tested all of their associated trials were removed from the training set (5k/h and PWS).In order to fascilitate model comparison, the models were compared based on the same tested participant.Due to the uniqueness of individual walking patters and the different PWS across participants [67,68], the testing set for all participants was kept fixed at 75 time-steps (starting from the foot strike).
To evaluate each model's performance, 5 parameters were considered to calculate how closely the predicted variable trajectories ŷj (X 1 , X 2 , X 3 , Y 4 , Y 5 , Y 6 ) were to the actual variable trajectories y j (X 1 , X 2 , X 3 , Y 4 , Y 5 , Y 6 ) across the n timesteps: 1. Mean Absolute Error (MAE) given as: 2. Mean Squared Error (MSE) given as: 3. Correlation coefficient (CC) given as: Where, std() is the standard deviation and covðy; ŷÞ is the covariance between variables y and ŷ.
4. Root Mean Square Error (RMSE) given as: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi 1 n 5. Normalised RMSE (NRMSE) (27,40) given as: Where max (y) and max (n) are the maximum and minimum values of the trial's ground truth.

Results
All models were firstly evaluated at PWS and on the same participant across 75 timesteps.Trials related to the tested participant were removed from the training set.Models performance (combined output windows) for each of the independent variables that belong to the tested participant are shown in Fig 4 for all LSTM models.A single output window (prediction horizon) is a 5 time-steps window (0.1s).All models have shown good tracking of the actual trajectories for the shank and thigh AV (Fig 4c and 4e).Poorer predictions were attained for all segment predictions based on LA.Models were then evaluated at PWS and 5km.h -1 combined based on leave-one-out cross validation technique for each participant (see Tables 3 and 4).All models achieved good predicted trajectories for AV related to the shank (X 2 ) and thigh (X 3 ) (low RMSE in Fig 5c and 5e) and good vector patters for all feature vectors (High CC in Fig 5).Predicted trajectories based on the AV (MAE 0.176-0.318deg/s) were generally less erroneous than the predicted trajectories based on the LA (MAE 0.184-0.379m/s 2 ) across all the LSTM models (see Table 3).The ED LSTM is found the best predictive model for future predictions of the lower limb kinematic trajectories at PWS and 5km.h -1 .The wider the gap between the two points (CC and RMSE) in  5) and a more accurate predicted LA and AV trajectories (lower RMSE in Table 3 and Fig 5).The Vanilla LSTM obtained the poorest prediction results (7.06-11.14%)compared to the Stacked LSTM (5.74-9.88%), the Bi-LSTM (4.71-7.83%)and the ED-LSTM (2.82-7.91%)(Table 4 and Fig 6).

Discussion
The aim of this study was to investigate the capability of 4 LSTM architectures in predicting future lower limb trajectories of sagittal plane kinematic variables at an imposed speed (5k/h) and PWS.The predicted kinematic variables are the foot AV (X 1 ), shank AV (X 2 ), thigh AV (X 3 ), foot LA (Y 4 ), shank LA (Y 5 ) and thigh LA (Y 6 ).Prediction was performed using: (i) Vanilla LSTM, (ii) Stacked LSTM, (iii) Bi-LSTM and (iv) ED-LSTM.Results suggested that stacked and ED LSTM models are more reliable in predicting the future trajectory of the lower limb kinematics up to 0.1s (5 time-steps) (Fig 6).The ED-LSTM achieved the most accurate predicted kinematic trajectories among the other LSTM architectures (see Table 3).The prediction of future gait trajectory has the potential to expand the horizon of solving several problems in human movement science.In general, the LSTM models are designed to predict the future trajectories of any timeseries variable related to the human motion (e.g.knee angle).A known future gait trajectory adds a feedforward term to powered exoskeleton devices instead of predominantly relying on feedback sensors [1,27,29,69].This would improve device  performance by narrowing down the nonlinear kinematic differences between the user and the device and therefore avoid altering the user's natural gait trajectories [70,71].Prediction of future gait trajectory could substantially improve the design of prosthetics by adapting the device controlling parameters according to the user's movement [72].Additionally, a known 0.1s future gait trajectory falls in the range of slow and fast twitch motor units [73] and may allow a response time (10-120ms) for an elderly patient to adjust their gait and avoid an imminent risk of tripping or balance loss [73][74][75][76][77]. Four metrics were implemented to evaluate the LSTM prediction quality (Tables 3-5).Results have shown that LSTM predictions based on LA were worse than predictions based on AV in all models, possibly due to the double derivative generating a noisier signal (Fig 4).The foot AV (X 1 ) predictions showed greater error (MAE 0.276-0.318deg/s) throughout models compared to the shank (MAE 0.176-0.221deg/s) and thigh (MAE 0.184-0.225deg/s) AV, likely due to the grater variation of the foot trajectory throughout the gait cycle [40].The NRMSE was used to facilitate the comparison between the LSTM models performance and to simplify the understanding of error rates for cross-disciplinary research.The Vanilla LSTM attained generalisation earlier (100 epochs) compared to all other models due to its simplicity and the fewer required parameters to train [57].Albeit, it obtained higher error rates compared to the Stacked LSTM and the ED-LSTM (Fig 6 and Table 4).The Bi-LSTM have shown a competent prediction quality in this research (Table 4).
All models were good at predicting the signal patterns but were erroneous at obtaining the actual trajectories (see Fig 5).This indicates that prediction at PWS further challenges the prediction quality and it can be improved by training and testing the model on the same participant.The walking speed was found to be the most influential variable amongst sex, age and body max index on the ambulatory kinematic and kinetic profiles [78].The changes of speed are known to have substantial impacts on the spatiotemporal as well as the kinematic and kinetic patterns of the gait cycle among different age groups [79,80].Normal (comfortable or preferred) walking speed reported in the literature had averages ranging from 1.05m.s - to 1.43m.s - (cadence of 101 to 122 steps/min,) [81,82].The imposed speed 5km.h -1 was found to be the general average PWS in previous studies [83][84][85][86] and it was adopted in this work to generalise the LSTM models to populations outside the recruited participants cohort.Prediction at the imposed walking speed of 5km.h -1 was found to be good in our previous work using the ED-LSTM [48] and in the literature using the PPCA [27].The prediction at PWS however, allows the development of ML models that are better suited to individuals who might have different PWS which in return naturalise the human-machine (i.e.bionics) interface.
In our previous paper, we found that prediction of future kinematics trajectory (up to 0.06 s) was possible at an imposed speed (5km.h - ) using ED-LSTM.In this work we have expanded the prediction horizon up to 0.1s and investigated the other LSTM architectures to predict the kinematics trajectory at PWS and imposed speed (5km.h - ).The input (25 timesteps-0.5s)and output (5 timesteps-0.1s)sliding window sizes were designed as per the work by Zaroug  et al. [48].The combination of PWS and imposed speed 5km.h -1 timeseries data, widens the variability in the training data and further challenges the ML model to maintain a good prediction quality across different walking speeds.A ML model trained on PWS allows a generalised predictive model that could be fine tuned across different participants.The number of participants was increased for both genders (Male and female) to our previous work [48].The foot segment was also added as the foot is the most distal segment of the human locomotor multisegment chain and plays an important part in maintaining balance, support and locomotion.Foot's movement directly affects the lower extremities (i.e.ankle) dynamics and control [87] as well as the body's Centre of Mass (COM) movement [88,89].
In contrast to the predicted trajectories evaluation technique by Tanghe et al. [27] (i.e. 3 selected time-steps), the performance evaluations of this work were calculated relative to the mean based on each of the predicted 75 time-steps (i.e.1.5s).The ED-LSTM in this paper attained lower NRMSE range (2.83-5.31%)than the CNN implemented by Gholami et al. (6.5-11.1%)[40].These results suggest that ED LSTM neural network is a more suitable model to capture features related to sequential time-series lower limb kinematic data [42,44].The ED LSTM achieved the most accurate predictions in this paper (see Table 3).As demonstrated in the ED-LSTM architecture (see S1 Appendix) and in our previous paper [48], the internal learning process for the ED LSTM is unsupervised [90].The ED LSTM obtains predictions based on a two learning phases.The encoder maps the input data (i.e. the input window) into a hidden layer and learns a compressed feature representation of the independent variables.While the decoder reconstructs the input data from the hidden layer to obtain the target dependent variables from the compressed feature representation.The optimiser (i.e.SGD) then minimises the reconstruction error which is the difference between the input and the reconstructed output [5].This type of learning approach allowed the ED LSTM to obtain quality features from a given input of kinematic trajectories [91].The unsupervised feature learning paradigm (i.e.encoder) and the reconstruction of time series information (i.e.decoder) has made the ED LSTM a good architecture for high level deep features extraction and for generative models [90,92,93].The complexity and the power of the ED LSTM could be extended to other learning techniques such as the unsupervised greedy layer-wise pre-training referred to as pretraining [57].The pretraining involves the training of a shallow layer and sequentially adding up and refitting a new hidden layer to learn inputs from the existing previous layer (i.e.shallow layer) while keeping fixed the learned weights and biases of the previous layer [94].The pretraining structure opened up the opportunity to train very deep stacked ED LSTM with less possibility of overfitting (because the training is performed in layer-wise) and a lower generalisation error [95].
This work was limited to 15 young healthy participants.There exists a proportional relationship between the human age and their walking speeds.The walking speed was found to be slightly decreasing each year among healthy male and female populations [96].Future work is needed to test the models' reliability on predicting slower speeds than that tested and accommodate predictions related to elderly population who may walk slower.Patients with balance issues or fall history should be recruited to further understand the potential application of this work for falls prevention [74,75,97,98].Finally, a more complex model such as the hybrid models (i.e.ConvLSTM) or a different learning technique such as the greedy-layer wise pre-training [95] may help expand the prediction horizon while maintaining the prediction quality.

Conclusions
In this work we developed and compared 4 LSTM architectures for the prediction of future lower limb kinematics (i.e.foot AV, shank AV, thigh AV, foot LA, shank LA and thigh LA).
Results suggested that future lower limb kinematics while walking at PWS and at 5km.h -1 can be well predicted up to 0.1s with ED-LSTM and Stacked LSTM.These findings highlight the potential of LSTM neural networks to predict the future trajectories of the human movements.This could have application in exoskeleton control systems or falls prevention.Further work is needed to understand the model's robustness under different walking conditions and in participants with a pathological gait.Future directions would incorporate other LSTM baseline varients such as the Gated Recurrent Unit (GRU) and the LSTM with attention or selfattention.

Table 5 . Leave-one-out cross validation test evaluation based on the CC at the PWS and 5km.h -1 combined.
Each of the predicted variables (i.e.X 1 , X 1 , . .., Y 6 ) was evaluated for the 4 trained LSTM architectures.