Figures
Abstract
Innovative technologies for developing intelligent systems related to locomotion prediction learning are crucial in today’s world. Human locomotion involves various complex concepts that must be addressed to enable accurate prediction through learning mechanisms. Our proposed system focuses on locomotion learning through vision RGB devices, ambient sensors-based signals, and physiological motions from biosensing devices. First, the data is acquired from five different scenarios-based datasets. Then, we pre-process the data to mitigate the noise from biosensors and extract body landmarks and key points from computer vision-based signals. The data is then segmented using a data windowing technique. Various features are extracted through multiple combinations of feature extraction methodologies, followed by feature reduction using optimization techniques. In contrast to existing systems, we employ both machine learning and deep learning classifiers for locomotion prediction, utilizing a modified body-specific sensor-based Hidden Markov Model and a deep Exponential Residual Neural Network, respectively. System ontology is also presented to elucidate the relationships among the data, concepts, and objects within the system. Experimental results indicate that our proposed biosensor-based system exhibits significant potential for effective locomotion prediction learning.
Citation: Javeed M, Jalal A, AlHammadi DA, Lee B (2026) Deep locomotion prediction learning over biosensors, ambient sensors, and computer vision. PLoS One 21(2): e0342793. https://doi.org/10.1371/journal.pone.0342793
Editor: Andrea Tigrini, Polytechnic University of Marche: Universita Politecnica delle Marche, ITALY
Received: October 11, 2024; Accepted: January 28, 2026; Published: February 23, 2026
Copyright: © 2026 Javeed et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data underlying the results presented in the study are available from https://datadryad.org/stash/dataset/doi:10.5061/dryad.v6wwpzgsj https://github.com/wilfer9008/Annotation_Tool_LARa/blob/master/README.md https://www.cs.cmu.edu/~espriggs/cmu-mmac/annotations/ https://archive.ics.uci.edu/dataset/226/opportunity+activity+recognition https://paperswithcode.com/dataset/berkeley-mhad.
Funding: This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. RS-2023-00217471). This work was supported through Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2026R508), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Human locomotion learning is an important aspect of artificial intelligence (AI)-based systems for human motion applications [1]. Intelligent sensors-based system processing has given a boost to the locomotion prediction learning field [2–4]. It is beneficial to utilize advanced sensory devices, analyze those signals, and enable human motion pattern recognition in healthcare systems, smart homes, lifelog routine management, and smart surveillance systems [5–8]. Biosensors such as inertial measurement units (IMU) and electromyography (EMG) sensors can acquire physiological data important for human body dynamics exploration [9]. Algorithms, including machine learning and deep learning, can process this data to predict different motion patterns, including gait, postures, and movement [10].
Ontology agents are the AI-relevant agents used to enhance the ability of a system to process and interpret information. An ontology can support representing the knowledge related to the locomotion prediction system domain. It contains the data interrelationships, concepts, and characteristics to provide a structured framework for agents to share and integrate information, which will help make more informed decisions [11,12]. Since our proposed system consists of multiple sensors-based data, ontology will facilitate understanding and incorporating different sensors to enhance the AI reasoning of the agents, as well as learning and communication abilities [13]. Ontological agents can help adapt new locomotion activities and update knowledge to make our proposed system adaptive. They can also enhance the system’s ability to predict and respond to changes in motion patterns [14].
Several systems have been proposed in this research area to predict human locomotion using sensor data. While some studies rely on single sensors [15–17], others integrate multi-sensors, proposing multi-modal systems [18–22]. However, these approaches face challenges such as signal drift [15], data fusion problems [18], background noise present in biosensors-based data [21], pre-processing step missing [20], sensors-based calibration limitations [22], inability to distinguish different actions [22], features extraction and selection not applied [23], limited data [16,24], irrelevant descriptors [20,23], and restricted movements recognition [17] causing degraded performance when it comes to locomotion prediction learning [23–25].
To address these limitations, we propose an intelligent system that integrates biosensors, ambient sensors, and computer vision for efficient and effective human locomotion prediction learning by using multi-sensory devices instead of single sensor. First, pre-processing is performed for noise reduction and a novel kinematic and static patterns recognition approach is defined in biosensors-based signals along with body-point extraction from videos explained in section 3.2. This component has helped in signal drift, sensors-based calibration issues and background noise reduction present in traditional systems. Next, a data segmentation method is utilized explained in detail in section 3.3, which supported in achieving better results in terms of system performance by reducing the overall data size of monitoring system and dividing it into segments for efficient processing. It helped in addressing the data fusion problem reduction, which is partly caused by the big sized data processing. Also, data fusion has been performed at the features level to resolve challenges faced for multi-sensory devices-based data integration in previously proposed systems. Then, relevant descriptors are extracted from each type of sensor for catering distinguished patterns in each human action explained in section 3.4, followed by the reduction of extracted features to handle the dimensionality issue present in conventional approaches given in section 3.5. Furthermore, our proposed system recognizes various human actions captured during data collection using multiple scenario-based datasets mitigating the limited data challenge present in literature. There are different types of actions that can be performed in different scenarios of lifelog routine. The conventional systems have focused over limited scenarios that makes the system’s practical implementation very limited but our proposed system has focused over wide range of movements recognition explained in section 4.1. However, by experimenting in different setups-based data, this system has the ability to learn multiple scenarios and supports human locomotion prediction learning, yielding acceptable results causing performance enhancement.
This paper is organized as follows: Section 2 gives a detailed overview of our proposed system and its implementation details. Section 3 shows the experimental results and their outcomes for each sensor type and the complete system. Section 4 discusses the limitations and challenges present in the proposed method. Section 5 concludes the paper with future directions.
Methods
This section provides a comprehensive framework detail for our proposed locomotion prediction system. It provides a detailed system overview, from data acquisition to locomotion prediction both via machine learning and deep learning algorithms. Fig 1 shows the overall architecture of our proposed system.
Next, data segmentation has been applied followed by features extraction for each type of sensors-based data. After fusing the different extracted features, descriptors have been optimized and finally for locomotion classification, BSM-HMM and DERNN have been used.
Data acquisition
In the proposed method, we acquired data from five different datasets using all three types of sensors, including biosensors, ambient sensors, and vision sensors. Data was collected from Opportunity++ [26], CMU-MMAC [27], Berkeley-MHAD [28], HWU-USP [29], and LARa [30] datasets. The reason for selecting these five datasets is to gather data from different human locomotion to cater to the versatile complex activities and the simple actions performed by humans in daily lifelog routines.
Pre-processing
It is an important step in our proposed system for locomotion prediction over multi-sensory devices and ontology agents. The noise present in the signals can cause degraded performance through incorrect pattern recognition. As a pre-processing step, a Wavelet Transform Quaternion-based filter [31] is utilized for biosensor filtration. First, the three signal readings from biosensors, including acceleration, gyroscope, and magnetometer, are retrieved from the IMU. To remove noise, we use the calibration phase to remove gravitational error from acceleration, drift error from the gyroscope, and magnetic error from magnetometer signals. Further, we use Quaternions and a gradient descent technique to normalize the data into vectors in a mapping and optimization phase. After filtration, the kinematic and static patterns [32] are detected from IMU signals to recognize the abrupt changes in signals caused by complex motion signals. The phase angle [33] is used to detect the learning phase of signals using (1) as:
where is the angle of the wth acceleration signal,
provides the wth gyroscope signal angle,
shows the angle of the wth magnetometer signal and
is the selected the wth signal. Fig 2 shows the phase angles extracted from each acceleration, gyroscope, and magnetometer signal over a red threshold line separating the kinematic and static signals. The yellow stars above the threshold show kinematic pattern detection, and the ones below the threshold represent static patterns.
The Butterworth filter [34] is used to pre-process the ambient sensor signals to reduce the surrounding noise. The filtration is performed using (2) as:
where T presents a domain transfer function and for the 15th order Butterworth filter, which helps reduce the noise much better compared to other similar filters. Fig 3 shows the actual acceleration and filtered signals over the Opportunity++ dataset. As illustrated in Fig 3, the filter effectively reduces noise in accelerometer signals from inertial sensors employed for ambient sensing.
To pre-process the computer vision-based video sequences from RGB videos, a delta of 45 images is selected to avoid processing costs and delays in the performance of the locomotion prediction system. A background image is selected for each type of video and subtracted from all the video sequences to detect human figures. Next, a landmark detection method [35] is applied to calculate the human position in a frame p as (3):
where is the frame p’s boundary and
is the human silhouette. To extract the human torso landmark,
is obtained using (4) as:
where presents the addition of human silhouette height and width to extract the human shape pixel
. The midpoint of
gives the torso mid-point. Now, by utilizing the body shape and size of the human silhouette, the head point and feet
landmarks
are detected as (5):
where is the frame sequence for each dataset, after the detection of head and feet landmarks, the midpoints are used to represent the head and feet body-points. Then, the neck, elbow
, and knees are detected using (6):
where is the landmarks detected for the neck and elbow. After dividing by half, the midpoint of the torso and head provides the neck point, and the midpoints of elbow landmarks give the wrist or hand body-points. Elbow points are tricky and need to be extracted by mining the landmark size and considering one of the most right, most left, lowest, or highest midpoints from elbow landmarks. The knee points are determined by finding the midpoints between the torso and feet. This process aids in constructing a 2D stick model [36] by connecting the extracted midpoints from the human silhouette. Fig 4 shows a 2D stick model after extracting landmarks and body-points. The red dots in Fig 4 give the eleven body-points extracted from each landmark. The red dots are further connected using the green and orange lines, where the green lines indicate the upper body 2D stick model and the orange lines show the lower body 2D stick model.
Data segmentation
Comprehensive data segmentation [36] is applied to the pre-processed data obtained from all three types of sensors. Fig 5 shows the data segmentation process performed on a data chunk by incorporating the time, events, and sequences. The red dashed lines indicate the segment separation for each ∆ time, δ event, and ō sequence. Specially, locomotion n presents the biosensor signals, event n corresponds to the ambient sensor signal, and video n denotes the video sequence. Experiments were conducted using window sizes of 2, 3, and 5 seconds over the combined dataset. Based on empirical analysis, we found that 3-second overlapping windows yielded the most efficient and effective results.
Descriptors extraction
To utilize their characteristics, we propose two novel descriptor extraction methods for each kinematic and static pattern. To do this, a spatial-temporal graph from multi-synchro squeezing transform (MSST) [37] signals are extracted for kinematic patterns using (7) as:
where is the time-frequency spread for the I-th iteration. Then, a short time periodogram
can be calculated using (8) as:
where is obtained over
window length, for time
and frequency
. Next, six frequencies-based nodes are used to construct a spatial-temporal graph. To get the graph, Laplacian matrix (LM) can be obtained using eigenvalues and eigenvectors as (9):
where is the eigenvector and
is the matrix for eigenvalues. Fig 6 compares our proposed technique with the previous study [38]. The earlier method used short-time Fourier transform (STFT) signals, while we propose using MSST signals, which are more effective for analyzing impulsive-like signals [39] and better suited for handling the complexity of kinematic energy signals, as shown in Fig 6.
As a next step, a linear prediction cepstral coefficient (LPCC) based spatial-temporal graph is extracted to extract descriptors from static biosensor signals. Five frequencies are used to transform a short-time periodogram into a spatial-temporal graph. The cepstrum can be extracted using (10) as:
where is LPCC,
is linear prediction coefficient,
offers the number of relevant to LPCC, and
denotes the number of iterations. Fig 7 illustrates our proposed spatial-temporal graph for static biosensor signals via LPCC.
To extract descriptors from ambient sensor pre-processed data, we propose a N-sensors based graph F using descriptors matrix d and adjacency matrix m [40] as (11):
where is the descriptors, matrix using type
sensors,
is the number of neighbors, and
orientation for
iterations. Fig. 8 shows the proposed ambient sensor descriptors extraction method in detail. For N sensors, we have the fully connected graph as shown in Fig 8. For each sensor, the descriptors are based on sensor type, sensor orientation, number of neighbors, and adjacent nodes, as calculated in (12).
For the video sequences pre-processed data, we utilize the eleven body-points and 2D stick model, including head, neck, left wrist, right wrist, left knee, right knee, left elbow, right elbow, torso, left ankle, and right ankle. A Hamiltonian circuit is used to extract the graph exactly once without edge repetition and returns to the starting node. We have divided the 2D stick model into two Hamiltonian graphs [41], such as the upper body and lower body. Pearson correlation p(i,j) [42] can be calculated for each corresponding node using (13) as:
where is the mean for
and
is the mean for
,
is the x-th node sample and
is the y-th node sample. Afterwards, the descriptors are formed in the matrix shape using nodes and edges, as shown in Fig 9. The red dots in both Fig 9(a) and 9(b) display the eleven body-points extracted. The green lines in Fig 9(a) represent the upper body along with the captioned black Hamiltonian path for the upper body Hamiltonian circuit generated. Fig 9(b) shows the orange lines and the captioned black Hamiltonian path for the lower body Hamiltonian circuit produced.
To get the full body-based descriptors, the disparity for each two consecutive frame sequences
and
is calculated using (14) as:
where and
are coordinates,
are the size of the sequences
and
,
is the sum of squared differences, and
represents a landmark for
coordinate. Next, a landmarks-based disparity map
is calculated using (15) for
and
coordinates. Matching pixels from frame sequences
and
are extracted using the sum of absolute values
as (16):
Finally, to calculate the landmarks-based disparity map [43], an 8 × 8 grid of 4 × 4 pixels each is mined using center point
as (17):
where is the center point in landmark
and pixels
. A descriptor matrix can be obtained using full body image sequences as in Fig 10. The extracted landmarks are shown in Fig.10(a), the landmark-based disparity map is calculated using (15) and displayed in Fig 10(b), and the grid is computed using (17) and given in Fig 10(c), where the red dot in each grid denotes the center point
.
Descriptors selection
After the multi-sensors-based descriptors extraction, data fusion has been applied using the time series. We have applied it to feature-level and descriptors from all three types of sensors have been fused together using feature-level fusion over time. Furthermore, a modified multi-layer sequential forward selection (MLSFS) [44] method is used to reduce the dimensions of extracted descriptors for descriptor selection in the proposed method. Sequential forward selection is utilized to modify this algorithm to achieve reduced vector R using (18) as:
where is a subset of descriptors of size d selected from the original descriptor set, D represents the dataset containing the input values, M is the classification model used to evaluate the descriptor subset, and
denotes an evaluation function (e.g., classification accuracy or another performance metric) used to score each subset
given the dataset D and model M. This equation helps to repeat the selection of descriptors until all the correlations are compared and the final descriptors vector is selected. We have experimented using different types of optimization and selection methodologies and found MLSFS to outperform other techniques, including linear discriminant analysis, Fisher linear discriminant analysis, and sequential forward selection.
Sensors-based ontology
Due to the large vector size causing heterogeneity even after descriptors reduction, it becomes difficult to manage data from multi-sensory devices. Hence, the domain knowledge in the form of sensors-based ontology [45] is presented in this proposed system. This sensors-based ontology supports the system for better locomotion prediction by explaining the sensors used, interpretations, processes, and characteristics. We have divided the sensors domain into biosensors, ambient, and vision. Then, the interactions with events, time, situation, and network are also presented in the form of strategies. The following equation helps to extract the semantic similarity between two concepts as (19):
where is the depth of the sememe concept
with adjustable parameters
and
and length of path from concept
to concept
as
. We define structural similarity calculation rules as:
- Parent nodes for concepts and concepts are alike in the constructed ontology tree.
- Two or more concepts and their children’s nodes are alike.
- If two concept nodes are alike, then their sibling nodes are also alike.
After defining these rules, we calculate the structural similarity in two concepts as in these equations:
where and
represent the concepts of different ontologies,
provides the collection of nodes related to the
concept [46]. Fig 11 shows the sensors-based ontology proposed for the locomotion prediction system. It contains seven ontological modules or patterns: event, time interval, situation, biosensors, ambient sensors, vision sensors, and network. Each ontology module is related to a few other concepts using the ontology property. Each ontology pattern also consists of a set of ontology classes.
Locomotion prediction
Machine learning and deep learning each have unique characteristics, and both can be applied to a wide range of applications. However, when it comes to matters involving human life, it is crucial for our system to achieve the best possible results. To this end, we propose a custom machine learning algorithm named Body-specific Sensors Modified based on the Hidden Markov Model (BSM-HMM) and a deep learning model called the Deep Exponential Residual Neural Network (DERNN).
BSM-HMM is inspired by a statistical model [47] consisting of finite states at time
, set of vertices
, and a transition probability matrix
as hidden Markov model (HMM)
in (21) and (22) as:
The probability of visiting a state sequence with events , possible events
, and parameters
can be extracted using (23).
An HMM was trained for each kinematic and static patterned signal related to every single dataset. Fig 12 shows the BSM-HMM flow diagram for different body-specific sensors. We separated the sensors-specific HMMs into five individual HMMs using the head, mid-body, lower body, ambient and vision-based specific sensors. Active head-specific sensors-based HMM consists of all the biosensors sensors actively working at the head or neck positions. Next, we have active mid-body biosensors, specifically HMM, representing biosensors attached to the shoulders and waist of the human body. Then, active lower-body specific sensors include all the biosensors attached from the thighs to the feet of the human body. Furthermore, active ambient sensors-specific HMM refers to all the sensors attached to the surroundings of the human, including accelerometers, RFIDs, and PIR sensors-based data classification. Finally, the active vision-based sensors-specific HMM represents the RGB camera extracted data-based HMM, focusing solely on classifying vision data.
A deep learning-based model named DERNN is proposed in this study extracted from regression convolutional neural network (RCNN). In [48], a system is proposed for the prediction of porosity in computed tomography (CT) scans using RCNN. We modified the RCNN to cater to the multi-sensory devices-based data requiring multiple sensors. The proposed DERNN can be defined using n descriptors for s sensors in derivatives of exponential as (23).
where p is a tunable parameter specific to the s-th sensor, s represents the total number of sensors, and n denotes the number of descriptors extracted from each sensor. For each locomotion action performed, different slope values are computed per sensor. Subsequently, each sensor-based slope is divided into 100 segments (or slices) and fed into a residual neural network (ResNet) based on the ResNet-50 framework. A matrix is extracted for each slice and processed by ResNet-50 in five stages. The first stage focuses on multiple layers, such as convolutional, normalization, ReLU, and max pooling. Then, the 2-nd to 5-th stages is the repetition of convolutional layers having input to neurons followed by a pooling layer to reduce the descriptors. Further, a flattening layer is utilized to change the matrix from a 2D to a 1D constant linear vector. It helps reduce computational complexity, and a fully connected layer is introduced to predict locomotion activities. Fig 13 shows the proposed DERNN and its comparison to the previous RCNN [48]. The previous study contained input based on regression equations using laser power and scanning speed parameters; however, we proposed using the sensors and their descriptors as the input parameters based on derivatives of exponentials. The previous study utilized cube-based data formation, whereas we used slopes for each sensor-based data for further processing. Then, we sliced the data into 100 slices, and as compared to the previous method, we used DERNN instead of RCNN for training the system. The complexity of DERNN is based on the complexity of RCNN, and it is O() in terms of Big O notation.
The proposed DERNN can be applied to classification problems involving dense data that require monitoring multiple parameters. For example, healthcare application systems, industrial control systems, and physical monitoring systems are a few of the potential real-time systems that may utilize the proposed model. By using DERNN, the systems can be evaluated using accuracy, precision, recall, F1 scores, and other evaluation metrics. Therefore, we can say that our work is crucial when it comes to the evaluation of locomotion prediction learning systems. This study can be further optimized using other similar methods-based comparison of DERNN with conventional deep learning methodologies.
Results
This section describes the main findings of this proposed study along with the multiple validation methodologies. We discuss the datasets used, followed by sensor-based assessments, and provide an overall system evaluation. Furthermore, we compare the pros and cons of BSM-HMM and DERNN. A comparative study is presented to evaluate and analyze the overall performance against existing state-of-the-art systems in the literature.
Datasets
In this subsection, a concise introduction to each dataset is presented, accompanied by a rationale for their selection in this study.
Opportunity++ Dataset.
Opportunity++ dataset contains data from each type of sensor, including biosensors, ambient sensors, and vision sensors. It also consists of five different sequences of activities performed during the daily living routine and a drill run. A total of seven IMU, thirteen switches, right accelerators, and an RGB video recorded at 640 × 480 resolution at 10 frames per second were included in the data. Both high-level and fine-grained level actions were performed. Hence, it is an ideal dataset for our proposed study and can be found at https://ieee-dataport.org/documents/opportunity-multimodal-dataset-video-and-wearable-object-and-ambient-sensors-based-human.
CMU-MMAC Dataset.
This dataset is related to kitchen and food preparation actions from daily living routines, and it was selected due to its diverse application and desirable sensor modality. It consists of five IMU, five microphones, a wearable watch, and a camera-based 4 mp resolution at 120 Hz data. A total of 55 subjects performed the preparation for brownies, sandwiches, eggs, salad, and pizza. It contains both high-level and low-level locomotor activities. It is available at http://kitchen.cs.cmu.edu/.
Berkeley-MHAD Dataset.
A total of 12 subjects performed eleven actions through six accelerometers, four microphones, and twelve RGB cameras. The actions that were performed are more related to the daily routine and exercise activities. Hence, it was applied to the proposed locomotion prediction system to include the flavor of physical exercise recognition. It is available at https://figshare.com/articles/dataset/Berkeley_Multimodal_Human_Action_Database_MHAD_/11809470.
HWU-USP Dataset.
Another dataset, HWU-USP, is included in the study to capture daily living activities, such as using a laptop, reading newspapers, using phones, etc. It contains nine such activities and a few kitchen-related actions. The data was extracted via two accelerometers, four switches, and an RGB camera of 640 × 480 at 25 frames per second rate. It is available at https://datadryad.org/stash/dataset/doi:10.5061/dryad.v6wwpzgsj.
LARa Dataset.
Finally, we selected a dataset of actions related to walking, pushing, pulling, carting, etc. LARa is collected through three IMU, thirty-eight infrared cameras, and an RGB camera. A total of eight actions were performed by fourteen versatile subjects in a recorded data of 840 minutes. This type of dataset has also helped to ensure the robustness of the proposed system in logistics-related actions. It is available at https://zenodo.org/records/8189341.
Sensors-based assessment
Each sensor is evaluated separately to ensure the overall performance has met the criteria. Each type of sensor, such as biosensors, ambient, and vision sensors, has its own benefits when considered for locomotion prediction. To be certain about the overall performance of the system, we need to conduct performance validation for each sensor type that processes data separately.
Biosensors.
A root mean square (RMSE) RMS is analyzed for the proposed system over biosensors-based data. It is calculated as (24):
where predicted outcomes and actual outcomes
were used over total outcomes
. Fig 14 shows the comparison of RMSE performed over all five datasets and indicates that the RMSE declines when the partition sample data increases. In Fig 14(a), when the percentage of sampled descriptors partition is increased, the error rate for RMSE decreases significantly. However, in Fig 14(b), it is evident that increasing the sampled data partitions is not as effective over CMU-MMAC and LARa datasets. Therefore, it can be asserted that the RMSE offers a more contextually relevant measure, tailored to the specific environmental conditions of each dataset.
Ambient Sensors.
The interaction accuracy rate is utilized to calculate the performance of ambient sensors-based computations. The number of interactions using the upper body, hands, legs, and mid-body in each action is calculated and compared with the ground truth values given with the datasets. Tables 1–5 show the performance validation over ambient sensors-based data using the interaction accuracies over Opportunity++, CMU-MMAC, Berkeley-MHAD, LARa, and HWU-USP datasets, respectively. An average interaction accuracy rate of 96.55% across all five datasets demonstrates that the proposed system performs outstanding in computations based on ambient sensors.
Vision sensors.
We suggest using a confidence level for the human body-points-based validation technique to evaluate the vision data performance. The confidence level (CL) for each human body-point can be calculated using (25) as:
where geodesic distance is calculated for the current location
and ground truth
over
values. CL supports validating the vision data performance and attaining a robust locomotion prediction system. Table 6 shows the details of each body-point along with its CL in the range [0,1] over all five datasets. An average CL of 0.94 across the datasets demonstrates that the proposed system excels in achieving acceptable performance for vision sensor-based locomotion classification. However, the CL for a few body points falls below 0.90, indicating challenges in accurately recognizing actions related to those specific body points.
Overall system evaluation
To demonstrate the performance efficiency of the proposed method, we utilize confusion matrices, accuracy rates, precision, recall, and F1-scores. Confusion matrices are particularly useful for extracting accuracy rates for each dataset and providing a detailed analysis of each activity recognition. Tables 7–11 present the accuracy rates of the locomotion prediction system for the proposed BSM-HMM across the Opportunity++, CMU-MMAC, Berkeley-MHAD, LARa, and HWU-USP datasets, respectively. Similarly, Tables 12–16 show the accuracy rates of the locomotion prediction system using the pro-posed DERNN across the same datasets.
In Table 7, the proposed BSM-HMM achieves an accuracy rate of 79.41% over Opportunity++ using a machine learning-based algorithm for classification. In contrast, Table 12 shows a mean accuracy rate of 91.11% using a deep learning-based algorithm, demonstrating superior prediction performance. For the CMU-MMAC dataset, Table 8 shows a mean accuracy of 88.89% using the proposed BSM-HMM, while Table 13 shows a mean accuracy of 91.11% using DERNN, indicating that deep learning-based locomotion prediction is more accurate. Similarly, for the Berkeley-MHAD, LARa, and HWU-USP datasets, Tables 9–11 show mean accuracies of 81.67%, 80.00%, and 80.00%, respectively using BSM-HMM, while Tables 13–15 show higher mean accuracies of 87.50%, 83.75%, and 93.33% respectively using DERNN.
Based on Tables 7–16, it can be observed that the method using deep learning demonstrates somewhat superior performance in terms of accuracy rates compared to the method using machine learning.
Furthermore, precision, recall, F1-scores, and locomotion prediction accuracy rates are used to show the performance of the proposed system. (26) to (29) are utilized to calculate the precision , recall
, F1-score
, and accuracy rates
as:
where trp is the true positive, trn is the true negative, flp is the false positive, and fln represents false negatives. Tables 17–21 demonstrate the detailed analysis using precision, recall, F1-score, and prediction accuracy rates. The comparison of mean precision, mean recall, and mean F1-scores between the BSM-HMM and DERNN classifications highlights the superiority of deep learning over the machine learning-based algorithm in our proposed methodology. The achievement of more than 0.90 results for precision, recall, and F1-scores in two datasets indicates that the system delivers outstanding results in such environments. Additionally, the other three datasets also demonstrate acceptable outcomes, achieving more than 0.80 in performance metrics.
Performance Comparison between BSM-HMM and DERNN
The proposed method can have either the BSM-HMM as a machine learning-based classifier or DERNN as a deep learning-based classifier. To compare the performance of each method over the locomotion prediction system, we compared the accuracy rates and computational times over all five datasets for both approaches. Table 22 provides a detailed performance comparison according to the computational times and prediction accuracies. From the comparison, we suggest that using BSM-HMM is favorable when computational time is more important than the accuracy rate. However, the DERNN-based method is more acceptable when the system performance in terms of accuracy is more critical.
Comparison of the proposed system with other methods
Other existing methods for locomotion prediction utilize activity recognition for daily living actions. In contrast, our proposed system has achieved a mean accuracy rate of 87.61%, which is improved than the systems compared in literature. This was achieved through the customized techniques developed in this study for data filtering, descriptor extraction, descriptor selection, and classification. We have compared the proposed system with previous studies, and Table 23 presents a comprehensive comparison. As shown in Table 23, the proposed method using BSM-HMM achieves slightly better performance. However, significantly better results can be achieved when using DERNN in the performance evaluation based on the accuracy of locomotion activity predictions.
In previous studies, several approaches have been proposed for human motion recognition and locomotion prediction; however, many of these methods exhibited limitations in feature extraction, pre-processing, and classification. For instance, [12] introduced a four-module framework involving signal pre-processing, segmentation, feature extraction, and feed-forward neural network-based classification. Nonetheless, the system underperformed due to insufficient feature extraction and suboptimal classifiers. Similarly, [49] proposed an IoT-based data processing system for home surveillance, yet it suffered from irrelevant feature selection and ineffective discrimination techniques, resulting in low accuracy. In [50], a deep belief network was utilized for skeleton modeling based on multi-sensor data, but ineffective filtration allowed noise interference, impairing motion recognition performance.
Advanced neural architectures have also been explored to address these challenges. For example, [51] combined a recurrent Capsule Network (CapsNet) with a ConvLSTM to capture spatio-temporal features, while optimizing parameters using a genetic algorithm. However, the reliance on hand-crafted features and a lack of pre-processing led to limited success. Similarly, Batool et al. [52] implemented data filtration, feature extraction, optimization, and classification to monitor daily activities. Despite using a reweighted genetic algorithm and noise removal, the system struggled to detect complex actions. In [53], sophisticated cue extraction methods, such as Hilbert and Walsh-Hadamard transforms, Bone Pair Descriptors, waypoint trajectories, and random occupancy patterns, were adopted. Yet, the absence of optimal cue selection contributed to degraded accuracy.
Several other studies further illustrate similar shortcomings. In [54], Tobit Kalman filtering and convolutional autoencoders were applied for motion capture, though accuracy levels remained unsatisfactory. A multi-model learning approach in [55] utilizing AlexNet, LSTM, BiLSTM, LeNet, and ResNet achieved recognition accuracy below 84%. Additionally, the skeleton generation and matching technique in [56] failed to deliver acceptable performance due to a lack of pre-processing and feature reduction. A hybrid attribute-based deep neural network in [57] showed inefficiency in recognizing human actions owing to unfiltered sensor data. Likewise, transfer learning-based pose estimation in [58] and the inefficient filtration strategy in [59] led to subpar results.
In more recent studies, [9] leveraged data filtration, state-of-the-art descriptor extraction, and a residual neural network for classification, but insufficient descriptor optimization reduced detection accuracy. The system proposed in [60] integrated pre-processing, feature engineering, fusion, optimization, and classification for human action detection, yet the performance was limited due to ineffective feature engineering. Finally, [29] applied LSTM and CNN models for action prediction and classification, but the lack of comprehensive pre-processing and feature extraction led to unsatisfactory outcomes.
Ablation Study
This study has proposed a locomotion prediction system for practical implementation, utilizing multi-sensory devices and ontological agents. The system efficiency stems from its novel approach to locomotion prediction through innovative filtration, descriptor extraction, and customized classification methodologies. While experimental outcomes demonstrate the robustness and accuracy of the proposed system, an ablation study is conducted to further clarify its competence and utility.
Table 24 shows the effectiveness of the proposed system with and without the filtration method, proposed descriptor extraction techniques, and DERNN. The comparative analysis employs accuracy rates, peak signal-to-noise ratio (PSNR), and mean squared error (MSE) across a diverse range of applications [59]. The results indicate that the performance of the proposed system is enhanced through the implementation of these novel approaches.
Limitations of the study
Despite of achieving improved accuracy rates and other matricular achievements in this study, there are some challenges present in the system that need to be looked into for future work. The system is limited when it comes to identifying correct body points for human skeleton modeling due to the complex human actions present in daily living routines causing pose estimation limitation. For example, Fig 15a and 15b shows the red circled body points that can be mixed up together with each other due to random human motion actions like daily workout routines. If we look at the MSE and accuracy rates in Table 24, we can see that this limitation has contributed towards higher MSE in prediction of actual true activities causing the performance degradation in terms of accuracy rates. Although the accuracy rates have been affected by this limitation, but due to multi-sensors-based approach, we were able to achieve better accuracies when compared to literature as shown in Table 23. Table 25 gives a detailed insight into the experiment performed using the confused human actions by providing a focused confusion matrix for those activities that were mixed together by system over Berkeley-MHAD dataset. For example, we can see from the table that clapping hands action has been confused with jumping jacks, punching, and waving one hand actions. Similarly, jumping jacks action was confused with punching and waving one hand actions. Such confusions have caused degraded performance of the proposed system. In future, a more detailed analysis of such complex motion patterns needs to be done, e.g., human tracking or distinguishing movement patterns strategies can be applied in order to avoid degraded accuracy due to these failure cases and increase the practical use of our proposed system.
Conclusions
This study presents a novel adaptive system for locomotion prediction that demonstrates robustness across various environmental conditions and sensor-derived data inputs. The key contributions of this research, namely the innovative data filtration technique, advanced descriptor extractors and selectors, and sophisticated motion classifiers, have collectively enhanced the system’s performance to optimal levels. We introduced a versatile locomotion prediction system with potential applications in domains such as smart homes, healthcare, surveillance, and lifelogging. The success of the proposed approach is underpinned by the integration of machine learning, ontological agents, deep learning, graph theory, filtration mathematics, semantic relations, and a combination of sensor data.
However, the system faces challenges, particularly in accurately identifying body points for human skeleton modeling, which has resulted in a reduction in overall performance. This has been reflected in Fig 15 of the previous section. To address these limitations, future work will incorporate advanced techniques such as MediaPipe or YOLO-V8 and integrate additional enhancements to improve the capabilities of intelligent agents and optimize the system’s overall performance.
References
- 1. Zell P, Rosenhahn B. Learning inverse dynamics for human locomotion analysis. Neural Comput Applic. 2019;32(15):11729–43.
- 2.
Azmat U. Human Activity Recognition via Smartphone Embedded Sensor using Multi-Class SVM. 2022 24th International Multitopic Conference (INMIC), 2022. 1–7. https://doi.org/10.1109/inmic56986.2022.9972927
- 3. Javeed M, Shorfuzzaman M, Alsufyani N, Chelloug SA, Jalal A, Park J. Physical human locomotion prediction using manifold regularization. PeerJ Comput Sci. 2022;8:e1105. pmid:36262158
- 4.
Azmat U, Jalal A. Smartphone inertial sensors for human locomotion activity recognition based on template matching and codebook generation. 2021 International Conference on Communication Technologies (ComTech), 2021. 109–14. https://doi.org/10.1109/comtech52583.2021.9616681
- 5. Figueiredo J, Carvalho SP, Goncalve D, Moreno JC, Santos CP. Daily locomotion recognition and prediction: A kinematic data-based machine learning approach. IEEE Access. 2020;8:33250–62.
- 6. De D, Bharti P, Das SK, Chellappan S. Multimodal wearable sensing for fine-grained activity recognition in healthcare. IEEE Internet Comput. 2015;19(5):26–35.
- 7.
Jalal A, Kim Y. Dense depth maps-based human pose tracking and recognition in dynamic scenes using ridge data. 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2014. 119–24. https://doi.org/10.1109/avss.2014.6918654
- 8. Ordóñez FJ, Roggen D. Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors (Basel). 2016;16(1):115. pmid:26797612
- 9. Javeed M, Abdelhaq M, Algarni A, Jalal A. Biosensor-based multimodal deep human locomotion decoding via internet of healthcare things. Micromachines (Basel). 2023;14(12):2204. pmid:38138373
- 10. Smith AA, Li R, Tse ZTH. Reshaping healthcare with wearable biosensors. Sci Rep. 2023;13(1):4998. pmid:36973262
- 11. Noor MHM, Salcic Z, Wang KI-K. Ontology-based sensor fusion activity recognition. J Ambient Intell Human Comput. 2018;11(8):3073–87.
- 12. Javeed M, Mudawi NA, Alazeb A, Alotaibi SS, Almujally NA, Jalal A. Deep ontology-based human locomotor activity recognition system via multisensory devices. IEEE Access. 2023;11:105466–78.
- 13. Liu J, Li Y, Tian X, Sangaiah AK, Wang J. Towards semantic sensor data: An ontology approach. Sensors (Basel). 2019;19(5):1193. pmid:30857211
- 14. Javeed M, Mudawi NA, Alazeb A, Aljuaid H, Alatiyyah MH, Alnowaiser K, et al. Intelligent fine-grained daily living locomotion prediction based on skeleton modeling and CNN. TS. 2024;41(5):2517–28.
- 15. Fan Y-C, Tseng Y-H, Wen C-Y. A novel deep neural network method for HAR-based team training using body-worn inertial sensors. Sensors (Basel). 2022;22(21):8507. pmid:36366202
- 16.
Batool M, Javeed M. Movement Disorders Detection in Parkinson’s Patients using Hybrid Classifier. In: 2022 19th International Bhurban Conference on Applied Sciences and Technology (IBCAST), 2022. 213–8. https://doi.org/10.1109/ibcast54850.2022.9990423
- 17. Oguntala GA, Abd-Alhameed RA, Ali NT, Hu Y-F, Noras JM, Eya NN, et al. SmartWall: Novel RFID-enabled ambient human activity recognition using machine learning for unobtrusive health monitoring. IEEE Access. 2019;7:68022–33.
- 18. Hu M, Luo M, Huang M, Meng W, Xiong B, Yang X, et al. Towards a multimodal human activity dataset for healthcare. Multimedia Systems. 2022;29(1):1–13.
- 19. Chung S, Lim J, Noh KJ, Kim G, Jeong H. Sensor data acquisition and multimodal sensor fusion for human activity recognition using deep learning. Sensors (Basel). 2019;19(7):1716. pmid:30974845
- 20. Ihianle IK, Nwajana AO, Ebenuwa SH, Otuka RI, Owa K, Orisatoki MO. A deep learning approach for human activities recognition from multimodal sensing devices. IEEE Access. 2020;8:179028–38.
- 21. Islam MM, Iqbal T. Multi-GAT: A graphical attention-based hierarchical multimodal representation learning approach for human activity recognition. IEEE Robot Autom Lett. 2021;6(2):1729–36.
- 22. Hajjej F, Javeed M, Ksibi A, Alarfaj M, Alnowaiser K, Jalal A, et al. Deep human motion detection and multi-features analysis for smart healthcare learning tools. IEEE Access. 2022;10:116527–39.
- 23. Antonucci A, Papini GPR, Bevilacqua P, Palopoli L, Fontanelli D. Efficient prediction of human motion for real-time robotics applications with physics-inspired neural networks. IEEE Access. 2022;10:144–57.
- 24. Yang C, Yuan K, Heng S, Komura T, Li Z. Learning natural locomotion behaviors for humanoid robots using human bias. IEEE Robot Autom Lett. 2020;5(2):2610–7.
- 25. al Shloul T, Javeed M, Gochoo M, A. Alsuhibany S, Yasin Ghadi Y, Jalal A, et al. Student’s Health exercise recognition tool for e-learning education. Intelligent Automation & Soft Computing. 2023;35(1):149–61.
- 26.
Ciliberto M, Rey VF, Calatroni A, Lukowicz P, Roggen D. Opportunity: A multimodal dataset for video- and wearable, object and ambient sensors-based human activity recognition. IEEE Dataport. 2021. https://doi.org/10.21227/yax2-ge53
- 27.
Torre FD, Hodgins JK, Bargteil AW, Martin X, Macey J, Collado AT, et al. Guide to the Carnegie Mellon University Multimodal Activity (CMU-MMAC) Database. 2008.
- 28.
Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R. Berkeley MHAD: A comprehensive Multimodal Human Action Database. 2013 IEEE Workshop on Applications of Computer Vision (WACV), 2013. 53–60. https://doi.org/10.1109/wacv.2013.6474999
- 29. Ranieri CM, MacLeod S, Dragone M, Vargas PA, Romero RAF. Activity recognition for ambient assisted living with videos, inertial units and ambient sensors. Sensors (Basel). 2021;21(3):768. pmid:33498829
- 30. Niemann F, Reining C, Moya Rueda F, Nair NR, Steffens JA, Fink GA, et al. LARa: Creating a dataset for human activity recognition in logistics using semantic attributes. Sensors (Basel). 2020;20(15):4083. pmid:32707928
- 31.
Javeed M, Jalal A. Body-worn Hybrid-Sensors based Motion Patterns Detection via Bag-of-features and Fuzzy Logic Optimization. 2021 International Conference on Innovative Computing (ICIC), 2021. 1–7. https://doi.org/10.1109/icic53490.2021.9692924
- 32. Ghadi YY, Javeed M, Alarfaj M, Shloul TA, Alsuhibany SA, Jalal A, et al. MS-DLD: Multi-sensors based daily locomotion detection via kinematic-static energy and body-specific HMMs. IEEE Access. 2022;10:23964–79.
- 33. Sat-Muñoz D, Martínez-Herrera B-E, González-Rodríguez J-A, Gutiérrez-Rodríguez L-X, Trujillo-Hernández B, Quiroga-Morales L-A, et al. Phase angle, a cornerstone of outcome in head and neck cancer. Nutrients. 2022;14(15):3030. pmid:35893884
- 34. Shouran M, Elgamli E. Design and implementation of Butterworth filter. International Journal of Innovative Research in Science, Engineering and Technology. 2020;9(9):7975.
- 35.
Akhter I, Hafeez S. Human Body 3D Reconstruction and Gait Analysis via Features Mining Framework. 2022 19th International Bhurban Conference on Applied Sciences and Technology (IBCAST), 2022. 189–94. https://doi.org/10.1109/ibcast54850.2022.9990213
- 36. Hoeser T, Kuenzer C. Object detection and image segmentation with deep learning on earth observation data: A Review-Part I: Evolution and Recent Trends. Remote Sensing. 2020;12(10):1667.
- 37. Yu G, Wang Z, Zhao P. Multisynchrosqueezing Transform. IEEE Trans Ind Electron. 2019;66(7):5441–55.
- 38. Yang C, Zhou K, Liu J. SuperGraph: Spatial-temporal graph-based feature extraction for rotating machinery diagnosis. IEEE Trans Ind Electron. 2022;69(4):4167–76.
- 39. Liu Q, Wang Y, Xu Y. Synchrosqueezing extracting transform and its application in bearing fault diagnosis under non-stationary conditions. Measurement. 2021;173:108569.
- 40. Dai M, Demirel MF, Liang Y, Hu J-M. Graph neural networks for an accurate and interpretable prediction of the properties of polycrystalline materials. npj Comput Mater. 2021;7(1).
- 41. Shi J, Wang W, Lou X, Zhang S, Li X. Parameterized hamiltonian learning with quantum circuit. IEEE Trans Pattern Anal Mach Intell. 2023;45(5):6086–95. pmid:36044483
- 42. Liu Y, Mu Y, Chen K, Li Y, Guo J. Daily activity feature selection in smart homes based on pearson correlation coefficient. Neural Process Lett. 2020;51(2):1771–87.
- 43. Jang M, Yoon H, Lee S, Kang J, Lee S. A Comparison and evaluation of stereo matching on active stereo images. Sensors (Basel). 2022;22(9):3332. pmid:35591022
- 44.
Javeed M, Jalal A, Kim K. Wearable Sensors based Exertion Recognition using Statistical Features and Random Forest for Physical Healthcare Monitoring. 2021 International Bhurban Conference on Applied Sciences and Technologies (IBCAST), 2021. 512–7. https://doi.org/10.1109/ibcast51254.2021.9393014
- 45. Xue X, Chen J. Optimizing sensor ontology alignment through compact co-firefly algorithm. Sensors (Basel). 2020;20(7):2056. pmid:32268547
- 46. Liu J, Li Y, Tian X, Sangaiah AK, Wang J. Towards semantic sensor data: An ontology approach. Sensors (Basel). 2019;19(5):1193. pmid:30857211
- 47. Manouchehri N, Bouguila N. Human Activity Recognition with an HMM-Based Generative Model. Sensors (Basel). 2023;23(3):1390. pmid:36772428
- 48. Alamri NMH, Packianather M, Bigot S. Predicting the porosity in selective laser melting parts using hybrid regression convolutional neural network. Applied Sciences. 2022;12(24):12571.
- 49.
Azmat U, Jalal A, Javeed M. Multi-sensors Fused IoT-based Home Surveillance via Bag of Visual and Motion Features. 2023 International Conference on Communication, Computing and Digital Systems (C-CODE), 2023. 1–6. https://doi.org/10.1109/c-code58145.2023.10139889
- 50.
Akhter I, Javeed M, Jalal A. Deep Skeleton Modeling and Hybrid Hand-crafted Cues over Physical Exercises. 2023 International Conference on Communication, Computing and Digital Systems (C-CODE), 2023. https://doi.org/10.1109/c-code58145.2023.10139863
- 51. Lu Y, Velipasalar S. Autonomous human activity classification from wearable multi-modal sensors. IEEE Sensors J. 2019;19(23):11403–12.
- 52. Batool M, Jalal A, Kim K. Telemonitoring of daily activity using accelerometer and gyroscope in smart home environments. J Electr Eng Technol. 2020;15(6):2801–9.
- 53. Hafeez S, Yasin Ghadi Y, Alarfaj M, al Shloul T, Jalal A, Kamal S, et al. Sensors-based ambient assistant living via e-monitoring technology. Computers, Materials Continua. 2022;73(3):4935–52.
- 54. Lannan N, Zhou LE, Fan G. Human Motion Enhancement via Tobit Kalman Filter-Assisted Autoencoder. IEEE Access. 2022;10:29233–51. pmid:36090467
- 55. Tian Y, Li H, Cui H, Chen J. Construction motion data library: an integrated motion dataset for on-site activity recognition. Sci Data. 2022;9(1):726. pmid:36435886
- 56.
Lannan N, Zhou L, Fan G. A Multiview Depth-based Motion Capture Benchmark Dataset for Human Motion Denoising and Enhancement Research. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2022. 426–35. https://doi.org/10.1109/cvprw56347.2022.00058
- 57.
Lüdtke S, Rueda FM, Ahmed W, Fink GA, Kirste T. Human activity recognition using attribute-based neural networks and context information. https://doi.org/10.48550/arXiv.2111.04564
- 58.
Awasthi S, Rueda FM, Fink GA. Video-based Pose-Estimation Data as Source for Transfer Learning in Human Activity Recognition. 2022 26th International Conference on Pattern Recognition (ICPR), 2022. 4514–21. https://doi.org/10.1109/icpr56361.2022.9956405
- 59. Syed AS, Sherhan Z, Shehram M, Saddar S. Using Wearable sensors for human activity recognition in logistics: A comparison of different feature sets and machine learning algorithms. IJACSA. 2020;11(9).
- 60. Javeed M, Mudawi NA, Alabduallah BI, Jalal A, Kim W. A multimodal IoT-based locomotion classification system using features engineering and recursive neural network. Sensors (Basel). 2023;23(10):4716. pmid:37430630