Retraction
The PLOS One Editors retract this article [1] due to concerns about potential manipulation of the publication process. These concerns call into question the validity and provenance of the reported results. We regret that the issues were not identified prior to the article’s publication.
All authors did not agree with the retraction.
7 Apr 2026: The PLOS One Editors (2026) Retraction: Stacked-gait: A human gait recognition scheme based on stacked autoencoders. PLOS ONE 21(4): e0346381. https://doi.org/10.1371/journal.pone.0346381 View retraction
Figures
Abstract
Human gait recognition (HGR) is the mechanism of biometrics that authors extensively employ to recognize an individuals based on their walking traits. HGR has been prominent for the past few years due to its surveillance capability. In HGR, an individual’s walking attributes are utilized for identification. HGR is considered a very effective technique for recognition but faces different problematic factors that degrade its performance. The major factors are variations in clothing, carrying, walking, etc. In this paper, a new hybrid method for the classification of HGR is designed called Stacked-Gait. The system is based on six major steps; initially, image resizing is performed to overcome computation problems. In the second step, these images are converted into grey-scale to extract better features. After that, the dataset division is performed into train and test set. In the next step, the training of the autoencoders and feature extraction of the dataset are performed using training data. In the next step, the stacking of two autoencoders is also performed. After that, the stacked encoders are employed to extract features from the test data. Finally, the feature vectors are given as input to various machine learning classifiers for final classification. The method assessment is performed using the CASIA-B dataset and achieved the accuracy of 99.90, 98.10, 97.20, 97.20, 96.70, and 100 percent on 000, 180, 360, 540, 720, and 900 angles. It is pragmatic that the system gives promising results compared to recent schemes.
Citation: Mehmood A, Amin J, Sharif M, Kadry S, Kim J (2024) RETRACTED: Stacked-gait: A human gait recognition scheme based on stacked autoencoders. PLoS ONE 19(10): e0310887. https://doi.org/10.1371/journal.pone.0310887
Editor: Toqeer Mahmood, National Textile University, PAKISTAN
Received: March 8, 2024; Accepted: September 9, 2024; Published: October 23, 2024
Copyright: © 2024 Mehmood et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data is available on Kaggle at the following link: https://www.kaggle.com/datasets/asifkhattak/casia-b-split-dataset-six-angles?select=CASIA-B+Split+Dataset.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Gaitdenotes the walking style of a person. The analysis of the human walk was initiated in the 1960s [1]. Initially, HGR [2] was used to assess medical anomalies such as spinal stenosis [3], Parkinson’s [4], and walking illnesses in elders. The method of HGR was considered efficient in diagnosing these types of disorders. After some time, researchers started to use the HGR method as a human recognition technique. Apart from that, there are many techniques utilized by researchers, such as iris recognition [5], facial features utilization [6], palm veins boundary [7], and fingerprints [8]. The HGR method is considered better because data acquisition is easy. The cooperation of the individual is not required because data can be captured from a distance. Researchers perform a lot of work globally to recognize a person using walking patterns. The implementation of HGR aims to minimize the threats to security in various places such as the embassy, banks, airports, etc. The gait attributes of a person are easy to use, but some factors can also affect the results, such as poor lighting, angle variation [9], clothing variations, and carrying situations [10]. Variations in the clothing, footwear, and carrying conditions can potentially change the gait features, creating problems during recognition [11].
Moreover, the lighting conditions, background, and camera angles also distort the gait features, making recognition a daunting task. The partial occlusion or angle change can introduce variation in the human gait information, making the recognition process very critical [12]. The consistency of the system is also challaged due to the variation in the gait of a person due to mood, fatigue, and health conditions [13]. To process the gait data, there is need for efficient and accurate system to be utilized for surveillance [14].
The HGR can be divided into two classes: model-free and model-based. The model-free method uses a silhouette of a person’s whole body to calculate the attributes. Moreover, the width of the vector and Fourier descriptor are also computed. The dynamic information and details extracted from the silhouette of the human body are also used to recognize a person. This method requires low computation, but different factors can affect it, such as changes in angles and clothes, the shadow of a person’s feet, issues that occur due to poor lighting, etc. In the model-based method, information extraction uses previous knowledge of human body parts movement. A person’s behavior is assessed by using different movements of the body parts. This method is considered good for dealing with lighting, clothing, view angle [15], and shadow, but the computation cost is very high.
The recent method utilizes the data captured from different sensors such as RGB depth sensor camera, inertial sensor, and infrared sensor to obtain powerful features. The power-integrated features are obtained by fusing the skeleton and silhouette data [16]. Some more techniques, such as deep learning, attention mechanisms, and capsule networks, are also being utilized for accurate feature extraction [17]. Furthermore, the variability and occlusion problems are addressed by applying 3D reconstruction of the human body to accurately represent human gait [18]. The comprehensive representation of gait is obtained by combining multiple granularity features.
The basic model of HGR is comprised of some steps such as preprocessing [19, 20], segmentation [21, 22], extraction of features, and classification [23]. Preprocessing is important in computer vision and image processing to extract good features. This step includes removing noise, normalizing the contrast of images, and background removal.
Researchers utilize many techniques for preprocessing images, such as thresholding, background removal, and watersheds. After preprocessing, improved images are utilized to extract the features [24, 25]. The purpose of preprocessing is to extract relevant features and avoid degrading the system’s performance. The feature extraction is also done using different methods such as geometry [26], color [27], and shape-based. During the extraction of features, there is a chance of the occurrence of irrelevant features. These irrelevant features result in the problem of dimensionality. This problem is also addressed by researchers using different techniques such as entropy [28–30], Wavelet Transform [31], meta-heuristic [32], and Principle Component Analysis (PCA) [33]. The final step is the classification, which is based on feeding the extracted feature vector (FV) into classifiers such as SVM, decision tree (DT), KNN, and regression learning (RL) [34–36]. To address the problem of recognition rate, computational cost, and relevant feature extraction, a new method based on the features of autoencoders is used to perform HGR in this work.
- The core contribution of the stacked-gait model designing for HGR is a hybrid approach based on autoencoder feature extraction and machine learning (ML) classification.
- The autoencoders are trained from scratch, and features are extracted with the help of the encoder part of the autoencoder. The training is accomplished with the help of selected hyperparameters acquired by conducting extensive experiments. The 2 number of autoencoders, 100 epochs, 250vs200 hidden size are selected hyperparameters selected after extensive experiments.
- Both autoencoders are stacked together to build a model that can directly extract features from the input data. We have eliminated the decoder part and obtained the features on the encoder part of the autoencoder. The extracted feature vectors are fed into different machine-learning classifiers for final recognition.
The rest of the article is organized as related work, proposed methodology, results and analysis, and discussion are elaborated. Finally, the conclusion and future work are in the last of manuscript.
Related work
Many techniques have been employed for HGR by authors in the past that address different problems that occur during recognition such as: A Convolutional Neural Network (CNN) features extraction method employed by authors [37] in which computation of a high-level description is performed by using low-level attributes. The analysis of the system was performed by exploiting a famous HGR dataset called TUM-GAD and succeeded in attaining an accuracy (ACR) of 89 percent. In [38], the authors presented the HGR technique based on a deep learning model and optimization based on Bayesian called HGRBOL2. The system is comprised of both parallel and sequential steps. In the first step, the EffecientNet-B0 is fine-tuned by using regions of optical flow. The optimization of the obtained features is performed with the help of the Bayesian technique. The testing of the system is performed by utilizing two HGR datasets and achieved ACR of 92.04 percent on CASIA-B and 94.97 percent on CASIA-C accordingly. In another method [39], authors employed an HGR method based on a two-stream network. In the first step, the enhancement of contrast is performed by using fused information of local and global filters. After that data is augmented to enhance the dataset. Two fine-tuned networks such as SuffleNet and MobileNetV2 are utilized to extract the features. After that, the fusion of features is performed. The classification is performed by using ML algorithms. The method is assessed by employing CASIA-B and it gives promising results. In another method [40], the authors tried to address the problem of carrying which is considered the most affecting factor of ACR. A model of CNN is utilized to acquire attributes from the image. The technique of fine-tuning is employed on the CNN network and the problem of a small dataset was addressed. The system testing was performed by utilizing the HGR dataset CASIA-B and obtained an average ACR of 90 percent.
By using another HGR method [41], the author tried to lessen the view variation effect on the performance of the system by introducing a new method called DeepNet. The Normalized Auto Correlation method was utilized to perform the normalization of the cycle of the gait. After normalization of the gait cycle, the extraction of the features is performed. The system testing is performed by exploiting an HGR dataset called OULP and obtaining an ACR of 89.3 percent. In another method [42], the authors tried to minimize the effect of the performance on the system. They used two CNN models such as AlexNet and VGG-16 to extract the features. Moreover, both extracted feature vectors are fused. The optimization of the features is performed using the Fuzzy Entropy Controlled Skewness method.
The overwhelming variation is performed by authors with the help of a new HGR method [43]. This scheme is based on deterministic learning and information merging. The testing of the scheme is accomplished by using three angles of CASIA-B and attained ACR of 88, 87, and 86 percent on 18, 36, and 54 angles accordingly. Another technique is presented by the author based on a hybrid selection method to overwhelm the problem of the variation [44]. The extraction of the features is performed by using a CNN model called DenseNet-201. The reduction of feature redundancy is performed with the help of firefly and skewness algorithms. The scheme testing was performed using CASIA-B and reached the ACR of 94.30, 93.80, and 94.70 percent on 18, 36, and 54 angles accordingly. The author tried to minimize the problem of covariation by presenting a new HGR approach [45]. The extraction of features is performed by using the CNN model. The testing of the performance of the system is done by using two HGR datasets CASIA-B and TUM GAID. The ACR was computed for Rank1 and Rank5 and attained the ACR of 96.45 and 99.24 percent on CASIA-B. The ACR was also calculated on TUM GAID and achieved the ACR of 72.06 and 84.07 percent for Rank1 and Rank5 respectively. In another method [46], authors utilized CNN and Long Short-Term Memory (LSTM) methods for HGR. Features are extracted by utilizing RGB image data. After the computation of features, both computed feature vectors are fused. The testing of the system is accomplished by exploiting three gait datasets such as CASIA-B, USF, and FVG. The system succeeded in reaching the ACR of 81.8, 99.5, and 87.8 percent respectively.
In [47], authors presented an HGR method to address the co-variant problem. They used a classical feature extraction method called Histogram of Oriented Gradients (HOG). The classification was carried out by using an ML classifier called SVM. The system was tested by using CASIA-B and achieved 87.9 percent ACR the coat-wearing, 83.33 percent ACR for all the covariates, and 87.9 percent for clothing covariates. Another method [48] presented by the author is used for the extraction of salient features of diverse regions. The aim was to obtain the spatio-temporal features. The system assessment was performed by using CASIA-B and OU-MVLP and obtained an average ACR of 93.37 and 89.70 percent respectively.
In [49] authors presented an HGR approach based on a generative adversarial network (GAN) to generate images of pedestrian gait. After that, various deep learning models such as VGG19, Inception, VGG16, Alexnet, Xception, and ResNet are used to extract gait features. To address the problem of imbalanced data, they used the synthetic minority oversampling technique (SMOTE) technique. Furthermore, features are optimized by using different techniques such as particle swarm optimization, Chi-square, genetic models, and grey wolf optimization. They tested their techniques on different angles of CASIA-A, B, and OU-ISIR, and 99.10, 99.3, and 99.09 percent ACR were obtained. He et al. [50] presented a method based on temporal sparse adversarial attack for recognition of gait images. They generated high-quality images by using the GAN network. The technique obtained good ACR on interceptability but the performance is degraded on some frames. In [51], the authors presented a technique comprised of LDA-PSO-LSTM. They utilized a dataset of sEMG signals and obtained an average ACR of 94.89 percent.
In another method [52], a code generation method is used for HGR. In this method, the codebook is encoded by utilizing a fisher vector. Two datasets for gait were used to analyze the method CASIA-A and TUM GAID. The ACR of 100 and 97.74 percent is achieved on CASIA-A and TUM GAID respectively. To address the imbalance data problem in HGR, a stacked deep multi-convolutional capsule network is implemented [53]. Initially, the preprocessing is performed to enhance image contrast by using CLAHEF. The classification is performed under different variants such as carrying and clothing conditions by using the stacked capsule network. The method is tested on CASIA-B and OU-ISIR datasets and obtained improved results as compared to recent techniques.
To learn robust features and obtain unique patterns from the skeleton, a Bimodal Fusion Network (BiFusion) is implemented [54]. Improved features are obtained by integrating the redirection of silhouettes. In particular, the Multi-Scale Gait Graph (MSGG) network is utilized to extract skeleton features. The method is assessed on CASIA-B and OUMVLP and obtained 94 percent ACR on CASIA-B and 90.43 percent mean ACR on OUMVLP. To identify important areas of input data, a method based on attention mechanisms is implemented named AttenGait [55]. The attention mechanisms utilize various types of data modalities such as optical flow to obtain robust information. The method is tested on CASIA-B and GREW and obtained an average ACR of 95.8 and 70.7 percent respectively. To address the redundant features problem, an HGR technique based on the 3D human body is introduced [18]. The technique aims to obtain compact and discriminative features. Moreover, a fusion module based on a multi-granular feature is implemented to handle the multiple granularities. The method assessment is performed by utilizing two datasets such as outdoor data and CASIA-B and achieved the average ACR of 78 and 90.90 percent respectively.
The proposed techniques work well on different datasets and offer some advantages but also have some limitations. In [38], Robust features are extracted but the computation time is increased due to the fusion of features. In [40], Automated features extraction is performed but the recognition is degraded in co-variant conditions such as gallery view. DeepGait [41] performed well on view variations but was unable to achieve promising results. CNN VGG-16 and Alexnet-based HGR [42] obtained the best features but computation time is increased due to the fusion of features. By using RBF neural networks-based architecture [43] knowledge fusion and deterministic learning are performed very well but the system is tested on limited data. The technique based on SVM and HOG-based methods for HGR is robust to variations but failed to achieve a promising recognition rate due to the classical method.
The STAR [48] method used both spatial and temporal features but was unable to attain good ACR on OU-MVLP. A GAN-based HGR method is implemented by authors which obtained good features but faces the problem of training instability [49]. Two-branch Multi-Stage CNN Network [56] addresses the problem of environments with view and cloth changes but fails to achieve better ACR for CASIA-B despite of Multi-Stage CNN network. CNN Model [57] for HGR is evaluated on a large range of datasets but the system is not evaluated through low-quality data. Autoencoder and LSTM-based method [46] is based on quantitatively, promising computational efficiency, and the ability to feature disentanglement qualitatively but the technique is evaluated on a limited dataset (i.e., only 900 angle of CASIA-B). Nithyakani et al. [53], implemented a stacked capsule network-based method for HGR. The method demonstrated improved spatial relationship learning but faced the problem of computation cost. In [54], the author used a Multi-Scale Gait Graph network to extract features. This method obtained improved results but also needed an accurate silhouette to perform well. The AttenGait [55] obtained good features but the computational cost is high due to different modalities. The multi-granularity technique works [18] well on the tested dataset but may face scalability issues due to 3D 3D-based technique. The recent techniques are summarized in the form of Table 1.
Proposed methodology
In this article, a novel technique called Stacked-Gait is introduced to solve the problem of recognition rate in HGR. The system is comprised of six steps. In the first step, the image is resized to overcome the problem of computation as the HGR system may be implemented into a real-time environment. In the second step, these images are converted into grayscale to extract good features. In the next step, these images are split into training and testing images. In the fourth step, two autoencoders (AEs) are trained, and feature extraction is performed for training data using the encoder part of the AEs. After training the AEs, in the fourth step, these two AEs are stacked together. In the fifth step, the testing data is fed into a stacked autoencoder (SAE), and features of this data are extracted. Finally, the extracted feature vector is fed to different ML classifiers for final classification. The framework is illustrated in Fig 1.
Preprocessing of input data
In the proposed work, the resizing technique is utilized for preprocessing. After resizing images, these images are converted to grayscale for better feature extraction and to eliminate the overhead of computational cost. This input of the image size is set to 28×28 and fed to encoders for training. As this system can be implemented in a real-time environment for the recognition of a person, the computational time should be less to get instant results. After the step of preprocessing, these frames are fed into encoders for training and feature extraction.
Features extraction based on the training of auto-encoders
In this work, two AEs are used for training. Both encoders are trained at different parameters to obtain maximum efficiency. The results on three stacked AEs were also calculated but it was 5 percent less as compared to the results of two AEs. So, we decided to further perform results by using two AEs. This parameter is selected after performing three experiments. Initially, we utilized the frontal view of CASIA-B and achieved an ACR of 91.60 percent on 50 epochs, 96.30 percent on 100 epochs, and 96.70 on 150 epochs on subset-1. There is a sufficient change of ACR while changing epochs from 50 to 100. While the ACR in the case of 100 to 150 is drastically changed. So, we decided to choose 100 epochs for a trade-off between computational time and ACR. The architecture of the AE is illustrated in Fig 2.
The autoencoder [58] is referred to as a structure for unsupervised learning that is based on three layers such as input, hidden, and output layers as illustrated in Fig 2. The AE is comprised of two parts such as encoder and decoder. The hidden representation of the input data is achieved by using the encoder while the reconstruction of the input data is performed by using the decoder. Suppose there is un-labeled data as input where xn Є Rm×1 while hn reffered as output vector extracted from xn and
reffered to as vector extracted from decoder at the output layer. The encoding can be stated as follows:
(1)
Where f expresses the function used for encoding, W1 expresses the encoder matrix for weight, and the bias vector can be expressed as b1. The process of the decoder can be defined as follows:
(2)
Where g expresses the function used for encoding, W2 expresses the encoder matrix for weight, and the bias vector can be expressed as b2. The error of reconstruction can be minimized by optimizing the autoencoder parameter sets.
(3)
Where L refers to the function of loss
. Stacked autoencoders (SAEs) are the mechanism of stacking n number of autoencoders into n number of hidden layers by utilizing the method of unsupervised learning. After that fine tuning of the SAEs is performed. So, the method of SAEs is based on three steps.
- i. Training of the first autoencoder by using input data and extracting the feature vector (FV).
- ii. Feeding FV extracted from prior autoencoder to next autoencoder.
- iii. Minimizing the cost function and updating weights by backpropagation after training hidden layers. This is performed to complete the process of fine-tuning.
The overfitting during the training can be effectively minimized with the help of the dropout layer. In this case, the neurons preserve their weight. Moreover, neurons have the capability of recovery if the new data is fed as input. The dropout is attained by keeping the output data of a few neurons as 0 and eliminating these neurons for further training. Many authors tried to reduce the problem of overfitting by testing the effect of dropout even on a small set of data [59]. While using the traditional activation functions such as hyperbolic tangent and sigmoid when the error of training is propagated to the next layer. The ReLU is considered the most effective function of activation as the value of the gradient is not decreased after increasing the autonomous variables. It can be stated that the network utilizing the ReLU does not face the problem of gradient vanishing. The function of ReLU can be shown as follows:
(4)
Features extraction based on the training of autoencoder-1
Initially, AE-1 is trained on the given dataset. Several parameters are utilized while training the AE-1. A total of 250 hidden layers are used in this autoencoder. The number of hidden layers was determined by extensive experiments. We tried different combinations such as 150vs100, 200vs150, 250vs200, and 300vs250 and selected the best combination of 250vs200 for further experiments. The hidden size of the first autoencoder is 250 because it was intended to handle higher dimensional data. The second autoencoder hidden size is 200 due to low dimensional data making the layer effective for encoding and decoding. A maximum of 100 epochs and a regularization weight of 0.004 is utilized. The values of regularization and proposition were 4 and 0.15 respectively. While the scaling of data was turned off. These parameters are also defined in Table 2.
After the training of AE-1, the encoder part of this AE is used as a feature descriptor to extract feature vectors from input images. Let N refer to the total images and S1 refers to the total hidden layers of AE.
So, AE-1 will give the output of the feature vector as follows:
(5)
While FV1 is the extracted feature vector of AE-1.
Features extraction based on the training of autoencoder-2
After the training of AE-1 and feature extraction, the training of AE-2 is performed on the output of AE-1. In the case of AE-2 training, different parameters are utilized as compared to the AE-1. A total of 200 hidden layers are used in this autoencoder. A maximum of 100 epochs and a regularization weight of 0.002 are utilized. The values of regularization and proposition were 4 and 0.10 respectively. While the scaling of data was turned off. These parameters are also defined in Table 3.
After the training of AE-2, the encoder part of this AE is used as a feature descriptor to extract the feature from the output of AE-1. Let N refer to the total images and S2 refers to the total hidden layers of AE.
So, AE-2 will give the output of the feature vector as follows:
(6)
While FV2 is the extracted feature vector of AE-2.
Final classification of gait
As AE-based techniques are called un-supervised, we have used AE as a feature descriptor and after feature extraction, the FV2 and FV3 are fed as input to different ML classifiers for final recognition. The final FV3 is computed by using testing data. The output can also be computed as follows:
(7)
While FVT, refers to the size of the feature vector, N refers total number of images and S denotes the number of hidden layers used in AE.
The extracted FV of size N × S is given as input to various classifiers for classification. The method is tested on different parameters such as ACR, precision (PR), recall (REC), and area under the curve (AUC).
Dataset description
A publicly available dataset CASIA-B [60] is used for the assessment of the scheme. This dataset is based on an indoor environment captured by 124 participants. This dataset has a total of 11 angles with a gap of 180. Each video has a resolution of 352×240 pixels and is obtained at the rate of 25 fps. Each subject has 10 sequences There are a total of 10 sequences six sequences with the normal walk (nm), two sequences with carrying a bag (bg), and two sequences with the person wearing a coat (cl). There is no standard partition of the CASIA-B dataset so we performed two types of experiments to compare results with the possible similar literature. One type of experiment is performed for the frontal view analysis. In this experiment, the dataset for frontal view is divided into two sets such as galley and probe. This is divided into three subsets such as subset-1 to subset-3. In subset-1 first four sequences nm-1 to nm-4 are used as galley view and nm-5 to nm-6 are used as probe view. In subset-2 first four sequences nm-1 to nm-4 are used as galley view and cl-1 to cl-2 are used as probe view. In subset-3 first four sequences nm-1 to nm-4 are used as galley view and bg-1 to bg-2 are used as probe view. In the second type of experiment six angles of CASIA-B, such as 00, 18, 36, 54, 72, and 90 degrees are utilized to test the method. The ratio of 70:30 is opted to train and test the model accordingly. In this experiment, three classes bg, nm, and cl are considered for experiments.
Results and analysis
This section offers the results of the assessment of the proposed HGR framework. After the extraction of FV, various classifiers such as Fine KNN (FKNN), Subspace KNN (SKNN), weighted KNN (WKNN), bagged tree (BTREE), cubic SVM (CSVM), Linear Discriminant (LD), and Subspace Discriminant (SD) are used for final classification. In the final classification, the method of K-Fold is utilized by feeding the final FV to different classifiers. The final classification is performed by using K-Fold = 10. A total of nine experiments are performed out of which 3 are performed on subset-1 to subset-3 and six on different angles such as 00, 18, 36, 54, 72, and 90 using CASIA-B for the evaluation of the proposed method.
The hardware system has a CPU of i5 9th generation, GPU of NVIDIA GTX 1050Ti 4 GB, and 16 GB of RAM utilized for experiments.
Classification results of the proposed stacked-gait on frontal view analysis
This section offers the experimental results on the frontal view. The frontal view analysis is performed by using three experiments on subset-1, subset-2, and subset-3 respectively. The ACR is calculated on two classifiers such as LD and SD. The subset-1 is comprised of two parts of data such as gallery view and probe view. In this nm-1 to nm-4 are used as galley view and nm-5 to nm-6 of all 124 subjects are used for experiments. For subset-1 the ACR of 96.30 and 93.60 percent is obtained on LD and SD respectively. The results are also verified with the help of the confusion matrix as shown in Fig 3(a) and 3(b).
The experiments are also performed on subset-2 which is also comprised of gallery view and probe view. In this set, the galley view is the same as subset-1 but the probe view is based on the sequences of cl-1 to cl-2. On subset-1 the ACR of 95.80 on LD and 91.50 percent on SD is achieved. The result verification for subset-2 is also performed by using the confusion matrix as shown in Fig 4(a) and 4(b).
The experiments are also performed on subset-3 which is based on gallery view and probe view. The gallery view is the same as of subset-1 and subset-2 but the probe view is based on the sequences of bg-1 to bg-2. By utilizing subset 3 the LD obtained an ACR of 95.90 percent and SD obtained an ACR of 89.70 percent. The authentication of results is also performed by using a confusion matrix as illustrated in Fig 5(a) and 5(b).
The comparison of the Stacked-Gait with the recent techniques for the frontal view is also performed as given in Table 4.
An HGR method named GaitSet is introduced to address the problem of covariates [61]. The aim was to develop a method that is immune to frame permutation. The method was evaluated on CASIA-B frontal view and obtained ACR of 90.80, 83.80, and 61.40 on nm, bg, and cl respectively. The method was also tested on OU-MVLP and obtained an average rank-1 ACR of 87.10 percent. By considering the spatio-temporal expression of each part of the human body a new method namely GaitPart is employed [62]. Initially, part-level spatial features based on fine-grained learning were enhanced by using a new convolution layer called the Focal Convolution Layer. Secondly, the Micro-motion Capture Module (MCM) is designed for each part of the human body. The MCM is used to capture relevant features and eliminate redundant features. The method was tested on two datasets such as CASIA-B and OU-MVLP. By using the OU-MVLP the method achieved the 88.70 percent of ACR. By using CASIA-B frontal view, the method achieved the ACR of 94.10 percent on nm, 89.10 on bg, and 70.70 percent, on cl respectively.
A new multi-modal feature learning and representation method is implemented by authors by considering the frontal-view walking sequences [63]. The features are represented by using two types of characterizing such as dense optical flow and holistic silhouette. Furthermore, the extraction of pedestrian regions is performed by using improved YOLOv7 called Gait YOLO. Global walking features are extracted by using an encoder by enables the multi-modal fusion technique. The method is tested on two datasets such as CASIA-B and OUMVLP. In the case of OUMVLP, the method obtained ACR above 80 percent. By using the frontal view of CASIA-B, the achieved ACR of 96.90, 93.50, and 77.80 percent on nm, bg, and cl respectively. The proposed Stacked-Gait obtained results of 96.30, 95.90, and 95.80 percent on nm, bg, and cl respectively. By observing the comparison results given in Table 4 it is evident that the proposed Stacked-Gait outperforms the recent techniques for frontal view analysis.
Classification results of the proposed stacked-gait on 000 to 900
The detailed results on six angles of CASIA-B such as 00, 18, 36, 54, 72, and 90 are given in this section. Each angle has three classes such as bg, cl, and nm. A separate experiment on each of the angles is carried out to validate the method for co-variant factors.
Classification results of the proposed stacked-gait on 000 angle.
The experimental results on the 000 degree angle of CASIA-B are demonstrated in Table 5. The highest ACR of 99.90 percent is attained on FKNN. The obtained ACR on the remaining of the classifiers such as SKNN, WKNN, BTREE, and CSVM is 97.20, 96.90, 96.80, and 96.70 percent subsequently. The highest REC of 99.90 percent is reached on FKNN. The REC on the remaining classifiers such as SKNN, WKNN, BTREE, and CSVM, were 97.23, 96.97, 96.80, and 96.77 percent respectively. The highest PR of 99.33 percent is obtained on FKNN. The obtained value of PR for the remaining classifiers such as SKNN, WKNN, BTREE, and CSVM is 97.20, 96.93, 96.83, and 96.73 respectively. The maximum value of AUC is 100 percent obtained on FKNN. The value of AUC for the remaining classifiers such as SKNN, WKNN, BTREE, and CSVM is 99.0, 99.67, 99.33, and 99.0 percent respectively.
Verification of results is also performed with the help of a confusion matrix. The confusion matrix for the FKNN is illustrated in Fig 6 which was able to obtain maximum results on 000 angle of CASIA-B. The true positive rate (TPR) of 99.80, 100, and 99.90 percent is obtained for bg, cl, and nm respectively. The obtained value of false negative rate (FNR) is 0.20, 0.00, and 0.10 percent for bg, cl, and nm respectively.
Classification results of the proposed stacked-gait on 180 angle.
The experimental outcomes on the 180 angle of CASIA-B are described in Table 6. The uppermost ACR of 98.10 percent is obtained on CSVM. The achieved ACR on the remaining classifiers such as FKNN, SKNN, WKNN, and BTREE is 92.70, 92.80, 89.0, and 91.20 percent respectively. The maximum REC of 98.10 percent is obtained on CSVM. The REC on the remaining classifiers such as FKNN, SKNN, WKNN, and BTREE, is 92.70, 92.83, 92.37, and 91.33 percent subsequently. The uppermost PR of 98.10 percent is found on CSVM. The obtained value of PR for the remaining classifiers such as FKNN, SKNN, WKNN, and BTREE is 92.70, 92.87, 87.27, and 91.27 percent respectively. The maximum value of an AUC is 99.33 percent obtained on CSVM. The value of AUC for the remaining classifiers such as FKNN, SKNN, WKNN, and BTREE is 94.33, 97.67, 97.67, and 98.0 percent respectively.
The result verification is also accomplished with the help of a confusion matrix. The confusion matrix for the CSVM is exemplified in Fig 7 which was able to gain maximum results on 180 angle of CASIA-B. The TPR of 97.30, 98.40, and 98.60 percent is attained for bg, cl, and nm correspondingly. The found value of FNR is 2.70, 1.60, and 1.40 percent for bg, cl, and nm respectively.
Classification results of the proposed stacked-gait on 360 angle.
The experimental outcomes on a 360 angle of CASIA-B are designated in Table 7. The uppermost ACR of 97.20 percent is reached by employing CSVM. The achieved ACR of the remaining classifiers such as FKNN, SKNN, WKNN, and BTREE is 86.56, 85.57, 81.0, and 85.30 percent accordingly. The maximum REC of 97.17 percent is reached on CSVM. The REC for the remaining classifiers such as FKNN, SKNN, WKNN, and BTREE is 86.48, 85.73, 80.90, and 85.27 percent accordingly. The highest PR of 97.17 percent is obtained by using CSVM. The obtained value of PR for the remaining classifiers such as FKNN, SKNN, WKNN, and BTREE is 86.53, 85.50, 81.43, and 85.37 percent accordingly. The best value of an AUC is 99.67 percent achieved on CSVM. The value of AUC for the remaining classifiers such as FKNN, SKNN, WKNN, and BTREE is 89.67, 94.33, 94.33, and 95.33 percent respectively.
The results are also verified with the help of a confusion matrix. The confusion matrix for the CSVM is demonstrated in Fig 8 which was able to gain supreme results on a 360 angle of CASIA-B. The TPR of 96.90, 96.30, and 98.30 percent is attained for bg, cl, and nm similarly. The obtained value of FNR is 3.10, 3.70, and 1.70 percent for bg, cl, and nm respectively.
Classification results of the proposed stacked-gait on 540 angle.
The experimental results on the 540 angle of CASIA-B are designated in Table 8. The maximum ACR of 97.20 percent is reached by utilizing FKNN and CSVM. The ACR of the remaining classifiers such as SKNN, WKNN, and BTREE reached 91.0, 87.40, and 87.10 percent accordingly. The maximum REC of 97.20 percent is obtained on FKNN and CSVM. The REC for the remaining classifiers such as SKNN, WKNN, and BTREE is 90.97, 87.33, and 87.03 percent respectively.
The highest PR of 97.23 percent is attained on FKNN and CSVM. The obtained value of PR for the outstanding classifiers such as SKNN, WKNN, and BTREE is 91.0, 87.40, and 87.07 percent respectively. The greatest value of AUC is 99.67 percent found on FKNN and CSVM. The value of AUC for the remaining classifiers such as SKNN, WKNN, and BTREE is 97.67, 95.67, and 96.0 percent respectively.
The results verification is also performed by utilizing a confusion matrix. The confusion matrix for the FKNN is demonstrated in Fig 9 which was able to gain the highest results on the 540 angle of CASIA-B. The TPR of 96.70, 96.70, and 98.20 percent is reached for bg, cl, and nm similarly. The obtained value of FNR is 3.30, 3.30, and 1.80 percent for bg, cl, and nm respectively.
Classification results of the proposed stacked-gait on 720 angle.
Table 9 shows the experimental result on the 720 angle of CASIA-B. The maximum ACR of 96.70 percent is grasped by utilizing CSVM. The ACR for the remaining classifiers such as FKNN, SKNN, WKNN, and BTREE reached 79.30, 79.0, 77.60, and 84.60 percent accordingly. The highest REC of 96.70 percent is achieved on CSVM. The remaining classifiers such as FKNN, SKNN, WKNN, and BTREE were able to obtain REC of 79.23, 78.90, 77.37, and 84.47 percent accordingly. The highest value of PR is 96.73 percent which is attained on CSVM. The computed value of PR for the remaining classifiers such as FKNN, SKNN, WKNN, and BTREE is 79.30, 79.10, 78.43, and 84.73 percent accordingly. The highest value of AUC is 99.0 percent reached on CSVM. The AUC value of 84.0, 91.0, 91.67, and 96.0 percent is obtained on the remaining classifiers such as FKNN, SKNN, WKNN, and BTREE accordingly.
The results confirmation is also executed by utilizing a confusion matrix. The confusion matrix for the CSVM is demonstrated in Fig 10 which was able to gain the uppermost results on a 720 angle of CASIA-B. The TPR of 94.90, 96.30, and 98.90 percent is grasped for bg, cl, and nm similarly. The found value of FNR is 5.10, 3.70, and 1.10 percent for bg, cl, and nm respectively.
Classification results of the proposed stacked-gait on 900 angle.
Table 10 demonstrates the results of experiments on a 900 angle of CASIA-B. The maximum ACR of 100 percent is achieved by utilizing CSVM. For the remaining classifiers such as FKNN, SKNN, WKNN, and BTREE the ACR reached 99.70, 79.90, 81.40, and 97.70 percent respectively. The highest REC of 100 percent is obtained on CSVM. The remaining classifiers such as FKNN, SKNN, WKNN, and BTREE achieved the REC of 99.67, 76.57, 81.17, and 97.63 percent accordingly. The maximum value of PR is 100 percent attained on CSVM. The calculated value of PR for the remaining classifiers such as FKNN, SKNN, WKNN, and BTREE is 99.70, 79.20, 81.10, and 97.73 percent respectively. The greatest value of AUC is 100 percent computed on FKNN and BTREE. The AUC of 91.67, 93.0, and 99.67 percent is obtained on the remaining classifiers such as SKNN, WKNN, and CSVM accordingly.
The results validation is also performed by utilizing a confusion matrix. The confusion matrix for the CSVM is demonstrated in Fig 11 which was able to attain the highest results on a 900 angle of CASIA-B. The TPR of 100 percent is achieved for bg, cl, and nm similarly. The obtained value of FNR is 0.0 percent for bg, cl, and nm consequently.
The comparison of the propped method is also carried out with the recent techniques in Table 11.
By using the technique of transfer learning, a HGR method is implemented by authors [64]. The technique is based on feature extraction of two pre-trained deep learning models. The optimization of features is carried out by using the Harris Hawks controlled Sine-Cosine method. The classification is performed by using different classifiers and obtained ACR of 96.70, 99.90, 98.70, 98.10, 85.50, 89.50, 93.30, 87.90, 91.30, 94.70, and 100 percent on all angles of CASIA-B from 00° to 180° degree.
In [65], a new model-free approach for HGR was implemented. The region estimation is performed using optical flow and cropped dynamic coordinates. After that, pre-trained MoblieNetV2 is fine-tuned, and features are extracted. In the third step, the fusion of feature vectors extracted from the optical flow and cropped dynamic coordinates frames is performed. After fusing the feature vectors, the optimization is performed, and classification is carried out finally. They used CASIA-A, B, and C for system testing and obtained ACR of 90.70 percent on 00°, 92.47 percent on 18°, 90.40 percent on 36°, 90.67 percent on 54°, 90.90 percent on 72°, 92.60 percent on 90°, 92.60 percent on 108°, 92.87 percent on 126°, 90.20 percent on 144°, 92.67 percent on 162°, and 92.70 percent on 180° angle CASIA-B. The obtained value of ACR for CASIA-A and C is 99.60 and 95.02 percent respectively. In another method [66], the author presented a multi-modal technique for HGR based on pose and silhouette features. The extraction of set-level spatial and temporal features is carried out with the help of a transformer model set. The system assessment is performed by exploiting two HGR datasets CASIA-B and GREW. By using CASIA-B the ACR of 90.47 percent on 00°, 95.33 percent on 18°, 96.16 percent on 36°, 94.70 percent on 54°, 92.67 percent on 72°, 90.30 percent on 90°, 92.20 percent on 108°, 94.20 percent on 126°, 94.33 percent on 144°, 94.93 percent on 162°, and 88 percent on 1800 degree angle. While the ACR of 82.51 percent is obtained on GREW. In [67], features of local and global entropy are utilized to build a robust method for HGR. The classification is carried out by using an extracted feature vector of gait dynamics and deep CNN. The CASIA-B is exploited to test the method. They used the 900 angle of the dataset and obtained an ACR of 86 percent. In [68], the skeleton of a human is used for HGR by using spatial and temporal features. The redundant features are removed to enhance the performance of the system. The CASIA-B and OUMVLP‐Pose were used to check the viability of the system and obtained the ACR of 79.7 percent on 00°, 81.40 percent on 18°, 82.80 percent on 36°, 83.0 percent on 54°, 82.10 percent on 72°, 80.80 percent on 90°, 82.40 percent on 108°, 80.80 percent on 126°, 79.70 percent on 144°, 80.0 percent on 162°, and 76.80 percent on 180° angle. In the case of OUMVLP‐Pose, the method obtained an ACR of 91.0 percent.
Khan et al. [69] offered a new technique for HGR called IACO. The technique of IACO is comprised of four steps such as video normalization, modifying two CNN models such as Inceptionv3 and Resnet101, features extraction with the help of trained CNN models, and final recognition by utilizing CSVM. The system is verified by CASIA-B and the correctness of 92.0, 93.9, and 96.70 percent is obtained on 000, 180, and 1800 angles correspondingly. In [57], Shopon et al. offered a novel method to minimize the problem of degradation which occurs due to covariant factors. The spatial and temporal features are extracted with the help of residual connection of graphs CNN. The system was tested using CASIA-B and obtained the ACR of 87.93 percent on 00°, 91.14 percent on 18°, 90.93 percent on 36°, 89.77 percent on 54°, 88.81 percent on 72°, 89.78 percent on 90°, 89.21 percent on 108°, 89.97 percent on 126°, 91.02 percent on 144°, 90.96 percent on 162°, and 89.38 percent on 180°. In the case of the proposed Stacked-Gait, the ACR of 99.90, 98.10, 97.20, 97.20, 96.70, and 100 percent is accomplished on 000, 180, 360, 540, 720, and 900 angles of CASIA-B. It can be noticed from Tables 4 and 11 that the Stacked-Gait outperforms the recent systems.
Discussion
A hybrid HGR model is designed to address the problem of HGR classification. The model is rigorously tested by conducting different experiments. The final design of the model is based on the training of selected hyperparameters such as the number of AEs, no of epochs, and no of hidden layers. The testing is performed on six angles of the CASIA-B, each angle has 3 classes. Separate experiments were conducted on each angle to generalize the results of the model. Moreover, the model is tested for a frontal view of CASIA-B by conducting experiments on subset-1, subset-2, and subset-3. By performing a total of 9 experiments, it is evident that the Stacked-Gait model works well for gait analysis even if the data input is different each time. Different evaluation measures are used to test the performance of the model such as REC, PR, AUC, and ACR. The result verification is performed with the help of a confusion matrix. Using different ML classifiers, a maximum of 96.0 ACR is obtained on LD for frontal view analysis. The experimental results on the 000 degree show a maximum ACR of 99.90 percent on FKNN, REC of 99.90 percent on FKNN, PR of 99.33 percent on FKNN, and AUC is 100 percent on FKNN. The maximum true positive rate (TPR) of 99.80, 100, and 99.90 percent is obtained for bg, cl, and nm respectively on FKNN. The outcomes on the 180 angle show a maximum ACR of 98.10 percent on CSVM, a maximum REC of 98.10 percent on CSVM, the highest PR of 98.10 percent on CSVM, and a maximum value of an AUC is 99.33 percent on CSVM. The maximum of TPR on 180 is 97.30, 98.40, and 98.60 percent is attained for bg, cl, and nm correspondingly.
The experimental outcomes on a 360 show the uppermost ACR of 97.20 percent on CSVM, the maximum REC of 97.17 percent on CSVM, the highest PR of 97.17 percent using CSVM, and the best value of an AUC is 99.67 percent on CSVM. The confusion matrix for the CSVM demonstrates maximum TPR of 96.90, 96.30, and 98.30 percent for bg, cl, and nm similarly.
The experimental results on the 540 angle show a maximum ACR of 97.20, a maximum REC of 97.20 percent, and a highest PR of 97.23 percent on FKNN and CSVM. The maximum value of AUC is 99.67 percent is also obtained on FKNN and CSVM. The FKNN obtained maximum TPR of 96.70, 96.70, and 98.20 percent for bg, cl, and nm similarly.
The maximum obtained results on the 720 angle are ACR of 96.70 percent on CSVM, REC of 96.70 percent on CSVM, PR of 96.73 percent, and AUC of 99.0 percent on CSVM. The maximum TPR of 94.90, 96.30, and 98.90 percent is computed on CSVM for bg, cl, and nm similarly. The results of experiments on a 900 angle show a maximum ACR of 100 percent on CSVM, the highest REC of 100 percent on CSVM, and the maximum value of PR is 100 percent attained on CSVM. The highest value of AUC is 100 percent obtained on FKNN and BTREE. The confusion matrix for the CSVM shows the highest TPR of 100 percent is achieved for all three classes such as bg, cl, and nm.
By analyzing the results, it is observed that the CSVM classifiers work well in the majority of the experiments while the SKNN classifier’s performance is minimal in the rest of the classifiers. By conducting the comparison of the Stacked-Gait, it is evident that the Stacked-Gait achieved 6 percent better results for frontal view analysis of CASIA-B. in the case of 000 to 900 of CASIA-B, the Stacked-Gait obtained an average improvement of 4.85 percent on six angles as compared to recent techniques.
On the one hand proposed method outperforms the recent techniques on the other hand it also has some limitations. The performance of the method is affected if there exist image frames that have no information about the gait. This can be addressed by eliminating the empty or less detailed image frames.
Conclusion and future work
The HGR is the discipline of biometrics applied to identify an individual by exploiting the pattern of walking. In this work, a new hybrid model for HGR is implemented to overcome the challenge of viewing variations, wearing various clothes, and carrying conditions. In HGR, researchers face numerous issues that originate problems in recognition such as people with a regular walk, wearing a coat, and clutching some belongings. In this work, a stacked encoder and ML-based hybrid model are used for HGR. In this model, the autoencoder is utilized as a feature descriptor by utilizing its encoder part. Initially, the sequence of videos is transformed into image frames. In the next step, these images are converted into grey-scale. After that, the dataset is divided into two sets such as training and testing. In the next step, the training of two autoencoders is performed. The encoder part of these autoencoders is utilized for feature extraction. The final feature vector is fed into different ML classifiers for final recognition. The Six CASIA-B angles are utilized for the evaluation of the system and succeeded in more than the 90 percent recognition rate. The attained identification rate is superior as compared to the recent approaches. By examining the accomplished results, it is evident that the proposed HGR features are effective as compared to other methods. By using the CASIA-B dataset, it is observed that the performance of the intended technique is superior on several angles as compared to the current works.
In the future, this model can be used for classification problems in other domains, and it can also be utilized for other HGR datasets such as CASIA-A and CASIA-C datasets. Furthermore, we can apply Shapley Additive explanations (SHAP) and Layer-wise Relevance Propagation (LRP) to visualize and analyze different components of input data and various aspects of the gait cycle. This will help to identify the most important components of human gait.
Acknowledgments
This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT)(No.RS-2023-00277907) and by the Technology Innovation Program(No.20022899) funded by the Ministry of Trade, Industry & Energy(MOTIE, Korea).
References
- 1. Murray M.P., Drought A.B., and Kory R.C.J.J., Walking patterns of normal men. 1964. 46(2): p. 335–360.
- 2.
Zhang, R., C. Vogler, and D. Metaxas. Human gait recognition. in 2004 Conference on Computer Vision and Pattern Recognition Workshop. 2004. IEEE.
- 3. Katz J.N., et al., Degenerative lumbar spinal stenosis Diagnostic value of the history and physical examination. 1995. 38(9): p. 1236–1241.
- 4. Jellinger K., et al., Neuropathology of Rett syndrome. 1988. 76(2): p. 142–158.
- 5. Wildes R.P.J.P.o.t.I., Iris recognition: an emerging biometric technology. 1997. 85(9): p. 1348–1363.
- 6. Sharif M., et al., Face recognition using edge information and DCT. 2015. 43(2).
- 7. Duta N.J.P.R., A survey of biometric technology based on hand shape. 2009. 42(11): p. 2797–2806.
- 8.
Jain, A.K., et al. Integrating faces, fingerprints, and soft biometric traits for user recognition. in International Workshop on Biometric Authentication. 2004. Springer.
- 9. Kusakunniran W., et al., Gait recognition under various viewing angles based on correlated motion regression. IEEE transactions on circuits and systems for video technology, 2012. 22(6): p. 966–980.
- 10.
Deng M., et al. View-Invariant Gait Recognition Based on Deterministic Learning and Knowledge Fusion. in 2019 International Joint Conference on Neural Networks (IJCNN). 2019. IEEE.
- 11.
Bashir, K., T. Xiang, and S. Gong, Gait recognition using gait entropy image. 2009.
- 12. Wang C., et al., Human identification using temporal information preserving gait template. 2011. 34(11): p. 2164–2176.
- 13. Connor P., Ross A.J.C.V., and Understanding I., Biometric recognition by gait: A survey of modalities and features. 2018. 167: p. 1–27.
- 14.
Makihara, Y., et al. Gait recognition using a view transformation model in the frequency domain. in Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 7–13, 2006, Proceedings, Part III 9. 2006. Springer.
- 15. Zeng W., Wang C., and Li Y., Model-based human gait recognition via deterministic learning. Cognitive Computation, 2014. 6(2): p. 218–229.
- 16. Verma P., Sah A., and Srivastava R.J.M.S., Deep learning-based multi-modal approach using RGB and skeleton sequences for human activity recognition. 2020. 26(6): p. 671–685.
- 17. Wang J., Peng K.J.C.M.i.E., and Sciences, A multi-view gait recognition method using deep convolutional neural network and channel attention mechanism. 2020. 125(1): p. 345–363.
- 18. Meng C., et al., Gait recognition based on 3D human body reconstruction and multi-granular feature fusion. 2023. 79(11): p. 12106–12125.
- 19. Li X., et al., Joint Intensity Transformer Network for Gait Recognition Robust Against Clothing and Carrying Status. 2019. 14(12): p. 3102–3115.
- 20. Li X., et al., Joint intensity transformer network for gait recognition robust against clothing and carrying status. IEEE Transactions on Information Forensics and Security, 2019. 14(12): p. 3102–3115.
- 21. Song C., et al., GaitNet: An end-to-end network for gait based human identification. 2019. 96: p. 106988.
- 22. Song C., et al., GaitNet: An end-to-end network for gait based human identification. Pattern Recognition, 2019. 96: p. 106988.
- 23. Kovač J., Štruc V., and Peer P., Frame–based classification for cross-speed gait recognition. Multimedia Tools and Applications, 2019. 78(5): p. 5621–5643.
- 24. Khan M.A., et al., Improved strategy for human action recognition; experiencing a cascaded design. 2019. 14(5): p. 818–829.
- 25.
Khan, M.A., et al., Human action recognition using fusion of multiview and deep features: an application to video surveillance. 2020: p. 1–27.
- 26.
Wang, X., et al., Human Gait Recognition Based on Self-adaptive Hidden Markov Model. 2019.
- 27.
Gevers, T., J. Van De Weijer, and H. Stokman, Color feature detection. 2006.
- 28. Khan M.A., et al., License number plate recognition system using entropy-based features selection approach with SVM. 2017. 12(2): p. 200–209.
- 29. Rehman A., et al., Microscopic melanoma detection and classification: A framework of pixel‐based fusion and multilevel features reduction. 2020. 83(4): p. 410–423.
- 30. Saba T., et al., Region extraction and classification of skin cancer: A heterogeneous framework of deep CNN features fusion and reduction. 2019. 43(9): p. 289.
- 31. Połap D. and Woźniak M., The Use of Wavelet Transformation in Conjunction with a Heuristic Algorithm as a Tool for Feature Extraction from Signals. Information Technology and Control, 2017. 46(3): p. 372–381.
- 32. Woźniak M., et al., Graphic object feature extraction system based on cuckoo search algorithm. Expert Systems with Applications, 2016. 66: p. 20–31.
- 33.
Ryu, J. and S.-i. Kamata. Front view gait recognition using spherical space model with human point clouds. in 2011 18th IEEE International Conference on Image Processing. 2011. IEEE.
- 34.
Abdullah, B.A. and E.-S.M. El-Alfy. Statistical Gabor-based gait recognition using region-level analysis. in 2015 IEEE European Modelling Symposium (EMS). 2015. IEEE.
- 35. Khan M.W., et al., A new approach of cup to disk ratio based glaucoma detection using fundus images. 2016. 20(1): p. 77–94.
- 36. Nida N., et al., A framework for automatic colorization of medical imaging. 2016. 7: p. 202–209.
- 37.
Castro, F.M., et al., Automatic Learning of Gait Signatures for People Identification, in Advances in Computational Intelligence, I. Rojas, G. Joya, and A. Catala, Editors. 2017, Springer International Publishing: Cham. p. 257–270.
- 38. Khan M.A., et al., HGRBOL2: human gait recognition for biometric application using Bayesian optimization and extreme learning machine. 2023. 143: p. 337–348.
- 39. Jahangir F., et al., A Fusion-Assisted Multi-Stream Deep Learning and ESO-Controlled Newton–Raphson-Based Feature Selection Approach for Human Gait Recognition. 2023. 23(5): p. 2754.
- 40. Alotaibi M. and Mahmood A., Improved gait recognition based on specialized deep convolutional neural network. Computer Vision and Image Understanding, 2017. 164: p. 103–110.
- 41. Li C., et al., DeepGait: A Learning Deep Convolutional Representation for View-Invariant Gait Recognition Using Joint Bayesian. Applied Sciences, 2017. 7(3): p. 210.
- 42. Arshad H., et al., A multilevel paradigm for deep convolutional neural network features selection with an application to human gait recognition. 2020: p. e12541.
- 43. Deng M., et al., Human gait recognition based on deterministic learning and knowledge fusion through multiple walking views. 2020. 357(4): p. 2471–2491.
- 44. Mehmood A., et al., Prosperous Human Gait Recognition: an end-to-end system based on pre-trained CNN features selection. Multimedia Tools and Applications, 2020.
- 45.
Sokolova, A. and A. Konushin, GAIT RECOGNITION BASED ON CONVOLUTIONAL NEURAL NETWORKS. ISPRS—International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2017. XLII-2/W4: p. 207–212.
- 46.
Zhang, Z., et al. Gait Recognition via Disentangled Representation Learning. in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019. IEEE.
- 47. Asif M., et al., Human gait recognition subject to different covariate factors in a multi-view environment. 2022. 15: p. 100556.
- 48.
Huang, X., et al., STAR: Spatio-Temporal Augmented Relation Network for Gait Recognition. 2022.
- 49. Yousef R.N., et al., Proposed methodology for gait recognition using generative adversarial network with different feature selectors. 2024. 36(4): p. 1641–1663.
- 50. He Z., et al., Temporal sparse adversarial attack on sequence-based gait recognition. 2023. 133: p. 109028.
- 51. Cai S., et al., Gait phases recognition based on lower limb sEMG signals using LDA-PSO-LSTM algorithm. 2023. 80: p. 104272.
- 52.
Khan, M.H., et al. Gait Recognition Using Motion Trajectory Analysis. 2018. Springer International Publishing.
- 53. Nithyakani P., Ferni Ukrit M.J.S., Image, and V. Processing, Deep multi-convolutional stacked capsule network fostered human gait recognition from enhanced gait energy image. 2024. 18(2): p. 1375–1382.
- 54. Peng Y., et al., Learning rich features for gait recognition by integrating skeletons and silhouettes. 2024. 83(3): p. 7273–7294.
- 55. Castro F.M., et al., AttenGait: Gait recognition with attention and rich modalities. 2024. 148: p. 110171.
- 56. Yao L., et al., Robust gait recognition using hybrid descriptors based on skeleton gait energy image. 2021. 150: p. 289–296.
- 57. Shopon M., Bari A., and Gavrilova M.L.J.T.V.C., Residual connection-based graph convolutional neural networks for gait recognition. 2021. 37(9): p. 2713–2724.
- 58.
Liu, G., H. Bao, and B.J.M.P.i.E. Han, A stacked autoencoder-based deep neural network for achieving gearbox fault diagnosis. 2018. 2018.
- 59.
Wang, S. and C. Manning. Fast dropout training. in international conference on machine learning. 2013. PMLR.
- 60.
Zheng, S., et al. Robust view transformation model for gait recognition. in 2011 18th IEEE International Conference on Image Processing. 2011. IEEE.
- 61.
Chao, H., et al. Gaitset: Regarding gait as a set for cross-view gait recognition. in Proceedings of the AAAI conference on artificial intelligence. 2019.
- 62.
Fan, C., et al. Gaitpart: Temporal part-based model for gait recognition. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.
- 63. Deng M., et al., Human Gait Recognition Based on Frontal-View Walking Sequences Using Multi-modal Feature Representations and Learning. 2024. 56(2): p. 133.
- 64. Hanif C.A., et al., Human Gait Recognition for Biometrics Application Based on Deep Learning Fusion Assisted Framework. 2024. 78(1).
- 65.
Ch Avais Hanif1, M.A.M., *, Muhammad Attique Khan2, Usman Tariq3, Ye Jin Kim4, Jae-Hyuk Cha4, Human Gait Recognition Based on Sequential Deep Learning and Best Features Selection. Computers, Materials & Continua, 2023.
- 66. Li G., et al., TransGait: Multimodal-based gait recognition with set transformer. 2023. 53(2): p. 1535–1547.
- 67. Deng M., et al., Human gait recognition by fusing global and local image entropy features with neural networks. 2022. 31(1): p. 013034.
- 68. Gao S., et al., Gait‐D: skeleton‐based gait feature decomposition for gait recognition. 2022. 16(2): p. 111–125.
- 69.
Khan, A., et al., Human gait recognition using deep learning and improved ant colony optimization. 2022.