Speed invariant gait recognition—The enhanced mutual subspace method

This paper introduces an enhanced MSM (Mutual Subspace Method) methodology for gait recognition, to provide robustness to variations in walking speed. The enhanced MSM (eMSM) methodology expands and adapts the MSM, commonly used for face recognition, which is a static/physiological biometric, to gait recognition, which is a dynamic/behavioral biometrics. To address the loss of accuracy during calculation of the covariance matrix in the PCA step of MSM, we use a 2D PCA-based mutual subspace. Furhtermore, to enhance the discrimination capability, we rotate images over a number of angles, which enables us to extract richer gait features to then be fused by a boosting method. The eMSM methodology is evaluated on existing data sets which provide variable walking speed, i.e. CASIA-C and OU-ISIR gait databases, and it is shown to outperform state-of-the art methods. While the enhancement to MSM discussed in this paper uses combinations of 2D-PCA, rotation, boosting, other combinations of operations may also be advantageous.


Introduction
Biometrics authentication has provided a large number of opportunities for security systems and forensic applications. Gait belongs to the class of "behavioral" biometrics. Gait-based person recognition does not require interaction with the subjects, and works from a distance, thus making it attractive to law enforcement agencies. For example, in the United Kingdom a gait recognition system used CCTV imagery to provide court evidence in a case of burglary [1].
Gait recognition methods which extract features from images captured by cameras have produced good results. Several techniques to extract gait features were proposed, such as gait energy image (GEI) [2], affine moment invariants [3], active energy image [4], gait flow image [5], and frame difference frieze pattern [6]. The first two techniques can be considered as gait features containing human shape information rather than motion information, while the last two techniques extract motion information explicitly. Among them, GEI, which is the average image of silhouette images for a duration of one gait cycle, has received the most attention due to its high performance and simple implementation. However, since image-based gait recognition is sensitive to changes in appearance due to various reasons, such as clothing and walking direction changes, the correct classification rate is low in cases where the subject's appearance is different from that in the database. Such low performance may occur even when the subject's walking speed is different from that in the database. The walking speed change causes variations in pitch and stride, which result in non-invariant gait features. Various approaches have been proposed to tackle this issue [7,8], however the performance of these methods is not sufficient. For example, with a challenging data set such as OU-ISIR Gait Speed Transition Dataset [8], Mansur et al. reported a correct classification ratio of 84 [%]. This leaves 16 [%] misclassification, which is still high. Walking speed can change during motion and it may be consciously modified to adapt to the context-such as slowing down when approaching a red light or accelerating when there are only a few seconds to finish crossing an intersection. It would thus be useful to extract motion-invariant information.
In a preliminary study we proposed to use a mutual subspace method (MSM) [9,10], which is an image set-based matching approach used for face recognition (a static/physiological biometric) to capture speed-invariant information [11]. This method involves dividing the human body area in multiple regions and use local movement information [12]. The preliminary experimental evaluations on a challenging gait data set for speed transition (OU-ISIR Gait Speed Transition Dataset [8], OU-ISIR Treadmill Dataset A [13], and CASIA-C Dataset [14]) had shown promise [11,12], however the performance of the MSM-based method diminishes when the speed mismatches are large.
Time-series images of each class in a gallery dataset are given as input into MSM, and a model of each class is obtained as a linear subspace. The same procedure is applied to images in an input probe dataset. To obtain the model, 2D images are first transformed into vectors, and principal component analysis (PCA) is applied to calculate subspaces from the vectors. Here, we refer to both PCA and the spectrum decomposition (autocorrelation-based PCA) as PCA. Similarities between gallery and input subspaces are calculated as squared cosines of canonical angles between two subspaces, and these similarities are used for recognition. However, as Yang et al. pointed out [15], due to the large sizes of the vectors, the covariance matrix calculated in the process of PCA can not be calculated accurately. This is the cause for performance decrease in gait recognition.
In this paper we propose an enhanced MSM to improve accuracy and robustness to speed by the following contributing enhancements: speed and (ii) extracting speed-invariant gait features. In the "common speed" approach, Tanawongsuwan and Bobick proposed a stride normalization procedure for double-support frames [7], and Kusakunniran et al. proposed a gait shape description, which conserves discriminative information in the gait signature [16].
In the "speed-invariant features" approach, various methods have been proposed to extract speed-invariant gait features [11,12,[17][18][19]. Xu et al. proposed a method based on single-support gait images, whose appearance is not changed by speed variations [18]. Guan and Li employed Random Subspace Method (RSM)-based method [17], which reduces the generalization errors by combining a large number of weak classifiers. Khan et al. proposed a spatiotemporal motion-based method [19], which is not affected by appearance changes.
In a preliminary study, we proposed a mutual subspace method (MSM)-based approach to be robust to speed variations [11], with work continued in [12] by dividing the human body area in multiple regions, followed by adaptive choice of areas that have high discrimination capability. In the case that the MSM-based method with divided areas [12] is applied to relatively high resolution gait images (e.g. the human height is more than 100 pixels), we achieved better performance than the MSM-based method [11]. However, in the case of low resolution images, the performance of the MSM-based method with divided areas degrades, due the loss of discrimination capability in each divided area.
One of the common issues in existing approaches is that the performance suffers from large speed mismatch. The enhanced MSM proposed in this paper addresses this challenge and achieves state-of-the-art performance in speed invariant gait recognition.

Deep learning-based gait recognition
Deep learning-based approaches have been applied to gait recognition and achieved state-ofthe-art performance [20,21]. Most of the methods focus on view-invariant gait recognition, since many data sets contain large view variance [22]. However, these approaches are not suitable for use with small data sets.

Mutual subspace method (MSM)
This section reviews MSM's application to gait recognition and MSM. It explains a relation between MSM and GEI, followed by the rationale for why MSM works well in gait recognition.

The use of MSM in gait recognition
An overview of the gait recognition based on MSM is shown in Fig 1. Assume that there is a person (probe) to be recognized and that there are multiple people registered in the database as a gallery data set. To recognize the person with MSM, time-series gait silhouette images are used as input. In general each silhouette area is scaled to a uniform height. From gallery images of class c and input images of the person, similarities between the class c and the subject are calculated based on MSM. This process is repeated for all classes in the database.
In this paper we use the following definition to calculate correct classification and misclassification rates. The person is classified by choosing highest similarities among all classes. This classification process is applied to all people in the probe dataset, and the numbers of true positives (TP) and of false positives (FP) are obtained. The correct classification and misclassification rates are calculated by TP/(TP + FP) and FP/(TP + FP), respectively.

Mutual subspace method (MSM)
MSM is an image set-based matching approach. MSM models template images of each class in a gallery dataset and input images in a probe dataset as subspaces. The similarity measure in MSM is calculated as a cosine squared canonical angle between each of the gallery subspaces and the input subspace. Details of the MSM are as follows.
Let us assume a C class pattern recognition problem. 2D images I G c;n (1 � n � N G c , where N G c is the number of images in class c in the gallery (G) dataset) are transformed into 1D vectors x G c;n . Eigenvalues l d G c and eigenvectors ϕ d G c of the class c gallery samples (1 � d G � D G , where D G is the dimensionality of the gallery subspace) are obtained from a class autocorrelation matrix Γ G c as The same procedure is applied to input images I P n (1 � n � N P , where N P is the number of images of each subject in the probe (P) dataset) to obtain eigenvectors ψ d P (1 � d P � D P , where D P is the dimensionality of the input subspace). The similarities in MSM, which are squared cosines of canonical angles between two subspaces, are calculated as eigenvalues of the following matrix [23].
We define the similarities as s ðr;d G ;d P Þ c (1 � r � R where R is the maximum number of canonical angles to be used in the recognition process and less than or equal to the rank of Z c . In existing methods r = 1 is commonly used). Intuitively, canonical angles between the gallery and input subspaces are regarded as the angles which minimize appearance difference between the gallery and input data. Eigenvectors ω ðr;d G ;d P Þ c are also calculated. Dimensionalities of gallery and input subspaces, d G and d P , are influential parameters. Roughly speaking the dimensionality is proportional to variations in gallery and input images. However when the dimensionality increases too much, the recognition accuracies got worse, because of the increasing amount of intersections among subspaces. Generally such dimensionalities are defined by cross validation experiments.   These images suggest that the first principal component represents human shape, such as head and torso, and the following principal components show motion information. Fig 2( Fig 3, we can see both are very similar. From a linear algebra perspective, GEI can be regarded as a representation of gait images with a 0-dimensional affine subspace. In the definition of an affine space, a point is defined as zero-dimensional. GEI is in a 0-dimensional subspace, since at each pixel a value is calculated as an average of time-series values (i.e. removing time information). The GEI-based gait recognition method can be interpreted as a subset of class-featuring information compression (CLAFIC) [24], which is one of the subspace methods with 0-dimensional affine subspace. Thus it is natural to assume that a d-dimensional subspace may perform better than the 0-dimensional one (i.e. GEI). MSM is categorized as the d-dimensional subspace method.

A relation between MSM and GEI
To show how the d-dimensional subspace method works, we visualize canonical vectors corresponding to each canonical angle defined in Eq 2. From the definition of Eq 2, the rth canonical angle can be regarded as an angle between the gallery canonical vector v G c;r of class c and the input canonical vector v P c;r of the subject. The input canonical vector v P c;r is defined as v P c;r ¼ The gallery canonical vector v G c;r is also calculated in the same manner. Two canonical vectors of the 1st canonical angle represent the most similar vectors between the gallery and the subject images.
In 2D principal component analysis (PCA) [15] a covariance matrix is calculated directly from image matrices (i.e. not from 1D vectors, which are used in a regular PCA). Thus the size of the covariance matrix of 2D PCA is smaller than that of the PCA, and this results in higher accuracy of the covariance matrix. More details are given in the following.
Given N G c images I G c;n (n = 1, . . ., N G c ), an A × B matrix (A and B are height and width of an image, respectively), in a gallery dataset, an image covariance matrix S G c is calculated by Instead of using 1D vectors of images as input vectors into MSM, the 2D-PCA MSM uses projected vectors by 2D PCA as input vectors into MSM. Each image I G c;n is projected with eigenvectors η G c;k (1 �k � K) and a projected vector ξ G c;n;k is obtained by ξ G c;n;k ¼ I G c;n η G c;k , and a set of projected vectors ξ G c;n;1 ; . . . ; ξ G c;n;K are obtained. The same process is applied to each input image I P n and a projected vector ξ P c;n;k is obtained by ξ P c;n;k ¼ I P n η G c;k . From the projected vectors, we define a gallery projected matrix and an input projected matrix as Ξ G c;n ¼ ½ξ G c;n;1 ; . . . ; ξ G c;n;K � and Ξ P c;n ¼ ½ξ P c;n;1 ; . . . ; ξ P c;n;K �, respectively. The set of projected matrices Ξ G c;1 ; . . . ; Ξ G c;N G c of the class c in the gallery dataset and the set of projected matrices Ξ P c;1 ; . . . ; X P c;N P in the probe dataset are used as input for MSM. Here, each of the projected matrices is an A × K matrix. Similarities (i.e. eigenvalues) s ðr;d G ;d P Þ c between the probe dataset and each class c of the gallery datasets are calculated, followed by classification. In Fig 5, the smaller dotted rectangle shows the overview of the process of the proposed 2D-PCA MSM.
The size of the projected matrices is smaller than that of those original images, where the projected matrices represent gait images efficiently. Thus the use of projected matrices as input into MSM has the potential to produce higher performance compared with the use of the original images.
To show the effectiveness of 2D PCA compared with PCA, we performed preliminary experiments on the OU-ISIR Treadmill Dataset A. Our experiment did matching between gait images of 7 km/h (gallery) and those of 2 km/h (probe) as shown in Fig 2. We build a classifier h as In the following examples, we set r = 1 (i.e., maximum eigenvalue), and d G and d P are defined by cross validation experiments. As a baseline result, we obtained a 88% correct classification rate by MSM. Fig 6(a) shows results of the matching by the proposed 2D PCA-based MSM with respect to changes of the number of eigenvalues K of 2D PCA, and we can see that the number of eigenvalues K between 9 and 21 shows improved performance over MSM.   respectively. In Fig 6(b) the performance by PCA-based MSM is not good especially in the case of K < 2700. One reason for such poor performance, as shown in Fig 8, is the reconstructed images with K = 1 contain blurred regions, especially in the leg and arm regions. This is a direct result of less accurate evaluation results of the covariance matrix computed by 1D PCA. On the other hand, in Fig 7 the reconstructed images with K = 1 by 2D PCA represent simplified information of the subject, such as the length of the legs and the location of hand/ arm. Based on the accuracy shown in Fig 6(a), the performance with K = 1 by 2D PCA-based MSM is better than that by 1D PCA-based MSM, suggesting that the simplified information extracted using 2D PCA is more suitable for people recognition than the blurred information produced by 1D PCA.

2D-PCA MSM with Rotated Images (2D-PCA-R MSM)
2D PCA has been proven to be a special case of block-based PCA [25]. For example, blocks are horizontal lines of images. This suggests that the covariance matrix produced by 2D PCA from images is different from that of images rotated at a certain angle. Fig 9(a) shows horizontal lines, which are used in 2D-PCA MSM. In the case that the lines are rotated at 20 degree as shown in Fig 9(b), the covariance matrix calculated from rotated parallel lines is different from that with horizontal lines. This results in extracting gait features which are different from those obtained using horizontal lines. Moreover, the horizontal lines rotated at a certain angle capture local biometric information, such as lengths of arms and leg segments, which are not captured by the horizontal lines. Therefore, similarities from lines rotated with various rotation angles contain richer gait/biometric features than those from the original images. Fusing similarities of lines rotated with various rotation angles is thus expected to improve the performance of gait recognition.
Because the implementation is simpler, in this paper we rotate an original image (Fig 9(a)) with respect to the optical axis of a camera instead of rotating the horizontal lines, and we obtain a rotated image as shown in Fig 9(c). This is equivalent to using rotated parallel lines on non-rotated images. In

Boosted-2D-PCA-R MSM (B-2D-PCA-R MSM)
There are various methods to fuse similarities, such as summation and voting. It is reported that the boosting method efficiently fuses the similarities of various canonical angles [26]. As discussed in section of MSM, if the dimensionalities d G and d P are changed, the recognition accuracy also changes. It may vary with changes of rotation angles. In general, these parameters are defined by cross validation experiments. In this paper we propose to fuse similarities from various dimensionalities, rotation angles, and canonical angles.
There are in total 4 parameters: canonical angle r (1 � r � R where R is the maximum number of canonical angles), dimensionalities d G and d P (1 � d G � D G and 1 � d P � D P , where D G and D P are the dimensionalities of the gallery and input subspaces, respectively), and rotation angle θ (−Θ � θ � Θ, every Δθ). Each combination of parameters produces different similarity, and the proposed method can efficiently choose combinations of parameters which improve the recognition accuracy.
To select parameters contributing to gait recognition, we use AdaBoost [27]. The main steps of the process are as follows: Step 1 The weight of each probe dataset is initialized as D t¼0 ðiÞ ¼ 1 N P , where i is 1 � i � N P (N P is the number of probe datasets) and t is 1 � t � T (T is the maximum number of iterations in the boosting process).
Step 2 At each classifier h ðr;d G ;d P ;yÞ , an error rate � ðr;d G ;d P ;yÞ t is calculated as This process is done for all classifiers (i.e. all combinations of parameters).
Step 3 The classifier with the minimum � ðr;d G ;d P ;yÞ t among all classifiers is selected as h p t t at iteration t, and p t is the selected combination of parameters of canonical angles, dimensionalities, and rotation angle at iteration t. Fig 10. Examples of rotated images (0, 10, 30, 50, 70, 90, -10, -30, -50, -70, -90 degree). https://doi.org/10.1371/journal.pone.0255927.g010 Step 4 Confidence a t 2 R is calculated as Step 5 Weights D t (i) are updated as where Step 6 Update t as t + 1 and repeat Steps 2*5 if t < T.
Final classifier H is defined as where score t;c ¼

Experiments
In this section we implement the 2D-PCA MSM, the 2D-PCA-R MSM, and the B-2D-PCA-R MSM, and we evaluate the performance of the proposed methods on gait databases. In our experiments we used the following 3 gait databases: (i) CASIA-C Dataset [14], (ii) OU-ISIR Treadmill Dataset A [13], and (iii) OU-ISIR Gait Speed Transition Dataset [8]. These data sets were chosen as they include speed variability. To check if the proposed eMSM correctly distinguishes datsets in which people are walking, and those in which do not (they perform other activities), we also tested against the KTH action dataset [28].

Gait recognition with the CASIA-C Dataset
The CAISIA-C Dataset consists of 153 subjects with 3 different walking speeds and a carrying condition. Since this paper focuses on walking speed condition, we apply the proposed method to datasets of different walking speeds. The three walking conditions include normal walking (fn), slow walking (fs), and quick walking (fq). For each subject, there are 4 sequences of fn, 2 sequences of fs, and 2 sequences of fq. We used 3 normal walking (fn) sequences as the gallery set, and slow walking (fs) was used as the probe set. The image resolution is 320 × 240.
The following list shows 2 experiments. The first experiment is done to determine the proper ranges of 4 parameters R, D G , D P , and Θ. The second experiment was done to confirm the effectiveness of the B-2D-PCA-R MSM. To fix parameters which are not trained (e.g. for the second experiment, the number of eigenvalues K of the 2D-PCA) and train the parameters, we used quick walking (fq) data in the case that slow walking (fs) data is used as the probe data, and vice-versa.

Train (i) a parameter rotation angle θ while keeping the rest of parameters fixed, (ii) a parameter canonical angle r while fixing the rest, and (iii) parameters d G and d P while fixing the rest.
First, we train a parameter rotation angle θ while keeping the rest of parameters fixed, to determine the range of parameter Θ. We used slow walking (fs) as probe data and we applied the boosting method to fuse similarities of all rotation angles from −Θ [degree] to Θ [degree]. Here, we set two different angle intervals. One is Δθ = 10 for |θ| 2 [20, 90] (-90 � θ � -20 and 20 � θ � 90), and the other one is Δθ = 2 for -20 � θ � 20. We set the canonical angle R = 1 (i.e. the maximum eigenvalue). The correct classification rate by the B-2D-PCA-R MSM is 99.02%, and the trained parameters at each iteration during the boosting process is shown in Fig 12(a). At the first iteration -10 degrees is selected and after that 20 degrees continuously selected. None of rotation angles from -90 to -30 and from 30 to 90 is selected. Therefore in the following experiments we set Θ = 20 degrees.
We also evaluated the classifier h (θ) at each θ (i.e. we used 2D-PCA-R MSM). Fig 12(b) and 12(c) show correct classification rates of |θ| 2 [20,90] (-90 �θ� -20 and 20 �θ� 90, Δθ = 10), and -20 �θ� 20 (Δθ = 2), respectively. These results show that recognition accuracy is relatively high between -20 and 20 degrees. The highest recognition accuracy is 99.02%, which is  Next, we train a parameter canonical angle r while keeping the rest of parameters fixed to determine the range of parameter R. We set R = 20 and rotation angle Θ = 0. The correct classification rate by the boosting method is 99.35%, and Fig 13(a) shows the trained canonical angles at each iteration in the boosting process. Based on this result, in the following experiments we set the parameter canonical angle R as 18.
Lastly, to fix the parameters D G and D P , we set d G and d P from 10 to 70 (every 10), respectively. We set the rotation angle and the canonical angle as Θ = 0 and R = 1, respectively. The correct classification rate by the boosting method is 98.7%, and Fig 13(b) shows the trained canonical angles. Based on this result in the following experiments we set the parameters D G and D P as 50.
Select all parameters r, d G , d P , and θ. In the last experiment we fused similarities of all parameters r, d G , d P , and θ. We set canonical angle r from 1 to 18, d G and d P from 10 to 50 at every 10, and rotation angle θ from -20 degree to 20 degree at 2 degree increments. The correct classification rate is 100.0% after 3 iterations as shown in Fig 14(a), and the use of full 4 parameters in the B-2D-PCA-R MSM produces the highest accuracy (100%) as shown in Table 1(d). This result shows that B-2D-PCA-R MSM is robust to speed changes. Fig 14(b) and 14(c) show the selected 4 parameters. Here, robustness to variation of walking speed means that one maintains high accuracy of classification for a diversity of speeds of the walking individuals in

Comparison with state-of-the-art methods
Gait recognition with the CASIA-C Dataset. As performed in the previous section, we used normal walking (fn) sequences as the gallery set, and both slow walking (fs) and quick walking (fq) sequences in the probe set. Table 2 shows the correct classification rates by the B-2D-PCA-R MSM, the spatiotemporal motion-based method [19], the RSM-based gait recognition [17], and the MSM-based method [12], and single-support GEI (SSGEI) [18]. These results show the effectiveness of the proposed method. Gait recognition with the OU-ISIR Treadmill Dataset A. The OU-ISIR Treadmill Dataset A consists of gait images of 34 people, each with a gallery dataset and a probe dataset. From the dataset specification, 25 subjects are assigned for evaluation while the rest are used for parameter training, where the image resolution is 128 × 88. This dataset consists of gait images with speed variations (from 2 km/h to 10 km/h, every 1 km/h) for each gallery and probe dataset. Subjects walked for speeds between 2 km/h to 7 km/h and run for speeds between 8 km/h to 10 km/h. In our experiments, we used images with speeds between 2 km/h to 7 km/h, and we performed experimental evaluations with all combinations of gallery and probe speeds, i.e. 36 combinations ("6 different speed for each gallery dataset" × "6 different speeds for each probe dataset"). Table 3 lists the average classification rates of our B-2D-PCA-R MSM (99.89%), the MSMbased method with divided areas [12] (99.78%), and the SSGEI-based method [18] (99.33%). In Table 4, bold numbers and numbers in round brackets show results of cross-speed walking people identification with our B-2D-PCA-R MSM and the MSM-based method with divided areas [12], respectively. Our method shows slightly better performance than the MSM-based method [12]. Furthermore, the equal error rate (EER) by the proposed method is greatly improved compared with the MSM-based method with divided areas [12]. In the case that the absolute speed difference between the gallery and probe data is less than 2 [km/h], average EER is 0 for both the B-2D-PCA-R MSM and the MSM-based method with divided areas [12], as shown in Table 5. In the case that the absolute speed difference is greater than or equal to 2 Table 2 [km/h], average EER gradually increases for the MSM-based method with divided areas [12], where the maximum averaged EER is 3.0%. On the contrary, the maximum average EER for the B-2D-PCA-R MSM is 0.05%. Fig 15 shows the ROC (Receiver Operating Characteristic) curves between 3 km/h (gallery) and 7 km/h (probe) by the MSM-based method with divided areas [12] and the B-2D-PCA-R MSM. Though this speed combination of gallery and probe data results in the same correct classification rates (100%) between the B-2D-PCA-R MSM and the MSM-based method with divided areas [12], the ROC curves show that the B-2D-PCA-R MSM is superior to the MSMbased method with divided areas.

Methodology fs fq
Gait recognition with the OU-ISIR Gait Speed Transition Dataset. The OU-ISIR Gait Speed Transition Dataset consists of two different datasets, dataset 1 and dataset 2. In dataset 1 the probe set consists of speed transited gait sequences recorded from 26 subjects. The gallery dataset consists of gait sequences of 179 subjects, which include the 26 probe subjects, where the subjects walked at a constant speed (4 km/h) for a few seconds. In dataset 2 the probe set consists of 25 subjects where each subject walked twice on the treadmill under the following conditions: (i) accelerations from 1 km/h to 5 km/h and (ii) decelerations from 5 km/h to 1 km/h. For the gallery dataset, there are 154 subjects, which include the 25 probe subjects, where each subject walked at a constant speed (4 km/h) for six seconds. In addition to the two datasets we explained above, the OU-ISIR gait speed transition dataset has an auxiliary training set, which includes 24 subjects under various walking speeds (2, 3, 4, and 5 km/h). The parameters in our method were learned using the auxiliary training set which contains images of resolution 32 × 22. We perform evaluation as follows. A set of images for the duration of one gait cycle are used for the evaluation of both gallery and probe datasets. For dataset 2, we divided each gait sequence into multiple gait sequences so that each of them includes gait images for the duration of one gait cycle. This is the same evaluation setting of [8]. Table 6 shows EER (equal error rate) and CCR (correct classification rate) of the MSMbased method [11] and the B-2D-PCA-R MSM for dataset 1. Here, we did not apply the MSMbased method with divided areas [12] due to the low image resolution, which causes performance degradation. Fig 16 shows [8], we cannot show their results on this paper. For more details, please refer to [8].
In [11], we reported that the MSM-based method [11] outperformed the Mansur's method [8]. The results in Table 6 and Fig 16 shows the performance of the B-2D-PCA-R MSM is better than that of the MSM-based method.  Table 7 shows EER and CCR of Mansur's method [8], the MSM-based method [11], and the B-2D-PCA-R MSM for dataset 2 accelerations and decelerations. From these results, it is clear that the B-2D-PCA-R MSM method outperformed existing methods.

Experiments with KTH action dataset
In this section we used the KTH action dataset [28] to evaluate if the proposed eMSM is able to correctly detect the cases when subjects are not walking, but doing something else (we refer to this as "not-walking" data). The KTH action dataset includes data from six activities (walking, jogging, running, boxing, hand waving and hand clapping) with 25 subjects, separated into train, validation, and test data (7, 7, and 8 subjects, respectively). Each subject has 4 data sets.
To evaluate eMSM with the KTH dataset, we calculated specificities and sensitivities (defined as TN / (TN + FP) and FN / (FN + TP), respectively) by changing thresholds of the eMSM. As for the training data, we used OU-ISIR Treadmill Dataset A (gallery 2 [km/h]), and we used combination of OU-ISIR Treadmill Dataset A (probe 7 [km/h]) and KTH dataset as test dataset. The reason why we included the OU-ISIR Treadmill Dataset A in test data is to avoid denominators being 0 during the calculation of specificities and sensitivities. We used all KTH test dataset (in total 192 data sets (= 6×8×4)) in addition to OU-ISIR Treadmill Dataset A probe data (25 data sets).  The result of B-2D-PCA-R MSM shows very close to the left top area, which means that the B-2D-PCA-R MSM is robust in "not-walking" data sets.

Conclusion
We proposed an enhanced mutual subspace method (eMSM) for gait recognition that is robust to variations in walking speed. The proposed method has three operational steps using these specific methods consecutively: First, a 2D PCA is utilized to calculate the covariance matrix with high accuracy, which enables us to represent subspaces of the original images efficiently. Second, rotated images are introduced to extract rich gait information. Third, boosting is applied to adaptively fuse multiple gait features. We carried out experiments using three public gait databases (CASIA-C Dataset, OU-ISIR Treadmill Dataset A, and OU-ISIR Gait Speed Transition Dataset), and we demonstrated that the proposed method outperformed state-ofthe-art methods.
There are other operations to improve the performance, such as Random-Forest-based feature fusion and flipping images to extract rich gait features. These are left as future work.
The proposed method showed advantages in gait recognition with speed variation, which is one of appearance changes. It has the potential to be used for other types of appearance changes, such as cloth variation. Thus in the future we will try to develop a method which recognizes people with different clothes and speed.