Face recognition algorithm using extended vector quantization histogram features

In this paper, we propose a face recognition algorithm based on a combination of vector quantization (VQ) and Markov stationary features (MSF). The VQ algorithm has been shown to be an effective method for generating features; it extracts a codevector histogram as a facial feature representation for face recognition. Still, the VQ histogram features are unable to convey spatial structural information, which to some extent limits their usefulness in discrimination. To alleviate this limitation of VQ histograms, we utilize Markov stationary features (MSF) to extend the VQ histogram-based features so as to add spatial structural information. We demonstrate the effectiveness of our proposed algorithm by achieving recognition results superior to those of several state-of-the-art methods on publicly available face databases.


Introduction
Face recognition, a typical biometric identification technology, is now recognized as an essential technology for establishing secure control. It has attracted much attention from researchers and engineers over the past decades owing to its wide range of applications in many fields, including information security, identity authentication, law enforcement, smart cards, access control systems and so forth. The entire face recognition procedure consists primarily of two operations: feature extraction and classifier design. These two steps have a substantial influence on the effectiveness and reliability of various recognition approaches. Regarding feature extraction, various face representation approaches have been discussed and studied, and all these approaches can be roughly divided into two categories: appearance-based methods and feature-based methods.
Appearance-based models use transformations and statistical methods to project samples from high-dimensional space into a much lower-dimensional feature subspace to extract the holistic features to represent the face. The Eigenfaces (PCA) [1] and Fisherfaces (LDA) [2,3] approaches are two of the most representative subspace techniques. The Eigenfaces approach, which is based on the Karhunen-Loeve transform, produces an expressive subspace for facial representation and recognition, while the Fisherfaces approach is a supervised subspace a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 analysis technique that can search for the projection directions that are optimal for discrimination. More recently, some extensions of PCA and LDA based face recognition have been studied and applied, such as Two-dimensional PCA (2DPCA) [4] and Two-dimensional LDA (2DLDA) [5]. In contrast to Eigenfaces and Fisherfaces, which are based on one-dimensional image vectors, the newly proposed approaches are based on a two-dimensional (2D) image matrix that directly addresses the 2D face images without the need for image-to -vector transformation. Moreover, there are many other methods, such as Non-negative Matrix Factorization (NMF) [6], which-unlike PCA and LDA-is designed to capture the part-based structures inherent in the face images space. NMF is a method to obtain a data representation using non-negativity constraints. Locality Preserving Projections (LPP) [7] is an alternative approach to PCA. LPP is a linear subspace method that tries to optimally preserve the local neighbourhood information. Structure-Preserved Projections (SPP) [8] is an algorithm that takes the holistic context of a face into account and preserves the configural structure of each face image in subspace.
In contrast to the above subspace methods that directly consider whole-face images as the input patterns, feature-based methods are based on the relationships between local facial features such as the eyes, mouth, nose, and so on. Some commonly used feature-based methods exist. In [9], the Local Binary Patterns (LBP) method [9], feature histograms are extracted from each small region of facial images by considering each pixel in the image as well as the values of its neighbourhood pixels. The Histograms of Oriented Gradients (HOG) method [10,11] has been shown to be an effective descriptor for object recognition in general, and it is particularly effective in face recognition tasks. In [12], Elastic Bunch Graph Matching (EBGM) [12] was proposed to recognize objects or object classes in an image based on a graph representation extracted from other images. Scale Invariant Feature Transform (SIFT) [13] is an algorithm used to detect and describe scale, translation and rotation-invariant local features in images. Other features can also be used for face recognition, such as Discrete Cosine Transform (DCT) [14], which has been used as a feature extraction step in various studies on face recognition, and Discriminative Common Vectors (DCV) [15], an approach proposed for face recognition that is based on a variation of Fisher's Linear Discriminant Analysis for small sample sizes.
Aside from the above approaches, many other techniques exist to perform face recognition, such as Sparse Representation Classification (SRC) [16], Linear Regression Classification (LRC) [17], Vector Projection Classification (VPC) [18], Nearest Distance Classifiers (NDC) [19], Bayesian Classifier (BC) [20], Support Vector Machines (SVM) [15], Convolution Neural Network (CNN) [21] [22] and so on. Among these, the CNN has become one of the most popular techniques in recent years. There are numerous CNNs based methods, including AlexNet (one of the largest CNNs used in the ILSVRC-2010 competitions [23]) and CenterlossNet [21] (an optimized CNN architecture that utilizes a new supervision signal called centre loss to optimize CNNs. The discriminative deep features extracted from CenterlossNet have achieved excellent performances on several important face recognition benchmarks). Table 1 provides a summary of the acronyms and references of the algorithms mentioned above.
However, most of the face representation approaches mentioned above are subject to limitations, including computational issues, and can become quite complex. Although many of the appearance-based face recognition techniques work well in controlled environment, in many real-world applications, the number of available training samples is often limited. Consequently these techniques have difficulty handling substantial amounts of facial variations, such as changes in illumination, pose, accessories and expression, as well as performing sample analysis of new classes. For example, the conventional methods PCA and LDA rely on component analysis techniques. For classification purposes, LDA is generally considered superior to

Approach
Advantages Limitations

Acronyms Algorithms
Ref.
•Contains no class information on the input data. [1] •Fails to capture high-order statistics.
•Suffers from the small sample size problem. [2,3] •Fails to capture high-order statistics.
2DPCA Two-dimensional PCA •Can directly extract the matrix features of 2D images. •More coefficients than PCA for image representation.
2DLDA Two-dimensional LDA •Can directly extract the matrix features of 2D images. •More coefficients than LDA for image representation. [5] •More computationally efficient and stable than LDA.

NMF Non-negative Matrix Factorization
•Can capture important local differences.

LPP Locality Preserving Projections
•Can find the intrinsic low-dimensional nonlinear manifold structure hidden in the observation space.

Structure-Preserved Projections
•Can preserve the configural structure of facial image in subspace.
•Robust to variations such as head, pose, lighting condition, and facial expression. [8] LBP Local Binary Patterns •Simple calculation.
•Sensitivity to noise.
•Good for extracting the local texture features of a face image.
•Features contain no shape information. [9] •Invariant to rotation and grey-scale.

HOG Histograms of Oriented Gradients
•Invariant to illumination and 2D rotation.
•Non-robust to scale changes. [10,11] EBGM Elastic Bunch Graph Matching •Can model a face as a 2-D elastic graph.
•Non-robust to changes in expression and illumination. [12] •High computation cost.

SIFT Scale Invariant Feature Transform
•Robust to rotation and scale changes.
•Complex calculation. [14] •Can be implemented using a fast algorithm.
•Applications for under-sampled data are limited.
•Linear technique (Inadequate to describe the complexity of face image due to facial variations). [15] •Can handle the small sample size problem. PCA when sufficient training samples per individual are available [2], however, when the number of available training samples per individual is far smaller than that we supposedly have, the experimental analysis in [24] indicates that PCA outperforms LDA. In particular, when one training sample per individual is available in the database, LDA cannot be readily applied because the within class scatter cannot be estimated. Moreover, most feature-based methods appear to be inadequate when nonrepresentative training samples are given. For example, the HOG descriptor [10] is non-robustness to scale changes, and the original LBP operator [9] suffers from sensitivity to noise and variance to rotation; thus, the applications of these methods in non-controlled environments are limited. Other face recognition techniques are quite complex and computationally expensive and are therefore unsuitable for processing the large numbers of training face images that are often required in practical applications. For example, Wright et al. [16] proposed a creative face recognition algorithm called SRC in which the occlusion matrix is an orthogonal matrix and the number of atoms required is very high. A large occlusion matrix can make the sparse coding process very computationally expensive and even prohibitive. Furthermore, most existing face representation approaches are handcrafted and usually require strong prior knowledge for manual tuning. Therefore, in this paper, we focus on a featured-based algorithm and propose a simple yet effective approach for face recognition. A reliable algorithm called the Vector Quantization (VQ) histogram method was developed in [25] to extract VQ histogram features of facial images for face recognition. Although the effectiveness of this method has been demonstrated by its excellent face recognition performances on the publicly available face database (the ORL database) [26], the inability of VQ histogram features to convey spatial structural information and take interactions between multiple different facial sub-regions into account greatly limits its discrimination capability-especially when applied to a larger face database. To overcome this limitation, in this paper, a novel recognition algorithm based on multiple image sub-regions (MSR-MSF-VQ) is proposed to address this problem. The key contributions of this paper are as follows:

SRC
1. Li et al. proposed a framework in [27] called MSF and utilized the framework to extend color histogram-based features with local location structure information. Based on this concept, we make full use of this framework and combine it with the VQ algorithm to incorporate spatial structural information into the VQ histogram. The obtained MSF extended VQ histogram features can be applied to face recognition and achieve satisfactory experimental results.
2. We propose the MSR-MSF-VQ algorithm based on the original MSF-VQ algorithm. The important characteristic of this extended algorithm is that it can consider the interactions of multiple different facial image sub-regions and thus preserve the significant location structure information and the spatial relationships of the facial sub-regions in the final feature information. The extended MSR-MSF-VQ features generated by our proposed algorithm can significantly improve face recognition performance.
3. During the process of face recognition, we can obtain satisfactory recognition results by simply transferring the comparison of two facial images to the comparison of two corresponding MSR-MSF-VQ features by using distance as a dissimilarity measure. We also make use of the advantages of the SVM classifier to further optimize the recognition performance.
The rest of the paper is organized as follows. First, we review the VQ and MSF approaches and then introduce the proposed MSR-MSF-VQ algorithm in detail. Next, we present the experimental results and analysis from tests on six well-known public face databases. Finally, conclusions are drawn.

Related algorithms
This section briefly reviews the VQ and MSF algorithms.
Vector quantization (VQ) histogram method. Feature extraction plays a crucial role in the face recognition process. VQ [25] is a widely used data compression model and an effective feature extraction algorithm that can extract VQ histogram features for face recognition. Here, we present a brief description of the VQ algorithm.
As shown in Fig 1, we first use a simple 2-D moving average filter to preprocess the input image because low-pass filtering is essential for eliminating high-frequency noise and extracting the most effective low-frequency components for recognition. Next, a block division step is conducted. Specifically, this step divides the input image into 4×4 pixel blocks with a 15/16 overlap using a dividing-partition that slides through the pixels one by one. Then, the minimum intensity in each 4×4-pixel block is found and subtracted from each pixel in the block. This effectively compensates for all brightness variations while preserving the intensity variations in the block for further feature extraction processing. The blocks with varying intensity are taken as input vectors, each of which is matched with the codevectors in a codebook containing 33 codevectors. By computing the Manhattan distances between the input vectors and the codevectors, we match the most similar codevector to the input vector by finding the minimum distance. After performing VQ for all the blocks in the image, a VQ histogram is generated by counting the matched frequencies for each codevector. In the registration procedure, we apply the VQ algorithm to all the facial images in a set of images and utilize their VQ histogram features as personal identification information to establish a face database. Subsequently,  in the recognition procedure, the VQ histogram created from an unknown given facial image is compared with the registered individual VQ histograms, and the best match is output as the recognition result. The corresponding experimental results illustrated in [25] have demonstrated the effectiveness of the VQ histogram algorithm.
Markov stationary features (MSF). As described in [27], we know that MSF can essentially handle three-level histogram-distinguishable problems; thus, they can alleviate the limitations of histograms. We can utilize this framework to extend the histogram-based features with spatial structural information from the facial image. Therefore, in the following, we provide an overview of Markov stationary features.
Let p k denote a pixel in image I. C = (c ij ) K×K represents the spatial co-occurrence matrix, each element of which takes the following form: where d denotes the distance between the pixels p 1 and p 2 , and c ij counts the number of spatial co-occurrences for bins c i and c j . After obtaining the co-occurrence matrix, the corresponding transition matrix P = (p ij ) K×K derived from the spatial co-occurrence matrix C = (c ij ) K×K can be easily computed using Formula (2), in which p ij denotes the probability of changing from state c i to c j .
Suppose the state distribution after n steps is π(n) and the initial distribution is π(0). The stationary distribution is an invariant measure of a Markov chain, which can be accumulated by: Here, π is the stationary distribution that satisfies π = πP. Finally, the complete MSF feature which includes the combination of the initial distribution defined by Formula (5) and the stationary distribution can be obtained by Formula (6).
The proposed MSR-MSF-VQ algorithm Although VQ was shown to be effective in [25], some room exists for improvement because the original version does not capture any spatial structural information. Considering this lack, [28] developed the MSF-VQ algorithm by combining MSF with the original VQ algorithm to encode spatial structural information into VQ histogram-based features to improve the accuracy of a facial recognition system. The combined MSF-VQ features are key in implementing the MSF-VQ algorithm for facial recognition and can be achieved by following these five steps.
Step 1: Conduct a simple a low-pass filtering to preprocess the input facial image to reduce high-frequency noise and extract the most effective low-frequency component for face recognition.
Step 2: Quantize the facial sub-region into 33 levels utilizing the VQ method.
Step 3: Construct a spatial co-occurrence matrix based on a defined distance d.
Step 4: Calculate the Markov transition matrix, which is derived from the spatial co-occurrence matrix.
Step 5: Normalize the self-transition as the initial distribution and combine it with the stationary distribution to obtain the complete MSF-VQ feature.
After obtaining the MSF-VQ features, the next step is the face recognition procedure, which is similar to the original VQ algorithm. We can obtain the final recognition results by transferring the comparison of two facial images to the comparison of two corresponding MSF-VQ features. Although the recognition results using MSF-VQ algorithm on the ORL database were reported in our earlier work [28] are better than that of the original VQ algorithm, thus demonstrating the effectiveness of the MSF-VQ algorithm, there are still some limitations in the MSF-VQ features, because the MSF-VQ features generated from the full facial image contain no location information concerning the facial sub-regions. This lack can degrade the face recognition performance-especially when the MSF-VQ algorithm is applied to a large face database such as FERET, which is larger than the ORL database. Considering this aspect, we felt sure that further research on the original MSF-VQ algorithm could yield more powerful discrimination capability for facial recognition. Therefore, in this paper, an extended version of the MSF-VQ algorithm based on several image sub-regions, called MSR-MSF-VQ, is proposed to address this problem. In contrast to the original MSF-VQ algorithm, this new proposed MSR-MSF-VQ algorithm not only retains the advantages of the MSF-VQ algorithm but also integrates the location information and spatial relationships of facial sub-regions into the MSF-VQ features to obtain a better facial feature representation for face recognition. Fig 2 shows the face recognition process using our proposed MSR-MSF-VQ algorithm. More specifically, in the implementation process of our proposed algorithm, after normalization, the facial image is first divided into several sub-regions (Fig 3 shows examples of segmenting a face image into several equal-sized sub-regions in a non-overlapping way). Then, the individual MSF-VQ features of each sub-region can be extracted independently through the series of steps and formulas mentioned above. After obtaining the MSF-VQ features, the comparison of each same sub-region of two facial images can be transferred to the comparison of two corresponding MSF-VQ features using a dissimilarity measure method; here, we choose Manhattan distance as the matching measure following [28], which can be computed using Formula (7). The symbols h r ðAÞ ! and h r ðBÞ ! (r represents each sub-region) stand for the MSF-VQ features belonging to each sub-region of facial images derived from the gallery and probe sets. By concatenating the recognition results based on different facial sub-regions using weighted averaging, the newly generated MSR-MSF-VQ feature is finally formed and can be utilized as a substitute for the original MSF-VQ features for face recognition. The formula used during the face recognition process is given in Formula (8), in which the symbol w r represents the corresponding weighting factor of the MSF-VQ feature for each facial sub-region.
The related experimental results and comparative analyses using MSR-MSF-VQ algorithm for face recognition are presented in the next section.

Results and discussion
In this section, to evaluate the feasibility and performance of our proposed algorithm, we carried out experiments on six standard public face databases: ORL [26] [29], FERET [30] [31], AR [32], Yale [33], Yale-B [34,35] and CAS-PEAL-R1 [36,37]. These databases all contain face images with pose, expression, illumination and occlusion variations. The first database was used to choose appropriate parameters for the MSF-VQ algorithm. Then, the FERET, AR, Yale, Yale-B and CAS-PEAL-R1 databases were used to compare and evaluate our new proposed MSR-MSF-VQ algorithm with previous face recognition approaches. The details of the corresponding experiments and the results will be given in the following subsections. Our proposed algorithm was programmed using ANSI C and executed on a PC with an Intel(R) Xeon (R) E5-2620 CPU running @ 2.1 GHz with 32 GB RAM and a Linux (Fedora distribution) operating system.

Determination of the parameters
To apply the MSF-VQ algorithm, several parameters such as the direction of the occurrence matrix, d (the distance used in the co-occurrence matrix), and n (the number of transfer times) must be determined in advance. This subsection contains an analysis of how to set these parameters. We conducted experiments on the ORL face database to investigate the sensitivities of these three parameters. The ORL database [25] [29] contains 400 facial images of 40 different people; there are 10 images of each person in different poses and with different expressions. All the images in this database are greyscale and have a resolution of 92 × 112 pixels (The facial images in the ORL face database are copyrighted, which limits the publication of these facial images in PLOS ONE for commercial use. Consequently, in this paper, we have removed the image samples from the ORL face database). In our experiments, we used five images from each individual for training and the remaining five images for testing by using the rotation method. Thus, in total, there are 252 (C 5 10 ) training-testing combinations. The final recognition rates are obtained by taking the mean of the 252 recognition results.
From the previous work described in [28], we know that we can utilize different directions of the Markov stationary features to eliminate the inherent ambiguity associated with MSF caused by the symmetric property of the co-occurrence matrix. Table 2 presents the corresponding experimental results. The symbols such as "MSF-VQ (0)", "MSF-VQ (90)", "MSF-VQ (45)", and "MSF-VQ (135)" stand for the MSF-VQ algorithm based on the horizontal case, vertical case and two diagonal cases, respectively, meanwhile, the symbols related to "MSF-VQ (mix)" and MSF-VQ (ave) separately represent the cases that use the co-occurrence matrix based on the four directions and those that use different MSF-VQ features based on the four directions with weighted average. From Table 2, we can see that the recognition accuracy of 96.15% achieved by the MSF-VQ (ave) algorithm constitutes an improvement of the recognition rate compared with the original VQ algorithm, which indicates the effectiveness of the MSF-VQ algorithm when considering the influences of different directions.
Furthermore, considering the essential role of the parameters related to d and n in our MSF-VQ algorithm, we performed two experiments using the MSF-VQ (mix) algorithm on the ORL database to select optimal values for the parameters d and n. In the first experiment, we fixed d to 1 and varied n from 10 to 100. In the second experiment, we fixed n to 50 and varied d from 1 to 6. We calculated the recognition rate for all variations. The corresponding average recognition results are shown as a function of filter size in Fig 4 and Fig 5, respectively. Both reflect the same trend: the average recognition rate first increases as the filter size increases, and then becomes saturated or gradually decreases. In more detail, the experimental results in Fig 4 show that the optimal recognition performance is obtained when d is fixed to 1, while Fig 5 shows that the impact of parameter n on the face recognition rate is not obvious, especially at filter sizes of 11×11 or 13×13. This result confirms that it is feasible and suitable to choose n = 50 (the same value as was used in [27]) and d = 1 in our algorithm.  Face recognition algorithm using extended vector quantization histogram features image is 256×384. In our next set of experiments, we tested our algorithm using the FB section of the FERET face database. This task has been widely used to evaluate facial recognition accuracy. In more detail, the database contains 1,196 frontal images in the fa set and 1,195 frontal images in the fb set. Each set contains only one image per person. The fa set consists of different facial expressions from fb; we selected fa as the gallery set and fb as the probe set. All the input facial images are normalized and resized to 146×200 pixels utilizing the two eye coordinates supplied by the FERET face database (The facial images in the FERET face database are copyrighted, which limits the publication of these facial images in PLOS ONE for commercial use. Consequently, in this paper, we have removed the image samples from the FERET face database).

Experiments on the FERET face database
Our previous experimental results reported in [39] validated the effectiveness of the MSF-VQ algorithm on the FERET face database. However, compared with some state-of-theart algorithms, the recognition performance of the MSF-VQ algorithm was still far from ideal because the MSF-VQ features of the whole-face image contain no location information or any spatial relationships of the facial sub-regions. Therefore, we planned to apply the new proposed MSR-MSF-VQ algorithm to our face recognition task to address this problem. We expected to obtain an improved face recognition performance.
Before validating the effectiveness of our new proposed MSR-MSF-VQ algorithm, we investigated the impacts of factors such as image size (F1), similarity measures (F2), and directions of the occurrence matrix (F3).These parameters play essential roles in our algorithm; consequently, to obtaining the optimal parameter values would facilitate our future work. Moreover, we also conducted experiments to further investigate the sensitivity of the parameters (d and n) by testing using a larger database than the ORL database. Our recognition results on the FERET database are shown in Fig 6. These results reveal similar experimental phenomena with the results on the ORL database. Therefore, we can conclude that these two parameters (d and n) are not sensitive to the training database. Therefore, in the next set of experiments based on other face databases, the parameters d and n are fixed and set as they were in this work. Next, we vary the factors F1, F2 and F3. The experimental results are listed in Table 3, from which we can observe that based on these factors, the MSF-VQ algorithm shows only slight differences in the face recognition accuracy. Therefore, we still adopt the Manhattan distance as the similarity measure, select the facial identification feature using four directions, and use the original image size (146×200 pixels) for our face recognition task.
Next, we conducted experiments based on different division strategies to investigate the effectiveness of our proposed MSR-MSF-VQ algorithm. In practice, when implementing our MSR-MSF-VQ algorithm, the resized square face images are first partitioned into several equal image sub-regions; then we apply the MSF-VQ algorithm to each equal-sized image subregion to obtain the individual recognition results. These results are concatenated by weighted averaging in the face recognition procedure to obtain the final combined MSR-MSF-VQ features for face recognition. Table 4 shows the experimental results from applying different image division strategies. The 1st column lists the division strategies; the 2nd column shows the size of face image; and the 3rd column presents the maximum recognition rate using the MSR-MSF-VQ algorithm based on the different division strategies. From Table 4, we can conclude that the recognition rate increases as the division strategy changes from 1×1 sub-regions to 5×5 sub-regions, however, this increasing trend is not maintained when more precise division strategies are used for feature extraction. After segmenting the face images into 5×5 subregions with a size of 41×41 pixels and achieve the maximum recognition rate of 98.2%, the face recognition accuracy decreases. This occurs because more precise division strategies introduce additional noise, which is not beneficial for the recognition performance. Only by combining different MSF-VQ features based on several image sub-regions using an appropriate division strategy will the recognition rate be improved. In contrast to the previously introduced MSF-VQ algorithm, applying the MSR-MSF-VQ algorithm results in a considerable  improvement in recognition accuracy. The largest contribution of our method is that it considers the interactions of multiple different facial image sub-regions. This approach more accurately describes the content of the facial images and preserves more significant personal identification information during feature extraction, which leads to the final excellent face recognition performance.
To clearly present the effectiveness of our proposed MSR-MSF-VQ algorithm, we compared it with various state-of-art approaches using the same FB task of the FERET database. The compared approaches include PCA [1] (considering Euclidean and Mahalanobis Cosine distances), LDA [2], the Bayesian algorithm with variants MAP and ML [20], Gabor-EBGM [12], HOG [10], HOG-EBGM [11], LBP [9] and SIFT [13]. The results listed in Table 5 show that the MSR-MSF-VQ algorithm achieves state-of-the-art accuracy, which validates the effectiveness of our proposed algorithm.
Furthermore, given that deep learning-in particular, the convolutional neural network (CNN) that is widely used in computer vision community-has achieved promising results in face recognition recently, we compared our proposed algorithm with some recent works based on CNNs to further validate the effectiveness of our proposed algorithm. We conducted extensive experiments based on CNNs, still using the FB task in the FERET database for face recognition.
Because we lacked large-scale datasets to train our own deep learning model, we used pretrained models and CNN architectures and then fine-tuned the parameters on our own dataset. We mainly chose two CNN architectures (AlexNet and CenterlossNet) for our experiments. AlexNet [22] is one of most representative convolutional neural networks. It can Face recognition algorithm using extended vector quantization histogram features classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into 1000 different classes. The pre-trained AlexNet model used here was supplied by Krizhevsky et al. [22], and was trained on a subset of ImageNet with approximately 1000 images in each of 1000 categories. In total, there are approximately 1.2 million training images, 50,000 validation images, and 150000 testing images. CenterlossNet (a newly proposed CNN) is an optimized CNN jointly supervised by the softmax loss and the center loss. As demonstrated in [21], compared with most recent works based on Deep Learning such as FaceNet and DeepFace, Center-lossNet can achieve excellent recognition performance with much less training data and a simpler network architecture. The pre-trained model supplied by Wen et al. [21] was trained on web-collected training data including the CASIA-WebFace, CACD2000, and Celebrity + image databases. In our experiments, we used fine-tuned models (8000 iterations) of these two models (AlexNet and CenterlossNet) to extract deeply learned features for face recognition and compare our proposed algorithm with these models to further validate the effectiveness of our proposed algorithm.
To compare the performance of our proposed algorithm with the methods based on CNNs, for the face recognition process, we not only used the previously introduced Manhattan distance as the similarity measure to obtain the recognition results but also adopted the Support Vector Machine (SVM) [15] to optimize the face recognition performance. SVM is a widely used classifier that employs a supervised pattern recognition scheme method with two significant features: (1) SVM achieves an optimal linear classifier (optimal hyperplane) in the feature space whose training process involves a linear classifier with minimum machine complexity, thereby keeping the expected generalization errors low. (2) SVM makes efficient use of extremely high dimensional feature spaces using kernel functions. In our experiments, we applied the LIBLINEAR model supplied by [40] to obtain our recognition results (the kernel function was the radial basis function (RBF), and the C (penalty factor) equalled 0~10). In addition, the MSR-MSF-VQ face image features used in our experiments were represented by concatenating the MSR-MSF-VQ features extracted from different sub-regions (we adopted a division strategy of 5 × 5 sub-regions and the dimension of the MSR-MSF-VQ features was 1650). These features were utilized to obtain the final face recognition results based on the distance measure and the SVM classifier. Table 6 shows the corresponding experimental results. The symbol "MSR-MSF-VQ+SVM" represents the MSR-MSF-VQ algorithm plus the SVM classifier). As Table 6 shows, compared with the original distance measure, the SVM classifier improves the recognition results. Our proposed MSR-MSF-VQ algorithm's results are higher than those of CenterlossNet and Alex-Net, which indicates the effectiveness of our proposed algorithm.
Finally, we compared the average execution time of our proposed algorithm with the times of the different approaches mentioned in Table 7. Note that the processing time for a single image using our proposed algorithm on the FERET database is 1,883 ms, which includes 50 ms for preprocessing (including filtering and image preprocessing), 69 ms for VQ processing, 381 ms for feature extraction, and 1,383 ms for face recognition (the gallery set contains a total of 1196 face images). As Table 7 shows, the MSR-MSF-VQ algorithm is more efficient than the other methods in terms of the feature extraction time and the total execution time for face recognition.

Experiments on the AR face database
The AR face database [32,41] contains more than 4,000 images of 126 different subjects (70 male and 56 female) (The facial images in the AR face database are copyrighted, which limits the publication of these facial images in PLOS ONE for commercial use. Consequently, in this paper, we have removed the image samples from the AR face database). We can see that the images in the AR face database have various facial expressions (neutral, smiling, angry), the lighting varies (e.g., brightly lit), and some of the images are partially occluded by sunglasses and scarves. Consequently, the AR database is more challenging. In these experiments, we take on the challenge of face recognition to test our proposed algorithm robustness to partial occlusion.
For our experiments, we selected 1,300 images of 100 individuals (50 males and 50 females) -13 different images for each subject. All the images are greyscale and cropped to 90×120 pixels based on two eye centre coordinates. To verify the effectiveness of our algorithm, we designed two test sets (the first comprised 300 scarf-occluded images and the second comprised 300 sunglasses-occluded images) and utilized the remaining seven images for each subject to create the training set. Table 8 contains a detailed comparison of the original MSF-VQ with a variety of state-of-the-art approaches. From Table 8, we can see that the recognition results of the original MSF-VQ lag those of the algorithms SRC [16], LRC [17] and VPC [18]; however, for the scarf-occluded images, the results of MSF-VQ exceed those of the other methods except for Fisherfaces and SRC.
Next, we conducted experiments to demonstrate the effectiveness of the proposed MSR-MSF-VQ algorithm under partial occlusion conditions using the same training and test sets. Table 9 reports the comparison results of MSR-MSF-VQ algorithm with LVPC [18] and MLRC [17] (LVPC and MLRC are extended versions of VPC and LRC, respectively, that Face recognition algorithm using extended vector quantization histogram features divide the images into four sub-regions). The symbols "MSR-MSF-VQ-4", "MSR-MSF-VQ-16", and "MSR-MSF-VQ-25" represent the MSR-MSF-VQ algorithm using three differently sized partitioning strategies. On the sunglasses-occluded images, the MSR-MSF-VQ-4 algorithm achieves the highest recognition (89%), greatly outperforming the other methods. However, on the scarf-occluded images, it falls behind the other algorithms. However, by applying more precise division strategies, the recognition rate increases significantly, exceeding all the other methods both cases (sunglasses and scarves). This is particularly evident when we employ the SVM classifier; the recognition rate rises to 100% for both types of occluded images. These experimental results clearly reflect that using whole-face MSF-VQ features is not a good strategy for face recognition and leads to uncompetitive recognition performances. However, by deploying an appropriate division strategy and using the combined MSF-VQ features for face recognition, the recognition performance improves. The satisfactory performance achieved on the AR face database confirms that our MSR-MSF-VQ algorithm is robust to partial occlusions.

Experiments on the Yale face database
To further explore the performance of the MSR-MSF-VQ algorithm under facial expression and illumination variations, we applied it to the Yale database [33], which consists of 165 frontal greyscale images of 15 subjects, with 11 different images for each person. All the images are normalized to 100×100 pixels. More specifically, we used this database to evaluate the recognition accuracy of our proposed MSR-MSF-VQ algorithm under conditions where facial expressions (normal, happy, sad, sleepy, surprised and winking), occlusions (with and without glasses) and illumination (centre, left and right lighting) vary. (The facial images in the Yale face database are copyrighted, which limits the publication of these facial images in PLOS ONE for commercial use. Consequently, in this paper, we have removed the image samples from the Yale face database).
For our first experiment, we randomly chose six images as gallery sets and the remaining five images as probe sets for each person, similar to [8] (hence, 90 images were used for training and 75 images for testing). This random selection operation was repeated 10 times. Table 10 shows the average recognition results of the MSR-MSF-VQ algorithm and several subspace analysis methods. The compared algorithms are listed in Table 11. Among these, PCA, NMF, LPP, and S-LPP are holistic feature extraction models, while the others are all local matching algorithms. The experimental results show that all the local matching methods achieved better recognition performances than do the holistic methods. This occurred because the holistic methods can only extract global features for face recognition, which causes their performances to be deeply affected by pose, lighting condition, and facial expression variations in the facial images. The results also show that our proposed MSR-MSF-VQ algorithm significantly outperformed all the Face recognition algorithm using extended vector quantization histogram features local matching methods, because the MSR-MSF-VQ algorithm considers the interactions between different sub-regions, which causes the facial recognition features to include both the location information and the spatial relationships of facial sub-regions. Therefore, the MSR-MSF-VQ algorithm achieves better recognition performance than the other methods.
In our second set of experiments with the Yale database, we varied the number of the training samples. More specifically, we selected m (m = 2, 3. . ., 8) images of each person from the Yale face database as the training set and employed the remaining (11-m) images for testing. For each m we repeated our face recognition experiments 50 times using the MSR-MSF-VQ algorithm, and calculated the mean of the 50 results. The corresponding experimental graph of our proposed algorithm compared with LSHOG (locality sensitive histograms of oriented gradients) [49] and HOG [10] plus different dimension reduction algorithms including PCA [1], MFA [50], NPE [51] and LPP [7] using the same Yale face database are plotted in Fig 7. The Y-axis denotes the recognition accuracy and the X-axis shows the number of training samples. The results show that the recognition performance of LSHOG is superior to that of HOG regardless of what type of dimension reduction algorithm is used. Furthermore, Fig 7 also shows that the MSR-MSF-VQ algorithm performs best in most cases with the same training set. This result occurs because-although LSHOG is better than the original HOG (as reported in [49])-the LSHOG algorithm, which computes a histogram of gradient orientations over Face recognition algorithm using extended vector quantization histogram features the entire face at each pixel location, ignores the interactions between different sub-regions, causing its recognition rate to be below that of the proposed MSR-MSF-VQ algorithm. Therefore, we can conclude that the recognition performance of our proposed algorithm is more robust than that of other methods.

Experiments on the Yale-B face database
Some additional experimental results on the Yale-B face database are provided in this section. In contrast to the Yale face database, the Yale-B face database [34,35,52] was constructed to test the performance of facial recognition algorithms under larger variations in lighting and pose; hence, the Yale-B face database is even more challenging. In total, it contains images of 38 individuals in 9 poses with 64 different illuminations per pose (The facial images in the Yale-B face database are copyrighted, which limits the publication of these facial images in PLOS ONE for commercial use. Consequently, in this paper, we have removed the image samples from the Yale-B face database). For our experiments, we selected only the frontal-face images of ten individuals under 64 different illumination conditions. All the images were greyscale and normalized to 168×192 pixels. We divided the 640 frontal images into five subsets based on the angle between the light source direction and the camera axis: Subset 1 (angle < 12 degrees), Subset 2 (13 < angle < 25 degrees), Subset 3 (26 < angle < 50 degrees), Subset 4 (51 < angle < 77 degrees), Subset 5 (angle > 78 degrees). The sizes of these five subsets are 70, 120, 120, 140 and 190, respectively. We conducted the two experiments to evaluate the validity of our proposed algorithm. Tables 12 and 13 show performance comparisons of the MSR-MSF-VQ algorithm and other comparable algorithms in terms of recognition performance for Experiment 1 and Experiment 2, respectively.
1. In our first experiment, we select Subset 1, including 7 images for each person as the gallery set (the images of Subset 1 were acquired under good illumination conditions) and randomly choosing facial images from the remaining four subsets as the probe set, which is the Face recognition algorithm using extended vector quantization histogram features same approach as [53]. We compare our proposed algorithm with three existing methods: Raw image (the image without any preprocessing), HEQ (histogram-equalized method) and wavelet-based normalization [53].
2. In our second experiment, the images with the most neutral light condition ("A+00E+00") are used as the gallery set, and images from Subsets 1-5 are randomly chosen as the probe set, as in [54]. We compare our proposed MSR-MSF-VQ algorithm with several state-ofthe-art methods: HE [55], LTV [56], Gradientface (GradFace) [57] and RG [58]. In addition, the result on the original image without any preprocessing (ORI) are also presented.
From the experimental results listed in Table 12 and Table 13, we can see that our proposed MSR-MSF-VQ algorithm achieved excellent recognition performance on the images with slight or moderate light variations. This result occurs because the test images that were taken under relatively good illumination conditions are more similar to the images in the gallery set. However, the worst facial recognition results occurred on the images with severe light variations. Here, our algorithm was only weakly competitive with some of the other methods on the same probe set because the images were not preprocessed to normalize the illumination effects. Therefore, it is difficult to extract an appropriate illumination-invariant feature from the images in the test set with intense shadows that were taken under poor illumination conditions. We took this factor into account and utilized the histogram equalization method to preprocess the face images of the probe sets with severe light variations (Subset 4 and Subset 5). The corresponding experimental results obtained after preprocessing using the SVM classifier are shown in Table 12 and Table 13. Preprocessing improved the recognition rate of our algorithm, especially for Subset 5, for which the recognition rate improved significantly compared with the original results. However, the improvement in the recognition results did not exceed the improvement observed when using all of the compared methods. From these results, we can still conclude that although the MSR-MSF-VQ algorithm is not robust to severe illumination variations, it achieves an excellent recognition rate under varied illumination conditions in small-scale face databases. To some extent, these results further validate the effectiveness of our proposed MSR-MSF-VQ algorithm. In summary, the substantial comparative analysis of our proposed MSR-MSF-VQ algorithms with several state-of-the-art methods on three standard face databases performed in this work, clearly reflect the feasibility and effectiveness of the MSR-MSF-VQ algorithm under challenges of varying facial expressions, pose and illuminations. We can attribute our algorithm's satisfactory recognition performance to the extended vector quantization histogram features (the MSR-MSF-VQ feature), which not only contain the spatial structural information but also consider the significance of location information and the interactions between different facial sub-regions.

Experiments on the CAL-PEAL-R1 face database
Finally, we conducted additional experiments on the well-known large-scale CAS-PEAL-R1 face database [36,37]. This database contains 30,900 images of 1,040 subjects with varying accessories, expressions and lighting. The standard evaluation protocol uses all the frontal-face images. The gallery set (GS) consists of 1,040 face images: one image per subject under standard conditions. We chose three representative probe sets (images with accessories, varied expressions, and varied lighting) for our experiments. The expression set (PE) contains 1,570 face images of 377 subjects; the accessory set (PA) contains 2,285 face images of 438 subjects; and the lighting set (PL) contains 2,243 face images of 233 subjects. All the face images in the gallery and probe sets were aligned and cropped to 64×64 pixels based on the eye coordinates given in the current release of the CAS-PEAL-R1 face database (The facial images in the CAS-PEAL-R1 face database are copyrighted, which limits the publication of these facial images in PLOS ONE for commercial use. Consequently, in this paper, we have removed the image samples from the CAS-PEAL-R1 face database).
The optimal accuracy recognition rate of different approaches and our proposed algorithm for the three probe sets from CAS-PEAL-R1 face database as described above are listed in Table 14. The compared approaches are as follows: 1. Total Variation based Quotient Image model (TVQI) [59]. This is an effective method for face recognition under low-level lighting conditions.
2. The TV_L1 and TV_L2 models [59] in INM [60] (INM uses the anisotropic diffused TV_L1 model to decompose the face sample into a low-frequency part and a high-frequency part, and it uses the TV_L2 model to generate a noiseless large-scale part). The symbols "TV_L1+HE" and "TV_L2+HE" stand for the INM method implemented using the TV_L1 and TV_L2 models with histogram equalization, respectively, while "TV_L2+RHE" represents the INM implemented using the TV_L2 model with region-based histogram equalization.
As shown in Table 14, our MSR-MSF-VQ algorithm clearly achieves the highest recognition rate on the accessory and expression probe sets, but it lags the other methods under low- level lighting conditions. Our MSR-MSF-VQ algorithm is obviously not beneficial on the lighting probe set. This result occurs because the TVQI model uses the low-frequency parts of the image to normalize the illumination effect in the face sample and then generates an illumination-invariant small-scale image. Furthermore, INM is an image preprocessing method that can be used to remove illumination effects in face samples, including diffuse reflections, specular reflections, cast shadows and attached shadows. By using the INM based methods, multiscaled information containing adequate enhanced facial features can be extracted, and these are illumination invariant. None of the images used in this experiment were preprocessed to normalize the illumination effect, hence, the features described above for recognition under varied lighting conditions achieved better recognition performances than those of our algorithm. Consequently, we took the illumination effects into account and utilized the histogram equalization method to preprocess the face images of the lighting probe set. We also adopted the SVM classifier to optimize the recognition performance. The corresponding experimental results after preprocessing are shown in Table 14, from which we can observe that the preprocessing substantially improved the recognition rate of our algorithm; its results exceeded those of all comparable methods. The reasons our algorithm achieves a satisfactory recognition performance on the other two probe sets (accessory and expression)-beyond its innate advantages and the optimization function of the SVM classifier-involve the fact that the TVQI model is suitable for face recognition only under varied lighting conditions in small-scale face databases; the image information generated by TVQI is limited. Therefore, when it is used on a large-scale face database, it cannot discriminate between all the face samples. Moreover, the fact that the INM based methods preserve numerous facial features, for example, the wrinkles in a face sample, may decrease their recognition performances in the expression probe set.
Overall, the experimental results on the CAS-PEAL-R1 face database further confirm the MSR-MSF-VQ algorithm's robustness for face recognition.

Conclusions
In this paper, an improved face recognition algorithm called MSR-MSF-VQ was proposed.
The main characteristic of the MSR-MSF-VQ model is that it captures spatial structural information to overcome the limitation of VQ histograms. Moreover, it also incorporates location information and the spatial interactions between facial sub-regions into the identification features, which improves the facial recognition performance. The proposed method was evaluated on five well-known face databases and comparisons were made with several state-of-theart algorithms. Our algorithm's satisfactory recognition performances demonstrate its robustness for face recognition. Although our proposed algorithm achieves excellent recognition rates using the extended VQ histogram features during face recognition, more work need to be done in the future. There are two directions worth exploring to further optimize our algorithm. First, the VQ histogram is utilized in this paper, and it is a reliable facial feature representation for face recognition; however, other histogram-based features such as LBP [61], HOG [10], and so on could be combined with MSF instead of VQ histogram features. Therefore, we plan to explore the relative merits of these approaches in future research. Second, we plan to explore additional common classification algorithms such as Nearest Distance Classifiers [19] and Neural Networks [21,22] for face recognition to improve the performance of the proposed algorithm.