Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Robust Face Recognition via Multi-Scale Patch-Based Matrix Regression

  • Guangwei Gao ,

    csggao@gmail.com

    Affiliation Institute of Advanced Technology, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China

  • Jian Yang,

    Affiliation School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China

  • Xiaoyuan Jing,

    Affiliation School of Automation, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China

  • Pu Huang,

    Affiliation School of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China

  • Juliang Hua,

    Affiliation School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China

  • Dong Yue

    Affiliation Institute of Advanced Technology, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China

Abstract

In many real-world applications such as smart card solutions, law enforcement, surveillance and access control, the limited training sample size is the most fundamental problem. By making use of the low-rank structural information of the reconstructed error image, the so-called nuclear norm-based matrix regression has been demonstrated to be effective for robust face recognition with continuous occlusions. However, the recognition performance of nuclear norm-based matrix regression degrades greatly in the face of the small sample size problem. An alternative solution to tackle this problem is performing matrix regression on each patch and then integrating the outputs from all patches. However, it is difficult to set an optimal patch size across different databases. To fully utilize the complementary information from different patch scales for the final decision, we propose a multi-scale patch-based matrix regression scheme based on which the ensemble of multi-scale outputs can be achieved optimally. Extensive experiments on benchmark face databases validate the effectiveness and robustness of our method, which outperforms several state-of-the-art patch-based face recognition algorithms.

Introduction

Object classification is an active topic in the area of pattern recognition [18]. Due to the advantages of non-intrusive natural and pronounced uniqueness, face recognition has been an active research topic and has been incorporated into many multimedia applications [914], such as surveillance, human machine interaction, access control and photo album management in social networks. Recently, linear regression based face recognition approaches have led to state-of-the-art performance [1518], with representative examples being sparse representation-based classification (SRC) [15] and linear regression-based classification (LRC) [16]. In SRC, the query sample image is coded as a sparse linear combination of all the training images, and then the classification is made by checking which class yields the least reconstruction error. Many works of SRC have been developed for vision applications, e.g., super-resolution [19, 20], facial expression recognition [21] and human gait recognition [22, 23]. Alternatively, Naseem et al. [16] proposed LRC for face recognition. Based on the assumption that samples from a specific object class lie on a linear subspace, LRC represents a query image as a linear combination of training images of each class. Yang et al. [24] provided an insight into SRC and sought reasonable supports for its effectiveness. They viewed the L1-regularizer as having two properties, sparseness and closeness. Sparseness determines a small number of nonzero representation coefficients, and closeness makes the nonzero representation coefficients concentrating on the training samples having the same class label as the test sample. Zhang et al. [18] discussed the working mechanism of SRC and demonstrated that it is collaborative representation rather than L1-norm sparseness that improves the classification performance. In their work, a collaborative representation-based classification (CRC) model was presented with a squared L2-regularization, which achieves competitive classification performance but with significantly lower complexity than the sparse representation method.

It is worth noting that the majority of studies assume that the testing images are taken under well-controlled settings (e.g., reasonable illumination, poses and variations, without occlusion or disguise). Their performance is degraded when the testing images are contaminated. By introducing an identity matrix I as a dictionary to code the outliers (e.g., corrupted or occluded pixels), SRC [15] exhibits excellent robustness and promising performance. However, SRC is not robust to contiguous occlusion such as sunglasses or scarves, as the occlusion level exceeds the breakdown point determined by this algorithm. Yang et al. [25] modified the SRC framework for handling outliers such as occlusions in face recognition by modeling the sparse coding as a sparsity-constrained robust regression problem. He et al. [26] unified the algorithms for error correction and detection by using the additive and multiplicative forms, respectively, and established a half-quadratic framework to solve the robust sparse representation problem. From the viewpoint of dictionary learning, Yang et al. [27] constructed a feature pattern dictionary that captures structured information and prior knowledge of image features to represent the unknown feature pattern weight of a query image. Similarly, Ou et al. [28] developed a clear and noise dictionary simultaneously and applied the learned clear dictionary for classification. Observing the distribution of the reconstruction error image, Yang et al. [2932] used the nuclear norm to characterize the structural information of an error image and proposed a nuclear norm-based matrix regression model that has achieved state-of-the-art performance for face recognition with occlusion and illumination changes.

In spite of aforementioned tremendous achievements, the small sample size (SSS) problem till remains one of the most fundamental and challenging issues in face recognition community. In many real-world applications such as smart card solutions, law enforcement, surveillance and access control, the available training samples per subject may be very limited [33]. Thus, the performance of these regression-based methods is greatly degraded because the query sample cannot be well represented by the few training samples. To tackle the SSS problem, many efforts have been made in the past few decades. Existing methods mainly fall into three categories. The first are patch-based methods, which generally contain steps of local patch representation, local feature extraction and the combination of classification results [3436]. However, the patch size has a great impact on the output performance in patch-based methods [37, 38]. The second integrate the local and global features for classification [39, 40] because they can provide complementary information for final results. The third employ different feature extractors to extract multiple types of features, and then utilizes decision level fusion scheme for final classification [41, 42]. We mainly focus on patch-based method in the sequel.

To improve the recognition performance of matrix regression in SSS problem and preserve its outstanding ability dealing with occlusion and illumination changes, in this paper, we propose performing matrix regression on patches. The so-called patch-based matrix regression (PMR) classifies each query matrix patch, and then integrates the recognition outputs of all patches for final decision. Nevertheless, the patch size plays an important role on the final performance in PMR, and the optimal patch size varies greatly across different databases. If the patch size is too small, little information is given, and the method cannot capture the geometric structure of the image; if it is large, the information that can be used is limited. To fully exploit the classification ability and appearance information of different patch sizes, we then devise a multi-scale PMR (MSPMR) scheme by integrating the complementary information from different scales. MSPMR first performs PMR on each scale and then learns optimal scale weights to adaptively fuse multi-scale outputs. To evaluate the performance of the proposed method, we use four databases that involve different recognition tasks: the Extended Yale B, AR and LFW dataset for face recognition without occlusion, the AR database for face recognition with real disguise, and the Extended Yale B dataset for face recognition with block occlusion. The experimental results demonstrate the effectiveness and robustness of the proposed method.

The remainder of the paper is organized as follows. Section 2 briefly reviews two related works. The proposed multi-scale PMR via margin distribution optimization is presented in Section 3. Section 4 conducts extensive experiments, and Section 5 concludes this paper.

Related Works

1. Nuclear norm based matrix regression

By observing the distribution of the reconstruction error image, a nuclear norm-based matrix regression (NMR) [29] model was proposed that uses the nuclear norm to characterize the whole structure of the error image. Here, we define Ni as the number of images from the i-th class and as the total number of training samples from c classes. Given a set of N training image matrices A1, A2, …, AN∈ℜrow×col and a query image matrix B∈ℜrow×col, the NMR model can be represented as (1) where λ is the regularization parameter, x and A(x) = x1A1+x2A2+…+xNAN are the representation coefficient vector and the reconstructed image, respectively. Then, the query image can be classified into the class that yields the minimal reconstruction error, i.e., (2) where where x* is the optimal solution of Eq (1) and δi(x) is a vector whose only nonzero entries are the entries in x that correspond to Class i. We know that NMR is much more robust and effective for face recognition, particularly with respect to occlusion and illumination changes.

2. Patch-based CRC

Suppose that we have c known pattern classes. Let Xi = [xi1, xi2, …, xiN] be the matrix formed by the training samples of class i, where Ni is the number of training samples of class i. Let X = [X1, X2, …, Xc]∈ℜM×N be the dataset of all training samples, where . To alleviate the performance degradation of CRC in the small sample size problem, the patch-based CRC (PCRC) [36] model was proposed. For a given query image y, it is first divided into multiple overlapped patches {y1, y2,…, yp}. Then, each patch yi is tackled by representing it as a linear combination over a local dictionary Di. Finally, one can employ the plurality or linear-weighted combination scheme to the recognition outputs for a final decision.

For each patch yi, its representation weights can be obtained by minimizing the following error: (3) Where Di = [Di1, Di2,…, Dic] denotes the local dictionary located with the same position as yi, Dik is the sub-dictionary of the k-th class. The recognition result of patch yi is Identity(yi) = argmink[43], where , is the coefficients associated with the k-th class.

For clarity, four key components (i.e., muti-scale trick, local patch strategy, structure error and pixel error characterization) of several related methods are compared in Table 1.

Multi-Scale Patch-Based Matrix Regression (MSPMR)

1. Motivation

In PCRC, each local patch matrix is first converted to a vector, and then the L2-norm is used to characterize the reconstruction error. However, the L2-norm (or L1-norm) is based on pixel values and thus ignores the structural information of the error image. Fig 1 shows an example where the error between (b) and (a) is shown in (c). By re-arranging the pixels of image (c), we can obtain image (d). The following observations can be made:

  1. The nuclear norm can better characterize the structure error than the L1 or L2 norm. For example, the L2-norm (or L1-norm) value of image (c) is the same as that of image (d), so it is difficult to distinguish between them. Fortunately, the nuclear norm values of images (c) and (d) are 47.75 and 58.14, respectively.
  2. From the distribution perspective, we can observe that the distribution of the error image does not follow a Laplacian or Gaussian distribution in Fig 1e. Fortunately, it can be seen from Fig 1f that the singular values of the error image (c) fit the Laplacian distribution well. We know that the nuclear norm is the sum of all singular values of a matrix, which can also be considered as the l1-norm of the singular value vector. Based on the above example, we believe that the nuclear norm is more suitable to describe the structural error.
thumbnail
Fig 1. (a) Original image; (b) observed image; (c) error image; (d) rearranged error image; (e) distributions of error image; and (f) distributions of singular values of error image.

https://doi.org/10.1371/journal.pone.0159945.g001

2. Patch-based matrix regression (PMR)

To make the model robust and efficient for face recognition with occlusion and illumination changes, matrix regression [29, 30, 32] was proposed using the nuclear norm to characterize the structure of the error image. In our patch-based matrix regression, all local patches are denoted in matrix form. Given a set of N local patches Xi1, Xi2,…,XiNRp×q and a query patch YiRp×q located at position i, Yi can be represented linearly using Xi1, Xi2,…,XiN, i.e., (4) where F(αi) = αi1Xi1+αi2Xi2+…+αiNXiN, αi = (αi1, …, αiN)T is the representation coefficient vector and Ei is the representation error. Generally, αi can be determined by the following regularized model (5) where ||·||* denotes the nuclear norm (the sum of the singular values) on Rp×q.

The problem is equivalent to (6)

Problem (6) can be solved by the alternating direction method of multipliers (ADMM), which minimizes the following augmented Lagrangian function: (7)

That is, (8)

The entire algorithm is briefly summarized in Algorithm 1, which mainly consists of two steps: a soft-thresholding operator [44] and a singular value thresholding operator [45].

Based on the optimal solution αi*, we can obtain the reconstructed image of Y as . Let δk: RnRn be the characteristic function that selects the coefficients associated with the k-th class. For αRn, δk(α) is a vector whose nonzero entries are the entries in α that are associated with Class k. Using the coefficients associated with the k-th class, one can obtain the reconstruction of Yi in Class k as .

Algorithm 1. Solving problem (6) via ADMM

Input: A set of N patches Xi1, Xi2,…,XiNRp×q and a query patch YiRp×q, parameters λ and μ, the termination condition parameter ε.

1: Fix the others and update by

2: Fix the others and update by

3: Fix the others and update by

4: Update the multipliers

5. If the termination condition is satisfied, go to 6; otherwise go to 1.

6. Output: Optimal coding vector xk+1.

The corresponding class reconstruction error is defined as (9)

The recognition output zj of query patch Yi is denoted as Identity (Yi) = argmink{eik(Yi)}.

Then we can combine the classification outputs of all patches by linear weighted combination [37], probabilistic model [40], kernel plurality [34] or majority voting [35]. In this paper, we use majority voting in the final decision making. Fig 2 shows the main diagram of the patch-based matrix regression for face recognition.

thumbnail
Fig 2. Diagram of patch-based matrix regression for face recognition.

https://doi.org/10.1371/journal.pone.0159945.g002

3. Multi-scale ensemble

From the previous introduction of PMR, we can find that the patch size plays an important role on the final output performance. In addition, how to set an optimal scale in advance for various databases remains unclear. Fig 3 exhibits the recognition rates curves versus different training sample sizes and patch sizes on the LFW and Extended Yale B databases, respectively. From Fig 3, the following observations can be made. First, the optimal patch size varies greatly between different databases. Second, the optimal patch size also varies a lot under the variation of the training sample size per person. To tackle the aforementioned difficulties, the recognition outputs of multi-scale PMR can be fused optimally; thus, the complementary information from different scales can be fully applied to further enhance the recognition performance. Motivated by [36], we also incorporate the ensemble learning scheme into our method to integrate multi-scale outputs.

thumbnail
Fig 3. Impact of patch size on PMR (1–5 denote the training sample size per subject).

https://doi.org/10.1371/journal.pone.0159945.g003

The diagram of the proposed multi-scale PMR is shown in Fig 4. In the following text, we first formulate the multi-scale ensemble problem, and then introduce a margin distribution optimization to obtain the optimal solution.

Problem formulation.

Suppose that we have two scales and two sample classes labeled +1 and -1. For any query sample, we can obtain its classification result +1 or -1 on each scale. Thus, each sample will have four possible classification results on these two scales, such as {-1,+1}, {+1,+1}, {-1,-1} and {+1, -1}. Given an available training data set, our goal is to learn a classification function f, which can make it possible to classify all the given samples exactly.

Given a sample set S = {(xi,zi)} (i = 1, 2, …, n, zi is the label of xi) and s scales, the classification results of samples xi on these s scales can form a space HRn×s. Denote by w = [w1, w2, …, ws] the scale fusion vector, and .

Definition 1 [36]: For multi-class classification tasks, the classification results of a given query sample xiS on these s different scales are denoted as [4], j = 1, 2, …, s. Then the decision matrix D = {dij}, i = 1, 2, …, n, j = 1, 2, …, s can be defined as (10) where zi is the label of the sample xi.

Definition 2 [36]: For a query sample xiS, [4] (j = 1, 2, …, s) are the classification results on these s different scales. Then the ensemble margin of xi is denoted as (11)

The ensemble loss of xi can be denoted as [36] (12) Where ε(xi) is the ensemble margin of sample xi.The square loss applied in CRC [18], SRC [15] and least square regression [16] can be used here to evaluate the ensemble loss. For a sample set S, its ensemble square loss can be formulated as (13) where e1 is a column vector whose entries are 1.

Algorithm of MSPMR.

In order to obtain the optimal scale fusion weights, the ensemble square loss in Eq (13) should be minimized. Nevertheless, the solutions may be non-unique for this linear system. Intuitively, we should impose constraint on the objective function in Eq (13) to make the solution unique and stable. Also, Shawe-Taylor [46] has provided the bound on the generalization error and pointed out that both the norm of w and the ensemble square loss should be optimized simultaneously to enhance the generalization ability.

As in [36], the following constrained l1-regularized least square optimization can be used to obtain the optimal scale weights [47]: (14) where τ is a regularization parameter and the regularization term can help to achieve a stable solution. Constraint can be converted to e2w = 1, where e2 = [1,1, …,1] is a row vector whose length is s. Then, we have (15)

Denoted by and , then we have [36] (16)

Algorithm 2. Algorithm of multi-scale ensemble learning for PMR

1: Choose s patch scale δ = [δ1, δ2, …, δs];

2: Obtain the recognition outputs [] by PMR;

3: Obtain the decision matrix

4: Learn the fusion weights

Duo to the fact the decision matrix is usually very small, the scale fusion weights w can be simply obtained by some commonly used l1-minimization solvers. In our method, l1_ls [48] is employed. Based on the above description, the proposed multi-scale PMR (MSPMR) scheme is summarized in Algorithm 2. Once the optimal scale fusion weights are achieved, the recognition output for an arbitrary sample xi can be represented as .

4. Computational complexity

In this subsection, we will evaluate the computational complexity of the proposed method. Since the multi-scale fusion weights can be learned off-line, we only discuss the computational complexity of the on-line recognition process involved in the proposed method. As illustrated in Algorithm 2, the proposed face recognition method takes major cost on patch-based matrix regression process. We observe that there are four factors affecting the cost in our method: the training sample size N, the dimension of one patch m = p×q, the number of iterations k in Algorithm 1, the patch number in one image M.

As described in [30], the matrix regression of each patch takes O(k(m1.5+mN+N2)) (in the case that p = q) cost. For M image patch, the computational cost is O(k(m1.5+mN+N2)M). In addition, the scale number s also affect the final running time. Therefore, the computational cost of the proposed method is about O(sk(m1.5+mN+N2)M). In Section 4, we will further compare the proposed method with the state-of-the-art approaches in terms of CPU runtime.

Experimental Results and Discussion

In this section, we conduct experiments on the benchmark face databases and the proposed method is compared with state-of-the-art models. For each method, we perform 20 runs of test on each database, and the average recognition rates and the corresponding standard deviations are reported. As in [36], seven scales are adopted in our MSPMR, and the patch sizes are 4×4, 6×6, 8×8, 10×10, 12×12, 14×14 and 16×16. In single scale-based PNN, PSRC, PCRC and PMR, the patch size is 10×10, and the patches overlap with their neighbors by 5 pixels. For PMR and MSPMR, we choose the optimal λ∈[0.01,0.1]. Parameter τ is set as 0.1 for MSPMR. It should be mentioned here that all experiments are done on the original face images, without any feature extraction or image preprocessing step. Some face image datasets were used in this paper to verify the performance of our methods. These face image datasets are publicly available for face recognition research, and the consent was not needed. The face images and the experimental results are reported in this paper without any commercial purpose.

As in [36], to learn the optimal scale weights, the training set is divided into subset1 (one image per person) and subset2 (the reminder of the training set). Then, PMR is used to classify samples from subset1 utilizing subset2 as the gallery set and the optimal weights on the seven scales can be learned. It should be noted that at least two samples per person are required to find the optimal scale fusion weights.

1. Face recognition without occlusion

In this subsection, we test the MSPMR for face recognition without occlusion on four face databases (Extended Yale B [49] and AR [50] in controlled environments together with the LFW [36] in uncontrolled environments). The baseline CRC [18], SRC [15], NSC [30], and patch-based methods including PNN [34], BlockFLD [38], Volterra [35], PCRC and MSPCRC [36] are used for comparison.

Extended Yale B database.

The first experiment was conducted on the Extended Yale B database; it includes 38 human objects in 9 poses under 64 illumination changes [49]. 64 images of a person with a particular pose are acquired at a camera frame rate of 30 frames per second, so the variations in the head pose and facial expression are small. All the frontal images marked with P00 are utilized in this experiment, and each is rehsaped to 32×32. Some examples are shown in Fig 5. For each subject, 2~5 samples are randomly selected from the first 32 images for training, and another 5 samples are randomly chosen from the rest 32 images for testing. Table 2 tabulates the experimental results.

thumbnail
Fig 5. Sample images of a person under various illumination conditions in the Extended Yale B database from different sessions.

https://doi.org/10.1371/journal.pone.0159945.g005

thumbnail
Table 2. Recognition rates (%) on the Extended Yale B database.

https://doi.org/10.1371/journal.pone.0159945.t002

It can easily be seen that MSPMR obtains the best recognition performance for all tests. Compared with PCRC and MSPCRC, PMR and MSPMR lead to much better results, thus verifying the effectiveness of characterizing the reconstruction error by the nuclear norm.

AR database.

The AR database [50] gathers over 4,000 color face images from 126 subjects, containing frontal facial images with different lighting conditions, facial expressions and occlusions. Pictures of 120 subjects were taken in two sessions (separated by two weeks), and each has 13 color images. As in [18], in this experiments, we choose a subset with only illumination and expression changes, which includes 50 male objects and 50 female objects. Fourteen face images (seven from each session) of these 100 individuals are selected and used. For each object, 2~5 samples from session 1 are randomly chosen for training, and another 3 samples from session 2 are randomly chosen for testing. All the images are manually cropped and then resized to 32×32 pixels. Some sample images of one person are presented in Fig 6.

The recognition results of different methods are listed in Table 3. The proposed methods always achieve better performance than the other methods. We can observe that in AR database, multi-scale ensemble learning in MSPMR leads to limited improvement over PMR. As described in [36], the reason may be that in this database, the average weight value for the scale 10×10 is approximately 0.9, indicating that patch size 10×10 is a proper choice for PMR in the AR database.

LFW database.

Labeled Faces in the Wild (LFW) [43] is a large-scale database of face photographs designed for unconstrained face recognition with variations in pose, illumination, expression, misalignment and occlusion; it contains images of 5,749 subjects. LFW-a is an extension of LFW after a commercial face alignment software is applied. As in [36], the objects who have more than ten samples are gathered to form a dataset with 158 objects from LFW-a. All the images are manually cropped and then resized to 32×32 pixels. Fig 7 shows some sample images from this database. For each subject, we randomly choose 2~5 samples for training and another 2 samples for testing.

Table 4 shows the face recognition of each method on the LFW dataset. From Table 4, we can clearly see that the performance of our PMR and MSPMR are superior to that of all other methods. Meanwhile, the recognition performance is greatly improved by MSPMR.

2. Face recognition with occlusion

In the following experiments, we evaluate the robustness and effectiveness of the proposed method when face images encounter with different occlusions, like real disguise or block occlusion. In this subsection, our method is compared with CRC [18], SRC [15], NSC [30], HQ_A and HO_M [26], PSRC [15], PCRC and MSPCRC [36].

Face recognition with real disguise.

As in [29, 32], a subset of the AR face database is applied, containing 50 males and 50 females. Each face image is manually cropped and normalized to a size of 42×30. Fig 8 shows the sample images for one person from the AR database. In our experiment, for each individual, the first four images (with various facial expressions) from session 1 and session 2 are chosen to form the training set. Two image sets with sunglasses and scarves are used for testing, each of which includes 600 images (three images per session of each individual). For each individual, 2~5 samples are randomly chosen from the training set and another 3 samples from the testing set to evaluate the performance of each method.

thumbnail
Fig 8. Training and testing images of a person in the AR database.

https://doi.org/10.1371/journal.pone.0159945.g008

The recognition results of each method are shown in Tables 5 and 6, from which we can see that the patch-based method achieves better performance than the corresponding original holistic ones. PMR also gives better results than PCRC or MSPCRC. MSPMR obtains the best performance among all the competing methods when testing images with sunglasses and achieves comparable results when testing images with scarves.

thumbnail
Table 5. Recognition rates (%) on the AR database testing with sunglasses.

https://doi.org/10.1371/journal.pone.0159945.t005

thumbnail
Table 6. Recognition rates (%) on the AR database testing with scarves.

https://doi.org/10.1371/journal.pone.0159945.t006

Face recognition with block occlusions.

In this subsection, we evaluate the robustness of our method against block occlusions. We adopt Subsets 1 and 2 of the Extended Yale B database for training and Subset 3 for testing. All the face images are normalized to 48×42 pixels. The testing images are corrupted by a randomly located square block of a “baboon” image with an occlusion level of 40%. Fig 9 shows the training and testing sample images for one person from the Extended Yale B database. For each individual, 2~5 samples are randomly chosen from the training set and another 5 samples from the testing set to evaluate the performance of each method.

thumbnail
Fig 9. Training (the first row) and testing (the second row) sample images of a person in the Extended Yale B database.

https://doi.org/10.1371/journal.pone.0159945.g009

The face recognition results of each method are tabulated in Table 7. We can see that by characterizing the reconstruction error with the nuclear norm, NSC overall outperforms CRC, SRC, HQ_A and HQ_M. By virtue of the patch trick, our PMR always outperforms PCRC and PSRC. By incorperating the multi-scale ensemble learning trick, the proposed MSPMR achieves the best performance among all the competitive methods.

thumbnail
Table 7. Recognition rates (%) on the Extended Yale B database with block occlusion.

https://doi.org/10.1371/journal.pone.0159945.t007

3. Parameter discussion

In this subsection, we mainly discuss how the regularization parameter λ affects the performance of our PMR and MSPMR in different face recognition scenarios. The experimental settings are the same as in the aforementioned experiments in section 4.1 and 4.2 except that the training samples per person are fixed at 3. Fig 10 plots the recognition results of PMR and MSPMR versus the variation in the regularization parameter λ on different face image databases. We can observe that PMR and MSPMR always achieve their optimal or nearly optimal performance in the range of [0.01, 0.1]. Thus, we can set the regularization parameter of the proposed method in the above range for real-word scenarios.

thumbnail
Fig 10. Recognition rate curves of PMR and MSPMR versus the variations in the regularization parameter in different face recognition scenarios.

https://doi.org/10.1371/journal.pone.0159945.g010

4. Running time comparisons

In this subsection, the CPU runtime of the proposed method is compared with the state-of-the-art methods. The compared results on the AR face database testing with scarves are listed for demonstration. For each individual, 3 samples are randomly chosen for training and another 3 samples for testing. Table 8 tabulates the CPU time cost on all test images conducted using Matlab R2012b on an Intel Core 8 CPU with 3.6 GHz and 8G memory PC at Windows platform. Since the singular value shrinkage operator in matrix regression, the proposed method consumes much more time compared with other methods. Although the patch-based methods have achieved promising results, they come at the cost of expensive running time. Thanks to the independence of the recognition process of each test image, we can hope to save the cost by parallel computation.

thumbnail
Table 8. Comparisons of CPU time on AR face database testing with scarves.

https://doi.org/10.1371/journal.pone.0159945.t008

5. Evaluation of the experimental results

The aforementioned experimental results have shown that the proposed method always obtain better performance than some state-of-the-art methods. However, is this superiority statistically significant? In this subsection, we will assess the experimental results by the null hypothesis statistical test [51]. If the evaluated p-value is under the desired significance level (i.e., 0.05), the performance difference between compared approaches is deemed to be statistically significant. The evaluation results are summarized as follows:

  1. For face recognition without occlusion, such as on LFW database, MSPMR outperforms MSPCRC significantly for all tests (p = 0.014, 0.013,0.016 and 0.020). On other database, although MSPMR performs better than other state-of-the-art methods, the performance discrepancies between MSPMR and other approaches are not statistically significant.
  2. For face recognition with occlusion, MSPMR performs significantly better than other approaches in case of real disguise and block occlusion (p < 0.001).

Conclusions and the Future Work

To improve the performance of matrix regression in face of the small sample size problem and preserve the desired performance level in the presence of occlusion and illumination changes, in this paper, we proposed a patch-based matrix regression (PMR) method. PMR first performed matrix regression on each raw patch (without matrix-to-vector conversion), and then combined the recognition outputs of all patches by majority voting. However, it is difficult to pre-define an optimal patch size across different databases. Fortunately, the complementary information across multiple patch scales can be fully utilized to further enhance the recognition performance. To this end, we proposed the multi-scale version of PMR, i.e., MSPMR, to optimally combine the multi-scale outputs. Our extensive experimental results have demonstrated that the proposed methods are more effective and robust than the state-of-the-art methods.

Although our proposed method has obtained successful performance, there are still many issues to be addressed in future. Generally, two main improvements can be made for our method. (1) With the development of the storage device, many images can be collected from real-word applications. One challenge in our method is how to overcome the expensive computational cost. We will try to design efficient matrix regression algorithm to further improve the robustness and effectiveness of our method. (2) In our method, we have to pre-define several specific scale sizes in advance. However, different database may exhibit scale transformation in real-word applications. We can borrow the idea of scale selective local binary patterns [52] to design adaptive scale selection strategy to further improve the flexibility of our method.

Ethics Statement

Some face image datasets were used in this paper to verify the performance of our methods. These face image datasets are publicly available for face recognition research, and the consent was not needed. The face images and the experimental results are reported in this paper without any commercial purpose.

Author Contributions

  1. Conceived and designed the experiments: GWG JY XYJ.
  2. Performed the experiments: GWG PH JLH.
  3. Analyzed the data: GWG JY XYJ DY.
  4. Wrote the paper: GWG JY XYJ DY.

References

  1. 1. Wen X, Shao L, Xue Y, Fang W (2015) A rapid learning algorithm for vehicle classification. Information Sciences 295:395–406.
  2. 2. Gu B, Sheng VS, Tay KY, Romano W, Li S (2015) Incremental support vector learning for ordinal regression. IEEE Transactions on Neural Networks and Learning Systems 26(7):1403–1416. pmid:25134094
  3. 3. Juang C-F, Chiu S-H, Shiu S-J (2007) Fuzzy system learned through fuzzy clustering and support vector machine for human skin color segmentation. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 37(6):1077–1087.
  4. 4. Gu B, Sheng VS, Wang Z, Ho D, Osman S, Li S (2015) Incremental learning for v-support vector regression. Neural Networks 67:140–150. pmid:25933108
  5. 5. Deng Z, Choi K-S, Jiang Y, Wang S (2014) Generalized hidden-mapping ridge regression, knowledge-leveraged inductive transfer learning for neural networks, fuzzy systems and kernel methods. IEEE Transactions on Cybernetics 44(12):2585–2599. pmid:24710838
  6. 6. Gu B, Sheng VS (2016) A Robust Regularization Path Algorithm for v-Support Vector Classification. IEEE Transactions on Neural Networks and Learning Systems
  7. 7. Jiang Y, Chung F-L, Ishibuchi H, Deng Z, Wang S (2015) Multitask TSK fuzzy system modeling by mining intertask common hidden structure. IEEE Transactions on Cybernetics 45(3):534–547.
  8. 8. Gu B, Sun X, Sheng VS. Structural Minimax Probability Machine (2016) IEEE Transactions on Neural Networks and Learning Systems
  9. 9. Jain AK, Ross A, Prabhakar S (2004) An introduction to biometric recognition. IEEE Transactions on Circuits and Systems for Video Technology 14(1):4–20.
  10. 10. Wong WK, Lai Z, Xu Y, Wen J, Ho CP (2015) Joint Tensor Feature Analysis For Visual Object Recognition. IEEE Transactions on Cybernetics 45(11):2425–2436. pmid:26470058
  11. 11. Yang M, Feng Z, Shiu SC, Zhang L (2014) Fast and robust face recognition via coding residual map learning based adaptive masking. Pattern Recognition 47(2):535–543.
  12. 12. Gao G, Yang J, Wu S, Jing X, Yue D (2015) Bayesian sample steered discriminative regression for biometric image classification. Applied Soft Computing 37:48–59.
  13. 13. Chen B, Shu H, Coatrieux G, Chen G, Sun X, Coatrieux JL (2015) Color image analysis by quaternion-type moments. Journal of mathematical imaging and vision 51(1):124–144.
  14. 14. Yang W, Wang Z, Sun C (2015). A collaborative representation based projections method for feature extraction. Pattern Recognition 48(1):20–27.
  15. 15. Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust Face Recognition via Sparse Representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(2):210–227. pmid:19110489
  16. 16. Naseem I, Togneri R, Bennamoun M (2010). Linear Regression for Face Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. 32(11):2106–2112. pmid:20603520
  17. 17. Wagner A, Wright J, Ganesh A, Zhou Z, Mobahi H, Ma Y (2012) Toward a practical face recognition system: Robust alignment and illumination by sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(2):372–386. pmid:21646680
  18. 18. Zhang L, Yang M, Feng X (2011) Sparse representation or collaborative representation: Which helps face recognition? 2011 IEEE International Conference on Computer Vision (ICCV), IEEE, p. 471–478.
  19. 19. Gao G, Yang J (2014) A novel sparse representation based framework for face image super-resolution. Neurocomputing 134:92–99.
  20. 20. Jiang J, Hu R, Wang Z, Han Z (2014) Noise robust face hallucination via locality-constrained representation. IEEE Transactions on Multimedia 16(5):1268–1281.
  21. 21. Tawari A, Trivedi MM (2013) Face expression recognition by cross modal data association. IEEE Transactions on Multimedia 15(7):1543–1552.
  22. 22. Lai Z, Xu Y, Jin Z, Zhang D (2014). Human gait recognition via sparse discriminant projection learning. IEEE Transactions on Circuits and Systems for Video Technology 24(10):1651–1662.
  23. 23. Lai Z, Xu Y, Chen Q, Yang J, Zhang D (2014) Multilinear sparse principal component analysis. IEEE Transactions on Neural Networks and Learning Systems 25(10):1942–1950. pmid:25291746
  24. 24. Yang J, Zhang L, Xu Y, Yang J-y (2012) Beyond sparsity: The role of L 1-optimizer in pattern classification. Pattern Recognition 45(3):1104–1118.
  25. 25. Yang M, Zhang L, Yang J, Zhang D (2011) Robust Sparse Coding for Face Recognition. 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp:625–632.
  26. 26. He R, Zheng W-S, Tan T, Sun Z (2014) Half-quadratic-based iterative minimization for robust sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(2):261–275. pmid:24356348
  27. 27. Yang M, Zhu P, Liu F, Shen L (2015) Joint representation and pattern learning for robust face recognition. Neurocomputing 168:70–80.
  28. 28. Ou W, You X, Tao D, Zhang P, Tang Y, Zhu Z (2014) Robust face recognition via occlusion dictionary learning. Pattern Recognition 47(4):1559–1572.
  29. 29. Yang J, Luo L, Qian J, Tai Y, Zhang F, Xu Y (2016) Nuclear norm based matrix regression with applications to face recognition with occlusion and illumination changes. IEEE Transactions on Pattern Analysis and Machine Intelligence preprints
  30. 30. Luo L, Yang J, Qian J, Yang J (2014) Nuclear Norm Regularized Sparse Coding. 2014 22nd International Conference on Pattern Recognition (ICPR), IEEE, pp: 1834–1839.
  31. 31. Zhang F, Yang J, Tai Y, Tang J (2015). Double Nuclear Norm-Based Matrix Decomposition for Occluded Image Recovery and Background Modeling. IEEE Transactions on Image Processing 24(6):1956–1966. pmid:25667350
  32. 32. Qian J, Luo L, Yang J, Zhang F, Lin Z (2015). Robust nuclear norm regularized regression for face recognition with occlusion. Pattern Recognition 48(10):3145–3159.
  33. 33. Tan X, Chen S, Zhou Z-H, Zhang F (2006) Face recognition from a single image per person: A survey. Pattern recognition 39(9):1725–1745.
  34. 34. Kumar R, Banerjee A, Vemuri BC, Pfister H (2011) Maximizing all margins: Pushing face recognition with kernel plurality. 2011 IEEE International Conference on Computer Vision (ICCV), IEEE, pp: 2375–2382.
  35. 35. Kumar R, Banerjee A, Vemuri BC (2009) Volterrafaces: Discriminant analysis using volterra kernels. IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp: 150–155.
  36. 36. Zhu P, Zhang L, Hu Q, Shiu SC (2012) Multi-scale patch based collaborative representation for face recognition with margin distribution optimization. 2012 European Conference on Computer Vision (ECCV), Springer, pp: 822–835.
  37. 37. Tan X, Chen S, Zhou Z-H, Zhang F (2005) Recognizing partially occluded, expression variant faces from single training image per person with SOM and soft k-NN ensemble. IEEE Transactions on Neural Networks 16(4):875–886. pmid:16121729
  38. 38. Chen S, Liu J, Zhou Z-H (2004) Making FLDA applicable to face recognition with one sample per person. Pattern recognition 37(7):1553–1555.
  39. 39. Su Y, Shan S, Chen X, Gao W (2009) Hierarchical ensemble of global and local classifiers for face recognition. IEEE Transactions on Image Processing 18(8):1885–1896. pmid:19556198
  40. 40. Lin D, Tang X (2006) Recognize high resolution faces: From macrocosm to microcosm. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, pp. 1355–1362.
  41. 41. Wolf L, Hassner T, Taigman Y (2011) Effective unconstrained face recognition by combining multiple descriptors and learned background statistics. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(10):1978–1990. pmid:21173442
  42. 42. Guillaumin M, Verbeek J, Schmid C (2009) Is that you? Metric learning approaches for face identification. 2009 IEEE International Conference on Computer Vision (ICCV), IEEE, pp. 498–505.
  43. 43. Huang GB, Ramesh M, Berg T, Learned-Miller E (2007) Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07–49, University of Massachusetts, Amherst.
  44. 44. Lin Z, Chen M, Ma Y (2010) The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv preprint arXiv:10095055.
  45. 45. Cai JF, Candes EJ, Shen ZW (2010) A Singular Value Thresholding Algorithm for Matrix Completion. Siam Journal on Optimization 20(4):1956–1982.
  46. 46. Shawe-Taylor J, Cristianini N (1999) Robust bounds on generalization from the margin distribution. 4th European Conference on Computational Learning Theory.
  47. 47. Shen C, Li H (2010) On the dual formulation of boosting algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(12):2216–2231. pmid:20975119
  48. 48. Kim S-J, Koh K, Lustig M, Boyd S, Gorinevsky D (2007) An interior-point method for large-scale l1-regularized least squares. IEEE Journal of Selected Topics in Signal Processing 1(4):606–617.
  49. 49. Lee KC, Ho J, Kriegman DJ (2005) Acquiring linear subspaces for face recognition under variable lighting. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(5):684–698. pmid:15875791
  50. 50. Martinez AM, Benavente R (1998) The AR face database. CVC Technical Report 24.
  51. 51. Beveridge JR, She K, Draper B, Givens GH (2001) Parametric and nonparametric methods for the statistical evaluation of human id algorithms. 3rd Workshop on the Empirical Evaluation of Computer Vision Systems pp. 1–14.
  52. 52. Guo Z, Wang X, Zhou J, You J (2016) Robust Texture Image Representation by Scale Selective Local Binary Patterns. IEEE Transactions on Image Processing 25(2):687–699. pmid:26685235