Supervised Filter Learning for Representation Based Face Recognition

Representation based classification methods, such as Sparse Representation Classification (SRC) and Linear Regression Classification (LRC) have been developed for face recognition problem successfully. However, most of these methods use the original face images without any preprocessing for recognition. Thus, their performances may be affected by some problematic factors (such as illumination and expression variances) in the face images. In order to overcome this limitation, a novel supervised filter learning algorithm is proposed for representation based face recognition in this paper. The underlying idea of our algorithm is to learn a filter so that the within-class representation residuals of the faces' Local Binary Pattern (LBP) features are minimized and the between-class representation residuals of the faces' LBP features are maximized. Therefore, the LBP features of filtered face images are more discriminative for representation based classifiers. Furthermore, we also extend our algorithm for heterogeneous face recognition problem. Extensive experiments are carried out on five databases and the experimental results verify the efficacy of the proposed algorithm.


Introduction
Automatic face recognition has become a very active topic in computer vision and related research fields [1]. However, face recognition is still a very difficult task in practice due to the following two problematic factors. One is the appearance variations including facial expression, pose, aging, illumination changes, the other is the man-made variations, e.g. the noises from the cameras. The performances of many recognition approaches degrade significantly in these cases.
Recently, the representation based methods have been widely used in face recognition problem. In [2], Wright et al. proposed a sparse representation based classification (SRC) method for face recognition. SRC first sparsely codes a query face image by the original training images, and then the classification is performed by checking which class leads to the minimal representation residual of the query image. Later, Naseem et al. [3] proposed a linear regression based classification (LRC) method based on the assumption that patterns from the same class lie on a linear subspace, so the test image should be well represented as a linear combination of the training images from the same class. The main difference between SRC and LRC is the regularization they employed. That is, SRC utilizes the L 1 norm regularization to make the representation coefficients to be sparse, while the L 2 norm regularization is adopted in LRC to ensure the learning problem to be well posed. Since the experimental results in [2] and [3] demonstrated that SRC and LRC achieved impressive face recognition performances, the research of representation based face recognition was largely boosted and lots of approaches have been developed [4][5][6][7][8]. However, these representation based methods all utilized the original face image without any preprocessing for classification. Thus, as we have analyzed above, their performances may be affected by the problematic factors in face images.
Nowadays, various feature extraction approaches have been employed for face recognition. Among these approaches, Principal Component Analysis (PCA) [9], Linear Discriminant Analysis (LDA) [10] and their related extensions [11][12][13][14][15]have been well studied and widely utilized to extract low-dimensional features from the high-dimensional face images. However, since some recent studies have shown that high-dimensional face images possibly reside on a nonlinear manifold, many manifold learning methods such as Isometric Feature Mapping (ISOMAP) [16], Local Linear Embedding (LLE) [17], Laplacian Eigenmap (LE) [18] and their extensions have also been proposed for face recognition. Although the aforementioned feature extraction algorithms worked well, they all belong to the subspace based method and can only extract the holistic features of face images, which may lead them to be unstable to local variances such as expression, occlusion, and misalignment [19]. As a result, local descriptors such as Local Binary Pattern (LBP) have attracted more and more attention for their robustness to local distortions [20,21]. The LBP operator [22] is a texture descriptor which describes the neighboring changes around each pixel. It has been successfully used in face recognition applications due to its invariance to the changes of illumination and expression in face images and computational efficiency. Considering the advantages of LBP in face recognition [23], many LBP variants have been proposed. In LGBP [24], GVLBP [25] and HGPP [26], instead of directly using the pixel intensity to compute the LBP features, multi-scale and multi-orientation Gabor filters were employed for encoding the face images. Then, the LBP histogram was obtained from the encoded images. Zhao et al. first extracted the gradient information from face image using Sobel operator and then applied LBP to the gradient images for feature extraction [27]. The LBP has also been adopted to extract the features for representation based classification techniques. In [28] and [29], some researchers combined LBP with SRC for face recognition. In their methods, the LBP features were first extracted from the face images. Then, the SRC was utilized for classification. Kang et al. employed LBP to extract local features of the face images so that the performance of kernel SRC could be improved [30]. In [31], Lee also used the Gabor-LBP features for face image representation in SRC.
Although the experimental results in [28,29]indicate that the LBP can improve the performances of representation based face recognition techniques, a main drawback of these methods is that the label information is neglected during the local feature extraction of LBP, which may weaken their discriminative ability. In order to overcome this limitation, Lei et al. proposed an Image Filter Learning (IFL) method for face recognition [19]. In IFL, an image filter which can explore the discriminative information for face representation was first learned. Then, the LBP features were extracted from the filtered face images for recognition. However, IFL learns the discriminative image filter based on Fisher criterion. Thus, it may not be suitable for representation based face recognition methods in which the classification is determined by the representation residuals. Furthermore, the Fisher criterion may also make it not suitable to non-Gaussian distributed face images [32].
In this paper, a new supervised filter learning (SFL) algorithm is proposed to improve the discriminative ability of LBP features for representation based face recognition. Compared with other algorithms, our algorithm possesses two advantages. Firstly, different from LGBP [24], GVLBP [25], HGPP [26] and Sobel-LBP [27] in which the image filters are defined in an ad hoc way, the optimal filter in our algorithm is learned by a supervised data-driven manner. Therefore, the LBP features obtained in our algorithm are more discriminative than them. Secondly, unlike IFL [19] which learns the filter based on Fisher criterion, our proposed SFL is specially designed for representation based face recognition methods. That is, the main difference between IFL and the proposed algorithm is that the filter in IFL is learned by minimizing the within-class scatter and maximizing the between-class scatter of faces' LBP features, while the filter in our algorithm is learned through reducing the within-class representation residual and enlarging the between-class representation residual of faces' LBP features. As a result, it can be seen from the experimental results on five benchmark face databases (Yale, AR, CMU-PIE, LFW and VLNHF) that the performances of our algorithm are better than IFL and some other algorithms for representation based face recognition problem.
The remaining part of the paper is organized as follows: 'Related Work' section briefly reviews the LBP and IFL. 'The Proposed Algorithm' section describes the details of our algorithm. Experimental results and analysis are provided in 'Experiments' section and 'Conclusions' section gives the conclusion of this paper.

Related Work
In this section, two related works including Local Binary Pattern (LBP) [33] and Image Filter Learning (IFL) [19] are briefly reviewed.

Local Binary Pattern
Local Binary Pattern (LBP) [33] was original proposed by Ojala et al. as a powerful technique for texture description. It can efficiently describe the local texture of an image by thresholding each pixel in a 3 × 3 sized neighborhood with the center pixel's value and considering the results as a binary number (see Fig 1 for an illustration). As a result, 256-bin histogram of the LBP labels computed over the image can be used as a texture feature. To describe the image textures at different scales, the LBP was later extended to use different neighborhood sizes [33,34]. In this way, the values of d points evenly sampled from a circle within an r×r sized neighborhood are compared with the center pixel's value. Then, the comparison result can also be considered as a binary number (see Fig 2 for an illustration). When the sampled points are not exactly located in the centers of pixels, their values can be estimated by interpolation [33,34].
Compared with other features, LBP feature has the advantage of invariant to monotone transformation. Thus, it is robust to the illumination and expression changes of face images to some extent and has been widely employed for face recognition. However, one limitation of LBP and its extensions is that the label information of face images is ignored. Therefore, the features extracted by these methods may lack of discrimination.

Discriminant Face Descriptor
For the sake of overcoming the limitation of LBP and improving the discriminative ability of LBP features, Lei et al. proposed a discriminant image filter learning (IFL) for face recognition [19]. The main idea of IFL is to reduce the variances of LBP features of face images from intra person and meanwhile enlarge the margin between LBP features of face images from different persons. To achieve this goal, the label information of face images is utilized to learn a filter in IFL. Then, the LBP operator is applied on the filtered face images for local feature extraction.  Specifically, let I denote an input face image and its filtered image is denoted as f(I). Considering the sampling strategy of LBP, IFL first defines pixel difference vector (PDV) as: where f(I) p is the pixel value of filtered face image at position p, {p 1 , p 2 , Á Á Á, p d }2Neighbor(p) and d is the number of sampled points. Then, in order to make sure that the PDVs of filtered face images from the same person are similar and the PDVs of filtered face images from different persons are distant. The Fisher criterion is adopted as where S b and S w are the between-class and within-class class scatters, which can be computed as where L is the number of classes and C i is the number of samples in the i-th class.df ðIÞ ij ¼ ½df ðIÞ are augmented vectors by concatenating mean vectors over different positions.df ðmÞ p i is the mean vector of PDVs at position p of the filtered face images from the i-th class and df(m) P is the total mean vector of PDVs at position p over the sample set. In IFL, the image filter vector is set to be w, and the value of filtered image at position p can be represented as f(I) P = w T I p , where I p denotes the patch vector centered at position p of the original face image. Therefore, the filter w can be learned by whereŜ b andŜ w are the between-class and within-class scatters of PDVs from the original input face images. For more details about the IFL, the readers can refer to [19].

Supervised Filter Learning
As shown in the previous section, IFL utilized the Fisher criterion to learn an optimal filter which improved the discriminative ability of LBP features extracted from filtered images. Therefore, like other methods based on Fisher criterion (such as LDA), it may only be suitable for the case in which the samples of each class are approximately Gaussian distributed [32]. However, this property is not always satisfied in face recognition problem [35]. Furthermore, the Fisher criterion is also not suitable for the representation based classification methods which have been proved to be effective for face recognition tasks. In order to overcome these limitations, we propose a new supervised filter learning (SFL) algorithm to improve the discriminative ability of LBP features for representation based face recognition.
Formally, let T ¼ ½T 1 ; T 2 ; Á Á Á ; T N 2 R DÂN denote a set of training face images from L classes (each class possesses C i samples, i = 1, . . ., L). Similar to IFL, we suppose that the filtered Since the proposed algorithm also applies LBP operator on the filtered image, we define the pixel difference vector (PDV) as: where f(T i ) p is the pixel value of filtered face image f(T i ) at position p, {p 1 , p 2 , Á Á Á, p d } 2 Neighbor(p) and d is the number of sampled points. Different from IFL which maximize the ratio of between-class scatter to the within-class scatter of LBP features extracted from the filtered face images, the aim of the proposed algorithm is to benefit the representation based face recognition methods. That is, our algorithm want to learn a filter so that after the image filtering, the LBP feature of a face image can be accurately represented by those from the same person and cannot be represented by those of different persons. To achieve this goal, we need to reduce the within-class representation residual and enlarge the between-class representation residual of the filtered images' PDVs. Suppose that df(T ij ) p is the p-th PDV of the j-th face image from the i-th class, its within-class representation residual can be obtained as: th PDVs of the other filtered face images from the i-th class and a p ij is the vector of within-class representation coefficients for df(T ij ) p , which can be estimated using the least-squares algorithm [28] as: Considering all the PDVs, we can get the total within-class representation residual as Similarly, the between-class representation residual of df(T ij ) p can be formulated as where df ðT~i Þ p is a matrix formed by the p-th PDVs of the filtered face images do not belong to the i-th class and b p ij is the vector of between-class representation coefficients for df(T ij ) p , which can also be estimated by least-squares algorithm as: where df ðT~i Þ pT is the transpose of df ðT~i Þ p .
Then, the total between-class representation residual can be obtained as Now, through combining Eqs (10) and (12), the objective function of our proposed filter learning algorithm is From the definitions of R w and R b , it can be found that Eq (13) will incur heavy penalties if the within-class residual of the filtered images' PDVs is large and the between-class residual of the filtered images' PDVs is small. Thus, minimizing Eq (13) could ensure that the LBP features extracted from a filtered face image can only be well represented by those from the same class but cannot be represented by those from different classes. In this study, we suppose that the image filter with the size of S × S can be concatenated into a vector o 2 R FÂ1 (F = S × S). Then, the value of a filtered image f(T ij ) at position p can be denoted as f ðT ij Þ p ¼ o T T p ij , where T p ij is a vector concatenated by the patch centered at position p of image T ij . Analogously, the PDV at position p of a filtered image can also be denoted as df ðT ij Þ p ¼ o T dT p ij , where dT p ij is the PDV at position p of the unfiltered image T ij . Through substituting df(T ij ) p into Eqs (10) and (12), Eq (13) can be converted to LetR w ¼ ðdT p ij À a p ij dT p ij ÞðdT p ij À a p ij dT p ij Þ T and From Eq (15), it is clear that both the matrixR w andR b are symmetric and positive semidefinite. As a result, the optimal filter (i.e. ω) that minimizing the objective function of our algorithm can be obtained by solving the generalized eigenvalue problemR w o ¼ lR b o with its smallest eigenvalue.
After the filter ω has been learned, we can convert it into the matrix form with the size of S × S and employ it to filter the training face images. Then, the LBP features are extracted from the filtered images and the representation based classification methods (such as SRC and LSR) can be utilized for recognition.

Extended SFL for Heterogeneous Face Recognition
Nowadays, heterogeneous face image recognition has attracted more and more attentions due to its widely applications in video surveillance and law enforcement. According to some studies [19,36], the heterogeneous faces can be defined as faces which are captured in different environments or different devices. For instance, the face images captured by visible light and near-infrared imaging devices can be regarded as heterogeneous faces.
In this section, we extended the proposed SFL for heterogeneous face recognition problem. Similar to the SFL for homogeneous face images, the aim of extended SFL is to learn a filter to reduce the within-class representation residual of faces' LBP features for heterogeneous images from the same person and enlarge the between-class representation residual of faces' LBP features for heterogeneous images from the different persons. Suppose are two heterogeneous image sets (e.g. images captured by visible light and near-infrared imaging devices), and the filtered images of them are f ðT ij Þ p and df ðT M ij Þ p be the p-th PDVs of the j-th faces from the i-th class in two heterogeneous image sets. In order to make sure that the LBP features of face images can be well represented by those from the same person, we need to minimize the following within-class representation residual:R where R w VV and R w MM are the homogeneous within-class representation residuals which can be obtained by Eq (9). R w VM and R w MV are the within-class representation residuals between heterogeneous images, which can be defined as: and Analogically, to ensure that the LBP features of face images cannot be represented by those from different persons, the following between-class representation residual should be maximized: where R b VV and R b MM are the homogeneous between-class representation residuals obtained by Eq (12). R b VM and R b MV are the heterogeneous between-class representation residuals defined as: and Through combining Eqs (16) and (19) together, we can obtain the objective function of extended SFL for heterogeneous face recognition as Similar to 'Supervised Filter Learning' section, we also suppose that that the image filter with the size of S × S can be concatenated into a vector o 2 R FÂ1 (F = S × S). Then, we have  (16) and (19), these two equations can be converted tõ After a series of deductions, Eq (22) can be reduced to where Therefore, the optimal filter ω that minimizing the objective function of extended SFL in Eq (25) can be obtained by solving the generalized eigenvalue problem K w ω = λK b ω with its smallest eigenvalue. After the filter learning, ω can be converted into its matrix form to filter the heterogeneous face images in T M and T V . Then, the SRC or LRC can be adopted for recognition.

Experiments
In this section, the performance of the proposed algorithm is tested and compared with other related algorithms such as LBP [33], LGBP [24], GVLBP [25], IFL-LBP [19], DSNPE [37], MNSMC [38]and UDSPP [39]. Among these algorithms, LBP, LGBP, GVLBP and IFL-LBP are LBP based methods, while DSNPE, MNSMC and UDSPP are recently proposed subspace based methods for representation based face recognition. Here, five benchmark face databases including Yale [40], AR [41], CMU PIE [42], LFW [43] and VLNHF [44] are employed. The proposed algorithm and other approaches used for comparison are all implemented in Matlab and executed on a computer with Intel Core i3-2100 CPU at 3.2 GHz and 8 GB physical memory.

Data Description
The Yale face database [40] contains 165 grayscale images of 15 individuals. Thereare11 images per subject, and the images demonstrate variations in facial expression (normal, sad, happy, sleepy, surprised, and wink), lighting condition (left-light, center-light, right-light), and with/ without glasses. In our experiment, 6 images of each person are randomly selected for training and the rest images are used for testing.
The AR face database [41] consists of more than 4000 frontal images from 126 subjects including 70 males and 56 females. The images were taken in two sessions separated by two weeks with expression (neutral, smile, anger and scream) and occlusion (sunglass and scarf) variations. In this experiment, we choose a subset which contains 50 males and 50 females. For each subject, 14 images with only illumination and expression changes are selected. We randomly select 7 images from each person for training, and remaining images are used for testing.
The CMU PIE face database [42] includes 68 subjects with 41368 face images as a whole, each subject contains 13 different poses, 43 different illumination conditions, and 4 different expressions. In our experiment, 24 face images of each individual are used. For this database, we randomly select 12 images of each person to form the training set and the rest images are utilized for testing.
The LFW database [43] is a large scale database which contains 13,233 face images of 5,749 different individuals. Since all the samples were taken from the real world in an unconstrained environment, the expression, pose, illumination, occlusions and alignment of face images are very variable in this database. In our study, a subset which contains 1580 face images of 158 individuals from the LFW database is employed. We randomly select 7 images from each person for training, and remaining images are used for testing.
The Visible Light and Near-infrared Human Face (VLNHF) database [44] is a heterogeneous face image database which consists of two datasets (Lab1 and Lab2). The Lab1 dataset simultaneously contains visible light images and near-infrared images of 50 persons. Each person has 10 visible light images and 10 near-infrared images. The Lab2 dataset also contains visible light images and near-infrared images of 50 subjects. Each subject provides twenty visible light face images and the same number of near-infrared face images. These images were acquired under four different illumination conditions, and also have variation in facial expression and pose. In the experiment, 7 visible light images and 7 near-infrared images of each person are randomly selected for training in Lab1 dataset, and 12 visible light images and 12 nearinfrared images of each person are randomly selected for training in Lab2 dataset. The rest images are used for testing.
In our recognition experiment, all images are manually aligned, cropped, and then resized to the resolution of 66×66, the random training sample selection are repeated 10 times for all databases and the averaged recognition accuracies are reported in the next subsection.

Results and Discussions
In the proposed algorithm and IFL-LBP, the image filter size S and neighborhood size r of LBP will affect their performances. According to [36], we empirically set S and r to be the same value and tune the value from{3, 5, 7}. The number of sampled points is set as d = 8 for all LBP based algorithms so that 256 dimension LBP features are extracted. For DSNPE, MNSMC and UDSPP, in order to fairly compare them with the LBP based algorithms, the dimension of subspace in these three algorithms are also set as 256. Two well known representation based classifiers, i.e., SRC [2] and LRC [3] are adopted for recognition in our study.
Homogeneous face recognition. The recognition performances of various approaches on different homogeneous face databases can be seen in Tables 1-4. From these tables, the following points can be observed. Firstly, it can be found that LBP extracts the local texture features directly from the original face images, so its performances are inferior to other algorithms in most cases. Secondly, we can see that the performances of LGBP and GVLBP are better than LBP on AR, CMU PIE and LFW databases. This is because that LGBP and GVLBP extract the LBP features from the images after multi-scale and multi-orientation Gabor filtering, which could eliminate the influences of illumination and expression changes in the face images to some extent. However, we can also observe that LBP outperforms the LGBP and GVLBP on Yale database. The reason to this phenomenon may be that the number of individuals in Yale is much less than other three databases. Thus, the dimension of LBP features obtained from multi-scale and multi-orientation Gabor filtered face images is much higher than the number of training instances. This "small sample size" problem will weaken the performances of classifiers [45]. Thirdly, since IFL-LBP learns the filter in a supervised manner, its recognition results are better than other LBP based algorithms. Fourthly, we can find that the performances of subspace based algorithms (i.e. DSNPE, MNSMC and UDSPP) are better than LBP, LGBP, GVLBP and IFL-LBP in some cases. This is because these three algorithms are all designed for representation based face recognition. Nevertheless, since the subspace based algorithms only extract holistic features from the face images, their recognition results are still worse than our algorithm. At last, it can be seen that the proposed algorithm outperforms IFL-LBP and other algorithms on all databases. This is due to that the filter in our algorithm is learned based on the representation residual rather than Fisher criterion, which makes the LBP features extracted from the filtered images more suitable for the representation based classifiers. Besides the representation based classifiers, we also compare the performances of our SFL-LBP with IFL-LBP using Nearest Neighbor classifier. From the experimental results in Table 5, it can be found that the proposed algorithm outperforms IFL-LBP in most cases. This is because the Fisher criterion utilized in IFL cannot work well when the input training samples are not Gaussian distributed. Then, the performances of our algorithm under different filter and neighborhood sizes are compared. From the experimental results in Tables 1-4, it can be found that the proposed SFL-LBP achieves better performances than IFL-LBP when their parameters are set as the same value. Moreover, we can also see that the values of parameters S and r have important effect on the performances of both IFL-LBP and SFL-LBP. However, given the standard deviation, the differences among the recognition results of our algorithm under various parameter values are less than IFL-LBP (especially on AR and CMU PIE databases). This indicates the proposed algorithm is less sensitive to the parameters when they are set as appropriate values.
Next, the Cumulative Match Characteristic (CMC) curve is used in our experiment to further compare the performances of IFL-LBP and our algorithm. From the CMC curves in Figs 3 and 4, it can be observed that our algorithm outperforms IFL-LBP nearly at all ranks, which demonstrates the advantage and robust of our algorithm for representation based face recognition tasks.
Heterogeneous face recognition. In this subsection, the performance of the proposed algorithm for heterogeneous face recognition are validated and compared with IFL. The  Tables 6 and 7. From these tables, we can find that the proposed SFL outperforms IFL, which is consistent with the experimental results in 'Homogeneous Face Recognition' section. Furthermore, from CMC curves in Figs 5 and 6, the superior of our SFL for heterogeneous face recognition task is also verified. Statistical test. In this subsection, the one-tailed Wilcoxon rank sum test is utilized to verify whether the performance of our algorithm is significantly better than the other algorithms. In this test, the null hypothesis is that the proposed SFL-LBP makes no difference when compared to other algorithms, and the alternative hypothesis is that SFL-LBP makes an   Table 8, it can be found that the p-values obtained by all pairwise Wilcoxon rank sum tests are less than the significance level, which indicates that the null hypotheses are rejected in all pairwise tests and the proposed algorithm significantly outperforms the other algorithms.

Conclusions
This paper presents a filter learning algorithm for representation based face recognition. Due to the objective function of our proposed algorithm is specially designed to reduce the withinclass representation residual and enlarge the between-class representation residual of faces' local descriptors, it is more suitable for the representation based classifiers than other algorithms. In the experiments, five public face databases are utilized to evaluate our algorithm. Through comparing our algorithm with other state-of-the-art algorithms using two wellknown representation based classifiers, the effectiveness and advantage of our algorithm are demonstrated.