Improved Minimum Squared Error Algorithm with Applications to Face Recognition

Minimum squared error based classification (MSEC) method establishes a unique classification model for all the test samples. However, this classification model may be not optimal for each test sample. This paper proposes an improved MSEC (IMSEC) method, which is tailored for each test sample. The proposed method first roughly identifies the possible classes of the test sample, and then establishes a minimum squared error (MSE) model based on the training samples from these possible classes of the test sample. We apply our method to face recognition. The experimental results on several datasets show that IMSEC outperforms MSEC and the other state-of-the-art methods in terms of accuracy.


Introduction
The minimum squared error based classification (MSEC) is sound in theory and is able to achieve a high accuracy [1,2]. It has been proven that for two-class classification MSEC is identical to linear discriminant analysis (LDA) under the condition that the number of training samples approximates the infinity [1,2]. In addition, MSEC can be applied to multi-class classification by using a special class label matrix [3]. Various improvements to MSEC such as orthogonal MSEC [4] and kernel MSEC [5][6][7][8] have been proposed. The MSEC has been applied to a number of problems such as imbalanced classification [7], palm-print verification [9], low-rank representation [10,11], super-resolution learning [12], image restoration [13], and manifold learning [14].
In recent years, representation based classification (RC) method [15][16][17][18] has attracted increasing attention in pattern recognition. The main difference between RC and MSEC is that RC tries to use the weighted sum of all the training samples to represent the test sample, whereas MSEC aims to map the training samples to their class labels. RC can be categorized into two types. The first type is the so-called sparse representation method (SRM) such as the methods proposed in [19,20]. The goal of SRM is to simultaneously minimize the L 1 norm of the weight vector and the representation error that is the deviation between constructed sample and test sample. The second type is the so-called nonsparse representation method such as the methods proposed in [21][22][23][24][25][26]. The goal of the non-sparse representation method is to simultaneously minimize the L 2 norm of the weight vector and the representation error. The non-sparse representation method has a closed-solution and is usually more computationally efficient than SRM [21].
In this paper, we focus on multi-class classification problem and propose an improved minimum squared error based classification (IMSEC) method. The basic idea of IMSEC is to select a subset of training samples that are similar to the test sample and then build the MSE model based on them. The advantage of the IMSEC is that it seeks the optimal classifier for each test sample. However, MSEC categorizes all the test samples based on a unique classifier. Therefore IMSEC has better performance than MSEC.

The minimum Squared Error Based Classification for Multi-class Problems
Suppose that there are N training samples from c classes. Let the p-dimensional row vector x i denote the i-th training sample, where i~1,:::,N. We use a c-dimensional row vector g i to represent the class label of the i-th training sample. If this sample is from class k, the k-th entry in g i is one and the other entries are all zeroes.
If a mapping Y can approximately transform each training sample into its class label, we have where X~x . Clearly, X is an N|p matrix, G is an N|c matrix, and Y is a p|c matrix that is to be solved. As Eq. (1) cannot be directly solved, we convert it into the following equation: If X T X is non-singular, Y can be solved by In general, we use Y~(X T X zcI) {1 X T G to obtain a stable numerical solution, where c and I denote a small positive constant and the identity matrix, respectively.
Finally, we classify a test sample t as follows: the class label of t is predicted using tY , and then the Euclidean distance between tY and the class label of each class is calculated, respectively. The class label of thej-th class is a row vector whosej-th element is one and whose other elements are all zeros (j~1,2:::,c). Among the c classes, if tY is closest to the k-th class, then x is classified into the k-th class.

The Algorithm of Improved Minimum Squared Error Based Classification
Suppose the j-th class has n j training samples. Let z k j be the k-th training sample of the j-th class, where k~1,:::,n j ,j~1,:::,c. The algorithm of IMSEC has the following three steps.
Step 1. Determine L possible classes of the test sample, where Lvc. First, the test sample t is represented as a weighted sum of the training samples of each class, respectively. For thej-th class, it is assumed that t~P n k~1 w k j z k j is approximately satisfied.
t~P n k~1 w k j z k j can be rewritten as t~Z j W j , where W j~½ w 1 j :::w n j T and Z j~½ z 1 j :::z n j . Then we have W W j~( Z T j Z j zcI) {1 Z j y. DDt{Z j W W j DD is the representation error between the training samples of j-th class and the test sample. The L classes that have the smallest L representation errors are determined, and they are referred to as base classes.
Step 2. Use the base classes to establish the following MSE modelX whereX X is composed of all the training samples of the base classes, and G z is composed of the class labels of these training samples.Ŷ Y is computed usingŶ Y~(X X TX X zmI) {1X X T G z . m and I are a small positive constant and identity matrix, respectively.
Step 3. ExploitŶ Y and G z to classify the test sample t. The class label of this test sample can be predicted by using tŶ Y . Calculate the Euclidean distance between tŶ Y and the class label of each base class, respectively. Let dif k denote the distance of the between g t and the class label of the k-th class. If h~arg min k dif k , then test sample t is assigned into the hth class.

Analysis of the Proposed Method
The proposed method and the MSEC have the following differences. MSEC attempts to obtain a unique model for all the test samples, whereas the proposed method constructs a special MSE for each test sample. MSEC tries to minimize the mean square error between the predicted class labels and the true class labels of the training samples. That means MSEC is capable of  mapping the training samples to the correct class labels. However, this does not imply that the model of MSEC can map the test sample to the correct class label accurately. Since the test sample and the training samples that are ''close'' to the test sample have the similar MSE models, it can be expected that IMSEC performs better in mapping the test sample to the correct class label than CMSE.
The proposed method works in the way of coarse-to-fine classification. In detail, step 1 of the proposed method indeed roughly identifies the possible classes of the test sample.
Step 2 of the proposed method assigns the test sample into one of the possible classes. For the complicated classification problem, the way of coarse-to-fine classification is usually more effective than the way in one step [27][28][29].
It is worth pointing out that the proposed method is different from CRC [21] and linear regression based classification (LRC) [30]. The proposed method tries to establish a model to map the training samples to their true class labels, whereas CRC uses the weighted combination of all the training samples to represent the test sample, and LRC uses the class-specific training samples to represent the test sample. Moreover, when classifying a test sample, the proposed method and LRC need to solve one and C MSE models, respectively, where C is the number of the classes. As a result, the proposed method is more efficient than LRC.

A. Ethics Statement
Some face datasets were used in this paper to verify the performance of our method. These face datasets are publicly available for face recognition research, and the consent was not needed. The face images and the experimental results are reported in this paper without any commercial purpose.
In the ORL database, there are 40 subjects and each subject has 10 different images. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open/ closed eyes, smiling/not smiling) and facial details (glasses/no  glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement). Each face image contains 92|112 pixels, with 256 grey levels per pixel [38]. We resized each face image into a 46 by 56 matrix. Figure 1 shows the face images of one subject in the ORL database. We took the first three, four, five and six face images of each subject as training images and treated the others as test images, respectively. In our method, L was set to 0:3|c. For AR face database, we used 3120 gray face images from 120 subjects, each providing 26 images [39]. These images were taken in two sessions and show faces with different facial expressions, in varying lighting conditions and occluded in several ways. Figure 2 shows the 26 face images of one subject in the AR database. We took the first four, five, six, seven and eight face images of each subject as training images and treated the others as test images, respectively. In our method, L was set to 0:15|c.
A subset of the FERET face database is used to test our method. This subset includes 200 subjects, and each subject has 7 images. It is composed of the images whose names are marked with twocharacter strings: 'ba', 'bj', 'bk', 'be', 'bf', 'bd', and 'bg'. This subset involves variations in facial expression, illumination, and pose [40]. The facial portion of each original image was cropped to form a 40|40 image. Figure 3 shows some face images from the FERET database. We took the first five and six face images of each subject as training images and treated the others as test images, respectively. In our method, L was set to 0:15|c.
Tables 1, 2 and 3 show the classification error rates of the methods on the ORL, AR and FERET databases, respectively. We can observe that our method always obtains the lowest classification error rate. In other words, our method can achieve the desirable classification result.

Conclusions
The proposed method, i.e. IMSEC, establishes a special MSE model for each test sample. When building the classification model, IMSEC uses only the training samples that are close to the test sample. Theoretical analyses were presented to explore the properties of IMSEC. Compared with MSEC that classifies all the test samples based on a unique model, IMSEC can perform better in classifying the test samples. We tested the proposed method on three face datasets. The experimental results clearly demonstrated that IMSEC outperforms MSEC and the other state-of-the-art methods.