Figures
Abstract
We propose a new model for face recognition under insufficient sampling conditions in this paper. In the proposed method, we combine the fusion dictionary with nuclear norm regularization to preserve the details of the restored images, and adopt a Laplacian-uniform mixture function to fit the error distribution. Since the proposed model is convex and separable, we employ the classic alternating direction method of multipliers to solve it by introducing auxiliary variables to transform the original problem into the saddle point problem. Theoretically, we conduct the convergence analysis of the proposed numerical algorithm. Final experimental comparisons are provided to verify the satisfactory performance of the proposed model, which outperforms other related competitive methods in both recognition rate and the robustness.
Citation: Song C, Zhou Y, Ji W (2026) Adaptive robust sparse representation for face recognition based on weighted and fusion dictionary. PLoS One 21(6): e0351984. https://doi.org/10.1371/journal.pone.0351984
Editor: Xuebo Zhang, Northwest Normal University, CHINA
Received: February 11, 2026; Accepted: June 3, 2026; Published: June 26, 2026
Copyright: © 2026 Song et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data for this study are publicly available from the Zenodo repository (https://doi.org/10.5281/zenodo.19663361).
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Face recognition is characterized by straightforward data collection, distinctive features, and strong anticounterfeiting capabilities. In recent years, it has been widely used in identity verification, attendance systems, mobile payments, criminal investigation, and other fields. Traditional face recognition algorithms have achieved favorable results under ideal conditions. However, certain limitations still exist in face recognition under non-ideal conditions. Particularly when samples are insufficient, recognition accuracy is significantly compromised by factors such as noise and occlusion. This poses a challenge for face recognition, which demands higher performance than traditional methods can offer.
Among the traditional face recognition algorithms, the Nearest Neighbor [1] (NN) classifier, the Nearest Neighbor Feature Classifier [2–5] and the Linear Regression Classifier [6] (LRC) are widely used due to their simplicity. The implementation of these classifiers is based on evaluating the relationship between test and training samples. Departing from these approaches, Wright et al. proposed the Sparse Representation-Based Classification (SRC) algorithm [7]. In SRC, training images form a dictionary, and the sparse representation is utilized to classify test images. This enables more effective recognition of face images that are corrupted by noise or occlusion. Subsequently, Zhang et al. proposed a Collaborative Representation Classification algorithm with Regularized Least Squares (CRC_RLS) [8]. Compared with SRC, this method achieves better recognition accuracy while effectively reducing computational complexity. Later, Zuo et al. proposed a face recognition algorithm based on norm sparse coding (SRC-P) [9] by employing the
norm(
) to solve the sparse representation model. While the aforementioned models emphasize sparsity, they overlook the correlation information between images. To address this issue, Wang et al. proposed an Adaptive Sparse Representation Classification (ASRC) model [10], which leverages both sparsity and correlation information by considering both encoding sparsity and the nuclear norm of dictionaries. Consequently, this model adaptively benefits from the respective advantages of the
norm and
norm.
However, the effectiveness of the aforementioned methods significantly degrades in face recognition tasks with insufficient sampling. To address this issue, Deng et al. proposed the Extended Sparse Representation-Based Classification (ESRC) [11], which effectively mitigates the problem of low recognition rates under limited samples. Subsequently, Deng et al. developed the Superposed Linear Representation Classifier (SLRC) [12] by constructing a dictionary from a class-center matrix and an extended within-class variation matrix, thereby enhancing the generalization capability of collaborative representation. Furthermore, Yang et al. introduced a Robust Sparse Coding (RSC) model [13] that employs the maximum likelihood estimation principle to solve the sparse coding problem, reducing the model’s sensitivity to outliers. In recent years, further investigations have been conducted on sparse representation-based face recognition [14–29]. The latest research indicates that face recognition under complex conditions, such as noise and occlusion, remains a significant and worthwhile challenge, particularly in scenarios with insufficient sampling.
To further enhance the face recognition rate under conditions of under-sampling, occlusion, and noise, this paper comprehensively considers for the synergistic effects of image correlation, dictionary completeness, and algorithm robustness within a sparse representation framework. The main contributions of this work are summarized as follows:
- (1) We propose a novel adaptive robust face recognition algorithm based on a weighted and fused dictionary. This method integrates the adaptive sparse representation classification framework with a weighted matrix and the concept of superposed linear representation, effectively enhancing the recognition accuracy and stability.
- (2) The proposed algorithm is validated through extensive experiments on public face datasets. Results demonstrate that it outperforms several existing state-of-the-art methods, including NN, LRC, SRC, CRC_RLS, SLRC, ASRC, and RSC.
Related work
In this section, we briefly survey previous work on image representation, concentrating particularly on sparse coding based approaches for face recognition.
Denote the training samples dataset of th class as
, where the submatrix
is the sample image of class
th, and
. Given a query sample
, the recognition task is to determine which class
belongs to in the training sample matrix
, i.e.,
where is the sparse coefficient vector. Most elements are zero in the sparse coefficient vector, except for those related to class
. That is, the non-zero entries in
are only those corresponding to training samples from the same class as
; coefficients from other classes are zero. This property allows the recognition task to be solved by finding the sparsest representation of
over
.
When sparse representation is initially used to describe face images, the following minimization objective function is adopted, i.e.,
The above -norm minimization problem obviously exhibits sparsity, but it is an NP-hard problem. Researchers usually use the
-norm that is closest to the
-norm for sparse constraints under certain conditions, i.e.,
In order to deal with the noise and avoid the non-zero terms of the sparse coefficient vector that are not related to the test sample, we add error constraints on the basis of the above model. Therefore, the above formulation is optimized to
where is a given tolerance. The above equation can then be written as
The first term in the above formula is the fidelity term, which represents the reconstruction error between the test image and the reconstructed image. The second term
is the regularization term, which represents the sparsity of the coefficient. The parameter
serves as a trade-off parameter that balances the fidelity term and the regularization term and it is also known as the regularization parameter. In the ideal case, a sample
in class
is selected for testing, then the non-zero terms in the regularization term
correspond to the entries in class
associated with the test sample. Due to the similarity of face images and the varying degrees of errors generated in face image processing, the non-zero terms in
that are unrelated to the test sample may also appear in the
-th class samples.
The characteristic function of class samples is defined as
, where
represents the coefficient vector of class
samples in
. The test sample
can be approximated based on the sparse representation of the training samples, and then classified according to the estimated residual
between the test sample
and original test sample
. Finally, the test sample
is assigned to the class with the smallest approximation residual, and then its mathematical representation is
The above SRC algorithm [2–7] relies on sufficient samples and precise alignment. Its performance deteriorates with insufficient dictionary atoms, highly correlated samples, or significant noise. Several studies have addressed these limitations. Among them, Zhang et al. proposed Collaborative Representation-based Classification (CRC), which utilizes regularized least squares with an -norm constraint on the representation coefficients. They argued that the collaborative representation mechanism itself, rather than the
-norm-induced sparsity, is the primary contributor to the model’s effectiveness. The objective function is defined as follows:
Based on this foundation, Wang et al. introduced a trace norm constraint on the representation vector and proposed Adaptive Sparse Representation-Based Classification (ASRC), which adaptively integrates both norm and
norm sparsity. However, ASRC remains sensitive to insufficient sampling. To address this issue, Deng et al. developed Extended Sparse Representation-Based Classification (ESRC), which augments the dictionary with intra-class variation bases. Further advancing this line of work, Deng et al. also proposed SLRC, representing a test image via a class-centered matrix and an intra-class variation matrix. While these methods have achieved some success, room for improvement remains. Yang et al. addressed the sparse coding problem by applying the principle of maximum likelihood estimation and proposed the Robust Sparse Coding (RSC) model, which reduced the model’s sensitivity to outliers. Subsequently, Dong et al. introduced the Low-Rank Laplacian-Uniform Mixture Model (LR-LUM), also based on MLE, to improve the error modeling capability. This method demonstrates strong robustness in the presence of noise in face recognition.
To address the limitations of existing methods, we propose a novel sparse representation approach termed ASRC-WFD. Building upon the adaptive sparse representation framework, our method incorporates an extended dictionary and a weight function, enhancing its robustness in scenarios characterized by insufficient sampling and diverse types of noise. The detailed formulation of ASRC-WFD is provided in next section.
Adaptive sparse representation classification based weighted and fusion dictionary
This section presents our adaptive sparse representation model with a weighted and fusion dictionary in detail and derives its solution. We formally refer to the resulting face recognition framework as ASRC-WFD.
Our model
In order to construct a more robust model for sparse coding of face images, in this paper we propose to find a maximum likelihood estimation (MLE) solution [30] for the coding coefficients. Assume that the elements of the coding residual
are independently and identically distributed according to some probability density function (PDF)
. Without considering the sparsity constraint of
, the likelihood of the estimator is
, and MLE aims to maximize this likelihood function or, equivalently, minimize theobjective function:
, where
. We can approximate
by its first-order Taylor expansion in the neighborhood of
,
where is the derivative of
. In order to better describe random errors in challenging situations, we adopt a Laplacian-uniform mixed (LUM) function to fit the error distribution, which can be expressed as
where corresponds to the scale of the Laplacian component and
corresponds to a uniform distribution,
is a distribution normalization factor. Denote by
the derivative of
, and then
. Then (8) can be rewritten as
The constant term constraint in the above equation can be omitted,
The -th element of the diagonal matrix
is denoted by
. So
term becomes a reweighted
-norm constraint. Furthermore, the objective function is expressed in the following form,
where is the regularization parameter. The regular term
is to prevent overfitting of the model.
In the fidelity term , we incorporate a weighted
norm constraint to account for potential errors in the image data. To further enhance the model’s performance, additional constraints are introduced to the regularization term. As demonstrated by Wang et al. [10], while sparsity effectively selects relevant samples, the correlation structure helps capture the underlying relationships between test and training samples. To simultaneously leverage both sparsity and correlation, we impose nuclear norm constraints on both the dictionary and the representation vectors. Consequently, the objective function (12) is reformulated as follows with the application of a nuclear norm constraint to the regularization term:
where is the correlation regularized. We make the following two extreme inferences from this model.
First, assume that the samples are unrelated and the columns of the dictionary matrix are orthogonal,
. Then equation (13) will be converted into the following problem:
In this case, it is simplified to the problem of Equation (12), which indicates that the norm sparse constraint in the regular term takes into account the non-correlation between the training samples.
Second, we assume that images of different subjects look similar to , and then we have
and
(
is a vector of size
,where all the elements are one). Then, equation (13) is converted to
where the norm sparse constraint of the regularization term can be used to process highly correlated images. The above hypothetical inference precisely indicates that the model effectively harnesses both
and
norm constraints by leveraging the inherent correlation among sample images. In practical applications, face images are neither perfectly aligned nor entirely independent. The introduced nuclear norm constraint successfully balances this complex dependency.
In addition to incorporating weight constraints for random errors and modeling sparsity and correlation, we address the coding errors arising from insufficient or unrepresentative training samples. A literature review demonstrates that the SLRC algorithm exhibits notable stability under such sampling conditions. This finding motivates us to decompose the training sample dictionary into a class-centered matrix
and an intra-class variation matrix
enhancing robustness against limited data, where ,
is the class centroid of class
. Then, the class-center matrix and the intra-class variation matrix are fused as a new fusion dictionary. Then the new fusion dictionary can be fused by the class-center matrix and the in-class variation matrix as follows:
Based on the above analysis, we propose an adaptive robust sparse representation model using weighted and fused dictionaries.
This mixed model is motivated by the key error sources in face recognition: random post-processing noise, reconstruction inaccuracy, alignment perturbations, and insufficient sampling. Our approach integrates a weighted norm constraint to model errors, a fused dictionary to compensate for limited data, and a nuclear norm to improve correlation learning, collectively enhancing robustness and accuracy.
Optimization
To solve problem (19), we first convert it to the following equivalent optimization problem:
where and
. Inspired by the optimization method used in [21] and [22] we adopt alternating direction method of multipliers (ADMM) [31] to solve problem (20). We first convert it to the following equivalent problem:
Problem (21) can be solved by solving the following augmented Lagrange multiplier problem:
where and
are Lagrange multipliers and
is a parameter. In general, obtaining optimal solution of (22) is equivalent to finding a saddle point
of the min-max problem
Such that
This optimization problem can be minimized with respect to ,
and
, respectively, by fixing the other variables, and the updating
and
. As a result, the original problem is decomposed into several subproblems as follow:
where denotes the iteration index in the algorithm.
First, we optimize problem (25) by solving the following subproblem:
where the and
. Problem (30) is a nuclear norm minimization problem and a closed form solution can be obtained through singularity Value Threshold (SVT) algorithm [32]. Let
be the SVD of
, where
is singularity value matrix of
,
and
are corresponding orthogonal matric. Then, the optimal solution can be obtained as follows:
where the SVT operator is defined as follows .
Second, we consider how to solve the problem Eq.(26). Eq.(26) is equivalent to
The above problem can be easily solved by
where .
Third, we consider how to solve the problem Eq.(27). Eq.(27) is equivalent to solving the following problem:
where . The above problem can be obtained a closed form solution by soft-thresholding operator as follows:
The soft-thresholding operator is defined as follows:
where is sign function, .
We set the original residual to the minimum to converge to the optimal solution through the ADMM problem. Therefore, the constraints are proposed as follows: and
, where
and
are a small positive scalar. Convergence is achieved when the difference in the weights between adjacent iterations is sufficiently small. Specifically, we stop the iteration if the following condition holds:
, where
is a small positive scalar.
Based on the above analysis, the model optimization process is summarized into algorithm 1:
Algorithm 1: The iterative optimization algorithm for the proposed model
Input: A test image , the fusion dictionary
,
Output: the identity with the converged and
,
(1) Initialize ,
,
,
,
(2) Repeat
(3) Update the weight matrix via (11)
(4) repeat
(5) Update via (31)
(6) Update via (33)
(7) Update via (35)
(8) Update and
via (28) and (29)
(9) Update
(10)
(11) until convergence or maximum iterations
(12) until convergence or maximum iterations
Convergence analysis
In this subsection, we aim to establish the convergence of Algorithm 1. To this end, we generally need some assumptions including that the objective function is the closed proper and convex and the problem (23) has a saddle point. With these facts, we have the following convergence analysis.
Theorem 1. Let be a saddle point of the min-max problem (23), then the sequence
generated by Algorithm 1 is convergent.
Proof. By the definition of the saddle point of the Lagrangian function (23), we can deduce that ,
. This fact coupling with (28) and (29) leads to
We set the relative errors by ,
,
,
,
and then subtracting (37) from (28)- (29), we can obtain that
Squaring both sides of (38) then yields
Furthermore, we can equivalently reformulate them as
In order to obtain the monotonicity of convex function (25)-(27), the lemma is needed, please refer to Lemma 1 in Appendix B.
According to lemma 1 and the saddle point of Lagrangian function (22), we also get the monotonicity of the function as follow:
To solve the subproblem 26, we need to fix ,
,
and
, then obtaining
have
Taking in (42) and
in (44) respectively, we add them together and take the aforementioned relative error into full account (
,
,
,
,
), then we obtain the following formula (For the detailed derivation process, please refer to the appendix C.):
Similarly, we have
Summing (45)-(47) together follows that
(The detailed derivation process can be found in Appendix D.)
Substituting (40) into (48), we obtain
With the simple transform of (49), we then have
For the inequation (41), we have
By taking in (51) and
in (52) respectively, and we add them together and take the aforementioned relative error into full account (
,
,
,
,
), then we obtain the following formula (For the detailed derivation process, please refer to the appendix E.):
In addition, using the relationship to (53) leads to
By the application of
the inequation (54) can be rewritten as
Similarly, we obtain
Substituting (56) and (57) into (50), we have
Then we get
By summing the inequality (59) from to
, we obtain
With the simple transform of (60), we then have
which implies that
Then so there exists a convergent subsequence , i.e.,
Now we need to show that is the saddle point of the Lagrangian function. In fact, using the iteration scheme (25)-(29) we can deduce that
This implies that is the saddle point, that is to say,
Furthermore, it is easy to deduce that the sequence is convergent.
Classification
Based on the above optimization results, we obtain the optimal solution of the model and the corresponding weighted matrix
. For each class
, let
be the characteristic function that selects the coefficients associated with the
th class. For
,
is a new vector whose only nonzero entries are the entries in
that are associated with class
. Using only the coefficients associated with the
th class, one can approximate the given test sample
as
. Since random error usually leads to a larger reconstruction error than that of the ordinary image, this part increases the weight function to detect large random errors in the test sample and assigns a certain weight to make it output a smaller value, ultimately reducing the impact of random error in the image recognition process. We then give the residual between
and
as follows:
Thus, based on the above residual, the test samples are assigned to the class that minimizes the refactored residuals that is
. Based on the above analysis, this algorithm can be summarized into algorithm 2:
Algorithm 2:
Input: the dictionary , the test image
Output: the label of test samples
(1) Compute the fusion dictionary via (18);
(2) Solve the minimization problem as follow according to algorithm 1;
(3) Compute the residuals
(4) Predict the identity of :
Flow Diagram
Fig 1 shows the flow diagram of the proposed ASRC-WFD. As can be seen from Fig 1, the adaptive robust sparse representation algorithm based on weighted and fusion dictionary is mainly divided into three stages: fusion dictionary construction, weight function update, and sparse coefficient optimization. This study focuses on face recognition under insufficient sampling scenarios. The constructed fusion dictionary can enhance the completeness of dictionary representation in the case of limited training samples. On the basis of the fusion dictionary, the introduced weighted -norm constraint can effectively suppress complex error interference in face images. In addition, the nuclear norm constraint imposed on sparse representation coefficients can reasonably balance the sparsity of representation and the correlation information of image data. It can be observed from the flowchart that the proposed algorithm involves iterative calculation, which is mainly reflected in the solving process of weight function and sparse coefficients. The iterative optimization of weight parameters and sparse coefficients enables the presented adaptive robust sparse representation algorithm with weighted and fusion dictionary to converge to the optimal solution.
Results
To evaluate the efficacy of our proposed adaptive sparse representation model with weighted and fusion dictionaries, we conducted comprehensive experiments on several widely-used face recognition benchmarks, including the (http://RVL.www.ecn.purdue.edu) AR [33], (https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html) ORL [34] (http://cvc.yale.edu/projects/yalefaces/yalefaces.html) Yale [35], and (http://www.ri.cmu.edu/projects/project_418.html) CMU PIE [36]. A comparative analysis was performed between the proposed method and a range of established face recognition algorithms, such as NN[1], LRC [6], SRC [10], CRC [8], SLRC [12], ASRC [10] and RSC [13]. For feature extraction, we employed Principal Component Analysis (PCA) on all face image datasets.
The setting of parameters and
in (4) follows the recommendations of literature [22].They are estimated in each iteration using the mean
of the absolute value of coding residuals. Specifically,
and
are set to
and
, respectively. In the experiment, we set
and
,
in (17) is set to
.The iteration number of the inner loop and the outer loop in Algorithm 1 are set to 10 and 30, respectively.
Face recognition with clean face
This part verifies the basic recognition performance of all methods with clean face image recognition.
AR database
The AR database comprises over 4,000 frontal face images of 126 subjects. In this experiment, a subset of 50 male and 50 female subjects was selected. Each subject provided 14 frontal images with only illumination and expression variations. All images were cropped to a size of 120 × 165 pixels. To evaluate performance under different training conditions, two experimental scenarios were constructed: one with limited data (2 images per subject for training and the rest for testing) and one with ample data (6 images per subject for training, the rest for testing). Feature dimensions were varied across experiments, the corresponding results summarized in Table 1 and Table 2.
As shown in Table 1, the proposed ASRC-WFD consistently outperforms all compared methods across nearly all experimental settings. The performance advantage is most pronounced under sufficient training samples and higher-dimensional feature spaces, achieving a recognition rate of 87.25% with 6 training samples per subject at 500 feature dimensions. This gain is attributed to the integrated modeling of sparsity, correlation, and robustness. Even with limited training data, the fusion dictionary and nuclear norm constraints effectively capture intra-class variations.
SLRC generally surpasses SRC due to its collaborative robustness under sparse sampling. ASRC excels in sufficient-data regimes but underperforms under limited samples, as it overlooks dictionary incompleteness when leveraging sample correlations. While SRC, CRC, SLRC, ASRC, and RSC achieve competitive results, all are consistently outperformed by ASRC-WFD. NN performs the weakest due to its sensitivity to facial similarity, and LRC, though stronger, remains inferior to our method.
In summary, the proposed ASRC-WFD demonstrates superior recognition performance under noiseless conditions on the AR database.
ORL database
The ORL database contains faces of 40 subjects with diverse ages, genders, and racial backgrounds. Image variations include changes in illumination, pose, facial accessories, and expressions. To evaluate robustness under limited data, we constructed two training scenarios: 2 images per subject (insufficient) and 4 images per subject (sufficient), with the remaining images used for testing. Recognition performance across varying feature dimensions is reported in Tables 3 and 4.
As shown in Table 3, ASRC-WFD achieves the highest recognition rates across most settings, except at 50 dimensions. This advantage is attributed to the fusion dictionary, which enhances dictionary completeness, and the effective utilization of correlation information between images. Under insufficient training samples, ASRC-WFD reaches a maximum recognition rate of 88.05%, outperforming NN, LRC, SRC, CRC, SLRC, ASRC, and RSC by 10.24%, 10.24%, 1.80%, 5.24%, 0.55%, 3.05%, and 1.49%, respectively.
Results in Table 4 show that the highest recognition rates for most algorithms do not occur at the maximum feature dimension. For instance, with sufficient training samples, NN and RSC perform best at 140 dimensions, while LRC, SRC, CRC, and ASRC achieve optimal results at 120 dimensions. Similarly, the proposed ASRC-WFD attains its peak performance of 92.02% at 120 dimensions. Overall, ASRC-WFD consistently surpasses all compared methods under noiseless conditions on the ORL database.
Yale database
The Yale database comprises 15 subjects, each with 11 images that capture varying facial expressions and illumination conditions. These images were collected under different emotional states (e.g., sad, happy, surprised). In the experiments, 2 and 4 images per subject were used for training to simulate insufficient and sufficient sample scenarios, respectively, with the remaining images reserved for testing. Recognition performance under varying feature dimensions is summarized in Tables 5 and 6.
From Tables 5 and 6, we can see that ASRC-WFD obtains the best recognition rates at all levels. The performances of all the methods improve as the dimension of the feature space increase, and the proposed ASRC-WFD always remains the best. For example, when the number of training samples is 2 and the dimension of feature space is 29, the best recognition rate for ASRC-WFD on the Yale database is 89.55%, compared to 83.70% for NN and LRC, 85.15% for SRC, 85.93% for CRC, 87.41% for SLRC, 86.67% for ASRC, 82.22% for RSC. Since RSC has certain limitations in the case of insufficient sampling, it does not show superior performance in the experiment with 2 training samples per subject. In the experiment with the number of training samples of 4, the training samples are relatively more sufficient, so the recognition rate of the RSC algorithm is improved.
CMU PIE database
To evaluate the generalizability of our algorithm for pattern recognition tasks, we conducted experiments on the C07 subset of the CMU PIE database. This subset contains 1,629 images of 68 subjects, with variations in facial expression, pose, and illumination. All images were cropped to a resolution of 64 × 64 pixels. In our experiments, four images per subject were used for training, with the remainder constituting the test set. The corresponding results are presented in Table 7.
As shown in Table 7, the proposed ASRC-WFD consistently outperforms all com-pared methods across every feature dimension. Our approach demonstrates a clear ad-vantage over conventional sparse representation-based methods such as SRC. Since each sample in the database exhibits unique facial expressions, poses, and illumination conditions, the extended dictionary in SLRC effectively captures inter-sample varia-tions, leading to its superior performance over SRC at all dimensions. Similarly, ASRC (which leverages both sparsity and correlation) and RSC (with its weighted design) also achieve higher recognition rates than SRC across dimensional settings.
The proposed method attains the highest recognition accuracy by comprehensive-ly integrating a fusion dictionary, correlation structure, and an adaptive weighting function to handle complex error patterns in face imagery. In summary, these experi-mental results confirm that ASRC-WFD constitutes an effective and robust solution for general pattern recognition tasks.
Face recognition with random noises
This section evaluates the robustness of the proposed method against random noise corruption. Test samples were contaminated with random noise ranging from 10% to 80% in intensity. All compared methods were assessed under noise conditions from 0% to 80%. Here, 10% random noise indicates that 10% of the pixels were replaced with random values at random locations.
AR database
For face recognition under random noise conditions, experiments were conducted from two perspectives on the AR database. Two evaluation settings were adopted: insufficient training samples and sufficient training samples. In the insufficient sample setting, two sample images per subject exhibiting only illumination and expression variations were selected from the 14 available images to form the training set, with the remaining images used for testing. Under the sufficient sample setting, half of the images per subject were used for training and the remainder for testing. All images were resized to 27 × 20 pixels, with experimental results presented in Fig 2.
(a) The number of training samples per class was 2. (b) The number of training samples per class was 7.
As shown in Fig 2, the proposed ASRC-WFD method performs significantly better than the seven compared algorithms under random noise on the AR database.
In the insufficient-sample scenario (Fig 4(a)), as noise increases to 30%, our method declines gradually while others degrade sharply, demonstrating its stability under insufficient samples. Under sufficient samples (Fig 2(b)), our method maintains over 84% accuracy up to 30% noise, whereas NN, LRC, SRC, CRC, SLRC, ASRC, and RSC drop to 41.43%, 52.00%, 58.43%, 55.14%, 54.57%, 41.14%, and 60.71%, respectively. At 40% noise, our method achieves 76.41%, significantly exceeding RSC (44.00%), which benefits only from weighted norm constraints for outlier suppression. By integrating error modeling and correlation-guided recovery, our approach preserves performance even at 50% noise (57.71%), outperforming all others by at least 28.57%. These results validate the robustness of our method under both sampling conditions on AR.
(a) The number of training samples per class was 2. (b) The number of training samples per class was 5.
ORL database
The proposed method was evaluated on the ORL database under two protocols: one with insufficient training samples and the other with sufficient samples. In these experiments, 2 and 5 sample images per subject were randomly selected for training, respectively, with the remaining images used for testing. All images were resized to 16 × 16 pixels, and the results are presented in Fig 3.
Fig 3 demonstrates the superior performance and robustness of the proposed method against random noise on the ORL database, under both insufficient and sufficient training samples. Under insufficient samples (Fig 3(a)), our method maintains the highest recognition rate even at 60% noise (53.48%), owing to the fusion dictionary and weighted norm constraint, which enhance dictionary completeness and error tolerance. With sufficient samples (Fig 3(b)), our method also shows minimal performance degradation at 30% noise and significantly outperforms all seven benchmarks at 50% noise, with a recognition rate of 72.22%. This confirms the method’s exceptional robustness across different training conditions.
Yale database
The robustness of the proposed method was evaluated on the Yale database under both insufficient and sufficient training sample conditions with random noise. In the experiments, 2 and 5 images per subject were used for training, respectively, while the rest were reserved for testing. All images were resized to a resolution of 27 × 20 pixels. The corresponding results are shown in Figure 4.
Experimental results on the Yale database under insufficient sampling (Fig 4(a)) confirm the superior robustness of the proposed method. Its recognition rate remains almost unchanged with 20% random noise, outperforming all comparison methods, which show a clear decline. This stability is conferred by the synergistic action of the weight function and image correlation information. While the recognition rate of the proposed method drops from 85.82% to 66.15% as noise increases from 20% to 50%, it maintains a significant advantage; the comparison methods fall below 40%. This is explained by the progressive loss of discriminative facial features and inter-image correlation due to severe noise. As shown in Figure 4(b), the method’s robustness is also validated under sufficient sampling. Overall, the proposed method exhibits consistent and effective performance against random noise on the Yale database.
CMU PIE database
To evaluate the proposed method’s performance under varying sampling conditions (simulated by training with 4 and 10 samples per subject) and noise levels (from 10% to 80%), we conducted experiments on the CMU PIE sub-database C07, with the results shown in Fig 5.
(a) The number of training samples per class was 2. (b) The number of training samples per class was 5.
(a) The number of training samples per class was 4; (b) The number of training samples per class was 10.
Experimental results on the CMU PIE database under insufficient sampling (Fig 5(a)) reveal that the proposed method, while initially inferior to CRC and SLRC in a noiseless setting, exhibits superior noise tolerance. A sharp performance degradation is observed in CRC, SLRC, and RSC at 30% noise, in contrast to the gradual decline of the proposed method. This gradual degradation and sustained superior performance are due to the enhanced dictionary completeness and the introduction of a weight function that mitigates the impact of errors.
Under sufficient sampling (Figure 5(b)), the proposed method maintains a recognition rate of 99.16% at 30% noise, showing no performance loss, and significantly outperforms the best comparator (SRC at 92.86%). Its advantage is further amplified at higher noise levels, which is attributed to the integrated contribution of the fusion dictionary, image correlation, and error modeling. Overall, the method confirms strong robustness against random noise on the CMU PIE database.
Face recognition with random occlusions
This section assesses the method’s robustness to random occlusion – a common real-world challenge where the position and size of obstructions are unpredictable. We test on multiple databases by occluding each test image with black blocks at random, unknown locations.
AR database
Under the insufficient and sufficient sampling protocols, 2 and 7 samples per subject from the AR database were used for training, respectively, while the remainder were occluded with random black blocks for testing, the quantitative results are recorded in Table 8.
The performance evaluation on the AR database under random occlusion reveals distinct behaviors among the comparison methods, which contextualizes the superiority of our approach. Sensitivity to occlusion explains the poor performance of NN and the instability of LRC. While SRC, CRC, SLRC, and ASRC achieve higher rates through strategies like sparsity or collaborative representation, they lack a dedicated mechanism for handling severe, random corruption. RSC, though robust in other error scenarios, does not excel here. In contrast, our method explicitly addresses the core challenge: it successfully characterizes random occlusions by integrating a weight constraint to suppress errors and leveraging correlation information to preserve essential features. This targeted design is why our method maintains the best stability against random occlusion.
ORL database
In this experiment on the ORL database, 2 and 5 samples per subject were used for training under the insufficient and sufficient sampling conditions, respectively. The remaining samples were used for testing, where they were occluded with black blocks at random locations. The experimental results are presented in Table 9.
As evidenced in Table 9, our method demonstrates robust performance against random occlusion under both sampling conditions, yielding the highest recognition rates. This advantage stems from the synergistic effect of the fusion dictionary and image correlation, coupled with a weighted norm constraint that effectively suppress-es random errors.
Yale database
This experiment evaluates the performance of the proposed algorithm on the Yale database when face images are occluded by random black blocks. For this purpose, 2 and 5 samples per subject were used for training under insufficient and sufficient conditions, respectively, while the remaining samples were used for testing. The experimental re-sults are presented in Table 10.
The experimental results in Table 10 confirm that the proposed method secures the highest recognition rate on the Yale database, regardless of sampling sufficiency. This performance is owing to its incorporation of a weighted norm constraint, which is critical for suppressing error propagation during recognition. The introduction of random block masks, which create pronounced localized errors, precisely create the conditions under which this robustness is most convincingly demonstrated.
CMU PIE database
The algorithm’s performance under random block occlusion was further verified on the CMU PIE database using 4 training samples per subject. The test set was occluded with random black blocks, and the results are summarized in Table 11.
As shown in Table 11, the proposed method achieves the highest recognition rate on the CMU PIE database under random occlusion. Compared with NN, LRC, SRC, CRC, SLRC, ASRC, and RSC, it achieves performance improvements of 4.78, 2.57, 1.47, 1.10, 0.36, 0.73, and 0.73 percentage points, respectively. This superiority stems from the method’s integration of sparsity and correlation information, coupled with the use of a weighted norm to suppress the influence of errors on the recognition results. Therefore, the proposed method also demonstrates strong robustness for occluded face recognition on the CMU PIE database.
Ablation study
To quantitatively verify the effectiveness and necessity of each core component in the proposed algorithm, we conduct comprehensive ablation experiments on the ORL database under clean faces, random noise and random occlusion conditions. As shown in Table 12, the traditional baseline methods (NN, LRC, SRC, CRC) show limited robustness, and their recognition accuracy drops sharply under noise interference.
Based on the internal module ablation results, the following conclusions can be drawn:
- (1) The single nuclear norm constraint achieves promising noise robustness, but it lacks discriminative representation ability for clean and occluded face images.
- (2) On the basis of the nuclear norm, the additional weighted constraint only yields a marginal gain for occlusion resistance, which indicates that the weighted regularization alone has limited improvement.
- (3) When the fusion dictionary strategy is further embedded, the recognition accuracy is substantially improved in all three scenarios. This fully demonstrates that the fusion dictionary effectively compensates for insufficient dictionary completeness under insufficient sampling conditions. Combined with nuclear norm regularization and weighted constraint, it greatly enhances feature representation capability and overall anti-interference performance.
In summary, each designed module plays a complementary and indispensable role. The collaborative combination of nuclear norm, weighted constraint and fusion dictionary jointly guarantees the superior performance of the final proposed algorithm.
Conclusion
In this work, we propose a novel adaptive robust sparse representation model for face recognition under insufficient sampling and complex interference conditions. The proposed framework integrates three pivotal components: weighted norm constraint, nuclear norm regularization, and fusion dictionary construction. These modules work synergistically to improve feature discriminability, representation stability, and anti-noise robustness. To solve the corresponding optimization problem efficiently, the alternating direction method of multipliers (ADMM) is adopted, and the convergence of the iterative optimization process is theoretically guaranteed. Extensive experiments under clean, noisy and occluded conditions demonstrate that our method outperforms conventional sparse representation and traditional comparison methods.
Furthermore, compared with mainstream deep learning-based schemes, our sparse representation method owns two unique merits. First, it provides high interpretability: sparse coefficients explicitly reflect the contribution of each dictionary atom to feature representation, which is critical for forensic and medical face recognition tasks. Second, it maintains stable performance under limited training samples. Different from deep models that rely on large-scale datasets, the proposed method combines weighted and nuclear norm regularization to alleviate overfitting, and achieves reliable results even with only a small number of training samples per category.
Acknowledgments
The authors gratefully acknowledge the institutions and researchers that provided the publicly available face datasets: the AR Face Database, ORL (AT&T) Database, Yale Face Database, and CMU PIE Database. Their valuable contributions to the research community are sincerely appreciated. The authors also thank the editor and anonymous reviewers for their constructive comments and suggestions, which have significantly improved the quality of this manuscript.
References
- 1. Cover T, Hart P. Nearest neighbor pattern classification. IEEE Trans Inform Theory. 1967;13(1):21–7.
- 2. Li SZ, Lu J. Face recognition using the nearest feature line method. IEEE Trans Neural Netw. 1999;10(2):439–43. pmid:18252542
- 3. Jen-Tzung C, Chia-Chen W. Discriminant waveletfaces and nearest feature classifiers for face recognition. IEEE Trans Pattern Anal Machine Intell. 2002;24(12):1644–9.
- 4.
Ho J, Ming-Husang Y, Jongwoo L, Kuang-Chih L, Kriegman D. Clustering appearances of objects under varying illumination conditions. In: 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings. 2003;I-11-I–18. https://doi.org/10.1109/cvpr.2003.1211332
- 5. Lee K-C, Ho J, Kriegman DJ. Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans Pattern Anal Mach Intell. 2005;27(5):684–98. pmid:15875791
- 6. Naseem I, Togneri R, Bennamoun M. Linear regression for face recognition. IEEE Trans Pattern Anal Mach Intell. 2010;32(11):2106–12. pmid:20603520
- 7. Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y. Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell. 2009;31(2):210–27. pmid:19110489
- 8.
Zhang L, Yang M, Feng X. Sparse representation or collaborative representation: which helps face recognition?. In: 2011 International Conference on Computer Vision 2011. 471–8. https://doi.org/10.1109/ICCV.2011.6126277
- 9.
Zuo W, Meng D, Zhang L, Feng X, Zhang D. A Generalized Iterated Shrinkage Algorithm for Non-convex Sparse Coding. In: 2013 IEEE International Conference on Computer Vision. 2013. 217–24. https://doi.org/10.1109/iccv.2013.34
- 10. Wang J, Lu C, Wang M, Li P, Yan S, Hu X. Robust face recognition via adaptive sparse representation. IEEE Trans Cybern. 2014;44(12):2368–78. pmid:25415943
- 11. Deng W, Hu J, Guo J. Extended SRC: undersampled face recognition via intraclass variant dictionary. IEEE Trans Pattern Anal Mach Intell. 2012;34(9):1864–70. pmid:22813959
- 12. Deng W, Hu J, Guo J. Face Recognition via Collaborative Representation: Its Discriminant Nature and Superposed Representation. IEEE Trans Pattern Anal Mach Intell. 2018;40(10):2513–21. pmid:28976311
- 13. Yang M, Zhang L, Yang J, Zhang D. Regularized robust coding for face recognition. IEEE Trans Image Process. 2013;22(5):1753–66. pmid:23269753
- 14. Liu X, Lu L, Shen Z, Lu K. A novel face recognition algorithm via weighted kernel sparse representation. Future Generation Computer Systems. 2018;80:653–63.
- 15. Wei C-P, Chen C-F, Wang Y-CF. Robust face recognition with structurally incoherent low-rank matrix decomposition. IEEE Trans Image Process. 2014;23(8):3294–307. pmid:24951689
- 16.
Qian J, Yang J, Zhang F, Lin Z. Robust Low-Rank Regularized Regression for Face Recognition with Occlusion. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2014;21–6. https://doi.org/10.1109/cvprw.2014.9
- 17. Yang J, Luo L, Qian J, Tai Y, Zhang F, Xu Y. Nuclear Norm Based Matrix Regression with Applications to Face Recognition with Occlusion and Illumination Changes. IEEE Trans Pattern Anal Mach Intell. 2017;39(1):156–71. pmid:26930675
- 18. Xie J, Yang J, Qian JJ, Tai Y, Zhang HM. Robust Nuclear Norm-Based Matrix Regression With Applications to Robust Face Recognition. IEEE Trans Image Process. 2017;26(5):2286–95. pmid:28166496
- 19. Li S, Li K, Fu Y. Self-Taught Low-Rank Coding for Visual Learning. IEEE Transactions on Neural Networks and Learning Systems. 2018;29:645–56.
- 20. Zhang H, Gong C, Qian J, Zhang B, Xu C, Yang J. Efficient Recovery of Low-Rank Matrix via Double Nonconvex Nonsmooth Rank Minimization. IEEE Trans Neural Netw Learn Syst. 2019;30(10):2916–25. pmid:30892254
- 21. Iliadis M, Wang H, Molina R, Katsaggelos AK. Robust and Low-Rank Representation for Fast Face Identification With Occlusions. IEEE Trans Image Process. 2017;26(5):2203–18. pmid:28252401
- 22.
Dong J, Zheng H, Lian L. Low-Rank Laplacian-Uniform Mixed Model for Robust Face Recognition. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019;11889–98. https://doi.org/10.1109/CVPR.2019.01217
- 23. Zheng H, Lin D, Lian L, Dong J, Zhang P. Laplacian-Uniform Mixture-Driven Iterative Robust Coding With Applications to Face Recognition Against Dense Errors. IEEE Trans Neural Netw Learn Syst. 2020;31(9):3620–33. pmid:31714242
- 24. Qian J, Yang J, Xu Y, Xie J, Lai Z, Zhang B. Image decomposition based matrix regression with applications to robust face recognition. Pattern Recognition. 2020;102:107204.
- 25. Li Q, He H, Lai H, Cai T, Wang Q, Gao Q. Enhanced nuclear norm based matrix regression for occluded face recognition. Pattern Recognition. 2022;126:108585.
- 26. Zhang C, Li H, Qian Y, Chen C, Zhou X. Locality-Constrained Discriminative Matrix Regression for Robust Face Identification. IEEE Trans Neural Netw Learn Syst. 2022;33(3):1254–68. pmid:33332275
- 27. Cheng D, Zhang X, Xu X. Reweighted robust and discriminative latent subspace projection for face recognition. Information Sciences. 2024;657:119947.
- 28. Melzi P, Tolosana R, Vera-Rodriguez R, Kim M, Rathgeb C, Liu X, et al. FRCSyn-onGoing: Benchmarking and comprehensive evaluation of real and synthetic data to improve face recognition systems. Information Fusion. 2024;107:102322.
- 29. Liu Y, Luo G, Weng Z, Zhu Y. Adaptive Face Recognition for Multi-Type Occlusions. IEEE Trans Circuits Syst Video Technol. 2024;34(11):11400–12.
- 30. Yang AY, Zhou Z, Balasubramanian AG, Sastry SS, Ma Y. Fast l₁-minimization algorithms for robust face recognition. IEEE Trans Image Process. 2013;22(8):3234–46. pmid:23674456
- 31. Boyd S. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning. 2010;3:1–122.
- 32. Cai JF, Candès EJ, Shen Z. A Singular Value Thresholding Algorithm for Matrix Completion. SIAM Journal on Optimization. 2010;20:1956–82.
- 33.
Benavente R. The AR face database. 1998. Available: https://www.researchgate.net/publication/243651904
- 34.
Samaria FS, Harter AC. Parameterisation of a stochastic model for human face identification. In: Proceedings of 1994 IEEE Workshop on Applications of Computer Vision. 138–42. https://doi.org/10.1109/acv.1994.341300
- 35. Belhumeur PN, Hespanha JP, Kriegman DJ. Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Machine Intell. 1997;19(7):711–20.
- 36. Terence S, Baker S, Bsat M. The CMU pose, illumination, and expression database. IEEE Trans Pattern Anal Machine Intell. 2003;25(12):1615–8.