Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Robust sparse smooth principal component analysis for face reconstruction and recognition

  • Jing Wang ,

    Roles Conceptualization, Formal analysis, Funding acquisition, Methodology, Writing – original draft, Writing – review & editing

    wangjing@xynu.edu.cn

    Affiliations School of Computer and Information Technology, Xinyang Normal University, Xinyang, Henan, China, Henan Key Laboratory of Analysis and Applications of Education Big Data, Xinyang Normal University, Xinyang, Henan, China

  • Xiao Xie,

    Roles Methodology, Visualization, Writing – original draft

    Affiliations School of Computer and Information Technology, Xinyang Normal University, Xinyang, Henan, China, Henan Key Laboratory of Analysis and Applications of Education Big Data, Xinyang Normal University, Xinyang, Henan, China

  • Li Zhang,

    Roles Conceptualization, Formal analysis, Funding acquisition, Visualization, Writing – review & editing

    Affiliation School of Early-Childhood Education, Nanjing Xiaozhuang University, Nanjing, Jiangsu, China

  • Jian Li,

    Roles Conceptualization, Funding acquisition, Methodology, Writing – original draft, Writing – review & editing

    Affiliations School of Computer and Information Technology, Xinyang Normal University, Xinyang, Henan, China, Henan Key Laboratory of Analysis and Applications of Education Big Data, Xinyang Normal University, Xinyang, Henan, China

  • Hao Cai,

    Roles Formal analysis, Methodology, Visualization, Writing – original draft

    Affiliations School of Computer and Information Technology, Xinyang Normal University, Xinyang, Henan, China, Henan Key Laboratory of Analysis and Applications of Education Big Data, Xinyang Normal University, Xinyang, Henan, China

  • Yan Feng

    Roles Conceptualization, Methodology, Writing – review & editing

    Affiliations School of Computer and Information Technology, Xinyang Normal University, Xinyang, Henan, China, Henan Key Laboratory of Analysis and Applications of Education Big Data, Xinyang Normal University, Xinyang, Henan, China

Abstract

Existing Robust Sparse Principal Component Analysis (RSPCA) does not incorporate the two-dimensional spatial structure information of images. To address this issue, we introduce a smooth constraint that characterizes the spatial structure information of images into conventional RSPCA, generating a novel algorithm called Robust Sparse Smooth Principal Component Analysis (RSSPCA). The proposed RSSPCA achieves three key objectives simultaneously: robustness through L1-norm optimization, sparsity for feature selection, and smoothness for preserving spatial relationships. Within the Minorization-Maximization (MM) framework, an iterative process is designed to solve the RSSPCA optimization problem, ensuring that a locally optimal solution is achieved. To evaluate the face reconstruction and recognition performance of the proposed algorithm, we conducted comprehensive experiments on six benchmark face databases. Experimental results demonstrate that incorporating robustness and smoothness improves reconstruction performance, while incorporating sparsity and smoothness improves classification performance. Consequently, the proposed RSSPCA algorithm generally outperforms existing algorithms in face reconstruction and recognition. Additionally, visualization of the generalized eigenfaces provides intuitive insights into how sparse and smooth constraints influence the feature extraction process. The data and source code from this study have been made publicly available on the GitHub repository: https://github.com/yuzhounh/RSSPCA.

1. Introduction

Principal Component Analysis (PCA) [1,2] has been widely applied in dimensionality reduction, signal reconstruction, and pattern classification [35]. However, traditional PCA adopts L2-norm in the objective function, which is easily affected by noise. Applying L1-norm to the objective function of PCA generates the PCA with L1-norm (PCA-L1) [6], which is robust and can effectively reduce the influence of data noise.

Sparsity [7,8] is another important property. Sparse modelling can automatically find relevant features from training data while ignoring irrelevant features. It not only improves the generalization ability of an algorithm but also increases the interpretability of the results. Therefore, sparse modelling has been widely applied in signal processing, machine learning, pattern recognition, and many other fields [9,10]. Traditional PCA cannot extract sparse principal components. To address this issue, L1-norm is applied to the constraint function of traditional PCA, generating the Sparse PCA (SPCA) [11]. Due to the sparsity-promoting property of L1-norm, the principal components extracted by SPCA are sparse.

Inspired by PCA-L1 and SPCA, Robust SPCA (RSPCA) [12] applies L1-norm to both the objective and constraint functions of traditional PCA for simultaneously robust and sparse modelling. However, when processing facial images using PCA and its variants, these images must be reshaped into vectors before further processing, which inevitably leads to the loss of inherent two-dimensional spatial structure information. While Two-dimensional PCA (2DPCA) [13] and its improved variants [1416] attempt to address this limitation by expressing face images as matrices, they still do not fully capture and utilize the rich spatial structure information present in image data.

Various approaches have been developed to effectively preserve and utilize spatial structure information in image processing tasks. These include texture analysis methods that capture local patterns and regularities [17], graph-based techniques that model relationships between image regions [18,19], geometric approaches that analyze shapes and spatial configurations [20], deep learning architectures specifically designed to maintain spatial correlations [21], etc. These approaches leverage the inherent continuity and gradual variations present in natural images to preserve essential spatial relationships. Among these methods, smoothness-based approaches [22] have demonstrated particular effectiveness in handling data with rich spatial and temporal structural characteristics.

The smooth constraint [23] characterizes the interactions among adjacent features and has achieved remarkable success in brain decoding applications, where preserving spatial and temporal relationships is crucial. It was initially introduced in Electroencephalogram (EEG) decoding [23], where it proved essential for capturing the continuity of brain activity patterns. Subsequently, the combination of sparseness and smoothness was fully investigated by [24]. Its effectiveness led to widespread adoption in various neuroimaging applications, including Magnetoencephalography (MEG) [25], functional Magnetic Resonance Imaging (fMRI) [2628], and Electrocorticographic (ECoG) [29] decoding. Recently, smoothness was also applied in many other fields, including functional connectivity-based brain region parcellation [30], hyperspectral image classification [31], reconstruction of compressively sensed multichannel EEG signals [32], foreground estimation in neuronal images [33], etc. The success in these applications stems from the constraint’s ability to model and preserve the intrinsic spatial and temporal relationships within the data, which is particularly crucial when dealing with complex, structured information.

The implementation of smoothness is typically achieved through a graph Laplacian matrix [34], which effectively represents the spatial and temporal structure information of data. This approach is rooted in spectral graph theory [35,36], where Laplacian eigenmaps capture the intrinsic geometry of high-dimensional data [37,38]. While traditional dimensionality reduction techniques like Laplacian eigenmaps [39,40] have been widely used in machine learning and pattern recognition, they primarily focus on general data representation rather than explicitly preserving the spatial and temporal relationships between adjacent features.

Building upon these insights and the success of smooth constraints in brain decoding [2333], we propose to incorporate this constraint into conventional RSPCA, resulting in a novel approach termed Robust Sparse Smooth PCA (RSSPCA). This integration addresses a major limitation of RSPCA, i.e., its inability to account for spatial structure information in images. The proposed RSSPCA achieves three key objectives simultaneously: robustness through L1-norm optimization, sparsity for feature selection, and smoothness for preserving spatial relationships. These combined properties make RSSPCA particularly well-suited for face image processing tasks, where preserving spatial structure is crucial for accurate reconstruction and recognition.

The synergy between sparsity and smoothness in RSSPCA is theoretically justified by their complementary nature [24,41]. While sparsity helps identify the most relevant features and reduces noise, smoothness ensures that the spatial coherence of these features is maintained, leading to more naturalistic and interpretable results. This combination has proven particularly effective in applications where both feature selection and structural preservation are important, such as in brain decoding and image processing tasks.

To validate the effectiveness of RSSPCA, we conducted comprehensive experiments on six benchmark face databases, comparing its performance with four competing algorithms in terms of face reconstruction and recognition accuracy. The results demonstrate that RSSPCA significantly outperforms existing methods, confirming the advantages of incorporating smooth constraints into robust sparse PCA frameworks.

The remainder of this paper is organized as follows. Section 2 reviews traditional PCA and its robust and sparse variants. Section 3 presents our proposed RSSPCA methodology, including the problem formulation, related techniques, and iterative solutions. Section 4 demonstrates the effectiveness of our approach through comprehensive experiments on face reconstruction and recognition. Section 5 discusses the limitations of our approach and suggests directions for future research. Finally, Section 6 concludes the paper.

2. Related works

In this paper, lowercase letters represent scalars, boldface lowercase letters represent column vectors, boldface uppercase letters represent matrices; , , and represent L1-norm, L2-norm, and Frobenius norm, respectively; represents the sign function; represents a square diagonal matrix with the elements of vector on the main diagonal.

This section reviews the traditional PCA and its robust and sparse variants. For the variants of PCA, we emphasize on finding the first projection vector. After that, multiple projection vectors can be extracted by implementing a deflation scheme.

2.1 PCA

Let be training images where each row is a feature and each column is an image. The images are assumed to be mean-centered, i.e., . PCA [1,2] finds the first principal component by solving the following optimization problem:

(1)

The projection vector can be obtained by conducting eigen decomposition of the image covariance matrix and preserving the eigenvector with the largest eigenvalue.

2.2 PCA-L1

PCA with L1-norm (PCA-L1) [6] is formulated by replacing the L2-norm in the objective function of PCA with L1-norm. That is, PCA-L1 finds the first projection vector by solving the following optimization problem:

(2)

The projection vector can be computed by an iterative procedure. Let be the iteration number, be the projection vector at the th step, then is updated by:

(3)(4)

By incorporating L1-norm into the objective function of PCA, the resulting PCA-L1 algorithm demonstrates enhanced robustness against the impact of data noise.

2.3 RSPCA

Robust Sparse PCA (RSPCA) [12] is formulated by incorporating the L1-norm into both the objective and constraint functions of PCA. That is, RSPCA finds the first projection vector by solving the following optimization problem:

(5)

where is a positive constant. RSPCA has two different iterative solutions. The first solution [12] only relaxes the objective function and then solves the relaxed optimization problem by soft-thresholding [42]. That is, the projection vector is updated by:

(6)(7)(8)

where is the soft-thresholding operator, , is the Lagrange multiplier. The and functions can be applied on a vector in an elementwise manner. Inspired by [14,15], RSPCA can also be solved by simultaneously relaxing the objective and constraint functions. That is, the projection vector is updated by:

(9)(10)(11)

where , , and are the th elements of , , and , respectively. The parameter is a nonnegative scalar that adjusts the relative weight between the sparse constraint and the L2-norm constraint. When is set to zero, the L1-norm constraint becomes invalid and RSPCA reduces to PCA-L1. When increases, the sparsity of the projection vector increases. When is set to positive infinity, the L2-norm constraint becomes invalid. As a result, the projection vector becomes a vector with only one nonzero element that equals one [15,43]. This is the reason why the L2-norm constraint is reserved in RSPCA [12] and its 2D counterpart, i.e., 2DPCA-L1 with Sparsity (2DPCAL1-S) [14]. Due to the L1-norm in the objective and constraint functions, RSPCA achieves robustness and sparseness simultaneously.

After obtaining the first projection vectors for PCA, PCA-L1, or RSPCA, , the th projection vector is calculated likewise on the deflated sample [44]

(12)

By iteratively implementing the deflation procedure, we can extract multiple projection vectors [6,12].

3. Proposed methodology

This section presents our proposed Robust Structured Sparse PCA (RSSPCA) algorithm. We first formulate the optimization problem and define the objective function. Then, we introduce the Laplacian matrix and its associated smoothness penalty term to capture spatial relationships. Next, we elaborate on the Minorization-Maximization (MM) framework that underpins our solution approach, followed by two essential inequalities that facilitate our theoretical derivation. Finally, based on the MM framework and the introduced inequalities, we develop a complete iterative solution to the RSSPCA optimization problem.

3.1 Problem formulation

Based on the robust and sparse PCA algorithms presented above, we propose RSSPCA by incorporating a smooth constraint. That is, RSSPCA finds its first projection vector by solving the following optimization problem:

(13)

where and are positive constants, is a Laplacian matrix representing the two-dimensional spatial structure information of images.

The optimization problem of RSSPCA has four parts. The first part is the objective function. The L1-norm in the objective function makes RSSPCA robust against noise. The second part is the L2-norm constraint. It is reserved for the same reason as in RSPCA [12] and 2DPCAL1-S [14]. The third part is the sparse constraint, which makes the projection vector sparse. The fourth part is the smooth constraint, which makes the projection vector spatially smooth. Therefore, RSSPCA is modelled to be robust, sparse, and smooth at the same time. It is anticipated to offer superior performance in face reconstruction and recognition tasks when compared to PCA and its variants.

3.2 Laplacian matrix

The Laplacian Eigenmaps approach [35,36] was initially proposed to capture the intrinsic geometric structure of low-dimensional manifolds. However, in the optimization problem of RSSPCA, we employ the Laplacian matrix to represent the two-dimensional spatial structure information inherent in images and subsequently construct the smoothness constraint. The definition of the Laplacian matrix is elaborated as follows. Suppose there are two pixels and on an image. Their coordinates on the two-dimensional plane are represented as and , respectively, as illustrated in Fig 1.

thumbnail
Fig 1. An illustration of the coordinates of two pixels in an image.

https://doi.org/10.1371/journal.pone.0323281.g001

The Euclidean distance between the two pixels is

(14)

Define an adjacency matrix whose elements are

(15)

. That is, the corresponding element in the adjacency matrix is one if and only if two pixels are adjacent on the two-dimensional plane, otherwise it is 0. The degree matrix is defined based on the adjacency matrix as:

(16)

where represents a -dimensinal column vector with all elements equal to one. The degree matrix is a diagonal matrix. Each element on the main diagonal of equals the number of pixels adjacent to the current pixel. Then the Laplacian matrix is calculated based on the adjacency matrix and the degree matrix as

(17)

Fig 2 shows examples of the adjacency matrix, degree matrix, and Laplacian matrix constructed by reshaping a 66 image into a vector.

thumbnail
Fig 2. An illustration of the adjacency matrix, degree matrix, and Laplacian matrix.

https://doi.org/10.1371/journal.pone.0323281.g002

According to the definition of Laplacian matrix, we have

(18)

For two adjacent pixels and , equals one, otherwise equals zero. By incorporating this constraint, the difference between and is punished. That is, the weights corresponding to two adjacent pixels will be close. Therefore, this constraint achieves a smoothing effect on the projection vector.

3.3 Minorization-Maximization framework

The optimization problem of RSSPCA is challenging to solve due to the presence of L1-norm. This paper adopts the Minorization-Maximization (MM) framework [45] to address this issue. Suppose is the objective function to be maximized. Within the MM framework, if there exists a surrogate function that satisfies the following two key conditions:

(19)(20)

the original objective function can be optimized by iteratively maximizing the surrogate function as follows:

(21)

Then we have

(22)

The first inequality holds because reaches the minimum value at according to the two key conditions. The second inequality holds because reaches the maximum value at according to the update rule. Therefore, the objective function monotonically increases during the iterative process and will converge to a local optimum. By finding a surrogate function that is easy to handle, we can solve the optimization problem of RSSPCA within the MM framework.

3.4 Inequality

The surrogate function is typically formulated by introducing inequalities. Below are two inequalities that will be used to solve RSSPCA. Let , , , the following two inequalities

(23)(24)

hold and the inequalities become equalities when . The proofs for the two inequalities can be found in Wang [15]. Note that Equation (24) requires that must not contain zero elements.

3.5 Solution

Based on the MM framework and the introduced inequalities, we can develop an iterative solution to the RSSPCA optimization problem. Maximizing the optimization problem of RSSPCA equals maximizing the Lagrangian as follows:

(25)

where , , and are three Lagrangian multipliers satisfying that , , and . Denote the Lagrangian as . According to Equations (23) and (24),

(26)

holds and the inequality becomes equality when . Denote the relaxed function as . We have and for all , satisfying the two key conditions of the MM framework. Therefore, is a feasible surrogate function of the Lagrangian . Within the MM framework, maximizing can be turned into iteratively maximizing the surrogate function as follows:

(27)

It is a quadratic optimization problem concerning to . Its solution is

(28)

where , . Considering that , we have

(29)

Let and , then , . Equation (29) can be rewritten as:

(30)

Therefore, it is only necessary to tune two parameters, i.e., and , in RSSPCA. The parameter tunes the relative weight between the sparse constraint and the L2-norm constraint. The parameter tunes the relative weight between the smooth constraint and the L2-norm constraint. In summary, the two parameters adjust the relative weights of the three constraints in the optimization problem of RSSPCA.

Considering that might be a sparse vector, calculating could encounter division by zero errors. To avoid this problem, we rewrite Equation (30) as follows:

(31)

This update rule no longer requires that there are no zero elements in . Equation (31) can be reformulated in the following two-step form:

(32)(33)

Update iteratively until convergence, the result is the first projection vector of RSSPCA. Then, multiple projection vectors can be extracted likewise by iteratively implementing the deflation strategy [44]. The algorithm procedure of RSSPCA is listed in Algorithm 1.

Algorithm 1. The algorithm procedure of RSSPCA.

Input: training samples , number of projection vectors , parameters and .

Output: projection vector .

for

   initialize , .

   Initialize by the th principal component.

   .

   while and

             .

             .

             .

             .

             .

             .

             .

   end while

   .

   .

   .

end for

From the optimization problem and the update rule of RSSPCA, it can be inferred that RSSPCA is a generalization of PCA-L1 and RSPCA. When and , RSSPCA reduces to PCA-L1. When , RSSPCA reduces to RSPCA. When , RSSPCA reduces to Robust Smooth PCA (RSMPCA).

4. Experiments

This section presents comprehensive experimental evaluations of our proposed RSSPCA algorithm. We first assessed the reconstruction and recognition performance of RSSPCA against four competing algorithms in face reconstruction and recognition tasks. Subsequently, we visualized the projection vectors computed by different algorithms to provide intuitive insights into their underlying characteristics. Finally, we validated our approach on five additional publicly available benchmark face databases to demonstrate its robustness and generalizability across diverse datasets. The flowchart of the experiments is shown in Fig 3.

thumbnail
Fig 3. The flowchart of the experiments.

The training of the projection matrix can be performed using the proposed RSSPCA method or any other competing algorithms.

https://doi.org/10.1371/journal.pone.0323281.g003

Our scripts were written in MATLAB and are publicly available at https://github.com/yuzhounh/RSSPCA. The experiments were conducted on four workstations, each equipped with dual 20-core 2.20 GHz Intel(R) Xeon(R) processors and 256 GB of memory. To minimize total computational time, we ran approximately 40 MATLAB sessions in parallel on each workstation. A parallel computing version of the code is available at https://github.com/yuzhounh/RSSPCA_parallel.

4.1 Face reconstruction

We first conducted a face reconstruction experiment to evaluate the reconstruction performance of RSSPCA and the four competing algorithms on the publicly available ORL Face Database [46,47], which was created and distributed by AT&T Laboratories Cambridge for research purposes. Four typical PCA-based algorithms, i.e., PCA [1,2], PCA-L1 [6], RSPCA [12], and RSMPCA, were compared with RSSPCA in the experiment.

The ORL face database contains 400 face images from 40 subjects, with 10 images per subject. The images were captured with different facial expressions, rotations, and slight scale variations. The original image size is 112 by 92. To reduce computational time, we further resized the images to 56 by 46. Among the 400 images in the ORL face database, 80 images were randomly selected and occluded with a rectangular salt-and-pepper noise whose size was not smaller than 10 by 10, located in a random position

Let be the projection matrix trained on the polluted ORL face database, which includes 320 clean images and 80 occluded images. Let be clean images that are mean-centered, =320. The average reconstruction error is defined as

(34)

It is used to evaluate the reconstruction performance of the five algorithms.

Fig 4 shows the reconstruction errors of PCA and PCA-L1 with different numbers of projection vectors. In both cases, the reconstruction error monotonically decreases with increasing number of projection vectors. When the number of projection vectors is greater than 7, the reconstruction error of PCA-L1 is lower than that of PCA. By averaging the reconstruction errors with different projection vector numbers that are in the range of [1,30], we obtain the overall reconstruction errors of PCA and PCA-L1. They are 1272.33 and 1203.31, respectively. The results demonstrate that incorporating L1-norm into the objective function of PCA, i.e., incorporating robustness into PCA, can reduce reconstruction errors, consistent with the results in [6,14,15].

thumbnail
Fig 4. Reconstruction errors of PCA and PCA-L1 with different numbers of projection vectors.

https://doi.org/10.1371/journal.pone.0323281.g004

RSPCA is a special case of RSSPCA when . Therefore, the projection vectors of RSPCA can be calculated by the update rule of RSSPCA. RSPCA only contains a parameter that tunes the relative weight between the sparse constraint and the L2-norm constraint. The parameters was selected from . That is, was selected from -3–3 with a step of 0.2. For each value, we averaged the reconstruction errors with different numbers of projection vectors to obtain the overall reconstruction errors, as shown in Fig 5. The overall reconstruction error generally increases with increasing value. The lowest reconstruction error is 1205.41, obtained when , very close to the reconstruction error of PCA-L1, which is 1203.31. From this result and the trend in Fig 5, we can infer that the lowest reconstruction error of RSPCA is achieved when . In other words, RSPCA achieves the lowest reconstruction error when it reduces to PCA-L1. The results demonstrate that incorporating sparsity has little effect on reconstruction, consistent with the results in [14,15].

thumbnail
Fig 5. Reconstruction errors of RSPCA with different values.

https://doi.org/10.1371/journal.pone.0323281.g005

RSMPCA is a special case of RSSPCA when . Similarly, the projection vectors of RSMPCA can be calculated by the update rule of RSSPCA. RSMPCA contains a parameter that tunes the relative weight between the smooth constraint and the L2-norm constraint. The parameter was selected from . That is, was selected from -3–3 with a step of 0.2. For each value, we averaged the reconstruction errors with different projection vector numbers to obtain the overall reconstruction errors, as shown in Fig 6. When , the reconstruction error was barely affected by . When , the reconstruction error significantly increases with increasing value. The lowest reconstruction error is 1152.25, obtained when . This result is lower than the reconstruction error of PCA-L1, indicating that incorporating smoothness improves reconstruction performance.

thumbnail
Fig 6. Reconstruction errors of RSMPCA with different values.

https://doi.org/10.1371/journal.pone.0323281.g006

RSSPCA contains two parameters and , among which tunes the relative weight between the sparse constraint and the L2-norm constraint, and tunes the relative weight between the smooth constraint and the L2-norm constraint. For RSSPCA, both and were selected from -3–3 with a step of 0.2. The average reconstruction errors of RSSPCA under different parameters are shown in Fig 7. The lowest reconstruction error is 1153.35, obtained when and .

thumbnail
Fig 7. Reconstruction errors of RSSPCA with different and values.

https://doi.org/10.1371/journal.pone.0323281.g007

The lowest reconstruction errors and corresponding optimal parameters of the five algorithms are shown in Table 1. Fig 8 shows the reconstruction errors of the five algorithms with different number of projection vectors when the optimal parameters are selected.

thumbnail
Table 1. The lowest reconstruction errors and optimal parameters of the five algorithms.

https://doi.org/10.1371/journal.pone.0323281.t001

thumbnail
Fig 8. Reconstruction errors of the five algorithms with different numbers of projection vectors.

The parameters are set to the optimal ones for each algorithm.

https://doi.org/10.1371/journal.pone.0323281.g008

Based on the above results, it is evident that the reconstruction errors of PCA are the largest. PCA-L1 incorporates L1-norm into the objective function of PCA and achieves lower reconstruction errors than PCA, indicating that incorporating robustness is beneficial for reconstruction. RSPCA incorporates the sparse constraint into PCA-L1. The reconstruction errors of PCA-L1 and RSPCA are very close, indicating that incorporating sparsity has little effect on reconstruction. These results are consistent with the results in [6,14,15]. RSMPCA incorporates the smooth constraint into PCA-L1 and achieves lower reconstruction errors than PCA-L1, indicating that incorporating smoothness is beneficial for reconstruction. RSSPCA simultaneously incorporates the sparse constraint and the smooth constraint into PCA-L1. It is not surprising that RSSPCA achieves lower reconstruction errors than PCA-L1.

In addition, RSSPCA can be generated by incorporating the smooth constraint into RSPCA. With this adaptation, RSSPCA achieves lower reconstruction errors than RSPCA. It further demonstrates that incorporating smoothness is beneficial for reconstruction. From another perspective, RSSPCA can be generated by incorporating the sparse constraint into RSMPCA. But the reconstruction errors of RSSPCA and RSMPCA are very close. It further demonstrates that incorporating sparsity has little effect on reconstruction.

In summary, incorporating robustness and smoothness reduces reconstruction errors, while incorporating sparsity has little effect on reconstruction. These results demonstrate the superiority of RSSPCA over PCA, PCA-L1, and RSPCA in face reconstruction. When comparing RSSPCA and RSMPCA, they have similar reconstruction performance.

4.2 Face recognition

Next, we conducted a face recognition experiment to evaluate the classification performance of RSSPCA and the four competing algorithms. For each subject in the ORL face database, we randomly selected seven images for training and used the remaining three images for testing. The training images were normalized by z-score so that each feature was centered to have a mean of zero and scaled to have a standard deviation of one. The testing images were then normalized by applying the same parameters. After that, we applied the five algorithms to extract the first 30 projection vectors and applied these projection vectors to reduce the dimension of the normalized data. Finally, Nearest Neighbor (NN) classifier was applied to perform classification. The above procedure was repeated three times, and the average classification accuracy was calculated to evaluate the classification performance of the five algorithms.

Figs 9 and 10 show the average classification accuracies of PCA and PCA-L1 with different numbers of projection vectors. When the number of projection vectors increases, the classification accuracy generally increases. When the number of projection vectors is greater than 10, the classification accuracy increases slowly. By averaging the classification accuracies with different projection vector numbers that are in the range of [1,30], we obtain the overall classification accuracies of PCA and PCA-L1. They are 0.8556 and 0.8552, respectively. The two results are very close, which means that the classification performances of PCA and PCA-L1 are very close. It indicates that incorporating L1-norm into the objective function of PCA, i.e., incorporating robustness into PCA, has little effect on the classification performance, consistent with the results in [14,15].

thumbnail
Fig 9. Average classification accuracies of PCA with different numbers of projection vectors.

https://doi.org/10.1371/journal.pone.0323281.g009

thumbnail
Fig 10. Average classification accuracies of PCA-L1 with different numbers of projection vectors.

https://doi.org/10.1371/journal.pone.0323281.g010

For RSPCA, the parameters was selected from . That is, was selected in the range of -6–6 with a step of 0.1. For each value, we averaged the classification accuracies with different projection vector numbers to obtain the overall classification accuracies, as shown in Fig 11. When or , the overall classification accuracies are stable with different values of . When , the overall classification accuracy generally decreases with increasing value. The highest classification accuracy is 0.8695, obtained when . RSPCA outperforms PCA-L1, indicating that incorporating sparsity improves classification performance, consistent with the results in [14,15].

thumbnail
Fig 11. Average classification accuracies of RSPCA with different values.

https://doi.org/10.1371/journal.pone.0323281.g011

For RSMPCA, the parameter was selected from . That is, was selected in the range of -2–6 with a step of 0.1. The overall classification accuracies of RSMPCA with different values are shown in Fig 12. When or , the overall classification accuracies are stable with different values of . When , the overall classification accuracy generally decreases with increasing value. The highest classification accuracy is 0.8747, obtained when . RSMPCA outperforms PCA-L1, indicating that incorporating smoothness improves classification performance.

thumbnail
Fig 12. Average classification accuracies of RSMPCA with different values.

https://doi.org/10.1371/journal.pone.0323281.g012

For RSSPCA, two parameters were selected according to the classification results of RSPCA and RSMPCA. Specifically, was selected in the range of -3–1 with a step of 0.2, and was selected in the range of -1–3 with a step of 0.2. The purpose of setting the step size to 0.2 is to reduce the number of parameter combinations, thereby shortening the total calculation time. The overall classification accuracies of RSSPCA with different parameters are shown in Fig 13. The highest classification accuracy is 0.8810, obtained when and .

thumbnail
Fig 13. Average classification accuracies of RSSPCA with different and values.

https://doi.org/10.1371/journal.pone.0323281.g013

The highest classification accuracies and corresponding optimal parameters of the five algorithms are shown in Table 2. The classification accuracies of PCA and PCA-L1 are very close. It indicates that incorporating L1-norm into the objective function of PCA, i.e., incorporating robustness, has little effect on classification performance. RSPCA incorporates L1-norm constraint into PCA-L1 and achieves a higher classification accuracy than PCA-L1, indicating that incorporating the sparse constraint is beneficial for classification. These results are consistent with the results in [14,15]. RSMPCA incorporates the smooth constraint into PCA-L1 and achieves higher classification accuracy than PCA-L1, indicating that incorporating the smooth constraint is also beneficial for classification. RSSPCA simultaneously incorporates the sparse constraint and the smooth constraint into PCA-L1, and achieves the highest classification accuracy among the five algorithms.

thumbnail
Table 2. The highest classification accuracies and optimal parameters of the five algorithms.

https://doi.org/10.1371/journal.pone.0323281.t002

In addition, RSSPCA can be generated by incorporating the sparse constraint into RSMPCA. With this adaptation, RSSPCA achieves a higher classification accuracy than RSMPCA. It further demonstrates that incorporating the sparse constraint is beneficial for classification. In another viewpoint, RSSPCA can be generated by incorporating the smooth constraint into RSPCA. With this adaptation, RSSPCA achieves a higher classification accuracy than RSPCA. It further demonstrates that incorporating the smooth constraint is also beneficial for classification.

In summary, incorporating robustness has little effect on classification performance, while incorporating either sparsity or smoothness improves classification performance. Therefore, RSSPCA has advantages over the other four algorithms in face recognition.

4.3 Experiments on five additional face databases

To further demonstrate the reconstruction and classification performance of the proposed algorithm, we conducted similar experiments on five additional benchmark face databases, i.e., the AR face database [48], the FEI face database [49], the FERET face database [50], the GT face database [51], and the Yale face database [52]. These datasets are publicly available online and widely distributed for research purposes. The comprehensive details of these face databases, along with their corresponding experimental results, are fully documented in S1 Text within the Supporting Information section.

Table 3 summarizes the lowest reconstruction error and highest classification accuracy of the five algorithms on these face databases. It also includes the corresponding optimal parameters and running time (in seconds) required to compute the projection matrix for each experiment. Additionally, results of the ORL face database are included to provide a comprehensive comparison.

thumbnail
Table 3. The reconstruction error and classification accuracy of the five algorithms on the six face databases.

https://doi.org/10.1371/journal.pone.0323281.t003

Overall, the running time of PCA and PCA-L1 is much shorter than that of RSPCA, RSMPCA, and RSSPCA. This is because the projection vector in the latter three algorithms is iteratively updated by Equations (32) and (33), which is a time-consuming process.

In the face reconstruction experiment, the results on the AR and FEI face databases differ from those on the other four face databases. On the AR face database, PCA obtains the lowest reconstruction error among the five competing algorithms. However, on the other five face databases, the lowest reconstruction error is obtained by RSMPCA or RSSPCA. This discrepancy may be attributed to the intrinsic properties of the AR face database. On the FEI face database, the reconstruction errors obtained by PCA-L1, RSMPCA, and RSSPCA are lower than those obtained by PCA and RSPCA. This result is slightly different from those obtained on the other face databases.

On the FERET, GT, and Yale face databases, RSMPCA or RSSPCA obtains the lowest reconstruction errors, while PCA obtains the highest reconstruction errors. Furthermore, the reconstruction errors of PCA-L1 and RSPCA are close, as are those of RSMPCA and RSSPCA. These findings are consistent with those obtained on the ORL face database. In summary, except for the results on the AR and FEI face databases, the results on the other four face databases demonstrate that incorporating robustness and smoothness improves reconstruction performance, while incorporating sparsity has little effect on reconstruction performance.

In the face recognition experiment, except for the results on the AR face database, the lowest classification accuracies are obtained by PCA or PCA-L1, while the highest classification accuracies are obtained by RSMPCA or RSSPCA. On the AR face database, the classification accuracy of RSMPCA is much lower than that of RSSPCA. Again, this may be attributed to the intrinsic property of the AR face database. In general, incorporating sparsity and smoothness improves classification performance, while incorporating robustness has little effect on classification performance.

In summary, the incorporation of robustness and smoothness enhances reconstruction performance, and the incorporation of sparsity and smoothness enhances classification performance. Table 4 summarizes the impact of the three factors, i.e., robustness, sparsity, and smoothness, on reconstruction and classification performance.

thumbnail
Table 4. The impact of the three factors on reconstruction and classification performance.

https://doi.org/10.1371/journal.pone.0323281.t004

5. Discussion

This study has three major limitations, as detailed below.

First, identifying the optimal parameters for RSSPCA remains an unresolved issue. This challenge is not unique to RSSPCA but is also present in other parameter-dependent algorithms, such as RSPCA [12,14] and RSMPCA. An exception exists when RSPCA is used for face reconstruction. As indicated in Table 3, RSPCA achieves the lowest reconstruction error with across all face databases. In this case, RSPCA approximates PCA-L1, which aligns with the findings in [14,15], demonstrating that incorporating sparsity does not affect reconstruction. However, outside of this specific case, determining the optimal parameters for RSPCA, RSMPCA, and RSSPCA is challenging because their results vary across different face databases.

Second, while some general conclusions can be drawn from the experimental results, they do not fully align with our expectations. Notably, on the AR face database, PCA achieves the lowest reconstruction error, and the classification accuracy of RSMPCA is much lower than that of RSSPCA. These outcomes are not observed in the other five databases. This discrepancy suggests that the AR face database has intrinsic properties influencing the results.

Third, the running time of RSPCA, RSMPCA, and RSSPCA is much longer than that of PCA and PCA-L1. This is due to the time-consuming update rule used by the first three algorithms, as specified in Equations (32) and (33). The running time of RSPCA can be reduced by using either of the two update rules in Equations (6)(11). However, for RSMPCA and RSSPCA, no acceleration strategies have been identified yet due to the incorporation of the smooth constraint.

Potential improvements for the current study are outlined as follows.

First, the smooth constraint in this paper only considers the relationships between spatially adjacent pixels. Extending it to a more general form [41,53] that additionally considers the relations between spatially distant pixels can make full use of the two-dimensional spatial structure information of images.

Second, the robustness and sparsity in this paper are achieved by incorporating L1-norm into the objective function and the constraint function, respectively, in the optimization problem of RSSPCA. They can be further enhanced by replacing L1-norm with an arbitrary norm, i.e., Lp-norm [15,43].

Third, this study only compares five algorithms, i.e., PCA, PCA-L1, RSPCA, RSMPCA, and RSSPCA. However, by incorporating sparsity or smoothness into traditional PCA, we can construct three additional algorithms: Sparse PCA [54], Smooth PCA, and Sparse Smooth PCA. These algorithms can be solved similarly within the MM framework. This paper focuses on improving RSPCA by incorporating the smooth constraint. Therefore, the three non-robust algorithms are not investigated.

As for application of RSSPCA, while this paper focuses exclusively on face reconstruction and recognition, the algorithm is versatile and can be extended to analyze various types of data. For example, RSSPCA can be employed to analyze one-dimensional time series data [55], such as stock prices, heart rate recordings, daily electricity usage, hourly traffic volume, etc. In these cases, the smooth constraint can capture the one-dimensional temporal structure information of the data. For two-dimensional images other than face images, RSSPCA is undoubtedly applicable. Additionally, for high-dimensional data like EEG, MEG, and fMRI [23,2529], the smooth constraint in RSSPCA can capture both spatial and temporal structure information within these datasets. In conclusion, RSSPCA shows great promise for reconstructing or classifying data that contains spatial or temporal structural information.

6. Conclusion

This paper proposes a new algorithm, termed RSSPCA (Robust Sparse Smooth Principal Component Analysis), which enhances RSPCA by incorporating a smooth constraint that captures the two-dimensional spatial structure information of images. An iterative optimization procedure is designed within the Majorization-Minimization (MM) framework to solve the RSSPCA optimization problem.

We evaluate RSSPCA’s performance against four existing algorithms: PCA, PCA-L1, RSPCA, and RSMPCA. Experimental results reveal distinct effects of different constraints: incorporating sparsity has minimal impact on reconstruction performance, while robustness and smoothness contribute to improved reconstruction accuracy. Conversely, robustness shows limited influence in classification tasks, while sparsity and smoothness significantly enhance classification performance.

The proposed RSSPCA algorithm demonstrates clear advantages over existing methods in both face reconstruction and recognition by simultaneously integrating three key properties: robustness, sparsity, and smoothness. Visualization of the generalized eigenfaces clearly illustrates how these constraints influence feature extraction: sparsity enables selective feature identification, while smoothness preserves spatial relationships among facial components. Given these promising results, RSSPCA shows great potential for analyzing data with rich spatial or temporal structural information.

Supporting information

S1 Text. Experiments on five additional face databases.

https://doi.org/10.1371/journal.pone.0323281.s001

(PDF)

References

  1. 1. Jolliffe I. A 50-year personal journey through time with principal component analysis. J Multivar Anal. 2022;188:104820.
  2. 2. Bharadiya JP. A tutorial on principal component analysis for dimensionality reduction in machine learning. Int J Innov Sci Res Technol. 2023;8(5):2028–32.
  3. 3. Wu RMX, Zhang Z, Yan W, Fan J, Gou J, Liu B, et al. A comparative analysis of the principal component analysis and entropy weight methods to establish the indexing measurement. PLoS One. 2022;17(1):e0262261. pmid:35085274
  4. 4. Torell F. Evaluation of stretch reflex synergies in the upper limb using principal component analysis (PCA). PLoS One. 2023;18(10):e0292807. pmid:37824570
  5. 5. Ahmed W, Ansari S, Hanif M, Khalil A. PCA driven mixed filter pruning for efficient convNets. PLoS One. 2022;17(1):e0262386. pmid:35073373
  6. 6. Kwak N. Principal component analysis based on l1-norm maximization. IEEE Trans Pattern Anal Mach Intell. 2008;30(9):1672–80. pmid:18617723
  7. 7. Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B: Stat Methodol. 1996;58(1):267–88.
  8. 8. Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y. Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell. 2009;31(2):210–27. pmid:19110489
  9. 9. Zhang Z, Xu Y, Yang J, Li X, Zhang D. A survey of sparse representation: algorithms and applications. IEEE Access. 2015;3:490–530.
  10. 10. Crespo Marques E, Maciel N, Naviner L, Cai H, Yang J. A review of sparse recovery algorithms. IEEE Access. 2019;7:1300–22.
  11. 11. Zou H, Hastie T, Tibshirani R. Sparse principal component analysis. J Comput Graph Stat. 2006;15(2):265–86.
  12. 12. Meng D, Zhao Q, Xu Z. Improve robustness of sparse PCA by L1-norm maximization. Pattern Recognit. 2012;45(1):487–97.
  13. 13. Yang J, Zhang D, Frangi AF, Yang J. Two-dimensional PCA: a new approach to appearance-based face representation and recognition. IEEE Trans Pattern Anal Mach Intell. 2004;26(1):131–7. pmid:15382693
  14. 14. Wang H, Wang J. 2DPCA with L1-norm for simultaneously robust and sparse modelling. Neural Netw. 2013;46:190–8. pmid:23800536
  15. 15. Wang J. Generalized 2-D principal component analysis by Lp-norm for image analysis. IEEE Trans Cybern. 2016;46(3):792–803. pmid:25898326
  16. 16. Wang J, Zhao M, Xie X, Zhang L, Zhu W. Fusion of bilateral 2DPCA information for image reconstruction and recognition. Appl Sci. 2022;12(24):12913.
  17. 17. Haralick RM, Shanmugam K, Dinstein I. Textural features for image classification. IEEE Trans Syst, Man, Cybern. 1973;SMC-3(6):610–21.
  18. 18. Shi J, Malik J. Normalized cuts and image segmentation. IEEE Trans Pattern Anal Machine Intell. 2000;22(8):888–905.
  19. 19. Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Süsstrunk S. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans Pattern Anal Mach Intell. 2012;34(11):2274–82. pmid:22641706
  20. 20. Chan TF, Vese LA. Active contours without edges. IEEE Trans Image Process. 2001;10(2):266–77. pmid:18249617
  21. 21. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, et al. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989;1(4):541–51.
  22. 22. Rasmussen CE. Gaussian processes in machine learning. Summer school on machine learning. Springer; 2003. p. 63–71.
  23. 23. Dyrholm M, Parra LC, editors. Smooth bilinear classification of EEG. International Conference of the IEEE Engineering in Medicine and Biology Society; 2006.
  24. 24. Hebiri M, vandeGeer S. The smooth-lasso and other ℓ1 ℓ2-penalized methods. Electron J Stat. 2011;5:1184–226.
  25. 25. de Brecht M, Yamagishi N. Combining sparseness and smoothness improves classification accuracy and interpretability. Neuroimage. 2012;60(2):1550–61. pmid:22261376
  26. 26. Grosenick L, Klingenberg B, Katovich K, Knutson B, Taylor JE. Interpretable whole-brain prediction analysis with GraphNet. Neuroimage. 2013;72:304–21. pmid:23298747
  27. 27. Sun Z, Qiao Y, Lelieveldt BPF, Staring M, Alzheimer’s Disease NeuroImaging Initiative. Integrating spatial-anatomical regularization and structure sparsity into SVM: improving interpretation of Alzheimer’s disease classification. Neuroimage. 2018;178:445–60. pmid:29802968
  28. 28. Watanabe T, Kessler D, Scott C, Angstadt M, Sripada C. Disease prediction based on functional connectomes using a scalable and spatially-informed support vector machine. Neuroimage. 2014;96:183–202. pmid:24704268
  29. 29. Eliseyev A, Aksenova T. Penalized multi-way partial least squares for smooth trajectory decoding from electrocorticographic (ECoG) recording. PLoS One. 2016;11(5):e0154878. pmid:27196417
  30. 30. Ling Q, Liu A, Li Y, Fu X, Chen X, McKeown MJ, et al. A joint constrained CCA model for network-dependent brain subregion parcellation. IEEE J Biomed Health Inform. 2022;26(11):5641–52. pmid:35930507
  31. 31. Chen E, Chang R, Guo K, Miao F, Shi K, Ye A, et al. Hyperspectral image spectral-spatial classification via weighted Laplacian smoothing constraint-based sparse representation. PLoS One. 2021;16(7):e0254362. pmid:34255786
  32. 32. Tayyib M, Amir M, Javed U, Akram MW, Yousufi M, Qureshi IM, et al. Accelerated sparsity based reconstruction of compressively sensed multichannel EEG signals. PLoS One. 2020;15(1):e0225397. pmid:31910204
  33. 33. Liu S, Huang Q, Quan T, Zeng S, Li H. Foreground estimation in neuronal images with a sparse-smooth model for robust quantification. Front Neuroanat. 2021;15:716718. pmid:34764857
  34. 34. Merris R. Laplacian matrices of graphs: a survey. Linear Algebra Appl. 1994;197–198:143–76.
  35. 35. Belkin M, Niyogi P. Laplacian eigenmaps and spectral techniques for embedding and clustering. Adv Neural Inf Process Syst. 2001;14.
  36. 36. Belkin M, Niyogi P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 2003;15(6):1373–96.
  37. 37. Zhang S, Li X, Zong M, Zhu X, Wang R. Efficient kNN classification with different numbers of nearest neighbors. IEEE Trans Neural Netw Learn Syst. 2018;29(5):1774–85. pmid:28422666
  38. 38. Zhang S, Li X, Zong M, Zhu X, Cheng D. Learning k for kNN classification. ACM Trans Intell Syst Technol. 2017;8(3):1–19.
  39. 39. Cai D, He X, Hu Y, Han J, Huang T, editors. Learning a spatially smooth subspace for face recognition. IEEE Conference on Computer Vision and Pattern Recognition; 2007.
  40. 40. He X, Yan S, Hu Y, Niyogi P, Zhang H-J. Face recognition using laplacianfaces. IEEE Trans Pattern Anal Mach Intell. 2005;27(3):328–40. pmid:15747789
  41. 41. Wang J, Xie X, Wang P, Sun J, Liu Y, Zhang L. Incorporating symmetric smooth regularizations into sparse logistic regression for classification and feature extraction. Symmetry. 2025;17(2):151.
  42. 42. Donoho DL. De-noising by soft-thresholding. IEEE Trans Inform Theory. 1995;41(3):613–27.
  43. 43. Liang Z, Xia S, Zhou Y, Zhang L, Li Y. Feature extraction based on Lp-norm generalized principal component analysis. Pattern Recognit Lett. 2013;34(9):1037–45.
  44. 44. Mackey L. Deflation methods for sparse PCA. Adv Neural Inf Process Syst. 2008;21.
  45. 45. Hunter DR, Lange K. A tutorial on MM algorithms. Am Stat. 2004;58(1):30–7.
  46. 46. Samaria FS, Harter AC, editors. Parameterisation of a stochastic model for human face identification. IEEE Workshop on Applications of Computer Vision; 1994.
  47. 47. Samaria FS. Face recognition using hidden Markov models. Cambridge, UK: University of Cambridge; 1994.
  48. 48. Martinez A, Benavente R. The AR face database. CVC Tech Rep. 1998;24.
  49. 49. Thomaz CE, Giraldi GA. A new ranking method for principal components analysis and its application to face image analysis. Image Vis Comput. 2010;28(6):902–13.
  50. 50. Phillips PJ, Martin A, Wilson CL, Przybocki M. An introduction evaluating biometric systems. Computer. 2000;33(2):56–63.
  51. 51. Nefian AV, Hayes MH, editors. Hidden Markov models for face recognition. IEEE International Conference on Acoustics, Speech and Signal Processing; 1998.
  52. 52. Georghiades AS, Belhumeur PN, Kriegman DJ. From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Machine Intell. 2001;23(6):643–60.
  53. 53. Wang J, Zhang B, Xie X, Li J, Zhang L, Guo H. Whole-brain classification based on generalized sparse logistic regression. J Xinyang Norm Univ (Nat Sci Ed). 2022;35(3):488–93.
  54. 54. Jolliffe IT, Trendafilov NT, Uddin M. A modified principal component technique based on the LASSO. J Comput Graph Stat. 2003;12(3):531–47.
  55. 55. Yu H-F, Rao N, Dhillon IS. Temporal regularized matrix factorization for high-dimensional time series prediction. Adv Neural Inf Process Syst. 2016;29.