Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

PCA/K-L transformation facial recognition method for vending systems

Abstract

Facial recognition, as an efficient and convenient biometric technology, can be applied to automatic vending systems to achieve functions such as fast checkout and personalized recommendations. To improve the accuracy and processing speed of facial recognition technology, this study designs a facial recognition model for an automatic vending system with an improved principal component analysis method and the Holtling transform. This method reduces the dimensionality of facial features by introducing sample partitioning and histograms into principal component analysis to process facial data. On this basis, the Holtling transform is applied to process the reduced dimensional feature image to obtain the projection value of the face image, making the image easier to recognize. On the renderMe-360 and VoxCeleb2 datasets, the recognition accuracy of the model reached 96.32% and 98.24%, both higher than the comparison methods. The model had an average recognition accuracy of 94.388% in facial recognition from various angles, and showed significant efficiency advantages in feature face construction time and recognition time, especially in high noise conditions, demonstrating strong robustness. Therefore, the proposed model can improve the accuracy of facial recognition, as well as have faster processing speed and noise tolerance, providing new technical value for the intelligent development of automatic vending systems in the future.

1. Introduction

With the rapid advancement of technology, traditional retail systems are transitioning towards Automated Vending Systems (AVS). The automatic vending system is mainly composed of multiple fields such as electronic technology, computer technology, network technology, communication technology, and mechanical engineering technology, which can achieve various functions such as automatic vending, inventory management, and goods supply [1]. As an unmanned, 24-hour self-service device, the vending system can not only be widely used in large shopping malls, stations, airports, hospitals, schools and other public places, but also save manpower and material resources [2]. The vending system, with its innovative technology and flexible application, is gradually becoming the focus of the new retail field [3,4]. However, vending systems still face issues such as theft and inaccurate user authentication in practical applications, which affects the security and user experience of the vending system [5]. How to improve the security and user experience of vending systems has become an urgent issue that needs to be addressed.

There is currently a lot of research in the field of automatic vending systems. C. Liu et al. proposed a multi category product recognition method for unmanned vending systems, which used manifold learning to identify the shapes between different products, constructed multi granularity labels to constrain the products, and then referenced a hierarchical label object detection network to capture the multi granularity features of the products. It effectively explored the potential similarities between multiple types of products and performs well in practical applications [6]. M. Grzegorowski et al. proposed a feature extraction framework to develop replenishment plans for AVS. This scheme adopts the latest machine learning methods and introduces survival analysis indicators, effectively dealing with data uncertainty and cold start problems that occur when AVS are out of stock. In practical applications, it can automatically replenish according to demand [7]. N. Ivanov et al. designed a security model with vending systems to protect AVS from network attacks during payment. This model introduced a multi signature transaction token to replace the application programming interface, and switched the network interface in the model to the device channel while accessing the guidance service. It could improve the safety performance of AVS and had portability and scalability [8]. A. Sharif et al. designed ultra-high frequency RFID tags based on inkjet printing for fruit sales in AVS. This label consists of dipole shaped bands and eye shaped rectangular nested slots to alleviate the capacitance effect on the surface of the fruit. This method can reduce the surface loss resistance of fruits and match well with the product recognition model of AVS, improving the sorting and selling efficiency of AVS for fruits [9]. L. Gan et al. designed an AVS with radar gesture recognition, which used exponential weighted averaging to fit the radar background echo for real-time gesture recognition. The average recognition rate of user gestures by the system was 95.9%, and it improved the sales rate and transaction speed of AVS [10].

With the rapid development of biometric technology, Facial Recognition (FR) technology has been widely used in scientific research due to its non-contact, convenient, and intuitive characteristics. A. Atzori et al. proposed a novel low resolution row face recognition method that converted high-resolution images into low resolution images and combined them to train a face recognition model. The facial images generated by this method were more realistic than those generated by other methods, and had a significant effect on improving demographic bias [11]. A. M. Rodriguez et al. introduced a multi task interpretable quality network into facial recognition models and designed a quality pairing protocol to address the issue of low efficiency in identifying suspects in surveillance footage. This model had good universality on different video datasets, improving the efficiency of identifying suspects from surveillance footage [12]. W. Gao et al. Constructed a facial recognition protocol to improve the computational performance of intelligent security systems during the recognition phase. This protocol introduced matrix into user data, and used edge computing to realize rapid response of large-scale face recognition. It could effectively improve the efficiency of facial recognition in intelligent security systems [13]. H. Li et al. proposed an unconstrained facial expression recognition model with reference free network element learning to explore information in facial expressions. This model introduced constraint conditions to overcome the problem of no basic information in the process of element removal learning, and decomposed unconstrained facial images into expression elements and neutral facial features, reducing the interference of irrelevant information. This model could improve the classification performance of facial expressions and significantly enhanced the robustness and generalization performance of recognition models [14]. Y. Gu et al. proposed a deep learning model with multi-source learning to address the issue of computers being unable to recognize human emotions. This model effectively utilized autocorrelation, demonstrating notable efficacy in this domain. These findings corroborate the evident superiority of multimodal over single modal [15].

In summary, FR technology is widely applied in various fields. There are also many studies on AVS, and introducing FR technology into AVS (AVS-FR) can not only improve system security, but also provide personalized services for users. Although FR technology has many advantages, it still faces the problem of low recognition accuracy and processing speed in complex environments. Based on this, this study proposes an AVS face recognition model based on Improved Principal Component Analysis/Karhunen Lo è ve Transform (IPCA/K-L) to enhance the AVS for face recognition in complex environments, adapt to diverse needs in complex environments, and provide new technical parameters for the intelligent development of AVS.

The innovation of the research lies in the proposal of an improved PCA. (1) Establish a collaborative optimization mechanism for dynamic blocking strategies. By adaptively adjusting the granularity of blocks based on facial structural features, enhance the classifier’s ability to capture local key features and improve recognition accuracy. (2) Achieve local histogram equalization, perform independent histogram equalization processing within each block to eliminate lighting interference, thereby enhancing the contrast and recognizability of the image. (3) Achieve global feature dimension compression. By collaboratively optimizing the covariance matrix, extract the principal components across blocks, further enhancing the recognition rate under low-light conditions and reducing the error recognition rate. (4) The hierarchical cascaded processing flow was completed, enhancing the anti-noise capability in dynamic environments, improving the recognition efficiency and accuracy, and demonstrating the robustness and effectiveness of the improved PCA in complex scenarios. (5) A feature compression framework integrating the advantages of multiple technologies has been constructed, combining local and global information to enhance the overall algorithm performance and adapt to various application scenarios.

2. Methods and materials

This section first introduces the design of a facial feature extraction and recognition scheme based on improved PCA. Secondly, the construction process of the AVS-FR based on IPCA/K-L transformation is described.

2.1 Design of facial feature extraction and recognition scheme with improved PCA

FR technology in AVS confirms individual identity by analyzing and comparing feature information in facial images [16]. However, it still faces some challenges in practical applications, such as accurate user authentication, single payment methods, and lack of personalized recommendation functions, which affect the user experience. How to improve the intelligence level of AVS, while maintaining high recognition accuracy, processing speed, and data security in complex environments, has become a current problem to be solved. Principal Component Analysis (PCA) can decline the dimensionality of original features while minimizing the loss of information [17]. First, by calculating the average face of the sample data, the difference between each sample and the average face is obtained, and then the covariance matrix is constructed to describe the linear relationship between the features. Then, through eigenvalue decomposition of the covariance matrix, the eigenvectors and corresponding eigenvalues are identified, and the principal component with the largest variance is screened out. Finally, the original data is projected into the new feature space according to the selected principal component, thereby reducing the data dimension while preserving as much information as possible. However, PCA has lower accuracy when dealing with FR tasks in complex environments. The pseudocode for PCA is as follows:

  1. Function PCA(InputData):
  2. // 1. Calculate the mean face of the samples
  3. MeanFace = CalculateMean(InputData)
  4. // 2. Calculate the difference between each sample and the mean face
  5. DifferenceFaces = InputData – MeanFace
  6. // 3. Construct the covariance matrix
  7. CovarianceMatrix = CalculateCovariance(DifferenceFaces)
  8. // 4. Compute eigenvalues and eigenvectors
  9. EigenValues, EigenVectors = EigenDecomposition(CovarianceMatrix)
  10. // 5. Sort eigenvalues and corresponding eigenvectors
  11. SortedEigenValues, SortedEigenVectors = SortEigenValuesAndVectors(EigenValues, EigenVectors)
  12. // 6. Select the top k principal components
  13. PrincipalComponents = SelectPrincipalComponents(SortedEigenVectors, k)// k is the number of desired components
  14. // 7. Project the original data onto the principal component space
  15. ProjectedData = Project(InputData, PrincipalComponents)
  16. return ProjectedData

Based on this, this study proposes using IPCA to extract and recognize facial features, to improve the performance of FR systems in complex environments. The pseudocode for IPCA is as follows:

  1. Function IPCA(InputData):
  2. // 1. Partition the sample data
  3. PartitionedSamples = PartitionSamples(InputData)
  4. // 2. Perform histogram equalization on each sub-sample
  5. for each sample in PartitionedSamples:
  6. sample = HistogramEqualization(sample)
  7. // 3. Calculate the mean face for each sub-sample
  8. MeanFaces = CalculateMean(PartitionedSamples)
  9. // 4. Calculate the difference between each sub-sample and the corresponding mean face
  10. DifferenceFaces = PartitionedSamples – MeanFaces
  11. // 5. Construct the covariance matrix
  12. CovarianceMatrix = CalculateCovariance(DifferenceFaces)
  13. // 6. Compute eigenvalues and eigenvectors
  14. EigenValues, EigenVectors = EigenDecomposition(CovarianceMatrix)
  15. // 7. Sort eigenvalues and corresponding eigenvectors
  16. SortedEigenValues, SortedEigenVectors = SortEigenValuesAndVectors(EigenValues, EigenVectors)
  17. // 8. Select the top k significant principal components
  18. PrincipalComponents = SelectPrincipalComponents(SortedEigenVectors, k)// k is the number of desired components
  19. // 9. Project the original data onto the principal component space
  20. ProjectedData = Project(InputData, PrincipalComponents)
  21. return ProjectedData

The dimensionality reduction method of IPCA is shown in Fig 1.

In Fig 1, the dimensionality reduction process in IPCA effectively transforms the original three-dimensional feature space into a two-dimensional space, thus simplifying the complexity of the data and preserving key information. Under this framework, the number of features is reduced from 28 to 20, ensuring that the identification efficiency is improved while the loss of information is minimized. By rearranging the features, the relationship between the distribution of data points and the features can be more clearly reflected in the two-dimensional space. By making clever linear transformations of the raw data, IPCA is able to focus on the most representative features, thereby optimizing subsequent recognition tasks, reducing the computational burden and improving the processing speed and accuracy of the algorithm. This dimensionality reduction strategy not only helps reduce redundant information, but also helps model performance in complex environments, significantly enhancing the robustness of face recognition applications. The horizontal axes PCA1 and vertical axes PCA2 in Fig 1 represent the two directions in which the data changes the most, which results in the first component having the largest variance in the data, followed by the second component, and so on, ensuring that all components are orthogonal to each other. To partition samples with similar features (expressions, angles, lighting) into a matrix, this study adopts a block based approach to partition them. After partitioning, the facial samples shows significant consistency in expressions, facial angles, and environmental lighting. Based on this, the distribution of facial sample data conforms to Gaussian distribution and can improve the efficiency and ability of the algorithm in recognizing sample data [18]. The steps for segmenting facial data samples are as follows: Firstly, the initially set training matrix is divided into several subset modules , ... The segmented images are subjected to histogram equalization, which enhances the sensitivity and contrast of facial images to highlight sample features [19]. The image effects before and after optimizing the facial dataset using histogram method are shown in Fig 2.

In Fig 2, after histogram equalization optimization, not only did the contrast of the face data in the segmented samples improve, but also the uniqueness of each part of the face image was enhanced [20]. After extracting facial features from the above steps, IPCA is used to extract features from the training samples and recognize them. The size of each facial sample data is dimension, where is the number of categories in the training set where facial features are collected, and is the number of samples for each category. The average face and difference face of the sample images are calculated, and all features are decentralized [21]. The formula for calculating the average face is shown in equation (1).

(1)

In equation (1), is the data matrix of the original sample; is the mean vector; is the vectorized value of the -th sample image of class . The calculation formula for facial differentiation is shown in equation (2).

(2)

In equation (2), is the difference face vector, representing the difference between each sample image and the average face. At this point, a covariance matrix can be constructed for the face data image, as shown in equation (3).

(3)

In equation (3), is the covariance matrix used to describe the linear relationship between features in the dataset. is the difference face matrix, composed of all difference face vectors; is the transpose of the difference face matrix; is the transpose symbol of the matrix. Due to the complex and multi-step process of directly solving , the singular value decomposition theorem is used to calculate it. Equation (4) shows the calculation formula [22].

(4)

In equation (4), is the eigenvector of matrix corresponding to ; is the -th non-zero eigenvalue of matrix ; is the orthogonal normalized eigenvector of matrix . On the basis of dimensionality reduction of facial features, based on the influence of feature values on facial features, the top feature values with the greatest impact on facial features and their feature vectors are selected, and their expressions are shown in equation (5).

(5)

In equation (5), is the number of features that have a significant impact on facial features; is the feature vector value; is the feature value, usually ranging from 90% to 99%. By filtering, the spatial matrix of the special class face can be obtained. Based on this, the original -dimensional image has been transformed into -dimensional image features. The image features are classified, and the classification expression is shown in equation (6).

(6)

In equation (6), is the vectorized image matrix; is the spatial matrix of the feature face; is the feature vector of the original face image. The vectorized image matrix can be evaluated by a cosine classifier to determine the similarity between two vector matrices. Equation (7) shows the calculation formula.

(7)

In equation (7), is cosine similarity, used to represent the similarity between two feature vectors; is the feature vector of the test sample; is the feature vector of the training sample. The process of facial feature extraction and recognition based on IPCA is shown in Fig 3. The IPCA process in Fig 3 consists of several key steps. Firstly, the original face image data is processed by sample partitioning and histogram equalization. Then, the mean face and difference face of the partitioned samples are calculated, and the corresponding covariance matrix is constructed. Then, the covariance matrix is calculated by singular value decomposition (SVD), from which the main eigenvectors and eigenvalues are extracted, and the first k most representative features are selected according to the size of the eigenvalues to reduce the dimension. Finally, the original data is projected into the new feature space to form the feature representation after dimensionality reduction. This process not only retains the main information of the data, but also optimizes the face recognition performance in complex environments, effectively improving the accuracy and processing efficiency of the system.

thumbnail
Fig 3. Process of facial feature extraction and recognition method based on ipca algorithm.

https://doi.org/10.1371/journal.pone.0336225.g003

2.2 AVS-FR with IPCA/K-L transform

By using IPCA to extract features from facial image datasets in the previous section, although the improved algorithm can reduce the dimensionality of images in FR to a certain extent, it has certain limitations when facing images with complex nonlinear features in AVS. AVS typically requires efficient and accurate identification of customers to ensure the security and smooth progress of transactions. Therefore, facial recognition technology is particularly important for rapid response and high accuracy in recognition. K-L transform is a linear transformation method that transforms data from the original space to a new feature space [23]. The K-L transform projects the data onto a new set of coordinate axes, minimizing the correlation between the data on each coordinate axis. The advantage of doing so is that it can reduce redundant information in the data and extract the main features from the data. Based on this, this study introduces the K-L transform into the IPCA algorithm and constructs an AVS-FR model based on the IPCA/K-L transform to address the limitations of traditional recognition methods in complex environments and ensure minimal information loss during dimensionality reduction. Taking image blocks as an example, Fig 4 shows the schematic diagram of K-L transformation.

Fig 4 shows the image blocks before and after K-L transformation. The image pixels have strong correlation in the spatial domain, and the energy distribution is relatively uniform. Secondly, after orthogonal transformation, the energy of the image blocks is concentrated on a few coordinate axes, and the correlation between the transformation coefficients is approximately statistically independent [24]. Firstly, the images in the test and training sets are normalized to obtain the sequence of , which can be continuously formed into an -dimensional vector value. Therefore, the image is regarded as a point in the -dimensional space and can be described by a low dimensional subspace through K-L transformation. At this point, the generation matrix of K-L transformation is calculated to obtain the value and vector of the input face image itself. Assuming that the face image library contains images represented by vectors , the mean difference of these images is projected onto this space to obtain the projection vector of a character , and the calculation formula is shown in equation (8).

(8)

In equation (8), is the projection vector, representing the projection of the image in the feature space; is the feature vector matrix; is the vector representation of the mean difference image. For the face image to be recognized, the projection vector of its difference from is calculated using the formula (9).

(9)

In equation (9), is the vector representation of the -th image; is the average facial image; is the projection difference of the image. To further compress the image’s own vector and reduce the computational complexity of the algorithm, this study sorted the size of the image feature vectors and ignored the self vectors corresponding to smaller self values [25]. The compressed feature vector set is shown in equation (10).

(10)

In equation (10), is the self vector; is the vector matrix; the eigenvalues of the self vectors are sorted in descending order, and the first feature sector is retained, where is the number of training image categories. Unlike the previous fixed discarding of some self vectors, this process ensures that the information contained in the remaining self vectors is greater than a certain threshold . value is usually 0.9, and the calculation formula is shown in equation (11).

(11)

In equation (11), is the eigenvalue of the -th figure; is the vector dimension; is the information dimension of the -th figure. Selecting an appropriate distance function from AVS-FR can reduce misidentification, thereby ensuring that AVS can quickly and accurately recognize user information when facing different users. The difference between the absolute values of the pixels of the facial figure is added up as the summation criterion, and the distance function formula is shown in equation (12).

(12)

In equation (12), is the L1 distance, which is also the Manhattan distance; is the -th component of the image vector on the -axis; is the -th component of the image vector on the -axis. The Manhattan distance only requires addition and subtraction, making the computer more cost-effective in a large number of calculations and eliminating errors caused by taking approximations in the square root process [26]. The schematic diagram is shown in Fig 5.

The L2 normal form distance, also known as Euclidean distance, is obtained by adding the squared difference between the pixels of the graph to the value obtained from the Manhattan distance function. Its calculation formula is shown in equation (13).

(13)

In equation (13), is the Euclidean distance value. By calculating the average of different training modes and comparing the distances, each category only needs to be compared once, thereby reducing the computational complexity. The distance expression between samples and is shown in equation (14).

(14)

In equation (14), is the distance between the average vectors of sample and sample ; is the average value of all samples in class ; is the vector of the current sample. In the test set of this algorithm, training and recognition are carried out in two stages. In the first stage, to project all information into a specific subspace, the dimensional vector needs to be obtained. The expression for the distance threshold is shown in equation (15).

(15)

In equation (15), is the distance threshold; and are the projection vectors of different facial shapes; 11 is the two norm of the vectors. The flowchart of the AVS-FR model with IPCA/K-L transformation is shown in Fig 6. The model first normalizes the input facial image dataset, converts it into a vector representation, and extracts features through the IPCA algorithm. The reduced dimensional feature images are then subjected to K-L transformation to obtain the projection values of the facial images. The feature vectors are sorted again, and the most influential features are selected for recognition. Finally, Manhattan distance and Euclidean distance are used to calculate the similarity between the test image and the training image to determine the image category, and threshold judgment is used to ensure the accuracy of recognition and output the recognition result.

Although PCA and K-L transformation are closely related in mathematical principles, in the second stage of the research, there are obvious differences in their scopes of action and optimization objectives. In terms of covariance estimation optimization, PCA only uses the global covariance matrix to extract the principal component directions, ignoring the separability between categories [27]. In the feature space after dimensionality reduction, the K-L transformation is combined with the inter-class adaptive covariance matrix to optimize the feature distribution, such as Formula (8) – Formula (11), and the inter-class differences are enhanced by dynamically adjusting the feature weights. The process of dynamically adjusting feature weights first calculates the performance of each type of sample in the feature dimension, including the contribution of each feature to the discrimination between categories. Based on this information, the model can assign different weights to each feature, highlighting those that have a significant impact on classification while reducing the weights of features that have a smaller impact on classification or contain redundant information [28]. In terms of feature whitening and de-correlation, PCA only decentralizes through orthogonal transformation, and there are still implicit correlations among feature dimensions [29]. The K-L transformation introduces a feature whitening operation, forcing the variance of the feature dimension to be 1 and the covariance to be 0, eliminating redundant correlations and making the Manhattan distance more robust [30]. The specific form is as shown in formulas (12)-(15). Finally, in the information-retention mechanism where energy is concentrated, the principal component selection of PCA is based on the maximization of global variance, which may result in the loss of local highly discriminative features. The K-L transformation retains the local statistical characteristics of different categories and enhances the adaptability to complex nonlinear changes through energy redistribution.

3. Results

3.1 Validity verification of facial feature extraction and recognition scheme based on IPCA

To verify the effectiveness of the research institute’s design scheme, this study used two different facial datasets to detect the scheme, namely VoxCeleb2 released by Oxford University in 2018 and RenderMe-360 released by Shanghai Artificial Intelligence Laboratory in 2023. RenderMe-360 is a large-scale multi view high-definition facial video dataset, which includes diverse facial expressions, rich fine-grained hairstyles and colors, as well as phonetically balanced speech videos; VoxCeleb2 is a large-scale speaker recognition dataset automatically obtained from open source media, which contains over 100000 facial data information. In this study, the training network architecture of IPCA/K-L transformation model adopts the convolutional neural network (CNN) structure. The architecture includes multiple convolution layers and pooling layers to extract local features of the image, and then IPCA module is introduced for feature dimensionality reduction to retain the main information and reduce the complexity of the data. Then, the K-L transform is applied to further optimize the feature representation and reduce the redundant information. Finally, the extracted features are mapped to the output layer through the full connection layer and classified using the Softmax activation function to achieve efficient and accurate face recognition. The software and hardware environment and initial parameter settings used in the experiment are shown in Table 1.

thumbnail
Table 1. Experimental software and hardware configuration and algorithm initialization parameter settings.

https://doi.org/10.1371/journal.pone.0336225.t001

To verify the effect of the proposed method in the research, the performance of PCA and PCA + K-L was compared through ablation in the study. The experiment compared the performance differences between the two versions under the same hardware environment with a fixed feature dimension of 50. The specific results are shown in Table 2.

The results in Table 2 show that, compared with the method using only PCA, the model performance is significantly optimized after introducing the K-L transformation. On the RenderMe-360 dataset, the recognition accuracy rate increased from 90.8% to 96.32%, and on VoxCeleb2, it increased from 91.0% to 98.24%, verifying the enhancement effect of K-L transformation on the classification boundary through feature whitening and energy redistribution. Although the feature construction time and recognition delay increased by approximately 4.2% (5 seconds) and 13.3% (0.2 seconds) respectively due to additional transformation steps, the L1 distance misjudgment rate decreased by 57.3% (RenderM-360) and 82.4% (VoxCeleb2), and the robustness under 5dB high noise improved by 17.5% and 16.8%. It indicates that the K-L transformation achieves a balanced optimization of computational efficiency and recognition accuracy in complex scenarios by eliminating feature redundancy correlations and dynamically adjusting statistical weights, which meets the core requirements of the vending system for high security and low misjudgment.

Fig 7 shows the accuracy iteration loss curve of IPAC on two datasets, Acc-Epoch-Loss.

In Fig 7(a), the loss value of IPCA tended to 3% after 60 iterations, with an average accuracy of 97%; In Fig 7(b), the loss value was 23% and the accuracy was 84% after 50 iterations; And the algorithm terminated the iteration in a timely manner at the appropriate position. IPCA effectively captured feature information in images, reduced the loss and neglect of target details, and was more accurate for portrait recognition collected in AVS, reducing the quantity of overfitting occurrences in the algorithm. To verify the accuracy of IPCA in FR in AVS, different numbers of feature vectors were set to validate the recognition accuracy of IPCA, PCA, Linear Discriminant Analysis (LDA), and Locally Linear Embedding (LLE) algorithms [31,32]. Fig 8 shows the experimental outcomes.

thumbnail
Fig 8. The recognition accuracy of IPCA, PCA, LDA, and LLE with different numbers of feature vectors.

https://doi.org/10.1371/journal.pone.0336225.g008

In Fig 8(a), the recognition rates of IPCA, PCA, LDA, and LLE all increased with the increase of feature quantity, and the overall recognition rate of IPCA was higher than the other three algorithms. When the number of features was 40, the recognition rates of IPCA, PCA, LDA, and LLE were 96.32%, 84.26%, 90.34%, and 77.49%, respectively, indicating that IPCA had a higher recognition accuracy. This indicated that IPCA could effectively utilize feature information to optimize the facial recognition. In Fig 8(b), IPCA showed a significant increase in recognition accuracy from 60% to 81% as the number of features increased from 4 to 8, while other algorithms showed slower growth. When the number of feature vectors was 40, the recognition rates of IPCA, PCA, LDA, and LLE were 98.24%, 89.47%, 84.16%, and 78.11%, respectively. Only IPCA achieved a recognition accuracy of over 90%, which was 8.77% lower than PCA. The recognition rate of IPCA had been significantly improved through sample partitioning and histogram equalization processing. The construction and recognition times of feature faces for IPCA, PCA, LDA, and LLE on two datasets are shown in Table 3.

thumbnail
Table 3. Four algorithms construct feature face time and recognition time on different datasets.

https://doi.org/10.1371/journal.pone.0336225.t003

In Table 3, on RenderMe-360, the IPCA constructed feature faces in 120 seconds, which was lower than the 240 seconds, 180 seconds, and 300 seconds of PCA, LDA, and LLE. The recognition time and accuracy of IPCA were 1.5s and 92.4%, which were better than the other three algorithms. On VoxCeleb2, the IPCA construction feature face time, recognition time, and recognition accuracy were 250s, 2s, and 93.1%, which were significantly superior to the other three algorithms. IPCA showed a significant time advantage on the large-scale dataset VoxCeleb2, with significantly reduced feature face construction time. IPCA also had good recognition performance in terms of recognition time and accuracy, and it was more flexible in handling incremental data, saving a lot of computing resources. In Fig 9, in the VoxCeleb2 dataset, whether the face image was classified as simple, medium, and complex difficulty, and the accuracy and recall of IPCA and IPCA were compared.

In Fig 9, under different FR difficulties, the recall rate of IPCA was superior to PCA, and the accuracy was inversely proportional to the recall rate. In Fig 9(a), under the condition of a simple dataset, when the recall rate of IPCA’s FR model was 0.85, the accuracy decreased exponentially and increased by 0.08 compared to PCA’s recall rate; In Fig 9(b) and 9(c), as the difficulty of the dataset increased, the accuracy of the FR model of IPCA decreased at recall rates of 0.74 and 0.56, respectively, but both were better than PCA, indicating that the recognition accuracy of IPCA had been improved.

3.2 Performance evaluation of AVS-FR model with IPCA/K-L transformation

Firstly, the recognition accuracy of the model was validated at different angles, and multiple facial angle images were selected from the VoxCeleb2 dataset for experimental verification. The AVS-FR model based on IPCA/K-L transformation was compared with CNN, Eigenfaces, Fisher faces based on LDA algorithm, Support Vector Machine (SVM), Haar Cascade classifier, Transformers, and Unified Embedding for Face Recognition and Clustering (FaceNet) based on deep learning. Table 4 shows the recognition accuracy of these algorithms at different facial angles.

thumbnail
Table 4. Recognition accuracy of different facial recognition models.

https://doi.org/10.1371/journal.pone.0336225.t004

Table 4 shows the recognition accuracy of various face recognition models at different facial angles. Among them, the AVS-FR model based on IPCA/K-L transformation performs particularly outstandingly, with an average recognition accuracy as high as 94.388%. This model performs exceptionally well from all angles. The recognition accuracy rates for left-leaning 60 degrees, left-leaning 30 degrees, frontal view, right-leaning 30 degrees, and right-leaning 60 degrees are 91.23%, 92.36%, 98.15%, 94.85%, and 95.35% respectively. Compared with other models, IPCA/K-L demonstrates outstanding robustness and reliability when dealing with complex scenarios.

In the traditional model, the average recognition accuracy of the convolutional neural network is 84.12%. Although there is an improvement in frontal recognition, the overall performance is still inferior to that of IPCA/K-L. The average recognition accuracy of Eigenfaces and Fisherfaces did not exceed 81%, indicating the limitations of these traditional methods in multi-angle face recognition. Support vector machines perform slightly better in this aspect, with an average recognition accuracy rate of 86.02%, but still do not reach the level of IPCA/K-L.

Among deep learning models, FaceNet performs the best, with an accuracy rate of up to 96% for frontal recognition, but its performance in other angles is still inferior to that of IPCA/K-L. The Transformers model performs relatively stably from different angles, especially achieving an accuracy of 97.42% in frontal recognition. However, the performance of IPCA/K-L exceeds 90% at all angles, demonstrating its advantages when dealing with various complex Angle variations.

Further comparison of 2D-PCA, Block-PCA and DCT-PCA revealed that their recognition accuracy rates were all lower than that of IPCA/K-L. The average recognition accuracy rate of 2D-PCA is 78.67%, that of Block-PCA is 80.03%, and that of DCT-PCA is 79.86%. This indicates that although these traditional PCA variants still have certain practicality in some scenarios, their performance is significantly insufficient under complex changes in facial angles. Overall, the IPCA/K-L transformation significantly enhances the accuracy of face recognition by improving feature extraction and reducing redundancy, providing a more effective technical solution for vending systems.

In Fig 10, SVM and FaceNet were selected for iterative experiments on the VoxCeleb2 dataset and the RenderMe-360 dataset.

thumbnail
Fig 10. Recognition results of three facial recognition models on different datasets.

https://doi.org/10.1371/journal.pone.0336225.g010

In Fig 10(a), when the iterations reached 2100, IPCA/K-L had the highest recognition accuracy, with a recognition accuracy of 97.54%, and SVM had the best recognition accuracy. And when the iterations were 300, the accuracy of IPCA/K-L approached its maximum value, and the iteration speed was faster. In Fig 10(b), the recognition accuracy of IPCA/K-L remained the highest at different iteration times. And at 600 iterations, the recognition accuracy of IPCA/K-L on the face dataset was 91.23%, approaching the maximum value. At 2100 iterations, the recognition accuracy of IPCA/K-L, SVM, and FaceNet on VoxCeleb2 were 98.63%, 91.18%, and 92.54%, which was 7.45% and 6.09% higher than SVM and FaceNet, respectively. The IPAC recognition model introduced with K-L transform outperformed other models in recognition accuracy and iteration speed, verifying its feasibility and applicability in AVS-FR. In Fig 11, IPCA/K-L was subjected to noise resistance testing.

thumbnail
Fig 11. Noise tolerance data of SVM, FaceNet, IPCA/K-L transform, haar cascade, and eigenfaces.

https://doi.org/10.1371/journal.pone.0336225.g011

In Fig 11, IPCA/K-L had better noise tolerance than other models in different noise environments. In a high noise environment (5dB), the recognition accuracy of IPCA/K-L was 84.65%; In a low-noise environment (50dB), the recognition accuracy of IPCA/K-L was 97.85%. The performance of IPCA/K-L was relatively stable in various noisy environments, especially demonstrating good robustness in high noise environments. Other models had recognition accuracy below 80% in high noise environments, and the trend of curve changes was significant in different noise environments, indicating that these models were easily affected by image noise, which led to a decrease in accuracy and speed when recognizing user information. The superiority of the IPCA/K-L transformation model in AVS was verified, especially its robustness in complex noise environments. At the end of the study, the effect of face detection under different lighting conditions was verified, and the results were shown in Table 5.

thumbnail
Table 5. Face detection effect under different lighting conditions.

https://doi.org/10.1371/journal.pone.0336225.t005

Under different illumination conditions, the recognition performance of IPCA/K-L transformation model shows strong robustness and adaptability, especially under medium illumination conditions, reaching a high accuracy of 94.10%. This shows that the model can effectively capture the feature information under good lighting conditions, so as to achieve efficient user identification. Under high illumination conditions, the recognition accuracy of IPCA/K-L model decreased to 89.45%, which may be due to facial shadow and reflection interference caused by strong illumination, which affected the feature extraction process. Nevertheless, IPCA/K-L maintained a high accuracy rate, much higher than that of PCA models under the same conditions. Therefore, it can be considered that IPCA/K-L model has relatively good stability when dealing with light changes. At the end of the study, the power consumption, running time and running efficiency of each method in the system are compared, and the results are shown in Table 6.

thumbnail
Table 6. Comparison of power consumption of each method in the system.

https://doi.org/10.1371/journal.pone.0336225.t006

The results in Table 6 show that the proposed method based on IPCA/K-L transformation achieves a good balance between power consumption, running time and processing efficiency. Its power consumption is 2.8W, running time is 1.8s, and processing efficiency reaches 22FPS. This performance is superior to SVM-based and CNN methods, which have significantly higher power consumption and runtime of 5.0 watts and 6.5 watts, respectively, while having lower processing efficiency.

The performance comparison results between the research and the current mainstream facial recognition methods are shown in Table 7.

thumbnail
Table 7. Comparison of power consumption of each method in the system.

https://doi.org/10.1371/journal.pone.0336225.t007

The results in Table 7 show that the IPCA/K-L method has significant advantages in model lightweight and processing efficiency. Compared with the mainstream deep models ArcFace-R100 and CosFace, its model parameters are reduced by more than 300 times and the processing speed is increased by 5–6 times. Meanwhile, the memory consumption is only 50MB, which is far lower than the over 750MB level of deep methods. Although the speed of the MobileFaceNet-INT8 version is slightly higher than that of IPCA/K-L, its recognition accuracy in low-light environments drops to 78.9%, while IPCA/K-L still maintains a robustness of 84.7%. Furthermore, IPCA/K-L can achieve a speed close to that of the Edge-TPU optimization model without relying on dedicated hardware acceleration, and the comprehensive resource consumption is more balanced.

4. Discussion and conclusion

The effectiveness of the AVS-FR model with IPCA/K-L transformation was verified through experimental analysis. After sample partitioning and histogram equalization processing, IPCA had significantly improved recognition rate compared to PCA, and effectively captured feature information in the image, making it more accurate for portrait recognition collected in AVS. In addition, to address the limitations of AVS-FR in dealing with complex nonlinear features in images, the K-L transform was introduced to reduce redundant information in facial data, effectively improving the recognition accuracy and processing speed of AVS in complex environments.

To improve the accuracy and processing speed of AVS-FR, enhance user experience and security, an AVS-FR model with IPCA/K-L transformation was constructed. The loss value of IPCA on RenderMe-360 tended to 3% after 60 iterations, with an average accuracy of 97%, and the algorithm terminated iterations at appropriate positions in a timely manner. When the number of feature vectors was 40, the recognition rate of IPCA was 98.24%, which was 8.77% lower than the recognition accuracy of PCA. After sample partitioning and histogram equalization, the recognition rate of IPCA had significantly improved compared to PCA. On RenderMe-360, IPCA took 120 seconds to construct feature faces, which was lower than PCA, LDA, and LLE’s 240 seconds, 180 seconds, and 300 seconds. In addition, the recognition time and accuracy of IPCA were 1.5s and 92.4%, which were better than the other three algorithms, proving that the improved IPCA recognition accuracy had been improved. The average recognition accuracy of IPCA/K-L at various angles of FR was 94.388%. When the iterations reached 2100, IPCA/K-L had the highest recognition accuracy, with a recognition accuracy of 97.54%, proving the effectiveness of the model. Under different lighting conditions, the recognition accuracy of the model based on IPCA/K-L transformation is 85.22% under low lighting conditions, while it is significantly increased to 94.10% under medium lighting conditions, indicating that it can effectively capture feature information under ideal lighting conditions to ensure efficient user recognition. In summary, IPCA/K-L can be effectively applied to AVS, which is of great significance for improving user experience and system security. The limitation of this study is that it can limit the real-time performance and computational efficiency of the algorithm when computer resources are limited. Future research will consider combining hardware acceleration technology to enhance model on the system.

Supporting information

References

  1. 1. Lin M-H, Sarwar MA, Daraghmi Y-A, Ik T-U. On-shelf load cell calibration for positioning and weighing assisted by activity detection: smart store scenario. IEEE Sens J. 2022;22(4):3455–63.
  2. 2. Pereira D, Bozzato A, Dario P, Ciuti G. Towards foodservice robotics: a taxonomy of actions of foodservice workers and a critical review of supportive technology. IEEE Trans Autom Sci Eng. 2022;19(3):1820–58.
  3. 3. Babayigit B, Abubaker M. Industrial internet of things: a review of improvements over traditional SCADA systems for industrial automation. IEEE Syst J. 2024;18(1):120–33.
  4. 4. Liu L, Zhang H, Zhou D, Shi J. Toward fashion intelligence in the big data era: state-of-the-art and future prospects. IEEE Trans Consum Electr. 2024;70(1):36–57.
  5. 5. Chen F, Xiao Z, Xiang T, Fan J, Truong H-L. A full lifecycle authentication scheme for large-scale smart IoT applications. IEEE Trans Depend Secure Comput. 2022;:1–1.
  6. 6. Liu C, Da Z, Liang Y, Xue Y, Zhao G, Qian X. Product recognition for unmanned vending machines. IEEE Trans Neural Netw Learn Syst. 2024;35(2):1584–97. pmid:35767486
  7. 7. Grzegorowski M, Litwin J, Wnuk M, Pabiś M, Marcinowski Ł. Survival-based feature extraction—Application in supply management for dispersed vending machines. IEEE Trans Ind Inf. 2023;19(3):3331–40.
  8. 8. Ivanov N, Yan Q. AutoThing: a secure transaction framework for self-service things. IEEE Trans Serv Comput. 2023;16(2):983–95.
  9. 9. Sharif A, Althobaiti T, Alotaibi AA, Ramzan N, Imran MA, Abbasi QH. Inkjet-printed UHF RFID sticker for traceability and spoilage sensing of fruits. IEEE Sens J. 2023;23(1):733–40.
  10. 10. Gan L, Liu Y, Li Y, Zhang R, Huang L, Shi C. Gesture recognition system using 24 GHz FMCW radar sensor realized on real-time edge computing platform. IEEE Sens J. 2022;22(9):8904–14.
  11. 11. Atzori A, Fenu G, Marras M. Demographic bias in low-resolution deep face recognition in the wild. IEEE J Sel Top Signal Process. 2023;17(3):599–611.
  12. 12. Rodriguez AM, Unzueta L, Geradts Z, Worring M, Elordi U. Multi-task explainable quality networks for large-scale forensic facial recognition. IEEE J Sel Top Signal Process. 2023;17(3):612–23.
  13. 13. Gao W, Yu J, Hao R, Kong F, Liu X. Privacy-preserving face recognition with multi-edge assistance for intelligent security systems. IEEE Internet Things J. 2023;10(12):10948–58.
  14. 14. Li H, Wang N, Yang X, Wang X, Gao X. Unconstrained facial expression recognition with no-reference de-elements learning. IEEE Trans Affect Comput. 2024;15(1):173–85.
  15. 15. Gu Y, Zhang X, Yan H, Huang J, Liu Z, Dong M, et al. WiFE: WiFi and vision based unobtrusive emotion recognition via gesture and facial expression. IEEE Trans Affect Comput. 2023;14(4):2567–81.
  16. 16. Ning X, Xu S, Nan F, Zeng Q, Wang C, Cai W, et al. Face editing based on facial recognition features. IEEE Trans Cogn Dev Syst. 2023;15(2):774–83.
  17. 17. Mannocci P, Baroni A, Melacarne E, Zambelli C, Olivo P, Perez E, et al. In-memory principal component analysis by crosspoint array of resistive switching memory: a new hardware approach for energy-efficient data analysis in edge computing. IEEE Nanotechnol Mag. 2022;16(2):4–13.
  18. 18. Hasanvand M, Nooshyar M, Moharamkhani E, Selyari A. Machine learning methodology for identifying vehicles using image processing. AIA. 2023;1(3):154–62.
  19. 19. Zhou J, Pang L, Zhang D, Zhang W. Underwater image enhancement method via multi-interval subhistogram perspective equalization. IEEE J Oceanic Eng. 2023;48(2):474–88.
  20. 20. Wu H-T, Cao X, Jia R, Cheung Y-M. Reversible data hiding with brightness preserving contrast enhancement by two-dimensional histogram modification. IEEE Trans Circuits Syst Video Technol. 2022;32(11):7605–17.
  21. 21. Lu S, Gao Z, Xu Q, Jiang C, Zhang A, Wang X. Class-imbalance privacy-preserving federated learning for decentralized fault diagnosis with biometric authentication. IEEE Trans Ind Inf. 2022;18(12):9101–11.
  22. 22. Zhang M, Sun Z, Li H, Niu B, Li F, Zhang Z, et al. Go-sharing: a blockchain-based privacy-preserving framework for cross-social network photo sharing. IEEE Trans Depend Secure Comput. 2023;20(5):3572–87.
  23. 23. Liu Y, Li B, Zhang C, Yao Y. Strategy for radar-embedded communication waveform design based on singular value decomposition. IEEE Trans Veh Technol. 2022;71(11):11847–60.
  24. 24. de Castro BA, Binotto A, Ardila-Rey JA, Fraga JRCP, Smith C, Andreoli AL. New algorithm applied to transformers’ failures detection based on karhunen–loève transform. IEEE Trans Ind Inf. 2023;19(11):10883–91.
  25. 25. Lin S, Luo S, Ma S, Feng J, Shao Y, Drikas ZB, et al. Predicting statistical wave physics in complex enclosures: a stochastic dyadic green’s function approach. IEEE Trans Electromagn Compat. 2023;65(2):436–53.
  26. 26. Wang Y, Mukherjee D. The discrete cosine transform and its impact on visual compression: fifty years from its invention [Perspectives]. IEEE Signal Process Mag. 2023;40(6):14–7.
  27. 27. Preethi P, Mamatha HR. Region-based convolutional neural network for segmenting text in epigraphical images. AIA. 2022;1(2):103–11.
  28. 28. Yang Y, Yang X, Sakamoto T, Fioranelli F, Li B, Lang Y. Unsupervised domain adaptation for disguised-gait-based person identification on micro-doppler signatures. IEEE Trans Circuits Syst Video Technol. 2022;32(9):6448–60.
  29. 29. Cheng M-M, Zhang J, Wang D-G, Tan W, Yang J. A localization algorithm based on improved water flow optimizer and max-similarity path for 3-D heterogeneous wireless sensor networks. IEEE Sensors J. 2023;23(12):13774–88.
  30. 30. Xu C, Song W. Intelligent task allocation for mobile crowdsensing with graph attention network and deep reinforcement learning. IEEE Trans Netw Sci Eng. 2023;10(2):1032–48.
  31. 31. Onoshima D, Uchida K, Iida T, Kojima T, Ikeda Y, Iwata D, et al. Single-cell detection and linear discriminant analysis of bacterial Raman spectra in glass filter microholes. Anal Methods. 2024;16(39):6746–50. pmid:39324503
  32. 32. Mitchell-Heggs R, Prado S, Gava GP, Go MA, Schultz SR. Neural manifold analysis of brain circuit dynamics in health and disease. J Comput Neurosci. 2023;51(1):1–21. pmid:36522604
  33. 33. Sun Z, Wang Y, Sun G. Fault diagnosis of rotating machinery based on local centroid mean local fisher discriminant analysis. J Vib Eng Technol. 2022;11(4):1417–41.
  34. 34. Zhang K, Wu W, Liu Y, Xie T, Zhou J, Zhu H. OCFMD: an automatic optimal clustering method of discontinuity orientation based on fisher mixed distribution. Rock Mech Rock Eng. 2023;57(3):1735–63.
  35. 35. Chang C-Y, Santra AS, Chang I-H, Wu S-J, Roy DS, Zhang Q. Design and implementation of a real-time face recognition system based on artificial intelligence techniques. Multimed Syst. 2024;30(2):1–19.
  36. 36. Babu KN, Manne S. An automatic student attendance monitoring system using an integrated HAAR cascade with CNN for face recognition with mask. TS. 2023;40(2):743–9.
  37. 37. Chen P-Y, Cheng Y-C, Pai N-S, Chiang Y-H. Applying blockchain technology and facial recognition to unmanned stores. Sens Mater. 2023;35(6):2081.
  38. 38. Yadav RK, Daniel A, Semwal VB. Enhancing human activity detection and classification using fine tuned attention-based transformer models. SN Comput Sci. 2024;5(8):1–21.