Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Adaptive mechanism-based grey wolf optimizer for feature selection in high-dimensional classification

  • Genliang Li ,

    Roles Funding acquisition, Investigation, Software, Supervision, Writing – original draft, Writing – review & editing

    humrry@foxmail.com

    ☯ These authors contributed to the work equally and should be regarded as co-first authors

    Affiliations New Engineering Industry College, Putian University, Putian, Fujian, China, Engineering Research Center of Big Data Application in Private Health Medicine, Fujian Province University, Putian, Fujian 351100, China, Putian Science and Technology Plan Project (Putian Electronic Information Industry Research Institute), Putian, Fujian 351100, China

  • Yaxin Cui ,

    Roles Software, Writing – original draft

    ☯ These authors contributed to the work equally and should be regarded as co-first authors

    Affiliation New Engineering Industry College, Putian University, Putian, Fujian, China

  • Jingyu Su

    Roles Software

    Affiliations New Engineering Industry College, Putian University, Putian, Fujian, China, Engineering Research Center of Big Data Application in Private Health Medicine, Fujian Province University, Putian, Fujian 351100, China, Putian Science and Technology Plan Project (Putian Electronic Information Industry Research Institute), Putian, Fujian 351100, China

Abstract

Feature Selection (FS) is a crucial component of machine learning and data mining. Its goal is to eliminate redundant and irrelevant features from a datasets, thereby enhancing the classifier's performance. The Grey Wolf Optimizer (GWO) is a well-known meta-heuristic algorithm rooted in swarm intelligence. It is widely used in various optimization problems due to its fast convergence and minimal parameter requirements. However, in the context of solving high-dimensional classification problems, GWO’s global search capability is limited, and it is susceptible to getting trapped in local optima. To address this, we introduce an Adaptive Mechanism-based Grey Wolf Optimizer (AMGWO) for FS in high-dimensional classification. This approach encompasses a novel nonlinear parameter control strategy to balance exploration and exploitation effectively, thereby preventing the algorithm from converging prematurely. Additionally, an adaptive fitness distance balancing mechanism is proposed to prevent premature convergence and enhance search efficiency by selecting high-potential solutions. Lastly, an adaptive neighborhood mutation mechanism is designed to adjust mutation intensity adaptively during the search process, allowing AMGWO to more effectively find the global optimum. To validate the proposed AMGWO method, we assess its performance on 15 high-dimensional datasets and compare it with the original GWO and five of its variants in terms of classification accuracy, feature subset size, and execution speed, thus confirming the superiority of AMGWO.

1. Introduction

The process of Feature Selection (FS) is a critical component of data reprocessing, with the purpose of reducing data dimensionality and redundancy. This is achieved by extracting features that are meaningful for model prediction, ultimately leading to improvements in model performance and interpretability [1]. In the realm of machine learning and data mining, FS has found widespread application in solving practical problems, including extracting fetal electrocardiogram signals [2], gender detection from voice data [3], biological data analysis [4], and intelligent facial emotion recognition [5]. The implementation of effective FS significantly improves the learning efficiency and prediction accuracy of models.

The FS problem is known to be NP-hard, with the search space growing exponentially as the dimension increases, making exhaustive search impractical. To enhance the search efficiency of FS algorithms, numerous methods have been proposed by scholars. FS methods are typically categorized into three groups: filter methods, wrapper methods, and embedded methods [6]. Among these, the wrapper method is widely favored for its superior classification ability, and it is the focus of this work. This method primarily comprises three components: classifier, feature subset evaluation, and search technique [7]. Of these, an effective search technique is vital for the performance of FS algorithms. It is noteworthy that Metaheuristic methods (MH) such as Particle Swarm Optimization (PSO) [8], Differential Evolution (DE) [9], Genetic Algorithm (GA) [10], Artificial Bee Colony Algorithm (ABC) [11], Harris Hawk Optimization (HHO) [12], Whale Optimization Algorithm (WOA) [13], Moth-Flame Optimization (MFO) [14], Snake Optimizer (SO) [15], Binary Improved ChOA Algorithm (BICHOA) [16], Binary Improved White Shark Optimizer (BIWSO) [17], Opposition-based sine cosine optimizer (IBSCA) [18], Improved Binary DJaya Algorithm (IBJA) [19], among others, have been extensively utilized in the realm of FS. In addition, MH are also widely used in other fields, for example, Structural Optimization [20,21], Economic Load Dispatch Problem [22], Patient Admission Scheduling Problem [23], Intrusion Detection Systems [24], Point Cloud Registration [25], and others. Meanwhile, some literatures of great research value are as follows: Metalearning-Based Alternating Minimization Algorithm [26], HyGloadAttack [27], Novel method for reliability optimization [28], Topology optimization [29], Multi-objective robust optimization model [30], Computational intelligence-based classification system [31], Near miss prediction [32], SLNL [33], Multi-Character Classification [34], and TMFF [35].

The MH method offers four key advantages in addressing such issues [36]: simplicity, flexibility, absence of a derivation mechanism, and the capability to evade local optima. Typically, the MH algorithm's search process comprises two stages [37]: the initial stage involves exploration, while the subsequent stage involves exploitation. During the exploration phase, the algorithm thoroughly navigates the search space to uncover diverse solutions to the problem. In the exploitation phase, the algorithm leverages local information to generate improved solutions, typically in the proximity of the current solution. Excessive exploration slows down the algorithm's convergence, while excessive exploitation raises the risk of getting trapped in local optima. Striking a balance between exploration and exploitation is a primary objective in algorithm design to achieve optimal performance.

The Grey Wolf Optimizer (GWO) is a meta-heuristic optimization algorithm inspired by the collective hunting behavior of grey wolves [38]. By emulating the social hierarchy and hunting strategies of grey wolves, GWO can effectively balance global search and local exploitation. However, when confronted with high-dimensional and intricate optimization problems, the original GWO algorithm is susceptible to getting trapped in local optima and demonstrates certain limitations in terms of convergence speed and accuracy. As a result, various GWO variants have been proposed by researchers, and they have been successfully applied across a range of real-world fields, as detailed in Table 1.

As per the No Free Lunch Theorem (NFL) [54], it is acknowledged that no single optimization algorithm excels across all problems. Hence, it is imperative to enhance and fine-tune algorithms to suit specific problems. This paper introduces a new variant of the GWO to improve its FS ability, addressing its current limitations. The key contributions of this paper are outlined below:

  1. (1). A new nonlinear parameter control strategy is introduced to effectively balance algorithm exploration and exploitation.
  2. (2). An adaptive fitness distance balance mechanism was proposed to accelerate the convergence speed of the algorithm.
  3. (3). An adaptive neighborhood mutation mechanism is designed to fully consider the information exchange between α, β, δ wolves and the current global optimal solution. This allows the algorithm to explore the solution space more effectively and escape local optima.
  4. (4). Fifteen data sets were used to test the performance of the AMGWO algorithm, and the effectiveness and robustness of the AMGWO algorithm were verified.

The paper is organized as follows: Section 2 outlines the FS problem and explains the fundamental principles of the original GWO. Section 3 provides a detailed introduction to the AMGWO algorithm. The experimental setup is described in Section 4. Section 5 presents comprehensive experiments and analysis to verify the effectiveness and performance advantages of the AMGWO algorithm in FS. Lastly, Section 6 concludes the manuscript and discusses future research directions and potential applications.

2. Preliminaries

In this section, we briefly review the feature selection problem and the original grey wolf optimizer.

2.1 Feature selection

Consider a dataset S containing D features and L samples. The FS problem aims to choose d features (where d < D) from the entire feature set in order to optimize the objective function . For practical reasons, the solution to the FS problem X is often represented as a binary string, with each bit indicating whether a feature is chosen for model construction.

(1)

where, means that the ith feature is selected in the subset , otherwise it is not selected.

In FS problems, often refers to the accuracy or error rate of a classification. If represents the classification error rate, the FS problem can be expressed by the mathematical formula (2).

(2)

2.2 Gray Wolf Optimizer

Proposed by Mirjalili et al. [38], the Grey Wolf Optimizer is based on the social hierarchy and hunting behavior of grey wolf populations. As illustrated in Fig 1, the grey wolf population adheres to a strict hierarchy consisting of four levels: α, β, δ, and ω. Among them, α wolves, as the supreme leaders, significantly influence crucial aspects such as hunting and habitat selection. β wolves assume a secondary role, working closely with α wolves, and providing crucial support in decision-making, planning group actions, and hunting strategies. δ wolves closely follow, guarding territorial boundaries and alerting the pack when threatened, while following the lead of both α and β wolves. Although ω wolves may seem to have a lower status, they play an essential role in maintaining the pack’s internal balance under the guidance of the α, β, and δ wolves.

In the GWO mathematical model, the optimal solution can be likened to the α wolf, the sub-optimal solution to the β wolf, the third optimal solution to the δ wolf, and the remaining candidate solutions to ω wolves. By utilizing the location information of α, β, and δ wolves as search guidance, we can direct the entire search process towards the optimal solution. This simulates a cooperation mechanism among them, leading to an effective solution for the complex problem.

The behavior of the grey Wolf while searching for prey is abstracted as mathematical formulas (3) and (4).

(3)(4)

where, represents the distance between the prey and the wolf during the search process, and represent the coefficient vectors, which are calculated by formula (5) and formula (6), respectively. represents the current position of the prey, represents the position of the gray wolf in the current iteration, and represents the new position of the gray wolf in the next iteration.

(5)(6)

The said coefficients of a linear decrease from 2 to 0, and random number from 0 to 1.

In summary, the mathematical model of the entire hunting behavior of the gray Wolf is as follows:

(7)(8)(9)

It is worth emphasizing that when the parameter , the grey wolf moves farther away from the prey, which helps the algorithm explore the search space more widely. On the other hand, if , the grey wolf closely surrounds the prey, promoting rapid convergence to the global optimal solution. Clearly, the dynamic adjustment of the grey wolf's position is guided by three core levels: α, β, and δ. Fig 2 shows the flow diagram of the GWO.

3. The proposed AMGWO method

In this section, we introduce a novel Grey Wolf Optimizer with adaptive mechanisms, incorporating three primary adaptive mechanisms: the adaptive parameter control mechanism (APCGWO), the adaptive fitness distance balance mechanism (AFDBGWO), and the adaptive neighborhood mutation mechanism (ADVGWO).

3.1 Adaptive parameter control mechanism (APCGWO)

The strategy of controlling the nonlinear parameter effectively balances the exploration and exploitation phases [55]. In the original GWO, the slow convergence rate or low convergence accuracy is partly due to the linear reduction of the convergence factor. To tackle this issue, this paper introduces a new nonlinear reduction strategy, presented in Eq. (10). As depicted in Fig 3, the convergence factor decreases nonlinearly from 2 to 0. The graph shows that at the start of the iteration, the curve is flat, indicating effective exploration of the entire search space by the AMGWO algorithm. In later iterations, the curve decays rapidly, signifying quick convergence of the algorithm.

(10)

3.2 Adaptive fitness distance balancing mechanism (AFDBGWO)

The Fitness Distance Balancing (FDB) method focuses on balancing the fitness of a solution candidate with its distance from the current optimal solution in the search space [56]. This strategy aims to mitigate the early convergence issue of metaheuristic search algorithms and enhance search efficiency by selecting solution candidates with high potential. In this context, Fitness (F) typically denotes the quality of the solution; for minimization problems, a solution with higher fitness corresponds to a lower objective function value. The Distance metric () captures the distance between the candidate and the best solution in the current population. This paper employs the Euclidean distance for calculation purposes:

(11)

where, represents the position coordinates of the solution candidate, denotes the position coordinates of the optimal solution, and is the number of populations.

The FDB method computes a score for each solution candidate, which combines both fitness and distance factors. The calculation formula of scoring can use the following ways:

(12)

where, and represent represent the normalized values of fitness and distance, respectively. The solution candidate with the highest FDB score is selected for the subsequent search operation during the selection process. This process allows the algorithm to maintain a certain degree of exploration through distance values while also leveraging known better solutions through fitness values. The FDB method effectively enhances the performance of metaheuristic algorithms by comprehensively considering the fitness and diversity of solution candidates, particularly in complex problems with multiple local optima. In this paper, we introduce the FDB mechanism to redesign the GWO algorithm and incorporate an adaptive strategy. The improved Grey Wolf step update method is as follows:

(13)

where, is the location of the improved gray wolves choose way, provide the formula (14) to choose, according to the number of iterations from the original method of adaptive or individual based on FDB score choice step updates, such not only can maintain the population diversity, can also speed up the convergence speed and accuracy of the algorithm.

(14)

Take this representative, said the current number of iterations, FDB individuals with the highest scores.

3.3 Adaptive neighborhood mutation mechanism (ADVGWO)

In order to enhance the global search capabilities of the GWO algorithm and prevent the occurrence of local optima in later iterations, this study introduces an adaptive neighborhood mutation mechanism. This mechanism takes into account the information exchange between α, β, δ wolves and the current global optimal solution. Its goal is to dynamically adjust the mutation intensity during the search process, enabling the algorithm to explore the solution space more effectively and avoid local optima. Initially, a mutation intensity factor is defined to measure the disparity between different ranks of wolves and the global optimal solution, as shown in Eq. (15). Subsequently, utilizing the calculated mutation intensity, the mutation operation is performed in the neighborhood of each level of wolves using Eq. (16). This step is designed to dynamically modify the search direction and step size of the wolves based on their relative positions to the global optimal solution, thereby achieving adaptive domain variation. This mechanism not only strengthens the algorithm's exploration capability but also maintains the efficiency of utilizing known information, offering a novel approach for addressing intricate optimization problems.

(15)(16)

where, denotes a standard normal random number. In summary, the pseudocode of AMGWO is shown in Algorithm 1.

Algorithm 1 AMGWO

1:    Begin

2:    Initialize the AMGWO parameters

3:    Calculate the fitness of each grey wolf

4:    Select α, β, and δ

5:    For t=1:Tmax

6:      The adaptive parameters are updated by Eq. (10)

7:      Step size updates by Eq. (13)

8:      Grey wolf adaptive domain variation by Eq. (16)

9:      For each grey wolf

10:       Change the positions by Eq. (9)

11:      End for

12:      Calculate the fitness of each grey wolf

13:      Update α, β, δ wolf

14       t=t+1

15:    End for

16:    Return best solution

17:    End

// Tmax denotes the maximum number of iterations

3.4 Time complexity analysis

For the same problem, we have a range of algorithmic solutions, each with its own set of advantages and disadvantages directly impacting execution efficiency and overall program performance [57]. The primary objective of algorithm analysis is to pinpoint the most suitable and optimized algorithm. In this study, we utilize the Big O notation as an assessment tool to gauge the time complexity of the algorithm [58,59], and perform a comparative analysis between AWGWO and GWO.

The specific application steps of the -order method [60] are briefly described as follows:

Simplified constant term: First, all additive constants in the running time of the algorithm are uniformly simplified to a constant 1 for the convenience of subsequent analysis.

Keep the highest order terms: In the adjusted running time function, only the highest order terms that have the greatest impact on the complexity are kept, and other lower order terms are ignored.

Non-essential constants are removed: If the highest order term exists and its coefficient is not 1, the coefficient is further removed and only its order is retained as the order representation.

Following the above steps, the time complexity analysis result of GWO is , where , , and represent the population size, dimension, and iteration number, respectively. It is worth noting that in the calculation process of AMGWO, no new loop structure is introduced, and no fundamental changes are made to the original loop order. Therefore, its time complexity analysis result is consistent with that of GWO, which is also .

4. Experimental setup

In this section, we first introduce the transformation function. Second, we elaborate on the fitness function design. Next, we introduce the datasets and evaluation metrics selected in this paper. Finally, we report the chosen contender parameter Settings.

4.1 Transformation functions

The traditional GWO excels at solving continuous search space problems. However, it encounters challenges when applied to problems such as FS, which involve binary optimization with solutions restricted to binary values {0,1}. To address this issue, researchers have introduced the concept of a transformation function, aimed at adapting the original capabilities of the Grey Wolf algorithm in the continuous search space to the binary domain. Among various transformation functions, the sigmoid function σ(⋅) stands out due to its unique properties. This function converts continuous input values into discrete values by mapping them to the range (0, 1) and then thresholding them to binary values (typically using 0.5 as the threshold). It is important to note that this approach has been proven effective in numerous GWO variants [41,42]. In our study, we similarly employed the sigmoid function as the transformation tool. The transformation function is defined as follows:

(10)

where, represents the continuous value corresponding to the ith search agent in dimension . is a random number between 0 and 1. The function σ(⋅) is regarded as a concrete example of the s-type function (Sigmoid function) [61]. The mathematical expression of the function is as follows:

(11)

4.2 Fitness function

In our experiments, we utilized the wrapper technique to execute each FS algorithm. The wrapper approach offers the distinct advantage of using the classifier's performance as the criterion for selecting features, resulting in an efficient and accurate FS process. To enhance the quality of the selected features, we specifically opted for the wrapper method based on the KNN [62] classifier, with the K value set to 5 (K = 5) to achieve superior classification results. To mitigate the risk of overfitting, we recommend employing a 10-fold cross-validation technique when working with real data sets [63]. In wrapper FS methods, the fitness function directly mirrors classification performance. The specific calculation formula for the fitness function used in this paper is elaborated in Eq. (20).

(12)

where, represents the classification error rate when ith runs. The mathematical model is as follows:

(13)

4.3 Datasets

In order to evaluate the performance of the proposed FS algorithm, we conducted experiments using 15 benchmark datasets. Table 2 provides an overview of the key parameters of these datasets, including the number of features, samples, and categories. These datasets are characterized by high dimensionality and small sample size, which are commonly encountered in FS literature. They have been sourced from Arizona State University and Jilin University and encompass a wide range of data types, such as microarray gene expression, image (face) detection, and email text. It’s worth noting that the datasets have been pre-processed by their respective providers.

4.4 Evaluation metrics

In order to ensure the reliability of the FS results, we performed 10 independent runs to evaluate the FS method. Our focus was on assessing classification accuracy and feature subset dimension. It’s important to note that in our experiments, each FS method produced a distinct feature subset for each dataset in every run. To validate the effectiveness of our proposed method, we employed a range of quantitative indicators for evaluation.

  1. (1). Best: minimum classification error/feature subset size of all solutions obtained by running the algorithm 10 times.
  2. (2). Worst: maximum classification error/feature subset size of all solutions obtained by running the algorithm 10 times.
  3. (3). Mean [64]: the average classification error rate/the size of the feature subset of all solutions obtained by running the algorithm 10 times.
  4. (4). Standard deviation (Std): The standard deviation is calculated on the set of all final solutions obtained by running the algorithm 10 times. It is an important indicator to measure the stability and robustness of the optimizer, and the mathematical expression is as follows:
(14)

where, is the number of times the algorithm is run for the FS problem. is the best solution to run times.

4.5 Competitor parameter settings

In order to evaluate the performance of the proposed AMGWO algorithm, we conducted a comparative analysis with six other optimization algorithms, including the original GWO and its six variants: GWO [38], GNHGWO [65], BABCGWO [1], SOGWO [66], EGWO [67], and AGWO [68]. The parameter settings of these optimization algorithms are detailed in Table 3. The maximum number of iterations and the population size were standardized at 100 and 30, respectively. Each algorithm was independently executed 30 times to ensure a fair comparison. Subsequently, the standard deviation (Std), best, worst, and Mean values were calculated and recorded, with the best result being highlighted in bold.

5. Results and discussion

5.1 Comparison with competitor algorithms

In this section, the proposed AMGWO is compared with six other competitive algorithms in terms of classification error rate, feature subset size, and running time. This comparison aims to verify the effectiveness and efficiency of AMGWO in the context of FS for classification tasks.

5.1.1 Accuracy analysis.

The classification error rates of AMGWO and six other competing algorithms are presented in Table 4. The superior results are highlighted in bold. Across most datasets, AMGWO outperforms the other algorithms in terms of classification error rates. The smaller standard deviation of the classification error rate indicates that the algorithm is more stable and robust. Furthermore, the complexity of the problem is directly linked to its dimensionality. As the problem dimensionality increases, AMGWO exhibits the statistically smallest error rate, suggesting that its classification performance is least affected by higher dimensionality. To further illustrate the effectiveness of AMGWO in terms of classification error rate, we have included boxplots of all algorithms in Fig 4. It is evident from the boxplot positions that AMGWO outperforms the other algorithms in terms of stability and robustness.

thumbnail
Table 4. Comparison of classification error rates for different competitor algorithms, with the top-ranked results shown in bold.

https://doi.org/10.1371/journal.pone.0318903.t004

thumbnail
Fig 4. Boxplots of classification error rates for all algorithms.

https://doi.org/10.1371/journal.pone.0318903.g004

5.1.2 Feature number analysis.

The results for the feature subset sizes of AMGWO and six other competing algorithms are detailed in Table 5. The best results are indicated in bold. It is evident that in terms of feature subset size, AMGWO achieves the smallest feature subset across 11 datasets. On the 13 datasets, the average feature subset size is optimal. Furthermore, the standard deviation for most datasets with varying dimensions is minimal. Fig 5 displays a boxplot based on the feature subset size results. It is clearly evident from the boxplot that the position of AMGWO is lower than that of the other algorithms, indicating its effectiveness in reducing the feature subset size.

thumbnail
Table 5. Comparison of feature subset sizes for different competitor algorithms, and the top-ranked results are shown in bold.

https://doi.org/10.1371/journal.pone.0318903.t005

thumbnail
Fig 5. Boxplots of feature subset size for all algorithm.

https://doi.org/10.1371/journal.pone.0318903.g005

5.1.3 Convergence curve analysis.

Please take note of the following text: The convergence curves of the seven algorithms on the 15 datasets are displayed in Fig 6. Each curve represents the average outcome of 10 runs for every iteration. These curves illustrate that AMGWO exhibits faster convergence and delivers higher-quality solutions compared to its counterparts across most datasets. While AMGWO may yield lower quality results than certain competing algorithms on the Prostate, Colon, and ALLAML datasets, it demonstrates superior convergence speed on these datasets. Overall, AMGWO outperforms the other six competing algorithms in terms of both convergence speed and solution quality.

5.1.4 Runtime analysis.

When dealing with high-dimensional datasets, the computational cost of FS algorithms mainly focuses on the evaluation of individual feature subsets. As shown in Fig 7, the running time of the AMGWO algorithm is lower than that of other competitors on most datasets, benefiting from the fact that each feature subset is evaluated only once per iteration in the AMGWO algorithm. Therefore, in terms of computation time, AMGWO outperforms the other algorithms.

thumbnail
Fig 7. The average running time of all competing algorithms.

https://doi.org/10.1371/journal.pone.0318903.g007

5.2 Statistical tests

To further verify the excellent performance of AMGWO, we performed the Wilson rank sum test and the Friedman test on the classification error rate and feature subset size results obtained by the competitor algorithms.

5.2.1 Wilcoxon rank sum test.

We utilized the Wilcoxon rank sum test [69] to compare the classification error rate and feature subset size results achieved by AMGWO and other competing methods, as presented in Table 6 and Table 7, respectively. A P-value lower than 0.05 indicates a significant difference between AMGWO and the other algorithms. Conversely, if the P-value is higher, there is no significant difference, and these non-significant results are highlighted in bold. It is evident from the tables that AMGWO exhibits notable distinctions from the competing algorithms across most datasets. Based on the analysis of the preceding experiments, it is evident that AMGWO significantly outperforms its competitors in terms of overall performance.

thumbnail
Table 6. The Wilcoxon rank sum test for classification error rates, where no difference is shown in bold.

https://doi.org/10.1371/journal.pone.0318903.t006

thumbnail
Table 7. Wilcoxon rank sum test for feature subset sizes, where no difference is shown in bold.

https://doi.org/10.1371/journal.pone.0318903.t007

5.2.1 Friedman rank test.

We used the Friedman rank test [70] to compare the classification error rate and feature subset size results achieved by AMGWO with those of other competing methods. The outcomes are detailed in Table 8 and Table 9, respectively. While AMGWO may not have secured the top rank in certain datasets, it boasts the lowest Friedman mean value across 15 datasets and emerges as the overall top performer. This underscores the superior performance of our proposed AMGWO over other algorithms considered in the selected datasets.

thumbnail
Table 8. Friedman test for classification error rate, ranked first results are shown in bold.

https://doi.org/10.1371/journal.pone.0318903.t008

thumbnail
Table 9. Friedman test for feature subset size, ranking first results are shown in bold.

https://doi.org/10.1371/journal.pone.0318903.t009

5.3 Ablation experiment

In order to assess the effectiveness of our proposed strategy, APCGWO, AFDBGWO, ADVGWO, the original GWO, and AMGWO fused with the three strategies underwent testing for accuracy analysis and feature number analysis. The experimental results are detailed in Table 10 and Table 11, respectively. Table 10 indicates that ADVGWO ranks second in accuracy analysis results, with a higher frequency of bold numbers, while the fusion of the three strategies, AMGWO, secures the top position. The table's last row also shows that AMGWO has the smallest average Friedman rank, signifying superior performance. In Table 11, ADVGWO also ranks second in the analysis results for the number of features, with a higher frequency of bold numbers, while AMGWO, incorporating the three strategies, maintains its top position. The average Friedman rank of AMGWO in the last row is 2.2, indicating its superior performance compared to the other competing algorithms.

thumbnail
Table 10. Comparison of classification error rates for ablation experiments, where the top-ranked results are shown in bold.

https://doi.org/10.1371/journal.pone.0318903.t010

thumbnail
Table 11. Feature subset size comparison for ablation experiments, and the top ranked results are shown in bold.

https://doi.org/10.1371/journal.pone.0318903.t011

In summary, although a single improvement strategy can achieve good results in some data sets, the comprehensive performance of AMGWO fusing three strategies is better.

6. Summary and outlook

This study introduces an effective FS method based on the GWO algorithm for classification purposes. Through comparative analysis with the original GWO and its five advanced variants using 15 high-dimensional datasets, the experimental findings demonstrate that the proposed AMGWO offers advantages in terms of accuracy, convergence speed, and feature subset size. These advantages can be attributed to three key aspects:

  1. (1). The incorporation of a nonlinear parameter control strategy that effectively balances exploration and exploitation.
  2. (2). The introduction of an adaptive fitness distance balance mechanism to prevent premature convergence in the search process and select high-potential solutions, thereby enhancing search efficiency.
  3. (3). The development of an adaptive neighborhood mutation mechanism that takes into account the information exchange between α, β, δ wolves and the current global optimal solution, enabling the algorithm to more effectively identify the global optimal solution.

While the proposed method has demonstrated its effectiveness on high-dimensional datasets, it does have some limitations. For example, sometimes the convergence speed is insufficient and it will fall into local optimum. In addition, the research on population initialization and boundary control is not deep enough. In our future work, we intend to introduce a multi-objective version based on the GWO algorithm to tackle the multi-objective FS problem. The goal is to maximize classification performance and minimize the number of selected features simultaneously. Additionally, ensuring the algorithm's adaptability and computational efficiency across different scenarios presents a significant challenge in the field of FS.

References

  1. 1. Zhang Y, Wang J, Li X, Huang S, Wang X. Feature selection for high-dimensional datasets through a novel artificial bee colony framework. Algorithms. 2021;14(11):324.
  2. 2. Chai Q-W, Kong L, Pan J-S, Zheng W-M. A novel discrete artificial bee colony algorithm combined with adaptive filtering to extract fetal electrocardiogram signals. Expert Syst Appl. 2024;247:123173.
  3. 3. Özbay FA, Özbay E. A new approach for gender detection from voice data: Feature selection with optimization methods. J Fac Eng Archit Gazi Univ. 2023;38:1179–92.
  4. 4. Seyyedabbasi A. Binary sand cat swarm optimization algorithm for wrapper feature selection on biological data. Biomimetics. 2023;8.
  5. 5. Zhang L, Mistry K, Neoh SC, Lim CP. Intelligent facial emotion recognition using moth-firefly optimization. Knowledge-Based Systems. 2016;111:248–67.
  6. 6. Jiao R, Nguyen BH, Xue B, Zhang M. A survey on evolutionary multiobjective feature selection in classification: Approaches, applications, and challenges, IEEE Trans Evol Comput. 2023:1.
  7. 7. Mafarja MM, Mirjalili S. Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing. 2017;260:302–12.
  8. 8. Kennedy J, Eberhart R. Particle swarm optimization, in: Proceedings of ICNN’95 - International Conference on Neural Networks, 1995, pp. 1942–8 vol. 1944.
  9. 9. Storn R, Price K. Differential evolution – a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim. 1997;11(4):341–59.
  10. 10. Holland JH. Genetic algorithms. Sci Am. 1992;267:66–73.
  11. 11. Gao W, Liu S, Huang L. A global best artificial bee colony algorithm for global optimization. J Comput Appl Math. 2012;236(11):2741–53.
  12. 12. Heidari AA, Mirjalili S, Faris H, Aljarah I, Mafarja M, Chen H. Harris hawks optimization: Algorithm and applications. Future Gener Comput Syst Int J Sci. 2019;97:849–72.
  13. 13. Mirjalili S, Lewis A. The whale optimization algorithm. Adv Eng Softw. 2016;95:51–67.
  14. 14. Mirjalili S. Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm. Knowledge-Based Systems. 2015;89:228–49.
  15. 15. Hashim FA, Hussien AG. Snake optimizer: A novel meta-heuristic optimization algorithm. Knowledge-Based Systems. 2022;242:108320.
  16. 16. Al-qudah NEA, Abed-alguni BH, Barhoush M. Bi-objective feature selection in high-dimensional datasets using improved binary chimp optimization algorithm. Int J Mach Learn Cyber. 2024;15(12):6107–48.
  17. 17. Alawad NA, Abed-alguni BH, Al-Betar MA, Jaradat A. Binary improved white shark algorithm for intrusion detection systems. Neural Comput Applic. 2023;35(26):19427–51.
  18. 18. Abed-Alguni BH, Alawad NA, Al-Betar MA, Paul D. Opposition-based sine cosine optimizer utilizing refraction learning and variable neighborhood search for feature selection. Appl Intell (Dordr). 2023;53(11):13224–60. pmid:36247211
  19. 19. Abed-alguni BH, AL-Jarah SH. IBJA: An improved binary DJaya algorithm for feature selection. J Comput Sci. 2024;75:102201.
  20. 20. Mehta P, Kumar S, Tejani GGJJoO. MOBBO: A multiobjective brown bear optimization algorithm for solving constrained structural optimization problems, 2024(2024):5546940.
  21. 21. Mashru N, Tejani GG, Patel P, Khishe M. Optimal truss design with MOHO: A multi-objective optimization perspective. PLoS One. 2024;19(8):e0308474. pmid:39159240
  22. 22. Alawad NA, Abed-alguni BH, El-ibini MJTJoS. Hybrid snake optimizer algorithm for solving economic load dispatch problem with valve point effect, 2024:1–50.
  23. 23. Alawad NA, Abed-alguni BH, Saleh II. Improved arithmetic optimization algorithm for patient admission scheduling problem. Soft Comput. 2023;28(7–8):5853–79.
  24. 24. Barhoush M, Abed-alguni BH, Al-qudah NEA. Improved discrete salp swarm algorithm using exploration and exploitation techniques for feature selection in intrusion detection systems. J Supercomput. 2023;79(18):21265–309.
  25. 25. Fu S, Ma C, Li K, Xie C, Fan Q, Huang H, et al. Modified LSHADE-SPACMA with new mutation strategy and external archive mechanism for numerical optimization and point cloud registration. Artif Intell Rev. 2025;58(3).
  26. 26. Xia J-Y, Li S, Huang J-J, Yang Z, Jaimoukha IM, Gunduz D. Metalearning-based alternating minimization algorithm for nonconvex optimization. IEEE Trans Neural Netw Learn Syst. 2023;34(9):5366–80. pmid:35439147
  27. 27. Liu Z, Xiong X, Li Y, Yu Y, Lu J, Zhang S, et al. HyGloadAttack: Hard-label black-box textual adversarial attacks via hybrid optimization. Neural Netw. 2024;178:106461. pmid:38906054
  28. 28. Fan H, Wang C, Li S. Novel method for reliability optimization design based on rough set theory and hybrid surrogate model. Comput Methods Appl Mech Eng. 2024;429:117170.
  29. 29. Wang Y, Han Z, Xu X, Luo Y. Topology optimization of active tensegrity structures. Comput Struct. 2024;305:107513.
  30. 30. Xu X, Lin Z, Li X, Shang C, Shen Q. Multi-objective robust optimisation model for MDVRPLS in refined oil distribution. Int J Prod Res. 2021;60(22):6772–92.
  31. 31. Zhu C. Computational intelligence-based classification system for the diagnosis of memory impairment in psychoactive substance users. J Cloud Comp. 2024;13(1).
  32. 32. Zhou Z, Zhou X, Qi H, Li N, Mi C. Near miss prediction in commercial aviation through a combined model of grey neural network. Expert Syst Appl. 2024;255:124690.
  33. 33. Huang H, Wu N, Liang Y, Peng X, Shu J. SLNL: A novel method for gene selection and phenotype classification. Int J Intell Syst. 2022;37(9):6283–304.
  34. 34. Pan H, Wang Y, Li Z, Chu X, Teng B, Gao H. A complete scheme for multi-character classification using EEG signals from speech imagery. IEEE Trans Biomed Eng. 2024;71(8):2454–62. pmid:38470574
  35. 35. Hu C, Zhao C, Shao H, Deng J, Wang Y. TMFF: Trustworthy multi-focus fusion framework for multi-label sewer defect classification in sewer inspection videos. IEEE Trans Circuits Syst Video Technol. 2024;34(12):12274–87.
  36. 36. Fu S, Li K, Huang H, Ma C, Fan Q, Zhu Y. Red-billed blue magpie optimizer: a novel metaheuristic algorithm for 2D/3D UAV path planning and engineering design problems. Artif Intell Rev. 2024;57(6).
  37. 37. Fu S, Huang H, Ma C, Wei J, Li Y, Fu Y. Improved dwarf mongoose optimization algorithm using novel nonlinear control and exploration strategies. Expert Syst Appl. 2023;233:120904.
  38. 38. Mirjalili S, Mirjalili SM, Lewis A. Grey wolf optimizer. Adv Eng Softw. 2014;69:46–61.
  39. 39. Emary E, Zawbaa HM, Hassanien AE. Binary grey wolf optimization approaches for feature selection. Neurocomputing. 2016;172:371–81.
  40. 40. Zhao GB, Wang HY, Jia DL, Wang QB. Feature selection of grey wolf optimizer based on quantum computing and uncertain symmetry rough set. Symmetry-Basel. 2019;11.
  41. 41. Wang J, Lin D, Zhang Y, Huang S. An adaptively balanced grey wolf optimization algorithm for feature selection on high-dimensional classification. Eng Appl Artif Intell. 2022;114:105088.
  42. 42. Paharia N, Jadon RS, Gupta SK. Feature selection using improved multiobjective and opposition-based competitive binary gray wolf optimizer for facial expression recognition. J Electron Imag. 2022;31(03).
  43. 43. Hong L, Wang G, Özcan E, Woodward J. Ensemble strategy using particle swarm optimisation variant and enhanced local search capability. Swarm Evol Comput. 2024;84:101452.
  44. 44. Li H, Lv T, Shui Y, Zhang J, Zhang H, Zhao H, et al. An improved grey wolf optimizer with weighting functions and its application to unmanned aerial vehicles path planning. Comput Electr Eng. 2023;111:108893.
  45. 45. He P, Wu W. Levy flight-improved grey wolf optimizer algorithm-based support vector regression model for dam deformation prediction. Front Earth Sci. 2023;11.
  46. 46. Ou Y, Yin PF, Mo LP. An improved grey wolf optimizer and its application in robot path planning. Biomimetics. 2023;8.
  47. 47. Chang D, Rao C, Xiao X, Hu F, Goh M. Multiple strategies based Grey Wolf Optimizer for feature selection in performance evaluation of open-ended funds. Swarm Evol Comput. 2024;86:101518.
  48. 48. Premkumar M, Sinha G, Ramasamy MD, Sahu S, Subramanyam CB, Sowmya R, et al. Augmented weighted K-means grey wolf optimizer: An enhanced metaheuristic algorithm for data clustering problems. Sci Rep. 2024;14.
  49. 49. Zhao D, Cai GR, Wang YX, Li XX. Path planning of obstacle-crossing robot based on golden sine grey wolf optimizer. Appl Sci (Basel). 2024;14.
  50. 50. Lian Z, Shu J, Zhang Y, Sun J. Convergent grey wolf optimizer metaheuristics for scheduling crowdsourcing applications in mobile edge computing. IEEE Internet Things J. 2024;11(2):1866–79.
  51. 51. Zhang HZ, Zhang Y, Niu YX, He K, Wang YK. A Grey wolf optimizer combined with Artificial fish swarm algorithm for engineering design problems. Ain Shams Eng J. 2024;15.
  52. 52. Zhu Z, Sun Z, Xie X, Sun Z. Improved grey wolf optimizer based on neighborhood trust model for parameter identification of PEMFC. Int J Hydrogen Energy. 2024;60:769–79.
  53. 53. Liang J, Du Y, Xu Y, Xie B, Li W, Lu Z, et al. Using adaptive chaotic grey wolf optimization for the daily streamflow prediction. Expert Syst Appl. 2024;237:121113.
  54. 54. Wolpert DH, Macready WG. No free lunch theorems for optimization. IEEE Trans Evol Computat. 1997;1(1):67–82.
  55. 55. Ozsoydan FB. Artificial search agents with cognitive intelligence for binary optimization problems. Comput Ind Eng. 2019;136:18–30.
  56. 56. Kahraman HT, Aras S, Gedikli E. Fitness-distance balance (FDB): A new selection method for meta-heuristic search algorithms. Knowl-Based Syst. 2020;190:105169.
  57. 57. Tallini LG, Pelusi D, Mascella R, Pezza L, Elmougy S, Bose B. Efficient non-recursive design of second-order spectral-null codes. IEEE Trans Inform Theory. 2016;62(6):3084–102.
  58. 58. Gupta S, Deep K, Heidari AA, Moayedi H, Wang M. Opposition-based learning Harris hawks optimization with advanced transition rules: principles and analysis. Expert Syst Appl. 2020;158:113510.
  59. 59. Li K, Huang H, Fu S, Ma C, Fan Q, Zhu Y. A multi-strategy enhanced northern goshawk optimization algorithm for global optimization and engineering design problems. Comput Methods Appl Mech Eng. 2023;415:116199.
  60. 60. Pelusi D, Mascella R, Tallini L, Nayak J, Naik B, Abraham A. Neural network and fuzzy system for the tuning of Gravitational Search Algorithm parameters. Expert Syst Appl. 2018;102:234–44.
  61. 61. Mirjalili S, Lewis A. S-shaped versus V-shaped transfer functions for binary particle swarm optimization. Swarm Evol Comput. 2013;9:1–14.
  62. 62. Liao Y, Vemuri VR, Use of K-Nearest Neighbor classifier for intrusion detection11An earlier version of this paper is to appear in the Proceedings of the 11th USENIX Security Symposium, San Francisco, CA, August 2002. Comput Secur. 2002;21:439–48.
  63. 63. Kohavi R, John GH. Wrappers for feature subset selection. Artif Intell. 1997;97(1–2):273–324.
  64. 64. Tilahun S, Hong C, Ong HC. Prey-predator algorithm: A new metaheuristic algorithm for optimization problems. Int J Inf Technol Decis Mak. 2015;14.
  65. 65. Akbari E, Rahimnejad A, Gadsden S, A greedy non‐hierarchical grey wolf optimizer for real‐world optimization. Electron Lett. 2021.
  66. 66. Dhargupta S, Ghosh M, Mirjalili S, Sarkar R. Selective opposition based grey wolf optimization. Expert Syst Appl. 2020;151:113389.
  67. 67. Joshi H, Arora S. Enhanced Grey Wolf Optimization Algorithm for Global Optimization. Fundam Inform. 2017;153(3):235–64.
  68. 68. Qais MH, Hasanien HM, Alghuwainem S. Augmented grey wolf optimizer for grid-connected PMSG-based wind energy conversion systems. Appl Soft Comput. 2018;69:504–15.
  69. 69. Gao YS, Zhang JH, Wang YL, Wang JP, Qin L. Love evolution algorithm: A stimulus-value-role theory-inspired evolutionary algorithm for global optimization. J Supercomput. 2024.
  70. 70. Jia H, Lu C, Xing Z. Memory backtracking strategy: An evolutionary updating mechanism for meta-heuristic algorithms. Swarm Evol Comput. 2024;84:101456.