Figures
Abstract
Feature selection is a crucial preprocessing step in the fields of machine learning, data mining and pattern recognition. In medical data analysis, the large number and complexity of features are often accompanied by redundant or irrelevant features, which not only increase the computational burden, but also may lead to model overfitting, which in turn affects its generalization ability. To address this problem, this paper proposes an improved red-billed blue magpie algorithm (IRBMO), which is specifically optimized for the feature selection task, and significantly improves the performance and efficiency of the algorithm on medical data by introducing multiple innovative behavioral strategies. The core mechanisms of IRBMO include: elite search behavior, which improves global optimization by guiding the search to expand in more promising directions; collaborative hunting behavior, which quickly identifies key features and promotes collaborative optimization among feature subsets; and memory storage behavior, which leverages historically valid information to improve search efficiency and accuracy. To adapt to the feature selection problem, we convert the continuous optimization algorithm to binary form via transfer function, which further enhances the applicability of the algorithm. In order to comprehensively verify the performance of IRBMO, this paper designs a series of experiments to compare it with nine mainstream binary optimization algorithms. The experiments are based on 12 medical datasets, and the results show that IRBMO achieves optimal overall performance in key metrics such as fitness value, classification accuracy and specificity. In addition, compared with nine existing feature selection methods, IRBMO demonstrates significant advantages in terms of fitness value. To further enhance the performance, this paper also constructs the V2IRBMO variant by combining the S-shaped and V-shaped transfer functions, which further enhances the robustness and generalization ability of the algorithm. Experiments demonstrate that IRBMO exhibits high efficiency, generality and excellent generalization ability in feature selection tasks. In addition, used in conjunction with the KNN classifier, IRBMO significantly improves the classification accuracy, with an average accuracy improvement of 43.89% on 12 medical datasets compared to the original Red-billed Blue Magpie algorithm. These results demonstrate the potential and wide applicability of IRBMO in feature selection for medical data.
Citation: Zhu C, Wang Z, Peng Y, Xiao W (2025) An improved Red-billed blue magpie feature selection algorithm for medical data processing. PLoS One 20(5): e0324866. https://doi.org/10.1371/journal.pone.0324866
Editor: Vedik Basetti, SR University, INDIA
Received: January 14, 2025; Accepted: May 1, 2025; Published: May 22, 2025
Copyright: © 2025 Zhu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
1. Introduction
With the full arrival of the data era, data has been deeply integrated into people’s lives and become an indispensable key element. Among them, medical data [1] has attracted much attention due to its rich life and health information and its important role in disease exploration, diagnosis and treatment. However, the high-dimensional complexity and diversity of healthcare data pose a serious challenge of “dimensionality disaster” [2]. The presence of high-dimensional features not only complicates data processing but can also diminish the accuracy and reliability of the analysis outcomes. Conventional approaches often fall short when handling high-dimensional medical data, making it challenging to fully uncover its underlying potential. Therefore, optimizing the processing and analysis of high-dimensional medical data has become a key issue in the field of data science [3]. Data mining [4] techniques extract useful information and knowledge from massive data through statistics, machine learning and artificial intelligence to support decision making, predict trends and discover potential value. However, massive subsets of features may exist in high-dimensional data, making it impractical to find the optimal subset directly. In this context, data dimensionality reduction techniques have emerged, which are mainly categorized into two methods: feature extraction [5] and feature selection [6].
Feature selection [6] significantly improves the performance, interpretability, and computational efficiency of machine learning models by identifying and retaining the most valuable features in the data, which is particularly important in areas such as medical diagnosis and fault prediction. Different application scenarios have different requirements for feature selection, which requires the development of customized methods to ensure high accuracy and minimize information loss based on the specific problem and data characteristics. Feature selection [7] eliminates redundant and irrelevant features through statistical analysis, model evaluation, or algorithmic mechanisms to reduce noise and improve the generalization ability of the model. Its methods are mainly categorized into three types: filtered [8], embedded [9] and wrapped [10], each with its own advantages and applicable scenarios, providing diversified solutions for high-dimensional data processing. Filtered feature selection [8] screens key features by measuring the statistical correlation or dependence between features and target variables, e.g., by utilizing statistical metrics such as correlation coefficient, variance and mutual information. The method is computationally simple, efficient in execution, and particularly good at handling large-scale datasets. However, its limitation is that evaluating each feature in isolation ignores the potential interaction effects among features, which may lead to the erroneous deletion of some features that perform well in a particular combination. Embedded feature selection [9], on the other hand, integrates the feature selection mechanism into the training process of the learner, and leverages the machine learning algorithm’s own properties (e.g., regularization techniques) to automatically filter the optimal subset of features. This approach performs well when dealing with linear relationship features, but it is weak when facing nonlinear relationships, and its interpretability is relatively weak.
In contrast, packaged feature selection [11] stands out with its unique optimization perspective. It regards feature selection as a complex optimization problem, generates diverse feature subsets through systematic search, and utilizes performance metrics of machine learning models to accurately evaluate the effectiveness of these subsets. This process begins with the selection of an appropriate evaluation model, followed by the generation of a series of feature subsets using efficient strategies such as recursive search, and the rigorous validation of their performance on an independent validation or test set. Compared with the previous two approaches, wrapper feature selection [12] shows significant advantages: it directly evaluates the model performance as a benchmark, and the selected subset of features can be more relevant to the actual application requirements; at the same time, by comprehensively exploring the feature space, the wrapper approach is able to discover the feature combinations that are optimal for the performance of a specific model, which is undoubtedly a great advantage for dealing with datasets that have complex interactions between the features and have very high requirements on the performance of the model [13].
Packed feature selection, although effective in the field of feature optimization, still faces the challenges of high computational cost and the possibility of not always finding the globally optimal feature combination [13]. To cope with these challenges, metaheuristic algorithms [14] have emerged as an important adjunct to wrapper feature selection with their excellent performance in complex optimization tasks. With powerful search capabilities and flexible optimization frameworks, meta-heuristic algorithms are able to efficiently explore large feature spaces under complex constraints, and their integration into the packaged feature selection framework not only significantly reduces the burden of computational resources, but also significantly improves the efficiency and accuracy of feature selection [15]. These algorithms can efficiently identify an optimal or near-optimal feature subset through a smart search strategy, thereby minimizing computational resource waste. Simultaneously, these algorithms exhibit strong compatibility with the feature selection evaluation framework, ensuring that the chosen features significantly decrease computational overhead while preserving the model’s efficacy [16]. This combination strategy provides an innovative idea for solving the feature selection problem for large-scale datasets, and shows a broad application prospect especially in high-dimensional complex fields such as medical engineering [17]. This method enhances the overall efficiency of feature selection while ensuring the model’s reliability and stability in real-world applications. The synergy between meta-heuristic algorithms and packed feature selection opens up a new way for feature selection in large-scale data processing with high efficiency and accuracy, fully demonstrating its great potential and value in the field of data science [16]. Specifically, Ghaemi et al. [18] introduced a forest optimization algorithm for feature selection, aiming to identify the most relevant features from a dataset. Samieiyan et al. [19] developed a feature selection method based on the crow search algorithm, which enhances the exploration phase and effectively reduces dataset dimensionality. Jia et al. [20] proposed three hybrid algorithms that combine the Seagull Optimization Algorithm (SOA) with Thermal Exchange Optimization (TEO) to address the feature selection challenge. Xu et al. [21] presented various Binary Arithmetic Optimization Algorithms (BAOA), each using different strategies to optimize feature selection. Too et al. [22] put forward an improved Competitive Genetic Algorithm, along with a faster version, to enhance the efficiency of genetic algorithms in feature selection.
Metaheuristic algorithms, while showing great potential in the field of feature selection, still face significant challenges [23]. The main challenge lies in striking a balance between exploration and exploitation, with the issue of parameter tuning further limiting performance. In addition, some algorithms neglect computational efficiency in the pursuit of innovation, making it difficult to cope with complex real-world problems. Therefore, when applying them to feature selection, it is necessary to optimize them for specific problems in order to improve the practical performance. Based on the “no free lunch theorem” [24], i.e., there is no universal algorithm that can be applied to all optimization problems, researchers have proposed diversified improvement strategies for different feature selection requirements. These strategies optimize the search mechanism and strike a balance between efficiency and accuracy to cope with the complexity of feature selection for high-dimensional data more effectively. For instance, Hu et al. [25] introduced an enhanced black widow optimization algorithm aimed at feature selection, while Hammouri et al. [26] presented an improved dragonfly optimization approach for the same purpose. Similarly, Peng et al. [27] developed a feature selection method based on an optimized ant colony algorithm. Additionally, Li [28] proposed a local pair of bursary string gray wolf optimization algorithm for feature selection within the context of data classification. Al-Betar et al. [29] proposed an enhanced version of the electric eel optimization algorithm to solve the high dimensional feature selection problem, achieving efficient feature selection and classification accuracy. There are also many improved meta-heuristic algorithms used to improve the effectiveness of the feature selection problem including differential evolutionary algorithms [30], whale optimization [31], grey wolf optimization [32], ant colony optimization [33], and so on. Currently, there are also many researchers for processing medical data, for example, Braik et al. [34] proposed three binary white shark optimizer variants that significantly improve the classification accuracy and efficiency of feature selection in machine learning tasks by enhancing exploratory and exploitative mechanisms. Braik et al. [35] proposed a capuchin monkey search algorithm (CSA) in three adaptive versions (ECSA, PCSA, SCSA) to significantly improve the performance of feature selection (FS) in cognitive computing. Awadallah et al. [36] proposed six binary artificial rabbit optimization (ARO) variants to significantly improve the performance of feature selection (FS) for medical diagnostic data. Given the complex interwoven interactions among features in medical datasets, the large feature search space, and the stochastic nature inherent in existing technological tools, the task of feature selection still harbors great optimization potential. Given the complex and intertwined interactions among features in medical datasets, which leads to non-intuitive and unpredictable relationships among features; at the same time, medical datasets usually possess a large feature search space, which makes traditional feature selection methods face great computational challenges in processing [37]. In addition, existing technical means often carry a certain degree of randomness when performing feature selection, which not only affects the stability and reproducibility of the results, but also limits the optimization effect of the feature selection method. Therefore, despite the many advances that have been made in the field of feature selection, there still exists a great potential for optimization, especially in improving the accuracy, efficiency and stability of feature selection [38]. The current research difficulties lie in how to effectively cope with the complexity of medical datasets, as well as how to develop more robust and efficient feature selection methods to meet the urgent needs in practical applications [39]. To effectively address these challenges, the development of a comprehensive and advanced search strategy that can effectively circumvent local optimal solutions and robustly approximate the global optimal subset of features is of great significance in promoting research progress and achieving major breakthroughs in the field of feature selection for medical data.
Given this, this study deeply optimizes the red-billed blue magpie algorithm [40] to better meet the practical needs of feature selection for medical data, especially to cope with the complex interactions among features. Inspired by the survival strategy of red-billed blue magpies, this population-based optimization algorithm has gained significant attention across various fields. Its appeal stems from its ability to balance global search and local refinement, outstanding search performance, strong dynamic adaptability, and broad application potential. However, directly applying the red-billed blue magpie algorithm to the feature selection problem is challenging due to the complexity of feature interactions and the vast search space, which limit its ability to fully exploit its potential. To overcome this challenge, this study systematically optimizes the red-billed blue magpie algorithm for the special characteristics of the feature selection task while retaining its original efficient search capability. By introducing innovative behavioral mechanisms, the algorithm is endowed with a finer search strategy, enabling it to capture key feature combinations more acutely in the huge feature space. Meanwhile, the study focuses on addressing the trade-off between exploration and exploitation, leveraging the three core behavioral strategies of the red-billed blue magpie algorithm to enhance both global search and local optimization. And in the design process of IRBMO algorithm, we attach great importance to its simplicity and practicability, and do not rely on parameters, IRBMO algorithm is able to show excellent performance without the need of complex parameter tuning, and still show excellent performance. The key contributions of this paper are as follows:
- (a) By simulating the natural behavior of the red-billed blue magpie, we propose a multi-behavioral improved IRBMO algorithm aimed at solving the medical data feature selection problem more effectively. The algorithm incorporates the following three key behavioral strategies:
- Ⅰ. Elite Search Behavior: Simulate the efficient guidance of red-billed blue magpie elite individuals to quickly locate the optimal feature combinations, combine random exploration and fine search strategy, and iteratively update to approach the global optimal feature solution.
- Ⅱ. Collaborative Hunting Behavior: Based on the concept of teamwork, the leader hunts for the optimal features together with the elites and random individuals, and flexibly adjusts the strategy to ensure that the optimal feature combinations are finally captured (i.e., found).
- Ⅲ. Memory Storage Behavior: Simulate the red-billed blue magpie’s strategy of storing food to cope with shortages, and when the current feature selection is insufficient, use the explored high-quality features to supplement them, so as to steadily advance the feature selection process and overcome the problem.
- (b) To validate the performance of the IRBMO algorithm for feature selection of medical data, experiments were conducted in conjunction with a KNN classifier on 12 medical datasets of varying complexity.
The structure of the paper is as follows: Section 2 summarizes the literature review, Section 3 reviews the original RBMO algorithm, Section 4 introduces the IRBMO algorithm, Section 5 provides extensive experimental validation, and Section 6 concludes with discussions on future work.
2. Literature review
When dealing with diverse data, feature selection methods are favored by many scholars due to their high efficiency, among which, metaheuristic algorithms have been widely used in optimizing the feature selection problem, especially within the framework of packaged feature selection, which demonstrates significant advantages. It is well known that there is a wide variety of meta-heuristic algorithms for feature selection problems, covering group intelligence algorithms, evolutionary algorithms, algorithms based on physical principles, and innovative hybrid algorithms incorporating multiple strategies. The following is a carefully categorized overview of some of the metaheuristic algorithms applied to the feature selection problem.
Given the vast literature in the field of feature selection and the lack of uniform and standardized evaluation tools, it leads to the fact that different algorithmic strategies are often required for different datasets. Therefore, in this section the discussion focuses on the classification of feature selection algorithms based on meta-heuristic approaches.
2.1. Feature selection algorithms based on Swarm Intelligence
Swarm-based optimization algorithms have demonstrated a wide range of application potentials in the field of feature selection, successfully addressing the challenges of multi-dimensional and complex datasets. In 2021, Ma et al. [33] innovatively proposed a two-stage hybrid ant colony optimization algorithm (TSHFS-ACO) in response to the fact that the ant colony optimization algorithm, although possessing excellent search capabilities, is mainly limited to low-dimensional datasets. The algorithm accurately determines the size of the optimal feature subset by introducing an interval strategy, and guides the search process by combining feature relevance and classification performance, thus effectively alleviating the problem of search space explosion in high-dimensional feature selection. Experiments on several high-dimensional datasets show that TSHFS-ACO not only has excellent performance but also has a short runtime. In addition, Pan et al. [32] proposed an improved Gray Wolf optimization algorithm, which significantly improves the quality of the initial population in the feature selection task for high-dimensional data by fusing the ReliefF algorithm, Coupla entropy, competitive bootstrapping strategy, and Leader Wolf enhancement strategy based on differential evolution, and enhances the search flexibility and global search capability of the algorithm. This improvement effectively avoids the problem of local optimal solution, and experimental results on 10 high-dimensional small-sample gene expression datasets show that the algorithm is able to achieve a significant improvement in classification accuracy with a very low feature selection ratio (less than 0.67%), which is remarkably competitive with existing state-of-the-art feature selection methods. The comprehensive study shows that the algorithm achieves a good balance between exploration and exploitation, and its unique search strategy significantly improves the search performance of the Gray Wolf optimization algorithm. In addition, Braik et al. [34] conducted an in-depth study on the application of White Shark Optimizer (WSO) in the field of feature selection and designed three innovative binary variants - Binary Adaptive WSO (BAWSO), Binary Comprehensive Learning WSO (BCLWSO) and Binary Heterogeneous CLWSO (BHCLWSO). These variants effectively balance the exploration and exploitation mechanisms and address the problems of low solution accuracy and slow convergence speed faced by WSO in optimization tasks. Comprehensive evaluation results on 24 well-known datasets show that these variants significantly improve the performance of classical binary WSO (BWSO), especially BHCLWSO, which demonstrates superiority over other binary algorithms in terms of classification accuracy and number of feature selections. Nadimi-Shahraki et al. [41] proposed an Aquila Optimizer (AO)-based wrapper feature selection method for addressing the negative impact of redundant and irrelevant features on algorithm performance in medical datasets. By introducing the S-shaped Binary Aquila Optimizer (SBAO) and V-shaped Binary Aquila Optimizer (VBAO), the method is able to efficiently filter out the optimal feature subset and significantly improve the accuracy of the classification task. Experiments on seven benchmark medical datasets show that SBAO and VBAO outperform six state-of-the-art binary optimization algorithms in classification performance. In addition, tests on the COVID-19 real dataset further validate the superiority of SBAO in terms of the number of feature selections and classification accuracy. Nadimi-Shahraki et al. [42] proposed an Improved Grey Wolf Optimizer (I-GWO) aimed at solving the problems of insufficient population diversity, exploitation and exploration imbalance, and premature convergence, which are common in global optimization and engineering design problems. The algorithm introduces a dimensional learning-based hunting (DLH) strategy that enhances the balance between local and global search and maintains population diversity by modeling individual wolf hunting behavior and constructing unique neighborhoods for each wolf to share information. Experiments on the CEC 2018 benchmark suite and four engineering design problems show that I-GWO outperforms six state-of-the-art meta-heuristic algorithms in terms of performance, and exhibits significant competitiveness and efficiency especially in engineering design problems. Nadimi-Shahraki et al. [43] proposed a discrete version of the Improved Grey Wolf Optimizer (I-GWO), known as DI-GWOCD, for the problem of community detection in complex networks.The algorithm optimizes node allocation using the local search strategy of I-GWO and introduces the Binary Distance Vector (BDV) to solve the discrete community detection problem. Evaluated by experiments on several real-world network datasets, DI-GWOCD excels in metrics such as modularity and NMI. Compared with existing algorithms, DI-GWOCD is able to detect communities with higher quality, demonstrating its advantages in large-scale network community detection.
2.2. Evolutionary-based feature selection algorithms
In order to improve the performance of feature selection problems and simultaneously reduce the number of features and computational burden, evolutionary algorithms have achieved significant development and innovation in recent years. Among them, classical algorithms such as genetic algorithm [44] and differential evolution [45] are widely used to solve various challenges in feature selection. To address the challenges of feature selection for high-dimensional data, Ma et al. [46] proposed a novel roulette-based level learning evolutionary algorithm (RWLLEA) in this study. The algorithm enhances the diversity of the population by introducing a balanced population model and incorporates a dynamic search space update strategy to effectively reduce the computational cost. Experimental validation results show that RWLLEA is able to obtain a streamlined feature set with shorter runtime on 15 different datasets while exhibiting superior classification accuracy, which significantly outperforms the other six feature selection techniques. In addition, Espinosa et al. [47] innovated in agent-assisted multi-objective evolutionary algorithms by skillfully fusing generation-fixed evolutionary control with a direct fitness replacement strategy. By continuously updating the LSTM agent model through an incremental learning mechanism, the method effectively accelerates the process of time series feature selection. In tasks such as air quality prediction, indoor temperature of smart buildings, and oil temperature prediction of power transformers, the algorithm achieves a significant improvement in prediction performance compared to traditional and existing agent-assisted feature selection methods, by 23.98%, 34.61%, and 13.77%, respectively. These results not only demonstrate the strong potential of evolutionary algorithms in the field of feature selection, but also provide new ideas and methods for future research. Nadimi-Shahraki et al. [48] proposed a Multi-Trial Vector based Differential Evolutionary Algorithm (MTDE), which significantly improves the performance of the algorithm through the introduction of an adaptive moving step and the Multi-Trial Vector Method (MTV). The MTV method combines three kinds of trial vector producers (TVPs): representative-based, locally randomized, and global best-history-based TVPs, and achieves experience sharing through a winner distribution strategy and a lifecycle archiving mechanism. Experiments on the CEC 2018 benchmark suite show that MTDE exhibits higher accuracy and performance in dealing with problems of different complexity, outperforming state-of-the-art meta-heuristic algorithms such as GWO, WOA, SSA, HHO, etc.
2.3. Physics-based feature selection algorithms
Many feature selection algorithms are inspired by physical phenomena.
Houssein et al. [49] addressed the challenge of high-dimensional feature selection in liver disease datasets by pioneering the enhanced Keplerian optimization algorithm, I-KOA. This algorithm ingeniously incorporates the local escape operator derived from dyadic learning and the k-nearest-neighbor classifier. Following a series of rigorous experimental validations, I-KOA demonstrated exceptional performance on the complex liver disease dataset. It surpassed multiple optimization algorithms in various aspects, including classification accuracy, number of selected features, sensitivity, precision, and F1 scores. Furthermore, I-KOA provides a highly efficient and precise feature selection tool for medical diagnosis decision support systems, showcasing significant practical application value and profound implications. Meanwhile, Abdel-Salam et al. [50] proposed the ACRIME algorithm to address the shortcomings of the RIME optimization algorithm in terms of the balance between exploration and exploitation, the avoidance of local optimal solutions, and the convergence speed, etc. ACRIME is effective in enhancing the population diversity and optimizing the population by introducing four innovative strategies, such as the initialization of chaotic mapping, the co-integration phase of the search for self-adaptive symbiotic organisms (SOS), the hybrid mutation strategy, and the restart strategy. enhanced population diversity, optimized the balance between exploration and exploitation, and substantially improved local and global search capabilities. The experimental results show that ACRIME performs excellently in CEC2005 and CEC2019 benchmark function tests, and also demonstrates strong competitiveness in feature selection applications on 14 datasets and COVID-19 classification real-world problems, and its performance is significantly better than that of other classical and advanced meta-heuristic algorithms. In addition, Zhang et al. [51] proposed a plant root growth optimization (PRGO) algorithm by drawing inspiration from nature and inspired by the plant rhizome growth mechanism. The algorithm skillfully combines the global exploration and local exploitation search strategies, and demonstrates excellent performance on both CEC2014 and CEC2017 test sets. For the high-dimensional feature selection problem, the binary variant of PRGO, BPRGO, is comprehensively compared with eight well-known methods on 16 datasets, and demonstrates its stronger feature reduction ability and better overall performance through a number of performance metrics. BPRGO is capable of obtaining a very small feature subset while maintaining high accuracy, providing a novel and effective solution to the high-dimensional feature selection problem.
2.4. Human-based feature selection algorithms
With the development of data processing, many human-based feature selection methods have been proposed.
Zhuang et al. [52] proposed Binary Arithmetic Optimization Algorithm (BAOA), which enhances exploration and exploitation by redesigning the multiplication operator and introducing four families of transfer functions. Further, a parallel mechanism is introduced to form a parallel binary AOA (PBAOA). Based on this, a four-family transfer function is introduced to enhance the performance of the algorithm. To further enhance the efficiency, this paper incorporates the parallel mechanism into BAOA and proposes the Parallel Binary AOA (PBAOA) algorithm. Tested on 10 low-dimensional and 10 high-dimensional datasets from UCI and scikit feature libraries, the results show that BAOA and PBAOA outperform the classical and the latest algorithms, and the transfer function selection varies according to the dimensionality of the dataset, and the parallel mechanism is effective in enhancing the performance of the algorithm. Meanwhile, Cinar. [53] proposed the adaptive modulo binary optimization (AMBO) algorithm for feature selection to effectively deal with the NP-Hard problem and dimensionality catastrophe. AMBO adopts a binary optimization strategy combined with adaptive mutation and crossover mechanisms, and innovatively introduces binary logic gates modulo intelligent local search. Experiments show that AMBO performs well on 11 datasets when comparing algorithms such as BPSO and multiple genetic algorithms, and statistical tests confirm its significance. When compared to other metaheuristic algorithms on 21 datasets, AMBO outperforms in terms of classification error rate, fitness, and feature selection, demonstrating superior performance. In addition, Khosravi et al. [54] proposed a novel binary group teaching optimization algorithm, BGTOALC, which combines local search with chaotic mapping and aims at solving the high-dimensional feature selection problem. BGTOALC enhances the exploration and exploitation of the algorithm by introducing the novel binary operators of Binary Teacher Phase Good Group (BTPGG) and Binary Teacher Phase Bad Group (BTPBG). Meanwhile, the student phase uses the new binary student’s objection-based learning (BSOBL) operator to enhance the performance of the algorithm by utilizing the objection strategy. In addition, the algorithm is designed with the Mean Binary Selection (MBS) operator to optimize the teacher assignment phase in a binary manner to improve the convergence rate. For further comparison, the study also developed the BGTOAV and BGTOAS algorithms using S-shaped and V-shaped transfer functions. Experimental results show that the BGTOALC method outperforms the previous methods in reducing the number of features and improving the accuracy of the machine learning algorithms on a dataset of 30 different dimensions. Statistical analysis further confirms that BGTOALC outperforms other binary metaheuristic algorithms in terms of efficiency and convergence rate. Nadimi-Shahraki et al. [55] proposed an improved algorithm based on multiple test vectors (MTV-SCA) to address the problems of the sinusoidal cosine algorithm (SCA), which is prone to fall into the local optimum, the exploration-exploitation imbalance, and the lack of accuracy. MTV-SCA effectively balances exploration and exploitation and avoids premature convergence by introducing the multi-test vector (MTV) method and combining four different search strategies. Experiments on CEC 2018 benchmark functions show that MTV-SCA outperforms conventional SCA and other advanced algorithms (e.g., CEC 2017 winner algorithm) in terms of convergence speed, accuracy, and avoidance of local optima. Statistical tests (Friedman and Wilcoxon signed rank test) further validate its significant advantages. In addition, the successful application of MTV-SCA to six non-convex constrained engineering design problems demonstrates its practical applicability in complex optimization tasks.
2.5. Hybrid filter-wrapper feature selection models
When dealing with high-dimensional datasets, feature selection is a key step to improve the effectiveness of machine learning methods. Currently, the mainstream solutions to the FS problem fall into two main categories: packing and filtering methods. In order to optimize the performance of the FS task, a strategy that combines these two methods is often used in the literature [56]. This combination is usually divided into two phases: first, the most critical features are filtered out using the filter approach; then, in the second phase, the optimal subset of features is further identified from these features by the wrapper approach.
Wrapper and filter feature selection are widely used when dealing with the challenges of high-dimensional data. Often, a hybrid of these two approaches is used. Vommi et al. [57] proposed a hybrid feature selection approach for medical dataset classification challenges, combining ReliefF and fuzzy entropy (RFE) filtering techniques and incorporating Enhanced Equilibrium Optimizer (EO) techniques including Opposition-Based Learning (OBL), Cauchy Variation Operator, and Novel Search Strategies. The enhanced EO is integrated with eight time-varying S- and V-shaped transfer functions to form a binary-enhanced equilibrium optimizer (BEE) for extracting essential features from the integrated features. Experiments are validated on 22 benchmark datasets and four microarray datasets (including the COVID-19 case), and the results show that the RFE-BEE method outperforms other state-of-the-art algorithms in terms of fitness, accuracy, precision, sensitivity, and F-measure. Song et al. [58] proposed a hybrid feature selection method, SFEMEO, based on an elite-guided mutation strategy, designed for high-dimensional, multi-sample and multi-categorical cancer gene expression data to solve the cancer subtype diagnosis problem. The method is divided into two phases, initial screening and combinatorial modeling, using seven filtering methods combined with logical operations to generate feature subsets and thresholds determined by leave-one-out cross-validation. On the UCI dataset, SFEMEO improves the classification accuracy by 0.56% to 20.19% compared to the other nine algorithms, and the optimal fitness is significantly improved. On the cancer gene expression dataset, SFEMEO improves the accuracy by 3.73% to 18.13% with superior best fit performance compared to the 9 intelligent optimization algorithms. The Wilcoxon rank sum test validates the effectiveness of SFEMEO and its advantages in solving the feature selection problem for high-dimensional cancer gene expression data.
3. Review of RBMO
3.1. Inspiration for RBMO
The red-billed blue magpie, a unique Asian bird treasure, has a wide distribution range covering China, India and Myanmar. They are known for their robust physique, dazzling blue plumage and bright red bill, adding a splash of color to nature. The red-billed blue magpie exhibits significant dietary diversity, consuming insects, small vertebrates, and various plant fruits. Their feeding strategies are flexible and include walking on the ground, jumping and weaving between branches, especially in the early morning and at dusk. Typically, red-billed blue magpies travel in small groups of two to five individuals, occasionally forming larger flocks with pronounced social behaviors. Their social interactions are especially prominent when hunting cooperatively. Once they find food, such as fruit or insects, they quickly signal to gather their companions and work together to round up the target. This tacit teamwork allows them to easily break through their prey’s defenses. In addition, red-billed blue magpies also show unique storage behavior, hiding their food in hidden places such as tree holes, gaps between branches or rock crevices to avoid being snatched by other predators.
Overall, red-billed blue magpies are not only highly skilled foragers, but also excel at diversifying their food reserves and display excellent teamwork and social intelligence in hunting. It is these unique ecological characteristics that provide a rich source of inspiration for the design of the RBMO metaheuristic algorithm.
3.2. Mathematical modeling of RBMO
3.2.1. Initialization stage.
The population of red-billed blue magpies is first created randomly, i.e., assuming that N red-billed blue magpies are in a D-dimensional space, with each red-billed blue magpie located:
where i = 1,2,...,N, j = 1,2,...,D, denotes the position of the ith red-billed blue magpie in dimension j. N is the population size of red-billed blue magpies and D is the dimension of the problem.
3.2.2. Search behavior.
Red-billed blue magpies typically forage in small groups, which facilitates a more efficient search and better population survival. They search for food in the forest or on the ground by hopping and flying. This adaptability enables red-billed blue magpies to adopt various hunting strategies based on environmental factors and resource availability, ensuring a steady food supply. When groups gather to forage, the mathematical model is as follows:
where Xi represents the i-th individual, n denotes the number of groups of red-billed blue magpies randomly chosen from the search population, ranging from 2 to 5, Xk refers to the k-th randomly selected individual, and Xr represents the search agent chosen at random during the current iteration.
where m represents the number of search agents in the cluster exploring for food, ranging from 10 to N. The number of search agents is selected randomly from the entire population.
3.2.3. Preying behavior.
Red-billed blue magpies demonstrate remarkable hunting abilities and teamwork, particularly when hunting prey. They employ a range of tactics, including rapid pecking, jumping, and aerial capture of insects. Individually or in small groups, they tend to hunt small prey and gather plants, a process that can be mathematically described by Equation (4). When grouped together, red-billed blue magpies are able to attack larger targets such as large insects and even small vertebrates in a concerted manner, and this collective hunting behavior is mathematically modeled in Equation (5). This range of predatory strategies highlights the diverse talents and high adaptability of red-billed blue magpies as predators, ensuring that they are able to forage efficiently in different environments.
where represents the location of the food,
, is the conditioning factor, and rand denotes a value randomly chosen from the range [0, 1].
3.2.4. Storage behavior.
When red-billed blue magpies hunt a surplus of food, they store it in a tree hole or hidden place for the winter food shortage season. This process can be formulated as preserving the optimal solution to more effectively identify the global optimum. The corresponding mathematical model is:
where fitnessold and fitnessnew denote the fitness values of the red-billed blue magpie following position updates, respectively.
In the RBMO algorithmic framework, the optimization process starts with the construction of a randomly generated set of candidate solutions, which is imaginatively referred to as a “population”.The core strategy of RBMO revolves around cyclic paths, and the goal is to find the optimal or near-optimal solution. A distinct “food reserve” mechanism improves the algorithm’s ability to explore and exploit the solution space, enabling more efficient search. The search continues until a specified termination condition is satisfied. The pseudo-code for RBMO is provided in Table 1.
4. Improved version of the RBMO algorithm
4.1. Motivation
The RBMO (Red-billed Blue Magpie Optimization) algorithm is inspired by the red-billed blue magpie, an efficient predator in nature, whose flexible and varied foraging strategies and highly collaborative group behaviors provide important inspirations for the algorithm design. The RBMO algorithm offers distinct advantages in both exploration and exploitation of the search space. However, its application to the complex, high-dimensional task of feature selection in medical data still presents several challenges. The difficulties in the field of medical feature selection mainly include the huge search space, the complex and nonlinear relationship between high-dimensional features, and the bottleneck of computational efficiency. To overcome these challenges, this paper proposes an enhanced version of IRBMO algorithm based on an in-depth analysis of the characteristics of the RBMO algorithm and customized improvement of the original algorithm with the actual needs of medical data feature selection.
Within the framework of an improved optimization algorithm for red-billed blue magpies, we innovatively propose an enhanced set of behavioral strategies designed to significantly improve the performance and efficiency of the algorithm in the task of feature selection for medical data. These well-designed strategies specifically include elite search behavior, collaborative hunting behavior, and memory storage behavior, each of which is targeted to address the complexity and high dimensionality challenges in medical data processing. First, elite search behavior is introduced to mimic the efficient exploration patterns of good individuals (i.e., elites) in nature. By retaining and reinforcing those solutions (feature subsets) that perform well during the algorithm iteration process, the elite search behavior is able to guide the search process to quickly converge to a global or local optimal solution, thus quickly identifying the most representative features in a huge medical dataset.
Second, collaborative hunting behavior emphasizes information sharing and cooperative work among individuals within a population. This strategy mimics the close collaboration of red-billed blue magpie groups during hunting, boosting the algorithm’s global search efficiency and solution diversity by promoting the exchange and integration of information among different solutions. In the context of feature selection for medical data, this means that the algorithm is able to explore the feature space more efficiently, discovering combinations of features that may be overlooked when considered individually but that have significant predictive value when combined. Finally, the memory storage behavior is a manifestation of a mechanism for long-term learning and knowledge accumulation. The strategy works by creating a memory store in the algorithm for storing excellent solutions and their related information encountered in historical iterations. This not only helps to avoid the algorithm from falling into the predicament of repeated searches, but also speeds up the optimization process by reusing this valuable information in subsequent iterations. For medical data feature selection, the memory storage behavior ensures that the algorithm can quickly leverage previous experience when faced with similar or related datasets, improving the accuracy and efficiency of feature selection.
The IRBMO algorithm not only retains the core advantages of RBMO in search and optimization, but also incorporates a series of innovative mechanisms by drawing on the survival wisdom of the red-billed blue magpie in its natural environment. These enhancements allow IRBMO to better address the challenges of high-dimensional feature spaces, accurately capture complex nonlinear relationships between features, and preserve computational efficiency when processing large-scale datasets. In the medical feature selection task, IRBMO demonstrates excellent performance, providing an efficient and reliable solution for the field.
4.2. Elite search behavior
The search behavior of red-billed blue magpies is a team action and very efficient, in the algorithm simulation, we analogize its elite search behavior to the process of feature selection optimization to quickly lock the optimal features. The elite search behavior of the red-billed blue magpie is shown in Fig 1.
Firstly, the algorithm is guided by the elite individuals of red-billed blue magpies, selecting the three individuals with the highest fitness (elite individuals) and calculating the average value of their positions to guide the search process towards the region of high-quality feature solutions, and then switching between the exploration based on the average positions of the elite individuals and the fine search based on the randomly selected elite individuals and the principle of Brownian Motion, and then, in the stage of fine search, the algorithm introduced the Brownian Motion random vectors to increase the diversity and flexibility of the search direction, and finally gradually approaches the optimal feature solution by updating the position of each search individual through efficient iteration. Its mathematical model is as follows:
where represents the current optimal individual and
represents the average of the top three current optimal individuals. The position of an individual moves towards a random point between its current position and the average position of the three optimal individuals (
). This movement simulates the exploration behavior of moving closer to the optimal feature solution.
where RS represents the elite pool, i.e., the optimal three individuals and their average values are placed in it, from which one is drawn each time, represents the current optimal individual, and rb is the Brownian motion adjustment factor. That is, one is randomly selected from the optimal three individuals and their average positions, and then the positions of the individuals are updated according to the principle of Brownian motion, which simulates a more random exploration process.
4.3. Collaborative hunting behavior
When red-billed blue magpies find prey, they will engage in teamwork to capture prey, based on this, we redesigned a behavioral mathematical model based on collaborative hunting. Collaborative hunting behavior is shown in Fig 2.
Firstly, based on the idea of teamwork of red-billed blue magpies, the prey is constantly surrounded by the team leader (the optimal individual) with the elites in the elite pool as well as the random individuals, and then the position between the elites and the random group is constantly adjusted through the Cauchy distribution to encircle the prey. If the prey escapes from the encirclement, then it is up to the leader and the elites to re-encircle the escaped prey and adjust the encirclement between the leader and the elites through levy flights. The mathematical model is as follows:
where cv represents the Cauchy distribution and L denotes the moderator.
4.4. Memory storage behavior
To survive cold winters when food is scarce, red-billed blue magpies have evolved the behavior of storing food, thus ensuring a stable food supply in times of shortage. The memory storage behavior is shown in Fig 3.
If they find that the food they hunt cannot supply the entire population, they begin to adaptively compensate by consuming stored food to survive the winter. Their mathematical model is as follows:
where if fitold and fitnew denote the fitness and current fitness in the memory respectively. The decision to update the memory is made by comparing the current fitness value with the fitness value in the memory.
where A is the determinant, if food is abundant then the elite individuals continue to lead the group in hunting, if food is scarce then the group uses the reserves for group survival. Xidx1 and Xidx2 are randomly selected individuals in the group. Therefore, the pseudo-code of IRBMO is shown in Table 2:
4.5. Space-time complexity
The time and space complexity of the RBMO algorithm can be summarized as follows:
Regarding time complexity, the RBMO algorithm is primarily influenced by the maximum number of iterations T, the population size N, and the dimensionality of the individuals (dim). Each iteration involves steps such as individual fitness calculation, position update, boundary condition processing, etc., and the time complexity of these steps is directly related to N and dim. Additionally, the time complexity of the sorting step reaches . The overall time complexity can be simplified to
when secondary factors such as the computation of fitness function are ignored, which intuitively shows that the algorithm’s time consumption increases notably with the number of iterations, population size, and individual dimensionality.
In terms of space complexity, the main space occupation of the RBMO algorithm stems from the population matrix X, which has a space complexity of and is responsible for storing the positional information of N individuals in the dimensional space of dim. Although the algorithm also needs to store auxiliary data such as fitness vectors, historical locations, indexes, etc., the space complexity of these data is relatively low and has a negligible effect on the total space complexity. Therefore, the total space complexity is mainly determined by the population matrix X, i.e.,
, which clearly indicates that the algorithm’s space requirement is directly related to both the population size and individual dimensions.
4.6. The binary IRBMO
To overcome the limitations of the meta-heuristic algorithm in dealing with the binary problem of feature selection, we transform it to adapt the algorithm from the continuous search space to the binary domain. Taghian et al. [59] summarized the application of transfer function based binary meta-heuristic algorithms in feature selection, where various binarization techniques are summarized and classified into four main categories, i.e., normalization, angular modulation, binary operators and transfer functions. Normalization normalizes the solution in a continuous space to the range [0, 1] and then converts it to a binary value by thresholding (e.g., 0.5). Angular modulation is the use of trigonometric functions (e.g., sine and cosine) to map real solutions to binary arrays. Normalization and angle modulation are simpler methods of binary conversion for specific problem scenarios. Binary Operators use binary operators (e.g., XOR, AND, OR, NOT) to generate new binary solutions, operate directly in binary space, and are suitable for discrete optimization problems, and Transfer Functions use transfer functions to map real solutions to binary values. The transfer function calculates the probability of a change in the value of each dimension and is the most commonly used method that preserves the search properties of continuous algorithms and is suitable for a wide range of optimization problems.
In this research, we realize the mapping of continuous solutions to binary representations by designing a transformation function (TF) that allows the algorithm to operate effectively in feature selection problems. Specifically, the TF determines the status of each feature by means of a threshold decision based on the importance score of the feature: if the score is higher than a predefined threshold, it is labeled as “1” (selected); otherwise, it is labeled as “0” (excluded). This approach retains the powerful global search capability of the metaheuristic algorithm, while simplifying the feature selection process and providing new possibilities for its application to discrete optimization problems. The transformation logic is briefly described as follows:
In the subsequent experimental chapters, we also further explore the transformation effects of the S- and V-shaped transfer functions to verify their effectiveness in the feature selection task.
5. Experimental results and discussion
In order to comprehensively and thoroughly measure the efficacy of our proposed binary version of IRBMO in real-world applications, we have carefully planned and executed a series of extensive and systematic experimental validations. These experiments are constructed based on a diverse dataset of 12 real medical scenarios, each of which varies in the number of features, sample size, and category distribution to ensure the broad representativeness and practicality of the experimental results. In terms of experimental design, we adopted a diverse comparison strategy, pitting the IRBMO algorithm against a variety of meta-heuristic algorithms with their own unique characteristics. This approach enables us to analyze the performance characteristics of IRBMO, including its strengths and limitations, in a nuanced manner, which lays a solid foundation for subsequent algorithm iterations and performance enhancement. In addition, we also evaluated our algorithm against the current mainstream feature selection algorithms, aiming to not only validate its competitiveness, but also explore the unique advantages of the algorithm in specific application scenarios through this session. Further, in order to explore the optimal classification synergy, we implemented an experiment combining the IRBMO algorithm with the KNN [60] classifier, aiming to further enhance the accuracy and efficiency of the classification task through the strong combination of the two. And we also combine IRBMO with transfer functions to construct variants with stronger generalization capabilities. The design and implementation of this series of comprehensive experiments not only improved our overall insight into the algorithm’s performance, but also provided a strong support for the efficient application of the algorithm in real-world problems.
5.1. Description of the datasets
To evaluate and validate the effectiveness of the new algorithm in this study in a comprehensive and detailed manner, we carefully selected 12 unique datasets covering multiple fields of medicine from several well-known data warehouses such as UCI and KEEL. These datasets include Breast Cancer Coimbra, Wisconsin Diagnostic & Prognostic Breast Cancer, Cleveland Heart Disease, Dermatology, Diabetic Retinopathy Debrecen, Hepatitis, ILPD, Lymphography, Parkinson’s Disease (both generic and classification datasets), and Spectfheart. These selected datasets have distinctive characteristics, ranging from 79 to 1151 sample sizes and feature dimensions from 9 to 753, demonstrating the adaptability and effectiveness of the algorithms in addressing the challenges of data of different sizes and complexities. Detailed dataset information is summarized in Table 3, where the abbreviated form of the dataset is used to enhance readability, and core parameters such as data source, sample size, number of features, number of categories, and the research area to which it belongs are fully documented. In particular, it is noted that most of the selected datasets focus on binary classification problems, which helps to accurately assess the performance of the algorithms in such tasks; meanwhile, the existence of a few multicategorical datasets provides a valuable opportunity to validate the algorithms in terms of their ability to recognize multiple categories. To ensure the comprehensiveness and rigor of the algorithm’s performance validation, we follow the standard dataset partitioning principle and allocate 75% of each dataset for training and the remaining 25% for testing. This division strategy aims to balance the sufficient sample size required for algorithm learning with the effective evaluation of the algorithm’s generalization ability in the test set, so as to ensure the accuracy and reliability of the algorithm’s performance evaluation results.
5.2. Performance measures
In this research, the effectiveness and accuracy of the proposed algorithm is validated against other comparative algorithms by using various performance metrics. These performance metrics are specifically defined as shown in the following section.
5.2.1. Evaluation metrics.
- 1. Average fitness value: This performance metric is used to measure the quality of the selected subset of features in the target solution. The formula is shown in the mathematical expression below:
where N is the number of trial tours and is the fitness value of the best solution generated in the ith time.
- 2. Average classification accuracy: With the help of this evaluation metric, we can determine the proportion of the examined cases in the test dataset that are accurately classified from N runs, which is calculated as follows:
where denotes the categorization accuracy of the highest quality solution generated at run i. The category is defined as the category that was rejected. where TN denotes a rejected category, FP denotes a misrecognized category, FN is a false negative indicating a wrongly rejected category, and TP is a true positive indicating dissatisfaction with a correctly defined category.
- 3. Average sensitivity: The purpose of this assessment method is to determine the percentage of all true positive instances in the test dataset over N experimental runs using equation 17.
where the sensitivity result of the best solution generated by the ith run is denoted by .
- 4. Average specificity: This metric is used to determine the proportion of all true-negative cases in the test dataset in run N, as shown below:
where denotes the specificity result of the optimal solution produced by the ith run.
- 5. Average number of chosen features: This metric aims to give the average size of feature selection over N runs. It is calculated by the formula:
where M is the total number of features in the dataset under consideration and is the number of features selected among the best solutions with minimum fitness values generated by run i.
- 6. Average F-score: is a metric used to evaluate the performance of a classification model, which integrates the two aspects of Precision and Recall. The formula is as follows:
5.2.2. Statistical test.
To verify the superior performance of our proposed algorithm, IRBMO (Improved Random-Based Multi-Objective Optimizer), in all aspects, in this section, we have carefully adopted the Wilcoxon rank sum test, a statistical method. aimed at assessing in depth whether the performance of IRBMO in each run is significantly different from the other algorithms involved in the comparison at the P = 5% significance level. These algorithms include: Differential Evolutionary Algorithm (DE) [30], Genetic Algorithm (GA) [61], Goose Optimization Algorithm (GOOSE) [62], Gray Wolf Optimizer (GWO) [32], Parrot Optimizer (PO) [63], Whale Optimization Algorithm (WOA) [31], Artificial Rabbits Optimization (ARO) [36], Capuchin Search Algorithm(CAPSA) [35], Electric Eel Foraging Optimization Algorithm (EEFO) [29], Elk herd Optimizer(EHO) [64], White Shark Optimizer (WSO) [34] and Covariance Matrix Adaptation Evolution Strategy (CMAES) [65]. To ensure the fairness and credibility of the experimental results, we finely tuned and optimized the key parameters of all the algorithms involved in the comparison, and the specific parameter settings are shown in Table 6 and Table 13.
This testing process strictly follows the standard practice in the academic literature, in which the original hypothesis H0 is set as there is no significant difference between the two algorithms. Specifically, if the P-value of the test is less than 5%, we have good reasons to reject the original hypothesis, which signifies that there is indeed a significant performance difference between IRBMO and the comparison algorithm; on the contrary, if the P-value is greater than 5%, we accept the original hypothesis, i.e., we believe that the two algorithms perform similarly, and it is difficult to distinguish between their strengths and weaknesses, and the “NaN” value reflects this situation. The “NaN” value reflects this situation, which means that the performance of the two algorithms is too close to be compared effectively.
In Table 4, we detail the test results of IRBMO with dimension 30 in the CEC-2017 test set, and make a comprehensive comparison with other well-known comparison algorithms in the field. In order to present the comparison results more intuitively, we have specifically bolded the values with P-values greater than 0.05 (i.e., not reaching the significance level). It is worth noting that there is no “NaN” value in the results of CEC-2017, a phenomenon that indicates that the optimization results of SBOA (another optimization algorithm) are usually able to produce significantly different performances from other algorithms, and thus have a certain degree of differentiation. Looking further at the data in the table, we can see that algorithms like ARO and DE do not show particularly outstanding data performance in the CEC-2017 results, and their data points are rarely highlighted in bold form display. In contrast, IRBMO exhibits significant differences from DE and other metaheuristic algorithms, which further confirms IRBMO’s superiority in performance.
To evaluate the performance of the IRBMO algorithm more comprehensively, we also used the nonparametric Friedman mean rank test to rank the experimental results of IRBMO and other algorithms on the CEC-2017 test set. The results show (Table 5) that IRBMO consistently tops the list in terms of average rank, a result that is certainly a strong proof that our proposed optimizer outperforms other benchmark algorithms on the test set under consideration. In summary, IRBMO not only excels in individual performance metrics, but also demonstrates unrivaled superiority in comprehensive performance evaluation.
5.3. Experiments comparing IRBMO and metaheuristic algorithms
To comprehensively analyze the efficacy and stability of the improved IRBMO algorithm in feature selection tasks, we have crafted a comparative study with nine classical and cutting-edge meta-heuristic algorithms in a systematic manner. These algorithms include Differential Evolutionary Algorithm (DE) [30], Genetic Algorithm (GA) [61], Goose Optimization Algorithm (GOOSE) [62], Gray Wolf Optimizer (GWO) [32], Parrot Optimizer (PO) [63], Whale Optimization Algorithm (WOA) [31], Elk herd Optimizer(EHO) [64], White Shark Optimizer (WSO) [34], Covariance Matrix Adaptation Evolution Strategy (CMAES) [65]. These algorithms are chosen for comparison mainly based on their wide application within the optimization field, their excellent performance and the extent to which they are frequently cited in the literature. In addition, each of these algorithms has its own characteristics and can comprehensively cover the main types and strategies of current optimization algorithms, thus providing a comprehensive and fair evaluation environment for our IRBMO algorithms.
To maintain experimental fairness and the credibility of the results, we carefully tuned the key parameters of all the compared algorithms, and the detailed parameter configurations are presented in Table 6. During the performance evaluation, we strictly unified the experimental conditions, initialized all the algorithms with 30 population individuals, and ran 100 iterations to complete the optimization process. To enhance the statistical reliability of the results, each algorithm performs 30 independent runs to capture the performance fluctuations and calculate the mean and standard deviation of the algorithms, as shown in Table 7 is the mean and standard deviation of the fitness values recorded. The evaluation metrics cover fitness value, classification accuracy, sensitivity, specificity, number of selected features, and F-score, which not only reveal the average performance of the algorithms, but also quantify their stability through standard deviation. To present the experimental results more intuitively, we use box-and-line plots to visualize and analyze the data distribution characteristics. This presentation makes the performance fluctuations and extremes of each algorithm on different datasets visible at a glance, which helps to provide an in-depth interpretation of the stability and robustness of the algorithms. As shown in Fig 4, the IRBMO algorithm performs particularly well on various metrics, and its fluctuation range and outlier distribution reflect excellent robustness. In addition, we have drawn radar plots of classification prediction accuracy to present the effectiveness of various algorithms in classification prediction tasks in a more intuitive way. As shown in Fig 5, the experimental results indicate that the IRBMO algorithm significantly outperforms the other compared algorithms in the feature selection task.
To validate the superiority of the IRBMO algorithm in a multi-dimensional manner, we combine numerical and visual analyses to provide a comprehensive and in-depth evaluation of its performance on multiple datasets. This comprehensive analytical approach not only clearly demonstrates the outstanding performance of the IRBMO algorithm in the feature selection task, but also further confirms its significant advantages in terms of efficacy and robustness.
Careful analysis of Table 7 shows that IRBMO performs the best in the fitness value test, has the lowest average fitness value, and achieves the optimal results in 9 datasets without performing the worst in any dataset. Its overall ranking is 1.50, which is the first place among all the compared algorithms, showing excellent optimization ability and stability. Specifically, IRBMO has the lowest average fitness values on the BCC (0.1588), BCWD (0.0100), and BCWP (0.0207) datasets, which are significantly better than the comparison algorithms. Meanwhile, their standard deviations are generally small, such as BCWD (0.0018), Dermatology (0.0033), and PDC (0.0063), which indicates that the algorithms’ results are stable and have low volatility. In contrast, RBMO, GA and WOA performed poorly, ranking 10, 9 and 11, respectively. RBMO did not win on all datasets and achieved the highest fitness values on several datasets, e.g., BCWP (0.0763), Cleveland (0.3895) and DRD (0.3066), suggesting a limited ability to optimize. WOA achieves better results on only 3 datasets but performs poorly on the remaining 9 datasets with high fitness values, e.g., Spectfheart (0.1434). GA achieves high fitness values on a number of datasets such as BCWD, BCWP, and DRD and ranks 9th with weak overall performance. EHO and DE rank 2nd and 3rd with sub-optimal performance. EHO achieves the lowest fitness values on the Dermatology (0.00496) and Parkinsons (0.01295) datasets, with better overall stability, but still has higher fitness values than IRBMO on some datasets, such as DRD (0.2911) and Spectfheart (0.0697).DE achieves higher fitness values on the BCWP, Parkinsons and PDC datasets, but still lags behind IRBMO in average ranking.
To present the performance advantages of the IRBMO algorithm visually, we visualized the fitness values by means of box-and-line plots (see Fig 5). The box-and-line plot analysis shows that IRBMO achieves the lowest or near-minimum fitness values on multiple datasets, demonstrating excellent optimization capabilities. Meanwhile, its small standard deviation and compact box range indicate that the algorithm has strong stability and robustness. On the BCC, BCWD, BCWP, Cleveland, ILPD, and Parkinsons datasets, IRBMO adaptation values are significantly lower than those of other algorithms, and the distribution is more concentrated, which further proves the stability of its optimization effect. In contrast, EHO, DE and CMAES perform better on individual datasets, for example, DE has lower adaptation values on the BCWP dataset, but the overall fluctuation is larger and the stability is insufficient. GWO and GOOSE achieve better optimization results on some datasets, but the distribution of the adaptation values is wider and the performance is not robust enough. The algorithms WSO, WOA, GA, RBMO and PO have high overall adaptation values, especially RBMO performs the worst on multiple datasets with more outliers, indicating that its optimization ability is limited and it is difficult to adapt to the characteristics of different datasets.
Careful analysis of Table 8 reveals that the IRBMO algorithm exhibits a notable performance advantage in terms of classification accuracy metrics. It boasts a mean rank of 2.25 and a comprehensive ran of 1. IRBMO achieves the highest classification accuracy on 6 out of 12 datasets and does not have the worst performance on any dataset, while its standard deviation was generally low, further demonstrating its stability and robustness. EHO ranked second with an average ranking of 4.17, performing optimally on 1 dataset but intermediate on most datasets. DE and CMAES were ranked third and fourth with average rankings of 4.25 and 4.75, respectively, with DE performing optimally on 3 datasets but worst on the BCWP dataset, and CMAES performing optimally on 1 dataset but centrally on multiple datasets. GWO and GOOSE were ranked fifth and sixth with mean rankings of 4.75 and 4.83, respectively, with GWO performing optimally on 1 dataset and GOOSE performing optimally on 2 datasets, but both were intermediate on most datasets. WSO, WOA, and GA performed poorly, with average rankings of 6.00, 6.42, and 8.25, respectively, with WSO performing optimally on the Hepatitis dataset, and WOA and GA performing optimally on 1 dataset each, but poorly on multiple datasets. RBMO and PO performed the worst, with mean rankings of 10.25 and 8.92, respectively, with RBMO performing the worst on 6 datasets and PO performing centrally on all datasets but with generally low accuracy.
To more intuitively demonstrate the performance of each algorithm in terms of classification accuracy, we plotted a radar chart of classification accuracy. It can be clearly observed from Fig 4 that the performance of IRBMO algorithm is particularly outstanding, and its contours are located on the outermost side on several datasets, which intuitively demonstrates its significant advantages on the 12 datasets. IRBMO achieves the highest classification accuracy on 6 datasets, and the overall contours are smooth and less fluctuating, which further corroborates its stability and robustness. In contrast, the contours of the other algorithms perform differently on different datasets: the contours of EHO, DE and CMAES are closer but significantly lower than those of IRBMO on some datasets, especially on the BCWP and BCC datasets, where the performance of DE and CMAES decreases. The profiles of GWO and GOOSE are close to IRBMO in some datasets, but still have some gap in most datasets, especially in DRD and Dermatology datasets, where GOOSE performs better but fluctuates more overall. The profiles of WSO, WOA and GA are significantly inward-looking, especially on the Hepatitis and BCC datasets, and the overall accuracy is low. RBMO and PO have the most inward-looking profiles, especially on the BCC and PDC datasets, and the overall performance is significantly lower than that of the other algorithms. The radar charts visualize that IRBMO leads the classification accuracy in all aspects, and the coverage and stability of its contours are better than the other algorithms, which further validates its efficiency and reliability in the classification task. Other algorithms such as EHO, DE, and CMAES perform moderately well, while WSO, WOA, GA, PO, and RBMO perform poorly, especially on multiple datasets with significantly lower accuracy.
Careful analysis of Table 9 shows that the CMAES algorithm performs optimally in the comparison of sensitivity metrics, with an average ranking of 3.00 and an overall ranking of first place. It achieved the highest sensitivity on 5 datasets such as BCWD (1.000), Dermatology (1.000) and ILPD (0.994333) with low standard deviation and high stability. IRBMO followed with a mean ranking of 3.75 and ranked second, with the best performance on 4 datasets such as BCC (0.744333) and Hepatitis (1.000). GWO,EHO ranked third and fourth with mean rankings of 4.42 and 4.50 respectively. GWO performed optimally on DRD (0.951333) and Lymphography (0.991667), and EHO performed optimally on BCWD (1.000) and Spectfheart (0.954667). DE was ranked fifth with a mean ranking of 4.83, performing optimally on 3 datasets but poorly on BCWP (0.530) and PDC (0.251667). GA, GOOSE, and WOA performed moderately well with mean rankings of 5.08, 5.25, and 6.58, respectively. WSO, PO, and RBMO performed poorly with mean rankings of 6.67, 9.25, and 9.67, respectively, with PO performing the worst on BCC (0.167) and Spectfheart (0.446333), and RBMO performing the worst on 6 datasets. sets where PO performed worst on BCC (0.398333) and Spectfheart (0.607333). The experimental results show that CMAES significantly outperforms other algorithms in sensitivity metrics, IRBMO, GWO and EHO perform well, DE, GA, GOOSE and WOA perform moderately, and WSO, PO and RBMO perform poorly.
Carefully analyzing Table 10, the IRBMO algorithm performs optimally in the comparison of specificity metrics, with a mean rank of 2.08 and the first overall rank. It achieved the highest specificity on five datasets, such as BCC (0.996), ILPD (1.000), and Spectfheart (1.000), with low standard deviation and high stability. GOOSE was ranked second with a Mean ranking of 3.75, and performed optimally on four datasets, such as BCWP (1.000) and Spectfheart (1.000). DE was ranked third with a mean ranking of 3.92, performing optimally on 1 dataset (BCWP, 0.984333) but with intermediate performance on most of the datasets. EHO was ranked fourth with a mean ranking of 4.17, performing optimally on 1 dataset (Spectfheart, 1.000), and with a more balanced performance overall. CMAES was ranked fifth with a mean ranking of 5.75, performing optimally on 3 datasets, such as BCWD (0.995) and PDC (0.906), but poorly on ILPD (0.885333).GWO, GA, and WOA performed moderately well, with mean rankings of 6.83, 7.08, and 6.00, respectively, with GWO performing moderately on DRD (0.252667) and Lymphography (0.660667), GA performed better on BCWP (0.907) and Spectfheart (1.000), and WOA performed moderately well on all datasets. WSO, PO, and RBMO performed poorly with mean rankings of 6.42, 8.50, and 9.75, respectively, with WSO performing optimally on 1 dataset (ILPD, 0.949667), PO performing worst on 3 datasets (e.g., DRD, 0.343), and RBMO performing worst on 5 datasets (e.g., BCC, 0.981 and Cleveland, 0.612). The experimental results show that IRBMO significantly outperforms the other algorithms on specificity metrics, GOOSE, DE, and EHO perform well, CMAES, GWO, GA, and WOA perform moderately well, and WSO, PO, and RBMO perform poorly.
Careful analysis of Table 11 shows that the GA algorithm performs optimally in the comparison of the number of features (Number of Selected Features), with an average rank (Mean) of 2.50 and an overall rank (Rank) of first place. It achieved the least number of features (W) on 5 datasets such as Dermatology (1.5), Parkinsons (1.833333) and Spectfheart (2.366667) with low standard deviation and high stability. IRBMO ranked the second with a Mean Rank of 2.67 and performed optimally on 2 datasets such as BCWD (5.233333) and Spectfheart (2.766667). DE ranked third with a mean ranking of 3.83 and performed optimally on 3 datasets such as DRD (2.0), Hepatitis (2.233333), and Spectfheart (1.533333.) GWO ranked fourth with a mean ranking of 4.42 and performed moderately well across all datasets and did not perform optimally or poorly on any dataset. CMAES and EHO tied for fifth place with a mean ranking of 5.17, with CMAES performing optimally on 2 datasets (e.g., ILPD, 3.733333) and EHO performing intermediate on all datasets. GOOSE, WOA, and WSO performed moderately well, with mean rankings of 6.92, 6.75, and 7.92, respectively, with GOOSE performing 1 dataset at the worst (Lymphography, 353.7), WOA on 1 dataset (Lymphography, 273.0333), and WSO on all datasets. PO and RBMO performed poorly, with mean rankings of 9.83 and 10.25, respectively, with PO performing worst on 5 datasets (e.g., BCC 13.33333 and Lymphography, 371.1), and RBMO performed the worst on 5 datasets (e.g., BCC, 13.03333 and Lymphography, 367.2333). The experimental results show that GA significantly outperforms the other algorithms in terms of the number of features, IRBMO, DE, and GWO perform well, CMAES, EHO, GOOSE, WOA, and WSO perform moderately, and PO and RBMO perform poorly.
Carefully analyzing Table 12, the IRBMO algorithm demonstrates a significant performance advantage in the comparison of F-score metrics, with a mean rank of 2.92 and an overall rank of No. 1. IRBMO achieves the highest F-scores on four out of the twelve datasets and does not have the worst performance on any dataset, especially on the BCC, Dermatology, ILPD and Spectfheart datasets, while its low standard deviation further demonstrates its stability and robustness. GOOSE ranked second with a mean ranking of 3.92, performing optimally on three datasets (e.g., BCWP, ILPD and Parkinsons datasets) and overall performance. GWO was ranked third with an average ranking of 4.50 and performed optimally on 2 datasets (e.g., DRD and Lymphography datasets) but performed poorly on the PDC dataset. CMAES was ranked fourth with an average ranking of 4.92 and performed optimally on the Parkinsons dataset but performed poorly on the PDC and Spectfheart datasets. DE was ranked fourth with an average ranking of 4.92 and performed optimally on the PDC and Spectfheart datasets. CMAES ranked fourth with an average ranking of 4.92, performing optimally on the Parkinsons dataset but poorly on the PDC and Spectfheart datasets, while DE and EHO tied for fifth with an average ranking of 5.00. DE was the best performer on the BCC, Dermatology, and ILPD datasets but poorly on the Hepatitis and PDC datasets, and EHO was the best performer on the BCWD, ILPD, and Parkinsons datasets, resulting in a more balanced overall performance. GA, WOA and WSO tied for the seventh place with an average ranking of 6.50 respectively, with GA performing best on BCWP and Spectfheart datasets, WOA performing in the middle of all datasets, and WSO performing best on ILPD dataset. RBMO and PO performed poorly with an average ranking of 8.58 and 9.67, with RBMO performing worst on 2 datasets (e.g., BCWD, ILPD and Parkinsons dataset). datasets (e.g., BCC and PDC datasets) and PO performed worst on 5 datasets (e.g., BCC, BCWP, Hepatitis, PDC, and Spectfheart datasets). The experimental results show that IRBMO significantly outperforms the other compared algorithms in the F-score metrics, demonstrating its efficiency and stability in classification tasks, while GOOSE, GWO and CMAES perform better, DE, EHO, GA, WOA and WSO perform moderately, and RBMO and PO perform poorly.
In summary, IRBMO, when dealing with the feature selection problem, shows excellent performance in several key evaluation indexes, such as fitness value, classification accuracy, sensitivity, specificity, the number of selected features, and F-score, etc. It not only can effectively identify the key features to reduce the complexity of the model and improve the operation efficiency, but also can strike a good balance between the precision rate and the recall rate to ensure the overall performance of the model, which shows a wide applicability and strong competitiveness.
5.4. Experiments comparing IRBMO with existing feature selection algorithms
Although several metaheuristic algorithms have been widely used in the feature selection field, there is still room for further optimization of their performance. In order to improve the effectiveness of feature selection, we systematically investigate the newly proposed IRBMO algorithm and make an in-depth comparison with nine algorithms that are outstanding performers in current feature selection tasks. These comparison algorithms include Binary Chimp Optimization (BCHIMP) [66], Hyper Learning Binary Dragonfly Algorithm (BDA) [67], and Binary Grey Wolf Optimizer (BGWO) [68], Binary Harris Hawk Optimization (BHHO) [69], Bare-bones Particle Swarm Optimization with Mutual Information (BPSO) [70], Binary Improved African Vulture Optimization Algorithm (AVOA) [71], Artificial Rabbits Optimization (ARO) [36], Capuchin Search Algorithm(CAPSA) [35], Electric Eel Foraging Optimization Algorithm (EEFO) [29]. These algorithms have been rigorously selected and all of them show strong competitiveness in the feature selection domain. Although IRBMO has demonstrated excellent results in feature selection tasks with meta-heuristic algorithms, we recognize that a more in-depth and detailed comparison with existing feature selection algorithms can further reveal IRBMO’s unique strengths and potential value in feature selection performance. Such a comparative analysis will not only help us understand the working principle and performance characteristics of IRBMO more comprehensively, but also provide researchers and practitioners in related fields with richer information and references to support them in making more informed decisions in feature selection tasks. Therefore, in this section, the comparative study with existing feature selection algorithms will be further deepened, with a view to revealing more comprehensively the excellent performance of IRBMO in the field of feature selection.
To ensure the fairness of the comparison and the reliability of the results, the parameters of all the algorithms were carefully tuned, and the specific configurations are detailed in Table 13. The mean and standard deviation of the fitness values were calculated for the experimental statistics, please refer to Table 14 for detailed results. The analysis of the mean reveals the average performance of the algorithms, while the standard deviation reflects the range of the fluctuation of the performance, and thus the stability of the algorithms is assessed. In addition, we visualize the experimental results through box-and-line plots (6Fig 5), which intuitively show the performance fluctuation, outlier distribution, and data distribution characteristics of the algorithms on different datasets, and provide a strong support for the analysis of stability and robustness.
And we also plotted their iterative curves for visualizing their convergence performance. And we conducted ablation experiments of IRBMO to deeply explore its internal mechanism. First, we plotted iterative graphs to visualize the convergence performance of IRBMO and its variants. In order to study the roles of elite search behavior and collaborative hunting behavior in the algorithm more clearly, we compared IRBMO1, which contains only elite search behavior, and IRBMO2, which contains only collaborative hunting behavior, respectively. Since memory storage behavior is regarded as a compensatory behavior, which is usually used in combination with elite search behavior and collaborative predation behavior, in this ablation experiment, we did not examine the effect of memory storage behavior alone, but instead focused on analyzing the independent contributions of elite search and collaborative predation behaviors to the performance of the algorithm. With this design, we are able to more accurately understand the role of each component in IRBMO and how they work together to contribute to the overall performance of the algorithm.
To comprehensively evaluate the performance of the IRBMO algorithm, we conducted an in-depth analysis in terms of both ablation experiments and comparison with existing algorithms. First, the effectiveness of the different improvement strategies in IRBMO is verified by ablation experiments. IRBMO1 introduces elite search behavior (exploration strategy), while IRBMO2 focuses on collaborative hunting behavior (exploitation strategy). Careful analysis of Table 14 shows that IRBMO achieved optimal fitness values on 7 of the 12 datasets, significantly better than IRBMO1 and IRBMO2. Specifically, IRBMO1 performs optimally on 1 dataset, but its overall performance is not inferior to that of RBMO, suggesting that while an exploration strategy alone can enhance the global search capability, the lack of support from an exploitation strategy may lead to insufficient local search capability. IRBMO2 performs optimally on 0 datasets, but its performance is slightly inferior to that of RBMO, especially on complex datasets (e.g., DRD, Lymphography), but its performance is significantly enhanced when it is combined with IRBMO1, proving the significant advantages of the exploitation strategy in terms of local search and convergence. IRBMO achieves optimal performance on most datasets by combining the exploration and exploitation strategies, verifying an effective balance between its global and local search capabilities.
Second, we compare the performance of IRBMO with a variety of existing feature selection algorithms (e.g., RBMO, BCHIMP, BDA, BHHO, etc.). The experimental results show that IRBMO performs optimally on 7 out of 12 datasets and has an average ranking of 1.5 on all datasets, which is significantly better than the other compared algorithms. In contrast, BDA and EEFO perform better with an average ranking of 4.50, but the number of times they perform optimally on the datasets is too low; BPSO, AVOA and ARO perform moderately well with average rankings of 4.67, 5.42 and 5.58, respectively; while BHHO, CAPSA, BCHIMP and RBMO perform poorly, especially on multiple datasets with high fitness values or higher fluctuations. Other existing algorithms (e.g., BCHIMP, BDA, BHHO, etc.) perform better on some datasets, but none of them is as good as IRBMO in terms of overall performance. These results fully demonstrate the strong competitiveness of IRBMO in the feature selection task.
To present the performance of the IRBMO algorithm intuitively, we adopt a box-and-line diagram for visual analysis (see Fig 7). From the figure, it can be clearly seen that the median of IRBMO algorithm is significantly less than that of other algorithms, which fully indicates that it has obvious performance advantages in most cases. Meanwhile, the IRBMO algorithm has a compact box shape and a small range of performance fluctuations, which further highlights its excellent stability. In addition, compared with other algorithms, the IRBMO algorithm has a very small number of outliers, a characteristic that reflects its strong anti-interference ability in dealing with different data sets and complex experimental conditions. IRBMO always maintains a consistent and superior performance in the face of data distribution differences and external factors.
By analyzing the convergence curves of IRBMO with other existing feature selection algorithms in iterative Fig 6, we intuitively observe the significant advantages of IRBMO in terms of convergence speed and accuracy. On most datasets, the convergence curve of IRBMO drops rapidly at the beginning of the iteration, indicating that it is able to quickly find the search region close to the optimal solution, demonstrating an efficient global search capability. In contrast, the convergence curves of other algorithms (e.g., RBMO, BCHIMP, BDA, etc.) decline slower, especially on complex datasets (e.g., DRD, Spectfheart), and IRBMO’s convergence speed is significantly better than other algorithms. At the later stage of iteration, the convergence curve of IRBMO stabilizes at lower fitness values with less fluctuation, indicating that it is not only able to find better solutions, but also has strong stability. On the other hand, the convergence curves of other algorithms tend to stay at higher fitness values and still have large fluctuations in the late iteration, indicating that they are prone to fall into local optimization or unstable convergence. Specifically, the convergence curve of RBMO decreases slower in most datasets and the final convergence value is higher, indicating that its global and local search ability is not as good as that of IRBMO; the algorithms such as BCHIMP and BDA perform better in some datasets, but are still not as good as that of IRBMO in the whole, e.g., the convergence curve of BCHIMP fluctuates more in the late iteration period, and the convergence curve of BDA decreases slower. For example, the convergence curve of BCHIMP fluctuates more in the late iteration and the convergence curve of BDA decreases more slowly. In summary, IRBMO achieves faster and more stable convergence by combining the elite search (exploration) and collaborative hunting (exploitation) strategies to quickly explore the global search space at the beginning of the iteration, and then finely adjusts the quality of the solution through the exploitation strategy, which further proves its high efficiency and stability in the feature selection task.
5.5. IRBMO combined with S- and V-shaped transfer functions
While dealing with the feature selection problem in the IRBMO algorithm, since the algorithm deals mainly with sequences of continuous values, we need to convert these continuous values into binary form, i.e., 0 or 1. This requires us to employ a conversion function to achieve this conversion. Various conversion methods have been covered in previous discussions. In this section, we will further analyze a total of eight variants of transfer functions from two major families (S-form and V-form) with a view to finding algorithms that have stronger generalization capabilities and are more suitable for dealing with feature selection problems. Table 15 lists the transfer functions of these two families in detail, while Fig 8 visualizes their respective distribution curves. This in-depth analysis is intended to help us better understand, and select, the best conversion strategy for the binary representation of continuous features in IRBMO-based feature selection tasks.
In this section, we design eight variants of binary IRBMO (Improved Binary Multi-Objective Optimization) based on different transfer functions, specifically named S1IRBMO, S2IRBMO, S3IRBMO, S4IRBMO, V1IRBMO, V2IRBMO, V3IRBMO and V4IRBMO.
Each of these variants adopts a unique transformation mechanism designed to enhance the efficacy of feature selection. Each variant is implemented in a specific transformation format, which is designed to optimize the exploration and utilization of the feature space, thus demonstrating their respective advantages in a multi-objective optimization framework.
To ensure the accuracy and high credibility of the experimental results, we have always maintained a rigorous and meticulous scientific attitude to carry out the research work. In the experimental setup, we adopt a uniform standard, setting the number of algorithm populations strictly at 30 and the number of iterations at 100 to ensure the consistency of the experimental conditions. In order to minimize the impact of random errors on the experimental results, we adopted the strategy of repeating the experiments several times, running each algorithm independently for 30 times, and exhaustively calculating the results of the fitness runs of these 30 experiments, and taking the average as the final performance result of the algorithms under the experimental conditions. To comprehensively and accurately record and analyze the fitness values of IRBMO and its various variants on different datasets, we elaborately designed the experimental record forms to record each key data in detail. For each dataset, we not only counted the mean value of the algorithm to reflect the average performance level of the algorithm, but also calculated the standard deviation to assess the stability of the algorithm. These two metrics complement each other and together constitute a comprehensive and in-depth evaluation system of algorithm performance. In the presentation of the experimental results, we highlight the best-performing results in bold so that readers can identify at a glance the best performance of each algorithm on different datasets. Through such experimental design and data analysis methods, we strive to draw scientific and reliable research conclusions to provide strong support for the development of related fields.
As can be seen from the detailed data in Table 16, the algorithms using the simple transformation approach exhibit impressive accuracy and stability on the vast majority of datasets.
Among these variants, the V2 variant stands out as one of the best performers with its outstanding performance. The V2 variant not only dominates in terms of average performance, but also wins wide recognition for its stable performance. This is a testament to the unique advantages of the V-shaped transfer function and its parameter configurations employed by the V2 variant in the feature selection task, allowing it to more accurately identify and retain the most valuable features for the target task. It is closely followed by the V1, V3 variant, which fails to outperform the V2 variant but still occupies the second position in the ranking. This suggests that the combination of the V1 and V3 transfer functions is somewhat complementary and can create synergy in the feature selection task, thus improving the overall performance of the algorithm. Surprisingly, however, the V4 variant is relatively low in the ranking, coming in at the ninth position. This may be related to the fact that the transfer function employed by the V4 variant and its parameter configurations are not sufficiently adapted for the current feature selection task. This also re-emphasizes the important impact of transfer function selection and parameter configuration on algorithm performance. As for the original IRBMO algorithm, it was ranked fourth in the ranking, which indicates that the algorithm itself is somewhat competitive. However, after the variants were designed by introducing S-shaped and V-shaped transfer functions, some of the variants (e.g., V2 and V1V3) showed significant improvement in performance. In addition, the four variants S4, S2, S1, and S3 constructed from the S-shaped transfer function occupied positions 5, 6, 7, and 8, respectively, in the ranking. Although these variants fail to outperform the variants constructed by V-shaped transfer functions, they still show some stability and competitiveness. This further demonstrates the potential of transfer function diversity for algorithmic performance improvement. It is worth noting that while these algorithmic variants showed impressive accuracy and stability on most datasets, performance fluctuations were observed on three specific datasets. Such fluctuations may be related to factors such as the completeness of the dataset, noise level, and distributional characteristics. This reminds us that we need to be more careful in selecting and handling datasets in subsequent studies to ensure stable and reliable algorithm performance.
In summary, the IRBMO algorithm and its variants constructed via S-shaped and V-shaped transfer functions demonstrate rich performance and in-depth analytical value on the task of medical data feature selection. These results provide useful references and insights for the optimization and improvement of subsequent algorithms.
6. Conclusion
To deeply explore the performance of the IRBMO algorithm in the field of feature selection for medical data, we have carefully designed and implemented an experiment covering 12 medical datasets with different complexities. The experiment aims to reveal the unique strengths and potential weaknesses of each algorithm in multiple dimensions, with a special focus on the specific contribution of the IRBMO algorithm in improving classification performance. To this end, a comprehensive comparison between IRBMO and the base RBMO algorithm was conducted to accurately assess its gains in feature selection.
During the experiments, we chose the classical KNN (K Nearest Neighbors) classifier as the baseline model. The wide acceptance and performance stability of KNN make it an ideal experimental tool, while it can clearly reflect the direct impact of feature selection on classification results. The experimental results in Table 17 show that the IRBMO algorithm improves the classification performance by 43.89% on average compared to the baseline model. This significant increase not only verifies the excellent performance of IRBMO in feature selection, but also demonstrates its strong potential in optimizing the overall classification performance.
Overall, the IRBMO algorithm performs well in the medical data feature selection task and significantly improves classification performance. This finding lays an important foundation for subsequent algorithm optimization and practical application, and also points out the direction for further exploration and improvement of feature selection strategies, providing strong support for advancing the development of medical data analytics.
This study has some limitations in the following aspects: first, the relatively small number of features and limited complexity of the dataset used may not fully cover the challenges of feature selection for real-world high-dimensional and nonlinear data; second, only the basic KNN classifier was used in the experiments, which is simple and easy to use, but may not be able to show the full potential of the feature selection algorithms; in addition, the design of the algorithm mainly focuses on the balance between development and exploration, the range of parameter tuning may be somewhat limited, and the tuning process of hyperparameters is not described in detail, which may have some impact on the robustness and reproducibility of the results. In order to further enhance the depth and breadth of the study, future work may consider introducing more high-dimensional datasets, combining multiple advanced classifiers, and further refining the parameter tuning strategy, so as to more comprehensively validate the performance of the algorithm and its scalability in practical applications.
In the wave of digital transformation, data has become the core driving force for social change and life innovation, and its potential information value is immeasurable. Especially in the healthcare field, data is not only an important resource for life safety, but also a source of wisdom for disease research, diagnosis and treatment decisions, and prevention strategies. However, with the exponential growth of data size, complexity and diversity, data processing and analysis are facing unprecedented challenges. The “dimensionality catastrophe” brought by high-dimensional data further aggravates the difficulty of analysis and adds obstacles to data insight. In this context, feature selection technology, as an important part of data preprocessing, is becoming more and more critical. By accurately identifying core features and reducing data dimensionality, feature selection technology significantly improves analysis efficiency and accuracy.
In this study, an innovative “Multi-behavioral Enhanced IRBMO Feature Selection Algorithm” (IRBMO) is proposed to address the key challenges in feature selection of medical data. The algorithm is based on the red-billed blue magpie optimization framework, and incorporates several innovations such as elite search strategy, collaborative hunting mechanism and memory storage strategy, which successfully realize the efficient optimization of the feature selection problem. These designs not only significantly improve the computational efficiency and selection accuracy of the algorithm, but also enhance its robustness and adaptability. In a series of tests on medical datasets of different sizes and dimensions, IRBMO demonstrates excellent performance. To comprehensively verify the performance of IRBMO, we conducted comparative experiments with the binary versions of other meta-heuristic algorithms as well as existing feature selection algorithms, and used a variety of evaluation metrics, such as fitness value, classification accuracy, sensitivity, specificity, number of selected features, F-scores, rank test, and Fiederman’s test, to perform a comprehensive evaluation. In addition, we constructed eight IRBMO variants with eight transfer functions to deeply explore their potentials in feature selection of medical data, and successfully constructed the V2IRBMO variant with better performance. Experimental results show that IRBMO achieves an overall performance improvement of 43.89% compared to the original RBMO algorithm. In addition, compared with mainstream optimization algorithms and existing feature selection methods, IRBMO has significant advantages in terms of accuracy, stability and efficiency.
Although this study has achieved important results in the field of feature selection, there are still several directions worth exploring. For example, although the packing method, as one of the high-performance feature selection methods, performs well on high-dimensional datasets, its computational complexity increases significantly with the growth of data size, which puts high demands on computational resources and time. Therefore, future research will focus on exploring the deep integration of the packing method with the filtering method and the embedded method, in order to fully utilize their respective advantages and balance the relationship between high accuracy and low complexity. In addition, we plan to introduce an adaptive parameter optimization strategy so that the algorithm can dynamically adjust the parameters according to the characteristics of different datasets, further enhancing its versatility and practicality. In the future, we will also expand the application areas of IRBMO algorithm, such as practicing it in complex data scenarios such as image processing and natural language processing. The complex data structure and strong feature correlation in these fields provide a broad stage for IRBMO to show its potential. Meanwhile, we will also explore the performance of IRBMO in prediction tasks, with a view to providing innovative solutions for more practical application scenarios.
In summary, this study provides an efficient and innovative solution in the field of feature selection by proposing the IRBMO algorithm. The algorithm breaks through the limitations of traditional methods and achieves significant improvement in performance, stability and applicability, showing a wide range of application prospects. In the future, we will continue to deepen our research on IRBMO, optimize its performance, promote its practical application in more fields, and contribute more wisdom and strength to the innovative development of data science.
References
- 1. Yang J, Li Y, Liu Q, Li L, Feng A, Wang T, et al. Brief introduction of medical database and data mining technology in big data era. J Evid Based Med. 2020;13(1):57–69. pmid:32086994
- 2. Berisha V, Krantsevich C, Hahn PR, Hahn S, Dasarathy G, Turaga P, et al. Digital medicine and the curse of dimensionality. NPJ Digit Med. 2021;4(1):153. pmid:34711924
- 3. Jiang X, Kong X, Ge Z. Augmented industrial data-driven modeling under the curse of dimensionality. IEEE/CAA J Autom Sinica. 2023;10(6):1445–61.
- 4. Qiao H, Chen Y, Qian C, Guo Y. Clinical data mining: challenges, opportunities, and recommendations for translational applications. J Transl Med. 2024;22(1):185. pmid:38378565
- 5. Wang J, Qiao L, Ye Y, Chen Y. Fractional envelope analysis for rolling element bearing weak fault feature extraction. IEEE/CAA J Autom Sinica. 2017;4(2):353–60.
- 6. Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, et al. Feature Selection. ACM Comput Surv. 2017;50(6):1–45.
- 7. Khaire UM, Dhanalakshmi R. Stability of feature selection algorithm: A review. Journal of King Saud University - Computer and Information Sciences. 2022;34(4):1060–73.
- 8. Jin Z, Teng S, Zhang J, Chen G, Cui F. Structural damage recognition based on filtered feature selection and convolutional neural network. Int J Str Stab Dyn. 2022;22(12).
- 9. Zhang Y, Wang S, Xia K, Jiang Y, Qian P. Alzheimer’s disease multiclass diagnosis via multimodal neuroimaging embedding feature selection and fusion. Information Fusion. 2021;66:170–83.
- 10. Zhou R, Zhang Y, He K. A novel hybrid binary whale optimization algorithm with chameleon hunting mechanism for wrapper feature selection in QSAR classification model:A drug-induced liver injury case study. Expert Systems with Applications. 2023;234:121015.
- 11. Liu W, Wang J. Recursive elimination–election algorithms for wrapper feature selection. Applied Soft Computing. 2021;113:107956.
- 12. Seghir F, Drif A, Selmani S, Cherifi H. Wrapper-Based Feature Selection for Medical Diagnosis: The BTLBO-KNN Algorithm. IEEE Access. 2023;11:61368–89.
- 13. Lee S-J, Xu Z, Li T, Yang Y. A novel bagging C4.5 algorithm based on wrapper feature selection for supporting wise clinical decision making. J Biomed Inform. 2018;78:144–55. pmid:29137965
- 14. Agrawal P, Abutarboush HF, Ganesh T, Mohamed AW. Metaheuristic Algorithms on Feature Selection: A Survey of One Decade of Research (2009-2019). IEEE Access. 2021;9:26766–91.
- 15. Beheshti Z. A fuzzy transfer function based on the behavior of meta-heuristic algorithm and its application for high-dimensional feature selection problems. Knowledge-Based Systems. 2024;284:111191.
- 16. Akinola OO, Ezugwu AE, Agushaka JO, Zitar RA, Abualigah L. Multiclass feature selection with metaheuristic optimization algorithms: a review. Neural Comput Appl. 2022;34(22):19751–90. pmid:36060097
- 17. Yab LY, Wahid N, Hamid RA. A Meta-Analysis Survey on the Usage of Meta-Heuristic Algorithms for Feature Selection on High-Dimensional Datasets. IEEE Access. 2022;10:122832–56.
- 18. Ghaemi M, Feizi-Derakhshi M-R. Feature selection using Forest Optimization Algorithm. Pattern Recognition. 2016;60:121–9.
- 19. Samieiyan B, MohammadiNasab P, Mollaei MA, Hajizadeh F, Kangavari M. Novel optimized crow search algorithm for feature selection. Expert Systems with Applications. 2022;204:117486.
- 20. Jia H, Xing Z, Song W. A New Hybrid Seagull Optimization Algorithm for Feature Selection. IEEE Access. 2019;7:49614–31.
- 21. Xu M, Song Q, Xi M, Zhou Z. Binary arithmetic optimization algorithm for feature selection. Soft comput. 2023;:1–35. pmid:37362265
- 22. Too J, Abdullah AR. A new and fast rival genetic algorithm for feature selection. J Supercomput. 2020;77(3):2844–74.
- 23. Fu S, Ma C, Li K, Xie C, Fan Q, Huang H, et al. Modified LSHADE-SPACMA with new mutation strategy and external archive mechanism for numerical optimization and point cloud registration. Artif Intell Rev. 2025;58(3).
- 24. Wolpert DH, Macready WG. No free lunch theorems for optimization. IEEE Trans Evol Computat. 1997;1(1):67–82.
- 25. Hu G, Du B, Wang X, Wei G. An enhanced black widow optimization algorithm for feature selection. Knowledge-Based Systems. 2022;235:107638.
- 26. Hammouri AI, Mafarja M, Al-Betar MA, Awadallah MA, Abu-Doush I. An improved Dragonfly Algorithm for feature selection. Knowledge-Based Systems. 2020;203:106131.
- 27. Peng H, Ying C, Tan S, Hu B, Sun Z. An Improved Feature Selection Algorithm Based on Ant Colony Optimization. IEEE Access. 2018;6:69203–9.
- 28. Li Z. A local opposition-learning golden-sine grey wolf optimization algorithm for feature selection in data classification. Applied Soft Computing. 2023;142:110319.
- 29. Al-Betar MA, Braik MSh, Mohamed EA, Awadallah MA, Nasor M. Augmented electric eel foraging optimization algorithm for feature selection with high-dimensional biological and medical diagnosis. Neural Comput & Applic. 2024;36(35):22171–221.
- 30. Wang P, Xue B, Liang J, Zhang M. Multiobjective Differential Evolution for Feature Selection in Classification. IEEE Trans Cybern. 2023;53(7):4579–93. pmid:34874881
- 31. Agrawal RK, Kaur B, Sharma S. Quantum based Whale Optimization Algorithm for wrapper feature selection. Applied Soft Computing. 2020;89:106092.
- 32. Pan H, Chen S, Xiong H. A high-dimensional feature selection method based on modified Gray Wolf Optimization. Applied Soft Computing. 2023;135:110031.
- 33. Ma W, Zhou X, Zhu H, Li L, Jiao L. A two-stage hybrid ant colony optimization for high-dimensional feature selection. Pattern Recognition. 2021;116:107933.
- 34. Braik MSh, Awadallah MA, Dorgham O, Al-Hiary H, Al-Betar MA. Applications of dynamic feature selection based on augmented white shark optimizer for medical diagnosis. Expert Systems with Applications. 2024;257:124973.
- 35. Braik M, Awadallah MA, Al-Betar MA, Hammouri AI, Alzubi OA. Cognitively Enhanced Versions of Capuchin Search Algorithm for Feature Selection in Medical Diagnosis: a COVID-19 Case Study. Cognit Comput. 2023;:1–38. pmid:37362196
- 36. Awadallah MA, Braik MS, Al-Betar MA, Abu Doush I. An enhanced binary artificial rabbits optimization for feature selection in medical diagnosis. Neural Comput & Applic. 2023;35(27):20013–68.
- 37. Wang S, Celebi ME, Zhang Y-D, Yu X, Lu S, Yao X, et al. Advances in Data Preprocessing for Biomedical Data Fusion: An Overview of the Methods, Challenges, and Prospects. Information Fusion. 2021;76:376–421.
- 38. Remeseiro B, Bolon-Canedo V. A review of feature selection methods in medical applications. Comput Biol Med. 2019;112:103375. pmid:31382212
- 39. Ershadi MM, Seifi A. Applications of dynamic feature selection and clustering methods to medical diagnosis. Applied Soft Computing. 2022;126:109293.
- 40. Fu S, Li K, Huang H, Ma C, Fan Q, Zhu Y. Red-billed blue magpie optimizer: a novel metaheuristic algorithm for 2D/3D UAV path planning and engineering design problems. Artif Intell Rev. 2024;57(6).
- 41. Nadimi-Shahraki M.H., Taghian S., Mirjalili S., Abualigah L., Binary Aquila Optimizer for Selecting Effective Features from Medical Data: A COVID-19 Case Study, 10 (2022) 1929.
- 42. Nadimi-Shahraki MH, Taghian S, Mirjalili S. An improved grey wolf optimizer for solving engineering problems. Expert Systems with Applications. 2021;166:113917.
- 43. Nadimi-Shahraki MH, Moeini E, Taghian S, Mirjalili S. Discrete Improved Grey Wolf Optimizer for Community Detection. J Bionic Eng. 2023;20(5):2331–58.
- 44. Zhou J, Hua Z. A correlation guided genetic algorithm and its application to feature selection. Applied Soft Computing. 2022;123:108964.
- 45. Yu F, Guan J, Wu H, Wang H, Ma B. Multi-population differential evolution approach for feature selection with mutual information ranking. Expert Systems with Applications. 2025;260:125404.
- 46. Ma H, Li M, Lv S, Wang L, Deng S. Roulette wheel-based level learning evolutionary algorithm for feature selection of high-dimensional data. Applied Soft Computing. 2024;163:111948.
- 47. Espinosa R, Jiménez F, Palma J. Surrogate-assisted multi-objective evolutionary feature selection of generation-based fixed evolution control for time series forecasting with LSTM networks. Swarm and Evolutionary Computation. 2024;88:101587.
- 48. Nadimi-Shahraki MH, Taghian S, Mirjalili S, Faris H. MTDE: An effective multi-trial vector-based differential evolution algorithm and its applications for engineering design problems. Applied Soft Computing. 2020;97:106761.
- 49. Houssein EH, Abdalkarim N, Samee NA, Alabdulhafith M, Mohamed E. Improved Kepler Optimization Algorithm for enhanced feature selection in liver disease classification. Knowledge-Based Systems. 2024;297:111960.
- 50. Abdel-Salam M, Hu G, Çelik E, Gharehchopogh FS, El-Hasnony IM. Chaotic RIME optimization algorithm with adaptive mutualism for feature selection problems. Comput Biol Med. 2024;179:108803. pmid:38955125
- 51. Zhang J, Yan F, Yang J. Binary plant rhizome growth-based optimization algorithm: an efficient high-dimensional feature selection approach. J Big Data. 2025;12(1).
- 52. Zhuang Z, Pan J-S, Li J, Chu S-C. Parallel binary arithmetic optimization algorithm and its application for feature selection. Knowledge-Based Systems. 2023;275:110640.
- 53. Cinar AC. A novel adaptive memetic binary optimization algorithm for feature selection. Artif Intell Rev. 2023;56(11):13463–520.
- 54. Khosravi H, Amiri B, Yazdanjue N, Babaiyan V. An improved group teaching optimization algorithm based on local search and chaotic map for feature selection in high-dimensional data. Expert Systems with Applications. 2022;204:117493.
- 55. Nadimi-Shahraki MH, Taghian S, Javaheri D, Sadiq AS, Khodadadi N, Mirjalili S. MTV-SCA: multi-trial vector-based sine cosine algorithm. Cluster Comput. 2024;27(10):13471–515.
- 56. Ganjei MA, Boostani R. A hybrid feature selection scheme for high-dimensional data. Engineering Applications of Artificial Intelligence. 2022;113:104894.
- 57. Vommi AM, Battula TK. A hybrid filter-wrapper feature selection using Fuzzy KNN based on Bonferroni mean for medical datasets classification: A COVID-19 case study. Expert Systems with Applications. 2023;218:119612.
- 58. Song Y-W, Wang J-S, Qi Y-L, Wang Y-C, Song H-M, Shang-Guan Y-P. Serial filter-wrapper feature selection method with elite guided mutation strategy on cancer gene expression data. Artif Intell Rev. 2025;58(4).
- 59.
Taghian S, Nadimi-Shahraki MH, Zamani H. Comparative Analysis of Transfer Function-based Binary Metaheuristic Algorithms for Feature Selection. In: 2018 International Conference on Artificial Intelligence and Data Processing (IDAP). IEEE. 2018. 1–6. https://doi.org/10.1109/idap.2018.8620828
- 60. Zhang S, Li X, Zong M, Zhu X, Cheng D. Learning k for kNN Classification. ACM Trans Intell Syst Technol. 2017;8(3):1–19.
- 61. Das AK, Das S, Ghosh A. Ensemble feature selection using bi-objective genetic algorithm. Knowledge-Based Systems. 2017;123:116–27.
- 62. Hamad RK, Rashid TA. GOOSE algorithm: a powerful optimization tool for real-world engineering challenges and beyond. Evolving Systems. 2024;15(4):1249–74.
- 63. Lian J, Hui G, Ma L, Zhu T, Wu X, Heidari AA, et al. Parrot optimizer: Algorithm and applications to medical problems. Comput Biol Med. 2024;172:108064. pmid:38452469
- 64. Al-Betar MA, Awadallah MA, Braik MS, Makhadmeh S, Doush IA. Elk herd optimizer: a novel nature-inspired metaheuristic algorithm. Artif Intell Rev. 2024;57(3).
- 65. Beyer H-G, Sendhoff B. Simplify Your Covariance Matrix Adaptation Evolution Strategy. IEEE Trans Evol Computat. 2017;21(5):746–59.
- 66. Pashaei E, Pashaei E. An efficient binary chimp optimization algorithm for feature selection in biomedical data classification. Neural Comput & Applic. 2022;34(8):6427–51.
- 67. Too J, Mirjalili S. A Hyper Learning Binary Dragonfly Algorithm for Feature Selection: A COVID-19 Case Study. Knowledge-Based Systems. 2021;212:106553.
- 68. Sallam NM, Saleh AI, Arafat Ali H, Abdelsalam MM. An Efficient Strategy for Blood Diseases Detection Based on Grey Wolf Optimization as Feature Selection and Machine Learning Techniques. Applied Sciences. 2022;12(21):10760.
- 69. Too J, Abdullah AR, Mohd Saad N. A New Quadratic Binary Harris Hawk Optimization for Feature Selection. Electronics. 2019;8(10):1130.
- 70. Song X, Zhang Y, Gong D, Sun X. Feature selection using bare-bones particle swarm optimization with mutual information. Pattern Recognition. 2021;112:107804.
- 71. Sharifian Z, Barekatain B, Quintana AA, Beheshti Z, Safi-Esfahani F. Sin-Cos-bIAVOA: A new feature selection method based on improved African vulture optimization algorithm and a novel transfer function to DDoS attack detection. Expert Systems with Applications. 2023;228:120404.