Figures
Abstract
Recently, hybrid feature selection methods have demonstrated excellent performance on high-dimensional data, but many of these methods tend to yield relatively homogeneous feature subsets. To address this, we propose a novel hybrid feature selection algorithm called the Hybrid Multiple Filter-Wrapper algorithm. This algorithm employs a dual-module structure: Module 1 utilizes the random forest feature importance method to achieve significant dimensionality reduction of the original feature set, resulting in the candidate feature subset F1. In Module 2, we first propose a bivariate filter algorithm: the minimum Spearman-Maximum Mutual Information method. This method assesses both the correlation and redundancy of F1, whose results are then fed into the wrapper algorithm for further exploration. Furthermore, we integrate two swarm intelligence algorithms to develop the Hybrid Grey Wolf and Chaotic Dung Beetle Wrapper Algorithm. This algorithm incorporates chaos theory to enhance the position update mechanism of the Dung Beetle Algorithm, then embeds Dung Beetle Algorithm into the Grey Wolf Algorithm, thereby balancing exploration and exploitation capabilities. Finally, a process optimization mechanism based on the theory of random laser intensity fluctuations dynamically monitors the optimization process. Upon convergence of the wrapper algorithm to a local optimum, the filter algorithm is restarted, and chaos theory is used to reset the population. This process enhances the diversity of both the candidate feature subset and the population, effectively avoiding local optima. We extensively compare our method with ten hybrid algorithms from the past three years across ten public benchmark datasets from MGE. Experimental results show that our algorithm outperforms the most other algorithms: on all datasets, it achieves an average classification accuracy that is at 1.3% least higher, an average feature subset length that is at least 8 units shorter, and a dimensionality reduced to less than 0.45% of the original. The results are statistically significant.
Citation: Shi Y, Zheng Y, Bai X (2025) A multiple filter-wrapper feature selection algorithm based on process optimization mechanism for high-dimensional omics data analysis. PLoS One 20(12): e0338051. https://doi.org/10.1371/journal.pone.0338051
Editor: Elnaz Pashaei, Indiana University School of Medicine, UNITED STATES OF AMERICA
Received: April 10, 2025; Accepted: November 16, 2025; Published: December 11, 2025
Copyright: © 2025 Shi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The datasets analyzed in this study are publicly available. ten high-dimensional datasets were sourced from the Gene Expression Model Selector and Microarray Data repositories (https://www.kaggle.com/datasets/ahmadalmahsiri/microarray-datasets), including Colon, SRBCT, Lymphoma, DLB, Brain Tumor, MLL, Ovarian, Breast, and SMK-CAN-187, CLL-SUB-111 (https://github.com/NoRaincheck/scikit-feature/tree/master/skfeature/data).
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
Over the past few decades, vast amounts of data has been generated across various fields [1]. These datasets often have high dimensionality and noise, which poses a significant challenge in removing irrelevant and redundant features for data classification [2,3].Therefore, feature selection has become a central focus for researchers aiming to reduce computational complexity, improve data analysis accuracy, and enhance operational efficiency [3].
Feature selection methods are categorized into filter [5] and wrapper approaches. Filter methods score features based on specific metrics, selecting those with the highest scores. While less computationally demanding, they often exhibit limited generalizability [6]. Some researchers have suggested filter algorithms that bypass initial screening and select features directly from the dataset, often leading to an overly broad and unsatisfactory selection [7,8].
Wrapper methods [9] combine optimization algorithms and classifiers to search the original feature set and generate candidate subsets. Classifiers then assess these subsets, typically achieving high classification accuracy (ACC). Common optimization algorithms include Genetic Algorithm (GA) [10], Particle Swarm Optimization (PSO) [11], Antlion Optimization (ALO) [12], and Cuckoo Search (CS) [13] among other swarm intelligence optimization algorithms. In 2022, researchers, including Xue et al., introduced the Dung Beetle Algorithm (DBA) [14], a novel algorithm with robust global search and parallel computing capabilities, although it lacks high solution accuracy [15]. Currently, DBA-based methods are primarily applied in engineering [16]. The superior global search and local exploration capabilities of swarm intelligence algorithms have led many researchers to apply them in feature selection [16,17]. Hybrid feature selection methods utilizing DBA are less developed compared to other swarm intelligence algorithms [17]. Therefore, this paper considers using the Dung beetle algorithm for Wrapper feature selection based on a hybrid swarm intelligence strategy.
In 2017, Jain et al. proposed a method combining univariate Filter and an improved PSO method for gene selection, achieving excellent classification performance [18]. However, the high dimensionality of genes and the limited number of samples complicate analysis and classification, and the BPSO algorithm tends to converge to local optima early, making it difficult to achieve the global best [18].In 2021, Wang et al. and Wojdan et al. introduced a hybrid filter-wrapper algorithm [19]. The filter method selects a subset of high-scoring features using specific metrics, effectively narrowing the search scope [20]. The wrapper method employs an optimization algorithm to search filtered feature subsets and selects the optimal one based on classification accuracy. While the hybrid algorithm combines the filter’s speed with the wrapper’s accuracy [21], it still struggles with local optima [22].
In 2022, Pashaei et al. ’s research combined Maximum Relevance Minimum Redundancy (mRMR) with the Binary Artificial Owl Optimization (BAO) algorithm, showing excellent performance. However, it performed poorly on some benchmark datasets, indicating the limitations of its mutation mechanism [23]. In 2023, the team used the recently compiled Turkey earthquake database to test a reliable and reasonable data-based CAV prediction model for seismic disaster analysis, structural health monitoring, geotechnical engineering investigations, and early earthquake warning analysis [24]. In 2024, Sosa et al. ’s research emphasized the importance of feature collaboration, showing that their combined contribution provides more information about category labels than individual features themselves [25].
In 2024, Lawrence et al. developed an efficient feature selection and classification method for high-dimensional microarray cancer data, noting that this data faces a strong “curse of dimensionality,” and the computational power required may limit the effectiveness of existing methods [26]. In the same year, Dr. José introduced the Iterative Linear Association Analysis method, which can effectively observe multicollinearity between variables and generate more accurate explanations using linear modeling techniques for result prediction [27]. In 2025, Pashaei et al. developed a multi-filter gene selection method to identify biomarkers for Alzheimer’s disease by integrating various ranking techniques and feature selection algorithms to improve accuracy [28]. In the same year, Pashaei et al. introduced two novel hybrid gene selection techniques, which improved search behavior by integrating crossover and mutation operators [29]. However, the algorithm may require longer execution times, especially when processing very large datasets. Additionally, high-dimensional, heterogeneous data poses challenges to Filter methods, limiting their effectiveness [29].
In summary, current research faces three main challenges: 1) Univariate Filter methods struggle with the theoretical limitation of feature interaction compensation, while bivariate Filter methods are challenged by both the curse of high dimensionality and limited sample sizes; 2) The literature has not fully explored the collaborative framework between the DBA and the grey wolf algorithm (GWA); 3) There is limited research on interdisciplinary regulatory methods in hybrid feature selection. Additionally, most existing hybrid algorithms lack dynamic interaction between the filter and wrapper stages, and the wrapper algorithm is susceptible to local optima due to poor initial feature quality.
To address these challenges, this study first combines univariate and proposed bivariate Filter algorithms in a stepwise manner, effectively addressing the issues of high dimensionality and limited sample sizes in omics data. Next, improvements are made to two swarm intelligence algorithms to enhance the strengths of each, filling the gap in research on hybrid swarm intelligence algorithms based on the DBA. Finally, an interdisciplinary algorithm regulation method, combining physics and computer science, is designed based on a physical optics model. The result is the development of a dual-module hybrid feature selection algorithm, named Hybrid Multiple Filter-Wrapper (HMF-W).
In this approach, the M1 module uses the Random Forest-based Feature Importance Method (RF-FIM) [39] for significant dimensionality reduction of the original feature set, resulting in a candidate feature subset. The M2 module integrates the minimum Spearman Maximum Mutual Information (mSMMI) with the Hybrid Gray Wolf and Chaotic Dung Beetle Wrapper (HGW-CDBW) algorithms, using the proposed interdisciplinary algorithm control method, the Process Optimization Mechanism (POM), to further reduce and optimize the candidate feature subset. The main contributions of the HMF-W algorithm are introduced in three aspects:
(1) This study proposes a nonlinear bivariate Filter algorithm, the mSMMI algorithm. It measures the correlation between features and labels, as well as feature redundancy in the candidate subset, using mutual information and Spearman’s rank correlation coefficient. Two parameters are designed to adjust evaluation weights for correlation and redundancy, enabling feature measurement from multiple perspectives. Additionally, RF-FIM is used for significant dimensionality reduction, followed by the mSMMI algorithm, which selects features from multiple perspectives in sequence, overcoming the bottleneck of feature interaction compensation and reducing the feature dimensions for both the mSMMI and subsequent Wrapper algorithms.
(2) We improve and combine two swarm intelligence algorithms to form the Wrapper algorithm, HGW-CDBW. This algorithm uses chaotic disturbances to control the movement direction and step size of dung beetle individuals, enhancing the first three optimal solution areas of the gray wolf algorithm. It combines the GWA’s excellent global search ability with the dung beetle algorithm’s enhanced local development capability, enabling HGW-CDBW to maintain a balance between exploration and exploitation, achieving the goal of selecting the optimal feature subset.
(3) We combine the theory of random laser intensity fluctuations with the proposed average improvement rate and extreme value control strategy to form the POM mechanism. When the Wrapper algorithm falls into local optima, it dynamically adjusts the execution frequency of the bivariate Filter and Wrapper algorithms, using the results of the bivariate Filter algorithm to initialize the population with a Logistic chaotic representation. This increases the diversity of the candidate feature subset and the population, helping the algorithm to escape local optima and find the optimal feature subset.
The rest of this paper is organized as follows: Sect 2 reviews related work. Sect 3 introduces the proposed algorithm. Sect 4 presents experimental studies and result analysis, followed by a conclusion in Sect 5.
2 Related works
In this section, we will introduce seven important theories and methods, namely random laser intensity fluctuation theory, chaos theory, random forest-based feature importance methods, GWA algorithm, and DBA algorithm. These theories lay a solid foundation for the HMF-W algorithm.
2.1 The random lasers intensity fluctuations theory
Random Laser Intensity Fluctuations (RLIF) theory, discussed by Anderson S.L. Gomes and others, explores the fluctuations in light intensity caused by multiple reflections and interference in random Laser without fixed cavities and random scattering media [30]. It suggests that the probability density function of laser intensity is represented by an exponentially truncated Lévy-like distribution, which depicts a wide range of intensity fluctuations [31]. In such a distribution, as the input excitation power increases, the accumulation of electromagnetic energy also increases, thereby raising the tail index of the intensity distribution [32]. This means the fluctuations in intensity are quite significant, and the intensity distribution tends toward a Gaussian distribution, due to the Central Limit Theorem [30]. Additionally, the collective effect of pseudo-modes in random Laser is considered. Due to the varying contributions from each scattering center, the interference patterns of light waves exhibit high randomness and irregularity, which are characteristic of the intensity fluctuations in random Laser [31]. Random laser intensity fluctuations can be expressed as the superposition of multiple independent scattering events, each of which can be represented by a sine wave [32]. The mathematical formula is as follows:
where, I0 represents the average light intensity, while and
denote the amplitude, frequency, and phase of the i-th sine wave, respectively [33].
2.2 Chaos theory
Chaos theory studies pseudo-random, unpredictable behaviors in deterministic nonlinear systems, with its core being the system’s extreme sensitivity to initial conditions (the butterfly effect) and inherent pseudo-randomness [34]. This pseudo-randomness is not true randomness, but rather a sequence generated by deterministic equations, which is ergodic and non-periodic [35]. Chaotic systems effectively represent high-dimensional, nonlinear, strongly coupled iterative processes, similar to the search trajectory of swarm intelligence algorithms [36]. When the system’s state deviates from a stable path due to parameter disturbances or boundary condition changes, the chaotic mechanism quantifies its fluctuations, revealing hidden patterns not captured by traditional linear analysis [37]. This paper uses the Logistic map to generate chaotic sequences for discrete systems, with the formula as follows:
where, Seed is the chaotic seed within the range (3,4), and in this paper, it is set to 3.9, where xn is the current value and xn + 1 is the next value. At this point, the system enters chaos, showing extreme sensitivity to initial values [38].
Firstly, rounding errors inevitably occur when initializing the population in a wrapper algorithm, leading to significant deviations in long-term behavior. The algorithm then searches the solution space diffusely, making it difficult to concentrate locally. However, chaotic sequences can infinitely approach any point within a finite range, offering more efficient search space coverage than uniform distribution. Therefore, this paper considers using chaos theory to improve the population initialization part. Secondly, chaotic sequences have an almost non-repetitive characteristic. They can inject continuous diversity into the algorithm, suppressing premature convergence. Thus, this paper considers guiding the positions of individual agents in swarm intelligence algorithms through chaotic sequences to enhance their development ability in smaller feature spaces.
2.3 RF-FIM
The feature importance in random forests is calculated by adding the gains from features at each node across all decision trees. entropy of information is often used to measure feature contributions in classification problems [39].
where, H(D) is the Entropy of the dataset D, C is the Category set, Dj is the Subset of the category j, is the Conditional Entropy of after given a subset of feature ,
is the size of the feature subset,
is the size of the dataset. The contribution of each decision tree to feature importance is the average reduction in information entropy of that feature across all trees [40]. Feature importance evaluation in random forests is done through the contribution of node splits in the integrated decision trees, providing quantifiable screening criteria for high-dimensional features. Its advantage is that it can handle nonlinear relationships without requiring independent feature distribution assumptions [41].
The purpose of using the RF-FIM algorithm in the feature selection process is to select a small number of features with high information gain from the original dataset. This helps reduce the complexity of model training and improves efficiency [42]. The number of selected features is based on the total number of features in the dataset and depends on the size of the dataset and samples.
2.4 The gray wolf optimization algorithm
As a typical swarm intelligence algorithm, the Grey Wolf algorithm (GWA) [43] simulates and optimizes the social hierarchy and group hunting behaviors of Grey Wolf populations. The gray wolf population has a strict social hierarchy, categorized into four types: α wolves, β wolves, δ wolves, and ω wolves, with lower ranks having more wolves. ω Grey Wolf individuals Xi in the pack update their positions based on the locations of α wolf, β wolf, and δ wolf, ,
,
[43], with the update formula as follows:
Where, D represents the distance between the grey wolf and its prey, and t represents the iteration. X and Xp represent the position vectors of the Grey Wolf and the prey, while A and C are vector coefficients, with their specific mathematical expressions given by formulas (8) and (9) [43]:
Where, a is the convergence factor, r1 and r2 are random vectors taking values in [0,1]. As the number of iterations increases, the value of a gradually decreases from 2 to 0 [44].
When handling feature selection problems, the GWA algorithm needs to be converted into a binary form to facilitate feature selection. However, existing methods of binary feature selection usually rely on random number comparisons, making it difficult to avoid selecting all or no features. While the GWA algorithm has excellent global search capabilities, it tends to get trapped in local optima in the later stages of iteration. Therefore, this paper first considers adopting a more robust approach to convert continuous algorithms into discrete ones, and then applies a strategy that combines two swarm intelligence optimization algorithms to improve the grey wolf algorithm, forming a new algorithm that balances global search and local development as part of the wrapper.
2.5 Dung beetle algorithm
The Dung Beetle Algorithm (DBA) [14] is a population based swarm intelligence optimization algorithm proposed in 2022. This algorithm is mainly influenced by the behaviors of dung beetles such as rolling, reproduction, foraging, and stealing, with the dung beetle population divided into different roles in a 6:6:7:11 ratio [14]. DBA has advantages in convergence, search effectiveness, and exploration capabilities, but some phases of behavior that develop around local optima often remain at local optima [14], The following Fig 1. shows the conceptual model of the dung beetle’s boundary selection strategy [16].
Conceptual model of Dung Beetle’s boundary selection strategy.
The foraging behavior of dung beetles selects safe areas, and the foraging area for larvae dynamically updates [14], represented equation as follows:
Here, Xb is the local optimal position of the current population, Lbb and Ubb define the lower and upper bounds of the optimal foraging area [16], The position update equation for the foraging dung beetles is as follows:
Where Y1 is a random number following a normal distribution, and Y2 is a random vector between 0 and 1, with a size of 1 × Dim [16].
Although DBA has a unique boundary selection strategy, it struggles with exploring the solution space. Since the optimal search in each iteration is updated around the optimal individual, the position of the optimal individual is crucial. The original DBA algorithm struggles to deeply analyze the optimal individual, leading to some waste in the utilization of the optimal individual.
To improve DBA’s ability to explore neighborhood optima and solve the premature convergence problem, this study first considers introducing the chaos theory introduced in Sect 2.2 to guide the position of dung beetle individuals, hoping to further enhance DBA’s local development ability. Then, it considers adopting a unique embedded structure that combines the improved DBA and the GWA algorithm introduced in the previous section, which can minimize the premature convergence issue. This forms a novel wrapper feature selection algorithm based on a hybrid swarm intelligence algorithm.
2.6 Previous hybrid feature selection methods
In the field of feature selection, previous researchers combined the advantages of filter and wrapper methods to develop many excellent hybrid algorithms.They have made significant contributions to the research field. The following Table 1 summarizes the five hybrid feature selection algorithms from 2021 to 2024.
In 2021, Got et al. attempted to merge filter and wrapper methods, using Whale Optimization Algorithm (WOA) to address the multi-objective nature of feature selection [47]. However, multi-objective optimization remained at the algorithm level, with experimental validation only designed using 2 multi-objective methods and 3 datasets [47]. This makes the results difficult to generalize. In 2022, Zhu et al. proposed an HFSIA algorithm to address the challenges of high-dimensional data. The algorithm combines filter methods and introduces a deadly mutation mechanism and Cauchy operator to enhance algorithm performance for dimensionality reduction [48]. However, the high redundancy of features led to poor classifier performance. At the same time, its fitness evaluation method was inefficient, leading to high computational complexity.
In 2023, Xu et al. discussed the method of using Markov blankets for feature selection, focusing on the relationship between features and classes [49]. They emphasized computing symmetrical uncertainty to evaluate the shared information between features and classes, helping to identify relevant features. Meanwhile, the study acknowledged the limitations of the datasets used. Secondly, the algorithm is prone to local optima due to neglecting feature interactions. Lastly, manually set thresholds are required, making it difficult to find the global optimal solution. In the same year, Deng et al. proposed a strategy involved continuously updating the feature threshold set to guide the evolution of genetic algorithms, enhancing search capability [50]. However, the unguided crossover and mutation in the genetic algorithm led to excessive randomness and the generation of poor solutions. Its randomness causes slow convergence of the algorithm.
In 2024, Li et al. designed a heuristic search based forward feature selection algorithm to enhance the exploration of feature subsets [51]. However, the cooperative swarm intelligence feature selection algorithm has limitations in searching the entire solution space, making it easy to overlook better solutions. Secondly, the designed algorithm often sacrifices the globally optimal feature combination to meet output conditions. Lastly, the heuristic algorithm suffers from early termination, leading to repeated feature subsets and invalid features.
Previous researchers’ work demonstrated the optimization potential of hybrid feature selection but still had the following shortcomings:
1) Static thresholds or fixed output conditions are difficult to adapt to changes in data distribution.
2) Most methods focus on classification accuracy, ignoring multi-objective collaborative optimization of feature subset length and feature redundancy.
3) Meta-heuristic methods have low computational efficiency in ultra-large feature spaces, leading to a surge in computational costs.
To address these issues, the proposed HMF-W algorithm first uses an extremum control strategy in the proposed process optimization mechanism to adaptively adjust the threshold, thus accommodating real-time changes in the optimization process. Next, the proposed algorithm uses an innovative algorithmic framework, where the univariate Filter method is responsible for significant dimensionality reduction. The POM mechanism regulates the alternating execution of the bivariate Filter method and the improved wrapper method to repeatedly optimize the result feature subset of the univariate Filter method. This method not only ensures a high ACC while selecting shorter feature subsets, but also significantly reduces computational complexity when dealing with high-dimensional data.
3 Hybrid multiple filter-wrapper algorithm
In this section, we will provide a detailed introduction to the POM-based Hybrid Multiple Filter-Wrapper(HMF-W) algorithm. The algorithm consists of two modules: Module 1 consists of the RF-FIM algorithm; Module 2 consists of the Filter mSMMI algorithm, the wrapper HGW-CDBW algorithm, and the process optimization mechanism.
In Module M1, we use the RF-FIM algorithm mentioned in Sect 2.3, which measures the contribution of features based on the random forest method. We treat it as feature importance and use it as a criterion to select a set of important features from the original dataset for further dimensionality reduction by other algorithms in M2, such as the nonlinear bivariate Filter mSMMI algorithm. This approach first significantly reduces the dimensionality of the original feature set, effectively reducing the length of the final feature subset. It then allows feature selection from three aspects: random forest, mutual information, and Spearman’s coefficient, preventing the omission of valuable features. Below, we detail the M2 module:
3.1 mSMMI filter algorithm
Feature selection algorithms, specifically filter algorithms, rank features in a dataset using specific metrics. These include univariate and multivariate methods, such as deep learning based feature selection and LASSO regression. LASSO regression performs feature selection using L1 regularization, which ensures sparsity in high-dimensional data. However, it relies on linear assumptions and is limited in handling nonlinear relationships.
Based on the renowned multivariate mRMR concept, to better consider both relevance and redundancy simultaneously, this paper introduces a new filter algorithm, mSMMI, which uses Maximum Mutual Information to measure the relevance between features and labels, and minimum Spearman to measure redundancy among features, dynamically adjusting the ratio of mutual information score to redundancy using two parameters, s and y.
3.1.1 Maximum Mutual Information (MMI).
Mutual Information (MI) measures the dependency between two variables, indicating how much information about one variable can be obtained from the value of another. Traditional methods of measuring relevance, such as the Pearson correlation coefficient, can only capture linear relationships. Mutual information can measure complex nonlinear relationships. This makes mutual information extremely robust when dealing with datasets that have highly complex structures.
For two random variables X(x1, x2, ..., xn) and Y(y1, y2, ..., yn), mutual information MI(X;Y) is defined as:
Where, p (x, y) is the joint probability distribution of X and Y, and p(x) and p(y) are the marginal probability distributions of X and Y, respectively.
The value of mutual information can be interpreted as how much information about variable Y can be obtained if we know the value of variable X (and vice versa). The higher the value, the stronger the dependency between the two variables. If the mutual information is zero, it indicates that the two variables are independent.
Suppose we have two vectors a and b,where a = , b =
. Using Eq (13), we calculate MI(a, b) = 1.609. The high mutual information indicates a strong nonlinear relationship between a and b, as shown in Fig 2. If the two vectors are c =
and d =
, using Eq (13), we calculate MI(a, b) = 0.673. The low mutual information indicates a weak positive correlation between c and d, as shown in Fig 3.
Nonlinear Relationship between vectors a and b.
Weak positive correlation between vectors c and d.
In high-dimensional raw datasets, there are thousands of features, which are reduced to hundreds of features by the M1 module to form a feature subset Here, the feature fi with the highest relevance to the label L is calculated using Eq (17):
where, MMI(Fobject, L) represents the maximum mutual information value between features and the label, Fobject denotes the feature subset obtained after dimensionality reduction by module M1, L represents the label vector in the dataset, MAX is the maximum function, MI(L, fi) is calculated by Eq (16). and fi denotes the features in Fobject, where i = 1, 2, ..., n.
3.1.2 Minimum Spearman (mS).
In feature selection, the concept of maximum relevance and minimum redundancy considers both the correlation between features and labels and the redundancy among features.
The Spearman coefficient [38] is primarily used to measure the monotonic relationship between two variables, where one variable increases or decreases as the other does. The Spearman coefficient is less sensitive to the distribution and form of data, focusing more on the ordinal relationships between variables. This allows it to be used in analyzing both linear and non-linear monotonic relationships, making it more effective in handling non-normal distributions or data with outliers.
For any two features in the feature set , denoted as fi and fj, their Spearman coefficient is calculated by Eq (18).
where, the Spearman coefficient between feature [-1,1], n is the number of features, dij is the difference in observation sequences of features fi and fj,i = 1,2,...,n.
Given two features F1(0.8,0.8,0.7,0.4,0.9,0.6,0.7), F2(0.5,0,0,0.1,0.5,0.1,0.1), the Spearman coefficient =0.384, indicating a certain monotonic relationship between the two features, though it is relatively weak.
Eq (18) calculates the Spearman coefficient between known features and each feature in the set. Known features are denoted as fknow, and a set of candidate features are denoted as The feature fk with the smallest Spearman coefficient value with the known features in the set is represented by Eq (19):
Where, fknow denotes a Known feature, currently f1. Fobject is the target feature set, at this point Fobject−know, and denotes the minimum Spearman coefficient between the known feature and the target features set. The MIN function denotes the minimum value, fk is a feature in Fobject, and the Spearman coefficient between fk and fknow is the smallest within Fobject. The value of
is calculated using Eq (18).
The minimum Spearman (mS) between two feature sets is calculated as follows: the selected feature set is represented as P={p_1, p_2,..., p_i,..., p_m}, i=1,2,,…,m And the target set is denoted as A feature is selected from the target set to satisfy the requirement that the Spearman coefficient is the minimum value between the selected feature and the known feature subset.The Spearman coefficient between the two feature sets is represented by Eq (20).
Where, P is the selected feature set, Fobject is the target feature set, mS2(P,Fobject) is the minimum Spearman coefficient between the target and selected feature sets. MIN is the minimum value function, fmin is the feature from Fobject with the smallest Spearman coefficient, mS1(fmin,P) calculated by Eq (19).
3.1.3 Minimum Spearman Maximum Mutual Information (mSMMI).
In the M1 module, the RF-FIM algorithm reduces the number of features, cutting thousands down to a smaller set of hundreds with high information gain. In the M2 module, the bivariate filter algorithm further improves the selection, picking dozens of features that have high relevance and low redundancy. The number of features selected by the RF-FIM and mSMMI algorithms depends on the number of features and samples in the dataset, as well as the adjustment factor p and the mapping function f(x). This helps determine how many features will be selected by both algorithms. The formula for the number of features k2 selected by filter is:
Where, the random number RandnumA, as the baseline, can provide a certain number of features for the filter algorithm. rand()(1,10), p
(0,1) with p set to 0.8 in this case. where f and e represent the number of features and samples in the dataset, respectively. The cosine function term can decide the number of features to select based on the sample size of different datasets. In larger datasets, more diverse features can be selected to avoid an overly concentrated distribution of feature numbers across datasets. The number of features K1 selected by the RF-FIM algorithm is also calculated using Eq (21). where the range of the baseline RandnumA is expanded by 10 times. This allows for more high quality features to be provided for multiple runs of the mSMMI algorithm’s feature selection, with the rest remaining unchanged.
Taking the Co and Ov datasets as examples, we use Eq (21) to calculate 100 iterations, and the variation of K1 and K2 values is shown in Fig 4 below.
The trend of K1 and K2 values in the Co and Ov datasets.
From Fig 4, it can be seen that the K1 and K2 values obtained from Eq (21) in the Ov dataset are significantly greater than those obtained in the Co dataset. This is because the Ov dataset has more features and samples than the Co dataset. Therefore, on different datasets, the filter algorithm can select an appropriate number of high quality features using Eq (21).
We combine the previously mentioned maximum mutual information score and minimum Spearman coefficient to select features. After formulating MMI and mS into a single equation, we opt for subtraction to reduce computation. Subsequently, we introduce parameters into by Eq (22) to adjust the influence weights of mutual information scores and redundancy scores, thereby avoiding the situation where the influence weights of correlation and redundancy remain constant during the filter algorithm. In this way, we can select features one by one from the candidate feature subset.
Where, s and y are iteratively generated based on Eqs (23) and (24), Fobject as the candidate feature set, L is the label vector in the feature subset, P is the selected feature set, p1 is the first selected feature, and i denotes the number of selected features.
This study uses the mSMMI algorithm to find the best feature combination, which is then improved by the HGW-CDBW wrapper algorithm for precise feature selection. We use parameters s and y to adjust the influence of relevance and redundancy. Since the values of MMI and mS differ significantly across different datasets, the values of s and y cannot be 1. The value of s decreases while y increases, and they are calculated using Eqs (23) and (24).
Where i is the current iteration of the wrapper algorithm, and I is the maximum number of iterations. s decreases from 1 to 0, and y increases from 0 to 1 during the wrapper algorithm’s iterations, as shown in Figs 5 and 6 shows the flowchart of mSMMI. Below is the pseudocode:
During the maximum number of iterations, s decreases from 1 to 0, and y increases from 0 to 1.
This flowchart is an introduction to the proposed filter algorithm mSMMI.
Algorithm 1 mSMMI Algorithm Pseudocode.
3.2 The hybrid gray wolf and chaotic dung beetle wrapper algorithm
To synergize the global convergence of the gray wolf optimization algorithm and the local exploitation advantages of the DBA algorithm, while avoiding the premature convergence flaw of DBA, this study improves the dung beetle and gray wolf optimization algorithms through a hybrid algorithmic strategy. Specifically, it combines the fine development ability of the chaotic dung beetle algorithm (CDBA) with the hierarchical leadership structure of the binary gray wolf algorithm. It then combines with a support vector machine classifier to form the Hybrid Gray Wolf and Chaotic Dung Beetle Wrapper (HGW-CDBW) algorithm.
3.2.1 Chaotic dung beetle algorithm.
The Dung Beetle algorithm is inspired by the behavior of dung beetles, which use their antennae to sense their surrounding environment and push spherical dung to find the best burial spot. The algorithm solves optimization problems by simulating this behavior. This paper applies it to hybrid feature selection, using average ACC as the fitness function to evaluate the performance of feature subsets and find the optimal subset.
The dung beetle primarily searches around locally optimal solutions. However, this search is too blind and singular, often neglecting the exploration of nearby areas while approaching locally optimal solutions, reducing the versatility of the individual and frequently resulting in settling at local optima. This paper introduces chaos sequences into the dung beetle search process to increase chaotic disturbances, using Logistic mapping to generate chaos sequences, with the specific formula as follows:
Where, Seed is a chaotic seed with a value range of (3,4), set to 3.9 in this paper. xn is the current value, δ is the next value in the chaotic sequence. n is the number of search agents, set to 30 in this paper.
One weakness of traditional DBA is its poor exploration of solution spaces, as each iteration optimally searches around the best individual. Thus the position of the best individual is crucial, but traditional DBA fails to analyze the best individual in depth, leading to some waste of the best individual’s utility.
To enhance DBA’s capability to explore local optima and address premature convergence, we employed chaotic sequences as the adjustment factor for search steps, achieving adaptability and diversity in the search process. Compared to the traditional DBA algorithm, the introduction of chaotic mapping increases the randomness and exhaustiveness of the search, allowing for more effective local exploration within the feature space while maintaining diversity and convergence in the search. In this paper, the initial position of the dung beetles is determined by the top three optimal solutions from the grey wolf algorithm. The following is the formula for the improved dung beetle position update:
Where, xi is the current position of the dung beetle, is the new dung beetle position, and i is the individual dung beetle’s index number. Xbest is the known best position, and δ is the step length factor generated based on Eq (25).
Each dung beetle having an independent step length factor. The sign function determines the direction of movement, and the log function adjusts the movement step length, decreasing as the distance increases, allowing for a precise approach to the optimal solution.
The introduction of chaotic theory aims to increase the diversity and local exploitation ability of the algorithm, avoiding premature convergence in CDBA algorithm and exploring more potential optimization areas. Below is the pseudocode for the CDBA algorithm:
Algorithm 2 CDBA algorithm pseudocode.
3.2.2 Embedded dung beetle binary grey wolf algorithm.
In the discrete feature selection problem, each position vector dimension in the binary grey wolf algorithm corresponds to the selection status of a feature, where 1 indicates the feature is selected, and 0 means it is not selected. Position selection is usually assisted by random numbers, with the threshold generally fixed at 0.5. Although this position update method uses randomization to determine feature selection, it may result in cases where all random numbers are greater than or less than 0.5, leading to the selection of all features or no features being selected. To overcome it, we use a set of random numbers in ascending order, selecting the median from the sequence as a threshold to determine positions, as per the following formula:
where is the selection status of feature j by wolf i in iteration t+1 , rand() is a random number generated from a uniform distribution, and μ is the median of an ascending sequence of random numbers; features greater than μ are selected, otherwise, they are not.
In Fig 7, in the feature selection stage, this study adopts a median based threshold selection strategy. Under this selection mechanism, a feature is selected only when its corresponding value exceeds the threshold. This strategy effectively avoids the extreme cases of not selecting any features or selecting all features, ensuring that the algorithm always obtains a sufficient and high quality feature subset. Secondly, it can adaptively regulate the feature space dimension, reducing the final feature subset dimension. This binarization strategy improves the algorithm’s convergence performance and stabilizes the size of the feature subset.
This figure is an explanation of Eq (27).
To overcome the limitations of the GWA algorithm in local development capabilities, we combined the precise development ability of the chaotic dung beetle algorithm with the hierarchical leadership structure of the GWA. The chaotic dung beetle algorithm focuses on and optimizes the top three individuals in the grey wolf population, conducting local development in their vicinity areas. It generates non-periodic search paths through chaotic sequences and adjusts the dung beetles’ movement direction and step size using Eq (22), achieving precise local development. This approach can improve the quality of solutions while maintaining diversity.
Specifically, we employ the chaotic dung beetle algorithm to perform a search in the vicinity of Alpha, Beta, and Delta wolves, respectively. After three searches, we sort and compare the fitness values of the three best dung beetle individuals with those of the current top three alpha wolves, thereby dynamically updating the positions and fitness values of the top three grey wolf individuals. On this basis, other wolves in the pack are attracted to approach the direction of the optimal solution, and the positions of the wolf pack are adjusted based on the repositioned top three alpha wolves. This approach can preserve the diversity of the solution space. The proposed embedding strategy not only retains the excellent global search capability of the Grey Wolf algorithm but also significantly improves the precision of local searches. Ultimately, a balance between exploration and exploitation is achieved. Fig 8 is a flowchart of the HGW-CDBW; below is the pseudocode:
Algorithm 3 HGW-CDBW algorithm pseudocode.
This flowchart is an introduction to the HGW-CDBW algorithm.
3.3 The proposed algorithm
The HMF-W algorithm consists of M1 and M2 modules. In the M1 module, the RF-FIM algorithm mentioned in Sect 2.3 is used for preliminary dimensionality reduction, achieving a significant reduction in dimensions. In the M2 module, the proposed Filter mSMMI and wrapper HGW-CDBW algorithms are used. Through the proposed POM mechanism, we dynamically adjust the execution frequency of Filter and wrapper algorithms to escape local optima. When the wrapper algorithm is executed multiple times and its search effect approaches saturation, the algorithm will execute the Filter algorithm to reinitialize the population. This allows for more effective approaches to the global optimum. Let us now provide a detailed introduction:
3.3.1 Population initalization combining chaos theory.
In swarm intelligence optimization algorithms, population diversity affects the convergence speed and search efficiency. Populations are usually initialized randomly, but this method may lead to uneven population distribution, thus affecting the algorithm’s global search capability and convergence speed. To address this issue, this paper proposes a population initialization method based on chaos theory, enhancing diversity and coverage by mapping chaotic sequences into the search space. By assigning an independent random seed to each individual, the diversity of the initial population is ensured. To initialize the position of each grey wolf using the following formula:
Where, where N is the number of populations, J is the dimension of the search space, pij is the value of the i-th individual in the j-th dimension.
, where
,
. l and u are the lower and upper bounds of the search space, respectively; δ is an element from the chaotic sequence, calculated according to Eq (25).
Chaotic systems exhibit better uniformity, sensitivity to initial values, and pseudo-randomness, which help the population to thoroughly explore the solution space. The chaotic sequence is transformed into the problem’s solution space through linear mapping, ensuring the feasibility of the initial solutions. Algorithm 4 demonstrates the specific steps for initializing the population based on chaos theory.
Algorithm 4 Chaos based population initialization.
3.3.2 Process optimization mechanism.
In feature selection, the previously mentioned filter and wrapper methods can be combined in various ways to form a hybrid algorithm. This paper introduces the theory of random laser intensity fluctuations, combined with the proposed average improvement rate and mechanism control strategy, forming a POM mechanism that can dynamically adjust the execution frequency of filter and wrapper algorithms. The wrapper algorithm is used for multiple optimizations of the results obtained from the filter algorithm executions.
According to Eq (1), when A1 = 2.743, A2 = 1.372, A3 = 0.914; f1 = 0.1, f2 = 0.2, f3 = 0.4; and , the random laser intensity fluctuation model produces the results shown in Fig 9.
The Figure demonstrates the variation trend of the random laser intensity fluctuation function.
The horizontal axis in the diagram represents time, showing two periods, while the vertical axis represents the intensity of the laser. We can see that there are six extremums in the intensity during each cycle. These six extremums are used to adjust the execution of the filter and wrapper algorithms.
Average Improvement Rate(AIR), as proposed in this paper, is a key performance metric that measures the degree of performance improvement from one iteration to the next. We use it to determine whether the wrapper algorithm is trapped in a local optimum. The specific formula is as follows:
where, AIR is the Average Improvement Rate, fi and fi + 1 are the fitness scores of the current (ith) and next ((i + 1)th) iterations respectively, N is the total number of iterations, and i denotes the current iteration number, .
In the early iterations of the HGW-CDBW algorithm, due to the randomness of the initial solutions and the exploration of the solution space, the search effects are more significant, as indicated by a higher AIR value. As the algorithm approaches the optimal solution or saturation point, the improvement in fitness decreases and may even decline. Consequently, the AIR value naturally decreases. The POM mechanism monitors whether the wrapper algorithm’s search results are approaching saturation or getting trapped in local optima by dynamically comparing the count of AIR decrease (CAD) and extrema.
When the CAD value exceeds the current extremum, the algorithm will call the mSMMI filter algorithm again to select a new set of candidate features to reinitialize the population for the HGW-CDBW algorithm, and the CAD value is reset to zero. This achieves the purpose of escaping the local optimum and providing a diverse set of candidate features.
The extremum control (EC) strategy proposed in this paper is used to regulate the progress of the wrapper algorithm’s search. It dynamically adjusts the current extremum used for comparison by combining the six light intensity extrema mentioned before and the AIR value calculated by Eq (29) to determine the search status of the wrapper algorithm, thus ending the search earlier or later. The specific logic is as follows:
If the AIR value continues to increase, indicating that the search results of the algorithm are stronger than expected, the current extremum is increased. Then the wrapper algorithm will perform more iterations, allowing it to search more solution spaces, making it easier to find the global optimum. If the AIR value continues to decline to zero or even below zero, it indicates that the search effect of the algorithm is weaker than expected. Depending on the extent to which it is below expectations, the current extremum is dynamically decreased to allow the wrapper algorithm to end the current search sooner and revert to the filter algorithm to reinitialize the population. This allows the wrapper algorithm to quickly escape local optima. Fig 10 below shows the changes in the values of CAD and Extremum in the Colon dataset over iterations.
The figure demonstrates the variation trends of both the performance metric CAD and the extremum set values during 100 algorithm iterations on the Colon dataset.
Fig 10 shows that in the early stages of iteration, the search for solutions is ideal, and the EC strategy gradually increases the current extremum, allowing the wrapper algorithm to iterate more times and fully explore the solution space. As the search deepens, the likelihood of finding better solutions increasingly decreases. The EC strategy gradually lowers the extremum to expedite the completion of the wrapper algorithm, which then triggers the bivariate filter algorithm to refresh the population. This achieves the goal of quickly escaping local optima and providing a more diverse set of features. The EC strategy determines when to adjust the process of the algorithm, enabling self-optimization adjustments based on the current search status and historical data, and effectively avoiding premature convergence to local optima.
This paper, based on the theory of random laser intensity fluctuations, combines the proposed average improvement rate and extremum control strategy to form a POM mechanism. Unlike traditional multicollinearity techniques, which mainly focus on handling linear correlations between features, this mechanism can promptly call the proposed bivariate filter algorithm to more effectively help the wrapper algorithm escape local optima. It can also adaptively extend the search process of the wrapper algorithm, improving global search capability.
Below is the pseudocode of the HMF-W algorithm, and Fig 11 is its flowchart. From it, we can see that the HMF-W algorithm first uses the RF-FIM algorithm to reduce the original feature set significantly and obtain the candidate feature subset F1, as Module M1. Module M2 includes the remaining part of the algorithm, in which Filter Algorithm 1 (mSMMI) is used to perform multi-angle comprehensive consideration on F1, resulting in the candidate feature subset Fj. Then, based on Fj, Algorithm 4 is used to initialize the population, which is handed over to Algorithm 3 (HGW-CDBW) for iterative optimization. In addition, the proposed POM is responsible for real-time monitoring of the optimization process of Algorithm 3. When Algorithm 3 gets trapped in a local optimum, it switches to Algorithm 1 to generate a new candidate feature subset Fj, which is then given to Algorithm 4 to initialize the population and continue the iterative process of Algorithm 3 until the optimal feature subset is output. Fig 11 below shows the flowchart of the entire HMF-W algorithm:
Algorithm 5 HMF-W algorithm pseudocode.
This figure is an introduction to the proposed algorithm HMF-W.
4 Experiment and result analysis
In order to fully evaluate the hybrid HMF-W algorithm, this paper conducted several experiments on 10 high-dimensional datasets (DS):
The first group of experiments is an ablation experiment, where the various improved components of the proposed hybrid algorithm are separated out. A total of 7 ablation hybrid algorithms were formed. These include the HMF-W algorithm proposed in this paper, the HMF-W_CDB algorithm (removing the chaotic Dung Beetle algorithm), the HMF-W_GWA algorithm (removing the binary grey wolf algorithm), the HMF-W_M1 algorithm (removing the RF-FIM algorithm or M1 module), the HMF-W_mSMMI algorithm (removing the mSMMI algorithm), the HMF-W_Chaos algorithm (removing chaos theory), and the HMF-W_POM algorithm (removing the process optimization mechanism). Each of these algorithms was tested to validate the effectiveness of each component improvement.
The second group of experiments compares the proposed filter algorithm mSMMI with 10 other classic Filter algorithms, such as ReliefF, Chi-square test, mutual information, normalized mutual information, information gain, Fisher score, Spearman, Pearson, maximum Kendall minimum redundancy, and maximum Kendall minimum chi-square, to demonstrate the excellent performance of the proposed Filter algorithm.
The third group of experiments compares the proposed improved wrapper algorithm HGW-CDBW with other classic wrapper algorithms, such as Ant Lion Optimizer (ALO) [12], Dragonfly Algorithm (BDA) [55], Binary Bat Algorithm (BBA) [56], Genetic Algorithm (GA) [10], Dung Beetle Algorithm (DBA) [14], and Gray Wolf Algorithm (GWO) [43], to validate the effectiveness of the wrapper method improvements.
The fourth group of experiments compares the proposed hybrid algorithm with recent hybrid algorithms to verify the superiority of the proposed algorithm. Although there are many studies on wrapper and filter algorithms, finding an algorithm with exactly the same parameters as the HMF-W algorithm remains a significant challenge. To further validate the algorithm’s performance, we used hybrid algorithms from 2023-2025, such as mRMR-DBA [14], IGMPMMIAPSO [47], TMKMCRIGWO [48], and DFDW [49]. The number of iterations, population size, and the integrated SVM classifier of all the above algorithms are the same as the parameters of the proposed algorithm.
4.1 Experimental dataset and algorithm environment settings
4.1.1 Experimental datasets.
To validate the effectiveness of the HMF-W algorithm, 10 high-dimensional datasets from internationally recognized Microarray Data repositories and the Gene Expression Model Selector are used. Table 2 provides a brief description of these datasets, including the dataset name, sample size, number of features, and feature size. The sample size ranges from 62 to 253, the number of features from 2,000 to 24,481, and the number of classes from 2 to 5. Datasets with fewer than 100 features are considered low-dimensional, those with fewer than 2,000 are medium-dimensional, and the rest are high-dimensional. Datasets with two class labels are binary, while those with more than two labels are multi-class.
4.1.2 Algorithm parameter settings.
In the next experiments, we compare the proposed HMF-W algorithm with other wrapper and hybrid feature selection algorithms. The SVM classifier uses the radial basis function as the kernel, and its radial basis function and penalty parameters are determined using grid search. Additionally, we used ten-fold cross-validation to compute the classification accuracy (ACC). The dataset is divided into 10 equally sized groups. One group is used as the test set, while the other 9 are combined to form the training set. The average of these classification accuracies and the length of the obtained feature subset (LEN) are used as the result of the fitness function. Based on reference [43], the best classification performance happened when there are 100 iterations and 10 runs. To improve classification performance, this study set the population size to 30 for all algorithms, used 100 iterations, and averaged ACC after testing each dataset 10 times. Table 3 shows the detailed parameter values for each wrapper algorithm.
In this study, the Support Vector Machine (SVM) classifier is used as the evaluation criterion for the fitness function. For testing the ACC of the dataset, we used the 10-fold cross validation method. We divide the dataset into ten equal parts using this method, where 9 parts (90%) are merged as the training set, and the remaining part (10%) is used as the test set. This process is repeated 10 times to ensure that every sample is tested. The parameters of the SVM classifier, including the kernel function, penalty parameter, and Radial Basis Function (RBF) kernel parameters, are selected using the Grid Search method. To ensure fairness, except for Sect 4.6, which uses nested cross-validation (i.e., the parameters of the SVM classifier are set within each training fold), the remaining experimental results all use 10-fold cross validation.
4.1.3 Environmental description.
To comprehensively evaluate the performance of the proposed algorithm, we validate the HMF-W algorithm in a CPU environment. Specifically, the environment includes Windows 11, CPU: i5-9400H, 16GB RAM, and an NVMe 1024GB SSD. Through the validation process, we are able to analyze the algorithm’s performance under different conditions and conduct detailed testing and evaluation of its efficiency and ACC. The experiment design is divided into three main modules:
(1). Data Preprocessing Module: The core function of this module is to systematically normalize the raw data in the dataset to ensure that all features are on the same scale, thereby eliminating the impact of different scales on the analysis results. In addition, this module is responsible for extracting basic statistical information from the dataset, including the number of samples and the number of features.
(2). Analysis Module: After completing data preprocessing, we use the HMF-W algorithm to conduct in depth analysis on the normalized data. This algorithm focuses on the relationships between multiple variables and can effectively identify the interdependencies between features and their contributions to the target variable.
(3). Result Output: Finally, the results generated by the analysis module are visualized to verify the ACC and performance of the algorithm.
4.2 Ablation experiments
In order to verify the superiority of the dual-module hybrid algorithm HMF-W, the improved components of the algorithm were isolated for processing on the 10 test datasets given in Table 2. A total of 7 ablation hybrid algorithms were formed. These include the HMF-W algorithm proposed in this paper, the HMF-W_CDB algorithm (removing the chaotic Dung Beetle algorithm), the HMF-W_GWA algorithm (removing the binary grey wolf algorithm), the HMF-W_M1 algorithm (removing the RF-FIM algorithm or M1 module), the HMF-W_mSMMI algorithm (removing the mSMMI algorithm), the HMF-W_Chaos algorithm (removing chaos theory), and the HMF-W_POM algorithm (removing the POM mechanism). Each algorithm was run 10 times to get the average ACC, average LEN, and Relative Speedups (RS), as well as the Standard Deviation (SD) of all results. In this paper, the shortest average RunTime (RT) and its lowest SD for each dataset are taken as T and t, respectively, and the comparison results are shown in Table 4.
From Table 4, it can be seen that the HMF-W algorithm performs better in classification accuracy on all datasets, but on the Ly dataset, the proposed algorithm reaches the same accuracy as the HMF-W_CDB algorithm. Then, in most datasets, the feature subset length chosen by HMF-W is better than others, except for Ly, BT, and CS datasets, where the proposed algorithm is slightly worse than HMF-W_M1 and HMF-W_CDB. Finally, on the DL dataset, the proposed algorithm’s RS is significantly better than the other algorithm components. On most datasets, the relative speedup of the proposed algorithm is slightly worse than that of the HMF-W_GWO algorithm. Overall, the HMF-W algorithm achieves excellent classification accuracy while selecting a feature subset with lower length, balancing both classification precision and convergence. Moreover, the relative speedup of the proposed algorithm is also relatively better, surpassing at least half of the ablation hybrid algorithms. This shows the proposed algorithm has excellent classification accuracy, strong convergence, and good computational efficiency.
To more comprehensively assess the algorithm’s robustness, Table 5 presents the worst, best, and average ACC of the HMF-W algorithm on 10 datasets, as well as the corresponding feature number selected for the worst and best ACC. Additionally, we plot the convergence of the performance metrics of the 7 algorithms across all datasets, as shown in Figs 12 and 13.
This figure demonstrates the iterative convergence of classification accuracy of the proposed algorithm and six ablation algorithms on 10 datasets.
This figure demonstrates the iterative convergence of feature subset lengths of the proposed algorithm and six ablation algorithms on 10 datasets.
By combining Table 4 and Fig 12, it can be observed that the proposed algorithm has a better ACC convergence trend across all datasets compared to other ablation hybrid algorithms. Specifically, on the SR dataset, the ACC convergence trend of the proposed algorithm is similar to that of HMF-W_Chaos, HMF-W_GWA, HMF-W_M1, and HMF-W_mSMMI algorithms, all of which outperform other algorithms significantly. On the DL dataset, the ACC convergence trend of the proposed algorithm is almost identical to that of the HMF-W_GWA algorithm, both significantly outperforming other algorithms. On the Ov dataset, the ACC convergence trend of the proposed algorithm is similar to that of the HMF-W_M1 algorithm, both clearly outperforming other algorithms. Therefore, the improvements in the proposed algorithm clearly enhance its ACC.
From Fig 13, it can be seen that on the Ly, BT, and CS datasets, the LEN convergence trend of the proposed algorithm is slightly inferior to the ablation hybrid HMF-W_CDB and HMF-W_M1 algorithms, but still significantly better than other algorithms. Therefore, the improvements in the proposed algorithm effectively enhance its convergence.
Based on Table 4, Figs 12 and 13, it can be concluded that the proposed process optimization mechanism helps improve the algorithm’s ACC, demonstrating its excellent algorithmic control capability. Additionally, the improvements in the other five algorithm components have also enhanced the algorithm’s classification capability to varying degrees, which proves that the improvements in the components of the proposed HMF-W algorithm are effective.
4.3 Filter algorithm comparison experiment
To verify whether the proposed Filter method, mSMMI, performs excellently. On the 10 test datasets given in Table 2, the mSMMI algorithm was compared with 10 other Filter methods by running each one once under conditions of selecting 25, 50, 75, 100, 150, 175, 200, 250, 275, and 300 features. The average ACC, RS, and their SD were obtained. In this paper, the shortest average RT and its lowest SD for each dataset are taken as T and t, respectively, and the comparison results are shown in Table 6.
From Table 6, we can see that, on the Br dataset, MkmR and mSMMI algorithms are comparable. Except for the BT and CS datasets, where mSMMI performs slightly worse than FisherScore and ReliefF, mSMMI outperforms other Filter algorithms on the remaining 8 datasets. This proves that the proposed Filter method has superior classification performance.
Due to the excellent computational cost of filter algorithms, the computational efficiency gap between filter algorithms is difficult to demonstrate. Therefore, in the Wilcoxon test for RS results, only algorithms such as ChiSquare, FisherScore, Relief, MKMC, and mKMR show significant differences with the proposed algorithm on up to 4 datasets. Through comparing RS, we can observe. On the CS dataset, the RS of the mSMMI algorithm is significantly better than the others, reducing the average runtime by 25%. On the Ly and Ov datasets, the proposed algorithm’s RS is on par with most algorithms and better than a few others. On the Co, SR, DL, and Br datasets, the RS of all algorithms is the same. However, on the BT, ML, and SC datasets, the proposed algorithm is on par with other bivariate filter algorithms but slightly worse than most univariate algorithms. It is clear that the proposed bivariate filter method has an advantage in computational efficiency.
To provide a more comprehensive evaluation of the algorithm’s robustness, Table 7 shows the worst, best, and average ACC of the mSMMI Filter algorithm on 10 datasets, as well as the number of selected features corresponding to the worst and best ACC.
4.4 Wrapper algorithm comparison experiment
To verify the superiority of the proposed algorithm’s wrapper part. On the 10 test datasets given in Table 2, we compared 6 other wrapper algorithms with the wrapper HGW-CDBW algorithm proposed in this paper. To improve computational efficiency, each algorithm randomly selects hundreds features before performing wrapper search. Each algorithm was run ten times to get the average ACC, average LEN, and RS, along with the SD of all results. In this paper, the shortest average RT and its lowest SD for each dataset are taken as T and t, respectively, and the comparison results are shown in Table 8.
From Table 8, we can see that, on most datasets, the improved algorithm HGW-CDBW shows higher average ACC than other wrapper algorithms. Only on the SR dataset, the improved algorithm is on par with the GWA+SVM algorithm. Also, on the ML dataset, the improved algorithm is slightly worse than the CDBA+SVM algorithm. On most datasets, the improved algorithm’s average LEN is better than other algorithms, except on the Br dataset, where the improved algorithm is slightly worse than the GWA+SVM algorithm. On most datasets, the RS of the improved method is significantly better than other wrapper algorithms. Except on the Co dataset, where the improved method is slightly worse than the GWA+SVM algorithm. On all datasets, the improved algorithm shows statistically significant differences in average ACC, LEN, and RS compared to most other algorithms. This proves that the improved wrapper algorithm in this paper has superior convergence and classification performance.
To provide a more comprehensive evaluation of the algorithm’s robustness, Table 9 shows the worst, best, and average ACC of the HGW-CDBW algorithm on 10 datasets, as well as the corresponding LEN.
4.5 Hybrid algorithm comparison experiment
To validate the superiority of the proposed HMF-W algorithm, we compare it with four recent hybrid algorithms on the 10 test datasets given in Table 2. Each algorithm was run ten times to get the average ACC, average LEN, and RS, along with the SD of all results. In this paper, the shortest average RT and its lowest SD for each dataset are taken as T and t, respectively, and the comparison results are shown in Table 10.
From Table 10, we can see that the proposed HMF-W algorithm performs much better in accuracy on most datasets, except that it is slightly worse than DFDW on the Ly dataset. Then, its chosen feature subset length is shorter on most datasets, except for Co where it’s slightly worse than DFDW, showing good convergence and classification performance. Next, the proposed algorithm’s RS is significantly better than most other algorithms on most datasets, except on the Ov dataset, where it is slightly worse than the IGMPMMIAPSO algorithm. From a statistical Wilcoxon test perspective, the proposed algorithm’s results show significant differences from most other algorithms across all datasets. Except for the proposed algorithm’s ACC, which does not show significant differences from at least two algorithms on the SR, Br, and DL datasets, the LEN does not show significant differences from two algorithms on the CS and Br datasets, and the RS does not show significant differences from the DFDW and IGMPMMIAPSO algorithms on the Ov dataset. Finally, the proposed algorithm shows a relatively low standard deviation in most datasets, showing its results are robust.
To further validate the classification performance of the proposed algorithm, Fig 14 shows the ACC values obtained by these five hybrid algorithms on ten datasets. In this figure, the vertical coordinate of the radar chart is the ACC value, and the number at the edge is the number of runs.
This figure displays the classification accuracy values of 5 hybrid algorithms over 10 executions on 10 datasets.
From Fig 14, we can see that the proposed HMF-W algorithm performs much better in ACC on most datasets, except that it is slightly worse than DFDW on the Ly dataset. Then, its chosen feature subset length is shorter on most datasets, except for Co where it’s slightly worse than DFDW, showing good convergence and classification performance. Next, the proposed algorithm runs faster than most others, except on Ov where it is slightly slower than IGMPMMIAPSO. Based on the Wilcoxon test, the results show significant differences from most other algorithms on all datasets. Except that its ACC on SR, Br, and DL is not significantly different from at least two algorithms, its subset length differs insignificantly only on CS and Br, and its running time only on Ov compared with DFDW and IGMPMMIAPSO. Finally, the proposed algorithm shows a relatively low standard deviation in most datasets, showing its results are robust.
To further describe the differences between HMF-W and the other 10 hybrid algorithms. The minimum and maximum ACC rates obtained by the 11 algorithms combined with the SVM classifier on ten datasets are shown in Figs 15 and 16, where the horizontal axis represents the datasets and the vertical axis represents the ACC values. The points in the figure represent the lowest or highest ACC values.
This figure compares the minimum classification accuracy values achieved by 11 hybrid algorithms on all 10 experimental datasets.
This figure visualizes the maximum classification accuracy values attained by 11 hybrid algorithms across the 10 experimental datasets.
From Figs 15 and 16. It can be seen that the classification ability of the HMF-W algorithm, compared to other algorithms, is the best across all five percentiles (minimum, quartile (25th percentile), median, quartile (75th percentile), and maximum). Therefore, based on Table 10 and Figs 14 to 16, it can be seen that the HMF-W algorithm has significant advantages in reducing feature dimensions and finding the optimal ACC. The standard deviation of the classification results of the proposed algorithm is relatively low on most datasets, so it can be concluded that the dynamic feature selection strategy used by the filter method is robust. Moreover, this algorithm, with its outstanding search ability, can effectively solve the problems faced during the feature selection process. In summary, this algorithm can balance the goals of ACC and feature count.
4.6 Nested cross-validation hybrid algorithms comparison experiment
To further demonstrate the superiority of the proposed algorithm, this section presents new results by applying nested cross-validation to five hybrid algorithms. Specifically, on 10 test datasets, the proposed algorithm was compared with four recent hybrid algorithms. Each algorithm ran 10 times to obtain average ACC, average LEN, RS, and the SD of all results. In this paper, the shortest average RT and its lowest SD for each dataset are taken as T and t, respectively, and the comparison results are shown in Table 11:
As seen in Table 11, all hybrid algorithms experience a significant increase in running time under nested cross-validation, but the proposed HMF-W algorithm remains the best. On most datasets, the proposed algorithm’s average ACC and RS are better than all other hybrid algorithms, except on the Co and Br datasets, where the proposed algorithm’s average ACC is slightly worse than the TMKMCRIGWO, DFDW, and IGMPMMIAPSO algorithms. Also, on the Ov dataset, the proposed algorithm’s RS is slightly worse than the DFDW and IGMPMMIAPSO algorithms. Moreover, by comparing Tables 10 and 11, the average ACC of the proposed algorithm under nested cross-validation is 0.92% higher than that under 10-fold cross-validation. Specifically, on most datasets, the proposed algorithm achieves a higher average ACC under nested cross-validation than under 10-fold cross-validation, with two more datasets achieving 100% average ACC under nested cross-validation compared to 10-fold cross-validation. However, the LEN selected by the proposed algorithm under nested cross-validation is longer, and the running time increases significantly, but it still outperforms the other hybrid algorithms.
4.7 Biological genomic explanation
From a biological genomics perspective, only a few microarray genomic data are relatively important for disease diagnosis [54,55]. The proposed algorithm can find the smallest gene subset with the highest ACC. It is crucial to analyze these genes by identifying the obtained genes, as well as their contribution to medical diagnosis and biological significance. Table 12 shows the optimal gene subsets obtained using the proposed algorithm on each dataset.
As shown in Table 12, the proposed algorithm, when combined with SVM, achieves high ACC and TPR on the optimal gene subset selected for high-dimensional microarray data. Secondly, on most datasets, the proposed algorithm selects the best features at least 5 times in 10 runs. Except for the CS dataset, where the best features were selected only 2 times. The results show that the proposed algorithm has good ability to identify and select optimal features. Finally, the gene features selected by the algorithm have biological significance. We conducted enrichment analysis on the optimal genes selected from the Co, SR, and Ov datasets. Details are as follows:
First, the Co dataset includes 30 cases of colorectal cancer polyps at various stages. It also contains variations in preparation procedures and observations using different surgical instruments during complete colonoscopy. Shieh et al. noted that polyps usually make up a small proportion in complete colonoscopy records [56]. In the 10 runs of the proposed algorithm in this study, the best 9 genes were selected 6 times, with both ACC and TPR being 95.3%. Furthermore, DAVID’s (GOTERM_CC_DIRECT) GO cellular component analysis shows that this gene list is enriched in extracellular exosome functions, with an enrichment score of P=1.95E-2 and Benjamini=9.74E-1.
The SR dataset includes 83 samples, each with 2,308 gene expression values. It covers four types of small round cell blue tumors: 29 cases of Ewing’s sarcoma, 18 of neuroblastoma, 25 of rhabdomyosarcoma, and 11 of Burkitt lymphoma. This is a well-known pediatric tumor classification dataset [59]. In the 10 runs of the proposed algorithm, the best 4 genes were selected 6 times, achieving both an ACC and TPR of 100%. Furthermore, DAVID’s (GOTERM_CC_DIRECT) GO cellular component analysis shows that this gene list is enriched in intracellular membrane bounded organelle functions, with an enrichment score of P=7.00E-3 and Benjamini=0.301.
The Ov dataset consists of surface-enhanced laser desorption ionization time-of-flight protein mass spectrometry data, including 162 ovarian cancer samples and 91 normal tissue samples [23]. The optimal gene subset selected by the proposed algorithm achieved an ACC and TPR of 100%. In the 10 runs of the proposed algorithm, the best 3 genes were selected 6 times, achieving both an ACC and TPR of 100%. Furthermore, DAVID’s (UP_SEQ_FEATURE) annotation enrichment shows that this gene list is enriched in disulfide bond functions, with an enrichment score of P=7.63E-2 and Benjamini=1.14E-1.
The biological interpretation for other biological datasets is as follows:
The Ly dataset includes gene expression profiles for three common adult lymphatic malignancies: 46 cases of diffuse large B-cell lymphoma, 9 of follicular lymphoma, and 11 of chronic lymphocytic leukemia. In 2014, Zhang, Y et al. analyzed certain activities of B cells [57]. The gene subset selected by the proposed algorithm achieved an ACC of 98.6% and a TPR of 98.5%. The DL dataset represents a common type of non-Hodgkin lymphoma, accounting for of all such cases. It can be classified into two subtypes based on molecular phenotype: germinal center B-cell-like and activated B-cell-like. In 2017, R.T. et al. studied various types of lymphoma. The ABC subtype has a poor prognosis, with a 5-year survival rate of only 26%, while the GCB subtype has a 62% survival rate [58]. The optimal gene subset selected by the proposed algorithm achieved an ACC and TPR of 100%. The BT dataset includes gene expression profiles for malignant gliomas, recording four types of brain cancer samples: 14 typical brain tumors, 15 lesions, 14 glioblastomas, and 7 anaplastic oligodendrogliomas [2]. The optimal gene subset selected by the proposed algorithm achieved an ACC of 98.9% and a TPR of 98.8%. The CS dataset covers different genetic subtypes of CLL, a mature B lymphocyte malignancy characterized by prolonged lymphocyte survival, immune dysfunction, and treatment resistance [61]. The optimal gene subset selected by the proposed algorithm achieved an ACC and TPR of 94.6%. The ML dataset is a gene microarray for mixed leukemia caused by chromosomal translocation in acute lymphoblastic leukemia. It records three leukemia subtypes: 24 cases of MLL chromosomal translocation leukemia, 20 of acute lymphoblastic leukemia, and 28 of acute myelogenous leukemia [62]. The optimal gene subset selected by the proposed algorithm achieved an ACC and TPR of 100%.
The SC dataset is a Phenome-Wide Association Study related to smoking and cannabis use, involving polygenic risk scores for alcohol, opioids, smoking initiation, and lifetime cannabis use disorders. The optimal gene subset selected by the proposed algorithm achieved an ACC and TPR of 87.7%. The Br dataset includes gene expression profiles from breast cancer patients: 46 cases of recurrent breast cancer and 51 of non-recurrent breast cancer. Due to RNA degradation in the samples [59], the proposed algorithm achieved an ACC of 74.9% and a TPR of 74.8%.
This demonstrates that the proposed HMF-W algorithm can accurately identify and select biologically significant genes in most genomic or proteomic datasets.
4.8 Analysis of dimensionality reduction effects
The HMF-W algorithm can obtain short feature subsets with high ACC, which is attributed to the algorithm’s good convergence and its dual-module structure design, ensuring the selected feature subsets are shorter. As shown in Figs 17 to 19, the x-axis represents all datasets, while the y-axis represents the number of features.
This figure demonstrates the three-layer dimensionality reduction effects of the HMF-W algorithm on 10 datasets.
This figure showcases the primary dimensionality reduction performance of HMF-W during its first-layer processing across 10 datasets.
This figure details the secondary dimensionality reduction results achieved by HMF-W at its second optimization layer over 10 datasets.
In Fig 17, we can see that the proposed algorithm achieves excellent dimensionality reduction results on 10 datasets through the dimensionality reduction operations of Modules M1 and M2. The number of features in most datasets drops to single digits. Among them, the Ov dataset shows the most significant effect, reducing from 15,154 features to 3, lowering the dimension to 0.02% of the original. The final LEN of 7 datasets is less than 10. Only the Ly, CS, and Br datasets have a final feature count slightly above 10, with dimensions reduced by at least 0.45% of the original. Therefore, the algorithm has a lower time complexity, with its wrapper part adopting a unique embedded algorithm structure, making its time complexity slightly higher than that of hybrid algorithms such as DFDW, IGMPMMIAPSO, and TMKMCRIGWO.
Figs 18 and 19 display the number of features (K1, K2) after dimensionality reduction by the RF-FIM and mSMMI algorithms, following 10 runs on each dataset. As shown in Fig 18, the average number of features obtained by Module M1 was between 183. 3-245. 3, with all maximum values above 200, and two datasets (CS, SC) exceeding 250. Fig 19 shows that after the execution of the filter algorithm, the average number of features selected by Module M2 ranges from 15.9 to 18, with the maximum values around 20. Among them, three datasets (Co, DL, Br) have fewer than 20 features. The number of selected features can be adjusted in the algorithm’s coding to avoid selection failures. Overall, multiple K values ensure the algorithm can select diverse features, demonstrating the strong convergence of the dimensionality reduction algorithm in Module M2.
4.9 Computational complexity
Time complexity is an important metric for measuring the resources (such as time and space) required during algorithm execution. Time complexity and space complexity reflect the time efficiency and memory consumption of the algorithm, respectively. These metrics are used together to evaluate the computational efficiency and performance of the algorithm, providing a basis for comparing different algorithms. The Computational complexity of the 10 hybrid algorithms used for comparison with the proposed HMF-W algorithm is presented in table form, as shown in Table 13.
Where, T represents the total number of iterations, K represents the number of selected features, n represents the number of features in the dataset, and S represents the time spent executing the SVM classifier. n×K represents the Computational complexity of updating the algorithm’s position, and K×S represents the Computational complexity of the SVM classifier during one iteration.
In the proposed HMF-W algorithm, RF-FIM and mSMMI algorithms are first used for dual dimensionality reduction on the original dataset. If the original dataset has 2000 features, after the first dimensionality reduction, 200 features may remain, and after the second reduction, 20 features may remain. Then, these 20 features are used as the original feature subset for the wrapper algorithm, which is more beneficial for selecting the most relevant and least redundant features. As the number of features decreases, the computation time also decreases, leading to a corresponding reduction in Computational complexity. In Table 13, since the number of features selected is much smaller than the original number of features in the dataset, and K is much smaller than n, K×n is smaller than n2. Therefore, this algorithm has relatively low Computational complexity. Except for the ablation hybrid algorithm without chaos theory, which has nearly the same Computational complexity as the proposed algorithm, the Computational complexity of the proposed algorithm is slightly higher than that of its remaining 5 ablation hybrid algorithms.
Because the wrapper part of the proposed algorithm uses a unique embedded structure, its Computational complexity is slightly higher in theory than hybrid algorithms such as DFDW, IGMPMMIAPSO, and TMKMCRIGWO. However, as shown in Sect 4.5, the proposed algorithm runs faster than most other hybrid algorithms on almost all datasets. This is because we carefully optimized the wrapper process and achieved good results in Relative Speedups. Specifically, the DBO itself is efficient, and we improved its local search ability using chaos theory. Then, the CDBO was applied in the local search phase, where the gray wolf optimizer is least efficient. In this way, the strong classification performance of the gray wolf optimizer is kept, while the computation efficiency is greatly improved.
When dealing with most high-dimensional omics data, the HMF-W algorithm shows higher computing efficiency and faster speed than others, due to its lower complexity and optimized process. This means that within the same time, the proposed algorithm can handle more data or perform more iterations, improving both performance and efficiency. Moreover, its lower Computational complexity shows that the algorithm is more practical for large-scale omics data. Under limited resources, the proposed algorithm may be more suitable for most high-dimensional omics data, as it can run with less memory. However, when handling extremely large omics data, the algorithm might face difficulties due to memory size limits.
5 Conclusion
Based on the summary in the introduction, the current research faces three main issues: 1) Univariate Filter methods struggle to break through the theoretical bottleneck of feature interaction compensation, while bivariate Filter methods find it difficult to tackle both the powerful dimensionality curse and limited sample sizes; 2) Existing literature has not fully explored the synergistic framework of dung beetle and grey wolf algorithms; 3) There is limited research on interdisciplinary regulation methods in the field of hybrid feature selection.
To address the above issues and improve local optima, this paper proposes a novel dual-module hybrid feature selection algorithm, HMF-W. The algorithm first uses the RF-FIM algorithm in Module M1 to perform initial dimensionality reduction on the original feature set, and then deeply integrates mSMMI and HGW-CDBW algorithms in Module M2. Through a process optimization mechanism (POM), it allows the wrapper method to backtrack to the Filter method when it falls into a local optimum and updates the population to escape the local optimum, approaching the global optimal solution. Compared to other hybrid algorithms, this algorithm selects shorter-length candidate feature subsets while monitoring the optimization process in real-time through POM, helping the wrapper algorithm escape local optima while avoiding frequent calls to the Filter algorithm.
The proposed dual-module hybrid framework and improvement strategy method is extensively compared with 10 hybrid algorithms from the past three years on 10 public benchmark datasets from MGE. Experimental results indicate that the algorithm outperforms other algorithms, on all datasets, the average classification accuracy is at least 1.3% higher, the average feature subset length is at least 8 units shorter, and the dimension is reduced to less than 0.45% of the original. The results are biologically meaningful and statistically significant.Although the algorithm performs excellently on high-dimensional datasets, its specific optimization for high-dimensional data in the encoding layer makes it challenging to select highly relevant effective features in some low-dimensional datasets.
Future work will expand the algorithm to medium- and low-dimensional datasets, continuously optimizing while maintaining excellent performance on high-dimensional data. Ultimately, the focus will be on solving experimental equipment issues, and if conditions permit, parallel technologies will be adopted to accelerate the computation process, actively verifying the algorithm’s feasibility on ultra-large-scale datasets.
References
- 1. Nicolella DP, Bredbenner TL, Havill LM, Tamez-Pena JG, Gonzalez P, Schreyer E, et al. Variation in knee shape predicts the future onset of radiographic knee osteoarthritis (RKOA) and this variation is different in males compared to females. Osteoarthritis and Cartilage. 2015;23:A208–9.
- 2. Pashaei E, Liu S, Li K, Zang Y, Yang L, Lautenschlaeger T, et al. DiCE: differential centrality-ensemble analysis based on gene expression profiles and protein-protein interaction network. Nucleic Acids Res. 2025;53(13):gkaf609. pmid:40626556
- 3. Jiao R, Nguyen BH, Xue B, Zhang M. A survey on evolutionary multiobjective feature selection in classification: approaches, applications, and challenges. IEEE Trans Evol Computat. 2024;28(4):1156–76.
- 4. Turkoglu B, Uymaz SA, Kaya E. Binary artificial algae algorithm for feature selection. Applied Soft Computing. 2022;120:108630.
- 5. Gu X, Guo J, Li C, Xiao L. A feature selection algorithm based on redundancy analysis and interaction weight. Appl Intell. 2020;51(4):2672–86.
- 6. Lambrechts G, De Geeter F, Vecoven N, Ernst D, Drion G. Warming up recurrent neural networks to maximise reachable multistability greatly improves learning. Neural Netw. 2023;166:645–69. pmid:37604075
- 7. Sun H, Cao J, Pang Y. Semantic-aware self-supervised depth estimation for stereo 3D detection. Pattern Recognition Letters. 2023;167:164–70.
- 8. Fan Y, Chen B, Huang W, Liu J, Weng W, Lan W. Multi-label feature selection based on label correlations and feature redundancy. Knowledge-Based Systems. 2022;241:108256.
- 9. Li Z. A local opposition-learning golden-sine grey wolf optimization algorithm for feature selection in data classification. Applied Soft Computing. 2023;142:110319.
- 10. Cavallaro C, Cutello V, Pavone M, Zito F. Machine learning and genetic algorithms: a case study on image reconstruction. Knowledge-Based Systems. 2024;284:111194.
- 11. Tijjani S, Ab Wahab MN, Mohd Noor MH. An enhanced particle swarm optimization with position update for optimal feature selection. Expert Systems with Applications. 2024;247:123337.
- 12. Avellaneda-Gomez LS, Grisales-Noreña LF, Cortés-Caicedo B, Montoya OD, Bolaños RI. Optimal battery operation for the optimization of power distribution networks: an application of the ant lion optimizer. Journal of Energy Storage. 2024;84:110684.
- 13. Liang M, Wang L, Wang L, Zhong Y. A hypervolume-based cuckoo search algorithm with enhanced diversity and adaptive scaling factor. Applied Soft Computing. 2024;151:111073.
- 14. Xue J, Shen B. Dung beetle optimizer: a new meta-heuristic algorithm for global optimization. J Supercomput. 2022;79(7):7305–36.
- 15. Nadimi-Shahraki MH, Zamani H, Mirjalili S. Enhanced whale optimization algorithm for medical feature selection: A COVID-19 case study. Comput Biol Med. 2022;148:105858. pmid:35868045
- 16. Wang Z, Huang L, Yang S, Li D, He D, Chan S. A quasi-oppositional learning of updating quantum state and Q-learning based on the dung beetle algorithm for global optimization. Alexandria Engineering Journal. 2023;81:469–88.
- 17. Gad AG, Sallam KM, Chakrabortty RK, Ryan MJ, Abohany AA. An improved binary sparrow search algorithm for feature selection in data classification. Neural Comput & Applic. 2022;34(18):15705–52.
- 18. Jain I, Jain VK, Jain R. Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Applied Soft Computing. 2018;62:203–15.
- 19. BinSaeedan W, Alramlawi S. CS-BPSO: hybrid feature selection based on chi-square and binary PSO algorithm for Arabic email authorship analysis. Knowledge-Based Systems. 2021;227:107224.
- 20. Vommi AM, Battula TK. A hybrid filter-wrapper feature selection using Fuzzy KNN based on Bonferroni mean for medical datasets classification: A COVID-19 case study. Expert Systems with Applications. 2023;218:119612.
- 21. Salesi S, Cosma G, Mavrovouniotis M. TAGA: Tabu Asexual Genetic Algorithm embedded in a filter/filter feature selection approach for high-dimensional data. Information Sciences. 2021;565:105–27.
- 22. Gayathri R, Suchand Sandeep CS, Vijayan C, Murukeshan VM. Lasing from micro- and nano-scale photonic disordered structures for biomedical applications. Nanomaterials (Basel). 2023;13(17):2466. pmid:37686974
- 23. Pashaei E. Mutation-based binary aquila optimizer for gene selection in cancer classification. Comput Biol Chem. 2022;101:107767. pmid:36084602
- 24. Kuran F, Tanırcan G, Pashaei E. Performance evaluation of machine learning techniques in predicting cumulative absolute velocity. Soil Dynamics and Earthquake Engineering. 2023;174:108175.
- 25. Sosa-Cabrera G, Gómez-Guerrero S, García-Torres M, Schaerer CE. Feature selection: a perspective on inter-attribute cooperation. Int J Data Sci Anal. 2023;17(2):139–51.
- 26. Lawrence MO, Jimoh RG, Yahya WB. An efficient feature selection and classification system for microarray cancer data using genetic algorithm and deep belief networks. Multimed Tools Appl. 2024;84(8):4393–434.
- 27. Pashaei E, Pashaei E, Aydin N. Biomarker identification for Alzheimer’s disease using a multi-filter gene selection approach. Int J Mol Sci. 2025;26(5):1816. pmid:40076442
- 28. Pashaei E, Pashaei E, Mirjalili S. Binary hiking optimization for gene selection: Insights from HNSCC RNA-Seq data. Expert Systems with Applications. 2025;268:126404.
- 29. Di Gaspare A, Pistore V, Riccardi E, Pogna EAA, Beere HE, Ritchie DA, et al. Self-induced mode-locking in electrically pumped far-infrared random lasers. Adv Sci (Weinh). 2023;10(9):e2206824. pmid:36707499
- 30. Okamoto T, Imamura K, Kajisa K. Inverse design of two-dimensional disordered structures for spectral optimization of random lasers. Optics Communications. 2022;508:127775.
- 31. Gomes ASL, Moura AL, de Araújo CB, Raposo EP. Recent advances and applications of random lasers and random fiber lasers. Progress in Quantum Electronics. 2021;78:100343.
- 32. Ni D, Späth M, Klämpfl F, Schmidt M, Hohmann M. Threshold behavior and tunability of a diffusive random laser. Opt Express. 2023;31(16):25747–62. pmid:37710453
- 33. Dudukcu HV, Taskiran M, Cam Taskiran ZG, Yildirim T. Temporal Convolutional Networks with RNN approach for chaotic time series prediction. Applied Soft Computing. 2023;133:109945.
- 34. Zhou S, Wang X, Zhang Y. Novel image encryption scheme based on chaotic signals with finite-precision error. Information Sciences. 2023;621:782–98.
- 35. Wang Z, Ahmadi A, Tian H, Jafari S, Chen G. Lower-dimensional simple chaotic systems with spectacular features. Chaos, Solitons & Fractals. 2023;169:113299.
- 36. Hua Z, Chen Y, Bao H, Zhou Y. Two-dimensional parametric polynomial chaotic system. IEEE Trans Syst Man Cybern, Syst. 2022;52(7):4402–14.
- 37. Ascoli A, Demirkol AS, Tetzlaff R, Chua L. Edge of chaos theory resolves smale paradox. IEEE Trans Circuits Syst I. 2022;69(3):1252–65.
- 38. Xia S, Yang Y. A model-free feature selection technique of feature screening and random forest-based recursive feature elimination. International Journal of Intelligent Systems. 2023;2023(1).
- 39. Sun Z, Wang G, Li P, Wang H, Zhang M, Liang X. An improved random forest based on the classification accuracy and correlation measurement of decision trees. Expert Systems with Applications. 2024;237:121549.
- 40. Zhou J, Yang P, Li C, Du K. Hybrid random forest-based models for predicting shear strength of structural surfaces based on surface morphology parameters and metaheuristic algorithms. Construction and Building Materials. 2023;409:133911.
- 41. Zhang X, Liu H, Yang G, Wang Y, Yao H. Comprehensive insights into the application strategy of kitchen waste derived hydrochar: random forest-based modelling. Chemical Engineering Journal. 2023;469:143840.
- 42. Chantar H, Mafarja M, Alsawalqah H, Heidari AA, Aljarah I, Faris H. Feature selection using binary grey wolf optimizer with elite-based crossover for Arabic text classification. Neural Comput & Applic. 2019;32(16):12201–20.
- 43. Ramasamy Rajammal R, Mirjalili S, Ekambaram G, Palanisamy N. Binary grey wolf optimizer with mutation and adaptive k-nearest neighbour for feature selection in Parkinson’s Disease diagnosis. Knowledge-Based Systems. 2022;246:108701.
- 44. Gong H, Li Y, Zhang J, Zhang B, Wang X. A new filter feature selection algorithm for classification task by ensembling pearson correlation coefficient and mutual information. Engineering Applications of Artificial Intelligence. 2024;131:107865.
- 45. Li X, Ma Y, Zhou Q, Zhang X. Sparse large-scale high-order fuzzy cognitive maps guided by spearman correlation coefficient. Applied Soft Computing. 2024;167:112253.
- 46. Got A, Moussaoui A, Zouache D. Hybrid filter-wrapper feature selection using whale optimization algorithm: a multi-objective approach. Expert Systems with Applications. 2021;183:115312.
- 47. Zhu Y, Li W, Li T. A hybrid artificial immune optimization for high-dimensional feature selection. Knowledge-Based Systems. 2023;260:110111.
- 48. Xu Z, Yang F, Tang C, Wang H, Wang S, Sun J, et al. FG-HFS: a feature filter and group evolution hybrid feature selection algorithm for high-dimensional gene expression data. Expert Systems with Applications. 2024;245:123069.
- 49. Deng S, Li Y, Wang J, Cao R, Li M. A feature-thresholds guided genetic algorithm based on a multi-objective feature scoring method for high-dimensional feature selection. Applied Soft Computing. 2023;148:110765.
- 50.
Li Z, Cai X, Huang Q, Lin Y. A novel cooperative swarm intelligence feature selection method for hybrid data based on fuzzy
covering and fuzzy self-information. Information Sciences. 2024;675:120757.
- 51. Bai X, Zheng Y, Lu Y. A hybrid feature selection algorithm based on collision principle and adaptability. IEEE Access. 2024;12:106236–52.
- 52. Bai X, Zheng Y, Lu Y, Shi Y. Chain hybrid feature selection algorithm based on improved grey wolf optimization algorithm. PLoS One. 2024;19(10):e0311602. pmid:39378228
- 53. Chen H, Zheng Y. Double filter and double wrapper feature selection algorithm for high-dimensional data analysis. IEEE Access. 2025;13:86185–202.
- 54. Bai D, Ma C, Xun J, Luo H, Yang H, Lyu H, et al. MicrobiomeStatPlots: microbiome statistics plotting gallery for meta-omics and bioinformatics. Imeta. 2025;4(1):e70002. pmid:40027478
- 55. Castellano-Escuder P, Zachman DK, Han K, Hirschey MD. GAUDI: interpretable multi-omics integration with UMAP embeddings and density-based clustering. Nat Commun. 2025;16(1):5771. pmid:40593592
- 56. Shieh JC, Wilkinson MG, Millar JB. The Win1 mitotic regulator is a component of the fission yeast stress-activated Sty1 MAPK pathway. Mol Biol Cell. 1998;9(2):311–22. pmid:9450957
- 57. Zhang Y, Pizzute T, Pei M. A review of crosstalk between MAPK and Wnt signals and its impact on cartilage regeneration. Cell Tissue Res. 2014;358(3):633–49. pmid:25312291
- 58. Franceschi RT, Ge C. Control of the osteoblast lineage by mitogen-activated protein kinase signaling. Curr Mol Biol Rep. 2017;3(2):122–32. pmid:29057206
- 59. Pashaei E, Pashaei E. Gene selection using hybrid dragonfly black hole algorithm: a case study on RNA-seq COVID-19 data. Anal Biochem. 2021;627:114242. pmid:33974890
- 60. Mirjalili S, Mirjalili SM, Yang X-S. Binary bat algorithm. Neural Comput & Applic. 2013;25(3–4):663–81.
- 61. Pashaei E, Pashaei E. Hybrid binary COOT algorithm with simulated annealing for feature selection in high-dimensional microarray data. Neural Comput & Applic. 2022;35(1):353–74.
- 62. Pashaei E, Pashaei E. An efficient binary chimp optimization algorithm for feature selection in biomedical data classification. Neural Comput & Applic. 2022;34(8):6427–51.