Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Improved WOA and its application in feature selection

  • Wei Liu,

    Roles Project administration, Supervision, Validation, Visualization

    Affiliation College of Science, Liaoning Technical University, Fuxin, Liaoning, China

  • Zhiqing Guo ,

    Roles Conceptualization, Data curation, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing

    mathgzq@gmail.com

    Affiliation College of Science, Liaoning Technical University, Fuxin, Liaoning, China

  • Feng Jiang,

    Roles Data curation, Formal analysis

    Affiliation College of Science, Liaoning Technical University, Fuxin, Liaoning, China

  • Guangwei Liu,

    Roles Funding acquisition, Project administration

    Affiliation College of Mines, Liaoning Technical University, Fuxin, Liaoning, China

  • Dong Wang,

    Roles Project administration, Resources

    Affiliation College of Mines, Liaoning Technical University, Fuxin, Liaoning, China

  • Zishun Ni

    Roles Data curation, Formal analysis, Methodology, Resources

    Affiliation College of Science, Liaoning Technical University, Fuxin, Liaoning, China

Abstract

Feature selection (FS) can eliminate many redundant, irrelevant, and noisy features in high-dimensional data to improve machine learning or data mining models’ prediction, classification, and computational performance. We proposed an improved whale optimization algorithm (IWOA) and improved k-nearest neighbors (IKNN) classifier approaches for feature selection (IWOAIKFS). Firstly, WOA is improved by using chaotic elite reverse individual, probability selection of skew distribution, nonlinear adjustment of control parameters and position correction strategy to enhance the search performance of the algorithm for feature subsets. Secondly, the sample similarity measurement criterion and weighted voting criterion based on the simulated annealing algorithm to solve the weight matrix M are proposed to improve the KNN classifier and improve the evaluation performance of the algorithm on feature subsets. The experimental results show: IWOA not only has better optimization performance when solving benchmark functions of different dimensions, but also when used with IKNN for feature selection, IWOAIKFS has better classification and robustness.

1. Introduction

With the continuous development and progress of science and technology and its continuous use in biomedicine, astronomy, agriculture, finance, and engineering, various forms of data have shown exponential growth [1]. However, original datasets usually contain many redundant, uncorrelated, and noisy features, making data mining very difficult [2]. Feature selection (FS) refers to a method that selects the optimal feature subset from many original features and restores all the features in the original datasets as much as possible with the smallest number of features [3]. FS has become a key preprocessing step for machine learning and pattern recognition [4].

According to different evaluation methods, FS was roughly divided into 3 categories: wrapper, filter, and embedding [5]. The wrapper-based model combined the FS process with predetermined learning algorithms (e.g., classifiers) and used learning algorithms to evaluate each feature subset. Its characteristic is that the solution accuracy is high, but the solution speed is slow. The filter-based model evaluated and filtered candidate subsets through the intrinsic attributes of the features without relying on any learning algorithm. Its characteristic is that the solution is fast, but it cannot reflect the relevance of various dimensions when extracting feature subsets. The embedding-based model is a combination of wrapper and filter. The FS mechanism is integrated into the training process of the learning algorithm and features are automatically selected while training the model [69].

Searching for feature subsets is a key problem of FS [10]. The process can be regarded as a combinatorial optimization problem to seek the optimal feature subset in a limited feature space. This process can be searched and solved by exhaustive methods and heuristic methods. However, exhaustive and heuristic methods need to traverse every dataset sample when performing FS, the algorithm search space increases, resulting in increased algorithm time complexity and high computational cost [11].

In order to improve the speed and efficiency of algorithm searching for feature subsets, a class of meta-heuristic algorithms inspired by natural evolution has been proposed. Due to its simple heuristic mechanism and strong global exploration ability, it has been widely used in various fields. Therefore, researchers use meta-heuristic algorithm as a search strategy for feature subsets in FS, and a series of more meaningful research results have been achieved so far [1217]. For example: grey wolf optimizer (GWO) [18] was improved by Seth, JK, etc. for binary gray wolf optimization algorithm for intrusion detection [19], and Two binary gray wolf optimization algorithms proposed by Emary E et al. after improving GWO [20], Al-Tashi Q et al. mixed GWO and PSO binary gray wolf algorithm [21]; Too J et al. improved binary ASO algorithm based on S-shaped and V-shaped transfer functions [22] and quadratic binary Harris hawk optimization (QBHHO) algorithm [23]; Kumar L et al. proposed a hybrid binary particle swarm algorithm and sine cosine algorithm (HPSOSCA) using V-shaped transfer function to change particle position [24]; Improved MBO for feature selection problems by Alweshah M et al. [25]; Mafarja M and Mirjalili S improved WOA for feature selection problems [26] and so on. In addition, Remora optimization algorithm (ROA) [27], African vulture optimization algorithm (AVOA) [28], Gorilla troops optimizer (GTO) [29], Wild horse optimizer (WHO) [30], Binary chimp optimization algorithm (BChOA) [31], Arithmetic optimization algorithm (AOA) [32], Aquila optimizer (AO) [33] and other meta-heuristic algorithms are also being explored for feature selection problems.

Among them, WOA is a new meta-heuristic algorithm inspired by the social behavior of humpback whales by Mirjalili et al. [34] to simulate the hunting behavior of whales. Because of its fast convergence speed, high accuracy, and few parameters when solving some optimization problems, WOA is widely used in various engineering practices and FS. According to the different improvement mechanisms, the existing WOA-based FS methods are roughly divided into three categories:

The first category is the classic WOA FS method based on binary variants. The classic WOA method is transformed into a binary WOA algorithm through a sigmoid function or a V-shaped function, and the optimal feature subset is searched for using the strong global search ability of WOA, thereby improving the classification accuracy of the dataset. This kind of method is applied in medical dataset [35, 36], breast cancer dataset [37], network intrusion detection [38, 39], spam filtering [40], dimensionality reduction of high dimensional data [41, 42], etc.

The second category is the improved WOA algorithm that enhances the algorithm’s global exploration and local development functions. The performance of WOA is enhanced by modifying the parameters and introducing operators, and the efficiency of FS is improved. This type of method usually enhances the exploration and development capabilities of WOA by increasing the diversity of the initial population (Elite Opposition-Based Learning [43] and Chaos strategy [44]), nonlinear correction of control parameters [26, 45, 46] and algorithm location update [4751] improve the algorithm’s search performance for feature subsets.

The third category is the improved strategy of the cross-fusion of the WOA algorithm and other algorithms. The optimization performance of WOA is enhanced by fusing the search characteristics of different algorithms, thus improving the search efficiency of the optimal feature subset. For example, the fusion of different algorithms such as WOA and salp swarm algorithm (SSA) [52], WOA and flower pollination algorithm (FPA) [53], WOA and GWO [54], WOA and simulated annealing (SA) [16], WOA and GA [55] can improve the convergence performance of the original algorithm effectively and achieve meaningful results in FS.

The evaluation of feature subsets is also a key issue of FS [10]. This process can be seen as a binary classification problem to evaluate the optimal feature subset through the classifier. After the meta-heuristic algorithm is used to search the feature subset, the k-nearest neighbors (KNN) classifier and support vector machine (SVM) classifier are usually used to evaluate the feature subset. The KNN classifier is the most used classifier when obtaining the best feature subset on the UCI knowledge base. SVM classifiers are mainly used in different applications such as medical diagnosis, pattern recognition, and image analysis [8, 17]. Studies have shown that different classifiers have significant differences in the results of FS [56]. Improving the KNN classifier and applying it to FS by fusing it with improved WOA is another motivation for the work of this paper.

The above studies have done good work on the FS problem in different periods and different fields. However, no optimization algorithm can completely solve all problems according to NO-Free-Lunch (NFL) [57]. This is also the basis and motivation of the work of this article. Therefore, for two key issues in FS: the search and evaluation of feature subsets. The main contributions of this paper are as follows:

  • An improved whale optimization algorithm (IWOA) is proposed based on chaotic reverse elite individuals and skew distribution. IWOA and 8 meta-heuristic algorithms (ASO, GWO, HHO, MFO, MVO, SSA, TSA, and WOA) were compared in two dimensions (30D and 100D) of 8 benchmark functions, verifying that IWOA has good superior performance.
  • An improved KNN classifier (IKNN) is proposed based on a weight matrix M and a weighted classification strategy. The comparison of IKNN and 5 classifiers (KNN, Naive Bayes, C4.5, SVM, and BP neural network) on 8 datasets have shown that IKNN has good classification performance.
  • A FS method based on IWOA and IKNN (IWOAIKFS) is proposed. As a wrapper-based model, IWOAIKFS is an FS method that uses IWOA fusion IKNN to search and evaluate feature subsets.
  • IWOAIKFS was applied to evaluate 15 datasets. The results are compared with 15 datasets evaluated with 6 FS methods based on meta-heuristic algorithms (ASO, GWO, HHO, SCA, SSA, and WOA). The comparison has shown that IWOAIKFS has higher classification accuracy and stable performance than 6 FS methods.

The remaining paper is organized as follows: Section 2 is the standard WOA algorithm and an improved whale optimization algorithm (IWOA). In Section 3, the standard KNN classifier is discussed, and the improved KNN classification algorithm (IKNN) is discussed and proposed. In Section 4, an IKNN FS method is discussed and proposed based on IWOA. In Section 5, the experimental results of three experiments are discussed and analyzed. Section 5.3 is the IWOA comparative experiment and discussion. Section 5.4 is the IKNN comparative experiment and discussion. Section 5.5 is the IWOAIKFS comparative experiment and discussion. In Section 6, the current work and possibly future research work are summarized.

2. Improved whale optimization algorithm based on chaotic elite reverse individual and skew distribution

2.1. WOA

The whale optimization algorithm is a meta-heuristic algorithm inspired by the social behavior of humpback whales by Mirjalili et al. [34] to simulate whales searching for prey, surrounding prey, and a spiral bubble net preying on prey. It is supposed that there are N whales in the population foraging in d dimensional space, the position of the ith whale is .

2.1.1. Searching for prey.

Before determining the approximate location of the prey whales update the location of the group through random walks and location sharing mechanisms. The process can be expressed as: (1) (2) Where t represents the current number of iterations, represents the location of the whale at the (t + 1)th time, represents the location of the random individual in the whale group (initial group), represents the distance between the current individual and the random individual whale. Besides, and are the coefficient vector, which can be expressed as: (3) (4) where , a is the algorithm convergence factor, which decreases from 2 to 0 as the iteration progresses, expressed as: (5) where t represents the current number of iterations, tmax is the maximum iterations of the algorithm. It can be seen from Eqs (1) (3) and (5) that when the algorithm simulates the process of whale searching for prey and explores the location of the optimal solution in the solution space. As the algorithm continues to iterate, decreases linearly with a, until , the algorithm enters the stage of encircling the prey.

2.1.2. Approaching and encircling the prey.

As the number of iterations increases, the location of the prey is determined by the optimal whale in the whale group. Other whale individuals gradually approach their prey by shrinking and enclosing through position sharing. This process can be expressed as: (6) where represents the optimal solution position vector in the current whale population, represents the distance between the optimal whale individual and other individuals.

2.1.3. Encircling and capturing prey.

When a school of whales approaches their prey, humpback whales’ prey on the prey through the spiral position update method. The spiral position update method is expressed as: (7) where represents the current distance between the whale and its prey, b is a constant defining the shape of the logarithmic spiral. l is a random number between [–1,1]. Assuming that the probability of a whale performing any of these behaviors is 50%, the whale encircling and hunting behavior can be expressed as: (8) where p is a random number between [0, 1].

2.2. IWOA

In order to better improve the convergence speed and accuracy of WOA, WOA was improved from 4 aspects: population initialization, probability selection, Parameter correction and position update.

2.2.1. Population initialization based on chaotic reverse elite individuals.

In the whale optimization algorithm, the initial position of the whale group has great constraints on prey. When the distance between the whale group and the prey is closer, the whale can prey on the prey faster. Therefore, the location of the initial population of whales is important for solving the optimal value of the algorithm. Chaos strategy [58] elite opposition-based learning [59] as effective improvement strategies for population initialization are widely used in generating initialization populations of various meta-heuristic algorithms. However, using chaos strategy alone or using elite opposition-based learning alone to generate the initial population of the algorithm ignores the uniform distribution of the initial population in the solution space and the retention of elite individuals. Therefore, this paper proposed a population initialization strategy based on the chaotic reverse elite individuals by combining the Gaussian chaos strategy and the Elite Opposition-Based Learning.

Answering the question: The more evenly distributed the initial population is in the solution space, the greater the probability that the algorithm finds the optimal value. Compared with random search strategy, chaotic search is widely used in the generation of initial population due to its randomness, ergodicity, non-repetition and other characteristics. However, different chaotic maps have different effects on the initial population of the algorithm. Therefore, in this paper, through the analysis and comparison of Rand, Gauss map, Tent map and Chebyshev map, the chaotic map suitable for the whale optimization algorithm is selected. The initial population and the original initial population generated by the three chaotic maps are shown in Fig 1.

thumbnail
Fig 1. Initial population generated by 3 kinds of maps in 1000 iterations.

(a) Rand, (b) Tent map, (c) Chebyshev map, (d) Gauss map.

https://doi.org/10.1371/journal.pone.0267041.g001

In Fig 1, Fig 1(a)–1(d) represent the original initial whale population, the initial whale population generated by Tent mapping, the initial whale population generated by Chebyshev mapping, and the initial whale population generated by Gauss Generated initial whale population. As can be seen from Fig 1, from the point of view of the generation of the initial whale population, the whale population generated by Gauss map is more evenly distributed in space, which provides a better guarantee for the global optimization of the algorithm.

Gauss/mouse map: (9) Where xk is the kth chaos number, k is the number of iterations, x ∈ (0, 1).

Elite Opposition-Based Learning [59, 60]:

Definition 1 (opposite solution) supposes that a feasible solution of the current population in the d dimensional search space is (xj ∈ [aj, bj]), then its opposite solution is , where , rrand[0, 1].

Definition 2 (elite opposite solution) supposes that extreme points of ordinary individuals in the current population by the elite individuals in the population, that is (10) Where i = 1, 2, ⋯, N, j = 1, 2, ⋯, d, , , , lbj and ubj are the lower and upper bounds of the dynamic boundary, and the opposite solutions can be defined as: (11) Where rrand[0, 1]. If exceeds the boundary, then set (12)

According to Gauss/mouse map and Elite Opposition-Based Learning, the initialization method of chaos reverses the elite individual population proposed in this paper is as follows.

Algorithm 1. Chaotic elite reverse individual.

Input: N, d, lb = lbj,ub = ubj

1: Initialize the Positions1 with Gauss/mouse map (Eq 9)

2: Initialize the Positions2 with Elite Opposition-Based Learning

3: for1 i = 1:N

4: for2 j = 1:d

5:   fitness1(i,j) = fobj(Positions1(i,j))

6:   fitness2(i,j) = fobj(Positions1(i,j))

7:   if1 fitness1(i,j)<fitmess2(i,j)

8:    Positions1(i,j) = Positions2(i,j)

9:   end if1

10:   if2 Positions1(i,j)<lb

11:    Positions1(i,j) = rand()*(ub-lb)+lb

12:   end if2

13:   if3 Positions1(i,j)>ub

14:    Positions1(i,j) = rand()*(ub-lb)+lb

15:   end if3

16: end for2

17: end for1

18: = Positions1

Return:

2.2.2. Probability selection strategy based on the skew distribution.

WOA assumes that the probability of a group of whales choosing to surround and prey is 50%, and the probability p generated by each iteration of the algorithm obeys a uniformly distributed random number between [0,1], which is inconsistent with the animal hunting guidelines in the actual nature. In nature, when a predator finds a prey, the probability of surrounding and preying on the prey changes accordingly with time. The probability of its generation does not obey a uniform distribution.

In order to improve the global exploration ability and convergence accuracy of WOA, a new probability generation method is proposed. This method divides the WOA iteration process into three periods and then corrects the probabilities generated in each period.

  1. Early iteration. When an individual whale finds prey, the remaining whales quickly move close to the optimal individual through the position sharing mechanism. At this time, the probability of surrounding the prey is greater than 0.5. Therefore, the probability of hunting behavior in the early stage of the algorithm iteration follows a negative skew distribution of 0.8.
  2. Mid iteration. The whales that are close to the prey have already surrounded the prey, and the groups of whales that are far away are constantly approaching the prey. At this time, the probability of the entire group of whales choosing to surround and prey is equal. Therefore, the whale encircling and hunting behavior at this time is given the probability obeys the uniform distribution between [0, 1].
  3. Late iteration. Assuming that the group of whales has surrounded the prey and began to attack the prey. However, the probability of the whale’s successful predation follows a distribution of less than 0.5 due to the prey’s own desire to survive. Therefore, the probability of whale encircling and hunting behavior in the late iteration of the algorithm follows 0.2 Positive skew distribution.

According to the above description, the probability generation equation is given as: (13) Where rsn(n,location,scale,shape) is the skew distribution random number generation method proposed by Azzalini A [61], tmax represents the maximum iterations of the algorithm. Fig 2 shows the probability generation method of whale social behavior in three different periods.

thumbnail
Fig 2. Probability generation of whale social behavior.

(a) Early iteration, b) Mid-iteration, c) Late iteration.

https://doi.org/10.1371/journal.pone.0267041.g002

Fig 2 and Eq (13) show that the generated probability range is not between [0, 1]. Therefore, the boundary of the generated probability is constrained, namely (14)

2.2.3. Nonlinear correction strategy of a and .

In WOA, and are important parameters that control whales to explore, surround and prey on prey. The value of is determined by the convergence factor a. Fig 3 shows the changes of WOA’s original parameters a and under 1000 iterations. The convergence factor a shows a linear downward trend, indicating that the distance between the whale and the prey shows a linear downward trend. is a uniformly distributed random number between 0 and 2, indicating that the distance between the whale and the prey changes randomly. has no obvious effect on the algorithm’s global exploration and local mining.

thumbnail
Fig 3. Original parameter a-value, -value and -value under 1000 iterations.

https://doi.org/10.1371/journal.pone.0267041.g003

Therefore, the parameters a and are modified to speed up the convergence speed of WOA. The modified convergence factor a and are: (15) (16) Where a ∈ [2, 0], t is the current number of iterations, and tmax is the maximum iterations of the algorithm. The iterative curve of the updated parameters a, and at 1000 times is shown in Fig 4.

thumbnail
Fig 4. Modified parameter a-value, -value and -value under 1000 iterations.

https://doi.org/10.1371/journal.pone.0267041.g004

2.2.4. Location update strategy.

In WOA, the whale group approaches the optimal whale individual and surrounds the prey when . In order to speed up the process of whales moving to the optimal individual whale and quickly encircle their prey; a nonlinear decreasing disturbance factor is introduced to enhance the local mining capability of the algorithm and improve the accuracy of the algorithm’s convergence. (17) where represents the optimal solution position vector in the current whale population, and are coefficient vectors, ω is a nonlinear decreasing perturbation factor, which is defined as: (18)

The revised position update equation is: (19) where represents the distance between the optimal whale individual and other individuals and p is the probability generated by Eqs (13) and (14).

2.3. Pseudo-code of the IWOA algorithm

In summary, an improved whale optimization algorithm (IWOA) executes pseudocodes as shown in Algorithm 2.

Algorithm 2. Improved whale optimization algorithm (IWOA).

Input: N = Total populations, d, tmax

1: Initialize the whales population Xi(i = 1, 2, ⋯, N) with Algorithm 1

2: Calculate the fitness of each search agent

3: X* = the best search agent

4: while (t<tmax)

5: for each search agent

6:   Update a (Eq 15), A, C (Eq 16), l

7:   Calculate p (Eqs 13 and 14)

8:   if1 (p<0.5)

9:    if2 (|A|< 1)

10:    Update the position of current search agent (Eq (17))

11:   else if2 (|A| ≥ 1)

12:    Select a random search agent (Xrand)

13:    Update the position of current search agent (Eq (1))

14:   end if2

15:   else if1 (p ≥ 0.5)

16:    Update the position of current search agent (Eq (19))

17:   end if1

18: end for

19:  Check the boundary and amend it

20:  Calculate the fitness of each search agent

21:  Update X* if there is a better solution

22:  t = t+1

23: end while

Return: X*

2.4. Time complexity analysis of IWOA

  1. The initialization of the population process needs time, where N is the population size, and d defines the dimension of a given test problem.
  2. Calculate the a and p needs , where tmax is the maximum number of iterations.
  3. Calculate the fitness of each search agent needs time.

Hence, the total time complexity of IWOA algorithm is .

3. IKNN based on M and weighted classification strategy

3.1. KNN

KNN [62] is a supervised classification algorithm proposed by COVER and HART. KNN is widely used in various fields due to its simple and intuitive idea. The basic principles of KNN classification are:

  1. Express the test sample as a feature vector consistent with the training sample set.
  2. Calculate the distance between the test sample and each training sample according to the distance function, and select the K samples with the smallest distance from the test sample as the KNN of the test sample.
  3. According to the principle of "majority voting", the class with the most occurrences among the KNN are selected as the test sample class.

k-value, distance function and "voting method" are important parameter criteria for KNN algorithm classification. The k-value represents the number of reference samples selected, which is determined by the actual problem requirements. The distance function corresponds to a non-negative function, used to describe the Similarity between different samples. The distance function can be defined as: (20) Where represents the distance between the training sample set and the test sample set , xi is the ith attribute of the training sample set , yi is the ith attribute of the test sample set , i = 1, ⋯, m, m is the feature dimension, is the distance measurement matrix (identity matrix). Assuming that there are J classes in yi’s KNN, the majority voting is: (21) Where Vot(yi, Cj) indicates that the test sample yi is the number of class Cj, Pa(ai, Cj) indicates whether the ai sample among the KNN of yi belongs to the class Cj, which is defined as: (22) Where j = 1, ⋯, J.

3.2. IKNN

3.2.1. Sample similarity measurement criterion based on M-matrix.

When classifying complex datasets, the classic KNN algorithm used simple Euclidean distance as the measure of similarity between samples and assigned the same weight to the sample distance through the identity matrix I, leading to an exaggeration of weak attributes and weakening of strong attributes easily, thus leading to inaccurate classification.

Based on the simulated annealing algorithm [63], the weight matrix M is constructed to replace the identity matrix I, then the Eq (20) can be modified as: (23) where the weight matrix M is based on the simulated annealing algorithm. For the datasets under study, the value of the distance weight matrix M is adaptively generated with the iteration of the algorithm. Therefore, the M is: (24) Where li is the label of the test sample xi and Cj is the jth class. The objective function is set: (25)

The simulated annealing algorithm is used to solve the weight matrix M, and the pseudo code is executed as shown in Algorithm 3.

3.2.2. Weighted voting criteria.

KNN adopted the "majority voting" to identify test samples. The selected sample was tested as the center of the circle, and in the range of the neighborhood with the k-value, the label with the most occurrences was used as the test sample label. Therefore, when the sample size is unbalanced, the classification results tend to be biased towards large-volume samples (Fig 5).

In Fig 5, the red hexagon is the test sample, and the training sample contains two kinds of labels: square and triangle. When k = 5, in the k neighborhood, the classification result is biased toward large-capacity samples (triangles), resulting in the final labeling of the test sample as a triangle label (the test sample is closer to a square label).

The original algorithm is improved based on the distance weight to balance the deviation of the classification results caused by the sample size. Assuming that there are J classes in the datasets, Eq (21) is modified as: (26) Where Class is the predicted test sample class, S is the total number of data, Sj is the number of the j-th class of data, ai is the number of the selected KNN belonging to the j-th class, f indicates that the class corresponding to the solution result can be returned. The test sample in Fig 4 is predicted using Eq (26) and result is a square.

(27)

3.3. Time complexity analysis of IKNN

The traditional KNN algorithm does not require training and can be directly used for testing. So, its time complexity is , Among them, k is the number of nearest neighbors, n is the number of samples, m is the sample feature dimension, and l is the number of samples to be tested.

The improved KNN algorithm needs to be trained in the original data set to find the best metric matrix M, and then tested. The training time complexity is , and the testing time complexity is , where k is the number of nearest neighbors, T is the number of training times, n is the number of samples, m is the sample feature dimension, and l is the number of samples to be tested.

Algorithm 3. The solution of weight matrix M.

Input: T0:initial temperature, Te:Termination temperature

1: Tk = T0

2: M0 = I

3: while (TkTe)

4:  for1 i = 2:n

5:   for2 j = 1:i-1

6:    ()

7:    ΔE = Z(M) − Z(M0)

8:    .

9:    if (p1 > rand())

10:    M0(i, j) = M(i, j +1)

11:    end if

12:   end for2

13:   end for1

14:  Tk+1 = ηTk (kk + 1)

15: end while

Return: M = M0

4. Improved WOA and improved KNN for FS

4.1. Fitness function

Combining improved WOA and improved KNN algorithm for FS is a wrapper-based FS method. The main purpose of IWOA for FS is to find the smallest feature subset. IKNN is based on feature subsets to classify and get the best classification accuracy. Therefore, the FS method based on IWOA and IKNN has two goals: (1) Finding the least feature subset as much as possible; (2) Making the algorithm classification accuracy as high as possible in the found feature subset.

In order to solve these two conflicting goals, IWOA is regarded as a solution. The location of each individual whale is the solution. When the number of features in the solution is less, the classification accuracy is higher, and the solution is better [26]. In order to balance the two conflicting goals (the smallest number of features and the largest classification accuracy), the fitness function [64] is: (28) where γR(D) represents the IKNN classification error rate, |R| represents the length of the selected feature subset, |C| represents the total number of features in the datasets, α ∈ (0, 1) represents the importance of classification quality, and β = (1-α) represents the importance of the length of the subset [64].

In order to make the proposed algorithm suitable for FS, this paper maps the continuous search space to the binary space. The main method is to take 1 when the algorithm fitness is greater than 0.5, and take 0 when it is less than or equal to 0.5.

(29)

4.2. Pseudo-code of the IWOAIKFS algorithm

In order to use IWOAIKFS for FS, a binary representation is used to represent the solution to the FS problem. Assuming that the selected feature is 1, and the unselected feature is 0. Therefore, the pseudo-code based on IWOAIKFS is shown in Algorithm 4.

4.3. Time complexity analysis of IWOAIKFS

Since IWOAIKFS is a feature selection method obtained by IWOA optimizing IKNN, its time complexity can be divided into three parts, namely IWOA time complexity, IKNN complexity and IWOA optimizing IKNN time complexity. So, the time complexity of IWOAIKFS can be obtained as , where k is the number of nearest neighbors, N is the population size, tmax is the number of iterations, n is the number of samples, m is the sample feature dimension, and l is the number of samples to be tested.

5. Experimental evaluation and discussion

In order to verify the effectiveness of the three methods proposed in this paper, three numerical experiments are designed as follows:

  1. (1) In order to verify that IWOA has better optimization and convergence performance, this paper selects 8 benchmark functions and conducts simulation experiments in different dimensions (30 dimensional and 100 dimensional), and compare and analyze the optimization results of 8 meta-heuristic algorithms. The details are shown in Section 5.3.

Algorithm 4. IWOAIKFS.

Input: Dataset, tmax, lb, ub

1: Dataset normalization and Initialize a feature population Xi with Algorithm 1

2: Calculate the fitness of each feature vector (Eq (28))

3: X* = the best feature vector

4: while (t<tmax)

5:  for each individual

6:   Update a (Eq 15), A, C (Eq 16)

7:   Calculate p (Eqs 13 and 14)

8:   if1 (p<0.5)

9:    if2 (|A|<1)

10:     Update the position of current feature vector (Eq (17))

11:    else if2 (|A|≥1)

12:     Select a random search feature (Xrand)

13:     Update the position of current feature vector (Eq (1))

14:    end if2

15:   else if1 (p ≥ 0.5)

16:    Update the position of current feature vector (Eq (19))

17:   end if1

18:  end for

19:  Check the boundary and amend it

20:  Calculate the fitness of each feature vector

21:  Update X* if there is a better feature vector

22:  t = t+1

23: end while

24: Divide X* into training set and test set

25: Calculate the weight matrix M according to Algorithm 3

26: Calculate the distance between the test sample and the training sample according

to Eq (23)

27: Sort each distance found

28: Select k points as the KNN of the test sample

29: Calculate and predict the test sample class (Eq (26))

30: Count the number of correct predictions and calculate Acc

Return: Accuracy (Acc)

  1. (2) In order to verify that IKNN has better classification performance, this paper selects 8 datasets on UCI [65] for simulation experiments, and compare and analyze with the experimental results of 5 kinds of classifiers. The details are shown in Section 5.4.
  2. (3) In order to verify that IWOAIKFS has good performance, this paper adds 7 datasets (15 datasets in total) to perform numerical experiments based on Section 5.4, and compares and analyzes the experimental results with 6 FS methods based on the meta-heuristic algorithm. The details are shown in Section 5.5.

5.1. Experimental environment and datasets

5.1.1. Experimental environment.

  1. System: 64bit Windows 10
  2. CPU: Intel(R) Core (TM) i7-5557U
  3. Main frequency: 3.10GHz; RAM: 8G
  4. Platform: Matlab2020b and Python 3.9

5.1.2. Benchmark functions and datasets.

In order to evaluate the superior performance of IWOA, IKNN and IWOAIKFS, 8 benchmark functions and 15 datasets were selected for numerical experiments. The detailed description shown in Tables 1 and 2.

5.2. IWOA comparative experiment

5.2.1. Parameter setup for IWOA and other algorithms.

In order to ensure the objective fairness of IWOA numerical experiments, the maximum number of iterations of all algorithms was set to 1000. The initial population size of the algorithm was 50. Each group of experiments was performed 30 times, and the mean and standard deviation (STD) were calculated as the algorithm Evaluation indicators. The detailed parameter settings of the algorithm are shown in Table 3.

thumbnail
Table 3. Parameter settings of IWOA and other selected algorithms.

https://doi.org/10.1371/journal.pone.0267041.t003

5.2.2. IWOA experimental results and analysis.

Numerical experiments were carried out on 8 test functions in 30 and 100 dimensions, respectively. The mean and standard deviation of the optimization results of each algorithm on the test function was counted (Table 4). Fig 5 is the log mean convergence curve of F1, F3, F5, and F7 under different algorithms. Fig 6 shows a histogram of the average running time of different algorithms on 8 benchmark functions.

thumbnail
Fig 6. Logarithmic mean convergence curves of different algorithms.

(a)F1(Sphere function), b)F3(Schwefel 1.2 function), (c)F5(Rastrigin function), d)F7(Weierstrass function), (e)F1(30D), (f)F3(30D), (g)F5(30D), (h)F7(30D). (i)F1(100D), (j)F3(100D), (k)F5(100D), (l)F7(100D).

https://doi.org/10.1371/journal.pone.0267041.g006

thumbnail
Table 4. The comparison of obtained solutions for 8 benchmark functions.

https://doi.org/10.1371/journal.pone.0267041.t004

Table 4 shows that IWOA has a smaller mean and standard deviation on most test functions and shows better optimization results. The experimental results of IWOA were analyzed in 30 and 100 dimensions. Except for the F8 function inferior to the HHO algorithm, the IWOA’s experimental results on the eight benchmark functions are several or even hundreds of orders of magnitude higher than the other comparison algorithms. The experimental results showed that IWOA had high convergence accuracy and stability. IWOA had high optimization results on unimodal functions and most multimodal functions, verifying that IWOA had better local mining and global exploration capabilities. Therefore, IWOA has better stability and convergence than the other 8 meta-heuristic algorithms.

In order to better compare the convergence performance of the 9 algorithms, the log-mean fitness values of the two unimodal functions (F1 and F3) and the two multimodal functions (F5 and F7) were selected to draw the convergence curve (Fig 6). Fig 6(a)–6(d) are F1, F3, F5, and F7 function images, respectively, and Fig 6(e)–6(h) and Fig 6(i)–6(l) correspond to F1, F3, F5, and F7, respectively Function, the average convergence curve of the number in 30 dimensions and 100 dimensions. Fig 6 shows that in the same dimension, the optimization of IWOA for unimodal functions is higher than that of the other 9 comparison algorithms. When the dimensions are different, IWOA has better convergence when optimizing high-dimensional multimodal functions. As the dimensionality increases, IWOA shows better convergence performance.

Fig 7(a) and 7(b) show the average running time of 9 algorithms in 30 and 100 dimensions, respectively. Fig 7 shows that the average running time of the IWOA algorithm is only better than that of the ASO algorithm due to the increased time of IWOA when initializing the population and generating skew random probabilities. However, the comparison of Fig 7(a) and 7(b) shows that the average running time increase of IWOA in high and low dimensions is smaller than that of other comparison algorithms with better convergence accuracy.

thumbnail
Fig 7. Average running time of different algorithms.

(a) 30D Average running time(s), (b) 100D Average running time(s).

https://doi.org/10.1371/journal.pone.0267041.g007

Compared with the other 8 comparison algorithms, IWOA has better local convergence and global optimization and shows better reliability and robustness than other comparison algorithms when solving high-dimensional multimodal functions.

From Table 4, Figs 6 and 7 shows that when IWOA optimizes the 8 benchmark functions, the smaller mean and standard deviation feedback that IWOA has better overall optimization stability. The convergence curve shows that IWOA has better optimization results on high-dimensional multimodal functions. The average running time of IWOA is higher than that of the comparison algorithm. However, IWOA can achieve higher accuracy with the sacrifice of a small increase in time. Therefore, IWOA has better optimization performance than the eight comparison algorithms.

5.2.3. Wilcoxon’s test and Friedman test.

Only the mean and standard deviation of the results of 30 independent experiments cannot fully measure the superiority of the improved algorithm. As one of the nonparametric statistical test methods for evaluating the performance of algorithms, Wilcoxon’s rank sum test [72] is often used to verify the performance of meta-heuristic algorithms Wilcoxon’s test was used to conduct experiments at the 5% significance level to judge whether each result of IWOA was statistically significantly different from the best results of other algorithms. Table 5 shows the p-values calculated in the IWOA of the eight benchmark functions and the Wilcoxon’s test of the other algorithms. p<0.05 is considered as a strong verification to reject the null hypothesis.

thumbnail
Table 5. p-value of the Wilcoxon test for the optimization results of IWOA and other algorithms based 8 benchmark functions (p> = 0.5 are in bold).

https://doi.org/10.1371/journal.pone.0267041.t005

Table 5 shows that the p-value of IWOA is less than 0.05 at 30 dimensions, indicating that the optimization performance of IWOA is statistically significant and verifying that IWOA has higher convergence accuracy than other comparison algorithms. The p-value of IWOA is second only to the comparison with the F4 and F5 functions of WOA in 100 dimensions, showing IWOA has a better performance compared with other algorithms.

In order to better evaluate the effectiveness of the method proposed in this paper and better detect the significant differences between two or more observation data, the method Friedman is used to perform statistical tests on the algorithm proposed in this paper. Friedman test is a nonparametric two-way analysis of variance method [72], The test process is as follows:

  1. (1) Collect observations for each algorithm or problem;
  2. (2) Each question is ranked from the best result (1) to the worst result (k), the i-th question is defined as; ;
  3. (3) Find the average ranking of each algorithm in all problems, and get the final ranking .

Under the null hypothesis, the rank Rj of all algorithms is equal, and the Friedman statistic value Ff is shown in formula (30), According to Table 4, the Friedman test was implemented on Matlab 2020b for IWOA and the comparison algorithm, and the results are shown in Table 6.

thumbnail
Table 6. Friedman test results of benchmark functions with different dimensions.

https://doi.org/10.1371/journal.pone.0267041.t006

(30)

It can be seen from Table 6 that the asymptotically significant p-value obtained by the Friedman test is far less than 0.01 on both the 30D and 100D benchmark functions, so it can be seen that IWOA is comparable to the 30D and 100D benchmark functions. There are significant differences between the algorithms, Therefore, it can be seen that there are significant differences between IWOA and the comparison algorithms in both 30D and 100D benchmark functions; However, the rank mean of the IWOA algorithm is the smallest in both 30D (1.38) and 100D (1.25), indicating that its optimization performance is the best. Combining the test results of Wilcoxon’s test and Friedman test, it can be concluded that IWOA has better performance than the comparison algorithm in general.

5.3. IKNN comparative experiment

5.3.1. Influence of 3 strategies on KNN algorithm.

The KNN algorithm that is only affected by the weight matrix M is MKNN. The KNN algorithm that is only affected by the weighted classification criteria is WKNN. The KNN algorithm affected by the two strategies is IKNN.

The simulated annealing temperature T0 = 100, the end temperature Te = 1, the number of cycles per temperature is 100, and the learning rate η = 0.9, then the number of iterations required to generate M is: (31) 8 datasets in Section 5.1 were taken, and the numerical experiments were performed in Python 3.9 environment. The experimental results under different strategies are shown in Table 5 (Breast: Breast_cancer, Heart: Heart_disease).

5.3.2. Comparison of classification accuracy between IKNN and other classifiers.

In order to further verify the effectiveness of IKNN, we uses KNN [62], Naive Bayes [73], C4.5 [74], SVM [75] and BP neural network [76] as comparison algorithms for numerical experiments. Among them, the number of nearest neighbors of KNN and IKNN k = 10. SVM uses Gaussian kernel function. BP neural network uses a stochastic gradient optimizer, sets 1 hidden layer. The number of hidden layer nodes is 13, and the activation function is a linear function. The experimental results of each algorithm on the datasets in the Python 3.9 environment are shown in Table 8.

5.3.3. IKNN results discussion.

Table 7 shows the experimental results of the KNN algorithm on 8 datasets under different strategies. Under the same parameter settings, the classification accuracy of MKNN, WKNN and IKNN on the 8 datasets is greater than or equal to that of the KNN algorithm. Among them, the IKNN algorithm that combines two strategies has the best effect. In the KNN algorithm improved based on a single strategy, MKNN performs better. Therefore, using the weight matrix M to measure the importance of samples has better classification performance than the simple Euclidean distance to measure the sample distance. When the amount of data is small, the classification accuracy of WKNN has no obvious change compared with the KNN algorithm, but for data sets with large amounts of data, WKNN has a better classification effect. The three proposed strategies can effectively improve the KNN algorithm and its classification performance.

thumbnail
Table 7. The comparison of classification accuracy of different strategies.

https://doi.org/10.1371/journal.pone.0267041.t007

Table 8 shows the classification experiment results of the IKNN algorithm and other 5 classifiers on 8 datasets. Table 8 show that in the same experimental environment, the classification accuracy of the IKNN algorithm on most datasets is higher than other comparison algorithms. Among them, the classification accuracy of the IKNN algorithm has been improved most significantly on the Glass dataset, and the classification accuracy on the Wine dataset and Bird dataset is inferior to Naive Bayes and SVM due to the universality of the algorithm. Naive Bayes and SVM are more suitable for Wine and Bord datasets (the effect is not obvious on other datasets). Therefore, IKNN has better classification performance than comparison algorithms.

thumbnail
Table 8. The comparison of classification accuracy of different classifiers.

https://doi.org/10.1371/journal.pone.0267041.t008

5.4. IWOAIKFS comparative experiment

The programming tool MATLAB 2020b was applied to verify that IWOAIKFS has better classification performance. In the computing environment of Section 5.1, experiments were carried out using 15 datasets (Table 2) in UCI. The focus of the experiment is to use the IKNN classifier for IWOA (k = 5 [8, 16, 20, 23, 26, 41, 54, 64, 77]) and the latest 6 meta-heuristic algorithms to use the KNN classifier (k = 5), performance comparison of 30 independent experiments under 15 datasets.

The evaluation indicators of the experimental results are the mean classification accuracy of the algorithm for 30 independent experiments. The standard deviation of the optimal accuracy of the algorithm for 30 independent experiments, and the average number of features selected by the algorithm in 30 independent experiments.

5.4.1. Parameter setup for IWOAIKFS and other optimizers.

Table 9:

thumbnail
Table 9. Parameter settings of IWOAIKFS and other selected algorithms.

https://doi.org/10.1371/journal.pone.0267041.t009

5.4.2. IWOAIKFS comparison of classification accuracy with other optimizers.

In order to test the effectiveness of IWOAIKFS, IWOAIKFS were compared with 6 FS methods according to meta-heuristic algorithms (ASO, GWO, HHO, SCA, SSA, and WOA). Table 10 shows the average accuracy, standard deviation, and the average number of selected features of each algorithm for 30 independent experiments under 15 datasets. Under 30 independent experiments on 15 datasets, the comparison of IWOAIKFS between other optimizers is shown in Table 10:

  1. In terms of average classification accuracy index, IWOAIKFS has the highest classification accuracy on 14 datasets, ranking first among 7 algorithms. The mean classification accuracy of IWOAIKFS on the Breast_cancer, Chart, Iris, Wine, and Zoo datasets has reached 100%. IWOAIKFS only ranks 2nd on the Heart_disease dataset, slightly inferior to the HHO algorithm. However, the standard deviation of IWOAIKFS is 0, indicating stronger stability relative to HHO. Therefore, IWOAIKFS has better classification accuracy on 15 datasets than other algorithms in terms of mean classification accuracy.
  2. In the standard deviation indicator, the standard deviation of SCA on 15 datasets is all 0, ranking first among 7 algorithms, indicating that the algorithm is more stable. IWOAIKFS ranks 3rd among 7 algorithms, inferior to SCA and HHO algorithms, caused by the IKNN calculation of the weight matrix M for different datasets and the randomness of the algorithm.
  3. In terms of selecting the number of features, IWOAIKFS chose the least features on the Breast_cancer, Car, Glass, Heart_disease and Ionosphere datasets, ranking 2nd among 7 algorithms, second only to the ASO algorithm (the least number of features is selected on 6 datasets).
thumbnail
Table 10. Comparison between IWOAIKFS with other competitor optimizers based on accuracy (k = 5 and best are in bold).

https://doi.org/10.1371/journal.pone.0267041.t010

Fig 8 shows the total average accuracy of IWOAIKFS on all data sets. Fig 9 shows a visualized bar graph of the average accuracy of IWOAIKFS and other algorithms in 15 datasets. Figs 8 and 9 identifies that in 15 datasets, IWOAIKFS has better performance in classification accuracy than other algorithms. The IWOAIKFS algorithm has the highest total mean accuracy in 15 datasets, reaching 91.96%, 2.05%, 2.14%, 1.84%, 2.32%, 2.91%, 2.60% higher than 6 algorithms, respectively.

thumbnail
Fig 8. Total mean accuracy of IWOAIKFS compared to other algorithms under all datasets.

https://doi.org/10.1371/journal.pone.0267041.g008

thumbnail
Fig 9. The accuracy of IWOAIKFS compared to other optimizers.

https://doi.org/10.1371/journal.pone.0267041.g009

Fig 10 shows that each algorithm performs 30 independent experiments on 15 datasets and the ratio of the average number of features selected by each algorithm. IWOAIKFS is inferior to one or more of the algorithms on most data sets. However, from an overall point of view, most of the feature ratios selected by IWOAIKFS are below the average. Therefore, it can be considered to have better FS performance.

thumbnail
Fig 10. The selection ratio of IWOAIKFS compared to other optimizers.

https://doi.org/10.1371/journal.pone.0267041.g010

In summary, considering the three indicators (accuracy, standard deviation and number of FSs), IWOAIKFS has better classification accuracy on 15 datasets than other optimizers. It also exerts better effects on the standard deviation index and the average number of selected features than most algorithms. Therefore, IWOAIKFS can be considered to have better superior performance.

5.4.3. IWOAIKFS convergence comparison with other optimizers.

Fig 11 shows the mean fitness convergence curve of IWOAIKFS and the other 6 optimizers under 15 datasets. These algorithms are all tested and evaluated under the same population size and number of iterations.

thumbnail
Fig 11. Convergence curve of IWOAIKFS versus other algorithms over all datasets (k = 5).

(a) Birds, (b) Blood, (c) Breast_cancer, (d) Bupa, (e) Car, (f) Chart, (g) Digits, (h) Glass, (i) Heart_disease, (j) Indian, (k) Ionosphere, (l) Iris, (m) Planning, (n) Wine, (o) Zoo.

https://doi.org/10.1371/journal.pone.0267041.g011

Fig 11(a)–11(o) represents the mean fitness convergence curve of different datasets under 7 algorithms. Except for the poor convergence performance on the Breast_cancer and Wine datasets (Fig 11(c)–11(n)), IWOAIKFS shows better convergence performance in the remaining datasets. The convergence speed of the algorithm determines the final classification accuracy. IWOAIKFS shows the best fitness value on the Heart_disease dataset, but its classification accuracy does not perform well on the Heart_disease dataset. This is because under 100 iterations of IWOAIKFS, the Breast_cancer and Wine datasets have reached the optimal value, so their mean accuracy values have reached the optimal value. For the Heart_disease data set, when the convergence of IWOAIKFS reaches the best and IKNN evaluates the best subset, The constructed weight matrix M has random effects. Therefore, from the perspective of the convergence curve, IWOAIKFS shows better convergence performance overall.

5.4.4. IWOAIKFS Wilcoxon’s test and Friedman test.

Table 11 is the p-value of Wilcoxon’s test based on the mean classification accuracy of 30 independent experiments. Compared with other optimizers, IWOAIKFS has better statistical significance on all datasets because its test p-value is less than 0.05. In addition, since its standard deviation is 0, ASO, GWO, HHO, SCA and SSA show the same statistical performance on the Blood, Bupa and Wine datasets; ASO, GWO, HHO, SCA, SSA and WOA show the same performance for Iris.

thumbnail
Table 11. p-value of the Wilcoxon test for the classification accuracy results of IWOAIKFS and other optimizers (k = 5 and p > = 0.5 are in bold).

https://doi.org/10.1371/journal.pone.0267041.t011

Table 12 is the p-value of Friedman test results. It can be seen from Table 12 that the asymptotically significant p-value obtained by the Friedman test is far less than 0.01 (3.66E-12), so it can be seen that there are significant differences between IWOAIKFS and the comparison algorithms on the 15 UCI benchmark data sets; but the rank of IWOAIKFS The mean is the smallest (1.53) among all contrasting algorithms, indicating that it has better optimization performance than contrasting algorithms.

Fig 12 shows a boxplot of classification accuracy obtained by all optimizers performing 30 independent experiments on 15 datasets. In Fig 12, the lower quartile (Qj) represents lower values, the upper quartile (Q3) represents higher values, and the red line in the box represents the median value. It can be seen from Fig 12 that IWOAIKFS ranks first in performance among all algorithms, and has the best performance in 15 datasets.

thumbnail
Fig 12. Boxplot of IWOAIKFS versus other algorithms over all datasets (k = 5).

(a) Birds, (b) Blood, (c) Breast_cancer, (d) Bupa, (e) Car, (f) Chart, (g) Digits, (h) Glass, (i) Heart_disease, (j) Indian, (k) Ionosphere, (l) Iris, (m) Planning, (n) Wine, (o) Zoo.

https://doi.org/10.1371/journal.pone.0267041.g012

5.4.5. IWOAIKFS results discussion.

Through the experimental results and analysis in Section 5.5, when the meta-heuristic algorithm is applied to the FS, IWOAIKFS has better overall optimization performance than other meta-heuristic algorithms. First of all, considering only the mean classification accuracy of 30 experiments, IWOAIKFS has the best search performance, achieving a total mean classification accuracy of 91.96% for all datasets, at least 1.8% higher than other meta-heuristic algorithms, indicating that the proposed algorithm has higher classification performance. Secondly, considering only the accuracy standard deviation of 30 experiments, the standard deviation of SCA on all datasets is 0, indicating that SCA has better stability than other algorithms. Only from the analysis of the number of FS, ASO realized the selection of the least number of features on 6 datasets, and IWOAIKFS realized the selection of the least number of features on 5 datasets. The proposed method is slightly inferior to the ASO algorithm, but the difference between the two is not much. Therefore, it can be considered that the IWOAIKFS and ASO algorithm have the same advantages in the selection of the minimum number of features. However, if it is analyzed as a whole, the proposed method has better superior performance than the rest of the algorithms in this paper.

In summary, analyzing from a single indicator, despite the effective results in some aspects, IWOAIKFS does not show the best search performance in all indicators, and the algorithm runs longer in experiments, causing certain limitations. Therefore, for IWOAIKFS and other meta-heuristic FS methods, the search and evaluation of the optimal feature subset should be changed according to the data set and actual needs to find the optimal feature subset in a specific scenario.

Conclusions

In this work, we start from two directions of exploring WOA optimization performance and applying intelligent optimization algorithm to solve the FS, and form the following conclusions through numerical simulation experiments and theoretical analysis.

  1. We propose an improved whale optimization algorithm (IWOA). Aiming at the shortcomings of slow optimization speed and low convergence accuracy of WOA, this method first uses chaotic reverse elite individuals to improve the diversity of the initial population of the algorithm. Then, We improve the traditional whale optimization algorithm by simulating the individual preference of whales to hunt prey and the nonlinear weight update mechanism of whale group position movement. Finally, the experimental results of 8 benchmark functions and 8 meta-heuristic algorithms in different dimensions show that: compared with the comparison algorithm, IWOA has smaller mean, standard deviation and average fitness in the overall function optimization, and shows better statistically significant performance in Wilcoxon’s test and Friedman test, which verifies that IWOA has higher convergence performance and local extremum escape ability.
  2. We propose an improved K-Nearest Neighbor algorithm (IKNN). Aiming at the problem that KNN cannot distinguish the similarity between samples by using simple Euclidean distance as the measure of similarity between samples, this method constructs the similarity measure matrix M between samples through simulated annealing algorithm, and improves KNN by combining with the new weighted classification. The experimental results with 5 classifiers under 8 benchmark UCI datasets show that the higher classification accuracy indicates that IKNN has better classification performance than the comparison algorithm.
  3. We propose a feature selection method based on IWOA and IKNN. Aiming at the problem that the traditional feature selection method selects a large number of features and low classification accuracy when screening the optimal feature subset, this method uses the powerful optimization ability of IWOA to search for feature subsets, and uses the powerful classification performance of IKNN to evaluate feature subsets, at the same time, IKNN is optimized synchronously through IWOA, so as to obtain an improved feature selection method. We conduct simulation experiments and analysis on 15 UCI benchmark datasets with IWOAIKFS and 6 optimizers. The experimental results show that on the whole, IWOAIKFS can filter out fewer feature subsets and has higher classification accuracy, showing better search and convergence performance. In addition, the test results of Wilcoxon’s test and Friedman test also show that IWOAIKFS has better statistical significance, which further verifies the validity of IWOAIKFS.

Although the three improved methods proposed in this paper have better performance than the original algorithm, they still have some shortcomings. For example, IWOA has poor convergence performance when dealing with high-dimensional multimodal functions, and the time complexity of IKNN and IWOAIKFS is too high. Therefore, we will conduct further research on these issues in the future, as follows.

  1. In the future, we plan to build a theoretical analysis system and evaluation system for meta-heuristic algorithms, as well as a community communication module. Due to the problem of over-using "metaphor" in the meta-heuristic algorithm, In order to better distinguish the new meta-heuristic algorithm Whether (or improving the algorithm) can promote the research in the field of optimization. the follow-up research in this paper will try to establish a theoretical analysis system and evaluation system and a community communication module for the corresponding meta-heuristic algorithm.
  2. In the future, we plan to try to reduce the time complexity of IWOAIKFS. Since IWOAIKFS is the fusion of IWOA and IKNN algorithm, and influenced by IKNN algorithm, its time complexity is much higher than that of common feature selection methods. Therefore, follow-up research will try to integrate the training and testing processes in IKNN to reduce the time complexity of the IKNN algorithm, thereby reducing the time complexity of IWOAIKFS.
  3. In the future, we plan to build a large data set preprocessing system based on IWOAIKFS. After we have built the evaluation framework of the meta-heuristic algorithm and reduced the time complexity of IWOAIKFS, we can try to build a large data set preprocessing system based on IWOAIKFS, which is used to quickly process complex data sets for faster entry into machine learning.

References

  1. 1. Manbari Z., AkhlaghianTab F., & Salavati C. (2019). Hybrid fast unsupervised feature selection for high-dimensional data. Expert Systems with Applications, 124, 97–118.
  2. 2. Bennasar M., Hicks Y., & Setchi R. (2015). Feature selection using joint mutual information maximisation. Expert Systems with Applications, 42(22), 8520–8532.
  3. 3. Liu H., & Yu L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on knowledge and data engineering, 17(4), 491–502.
  4. 4. Arora S., & Anand P. (2019). Binary butterfly optimization approaches for feature selection. Expert Systems with Applications, 116, 147–160.
  5. 5. Li J., Cheng K., Wang S., Morstatter F., Trevino R. P., Tang J., & Liu H. (2017). Feature selection: A data perspective. ACM Computing Surveys (CSUR), 50(6), 1–45.
  6. 6. Moorthy, R. S., & Pabitha, P. (2018, August). A study on meta heuristic algorithms for feature selection. In International conference on intelligent data communication technologies and internet of things (pp. 1291–1298). Springer, Cham.
  7. 7. Dash M., & Liu H. (1997). Feature selection for classification. Intelligent data analysis, 1(1–4), 131–156.
  8. 8. Neggaz N., Houssein E. H., & Hussain K. (2020). An efficient henry gas solubility optimization for feature selection. Expert Systems with Applications, 152, 113364.
  9. 9. Inza I., Larranaga P., Blanco R., & Cerrolaza A. J. (2004). Filter versus wrapper gene selection approaches in DNA microarray domains. Artificial intelligence in medicine, 31(2), 91–103. pmid:15219288
  10. 10. Liu, H., & Motoda, H. (2012). Feature selection for knowledge discovery and data mining (Vol. 454). Springer Science & Business Media.
  11. 11. Abdel-Basset M., El-Shahat D., El-henawy I., de Albuquerque V. H. C., & Mirjalili S. (2020). A new fusion of grey wolf optimizer algorithm with a two-phase mutation for feature selection. Expert Systems with Applications, 139, 112824.
  12. 12. Ahmad, S. R., Bakar, A. A., & Yaakub, M. R. (2015, July). Metaheuristic algorithms for feature selection in sentiment analysis. In 2015 Science and Information Conference (SAI) (pp. 222–226). IEEE.
  13. 13. Ramanujam K. S., & David K. (2019). Survey on Optimization Algorithms Used for Feature Selection Techniques in Web Page Classification. Journal of Computational and Theoretical Nanoscience, 16(2), 384–388.
  14. 14. Faris, H., Aljarah, I., & Al-Shboul, B. (2016, September). A hybrid approach based on particle swarm optimization and random forests for e-mail spam filtering. In International conference on computational collective intelligence (pp. 498–508). Springer, Cham.
  15. 15. Faris H., Hassonah M. A., Ala’M A. Z., Mirjalili S., & Aljarah I. (2018). A multi-verse optimizer approach for feature selection and optimizing SVM parameters based on a robust system architecture. Neural Computing and Applications, 30(8), 2355–2369.
  16. 16. Mafarja M. M., & Mirjalili S. (2017). Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing, 260, 302–312.
  17. 17. Agrawal P., Abutarboush H. F., Ganesh T., & Mohamed A. W. (2021). Metaheuristic Algorithms on Feature Selection: A Survey of One Decade of Research (2009–2019). IEEE Access, 9, 26766–26791.
  18. 18. Mirjalili S., Mirjalili S. M., & Lewis A. (2014). Grey wolf optimizer. Advances in engineering software, 69, 46–61.
  19. 19. Seth, J. K., & Chandra, S. (2016, March). Intrusion detection based on key feature selection using binary GWO. In 2016 3rd international conference on computing for sustainable global development (INDIACom) (pp. 3735–3740). IEEE.
  20. 20. Emary E., Zawbaa H. M., & Hassanien A. E. (2016). Binary grey wolf optimization approaches for feature selection. Neurocomputing, 172, 371–381.
  21. 21. Al-Tashi Q., Kadir S. J. A., Rais H. M., Mirjalili S., & Alhussian H. (2019). Binary optimization using hybrid grey wolf optimization for feature selection. IEEE Access, 7, 39496–39508.
  22. 22. Too J., & Rahim Abdullah A. (2020). Binary atom search optimisation approaches for feature selection. Connection Science, 32(4), 406–430.
  23. 23. Too J., Abdullah A. R., & Mohd Saad N. (2019). A new quadratic binary harris hawk optimization for feature selection. Electronics, 8(10), 1130.
  24. 24. Kumar L., & Bharti K. K. (2021). A novel hybrid BPSO–SCA approach for feature selection. Natural Computing, 20(1), 39–61.
  25. 25. Alweshah M., Al Khalaileh S., Gupta B. B., Almomani A., Hammouri A. I., & Al-Betar M. A. (2020). The monarch butterfly optimization algorithm for solving feature selection problems. Neural Computing and Applications, 1–15.
  26. 26. Mafarja M., & Mirjalili S. (2018). Whale optimization approaches for wrapper feature selection. Applied Soft Computing, 62, 441–453.
  27. 27. Jia H., Peng X., & Lang C. (2021). Remora optimization algorithm. Expert Systems with Applications, 185, 115665.
  28. 28. Abdollahzadeh B., Gharehchopogh F. S., & Mirjalili S. (2021). African vultures optimization algorithm: A new nature-inspired metaheuristic algorithm for global optimization problems. Computers & Industrial Engineering, 158, 107408.
  29. 29. Abdollahzadeh B., Soleimanian Gharehchopogh F., & Mirjalili S. (2021). Artificial gorilla troops optimizer: A new nature‐inspired metaheuristic algorithm for global optimization problems. International Journal of Intelligent Systems, 36(10), 5887–5958.
  30. 30. Naruei, I., & Keynia, F. (2021). Wild horse optimizer: A new meta-heuristic algorithm for solving engineering optimization problems. Engineering with Computers, 1–32.
  31. 31. Wang J., Khishe M., Kaveh M., & Mohammadi H. (2021). Binary Chimp Optimization Algorithm (BChOA): a New Binary Meta-heuristic for Solving Optimization Problems. Cognitive Computation, 13(5), 1297–1316.
  32. 32. Abualigah L., Diabat A., Mirjalili S., Abd Elaziz M., & Gandomi A. H. (2021). The arithmetic optimization algorithm. Computer methods in applied mechanics and engineering, 376, 113609.
  33. 33. Abualigah L., Yousri D., Abd Elaziz M., Ewees A. A., Al-Qaness M. A., & Gandomi A. H. (2021). Aquila optimizer: a novel meta-heuristic optimization algorithm. Computers & Industrial Engineering, 157, 107250.
  34. 34. Mirjalili S., & Lewis A. (2016). The whale optimization algorithm. Advances in engineering software, 95, 51–67.
  35. 35. Zamani H., & Nadimi-Shahraki M. H. (2016). Feature selection based on whale optimization algorithm for diseases diagnosis. International Journal of Computer Science and Information Security, 14(9), 1243.
  36. 36. Mafarja M., Jaber I., Ahmed S., & Thaher T. (2021). Whale optimisation algorithm for high-dimensional small-instance feature selection. International Journal of Parallel, Emergent and Distributed Systems, 36(2), 80–96.
  37. 37. Sayed, G. I., Darwish, A., Hassanien, A. E., & Pan, J. S. (2016, November). Breast cancer diagnosis approach based on meta-heuristic optimization algorithm inspired by the bubble-net hunting strategy of whales. In International conference on genetic and evolutionary computing (pp. 306–313). Springer, Cham.
  38. 38. Xu, H., Fu, Y., Fang, C., Cao, Q., Su, J., & Wei, S. (2018, September). An improved binary whale optimization algorithm for feature selection of network intrusion detection. In 2018 IEEE 4th International Symposium on Wireless Systems within the International Conferences on Intelligent Data Acquisition and Advanced Computing Systems (IDAACS-SWS) (pp. 10–15). IEEE.
  39. 39. Mafarja M., Heidari A. A., Habib M., Faris H., Thaher T., & Aljarah I. (2020). Augmented whale feature selection for IoT attacks: Structure, analysis and applications. Future Generation Computer Systems, 112, 18–40.
  40. 40. Shuaib M., Abdulhamid S. I. M., Adebayo O. S., Osho O., Idris I., Alhassan J. K., & Rana N. (2019). Whale optimization algorithm-based email spam feature selection method using rotation forest algorithm for classification. SN Applied Sciences, 1(5), 1–17.
  41. 41. Hussien, A. G., Houssein, E. H., & Hassanien, A. E. (2017, December). A binary whale optimization algorithm with hyperbolic tangent fitness function for feature selection. In 2017 Eighth international conference on intelligent computing and information systems (ICICIS) (pp. 166–172). IEEE.
  42. 42. Hussien A. G., Hassanien A. E., Houssein E. H., Bhattacharyya S., & Amin M. (2019). S-shaped binary whale optimization algorithm for feature selection. In Recent trends in signal and image processing (pp. 79–87). Springer, Singapore.
  43. 43. Tubishat M., Abushariah M. A., Idris N., & Aljarah I. (2019). Improved whale optimization algorithm for feature selection in Arabic sentiment analysis. Applied Intelligence, 49(5), 1688–1707.
  44. 44. Guha R., Ghosh M., Mutsuddi S., Sarkar R., & Mirjalili S. (2020). Embedded chaotic whale survival algorithm for filter–wrapper feature selection. Soft Computing, 24(17), 12821–12843.
  45. 45. Eid H. F., & Abraham A. (2018). Adaptive feature selection and classification using modified whale optimization algorithm. International Journal of Computer Information Systems and Industrial Management Applications, 10, 174–182.
  46. 46. Khaire, U. M., & Dhanalakshmi, R. (2020). Stability Investigation of Improved Whale Optimization Algorithm in the Process of Feature Selection. IETE Technical Review, 1–15.
  47. 47. Saidala, R. K., & Devarakonda, N. R. (2017, April). Bubble-net hunting strategy of whales based optimized feature selection for e-mail classification. In 2017 2nd international conference for convergence in technology (I2CT) (pp. 626–631). IEEE.
  48. 48. Zheng Y., Li Y., Wang G., Chen Y., Xu Q., Fan J., & Cui X. (2018). A novel hybrid algorithm for feature selection based on whale optimization algorithm. IEEE Access, 7, 14908–14923.
  49. 49. Ghoneim, S. S., Farrag, T. A., Rashed, A. A., El-kenawy, E. S. M., & Ibrahim, A. (2021). Adaptive Dynamic Meta-heuristics for Feature Selection and Classification in Diagnostic Accuracy of Transformer Faults. IEEE Access.
  50. 50. Agrawal R. K., Kaur B., & Sharma S. (2020). Quantum based whale optimization algorithm for wrapper feature selection. Applied Soft Computing, 89, 106092.
  51. 51. Bai L., Han Z., Ren J., & Qin X. (2020). Research on feature selection for rotating machinery based on Supervision Kernel Entropy Component Analysis with Whale Optimization Algorithm. Applied Soft Computing, 92, 106245.
  52. 52. Krithiga R., & Ilavarasan E. (2020). A reliable modified whale optimization algorithm based approach for feature selection to classify twitter spam profiles. Microprocessors and Microsystems, 103451.
  53. 53. Mohammadzadeh H., & Gharehchopogh F. S. (2021). A novel hybrid whale optimization algorithm with flower pollination algorithm for feature selection: Case study Email spam detection. Computational Intelligence, 37(1), 176–209.
  54. 54. Mafarja M., Qasem A., Heidari A. A., Aljarah I., Faris H., & Mirjalili S. (2020). Efficient hybrid nature-inspired binary optimizers for feature selection. Cognitive Computation, 12(1), 150–175.
  55. 55. Vijayanand R, Devaraj D. A novel feature selection method using whale optimization algorithm and genetic operators for intrusion detection system in wireless mesh network[J]. IEEE Access, 2020, 8: 56847–56854.
  56. 56. Nadu T. (2018). Whale optimization algorithm based feature selection with improved relevance vector machine classifier for gastric cancer classification. International Journal of Pure and Applied Mathematics, 119(10), 337–348.
  57. 57. Wolpert D. H., & Macready W. G. (1997). No free lunch theorems for optimization. IEEE transactions on evolutionary computation, 1(1), 67–82.
  58. 58. Sayed G. I., Darwish A., & Hassanien A. E. (2018). A new chaotic whale optimization algorithm for features selection. Journal of classification, 35(2), 300–344.
  59. 59. Tizhoosh, H. R. (2005, November). Opposition-based learning: a new scheme for machine intelligence. In International conference on computational intelligence for modelling, control and automation and international conference on intelligent agents, web technologies and internet commerce (CIMCA-IAWTIC’06) (Vol. 1, pp. 695–701). IEEE.
  60. 60. Seif Z., & Ahmadi M. B. (2015). An opposition-based algorithm for function optimization. Engineering Applications of Artificial Intelligence, 37, 293–306.
  61. 61. Azzalini, A. (2013). The skew-normal and related families (Vol. 3). Cambridge University Press.
  62. 62. Cover T., & Hart P. (1967). Nearest neighbor pattern classification. IEEE transactions on information theory, 13(1), 21–27.
  63. 63. Kirkpatrick S., Gelatt C. D., & Vecchi M. P. (1983). Optimization by simulated annealing. science, 220(4598), 671–680. pmid:17813860
  64. 64. Emary E., Zawbaa H. M., & Hassanien A. E. (2016). Binary ant lion approaches for feature selection. Neurocomputing, 213, 54–65.
  65. 65. Asuncion, A., & Newman, D. (2007). UCI machine learning repository. http://archive.ics.uci.edu/ml.
  66. 66. Zhao W., Wang L., & Zhang Z. (2019). Atom search optimization and its application to solve a hydrogeologic parameter estimation problem. Knowledge-Based Systems, 163, 283–304.
  67. 67. Heidari A. A., Mirjalili S., Faris H., Aljarah I., Mafarja M., & Chen H. (2019). Harris hawks optimization: Algorithm and applications. Future generation computer systems, 97, 849–872.
  68. 68. Mirjalili S. (2015). Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm. Knowledge-based systems, 89, 228–249.
  69. 69. Mirjalili S., Mirjalili S. M., & Hatamlou A. (2016). Multi-verse optimizer: a nature-inspired algorithm for global optimization. Neural Computing and Applications, 27(2), 495–513.
  70. 70. Mirjalili S., Gandomi A. H., Mirjalili S. Z., Saremi S., Faris H., & Mirjalili S. M. (2017). Salp Swarm Algorithm: A bio-inspired optimizer for engineering design problems. Advances in Engineering Software, 114, 163–191.
  71. 71. Kaur S., Awasthi L. K., Sangal A. L., & Dhiman G. (2020). Tunicate Swarm Algorithm: A new bio-inspired based metaheuristic paradigm for global optimization. Engineering Applications of Artificial Intelligence, 90, 103541.
  72. 72. Derrac J., García S., Molina D., & Herrera F. (2011). A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm and Evolutionary Computation, 1(1), 3–18.
  73. 73. McCallum A., & Nigam K. (1998, July). A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization (Vol. 752, No. 1, pp. 41–48).
  74. 74. Quinlan, J. R. (2014). C4. 5: programs for machine learning. Elsevier.
  75. 75. Hearst M. A., Dumais S. T., Osuna E., Platt J., & Scholkopf B. (1998). Support vector machines. IEEE Intelligent Systems and their applications, 13(4), 18–28.
  76. 76. Hecht-Nielsen R. (1992). Theory of the backpropagation neural network. In Neural networks for perception (pp. 65–93). Academic Press.
  77. 77. Mafarja M., Aljarah I., Heidari A. A., Hammouri A. I., Faris H., Ala’M A. Z., & Mirjalili S. (2018). Evolutionary population dynamics and grasshopper optimization approaches for feature selection problems. Knowledge-Based Systems, 145, 25–45.
  78. 78. Too J., Mafarja M., & Mirjalili S. (2021). Spatial bound whale optimization algorithm: an efficient high-dimensional feature selection approach. Neural Computing and Applications, 1–22.
  79. 79. Too J., Abdullah A. R., Mohd Saad N., Mohd Ali N., & Tee W. (2018). A new competitive binary grey wolf optimizer to solve the feature selection problem in EMG signals classification. Computers, 7(4), 58.
  80. 80. Hafez A I, Zawbaa H M, Emary E, et al. Sine cosine optimization algorithm for feature selection[C]//2016 international symposium on innovations in intelligent systems and applications (INISTA). IEEE, 2016: 1–5.
  81. 81. Faris H., Mafarja M. M., Heidari A. A., Aljarah I., Ala’M A. Z., Mirjalili S., & Fujita H. (2018). An efficient binary salp swarm algorithm with crossover scheme for feature selection problems. Knowledge-Based Systems, 154, 43–67.