An improved chaotic fruit fly optimization based on a mutation strategy for simultaneous feature selection and parameter optimization for SVM and its applications

This paper proposes a new support vector machine (SVM) optimization scheme based on an improved chaotic fly optimization algorithm (FOA) with a mutation strategy to simultaneously perform parameter setting turning for the SVM and feature selection. In the improved FOA, the chaotic particle initializes the fruit fly swarm location and replaces the expression of distance for the fruit fly to find the food source. However, the proposed mutation strategy uses two distinct generative mechanisms for new food sources at the osphresis phase, allowing the algorithm procedure to search for the optimal solution in both the whole solution space and within the local solution space containing the fruit fly swarm location. In an evaluation based on a group of ten benchmark problems, the proposed algorithm’s performance is compared with that of other well-known algorithms, and the results support the superiority of the proposed algorithm. Moreover, this algorithm is successfully applied in a SVM to perform both parameter setting turning for the SVM and feature selection to solve real-world classification problems. This method is called chaotic fruit fly optimization algorithm (CIFOA)-SVM and has been shown to be a more robust and effective optimization method than other well-known methods, particularly in terms of solving the medical diagnosis problem and the credit card problem.


Introduction
In many real-world classification tasks, reducing the dimensionality of data is an essential step before classifying the data. The general approach for reducing the dimensionality of data involves feature selection techniques that aim to select the most relevant features based on certain predefined filter criteria. Based on their dimensionality-reducing characteristics, feature selection techniques have been widely applied in many pattern recognition tasks and machine learning fields. PLOS  To solve the hybrid flow-shop rescheduling problem with flexible processing time in steelmaking casting systems [25], a hybrid method using the FOA and two decoding heuristics called HFOA was proposed and has successfully solved flow-shop rescheduling problems. To optimize continuous function problems, Wang L et al. [26] proposed an improved FOA that uses a new mutation parameter and level probability policy. In this improved algorithm, the mutation parameter and level probability policy are used to balance population stability and diversity. To optimize twin support vector machine (TWSVM), at least two parameters generally must be considered. Ding S et al. [27] used FOA for parameter setting turning of TWSVM, and the experimental results showed that using FOA to optimize TWSVM can yield better classification performance than SVM. To solve the medical data classification problem, [28] used FOA to optimize SVM by determining an optimal parameter setting of the SVM model; compared with other well-known methods, the results of the constructed experiments of this method showed that the optimized SVM model with FOA is a powerful tool for medical data classification. For community detection methods, which are generally based on one evolutionary algorithm, a novel multi-swarm FOA was proposed by Liu Q et al. [29], namely, CDMFOA, and the experimental results showed that this method can effectively solve the detection community structure. For the nonlinear and non-stationary traits of rotating machinery vibration, the SVM classifier optimized by FOA has been used [30], and the experimental results showed that using FOA in combination with SVM (FOA-SVM) can yield better performance with respect to rolling bearing diagnosis. Some control systems have certain essential parameters that require proper determination [31]. A new hybrid method using GA in combination with FOA has been proposed to perform parameter tuning of control systems. In this method, FOA is employed to perform parameter setting turning of the controller system and GA is used to select the controller structure. Si L et al. [32] used an improved FOA in combination with the least squares support vector machine (LSSVM) to solve the identification problem of Shearer Cutting Patterns and constructed experiments to compare with PSO-LSSVM, GA-LSSVM and FOA-LSSVM; these experiments indicated that the proposed improved fruit fly optimization algorithm (IFOA)-LSSVM outperformed other methods. FOA has also been applied in traffic flow forecasting. A method using FOA to optimize the LSSVM has been proposed to improve the accuracy of traffic flow forecasting [33], and the experimental results showed that this method outperformed the LSSVM model, the RBF neural network (RBFNN), and LSSVM-PSO. To overcome the disadvantages of traditional FOA, Wu L et al. [34] proposed a cloud mode based on FOA, namely, CMFOA, which uses an adaptive parameter strategy to enhance the global search ability in the first stage, and performed experiments using 33 benchmark functions, and the results revealed the superior performance of this method compared with that of other FOA variations. One of the most important drawbacks of the original FOA is that it is difficult to obtain optimal solutions in zero vicinity; thus, an improved algorithm based on the original FOA using differential evolution has been proposed [35], namely, DFOA, which modifies the representation of the smell concentration judgment value and replaces the stochastic search with a differential vector. The experimental results show the effectiveness of DFOA for finding optimal working conditions. Parameter estimation plays an important role in bidirectional inductive power transfer (BIPT) systems.
To obtain a proper parameter setting for this system, an improved algorithm using chaotic PSO to enhance the original FOA has been proposed [36], namely, CFOA, and the experimental results showed that the 11 parameters of this system were determined properly. In certain real-world problems, such as joint replenishment problems (JRPs), FOA has also been applied. Wang L et al. [37] proposed an improved and effective algorithm based on the original FOA, namely, IFOA, which is used to solve joint replenishment problems and to optimize numerical functions. To determine product specifications, the melt index (MI) is one of the most important criterion. To forecast MI, an improved FOA using an adaptive mutation, namely, AM-FOA [38], was used to determine the punishment factor "γ" and the parameters of the Gaussian RBF kernel, and the experimental results showed that AM-FOA optimizing LSSVM is a functional method in MI prediction. FOA is generally suitable for optimizing continuous variables. To solve the discrete variable optimization problem, a binary FOA has been proposed to solve set covering problems (SCPs) [39]. Pan Q K et al. [40] proposed an improved algorithm based on the original FOA, namely, IFFO, which is used to solve continuous function optimization problems. The main novelty of this algorithm is that it uses a new parameter to control the search scope, and the experimental results showed that IFFO outperforms five state-of-the-art harmony search algorithms. To solve the semiconductor final testing scheduling problem (SFTSP), a novel algorithm based on the original FOA was proposed [41] called nFOA; multiple fruit fly swarms are employed in the evolution process to improve FOA's parallel search ability. One typical discrete optimization problem is solving three-dimensional path planning. The IFOA has been proposed [42] for solving engineering problems. The constructed experiments have shown that IFOA is a powerful method that can solve discrete optimization problems with greater efficiency. As the above related works of FOA show, FOA has become a powerful tool to effectively determine proper parameter settings for machine learning algorithms and to successfully solve complex multidimensional problems. However, traditional FAO has several drawbacks, such as the searching procedure becoming easily trapped in the local optimum, premature convergence, and the difficulty of addressing the discrete variable optimization issues. Several studies have proven that FOA is an efficient tool for parameter estimation of machine learning algorithms, such as the GRNN and SVM. Moreover, according to the related work of SVM parameter optimization, most of these optimization methods using FOA perform only parameter turning for the SVM classifier without simultaneously performing feature selection. Therefore, the above facts motivate us to propose a novel, intelligent framework using the proposed CIFOA and its improved mutation strategy, aiming to enhance the generalization ability and improve the classification performance of the SVM classifier by determining a proper parameter setting with an optimal feature subset simultaneously. In the proposed CIFOA, the chaotic PSO is proposed in combination with the mutation strategy to overcome the weaknesses of the original FOA and make the algorithm procedure reflect ergodicity, randomicity, and regularity. The proposed CIFOA optimizing SVM, namely, CIFOA-SVM, is an efficient framework that can be used to solve various real-world classification problems. The main novelty of the proposed intelligent framework is using chaotic PSO to solve the discrete variable optimization problem (determining an optimal feature subset). The proposed mutation strategy uses two different searching strategies to search for the local and global optimal solutions simultaneously, whereas a mutation parameter is proposed to allocate the individual in both the local searching strategy and global searching strategy. Concurrently, this study proposes a weighted fitness function to simultaneously address the trade-off between sensitivity and specificity classification accuracy and the number of selected features to maximize the classification performance of the proposed method. The efficiency and effectiveness of the proposed method have been examined in terms of classification accuracy, sensitivity, specificity, running time, and convergence curve with respect to two real-world classification problems: the medical diagnosis problem and the credit card problem. Five real-world datasets are introduced to evaluate the classification performance of various methods in solving real-world problems, and these datasets come from the UCI machine learning database repository. The experiment's results indicate that the proposed method can determine a more appropriate SVM model parameter setting and obtain an optimal feature subset with much less running time than the GAFS and other intelligent methods. The main contributions of this study are as follows: (1) it develops an improved FOA based on chaotic particle optimization with a novel mutation strategy, and (2) it develops an improved FOA that is successfully applied to SVM to determine proper parameter settings with an optimal subset of features for real-world classification problems.
The remainder of this paper is organized as follows: in section 2, we provide the necessary background materials regarding SVM and FOA. The proposed improved FOA based on chaotic optimization techniques and mutation strategy is presented in section 3 in which we also provide several groups of tests on well-known continuous functions. The detailed experiments and in-depth comparisons of the proposed framework with other well-known methods are presented in section 4, in which we also include a discussion. Finally, the conclusion and recommendations for future research are summarized in section 5.

A brief overview of support vector machines
In this subsection, we present a brief description of SVM. SVM was originally developed by Vapnik et al. [43] and is mainly used to solve classification problems. In subsequent years, SVM was applied to the multi-classification problem [44][45]. The main objective of SVM is to determine an optimal hyperplane that separates data of different classes on either side. The optimal hyperplane is determined by maximizing the interval between the support vectors of the closest positive and negative frontiers.
For now, we consider binary classification problems, i.e., -1 or 1, to represent one of two classes of a sample. When the label of i item of the samples is -1, then that i item of the samples belongs to the "positive class"; otherwise, the i item of the samples belongs to the "negative class". Let Y i is the label of i item of the samples. To separate the instances into two categories, we use the function F(X) = W T X + b, where W is a coefficient vector that is used to normalize the hyperplane. For the linearly separable case, an optimal separating margin can be determined by solving the following equation: To solve the above equation, a dual Lagrangian equation with multipliers α i (i = 1,2,. . .n) is introduced. The detailed Lagrangian equation can be expressed as follows: To construct the optimal hyperplane, a Lagrangian equation La(α) must be maximized with a positive multiplier α i under the conditions of X n i¼1 a i Y i ¼ 0 and α i ! 0. The solution α i can be solved by addressing the parameter w introduce an optimal equation for solving this case.
From the above equation, the Lagrangian multiplier α i = 0 means that its corresponding training vector is the closest to the margin of the optimal hyperplane and is also called a support vector. The main SVM characteristic is clearly expressed, i.e., constructing the optimal hyperplane depends on only a small subset of the training dataset without all the training dataset.
To address the nonlinear case, the linear equation can also be modified to address the nonlinear data. For now, a general ideal is given here, i.e., we use a kernel function to map the original input spaces into a higher-dimensional feature space. The original input data can be linearly separated using the kernel function to calculate the inner product in the feature space.
The kernel function can be expressed as follows: By using the kernel function, the original linear generalized equation can be modified to represent the nonlinear dual Lagrangian La(α).
0 a i C; i ¼ 1; 2; . . . ::; n; To solve the above optimization model, a method that solves the Lagrangian equation in the separable case can also be used to solve this optimization model.
There are several general kernel functions, such as the radial basic function (RBF), polynomial, linear kernel function and sigmoid kernel function. Table 1 displays the detailed calculation of the four kernel functions.
Where d is the polynomial order and γ is a parameter predefined by the user, which is used to control the graph's width of the Gaussian kernel.

A brief overview of the basic fruit fly swarm optimization algorithm
FOA was developed by Pan W T et al. [46]. The main characteristics of FOA are to search for food source by visual sense and sensitive olfactory. The specific procedure a fruit fly swarm employs in searching for food is shown in Fig 1. More specifically, a fruit fly can still find food at a distance of 70 km from the food source. The main merits of FOA are that it is efficient, simple and easy to implement. FOA is generally suitable for solving continuous variables optimization problems. To solve discrete parameter optimization problems, the binary fruit fly optimization algorithm (bFOA) was proposed by Wang L et al. [47]. In bFOA, the MKP problem was represented by a binary string and three search procedures were used to perform an evolutionary search to obtain the local best solution and global best solution. Because of the nature of FOA, it works better with continuous variables optimization problems. Although Chaotic improved FOA and SVM for real-world problems FAO has better search ability than other intelligent algorithms, there are certain drawbacks of the original FOA that cannot effectively solve complex problems in the real world. Thus, some improved or modified algorithms have been proposed to overcome these drawbacks. For example, an improved algorithm using the original FOA in combination with chaotic PSO has been proposed [48], namely, CFOA, which was compared with other intelligent algorithms to solve ten well-known benchmark problems, and the results showed that CFOA outperforms the other algorithms as measured by various performance criteria. A bimodal optimization algorithm based on the original FOA and cloud model learning has been proposed [49], namely, BCMFOA, which uses an adaptive parameter update strategy and a cloud generator to adaptively adjust the search range of fruit flies. To solve the drawback of the original FOA in multidimensional complicated optimization problems, a novel FOA algorithm using the multi-swarm fruit fly concept has been proposed [50], namely, MFOA. In comparison with the original FOA, the constructed experiments of MFO showed that MFOA obtains a significant outcome for several benchmark functions.
The original FOA can be divided into several steps. In this section, we attempt to express the detailed procedure of each step in FOA; then, we present the improved FOA in section 3.1. The detailed procedure of the original FOA is described as follows: Step 1. FOA parameter initialization: to initialize the parameter setting for FOA, several necessary parameters must be considered. The parts of these parameters are the same as those of other evolutionary algorithms (EAs), such as population size and the maximum iteration number. The remaining parts of these parameters are the upper bound with the lower bound of the random flight distance range and the initial fruit fly swarm location (X axis ,Y axis ). The initial location (X axis ,Y axis ) can be initialized by solving the following equations: where rand() is used to generate a random value in the interval [0, 1].
Step 2. Initializing populations for FOA: initializing the population for FOA employs a random strategy with the obtained initial fruit fly swarm location to generate a random location (X i ,Y i ) for each fruit fly. (X i ,Y i ) represents the location of the i-th fruit fly and is obtained as follows: Step 3. Evaluating the population for FOA: to evaluate the population and determine the best fruit fly, the distance from each fruit fly location to the food location can be used to calculate the smell concentration judgment value. Distance i represents the distance of the i-th fruit fly to the food location, which can be calculated by solving Eq (11). The smell concentration judgment value S i is obtained by solving Eq (12) as follows: Step 4. Converting the concentration value to fitness value: from the above step, the smell concentration judgment value of each fruit fly has been obtained. Converting the smell concentration judgment value to the fitness value is a convenient way to evaluate the importance of each fruit fly in populations.
Step 5. Selecting the maximal smell concentration: to find the food source, the maximal smell concentration is used as a guide for the searching procedure. bestSmellIndex represents the corresponding index of the maximal smell concentration among the fruit fly swarm.
bestSmellIndex ¼ SelectionMaxðSmellÞ ð14Þ Step 6. Updating the maximal smell concentration: this step updates the maximal smell concentration value and location (X axis ,Y axis ) based on the determination of bestSmellIndex, which means that all fruit flies modify their own location to move toward the direction of the maximal smell concentration.
Step 7. Checking the termination conditions: this step compares the current maximal smell concentration value with the previous maximal smell concentration value. If the current maximal concentration is no longer superior to the previous one, then the termination conditions are satisfied and the iterative procedure is stopped. Moreover, the maximum iteration number is needed to avoid unnecessary computational costs.

The proposed methodology
In this section, we present a novel and efficient FOA based on mutation strategy and chaotic PSO. The detailed mutation strategy is described in section 3.1. In section 3.2, we provide the detailed algorithm procedure of the CIFOA and its pseudo-code.

Mutation strategy and chaos particle optimization
In a traditional intelligent algorithm, the algorithm's performance depends on its preset parameter setting and is easily trapped in a local optimum. To further avoid premature convergence and enhance the algorithm's search ability, the chaos concept has been introduced [48,51,52,53,54,55]. Moreover, many algorithms based on the chaos concept were proposed to solve the medical diagnosis problem [55], yielding excellent outcomes. Chaos is characterized as ergodic, random, and regular [56][57][58]. Numerous studies have shown that random-based optimization algorithms perform better when using non-standard distributions (i.e., Gaussian or uniform distributions). Additionally, the properties of ergodicity and non-repetition of the chaos technique can force an algorithm to perform overall searches at higher speeds. These are the main reasons to employ the chaotic technique used in the proposed algorithm.
Although traditional FOA can achieve considerable results in terms of search efficiency and running time in various fields, FOA's searching performance depends exclusively on its fruit fly swarm location, which can easily lead the procedure of the FOA to fall into the trap of local optima. Thus, to address this problem and improve FOA's searching ability, we introduce the chaotic PSO in combination with the proposed mutation strategies to be used in FOA. The mutation strategy is proposed to generate two different osphresis forging strategies for simultaneously searching for the local optimum and the global optimum. One of these strategies is the global searching stage, which replaces the random method that fruit flies employ to find food sources. Instead, new food sources are generated by Eq (17), as follows: From the above equation, X ij is a newly generated food source in the range [minX j ,maxX j ]. n is population size and d is the number of decision variables. C ij is the i-th column of the j-th row of the chaos set, which is calculated as follows: From the above equation, Normalized() is employed to transform the fruit fly swarm location in the range [0, 1]. C 0j is the j-th dimension of the initial chaos vector, which is given by NormalizedðX j axis Þ. Logistic() is a logistic chaos mapping that is defined by Eq (19). Then, using an iteration of the logistic chaos mapping with an initial chaos vector, a set of chaos vectors C 1 , C 2 ,.. . ... . .C 1 is generated.
The second osphresis forging strategy is the local searching stage, which generates new food sources around the fruit fly swarm. In this stage, the osphresis parameter μ is proposed to help the algorithm control the range of newly generated food sources. The new food source X ij is calculated as follows: Note that the osphresis parameter μ plays an important role in the local searching ability of CIFOA and should be set properly. According to our previous experimental results, the osphresis parameter μ can be determined as follows: From the above equation, the upper_bound and lower_bound are used to form a domain of the parameter.
To simultaneously perform the two stages in an iterative procedure, the mutation probability rate mr is introduced to allow many individuals to use the global searching stage, while for the remaining population uses the local searching stage. The mutation probability mr is generally set to 0.8; a specific procedure of the mutation probability rate mr used is described as follows: if randðÞ mr Then where rand() is a randomly generated value in the interval [0, 1].

The procedure of CIFOA
According to the nature of the proposed CIFOA, the full procedures of the proposed algorithm can be divided into seven steps, and each step is described in detail as follows: Step 1. Parameters and chaos particle initialization: to gain an appropriate initial fruit fly swarm location and to maintain the diverse population distribution, the chaotic optimization technique is suitable for initializing the fruit fly swarm location and generating other fruit flies' locations. Moreover, the population size, maximum number of iterations, mutation probability mr, and osphresis parameter μ must be initialized.
(1) Provide a chaos vector C 0 = {C 01 ,C 02 ,..,C 0n }, then use the random function to generate a random value in the range [0, 1] for each component C 0i of the chaos vector.
(2) Use the initial chaos vector C 0 with the iteration procedure of the logistic chaos equation to generate a set of chaos vectors, C 1 ,C 2 ,C 3 ,. . ..,C n .
(3) To avoid the i-th item-value of the chaos vector being outside the bounds of the i-th parameter's range, the data normalized method is used to transfer a chaos vector, C i , into the parameter's range; the detailed transformation is thus described as follows: where maxX i and minX i denote the upper and lower bounds of the i-th parameter's range, respectively. C 0 ij represents the value of the j-th dimension of the i-th fruit fly that has been converted into the range [minX i ,maxX i ].
Step 2. Determine the initial fruit fly swarm location: this step is mainly responsible for finding an appropriate location with a maximal smell concentration value as the initial fruit fly swarm location. First, calculate the smell concentration value of each fruit fly using a predefined evaluation function; then, compare the smell concentration value of each fruit fly to obtain an optimal fruit fly location as the initial fruit fly swarm location.
Step 3. Update the locations of fruit flies: this step employs the chaotic PSO with the initial fruit fly swarm location X axis to update the location of the fruit flies. The detailed procedures are as follows: (1) Consider the initial chaos vector C 0 = {C 01 ,C 02 ,..,C 0n }. The fruit fly swarm location X axis is transferred to a scaled location that is used as the chaos vector C 0 , where each component must be converted into the range [0, 1] using the following equation: where minX i and maxX i are the lower and upper bounds of the parameter, respectively.
(2) The scaled vector C 0 0 is used as a chaos seed to generate a set of chaos vectors {C 1 ,C 2 ,. . . . . .,C n } by iteration of the logistic chaos mapping.
(3) To update the location of the fruit fly, the proposed mutation parameter mr is employed to partition the population into two groups of fruit flies: one of the groups updates their locations through the global searching Eq (17), whereas the other groups update their locations using the local searching Eq (20).
Step 4. Calculate the smell concentration value: from the above step, the location of each fruit fly is obtained, given a parameter X i that represents the distance of the j-th fruit fly to the food sources. The smell concentration value of each fruit fly is calculated by solving the following predefined objective function: Step 5. Determine the optimal fruit fly: this step determines the best fruit fly by selecting the maximum smell concentration value among the fruit fly swarm.
bestSemllIndex ¼ MaxðSmellÞ ð26Þ Step 6. Update the fruit fly swarm location: this step will keep the smell concentration value and update the fruit fly swarm location if the current obtained best smell concentration is superior to the previous location.
Step 7. Check termination conditions and repeat algorithm iterative procedure: (1) First, compare the current iterative times and the preset maximum iterative times; if the first one reaches the second one, then the termination condition is satisfied and the algorithm's procedure stops; otherwise, go to the following step.
(2) If the smell concentration value of the current iteration is no longer superior to the smell concentration value of the previous iteration and the current iterative times have reached a predefined value, then the algorithm's procedure stops; otherwise, go to step 8.
Step 8. Using the proposed mutation strategy to generate a new fruit fly swarm location: from the above step, we have learned that the best smell concentration index with its corresponding location was not changed from the previous generation; to avoid the iterative procedure falling into the trap of a local optimum and to explore a more feasible global best optimal, the proposed mutation strategy has been introduced to address this case. The specific procedure of the mutation strategy is described as follows: In the above equation, rand() is a randomly generated value in the interval [0, 1], and t is employed as a variation gene in the range of dimension of the fruit fly. Most intelligent algorithms generally stop the procedure if the current obtained best smell concentration value is not superior to the previous best smell concentration value. However, the proposed mutation strategy continues to have opportunities to find the better solution by changing the fruit fly swarm location.

Testing in several examples
In this subsection, to evaluate and observe the search performance of the proposed CIFOA, this study uses several benchmark functions that are described in Table 1, and FOA, traditional PSO, time-varying particle swarm optimization algorithm (TVPSO), and improved fruit fly optimization algorithm (IFOA) [40] are used as competitors in this testing. To achieve the available and objective results of the proposed algorithm compared with the other algorithm in this testing, the population size and the maximal iteration number are set to 50 and 1000, respectively. For the parameter setting of traditional PSO, the inertia W is set to 1.0 and the acceleration coefficients C 1 and C 2 are set to 2.05 and 2.05, respectively [15,28]. For the parameter setting of the TVPSO, the lower inertia weight W min and the upper inertia 10. For j = 0 to d 11.
For i = 0 to d 46.
EndFor 48. EndIF 49. EndFor weight W max are set to 0.5 and 0.9, respectively. The initial C 1i ,C 1f ,C 2i ,C 2f are set to 2.5, 0.5, 0.5 and 2.5, respectively, and V max is set to 60% of the upper range of the parameter on each dimension.
In this testing, we implement the proposed CIFOA and other algorithms using C sharp (C#) language with the Visual Studio 2008 platform. To obtain an objective result, this study uses 12 well-known functions in this testing. The dimensions of these optimization functions are set to 30. The specific mathematical description is shown in Table 2, where lb and ub represent the lower and upper bounds of the solution X, respectively.
The global optimum of all test functions is equal to F(X Ã ) = 0. The lower and upper bounds for functions are set based on their known initial value ranges. For each iteration of the algorithm procedure, the range of values is applied for each parameter X ij . Moreover, we use the closeness criterion to evaluate the error between the searched solution of each algorithm and the final algorithm solution. The error is evaluated using the search space of a well-known function, which is defined as follows: In the above equation, X best is the global best solution of the algorithm obtained by each iteration of the algorithm procedure.  In this test, to evaluate the searching performance of five algorithms, the five most frequently used statistical measures, i.e., the best, worst, median, means objective function values, and standard deviation are used to measure the search ability of five algorithms. We use five algorithms to perform 50 independent runs for each optimization function and obtain the statistical results, which are averaged. Moreover, we also present the detailed convergence curves generated by the five algorithms for each optimization function. Table 3 shows that CIFOA achieves the best results in terms of most performance criteria in comparison with the other intelligent algorithms. In each independent test using the five algorithms, CIFOA almost achieved the optimal solution of each function and obtained significant results in terms of statistical measures. The traditional FOA obtained the worst results in most cases overall, and it could not find the global solution in several function tests. Comparing the CIFOA with the traditional FOA shows that traditional FOA using the proposed chaotic optimization technique in combination with the proposed mutation strategy can enhance the searching ability of both the global and local optimums.
Moreover, CIFOA showed good performance in several complex multimodal functions, while the other four algorithms achieved worse results in terms of the median and standard deviation. In addition, CIFOA was also shown to be an efficient and robust algorithm for solving continuous functions. Figs 2, 3, 4, 5, 6 and 7 shows the convergence curves generated by the five intelligent algorithms for solving different complex nonlinear continuous functions. To observe the convergence curves generated by five algorithms, we have found that the proposed algorithm always successfully reached the closest solution of the optimization function with a minimal number of iterations. In particular, when solving the more complicated mathematical equations, the proposed algorithm has better searching performance than the other intelligent algorithms. Thus, it can also be concluded that the mutation strategy and chaotic PSO can help the algorithm simultaneously search for the global and local optimums, whereas the mutation strategy also allows the algorithm procedure to jump out of the local extremum.
Furthermore, this testing compares the proposed algorithm CIFOA with other well-known algorithms. The testing results show that the proposed CIFOA is significantly better than other algorithms presented for solving the complex nonlinear continuous functions.

Experiments and applications
In this section, we applied the proposed CIFOA to optimize the SVM, aiming to enhance the classification performance of the SVM classifier for solving real-world classification problems. In this proposed method, namely, CIFOA-SVM, which simultaneously performs SVM model parameter setting turning (penalty parameter C and hyperplane parameters) and feature selection, attempts to achieve an optimal SVM model, which has better generalization ability and excellent effectiveness for real-world classification tasks. To evaluate the classification performance of the proposed methods, this study has constructed comparative experiments that were performed between the proposed CIFOA-SVM and other intelligent methods, including GA-SVM, PSO-SVM, FOA-SVM, and TVPSO-SVM. Moreover, two realworld problems have been introduced in this study: the medical diagnosis problem and the credit card problem.

Fitness function design
A well-designed fitness function can explore a more optimal global best solution and avoid falling into the trap of local optima. Although different performance criteria have been proposed to evaluate the classification performance of a SVM classifier, the most popular and Chaotic improved FOA and SVM for real-world problems frequently used of these performance criteria are sensitivity and specificity. To explain the effect of sensitivity and specificity in the performance metrics, we introduce the confusion matrix that is displayed in Table 4.  Chaotic improved FOA and SVM for real-world problems for the SVM classifier. Thus, the four performance criteria are understood to design a weighted objective function that simultaneously considers the trade-off between sensitivity and specificity, maximizing the true positive rate and minimizing (1 -the false positive rate), along with the number of selected features and support vectors. The detailed proposed fitness function is depicted as follows: Considering that any of these four components of the fitness function have different effects on the classification performance of the SVM classifier, we designed the fitness function using  Chaotic improved FOA and SVM for real-world problems multiple performance criteria to convert a single weighted criterion, i.e., the sensitivity, specificity, and the number of selected features and support vectors are converted into one by combining their weight values W sen ,W 1−spe ,W F ,W S . In the above equation, F i represents the value of the i-th feature mask, with a value of "1" meaning that it is selected as one part of the input feature space; otherwise, it is ignored during the training phase. N f refers to the number of total features.

Data representation
To implement the proposed CIFOA-SVM, the radial basis function (RBF) is employed as the kernel function of the SVM classifier because it can effectively address high-dimensional data, and only one parameter is required to be optimized. The selected feature subset and model parameter setting are generally represented by a binary string or other representation. In the GA, one of the most frequently used of these representations is the binary string, which can easily map the selection state of features to a feature mask. By contrast, in FOA, the sigmoid Chaotic improved FOA and SVM for real-world problems function and random strategy are used to convert the distance X into a feature mask ("1" means that the feature is selected as the input feature, and "0" means that the feature is ignored). The specific equation is described as follows: where rand() is a randomly generated value in the range [0, 1].

The proposed CIFOA-SVM framework
This section describes the detailed procedure of the proposed CIFOA-SVM framework. Based on the nature of the proposed framework, its procedure mainly consists of two stages. One of these stages is the algorithm computing processing layer, which is used to implement CIFOA operations, such as population initialization, fruit fly swarm location update, and maintaining the maximum smell concentration and its corresponding location. Another of these stages is the microcosmic computing layer, which is also called the control layer in this paper. This stage is mainly responsible for the calculation of the smell concentration value of each individual using SVM and the proposed evaluation function. The detailed basic procedure of the proposed framework is shown in Fig 8. To give a more detailed description of how an optimal SVM model is achieved, we provide a pseudo-code of the proposed CIFOA-SVM framework as follows:   Xaxis(t) = X min (t)+(X max (t)−X min (t))Árand(); 50. EndIf 51. EndFor

Parameter setting and datasets
In this section, we describe the parameter setting of various methods and the property of datasets in detail. To evaluate the classification performance of the proposed framework compared with that of the other methods in the classification tasks, GA, FOASVM, PSOFS, and TVPSOFS have been introduced to be used as the competitors in these experiments. GAFS means that the genetic algorithm (GA) is used to simultaneously turn the model parameter setting and select the feature subset. FOASVM means that the FOA is used to turn the parameter setting of the SVM classifier without selecting the feature subset. PSOFS means that the traditional PSO algorithm is used to adjust the model parameter setting and to select the feature subset simultaneously. TVPSOFS means that the traditional PSO uses a time-varying technique to dynamically adjust its inertia weight and coefficients based on iterative times. To achieve an objective comparison in various methods, the same population size and maximum iteration number are applied for all methods. For the parameter setting of the GAFS, the crossover probability rate and mutation probability rate are set to 0.75 and 0.15, respectively. The binary string is used to represent the individual, and the penalty parameter C and hyperparameters are represented by two binary strings, with each binary string being composed of 20 bit (2 20 ), which means that the searching precision of two parameters depends exclusively on the length of the binary string. The roulette wheel selection and elite selection strategy are used to generate new individuals that merge with the best individual into the new population of the next generation. For the parameter setting of FOA, the lower and upper bounds of the flight range are set to -10 and 10, respectively. The location of each fruit fly is limited to the range of [-10, 10]. For the parameter setting of the traditional PSO algorithm, the acceleration coefficients C 1 ,C 2 are set to 2.05 and 2.05 [15,18]. The inertia weight W is set to 0.729 [59]. The lower and upper bounds of particle velocity are set to 0 and 1, respectively, for the feature mask, and set to 60% of the range of the parameter on each dimension for model parameters.
For the parameter setting of the time-varying particle swarm optimization algorithm (TVPSO), the initial acceleration coefficients C 1i ,C 1f ,C 2i ,C 2f are set to 2.5, 0.5, 0.5, and 2.5, respectively. The initial inertia weights W min ,W max are set to 0.4 and 0.9, respectively. The lower and upper bounds of penalty parameter C are set to 0 and 1000, respectively. The lower and upper bounds of the hyperplane parameter are set to 0 and 10, respectively.
To achieve a feasible result and avoid overtraining, we introduce the k-fold crossover-validation technique in this experiment. The k-fold crossover-validation technique is the most popular and frequently used method in myriad classification experiments. The main idea behind this technique is to divide the original training dataset into several subsets; each subset is completely independent and is constituted by the same number of samples. In each run of the k-fold crossover-validation, each subset has the same chance to be used as the testing dataset, and the remaining k-1 subsets are used as the training dataset. In this study, we set k = 10 for k-fold crossover-validation, i.e., we divide the original training dataset into ten parts; and each part can be used as the testing dataset and the remaining 9 parts are used as the training dataset. Finally, the training subset, SVM model parameters, and selected feature subset are fed into the SVM to generate an SVM model; then, the testing subset is used with the SVM model to make a prediction, and the fitness value is calculated based on the obtained classification accuracy and other performance criteria. This study uses five methods to perform ten iterations of 10-fold crossover-validation on the various datasets, and the obtained results are averaged. To generate enough classification performance for comparison, this study repeatedly performs 10 iterations for the 10-fold crossover-validation procedure instead of a single repetition of the 10-fold crossover-validation procedure.

Datasets and data preprocessing
Many datasets contain several dissimilar properties. To avoid the feature value spanning over great numerical ranges and dominating the other features in smaller numerical ranges-and to address numerical other difficulties in the calculation-the data normalized method has been introduced to scale all the input variables of the datasets into the range [0, 1] or [-1, 1]. The main advantage of scaling variables is that it can help improve the classification performance of the SVM classifier and reduce the running time of training and prediction. The specific data normalized method is described as follows: In the above equation, V ij is the feature value of the j-th item of the i-th row recorded. The function min() is used to find the smallest feature value among all the features of the i-th row recorded, and the function max() is used to select the greatest feature value among all the features of the i-th row recorded. V 0 ij is the transformed feature whose value is restricted in the [0, 1] range.

The medical diagnosis problem
To investigate the classification performance of various methods in terms of solving the medical diagnosis problem, three datasets have been employed as the experimental dataset, all from the UCI machine learning database repository. These datasets include the Wisconsin Diagnostic Breast Cancer (WDBC-1995), Pima Indians Diabetes (PID), and the Parkinson's disease dataset (PDD). The WDBC-1995 contains 569 samples (212 malignant and 357 benign samples) represented by 30 continuous features [60]. This breast cancer database was established by Dr. William H. Wolberg from the University of Wisconsin Hospitals and donated in November 1995. Ten real-valued features are computed for each cell nucleus. The details of the 32 attributes of the (WBC-1995) dataset are presented in Table 5. The PID [61] consists of 758 samples (500 normal samples and 268 diabetes samples), represented by 8 attributes. The PID was constructed by collecting donative diabetes cases in which all patients were females at least 12 years old of Pima Indian heritage. The detailed attributes and categories of this dataset are shown in Table 6. The PDD [62] was created and collected by Max Little of the University of Oxford, in collaboration with the National Centre for Voice and Speech, Denver. This dataset contains 195 samples (147 non-disease samples and 48 disease samples) represented by 23 attributes. These attributes are various biomedical voice measurements for 23 patients with Parkinson's disease and 8 healthy individuals. The primary purpose of this dataset is to discriminate healthy people from those with Parkinson's disease. The detailed 23 attributes of the dataset are shown in Table 7.
There are several performance metrics to evaluate the classification performance of the proposed approach, including the classification accuracy, sensitivity, specificity, the number of selected features and support vectors, and the running time of the training and prediction procedure that have been employed to construct a comprehensive comparative experiment. In each run of various datasets of this experiment, we undertake the five methods and perform 10 iterations of 10-fold crossover-validations, and the obtained experimental results are averaged.
The detailed classification results of the five methods in terms of the average classification accuracy, sensitivity, specificity, and number of selected features and support vectors on the Chaotic improved FOA and SVM for real-world problems Wisconsin Diagnostic Breast Cancer (WDBC-1995) database are presented in Table 8. We first compare the classification accuracy, sensitivity, and specificity. As shown in this table, the proposed CIFOA-SVM has achieved average results of 98.21% classification accuracy, 99.51% sensitivity, and 96.11% specificity. These obtained statistic results show that the proposed CIFOA-SVM outperforms other methods in terms of classification accuracy and in the tradeoff between sensitivity and specificity. Moreover, this table also shows that not only did the proposed CIFOA-SVM achieve the best results in terms of classification accuracy but also the standard deviation produced by the CIFOA-SVM was also much smaller in terms of classification accuracy, sensitivity and specificity than that of other methods. This result indicates that the proposed CIFOA-SVM can obtain considerably more consistent and smooth prediction Chaotic improved FOA and SVM for real-world problems results than other methods for different fold runs of 10-fold crossover-validation. To compare the number of selected features of the optimal SVM model using various outcomes, the proposed CIFOA-SVM almost requires the least features to construct an optimal SVM model among the FOA-SVM, PSOFS, and TVPSO. Table 8 also shows that feature methods have better classification accuracy than FOASVM, which indicates that feature selection actively improves the accuracy of the SVM classifier for solving medical diagnosis problems. For the number of support vectors of the optimal SVM model constructed using a variety of methods, the best SVM model with the fewest support vectors was achieved by the proposed CIFOA-SVM. Moreover, the same results are shown in Tables 9 and 10. To investigate the effects of the number of selected features on the classification accuracy of the proposed framework compared with other methods in solving medical diagnosis problems, we perform one iteration of 10-fold cross-validation using the five methods on the WDBC-1995 dataset. The detailed results are displayed in Fig 10. This graph shows that the feature selection methods, including GAFS, CIFOA-SVM, PSOFS and TVPSOFS, can improve classification accuracy better than the ordinary parameter tuning FOASVM in different fold runs of 10-fold crossover-validation. The main reason for these results is that some medical data contain redundancies and useless features, including noise or irrelevant feature information, which might affect the quality and efficiency of adjusting parameters for the SVM classifier. Thus, simultaneous feature selection and parameter optimization can enhance the classification accuracy for classifying medical data. Furthermore, CIFOA-SVM almost requires fewer selected features to construct an optimal SVM model than the PSOFS and TVPSO at each fold run of 10-fold cross-validation for the (WBC-1995) dataset, which indicates that CIFOA-SVM filters out most of the irrelevant features from the SVM classifier, reducing its complexity and improving its classification performance.
To investigate the computational time of the five methods required to implement the training and prediction procedure, we recorded the five methods' running time for each fold run of 10-fold crossover-validation using the WDBC-1995 database. Fig 11 displays the computational time in seconds under the five methods to perform a one-fold run of the 10-fold crossover-validation. This figure shows that the proposed CIFOA-SVM requires an average of almost 150 s to implement the training of an optimal SVM model and to make a prediction in each fold for WDBC-1995. Compared with the proposed CIFOA-SVM and GAFS, the computational time was reduced by 4 seconds using CIFOA-SVM. Compared with the proposed CIFOA-SVM and PSOFS, the computational time in seconds was reduced by 43 seconds by CIFOA-SVM. Compared with the proposed CIFOA-SVM and TVPSO, the computational time was reduced by 9 seconds using CIFOA-SVM. Compared with the proposed FOASVM and the four other methods based on the feature selection technique, FOASVM achieves an optimal SVM model with the required minimal computational resources. Moreover, traditional PSO requires the most running time to implement an optimal SVM model. The above results show that our proposed CIFOA-SVM algorithm is an efficient framework among the feature selection methods for the medical data classification tasks. Fig 12 presents the convergence curves generated by the five methods for different numbers of iterations for the fold #3 run of the 10-fold crossover-validation using the WDBC-1995 Chaotic improved FOA and SVM for real-world problems database. These curves show that not only did the proposed CIFOA-SVM obtain better classification accuracy during the different numbers of iterations but also the convergence procedure of CIFOA-SVM determined the final optimal SVM model and reached the stopping criteria much more quickly than PSOFS, TVPSO, FOASVM, and GAFS. This result indicates that the proposed CIFOA-SVM has excellent searching ability for determining an optimal feature subset and a proper parameter setting for the SVM classifier.

Credit card problem
To evaluate the classification performance of the five methods for solving the credit card problem, we selected two datasets, both of which come from the UCI machine learning database repository. One of these datasets is the German Credit Data (GCD) [63], which contains 1000 records (300 bad credit risk records and 700 good credit records) represented by 20 numeric and non-numeric features. The information regarding these features is illustrated in Table 11.
To convert the original non-numeric features to numeric features that SVM recognizes, we adopt a real number to represent non-numeric values of the discrete feature. The GCD is used to evaluate credit card risks, which are either good or bad credit risks. The other dataset is the Australian Credit Approval (ACA) [64], which consists of 690 samples (307 positive samples  Chaotic improved FOA and SVM for real-world problems and 383 negative samples) represented by 14 attributes. The main purpose of ACA is to consider credit card applications.
To evaluate the classification performance of the proposed method in solving the credit card problem, experiments using the five methods for each fold run of the 10-fold crossovervalidation on the GCD and ACA datasets have been performed. All the results obtained, including classification accuracy, sensitivity, specificity, and number of selected features and support vectors, were averaged. Tables 12 and 13 display the detailed classification results achieved by the five methods for 10 runs of 10-fold crossover-validation. Table 13 shows that the proposed CIFOA-SVM has achieved average results of 81.70% classification accuracy, 95.03% sensitivity, 51.04% specificity, 16.1 selected features, and 504.3 support vectors. To compare the CIFOA-SVM with other methods in terms of the GCD, not only did the CIFOA-SVM obtain the best results with respect to most of the performance criteria but also the standard deviation generated by CIFOA-SVM is much smaller in terms of classification accuracy and the number of selected features than that of other methods. Table 12 presents the same phenomenon. Moreover, Fig 13 shows the classification accuracy achieved by the five methods for each fold run of the 10-fold crossover-validation on the GCD. This graph shows that the proposed CIFOA-SVM almost achieves better results than the other methods in terms of classification accuracy in each fold for the GCD datasets. The results achieved above by the five methods for the two datasets indicate that the proposed method has better classification performance in solving credit card problems.
Feature selection is one of the important effect factors for the classification accuracy of the SVM classifier. According to Table 12 and Fig 14, feature selection methods have better classification accuracy than the parameter adjustment methods, although FOA has better searching ability for the optimal SVM parameters among these parameter adjustment methods.  shows the number of selected features of the five methods for 10 runs of 10-fold crossover-validation for the GCD. As shown, the proposed CIFOA-SVM almost requires the least number of selected features to implement an optimal SVM model compared with the PSOFS and TVPSO at each fold run of 10-fold cross-validation for the GCD dataset.
To investigate the required computational overhead of the five methods to complete an optimal SVM model, we have recorded the computational time in seconds of using the five methods to implement the training and prediction procedure in the fold #4 run of the GCD dataset, and these results are presented in Fig 15. This graph shows that the proposed CIFOA-SVM requires an average of almost 800 seconds to implement the training and prediction procedure in the fold #4 run for the GCD dataset. Compared with CIFOA-SVM and GAFS on the GCD dataset, the required running time was reduced by 40 seconds by CIFOA-SVM. Compared with the PSOFS and CIFOA-SVM, the required running time was reduced by 215 seconds by CIFOA-SVM. Compared with CIFOA-SVM and TVPSOFS, the running time was reduced by 35 seconds by CIFOA-SVM. Moreover, the same phenomenon has been observed in Fig 15. PSOFS requires the most time consumption for implementing the entire optimization process.
To compare the searching capability of using various intelligent algorithms to determine the optimal parameter setting of the SVM model and to select a proper feature subset, this experiment recorded the best classification accuracy achieved by the five methods in each Chaotic improved FOA and SVM for real-world problems iteration for fold #4 of the GCD dataset. Fig 16 shows the convergence curve generated by the five methods with increasing iterative times for the GCD dataset. This curve shows that did the proposed CIFOA-SVM not only almost achieve the best classification accuracy of all the methods for different iterations but also determined the superior solution and met the termination criteria far more rapidly.

Conclusion and future work
In this work, feature selection and parameter estimation for the SVM are transformed into a complex multidimensional optimization problem. To solve this problem and obtain an optimal SVM model, this study proposed an improved FOA based on the chaotic PSO-in combination with the mutation strategy. In the proposed method, the proposed improved algorithm, CIFOA, has been successfully applied to determine the optimal parameter setting of the SVM classifier and to provide a more appropriate feature subset. To prevent the searching procedure from becoming trapped in a local optimum and to have an efficient classifier with better global searching ability, a mutation strategy was proposed to maintain population diversity. In this study, we first perform several groups of tests to evaluate the searching ability of the proposed CIFOA in solving the complex nonlinear continuous functions. The empirical results show  Chaotic improved FOA and SVM for real-world problems that CIFOA not only achieves a significant result with respect to parameter estimation of the optimization function but also has a faster convergence rate. Finally, to evaluate the effectiveness of parameter estimation of the proposed improved algorithm and the classification performance of the proposed intelligent framework, CIFOA-SVM, we performed several groups of experiments using various well-known methods for solving the credit card problem and the medical diagnosis problem. The experimental results reveal that the proposed intelligent framework is a powerful tool for parameter optimization and feature selection for SVM.
There are several notable directions for our future work. First, it would be interesting to combine the proposed intelligent framework with a different classifier-such as a Naïve Bayes and an Artificial Neural Network-to solve classification problems in wider areas. Second, we would like to extend the proposed intelligent framework to solve multi-class problems in the real world. Finally, it would be fruitful to employ heterogeneous evolution algorithms with the swarm optimization technique to construct several groups of experiments for classification tasks.