A hybrid differential evolution based on gaining‑sharing knowledge algorithm and harris hawks optimization

Differential evolution (DE) is favored by scholars for its simplicity and efficiency, but its ability to balance exploration and exploitation needs to be enhanced. In this paper, a hybrid differential evolution with gaining-sharing knowledge algorithm (GSK) and harris hawks optimization (HHO) is proposed, abbreviated as DEGH. Its main contribution lies are as follows. First, a hybrid mutation operator is constructed in DEGH, in which the two-phase strategy of GSK, the classical mutation operator “rand/1” of DE and the soft besiege rule of HHO are used and improved, forming a double-insurance mechanism for the balance between exploration and exploitation. Second, a novel crossover probability self-adaption strategy is proposed to strengthen the internal relation among mutation, crossover and selection of DE. On this basis, the crossover probability and scaling factor jointly affect the evolution of each individual, thus making the proposed algorithm can better adapt to various optimization problems. In addition, DEGH is compared with eight state-of-the-art DE algorithms on 32 benchmark functions. Experimental results show that the proposed DEGH algorithm is significantly superior to the compared algorithms.

Since its inception, differential evolution (DE) has become one of the most commonly used meta-heuristic algorithms for solving optimization problems [12]. Many scholars have improved DE and applied it in diverse fields, such as clinical medicine [13], text classification  [14], optics [15], energy [16] and neural network [17]. Improvements studies to DE can be divided into two broad categories: 1) Changes of DE compositions, which enhance the performance of the original DE by improving the mutation, crossover, selection operation and adjusting control parameters; 2) hybrid DE with other meta-heuristic algorithms to improve performance by combing their respective advantages. At each generation, the evolution of individuals in differential evolution mainly goes through three stages: mutation, crossover and selection. These stages are the critical targets for the improvements of DE components, among which the mutation operation is the most important. Zhang and Sanderson [18] proposed the famous "DE/current−to−pbest/1" mutation operator in their proposed adaptive DE algorithm (JADE), which improved the mutation by using the first 100p% individuals and an external archive containing suboptimal individuals. Wang et al. [19] facilitate a self-adaptive differential evolution algorithm with improved mutation mode (IMMSADE), which ameliorate the classic mutation operator "DE/rand/1" by attaching a benchmark factor to the basis vector. Zheng et al. [20] proposed a collective information-powered DE (CIPDE), a collective individual contained in the mutation operator of which is a linear combination of m individuals with optimal fitness values. Mohamed et al. [21] proposed two enhanced DE variants (EBDE and EDE), in which three different individuals were ranked to participate in mutations, the difference being that the former's individuals were randomly selected from the top p individuals and from the entire population, while the latter three individuals were all randomly chosen from the population. Li et al. [22] presented an improved differential evolution algorithm with dual mutation strategies collaboration (DMCDE), which applied the improved DE/rand/2 and DE/best/2 based on an elite guidance mechanism. Ghosh et al. [23] proposed a switched parameter DE, in which each individual randomly selected binary crossover operator or BLX−α−β crossover operator. Tian et al. [24] presented a DE with improved individual-based parameter setting and selection strategy (IDEI), which developed a diversity selection strategy based on the newly defined weighted fitness value. Cheng et al. [25] proposed an improved DE with fitness and diversity rankingbased mutation operator (FDDE), which judged the contribution of individuals participating in the "DE/rand/1" mutation strategy to population diversity according to their fitness values, and rearranged the positions of the three random individuals based on the ranking information of individual diversity and fitness values.
The control parameters of DE include population size NP, scaling factor F and crossover probability CR, which are the other direction of improvements of DE components. Tanabe and Fukunaga [26] proposed a success-history based parameter adaptation for DE (SHADE). By establishing new historical storages M cr and M f , CR and F with good performance in the past were preserved, and new parameter pairs were sampled from them. Shortly after that, Tanabe and Fukunaga [27] raised an enhanced version that added a population size reduction rule to the SHADE (LSHADE). After the end of each evolutionary process, the population size of the next generation was reduced by a linear function. Poláková et al. [28] described a new mechanism of population size adaption to DE, which evaluated the current population diversity based on European distance and adjusted the NP size according to the evaluation results. Meng et al. [29] put forward a DE variant with novel control parameter adaptation (PaDE), which included a grouping strategy for adjusting F and CR and a parabolic reduction rule for changing NP. Li et al. [30] proposed an enhanced adaptive ED algorithm (EJADE), which introduced a crossover probability sorting mechanism and dynamic population reduction strategy based on JADE. Wang et al. [31] proposed a self-adaptive ensemble-based DE (SAEDE), which set the control parameters of each generation through self-adaptive and integration mechanisms, reducing the need for user setting. Xue and Chen introduced [32] an adaptive compact DE (ACDE), in which F and CR obeyed the Cauchy distribution and

Preliminaries
This section describes the basic principles of differential evolution (DE), gaining-sharing knowledge algorithm (GSK) and harris hawks optimization (HHO).

Differential evolution
The framework structure of DE mainly includes four stages: initialization, mutation, crossover and selection, among which the last three stages are the cyclic evolution process based on the population.
2.1.1 Initialization. For a minimization problem minf(X), the population P g in DE can be defined as: where g and G denote the current and the maximum generation number. NP is the population size, D represents the dimension of the problem. x min i and x max i are the upper and lower boundaries of the solution space, respectively. The original population P 0 is determined by random initiation in the solution space, and then the following cyclic evolution process is performed.

Mutation. At generation g, a mutation individual
g is generated for each individual X i,g , commonly treated as follows.

Crossover.
By means of binary crossover, the components are extracted from the target individual X i,g and the mutation individual V i,g+1 to form the trial individual ( where rand j is a real random number in [0,1], j rand is a random integer in [1,D]. The crossover probability CR determines the amount of replication from the mutation individual V i,g+1 .

Selection.
After evaluating the fitness of the target individual and the trial individual, the winner goes on to the next generation. (

Gaining-sharing knowledge algorithm
Gaining-sharing knowledge optimization algorithm (GSK) [11] is a nature-inspired algorithm that mimics the process of gaining and sharing knowledge throughout the human life, including the junior gaining-sharing phase and the senior gaining-sharing phase. In GSK, D junior dimensions are randomly selected from each individual to adopt the junior scheme, and the remaining D senior = D−D junior dimensions to use the senior scheme. D is the dimension of the problem, and D junior is determined by the following formula.
where the knowledge rate k is a constant, g and G represent the current and the maximum generation number.

Junior gaining-sharing phase.
In this phase, all individuals are arranged in ascending order according to fitness values: X best,g ,� � �,X i−1,g ,X i,g ,X i+1,g � � �,X worst,g . When the Knowledge ratio k r >rand j (a random number in [0,1]), the jth dimension of each individual remains unchanged. Otherwise, it is updated as follows. ( where the knowledge factor k f is a real number greater than zero. x j i;gþ1 and x j i;g represent the jth dimension of X i at the current generation and the next generation, respectively. x j iÀ 1;g , x j iþ1;g and x j r;g are the jth dimensional components of individuals X i−1,g , X i+1,g and X r,g , respectively. f (X i,g ) and f(X r,g ) denote the fitness values of X i,g and X r,g , respectively.
2.2.2 Senior gaining-sharing phase. At this stage, after sorting by fitness values, all the individuals are divided into three groups: best people {X pb,g }, middle people {X m,g } and worst people {X pw,g }, with the number of 100p%, N−(2�100p%), 100p%, respectively. Similarly, the jth dimension of each individual remains unchanged when k r >rand j , otherwise it is updated as follows.
( where x j rpb;g , x j rpw;g , x j rm;g represent the jth dimension of individuals X rpb,g ,X rpw,g ,X rm,g , and individuals X rpb,g ,X rpw,g ,X rm,g are randomly selected from groups {X pb,g }, {X pw,g }, {X m,g }.

Harris hawks optimization
Harris hawks optimization (HHO) is a novel swarm-based algorithm proposed by Heidari et al. [10], which imitates the cooperative behavior and chase pattern of Harris hawks in the process of hunting. In HHO, there are three primary phases: exploration, transition from exploration to exploitation, exploitation.

Exploration phase.
At this phase, the hawks use the following two strategies to find prey. X i;gþ1 ¼ X rand;g À r 1 � jX rand;g À 2 � r 2 � X i;g j; q � 0:5 ðX rabbit;g À X mean;g Þ À r 3 � ðLB þ r 4 � ðUB À LBÞÞ; q < 0:5 ð8Þ where X mean,g and X i,g denote the mean and current location vector of the Harris hawk at the current generation g, X rand,g and X rabbit,g are positions of a randomly selected hawk and the prey. X i,g+1 indicates the location vector of the hawk at the next generation g+1. r 1 , r 2 , r 3 , r 4 and q are real random numbers in [0,1], UB and LB are the upper and lower range, respectively.

Transition from exploration to exploitation.
Through the rabbit's escaping energy E, the HHO algorithm can realize the transition from exploration to exploitation. The escaping energy E is formulated as: where g and G indicate the current and the maximum generation number, E 0 is the initial energy in (−1,1).

Exploitation phase.
According to the escaping energy E and the successful escaping chance r of the prey, diverse exploitative behaviors are adopted, such as soft besiege, hard besiege, soft besiege with progressive rapid dives and hard besiege with progressive rapid dives. The successful escaping chance r is a real random number in [0,1].
where J = 2�(1−r 5 ) indicates the random jump intensity of the prey, and r 5 is a real random number in [0,1].
• Hard besiege (r�0.5 and |E|�0.5). The Harris hawks hardly encircled the prey, and their positions are updated as follows: where ΔX i,g is the difference between positions of the rabbit and the current hawk, which can be seen in Eq (12).
• Soft besiege with progressive rapid dives (r<0.5 and |E|�0.5). The prey still has enough energy to escape, and the Harris hawks respond as follows. ( where f(Y) and f(Z) represent the fitness values of Y and Z, respectively. D denotes the dimension of the problem, LF(D) is the Levy fight that can be obtained through the following formula.
where u and v are random numbers in [0,1], β is a constant value of 1.5.
• Hard besiege with progressive rapid dives (r<0.5 and |E|�0.5). In contrast to the previous behavior, the rabbit's escaping energy is insufficient, and the behavior of the Harris hawks are modelled as follows.
( where X mean,g is the average position calculated by Eq (9).

The proposed algorithm
This section is a detailed introduction to the proposed algorithm, including its motivation, hybrid mutation operator and Crossover probability self-adaption.

Motivations
According to the above introduction, changes based on DE components and hybridization with other meta-heuristic algorithms can improve the performance of DE. As for GSK algorithm, its two-stage model has been able to balance exploration and exploitation effectively [11]. On this basis, a mutation strategy "DE/rand/1" with global exploration ability and HHO's Soft Obsessed strategy with exploitation ability are considered. By applying these four strategies to mutation operation, a balanced double insurance mechanism for exploration and exploitation is formed. Besides, for most DE variants, the operations of mutation, crossover and selection are relatively independent. In DEGH, these operations are linked together by the control parameters F, CR and a binary variable h that records the historical evolution state, making the connection within the whole DE framework even tighter.

Hybrid mutation operator
In order to achieve a better balance between exploration and exploitation, DEGH adopts a dual insurance mechanism in the mutation operation, which contains four mutation strategies. First, the strategy of GSK in the junior phase (Eq (6) and senor phase (Eq 7) in GSK are introduced and streamlined, which help maintain a sufficient balance between global exploration and local exploitation capabilities in the search process [44]. The two strategies are as abbreviated as GSK/J-mutation and GSK/S-mutation. Second, in order to further strengthen this balance, DE's classic mutation strategy "DE/rand/1" and the soft besiege in the exploitation phase of HHO are added to the hybrid mutation operator, which are called "DE/rand /1-mutation" and "HHO/SB-mutation", respectively. Thus, GSK/J-mutation and GSK/S-mutation, combined with DE/rand/1-mutation and HHO/SB-mutation, form a hybrid mutation operator, which is a dual-insurance mechanism for balancing global exploration and local exploitation capabilities.
Before mutation operation, all individuals are arranged according to fitness values to form a new population P g = {X best,g ,X 2,g ,� � �,X NP−1,g ,X worst,g }, which is grouped into best people {X pb, g }, middle people {X pw,g } and worst people {X m,g }, as shown in Fig 1. The population sequencing and grouping strategy of DEGH is the same as that of GSK. On this basis, two random distribution numbers R1 i,g and R2 i,g , as well as control parameters F and CR i,g , together determine the mutation strategy adopted by each individual. Among them, R1 i,g ,R2 i,g and CR i, g are implemented at the individual level.

GSK/J-mutation.
When R1 i,g �F and R2 i,g <CR i,g , the strategy of the junior phase (Eq (6)) of GSK is improved. The scaling factor F is substituted for the knowledge factor k f , and the mutation individual V i,g+1 generated is as follows.
where X i−1,g and X i+1,g are the nearest better and worsen individuals of the target individual X i, g . if X i,g is X best,g , X i−1,g and X i+1,g are X 2,g and X 3,g . if X i,g is X worst,g , X i−1,g and X i+1,g are X NP−2,g and X NP−1,g . X r,g denotes a randomly selected individual in the new population P G .

GSK/S-mutation.
When R1 i,g <F and R2 i,g �Cr i,g , similarly, the strategy in Eq (7) of the senior phase of GSK is also changed, and the mutation individual V i,g+1 is generated by the following mode.
where X rpb,g , X rpw,g and X rm,g are randomly chosen individuals from best people {X pb,g }, middle people {X pw,g } and worst people {X m,g }, respectively.

DE/rand/1-mutation.
when R1 i,g �F and R2 i,g �CR i,g , the mutation individual V i,g+1 is produced by the classic mutation operator of DE in Eq (2), which is famous for its strong global search capability.

HHO/SB-mutation.
when R1 i,g <F and R2 i,g <CR i,g , according to the enhanced version of the soft besiege rule of exploitation phase in HHO, the mutation individual V i,g+1 is obtained as follows.

Crossover probability self-adaption
As shown in the mutation operation above, the crossover probability CR affects the selection of the mutation operator adopted by each individual. In order to make the internal phases of

PLOS ONE
DE more closely linked, the adjustment of CR is associated with mutation and selection operations.
At each generation of DEGH, the frequencies used for GSK/J-mutation, GSK/S-mutation, DE/rand/1-mutation and HHO/SB-mutation are counted and represented as anum,bnum, cnum and dnum respectively. At the same time, the mutation strategy adopted by each individual is labelled with flag: individuals with GSK/J-mutation are flag = 1; individuals with GSK/Smutation are flag = 2; individuals with DE/rand/1-mutation are flag = 3; individuals with HHO/SB-mutation are flag = 4. Besides, in the selection operation of DEGH, a binary variable h recording the evolutionary status of the trial individual is introduced and participated in the adjustment of CR. If the trial individual fails to evolve, h i,g+1 is set to 0 and CR is assigned a random number in [0,1]. On the contrary, h i,g+1 is set to 1 and the adaptive adjustment of CR is as follows.
where flag i records the mutation strategy applied by individual X i,g and NP is the population size.

Pseudocode of the proposed algorithm
Based on the above description, pseudo-code of the proposed DEGH algorithm is reported in Fig 2, where the hybrid mutation operator is shown in lines 11-27 and the crossover probability self-adaptation strategy is used in lines 28-34.

Computational complexity
The computational complexity of the DEGH depends on the following aspects: initialization, sorting, evaluation, mutation, crossover, and selection. Compared with the original DE, DEGH only increases the complexity of sorting. The computational complexity of the initial DE is O(NP�D�G), and the sorting complexity is O(NP), so in general, the computational complexity of DEGH remains the same as the original DE, which is O(NP�D�G).

Experimental setting
In the following experiments, to ensure a fair comparison, the common parameters of all algorithms are set the same: the maximum generation number G is set to 1000, the population size NP is set to 100, and 30 independent runs are conducted. Other parameter settings of each algorithm are shown in Table 2.

Parameter study
In this section, the sensitivity analysis of population size NP and scaling factor and the efficiency analysis of crossover probability are studied through relevant experiments.  Table 3.
As can be seen from Fig 3, with the increase of NP, the performance of DEGH improves. DEGH performs best at NP = 250. From the data listed in Table 3, there is no significant difference in the performance of DEGHs with different NP values, that is, DEGH is not sensitive to  Sphere Exponential Step ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi P D i¼1 x i 2 q Þ þ 0:1 ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Katsuura Scaffer's F6 f 27 ðxÞ ¼ P D i¼1 ð0:5 þ ððsinð ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi

PLOS ONE
population size NP. In order not to lose universality, the population size NP is set to 100 in the following experiments.

Sensitivity analysis to scaling factor.
In DEGH, the scaling factor F plays a vital role in the mutation operation. By setting F to F2[0.1,0.9] in steps of 0.1, a series of experiments are conducted to analyze the sensitivity of the scaling factor. Three nonparametric statistical tests are used to analyze the optimization results of 30-dimensional problems with different F values, which are recorded in Fig 4 and Table 4, respectively.
From Fig 4, it is clear that the performance of DEGH is best at F = 0.3. From Table 2, it can be seen that DEGH is insensitive to F except F = 0.1. Therefore, F = 0.3 can be considered as a suitable value for subsequent experiments.

Efficiency analysis to crossover probability.
In order to investigate the effectiveness of crossover probability self-adaptation strategy in DEGH, the efficiency of crossover probability is analyzed by setting CR = 0.2,0.4,0.8, rand and compared with the proposed DEGH, where rand represents a random real number inside [0,1]. The results of the non-parametric statistical tests of these DEGHs are shown in Fig 5 and Table 5.
From Fig 5, it is evident that the proposed DEGH is the best and DEGH with CR = rand is the second best. It can be concluded from Table 5 that, except for DEGH with CR = 0.2, there is no significant performance difference between DEGH and its variants. In other words, the crossover probability self-adaption is effective, but DEGH is less susceptible to crossover probability.

Comparison with eight state-of-the-art DE variants
In  Table 9, where the symbol "+/−/�" indicates the performance of DEGH is "better than/worse than/similar to" the compared algorithm.
At D = 30, from At D = 50, it can also be seen from Table 7    Furthermore, three non-parametric statistical tests are used to analyze these optimization results. The Friedman and Kruskal-Wallis tests results drawn in Fig 6 show that DEGH is the best in all dimensions. The Wilcoxon's rank-sum test results in Table 10 show that all positive rank sums R + obtained are far larger than negative rank sums R − , no matter in which dimension and compared with which algorithm. Moreover, whether the significance test level is 0.5 or 0.1, all p-values obtained are far less than them. In other words, Wilcoxon's rank-sum test also confirms that DEGH is significantly superior to other compared algorithms.

Convergence properties.
The convergence properties can be summarized into the following four types, which are depicted in Fig 7.  Fig 7(A). In this type, DEGH does not show apparent advantages at the beginning of evolution, but it can quickly converge to the global minimum first.
ii. The convergence attributes of f 17~f21 , f 23 , f 24 , f 28 and f 30 are divided into a class, as shown in Fig 7(B). In this type, DEGH shows absolute advantages at the beginning, with the steepest slope and can quickly converge to the global minimum, while other algorithms evolve slowly or stall.
iii. In Fig 7(C), the convergence curve of f 25 is plotted, which is similar to that of f 14 , f 15 , f 26 and f 29 . On these functions, all algorithms are at varying degrees of evolutionary stagnation or slow evolution.
iv. The evolutionary trend of f 13 , f 31 and f 32 is similar, as shown in Fig 7(D). Here, some algorithms fall into evolutionary stagnation, but DEGH continues to evolve downward.

Discussion on results
The above experiments prove the remarkable superiority of the proposed DEGH. The reasons for DEGH's outstanding performance are summarized as follows. (1) DEGH, based on GSK and HHO algorithms, is an improvement and hybrid on the DE framework. On the one hand, GSK/J-mutation and GSK/S-mutation operators have a good balance between global exploration and local exploitation. On the other hand, the DE/rand/1-mutation and HHO/SB-mutation are another powerful guarantee for the balance between exploration and exploitation. These two respects cooperate each other, formed the dual-safeguard mechanism for the balance between exploration and exploitation. (2) The crossover probability self-adaption strategy of DEGH strengthens the internal connection between the mutation, crossover and selection stages, and makes the whole frame structure more harmonious. On this basis, the crossover probability and scaling factor dynamically adjust the evolution strategy of each individual to make the proposed algorithm more suitable for various problems.

Conclusions
This paper proposes a hybrid differential evolution algorithm based on gaining-sharing knowledge algorithm and harris hawks optimization (DEGH), which can achieve excellent performance even with a fixed scaling factor. Through a series of experiments, the effectiveness and sensitivity of DEGH parameters are investigated. The performance of DEGH is evaluated by comparing with eight state-of-the-art DE variants like IMMSADE [19], CIPDE [20], EBDE [21], EDE [21], EJADE [30], LSHADE-SPACMA [35], DEPSO [37] and ATLDE [39] on 32 benchmark functions at D = 30,100. Experiments results show that: 1) DEGH is not sensitive "Yes" represents that the combined DEGH's performance is better than its variant significantly. "No" represents no significant performance discrepancy between the two compared. The bigger the R + , the better the combined DEGH.

PLOS ONE
As an extension of this research work, the following aspects are the future research directions. 1) Binary version of DEGH and its application in flight sequencing system; 2) Apply DEGH to the optimization of neural network parameters and further apply it to flight trajectory prediction. 3) Hybridize DE with other emerging meta-heuristic algorithms.