Ranking-based hierarchical random mutation in differential evolution

In order to improve the performance of differential evolution (DE), this paper proposes a ranking-based hierarchical random mutation in differential evolution (abbreviated as RHRMDE), in which two improvements are presented. First, RHRMDE introduces a hierarchical random mutation mechanism to apply the classic “DE/rand/1” and its variant on the non-inferior and inferior group determined by the fitness value. The non-inferior group employs the traditional mutation operator “DE/rand/1” with global and random characteristics, which increases the global exploration ability and population diversity. The inferior group uses the improved mutation operator “DE/rand/1” with elite and random characteristics, which enhances the local exploitation ability and convergence speed. Second, the control parameter adaptation of RHRMDE not only considers the complexity differences of various problems but also takes individual differences into account. The proposed RHRMDE is compared with five DE variants and five non-DE algorithms on 32 universal benchmark functions, and the results show that the RHRMDE is superior over the compared algorithms.


Introduction
In the field of problem optimization, metaheuristic algorithms have been paid more attention and researched by many scholars because of their fast calculation capabilities based on computer simulation and no need for complicated mathematical formula derivation, so they have developed rapidly. In addition to classic Genetic Algorithm (GA) [1], Simulated Annealing (SA) [2], Particle Swarm Optimization (PSO) [3], Differential Evolution (DE) [4], Artificial Bee Colony (ABC) [5], Firefly Algorithm (FA) [6], etc., many new metaheuristic algorithms are also emerging, such as Grey Wolf Optimizer (GWO) [7], Whale Optimization Algorithm (WOA) [8], Monkey King Evolution (MKE) [9], Marine Predators Algorithm (MPA) [10], Gaining Sharing Knowledge based Algorithm (GSK) [11] and Equilibrium Optimizer (EO) [12], etc. Differential evolution (DE) is a branch of metaheuristic algorithms based on population for optimization, which performs parallel search and obtains information in the search space through the differences among individuals in the population. Because of its high efficiency, simplicity and easy operation, DE has been successfully applied in many fields [13][14][15][16][17][18]. However, like other metaheuristic algorithms, when solving some complex optimization problems, DE is also prone to premature convergence and fall into a local optimal solution [19,20]. To overcome these shortcomings and improve the convergence speed and stability of DE, many scholars have carried out a lot of research work, most of which based on control parameters and mutation strategies [21]. First, many researchers focused on the optimization of DE control parameters, such as population size NP, scaling factor F and crossover probability CR. Liu and Lampinen [22] proposed a fuzzy adaptive differential evolution (FADE), in which individuals successfully entered the next generation and their fitness values were used as inputs to a fuzzy logic controller to adjust F and CR. Noman [23] proposed an adaptive differential evolution Algorithm (aDE), which compared the fitness of the offspring individual with the average fitness value of the parent population. If the former was smaller, F and CR would remain unchanged. Otherwise, they would be randomly selected in the interval [0.1, 1.0] and [0.0, 1.0] respectively. Ghosh [24] proposed an improved differential evolution algorithm with fitness-based adaptation of the control parameters (FiADE), in which the adjustment of F and CR was based on the objective function value of individuals in the population. Sarker [25] proposed a DE with dynamic Parameters Selection algorithm (DE-DPS), in which NP, F and CR were dynamically selected from corresponding candidate sets according to their historical performance for adaptive looping. Tanabe and Fukunaga [26] proposed success-history based adaptive differential evolution (SHADE), which saved the control parameters when the evolution is successful in the historical memories M F , and M CR to guide the selection of future control parameter values. Based on SHADE [26], Viktorin [27] adjusted F and CR based on the Euclidean distance between the trial individual and the original individual. Draa [28] proposed a sinusoidal differential evolution (SinDE), the adjustment of F and CR in which was based on the use of a new sinusoidal framework.
Second, some scholars paid much attention to the improvement of DE's mutation strategy. Qin et al. [29] proposed a self-adaptive DE (SaDE), which adopted four mutation strategies to generate trial vectors. The selection of mutation strategies would be affected by the performance of these four strategies. Zhang and Sanderson [30] proposed an adaptive differential evolution with optional external archive algorithm (JADE), which introduced a new mutation strategy "DE/current−to−pbest/1". In contrast to the classic mutation strategy "DE/current−to −best/1", the best individual in the classic was replaced by an individual randomly selected from the top P%. Gong and Cai [31] proposed an improved adaptive differential evolution with crossover rate repairing technique and ranking-based mutation called Rcr-IJADE, which was an enhanced version of JADE [30]. Epitropakis [32] proposed a proximity-based mutation algorithmic scheme for DE/rand/1. Mallipeddi [33] proposed a differential evolution algorithm with ensemble of parameters and mutation strategies (EPSDE). During the entire evolution process, different mutation strategy pools coexisted with the value pools of each control parameter and competed with each other to produce offspring. Fan and Yan [34] presented an adaptive differential evolution algorithm with discrete mutation control parameters (DMPSADE), which assigned mutation strategies to individuals based on the roulette wheel. Tang [35] proposed a differential evolution with an individual-dependent mechanism (IDE), which contained four mutation operators used in different stages. Wu [36] proposed a differential evolution with multi-population based ensemble of mutation strategies (MPEDE), which partitioned the population dynamics into three indicator subpopulations and one reward subpopulation. Each indicator subpopulation had its own mutation strategy, and the reward subpopulation was applied to the mutation strategy that had performed best so far. Duan [37] proposed a differential evolution algorithm with dual preferred learning mutation (DPLDE), which selected the random mutation vector based on the fitness value and the Euclidean distance between individuals, and then adjusted the mutation strategy according to the randomly selected individual, the optimal individual and the target individual. Mohamed AW and Mohamed AK [38] proposed an adaptive guided differential evolution algorithm with novel mutation for numerical optimization (AGDE), in which individuals were randomly selected from the best 100P% and the worst 100P% of the population and their difference vectors were used as the disturbance in the mutation operation. Besides, the base vector of the mutation operator was randomly selected from the remaining [NP−2(100P%)] individuals. Wei [39] proposed a random perturbation modified differential evolution algorithm (PRMDE), which contained a new "DE/M_pBest−best/1" mutation operator, and the population mutation operation would be implemented based on this proposed mutation operator or the classic mutation operator "DE/best/1". Wang [40] proposed a self-adaptive mutation differential evolution algorithm based on particle swarm optimization (DEPSO), in which a selection probability determined whether an individual would apply "DE/e−rand/1" mutation strategy or PSO mutation strategy.
In order to overcome the problem of premature convergence on DE and improve its performance, a differential evolution with ranking-based hierarchical random mutation in (RHRMDE) is proposed in this paper. Firstly, a hierarchical random mutation mechanism is designed. The upper layer of the model corresponds to a random global mutation strategy, which is applied by superior (NP−NWP) individuals in the population sorted by fitness values. NP and NWP represent the population size and the inferior group size, respectively. At the same time, the lower layer of the model designs a random elite mutation strategy, using individuals randomly selected from superior NWP individuals to guide the evolution of NWP inferiors. Secondly, in the adjustment of the control parameters, the complexity differences of diverse problems and individual differences are considered, as well as the evolutionary status of individuals in the previous generation. In addition, the sensitivity of the inferior group size NWP has been analyzed in this paper. Furthermore, the performance of the proposed RHRMDE is verified on 32 benchmark functions by comparing with five DE variants and five non-DE algorithms, and the experiment results demonstrate that RHRMDE outperforms the compared algorithms.
The rest of this paper is structured as follows. Section 2 gives the classic DE algorithm briefly. Section 3 introduces details of the proposed RHRMDE algorithm. The experimental results compared with five DE variants and five non-DE algorithms are reported and analyzed in Section 4. Section 5 draws the conclusions of this work and offers the future research direction.

The classic differential evolution algorithm
In DE, the population all of which has D dimensions. After initialization, the entire population generates the corresponding offspring through loop steps of mutation, crossover, selection until certain termination conditions are met.

Initialization
The jth component of the ith individual in the initial population can be expressed as: where x U i;j and x L i;j are the upper and lower bounds of the jth dimension of the ith individual. rand j returns a random number distributed between 0 and 1, j indicates that each component of the individual generates a new random number.

Mutation
After initialization, for each individual X G i , a mutation individual V Gþ1 i;D g is generated through mutation operation. The classic mutation operator "DE/rand/1" can be expressed as: where F is a scaling factor, r1, r2, r3 are randomly chosen indices within [1,NP] and are different from the current index i, that is, r16 ¼r26 ¼r36 ¼i.

Crossover
Next, for the target individual X G i and its associated mutation individual V Gþ1 i;D g is produced by using binomial crossover.
where CR(2[0,1]) determines the contribution of the mutation individual to the trial individual. j rand denotes a random integer in the range of [1, D], which guarantees that at least one component (j rand ) in the trial individual comes from the mutation individual.

Selection
After that, the selection step is executed: between the trial individual and the target individual, the one with the smaller fitness value will be accepted. In other words, if f ðU Gþ1 i;j Þ � f ðX G i Þ, the trial individual U Gþ1 i;j will become the offspring individual X Gþ1 i , otherwise the target individual X G i will be retained.

Hierarchical mutation mechanism
In RHRMDE, all individuals in the current population are sorted in descending order according to their fitness value, resulting in the new population as follows: with the maximum fitness value; X G worst is X G NP with the minimum fitness value. Then the ranked population P G is divided into two groups: the non-inferior group composed of the former (NP−NWP) individuals and the inferior group composed of the latter NWP. On this basis, a two-layer random mutation mechanism is adopted.
3.1.1 The random global mutation strategy. Due to its random characteristic, the classic "DE/rand/1" has stronger global search capability and is applied to the non-inferior group at the individual level: where i = {1,2,� � �,(NP−NWP)}, r1,r2,r3 differ from each other and range in the set {1,2,� � �, NP}. However, just because of the global and random properties, the classic mutation operator does not make full use of the information contained in superior individuals, which is beneficial for population evolution and convergence [30,31,[38][39][40][41]. Then an improved version of the classic mutation strategy is proposed and applied to the "elite" level.

The random elite mutation strategy.
In this strategy, elite individuals guide the evolution of the inferior group. Meanwhile, for the purpose of accelerating convergence, a weighting factor is presented: is the weighting factor, X G rp1 ; X G rp2 and X G rp3 are elite individuals randomly selected from the former NWP individuals fX G best ; X G 2 ; � � � ; X G NWP g. That is to say, rp1,rp2,rp3 range in the set {1,2,� � �,NWP} and rp16 ¼rp26 ¼rp3. In section 3.2, F i and W i will be discussed in more detail.

Control parameter adaptation
In the classic DE, control parameters are the same for all individuals, which have been applied in the individual level by many researchers later. Unlike most studies, the adaptation of control parameters (F, CR, W) in RHRMDE takes into account the complexity differences of various problems and individual differences.

The scaling factor F adaptation.
Due to control the size of the searching area around the base individual, the scaling factor F is one parameter used to balance the tradeoff between global and local search in RHRMDE. The scaling factor F is adjusted for the following purposes. Firstly, since the elite individuals in RHRMDE guide the evolution direction of the entire population, if a better individual can have more global search generations in the solution space, it is more likely to find a better solution, thereby reducing the probability of the population falling into local optima, so the index i representing individual differences is added to the adjustment of F. Secondly, for a simple optimization problem, it may take only a few dozen evolutionary generations to come up with a solution. However, for a complex optimization problem, it may not get close to the correct solution after hundreds of generations, but has already converged and fallen into the local optimal. Therefore, a factor related to the maximum generation number should be considered when calculating F, so that the change curves of F under different maximum generation numbers will not differ too much. Thirdly, using the evolutionary state of the previous generation to intervene in the adjustment of F helps to get rid of local optimality. Therefore, if evolution of the ith individual in the previous generation fails, the F i of the ith individual in the current ranked population will be assigned a random value.
where G and G max denote the current and the maximum generation number respectively. bs GÀ 1 i is a binary variable that records whether the ith individual in the previous generation has evolved successfully. If the evolution is successful, then bs GÀ 1 i ¼ 1; otherwise, bs GÀ 1 i ¼ 0: rand i denotes a uniform random variable in the interval [0,1]. The random setting of F i can improve the exploration ability by increasing population diversity.
Without considering the evolutionary status (set all bs GÀ 1 i ¼ 1), the maximum generation number is designed as 100 and 1000 respectively. Taking the 1st, 10th, 20th individuals as examples, the dynamic adaptation curves of the scaling factor F are shown in Fig 1. As F approaches zero, the search is conducted over a minimal area around the base vector. Conversely, the larger F is, the larger the search area around the base vector is. Observing Fig  1A-1C and Fig 1D-1F from the horizontal direction, it can be concluded that, with the same maximum generation number, the better the individual (the smaller the index number), the lager the corresponding generation number when F is close to zero. In other words, the better individual has more global search opportunities. By longitudinal comparison of Fig 1A-1C and Fig 1D-1F, it is clear that, although the maximum generation number is different, the dynamic adaptation curves of scaling factors of individuals with the same index are similar. In other words, for optimization problems with different complexity, the ratios of global and local search are similar. These are precisely in line with our design concepts of F.

3.2.2
The crossover probability CR adaptation. The crossover probability CR indicates the probability of inheriting elements from the mutation individual V i in the process of constructing the trial individual U i . Generally speaking, the superior individual is closer to the global optimal than the inferior individual [35]. With a smaller CR, the offspring individual can obtain more useful information by inheriting more elements from a superior parent individual. Instead, a larger CR can generate a more promising offspring individual by accepting more elements from the mutation individual if the parent individual is far away from the optimum. Based on this analysis, CR is calculated as follow: where CR min is the minimum CR, which is set as 0.1 in this paper. i is the ranked index of X G i . NP denotes the population size, and rand i denotes a random number uniformly distributed in the interval [0,1].

3.2.3
The weighting factor W adaptation. In RHRMDE, looking forward to more local search ability to accelerate convergence, the random elite strategy is proposed with a weighting factor, which is calculated by: where f min , f max , f i are the fitness value of the best, worst and ith individuals in the ranked population. In particular, when f max is equal to f min , W i is set to a uniformly distributed random variable within [0,1] to increase diversity. With the deepening of generation, the weighting factor W i decreases, so that the random elite strategy has a stronger local optimization ability. The pseudo-code for RHRMDE is given in Fig 2.

Benchmark test functions
To test the efficiency of RHRMDE, 32 benchmark test functions from literature [42][43][44] are utilized. All the test functions are treated as black-box problems and tested on 30D,100D, the search domain and the global minimum f( � ) of these benchmarks can be seen in Table 1.

PLOS ONE
Ranking-based hierarchical random mutation in differential evolution Sphere Schwefel 2.21 Sum of Different Power Step  Scaffer2 ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi These test functions have many properties, such as unimodality, multimodality, non-separation, separation, shifted, rotated, scalable and non-scalable [21]. In these 32 benchmarks, f 1 −f 12 are unimodal functions, f 13 is the step function, f 14 is the noise quartic function, and f 15 −f 32 are multimodal functions.

Computational complexity
The computational complexity of the RHRMDE is mainly composed of the following parts: initialization, fitness evaluation, mutation, crossover and selection, which are represented by Big-O notation respectively as follows: where NP is the population size, D denotes the dimension of the specific problem, and G max indicates the maximum generation number. In addition, the computational complexity of the fitness ranking and individual position updating before each generation of RHRMDE is O(NP�G max )+O(NP�D�G max ). Therefore, the RHRMDE has a computational complexity of O(NP�(3�D�G max +3�G max +D)), which is simply O(NP�D�G max ).

Sensitivity analysis to the inferior group size NWP
The impact of the inferior group size NWP on the performance of RHRMDE is investigated by setting NWP = λ�NP, and the candidate values for λ are 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, respectively. The Friedman test and Wilcoxon's rank-sum test [45] are conducted for ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Levy and Montalvo 1 Levy and Montalvo 2  Table 2, respectively. From Fig 3, it can be observed that the performance of RHRMDE is best at λ = 0.1. From Table 2, it can be seen that RHRMDE is not sensitive to NWP. Based on these statistical results, λ = 0.1 is considered a more appropriate value. Therefore, NWP = 0.1�NP is employed in the following series of experiments.

Parameter settings and involved algorithms
The performance of RHRMDE is compared with five DE-based algorithms like SaDE [29], RcrIJADE [31], MPEDE [36], DPLDE [37], DEPSO [40] and five non-DE-based algorithms like GWO [7], WOA [8], MKE_v3 [9], MPA [10] and EO [12].  For a fair comparison, the maximum generation number G max = 1000 is used as the stopping criterion for all algorithms, the population size is set NP = 100, and each function runs independently for 30 times. The other parameters of the comparative algorithms are consistent with their original literature.

Convergence accuracy analysis.
With the same test function, the average of function error value f(X best )−f(X � ) represents the convergence accuracy of the algorithm, where X best is the best solution found by the algorithm in each run and X � is the real global optimal solution of the test function. Moreover, the Wilcoxon signed-rank test at the 0.05 significance level, the Wilcoxon's rank-sum test, the Friedman test and Kruskal Wallis test are conducted on the experimental results to obtain a reliable statistic conclusion.
From Tables 3 and 4 According to Table 5, whether D equals 30 or 100, it can be observed that RHRMDE gets higher R + values than R − values for the ten competitors. As can be seen from the computed p−values, RHRMDE shows a significant improvement in terms of different criteria over SaDE, Rcr-IJADE, MPEDE, DPLDE, DEPSO, GWO, WOA, MKE_v3, MPA and EO at the significance level α = 0.05,01. Thus, it can be concluded that RHRMDE is significantly better than its competitors.
As can be seen from Fig 5, when D = 30,100, both the Friedman test and the Kruskal-Wallis test show that RHRMDE is the best and MKE_v3 is the worst. In addition, EO performs the second best at D = 30,100 under the Friedman test, while DPLDE performs the second best at D = 30,100 under the Kruskal-Wallis test.   R + is the rank sum of the first algorithm over the second algorithm, and R − is the rank sum of the second algorithm over the first algorithm,. The bigger the R + , the better the first algorithm. Sign "Yes" indicates that the performance of RHRMDE is better than its competitor significantly. https://doi.org/10.1371/journal.pone.0245887.t005

Stability analysis
Due to the stochastic nature of these metaheuristic algorithms, the stability of the proposed algorithm has been analyzed by solution error results over 30 independent runs at D = 30. Box plots of solution error results of all compared algorithms on two unimodal and four multimodal functions are shown in Fig 6. It can be seen that RHRMDE has more concentrated solution error values and fewer outliers compared with other algorithms. Therefore, it can be concluded that RHRMDE is superior to other competitors in terms of stability.

Convergence speed analysis
In order to compare the convergence speed intuitively, the convergence curves of the mean function error values of all compared algorithms on two unimodal and four multimodal functions at D = 30 are plotted in Fig 7. The difference between performance is pronounced in these convergence curves. The steep slope of the RHRMDE curves indicates that RHRMDE converges rapidly and rarely stops before finding the global optimal solution. Due to the introduction of the random global mutation strategy and control parameter adaptation, the population diversity is preserved in the early generations. Therefore, although the convergence speed of RHRMDE is not the fastest in the beginning, it produces the best results in the end. From the perspective of steepness and the final solution, it can be concluded that the convergence speed of RHRMDE is much faster than that of SaDE, Rcr-IJADE, MPEDE, DPLDE, DEPSO, GWO, WOA, MKE_v3, MPA and EO.

Efficiency analysis of control parameter adaptation
The efficiency analysis for control parameter adaptation in RHRMDE is carried out, while the efficiency of hierarchical mutation strategy can be verified by sensitivity analysis to the inferior group size NWP as described in section 4.3. Some variants of RHRMDE are listed as follows. The performance of these different RHRMDE variants is compared in terms of the Friedman test and Wilcoxon's rank-sum test, the results of which can be seen in Fig 8 and Table 6, respectively. From Fig 8, it can be observed that the proposed RHRMDE, RHRMDE-6, RHRMDE-5 are the best, second best and third best respectively. From Table 6, the proposed RHRMDE outperforms other RHRMDE variants with fixed F, W and a smaller crossover probability except for RHRMDE-5 with a larger CR and RHRMDE-6 with a random CR. The above experimental comparisons prove that the control parameter adaptation of RHRMDE is effective and its comprehensive effect is the best. It is worth noting that the contribution of the adaptation of the scaling factor and weighting factor is larger than that of the crossover probability.

Discussion on the comparison results
A series of experiments and comparisons have verified the superior performance of RHRMDE. The reasons are summarized as follows: (1) In the hierarchical mutation mechanism, the random global mutation strategy is conducive to extensive searching in the solution space and maintaining population diversity; the random elite mutation strategy makes full use of randomly selected elite individuals to guide the evolution of inferior individuals, thereby positively affecting the evolution direction of the entire population and accelerating the convergence speed. (2) The scaling factor adaptation considers the complexity differences of various problems and individual differences, and adjusts in combination with the historical evolutionary state. Therefore, the proposed algorithm has different global exploration and local exploitation capabilities for each individual and problem, and can better adapt to various The crossover probability adaptation also takes into account individual differences. When the individual is superior, the crossover probability is more likely to be smaller, so there is a greater likelihood of keeping useful information of the superior individual. (4) Under the influence of fitness value and generations, the weighting factor adaptation improves the local search ability of the proposed algorithm and accelerates the convergence speed of the proposed algorithm.  R + is the rank sum of the first algorithm over the second algorithm, and R − is the rank sum of the second algorithm over the first algorithm,. The bigger the R + , the better the first algorithm. Sign "No" indicates that there is no significant performance discrepancy. https://doi.org/10.1371/journal.pone.0245887.t006

Conclusions
A new DE variant algorithm, RHRMDE, is proposed in this paper. In the RHRMDE, the novel hierarchical mutation mechanism based on the non-inferior group and the inferior group is designed to maintain the population diversity and accelerate convergence speed. At the same time, the control parameter adaptation includes historical evolutionary status and individual differences information, considering the complexity differences of various problems, and combining with the hierarchical mutation mechanism to further optimize the algorithm's balancing ability for exploration and exploitation and speed up the convergence. In addition, the performance of RHRMDE is verified by comparing with five advanced DE variants like SaDE [29], RcrIJADE [31], MPEDE [36], DPLDE [37], DEPSO [40] and five metaheuristic algorithms like GWO [7], WOA [8], MKE_v3 [9], MPA [10] and EO [12] on 32 universal benchmark functions. The experiment results show that: (1) RHRMDE is not sensitive to the inferior group size. (2) In terms of convergence accuracy, stability and convergence speed, RHRMDE performs best for all the unimodal functions and most of the multimodal functions. For the step function, quartic noise function and some multimodal functions, RHRMDE is not the best, which is in line with "No free lunch theorems for optimization" [46]. (3) The adaptive adjustment of the control parameters F, CR and W of RHRMDE is effective, and the integration effect is the best. (4) RHRMDE has the best performance among all the compared algorithms.
As a continuation of this research, the binary and multi-objective version of RHRMDE can be developed. Also, it can be applied to some practical engineering issues, such as airspace planning, flight sequencing, etc.