Reinforcement learning for solution updating in Artificial Bee Colony

In the Artificial Bee Colony (ABC) algorithm, the employed bee and the onlooker bee phase involve updating the candidate solutions by changing a value in one dimension, dubbed one-dimension update process. For some problems which the number of dimensions is very high, the one-dimension update process can cause the solution quality and convergence speed drop. This paper proposes a new algorithm, using reinforcement learning for solution updating in ABC algorithm, called R-ABC. After updating a solution by an employed bee, the new solution results in positive or negative reinforcement applied to the solution dimensions in the onlooker bee phase. Positive reinforcement is given when the candidate solution from the employed bee phase provides a better fitness value. The more often a dimension provides a better fitness value when changed, the higher the value of update becomes in the onlooker bee phase. Conversely, negative reinforcement is given when the candidate solution does not provide a better fitness value. The performance of the proposed algorithm is assessed on eight basic numerical benchmark functions in four categories with 100, 500, 700, and 900 dimensions, seven CEC2005’s shifted functions with 100, 500, 700, and 900 dimensions, and six CEC2014’s hybrid functions with 100 dimensions. The results show that the proposed algorithm provides solutions which are significantly better than all other algorithms for all tested dimensions on basic benchmark functions. The number of solutions provided by the R-ABC algorithm which are significantly better than those of other algorithms increases when the number of dimensions increases on the CEC2005’s shifted functions. The R-ABC algorithm is at least comparable to the state-of-the-art ABC variants on the CEC2014’s hybrid functions.


Introduction
The Artificial Bee Colony (ABC) algorithm [1] is a meta-heuristic optimization algorithm based on Swarm Intelligence.A swarm system comprises simple agents which communicate with other agents and their environment.By targeting the same goal, agents complete the swarm's task without any control unit.In the Artificial Bee Colony algorithm, the agents' goal is to find the best food source.
Food sources represent a set of feasible solutions in a multidimensional search space, and each agent simulates a bee.A solution is composed of optimization parameters.The number of dimensions specifies the number of optimization parameters.For example, a 10-dimension solution is composed of 10 optimization parameters.The larger the number of dimensions, the more complicated a problem is.
In ABC, agents are categorized by their functions into three groups, which are employed bees, onlooker bees, and scout bees.Each employed bee is initially associated with a random food source.In each iteration, employed bees explore new food sources near the current ones.After collecting nectar, each employed bee evaluates how good the food source is, and it moves to the new food source only if the bee determines that the new source is better.The employed bees also share the information with onlooker bees.Each onlooker bee decides to establish a rich food source based on the information provided by the employed bees.The higher the quality of a food source, the higher the probability the food source is selected by onlooker bees.Each onlooker bee then finds a new food source around the selected food source and moves to the new food source if it finds that the new food source is better.In a predetermined number of iterations, if a better food source cannot be found, employed bees associated with a food source may turn into scout bees and explore new food sources in a new area of the search space.
There are both similarities and differences in finding new food sources by employed bees, onlooker bees, and scout bees.An employed bee finds a new food source close to the food source with which it is associated, irrespective of how good the current food source is, whereas an onlooker bee tends to find a new food source around a high quality one.An employed bee finds a new food source around the current food source by changing a value in one dimension, and so does an onlooker bee.After that, both employed bees and onlooker bees choose to stay with the better food sources, i.e. greedy selection.In contrast, scout bees will find new food sources in the search space by changing values in all dimensions and always move to the new ones, without considering their quality.
Each group of bees displays a different degree of explorative and exploitative behaviors.The explorative behavior of a search agent involves searching for a new food source in a large region of the search space to avoid a local optimum.On the other hand, the exploitative behavior involves searching for a better food source near the current food source.Comparing explorative and exploitative behaviors, the scout bees perform the highest degree of exploration while the onlooker bees perform the highest degree of exploitation.The scout bees' explorative behavior enables the widest search range, i.e. their new food sources could be any food sources in the search space, possibly in an unknown area.The search ranges of the employed bees and onlooker bees, in contrast, are only in the area around the existing food sources.The onlooker bees' exploitative behavior makes them likely to find the best quality food source because each onlooker bee tends to have a good quality food source in hand while it searches for a better one.
Inspired by a variant of Particle Swarm Optimization (PSO) called Bare Bones PSO [2], Gao, Chan, Huang, and Liu [3] introduced Bare Bones ABC (BABC) with three key modifications.First, onlooker bees use a Gaussian search equation to generate a new food source.Second, a new adaptation strategy is applied to the scaling factor of the search equation in the employed bee phase.Third, a fitness-based neighborhood mechanism was introduced.To update the current food source, an employed bee proportionally selects an individual food source based on the fitness values.The better fitness value a food source has, the more probability it will be selected.The three modifications increase the exploitation of the employed bee and onlooker bee phases.Kiran, Hakli, Gunduz, and Uguz [4] proposed the ABC algorithm with a variable search strategy (ABCVSS) using five different search equations and their individual counters.Each search equation plays a different role of searching.One of the search equations is the original ABC's search equation.Two search equations are used to increase the diversity of the population.Another search equation is used to search around the global best food source, while the other search equation is used to search towards the mean of the population.The counter of a search equation indicates the number of the successful searches.The search equation selection is based on the values of the counters.The higher value of the counter a search equation has, the more probability it will be selected.The variable search strategy allows the ABC algorithm to learn which search equation provides good solutions for each problem.Cui, Li, Zhu, Lin, Wen, Lu, et al. [5] introduced an adaptive method for the population size (AMPS).The population size is adaptive according to the number of the successful searches and this population control strategy is used to balance the degree of exploitation and exploration adaptively.In [6], a ranking-based adaptive selection probability is used.Rather than the fitness values, the selection probability is calculated from the ranking of each food source and the success rate of the population.If the success rate is high, the exploitation is preferable to the exploration and an onlooker bee tends to select a good food source.On the other hand, if the success rate is low, the exploration is preferable to the exploitation.
Lately, Li, Cui, Fu, Wen, Lu, and Lu [7] introduced a gene recombination operator (GRO) at the end of each generation.As an extension, the GRO is used to generate a new food source from good food sources.It accelerates the convergence speed and increases the exploitation.However, the GRO is not efficient in the case of multimodal functions, because two good food sources used to generate a new food source might be located near two different local optimum food sources.
Like other optimization algorithms, the performance of the ABC algorithm in the aspect of solution quality drops when the number of dimensions increases.There have been many efforts to improve the solution quality of the ABC algorithm.One of the widely-used techniques is driving a new solution towards the best quality food source.Inspired by the PSO algorithm, Zhu and Kwong [8] proposed the Gbest-guided Artificial Bee Colony (GABC).In the GABC algorithm, a non-negative control parameter C was used to steer the new food source towards the best quality food source as the global best.
To accelerate driving towards the best solution, Banharnsakun, Achalakul, and Sirinaovakul [9] proposed the Best-so-far Artificial Bee Colony (BSF-ABC) which not only drives the food source towards the best food source but also updates values in all dimensions.There are two key modifications.First, BSF-ABC updates all dimensions of the food source towards the bestso-far solution rather than updating only one dimension towards a random food source.Because all dimensions are updated towards the best-so-far food source, the convergence speed is very high.Second, a scout bee discovers a new food source within an adjustable search area.The search area is largest during the early iterations and shrinks over time.
Akay and Karaboga [10] proposed a modified ABC (MABC) which updates more than one dimension with the controllable magnitude of the perturbation.They introduced two parameters, modification rate (MR) and scaling factor (SF).The modification rate controls the diversity of the food source update and the number of dimensions changed.The higher the modification rate is, the larger the number of dimensions that are changed.The other control parameter, scaling factor or SF, controls the magnitude of the food source update.Both control parameters are fixed and predefined.
Karaboga and Kaya [11] proposed a new version of ABC named Adaptive and Hybrid Artificial Bee Colony (aABC).The aABC algorithm drives a new food source towards the best food source using an arithmetic crossover operation and updates a food source within an adaptable magnitude of the perturbation.A new food source update in the onlooker bee phase was introduced with two new control parameters, crossover rate, and adaptivity coefficient.The crossover rate determines how much the new food source is like the current food source.The adaptivity coefficient controls how quickly a bee moves towards the random food source.The crossover with the global best solution improves the quality of the generated food sources resulting in quick convergence, while the adaptivity coefficient enables the magnitude of food source update to be adaptable.The lower the adaptivity coefficient, the larger the magnitude of an update.
However, real-world optimization problems in many areas tend to be high-dimensional, for example, in biology [12] and visualization [13].The growing number of dimensions, from tens to hundreds or even thousands, makes optimization problems much more difficult because the search space exponentially expands as the number of dimensions increases [14].Therefore, the time complexity increases.
It is obvious that updating a value in only one dimension at a time is not sufficient to solve a high-dimensional problem.When updating many dimensions, the algorithm should not change all of them within the same magnitude of perturbation for all dimensions because the distance between the current food source and the actual best food source might be different in each dimension.The problem causes the algorithm unable to converge to an optimal solution in the problem like the Dixon-Price function.The algorithm should learn to adapt the degree of perturbation for each dimension separately.
In this paper, we use reinforcement learning to improve the solution quality when finding the solution in a very high dimensional problem.The ABC algorithm has strong explorative behavior and weak exploitative behavior, so we aim to improve the exploitative behavior in the onlooker bee phase.Inspired by the foraging behavior of animals, we introduce the concept of the reinforcement vector.When an animal forages for food, it is more likely to keep moving in the same direction rather than turning to a different direction.The animal will often search for more food in the same area if it can find food, so called win-stay lose-shift strategy [15].In the ABC algorithm, a bee moves in a direction by changing the value of a dimension.In the proposed algorithm, when an employed bee moves to a direction and finds a candidate food source, it does not share only the quality of its food source with onlooker bees but also the direction.The employed bees then update the reinforcement vector with positive or negative reinforcement.If an employed bee finds a better food source, positive reinforcement is given to the dimension.On the other hand, if the employed bee cannot find a better food source, negative reinforcement is given to the dimension.The idea is to driving onlooker bees' positions with different magnitude of perturbation for each dimension according to the reinforcement vector.An onlooker bee changes the values of all dimensions, and it is likely to change the values of some dimensions more than those of others.Therefore, the algorithm will improve the exploitation performance of ABC when finding the solution in a very high dimensional problem.

Reinforcement learning for solution updating
Modifying the original ABC algorithm [1,16], we use reinforcement learning for solution updating, introducing a new algorithm called R-ABC algorithm.In the t th iteration, a population or a set of food sources is denoted by X t consisting of SN food sources.The i th food source of X t is demoted by " x t i , i 2 {1, 2, 3, . .., SN}, as shown in Eq (1).
For a D-dimensional problem, each food source " x t i has D optimization parameters.The j th optimization parameter of " x t i is denoted by x t i,j , j 2 {1, 2, 3, . .., D}, as shown in Eq (2).

*
x t i ¼ hx t i;1 ; x t i;2 ; x t i;3 ; . ..; The fitness value of " x i t is denoted by Fitð" x i t Þ.The objective of the algorithm is to find the optimization parameters providing the minimum value of Fitð" x i t Þ, and the fitness value of " x i t is defined as in Eq (3).
Where f i t is the objective function value of the food source " x i t .A reinforcement vector " r t is introduced for X t in the R-ABC algorithm.For a D-dimensional problem, the reinforcement vector " r t is defined as in Eq (4).r !t ¼ hr t 1 ; r t 2 ; r t 3 ; . ..; r t D i; r t j 2 ½0; 1; Fig 1 shows a flowchart of the R-ABC algorithm.The reinforcement value r t j is used as the reinforcement for the j th optimization parameters of all food sources in X t .In other words, the optimization parameters x t 1,j , x t 2,j , x t 3,j , . .., and x t SN,j share the same reinforcement value (r t j ).For initialization, food sources are randomly generated as shown in Eq (5) and then employed bees are associated with food sources.The initial reinforcement r 0 j is calculated from Eq (6).
Where x j min is the lower limit of the j th optimization parameter, and x j max is the upper limit of the j th optimization parameter.Apart from the initialization phase, the algorithm is divided into three phases which are employed bee phase, onlooker bee phase, and scout bee phase.All three phases are repeated until the termination criterion is satisfied or until the maximum number of fitness evaluations (MFE) is reached.
In the employed bee phase, the neighboring food source " v t i of the current food source " x t i is discovered by employed bees as proposed in [16], as shown in Eq (7).
Where v t i,j is the j th optimization parameter of " v t i .An integer j is randomly selected in the interval of [1, D].The index of food source k is randomly selectedin the interval of [1,SN], and k must not be equal to i to prevent the value of v t i,j from being the same as that of x t i,j .If the new food source " v t i provides a better fitness value than the current food source " x t i , i.e., Fitð" v t i Þ > Fitð" x t i Þ, the employed bee forgets the current food source and memorizes the new one.
The reinforcement vector " r t for the next iteration is then updated by using the linear reward-penalty scheme or Bush-Mosteller scheme [17][18] in Eqs (8) and (9).If the new food source provides a better fitness value, the employed bee not only replaces the current food source with the new food source, but also gives a larger reinforcement value as a reward to the selected dimension while the reinforcement values of other dimensions are smaller as shown in Eq (8).On the other hand, if the new food source does not provide a better fitness value, the new food source is ignored and a penalty is given to the selected dimension by decreasing the reinforcement value while the reinforcement values of other dimensions are increased as shown in Eq (9).
Where j and d 2 {1, 2, 3, . .., D} and d is the randomly dimension selected by an employed bee.α and β are the degree of reward and the degree of penalty, respectively.The reinforcement vector is updated every time an employed bee finds a candidate food source.Therefore, if EB is the number of employed bees, the reinforcement vector is updated EB times per iteration.After updating, the sum of the reinforcement values is always equal to 1.The proof is shown in S1 Appendix.The values of α and β are shown in Eq (10).
In the onlooker bee phase, each onlooker bee selects a candidate food source provided by employed bee depending on a probability associated with the fitness values as in Eq (3).In contrast to the original ABC algorithm, an onlooker bee discovers new food source using Eq (11) and replaces the current food source with the new food source if the new food source is more profitable.
Where v t i,j is the optimization parameter of a neighboring food source " v t i at the dimension j 2 {1, 2, 3, . .., D} in the iteration t. k 2 {1, 2, 3, . .., D} is a random number indicating a random dimension for all j.F 2 [-1, 1] is a random real number.x t BSF,k is the optimization parameter of the global best food source found so far at a random dimension k.
The new search equation in Eq (11) is designed to enhance the exploitation.Rather than updating the value in a dimension, an onlooker bee exploits around the current food source by updating all dimensions with different weights.The reinforcement vector (r j t ) indicates how much an onlooker bee focuses on updating each dimension.If the value in a dimension of the reinforcement vector is equal to 1, and all other dimensions are equal to 0, an onlooker bee will update only one dimension as in the original ABC algorithm which is one-dimension update process as aforementioned.Conversely, if the values in all dimensions of the reinforcement vector are similar, an onlooker bee updates all dimensions with similar weights.In this case, the onlooker bee may not be able to find a better food source.
The degree of reward and penalty control how much the reinforcement values are adapted.They have adjusted adaptively according to the fitness value of a candidate food source found by an employed bee.According to the win-stay lose-shift strategy in the animals' foraging behavior [15], if animals find food in an area, they will keep finding more food in that area.Otherwise, they will move to another area.In the proposed algorithm, if an employed bee can find a better food source in dimension d of the candidate food source, the onlooker bees will assign more weight to dimension d by Eq (8).On the other hand, if an employed bee cannot find a better food source, the smaller weight value is assigned to the dimension d, by Eq (9).If the candidate food source is better than the current food source, the value of the reinforcement value in the dimension d increases.Otherwise, the reinforcement value will be decreased.In some cases, the candidate food source can have a lower value of reinforcement vector (r j t ) than the current food source but its fitness value is higher than those of other food sources.This means that the candidate food source is a local optimum food source and the negative reinforcement value is assigned.This process allows the onlooker bees to escape from a local optimum food source by updating the values in other dimensions with higher weights.However, if there is any optimum value in a dimension (j) that is far away from others, the explorative behavior of employed bees, Eq (7), is required to handle the case.Fig 2 shows an example of the reinforcement vector updating.If an employed bee updates the value in dimension 4, and the fitness value of the candidate food source is better, the value of r 4 t+1 increases.Otherwise, the value of r 4 t+1 decreases.
In the scout bee phase, if the number of unsuccessful attempts to find a better neighbor for any food source exceeds the limit, the food source is abandoned and a scout bee randomly discovers a new food source as in [16].
The pseudo-code of the R-ABC algorithm is presented in Algorithm 1. Update position of employed bee using Eq (7); 9: Evaluate the new position; FEs = FEs+1; 10: IF fitness value of new position is better 11: Reset TRIAL; 12: Replace the current food source with the new one; 13: Apply positive reinforcement using Eq (8); 14: ELSE 15: TRIAL = TRIAL+1; 16: Apply negative reinforcement using Eq (9); 17: END IF 18: END FOR 19: Calculate probability for selecting food source in the onlooker bee phase; Onlooker Bee Phase: 20: FOR each onlooker bee DO 21: Select a food source based on probability; 22: Update position of onlooker bee using Eq (11); 23: Evaluate the new position; FEs = FEs+1; 24: IF fitness value of new position is better 25: Reset TRIAL; The effect of the reinforcement learning integrated into the proposed algorithm is to adjust the range of the area searched in the onlooker bee phase.There are two differences between the original ABC and the R-ABC algorithms.
The first difference relates to the information shared between employed bees and onlooker bees.In both algorithms, an employed bee updates one dimension at a time to find a new food source and then shares some information with onlooker bees.In the original ABC algorithm, the information which employed bees share with an onlooker bee is only the quality of the food source in terms of the fitness values.Then each onlooker bee uses the fitness values of food sources to select a candidate food source in Eq (7).In the R-ABC algorithm, each employed bee shares not only the quality of food source but also which dimension should be changed to find a better food source.This additional information is shared in the terms of the reinforcement.If an employed bee changes a dimension and finds a better food source, it gives a positive reinforcement to the dimension.On the other hand, if it changes a dimension and finds a worse food source, it gives a negative reinforcement to the dimension.Each onlooker bee uses the reinforcement to update its position as shown in Eq (11).
The second difference between the original ABC and the R-ABC algorithms is the number of dimensions updated by an onlooker bee.In the original ABC algorithm, only one dimension is changed by an onlooker bee.In the R-ABC algorithm, all dimensions are changed by an onlooker bee with different ranges according to the reinforcement.A dimension which is changed and gives a better food source has a wider update range.On the other hand, a dimension which is changed and produces a worse food source has a narrower update range.
For example, suppose bees fly along the x-axis, y-axis, and z-axis to find the best food source, in other words, a 3D problem.An employed bee explores a new food source by changing only one dimension at a time, i.e., it moves along only one axis.If an employed bee flies along the y-axis and finds a better food source, it tells onlooker bees that they should move along the y-axis too.Each onlooker bee moves in all three dimensions, but it tends to move along the y-axis more than other dimensions.On the other hand, if an employed bee flies along the y-axis and finds a worse food source, it tells onlooker bees that they should not move very much along this dimension.
The reinforcement varies over time depending on which dimension is selected and the quality of the food sources found by employed bees.For example, if the first employed bee finds a better food source by changing dimension d, it gives a positive reinforcement to dimension d, i.e., the reinforcement of dimension d increases while those of other dimensions decrease.The higher the quality of the new food source is, the more the positive reinforcement is given to dimension d.Later, if the second employed bee finds a better food source by changing another dimension, it gives a positive reinforcement to that dimension and the reinforcement of other dimensions including d decrease.
Although in both the R-ABC and the BSF-ABC algorithms, the values of all dimensions are updated.In the BSF-ABC algorithm, the values in all dimensions of each food source are updated according to the best-so-far food source as the multiplier of the error correction for this solution update [9].In the R-ABC algorithm, the values of all dimensions are modified with different magnitudes.The magnitude of each dimension varies according to the fitness values of new food sources found by employed bees.Fig 3 shows an example of a 10D reinforcement vector changing over time.At the initialization, the reinforcement for all dimensions is equal.Each reinforcement is changed when an employed bee discovers a new food source.At time t, the employed bee finds a new food source by changing dimension x 3 and evaluates the new food source.At time t+1, the values of reinforcement are changed according to the quality of the new food source.
The difference of the position update between the original ABC algorithm and the R-ABC algorithm is shown in Fig 4 .An onlooker bee in the original ABC algorithm selects a random dimension to update the position of food source, but an onlooker of the R-ABC algorithm updates all dimensions within different ranges according to the positive-negative reinforcement associated with the food source discovery in the employed bee phase.The dimension that provides a better fitness value when changed has the widest possible update range.

Experimental settings
To evaluate the performance of the R-ABC algorithm, we tested the proposed technique compared with the state-of-the-art algorithms on widely-used numerical benchmark functions.
The R-ABC algorithm was evaluated by applying it to a set of numerical benchmark functions focusing on two characteristics as follows.
• Unimodal and multi-modal: Modality of a function corresponds to the number of peaks in the function surface [19].Finding the global optimum of a multi-modal function is more difficult than that of a unimodal function because algorithms may become trapped at one of the peaks which is a local optimum solution.• Separable and non-separable: Separability of a function is a measure of the whether each parameter in a solution is independent to the other parameters [19].Dealing with a separable benchmark function is easier than dealing with a non-separable one because a separable function can be decomposed into sub-functions and they can be optimized separately [20].
Thus, we used four categories of benchmark functions:-unimodal separable functions (US), unimodal non-separable functions (UN), multimodal separable functions (MS), and multimodal non-separable functions (MN).We chose two benchmark functions from each category.The description of the selected benchmark functions [19] is shown in Table 1.In addition, to evaluate the performance of the R-ABC algorithm on functions with more difficulty, we employed seven shifted functions from CEC2005 [21] and six hybrid functions from CEC2014 [22] to test.The CEC2005's shifted functions and the CEC2014's hybrid functions are shown in Table 2 and Table 3, respectively.The objective is to find the minimum output values from the functions.
The R-ABC algorithm was compared with seven algorithms listed below.
• Adaptive Population ABC (APABC) [5] • Adaptive and Hybrid ABC (aABC) [11] • Best-So-Far ABC (BSF-ABC) [9] • ABC with a variable search strategy (ABCVSS) [4] • ABC [16] • Starling Particle Swarm Optimization (Starling PSO) [23] • Adaptive Differential Evolution with Optional External Archive (JADE) [24] Code for BSF-ABC, ABC, and Starling PSO was provided by the original authors, whereas code for APABC, aABC, ABCVSS, and JADE was rewritten based on published reports and [-5.12, 5.12] f 5 (0,. ..,0) [-500, 500] f 6 (420.9687,. ..,420.9687)  random, and the sum of the random values is equal to 1.Each experiment was run 20 times on 2.00 GHz Intel1 Xeon1 CPU with 4.00 GB memory, and all memory was cleared before each run.The summary of the control parameters is shown in Table 4.The values of adaptivity coefficients α and crossover rate γ providing the best results for each function in [11] were selected to be used in aABC algorithm as shown in Table 5, while the values of adaptivity coefficients α and crossover rate γ used in the CEC2005's shifted functions and the CEC2014's hybrid functions were 0.5 and 0.5, respectively.

Basic numerical benchmark functions
The results of the algorithm for basic benchmark functions are presented in Tables 6-8.Tables 6 and 8 show the average final values of the R-ABC and the compared algorithms.However, the R-ABC and the BSF-ABC algorithms may not consume the same number of fitness evaluations to provide the results in Table 8.The "D" column is the number of dimensions.The "Mean" row is the arithmetic mean of output values.The "Min" and "Max" rows present the minimum and maximum outputs respectively.The "SD" row is the standard deviation of the  results.For "Mean", "Min", "Max", and "SD" rows, the values which are smaller than 1E-308 are considered as 0. The best results are marked with boldface.Note that the global minimum value of Ackley function is computationally 8.88E-16, but considered as 0. The "Sig" row is the comparison results from the Wilcoxon's rank sum test to show whether the final results from all runs between the R-ABC and each competitor are significantly different.The final results which are smaller than 1E-308 are replaced by 0 before the test.The "+", "=", and "-" symbols mean that the final results of the R-ABC algorithm are better than, similar to, and worse than those of a competitor at 0.05 significant level, respectively.Although the results of the R-ABC and BSF-ABC algorithms for the Schwefel function are the same in Table 8, the actual values are different but very trivial.Table 7 shows the results of algorithm ranking by the Friedman's test by using KEEL [25][26].Table 6 shows that, with 30000 fitness evaluations, the R-ABC algorithm gives final solutions significantly better than the APABC, aABC, ABC, Starling PSO, and JADE algorithms for the unimodal separable, unimodal non-separable, multi-modal separable, and multi-modal non-separable benchmark functions for all tested dimensions.The rankings of all compared algorithms for the basic benchmark functions are shown in Table 7.The R-ABC algorithm is the first ranked among all competitors for all tested dimensions.The JADE algorithm is the second ranked for the majority of cases.
Table 8 shows that, with MCN = 10000, the R-ABC algorithm gives solutions significantly better than the BSF-ABC algorithm in the majority of cases of the unimodal separable and unimodal non-separable functions.Both the R-ABC and BSF-ABC algorithms can reach the global optimum solutions for all runs of the 100D Sphere function, but only the R-ABC algorithm can reach the global optimum solutions for all runs of the 100D Sum Squares function.According to the Wilcoxon's rank sum test, the solutions of the R-ABC algorithm are similar to those of the BSF-ABC algorithm for the 100D Sphere function, the 100D Sum Squares function, and the 700D and 900D Rosenbrock functions.For the multi-modal separable and multimodal non-separable functions, the solutions of the R-ABC algorithm are similar to those of the BSF-ABC algorithm.
The result also shows the different sensitivity to the number of dimensions.The growing number of dimensions affects the solution quality of the R-ABC algorithm less than those of some algorithms.This is obvious in the cases of the Sphere and Sum Squares functions.Table 8 shows that both the R-ABC and BSF-ABC algorithms can reach the global optimum solutions for all runs of the 100D Sphere function with MCN = 10000, but the BSF-ABC algorithm was beaten by the R-ABC algorithm for 500D, 700D, and 900D.In Table 6, for the Sphere and Sum Squares functions, with MFE = 30000, the aABC algorithm provides the third-best mean solutions for 100D, but it provides the fifth-best mean solutions for 900D, while the R-ABC algorithm provides the best mean solutions for both 100D and 900D.This shows that the mechanism of the R-ABC algorithm is not sensitive to the number of dimensions, so it also works well with the high-dimensional basic benchmark functions.
The experiments were designed to evaluate the R-ABC algorithm on four categories of benchmark functions.We found that the different categories of benchmark functions did not affect the R-ABC's quality in any consistent way.Table 8 shows that the R-ABC algorithm can reach the global optimum solutions for all tested dimensions of the Rastrigin, Ackley, and Griewank functions and for the 100D Sphere and Sum Squares function within MCN = 10000.Reaching the global optimum solutions for the Ackley and Griewank functions, which are multimodal and non-separable, shows that the difficulty of solving multimodal functions and that of solving separable functions do not affect the performance of the R-ABC algorithm in this case.However, Table 6 shows that the JADE and APABC algorithms performed well in some categories of benchmark functions.For all tested dimensions of the unimodal separable and unimodal non-separable functions, the JADE algorithm gives the second-best solutions.For the multi-modal separable functions, the APABC algorithm gives the second-best mean solutions in all cases, except for the 900D Rastrigin function.For the multi-modal non-separable functions, the JADE algorithm gives the second-best mean solutions in all cases, except for the 100D Ackley function.
The solution quality of the R-ABC algorithm is rather affected by the surface of the benchmark functions.The surfaces of the Rastrigin, Ackley, and Griewank functions are like big mountains.The highest point is at the center and surrounded by smaller peaks roughly in descending order.The best food source found so far in Eq (11) guides onlooker bees to a higher step of the mountain.
However, the R-ABC is not totally superior to the BSF-ABC algorithm for the Rosenbrock function which is unimodal.For all tested dimensions of the Rosenbrock function, the R-ABC algorithm provides mean solutions better than the BSF-ABC algorithm, while the BSF-ABC algorithm provides minimum solutions better than the R-ABC algorithm.There is a flat valley around the optimum point on the surface of the Rosenbrock function.When the R-ABC almost converges towards the optimum solution, food sources are likely located in the flat valley including the best food source found.The short distance between the best food source found and each food source makes it difficult for the R-ABC algorithm to reach the optimum solution.
In addition, to validate the effectiveness of the reinforcement vectors on each benchmark function, we replaced the reinforcement vectors with random vectors.results of the reinforcement vectors compared to those of random vectors.The R-ABC algorithm gives solutions significantly better than the algorithm with random vectors in the majority of cases.The R-ABC algorithm is beaten by the algorithm with random vectors for the 100D Dixon-Price function and the 100D Rosenbrock function.In the case of the Rosenbrock function, the R-ABC algorithm gets into difficulties when it adapts the values of the reinforcement vectors in a flat valley surface where the fitness values of the food sources are similar.In the case of the Dixon-Price function, the optimum values of some dimensions are far from those of others, and those values may be out of the range of the reinforcement vectors, which were designed to find a better solution near the current solution.
To compare the convergence speed of the algorithms, the results of all benchmark functions are displayed in Figs 5-12.Each contour shows the average of the best solution found.All Yaxes are displayed in log scale.Figs 5 and 6 show that, for the unimodal separable benchmark functions, the Sphere, and Sum Squares functions, the convergence speed of the R-ABC algorithm is faster than the JADE algorithm and far faster than all other algorithms.Figs 7 and 8 show that, for the unimodal non-separable benchmark functions, the Dixon-Price and Rosenbrock functions, the R-ABC initially converge more quickly than others.For the 100D unimodal non-separable functions, the Starling PSO algorithm initially converges quickly than the APABC, ABC, and aABC algorithms, but it stagnates too quickly.Figs 9-12 show that, for the multi-modal functions, the contour of the R-ABC algorithm suddenly drops several times, but it can escape from local optimum solutions.

CEC2005's shifted functions
To evaluate the performance of the R-ABC algorithm on benchmark functions with more difficulty, we compared the R-ABC algorithm with the original ABC algorithm and its variants which were the APABC, aABC, and ABCVSS algorithms on seven CEC2005's shifted functions [21] for 100, 500, 700, and 900 dimensions.Table 9 shows the results on the CEC2005's shifted functions.Table 10 shows the results of the Wilcoxon's rank sum test for each dimension.Fig 13 shows the final results for all tested dimensions.Fig 14 shows the results of the algorithm ranking by the Friedman's test by using KEEL [25][26].Note that the tested dimensions are 100, 500, 700, and 900, so there are irregular intervals of x-axis between 100 and 500 in Figs 13 and 14.
Table 9 shows that, with 30000 fitness evaluations, the R-ABC algorithm gives the best solutions on one function for 100 dimensions, two functions for 500 dimensions, three functions for 700 dimensions, and three functions for 900 dimensions.For the Shifted Function 7, the R-ABC algorithm can give the best solutions for all tested dimensions.
The results of the Wilcoxon's rank sum test in Table 10 show that the number of solutions provided by the R-ABC algorithm which are significantly better than those of the compared algorithms increases when the number of dimensions increases.For 100 dimensions, the numbers of the shifted functions on which the R-ABC algorithm gives solutions significantly better than the APABC, aABC, ABCVSS, and ABC algorithms are 0, 1, 6, and 1, respectively.However, for 900 dimensions, the numbers of the shifted functions on which the R-ABC algorithm gives solutions significantly better than the APABC, aABC, ABCVSS, and ABC algorithms increase to 3, 4, 6, and 5, respectively.
Fig 13 shows the final results of each algorithm for all tested dimensions to compare the trend of the solution quality.The R-ABC provides flatter upward tilts to the lines in the cases of the Shifted Function 1, 2, 3, 4, and 5 compared with other algorithms.This is evidence that the R-ABC algorithm is less sensitive to the increasing number of dimensions than other algorithms for the majority of cases.
Fig 14 shows the rankings of all compared algorithms for the CEC2005's shifted functions.The aABC algorithm is the first ranked for 100 dimensions, and the APABC algorithm is the fisrt ranked for 500, 700, and 900 dimensions.Although the R-ABC algorithm is not the first ranked for any tested dimension, it is obvious that, when the number of dimensions increases, the R-ABC algorithm gets a better rank while the aABC and ABC algorithms get worse ranks.

CEC2014's hybrid functions
To evaluate the performance of the R-ABC algorithm on more complicated functions, we also compared the proposed algorithm with the APABC, aABC, ABCVSS, ABC, Starling PSO, and JADE algorithms on CEC2014's hybrid functions.Each hybrid function is made up of different basic functions resulting in different subcomponents.
Table 11 shows the results on the CEC2014's hybrid functions.Table 12 shows the results of the Wilcoxon's rank sum test.Starling PSO, ABC, APABC, and ABCVSS algorithms on 1, 1, 3, and 6 functions, respectively.Overall, the R-ABC algorithm is beaten by the JADE and aABC algorithms on the hybrid functions.However, the solutions of the R-ABC algorithm are similar to those of the JADE and aABC algorithms, and significantly better than those of the ABC, APABC, ABCVSS, and Starling PSO algorithms on Hybrid Function 4.

Analysis of the perturbation
In addition, we analysed the impact of the perturbation.The sum of the reinforcement values is always equal to 1, so the average reinforcement value is 1/D.When the number of dimensions is high, the magnitude of perturbation is probably narrow.A narrow magnitude of perturbation probably makes the performance of the R-ABC algorithm drop especially when the optimal value of each dimension is far from those of other dimensions.Therefore, two new parameters, γ and κ, are introduced to analyse the impact of perturbation.The parameter γ is used to increase the degree of reward and penalty.The value of γ is set to 1, 2, 3, 4, 5, 10, 15, and 20.Eqs ( 8) and ( 9) are replaced by Eqs ( 12) and ( 13), respectively. If The parameter κ is used to magnify the reinforcement values.The value of κ is set to 1, 2, 3, and 4, respectively.Eq (11) is replaced by Eq (14).
The analysis of perturbation was conducted on the CEC2005's shifted functions for 100 dimensions.Note that it is the same as the original R-ABC algorithm when both γ and κ are set to 1.
Fig 16 shows the final results of the CEC2005's shifted functions with different values of γ and κ.There is no evidence that any value of γ gives solutions significantly better than other values.It means that multiplying the degree of reward and penalty by 1, 2, 3, 4, 5, 10, 15, and 20 does not significantly improve the performance of the R-ABC algorithm.Although there is no significant impact of γ, different values of κ give different final results.The Friedman's test is used to evaluate the results from different values of κ as shown in Table 13.For the Shifted Function 1, 2, 5, 6, and 7, the best solutions are obtained when κ is set to 4. For the Shifted    Function 3, the best solution is obtained when κ is set to 2. For the Shifted Function 4, the best solution is obtained when κ is set to 3. The results of the experiment show that, in all cases, the original R-ABC algorithm (γ = 1 and κ = 1) does not give the best solution.The performance of the R-ABC algorithm can

Conclusions
In this paper, we have integrated a reinforcement learning method for solution updates in the Artificial Bee Colony algorithm.We applied a positive-negative reinforcement to the dimensions of candidate food sources during the onlooker bee phase.All dimensions of a solution are updated at the same time, based on a corresponding reinforcement value.These reinforcement values indicate the range of the value in each dimension.A dimension which provides better solutions after a change will vary within a wider range during the update.
There are two differences between the original ABC and the R-ABC algorithms.First, different information is shared between employed bees and onlooker bees.Each employed bee in the R-ABC algorithm shares not only the quality of food source but also which dimension should be changed to find a better food source.Second, all dimensions of R-ABC are updated in every iteration while the original ABC algorithm updates only one dimension at a time.Generally, this will improve convergence speed for the R-ABC algorithm compared with the original ABC algorithm.
The reinforcement learning method enables an onlooker bee to update the values in all dimensions with different ranges according to the information from employed bees.This makes the ranges of update in each dimension adaptable to all kind of functions which results in making the algorithm capable of handling a wider range of real world problems.
The proposed algorithm was tested on eight basic numerical benchmark functions, seven CEC2005's shifted functions, and six CEC2014's hybrid functions.The selected basic numerical benchmark functions were chosen from four categories: unimodal separable functions, unimodal non-separable functions, multimodal separable functions, and multimodal non-separable functions.We tested the basic functions with 100, 500, 700 and 900 dimensions, the CEC2005's shifted functions with 100, 500, 700 and 900 dimensions, and the CEC2014's hybrid functions with 100 dimensions.
Compared with other algorithms, the R-ABC algorithm provides the best mean solutions for all basic benchmark functions with all tested dimensional size.Categories of benchmark functions do not consistently affect the solution quality of the R-ABC algorithm.Compared with other ABC variants on the CEC2005's shifted functions, the results provide evidence that the R-ABC algorithm is less sensitive to the growing number of dimensions than some algorithms in the majority of cases.Compared with other algorithms on the CEC2014's hybrid functions, the R-ABC algorithm is better than or at least comparable to the ABC variants.
In short, our results suggest that using reinforcement to differently control the degree of variation across dimensions in a high-dimensional problem is an effective modification to the original ABC.This technique provides good quality solutions to high-dimensional problems without sacrificing convergence speed.In the future, the value of parameter κ can be studied and modified to improve the performance of the R-ABC algorithm in the aspects of solution quality and convergence speed.

Scout Bee Phase: 32
26:Replace the current food source with the new one; : FOR each food source DO 33:IF TRIAL > LIMIT 34:Abandon the food source and send a scout bee to discovery a new food source; 35:Evaluate the new position; FEs = FEs+1;

Table 1 . Numerical benchmark functions. Function name (Type) Function Range Global minimum f
with the results of the original research, except for APABC because the published report does not provide the final results.As we are focusing on the issue of very high-dimensional problems, the experiments were conducted to solve 100D, 500D, 700D, and 900D problems, except for the CEC2014's hybrid functions which are defined for 100D.The maximum number of fitness evaluations (MFE) is set to 30000.If the number of function evaluations reaches MFE or the function value reaches the global optimum, the run is terminated.However, the BSF-ABC requires a pre-defined maximum number of iterations (MCN) because it uses the values of the current iteration number and MCN to update scout bees' positions.
600, 600] f 8 (0,. ..,0) https://doi.org/10.1371/journal.pone.0200738.t001validated Therefore, the comparison with BSF-ABC was conducted separately based on MCN = 10000.Moreover, to validate the effectiveness of the reinforcement vector, we also conducted the experiments with random vectors.The value of each element in a random vector is uniformly

Table 8
also shows the