Lagrange Interpolation Learning Particle Swarm Optimization

In recent years, comprehensive learning particle swarm optimization (CLPSO) has attracted the attention of many scholars for using in solving multimodal problems, as it is excellent in preserving the particles’ diversity and thus preventing premature convergence. However, CLPSO exhibits low solution accuracy. Aiming to address this issue, we proposed a novel algorithm called LILPSO. First, this algorithm introduced a Lagrange interpolation method to perform a local search for the global best point (gbest). Second, to gain a better exemplar, one gbest, another two particle’s historical best points (pbest) are chosen to perform Lagrange interpolation, then to gain a new exemplar, which replaces the CLPSO’s comparison method. The numerical experiments conducted on various functions demonstrate the superiority of this algorithm, and the two methods are proven to be efficient for accelerating the convergence without leading the particle to premature convergence.


Introduction
In 1995, Kennedy and Elberhart, inspired by the foraging behaviour of birds, proposed the particle swarm optimization (PSO) algorithm [1], which has attracted attention in the academic circles and has demonstrated its superiority in solving practical problems. PSO is a type of evolutionary algorithm, which is similar to the simulated annealing (SA) [2] algorithm. PSO starts from a random solution, searches for the optimal solution in iterative manner, and then evaluates the quality of the solution based on the fitness. PSO follows the current optimal value to find the global optimum, so it is simpler than the genetic algorithm (GA) [3], without the need for the "cross" and "mutation" operations. However, PSO suffers two problems: premature and slow convergence at the late stage. In the past two decades, many researchers have focused on addressing these problems by introducing some methods and concepts. A concept of inertia weight was introduced and applied in the formula by Y.Shi and Eberhart, they set the weight from 0.9 to 0.4 to provide a balance between exploitation and exploration [4] [5]; On the basis of that, later researchers developed adaptive inertia weight and coefficients [6] [7]. To avoid the premature convergence of the particle swarm, Riget J and Vesterstrim Js proposed a concept of If the particle is denoted as X i = (x i1 , x i2 , Á Á Á, x iD ), Then the best position it experienced (the best fitness value) is P i = (p i1 , p i2 , Á Á Á, p iD ), also denoted by p best , The index of the best position experienced in particle group represented by symbol g is denoted by Pg, or g best . The speed of particle i is denoted by V i = (v i1 , v i2 , Á Á Á, v iD ), for each generation, its d th dimension iteration functions are: v id ðt þ 1Þ ¼ ov id ðtÞ þ c 1 r 1 ðp id ðtÞ À x id ðtÞÞ þ c 2 r 2 ðp gd ðtÞ À x id ðtÞÞ ð1Þ Where, c 1 and c 2 are all positive constants, called learning factors; r 1 and r 2 are random numbers, which are from 0 to 1; v id is the -dimension speed of each particle. The first part of the right of the equal sign in Eq 1 is caused by the particle previous velocity, is called "inertia" part. The second part is "cognition" part, which illustrates that the particle thinks itself, as well as influences the particle information itself of the next step. The third part is the "social" part, which illustrates the information shared and mutual cooperation, as well as the influences on the swarm information of the next step.

CLPSO algorithm
CLPSO iteration function is different from the standard PSO.
v id ðt þ 1Þ ¼ ov id ðtÞ þ c 1 r 1 ðp id ðtÞ 0 À x id ðtÞÞ ð3Þ where, o ¼ o max À t max gen ðo max À o min Þ, ω max = 0.9 and ω min = 0.4. p id (t) 0 is the exemplar of the i th particle in the d th dimension. If the i th particle does not update its historical optimum (pbest) continuously and over the gap m (usually m = 7), then a random number from 0 to 1 will be generated; for each dimension, if this random number is less than pc(i)(Eq 5), then another two particles' historical optimum values will be compared, with the better one chosen for the exemplar in the d th dimension. If all the exemplars of a particle are its own pbest, then we will randomly choose one dimension to learn from another particle's pbest's corresponding dimension.

Lagrange interpolation
The theory of Lagrange interpolation is to use a polynomial to represent the relationship between a number of things. For example, when observing a physical quantity, if we gain some different values at different places, then a polynomial can be simulated by Lagrange interpolation method. The aim of this method is mainly used for data fitting in engineering experiments. It is a kind of curve smoothly fitting method. Generally, if the fitness y 0 , y 1 Á Á Áy n at n+1 points x 0 , x 1 Á Á Áx n of function y = f(x) are known, then a polynomial y = P n (x) is considered, who occupies the n+1 points and whose number is no less than n.

Local search with Lagrange interpolation (LSLI)
The main idea of CLPSO's search method is to make the particle experience all the local optima as much as possible. Hence, the Eq 3 of CLPSO cuts off the global optima part compared with the Eq 1 of PSO. Moreover, after learning from the other particle's historical optima, CLPSO sets a gap value m to digest this information. It is no doubt that these procedure will slow the particle swarm convergence, but will be in favor of multimodal function solution.
To accelerate the convergence, we decide to add a kind of local search into CLPSO. By [35]. The technique of subgradient can find the convergence direction easily. However, for the discontinuous and nondifferentiable problems, the direction obtained will mislead the convergence. In addition, the step size is hard to decide. The technique of perturbation with neighborhood is not influenced by the form of the function. However, this method has no convergence direction, and usually needs many additional function evaluations (FEs). Hence, we adapt the local search technique of Lagrange interpolation to weaken these problems.
For the j th dimension of the gbest, we select three points to generate the information and perform Lagrange interpolation. One point is the gbest itself, another point and the last point are the perturbations nearby the gbest. The perturbation value is denoted by delta, shown in Eq 7.
x 0 ðjÞ ¼ gbestðjÞ; x 1 ðjÞ ¼ gbestðjÞ þ delta; Where, v(i, j) is the particle's speed who has the best fitness for each iteration; η is a very small coefficient. In this research, we set η = 0.5/N, N is the particle swarm size. In the j th dimension space, the three points can generate a parabola by the Lagrange interpolation method, and the minimum point is desired. Eqs 9 and 10 is the Lagrange interpolation computation.
Where, y 0 = fitness(x 0 ), y 1 = fitness(x 1 ), , after calculating, we obtain a quadratic polynomial. Fig 1 shows the different cases of the desired solution. When I 6 ¼ 0, which means the three points are different, if a > 0, then two cases will happen: one is that the gbest is between x1 and x2, and another is that the gbest is on one side of x1 and x2. At this time, we will choose the minimum point -b/2a as the solution, then compare it with the gbest. If a < 0, then we will randomly generate a point near the smallest one by Eq 11. Where, x min , x mid and x max are the resort points of x0, x1 and x2 according to their fitness separately. If a = 0, then the three points form a line. If a = 0 and b = 0, this means y0 = y1 = y2, then we will randomly select a position from the area, denoted as x3 (Eq 12). Where, ðx max þ x min Þ 2 is the search center, and because −0.5 < rand − 0.5 < 0.5, the random search radius is from 0 to half of the area. If a = 0 and b! = 0, then the solution will be chosen by Eq 11. When I = 0, which means v (i, j) = 0, then the procedure will be terminated.
The flowchart of Lagrange interpolation is shown in Fig 2. Comparing with other local search techniques, Lagrange interpolation has three characteristics. First, this method has a very fast convergence speed, especially for uni-modal functions. For example, for the Sphere function computation, whose global optima is [0, 0] 10 , supposing that the gbest is [1, 1] 10 , and the delta is 2. After one Lagrange interpolation computation, we obtain the next gbest is [0, 0] 10 . Second, for each dimension Lagrange interpolation, it will cost three additional FEs, and for D dimension problem, the additional FEs of the gbest will be 3 Ã D. Third, the behaviour of Lagrange interpolation is only a local search, it will not broken the diversity of the whole particle swarm, hence it will remain the CLPSO's search ability for multi-modal functions.

Lagrange interpolation learning (LIL)
In CLPSO, if the i th particle's pbest does not update for a certain number of times, then another two particles' pbest will be chosen for comparison, and the better one will be the exemplar. However, if the exemplar is still worse than the i th particle's pbest, then this particle will remain stagnant until the next flag(i)>m, with a large probability. Hence, the best manner to improve the search efficiency is to set the gbest as the exemplar. Nevertheless, the so-called gbest in the computation is not the real gbest rather, it can be a best one of many local optima that the  program runs until now. If we set gbest as the exemplar directly, as in the traditional PSO, then a premature convergence will occur. To avoid both the particle's stagnant nature and premature convergence, we ensure that the exemplar's information has two characteristics. First, the fitness of the exemplar is at least better than the i th particle's pbest. Second, the information from the exemplar has a diversity that cannot lead all the particles to prematurely fly into a same area. Hence, we decide to select three points to generate the information mentioned above. One point is the i th particle's pbest itself, another point is gbest, and the last point is the rand th particle's pbest, except for the i th particle. In the d th dimension space, the three points can generate a parabola by the Lagrange interpolation method, and the minimum point is desired. Fig 3 shows the difference between CLPSO and LIL.
This search behaviour has three strengths: First, the exemplar is the minimum from Lagrange interpolation with three points which are two pbests and one gbest, this process ensures the learning direction is always flying to a theoretical point which is better than gbest. Second, for most of the local search, additional time must be spent on computing the function evaluations (FEs) to obtain some key information, whereas LIL does not require any additional FEs. Third, after performing the LSLI mentioned in Section 3.1, we obtained a better gbest, then the LIL can share this information to other particles as soon as possible.

Parametric settings
To a convenient computation, we plan to run LSLI for N times, which is equal to the particle swarm size. However, To a fair comparison, noticing that there will be 3 Ã D additional FEs for each LSLI, and we hope the total FEs of all compared algorithms are equal, thus we short the max iteration number (max_gen) to max_gen − 3 Ã D. The FEs cost in LSLI are 3 Ã D Ã N, and FEs cost on other part are (max_gen − 3 Ã D) Ã N, plus them, we get the total FEs max_gen Ã N, which is the same with CLPSO.
When to run LSLI? we set a gap g, g = floor(maxDT/N), where, maxDT = max_gen − 3 Ã D. Hence, when the particle swarm updates for g times by Eqs 3 and 4, LSLI will run for one time.
In CLPSO, the learning probability pc(i) is set as Eq 5. In this research, we chose a linear probability form the Ref. [28] to increase the learning chance.
To compare with OLPSO [36], in this algorithm, we chose both c1 and c2 to be 2, w0 = 0.9 and w1 = 0.4. The setting of the bounds of the search space and the velocity of any particle affect the search procedure; hence we chose the same velocity boundary setting as that used in most of the algorithms.

Numerical experiments
Seventeen functions are collected from [17][19] [37][38] [39], as presented in Table 1, where, F1, F3, F4 and F5 are uni-modal functions; Rosenbrock (F2) is a multi-modal function that has a narrow valley and hard to achieve the global optimum; F5 is a noisy function who has a discrete problem; un-rotated multi-modal functions include F6 F11; F12 and F13 are rotated multi-modal functions; F14 F16 are shifted rotated multimodal functions. The orthogonal matrix M is generated according to [38].
Since some other algorithms, such as PSO-cf-local [36], UPSO [40], FIPS [41] and DMSPSO [42], are proven to be less superior to CLPSO in reference [19]; hence, in this research, we just need to compare LILCLPSO with CLPSO, ECLPSO, OLPSO and DNLPSO, whose iterative forms are presented in Table 2. The parametric setting of ECLPSO is the same as that of ECLPSO-4 in [22]. Because we do not know the exact orthogonal matrix for OLPSO, we introduce the result directly from the reference [19]. For DNLPSO, we don't know the exact topology data, hence we introduce the result directly from the reference [26]. To proof the performance of LSLI and LIL, we test two algorithms, called LILPSO1 and LILPSO2, where, LILPSO1 just adapt LSLI technique, and LILPSO2 adapt both LSLI and LIL technique.
For 10 D problems, the following parameters are used: particle number = 50, iteration number = 2000, and FEs = 100,000; for 30 D problems, the following parameters are used: particle number = 40, iteration number = 5000, and FEs = 200,000; for 50 D problems, the following parameters are used: particle number = 100, iteration number = 5000, and FEs = 500,000. For each function, each algorithm runs for 25 times, and the solutions are analysed using the twotailed t-test, with the confidence level of 0.05, '+', '-' and '=' denote that LILPSO is better, worse and equal to other algorithms statistically, respectively.
Tables 3-8 list the results tested in 10D, 30D and 50D, respectively, and Fig 5(a)-5(f) show these algorithms' convergence curves for some different functions.
For the uni-modal and low dimension problems, DNLPSO has the better performance relatively. For the noisy function F5, LILPSO has the better performance. For uni-modal function doi:10.1371/journal.pone.0154191.t002 F3, DNLPSO and LILPSO have the equal solutions. Because DNLPSO's iterative form contains the gbest part, it will have a fast convergence for uni-modal functions without a doubt. Besides, the neighbourhoods topology behaviour plays a role of decreasing the search space actually. However, for the high dimension problems, i.e. 50D, LILPSO has all the best solutions, which illustrates the Lagrange interpolation technique has a fast convergence performance for complex high dimension problems. For the multi-modal problems, DNLPSO is less superior to LILPSO for almost all the functions. Comparing OLPSO with LILPSO in 30D problems, for F2 and F8, LILPSO has the better solutions; for F6, F7, F9, F10 and F11, LILPSO has the equal solutions with OLPSO statistically, which illustrate that the Lagrange interpolation technique can help accelerating the pbest's convergence when performing a local search. For F12, F13 and F14, OLPSO is superior to LILPSO, which illustrates that LSLI is restricted to rotated problems, although it is still better than CLPSO.
Comparing LILPSO1 with LILPSO2, for the most problems, LILPSO2 is superior to LILPSO1, which illustrates that LIL can help sharing the LSLI's information to accelerate the convergence. Meanwhile, unlike the gbest's part of DNLPSO, LIL neither break the diversity of the particle, nor lead the particle to premature. However, for the rotated functions such as F13, F14, F16 and F17, the solutions of LILPSO1 are better than LILPSO2, which illustrates that LIL is not suitable for solving the rotated problems either.
Comparing CLPSO with LILPSO, LILPSO has all the better solutions than CLPSO, which illustrates that the Lagrange interpolation is a stable and efficient local search technique.
Comparing OLPSO with LILPSO, they both inherit the advantage of CLPSO in solving the multi-modal problems, nevertheless, LILPSO is obviously superior to OLPSO in solving the uni-modal problems. Moreover, OLPSO-L needs O(2 log 2 (D + 1) D + ND) memory space to store its algorithm related data structures, which means longer cost time for complex real

Application for PID control
The fan speed system controlled by oil in air turbofan launch is taken for example. The transfer function model of the system is: 1:192s þ 6:273 s 2 þ 7:167s þ 12:84 PID discretion control equation is The objective function is: Where, |e(t)| is the error, t u is the rise time, u(t) is the output of the controller, ω1, ω2, ω3 are the weight. To solve the problem of system overshoot, use the punish function, once the system overshoot happens, the objective function will be: The parameters are set as: swarm size N = 30, function evaluations FEs = 3000. The results are shown in Table 9. It illustrates that the results optimized by LILPSO2 have the smallest goal function value, and the mean error. Hence, LILPSO2 algorithm is more efficient.

Conclusions
In this study, we proposed a novel method known as LILPSO to improve upon the state-ofthe-art CLPSO method. First, the Lagrange interpolation approach is introduced to perform a local search near the gbest, and help accelerating convergence. Second, this technique is introduced to replace the simple comparison used in CLPSO, to achieve a better exemplar. After performing numerical experiments, LILPSO was proven to be superior to CLPSO, ECLPSO, DNLPSO, OLPSO for most of the test functions considered. The Lagrange interpolation was proven to be an efficient local search approach except for rotated problems. The future work is to use this method into other fields.
Supporting Information S1