Testing Nelder-Mead Based Repulsion Algorithms for Multiple Roots of Nonlinear Systems via a Two-Level Factorial Design of Experiments

This paper addresses the challenging task of computing multiple roots of a system of nonlinear equations. A repulsion algorithm that invokes the Nelder-Mead (N-M) local search method and uses a penalty-type merit function based on the error function, known as ‘erf’, is presented. In the N-M algorithm context, different strategies are proposed to enhance the quality of the solutions and improve the overall efficiency. The main goal of this paper is to use a two-level factorial design of experiments to analyze the statistical significance of the observed differences in selected performance criteria produced when testing different strategies in the N-M based repulsion algorithm. The main goal of this paper is to use a two-level factorial design of experiments to analyze the statistical significance of the observed differences in selected performance criteria produced when testing different strategies in the N-M based repulsion algorithm.


Introduction
In this paper, we aim to investigate the performance of a repulsion algorithm that is based on a penalty function and the Nelder-Mead (N-M) [1] local search procedure to compute multiple roots of a system of nonlinear equations of the form in the sense that they have the same solutions. Thus, a global minimizer and not just a local one, of the function M(x), known as merit function, in the set O, is required. Although finding a single root of a system of nonlinear equations is a trivial task, finding all roots is one of the most demanding problems. Multistart methods are stochastic techniques that have been used to compute multiple solutions to problems [2,3]. In a multistart strategy, a local search procedure is applied to a set of randomly generated points of the search space to converge sequentially along the iterations to the multiple solutions of the problem, in a single run. However, the same solutions may be located over and over again along the iterations resulting in a very expensive process. Other recent approaches combine metaheuristics with techniques that modify the merit function (2) as solutions are being found [4][5][6][7][8]. Mostly, the techniques rely on the assignment of a penalty term to each previously computed root so that a repulsion area around each previously computed root is created. The repulsion areas force the algorithm to move to other areas of the search space and look for other roots, avoiding repeated convergence to already located solutions. The main idea of this type of methods, designated by repulsion method, is to solve a sequence of global optimization problems aiming to minimize the modified merit function M that creates a repulsion area around each computed solution ξ i , for i = 1, . . ., N r , where N r represents the number of previously computed minimizers, as follows min x2O&R n MðxÞ MðxÞ þ Pðx; x i ; Á Á ÁÞ: ð3Þ The function P(x;ξ i , Á Á Á) is the penalty term that depends on ξ i and one or more parameters [6][7][8]. The goal of these parameters is to adjust the penalty for already located minimizers. They may be used to reduce the radius of the repulsion area or to highly penalize the proximity to located solutions. In this study, we further explore this penalty-type approach to create repulsion areas around previously detected roots and propose a repulsion algorithm that is capable of computing multiple roots of a system of nonlinear equations invoking the N-M local procedure with modified merit functions. The herein proposed repulsion algorithm is of a stochastic nature in the sense that a sequence of points are randomly generated inside the search space O aiming to increase the exploration ability of the algorithm. The exploitation ability of the algorithm is carried out by implementing the N-M local search starting from each of the generated points. The N-M method is a derivative-free local search procedure that is capable of converging to a minimizer of a merit function, provided that a good initial approximation is given, with a reduced computational effort when compared with most metaheuristics available in the literature.
Design of experiments (DoE) is a powerful statistical tool that is capable of developing an experimentation strategy to learn and identify crucial factors that influence experimental data, using a minimum of resources [9,10]. Our main challenge here is to analyze the effect of imposing slightly different strategies on the classical N-M algorithm, namely • randomly generating points for the initial simplex when they fall outside the search space O; • dynamically setting the simplex parameters to generate expansion, contraction and shrinkage vertices; • generating renewal positions for randomly selected vertices of the simplex to overcome simplex degeneracy; • generating vertices around the best vertex according to the Lévy distribution to replace the classical shrinkage of the simplex.
We aim to conclude if the observed differences in the performance, measured by different criteria, are considered statistically significant, or are they just explained by normal variation like noise. Usual performance criteria measure robustness and efficiency. For robustness, we consider the percentage of runs that converge to all roots of the system of equations, and for efficiency, we use the number of function evaluations and time required to compute each root. The paper is organized as follows. First, we address the proposed repulsion algorithm based on an 'erf' penalty merit function; second, a summary of the main steps of the classical N-M algorithm, as well as the description of different strategies for testing and performance assessment are presented. Afterwards, a two-level factorial DoE is introduced and the corresponding statistical analysis is carried out. Finally, a performance comparison is reported and the conclusions are presented.

Repulsion algorithm
This section aims to briefly describe a penalty-type approach to create repulsion areas around previously computed roots of a system of nonlinear equations, so that convergence to already located solutions is avoided. Algorithms based on repulsion merit functions have been used for locating multiple roots of systems of equations [6,8]. The merits of a repulsion algorithm are that multiple roots can be located with a reduced computational effort in a single run of the algorithm. The pseudo code of the repulsion algorithm is presented in Algorithm 1 in Table 1.
The algorithm solves a sequence of global optimization problems by invoking a local search procedure that is capable of locating a local minimizer of a merit function. This function is modified as the roots are located. See steps 1 and 1 of Algorithm 1 in Table 1. The first call to the local search considers the original merit function (2). Thereafter the merit function is modified to avoid locating previously computed minimizers.
In the algorithm, 'k R ' is the iteration counter, 'X' is the set where the roots are saved, 'N r ' contains the number of located roots so far, 'r 1 , . . ., r N r ' are variables that contain the number of times each root is relocated and 'k uns ' represents the number of consecutive unsuccessful iterations, i.e., consecutive iterations that are not able to find any new root. To identify a root of the system the algorithm requires that its merit function value is under 10 −10 . A located root ξ is considered to be different from the previously computed roots ξ i , i = 1, . . ., N r (saved in X) if kξ−ξ i k > , for all i = 1, . . ., N r , where is a small positive error tolerance.
The repulsion algorithm terminates when a maximum number of iterations, k Rmax , is exceeded or when a pre-specified number of consecutive iterations (15, in this algorithm) are not able to locate any new root. Thus, if k R > k R max or k uns > 15 the algorithm stops.
The algorithm explores the search space by randomly generating points in O in a sequential manner (see steps 1 and 1 of the algorithm). Starting from the sampled point y, the algorithm exploits the region in order to locate a minimizer of the merit/modified merit function by invoking a local search procedure (see steps 1 and 1 of Algorithm 1 in Table 1). Since the goal is to produce a derivative-free effective algorithm for locating multiple roots, our proposal for computing a minimizer of the merit function, given a sampled point y, is the N-M local procedure. This is the subject of the next section.
The herein implemented penalty-type merit function is based on the error function, known as 'erf', which is a mathematical function defined by the integral and satisfies the following properties erf(0) = 0, erf(−1) = −1, erf(+1) = 1 and erf(−x) = −erf (x). In a penalty function context aiming to prevent convergence to a located root ξ i , thus defining a repulsion area around it, the newly developed 'erf' penalty term is: which depends on the parameter δ > 0 to scale the penalty for approaching the already computed solution ξ i and on the parameter ρ to adjust the radius of the repulsion area. To better understand the effect of the 'erf' penalty around 0, we plot in Fig 1 the function (4) for different values of the parameter δ (1, 10 and 100). We note that the penalty term tends to δ when x approaches a root ξ i , meaning that δ should be made large enough so that ξ i is no longer a minimizer of the modified penalty merit function. Further, as x moves away from ξ i , the penalty P(x;ξ i , δ, ρ) ! 0 meaning that the modified merit function MðxÞ ¼ MðxÞ þ Pðx; x i ; d; rÞ is not affected when x is far from a previously located root.

N-M Algorithm Variants
The N-M algorithm, also known as simplex search algorithm, originally published in 1965 [1], is probably the best known algorithm for multidimensional derivative-free optimization, and is an improvement over the Spendley's et al. method [11].
In R n , the N-M algorithm uses a set of n+1 vertices, by now denoted by x 1 , x 2 , . . ., x n , x n+1 , to define a simplex, which is an n-dimensional polytope-the convex hull of its n+1 vertices. The algorithm moves at least one vertex per iteration defining a new simplex for the next iteration. The main idea is to maintain at each iteration a nondegenerate simplex, i.e., a geometric figure in R n of nonzero volume. In [11], either a reflection or a shrink step is performed so that the shape of the simplex does not change along the iterative process. In general, a reflection step is performed when the current simplex is far from the solution, while a shrink step is performed when the simplex is close to the solution. In terms of computational effort, when a reflection step is performed only one or two function evaluations are required, while when a shrink step is performed, n function evaluations are required.
The initial simplex could be computed by performing small perturbations around an initial guess point. Thus, given x 1 , an approximation to the minimizer of M (or, it may be a randomly generated point in O), the other n vertices are usually generated along the unit coordinate vectors e j 2 R n with a fixed step size " e 2 (0,1] (at an equal distance relative to x 1 ) as follows: It has been observed that using short edges in the initial simplex is a good strategy for small dimensional problems [12]. Although this is the most popular rule to generate the initial simplex, when the problem contains bound constraints, some (or all) components of the other vertices x j (j = 2, . . ., n+1) may fall outside the search space O, even if x 1 is inside. Two possible ways of restoring feasibility are: (i) project the component of the point onto the boundary of O, as shown in Algorithm 2 in Table 2; or (ii) randomly generate the component of the point inside O, as described in Algorithm 3 in Table 3, where τ is a uniformly distributed random number To define a set of candidate points to be a vertex of the simplex of the next iteration, the simplex is ordered so that M(z 1 ) M(z 2 ) Á Á ÁM(z n ) M(z n+1 ), where z 1 is termed the best vertex, z n+1 the worst vertex and z n the second-worst (or next-worst) of the simplex. Note that simplex ordering relabels the vertices according to merit function value and are now represented by z 1 , z 2 , . . ., z n , z n+1 . Let the centroid of the n best vertices be defined by for j = 1, . . ., n, where x r , x e , x oc and x ic , usually referred to as reflection vertex, expansion vertex, outer contraction vertex and inner contraction vertex, are obtained setting γ to γ r , γ e , γ oc and γ ic respectively. We remark that the algorithm checks if the component of the vector in (5) stays inside O; otherwise, a projection onto the boundary is carried out. The rules that are used to accept one of the above referred candidate points are directed related with their objective function values relative to the function values of the best vertex, second-worst vertex and worst vertex, are shown in Algorithm 4 in Table 4, and are according to [13]. There might be some differences in the literature, for instances in [1]. When one of these vertices is accepted, the worst vertex is discarded and the new point takes the position of a vertex of the simplex for the next iteration.
In classical versions of the N-M method [1,[13][14][15][16][17], when none of these vertices is accepted, according to the rules described in Algorithm 4 in Table 4, the simplex is shrunk towards the best vertex by using for i = 2, . . ., n+1 and j = 1, . . ., n. According to [1], the simplex γ parameters that give the step size to generate the points in (5) and (6) must satisfy 0 < γ r < γ e , γ e > 1, −1 < γ ic < 0 < γ oc < 1 and 0 < γ s < 1. The most commonly used values for the parameters are: The examples reported in [15] showing that N-M may converge to a non-critical point of a strictly convex objective function are well-known in the community. N-M is one of the most popular direct search method and has shown to behave rather well when solving small dimensional problems. Although it has been used in many scientific and engineering applications, there are just a few works on the convergence properties of the algorithm [12,13,15,16]. The performance of N-M gets worse as n increases. The authors in [12] prove that the efficiency of the expansion and contraction steps diminishes as n increases when N-M relies on the standard parameter values (see in (7)) to solve uniformly convex objective functions. Thus, values for the γ parameters for expansion, contraction and shrinking, dependent on n, are proposed in [12] to improve convergence. The golden ratio represented by the Greek letter F has been used in many practical situation to bring balance and harmony in life. It is a real number and is the unique positive solution of the quadratic equation F 2 −F−1 = 0. Here, we propose to define the N-M γ parameters using the golden value F ¼ ffi ffi 2 and n in order to have balanced expanded and reduced simplices, as follows: Note that we propose to use γ r = 1 in order to keep the isometric reflection since this is crucial for good performance [12,16]. We also remark that expansion, contraction and shrinkage vertices are obtained from (5) and (6), using (8) to define the associated parameters dependent on the problem's dimension. The goal is to reduce the simplex in a smoother way. Require: n, k max , ε, x 1 2 Ω 1: Set k = 0 2: Generate the remaining n vertices x 2 , . . ., x n+1 using Algorithm 2 in Table 2 or Algorithm 3 in Table 3 3 The major difficulty with the N-M method is that a sequence of simplices can come arbitrarily close to degeneracy. Let V be the n×n matrix with n-vector columns v 1 , v 2 , . . ., v n where and det(V) denote the determinant of V. To keep the interior angles of the simplex bounded away from 0, the following condition has to be maintained along the process where v is a small positive constant. When (10) fails to be satisfied, the simplex has collapsed and needs to be reshaped. The simplest option is to 'break' the iterative process and return to the main repulsion algorithm with no solution. Here, we are interested in testing a renewal strategy that aims to keep the best vertex and has a 75% chance of changing the remaining n vertices (i = 2, . . ., n + 1), componentwise, as follows: using again the golden ratio to balance the step size of the move around z 1 . If the generated components fall outside O, a projection onto the boundary is carried out. When none of the x r , x e , x oc and x ic vertices is accepted, a procedure based on the randomly generation of points according to a Lévy distribution is also proposed. This is a stable and heavy-tailed distribution that is able to provide occasionally long movements. Similarly to the classical shrinkage procedure (6), the best vertex is maintained, and the other n vertices are generated around z 1 : for i = 2, . . ., n+1 and j = 1, . . ., n, where L(α) is a random number from the standard Lévy distribution, with location parameter equals to 0, a unit scale parameter, and the shape parameter α = 0.5. Again, if the components fall outside O, a projection onto the boundary is carried out. The vector σ i in (12) represents the variation around z 1

Two-Level Factorial DoE
In this section, we aim to use a two-level factorial DoE to analyze the effect of imposing the above referred strategies on the N-M algorithm, as well as to know if the observed differences in selected performance criteria are considered statistically significant or are they just due to random variations from normal distribution. Some notation and terminology related to factorial DoE will be introduced. In the previous section, we considered different strategies that can be implemented and manipulated within the N-M algorithm, in order to improve its performance. We aim to analyze the effect of those strategies-hereafter denoted by 'factor levels' or 'treatments'-on the performance of the algorithm. For the design of an experiment, the following main steps must be carried out [9,10]: • formulation of the statistical hypotheses; • definition of the factors and their levels (independent variables) and the measurements (response or dependent variables) to be recorded; • specification of the number of experimental units; • specification of the randomization process to assign the experimental units to the treatments; • determination of the statistical analysis.
The simplest DoE involves randomly assigning experimental units to a factor with two or more levels. However, when more than one factor require to be analyzed, a factorial DoE is preferable. The simplest one involves two factors, each at two levels, denoted by two-by-two design. To analyze the two factor effects-named 'A' and 'B' -, a set of four experimental conditions should be analyzed. Let us denote the two levels of each factor by 'low' (or '-1') and 'high' (or '+1'). Then, the first experiment combines the low level of A with the low level of B, the second combines the high level of A with the low level of B, the third combines the low level of A with the high level of B, and the fourth combines the high levels of both A and B. Although factorial experiments can involve factors with different number of levels, we are interested in a design where all factors have two levels. We plan to use a factorial DoE with four factors, each at two levels, and overall 2 4 experimental conditions are to be conducted. In the statistical field, each experimental condition represents a different 'treatment' protocol.
We have chosen sixteen problems with different characteristics, dimensions and different number of roots to test the 2 4 'treatments'. The problems and the number of roots are listed in Table 5 and are available in the literature [3,5,6,8,18,19]. Due to the stochastic nature of the repulsion algorithm, from which N-M is invoked, we run each 'treatment' 30 times with each problem. The results are averaged over the 30 runs for each problem.
Three response variables are considered to analyze the performance of the algorithm variants: • the proportion of successful runs, ranging from 0 to 1, and hereafter denoted by Y p ; • the number of function evaluations to locate each root, ranging from 0 to 1500, denoted by Y e ; • the time (in seconds) to locate each root, ranging from 0 to 1, denoted by Y t .
To compute Y p , a run is considered to be successful if all the roots are located at that run. To take into consideration the diversity of problems, the values of the above response variables for the statistical analysis are averaged over the sixteen problems. At the end, for each 'treatment' an average value of a total of 480 results is considered in this experiment.

Parameters setting
All the experiments were carried out in a PC Intel Core 2 Duo Processor E7500 with 2.9GHz and 4Gb of memory. The algorithms were coded in Matlab Version 8.0.0.783 (R2012b). The parameters have been set after an empirical study. For the Algorithm 1, we consider: k Rmax = 50, = 0.005, δ = 100 and ρ = min{ρ , min i kx−ξ i k}, where ρ is set to 1 for the Geometry problem, to 0.1 for Trans, Parsopoulos and NonDif problems, to 0.001 for NonD2, Floudas and Robot and to 0.01 for the remaining problems. In the N-M algorithm, we use " e = 0.1, v = 10 −8 and k max depends on the problem's dimension and is set to 100n 2 . Table 5. Problems set. NonD2 Floudas

Factor and interaction effects
The four factors, denoted for simplicity by A, B, C and D, that are manipulated in the N-M algorithm in order to analyze the effect on the performance of the repulsion algorithm are the following. Factor A is directly related with the initialization of the simplex. This is a factor of categorical type. Either Algorithm 2 in Table 2 or Algorithm 3 in Table 3 is analyzed. The low level (-1) of factor A represents the Algorithm 2 and the high level (+1) represents Algorithm 3.
Factor B is related with the values set to the simplex γ parameters and is a numerical type factor. The standard values in (7) define the low level and the values herein proposed in (8) define the high level of the factor.
Factor C is related with the strategy to be used to overcome the simplex degeneracy. This is also a categorical type factor. The low level represents the 'break' strategy and the high level represents the generation of new vertices using (11) with probability 0.75.
Factor D has to do with the adopted strategy when none of the reflection, expansion and contraction vertices is accepted. The low level represents the usual shrinkage strategy, as shown in (6), while the high level aims to represent the generation of new vertices by the Lévy distribution using (12).
The results obtained from the combination of the four factors, each at two levels, are shown in Table 6. The layout is a standard one, starting with all factors at low levels and ending with all factors at high levels. Interactions occur when the effect of one factor depends on the level of the other. They cannot be detected by an one-factor-at-a-time experimentation. This factorial design also allows the estimation of six 2-factor interactions (AB, AC, AD, BC, BD, CD), four 3-factor interactions (ABC, ABD, ACD, BCD) and one 4-factor interaction (ABCD), a total of 15 effects. (The most we can estimate from a 'four factors at two levels' DoE, because 1 degree of freedom is used to estimate the overall mean.) The ± signs for interaction effects are computed by multiplying the signs of the factors involved in the interaction. After conducting the experiments, the observed averaged values of the response variables are reported in the last three columns of Table 6.
Replication is an important principle in experimentation and is concerned with using more than one experimental unit under the same conditions. When replication is used, error effects can be estimated. Although we have multiple response observations taken at the same factor levels, they cannot be considered replications since these would have been recorded in a random order. Furthermore, the stochastic nature of the repulsion algorithm dictates that the experimental conditions would not be at all the same over the replications.
Let Y i represent the average value of all the obtained results when treatment i is conducted To estimate the effect of each factor or interaction on each response variable (Y p , Y e and Y t ), we analyze the difference or contrast between the average of the highs and the average of the lows. Mathematically, the effect can be estimated using where N f represents the number of factors in this two-level analysis, each ± sign corresponds to the sign of the respective factor/interaction column in Table 6 and the Y 1 ; . . . ; Y 2 N f are the averaged values obtained for each associated response variable, during the 2 N f conducted 'treatments'. The estimated effects computed by (13) are shown in the last three rows of Table 6. It looks like factor C has a high impact on the response variable Y p , with E e = 0.158, (which aims to measure algorithm robustness) and factor B has a moderate effect on Y p (E e = 0.024). Factors A   A simple analysis allows us to conclude that factor C produces the highest effect on the response variables. It is noticed that the results obtained when factor C is at the low level are consistently lower than those obtained when C is at high level. This means that when the simplex collapses, the strategy that generates new vertices using (11) with probability 0.75, when compared with the strategy that 'breaks' the cycle: (i) has a positive impact on Y p (with E e = 0.158), which is good since the percentage of successful runs increased; (ii) has a positive impact on Y e (with E e = 110.260), increasing the number of function evaluations which means that the strategy is computationally more demanding; and (iii) also has a positive impact on Y t (with E e = 0.012), meaning that the time has increased worsening the efficiency.
A similar reasoning can be applied to the factor B related with the choice of γ parameters reported in (8), when compared with (7), although with a more moderate impact.
As far as the impact of factor A is concerned, we may conclude that Algorithm 3 in Table 3 for the initialization of a feasible simplex, when compared with Algorithm 2 in Table 2: (i) has slightly increased the percentage of successful runs (with E e = 0.010); (ii) has a small negative impact on Y e (with E e = −2.628), meaning that the number of function evaluations has slightly decreased; and (iii) also has a small negative impact on time (with E e = −0.001), i.e., the time has been reduced.
Finally, factor D has little impact on the response variables. When using the Lévy distribution to generate the n worst vertices of the simplex, instead of the standard shrinkage strategy, the percentage of successful runs has slightly increased (with E e = 0.010), the number of function evaluations and the time have also suffered small increases (E e = 12.767 and E e = 0.001 respectively).
The effects of interactions between factors are estimated using formula (13) as well. Relative to the variable Y p , the estimated (interaction) effects, E e , in absolute value, range from 0.005 to 0.014. This seems to show that the differences are considered normal variations just due to chance. There are some small values of interaction effect of AD, BC, BD and BCD but they are in general smaller than the effects of each factor separately. The same is true relative to the interaction effects on both Y e and Y t .

Normal variation
To better analyze the variation of the effects to check if they vary normally or if some of them are significantly kept away from the others, we carry out the plotting of the effects, using their absolute values and the half-normal probability plot. The horizontal axis of the half-normal plot displays the absolute value of the estimated effects and the vertical axis displays the cumulative probability of obtaining a value less or equal to a certain target. The values for the plotting are displayed in Table 7.
From the half-normal probability plot, we aim to identify the important effects, i.e., the factors that significantly affect the values of the response variables, in a statistical sense. Unimportant effects tend to have a normal distribution centered near zero while important effects tend to have a normal distribution centered at their large effect values. Fig 2 contains the half-normal probability plot of the effects for variable Y p . It is observed that factor B and in particular factor C fall off to the right of the straight line that emanates from the origin and fits the near zero values. Note that the majority of the effects/points are near zero and fall along the line meaning that their variations are due to normal causes, like noise, and are considered unimportant. The effects that are likely to be important fall out and are far from the line.  compared to the others. To confirm these conclusions in a statistical sense, an analysis of variance is carried out in the next subsection. Fig 4 shows that factors B and C seem to be important, and it looks like that interaction CD could be important. This matter is investigated in the next subsection.

Analysis of variance
The statistical analysis of DoE relies on the analysis of variance (ANOVA), a collection of statistical models that partition the observed variance on the response variable (variation relative  to the mean value) into components according to the factors that the experiment is testing. The statistical analysis of the four factor effects, as well as their interaction effects, on the response variables is carried out using ANOVA. Statistical hypotheses tests are needed to determine if the factor/interaction effects are significant. Sums of squares (SoS)-sum of all the squared effects-for each factor/interaction must be computed, and are related to the estimated effects in (13), as follows: For each response variable, each effect that is likely to be significant is tested with the following statistical hypotheses: H 0 : The variable is not significantly affected by the variation in the factor/interaction, i.e., the effect produced on the variable is not statistically significant.
The variable is significantly affected by the variation in the factor/interaction (the effect produced on the variable is statistically significant).
In ANOVA, the total variation relative to the mean value, designated by 'Total SoS', is partitioned into two components. One is due to the model and the other to the residual where the effects that are likely to be significant (and are to be tested for statistical significance) are incorporated in the model and the remaining ones with near zero values are used to estimate the error, or residual. The two largest effects (B and C) are the ones that fall out the straight line in the half-normal plot for the response variable Y p and since we also aim to check the effect of the interaction BC, the SoS components are: The values for the ANOVA for Y p are shown in Table 8. The SoS for the components of the model and for the residual are shown in the second column of the table. The third column contains de degrees of freedom (df) associated with SoS-one df for each factor/interaction since only two-levels are present -, the fourth column shows the mean square (MS) (the SoS divided by df) and the ratio of MS of an effect over the MS of the residual, giving the 'statistic' F value (Fval), is on the fifth column. Finally, the last column of the table shows the probability of getting an Fval as high as that computed, due to chance alone, known in the literature as 'Prob. value > F' (P. val). The Fval is compared to the reference F-distribution with the same df. We decided to use a 1% risk (the risk of erroneously rejecting the null hypothesis H 0 when it is true), also known as level of significance, in these hypotheses tests. This means that if the computed 'Fval' exceeds the critical value of the reference F-distribution for the 1% risk, H 0 should be rejected, and we are 99% confident that the 'response variable' is significantly affected by the effect of that factor/interaction, in the model. Based on the values of 'P. val' in Table 8, we conclude that only the 'Fval' of factor C exceeds the critical value 9.33 of F 1,12 (1 df for the SoS in numerator and 12 df for the SoS in the denominator) that corresponds to a 1% risk. (It even exceeds the critical value of F 1,12 for a 0.1% risk.) Thus, the two strategies identified as low and high levels of factor C are able to produce statistically significant differences on the percentage of successful runs. Next, we show the ANOVA carried out for the response variable Y e . The SoS components for the model include factors B and C and to be able to check the importance of the effect due to the interaction BCD, we had to incorporate factor D as well in the analysis. For a more complete checking, interaction CD is also included, The values for the ANOVA for Y e are shown in Table 9. Based on the 'P. val' reported in the table, we may conclude that from the tested factors (B, C and D) and interactions (CD and BCD), only the interaction CD effect is not significant, since the associated 'Fval' does not exceed the critical value 10.04 of F 1,10 (1 df for the SoS in numerator and 10 df for the SoS in the denominator) that corresponds to the 1% risk. This means that although the levels of the factors C and D affect variable Y e significantly, the effect caused by one of the factors on Y e does not depend on the level of the other. We note here that for both factors B and C we are 99.9% confident that the number of function evaluations to locate each root is significantly affected since their 'Fval' highly exceed the critical value of F 1,10 for the 0.1% risk. However, when the interaction BCD is considered, the effect of one of the factors B, C or D is significantly affected, in statistical sense, by the levels of the other two factors, as far as the average number of function evaluations is concerned. The 'Fval' associated with BCD exceeds the reference critical value as reported in Table 9. We may observe how factors B, C and D interact. We notice that when B and C are at low level, and D varies from low to high level, Y e decreases from 208.6 (average of 209.2 and 207.9) to 202.1 (average of 201.0 and 203.1), while when B is at high level and C remains at low level, the variation of D from low to high level makes Y e to increase from 285.1 (average of 286.6 and 283.6) to 297.9 (average of 302.7 and 293.1).
ANOVA for the response variable Y t is the following. To be able to analyze the effect of CD we have to include factor D in the model. Interaction BCD is also integrated in the model. Thus, the components of the SoS for the model and SoS for the residual are: Table 10 contains the values of the ANOVA for Y t . Note that the computed values of 'Fval' for factor D, and interactions CD and BCD do not exceed the critical value 10.04 of F 1,10 for a 1% risk. Thus, the associated H 0 could not be rejected. Only the H 0 associated with factors B and C, for Y t , should be rejected even at a 0.1% risk. The main conclusions are the following: (i) the two strategies related with the setting of the simplex γ parameters affect significantly the response variable (time to compute a root); and (ii) the two tested strategies related with solving the problem when the simplex collapses also affect significantly the time to locate each root.

Comparison of results
This section aims to show some performance comparison between the proposed repulsion algorithm based on a Nelder-Mead type local search and some other available techniques for locating multiple roots of a system of nonlinear equations. First, we compare our results with those reported in [7]. The results are summarized in Table 11 that shows: • 'SR' (%), the percentage of runs (out of 30) where all the roots were located; • 'N r ', the average number (out of 30) of located roots per run; • 'NFE r ', the average number of function evaluations required to locate a root; • 'T r ', the average time required to locate a root (in seconds); relative to twelve problems of the previous set. From the proposed variants of the N-M algorithm, and according to the experimental testing of the previous section, we choose the one that combines the high levels of factors A, B, C, and D. We note that the results reported in [7] were obtained using a repulsion algorithm that uses the metaheuristic known as the harmony search as the local procedure. The differences between the three tested algorithms lie in the penalty term used to modify the merit function and to define the repulsion area as roots are computed. The penalty terms involve an inverse 'erf' function [7], an 'exp' function [8] and the 'coth' function [6]. These results are shown in the last three sets of four columns of Table 11. The first set of four columns of the table contains the results obtained by our study. We may conclude that our method notably wins in efficiency and is comparable to the others in terms of percentage of successful runs. We aim to further compare the efficiency of our algorithm with the results reported in [3] and [8], regarding five problems of the previous set. In [8], the authors propose a biased random-key genetic algorithm to minimize the merit function coupled with a penalty approach to modify the merit function as roots are found. The penalty term is an 'exp' type function and is implemented with two parameters. One aims to create an area of repulsion around previously found roots and the other aims to penalize points inside the repulsion area. In [3], a multistart approach that uses a gradient-based BFGS variant as the local search procedure is implemented. The statistics reported by the authors correspond to three multistart implementations that differ in the stopping rule adopted by the algorithm. Here, we report only the second best value. The results are shown in Table 12. Beside the statistics used in the previous table, we also show the average number of N-M local search calls, L cal , for our algorithm. In the table, '-' means that the data is not available in the referenced paper. In these comparisons we increase the parameters k Rmax and the maximum of k uns of the repulsion algorithm to 100 and 25, respectively, since the computational budget is not an important issue. Although our algorithm did not find all roots in all runs, when solving problems Reactor and Robot, it requires less function evaluations than the other two in comparison. While the average time needed to find each root is significantly lower than that of the heuristic proposed in [8], it exceeds the value of the average time reported in [3] (ranging from one to eight times), on problems Floudas, Merlet and Reactor (R = 0.960).

Performance on standard benchmarks
Since our proposals for the N-M variant seem promising when compared with those of the classical N-M, we decided to extend them to solving more complex and real application systems [18,[20][21][22]. The benchmark set is a difficult set of problems. For these experiments, k Rmax and the maximum of k uns (to stop the Repulsion Algorithm) are set to 75 and 15 respectively, the parameters ρ and δ in the definition of the penalty term in (4) are fixed at 0.01 and 100 respectively, and the N-M algorithm is allowed to run for a maximum of 1000n iterations. We compare our results with two deterministic global search techniques, a traditional interval method that uses range testing and branching, designated as 'HRB', and a branch and prune approach denoted by 'Newton' in [20]. We also use the results produced by a continuation method designated by 'CONT' in [20] and those of a multiobjective evolutionary algorithm approach presented in [18], for comparison. It is mentioned that the algorithm has produced a constant number ( 200) of nondominated solutions in each run. We now describe each benchmark and the obtained results with some detail. Economics modeling application. This is an economic modeling problem that can be tested for a variety of dimensions n: x j x jþi x n À c i ¼ 0; i ¼ 1; . . . ; n À 1 We test for n = 4,5,6,7,8,9 and compare with the results available in [20] considering the set O = [−100,100] n . The results produced by the 'Newton' algorithm correspond to interval widths  Table 13. Our algorithm is able to identify several roots. We note here that a located root ξ is considered to be different from any other computed root if they differ in norm more than = 0.05. We observe that as n increases, the number of located roots decreases. This means that the difficulty in converging to a root is higher when n is larger and this is related with the performance of the N-M search procedure that deteriorates as n increases. While the computational effort (function evaluations and time) in 'Newton' greatly increases with n, the number of function evaluations in our algorithm grows at most by a factor of two, if the number of local search calls is the same (see, for example, the instances with n = 4,5,6,7). However, the total time essentially depends on the time required by the N-M procedure to locate each root. Neurophysiology application. This is an application from neurophysiology with n = 6 and is defined by: where c i can be chosen at random. We test this application with c i = 0, i = 1, . . ., 4 and consider three intervals [−10,10] 6 , [−100,100] 6 and [−1000,1000] 6 , as proposed in [20]. See Table 14. In [18] (with 300 initial points and after 200 generations), the multiobjective algorithm produces a set of about 200 nondominated solutions after 28.90 seconds. (The time reported in [18] is based on a 2.4-GHz Intel Duo Core CPU with 2-GB RAM.) The results show that the larger the interval the smaller is the number of located roots. Once again we observe that as the number of located roots increases the larger is the time required to converge to a root (see the aT r values). This is an important characteristic of the Repulsion Algorithm since the minimization of the modified penalty merit function gets harder as the number of penalty terms increases. Interval arithmetic benchmarks. The following two sets of functions are benchmarks from the interval arithmetic area. They are designated as instances i1 and i5 and are defined as and respectively [20]. The set O for instance i1 is [−2,2] 10 and for i5 is [−1,1] 10 . Both have n = 10 and one root in O. Our results as well as those available in [20] are reported in Table 15. We note that in [18] a set of nondominated solutions are obtained after 300 generations and 39.08 seconds when solving the benchmark i1 (using an initial population of 500 points), and 366.40 seconds when solving i5. We are able to accelerate our iterative process by setting the maximum of k uns to 5 since the root is computed during the first iterations of the Repulsion Algorithm. From the results shown in the Table 15 we may conclude that the number of function evaluations and time required to converge to a root are comparable to that of 'Newton'. Since the repulsion algorithm's paradigm requires restarting again and again, searching the search space and looking for other roots, the corresponding total values may become large, depending on the number of restarts. Nevertheless, those totals are comparable to the totals of 'HRB'. Chemical equilibrium application. This problem addresses the equilibrium of the products of a hydrocarbon combustion process which, reformulated in the variable space, is [21]: where  Table 16, where we report the results available in [20,21] as well. In [18], 500 generations and an initial population of 500 points are considered to obtain a set of about 200 nondominated solutions in 32.71 seconds. Since we look for just one root, we first set the maximum of k uns to 10. Although the root was located in only 10% of the runs, when the set O = [10 −4 ,10 8 ] 5 is used, our iterative process takes on average 2,482 function evaluations (and 0.19 seconds) to converge to the solution. While the branch and prune technique 'Newton' requires a total of 52,236 function evaluations (and 6.32 seconds) to converge to the required solution, our iterative process computes 179,951 function values when the limit of unsuccessful iterations is set to 10, but it computes 55,913 function values when the limit of unsuccessful iterations is 5.
We  Bratu system. This is a polynomial system about x 1 , . . ., x n that arises from the discretization (with a mesh size of h) of the differential equation model for nonlinear diffusion phenomena taking place in combustion and semiconductors-a two-point boundary value problem- [3,23]: In the set [0, 5] n , there are two roots for all n. We tested this problem with n = 5.
Brown almost-linear system. This is a system with n variables which has two real roots if n is even and three roots if n is odd. It is a ill-conditioned and difficult to solve for some standard methods [21,22]. In the interval [−1,1] n there is one root when n = 10.
Resistive circuit problem This system of n nonlinear equations describes a nonlinear resistive circuit containing n tunnel diodes [25]. The set O is defined by [−10,10] n and we test this circuit problem with n = 5 and n = 7.
All the experiments were run 30 times with each problem and the reported results are average values over the 30 runs. First, we aim to show that the herein proposed parameter values, defined in (8), make the N-M algorithm to work better. Overall, the number of located roots has in general increased and the number of function evaluations and time have not increased, except when solving problem Brown. See the first two sets of three columns (identified with A +1 B + C +1 D +1 and A +1 B −1 C +1 D +1 ) in Table 17. We note that the problems now used are in general of larger dimension than those used during the DoE analysis. The last set of three columns of the table shows the results obtained when the shrinking strategy is selected, A +1 B −1 C +1 D −1 (as opposed to the strategy defined in (12)). Improvements are obtained only on the problems Brown and Broyden, when compared to A +1 B + C +1 D +1 . Here, we consider an improvement when the number of located roots has increased, or the number of function evaluations has decreased provided that a tie in the number of roots is reported.
Second, we show that the strategy defined in (12), in the context of our proposals, is able to locate in general more roots than the classic shrinking methodology, using now this new set of problems. These two variants are identified by A +1 B + C +1 D +1 and A +1 B + C +1 D −1 respectively in the Table 18. We also include the results of another variant that uses a uniformly distributed random number instead of a number drawn from the Lévy distribution. We observe that the variant A +1 B + C +1 D +1 wins on four problems. It is able to locate more roots than the other two in comparison on problems Broyden, Circuit_5 and Circuit_7 and is more efficient than the other two on the problem Interval (for the same number of located roots). The variant A +1 B + C +1 D −1 wins on problem Neurophysiology and is more efficient on the problems Brown and Yamamura. The variant A +1 B + C +1 D +1(Uniform) wins only on the problem Bratu. The results reported in Table 19 are used to show that the variant A +1 B + C +1 D +1 that uses the golden ratio F to define the step size in order to generate points around the best vertex of the simplex (see (11)) outperforms the other two variants A +1 B + C +1(random) D +1 and A +1 B + C −1 D +1 in comparison. C +1(random) means that in Eq (11) the step size is a multiple of a random number uniformly distributed in [0, 1], and C −1 represents the strategy that breaks the N-M iterative process when the simplex has collapsed.
Finally, we aim to show that the maximum target value, ρ , for the radius of the repulsion area ρ (see the definition in (4) and in Subsection 'Parameters Setting') slightly affects the performance of the algorithm. In this section and until now we have set the fixed value ρ = 0.01 when solving the eight problems. We test two other different values: 0.1 and 0.001. Table 20 contains the results produced by the variant A +1 B + C +1 D +1 . We may conclude that the values 0.01 and 0.1 give better results than 0.001. Between the two best values, 50% of the results are better with ρ = 0.01 and the remaining are better with ρ = 0.1.

Conclusions
A repulsion algorithm is presented for locating multiple roots of a system of nonlinear equations. The proposed algorithm relies on a penalty merit function that depends on two parameters. One aims to scale the penalty and the other adjusts the radius of the repulsion area, so that convergence to previously located minimizers of the merit function is avoided. For each randomly generated point in the search space, the algorithm invokes the N-M algorithm to exploit the region for a new minimizer. In the N-M local search context, several alternative strategies have been incorporated in the algorithm aiming to enhance the quality of the solutions and improve efficiency. To analyze the effect of these strategies on the overall performance of the repulsion algorithm, measured by three criteria-the percentage of successful runs, the number of function evaluations and the time required to compute each root -, a two-level factorial design of experiments is carried out. Four factors at two levels each are manipulated and tested to analyze their statistical significance. During these computational experiments, a set of sixteen benchmark problems is used. The values of the response variables used in the factorial design of experiments correspond to the averaged values after 30 independent runs produced by solving the sixteen problems. From the statistical analysis, we may conclude with 99% confidence that: • the number of function evaluations to locate each root and the time to locate each root have significantly increased when the simplex parameters are changed from the classical values to those based on the golden ratio and dimension; • the percentage of successful runs, the number of function evaluations to locate each root and the time to locate each root have significantly been affected when the two approaches to overcome the simplex degeneracy are tested, in particular, they have increased when the new strategy of generating n vertices around the best vertex with probability 0.75 is used, instead of just 'breaking' the cycle; • the number of function evaluations to locate each root has significantly increased when the Lévy distribution is used to generate n vertices around the best vertex instead of using the classical shrinking procedure; • the number of function evaluations to locate each root has significantly decreased by the interaction that occurs when the simplex parameters change from the classical values to those based on the golden ratio and n, when the generation of n vertices around the best vertex with probability 0.75 is used, instead of the 'break' strategy, and when the Lévy distribution is used to generate n vertices around the best one as opposed to the shrinking strategy.
We have also tested our method with other more complex and realistic problems. The results show that our method is capable of converging to the multiple solutions of those problems with a moderate computational effort. To undergo a variety of other modifications to the N-M algorithm using a limited number of experimental units, a fractional factorial experimental design as opposed to the herein operated full factorial design will be used. The techniques for investigating the variation in experiments brought by the Taguchi methods will be explored.