A modified filter nonmonotone adaptive retrospective trust region method

In this paper, aiming at the unconstrained optimization problem, a new nonmonotone adaptive retrospective trust region line search method is presented, which takes advantages of multidimensional filter technique to increase the acceptance probability of the trial step. The new nonmonotone trust region ratio is presented, which based on the convex combination of nonmonotone trust region ratio and retrospective ratio. The global convergence and the superlinear convergence of the algorithm are shown in the right circumstances. Comparative numerical experiments show the better effective and robustness.


Introduction
Consider the following unconstrained optimization problem min x2R n f ðxÞ ð1Þ where f:R n !R is a twice continuously differentiable function.
Trust region method and line search method are two effective methods to solve unconstrained optimization problems. At present, the both numerical methods mentioned above for solving nonlinear programming are widely used in many applications of various engineering design, automation, transportation, economic analysis, pattern recognition, artificial intelligence, network design and many other modern high-tech research and development.
Compared with the line search method, the trust region method has novel idea, strong convergence and stable numerical performance, see [1][2][3][4]. It can not only solve the well-conditioned problem quickly, but also solve the ill-conditioned optimization problem effectively. The basic idea of trust region method as follows: at the iteration point x k , the trial step d k is obtained by solving the subproblem. min d2R n m k ðdÞ ¼ g T k d þ kd k k � D k g k denotes by r f (x k ), B k is a symmetric approximation of r 2 f(x k ), Δ k stands for the trust region radius, and ||.|| shows any vector norm, usually the Euclidean norm. To evaluate the consistency between the quadratic model and the objective function, the most classical ratio, denoted by r B k , is defined as follows, The trial step d k is accepted whenever r B k is closed to 1, that is to say x k+1 = x k +d k , and Δ k is updated suitably. Otherwise, r B k is negative or positive but not close to 1, Δ k is decreased and the subproblem should be resolved.
It is well-known that monotone techniques may not only lead to the rate of convergence slows down, especially in the presence of the narrow curved valley, but also the objective function is required to be decreased at each iteration. Considering this fact, it is most meaningful to study nonmonotone technique for improving algorithm. Obviously, the first nonmonotone technique was the alleged watchdog technique proposed by Chamberlain et al. [5], which was designed to overcome the Maratos effect. At the same time, nonmonotone techniques in [6][7][8] have attracted extensive attention from scholars, for example, Deng et al. [9] proposed a nonmonotone trust region method by replacing f(x k ) in (3) with f l(k) given by where , and N�0 is an integer constant. Deng et al. [9] proposed another nonmonotone trust region method with the following ratio.
Motivated by this, Ahookhoosh et al. [6] proposed a nonmonotone trust region method with the following ratio, where in which η k 2[η min ,η max ] with η min 2[0,1) and η max 2[η min ,1]. After that, Grippo et al. [10] proposed a nonmonotone line search technique for Newton's method, that is, However, Grippo's nonmonotone technique has a drawback in that numerical performance is highly dependent on the choice of N. It makes more sense, for given σ2(0,1), the step size α k is chosen so that As is well-known, an appropriate updating strategy of trust region radius plays a valuable role in a manner that it may prominently affect the computational efficiency. Motivated by this, the versions of trust region radius have attracted considerable attention from many scholars in [11][12][13]. In order to avoid the gradient or the Hessian information is not precisely employed in the standard trust region, Zhang et al. [14] proposed a new scheme, which use the following trust region radius, where c is a constant, p is an adjustment parameter, Despite effectiveness of the Zhang's method, calculating an estimation of the inverse of the Hessian in each iteration which results in some additional computational costs. Qu et al. [15] refered to another adaptive strategy for updating the trust region radius as follows, Here, the filter technique introduced by Fletcher and Leyffer [16], which avoids the difficulty of updating the penalty parameter in penalty functions. The filter is able to reject poor trial iterates and enforce global convergence from arbitrary starting points. In this case, it is worth mentioning that Gould et al. [17] proposed an algorithm by using filter technique for unconstrained optimization problems, the main idea is to accept the new iteration point as much as possible.
The filter is primarily composed of a series of gradient of iteration points, the so-called trial step is accepted by the filter, in fact, the corresponding gradient is accepted by the filter. Set rf (x k ) = g(x k ) = g k = (g k,1 ,g k,2 ,. . .,g k,n ), g k,i (i = 1,2,. . .,n) is the i-th component of g k . We say that an iterate x 1 dominates x 2 , whenever The basic concept of multidimensional filter F is a list of n-tuples of the form (g k,1 ,g k,2 ,. . ., g k,n ). Suppose that g k 2 F , g l 2 F , there exists j2(1,2,. . .n) such that jg k;j j � jg l;j j Subsequently, compared with [17], in order to maximize the possibility of acceptance of the trial point, we introduced an improved multidimensional filter F , specifically as follows, Set � l ¼ minfkg l kjðg l;1 ; g l;2 ; . . . ; g l;n Þ 2 F g, a new trial point x k is acceptable if there exists j2{1,2,3,. . .n}, such that When an iteration point x k is accepted, we add g k to the filter and Meanwhile and remove the points which are dominated x k by from the filter.
The rest of the paper is organized as follows. Section 2 is devoted to describe the new filter nonmonotone adaptive restrospective trust region method in details. The global convergence and superlinear convergence are established in Section 3. Some preliminary numerical results are introduced in Section 4. Finally, some conclusions are summarized in Section 5.

Materials and methods
In this section, we propose a new filter nonmonotone adaptive retrospective trust region algorithm for solving unconstrained optimization problems. In order to reduce the computational cost, at the iteration point x k , the new nonmonotone ratio and retrospective ratios are introduced based on [13] as follows: where ε k 2[0,η k ]. As can be seen from the motivation of this nonmonotone term R k , the better convergence result can be obtained by freely selecting the parameter η k and ε k . As a result of above discussion, a new nonmonotone ratio is introduced to improve computational efficiency through the convex combination of r NB k and r NR kþ1 , i,e.
where γ2[γ min ,γ max ]�[0,1]. More exactly, r NB k is used to determine whether the trial step is acceptable, while r NC kþ1 is employed for updating trust region radius. In the criteria presented earlier, the Hessian matrix is time-consuming to calculate and the convergence rate of the adaptive trust region method will be less efficient. Inder to reduce the workload and computational time, some simpler information of known iteration points can be used to reconstruct the regulation formula of trust region radius.
In this way, the trust region radius adjustment formula takes into account the gradient information of the function and the solution of the trust region subproblem, which ensures the calculation accuracy of the algorithm is not reduced. The improved adaptive trust region radius as follows, Note that τ k is computed by the following formula, More formally, the new Algorithm is described as follows.
Step 2. Else, compute r NR k and r NC k by (13) and (14), respectively. τ k is updated by (16) and Step 2. Compute kg k k. If kg k k�ε, then stop.
Step3. Solve the subproblem (2) to find the trial step d k , set Step4. Compute R k and r NB k , respectively.
Step6. Update the symmetric matrix B k by (30). Set k = k+1, and go to Step 1. In this way, it is not necessary to precisely solve the subproblem in Algorithm 2.1, as a result, an approximation of d k satisfies where ϑ2(0, 1) Assumption 2.1.

Convergence analysis
In order to ease of operation, the following index sets are defined as follows: When k= 2S, we have x k+1 = x k +α k d k . Lemma 3.1. For all k, it makes sense for me that Proof. Motivated by the Taylor's expansion and Assumption 2.1, we have This completes the proof of the Lemma 3.1. Lemma 3.2. Suppose that Assumption 2.1 holds, the sequence {x k } is generated by Algorithm 2.1. Moreover, assume that there exists a constant 0<ε<1, so that kg k k>ε, Then, for any k, there exists a nonnegative integer p such that x k+p+1 is a successful iteration point, i.e., r NB kþp � m 1 .

Proof.
On the contrary, that is, assume that there is an iteration k at which x k+p+1 is unsuccessful, for an arbitrary nonnegative integer p, we have This clearly implies that Thus, according to 0<β 1 <1 and Eq (21), we have Now, according to Lemma 3.1, and Eq (17), we get Following the definition of R k , we get R k �η k f k +(1−η k )f k = f k . Thus, for sufficiently large p, we have which contradicts (20). This completes the proof of the Lemma 3.3. Lemma 3.3. Suppose that the infinite sequence {x k } is generated by Algorithm 2.1. The number of successful iterations is infinite, that is, |S| = +1. Then, we have {x k }�L(x 0 ), meanwhile, the sequence {f l(k) } is not monotonically increasing and convergent.
Proof. The proof follows the process of lemma 3 and Lemma 4 in [15].
Proof. The standard of proof can be classified into the following both cases. Case 1. There is no limit to the number of successful iterations, that is |S| = +1. Meanwhile, there are many infinitely filter iterations, i.e, |A| = +1.
Prove by contradiction, assuming that Eq (24) is not true, then there exists a positive constant ε such that kg k k>ε. On account of Assumption 2.1, {kg k k} is bounded. Set the index in set A is the sequence {k i }.Therefore, there exists a subsequence {k t }�{k i } which satisfies lim t!1 This fact, along with the definition of k t , leads to gðx k t Þ is accepted by the filter. Then there exists j2{1,2,. . .,n}, for 8 t>1, that is to say By the fact that 0 jg k t ;j j À max 1�j�n jg k tÀ 1 ;j j � À g g ε < 0, as t is sufficiently large, which is a contradiction. Case 2. There is no limit to the number of successful iterations, that is |S| = +1. Meanwhile, there are many finite filter iterations, i.e, |A|<+1.
Contrarily, suppose that the Eq (24) is not true. Then there exists a positive constant 0<ε<1 such that kg k k>ε, for all k.
As a consequence of |A|<+1, for sufficiently large k2S, we have r NB k � m 1 . Set Obviously, when p is fixed and k!1, Then the sequence {x k } converges to x � superlinearly. Proof. The proof follows the same path as given in the proof of Theorem 4.1 in [19].

Preliminary numerical experiments
In this section, our purpose is to investigate the computational results of algorithm 2.1 on some middle-large size test problems from Andrei [20]. All algorithms are implemented in MATLAB (R2018a) on a PC Intel(R) Core(TM) i7-4558u CPU @2.80GHz 2.80 GHz 4.00 GB RAM memory and double precision format, the following notations represent the different algorithms. ANTRL: ANTRL method is denoted by Ahookhosh et al. in [6]; NFTR: NFTR method is denoted by [17]; WA FTR: WAFTR method is expressed by Qu et al. in [15]; NAFRTR-1: Algorithm 2.1 with Δ 0 = 1; NAFRTR-2: Algorithm 2.1 with Δ 0 = 10; NAFRTR-3: Algorithm 2.1 with Δ 0 = 100; As is known to all, BFGS correction is one of the most important methods in quasi-Newton method. Several improved BFGS methods are given in [21,22], and the convergence theory has been well established. More specifically, B k+1 is revised by [23] where y m� k ¼ y k þ r k kd k k 2 d k , and ρ k = 2(f(x k )−f(x k+1 ))+(g(x k+1 )+g(x k )) T d k . It is easy to know that the formula has not only gradient value information, but also function value information.The parameters of these algorithms are exploited identically as follows: μ 1 = 0.25, μ 2 = 0.75, N = 5, β 1 = 0.25, β 2 = 1.5, Δ max = 100. It is worth mentioning that the stopping criterion is either kg k k�10 −6 , or the number of iterations exceeds 10,000. As menthoded in [6], set η 0 = 0.25, the updating principle of η k follows the following formula, ( For the sake of simplicity, we draw efficiency comparisons involved in the number of function evaluations (n f ), the number of gradient function evaluations (n i ), and the running time (CPU) by using the Dolan-Moré performance profile, in particular, the particulars are detailed in [24]. To account for this, we can choose a performance index as a comparison metric between the above algorithms. For every τ�1, the proportion ρ(τ) of the test problems is given by the performance profile. The performance of each considered algorithmic variant was the best within a range of τ of the best as well.
In this way, it has to be noted that the selection of initial trust region radius has a great influence in the efficiency of the algorithm. Tables 1 and 2 and Figs 1-3 imply that NAFRTR-2 solves more than 95% of the problems with the minimum number of failures compared with the other two initial radius. At a glance, we choose the initial trust region radius Δ 0 = 10, as a given parameter for the following algorithm. Table 2, the NAFRTR-2 is the best solver, in the field of CPU, n f and n i , about 98% problems respectively. As it is clear that NAFRTR-2 is effective and obtains better performance profiles by comparing with ANTRL, NFTR and WAFTR. Based on the above main observations, the modified trust region method comes out to be fairly effective for unconstrained optimization. Moreover, as shown in Figs 4-6 and Table 3, the RNATR-A is the best solver, in the field of CPU, n f and n i , about 98% problems respectively. As it is clear that NAFRTR-2 is effective and obtains better performance profiles by comparing with ANTRL, NFTR and WAFTR. Based on the above main observations, the modified trust region method comes out to be fairly effective for unconstrained optimization.

Conclusions
In this paper, making proper use of the filter technique, a new trust region method has been proposed that the trust region radius takes into account the gradient information of the function and the solution of the trust region subproblem. To some extent, it is more reasonable to adopt the convex combination of nonmonotone trust region ratio and retrospective ratio. In addition, the approximation of the Hessian matrix is updated by an improved quasi-Newton