A Modified BFGS Formula Using a Trust Region Model for Nonsmooth Convex Minimizations

This paper proposes a modified BFGS formula using a trust region model for solving nonsmooth convex minimizations by using the Moreau-Yosida regularization (smoothing) approach and a new secant equation with a BFGS update formula. Our algorithm uses the function value information and gradient value information to compute the Hessian. The Hessian matrix is updated by the BFGS formula rather than using second-order information of the function, thus decreasing the workload and time involved in the computation. Under suitable conditions, the algorithm converges globally to an optimal solution. Numerical results show that this algorithm can successfully solve nonsmooth unconstrained convex problems.

Let F : R n ! R be the so-called Moreau-Yosida regularization of f, which is defined by where λ is a positive parameter and k Á k denotes the Euclidean norm. The problem Eq (1) is equivalent to the following problem min x2R n FðxÞ: It is well known that the problems Eqs (1) and (3) of the solution sets are the same. As we know, one of the most effective methods for problems Eq (3) is the trust region method. The trust region method plays an important role in the area of nonlinear optimization, and it has been proven to be a very efficient method. Levenberg [19] and Marquardt [20] first applied this method to nonlinear least-squares problems, and Powell [21] established a convergence result for this method for unconstrained problems. Fletcher [22] first proposed a trust region method for composite nondifferentiable optimization problems. Over the past decades, many authors have studied the trust region algorithm to minimize nonsmooth objective function problems. For example, Sampaio, Yuan and Sun [23] used the trust region algorithm for nonsmooth optimization problems; Sun, Sampaio and Yuan [24] proposed a quasi-Newton trust region algorithm for nonsmooth least-squares problems; Zhang [25] used a new trust region algorithm for nonsmooth convex minimization; and Yuan, Wei and Wang [26] proposed a gradient trust region algorithm with a limited memory BFGS update for nonsmooth convex minimization problems. For other references on trust region methods, see [27][28][29][30][31][32][33][34][35], among others. In particular, for the problem we address in this study, as we can compute the exact Hessian, the trust region method could be very efficient. However, it is difficult to compute the Hessian at every iteration, which increases the computational workload and time.
The purpose of this paper is to present an efficient trust region algorithm to solve Eq (3). With the use of the Moreau-Yosida regularization (smoothing) and the new quasi-Newton equation, the given method has the following good properties: (i) the Hessian makes use of not only the gradient value but also the function value and (ii) the subproblem of the proposed method, which possesses the form of an unconstrained trust region subproblem, can be solved using existing methods.
The remainder of this paper is organized as follows. In the next section, we briefly review some basic results in convex analysis and nonsmooth analysis and state a new quasi-Newton secant equation. In section 3, we present a new algorithm for solving problem Eq (3). In section 4, we prove the global convergence of the proposed method. In section 5, we report numerical results and present some comparisons for the existing methods to solve problem Eq (1). We conclude our paper in Section 6.
Throughout this paper, unless otherwise specified, k Á k denotes the Euclidean norm of vectors or matrices. strongly convex. By Eq (2), F can be rewritten as In the following, we denote g(x) = rF(x). Some important properties of F are given as follows: 1. F is finite-valued, convex and everywhere differentiable with 2. The gradient mapping g : R n ! R is globally Lipschitz continuous with modulus λ, i.e., 3.
It is obvious that F(x) and g(x) can be obtained through the optimal solution of argmin z 2 R n θ(z, x). However, the minimizer of θ(z, x), p(x) is difficult or even impossible to solve for exactly. Thus, we cannot compute the exact value of p(x) to define F(x) and g(x). Fortunately, for each x 2 R n and any > 0, there exists a vector p α (x, ) 2 R n such that Thus, we can use p α (x, ) to define respective approximations of F(x) and g(x) as follows, when is small, and g a ðx; The papers [36,37] describe some algorithms to calculate p α (x, ). The following remarkable feature of F α (x, ) and g α (x, ) is obtained from [38]. Proposition 2.1 Let p α (x, ) be a vector satisfying Eq (6), and F α (x, ) and g α (x, ) are defined by Eqs (7) and (8), respectively. Then, we obtain The relations Eqs (9), (10) and (11) imply that F α (x, ) and g α (x, ) may be made arbitrarily close to F(x) and g(x), respectively, by choosing the parameter to be small enough.
Second, recall that when f is smooth, the quasi-Newton secant method is used to solve problem Eq (1). The iterate x k satisfies rf k + B k (x k+1 − x k ) = 0, where rf k = rf(x k ), B k is an approximation Hessian of f at x k , and the sequence of matrix {B k } satisfies the secant equation as follows.
where y k = rf k+1 − rf k and s k = x k+1 − x k . However, the function values are not exploited in Eq (12), which the method solves by only using the gradient information. Motivated by the above observations, we hope to develop a method that uses both the gradient information and function information. This problem has been studied by several authors. In particular, Wei, Li and Qi [39] proposed an important modified secant equation by using not only the gradient values but also the function values, and the modified secant is defined as where . When f is twice continuously differentiable and B k+1 is updated by the BFGS formula [40][41][42][43], where B k = I is a unit matrix if k = 0, this secant Eq (13) possesses the following remarkable property: This property holds for all k. Based on the result of Theorem 2.1 [39], Eq (13) has an advantage over Eq (12) in this approximate relation.

The new model
In this section, we present a modified BFGS formula using trust region model for solving Eq (1), which is motivated by the Moreau-Yosida regularization (smoothing), general trust region method and the new secant Eq (13). First, we describe the trust region method. In each iteration, a trial step d k is generated by solving an adaptive trust region subproblem, in which the values of the gradient of F(x) at x k and Eq (13) are used: where the scalar k > 0 and Δ k describe the trust region radius. Let d k be the optimal solution of Eq (14). The actual reduction is defined by and we define the predict reduction as Then, we define r k to be the ratio between Are d k and Pre d k Based on the new secant Eq (13) and with B k+1 being updated by the BFGS formula, we propose a modified BFGS formula. The B k+1 is defined by if s T k n k 0; : if k = 0, then B k = I, and I is a unit matrix. We now list the steps of the modified trust region algorithm as follows.
Step 0. Choose is called the maximum value of trust region radius, B 0 = I, and I is the unit matrix. Let k: = 0.
Step 4. Regulate the trust region radius. Let Step 5. If the condition r k ! σ 1 holds, then let x k+1 = x k + d k , update B k+1 by Eq (18), and let k: = k + 1; go back to Step 1. Otherwise, let x k+1 : = x k and k: = k + 1; return to Step 2.
Similar to Dennis and Moré [44] or Yuan and Sun [45], we have the following result. Lemma 1 If and only if the condition s T k n k > 0 holds, B k+1 will inherit the positive property of B k .
Proof " ) " If B k+1 is symmetric and positive definite, then "(" For the proof of the converse, suppose that s T k n k > 0 and B k is symmetric and positive definite for all k ! 0. We shall prove that x T B k+1 x > 0 holds for arbitrary x 6 ¼ 0 and x 2 R n by induction. It is easy to see that B 0 = I is symmetric and positive definite. Thus, we have Because B k is symmetric and positive definite for all k ! 0, there exists a symmetric and positive definite matrix B It is not difficult to prove that the above inequality holds true if and only if there exists a real (20) strictly holds (and note that s k n k T > 0), then from Eq (19), we have Therefore, for each 0 6 ¼ x 2 R n , we have x T B k+1 x > 0. This completes the proof. Lemma 1 states that if s T k n k > 0, then the matrix sequence {B k } is symmetric and positive definite, which is updated by the BFGS formula of Eq (18).

Convergence analysis
In this section, the global convergence of Algorithm 1 is established under the assumption that the following conditions are required.
Assumption A.  (14), then Proof Similar to the proof of Lemma 7(6.2) in Ma [46]. Note that the matrix sequence {B k } is symmetric and positive definite; then, we present d c k to be a Cauchy point at iteration point x k , which is defined by It is easy to verify that the Cauchy point is a feasible point, i.e., kd c k k D k . If Thus, we obtain Otherwise, we have d c k ¼ À kg a ðx k ; k Þk 2 g a ðx k ; k Þ T B k g a ðx k ; k Þ g a ðx k ; k Þ. Thus, we obtain Let d k be the solution of Eq (14). Because q k ðd c k Þ ! q k ðd k Þ, we have This completes the proof. Lemma 3 Let Assumption A hold true and the sequence {x k } be generated by Algorithm 1. If d k is the solution of Eq (14), then Proof Let d k be the solution of Eq (14). By using Taylor expansion, F α (x k + d k , k+1 ) can be expressed by Note that with the definitions of Are d k and Pre d k and by using Eq (23), we have The proof is complete. Lemma 4 Let Assumption A hold. Then, Algorithm 1 does not circle in the inner cycle infinitely.
Proof Suppose, by contradiction to the conclusion of the lemma, that Algorithm 1 cycles between Steps 2 and 5 infinitely at iteration point x k , i.e., r k < σ 1 and that there exists a scalar ρ > 0 such that kg α (x k , k )k ! ρ. Thus, noting that 0 < η 1 < 1, we have By using the result Eq (22) of Lemma 3 and the definition of r k , we obtain which means that we must have r k ! σ 1 ; this contradicts the assumption that r k < σ 1 , and the proof is complete. Based on the above lemmas, we can now demonstrate the global convergence of Algorithm 1 under suitable conditions. Theorem 1 (Global Convergence). Suppose that Assumption A holds and that the sequence {x k } is generated by Algorithm 1. Let d k be the solution of Eq (14). Then, lim k!1 inf kg k k ¼ 0 holds, and any accumulation point of x k is an optimal solution of Eq (1).
Proof We first prove that Suppose that g α (x k , k ) 6 ¼ 0. Without loss of generality, by the definition of r k , we have Using Taylor expansion, we obtain When Δ k > 0 and small enough, we have Suppose that there exists ω 0 > 0 such that kg α (x k , k )k ! ω 0 . By contradiction, using Eqs (25) and (26) and Lemma 2, we have which means that there exists sufficiently smallD > 0 such that D k D for each k, and we have jr k − 1j < 1 − σ 2 , i.e., r k > σ 2 . Then, according to the Algorithm 1, we have Δ k+1 ! Δ k . Thus, there exists a positive integer k 0 and a constant ρ 0 for arbitrary k ! k 0 and satisfying D k D , for which we have On the other hand, because F is bounded from below, and supposing that there exists an infinite number k such that r k > σ 1 , by the definition of r k and Lemma 2, for each k ! k 0 , which means that Δ k ! 0 for k ! 1; this is a contradiction to Eq (28). Moreover, suppose that for sufficiently large k, we have r k < σ 1 . Then, D k ¼ Z k 1 D 0 , and we can see that Δ k ! 0 for k ! 1; this is also a contradiction to Eq (28). The contradiction shows that Eq (24) holds.
We now show that lim k!1 inf kg k k ¼ 0 holds. By using Eq (11), we have k g a ðx k ; k Þ À gðx k Þ k ffiffiffiffiffiffi 2 k l r : Together with Assumption A(iv), this implies that Finally, we make a final assertion. Let x Ã be an accumulation point of {x k }. Then, without loss of generality, there exists a subsequence {x k } K satisfying lim k!1;k2K From the properties of F, we have Thus, by using Eqs (29) and (30), we have x Ã = p(x Ã ). Therefore, x Ã is an optimal solution of Eq (1). The proof is complete. Similar to Theorem 3.7 in [25], we can show that the rate of convergence of Algorithm 1 is Q-superlinear. We omit this proof here (the proof of the Q-superlinear convergence can be found in [25]).
Theorem 2 (Q-superlinear Convergence) [25] Suppose that Assumption A(ii) holds, that the sequence {x k } is generated by Algorithm 1, which has a limit point x Ã , and that g is BD-regular and semismooth at x Ã . Furthermore, suppose that k = o(kg(x k )k 2 ). Then, 1. x Ã is the unique solution of Eq (1); 2. the entire sequence {x k } converges to x Ã Q-superlinearly, i.e.,

Results
In this section, we test our modified BFGS formula using a trust region model for solving nonsmooth problems. The type of nonsmooth problems addressed in Table 1 can be found in [47][48][49][50][51][52][53]. The problem dimensions and optimum function values are listed in Table 1, where "No." is the number of the test problem, "Dim" is the dimension of the test problem, "Problem" is the name of the test problem, "x 0 " is the initial point, and "f ops (x)" is the optimization function evaluation. Here, the modified algorithm was implemented using MATLAB 7.0.4, and all numerical experiments were run on a PC with CPU Intel CORE(TM) 2 Duo T6600 2.20 GHZ, with 2.00 GB of RAM and with the Windows 7 operating system. To test the performance of the given algorithm for the problems listed in Table 1, we compared our method with the trust region concept (BT) of paper [15], the proximal bundle method (PBL) of paper [17] and the gradient trust region algorithm with limited memory BFGS update (LGTR) described in [26]. The parameters were chosen as follows: σ 1 = 0.45, σ 2 = 0.75, η 1 = 0.5, η 2 = 4, λ = 1, Δ 0 = 0.5 < Δ max = 100 and k ¼ 1 ð2þkÞ 2 (where k is the iterate number). We stopped the algorithm when the condition kg α (x, )k 10 − 6 was satisfied. Based on the idea of [26], we use the function fminsearch in MATLAB for solving min θ(z, x). Then, we obtained the solution p(x); moreover, we obtained g α (x, ), which is computed using Eq (8). Meanwhile, we also listed the results of PBL, LGTR, BT and our modified algorithm in Table 2. The numerical results of PBL and BT can be found in [17], and the numerical results of LGTR can be found in [26]. The following notations are used in Table 2: "NI" is the number of iterations; "NF" is the number of the function evaluations; "f(x)" is the function value at final iteration; "--" indicates that the algorithm fails to solve the problem; and "Total" denotes the sum of the NI/NF.
The numerical results show that the performance of our algorithm is superior to those of the methods in Table 2. It can be seen clearly that the sum of our algorithm relative to NI and NF is less than the other three algorithms. The paper [54] provides a new tool for analyzing the efficiency of these four algorithms. Figs 1 and 2 show the performances of these four methods relative to NI and NF of Table 2, respectively. These two figures prove that Algorithm 1 provides a good performance for all the problems tested compared to PBL, LGTR and BT. In sum, the preliminary numerical results indicate that the modified method is efficient for solving nonsmooth convex minimizations.

Conclusion
The trust region method is one of the most efficient optimization methods. In this paper, by using the Moreau-Yosida regularization (smoothing) and a new secant equation with the BFGS formula, we present a modified BFGS formula using a trust region model for solving nonsmooth convex minimizations. Our algorithm does not compute the Hessian of the objective function at every iteration, which decrease the computational workload and time, and it uses the function information and the gradient information. Under suitable conditions, global convergence is established, and we show that the rate of convergence of our algorithm is Q- superlinear. Numerical results show that this algorithm is efficient. We believe that this algorithm can be used in future applications to solve non smooth convex minimizations.