An active-set algorithm for solving large-scale nonsmooth optimization models with box constraints

It is well known that the active set algorithm is very effective for smooth box constrained optimization. Many achievements have been obtained in this field. We extend the active set method to nonsmooth box constrained optimization problems, using the Moreau-Yosida regularization technique to make the objective function smooth. A limited memory BFGS method is introduced to decrease the workload of the computer. The presented algorithm has these properties: (1) all iterates are feasible and the sequence of objective functions is decreasing; (2) rapid changes in the active set are allowed; (3) the subproblem is a lower dimensional system of linear equations. The global convergence of the new method is established under suitable conditions and numerical results show that the method is effective for large-scale nonsmooth problems (5,000 variables).


Introduction
Consider min x2K f ðxÞ; ð1Þ where f: < n ! < is a possibly nonsmooth convex function, K = {x j l x u}, the vectors l and u represent lower and upper bounds on the variables, and n is the number of variables. Similar problems are discussed by Fukushima [1,2], in which equality constraints are considered and a penalty strategy is used. The form of problem (1) can be viewed as an extension of the linearly constrained convex nonsmooth problem considered in, e.g., [3,4] from linear to possibly nonlinear. In fact, many fields including finance, engineering, management, biology, and medicine can convert to the optimization models (1) (see [5][6][7][8][9] in detail).
Generally, nonsmooth problems are very difficult to solve even when they are unconstrained. Derivative-free methods, like Powell's method [10] or genetic algorithms [11], may be unreliable and become inefficient whenever the dimension of the problem increases. The direct application of smooth gradient-based methods to nonsmooth problems may lead to a PLOS ONE | https://doi.org/10.1371/journal.pone.0189290 January 2, 2018 1 / 16 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 failure in optimality conditions, in convergence, or in gradient approximation [12]. Wolfe [13] and Lemaréchal [14] initiated a giant stride forward in nonsmooth optimization by the bundle concept. Kiwiel [15] proposed a bundle variant that is close to the bundle trust iteration method [16]. Some good results about the bundle technique can be found in [17][18][19] etc. At the moment, various versions of bundle methods are regarded as the most effective and reliable methods for nonsmooth optimization. Bundle methods are efficient for small-and medium-scale problems. This is explained by the fact that bundle methods need relatively large bundles to be capable of solving the problems efficiently [17]. Therefore, special tools for solving nonsmooth optimization problems are needed. At present, Haarala et al. (see [20,21] etc.) introduce the limited memory bundle methods for large scale nonsmooth unconstrained and constrained minimization, which are a hybrid of the variable metric bundle methods and the limited memory variable metric methods and some good results are obtained. More related literature can be found in [22][23][24][25][26]. The test problems can have thousands of decision variables. Yuan et al. [27][28][29][30][31] make some studies where nonsmooth problems with the largest dimension 100,000 were solved in the unconstrained cases [28]. The active-set method can be generalized easily when the objective function is nonsmooth. For example, Sreedharan [32] extends the method developed in [33] to solve nonsmooth problems with a special objective function and inequality constraint. Also, it is quite easy to generalize the ε-active set method to the nondifferentiable case (see, e.g., [34]). In this paper we use the active-set method to solve (1) when the objective function f is convex but not necessarily differentiable. Convexity, which is not essential for our study, is assumed only for simplicity. For the objective function, we first use the Moreau-Yosida regularization technique to make it smooth. Then the active-set limited memory BFGS (L-BFGS) technique is proposed to solve it. Global convergence is established under suitable conditions. The main features of the proposed method are as follows.
1. The iterates are feasible; large changes are allowed in the active set; the subproblem has lower dimension; and the objective function sequence {f MY (x k , ε k )} is decreasing.
2. The L-BFGS method uses function and gradient values.
3. Global convergence is established under suitable conditions. 4. Numerical results show that the method is effective for large-scale problems (up to 5,000 variables).
The paper is organized as follows. In the next section, we briefly review some nonsmooth analysis, a BFGS method and the L-BFGS method for unconstrained optimization, and the motivation for using these techniques. In Section 3, we describe the active-set algorithm with L-BFGS update for (1). In Section 4, global convergence is established under suitable conditions. Numerical results are reported in Section 5, and conclusions are given in the last section.
where kÁk denotes the Euclidean norm of vectors and λ is a positive parameter. Then it is not difficult to see that problem (1) is equivalent to the problem min x2K f MY ðxÞ: ð3Þ The function f MY is a differentiable convex function and has a Lipschitz continuous gradient even when f is nondifferentiable. Under some reasonable conditions, using the following properties of f MY (x) and assuming rf MY (x) is globally Lipschitz continuous, the gradient rf MY (x) is semismooth (see [35,36] etc.). By these properties, many algorithms have been given for solving (3) (see [37] etc.) when K = < n . Some features of f MY (x) can be seen in [38][39][40] et al. Set and denote p(x) = argmin θ(z). Since θ(z) is strongly convex, it is easy to deduce that p(x) is well-defined and unique. Then f MY (x) in (2) can be rewritten as The generalized Jacobian of f MY (x) and the property of BD-regular can be found in [41,42], respectively. Here some properties are listed without proof.
(i) The function f MY is finite-valued, convex, and everywhere differentiable. If g(x) = rf MY (x), then g: < n ! < n is globally Lipschitz continuous: where (ii) g is BD-regular at x means that all matrices V 2 @ B g(x) are nonsingular. Then there exist constants μ 1 > 0, μ 2 > 0 and a neighborhood O of x satisfying d T Vd ! m 1 kdk 2 ; kV À 1 k m 2 ; 8 d 2 < n ; V 2 @ B gðxÞ: It is easy to find that p(x) of the minimizer for θ(z) is difficult or even impossible to solve exactly. Fortunately, for each x 2 < n and any ε > 0, there exists a vector p(x, ε) 2 < n satisfying respectively. Some implementable algorithms to find such p(x, ε) for a nondifferentiable convex function are introduced in [43]. A remarkable feature of f MY (x, ε) and g(x, ε) given by [35] is introduced, which show that, by choosing parameter ε small enough, we can compute approximations f MY (x, ε) and g(x, ε) closing to f MY (x) and g(x) respectively. Proposition 1. Suppose that f MY (x, ε) and g(x, ε) are defined by (7) and (8), respectively. Let p(x, ε) be a vector satisfying (6

A modified BFGS formula and the L-BFGS formula
The BFGS method is one of the most effective quasi-Newton methods for unconstrained optimization problems (UNP) min x2< n h(x), where h(x): < n ! < is continuously differentiable. The famous BFGS quasi-Newton formula is where , and it is easy to see that the quasi-Newton equation holds. If H k is the inverse of B k , we get the inverse update formula of (12): which is the dual form of the DFP update formula in the sense that H k $ B k , H k+1 $ B k+1 , and s k $ y k . The L-BFGS method is an adaptation of the BFGS method to large-scale problems (see [44][45][46] in detail). Instead of storing the matrices H k , at every iteration x k the method stores a small number, say m, of correction pairs which can provide a fast rate of linear convergence and requires minimal storage. From the BFGS formula (14) and the L-BFGS update (15), it is not difficult to find that both of these formulas contain only the gradient information of the objective function, while the function values available are neglected. Some modified quasi-Newton formulas using both gradient and function information are presented (e.g. [47,48]). Wei et al. [49] also gave a new quasi-Newton equation and the corresponding BFGS update formula is defined by The quasi-Newton formula (16) contains both gradient and function information; moreover the modified BFGS update formula possesses a higher order approximation of r 2 h(x) than that of the standard BFGS update (see [47,49] in detail).
Global convergence and superlinear convergence of the quasi-Newton method with (16) have been established for uniformly convex functions [47,49], but fails for general convex functions. One of the main reasons lies in the condition s T k y k > 0 that may not hold for general convex functions. To overcome the weaknesses, Yuan and Wei [50] presented a modified y Ã k named y 1Ã k ¼ y k þ max f0; A k gs k . The idea of paper [50] is based on the following two cases: Case ii: On the other hand, if A k < 0, it is easy to get where the second inequality follows the definition of the convexity of h(x), which means that s T k y k > 0 holds. This modified BFGS formula with y 1Ã k possesses global convergence and superlinear convergence for general convex functions. However, its applications in L-BFGS and nonsmooth optimization have not been widely studied. This article will attempt to do this. The following gives the modified L-BFGS formula for (3) with form where . It is clear that the modified L-BFGS formula (19) contains both function and gradient information at the current and previous step if A 2Ã k > 0. In the following, the matrix H k is generated by (19). This is very costly for even moderately large nonsmooth problems with box constraints, since the limited memory update is used to store fs k ; d Ã k g k kÀ mþ1 , update it as a full matrix, reduce in the free subspace, and the set of active constraints changes at the first finite steps.
Inspired by the Moreau-Yosida regularization and the modified method of [50], we combine them with the limited memory technique, and use them to solve box constrained optimization with nonsmooth objective function. This paper can be regarded as an improvement of the method in [51] with extension to nonsmooth objective functions. Comparing with [51], at each step of our method, a lower-dimensional system of nonlinear equations and nonsmooth objective function needs to be solved. The method is also similar to the algorithm in [44], but, at each iteration, we use an identification technique and solve nonsmooth optimization problems.

L-BFGS active-set algorithm
The following assumptions are needed to obtain convergence.
is bounded from below and the sequence {ε k } converges to zero. We first solve (3) and adapt its solution to problem (1). With the feasible region K = {x 2 < n : l i x i u i , i = 1, . . ., n}, a vector x 2 K is said to be a stationary point for problem (3) if the relations hold. Consider the relations where the scalar ε 2 < tends to zero. By the definition of gðxÞ and gðx; εÞ, we have By (10), if ε ! 0; we get gðx; εÞ ¼ gðxÞ þ oð1Þ: Thus, if (21) holds, it is easy to deduce that (20) holds. In the following, without special note, we concentrate on the relation (21) and regard it as the stationary point condition. In the following, we always suppose that the point x k is consistent with ε k and x is consistent with ε without special remark.
Similar to normal numerical optimization methods, the iteration formula is where {x k } K = {x 2 < n : l i x i u i , i = 1, . . ., n}, x i is the ith element of x, d k is a descent direction of f MY at x k , and α k is a step length determined by the Armijo line search technique where 0 < s < 1 2 , α k = 2 −i with the smallest integer i = 0, 1, 2, . . ., and the sequence {ε k } satisfies ε k > ε k+1 > 0. Before we give the direction definition, we introduce the procedure that estimates the active bounds. Suppose that x 2 < n is a stationary point of problem (1). Let the associated active constraint set be Proof. For any feasible x, if k 2 Υ(x), it is obvious that g k (x, ε) ! 0 holds. Suppose that k 2 Λ(x); then we have u k ! x k ! u k + b k (x)g k (x, ε) ! u k . This implies that l k = x k = u k and g k (x, ε) = 0, which is a contradiction. Thus Υ(x) \ Λ(x) = ;. Now we prove the second conclusion. If i 2 Υ, then by the definition of Υ, g i ðx; εÞ ! 0. Since a i is nonnegative, then x i l i þ a i ðxÞg i ðx; εÞ. Since both a i and g i are continuous in x, we deduce that i 2 Υ(x). Thus we have Υ ΥðxÞ.
Otherwise if i 2 Υ(x), then by the definition of Υ(x), a i (x)g i (x, ε) ! x i − l i ! 0. Since a i is nonnegative, g i (x, ε) ! 0. Since g i is continuous in x, we deduce that i 2 Υ. Thus we get ΥðxÞ Υ. Therefore, we obtain ΥðxÞ ¼ Υ. Analogously, we can conclude that GðxÞ ¼ G and LðxÞ ¼ L. □ This theorem proved that Υ(x), Γ(x) and Λ(x) are "good" estimates of Υ; G and L. The proof can also be found in [52].
is an approximation to the reduced inverse Hessian matrix, H k is an approximation of the full space inverse Hessian matrix, Z is the matrix whose columns are {e i j i 2 Γ k }, and e i is the ith column of the identity matrix in < n×n . If the strict complementarity condition holds, d G k is a strict interior point of k g, and a Ã k is always positive (see [53] in detail). Based on the above discussions, we state our algorithm as follows.
Step 5: Let x k+1 = x k + α k d k and update H k by (19).
Step 6: Set k ≔ k + 1 and go to Step 1.

Global convergence
In order to prove global convergence of Algorithm 1, the following further assumption is needed.
Assumption C. There exist positive scalars B 1 , B 2 such that any sequence of matrices H k ; k ¼ 1; 2; . . . ; satisfy The following lemma shows that d k 6 ¼ 0 is determined by (28) satisfies the sufficiently descent property. Lemma 1. Suppose that d k 6 ¼ 0 is determined by (28) and x k 2 C. Then the inequality holds with a constant ω > 0.
Proof. We prove this result by three cases. Case 1. i 2 Υ k . By x k 2 C and d i where B i is an lower bound on b i (x) in C. Case iii. i 2 Γ k . By (29), (30), and with H k symmetric positive definite, we obtain The following lemma is similar to [52], so we state it without proof.
which shows that the sequence {f MY (x k , ε k )} is descending. So {x k } has at least a limit point. Suppose that x is a limit point of {x k }. It is sufficient to prove that x is a stationary point for problem (3). Lemma 2 means that we only show that the sequence {d k } ! 0. Without loss of generality, we suppose that fd k g ! d and fε k g ! ε. By the property of limit, it is clear that x is feasible. By the feasibility of x, Theorem 1, and the positive functions a i (x) and b i (x) with any possible choice, we have It follows from (27) that gðx; εÞ G ¼ 0: By (30), we get d G ¼ 0: Thus d k ! d ¼ 0. By Lemma 2, we deduce that x is a stationary point for problem (3).
Remark. If the condition gðx; εÞ ¼ 0 holds, by (11) it is not difficult to deduce that gðxÞ ¼ 0 as ε ! 0. By the convexity of f MY (x), the point x is the optimal solution.

Numerical results
In this section, we test the numerical behavior of Algorithm 1. All codes were written in MATLAB 7.6.0 and run on a PC Core 2 Duo CPU, E7500 @2.93GHz with 2GB memory and Windows XP operating system.

Initialization
Our experiments are performed on a set of the nonlinear box-constrained nonsmooth problems from Karmitsa [54] which have given initial points. We choose σ = 0.1, a i (x) = b i (x) = 10 −5 in (26), θ = 1 and the "basic matrix" to be the identity matrix I in the limited memory BFGS method, and m = 5. ε k = 1/(NF + 1) 2 (NF is the function number). For subproblem (2), we use the PRP conjugate gradient algorithm, where the iteration number and the function number are added to the main program. Since the line search cannot always ensure the descent condition d T k g k < 0; uphill search direction may occur in the numerical experiments. In this case, the line search rule may fail. In order to avoid it, the stepsize α k is accepted if the search number is more than six in the line search. The following Himmelblau We also stop the program if the iteration number is more than 5000, and the corresponding method is considered to have failed.

Results
In this section, the test results of our algorithm for some box-constrained nonsmooth problems are reported. The columns of Table 1 have the following meaning: Dim: the dimension of the problem; NI: the total number of iterations; NF: the total number of function values; cpu: the cpu time in second; f ðxÞ: denotes the function value at the point x when the program is stopped. The numerical results indicate that our algorithm is effective for these box constrained nonsmooth problems. The iteration number and function number do not change obviously with the increasing dimension. Problems Chained CB3 I and Chained CB3 II, and Chained Crescent I and Chained Crescent II, have many similar properties and have the same optimal values. From Table 1, we see that the final function value is close to the optimal value, especially for Problems Chained CB3 I and Chained CB3 II, and Chained Crescent I and Chained Crescent II, whose final function values are the same respectively, which shows that the presented method is stable. The cpu time is acceptable for this algorithm, although the iteration number is large for some problems. In the experiments, we find that different stopping rules influence the iteration number and the function number, but not the final function value.
To show the sequence of function values, we give the line chart graph (Figs 1, 2, 3 and 4) for problems Generalization of MAXQ (Fig 1), Chained LQ (Fig 2), Generalization of Brown function 2 (Fig 3), and Chained Crescent I (Fig 4) withp 5,000 variables. We see that the functions are descending. The descent property of the first two steps is very obvious, and these two steps make the function value close to the optimal value. However, the descent is not obvious for other steps. In our opinion, the reason is that the stopping rules are not ideal. Overall, the numerical performance of the proposed algorithm is reasonable for these largescale nonsmooth problems. We conclude that the method provides a valid approach for solving large-scale box-constrained nonsmooth problems. An active set algorithm for solving LS nonsmooth optimization models with box constraint

Conclusion
In this paper, a modified L-BFGS method was presented for solving box constrained nonsmooth optimization problems. This method uses both gradient information and function values in the L-BFGS update formula. The proposed algorithm possesses global convergence.
(i) It is well known that nonsmooth problems are difficult to solve even when the objective function is unconstrained, especially for large-scale nonsmooth problems. To overcome this drawback, the Moreau-Yosida regularization technique is proposed to make the objective function smooth. Moreover, the L-BFGS method is introduced to reduce the computation and make the active-set algorithm suitable for solving large-scale nonsmooth problems.
(ii) The bundle method is one of the most effective methods for nonsmooth problems. However, its efficiencies are applied to small-and medium-scale problems. In order to find  An active set algorithm for solving LS nonsmooth optimization models with box constraint more effective methods for large-scale nonsmooth problems, the bundle L-BFGS algorithms are presented by many scholars, where the dimension can be 1,000 variables. In this paper, the given algorithm can successfully solve 1,000-5,000 variables nonsmooth problems with bound constraints.
(iii) In experiments, we find the different stopping rules influence the iteration numbers and the function numbers but not the final functions. Moreover, from Figs 1-4, we see that the first two iteration steps are the most effective, which shows that the proposed algorithm is effective for large-scale nonsmooth box constrained problems. In our opinion, the reason lies in the stopping criteria. Better rules should be found.
(iv) Considering the above discussions, we think there are at least four issues that could lead to improvements. The first that should be considered is the choice of the parameters in the active-set identification technique. The parameters used are not the only choice. Another important point that should be further investigated is the adoption of the gradient projection  An active set algorithm for solving LS nonsmooth optimization models with box constraint technique. The third is adjustment of the constant m in the L-BFGS update formula. The last is the most important one, from the numerical experiments, namely whether are there other optimality conditions and convergence conditions in the nonsmooth problems? We will study these aspects in our future works.
Although the proposed method does not obtain significant development that we expected, we feel that its performance is noticeable.