The Optimal Solution of a Non-Convex State-Dependent LQR Problem and Its Applications

This paper studies a Non-convex State-dependent Linear Quadratic Regulator (NSLQR) problem, in which the control penalty weighting matrix in the performance index is state-dependent. A necessary and sufficient condition for the optimal solution is established with a rigorous proof by Euler-Lagrange Equation. It is found that the optimal solution of the NSLQR problem can be obtained by solving a Pseudo-Differential-Riccati-Equation (PDRE) simultaneously with the closed-loop system equation. A Comparison Theorem for the PDRE is given to facilitate solution methods for the PDRE. A linear time-variant system is employed as an example in simulation to verify the proposed optimal solution. As a non-trivial application, a goal pursuit process in psychology is modeled as a NSLQR problem and two typical goal pursuit behaviors found in human and animals are reproduced using different control weighting . It is found that these two behaviors save control energy and cause less stress over Conventional Control Behavior typified by the LQR control with a constant control weighting , in situations where only the goal discrepancy at the terminal time is of concern, such as in Marathon races and target hitting missions.


Problem Definition
In this paper, we seek an optimal control law u~k(x,t), for which the performance index J(x(t),u(t))~ð t f t 0 L(x(t),u(t),t)dtzw(x(t f ),t f ) 1 2 is minimized along the associated closed-loop system trajectory of the Linear Time-variant (LTV) system where u(t)[R m is the control input, x(t)[R n is the system state, t 0 is the starting time, t f is the terminal time and x 0 is the initial value of x(t) at time t 0 . The coefficients A(t),Q(t),S(t f )[R n|n , B(t)[R n|m , R(x(t))[R m|m . To simplify notation, the dependence of variables on t is omitted when no confusion will be introduced in the rest of the paper. It is assumed that A, B, Q are continuous in t, R(x) is differentiable with respect to x, and LR Lx is bounded.
The coefficients Q and S are positive semi-definite symmetric matrices for all t[½t 0 ,t f , and R(x) is a positive definite symmetric matrix for all x(t),t[½t 0 ,t f . Additional conditions on R(x) will be imposed in order to obtain the sufficiency for optimality. It is noted that when the state-dependent matrix R(x) in Eq (1) is replaced by a time-dependent matrix R(t), the performance index J is quadratic and convex in both x and u, and Eq (1) and (2) constitute the standard Linear Quadratic Regulator (LQR) problem. The classical LQR theory provides a mature way to find an optimal control law for such a convex quadratic performance index. However, the state-dependent coefficient R(x) in Eq (1) renders the performance index in the problem no longer convex in both x and u, which makes the LQR theory inapplicable here. However, the formalism of the LQR theory is still useful. Therefore, we denote the problem defined above as a Non-convex State-dependent LQR (NSLQR) problem. The associated Riccati Equation of the NSLQR problem is named as Pseudo-Differential-Riccati-Equation (PDRE). In this paper, a necessary and sufficient condition for the optimal solution of the defined NSLQR problem is presented, with an additional condition on R(x), and the optimality of the solution is proven with Euler-Lagrange Equation. The PDRE is also studied to obtain the optimal solution and a theorem is given to estimate the solution of the PDRE.

Related Work
A similar problem has been studied in the context of State-Dependent Riccati Equation (SDRE) control strategy since mid-90's. The strategy, proposed by Pearson [1] and expanded by Wernli and Cook [2], was independently studied by Mracek and Cloutier [3] in 1998. Friedland [4], Salamci and Gokbilen [5], Cimen et.al [6,7] also contributed to the existence of solutions as well as the properties of optimality and stability. In the SDRE strategy, a nonlinear system is ''factored'' into the product of the state vector and the state-dependent matrix-valued function in the form _ x x(t)~f (x(t))zg(x(t))u(t)~A Ã (x(t))x(t)zB Ã (x(t))u(t) which is a linear structure having state-dependent coefficients.
Borrowing the LQR theory, the SDRE strategy postulates an approximately optimal feedback control law as for the performance index where P Ã (x) is the solution of an algebraic Riccati Equation (RE) as This strategy has been applied in a wide variety of nonlinear control applications, such as autopilot design [8,9], integrated guidance and control design [10], etc. However, only the necessary condition for optimality has been studied in the SDRE control strategy, and it cannot always be established. So the optimality of the control law in the SDRE control strategy cannot be guaranteed. Since a simplified algebraic RE is employed to obtain P Ã instead of a differential RE, the application of the SDRE control strategy is limited to Slowly Time-Varying and Weakly State-dependent Systems. Moreover, even though the SDRE strategy has been used in many applications, in most cases, the coefficients R Ã and Q Ã in the performance index J Ã are constant instead of state-dependent, as shown in its formulation [7].
The NSLQR problem defined in this paper focuses on the statedependent R(x) and the time-dependent Q(t) in the performance index and starts with the LTV systems. The optimality of the solution is validated by a rigorous proof with Euler-Lagrange Equation. The solution can be obtained by solving a PDRE associated with the problem. The work is a special case of the SDRE control strategy, but with rigorous mathematical proof. It could be considered as a theoretical support for the SDRE control strategy.
On another aspect, the solution of the optimal LTV problem is usually obtained through numerical approximation approaches, which can be roughly classified into offline and online methods. The offline method usually pre-computes solutions and stores them for fast online look-up [11,12]. Since the computation grows exponentially with the size of the control problem, offline methods are normally used in small-and medium-size applications. The most prominent online methods are active set [13] and interior point method [14]. The method of active set performs well in largesize cases even though its convergence rate is unknown. For the interior point method, the reported lower iteration number is larger than the practically observable number. In Ref. [15] and [16], a fast gradient method is introduced to help calculating the lower iteration bound for a quadratic LTV optimal problem with input constraints. Though the work listed above is mainly about the optimal problem with a time-dependent R, the formalism is still applicable when developing the numerical solution for the defined NSLQR problem.

Application Background
The NSLQR problem discussed in this paper can be applied to model a psychological goal pursuit process, as a non-trivial example.
Psychologists observe that there are two different behaviors when intelligent creatures pursue a goal. One is the Goal-Gradient Behavior (GGB) [17][18][19][20], in which the control effort to reach a goal increases monotonically with the proximity to the desired end state, such as the predator stalking behavior and the deadline beating behavior. Fig. 1 (a) and (b) give the normalized goal discrepancy and control effort of the GGB. As it is shown, with a monotonically increasing control energy, as the goal is approached the discrepancy reaches zero faster at the end of the process than it does at the beginning. The other is the Stuck-in-the-Middle Behavior (SMB) [21], in which the control effort to reach a goal is high at the beginning of the goal pursuit and when the desired end state is in sight, but it is maintained at a low level in between, such as what athletes do in Marathon. Part (c) and (d) in Fig. 1 show typically the SMB where the goal discrepancy decreases faster at the two ends than it is in the middle of the goal pursuit process and the control effort is maintained at a low level in the middle.
Both the GGB and the SMB are different from the Conventional Control Behavior (CCB) found in an engineering control system, as shown in part (e) and (f) in Fig. 1. For the CCB, the control effort is proportional to the goal discrepancy, so the effort decreases with proximity to the desired end. The purpose of this paper is to study which one pf these three behaviors is the best. Some computational models of the GGB have been proposed based on psychological interpretation [22,23]. In this paper, a single-task goal pursuit process is modeled as a NSLQR problem and the three behaviors are reproduced for comparison, facilitating ''a deeper understanding of mathematical characterizations of principles of adaptive intelligence'' [24] instead of psychological interpretation.
In the sequel, Section 2 presents the necessary and sufficient condition for the optimality of the solution to the NSLQR problem. Section 3 analyzes the solution of a PDRE involved in the NSLQR problem and presents a Comparison Theorem. Section 4 verifies the feasibility of the NSLQR theory with a LTV system and applies the NSLQR to model a goal pursuit process. The numerical simulation results are presented to demonstrate that the GGB and SMB save control energy and cause less stress over the CCB in some applications. Conclusion and Future Work are presented in Section 5.

Analysis of the Optimality of the Solution
In this section, the main result of this paper is presented: the necessary and sufficient condition of the optimality of the solution to the NSLQR problem defined in Eq (1) and (2). Before that, an Optimality Lemma is introduced first. The Optimality Lemma discusses a more general optimal problem, compared with the NSLQR problem. To differ from the performance index J in the NSLQR, we denote the performance index in the Lemma as J 0 with an associated general L 0 (x(t),u(t),t). The associated augmented performance index is defined as e J 0 J 0 in Eq (10), to distinguish it from the e J J in the NSLQR problem. In this paper, the header ''*'' indicates the associated augmented performance index, as presented on Page 379 in Ref [25].
Lemma 1 For the problem of finding an optimal control law u(t), for which the performance index is minimized along the associated closed-loop system trajectory x(t) of the LTV system (2), with a fixed starting time t 0 and terminal time t f , to simplify notation, define and an augmented performance index as where l 0 (t) is the Euler-Lagrange multiplier, with the boundary condition as Then the point z o~½ (x o ) T ,(u o ) T T that satisfies Euler-Lagrange Equation To make the point z o be the optimal solution, the variation d e J 0 J 0 (x,u) is supposed to be 0 at z o , which leads to Eq (11). However, the point z o satisfying Eq (11) and (12) can either an extreme point or a saddle point of the e The equality holds if and only if x 1~xo and u 1~uo , which contradicts the defination of z 1 . The inequality contradicts Eq (14). Thus the point z o cannot be a saddle point of e J J 0 (x,u) either. Then the point z o must be a minimum point of e J J 0 (x,u). So the solution (x o ,u o ) minimizes the performance index J 0 in (7). From the proof above, it can be said that the classical LQR is a special case of Lemma 1. In the classical LQR theory, the sufficiency of the optimality of the solution obtained from Euler-Lagrange Equation is guaranteed by the convexity of the augmented performance index e J J in its arguments (x,u) [25]. However, in the NSLQR, e J J is no longer convex in (x,u) because of the state-dependent R(x). The theorem below shows that the solution of Euler-Lagrange Equation is still optimal for the NSLQR problem with a constraint on R(x).
Theorem 1 Under the convexity constraint that the function l(x,u)~u T R(x)u is a strictly convex function in x, uniformly in u, the state feedback control law for the NSLQR problem defined in Eq (1) and (2), and the associated closed- minimizes the performance index (1) if and only if the n|n matrix P(t) satisfies the PDRE as with where the column vector Here, the explicit x(t) on both sides of the equation can be eliminated for some sharpened R(x). One example is discussed in Section 3.
This theorem provides an optimal solution, which is similar to that of the classical LQR theory, for the NSLQR problem. However, the PDRE (21) is with an additional term ''M(x o ,u o )'', compared with the standard Riccati Equation in the LQR theory. This term comes from the derivative of the state-dependent R(x) with respect to x in the Euler-Lagrange Equation, as detailed in Proof 2. Theorem 1 can be proven with Lemma 1 as follows. Proof 2 To simplify notation, define The Optimal Solution of a NSLQR Problem and Its Applications and l(t) is the Euler-Lagrange multiplier. The admissible directions of z(t) are denoted as with j(t 0 )~0, j(t f ), m(t 0 ), and m(t f ) being free, since the initial value of Similarly, e J J(x,u) is equivalent to J(x,u) in that they have the same minimizing function, if it exits [25]. Now we need to prove that (19) and (20) minimizes the augmented performance index (27) if and only if P(t) satisfies the PDRE (21) and P(t f )~S(t f ).
We start with the necessary condition. From Euler-Lagrange Equation, it is known that for a point z o~½ (x o ) T (u o ) T to be an optimal solution, the necessary condition is with a boundary condition, as discussed later. For a state-dependent R(x) To simplify notation, the column vector 1 2 Substituting the two equations above into Eq (28), we have which leads to a control law as with l(t) satisfying The variation of e J J(x,u) at the point z o~½ (x o ) T ,(u o ) T T can be written as To minimize e J J(x(t),u(t)), we need to achieve d e J J((x,u); (j,m))~0 for all admissible directions. Since j(t 0 )~0 and j(t f ) is free, the terminal value of l needs to satisfy We choose l(t) such that it is linearly related to x(t) through l(t)~P(t)x(t). Then the boundary condition Eq (36) becomes Substituting the assumption l(t)~P(t)x(t) into Eq (33) and (34), we obtain a control law as where where P(t) satisfies and P(t f )~S(t f ). Now we prove that the solution (x o ,u o ) from the necessary condition of optimality also satisfies the sufficiency condition using Lemma 1.
First, it is easy to verify that the solution (x o ,u o ) in Eq (38) and (39) with P(t f )~S(t f ) satisfies Eq (11) and (12) in Lemma 1. Now we prove that e J J(x,u) in Theory 1 is strictly convex in u, uniformly in x. Considering an arbitrary fixed x, we set j(t)~0, then the variation of e J J(x,u) in u is In terms of Eq (41), we have Since R(x)w0, the equality holds when and only when m(t)~0. So e J J(x,u) is strictly convex in u, uniformly in x. Now we prove that e J J(x,u) is strictly convex in x, uniformly in u. Setting m(t)~0, the variation of e J J(x,u) in x is written as d e J J((x,u); (j,0))~ð where M(x,u)~1 2 In term of Eq (43), we obtain Since the function l(x,u)~u T R(x)u is a strictly convex function in x, uniformly in u as it is stated in Theorem 1, we have Then e J J((x,u); (j,0)){ e J J(x,u) §d e J J((x,u); (j,0)) the equality holds when and only when j(t)~0. So e J J(x,u) is strictly convex in x, uniformly in u.
From the analysis above, it holds that the u o (t) and x o (t) defined in (19) and (20) with the boundary condition (22) satisfy (11) and (12) in Lemma 1. In addition, the augmented performance index e J J(x,u) is strictly convex in x, u separately, uniformly in the other one. We conclude that x o and u o minimize the performance index J in Eq (1) for the NSLQR problem. So u o (t) defined in (19) is the optimal control law for the NSLQR problem.
From the proof of Theorem 1 and Lemma 1, we obtain a corollary for the defined NSLQR problem with a general R(x) as following.
Corollary 1 For the optimal problem defined in (1) and (2) with R(x)w0 and free boundary values of x(t 0 ) and x(t f ), the u o (t) and x o (t) defined in (19) and (20) with P(t) satisfying (21) and (22) is either a minimum or a saddle for the performance index J in (1).
It follows from Eq (42) that e J J is strictly convex in u, uniformly in x if R(x)w0. It then follows from the proof of Lemma 1 that e J J(x o ,u o ) cannot be a maximum. Detailed proof is omitted.
Remark: The corollary signifies that the optimal control law (19) gives the minimum cost for a particular x(t 0 ) among all possible control laws for a general R(x)w0, since u o is the minimum point of J(x,u) with respect to u. However, the cost may be lowered if a different x(t) trajectory with a different x(t 0 ) is chosen since (x o ,u o ) can be a saddle of J. So for a given x(t 0 ), the control law (19) gives the optimal solution.
From the analysis, it can be said that, for the NSLQR problem with a general R(x)w0, the optimal solution (x o ,u o ) needs to be evaluated for every specific x(t 0 ). Whereas in the classical LQR problem or the SDRE problem, the optimal solution can be explicitly written as a function of t or x, uniformly for any x(t 0 ). For the NSLQR to have such uniform solutions, R(x) has to satisfy the additional constraint that the function l(x,u)~u T R(x)u is strictly convex in x, uniformly in u.

A Sharpened R(x)
The PDRE (21) of the NSLQR problem is different from the RE in the SDRE control strategy literature in 1. There is an additional term M(x,u) in the PDRE (21), which is derived from the derivative of the state-dependent R(x) with respect to x; 2. It is a differential RE instead of an algebraic RE; 3. The system state x(t) appears on both sides of the equation.
The way to solve the SDRE is not applicable for the PDRE. To obtain the optimal solution of the NSLQR, it is necessary to investigate the solution of the PDRE (21). In this section, a sharpened R(x) is studied as an example.
As stated in Theorem 1, the function l(x,u) needs to be strictly convex in x, a quadratic function of x seems to be a reasonable choice for the matrix R. So consider where R 0 is a n|n symmetric, positive semi-definite constant matrix, R 1 is a m|m symmetric, positive definite constant matrix and I m|m is the m|m identity matrix. The term R 1 guarantees that R(x) is invertible. Then and the PDRE (21) becomes which can be denoted as where , and x o (t) satisfies the closed-loop system (20).
Remark: The PDRE (50) is coupled with closed-loop system (20). In classical LQR theory, system state x(t) and the solution of Riccati Equation P(t) can be obtained through a 2n-dimensional Hamiltonian matrix [25] or by decoupling the system plant and the Riccati Equation. However, in the NSLQR problem, the 2ndimensional Hamiltonian matrix is not linear any more, and the decoupling is not applicable. The PDRE (50) has to be solved together with the closed-loop system (20).
To generalize the results, the PDRE (50) can be rewritten into a general form as:

_ P P(t)zA T (t)P(t)zP(t)A(t)zG(x,t){P(t)V (x,t)P(t)~0
ð51Þ with a given terminal value P(t f ), and x(t) is a continuous singlevalued function of t. The matrices G(x,t), V (x,t) are positive semidefinite, symmetric, and continuous in both arguments. For convenience of reference, the PDRE (51) is denoted as <(P)~0 in the sequel. The dependence of the variables on t is omitted, for example, x(t m ) is denoted as x m . The time argument of matrices and vectors are omitted when no misunderstanding is introduced. For instance, G m denotes the value of matrix G(x m ,t m ).

A Comparison Theorem for the PDRE
The propositions and theorem introduced below are derived from Proposition 7 and 8 in [26]. In [26], similar results are developed for time-dependent Riccati Equations with initial values. Now, a Comparison Theorem for the PDRE (51) with a terminal value is given. Before presenting the theorem, four propositions need to be established first.
Proof 3 Since the matrices and vectors in the PDRE (51) are continuous in the arguments, P(t), U(t) and W (t) are continuous in t. We start with part (1) of the proposition.
Suppose that U(t)vP(t) does not hold on D, there must be a time t m [½t 0 ,t f ) by the Mean Value Theorem, such that and for any t[(t m ,t f , as shown in Fig. 2. Let where r is a non-zero constant vector. Then and the equality holds if and only if t~t m . Therefore, w(t)v0 in a right neighborhood of t m , which contradicts Eq (55). So U(t)vP(t) holds on D.
Part (2) can be proven similarly. (1) If U(t) is a symmetric solution of the inequality <(U)w0 on D such that U(t f )ƒP(t f ), then U(t)ƒP(t) on D.
Proof 4 The inequality part of the proposition has been proven in Proposition 1. The following part is the proof for the case in which U(t f )~P(t f ). We start with part (1).
As it is discussed in Proof 3, P(t), U(t) and W (t) are continuous in t. Assume that U(t)ƒP(t) does not hold on D when U(t f )~P(t f ), there must be a period ½t m ,t n [½t 0 ,t f , such that for any t[½t m ,t n ), and Let where r is a non-zero constant vector. Then and the equality holds if and only if t~t n .
Therefore, w(t)w0 in a left neighborhood of t n , which contradicts Eq (60). So U(t)ƒP(t) holds on D.
Part (2) can be proven similarly. (1) If U(t) is a symmetric solution of the inequality <(U) §0 on D such that U(t f )ƒP(t f ), then U(t)ƒP(t) on D.

Proposition 3 Let P(t) be a symmetric solution of the PDRE (51) on
The proofs of these two propositions are similar to those of Propositions 1 and 2.
These four propositions give a boundary estimation for the solution of the PDRE. Based on the four propositions discussed above, a Comparison Theorem is introduced for the PDRE (51).

Application of Comparison Theorem
For the PDRE (50), it is readily verified that and With Theorem 2 in Section 3.2, it follows that the solution P(t) of The Optimal Solution of a NSLQR Problem and Its Applications PLOS ONE | www.plosone.org the PDRE (50) satisfies where P 1 (t) is the solution of and P 2 (t) is the solution of These two equations are differential REs, which are similar to the algebraic RE in the SDRE control strategy. The analysis shows that the solution of a PDRE can be estimated by the solutions of two differential REs. Thus the methods of solving a differential RE can be borrowed to facilitate the solution to a PDRE, such as determining the initial value for P(t).

Simulation
In this section, the NSLQR problem studied above is applied to two specific simulation cases to first verify the optimality numerically and then to model the goal pursuit process introduced in the section Introduction, so that the three goal pursuit behaviors, GGB, SMB and CCB are reproduced for further studying.
The NSLQR problem is, technically, a Two-Point Boundary Value (TPBV) Problem, since the initial value of the system plant (20) x 0 and the terminal value of the PDRE (21) P f are known. A shooting method is employed in the simulation with sacent iteration to solve this TPBV problem [27]. The convergence of the method is slightly slower than being second-order with the chosen R(x), as it is discussed in [28] and [29]. The solver of ''ode4 Runge-Kutta'' is chosen in the simulation with a fixed step of 0:01 second. The simulation error threshold is set as 10 {6 .

Numerical Verification of the NSLQR Optimality
4.1.1 Simulation Model. In this section, we consider a specific optimal problem of seeking a control law u(t) to minimize the given performance index (1 : x 2 (t)z4 : x 2 (t) : u 2 (t))dtz along the first-order LTV system as : with A~{ 1 tz1 , B~1, Q~1, S(t f )~0 and R~4x 2 (t).
Three forms of control law are considered. The first one is the optimal control law u o (t) defined in Eq (19) with the P(t) from the PDRE (21). The second control law is the same u o (t) in Eq (19) but with the P(t) from a standard Differential Riccati Equation, which has no additional term of M(x,u). The third control law is the same u o (t) in Eq (19) with a perturbation of 0:01. To differentiate with each other, the three control laws are named as ''Optimal Solution'',''Riccati Perturbation'' and ''Control Perturbation'', respectively. Table 1 summarizes the parameter details used in this simulation case as well as the values of performance index J 1 . Fig. 3 gives the system behaviors with the three different control laws.
4.1.2 Discussion of Results. From Fig. 3, it is valid to say that the three control laws, with the same function of R(x), all bring the system state a stable behavior. However, the optimal control law u o (t) supplies the minimal performance index J 1 , as shown in Table 1. Even though it cannot be said with great confidence that the control law u o (t) is optimal since it is difficult to verify infinite numerical examples. It is still proved that the classical LQR solution, which is the ''Riccati Perturbation'' case, provides a greater value of J 1 than the optimal control law u o (t) does, so the classical LQR theory is not applicable to the NSLQR problem anymore. Moreover, the NSLQR theory does provide the minimal value of J 1 among all three control laws, which verifies the optimality of the NSLQR theory at some degree.

Modeling Goal Pursuit Behaviors.
From a psychological perspective, the system state x(t) in the NSLQR problem represents the goal discrepancy. The parameter A(t) in system model (2) represents the goal attraction. For a constant A, all the eigenvalues having negative real parts (asymptotically stable) means the goal is attractive; if the eigenvalues are with nonpositive real parts and those eigenvalues with zero real parts are simple (marginal stable), then the goal is neutral; otherwise (unstable), the goal is repulsive. Similar interpretations apply to time-varying A(t), where the asymptotic stability, marginal stability and instability can be interpreted as attractive, neutral and repulsive goals. The input u(t) represents the level of control effort, while the parameter B(t) is treated as control effectiveness. In the performance index J, the weighting coefficient Q(t) functions as goal discrepancy penalty. A greater value of jjQjj results in less discrepancy; and the weighting coefficient S(t f ) is known as terminal penalty. A greater value of jjSjj leads to smaller terminal goal discrepancy; the weighting coefficient R(x) is control energy penalty, which depends on goal discrepancy. A greater value of jjRjj means less control energy expenditure is allowed. x 0 x f P 0 P f u(t) P(t) J 1

Simulation Model.
As an initial study, we consider a first-order linear goal attainment process in the simulation case. The process is set as which is a neutrally attractive goal process. The parameter R(x) is the concern in the current simulation case study. Based on the analysis above, we hypothesize that the GGB can be produced by R(x) that is a monotonically increasing function of jjxjj; the SMB can be produced by R(x) that is a hump function of jjxjj; and the CCB can be produced by a constant R. Table 2 gives the value of the parameters used in the simulation. The simulation results are presented in Table 3 and Fig. 4. From Table 2, it can be seen that the choice of R(x) in the GGB satisfies the convexity constraint of the function l(x,u), so the solution is optimal for the GGB. Even though the convexity constraint is not satisfied in the SMB, from Corollary 1, it still can be said that the solution is optimal since x(t 0 ) is fixed. For a meaningful comparison, the parameters are adjusted such that three behaviors achieve roughly equal terminal values, as the values of x tf shown in Table 3. From Fig. 4 Part d), it can be seen that, the NSLQR method has successfully modeled the goal pursuit process with three behaviors. Part a) and Part b) show the control energy penalty matrix R vs time t and goal discrepancy x, respectively. The hypothesis is validated that when R is a monotonically increasing function of jjxjj, the goal pursuit process can exhibit the GGB; when R is a hump function of jjxjj, the goal pursuit process can exhibit the SMB; and when R is a constant, the goal pursuit process can exhibit the CCB. Part e) and Part f) present the control effort u vs time t and goal discrepancy x, respectively. For the CCB, the control effort increases as it approaches the goal; for the GGB, the control effort decreases with goal discrepancy; and for the SMB, the control effort is higher at two ends than in the middle. Table 3 lists some selected norms of the goal discrepancy x(t), and the control effort u(t). It shows that, with the same initial value x 0 and terminal value x f , the CCB features the least accumulated error jjxjj 1 , but consumes the most control energy jjujj 2 and suffers from the highest stress level jjujj ? . The SMB consumes the least control energy and sufferers from the lowest stress level at the price of a higher accumulated error. The CCB is in between of these two.
4.2.3 Discussion of Results. Based on the simulation results above, it is concluded that in pursuing a goal with a finite terminal time (deadline), the GGB and the SMB behaviors may save control energy and reduce stress level over the CCB. However, the CCB has the least accumulated error. So the GGB and SMB may be beneficial in applications where only the level of goal attainment at the terminal time is of concern, such as a deadline beating process. However, the GGB or SMB would not a preferred choice when the goal needs to be maintained over long time or needs to be approached smoothly.

Conclusion and Future Work
In this paper, a necessary and sufficient Optimality Theorem with rigorous mathematical proof for the NSLQR problem with a convexity constraint on R(x) is presented. It is also argued that for given x(t 0 ), the NSLQR gives an optimal solution. A Comparison Theorem for the solution of the PDRE with a general form for the NSLQR problem is presented as well. In the simulation, the NSLQR is first applied to a first-order LTV system to verify the proposed theory. The NSLQR is then used to model two psychological behaviors (GGB and SMB) in goal pursuit processes identified from psychology, along with the typical behavior of engineering control systems (CCB) by employing different control energy weighting R(x). The simulation results show that the NSLQR modeling method can reproduce the three goal pursuit behaviors and the psychological goal pursuit behaviors can be more beneficial than the CCB in terms of energy saving and stress reduction in applications where only the goal discrepancy at the terminal time is of concern, such as in Marathon race, animal stalking, beating a deadline or hitting a target.
In this paper only some scalar cases of the goal pursuit process are studied; studies of the multi-variable cases are the next steps of our work. In the current study, the parameter R(x) is selected to reproduce the goal pursuit behaviors. Similar results should be achievable with a state-dependent goal discrepancy weighting Q(x), which would be more akin to intuitive psychological tendency to employing the GGB and SMB strategies in terminal goal pursuit processes; whereas the control weighting modeling is more akin to conscious choice of the GGB and SMB for its energy saving and stress reduction benefits. Since for the NSLQR problem the PDRE has to be solved simultaneously with the closed-loop system, it is a TPBV problem. An inherent difficulty in this TPBV problem is how to determine the initial value of P(t). An Approaching-Horizon algorithm based on a shooting method is developed to address this problem, which will be presented in detail in a separate paper. The Optimal Solution of a NSLQR Problem and Its Applications