Evidence for Composite Cost Functions in Arm Movement Planning: An Inverse Optimal Control Approach

An important issue in motor control is understanding the basic principles underlying the accomplishment of natural movements. According to optimal control theory, the problem can be stated in these terms: what cost function do we optimize to coordinate the many more degrees of freedom than necessary to fulfill a specific motor goal? This question has not received a final answer yet, since what is optimized partly depends on the requirements of the task. Many cost functions were proposed in the past, and most of them were found to be in agreement with experimental data. Therefore, the actual principles on which the brain relies to achieve a certain motor behavior are still unclear. Existing results might suggest that movements are not the results of the minimization of single but rather of composite cost functions. In order to better clarify this last point, we consider an innovative experimental paradigm characterized by arm reaching with target redundancy. Within this framework, we make use of an inverse optimal control technique to automatically infer the (combination of) optimality criteria that best fit the experimental data. Results show that the subjects exhibited a consistent behavior during each experimental condition, even though the target point was not prescribed in advance. Inverse and direct optimal control together reveal that the average arm trajectories were best replicated when optimizing the combination of two cost functions, nominally a mix between the absolute work of torques and the integrated squared joint acceleration. Our results thus support the cost combination hypothesis and demonstrate that the recorded movements were closely linked to the combination of two complementary functions related to mechanical energy expenditure and joint-level smoothness.


General settings and framework
Let us recall that the dynamical system under consideration corresponds to a two-joint and planar arm and = g{m 1 l c1 cos θ 1 + m 2 (l c2 cos(θ 1 + θ 2 ) + l 1 cos θ 1 )}, The numerical values for the above quantities can be found in Table S1, which comes from Winter's documented tables [1]. notation description value and unit M s total mass of the subject measured, kg L s height of the subject measured, m m 1 mass of the upper arm ≈ M s × 0.028 kg m 2 mass of the forearm (+hand) ≈ M s × 0.022 kg l 1 length of the arm ≈ 0.186 × L s m l 2 length of the forearm ≈ (0.146 + 0.108) × L s m l c1 length from shoulder to center of mass of the arm ≈ l 1 × 0.436 m l c2 length from shoulder to center of mass of the forearm ≈ l 2 × 0.682 m I 1 inertia with respect to center of mass of the arm ≈ m 1 × (l 1 × 0.322) 2 kg.m 2 I 2 inertia of the forearm w.r.t center of mass of the forearm ≈ m 2 × (l 2 × 0.468) 2 kg.m 2 g gravity acceleration ≈ 9.81 m.s −2 Table S1: Anthropometric parameters taken from [1] and gravity acceleration.
We also impose the following constraints to keep state and control values within biological bounds: For a deterministic OCP, Pontryagin's maximum principle (PMP) provides necessary conditions for optimality and the under-determination of the final point is resolved by means of the transversality conditions.
When relevant and possible, the PMP was used to verify the validity of the solutions found by the numerical technique we employed (GPOPS) during inverse optimal control. For certain models/costs (trajectory optimization, geodesic...), we also compared those solutions with the ones obtained from the original method, as described by their respective authors. Nevertheless the general and unified framework provided here allowed to compare several existing models and to apply the inverse optimal control algorithm described in the main text.
2 Solutions to the optimal control problems

Preliminary: necessary conditions for optimality
A general necessary condition for optimality is provided the maximum principle ( [2,3]). We will first neglect the state constraints and just check them a posteriori.

Clarke's version of PMP
Consider a control systemq = f (q, u) with a cost of the form C(u) =´T 0 g(q, u)dt where f and g are smooth and Lipschitz-continuous functions, respectively. The goal is to connect an equilibrium source point q s to an equilibrium point of the target manifold B = {q ∈ R 8 such that m(q) = 0}.
If the control u ∈ U is optimal then there exists a non-trivial absolutely continuous mapping (p(.), λ) : [0, T ] → R n × R + called adjoint vector, where λ is a non-negative constant, such that the trajectory associated to control u verifies the following conditions : where H(q, p, λ, u) =< p, f (q, u) > +λg(q, u) is the Hamiltonian of the system, and moreover : In addition, the transversality condition must hold at the terminal point, i.e., p(T ) ⊥ T q(T ) B.
In the above statement, ∂ q H(q, p, u) denotes the Clarke's generalized gradient for locally Lipschitz functions, i.e., the convex hull of all possible limits of derivatives of H at points q n ∈ R 8 with q n → q [3]. Note that if all the quantities involved are smooth, the generalized gradient reduces to the classical differentiation.
In our experiment, the target bar was characterized by the following manifold: The first row of the mapping m defines the geometric target bar constraint, while the remaining rows account for the fact that the terminal point is an equilibrium point for the arm dynamics. In particular, the joint torques have to counteract the gravitational torques. so that, indeed, we obtain a boundary value problem.

Computation of the optimal solutions for the different models/costs
The numerical inverse optimal control method required formulating different models within a single class of optimal control problems. To this aim, the original models were sometimes slightly modified to fit in a unified framework. We checked, as described below, that the arm trajectories provided by these reformulated problems were nevertheless consistent and close enough to the original solutions.

Minimum hand jerk, angle jerk, and angle acceleration models
All these problems, involving kinematic costs, are independent on the system dynamics, and were originally formulated as trajectory optimization problems. For instance, the minimum Cartesian hand jerk problem was solved by [4], by applying Euler-Lagrange necessary conditions for optimality.
Consider a minimum jerk problem. Let v be some variable whose jerk has to be minimized, i.e. we seek , under the constraint of connecting an equilibrium source point v S to an equilibrium terminal point v T in time T . Then it is well-known that the optimal path is given by a 5th order polynomial: with s = t T . In a subsequent step, inverse dynamics computations are required if one wants to obtain the corresponding control action or torques.
Moreover, in this study, the terminal state is not a point but a manifold (the bar, denoted by B). We used a straightforward approach to solve this problem, that is to seek for the final point on the bar that yields the minimum cost. To this aim, we solved a non-linear constrained optimization problem. Defining The constraint is that the terminal point on the bar has to be within the reachable region of the arm. In fact, it is seen that these models lead to the shortest possible path to the manifold, in the space on which the variable v is lying.
It is worth noting that the above optimal trajectory v is not necessarily admissible for the dynamical system (Σ). In fact, in the framework provided by the system (Σ), these models could also be treated as classical OCPs. Therefore, we also solved these OCPs using numerical techniques for optimal control (such as GPOPS, [5]). These two methods yielded identical solutions for what concerned the finger paths.

Geodesic model
Following the original formulation of [6], we treated the problem by separating its spatial and temporal aspects. Firstly, geodesics are found by solving a boundary value problem. Here geodesics are meant with respect to the kinetic energy metric of the arm. Secondly, the time course of the finger is derived from a minimum jerk velocity profile. Then, to solve our particular reaching-to-a-bar problem, we discretized the bar every 0.5 cm and solved the problem for all points, similarly to what was done for 3-D arm movements in [6] (in which the problem of free final point occurred as well). We selected the best terminal point, i.e. the one associated to the minimal geodesic length.
As pointed out previously, the optimal solution obtained using this method may not be admissible for the system (Σ). Therefore, we also formulated this problem as an OCP and solved it numerically using GPOPS.
This led to the same geometric paths (although not the same time courses).

Smooth OCPs: the minimum torque change example
In this subsection, we deal with smooth OCPs which can be solved by applying the PMP. Here and in the next subsection, a sketch of the procedure for establishing the necessary conditions for optimality using the PMP is given.
Before going further, let us rename the variables as follows: The control system can thus be rewritten as: For any x 2 , M is always invertible. We set The explicit expression of the elements of M −1 is: and:M 11 = m 2 l 2 c2 + I 2 d , Following the current notations, the integral cost for the minimum torque change is: The Hamiltonian function is: As usual in the resolution of OCPs, singular extremals (λ = 0) and regular extremals have to be considered.
Let us recall that a bang extremal is an extremal such that, for almost all time, one of the control component, In that case the adjoint equations are: A necessary condition for optimality is the minimization of H along a trajectory. The optimal control has to satisfy: (u 1 , u 2 ) = arg min u1,u2 H(x 1 , . . . , w 2 , p 1 , . . . , s 2 , u 1 , u 2 ). Note that for autonomous systems and fixed time the Hamiltonian remains constant along the optimal trajectory.
When a component of the commutation function is not zero, the control is bang and simply given by: However, it could happen that s i ≡ 0 on a non-trivial time interval. In such a case, we would getṡ i ≡ 0 and thus, from Equation 15: w i ≡ − 1 2 r i . But u i =ẇ i from Equation 9. Therefore, we get: Consequently, regular extremals may switch from bang pieces of trajectories to non-bang ones, at times t where (s i (t),ṡ i (t)) = (0, 0). In general a switching time t results from the condition s i (t) = 0. Here, the control vector (u 1 , u 2 ) might have discontinuities. Notice that, in contrast, the state and adjoint vectors are continuous, just as classical solutions of differential equations and assuming that the boundary values on the state variables are not active (which is checked a posteriori).
In practice, we solved this problem with high precision by using the following procedure: • Initialize the shooting method with a good guess of the optimal solution structure (adjoint vector, commutation times, control sequence etc.). Note that in practice a shooting problem can reveal itself a difficult computational challenge because the radius of convergence may be quite small. The standard approach is to initialize the shooting method by using a guess arising from a numerical optimal control technique.
• Use a Gauss-Newton method to solve the resulting two-point boundary value problem, in which the unknowns are the final adjoint vector, the commutation times, and the terminal position on the bar.
The optimal solution for the starting posture P3 is depicted in Figure S1. It is clear that the necessary conditions are met with a very good precision. Since we only found one extremal meeting these conditions, this extremal was expected to be the global optimal solution. Numerical simulations using GPOPS confirmed this result and revealed that the numerical method was accurate to find the arm trajectories to the bar.
Similar analytical calculations can be performed for the minimum torque and effort models. However, for costs including the absolute work, i.e. a nonsmooth cost, the scenario can be generally more complex as explained below. Control and Hamiltonian function. One can also check that the Hamiltonian is constant and that the control strategy is of type "bang-regular-bang" (with very short bang pieces of trajectory).

Nonsmooth OCPs: costs including the minimum absolute work.
Considering nonsmooth costs (i.e. not differentiable) substantially complicates the problem of solving an optimal control problem. However, non-differentiality arises naturally when computing the mechanical energy expenditure of a movement. This involves a quantity such as the absolute work of torques (for the energy and hybrid models). A comprehensive mathematical study was provided by [7] for costs involving the absolute work of torques. Such costs imply the use of a nonsmooth version of the PMP [3].
Using the same notations as above, the hybrid cost can be written as: The corresponding Hamiltonian function is: Again, it can be shown that singular extremals are normal. Thus there are only two kinds of extremals in this OCP: singular bang extremals and regular extremals. Next, we consider regular extremals. Let us set λ = 1, as above.
In that case the adjoint equations of Clarke's extension of the PMP are: with I = [−1, 1], coming from Clarke's generalized gradient.
Here, the nonsmooth maximum principle leads to differential inclusions for the adjoint vector at points where H is not continuously differentiable. These points are those for which y i z i vanishes (i.e., when the power of torques becomes zero). The optimal scenario can be quite complex because there may exist pieces of trajectory such that y i ≡ 0 and/or z i ≡ 0, for which only inclusions are given by the maximum principle. In such a case, higher order derivatives of y i and z i should be considered to get the control expression.
However, based on the exact solutions derived for the torque-control case in [8] and on numerical simulations, it appears that the optimal strategy is generally similar to what happens for the minimum torque change ∂zi , along such a piece of trajectory. This inclusion is an equality if z i = 0. The control u i can be determined by differentiating the latter expression. Otherwise, z i ≡ 0 and, thus, a necessary condition for optimality is that zero belongs to a certain time-varying interval. This was referred to as an "inactivation" in [8]. In that case, we have necessarily w i ≡ 0 and, in turn, u i ≡ 0.
Remark: When the optimal control is discontinuous, one would rather avoid the use of collocation methods since they rely on global polynomial interpolation. An alternative method is to use a direct transcription of the OCP (with a trapezoidal Euler's discretization scheme for instance). In that case, the control is approximated by a piecewise constant function. Nevertheless, we noted that collocation methods yielded quite precise finger trajectories by comparing with the exact solutions derived from the PMP. In fact, the error only increased significantly for higher order variables such as for the torque change or control vectors but we focused our analysis on the hand paths and velocities. However, it was still possible to guess the optimal scenario, allowing in turn the use of the shooting method described above. Finally, in order to deal with nonsmooth costs using numerical techniques, the absolute value function was replaced by the smooth function: z → tanh(10z)z. This is because NLP methods are generally gradient-based.