When Optimal Feedback Control Is Not Enough: Feedforward Strategies Are Required for Optimal Control with Active Sensing

Movement planning is thought to be primarily determined by motor costs such as inaccuracy and effort. Solving for the optimal plan that minimizes these costs typically leads to specifying a time-varying feedback controller which both generates the movement and can optimally correct for errors that arise within a movement. However, the quality of the sensory feedback during a movement can depend substantially on the generated movement. We show that by incorporating such state-dependent sensory feedback, the optimal solution incorporates active sensing and is no longer a pure feedback process but includes a significant feedforward component. To examine whether people take into account such state-dependency in sensory feedback we asked people to make movements in which we controlled the reliability of sensory feedback. We made the visibility of the hand state-dependent, such that the visibility was proportional to the component of hand velocity in a particular direction. Subjects gradually adapted to such a sensory perturbation by making curved hand movements. In particular, they appeared to control the late visibility of the movement matching predictions of the optimal controller with state-dependent sensory noise. Our results show that trajectory planning is not only sensitive to motor costs but takes sensory costs into account and argues for optimal control of movement in which feedforward commands can play a significant role.


Optimal Controller
We start from the control-observation model with signal-dependent and state-dependent noise (see Main text for details of this formulation): The state of the system x is estimated asx by a Kalman filter.
x t+1 = Ax t + Bu t + K t (y t − Hx t ) + η t , where the initial mean and covariance of the estimated state,x 1 and Σ 1 are given. Now, we assume that the cost-to-go function v t (x t ,x t ) can be represented by the following "affine-quadratic" form: where e = x −x and the final conditions are given as S x n = Q n , S e n = 0, s x n = s n = 0. Here we use the term "affine-quadratic" to distinguish it from the conventional quadratic form that has no linear term s x t x t . Now, assume that the system is optimally controlled for t = [t + 1, · · · , n] and an optimal feedback policy u t = u t (x t ) is given. Then the corresponding cost-to-go function v t (x t ,x t ) should satisfy the following Bellman equation: To represent the last expectation term with variables at time t, we use the following dynamics of x,x and e: and their conditional means and covariances: Given that BF t consists of mutually independent random variables [ε 1 t , · · · , ε c t ] each of which follows a unit normal distribution (i.e. , the same decomposition can be applied to g t (a+κd x t ), which results Note that κ is now included in D. Using these, the covariances become Since the expected cost-to-go at time t + 1 is: where each element can be represented by variables at time t using Equation (3, 4, 5, 6) Putting these together, the cost-to-go function at time t can be written as where Then, the optimal policy is u Since the policy should depend on the observed statex, not x, we take a conditional mean of the policy using E[x t |x t ] =x t , which results u t = −L txt − l t = −L t (x t − e t ) − l t . Applying this to the equation (7), we finally get Together with v n = x n Q n x n and v t , this is also in an affine-quadratic form and therefore completes the proof by induction that the cost-to-go function is always affine-quadratic. The corresponding update rule is where the final conditions are S x n = Q, S e n = 0, s x n = 0, s n = 0. Assumingx 1 and E[e 1 e 1 ] = Σ 1 are known and E[e 1 ] = 0, the total expected cost is

Optimal Estimator
Now we determine the Kalman filter K t that minimizes the expected cost-to-go function at each time step. As the cost-to-go-function is also affine-quadratic with respect to K t , ∂E[v t ]/∂K t = 0 is a necessary and sufficient condition for the minimum.
Using the following matrix lemmas: where Σ's are uncentered and unconditional covariances (i.e. ) and x t is the unconditional mean of x t . Note thatx t represents the nominal trajectory, a trajectory that would be produced by controller if there are no noises.
Given that the dynamics of e, x,x are update rules for the unconditional means of e, x,x arē with initial conditionsē 1 = 0,x 1 =x 1 =x 1 . Update rules for covariance Σ e t and Σ x t are where Initial conditions are Σ e 1 = Σ 1 , Σx 1 =xx , Σx e 1 = 0.

Switching between affine and constant form
One problem of the suggested affine state-dependency is that the noise will increase when a i + D i x t becomes negative. To prevent this, for each iteration, we simply set the affine term to be zero (a i = 0 and D i = 0) when a i + D ixt becomes negative. We will show later ( Figure S1) that the algorithm still stably converges to the solution.

Iterative Solver
As originally suggested by Todorov [1], the optimal solutions for both controller (L t and l t ) and estimators (K t ) can be obtained by iterative updates that guarantee the convergence. Again, this guarantee does not hold any more in our case due to the switching behavior, but we will empirically show that the algorithm converges. The overall procedure is sketched in Algorithm S1.

A system with de-coupled dynamics
To understand the characteristics of the suggested affine controller, we consider a typical case when the deterministic dynamics (x t+1 = Ax t + Bu t ) of the system and the cost matrices (Q and R) are decoupled into k mutually independent sub-systems, i.e. when A, B, Q and R are all blockdiagonal matrices with k-blocks of the same structure and therefore the whole system is represented as follows: . .
Let S k be a group of all k-block-diagonal matrices (i.e. A, B, Q, R ∈ S k ). Note that this group is closed under multiplication (i.e. if X ∈ S k and Y ∈ S k , then XY ∈ S k ). As will be shown in the next section, our two-dimensional reaching experiment can be considered to be this case (k = 2). Now, let U {j 1 ,j 2 ,··· } be a set of column vectors (in either state, control or observation space) that can only have non-zero elements in the entries that correspond to the blocks j 1 , j 2 , · · · . In addition, let e m be a standard basis vector (either in state, control or observation space) that has 1 at the m-th entry and 0 elsewhere, by which a set of all matrices that can only have a single non-zero element is represented as B = span(e q e r ) ∀ q, r. Based on these notations, we prove the following two theorems featuring important characteristics of the derived optimal controller.
First, it can easily be shown that: and this leads to Proof. Remember the following update rule for L t : and all other components except S x t+1 belong to S k by definition. If we assume that S x t+1 ∈ S k , then L t ∈ S k and also S x t ∈ S k by the update rule for S x t+1 : by the lemma and all other components, including L t , belong to S k . As S x n = Q ∈ S k , this completes the proof by induction.
The above result shows that the resultant feedback controller L t x t is always decoupled into k independent controllers, regardless of the couplings caused by signal-dependent or state-dependent noises.

and this results
Theorem 2. If D i ∈ span(e q e r ) and e r ∈ U {j i } ∀i = [1, · · · , d], then l t ∈ U {j 1 ,··· ,j d } ∀t = [1, · · · , n − 1] Proof. Consider the update rule for l t : (10 revisited) From the above proposition, we have proved that the whole matrix on the lefthand side of s x t+1 belongs to S k . Then using the lemma above, we can only show that s x t+1 ∈ U {j 1 ,··· ,j d } . Remember the update rule for s x t is and the final condition is s x n = 0 ∈ U {j1,··· ,jd} . If s x t+1 ∈ U {j1,··· ,jd} , then (A−BL t ) s x t+1 ∈ U {j1,··· ,jd} from the lemma above, and for each i, Therefore s x t ∈ U {j1,··· ,jd} too and this completes the proof by induction.
This result shows that the feed-forward controller l t only controls sub-system dynamics that affect the observation noise.
Taken together, if a given system dynamics is decoupled, the resultant optimal controller exhibits two important characteristics. First, the feedback part of the controller is decoupled, where each sub-controller independently drives the corresponding sub-state to the goal state. Second, on the other hand, the feed-forward part of the controller only generates control signals that affect (possibly reduces) the observation noise. Therefore, the feed-forward part can be considered as an offline, pre-planned motor program that collects the sensory information, and the feedback part can be considered as an online motor program that tries to achieve the given task goal based on the sensory information that is collected online.

Model of two dimensional reaching
The dynamics of the reaching movement was modelled as a linear two-dimensional dynamics. The model assumes a point-mass system to which the control signal is smoothened by a muscle-like second order low-pass filter. Deterministic dynamics and observer of the one-dimensional model are represented as follows: where p i t , v i t and f i t are the position, velocity and force of the hand in i th dimension at time t, p * i is the goal position, ∆ is the time step (set to 0.01 s), m is the effective mass of the hand (set to 1 kg), and τ is time constant of the low-pass filter [1]. Corresponding cost matrices are given aš where O m×n represents a m × n zero matrix, r is a regularization factor, and w v and w f are relative weights of the penalties on non-zero final velocity and force, with respect to the positional penalty.
Now we build a two-dimensional system by stacking the above system: Both systems use the same ∆ and τ 's. Similarly, cost matrices will be Now we add noise elements to the system. For convenience, signal and state independent noise terms ξ t , ω t are both set to zero and the covariance of internal noise Ω η is set to be Ω η = ω 2 η diag([1 0 0 0 0 1 0 0 0 0]), which means that the internal noise only affects the position update. Effect of assigning non-zero values for these zero terms will be tested in the sensitivity analysis.
As the start and the target position are known to subjects almost deterministically, we put the initial state covariance Σ 1 to be 0. Effect of non-zero initial positional uncertainty will be tested in the sensitivity analysis. The matrix for the signal dependent noise F t is modelled as where α is a constant noise amplification factor ε i t 's are random variables each of which follows a unit normal distribution. In theory, each element should have its own amplification factor, but we used one number for simplicity. Similar to the case of signal-independent noises, effects of having individual factors will be tested in the sensitivity analysis. BF t is later decomposed into 4 i=1 ε i t C i where each C i has only one non-zero element α. Note that, since F t has off-diagonal terms, the signal-dependent noise is coupled.
The state-dependent observation noise terms g t , a, d are modelled as follows: which defines our experimental situation when the velocity in the visibility modulation direction (θ) affects the positional sensing. Note that a specifies the threshold above which the cursor becomes fully detectable. This state-dependent term is later decomposed into 2 i=1 i t (a i + D i x t ), where a 1 = βae 1 and a 2 = βae 4 (remember we defined e m to be a standard basis vector with 1 at the m-th entry and 0 elsewhere). D 1 is a 6 × 10 matrices that has D 1,2 = −κβcos(θ), D 1,7 = −κβsin(θ) and zero elsewhere. D 2 is a 6 × 10 matrices that has D 4,2 = −κβcos(θ), D 4,7 = −κβsin(θ) and zero elsewhere. Note that, except when θ = 0, π/2, this has two non-zero elements and therefore doesn't fit the assumption in Section 5, but can be transformed to a single non-zero element matrice if the coordinate frame is properly rotated so that one of the coordinate axis aligns to θ. Again, effects of having additional non-zero terms in g t , such as an effect of state-dependent noise in the velocity sensing, will be tested in the sensitivity analysis. Figure S1: Cost per iteration. Total 700,000 simulations for all reaching directions in experiment 6.2 Convergence analysis Figure S1 shows a summary of the cost per iteration for all 700,000 simulation runs (100,000 × 7 directions from 0 to 180 degrees) during the sensitivity analysis described in the main manuscript.
We calculated the average, minimum and maximum costs for each iteration (up to fifth) in order to examine whether the suggested method can robustly reduce the total cost through iterations. The result suggests that the iterative method stably reduces the total cost over the iterations.