Continuous evaluation of cost-to-go for flexible reaching control and online decisions

doi:10.1371/journal.pcbi.1011493

Fig 1.

Model architecture.

The feedback-based controller from the optimal feedback control framework (rightmost loop, black) is augmented by an additional loop (leftmost loop, gray) that selects the controller as a function of task parameters and state estimation. This additional loop is responsible for controller adjustments during movement.

More »

Expand

Fig 2.

Simulation results: narrow and wide targets.

A Mean reaching traces in presence of a rightward mechanical perturbation (represented by the black arrow) for trials initially directed towards a narrow (red) or wide (blue) target. The full and dashed lines represent the trials with and without switch in target structure, respectively. B Mean and standard deviation of the x-position for the different target conditions. Time is aligned with movement initiation and the vertical dotted, full, and dashed lines respectively represent the target switch onset, the force onset, and the time at which the controller was updated. C Simulated end-point variances along the x-axis in absence of mechanical disturbances for trials without (left bar plot) and with change in target structure (right bar plot). The left panel represents the switch from square to rectangle and the right one the switch from rectangle to square. D Mean traces of the transverse velocity in the different target conditions. E Mean traces of the x-motor command for the different target conditions. For comparison with experimental data, the panels A, B and C correspond to the Fig 3 in [17] and the panels D and E correspond to the Fig 4.

More »

Expand

Fig 3.

Simulation results: Continuous change in target width.

A Mean reaching traces in presence of a rightward mechanical perturbation (represented by the black arrow) for the different target conditions in the absence (black) or presence of target change (green: slow continuous change, blue: fast continuous change, and magenta: instantaneous change). B Mean and standard deviation of the x-position for the different target conditions. Time is aligned with movement initiation and the vertical dotted, dashed, and full lines respectively represent the target switch onset, the time at which the controller was updated, and the force onset. C Simulated end-point variances along the x-axis in absence of mechanical disturbances for the different target conditions. D Mean traces of the transverse velocity for the different target conditions. E Mean traces of the x-motor command for the different target conditions. For comparison with experimental data, the panels A, B, and C correspond to the Fig 2 in [18] and panels D and E correspond to the Fig 3.

More »

Expand

Fig 4.

Simulation results: Online motor decisions.

A Individual simulated hand traces in presence of multiple alternative targets (green, magenta and blue traces represent trials reaching the left, center and right target respectively). Each row corresponds to a different reward condition (top and bottom for same and different rewards, respectively) and each column represents a different force level for the mechanical disturbance (-10N, -5N, 0N, 5N, and 10N from left to right). B Proportion of the trials that reached the lateral targets (green and blue lines for left and right targets, respectively) as a function of the intensity of the mechanical perturbation. Full line represents the same reward condition and dotted line represents the different rewards condition. C Proportion of trials that reached the rightward target in presence of a rightward disturbance (5N) as a function of the difference in reward between the central and the right targets. Positive difference values favor the central target which was more rewarding than the lateral ones. D Hand traces for the illustrated example of the relationship between cost-to-go function and behavior, corresponding to the fourth condition of the second row of panel A (different rewards, slight rightward perturbation). The full magenta line represents a trial where participant’s hand reached the central target (corresponding to the graph of panel E) and the dotted blue line represents the one that reached the right target (corresponding to the graph of panel F). E-F Representation of the cost-to-go values associated with each target (green = left, magenta = center, blue = right) for trials in the different rewards condition and rightward mechanical perturbation. The panel E represents the cost-to-go values for the trial that ended at the central target and the panel F represents those for the trial that ended at the right target. The red arrow captures the time at which the right target became the goal target for the right trial and the rectangular insets at the bottom of the panels represent the target associated with the lowest cost-to-go at each time. Time axis is aligned on movement onset. For comparison with experimental data, see the Fig 4 in [21].

More »

Expand

Fig 5.

Online motor decisions for rectangular targets.

A Individual simulated hand traces for the symmetric (Case 1) and asymmetric (left bias and right bias for cases 2 and 3, respectively) reward distributions. The reward distributions along the x-axis are represented above the targets. B Individual simulated hand traces for the three different reward distributions in presence of a rightward mechanical perturbation. C Individual simulated hand traces for the biased distributions (left and right bias represented by cases 1 and 2, respectively). D Individual simulated hand traces in presence of a switch in the reward distribution (from the full line to the dotted line) for both initial reward distributions. For comparison with experimental data, see the Fig 2 in [20] for the panels A and B and the Fig 3 in [23] for the panels C and D.

More »

Expand

Fig 6.

Behavioral evidence of the modulation of decision time.

A Model prediction of the modulation of the time of change of mind as a function of the difference between the central and the lateral targets. In presence of larger reward difference (when the central target has a larger reward than the other two) the model predicted longer decision time. B Summary of the experimentally measured decision time for constant force and various reward distributions (left) and constant reward distribution and different force levels (right). The full and dashed lines represent the trial with a leftward and rightward perturbation, respectively. C Group mean and SEM hand traces along the x-axis for reaching performed in presence of three alternative targets and leftward perturbations. All the targets had the same reward, the magenta trace captures trials that reached the center target while the green trace captures those that reached the left one. The vertical dashed line represents the onset of mechanical perturbation and the black arrow the onset of change of noticeable differences between the two targets. D Same as C in the condition where the central target was more rewarding than the lateral ones. E Same as C. F Same as E in presence of small leftward perturbations instead of large ones.

More »

Expand

Fig 7.

Model implementation.

A Schematic representation of the computation of online changes in task demands. At each time step, the time-varying task parameters Θ_t are used to derive the optimal control policy that is used to compute the motor command u_t (black arrow) which depends on the dynamical state estimate (dotted arrow) computed through dynamical Bayesian integration (state estimation). B Schematic representation of the implementation of online motor decisions in a three targets paradigm (see panel C). Each line schematized the time-varying cost-to-go functions associated with each option which are compared such that the one associated with the lowest value (see panel D) is selected (represented by the filled rectangle) and the corresponding motor command (full black arrow) is applied to the system for that very time step. C Representation of the different targets and an exemplar simulated hand trajectory (exaggerated case for illustration), the dots correspond to the time at which the decisions processes were considered. D Cost-to-go functions associated with the three targets evaluated at three different time points. The filled dots represent the minimum values that instructed the decision process, whenever these filled dots fall on a new color, it corresponds to an online change in target.

More »

Expand