Interference and Shaping in Sensorimotor Adaptations with Rewards

When a perturbation is applied in a sensorimotor transformation task, subjects can adapt and maintain performance by either relying on sensory feedback, or, in the absence of such feedback, on information provided by rewards. For example, in a classical rotation task where movement endpoints must be rotated to reach a fixed target, human subjects can successfully adapt their reaching movements solely on the basis of binary rewards, although this proves much more difficult than with visual feedback. Here, we investigate such a reward-driven sensorimotor adaptation process in a minimal computational model of the task. The key assumption of the model is that synaptic plasticity is gated by the reward. We study how the learning dynamics depend on the target size, the movement variability, the rotation angle and the number of targets. We show that when the movement is perturbed for multiple targets, the adaptation process for the different targets can interfere destructively or constructively depending on the similarities between the sensory stimuli (the targets) and the overlap in their neuronal representations. Destructive interferences can result in a drastic slowdown of the adaptation. As a result of interference, the time to adapt varies non-linearly with the number of targets. Our analysis shows that these interferences are weaker if the reward varies smoothly with the subject's performance instead of being binary. We demonstrate how shaping the reward or shaping the task can accelerate the adaptation dramatically by reducing the destructive interferences. We argue that experimentally investigating the dynamics of reward-driven sensorimotor adaptation for more than one sensory stimulus can shed light on the underlying learning rules.


Text S1 Delayed learning for a Gradient Ascent over a sigmoidal reward function
The delayed learning effect described in the main text depends mainly on the width of the tuning curves and on the shape of the reward, which changes abruptly from zero to one. This was shown in the context of a learning rule derived from the REINFORCE family. In fact, delayed learning is not limited to this type of algorithm. It also occurs if learning is driven by an on-line gradient ascent on a sigmoidal reward function according to the rule: where E and E are defined in Eq.(6) in the main text. Note that this learning rule is deterministic. Figure S1 displays the results for the sigmoidal reward function: with a constant c and a smoothing parameter T (see Results). Here, as in Figure 10B in the main text, delayed learning is reduced when the reward function is smoothed.

Delayed learning for a network with an intermediate layer and a different decoder
We consider here a network with three layers. It consists of an N dimensional sensory input layer and a 2 dimensional output layer, as in the model investigated in the main text. In addition, it has an N dimensional hidden layer.
The activity of the neurons in the hidden layer is: where ξ ∼ N (0, σ 2 I) is a Gaussian noise. The direction of the reach movement is then computed as the angle of the vector: The 1 √ N normalization factor guarantees that the variability of r does not depend on N . When a rotation is introduced, an angle γ is added to the decoded direction, denoted by θ r . The error is then calculated according to: where θ is the location of the target . We stick with the notation E ξ to highlight the fact that this measure depends on the noise through θ r . This quantity will be used to measure the error with which the network performs the reaching task. Upon presentation of a target in a direction θ at trial t, the network performs the task and a reward R is delivered according to the outcome: The matrix W is then updated in two steps: The second step prevents a drift of the weight matrix in directions that are irrelevant to the decoder of the angle.
We first train the network to perform the reaching task without a rotation on 15 targets, with a small error for all targets. The adaptation to the rotation is then performed with this initial condition. Figure  S2 plots the reach angle against the trial number in a network adapting to two targets to a rotation of 30 • . Although the network adapts to the rotation for the target at θ = 0 • quite fast, adaptation to the target in the opposite direction (θ = 180 • ) is delayed. Similar results can be achieved as well with threshold-linear neurons in the intermediate layer (unpublished data).

Generalization error for gradual adaptation
Here we compare the behavior of our model for gradual adaptation to one target with the corresponding experimental and modeling results reported in [1]. As shown in Figure S3, our model accounts for the experimental data to the same extent as the model studied in the latter paper (compare with Figure 1C and 2B in [1]). We also found that in our model, for a large rotation (e.g. 30 degrees), there is bias toward the reinforced location, but this bias is negligible for a small rotation angle (e.g. 8 degrees, as in [1]).

Learning the rotation for multiple targets with minimization of a quadratic error
The increase in the learning duration for wide tuning curves in our model, as well as the decrease in learning duration when multiple targets are learned, stem from the shape of the reward. These effects do not occur if the adaptation is done using an on-line gradient descent on a quadratic error function, as for instance in the model explored in [2]. This is shown in Figure S4.