A Biologically Plausible Computational Theory for Value Integration and Action Selection in Decisions with Competing Alternatives

doi:10.1371/journal.pcbi.1004104

Fig 1.

Model architecture.

The core component of the model is the motor plan formation field that dynamically integrates information from disparate sources. It receives excitatory inputs (green lines) from: i) the spatial sensory input field that encodes the angular representation of the alternative goals, ii) the goods-value field that encodes the expected benefits for moving towards a particular direction and iii) the context cue field that represents information related to the contextual requirements of the task. The motor plan formation field also receives inhibitory inputs (red line) from the action cost field that encodes the action cost (e.g., effort) required to move in a particular direction. All this information is integrated by the motor plan formation field into an evolving assessment of the “desirability” of the alternative options. Each neuron in the motor plan formation field is linked with a motor control schema that generates a direction-specific policy π_j to move in the preferred direction of that neuron. The output activity of the motor plan formation field weights the influence of each individual policy on the final action-plan (see “Model architecture” in the results section for more details).

More »

Expand

Fig 2.

A simplified version of the model architecture for reaching decision tasks in the presence of two competing targets.

The motor plan formation field encodes the direction of the intended arm movement in a special reference frame centered on the hand. The goods-based decision values and the spatial sensory inputs are therefore transformed from allocentric to egocentric representations centered on the hand before being input to the motor plan formation field. The motor plan formation field integrates information about the spatial location of the targets, the expected reward attached to each target and the action cost required to pursue the targets into a single variable named “relative desirability”. The relative desirability encodes the “attractiveness” of the individual M reach policies at a given time and state and is used to weigh the influence of these policies on the final policy. Note that M is the number of neurons with activation level above a threshold γ. Once the final policy is determined, the framework implements that policy at the given time and state resulting in an action-plan (i.e., sequences of actions) that drives the hand closer to the target (see Results and Methods sections for more details).

More »

Expand

Fig 3.

Characteristic examples of the simulated model activity in single- and two- target trials with the corresponding reaching movements, for a “rapid reaching choice” experiment.

Each trial started with either a single black target or two gray potential targets presented simultaneously in both visual fields. In the single-target trials, the actual target location was known prior to reach onset (upper row). In the two-target trials, reaching movements were initiated without knowing the actual location of the target (bottom row). Each of the two potential targets had an equal probability to turn black (i.e., cued target). When the hand crossed a pre-defined threshold (black discontinuous line in the bottom right panel), the actual target was cued for action. The left and the middle columns depict the activity in the spatial sensory input field and the motor plan formation field for the single- (top row) and two- (bottom row) target trials. Notice that when both potential targets were presented, the activity in the motor plan formation field was weaker, compared to activity generated in a single-target trials where the actual location of the target was known in advance. The populations of neurons related to the two potential targets remained active and competed against each other until the actual target was cued, and the activity of the non-cued target vanished. On trials when only a single target is presented, the reaches were made directly to the target. However, when two potential targets are presented in the field, the model generates spatial averaging reaching movements—i.e., initial reaching movement towards an intermediate position between the two targets followed by a corrective movement to the cued target. “Movement time” (or response time) indicates the time that the framework initiated the reaching movement.

More »

Expand

Fig 4.

Characteristic example of the simulated model activity during an effector choice task.

A single target, which can be acquired with either a hand or an eye movement, is presented at about 50 time-steps after the trial onset. The activity of the neurons tuned to this target increases in both DNFs that plan hand and eye movements, since the framework does not know whether it is a “reach” or a “saccade” trial. Once the “green” cue is presented about 20 time-steps after the target onset (the hypothetical (x,y) location of the green cue is represented by the green dot in the Cartesian plot to the right panel), the neuronal activity in the reaching DNF (bottom right field) becomes sufficiently strong, due to the excitatory inputs from the context cue neurons, to inhibit the saccade DNF (upper right field). The competition is resolved shortly and the framework generates a direct reaching movement (green trace) to the target.

More »

Expand

Fig 5.

Characteristic example of the simulated model activity during an effector choice task with two targets.

Left: Neuronal activity of the DNFs that plan saccade (upper row) and reaching (bottom row) movements in a free-choice task with two competing targets. Two targets located in the left and the right hemifield at equal distance from the “hand” and “eye” origin are presented at 50 time-steps followed by a “free-choice” cue signal (red and green cues are presented simultaneously) 50 time-steps later, which indicates that the framework is free to choose any effector to acquire any of the two targets. Since there is no effector preference to bias the effector competition, it takes longer for the model to decide whether to use the “hand’ or the “eye” to acquire the target. As a result, the competition between the targets is usually resolved before the movement onset, resulting frequently in direct movements to the selected target (green trace is a characteristic example of reaching movement in a free-choice trial). Right: Similar to the left panel but for a “cued-reaching” trial (green cue). The effector competition is resolved shortly after the cue is presented and the movement starts sooner than the free-choice trial due to the excitatory inputs from the context cue neurons. Thus, the competition between the targets is usually not resolved before the movement onset resulting in curved trajectories (green trace is a characteristic example of reaching movement in a “cued-reaching” trial).

More »

Expand

Fig 6.

Simulated neural activity, movement time and approach direction of reaching trajectories, in free-choice and cued-reaching trials in an effector choice task.

A: Time course of the average activity (20 trials) of the two populations of neurons tuned to the selected (solid black line) and the non-selected (discontinuous black line) targets prior to movement onset, from the DNFs that plan reaching (green color) and saccade (red color) movements in the “free-choice” sessions. Data shown only when reaches were selected. Notice that the framework selects first which effector to use to perform the task and then it chooses the target. The average activity from the saccade DNF for the selected and non-selected targets overlaps. B: Similar to panel A, but for the “cued-reaching” sessions. The competition between the effectors is resolved almost immediately after the cue onset. C: Mean movement (i.e., response) time from 20 reaching trajectories in a free-choice task (i.e., model is free to choose to perform hand or eye movement to acquire each of the targets) and a cued task, in which the model was instructed to perform reaches. The error bars are ± standard error. The movement time in the free-choice trials was significantly lower than the movement time in the cued-reaching trials (two-sample t-test, p < 10⁻⁷). D: Mean approach direction of 20 reaching movements for the first 50 time-steps in a free-choice task and a cued-reaching task. The error bars are ± standard error. Approach direction at 0 deg. indicates that initial reaching movements were made towards the intermediate location between the two targets. Notice that free-choice trials are characterized with straight reaching movements to the selected target, whereas the cued-reaching trials are dominated mostly by curved reaching movements to the selected target (two-sample t-test, p < 10⁻⁴).

More »

Expand

Fig 7.

Characteristic example of the simulated model activity during an effector choice task with three targets.

Neuronal activity of the DNFs that plan saccade (upper row) and reaching (bottom row) movements during a “cued-saccade” trial (note the red cue), in which the context cue is presented prior to target onset. The competition between the effectors is resolved shortly after the context cue is presented. Once the locations of the targets are shown, the framework has already selected the effector (i.e., eye in this trial) and the competition between the targets is resolved quite fast resulting in direct saccadic movements to the selected target (right panel).

More »

Expand

Fig 8.

Characteristic example of the simulated model activity during training with a reach cue presented first, followed by a single target.

Stimulus input activity (left column) and motor plan formation DNF activity (middle column) for the eye (top row) and hand (bottom row) networks. The model incorrectly performed a saccade in response to the reach cue (right column).

More »

Expand

Fig 9.

History of training on effector cues.

Plots A-D show the connection weights from neurons representing each cue (i.e., red and green) to the saccade (A-B) and reach (C-D) motor plan formation DNFs. There are 50 neurons selective for each cue and each motor plan formation field has 181 neurons, yielding four 50×181 connection weight matrices. Each matrix has been averaged over the cue selective neurons at each trial to show the mean connection weight to each motor plan formation field as training progresses. A: Mean connection weights from neurons representing the red cue (cue 1) to neurons in the saccade motor formation DNF from trials 1 to 500. B: Mean connection weights from green cue (cue 2) neurons to the saccade DNF. C: Mean connection weights from red cue neurons to the reach motor formation DNF. D: Mean connection weights from green cue neurons to the reach motor formation DNF. E: Success of each trial during training (0 = unsuccessful, 1 = successful).

More »

Expand

Fig 10.

Expected reward biases the competition between alternative actions.

A: Proportion of left and right choices in the “equal-reward” and “unequal-reward” conditions. Expected reward influences choice preferences by shifting the choice bias towards the higher valued target. B: Movement time distribution for “equal-reward” (gray bars) and “unequal-reward” (black bars) choices. The movement time is approximately normally distributed for equal-reward choices, and it becomes increasingly skewed to the right for unequal-reward choices. C: Average movement time for reaches to the left and right target in the equal-reward and unequal-reward conditions. The error bars are ± standard error. Notice that reach responses become faster when choosing the most preferred option than when selecting the less preferred option (two-sample t-test, p < 10⁻⁶). D: Time course of the average activity of the two populations of neurons tuned to the targets from the DNF that plans the reaches in the “equal-reward” condition. The target onset is indicated by a green arrow. The temporal evolution and the strength of the neural activity are about the same for both populations, since the expected reward for both choices is the same. E: Similar to panel D, but for the “unequal-reward” condition. In this case, the modulation of the expected reward influences the neural activity in the reach DNF—the activity of neurons tuned to the higher valued target increases significantly compared to the neuronal activity associated with the lower valued target.

More »

Expand

Fig 11.

History of training on reward contingency.

A: Expected reward for target directions in an egocentric reference frame from trials 1 to 500. The model was presented with two targets on each trial, initialized with equal expected reward. Reward was received for reaching or making a saccade to the left target. B: Success of each trial during training (0 = unsuccessful, 1 = successful).

More »

Expand