Learning Reward Uncertainty in the Basal Ganglia

doi:10.1371/journal.pcbi.1005062

Fig 1.

Simplified anatomy of the basal ganglia.

The arrows and lines ending with circles denote the excitatory and inhibitory connections respectively. The following abbreviations are used: GPe—external globus pallidus, SNr—substantia nigra pars reticulata, GPi—internal globus pallidus, SNc—substantia nigra pars compacta, VTA—ventral tegmental area.

More »

Expand

Fig 2.

Changes in G and N for the AU model as a function of trial number.

Different rows correspond to different mean reward μ_r (indicated left of each row), and different columns correspond to different standard deviations of reward σ_r (indicated above each column). The rewards were sampled from a Gaussian distribution. Here, both G and N were initialized at 0, and we set α = 0.1. We have selected in order to make the figure easier to interpret, because then , and hence in the middle row G and N approach σ_r. For each of the panels, the simulation was run 50 times, for 100 trials each.

More »

Expand

Fig 3.

Comparison of the sum of weights in the Go and NoGo pathways in the AU model (vertical axis) with the standard deviation of rewards (horizontal axis) for different values of parameter β.

In all simulations in this figure, μ_r = 1 (so σ_r is equal to the coefficient of variation) and α = 0.1. For each value of σ_r the model was simulated 10 times for 300 iterations. For each simulation, the sum of G and N at the end of the simulation is displayed as a point on the figure.

More »

Expand

Fig 4.

Effects of dopamine (DA) on action selection between safe and risky options.

Circles denote different neural populations: black circle corresponds to the neural population in cortex selective for the current state, green circle corresponds to dopaminergic neurons, and blue and orange circles correspond to the striatal neurons selective for two different actions. The circles receiving inputs via connections G_i and N_i correspond to D1 and D2 neurons. Arrows and lines ending with circles denote connections with excitatory and inhibitory effect respectively. The top panel illustrates a situation of high tonic level of DA, where the D2 neurons are inhibited (indicated by bleak color), while the bottom panel corresponds to low DA, where the D1 neurons are inhibited.

More »

Expand

Fig 5.

Effect of dopaminergic receptor manipulations on risky choices.

In each panel the percentage of risky choices is plotted against the probability of obtaining the large reward by choosing the risky lever. Open circles show the data from animals in the control state, while filled squares show the data obtained after treatment with a drug. Each panel corresponds to a different drug indicated in the key. The data in each panel were read from one figure in [32] and averaged over different drug concentrations. In particular, the data in the four panels come from Figs 3c, 4c, 3a and 4a in [32]. Solid and dashed curves show the fractions of risky choices made by the model, simulated for parameters corresponding to control and drug conditions. During each simulation the model made 10,000 choices in each of four probability conditions (thus the standard error of mean fraction of risky choices made by the model was < 1%). This large number of simulated trials allowed the model to produce stable behavior, which was necessary for the search for parameters resulting in a match with animal behavior.

More »

Expand

Fig 6.

Schematic illustration of the sensitivity of D1 and D2 receptors to the changes in dopamine (DA) level.

Black and grey curves show how the probability of D1 and D2 receptor occupancy depends on DA concentration in a biophysical model of [36]. They assumed that receptor occupancy depends on DA concentration C as , where EC₅₀ is the receptor affinity, which was taken as 1μM and 10nM for D1 and D2 receptors respectively, based on [38]. Dashed line indicates baseline DA concentration C = 60nM suggested by simulations in [37]. Vertical arrows indicate how much binding probability changes due to changes in DA concentration, shown by horizontal arrows.

More »

Expand

Fig 7.

Comparison of the sum of weights in the Go and NoGo pathways (vertical axis) with the standard deviation of rewards (horizontal axis) in the original (black dots) and generalized (grey dots) versions of the AU (left panel) and ACU (right panel) models.

In all simulations in this figure, μ_r = 1 and α = 0.1. The rewards were sampled from a Gaussian distribution. For each value of σ_r the model was simulated 10 times for 300 trials. For each simulation, the sum of G and N at the end of the simulation is displayed as a point on the figure. At the first trial of each simulation, the weights were initialized to G = N = 0.

More »

Expand

Fig 8.

Changes in the variables of the ACU model simulated in a two-alternative choice task as a function of trial number.

The rewards were sampled from a Gaussian distribution. Different rows correspond to simulations with different mean rewards μ_i (indicated above the panels), and different columns show: synaptic weights describing the tendency to select G_i and inhibit N_i for the two actions and the value of the state V. Standard deviations of reward σ_i associated with the two actions are indicated above the corresponding panels. Here, both G and N were initialized at 0, and we set α = 0.1 and the parameters of the choice rule to a = b = 1. For each of the panels, the simulation was run 50 times, for 100 trials each.

More »

Expand

Fig 9.

Changes in the variables of the OpAL model simulated in a two-alternative choice task as a function of trial number.

The rewards were sampled from a Gaussian distribution. Different rows correspond to simulations with different mean rewards μ_i (indicated above the panels), and different columns show: synaptic weights describing the tendency to select G_i and inhibit N_i for the two actions and the value of the state V. Standard deviations of reward σ_i associated with the two actions are indicated above the corresponding panels. Here, both G and N were initialized at 0.1, and we set α = 0.1 and the parameters of the choice rule to a = b = 1. For each of the panels, the simulation was run 50 times, for 300 trials each.

More »

Expand

Fig 10.

Comparison of the behavior of different models (labelled above columns of panels) in the Japanese letter learning task.

The top panels show the weights at the end of the simulation, and the bottom panels the probability of choosing A and avoiding B (computed from Eq 6). At the start of each simulation V, G and N were initialized at 0.1, and we set α = β = 0.1. The parameters of the choice rule to were set to a = b = 2 during training, while during test they were set to a = 4, b = 0 in the simulated on medication condition, and to a = 0, b = 4 in the simulated off medication condition. For each of the models, 100 simulations were run, with 100 learning trials each, and error bars show standard error across simulations.

More »

Expand