Uncertainty–guided learning with scaled prediction errors in the basal ganglia

doi:10.1371/journal.pcbi.1009816

Fig 1.

The simplified basal ganglia circuit.

Selected nuclei and connections are shown as circles and arrows. Green connections correspond to the direct pathway; red connections correspond to the indirect pathway. Dopamine projections are shown in blue.

More »

Expand

Fig 2.

Reward prediction performance of the RW, SPE and Kalman filter models.

A The first 200 trials of reward prediction for the RW learner (upper row, orange color) and the SPE learner (lower row, blue color). The true value (grey line), the observed rewards (grey dots) and the learner’s estimate (colored line) are shown as a function of trial number. Columns correspond to selected levels of observation noise (σ = 1, 5, 15). B Learning performance averaged over trials. We show the logarithm of the mean squared difference between the mean of the reward distribution and the learner’s prediction thereof, as a function of the observation noise σ. Orange lines correspond to RW learners, the blue line corresponds to a SPE learner parametrized with α_m = 1 and α_s = 0.01, and the green line corresponds to a Kalman filter parametrized with the true underlying process and observation noise parameters. The different shades of orange correspond to different learning rates, as indicated by the color bar.

More »

Expand

Fig 3.

Dopamine responses to unpredictable rewards—experimental data and simulations.

A The reward distributions used by Tobler, Fiorillo [10]. Each distribution corresponds to an experimental condition. B Dopamine responses to rewards sampled from the distributions in A are shown as a function of reward magnitude, for the three different conditions. The representation of data is similar to that in figure 4C of Tobler, Fiorillo [10]. We show experimental data, extracted from figure 4C (animal A) of Tobler, Fiorillo [10] and simulated data, using a standard RW model and the SPE model. The colors relate the dopamine responses in B to the reward distributions in A. C The reward distributions used by Rothenhoefer, Hong [21]. The panel shows the probabilities plotted by Rothenhoefer, Hong [21] in figure 1A. D Dopamine responses to rewards sampled from the distributions in C. We show the empirical values plotted by Rothenhoefer, Hong [21] in figure 2E, and the responses according to the RW model computed analytically as δ = r−μ, and the SPE model computed as , where μ and σ are the mean and standard deviation of corresponding reward distributions in C. Purple lines correspond to the uniform reward distribution, green lines correspond to the normal reward distribution.

More »

Expand

Fig 4.

Plasticity and computations in the basal ganglia circuit.

A The nonlinear transformation of dopaminergic prediction errors in the SPE model. The transformation in the direct pathway (i) and the transformation in the indirect pathway (ii) are mirror images of each other. B We plot the proportion of occupied receptors in the striatum as a function of dopamine concentration. The curves are based on the results of Dreyer, Herrik [26]. The blue vertical lines indicate the baseline dopamine concentration in the ventral striatum, based on the results of Dodson, Dreyer [27]. The green curve corresponds to the occupancy of D1 receptors, the red curve corresponds to the occupancy of D2 receptors. Panel B is adapted from figure 3D of Möller and Bogacz [23]. C The connectivity underlying a dynamical model of the simplified basal ganglia circuit. Circles correspond to neural populations; arrows between them indicate connections. D The computation of a scaled prediction error in continuous time, according to a dynamical model of the basal ganglia. We show how the relevant variables, T and δ, evolve as a function of time, assuming a step–function activation for the input nodes G, N and r. The black line in the lowest panel indicates the level of dopamine required for exact SPE learning.

More »

Expand

Fig 5.

Variables of the Kalman filter and its approximations.

A Posterior variance in the Kalman filter (solid, Eq 19) and the steady–state Kalman filter (dotted, Eq 20), as a function the number of observations. Different colours correspond to different levels of process noise, and the values are plotted for the standard deviation of the observation noise of σ = 1. B The learning rate of the steady–state Kalman filter k_∞ (blue, Eq 21) and the approximation (orange) which corresponds to the effective learning rate in the SPE model. We show the learning rates as a function of the observation noise σ for process noise of ν = 1.

More »

Expand