Learning Reward Uncertainty in the Basal Ganglia
Fig 9
Changes in the variables of the OpAL model simulated in a two-alternative choice task as a function of trial number.
The rewards were sampled from a Gaussian distribution. Different rows correspond to simulations with different mean rewards μi (indicated above the panels), and different columns show: synaptic weights describing the tendency to select Gi and inhibit Ni for the two actions and the value of the state V. Standard deviations of reward σi associated with the two actions are indicated above the corresponding panels. Here, both G and N were initialized at 0.1, and we set α = 0.1 and the parameters of the choice rule to a = b = 1. For each of the panels, the simulation was run 50 times, for 300 trials each.