A nonlinear relationship between prediction errors and learning rates in human reinforcement-learning

doi:10.1371/journal.pcbi.1013445

Fig 1.

A parabolic relationship between prediction errors and learning rates obtained by a cubic learning algorithm.

The proposed algorithm approximates this relationship under the control of a single free-parameter (thick green line, κ = 0.25). The free-parameter κ also determines the focal point of the parabolic curve (i.e., the black dot at (0, 0.25) on the xy plane is the focal point for the thick green parabolic curve). Increasing values of κ flattening the relationship between prediction errors and learning rates, leading to lower learning rates for a given magnitude of the prediction error. A few examples are illustrated by light green lines at κ = 0.5; 1.25 and 2.5. Parameter values smaller than 0.25 could allow prediction errors to be converted to learning rates even more steeply.

More »

Expand

Fig 2.

Nonlinearity in trial-wise learning rates as a function of prediction errors and environmental noise.

One viable function, perhaps out of many possible solutions, accounting for the relationship between κ and σ is . This would suggest that the parameter K is a function of environmental noise, and one example of this exponential relationship is shown in Fig A in S1 Text. The figure demonstrates that the σ parameter can be estimated reliably between.01 and.4, covering the whole learning rate space. Note that for outcome magnitudes within the normalised space (i.e., [0 1]), values of σ higher than.3 would generate outcomes outside these boundaries. Here, ϒ denotes the Euler-Mascheroni constant.

More »

Expand

Fig 3.

Nonlinearity in trial-wise learning rates as a function of prediction errors under an exponential-logarithmic assumption.

(A) The parameters δ and λ from Eq 7 are coarsely sampled for illustration purposes. When the parameter combinations produce horizontal lines over the PEs (i.e., the x-axis), the model is fully reduced to the Rescorla-Wagner rule with a constant learning rate (Red thicker lines, lambda = 0.005, variable delta values shown in figure). (B-C) Change in learning rate vs absolute value of prediction error (PE) trajectories when the value of one of the parameters in the model is fixed to 1. Although it is important to highlight that these parameters interact with each other to set the trajectory of the learning rate/absolute value of prediction error relationship, higher the lambda value the more sigmoidal this relationship will be as opposed to a constant learning rate, whereas higher the delta, more parabolic will be the relationship between prediction errors and learning rates.

More »

Expand

Fig 4.

The behaviour of different reinforcement-learning models in a task environment in which unexpected and expected uncertainties were independently manipulated.

All models converge reasonably well with the actual mean of variable rewards. The learning rate for the Rescorla-Wagner model (η, Eq 1) is 0.32. For the hybrid Pearce-Hall model, ω (Eq 2) is 0.48 and λ is (Eq 3) is 1.56. For the cubic model κ is 0.11 (Eqs 4-5). For the exponential-logarithmic model, the parameters δ and λ are 0.83 and 1.45, respectively (Eq 7). Because models perform ever so comparably, their differences are illustrated in Fig B in S1 Text, showing the average prediction error values relative to the simulated outcomes in the task environment. Note that, the simulation environment shown was generated only once, covering many possibilities of environmental volatility and noise, and their interaction, whereas the models were fitted iteratively until parameters minimising the average magnitude of the prediction error relative to the actual outcome sequence are identified.

More »

Expand

Fig 5.

The learning rates estimated under each reinforcement-learning model.

(A) Hybrid Pearce-Hall model estimates higher learning rates relative to the Rescorla-Wagner model, whereas cubic and exponential-logarithmic models estimate overall lower learning rates. (B) The relationship between PEs and learning rates estimated by the exponential-logarithmic model reveals a nonlinear trajectory very similar to what is proposed by the cubic model. Volatility of different task blocks did not seem to influence learning rate trajectories in terms of how absolute values of the PEs tended to influence learning rates. This relationship was similarly exponential in form across both stable and volatile task-blocks.

More »

Expand

Fig 6.

Learning rate results from the reanalysis of the bucket task.

(A) Overview of the bucket task. (B) The analysis of the data with the novel reinforcement-learning models demonstrated that, among models that can estimate trialwise learning rates, the exponential-logarithmic model exhibits greater flexibility to account for participant behaviour (i.e., the model minimising the deviation from participants bucket placement) relative to the cubic model which assumes a strictly parabolic relationship. The values are normalised from 360° to [0,1].***p < .001 (C) The model estimates steep learning rate increases over the prediction error space and results mostly align with those reported by Vaghi et al., (2017). Average model-free learning rates reported by Vaghi and colleagues indicated by circular markers, curve with grey shading denotes the trajectory estimated by the exponential-logarithmic model. Here, population average of learning rates estimated by the RW model is 0.968 ± 0.034 (mean±SD, mean shown by the blue horizontal dashed line). Consequently, the exponential-logarithmic model offers greater flexibility at the lower PE values covering the spectrum where most of the actual PEs in this task were, whereas learning rates from these competing models somewhat converge at the higher PE values (i.e., top right of panel C).

More »

Expand

Fig 7.

The timeline of the reward magnitude learning task.

See Fig E in S1 Text for the task environment, more details about the task available in the Methods section.

More »

Expand

Fig 8.

The learning rates and their pupillary correlates.

(A) Relationship between the absolute value of prediction errors and learning rates. Each single-subject is represented as different colours. Variability in individual marker size along each single-subject trajectory is scaled to the average normalised pupil size during the outcome delivery period at that intersection of prediction errors and learning rates. Four thick continuous lines designate the participants whose pupil data was corrupted, demonstrating only the behavioural relationship. (B) Time evolution of regression coefficients during the outcome delivery period: unsigned prediction errors (grey); learning rates (orange); chosen outcome magnitude (green). Error shading designate ±1 SEM. *p < .05 designate bins of 1000ms in which the average pupil dilation is significantly different relative to baseline.

More »

Expand