Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing

doi:10.1371/journal.pcbi.1005684

Fig 1.

Behavioural task variants and computational model.

(A) Behavioural task variants. In Experiment 1 (leftmost panel) participants were shown only the outcome of the chosen option. In Experiment 2 (rightmost panel) participants were shown the outcome of both the chosen and the unchosen options. (B) Computational models. The schematic summarises the value update stage of our computational model. The model contains two computational modules, a factual learning module (in red) to learn from chosen outcomes (R_C) and a counterfactual learning module (in blue) to learn from unchosen outcomes (R_U) (note that the counterfactual learning module does not apply to Experiment 1). Chosen (Q_C) and unchosen (Q_U) option values are updated with delta rules that use different learning rates for positive and negative factual (PE_C) and counterfactual prediction errors (PE_U).

More »

Expand

Fig 2.

Factual and counterfactual learning biases.

(A) Predicted results. Based on previous studies we expected that in Experiment 1 factual learning would display a “positivity” bias (i.e. the learning rate for the chosen positive outcomes would be relatively higher than that of the chosen negative outcomes (; note that in Experiment 1 the “positivity” and the “confirmation” bias are not discernible). In Experiment 2, one possibility was that this “positivity” bias would extend to counterfactual learning, whereby positive outcomes would be over-weighted regardless of whether the outcome was chosen or unchosen (“valence” bias) (). Another possibility was that counterfactual learning would present an opposite bias, whereby the learning rate for unchosen negative outcomes was higher than the learning rate of unchosen positive outcomes () (“confirmation” bias). (B) Actual results. Learning rate analysis of Experiment 1 data replicated previous findings, demonstrating that factual learning presents a “positivity” bias. Learning rate analysis of Experiment 2 indicated that counterfactual learning was also biased, in a direction that was consistent with a “confirmation” bias. ***P<0.001 and *P<0.05, two-tailed paired t-test.

More »

Expand

Fig 3.

Dimensionality reduction with model comparison.

(A) Model space. The figure represents how the number of parameters (learning rates) are reduced moving from the ‘Full’ model to more simple ones. (B) Model comparison. The panel represents the posterior probability (PP) of the models, the calculation of which is based on the BIC, which penalisses model complexity. The dashed line represents random posterior probability (0.25). (C) Model parameters. The panel represents the learning rate for the best fitting model (i.e., the ‘Confirmation’) model. α_CON: learning rate for positive obtained and negative forgone outcomes; α_DIS: learning rate for negative obtained and positive forgone outcomes. ***P<0.001, two-tailed paired t-test.

More »

Expand

Table 1.

Model comparison.

The “winning” model is the “Confirmation” model for which the learning rates are displayed in Fig 3C. The second best model is the Full model, for which the learning rates are displayed in Fig 2B.

More »

Expand

Fig 4.

Learning curves and model estimates.

(A) Task conditions. (B) and (C) Learning curves as a function of the task conditions in Experiment 1 and Experiment 2, respectively. Each panel displays the result of the corresponding condition presented in (A). The black dots and the error bars represent the actual data ± s.e.m. The green lines represent the model estimates of the biased models (Experiment 1: ; Experiment 2:α_CON≠α_DIS), the grey lines represent the model estimates of the unbiased models (Experiment 1: ; Experiment 2: α_CON = α_DIS).

More »

Expand

Fig 5.

Parameter correlation and recovery.

(A) Correlation matrix of the free parameters for Experiment 1 (left) and Experiment 2 (right). Dark blue or dark red values of R indicate a strong correlation and therefore a problem in parameter identifiability. (B) Correlation matrix of the free parameters used to generate the simulated data (‘Fitted on real data’) and obtained by applying the parameter estimation procedure on the simulated data (‘Fitted on simulated data’). Dark red values of R indicate a strong correlation between the true and the retrieved parameter value and therefore a good parameter recovery.

More »

Expand

Fig 6.

Behavioural signatures distinguishing “low” and “high” bias participants.

(A) Task conditions. The ‘Symmetric’ condition was characterised by a stable reward contingency and no correct option, because the two options had equal reward probabilities. The ‘Asymmetric conditions’ were also characterised by a stable reward contingency but had a correct option, since one option had a higher reward probability than the other. The ‘Reversal’ condition was characterised by an unstable reward contingency: after 12 trials the reward probability reversed across symbols, so that the former correct option became the incorrect one, and vice versa. Note that the number of trials refers to one session and participants performed two sessions, each involving new pairs of stimuli (192 trials in total). (B) and (C) Behavioural results as a function of the task conditions in Experiment 1 and Experiment 2, respectively. Each column presents the result of the corresponding condition presented in (A). In the Symmetric condition, where there was no correct option, we calculated the “preferred choice rate”, which was the choice rate of the most frequently chosen option (by definition, this was always greater than 0.5). In the Asymmetric and the Reversal conditions we calculated the correct choice rate. In the Reversal condition the correct choice rate was split between the two learning phases. ***P<0.001 and *P<0.05, two-tailed paired t-test.

More »

Expand