Contextual influence on confidence judgments in human reinforcement learning

doi:10.1371/journal.pcbi.1006973

Fig 1.

Experiment 1 Task Schematic, Learning and Confidence Results (A) Behavioral task. Successive screens displayed in one trial are shown from left to right with durations in ms. After a fixation cross, participants viewed a couple of abstract symbols displayed on both sides of a computer screen and had to choose between them. They were thereafter asked to report their confidence in their choice on a numerical scale (graded from 0 to 10). Finally, the outcome associated with the chosen symbol was revealed. (B) Task design and contingencies. (C) Performance. Trial by trial percentage of correct responses in the partial (left) and the complete (middle) information conditions. Filled colored areas represent mean ± sem; Right: Individual averaged performances in the different conditions. Connected dots represent individual data points in the within-subject design. The error bar displayed on the side of the scatter plots indicate the sample mean ± sem. (D) Confidence. Trial by trial confidence ratings in the partial (left) and the complete (middle) information conditions. Filled colored areas represent mean ± sem; Right: Individual averaged performances in the different conditions. Connected dots represent individual data points in the within-subject design. The error bar displayed on the side of the scatter plots indicate the sample mean ± sem.

More »

Expand

Fig 2.

Experiment 2 Task Schematic, Learning and Confidence Results (A) Behavioral task. Successive screens displayed in one trial are shown from left to right with durations in ms. After a fixation cross, participants viewed a couple of abstract symbols displayed on both sides of a computer screen, and had to choose between them. They were thereafter asked to report their confidence in their choice on a numerical scale (graded from 50 to 100%). Finally, the outcome associated with the chosen symbol was revealed. (B) Task design and contingencies. (C) Performance. Trial by trial percentage of correct responses in the partial (left) and the complete (middle) information conditions. Filled colored areas represent mean ± sem; Right: Individual averaged performances in the different conditions. Connected dots represent individual data points in the within-subject design. The error bar displayed on the side of the scatter plots indicate the sample mean ± sem. (D) Confidence. Trial by trial confidence ratings in the partial (left) and the complete (middle) information conditions. Filled colored areas represent mean ± sem; Right: Individual averaged performances in the different conditions. Connected dots represent individual data points in the within-subject design. The error bar displayed on the side of the scatter plots indicate the sample mean ± sem.

More »

Expand

Fig 3.

Incentive mechanism and overconfidence (A) Incentive mechanism. In Experiment 2, for the payout-relevant trials a lottery L is randomly drawn in the 50–100% interval and compared to the confidence rating C. If L > C, the lottery is implemented. A wheel of fortune, with a L% chance of losing is displayed, and played out. Then, feedback informed participants whether the lottery resulted in a win or a loss. If C > L, a clock is displayed together with the message “Please wait”, followed by feedback which depended on the correctness of the initial choice. With this mechanism, participant can maximize their earning by reporting their confidence accurately and truthfully. (B) Overconfidence. Individual averaged calibration, as a function of Experiment 2 experimental conditions (with a similar color code as in Figs 1 and 2). Connected dots represent individual data points in the within-subject design. The error bar displayed on the side of the scatter plots indicate the sample mean ± sem.

More »

Expand

Fig 4.

Modelling results: Fits.

Behavioral results and model fits in Experiments 1(A) and 2 (B). Top: Learning performance (i.e. percent correct). Middle: Choice rate in the transfer test. Symbols are ranked by expected value (L₇₅: symbol associated with 75% probability of losing 1€; L₂₅: symbol associated with 25% probability of losing 1€; G₂₅: symbol associated with 25% probability of winning 1€; G₇₅: symbol associated with 75% probability of winning 1€;) Bottom: Confidence ratings. In all panels, colored dots and error bars represent the actual data (mean ± sem), and filled areas represent the model fits (mean ± sem). Model fits were obtained with the RELATIVE reinforcement-learning model for the learning performance (top) and the choice rate in the transfer test (middle), and with the FULL glme for the confidence ratings (bottom). Dark grey diamonds in the Preference panels (middle) indicate the fit from the ABSOLUTE model.

More »

Expand

Table 1.

Reinforcement-learning.

Model comparison. AIC, Akaike Information Criterion (computed with nLL_max); BIC, Bayesian Information Criterion (computed with nLL_max); DF, degrees of freedom; nLL_max, negative log likelihood; nLPP_max, negative log of posterior probability; EF, expected frequency of the model given the data; XP, exceedance probability (computed using the Laplace approximation of the model evidence ME). The table summarizes for each model its fitting performances.

More »

Expand

Table 2.

Reinforcement-learning.

Free parameters. ABSOLUTE, absolute value learning model; RELATIVE, relative value learning model (best-fitting model); LL optimization, parameters obtained when minimizing the negative log likelihood; LPP optimization, parameters obtained when minimizing the negative log of the posterior probability. The table summarizes for each model the likelihood maximizing (best) parameters averaged across subjects. Data are expressed as mean±s.e.m. The values retrieved from the LPP optimization procedure are those used to generate the variable used in the confidence glme models.

More »

Expand

Table 3.

Modelling confidence ratings.

Estimated fixed-effect coefficients from generalized linear mixed-effect models.

More »

Expand

Table 4.

Modelling performance and reaction times.

Estimated fixed-effect coefficients from generalized linear mixed-effect models (performance: logistic regression; reaction times: linear regression).

More »

Expand

Fig 5.

Modelling results: Lesioning approach.

Three nested models are compared in their ability to reproduce the pattern of interest observed in averaged confidence ratings, in experiment 1 (A) and experiment 2 (B). In the FULL model, confidence is modelled as a function of three factors: the absolute difference between options values, the confidence observed in the previous trial, and the context value. In the REDUCED model 1, confidence is modelled as a function of only two factors: the absolute difference between options values and the confidence observed in the previous trial. Hence, the REDUCED model 1 omits the context-value as a predictor of confidence. In the REDUCED model 2, confidence is modelled as a function of only two factors: the absolute difference between options values and the context-value. Hence, the REDUCED model 2 omits the confidence observed in the previous trial as a predictor of confidence. Left: pattern of confidence ratings observed in the behavioral data. Middle-left: pattern of confidence ratings estimated from the FULL model. Middle-right: pattern of confidence ratings estimated from the REDUCED model 1. Right: pattern of confidence ratings estimated from the REDUCED model 2. In red are reported statistics from a repeated-measure ANOVA where the alternative model fails to reproduce important statistical properties of confidence observed in the data. Connected dots represent individual data points in the within-subject design. The error bar displayed on the side of the scatter plots indicate the sample mean ± sem.

More »

Expand

Fig 6.

Summary of the modelling results.

The schematic illustrates the computational architecture that best accounts for the choice and confidence data. In each context (or state) ‘s’, the agent tracks option values (Q(s,:)), which are used to decide amongst alternative courses of action, together with the value of the context (V(s)), which quantify the average expected value of the decision context. In all contexts, the agent receives an outcome associated with the chosen option (R_c), which is used to update the chosen option value (Q(s,c)) via a prediction error (δ_c) weighted by a learning rate (α_c). In the complete feedback condition, the agent also receives information about the outcome of the unselected option (R_u), which is used to update the unselected option value (Q(s,u)) via a prediction error (δ_u) weighted by a learning rate (α_u). The available feedback information (R_c and R_u, in the complete feedback contexts and Q(s,u) in the partial feedback contexts) is also used to update the value of the context (V(s)), via a prediction error (δ_V) weighted by a specific learning rate (α_V). Option and context values jointly contribute to the generation of confidence judgments.

More »

Expand

Table 5.

Assessing the specific role of context values on confidence.

Estimated fixed-effect coefficients from generalized linear mixed-effect models.

More »

Expand

Fig 7.

Experiment 3 task schematic, reversal learning and confidence results. (A) Task design and contingencies. (B) Performance. Trial by trial percentage of correct responses in the partial (left) and the complete (middle-left) information conditions. Filled colored areas represent mean ± sem; Middle-right and right: Individual averaged performances in the different conditions, before (middle-right) and after (right) the reversal. The orange shaded area highlights the post-reversal behavior. Connected dots represent individual data points in the within-subject design. The error bar displayed on the side of the scatter plots indicate the sample mean ± sem. (C) Confidence. Trial by trial confidence ratings in the partial (left) and the complete (middle-left) information conditions. Filled colored areas represent mean ± sem; Middle-right and right: Individual averaged performances in the different conditions, before (middle-right) and after (right) the reversal. The orange shaded area highlights the post-reversal behavior. Connected dots represent individual data points in the within-subject design. The error bar displayed on the side of the scatter plots indicate the sample mean ± sem. G_Sta: Gain Stable; L_Sta: Loss Stable; G_Rev: gain reversal; L_Rev: Loss Reversal.

More »

Expand