Confidence resets reveal hierarchical adaptive learning in humans

doi:10.1371/journal.pcbi.1006972

Fig 1.

Apparent learning rate modulations in previous designs are not a hallmark of hierarchical processing.

This simulation is inspired by a previous study by Behrens et al [2], in which the reward probability was not fixed but changed abruptly; the authors used different volatility levels (i.e. different numbers of change points). Similarly, we generated sequences with low volatility (7 change points, see vertical plain black lines), and high volatility (see additional change points, vertical dashed dashed lines). The sequences were binary (absence or presence of reward) and the reward probability was resampled randomly after each change point. We consider two learning models: a hierarchical model, which estimates the reward rate, taking into account the possibility of change points; and a flat model that computes the reward rate near-optimally based on a fixed leaky count of observations, and a prior count of 1 for either outcome (see Methods). Each model has a single free parameter (respectively, a priori volatility and leak factor) which we fit to return the best estimate of the actual generative reward probabilities in both the low and high volatility conditions together. Keeping those best fitting parameters equal across both conditions, we measured the dynamic of the apparent learning rates of the models, defined as the ratio between the current update of the reward estimate (θ_t+1-θ_t) and the prediction error leading to this update (y_t+1-θ_t). The hierarchical model shows a transient increase in its apparent learning rate whenever a change point occurs, reflecting that it gives more weight to the observations that follow a change point. Such a dynamic adjustment of the apparent learning rate was reported in humans [5]. The flat model showed a qualitatively similar effect, despite the leakiness of its count being fixed. Note that since there are more change points in the higher volatility condition (dashed lines), the average learning rates of both models also increase overall with volatility, as previously reported in humans [2]. The lines show mean values across 1000 simulations; s.e.m. was about the line thickness and therefore omitted.

More »

Expand

Fig 2.

Correlation between the hierarchical and flat models in a classic probability learning task is higher for probability estimates than for confidence levels.

We simulated a classic probability learning task, similar to the one by Behrens et al 2007. In this task, the binary observation made on each trial (e.g. presence or absence of reward) is governed by a probability that changes discontinuously at so-called change points. For the sake of generality, we varied the volatility (probability of a change point) and the step size of those changes (minimum fold change, in odds ratio, affecting the generative probability). For each combination of volatility and step size, we simulated 100 sequences to achieve stable results and we fit the single free parameter of each model (respectively, a priori volatility and leak factor) onto the actual generative probabilities of the observed stimuli in the sequences. The resulting parameterized models therefore return their best possible estimate of the hidden regularities, in each volatility-step size condition. We then simulated new sequences (again, 100 per condition) to measure (A) the correlation between the estimated probabilities of stimuli between the two models, and (B) the correlation (Pearson’s rho) between the confidence (log-precision) that those models entertained in those estimates. The correlations indicate that probability estimates are nearly indistinguishable between the two models, whereas their confidence levels are more different.

More »

Expand

Fig 3.

Behavioral task: Learning of dynamic transition probabilities with confidence reports.

(A) Probability learning task. Human subjects (N = 23) were presented with random sequences of two stimuli, A and B. The stimuli were, in distinct blocks, either auditory or visual and they were perceived without ambiguity. At each trial, one of either stimulus was sampled according to a probability that depended on the identity of the previous stimulus: p(A_t|A_t-1) and P(B_t|B_t-1). These transition probabilities underwent occasional, abrupt changes (change points). A change point could occur at any trial with a probability that was fixed throughout the experiment. Subjects were instructed about this generative process and had to continuously estimate the (changing) transition probabilities given the observations received. Occasionally (see black dots in A), we probed their inferences by asking them, first, to report the probability of the next stimulus (i.e. report their estimate of the relevant transition probability) and second, to rate their confidence in this probability estimate. (B, C) Subjects’ responses were compared to the optimal Bayesian inference for this task. Numeric values of confidence differ between subjects and models since they are on different scales (from 0 to 1 in the former, in log-precision unit in the latter). For illustration, the optimal values were binned, the dashed line (B) denotes the identity, the plain line (C) is a linear fit, and data points correspond to subjects’ mean ± s.e.m.

More »

Expand

Fig 4.

A qualitative signature of hierarchical learning in confidence reports.

(A) Divergent predictions of hierarchical versus flat learning models. Two fragments of sequences are shown in which one stimulus (‘A’) is consecutively repeated 10 times. In the upper fragment, this streak of repetitions is highly unlikely (or ‘suspicious’) given the context, and may indicate that the underlying statistics changed. By contrast, in the lower fragment, the same streak is not unlikely, and does not suggest a change point. The heat maps show the posterior probability distribution of P(B|B), i.e. the probability of a repetition of the other stimulus (B), estimated by the hierarchical and flat models. In a hierarchical model, unlikely streaks arouse the suspicion of a global change in statistics, causing the model to become uncertain about its estimates of both transition probabilities, despite having acquired no direct evidence on P(B|B). In a flat model, by contrast, a suspicious streak of As will not similarly decrease the confidence in P(B|B), because a flat model does not track global change points. To test for this effect, pre/post questions (indicated by a star) were placed immediately before and after selected streaks, to obtain subjective estimates of the transition probability corresponding to the stimulus not observed during the streak. Streaks were categorized as suspicious if they aroused the suspicion of a change point from the hierarchical, Bayes-optimal viewpoint. Note that the flat model also shows a decrease in confidence, because it progressively forgets its estimates about P(B|B) during a streak of As, but, there is no difference between suspicious and non-suspicious streaks. (B) For the sequences presented to subjects, the change in confidence (post-streak minus pre-streak) was significantly modulated by streak type in the hierarchical model, but not in a flat model. (C) Subjects’ confidence showed an effect of streak type predicted by the optimal hierarchical model. As in Fig 4C, confidence values in subjects and models are on different scales. Error bars correspond to the inter-subject quartiles, distributions show subjects' data; significance levels correspond to paired t-test with p<0.005 (**) and p<^10–12 (***).

More »

Expand

Fig 5.

Control experiment: Subjects take into account the higher-order structure of the dynamics.

In the control experiment, change points were uncoupled between the two transition probabilities, thereby abolishing the possibility to infer a change in one transition probability by only observing the other transition type. (A) Theoretical predictions for changes in confidence around the target streaks. The optimal hierarchical model for the main task assumes that change points are coupled (“hierarchical model, coupled changes”), which is no longer optimal in the case of uncoupled change points. This model was nevertheless used to identify suspicious and non-suspicious streaks and indeed it showed an effect of streak type on the change in confidence here in the control task as in the main task (Fig 4C). The optimal hierarchical Bayesian model for this control experiment is similar to this first model, the only difference is that it assumes that change points are uncoupled (“hierarchical model, uncoupled changes”). As expected, this model correctly showed no effect of streak type on the change in confidence. The flat model, by definition, ignores change points and therefore whether they are coupled or uncoupled, as a result it shows no effect of streak type (as in the main experiment). (B) Subjects showed no difference between streak types, like the hierarchical model for uncoupled changes. The results of the main task are reproduced from Fig 4C to facilitate visual comparison. (C) Subjects overall perform well in the control task, showing a tight agreement with the optimal hierarchical model for uncoupled change (the optimal model for this task) for both predictions (left) and confidence (right). In panels A and B, the error bars correspond to the inter-subject quartiles, distributions show subjects' data. In panel C, data points are mean ± s.e.m across subjects. In all panels; significance levels correspond to p = 0.048 (*), p<0.01 (**), p<0.001(***) in a two-tailed t-test.

More »

Expand