Locus Coeruleus tracking of prediction errors optimises cognitive flexibility: An Active Inference model
Fig 8
Reversal learning during the go/no-go task.
(a)–(c) show the performance of an agent with a value of model decay determined by state-action prediction error during a reversal of cues in the go/no-go task. The agent begins with a well-trained understanding (via 750 trials of training) that cue 2 indicates that a reward is available. At trial 35 (t = 70) the cue/context relationship is reversed, and the agent must now learn that cue 1 indicates the ‘Go’ context. This initially causes numerous unsuccessful trials, violating the learnt model and producing high prediction errors (a). Note that prediction errors are initially elevated at both timepoints in each trial because both the previously rare cue and the subsequent lack of reward are unexpected. These prediction errors result in a lowering in the parameter decay factor (b), which in turn flattens the agent’s priors causing more variability in behaviour. Eventually the agent learns the new contingencies and the model stabilises, with the re-emergence of phasic bursts of LC activity on ‘Go’ trials (a, c). From trial 125 onwards, the peak of phasic activity begins to transition towards the presentation of the cue rather than the reward. Plot (d) is a graphical representation of behaviour during the task at times t = 2 and t = 3 for each trial, in which the position of the coloured block describes the agent’s location and the colour shows the agent’s observation after moving. (e) shows performance over 50 repeats of the reversal learning task shown in (a), for agents with a fixed or flexible value of α. All agents begin with a near optimal d’ value (measured over bins of 20 trials). However, only the agent with α determined by the state-action prediction error is able to return to optimal levels of performance within the 300 trials shown. (f) and (g) show characteristics of the mean prediction error response to ‘go’ and ‘no-cue’ cues during the static (non-reversed) task as reward and probability parameters are varied, for agents with a flexible value of α((f) ***P<0.0001; one-way ANOVA between different c values, followed by Tukey posthoc test, (g) ** P<0.001, ***P<0.0001; two tailed Student’s t-test between go/no go contexts for fixed cue probabilities, one way ANOVA followed by Tukey posthoc test for ‘go’ peaks with different cue probabilities).