Meta-Reinforcement Learning reconciles surprise, value, and control in the anterior cingulate cortex
Fig 3
Speeded decision-making task simulation.
a: Task layout. The RML was presented with two possible choices (fractal images). Each image was assigned a hidden reward value ranging from 2 to 7 (bottom fractals list). The agent’s task was to choose the image with the higher reward, and it had to do this as quickly as possible. We tested the RML on 36 different combinations of reward values for the two images. After the RML made its choice, it received the reward associated with the selected image. b: The MRI results from Vassena et al. [32] (adapted). The dACC activity is shown in the black line, while the grey, dashed line shows the best fitting quartic function to this data. c: The dACC activity as simulated by the RML (black line), and the best fitting quartic function (blue, dashed line). This activity is the sum of the value (panel d) and the cognitive control (panel e). d: The value component of the RML activity. e: Cognitive control signal (RML boost) as a function of stimuli value difference. f: RML surprise-related activity. g: Activation clusters within the MPFC (based on data extracted from the figures from Vassena et al. [32]). Blue: dACC activation as a mixture of value and cognitive control, RML prediction in panel c; Red: vMPFC value-based activation, RML prediction in panel d; Green: mid-cingulate activation relative to average surprise, RML prediction in panel f.