Meta-Reinforcement Learning reconciles surprise, value, and control in the anterior cingulate cortex

doi:10.1371/journal.pcbi.1013025

Meta-Reinforcement Learning reconciles surprise, value, and control in the anterior cingulate cortex

Fig 4

Verbal WM task simulation.

a. Setup of the verbal WM task. During each trial, the RML was exposed to either 1, 4, 6 or 8 words, generating four different difficulty levels. After a delay of 10s, the model was presented with a target word that matched one of the memorized words in 50% of trials. The model’s goal was to indicate whether the target word matched one of the words presented before. b. fMRI results from Engstrom et al. [31] showing dACC activity (red cluster in subpanel) as a function of WM load (difficulty levels 1-4), dashed line shows linear fitting. C. Boost signal from RML as a function of WM load, dashed line shows linear fitting. d-e. RML surprise and expected value (average of MPFC_Act and MPFC_Boost) as a function of WM load. Dashed lines show linear fitting.

doi: https://doi.org/10.1371/journal.pcbi.1013025.g004