Meta-Reinforcement Learning reconciles surprise, value, and control in the anterior cingulate cortex
Fig 2
RML-external modules interactions in different tasks.
a: RML-DDM interaction during the speeded decision-making task and the foraging task. The RML received input from the environment (about rewards and environmental states), and controlled a task-specific module (the DDM), which helped in task execution. δv: difference in the expected value of the two options (different fractals for the speeded decision-making, “forage” or “engage” for the foraging task), whose absolute value determined the drift rate, while its sign determined the drift direction (up or down). The LC output modulated the decision boundaries, influencing the decision time. In the foraging task only, the LC output additionally influenced the bias of the DDM towards the “engage” option (human propension for “engaging” 17,35). This bias was set to 0 for all trials in the speeded decision-making task. b: RML-cRNN interaction during the execution of the verbal WM task. The LC output modulated the gain of the neural units in the articulatory process layer, improving words retention in WM.