Surprise-minimization as a solution to the structural credit assignment problem

doi:10.1371/journal.pcbi.1012175

Surprise-minimization as a solution to the structural credit assignment problem

Fig 1

A. Schematic decision-outcome representations in two variants of the bandit task. In the single bandit task, one decision (d1) is followed by one outcome (o1). In the multiple-bandits task two decisions (d1, d2) are followed by two outcomes (o1, o2). White circles constitute the different states of the task. Gray and colored boxes indicate the true causal structure, called decision-outcome mapping. Colored arrows indicate the correct or incorrect policy, where correctness relates to the match between causal structure and an agent’s representation. Outcomes are considered relevant if they belong to the correct representation and irrelevant if the belong to the incorrect representation. B. Graphical representation the multiple-bandits task, as implemented in the study. P and Q define the outcome value associated with each action. Over the course of a block, these values are subject to independent Gaussian random walks, as depicted in the boxes below. C and D. Trial representation of the multiple-bandits task in Experiment 1 and 2. E and F. Trial representation of the transfer task in Experiment 1 and 2. For the transfer task, stimuli from the different decision were mixed and participants were instructed to always choose the stimuli associated with a specific color (e.g., blue was associated with the star and the circle).

doi: https://doi.org/10.1371/journal.pcbi.1012175.g001