Surprise-minimization as a solution to the structural credit assignment problem

doi:10.1371/journal.pcbi.1012175

Surprise-minimization as a solution to the structural credit assignment problem

Fig 3

Simulated data for the validation of the surprise minimization model.

A. Choice behavior when action selection is fully driven by the first-level policy representing the correct mapping between decisions and outcomes. B. Choice behavior when action selection is fully driven by the lower-level loop representing the incorrect mapping. C. Choice behavior when action selection is arbitrated between policies by the inference process of the surprise minimization model. D. Distribution of surprise signals calculated as the absolute prediction errors for both the correct and the incorrect policy. E. Illustration of the evidence accumulation process. Surprise is calculated for both the correct (green) and incorrect (red) mapping (top panel). The evidence signal is calculated as the difference between these two surprise signals (middle panel). Accumulation of evidence and development of the arbitration weight (logit^-1(ω)) over the course of a block (bottom panel). Starting in a state of uncertainty (0.5), the inference process gradually und robustly establishes the correct arbitration weight, leading to increasingly optimal credit assignment.

doi: https://doi.org/10.1371/journal.pcbi.1012175.g003