Information uncertainty influences learning strategy from sequentially delayed rewards

doi:10.1371/journal.pcbi.1013879

Information uncertainty influences learning strategy from sequentially delayed rewards

Fig 2

Graphical representation of task conditions, learning models, and value functions.

A. Differences in feedback presentation based on the participant’s condition and the outcome used to generate the prediction error for immediate feedback. B. Example sequence of two-alternative forced choice trials and the participant’s selection (darker arrow) of object 2 (Obj 2) in the current trial (red), the previous trial (T-1), and the trial-minus-two (T-2). The colors correspond to the tabular model, which updates the immediate choice (+1) and trial-minus-two choice (+4), each generating a prediction error to update the value function. C. Temporal sequence of assigning credit (shown as a blue heatmap). In this model, the tabular model skips assigning credit to the previous state (S2). The triple period signifies that credit assignment can extend beyond the three depicted states. Note that the extent to which past states are assigned credit in each model depends on the free parameter lambda: for eligibility, higher lambda values mean that credit extends further back in time (less decay), while specifically for tabular, higher lambda values mean less discounting of the trial-minus-two state. D-E. Value functions for the tabular model (D), which involves separate, independent, updates for the immediate and delayed chosen options, and for the eligibility trace (E), which utilizes a single prediction error for updates. S: State, Obj: Object, I: Immediate, D: Delay.

doi: https://doi.org/10.1371/journal.pcbi.1013879.g002