Emergence of belief-like representations through reinforcement learning
Fig 2
Observations, model representations, value estimates, and reward prediction errors (RPEs) during Task 2.
A. State transitions and observation probabilities in Task 2. Each macro-state (ISI or ITI) is composed of micro-states denoting elapsed time; this allows for probabilistic reward times and minimum dwell times in the ISI and ITI, respectively. B. Observations emitted by Task 2 during two example trials. Note that omission trials are indicated only implicitly as the absence of a reward observation. C. Example representations (bt, zt) and value estimates () of two models (Belief model, left; Value RNN, right) for estimating value in partially observable environments, after training. D. After training, both models exhibit similar RPEs.