Emergence of belief-like representations through reinforcement learning

doi:10.1371/journal.pcbi.1011067

Emergence of belief-like representations through reinforcement learning

Fig 2

Observations, model representations, value estimates, and reward prediction errors (RPEs) during Task 2.

A. State transitions and observation probabilities in Task 2. Each macro-state (ISI or ITI) is composed of micro-states denoting elapsed time; this allows for probabilistic reward times and minimum dwell times in the ISI and ITI, respectively. B. Observations emitted by Task 2 during two example trials. Note that omission trials are indicated only implicitly as the absence of a reward observation. C. Example representations (b_t, z_t) and value estimates () of two models (Belief model, left; Value RNN, right) for estimating value in partially observable environments, after training. D. After training, both models exhibit similar RPEs.

doi: https://doi.org/10.1371/journal.pcbi.1011067.g002