A flexible and generalizable model of online latent-state learning
Fig 1
A) A learning agent’s world view whereby rewards are generated according to cues, a latent state, and a latent error. In order to predict rewards, they must infer which latent state is active, the relationship between cues and rewards for each latent state, and the expected uncertainty in rewards due to the latent error. B) The proposed model for how a learning agent inverts their world view. They first observe cues to generate expectations or predictions for rewards based on L estimates of associative strengths corresponding to L latent states. Upon observing rewards, they use errors in their predictions to update associative strengths, measures of uncertainty, and beliefs in which state is active. The degree to which associative strengths can be updated depends on both the agent’s belief in the corresponding latent state and the corresponding effort matrix, which keeps track of how cues covary.