Accounting for sensitivity of latent learning to behavioral statistics with successor representations
Fig 11
Latent learning observed with different RL exploration strategies.
Evaluation of the softmax policy for exploration in the A) gridworld and B) Tolman maze (average of 30 simulations, error bars are S.E.M). Latent learning agents consistently exhibit faster learning during the learning phase, compared to direct learning agents. The subtle performance differences between targeted, continuous, and mistargeted pre-exposures persist even as the exploration policy changes. C) Restricting the agents movement, by introducing doors that are only passable in one direction, in the Tolman maze during pre-exposure and learning phases significantly influences the performance of the DSR (left) and Dyna-DQN (right) agents (average over 30 simulations, with a maximum of 500 steps per trial, error bars are S.E.M.). Performance is improved when doors are active for the DSR agents and even more so for the Dyna-DQN agent. Legend labels for agents with pre-exposure indicate the door conditions during pre-exposure and learning phases, e.g., “No doors/Doors” indicates that doors were not active during pre-exposure, but were during learning.