Accounting for sensitivity of latent learning to behavioral statistics with successor representations

doi:10.1371/journal.pcbi.1014131

Accounting for sensitivity of latent learning to behavioral statistics with successor representations

Fig 1

Simulation environments.

A) Spatial setup of the environments. The gridworld is a arena for both one-hot encoding (top-left) and image-based (bottom-left) representations. The Tolman maze [3] is either a 72-state environment (top-right) with one-hot encoding representations, or a 30-state environment with image representations (bottom-right). In the Tolman maze, virtual doors (dotted lines), if activated, can restrict the agent’s movement to only one direction (along arrow). The starting and goal states are indicated by s and g, respectively. The states marked with i are the additional starting locations during the continuous pre-exposure, while m is the incorrect goal location in the mistargeted pre-exposure. The locations of these distinct states shown in the one-hot encoding setting are the same in the image-based environments. B) In the image-based learning, a pre-trained VGG16 network [21] encodes a RGB image into a state feature that is further used as the input to the deep SR networks. C) Cosine similarity between each pair of state features in the gridworld. Features of closeby locations have high similarities, but diverge for distant locations. In this, and comparable plots, the states are stacked to a 100-dimensional vector to facilitate the plotting of similarity between all states.

doi: https://doi.org/10.1371/journal.pcbi.1014131.g001