Grid Cells, Place Cells, and Geodesic Generalization for Spatial Reinforcement Learning
Figure 2
Qualitative comparison of learned value functions using tabular, Euclidean grid cell, and Euclidean place cell bases.
In each figure A–C, the column titles indicate the representation used to learn the value functions for a given gridworld configuration, and each row corresponds to an environment. White lines are walls, discrete squares indicate states, and the gray scale from dark to light indicates low to high value, respectively. To ease comparison between spatial representations within a given gridworld, the image brightness was normalized with respect to the optimal value function. (A) Snapshot of value representation after 15 learning trials. (B) Snapshot of value representation after 25 learning trials. (C) Snapshot of value representation after 50 learning trials. Notice that for both grid cells and place cells, the value representation bleeds across walls, indicated by red arrows where the estimated value is too low (relative to ground truth) on the side of a wall nearer a reward or too high on the far side.