Reward-predictive representations generalize across tasks in reinforcement learning

doi:10.1371/journal.pcbi.1008317

Reward-predictive representations generalize across tasks in reinforcement learning

Fig 6

Maze curriculum.

Maze A and Maze B are augmented with an irrelevant state variable to construct a five-task curriculum. In each maze, the agent starts at the blue grid cell and can move up, down, left, or right to collect a reward at the green goal cell. The black lines indicate barriers the agent cannot pass. Once the green goal cell is reached, the episode finishes and another episode is started. (These rewarding goal cells are absorbing states.) Transitions are probabilistic and succeed in the desired direction with probability 0.95; otherwise the agent remains at its current grid cell and cannot transition off the grid map or through a barrier. A five-task curriculum is constructed by augmenting the state space either with a “light” or “dark” colour bit (first, third, and fourth task), or the right half of the maze is augmented with the colour red, green, or blue (second and fifth task).

doi: https://doi.org/10.1371/journal.pcbi.1008317.g006