Goal-directed navigation in humans and deep reinforcement learning agents relies on an adaptive mix of vector-based and transition-based strategies
Fig 1
Task design and experimental set-up.
A: underlying structure of the 8 × 8 grid, unseen by participants. Every state is represented by an image of an object, and these objects and their positions change on every trial. B: schematic diagram of the ‘map reading’ phase of each trial. Participants see a top–down view of the grid with objects obscured and successively click on blue squares to reveal ‘landmark’ objects at the location. After 16 clicks have been completed, a yellow square appears. Clicking on the yellow square reveals the ‘goal’ object for the trial. C: schematic diagram of the navigation phase of each trial. Participants start in a random, previously unobserved location and are tasked with navigating to the ‘goal’ object they had just learnt about (displayed at the top). They can navigate in two ways. First, they could choose a direction to travel in by clicking on the corresponding arrow (highlighted yellow). This is analogous to using a ‘vector-based’ strategy. Alternatively, they could choose an adjacent state to travel to by clicking on one of the associated images (displayed in a random order; highlighted blue). This corresponds to using a ‘transition-based’ navigation strategy. Both response methods were equivalent in that they both only allowed participants to move to the four adjacent states, but setting up the response methods in this way allowed us to determine if participants were focusing more on the direction they were travelling in, or the identity of the next state they would be transitioning to.