Goal-directed navigation in humans and deep reinforcement learning agents relies on an adaptive mix of vector-based and transition-based strategies

doi:10.1371/journal.pbio.3003296

Goal-directed navigation in humans and deep reinforcement learning agents relies on an adaptive mix of vector-based and transition-based strategies

Fig 2

Human participants benefit from freely arbitrating between vector- and transition-based strategies.

A: Performance for each participant across the different conditions of Experiment 1. Y-axis represents number of steps taken to reach a goal on a logarithmic scale, while x-axis represents the different conditions. Each dot represents an individual participants’ performance on each condition, and the dashes represent the mean performance across all participants, with error bars representing the 95% CI. B: Relationship between the proportion of steps made using direction responses (x-axis) and the number of steps taken to reach a goal (y-axis; represented on a logarithmic scale) for each participant. The different colors represent different types of environments, while the lines represent the best-fitting quadratic curve. C: Selected sample participant trajectories on the task. Participants’ trajectories progress from the darker squares to the lighter squares. A red circle indicates the location of a landmark, and a red square indicates an obstacle in the cluttered condition. A yellow square indicates the goal for the trial. A cross indicates when participants used a state response to get to the state. In these trajectories, participants use direction responses most of the time but use state responses to get to a landmark or goal. D: Participants’ use of direction responses (y-axis) as a function of destination type (i.e., goal, landmark, or non-landmark; x-axis) and whether the state had been visited before (color of bar). Error bars represent the 95% CI. E: Performance for each participant across the different numbers of landmarks in Experiment 2. Y-axis represents number of steps taken to reach a goal on a logarithmic scale, while x-axis represents the different numbers of landmarks. Each dot represents an individual participants’ performance on each condition, and the dashes represent the mean performance across all participants, with error bars representing the 95% CI. Data and code underlying this figure are available at https://osf.io/w39d5/ and https://github.com/denis-lan/navigation-strategies, respectively.

doi: https://doi.org/10.1371/journal.pbio.3003296.g002