Goal-directed navigation in humans and deep reinforcement learning agents relies on an adaptive mix of vector-based and transition-based strategies

doi:10.1371/journal.pbio.3003296

Goal-directed navigation in humans and deep reinforcement learning agents relies on an adaptive mix of vector-based and transition-based strategies

Fig 5

Participants successfully choose landmarks that are beneficial for few-shot navigation.

A: Performance on the second day in the free-sampling and forced-sampling groups as measured by number of steps taken to get to a goal (y-axis, represented on a logarithmic scale). Each dot represents an individual participants’ mean performance across the task and the dash represents mean performance across all participants, with the error bars representing the 95% CI. B: Average distance from all states to their nearest landmarks for participants in the free-sampling group. The dotted line represents the mean distance expected by chance. C: Average distance from landmarks to the center for participants in the free-sampling group. The dotted line represents the mean distance expected by chance. D: Mean error on probe trials for participants in the free-sampling and forced-sampling conditions. E: Results from a principal component analysis (PCA) conducted on each participants’ number of samples on each location in the 8 × 8 grid. The loadings for each of the locations in the 8 × 8 grid on the three sampling-related principal components are shown in the first row. The second row shows the relationship between each component (x-axes) and navigation performance, as measured by number of steps taken to reach a goal (y-axes, represented on a logarithmic scale). Each dot represents an individual participant, and the line represents the best-fitting line. The third row shows the same for the deep RL model, with each dot represents the model’s performance when tested on the landmarks sampled by an individual human participant. F: Model performance, as measured by steps taken to goal (y-axis, represented on a logarithmic scale) when tested on the freely selected landmarks chosen by the free-sampling group or the randomly chosen landmarks that the forced-sampling group was exposed to. Each dot represents the model’s performance when tested on the landmarks a single human participant sampled or was exposed to. G: Correlation between participant’s performance (as measured by number of steps taken to reach the goal, represented on a logarithmic scale) and the model’s performance when it is tested on the landmarks chosen by each participant. Participants’ performance on the navigation phase was significantly associated with the models’ performance when it was tested on the landmarks they chose. Data and code underlying this figure are available at https://osf.io/w39d5/ and https://github.com/denis-lan/navigation-strategies, respectively.

doi: https://doi.org/10.1371/journal.pbio.3003296.g005