Goal-directed navigation in humans and deep reinforcement learning agents relies on an adaptive mix of vector-based and transition-based strategies
Fig 3
Deep RL model meta-trained for few-shot navigation recapitulates key features of human behavior.
A: architecture of the deep reinforcement learning network. The network consisted of an LSTM with separate policy and value heads. B: model performance across the different conditions. Y-axis represents number of steps taken to goal on a logarithmic scale, while x-axis represents the different conditions. Each dot represents an individual model’s performance on each condition, and the dashes represent the mean performance across all participants, with error bars representing the 95% CI. C: scatterplot showing the correspondence between model performance (x-axis, as measured by number of steps to goal, represented on a logarithmic scale) and human performance (y-axis) for each condition. The colors of the scatter points represent different action conditions, while the shapes represent the type of environment. Error bars represent the 95% confidence interval. D: Performance (as measured by number of steps to goal; on y-axis, on a logarithmic scale) and proportion of steps using vectors (x-axis) of the models (colored dots) superimposed on scatter plot of human performance (translucent dots) and best-fit quadratic curve for the relationship between the proportion of steps made using vector-based responses and performance in humans. D: Models’ use of vector-based responses (y-axis) as a function of destination type (i.e., goal, landmark, or non-landmark; x-axis) and whether the state had been visited before (color of bar). Each dot represents the behavior of an individual model, with error bars representing the 95% CI. Data and code underlying this figure are available at https://osf.io/w39d5/ and https://github.com/denis-lan/navigation-strategies, respectively.