Skip to main content
Advertisement

< Back to Article

Goal-directed navigation in humans and deep reinforcement learning agents relies on an adaptive mix of vector-based and transition-based strategies

Fig 4

Models spontaneously develop separate modules for ‘vector-‘ and ‘transition’-based strategies.

A: Example heatmaps for a unit with a ‘landmark’, ‘spatial’, or ‘conjunctive’ response pattern in two different environments. Red circles on the heat maps denotes the presence of a landmark in that location. ‘Spatial’ units respond stably to certain regions of space across environments regardless of landmark configuration, ‘landmark’ units respond to all landmarks across environments, while ‘conjunctive’ units respond to landmarks differently across different environments. B: Stacked bar plots showing the proportion of response pattern types across all units or in the ‘vector’, ‘transition’, or ‘unspecialized’ clusters. ‘Vector’ units are more likely to have spatial responses, while ‘transition’ units are more likely to have conjunctive or landmark responses. C: Scatter plot showing the R2 value for the correlation between the activation of each LSTM unit’s cell state and the output values of the either the ‘direction’ or ‘state’ actions in the policy network. Values are normalized to be between 0 and 1. The 20 units that explained the most variance in ‘direction’ or ‘state’ actions were designated as the ‘vector’ and ‘transition’ units, respectively, while the 20 units that explained the least variance in either type of action were designated as ‘unspecialized’ units. D: Performance deficit of lesioned models (as measured by excess number of steps taken to get to the goal compared to an intact model) on the both, directions-only, and states-only conditions. Each dot represents one of the 20 trained models, and the line represents the mean, and the error bar represents the 95% confidence interval. Lesioning ‘vector’ units leads to deficits in the directions-only condition, while lesioning ‘transition’ units leads to deficits in the states-only condition. Lesioning ‘vector’ and ‘transition’ units both lead to deficits in the both condition. E: Change in use of ‘direction’ actions after lesions to the ‘vector’, ‘transition’ and ‘unspecialized’ units, compared to the unlesioned models. Each dot represents one of the 20 trained models, and the line represents the mean, and the error bar represents the 95% confidence interval. Lesioning ‘vector’ units leads to a decrease in use of ‘direction’ actions and lesioning ‘transition’ units leads to a decrease in use of ‘state’ actions. F: Decoding error on held-out time steps for current and goal locations (as measured by Euclidean distance) for the ‘vector’, ‘transition’ and ‘unspecialized’ units. Current and goal locations are both best decodable from ‘vector’ units. G: Decoding error on held-out time steps for whether the agent is currently adjacent to a landmark or a goal. Goal and landmark adjacency are both best decodable in ‘transition’ units. H: First three principal components for the PCA on the cell state activations of ‘vector’ units. Each dot represents the centroid of the PCs for each location in the grid. Red and blue dots represent the PCs before and after a landmark is encountered, respectively. The representations of ‘vector’ units faithfully reflect spatial structure after a landmark is encountered. PCA results are shown for one representative model. I: First three principal components for the PCA on the cell state activations of ‘transition’ units. Each dot represents the centroid of the PCs for each location in the grid. Purple and green dots represent the PCs for non-landmarks and landmarks, respectively. The representations of ‘transition’ units seem to separate landmarks and non-landmarks without apparent spatial structure. Data and code underlying this figure are available at https://osf.io/w39d5/ and https://github.com/denis-lan/navigation-strategies, respectively.

Fig 4

doi: https://doi.org/10.1371/journal.pbio.3003296.g004