Impact of symmetry in local learning rules on predictive neural representations and generalization in spatial navigation

doi:10.1371/journal.pcbi.1013056

Fig 1.

Model and learning rule.

(A) Cartoon depiction of the model we are using in the main text. A recurrently connected population of neurons p₁, putatively CA3, receives external input , putatively from dentate gyrus or entorhinal cortex(EC). It projects to another population p₂, which receives input . The latter could be CA1 receiving input again from EC. Note that there are no recurrent connections in the second layer and no backwards connections. (B) Quantities relevant for the update of synapse W_ij: pre- and postsynaptic activities, as well as the sum of the total input to the postsynaptic neuron through the synapses W. (C), (D) Invariance of learning rules with respect to temporal order. We plot synaptic weight change of a single synapse in a setup with a single pre- and postsynaptic neuron, respectively. The right column has the same pre- and postsynaptic activities as the left column, only in reverse order. In (C), the learning rule with parameters is used, while in (D) Only in the latter the synaptic weight changes are preserved (in reverse order), while in (C), postsynaptic activity before presynaptic activity leads to a net weight decrease. Note that in this illustrative example W is fixed, in reality, network dynamics and weights would influence each other and lead to more complex changes.

More »

Expand

Fig 2.

Successor Representations learned in circular random walks.

We construct a circular state space with possible actions stay, move clockwise and move anti-clockwise. We simulate three random walks, one where the actions are selected uniformly (first row), one where clockwise actions are preferably selected (second row) and one where anti-clockwise actions are preferably selected (third row). The first column shows an example trajectory of the respective walk. The second and third column show the successor representations learned by the first and second layer of our model, using a symmetric ( and an asymmetric () learning rule, respectively. Note how the successor representation learned with a symmetric rule does not distinguish between the policies. Here, the inputs to the cells are one-hot vectors encoding the respective states and the plotted successor representations are obtained by taking the average population activity in the respective states.

More »

Expand

Fig 3.

Successor representations are learned for a variety of inputs, dynamics, and parameters.

Top: Convergence of recurrent (red) and feedforward (blue) matrices to their theoretical limit with random features in circular (left) and arbitrary (right) random walks. Bottom: Convergence of recurrent weight (left) and feedforward weight (right) for different parameters . The other set of parameters is fixed to (1,0) and in these experiments, respectively. In graphs, we measure convergence by the loss term as explained in Methods section. In the bottom row, we compute the fraction of the loss at the final step over the initial loss and display the result in a logscale. Thus, negative values indicate converging towards the target. Note that the values on the antidiagonal are approximately 0.

More »

Expand

Fig 4.

Symmetric learning rule leads to more stable place fields in linear track.

We simulated an experiment with a rat repeatedly running on a linear track, similar to [37]. A two-layer SR network was used where the recurrent weights had a symmetric (A, middle row) or asymmetric (A, top row) learning rule. In the the symmetric case, there is less shift of the centre of mass of place fields in the modelled CA3 population (red) than in the CA1 population (blue), which is not the case in the asymmetric version. Histograms show distribution of shifts comparing last five laps versus first five laps, while the rightmost plot shows shift relative to the 12-th lap. The results in the symmetric case are qualitatively similar to data (A, bottom row) from Ca^{2 +} recordings of hippocampal neurons in a similar experiment - figure adapted from Dong, C., Madar, A. D., & Sheffield, M. E. (2021)([37]). In B, we show firing rates of an exemplar cell from CA3 and CA1 respectively, where the symmetric learning rule is used for CA3. The firing rates in each position are averaged over the first and last five laps, and plotted relative to the centre of mass in the first laps. With experience, only the place field in CA1, not the place field in CA3 shifts backwards (arrow indicates direction of travel).

More »

Expand

Fig 5.

Symmetric successor representation agent affords better generalization in simple navigation tasks.

Left: Agents started in random locations in the environments and had to learn to navigate to fixed targets. After 400 episodes, reward location was switched to a new random location, where agents could only relearn the reward prediction vector but not the SR. (Generalization) performance is visualized by total number of steps taken per episode, for an agent using the classical rule (blue) and an agent using the symmetric rule (red). Dashed line indicates change of target location. We show the average performance over different environments as performance is qualitatively similar, see S3 Fig for plots in individual environments. Right: Similar to left plot, but instead of switching target after a fixed number of episodes, the target was switched when the previous target was found with a fixed accuracy. Violin plots show distribution of suboptimality (steps - optimal number of steps) over all environments, for individual environments see S4 Fig. For an outline of the environments see S2 Fig.

More »

Expand

Fig 6.

Comparison of policy entropy and sensitivity to learning rate among agents with symmetric and asymmetric learning rule.

Left: Policy entropy of the two agents show different trajectories in the generalization experiment. We calculated the entropy of the agent’s policy, averaged over all states, at the end of each episode. This reveals that during learning to navigate to the first target, the symmetric agent has more entropy, which is then reversed when the new target has to be reached. Right: Symmetric agent shows more sensitivity to learning rate parameter for lower learning rates. We trained the agents repeatedly until a fixed accuracy in navigation to the target was met. We then recorded the number of episodes it took until that criterion was reached. Curves show median and interquartile range of this number for the two agents.

More »

Expand

Fig 7.

Symmetric learning rule provides no advantage in generalization experiment on a directed graph.

We conducted the same kind of experiment as in Fig 5 on a directed graph. Left: The state space is tree-like, with the addition that from the leaf nodes at the last level one travels back to the central node (orange dashed line). Right: In this scenario an SR agent with the classical learning rule (blue) performs better in generalization than one with the symmetric learning rule (red).

More »

Expand

Fig 8.

Variations of symmetry in the learning rule

Experiment for all plots is the same as in Fig 5. Top: Generalization for parameters . Violin plots show distribution of differences (steps-optimal number of steps) when evaluated on new targets. Distributions broaden towards the optimal value of 0 with increased symmetry. Middle: Generalization with parameters , randomly initialized for each pair of states. Bottom: Generalization with noise added to , at each timestep.

More »

Expand

Fig 9.

Generalization performance in maze task with blocked paths

Top: Grid world mazes used for generalization task. Leftmost maze was used for training, the other three environments for testing the generalization. Bottom: Violin plots show distribution of suboptimality (steps - optimal number of steps) of the agents when using the successor representation trained on one target and tested on another one. Training and test targets are randomly drawn from all possible states in the respective environments. The distribution for the symmetric agent is broader around 0, which indicates optimal generalization, and less pronounced at 400, which was the maximum number of steps allowed.

More »

Expand