Reservoir computing model of prefrontal cortex creates novel combinations of previous navigation sequences from hippocampal place-cell replay with spatial reward propagation

doi:10.1371/journal.pcbi.1006624

Fig 1.

An optimal trajectory between feeders ABCDE is represented in panel A. Panel B, C and D display non optimal trajectories that contain a sub trajectory of the ABCDE trajectory. The sub trajectory shared with the ABCDE trajectory is displayed in red and the non-optimal parts in blue. Panel B contains the ABCED, panel C the EBCDA trajectory and panel D the BACDE trajectory.

More »

Expand

Fig 2.

Place-cell and snippet coding.

Panel A represents the place-cell activations that correspond to a single point. Place-cell centers are represented by red points and the mean firing rate of each place-cell by a red circle with a fixed radius, centered on the place-cell center. The transparency level of the circle represents the magnitude of the mean firing rate. Panel B depicts the ABCED trajectory, and a snippet randomly drawn along the trajectory. The snippet length is s = 5. Panel C represents the snippet replay likelihood as learnt by the Hippocampal replay model by propagation of reward from rewarded locations at ABCE and D. Panel D represents the raster of the place-cell activation along the ABCED trajectory. The time index where feeders A,B,C,D and E are encountered during the ABCED trajectory are tagged above the raster and represented by a thin white vertical line. The snippet represented in panel B is emphasized by a blue rectangle in panel D. Panel E represents the spatial extent of the snippet replay likelihood. F illustrates part of a typical random replay episode, where multiple snippets from remote locations are replayed.

Expand

Expand

Expand

Expand

Fig 3.

Reservoir computing model.

The Temporal Recurrent Network (TRN) is a model of the prefrontal cortex (PFC) that takes into account cortico-cortical loops by defining a fixed recurrent adjacency matrix for the leaky integrator neurons that model PFC neurons. Inputs of the TRN are modelled hippocampus (HIPP) place-cells. During the training phase, place-cells activations are provided by the algorithmic model of SWR replay (red pathway), and the striatum model learns to predict the next place-cell activation from the PFC model states by modifying the synaptic weights that project the PFC model into the striatum model according to the delta learning rule. During the generation phase, the model is no longer learning and the place-cell activation patterns result from the new position of the agent, reconstructed with a Bayesian algorithm from the next place-cell activation prediction of the modeled striatum (blue pathway).

More »

Expand

Fig 4.

Sequence learning.

Panel A illustrates a long convoluted trajectory taken by a rat in configuration 38. Panel B illustrates the probability maps of trajectories generated by the trained model in autonomous sequence generation mode. The 2D trajectory histogram is generated by superposing the trajectories generated when 10 batches of 100 reservoirs each were trained and each model instance was evaluated 10 times with noise. Note that there are two locations where the trajectory crosses itself. At the point of crossing, there are two possible paths that preserve path continuity. The system has memory of the context of how it got to that intersection point, and thus can continue on that trajectory. This illustrates that the model is well able to learn such complex sequences.

More »

Expand

Fig 5.

Illustration of snippet integration in reservoir state space.

Here we visualize the high dimensional reservoir space in a low (2D) PCA space, in order to see how pieces (snippets) of the overall sequence are consolidated. In this experiment, the sequence ABCDE is broken into snippets, which are then used to train the model. The challenge is that only local structure is presented to the model, which must consolidate the global structure. Panels A-C represent the state trajectory of reservoir activation after 100, 1000 and 10000 snippets. While each snippet represents part of the actual trajectory, each is taken out of its overall spatial context in the sequence. Panel D represents the trajectory of reservoir state during the complete presentation of the intact sequence. Panel C reproduces this trajectory, but in addition we see “ghost” trajectories leading to the ABCDE trajectory. These ghost elements represent the reservoir state transitions from an initial random state as the first few elements of each snippet take the reservoir from the initial undefined state onto the component of the ABCDE trajectory coded by that snippet.

More »

Expand

Fig 6.

Longer paths are rejected (left), and stronger rewards are favored (right). Panels A and B illustrate snippet counts for T maze trajectories pictured in panels C and D. In Panel C, sequences begin at location A, and rewards are given at locations C and D. Based on the reward proximity and propagation, there is a higher probability of snippets being selected along path AC than path AD. This is revealed in panel A, a histogram of snippets for the sequences ABC (in Blue) and ABD (in Orange). Panels B and D illustrate how distance and reward intensity interact. By increasing the strength of the reward, a longer trajectory can be rendered virtually shorter and more favored, by increasing the probability that snippets will be selected from this trajectory, as revealed in Panel B. Panels C and D reveal the 2D trajectory histograms generated by superposing the trajectories generated when 10 batches of 100 reservoirs each were trained and each model instance was evaluated 10 times with noise. Panel E and F confirm a robust tendency to generate autonomously sequences significantly similar to the ABC and ABD sequence respectively (p-value = 0).

More »

Expand

Fig 7.

Efficient sequence synthesis.

A. Distribution of snippets drawn from the sequences illustrated in Fig 1B, 1C and 1D. Globally we observe snippet selection favors snippets from the beginning of sequence ABCED (blue), the middle of EBCDA (yellow), and the end of sequence BACDE (pink), which corresponds exactly to the efficient sub-sequences (ABC, BCD, and CDE) of these three sequences. This distribution of snippets is used to train the model. The results of the training are illustrated in panel B. Here we see a 2D histogram of sequences generated by the model in the ABCDE recombination experiment. The 2D trajectory histogram is generated by superposing the trajectories generated when 10 batches of 100 reservoirs each were trained and each model instance was evaluated 10 times with noise. Panel C displays the Frechet distance between the autonomously generated sequence and the four reference sequences. Kruskal-Wallis comparison confirms that the trajectories generated autonomously are significantly more similar to the target sequence ABCDE than to the experienced non-efficient sequences (p < 0.0001).

More »

Expand

Fig 8.

Reverse replay facilitates efficient sequence discovery.

Using the same sequences illustrated in Fig 1, we reversed the direction of sequence EBCDA, and then tested the model’s ability to synthesize the ABCDE sequence from ABCED ACDBE and BACDE. A. Error reduction with reverse replay. B. Effects of reverse replay in generation and learning.

More »

Expand

Fig 9.

Reverse replay allows novel shortcut path generation.

Panels A and B illustrate the trajectories for left and right trajectories, based on Gupta et al. After training on these two trajectories, we test the ability to generate a shortcut that makes the complete outer loop in one direction. Panel C–without reverse replay, significant spatial errors are revealed when the system attempts to complete the counter-clockwise loop on the right side of the maze. Panel D illustrates the beneficial effects of reverse replay during trajectory learning. Panel E illustrates the effect of a model training with 100% reverse replay. It is similar to using 0% reverse replay but the effect is observed on the left lap trajectory part. Panel F–when reverse replay is introduced, this error is attenuated.

More »

Expand

Fig 10.

Consolidation and reverse replay applied to behavioral data.

Measured variable is Frechet distance between generated and desired sequence. Data from the rat TSP configurations are used for training and testing the model. A. Effects of consolidation: as successive trials are added to the replay repertoire; the trajectory reconstruction error is significantly reduced. B. Effects of reverse replay: as reverse replay is introduced in snippet formation for training the PFC model, reconstruction error is significantly reduced.

More »

Expand