^{1}

^{1}

^{2}

^{2}

^{1}

The authors have declared that no competing interests exist.

Place cells in the mammalian hippocampus signal self-location with sparse spatially stable firing fields. Based on observation of place cell activity it is possible to accurately decode an animal’s location. The precision of this decoding sets a lower bound for the amount of information that the hippocampal population conveys about the location of the animal. In this work we use a novel recurrent neural network (RNN) decoder to infer the location of freely moving rats from single unit hippocampal recordings. RNNs are biologically plausible models of neural circuits that learn to incorporate relevant temporal context without the need to make complicated assumptions about the use of prior information to predict the current state. When decoding animal position from spike counts in 1D and 2D-environments, we show that the RNN consistently outperforms a standard Bayesian approach with either flat priors or with memory. In addition, we also conducted a set of sensitivity analysis on the RNN decoder to determine which neurons and sections of firing fields were the most influential. We found that the application of RNNs to neural data allowed flexible integration of temporal context, yielding improved accuracy relative to the more commonly used Bayesian approaches and opens new avenues for exploration of the neural code.

Being able to accurately self-localize is critical for most motile organisms. In mammals, place cells in the hippocampus appear to be a central component of the brain network responsible for this ability. In this work we recorded the activity of a population of hippocampal neurons from freely moving rodents and carried out neural decoding to determine the animals’ locations. We found that a machine learning approach using

Place cells, pyramidal neurons found in CA1 and CA3 of the mammalian hippocampus [

However, although place cell activity is strongly modulated by self-location this relationship is non-trivial and not exclusive. For example, during rest and brief pauses, but also during motion, the place code can decouple from an animal’s current location and recapitulate trajectories through the enclosure [

A common approach used to interrogate neural representations, such as that of place cells, is decoding; the accuracy with which a variable, such as self-location, can be decoded from the brain, places a useful lower limit on the amount of information present [

To better understand these constraints, we trained a deep recurrent neural network (RNN) [

In the following Results section we summarize the decoding performance with three decoders. The

The third decoder,

In this work we used a type of recurrent neural network called long-short term memory (LSTM, [

To test the RNN’s ability to decode rodent location based on hippocampal activity we first characterized the decoding error for a single animal foraging in a 2D arena (1m x 1m square). Single unit recordings were made using tetrodes from region CA1 of five rats. In all animals less than 10% of the recorded neurons were interneurons, characterized by narrow waveforms and high firing rates. Rat R2192 yielded the greatest number of simultaneously recorded hippocampal neurons (n = 63). Since the number of recorded neurons is expected to correlate with decoding accuracy, we first focused on this particular animal.

Neural data was processed to extract action potentials and these were assigned to individual neurons using the amplitude difference between tetrode channels [

As the RNN training process is stochastic, 10-fold cross validation (CV) procedure was run multiple times for each window size. For each of these runs we trained 10 models (for each fold of CV) and extracted the mean and median results across the folds. Black dots on

Location decoding errors based on CA1 neural data recorded from 1m square open field environment as a function of time window size. (a) shows mean error and (b) median error. Blue lines represent errors from the RNN decoder and red lines from Bayesian approaches. Results for the RNN approach are averaged over different independent realizations of the training algorithm. Black dots depict the mean/median error of each individual model. Results shown are for animal R2192.

For both the mean (

The RNN has the ability to flexibly use information from all 100 input vectors and thus integrates contextual information over time. This results in lower mean and median errors as compared to the two baseline Bayesian approaches. The naive Bayesian method with flat priors does not have access to information about past activity, resulting in lowest accuracy. Equally, the Bayesian decoder with memory incorporates past activity to form an informed prior, but does this in a predefined manner, being less flexible than the RNN. Notice also that the RNN approach achieves its best results for shorter time windows than the Bayesian approaches (see also

Number of data points, number of recorded neurons, and the optimal time window for the three decoders for each of the 5 analyzed animals and for both decoding tasks.

Task | Rat ID | Length of recording | Neurons recorded | Optimal window for RNN | Optimal window for simple Bayes | Optimal window for Bayes with memory |
---|---|---|---|---|---|---|

2D | R2192 | 1081 s | 63 | 1400 ms | 2800 ms | 2000 ms |

R2198 | 1281 s | 33 | 2000 ms | 1800 ms | 2000 ms | |

R2336 | 1234 s | 48 | 1800 ms | 2800 ms | 2800 ms | |

R2337 | 1456 s | 43 | 1800 ms | 2800 ms | 2600 ms | |

R2217 | 1500 s | 26 | 1600 ms | 3400 ms | 3400 ms | |

1D | R2192 | 1394 s | 72 | 1400 ms | 2800 ms | 2200 ms |

R2198 | 1934 s | 49 | 2200 ms | 3400 ms | 1800 ms | |

R2336 | 1900 s | 71 | 1400 ms | 4400 ms | 3600 ms | |

R2337 | 2778 s | 71 | 1600 ms | 2000 ms | 400 ms | |

R2217 | 1595 s | 40 | 1400 ms | 5400 ms | 3800 ms |

Beyond the global descriptors of mean and median error, we also inspected the distribution of decoding error sizes (

(a) Histogram of error sizes, generated in each case with the best performing time window (1400 ms for RNN, 2800 ms for flat Bayesian and 2000ms for Bayesian with memory). Both types of Bayesian decoders make more very large errors (0.02% vs 2.7% of errors > 50 cm). Errors are grouped into 2 cm bins, the last bin shows all errors above 50 cm. (b-c) Downsampling analysis demonstrates the RNN decoder is more robust to small dataset sizes. Data from R2192 was downsampled and all three decoders were trained with a random subset of the available neurons. For each sample size, 10 random sets of neurons were selected and independent models trained as before using 10-fold cross validation. Dots represents (b) mean and (c) median error for each downsampled dataset. Lines indicate the (b) mean of means and (c) mean of medians over sets of the same size.

The RNN error distribution followed a unimodal curve with most predictions deviating from the rat’s true position by 6-8 cm and few errors were larger than 35 cm (1.7% of errors > 35 cm, see

In many cases single unit recordings yield fewer than the 63 neurons identified from R2192. We hypothesised that the RNN’s ability to use contextual information would be increasingly important in scenarios where neural data was more scarce. To test this prediction we randomly downsampled the dataset available from R2192, repeating the training and decoding procedure for populations of neurons varying in size from 5 to 55 in increments of 5. As expected we saw that decoding accuracy reduced as the size of the dataset reduced. However the RNN was considerably more robust to small sample sizes, decoding with an error of 30.9 cm with only 5 neurons vs. 46.0 cm error for the Bayesian decoder (

In total we analyzed recordings from five animals as they foraged in a 2D open field environment (1m x 1m square). For each of these 5 datasets, we determined the best performing time window size (similarly to

In the 2D decoding task, for different animals, the mean error (mean across cross validation folds) ranged between 12.5-16.3 cm and median between 10.3-13.1 cm (

(a-b) Decoding results in a 1m square environment. The RNN consistently outperforms the two Bayesian approaches in all 5 data sets. Mean and median errors across cross validation folds, respectively. (c-d) Decoding errors from a 600 cm long Z-shaped track. RNN consistently yields lower decoding errors than the Bayesian approaches, the difference is more marked when mean (c) as oppose to median (d) errors are considered.

We also performed decoding on 1D datasets recorded while the same 5 animals shuttled back and forwards on a 600 cm long Z-shaped track for reward placed at the corners and ends (

Next to understand how behavioural and neural variability influenced decoding accuracy we focused on the results obtained from rat R2192 in the 1m square—the animal with the greatest number of neurons and the lowest decoding error.

First we examined the decoding error as a function of the rat’s location. It is important to note that the animals’ behaviour is non-uniform—the rats visit some parts of the arena more often than others (see _{val} ≪ 0.001,

(a) The trajectory of the rat during the entire trial. Not all parts of the arena are visited with equal frequently. (b) The average size of errors made in different regions of space. Color of each hexagon depicts the average euclidean error of data points falling into the hexagon. More frequently visited areas (as seen from (a)) tend to have lower mean error. (c) Sum neural activity in different regions of space. For each data point we sum the spike counts of all 63 neurons in a 1400 ms period centered around the moment the location was recorded. The color of the hexagon corresponds to the average over all data points falling into the hexagon. Areas where sum neural activity is high have lower prediction error. (d) Prediction error of a coordinate decreases if the animal is closer to the wall perpendicular to that coordinate.

Another important factor influencing the decoding accuracy is the distribution of neural activity across the 2D enclosure. In particular, place fields of the recorded hippocampal cells do not cover the enclosure uniformly. Clearly it would be difficult for the algorithm to differentiate between locations where no cell is active. As such, it is likely that areas where more neurons are activated are decoded with higher precision. Our results confirm that the sum of spike counts across neurons at a given location is strongly anti-correlated with the prediction error made at that location (_{val} ≪ 0.001,

We also inspected the

Furthermore, an additional factor that seemed to influence prediction accuracy was the animal’s motion speed. Predictions were more reliable when the rat was moving as opposed to stationary. The mean prediction error for speeds below 0.5 cm/s being 16.5 cm, higher than the 12.1 cm average error for all speeds above 0.5 cm/s (two-sided Welch’s t-test,

The accuracy of any neural decoder represents a useful lower bound on the information about the decoded state contained by the recorded neurons. Thus, a biologically relevant question is how such information is distributed among the neurons, across space and time. In short we asked which features of the neuronal activity are the most informative at predicting the animal’s position. To this end we conducted two different types of sensitivity analyses to measure robustness to different types of perturbations. For a visualization of the representations learned by the RNN, see the dimensionality reduction analysis (using t-SNE) in

A simple way to estimate the relevance of a specific input in a predictive model is to remove it (to

Here we used knockout without retraining. The RNN was applied, as before, to predict locations based on a validation dataset in which the activity of a single neuron was set to zero. The knock-out procedure was repeated for each input neuron separately and mean prediction error calculated. Thus we were able to rank neurons by sensitivity—the greater the error increase due to the knocking-out the more crucial the neuron was for the model.

The most influential neuron (neuron #55) was visually identified as an inhibitory neuron based on the lack of clear firing fields and high firing rate (

The firing rate maps of the most (a-e) and least (f) influential neurons according to the knockout analysis. Colour bar to the right of each plot indicates the firing rate in Hz. With the complete dataset the mean error was 12.50±0.28 cm. When knocking out neurons 55, 26, 41, 17, 23 (the five most influential) and 9 (the least influential), the mean error increased to 14.72 cm, 13.80 cm, 13.66 cm, 13.50 cm, 13.49 cm and 12.58 cm respectively.

For more than half of the neurons knocking them out decreased the prediction accuracy only slightly (less than the standard deviation of accuracy, calculated over 10 realizations of the complete model). Among those less influential neurons we found both putative inhibitory interneurons and pyramidal cells with no clear place fields and a lower than average firing rate. For example, the rate map of the least influential neuron, was characterized by a low firing rate #9 (_{val} < 0.001,

A different way to investigate which neurons most strongly influence decoding accuracy is a gradient analysis. In this analysis we calculate the derivatives of the loss function (mean squared prediction error) of the RNN with respect to the inputs (spike counts of neurons) at different time points. By definition these derivatives show how much a small change in a spike count influences the error. This type of sensitivity analysis is quite different from the knock out analysis—knockout sensitivity measures the impact of silencing a neuron, gradient sensitivity measures the impact of a neurons activity deviating from the expected value.

For each predicted location we asked how sensitive the model was to each of the input spike counts. Since our RNN input is a set of 100 spike count vectors (length of time series), each of length 63 (number of neurons), this amounts to 63x100 gradients per sample. Considering that the whole data set contains around 4400 samples we obtain a 4400x63x100 cubic array of gradient values. To reveal different aspects of the sensitivity of the model, we can average this array of gradients across three dimensions—samples, neurons, or position in the input sequence.

Averaging gradients across all samples and all time windows provided one average gradient value per neuron. Similarly to the knock-out analysis this indicated how relevant the neuron is for the prediction. The two sensitivity measures (knock-out and gradient) were strongly correlated (Spearman’s Rank Order,

In a second step, we investigate how sensitivity with respect to a neuron’s spike count depends on whether the animal is within its place field or not. Place fields are of variable shape and size and, moreover, a small proportion of the recorded cells have no distinct place fields. Also the firing rates and gradient strengths vary greatly between neurons. Thus, we used firing rate as a proxy to indicate proximity of the animal to a given neuron’s place field—firing rate being maximal when the animal is near the centre of a place field, diminishing the further is moves away from that point. Hence after normalizing both the firing rates and the strength of gradients we averaged over all recorded cells (see the Sensitivity measures subsection in Methods). We saw that sensitivity decreases when the firing rate increases (

(a) Place field of an example neuron. (b)

We have shown that the sequential processing afforded by an artificial recurrent neural network (RNN) provides a flexible methodology able to efficiently decode information from a population of neurons. Moreover, since a RNN decoder is a neural network, it represents a biologically relevant model of how neural information is processed. Specifically, when applied to hippocampal neural data from freely moving rats [

Bayesian methods are known to be optimal decoders when using appropriate priors [

Making use of the past neural activity as contextual information, the RNN seems more robust to noise than the two Bayesian classifiers. In particular when using shorter time windows the spike counts become noisier and the Bayesian models’ prediction accuracy degraded rapidly. In contrast the RNN decoder was more resistant to the variability of spike counts, likely due to its ability to combine information over the complete sequence of past inputs. Similarly, in situations where fewer neurons were available and hence the total amount information was reduced, the RNN exhibited a pronounced advantage over the Bayesian decoders. Equally, in the 1D task the benefit of the RNN was most evident for animal R2217, which had the fewest recorded neurons. Nevertheless notice that fewer recorded neurons does not necessarily mean lower accuracy. As described in Section 2.3.1, the error depends strongly on the amount of training data available (length of recording) and the quality of the cells (amount and location of firing). Taken together these results suggest that RNN decoding of neural data may prove to be particularly useful in situations where large populations of neurons are not available or are difficult to stably maintain.

Beyond quality and amount of data available, the size of error the RNN decoder made was also seen to depend on the distance of the animal from the walls and its instantaneous speed. At higher speeds (above 10.5 cm/s) the decoding accuracy does not decrease, but when the animal is immobile (below 0.5 cm/s) the error was significantly higher than when in motion. We hypothesize that while stationary hippocampal activity may reflect non-local activity associated with sharp-wave ripple states [

Beyond providing more accurate decoding, the neural network approach also provides a new means of conducting sensitivity analyses. While knockout-type sensitivity analyses can be applied to both Bayesian and RNN decoders, the latter approach also supports gradient analyses. The two types of sensitivity—knockout and gradient—are correlated, but not identical. By design knockout analyses answers how the system behaves if an input is completely removed, while gradient analyses investigated how the system behaves in response to small perturbations to that input. Having access to the gradients with respect to each spike count makes is possible to pose new questions about the dynamic variability of the information content of individual neurons.

All procedures were approved by the UK Home Office, subject to the restrictions and provisions contained in the Animals Scientific Procedures Act of 1986.

Five male Lister Hooded rats were used in this study. All procedures were approved by the UK Home Office, subject to the restrictions and provisions contained in the Animals Scientific Procedures Act of 1986. All rats (330 − 400

Screening was performed post-surgically after a 1-week recovery period. An Axona recording system (Axona Ltd., St Albans, UK) was used to acquire the single-units and positional data (for details of the recording system and basic recording protocol see Barry et al(2007). The position and head direction of the animals was inferred using an overhead video camera to record the location of two light-emitting diodes (LEDs) mounted on the animals’ head-stages (50 Hz). Tetrodes were gradually advanced in 62.5

The experiment was run during the animals’ light period. First, animals ran on a Z-shaped track, elevated 75

Following the track session the same animals completed a 20min random foraging session in a square (1m x 1m) enclosure. Coverage of the enclosure was encouraged by rewarding animals with sweetened rice. These recordings constitute the dataset we refer to as the 2D decoding task. Not all animals’ recordings were used.

Deep learning is a class of algorithms that learn a hierarchy of representations or transformations of the data that make the problem of classification or regression easier [

Whereas feed-forward neural networks learn to predict an output based on a single input, recurrent neural networks (RNNs) can deal with series of inputs and/or outputs [

A RNN can be made to predict (i) a series of outputs based on a series of inputs, (ii) a series of outputs given only one input, and (iii) one output given a series of inputs. For our location prediction task we are interested in the latter—given hippocampal activity (spike counts) over a longer period of time, we aim to predict one set of spatial coordinates—the animal location.

The architecture, illustrated in

(a) Extracting a sequence of spike count vectors from spiking data. Each subsequent input originates from an overlapping time period, shifted 200 ms forward in time. (b) The input data that the RNN decoder will use is a sequence of spike count vectors from these time windows. (c) The network used for decoding consists of an input layer (size equals number of recorded neurons), two hidden layers containing 512 long-short term memory (LSTM) units and an output layer of size 2 (

The features of neural data used for decoding are the spike counts of all

Based on this series of 100 spike count vectors the recurrent network was trained to predict the rat’s location in the center of the last (100th) time window. Thus, each sequence of 100 vectors plus the correct location of the rat at the center of the last time window forms one data point for training the RNN.

During the training procedure the network aims to minimize an objective function, in our case the mean squared error of the coordinates. The learning is done for 50 epochs (full cycles of training data) using a constant learning rate 0.001 and RMSprop optimizer (a variant of stochastic gradient descent, see [

The code for training the RNN models, can be found in

When searching for the best recurrent neural network model we scanned a number of architectures. Initially, we experimented with “many-to-many” models that predict a location per timestep, and with bidirectional RNN models [

We tried Adam [^{−2} to 10^{−4}) and batch sizes (5 to 100). Learning rate was multiplied every {1, 5 or 10} epochs with a coefficient {0.1, 0.5 or 1.0}. We trained models for 10, 50 and 100 epochs. We did not use early stopping because we report cross validation performance and hence using a validation set at any stage of the training process might inflate the results.

As the number of possible parameter combinations is huge, we did not run a complete grid search, instead we removed values from consideration if they decreased validation performance or increased computational cost without improving the model.

Spatial decoding was also implemented using a Bayesian framework [

Next, with the remaining 10% of the data, using temporal windows (200 ms to 4000 ms) each of which overlapped with its neighbours by half, we calculate the probability of the animal’s presence in each spatial bin given the observed spikes—the posterior probability matrix [

Specifically during a time window (T) the spikes generated by _{1},…, _{i},…, _{N}), where _{i} was the number of spikes fired by the _{i}(

In the case of the simple Bayes decoder, to compute the probability of the animal’s position given the observed spikes we applied Bayes’ rule, assuming a flat prior for position (

Finally for the Bayes decoder with memory we made two further changes. First for each animal

Second, following [_{t} | _{t}, _{t−1}).

The implementations of these two approaches can be found in the Bayesian folder of the GitHub repository

The reported errors for both Bayesian and RNN approach are measured using a 10-fold cross validation method that divides the

Instead, in our analysis the first fold in cross validation simply corresponds to leaving out the first 10% of the recording time and training the model on the last 90% of data. The second fold, accordingly, assigns the second tenth of recordings to the validation set, and so on. For RNNs we need to additionally discard 99 samples at each border between training and validation sets. Remind that the input for RNNs is a series of 100 spike count vectors—to avoid any overlap between training and test data we remove validation data points that have at least one shared spike count vector with any training data point.

For each fold we train a model on the training set and calculate the error on the validation set. All reported errors are the validation errors—errors that the models make on the one tenth of data that was left out of the training procedure. To increase the reliability of the results, we perform 10-fold cross validation procedure multiple times and report the mean and median of the errors. This is done only for the RNN decoder, because the Bayesian decoder is deterministic and repeating cross-validation procedure multiple times is not necessary.

When decoding rat locations in the 2D arena, prediction errors were quantified by the mean Euclidean distance (MED) between the predicted and true positions:

The training procedure for recurrent neural networks is stochastic, potentially resulting in different solutions with the same starting conditions. We repeat the 10-fold cross-validation 10 times, giving us 10 independent predictions for each data point. We report the average of errors over these 10 realizations (and not the error of the averaged prediction).

For evaluating the

In an additional experiment, we also decode the rat locations on a 600 cm long Z-shaped track. The position of the rat along the track is considered as a 1D coordinate ranging from 0 in one end of the track to 600 in the other end of the Z-shape. To obtain these 1D coordinates the actual locations extracted from camera images are projected to the nearest point on a Z-shaped ideal trajectory. The prediction error of the model is quantified by absolute distance between the predicted and true position along this 1D coordinate.

For knock-out analyses we set the activity of the neuron under consideration to zero in all validation data points and then calculate the validation errors. The activity is not annulled during training of the model, so the system can not learn to compensate or adapt. We repeated this knock-out procedure for each neuron in turn.

The gradient of the loss function with respect to inputs was calculated using back-propagation through time [

When investigating the relationship between sensitivity and location on the place field (on

Movement speed is based on the distance covered in 200 ms. The first bar is the average over the errors for speeds in range [0, 0.5] cm/s, the second for (0.5, 1.5] cm/s, etc. The error is highest when the rat is not moving or moving very slowly. Notice that speeds in the range of 1-2 cm/s can also be the results of head movements. At higher speeds the exact velocity does not seem to influence accuracy. Note that the bars do not contain the same amount of data points. Apparent changes in the mean error at higher velocities can be attributed to noise as we have less data points there.

(TIF)

Top row: colouring reflects if the data point’s true location is near a wall. Bottom row: points are colored by the X-coordinate of true location of the data point and sized accoring to Y-coordinate. The left column illustrates how the the coloring schemes look on the true XY-coordinates. The right column shows the schemes applied on activations of nodes in the second layer of the RNN model at t = 100, reduced to 2D by t-SNE. Notice that data points with similar true locations are also nearby in activity space.

(TIF)

Top row: data points are colored according to instantaneous speed. Bottow row: data points a colored accoring to instantaneous direction of movement. Left column shows the correspondence between true location and speed and direction. In the right column the colouring schemes are applied to activations of nodes in the second layer of the RNN model at t = 100, reduced to 2D by t-SNE. There is no visible correspondence between RNN activations and neither speed nor direction.

(TIF)

a-e) Temporal profiles of relative importance for 5 selected neurons among the highest contributing neurons according to gradient analysis. Notice that the profiles peak at different time steps. f) Temporal profile of the least important neuron according to gradient analysis.

(TIF)

(PDF)

(PDF)

We thank Zurab Bzhalava, Sander Tanni and Jaan Aru for early work and useful discussions. We also thank Jack Kelly for useful discussions.