Efficient and robust coding in heterogeneous recurrent networks

Cortical networks show a large heterogeneity of neuronal properties. However, traditional coding models have focused on homogeneous populations of excitatory and inhibitory neurons. Here, we analytically derive a class of recurrent networks of spiking neurons that close to optimally track a continuously varying input online, based on two assumptions: 1) every spike is decoded linearly and 2) the network aims to reduce the mean-squared error between the input and the estimate. From this we derive a class of predictive coding networks, that unifies encoding and decoding and in which we can investigate the difference between homogeneous networks and heterogeneous networks, in which each neurons represents different features and has different spike-generating properties. We find that in this framework, ‘type 1’ and ‘type 2’ neurons arise naturally and networks consisting of a heterogeneous population of different neuron types are both more efficient and more robust against correlated noise. We make two experimental predictions: 1) we predict that integrators show strong correlations with other integrators and resonators are correlated with resonators, whereas the correlations are much weaker between neurons with different coding properties and 2) that ‘type 2’ neurons are more coherent with the overall network activity than ‘type 1’ neurons.


•
The argument rests on a model of neurons that produce spikes that perfectly minimise error when doing linear decoding. I think this is an interesting approach, but it might be nice to acknowledge limitations of that approach and discuss to what extent it can be implemented more practically.
o We have addressed this in the Discussion • How robust are the results to the choice of efficiency measure? I can see the logic in dividing by number of spikes to get a measure of some sort of accuracy per spike, but 1/MSE as a measure of accuracy doesn't seem totally obvious to me. For example, you might equally argue that there is a cost to each inaccuracy, and a cost for each spike, and you want to minimise the sum of those costs, suggesting that you want to minimise a*num_spikes+b*MSE for some a, b. It would be nice to see that the conclusions about efficiency are robust to the measure chosen here, in the absence of any obvious standard measure. o Indeed, we agree with the reviewer, and we have discussed this. It was actually hidden in a footnote (at very low amplitudes, the error is high due to the filters being larger than the stimulus, and Gamma is high). It depends a bit on the efficiency measure used (see supplemental figures), but there appears indeed to be an optimal Gamma. o I am sorry, does the reviewer mean that with this reasoning, the maximal trial-to-trial variability is expected at the stimulus amplitude = # neurons/2 (given that the filter amplitude is 1, since then there would be the optimal number of choices of which neuron can spike)?. I suppose that that is true, and following this reasoning, that would be around an amplitude of about 20 (for 50 'positive' neurons with filters slightly higher than 1). However, this is not taking into account the temporal aspects of the filters, which complicate things: there is 'leftover' stimulus estimate from previous spikes, and there are positive and negative parts of a filter. So I agree that this argument is more to gain an intuition than that it completely works out, we have made a comment about that in the text.

•
The kernel is added to the threshold of neuron m every time neuron m fires a spike, as given in equation (7).
The index m had fallen away from this equation, I hope it is clearer now. Adding a neuron-specific temporal increase to the threshold, instead of just an overall one, has as an advantage, that the activity gets 'spread' over multiple neurons, we have added this in the text.
2 • We included figure 5 an 6, as these are measurements (signal and noise correlations) that are often done in experiments. Our main argument is exactly that it is difficult to derive definitive conclusions from such measurements, especially on the timescales on which they are measured experimentally. We agree that this is not a conclusive prediction (because such measurements could be in agreement with multiple frameworks or networks). We have introduced the topic differently to make this clearer.
### Minor issues 1. The authors note that in deep learning, causal filters are needed because of the layered structure and information flow direction. In my view, causal filters are also needed for a recurrent network. Since the authors are proposing a mechanistic model of neural coding -and not a statistical description-causality seems important for recurrent networks.
• Indeed our network mechanism is purely causal, i.e. each spike influences the future. However, in post-hoc data analysis, acausal filters are often used as descriptors of neurons (i.e. a spike-triggered average). The text has been adapted to also include recurrent networks.
2. I find some of the definitions slightly confusing: 1) First, the authors define reliability as the similarity between spike trains. Naturally, one expects reliability to depend on the readout. In that case, a network that can produce the same readout with low error for very different spike trains is more reliable. Counterintuitively, the 'reliability' of the network is very prone to noise. I feel that 'consistency' is a more appropriate term. If the authors choose to keep the term 'reliability,' they may want to emphasize that difference.
• We understand the confusion, we have called it 'spike reliability'.
2) The authors do not normalize the efficiency by the signal amplitude, so high-amplitude signals with high activity give low efficiency (which they notice in the results). It is counterintuitive to the meaning of efficiency, and I think it should be pointed out.