^{1}

^{2}

^{1}

^{2}

^{*}

Conceived and designed the experiments: MB SD. Performed the experiments: MB. Analyzed the data: MB. Wrote the paper: MB SD.

The authors have declared that no competing interests exist.

Compelling behavioral evidence suggests that humans can make optimal decisions despite the uncertainty inherent in perceptual or motor tasks. A key question in neuroscience is how populations of spiking neurons can implement such probabilistic computations. In this article, we develop a comprehensive framework for optimal, spike-based sensory integration and working memory in a dynamic environment. We propose that probability distributions are inferred spike-per-spike in recurrently connected networks of integrate-and-fire neurons. As a result, these networks can combine sensory cues optimally, track the state of a time-varying stimulus and memorize accumulated evidence over periods much longer than the time constant of single neurons. Importantly, we propose that population responses and persistent working memory states represent entire probability distributions and not only single stimulus values. These memories are reflected by sustained, asynchronous patterns of activity which make relevant information available to downstream neurons within their short time window of integration. Model neurons act as predictive encoders, only firing spikes which account for new information that has not yet been signaled. Thus, spike times signal deterministically a prediction error, contrary to rate codes in which spike times are considered to be random samples of an underlying firing rate. As a consequence of this coding scheme, a multitude of spike patterns can reliably encode the same information. This results in weakly correlated, Poisson-like spike trains that are sensitive to initial conditions but robust to even high levels of external neural noise. This spike train variability reproduces the one observed in cortical sensory spike trains, but cannot be equated to noise. On the contrary, it is a consequence of optimal spike-based inference. In contrast, we show that rate-based models perform poorly when implemented with stochastically spiking neurons.

Most of our daily actions are subject to uncertainty. Behavioral studies have confirmed that humans handle this uncertainty in a statistically optimal manner. A key question then is what neural mechanisms underlie this optimality, i.e. how can neurons represent and compute with probability distributions. Previous approaches have proposed that probabilities are encoded in the firing rates of neural populations. However, such rate codes appear poorly suited to understand perception in a constantly changing environment. In particular, it is unclear how probabilistic computations could be implemented by biologically plausible spiking neurons. Here, we propose a network of spiking neurons that can optimally combine uncertain information from different sensory modalities and keep this information available for a long time. This implies that neural memories not only represent the most likely value of a stimulus but rather a whole probability distribution over it. Furthermore, our model suggests that each spike conveys new, essential information. Consequently, the observed variability of neural responses cannot simply be understood as noise but rather as a necessary consequence of optimal sensory integration. Our results therefore question strongly held beliefs about the nature of neural “signal” and “noise”.

Our senses furnish us with information about the external world that is ambiguous and corrupted by noise. Taking this uncertainty into account is crucial for a successful interaction with our environment. Psychophysical studies have shown that animals and humans can behave as optimal Bayesian observers, i.e. they integrate noisy sensory cues, their own predictions and prior beliefs in order to maximize the expected outcome of their actions

Several theoretical investigations have explored the neural mechanisms that could underly such probabilistic computations

However, most of these studies neglect a crucial dimension of perception: time. Most sensory stimuli vary dynamically in a natural environment, which requires sensory representations to be constructed, integrated and combined on-line

The problem is even more crucial if the decision is delayed compared to the presentation of sensory information. Sensory variables such as the direction of motion of a stimulus can be retained in “working memory” for significant periods of time even in the absence of sensory input. Neural correlates of this working memory appear as persistent neural activity in parietal and frontal brain areas and exhibit firing statistics similar to those found for sensory responses

Here, we approach these issues by using a new interpretation of population coding in the context of temporal sensory integration. We consider spikes, rather than rates, as the basic unit of probabilistic representation. We show how recurrent networks of leaky integrate-and-fire neurons can construct, combine and memorize probability distributions of dynamic sensory variables. Spike generation in these neurons results from a competition between an integration of evidence from feed-forward sensory inputs and a prediction from lateral connections. A neuron therefore acts as a “predictive encoder”, only spiking if its input cannot be predicted by its own or its neighbors' past activity.

We demonstrate that such networks integrate and combine sensory inputs optimally, i.e. without losing information, and track the stimulus dynamics spike-per-spike even in the absence of sensory input, over timescales much longer than the neural time constants. This framework thus provides a first comprehensive theory for optimal

Similar to cortical sensory neurons, model neurons respond with sustained, asynchronous spiking activity. Spike times are variable and uncorrelated, despite the deterministic spike generation rule. However, in contrast to rate codes, each spike “counts”. The trial to trial variability of spike trains does not reflect an intrinsic source of noise that requires averaging, but is a consequence of predictive coding. While spike times are unpredictable at the level of a single neuron, they deterministically represent a probability distribution at the level of the population. This leads us to reinterpret the notions of signal and noise in cortical neural responses.

In order to clarify the presentation, we will concentrate on the following general task. Imagine a cat chasing a mouse in your garden. The cat integrates auditory and visual information to locate the mouse. It will combine these cues according to their reliability. If for instance the mouse is partially covered by a bush, i.e. there is a high uncertainty associated with the visual cue, the cat will give a higher weight to its auditory information. If the mouse suddenly disappears behind a tree and cannot be heard or seen anymore, the cat should estimate the likely trajectory of the mouse in the absence of any relevant sensory input, in order to anticipate where the mouse is going to reappear. Finally, this information will need to be extracted when the cat eventually decides to catch the mouse.

The cat's task can thus be divided into three parts (

(A) Illustration of the network task. An auditory and a visual cue (cue 1 and 2) about a dynamic stimulus (e.g. the position of a mouse) are integrated and combined during the integration period. During the memory period, this information is kept available such that it can be read out over a timescale of order

We assume that the dynamic stimulus

Visual and auditory inputs are provided by two independent population of neurons on two input layers, a “visual” layer and an “auditory” layer. Input neurons respond to position

The two sensory input layers converge onto a recurrently connected output layer (

If this equation holds, the output neurons are said to encode the stimulus “optimally”.

This decoder defines how the posterior probability is represented on-line (i.e. within time constant

The coding strategy for the output layer is chosen for self-consistency, i.e. it ensures that

We now derive the dynamics of the output neurons that will ensure that equation (4) holds approximately.

In a first step, we need to know what an ideal observer, i.e. an observer that performs optimal inference on the input spikes, would know about the stimulus. We denote it as

With the assumptions made in the previous section, we can derive an expression for the ideal observer of the stimulus in the limit of small

The ideal observer performs a linear integration of the input spikes weighted by the kernels of their likelihoods. The term

Output spike trains shall be generated such that the output read-out,

We propose a spike generation criterion that minimizes the mean squared distance between

This criterion ensures that neurons only fire spikes to account for new information about the stimulus that has not previously been reported by their own or their neighbors' activity. Avoiding spike redundancies minimizes the metabolic cost of the code and increases the independence among output spikes.

In contrast to other error measures such as the Kullback-Leibler divergence, the squared distance results in a local integrate-and-fire spike generation rule. Indeed, let us now define the “membrane potential”

Output neurons integrate input spikes with feed-forward weights

The slow currents

The weights

An output neuron receives inputs through fast feed-forward connections (

Averaged currents to a neuron with a preferred stimulus of 180 deg as a function of the presented stimulus location. (A) Currents during the integration period. Feed-forward input currents (blue) are excitatory for stimuli similar to the preferred stimulus of the neuron and inhibitory otherwise. The sum of fast and slow recurrent currents (red-green dashed line) follows an inverted profile of similar magnitude that counteracts the effect of the feed-forward input. The leak current (magenta) is small in magnitude compared to the synaptic currents. (B) Currents during the memory period. Feed-forward inputs are equal to zero. The individual lateral currents are enhanced with respect to the integration period. However, their total sum (red-green dashed line) is balanced and close to zero (see also the black dashed line in C). (C) Total currents (including leak) during the integration period (solid line) and during the memory period (dashed line). In both cases, the contributions of individual currents balance each other out such that the total current is small, slightly excitatory among neurons whose preferred stimuli are similar to the presented stimulus and inhibitory otherwise. The two maxima of the current during the memory period are due to the non-linear component of the slow recurrent currents (

Example contributions of the different currents are depicted in

Slow recurrent connections, on the other hand, have two distinct roles. First, they “reintroduce” information that has leaked out, hence making past information available within the time window of integration of the decoder. It is this short-range slow excitation and long-range slow inhibition, mediated by the recurrent connections

Altogether, spike generation in our model is deterministic and results from a competition between an integration of evidence from feed-forward and slow lateral inputs,

Finally, we assumed for the sake of simplicity that the same output neuron can both excite and inhibit different target neurons, which is clearly not realistic. A more realistic model can be constructed by using one purely excitatory neuron and another purely inhibitory neuron for each output kernel.

The free parameters of our model are the leak

The kernel

These output kernels do not necessarily need to be known in advance by the decoder, or any other neural structure extracting information about

This relationship holds if the spiking likelihood of the output neurons lies in the exponential family with linear sufficient statistics

Similarly, the leak

Let us briefly go back to our example of the cat and the mouse and say that the cat is looking around to find a mouse to chase. Even in the absence of the mouse, the cat's beliefs on where the mouse is likely to appear is not uniform. The cat might for instance know that there is a family of mice living in a specific bush. It will then base its search mainly on the area around that bush. In other words, the cat has a strong prior belief on where mice are likely to appear.

The prior belief corresponds to the initial value of the log posterior,

If the stimulus includes a diffusive component, the slow current

(A) Input and output spike trains on a single trial. A stimulus with constant drift and diffusion is presented for 500 ms (gray area). (B) Time evolution of the stimulus posterior for the ideal observer (blue) and the network read-out (red). Thick lines show the mean of the posterior and narrow lines the corresponding width. The stimulus trajectory is shown in black. The dashed black line indicates the predictable (drift) part of the stimulus that the network is tracking during the memory period. (C) Snapshots of the posteriors, from left to right; after 500ms (end of integration period), after 2000 ms and after 5000 ms. (D) Coding performance measured as the standard deviation of the stimulus estimate

We illustrate the network dynamics and model predictions using the general task outlined in

The response of the decoder closely matches the performance of an ideal observer (

Slow currents

The network implements Bayesian inference and therefore combines visual and auditory cues optimally, weighting each sensory cue according to its accuracy. To illustrate this point, we plot the performance of the network in a bimodal case in which both input cues encode the stimulus with equal accuracy and two “unimodal” cases in which one of the inputs represents the stimulus much more accurately than the other. The accuracy of the sensory input was changed by multiplying the corresponding input tuning curves by a constant

(A) Estimation accuracy for different reliabilities of the input cues: both input cues are equally reliable (bimodal) or one cue is more reliable than the other (cue 1 and cue 2). In each subgroup, bars depict from left to right the encoding accuracy of: cue 1, cue 2, the ideal observer, the network at the end of the integration period and the network after one second in the memory period. (B) Biasing effect of the prior measured as the difference between the real and the estimated stimulus,

For the same reason, the network takes prior information into account accurately.

The presentation of a stimulus

(A) Post-stimulus time histogram (PSTH) of the output activity in response to a stimulus with constant diffusion. Color indicates firing rates in Hz. The stimulus (magenta line) is presented during the first 500 ms. (B) Tuning curves of a sample neuron. Spikes are counted in 10ms bins centered at 50 ms (black), 200 ms (blue) and 500 ms (red) during the integration period and at 550 ms (green) and 2500 ms (magenta) during the memory period. (C) Traces of the average firing rate of a neuron whose preferred stimulus lies around the peak of the bump of activity. Different curves depict different levels of Fisher information in the input population codes: reference information,

The integration of sensory evidence and its maintenance in working memory is reflected by the instantaneous firing rates of the output neurons.

Firing rates during the memory period have a lower baseline activity but similar tuning as during the integration period. Over time, tuning curves and population activity decrease, broaden and eventually disappear (

However, the behavior of the network is different in the absence of diffusion. The network is then able to maintain information about the stimulus over very long timescales, reflected by a neutrally stable bump of activity (

This implies that our working memory model differs from previous models that are based on line attractor dynamics

In particular, our network can maintain multi-modal posterior distributions reflected in multi-modal patterns of activity.

Two static stimuli (red lines) are consecutively presented to the network for 350 ms each. They are separated by a delay time interval of one second. Their spatial distance is (A) 180 deg, and (B) 45 deg. Top row: Spike trains on a single trial. Bottom row: Time evolution of the unnormalized log posterior (gray scale representation). The simulated network contains 200 instead of 50 neurons for better visual clarity.

The resulting output spike trains are asynchronous and spike times are not reproducible from trial to trial. They exhibit properties very similar to Poisson processes. Thus, the interspike interval (ISI) distributions of the output spike trains are quasi-exponential in both integration and memory period (

The sensory stage in our model is noisy, reflected by the Poisson firing of the input neurons. In contrast, output neurons generate spikes deterministically. Despite this fact, their spike trains resemble independent Poisson processes. This is true even during the memory period when the network activity is self-sustained and no noise is introduced by the external inputs. This eliminates the possibility that the output statistics are directly inherited from the Poisson distributed, feed-forward inputs and raises the question of where this variability comes from. In particular, can the responses of the network be considered to obey the predictions of a rate model?

We hence investigate the origin and role of this variability by using two approaches: A perturbation approach to study the dependency of output spike trains on initial conditions; and a decoding approach where we study how well the spike train of an output neuron can be predicted from the activity of the other neurons in the population.

(A) Output spike trains for two runs (blue and red) of activity starting with the same initial conditions. The red run is perturbed by the injection of one extra spike (shown by the red arrow). (B) Time course of the posterior of the two runs. (C) PSTH of the control (blue) and the perturbed (red) runs. The extra spike is injected at

The encoding properties of the output neurons are thereby not affected. The decoded posterior still matches the ideal observer closely (

We observed the same characteristics if a single output spike fails to be fired. Spike patterns are again completely reshuffled while coding performance is unaffected. Moreover, our model is robust to even frequent spike generation failure. The reason lies in the error correcting property of the code. If a spike generation fails it is compensated by a spike from another neuron that adds a similar kernel to the posterior, as illustrated in

Let us first assume that we record from the entire population of output neurons. We want to know how well the spike times of a single neuron

We can predict the spike times of neuron

Let us now suppose that we record (more realistically) from a subpopulation of

As shown in

We have previously seen that our network is robust to small perturbations and spike generation failure. We are now going to show that it is also robust to synaptic noise. Synaptic background noise is a prominent source of neural noise

(A) Coding performance of the network in the presence of synaptic background noise. The vertical axis plots the percentage excess of the standard deviation of the stimulus estimator above its optimal value. Results are reported for percentual decreases in the signal-to-noise ratio, SNR = mean(input)/std(input), of 0% (black), 20% (blue), 50% (red) and 100% (green). A static stimulus is presented during the first 500 ms (grey area). (B) Coding performance of a stochastic network for different output gains:

This robustness to even large levels of synaptic noise is another consequence of the error-correcting property of the code. Synaptic noise will lead neurons to reach their firing threshold even if their kernel does not decrease the mean squared distance between

For a similar reason, our network is robust to changes in the connection strengths between neurons. Scaling all recurrent synapses by

Despite its deterministic nature, our model exhibits firing statistics comparable to a rate model with independent Poisson noise for which spike times do not carry information. Thus, the question arises whether we could implement the same computations equally efficiently with stochastically generated spikes? In particular, if we consider biological networks with thousands of neurons, averaging responses from large populations of neurons might render the contribution of each spike unimportant. In this case, spike-based and rate-based approaches might become equivalent. In the following, we show that this is not the case. A deterministic spike generation rule is crucial for efficient information transfer even in very large networks.

To show this, we started by implementing a version of the probabilistic population code of Ma et al.

We now examine the consequence of firing spikes stochastically with rate

A neural system clearly cannot afford to spend 15 times more resources at each processing stage. Moreover, this cost of stochastic spike generation does not decrease with the size of the input and output neural populations. In the limit of large numbers of neurons/spikes, the variance of the stochastic network estimate approaches

In this article, we have revisited population coding with spiking neurons in the context of dynamic stimuli. Starting from first principles, we have demonstrated that networks of laterally coupled integrate-and-fire neurons can integrate and combine sensory information about a dynamic stimulus in close approximation to an ideal observer. In the absence of sensory input, these networks either represent the stimulus prior probability in their spontaneous activity before stimulus onset or they represent a working memory of the inferred stimulus posterior in their sustained activity after integration. These memories thereby keep tracking the underlying stimulus dynamics.

An important innovation of our model is that it encodes working memories representing an entire stimulus distribution rather than only a single stimulus value. It thereby distinguishes itself from other working memory models in the literature. Most working memory models are bi-stable attractor models

We propose that cortical neurons are primarily predictive encoders rather than stochastic spike generators. Integrate-and-fire dynamics as well as a competition between neurons only allows the generation of spikes that contain new information about the stimulus, i.e. information that has not yet been signaled by the neural population. Each spike therefore carries a precise meaning. As a consequence of the above mentioned properties, small networks of only tens of neurons can encode stable memories. Persistent, asynchronous memory states are notoriously difficult to achieve with small networks of integrate-and-fire neurons. Our model on the other hand is largely free from laborious fine tuning. It provides a functional interpretation of parameters such as lateral connections and synaptic dynamics, and could be used as a guideline to find optimal parameters in biophysically plausible networks. For instance, the slow currents

In our framework, prior beliefs correspond to setting the network into an initial state

Another important aspect of our approach concerns its interpretation of neural variability. Traditional population coding approaches clearly separate “signal”, encoded in rate modulation, and “noise”, encoded in the spike count variance. Rate models, such as linear-nonlinear Poisson (LNP) neurons

Our approach provides an alternative account for the origin of neural variability observed in cortical networks. Stochastic firing is not a good description of noise in single neurons

It might appear paradoxical to assume input neurons corrupted by Poisson noise while using perfectly deterministic output neurons. However, input noise in our model is meant to represent unavoidable sources of sensory noise, such as the stochasticity of our sensors in the first signal transduction stages (e.g. thermodynamical/quantum mechanical noise in the photoreceptors). This initial noise sets a bound on how much information is available for further processing stages. We used population codes with independent Poisson noise as inputs for the sake of convenience and because such variability is expected as a consequence of predictive coding. However, the same networks can process any noisy inputs whose log-likelihoods can be computed on-line. Our preliminary findings suggest indeed that our model can construct population codes with Poisson-like firing statistics for almost any type of noisy sensory input, including input that is not Poisson, not spiking or not a population code. Consequently, Poisson distributed input in our model does not represent noise in the input neurons but the outcome of previous optimal neural processing of the sensory input.

Our hypothesis can be tested experimentally in cases where one is able to record simultaneously from a significant portion of the population. Since our model assumes a strong level of inter-connectivity and shared input, a population could correspond to a local, relatively small network such as a micro-column, rather than a large and diffuse network containing millions of neurons. Our model predicts that the larger the simultaneously recorded population, the better one can predict individual spike times, using methods described in section “Output spike train statistics”. On the behavioral level, our model predicts that humans should be able to memorize entire probability distributions. This could be tested by a simple cue combination experiment, in which two cues about a stimulus (e.g. a visual and an auditory cue about the location of an object) are presented with a temporal delay. If subjects keep track of the uncertainty associated with the first cue, they should still behave like optimal Bayesian observers when combining information from the two cues after the delay period.

We are not the first authors to propose a spiking network for optimal cue combination and sensory integration. Ma et al.

Other authors have considered log probability codes

Our approach is similar to the “spiking Boltzmann machine” proposed by Hinton and Brown

We assumed that output neurons “know” the parameters of the input noise and stimulus dynamics. Sensory noise, stimulus drift and diffusion are hard-wired in the weights of feed-forward and lateral connections. For the sake of simplicity, we considered simple stimulus dynamics with a constant drift

Here we derive an expression for the ideal observer of the log posterior

The total response

We can use Bayes' rule to write the conditional probability of the stimulus given the past history of activity patterns,

This equation expresses the current posterior stimulus probability as a spatially averaged version of the past stimulus probability, weighted by the current response probabilities and properly normalized by

The normalization term,

The response likelihood

Let us now move to the term

Here we derive an approximation to the ideal observer that is implemented by the leaky integrate-and-fire neurons in the output population described in equations (7) and (8) of the main text.

We first introduce a discretization of the stimulus space given by

When inferring the input log posterior,

We now define

A similar equation is found for the second spatial derivative of

Similarly we define

Finally, we can write our approximation to the ideal observer as

For this approximation to work, it is crucial that

We can develop the squares in equation (26) to rewrite the spiking criterion as

We define the left hand side of this equation as the membrane potential

The dynamics of the slow currents

Decoding in our model reduces to a simple leaky integration of output spikes according to equation (3) of the main text. We can either assume that kernel

The two methods give virtually identical results. All results reported in this paper use learnt kernels.

On every trial, we measure the mean and variance of the posterior that we decode from the output spike patterns. The estimator of the stimulus mean,

We measure coding accuracy over many trials as the variance,

We will use an indirect measure to assess the predictability of the response of a neuron

The predicted membrane potential depicts the total external “driving force” that neuron

Here we derive an expression for the accuracy with which the stochastic network of section “Comparison to a rate model” can encode the underlying stimulus. The encoding accuracy of this network is limited by two factors: the initial accuracy with which the stimulus is encoded in the input populations and the additional uncertainty that stochastic spike generation adds on top of it.

The input accuracy is determined by the Cramer-Rao bound,

The output neurons in the stochastic network fire Poisson spikes from a rate,

This corresponds to a mean rate

The noise in input and output spike generation is independent from each other. The variances of input and output estimators therefore add up and we find that the accuracy of an optimal observer of the stochastic output spike trains is given as

The network structure is outlined in

The input kernels are given by the log tuning curves:

The output kernel

Parameters for the stimulus dynamics are

In order to change the reliability of the input cues (for the simulation in

To test the robustness of our network to noise, we add a Gaussian white noise term to the membrane potential:

The differential equations of the membrane potentials are integrated using an Euler method with time step

We thank Brian Fischer for his constructive comments.