The authors have declared that no competing interests exist.
Conceived and designed the experiments: AB DBC. Performed the experiments: AB. Analyzed the data: AB. Wrote the paper: AB DBC.
Current Address: Simons Center for Data Analysis, Simons Foundation, New York, New York, United States of America
Neurons must faithfully encode signals that can vary over many orders of magnitude despite having only limited dynamic ranges. For a correlated signal, this dynamic range constraint can be relieved by subtracting away components of the signal that can be predicted from the past, a strategy known as predictive coding, that relies on learning the input statistics. However, the statistics of input natural signals can also vary over very short time scales e.g., following saccades across a visual scene. To maintain a reduced transmission cost to signals with rapidly varying statistics, neuronal circuits implementing predictive coding must also rapidly adapt their properties. Experimentally, in different sensory modalities, sensory neurons have shown such adaptations within 100 ms of an input change. Here, we show first that linear neurons connected in a feedback inhibitory circuit can implement predictive coding. We then show that adding a rectification nonlinearity to such a feedback inhibitory circuit allows it to automatically adapt and approximate the performance of an optimal linear predictive coding network, over a wide range of inputs, while keeping its underlying temporal and synaptic properties unchanged. We demonstrate that the resulting changes to the linearized temporal filters of this nonlinear network match the fast adaptations observed experimentally in different sensory modalities, in different vertebrate species. Therefore, the nonlinear feedback inhibitory network can provide automatic adaptation to fast varying signals, maintaining the dynamic range necessary for accurate neuronal transmission of natural inputs.
An animal exploring a natural scene receives sensory inputs that vary, rapidly, over many orders of magnitude. Neurons must transmit these inputs faithfully despite both their limited dynamic range and relatively slow adaptation time scales. One well-accepted strategy for transmitting signals through limited dynamic range channels–predictive coding–transmits only components of the signal that cannot be predicted from the past. Predictive coding algorithms respond maximally to unexpected inputs, making them appealing in describing sensory transmission. However, recent experimental evidence has shown that neuronal circuits adapt quickly, to respond optimally following rapid input changes. Here, we reconcile the predictive coding algorithm with this automatic adaptation, by introducing a fixed nonlinearity into a predictive coding circuit. The resulting network automatically “adapts” its linearized response to different inputs. Indeed, it approximates the performance of an optimal linear circuit implementing predictive coding, without having to vary its internal parameters. Further, adding this nonlinearity to the predictive coding circuit still allows the input to be compressed losslessly, allowing for additional downstream manipulations. Finally, we demonstrate that the nonlinear circuit dynamics match responses in both auditory and visual neurons. Therefore, we believe that this nonlinear circuit may be a general circuit motif that can be applied in different neural circuits, whenever it is necessary to provide an automatic improvement in the quality of the transmitted signal, for a fast varying input distribution.
Early sensory processing faces the challenge of communicating sensory inputs with large dynamic range to the rest of the brain using neurons with limited dynamic range [
A predictive coding circuit attempts to reduce the dynamic range of an input by subtracting a prediction of the current input value–based on past input values–from the actual current input value, and then transmitting only the difference, i.e. the prediction error (
Feedforward (a) and feedback (b) predictive coding circuits (as in Eq (
This adaptation of the response filters of a neuron to changing input properties has been explored in the literature [
However, the adaptation of early sensory processing circuits must also be very fast, since the input statistics often vary rapidly, e.g. across a saccade [
How can a linear filter change at a high rate in response to changes in its input statistics? The addition of a time-invariant nonlinearity can allow the construction of a circuit that “instantaneously adapts” its linearized responses to input changes [
Here, we demonstrate that a network of leaky integrator neurons, with a threshold nonlinearity, is, indeed, able to achieve automatic adaptation to changes in the ratio of the predictable component of an input to its unpredictable component. Since noise is, by definition, unpredictable, one specific case of this would be sudden changes to the input SNR. We first show that, for certain stimulus ensembles, linear leaky integrator neurons can implement linear predictive coding through both a feedback and a feedforward inhibitory circuit. We find the parameters that allow these networks to minimize the output dynamic range. Comparing these implementations, we find that the structure of the adaptation in the feedback inhibitory circuit lends itself to the construction of an automatically adapting filter. Specifically, the addition of a biologically-plausible threshold nonlinearity to the feedback inhibitory circuit allows it to approximate the performance of the optimal linear filter over a range of input SNRs.
We compare the responses of a nonlinear predictive coding circuit to available experimental results. The instantaneous changes to the linearized filter of the nonlinear circuit match the fast changes measured to the linear filters of neurons in different sensory modalities, in response to rapid input changes. Hence, our results support the nonlinear feedback inhibitory circuit as a circuit implementation of predictive coding that models the response of early processing in various sensory modalities, facilitating the transmission of rapidly varying, high dynamic range inputs, through slow, low dynamic range neurons.
In the field of adaptive signal processing, predictive coding algorithms have commonly been used for signal compression [
In a general predictive coding algorithm, acting on an input time series {
A crucial property of predictive coding algorithms is that they transmit information losslessly. Specifically, their function is to transmit all the input that they receive including both signal and noise. This is unlike many other algorithms commonly used in neuroscience, which separate signal from noise. Indeed, as structured in Eq (
To formulate this optimization problem, we define:
A class of allowable predictive coding algorithms, within which to identify an optimal algorithm An input ensemble over which the algorithm is optimized. An optimization metric to measure the algorithm’s performance
We start by defining the class of
We have written this in the feedforward implementation. However, since the equation is linear, this results in no loss of generality; it can be rewritten recursively to obtain the feedback case.
Eq (
Ideally, one would like to find the optimal filter over the space of natural images. However, given the complexity of this space, we chose to use a subset of such inputs. Natural image amplitudes are distributed, over time, with a power law distribution over temporal frequencies [
Therefore, we chose an input composed of one such exponentially correlated signal (with a single time constant,
This input provides the ensemble over which we can identify an optimal linear predictive coding filter. We believe that this subset of inputs is naturalistic, since it should be possible to combine several input subsets (constructed with different time constants) to generalize back to the space of natural images.
The final part of the formulation of the optimization problem is a performance metric against which to optimize the filter. Since the goal of applying predictive coding is to reduce the dynamic range required to transmit a signal, a natural measure of performance would be the degree of reduction in the power of the transmitted signal, relative to the input power. We term this the network gain, defined as:
Ideally, any metric of performance, for a compression algorithm, would include both the degree of compression, and a measure of the information lost due to the compression, e.g. reconstruction error. However, as introduced above, predictive coding algorithms encode inputs losslessly. Hence, any reconstruction error is necessarily zero (
Finding the linear filter that minimizes the network gain is a specific example of a common optimization in the adaptive signal processing literature [
Briefly, we compute the power of the filter by transforming Eq (
It is important to note that:
Λ* is dependent both on the signal, and the SNR
Plotting each of these variables, for some values of the input parameters, provides their qualitative structure. First,
(a)
Substituting Eq (
An interesting property of the optimal linear predictive coding algorithm is its structure in the high noise, low signal regime (i.e. low SNR in
Further intuition about the values of the parameters is most useful when applied to specific implementations of this algorithm. Therefore, we first show that it is possible to implement Eq (
A simple model of a biological neuron is a leaky integrator (
Comparing Eq (
This structure can be implemented with different circuits, and we explore both feedforward and feedback two-neuron circuit implementations of the predictive coding algorithm.
The feedforward and feedback implementations of predictive coding (
Networks’ parameters are: (a) Feedforward:
The recursive dynamics of the feedforward circuit (
Comparing Eq (
For the feedback inhibitory network (
Summarizing the dependence of the optimal network parameters on the input statistics (from Eqs (
Interneuron discounting factor | Gain | |
---|---|---|
Feedforward | ||
Feedback | Γ( |
Although the resulting linear prediction-error filter changes in the same way for both circuits, the mechanistic difference between the circuits places different demands on the interneurons. For example, consider the changes in response to increasing input SNR (
Focusing on the interneuron discounting factor, in the feedforward case, the interneuron gets progressively faster as noise increases, as if the interneuron is reducing the time over which it averages the signal to obtain a prediction, given cleaner inputs (with less noise). However, in the feedback case, the interneuron averages over the same time scale, perhaps to provide a matched filter to select the specific correlated signal. This emphasizes the different roles for the interneuron within each circuit.
Further, this difference suggests that the feedback network may lend itself more readily to the construction of an automatically adapting nonlinear network–to respond to rapidly varying input SNR. One can imagine changing the output from one component of a circuit using a nonlinearity, as is necessary for adaptation of the feedback network. However, it would seem to be quite difficult to vary the time constant of a neuron, a cellular property, using a nonlinearity, as is necessary for adaptation of the feedforward network. Therefore, we now explore the construction of such a nonlinear feedback circuit.
As introduced earlier, a nonlinearity can allow an invariant circuit to automatically change its linearized response to varying inputs [
Our analysis of the optimal linear feedback circuit shows that as
Since inputs of different SNR are integrated differentially by the interneuron, we define the shape necessary for the static nonlinearity. Integrating uncorrelated noise is equivalent to a random walk. In contrast, integrating a correlated signal is equivalent to a biased random walk. Hence, on average, the output of a leaky integrator neuron, from an input with greater correlated component (i.e. higher SNR), will be larger in amplitude. Therefore, any automatic adapting nonlinearity–applied to the output of the feedback interneuron–must push the gain towards 0 for small output amplitudes, and pull the gain towards 1 for large output amplitudes.
One simple piecewise linear nonlinearity satisfies this requirement: the threshold or rectilinear nonlinearity, which increases linearly, from a fixed threshold (
(a) Rectification nonlinearity (black), with threshold at v = 1. Linearized responses in color (cyan: min; magenta: max). (b) Nonlinear feedback inhibitory network. The nonlinearity (inset) is applied to the interneuron’s output. Nonlinearities with increasing thresholds are colored (green: min; red: max) (Methods). (c) Network gain at different input frequencies (note: the frequency space is in Z-space, i.e. defined with respect to the fixed time step of the network). The three colored curves are the nonlinear network response curves (computed using describing function analysis, colored as in (b)) (Methods). The dotted lines provide extreme parameter values for the linear network: Γ = 0,1.
Also termed a dead-zone nonlinearity by engineers [
Therefore, a feedback inhibitory circuit with a rectification nonlinearity applied to the feedback interneuron (
To understand the operation of the nonlinear feedback circuit (
We observe that, without changing any parameters, each nonlinear network (with a specific threshold) shows network gains that approach those of the Γ = 1 linear network for low frequency inputs, and those of the Γ = 0 linear network for high frequency inputs (
We now demonstrate that this qualitative understanding is also supported quantitatively, for fast varying input statistics: comparing the performance of the nonlinear feedback network against that of optimal linear networks.
To compare the quantitative performance of the nonlinear and the linear circuits in the regime where input properties change too fast to allow for parameter adaptation, we define a class of non-stationary inputs. Each such input–termed a mixture–is composed of two components with different SNRs, mixed in time, such that there is one component for a fixed amount of time, and then the second component for the same amount of time. In this way, we are modelling the response of the circuit to an input with a rapid change from one SNR to another, as opposed to a single input, with a fixed SNR.
Within this input regime, we compare the nonlinear network to two different types of linear networks:
Type 1: The linear network that obtains the minimal network gain over the specific mixture of two SNR inputs, i.e. the minimal network gain for a non-adapting linear predictive coding network. Type 2: The linear network that has sufficient time to adapt separately to optimally transmit each component of the mixture, i.e. this network has the minimal network gain for any linear predictive coding network (over this specific input mixture).
We demonstrate (
(a) Two input mixtures, modeling rapid transition from predictable to unpredictable input components. (b) Description of simulations. Inputs constructed as in (a). At each time point, inputs are either pure predictable signal, or pure unpredictable noise, with an instantaneous transition from one type to the other, in the middle of the simulation period. (c-f) Simulation outputs (inputs shown in inset). The amplitude of the unpredictable component of the mixture varies along the x axis. Error bars are 1 std. dev. (c,e) Network gain of the linear network of type 1, optimized to the mixture (blue, non-adapted linear response), is significantly higher than that of the nonlinear network (red). In contrast, the nonlinear network gain is close to the response of the optimal linear network of type 2, which is allowed to adapt to each component of the mixture (dotted black, adapted linear response). (e) Green shading indicates region where the nonlinear response is more than one std. dev. lower than the non-adapted linear response. Diagonal hashing indicates region where the nonlinear response is within one std. dev. of the adapted linear response. (d,f) % improvement of the performance of the nonlinear network over the type 1 linear network at different amplitudes of the unpredictable component. Data taken from (c) and (e) respectively. (f) Green box indicates region where the improvement is more than one std. dev. different from 0.
In more detail, the response of both the linear and nonlinear networks to a mixture of two input components (
To robustly test the performance of the nonlinear circuit, we simulated its response to a mixture composed of components that are as distinct as possible. Therefore, we chose the first component of the mixture to be pure, predictable, correlated signal, and the second component to be unpredictable. As defined earlier, the correlated component is exponentially correlated with a fixed time constant. For the unpredictable component of the mixture, we utilize one of two inputs: (1) input at the Nyquist frequency, or (2) Gaussian white noise. Both these inputs are–for the purposes of a nonlinear predictive coding circuit with a non-zero time constant in the feedback neuron–unpredictable. Since the input mixture transitions from one extreme SNR to another, it should provide a strong test of the ability of a fixed nonlinear circuit to respond to a range of input SNRs.
We first show that the best linear network of type 1 is outperformed by the nonlinear network (
It is important to note that the linear network of type 1 against which we compare the nonlinear network’s performance has the minimal network gain of any such network. We could have used a linear network adapted to the first component of the mixture, and then measured its performance over both components. This would be a natural model for the case where a network was adapted to some input statistic, which changed rapidly, and the network had had insufficient time to adapt to the new statistic. However, the type 1 linear network outperforms any such linear network. Therefore, it provides a strong baseline against which to compare the performance of the nonlinear network.
Our results show that the nonlinear network’s improvement over the type 1 linear network persists even if (a) the unpredictable component has larger average amplitude than the predictable (correlated) component (
Continuing beyond the improvement over type 1 networks, we next observe that the performance of the nonlinear network approximates the performance of the type 2 linear network (
Given this, we explore the potential role of nonlinear feedback inhibitory networks in real sensory systems.
Classical experiments, such as the seminal work of J. D. Victor in cat retinal ganglion cells [
To test specifically for the presence of automatic adaptation, experiments must constrain the speed of the change of the neuronal response function: an automatically adapting circuit will respond to a change in the input structure with an adaptive change, on the timescales of neuronal dynamics. Experimental evidence demonstrating these fast changes to neuronal responses has been found, recently, both in the salamander visual system [
(a,b) Adapted from [
(a-c) Adapted from [
Baccus and Meister [
To match the qualitative structure of the observed temporal filter, with a smooth increase from zero to the first peak, we make two biologically reasonable changes to the nonlinear model (Methods,
We introduce an additional neuron with a non-zero time constant, prior to the principal neuron (
We added a time constant to the principal neuron (
This modified network produces a smoothly varying temporal filter (with zero weight at t = 0) that can be compared to experiment (Methods). These two (biologically reasonable) changes are actually necessary; subsets of this model, with fewer neurons, or fewer time constants, won’t result in a smooth temporal filter (Methods).
Given this model, we demonstrate that the resulting response filter for the nonlinear feedback network shifts in the same direction as that measured by Baccus and Meister [
Mante et al. [
This independence on the precise input also allows the automatic adaptation of the nonlinear feedback inhibitory network to generalize to non-visual sensory modalities. Nagel and Doupe [ In response to increasing input amplitude, the first peak of the temporal filter decreased in amplitude, and the first valley increased in amplitude. Therefore, there was a decrease in the ratio of total positive response of the filter to total negative response, when the input amplitude changes from high to low (points below the diagonal in The shift in the location of the peaks can be characterized by the change in the peak frequency response of the filter (best mean frequency, BMF). Therefore, comparing high to low amplitude inputs, the authors identified an increase in the BMF (points above the diagonal in
We demonstrate, through simulating the responses of the nonlinear feedback circuit ( The positive/negative ratio of the simulated filter of the nonlinear network decreases as the input amplitude increases ( The BMF of the simulated temporal filters of the nonlinear network increases for increasing input amplitudes (
This suggests that a nonlinear feedback circuit could underlie the observed fast adaptation in the zebra finch auditory forebrain.
Importantly, the changes to the response filters of the nonlinear predictive coding network are a general property of the network, and not a function of the specific parameters chosen. Indeed, it is possible to demonstrate analytically that a nonlinear model of neurons with just two time constants, assuming only that the time constant of the interneuron is longer than that of the principal neuron, already shows a shift of its single extremum towards the more recent past (as input amplitudes increase) (
Neuronal circuits must transmit input signals which vary, rapidly, by multiple orders of magnitude [
Our analysis distinguishing the feedforward and feedback implementations of the algorithm demonstrated the importance of the neural implementation in developing intuition about an algorithm. For example, in the feedforward inhibitory circuit, it was possible for predictive coding to be implemented by a neuron with a short time constant, for large SNR inputs. However, in the feedback circuit, the predictive neuron is matched to the properties of the signal component, and independent of the SNR. Therefore, each implementation may have differing properties, which may each prove differently useful.
This work also demonstrates the necessity of studying inputs with rapidly varying statistics, to understand the different constraints they place on circuit implementations of an algorithm. Both the feedforward and feedback circuits can implement optimal linear predictive coding–when the input is statistically stationary. However, by analyzing the responses of the two different implementations to non-stationary inputs, we found that the feedback implementation provides a natural way to approximate an optimal response, through the addition of a circuit nonlinearity. In contrast, the alternative, feedforward, implementation requires adaptation of its underlying cellular properties, which would be difficult to vary through a circuit nonlinearity.
In this report, we introduced intuition on the shape of a nonlinearity necessary to perform automatic adaptation. Given its mathematical convenience, and biological plausibility, we focused on the rectilinear nonlinearity. However, it is important to note that the rectification nonlinearity is not the only nonlinearity that can satisfy the necessary structure. Indeed, any nonlinearity with the necessary inflection point should also be able to perform an automatic adaptation. This provides an avenue for further analysis.
Another avenue for further exploration is how to generalize our results on nonlinear predictive coding networks to more complex stimuli. We found the optimal linear predictive coding algorithm for an exponentially correlated signal with a single time constant. However, naturalistic stimuli can be modeled as a combination of several exponentially correlated signals, with different time constants. This suggests that to respond to a naturalistic stimulus, there should be several predictive coding circuits, each adapted to one of the correlations within the signal. However, how could these different predictive coding circuits be combined to respond optimally overall? One possible solution may be the addition of mutually inhibitory connections between the parallel predictive coding circuits. This circuit design should allow each neuron to respond maximally to the input component that it was adapted for, while simultaneously removing that input component from the remaining neurons. Hence, it might allow the net response of the larger circuit to remain close to optimal. A similar network design has been shown to implement predictive coding across a spatial scene (for a non time-varying stimulus) [
Another direction to explore is the computational function of the nonlinear circuit, beyond its linearized responses. In this work, we demonstrated that the network gain of the nonlinear feedback network approximates that of the optimal linear network. However, this does not imply the two networks have identical responses to stimuli. For example, the linear algorithm amplifies high frequency inputs (flattening the output frequency distribution, for an exponentially correlated signal with noise) (
In general, adaptation of network dynamics, causing them to respond faster when inputs are more salient, has been observed in different experiments [
Finally, one long standing goal of computational neuroscience has been to develop circuit motifs, in a manner similar to electrical engineering. We believe that the nonlinear feedback inhibitory network could be one such neuronal circuit motif. It performs a specific computational function without losing information, and is stable with respect to internal disturbances (
The input used in optimizing the linear predictive coding circuit (as in Eq (
Our derivation of the optimal linear predictive coding filter did not require any constraint on the distributions (for either the signal or noise components of the input). Hence, for maximal generality, we left them unconstrained.
The predictive coding circuits were constructed with linear leaky integrator neurons (
where the synaptic conductance, gs, is measured as a fraction of the cell's membrane conductance. Discretizing Eq (
The steps to derive the recursive equation governing the dynamics of the feedback circuit Eq (
The discretized feedback circuit is described by a pair of linear, recursive equations. As introduced in the text, we derive this pair of equations by computing the input to each cell in the circuit (
This process gives us:
Substituting Eq (
Letting
Repeating this process, we have:
Finally, substituting back into Eq (
The equation for the rectilinear nonlinearity (also known as a dead zone nonlinearity in the engineering literature) is as follows:
In
In this analysis, we assume that the response of the network to a single frequency of input (at a single amplitude) is linear for each such input. This linear model is allowed to vary with each different frequency (and amplitudes). We compute a look-up table for the effect of the nonlinearity by balancing the input across the nonlinear loop, for each frequency (and input amplitude). For every single frequency input to the network, however, it is necessary to make the assumption that the output of the network only produces a single frequency output (the primary component of the Fourier transform). This means that describing function analysis automatically discards any spread of the initial frequency into higher Fourier harmonics. The resulting Bode plots are, therefore, not quantitatively correct. However, it has been well established that describing function analysis does provide a reasonable, qualitatively correct result.
Response of optimal linear and nonlinear networks to varying mixtures of stimuli was simulated. To construct these plots, a 1:1 mixture of two input components was used. The first component was pure exponentially correlated signal, and the second was pure noise. Further, the amplitude of the noise component was varied (values on the x-axis of
The response of three different networks, two linear and one nonlinear, was simulated for each input mixture, and the network gains computed. For all three networks, the discounting factor was matched to the time constant of the input within the signal component of the mixture. Parameter variation was then used to find the optimal value of Γ (for the linear networks) and both Γ and the threshold
It was necessary to modify the analytically-derived nonlinear feedback circuit to obtain simulations that can be directly compared to the experimentally measured response filters of different sensory neurons. The experimentally measured filters place a low weight on inputs at t = 0, with the weight increasing to a peak, followed by a reducing oscillation between peaks and troughs (Figs
Since real neurons must have a non-zero time constant, the first change that we considered to the model, was to add a non-zero time constant to the principal neuron. However, the linearized response filter of this modified model still has maximal weight at t = 0. An alternative change is to add a time constant to a neuron providing input to the two-neuron network (
The response of the nonlinear network, with three neurons with non-zero time constants (
Electrically, the neuron is an RC circuit, with inputs arriving as current (gsvinput).
(TIF)
Both the linear and nonlinear networks are only allowed to adapt to the mixture (and not to each individual component).
(TIF)
(a) Firing rate of X-type retinal ganglion cells in response to a stimulus pulse of increasing contrast; dashed lines denote peak (cyan) and steady-state (red) responses. (b) Ratio of steady-state amplitude to peak amplitude for experimental (squares) and simulated model responses (diamonds). The reduction in the ratio, as measured experimentally, is qualitatively the same as the simulation. (c) Bode plots of responses of retinal ganglion cells, for sinusoidal stimuli with increasing contrast (figure adapted from [
(TIF)
(a) Nonlinear predictive coding circuit (as in
(TIF)
(a-e) Input power (blue) and the power within the optimal transfer function of the network (red) at different frequencies. SNR decreases from (a)–(e) (f-j) Output power at each frequency (obtained by multiplying both functions from left column). Notice the flat output power, termed whitening. Also, notice the reduction in total transmitted power. This reduction in power get progressively less as the fraction of predictable signal within the input reduces (i.e. as the SNR decreases). At the extreme case, in the final row, with pure noise, the input has the same power as the output, with no reduction of gain.
(TIF)
(PDF)
AB thanks Simon Laughlin, his PhD co-supervisor, for introducing him to predictive coding, and for helpful discussions during AB’s time in Cambridge. Also, we thank Tao Hu and other members of the Chklovskii lab, for helpful discussions on adaptive signal processing.