Sensory noise predicts divisive reshaping of receptive fields

In order to respond reliably to specific features of their environment, sensory neurons need to integrate multiple incoming noisy signals. Crucially, they also need to compete for the interpretation of those signals with other neurons representing similar features. The form that this competition should take depends critically on the noise corrupting these signals. In this study we show that for the type of noise commonly observed in sensory systems, whose variance scales with the mean signal, sensory neurons should selectively divide their input signals by their predictions, suppressing ambiguous cues while amplifying others. Any change in the stimulus context alters which inputs are suppressed, leading to a deep dynamic reshaping of neural receptive fields going far beyond simple surround suppression. Paradoxically, these highly variable receptive fields go alongside and are in fact required for an invariant representation of external sensory features. In addition to offering a normative account of context-dependent changes in sensory responses, perceptual inference in the presence of signal-dependent noise accounts for ubiquitous features of sensory neurons such as divisive normalization, gain control and contrast dependent temporal dynamics.

We can findx x x ML using a simple gradient-descent algorithm, so that each estimate is updated in time according to: where η is a free parameter that determines how quickly estimates are updated. If p (s s s|x x x) ∝ ∏ j p (s j |x x x) this expression can be written: The dynamics of this algorithm depend on the generative model, p (s j |x x x), describing how stimulus features in the world activate the sensory receptors. Here, we consider the case where the variance of each sensory input, σ (x x x) 2 , is proportional to the mean, µ (x x x). The ratio between the variance and mean, F = σ 2 µ , is called the Fano-factor. By chain rule, equation 3 can be written: In the following sections we show that, for a large class of distributions with constant Fanofactor, internal estimates are updated in time according to a function of the the ratio between the received input, s j , and the expected input, µ j (x x x).

Poisson distribution
In the main text, we consider a Poisson noise, described by the probability distribution: Note that for simplicity, we use simplified notation, where µ ≡ µ j (x x x) and s ≡ s j .
For a Poisson distribution, the gradient of the log-likelihood can be written: where g (x) = x − 1. In other words, the gradient is equal to the ratio between the received input, s, and the predicted input, µ, minus 1.

Negative binomial distribution
A generalization of the Poisson distribution, called the negative binomial distribution, allows for Fano-factors greater than 1.The negative binomial distribution can be written in the form: When F → 1, this distribution converges to a Poisson distribution. If F − 1 µ the gradient of the log-likelihood is closely approximated by: where g (x) and h (x) are the monotonically increasing functions:

Gaussian distribution with constant Fano-factor
We now consider a gaussian distribution, with variance equal to the mean: The gradient of the log-likelihood is: where g (x) = 1 2F x 2 − 1 . When µ is large, F ≈ 1 and x is close to 1, then equation 10 becomes equivalent to equation 6, obtained with the Poisson distribution.

Gamma distribution
Finally, we consider a gamma distribution, with variance equal to the mean: The gradient of the log-likelihood is: where g (x) = 1 F log x and C = 1 F log F. As before, when µ is large, F ≈ 1 and x is close to 1, then equation 12 becomes equivalent to equation 6, obtained with the Poisson distribution.

Correlated input noise
In the main text, we assumed that input noise correlations were negligible. To see how input noise correlations could influence our results, we now consider the case where the feed-forward input to the network is described by a Gaussian distribution, with signal-dependent covariance matrix Σ (µ): We consider a covariance matrix that comprises a signal-dependent part, D (µ µ µ), and a signalindependent part, Q: Σ (µ µ µ) = D (µ µ µ) + αQ. The signal-dependent covariance, D (µ µ µ), is a diagonal matrix, with diagonal elements, d i = 1/µ i . If noise correlations are small, the log-likelihood can be approximated by a first order Taylor expansion around α = 0: where q q q is a vector consisting of the diagonal elements of Q. The derivative is thus: where z z z is a vector denoting the divisively normalized input (z i = s i /µ i ), q q q i is the i th column of element of Q, and q ii is the i th diagonal element. Note that the first line in equation 15 is identical to the case where there are no noise correlations (equation 10). The next line consists of a 'correction' term due to the noise correlations. In common with the first term, this term also depends on the ratio between the predicted and received feed-forward inputs, s i /µ i .
Thus, while input noise correlations result in quantitative changes to the steady state solution, our main result, that each feed-forward input needs to be divisively normalized by its top-down prediction, is unchanged.

Two-layer neural network
Here, we consider a hierarchical model in which high-level features, y y y = y 1 , y 2 , . . . , y n y , generate low-level features, x x x = (x 1 , x 2 , . . . , x n x ), which in turn generate a sensory input, s s s = s 1 , s 2 , . . . , s n y . For simplicity, we assume that stimulus features combine linearly, so that the mean predicted values of x i and s j are given by: We assume that the goal of the neural network is to find maximum likelihood estimates ofx x x andŷ y y: If p (x x x|y y y) ∝ ∏ j p (x j |y y y), and p (s s s|x x x) ∝ ∏ j p (s j |y y y), thenx x x ML andŷ y y ML can be obtained numerically, by applying the following updates 1 : If features, x x x, are generated via a Poisson process (i.e. p (x j |y y y) ∝ x j (y y y) x j e x j (y y y) x j ), then equation 20 can be written: which is identical to the gradient descent algorithm that we obtained with the one-layer network.
If sensory inputs, s s s, are generated via a Poisson process (i.e. p (s j |x x x) ∝ s j (x x x) s j e s j (x x x) s j ), then equation 19 can be written: The first term in this expression is the same as in the one-layer network. The second-term is the fractional prediction error between the current estimate ofx i , and the top-down prediction x i (y y y) .

Neural implementation
The single-layer network described in the main text can easily be extended to a two-layer network, to implement equations 21 & 22. As with the single-layer network, each layer consists of two populations of neurons: excitatory neurons that encode the ratio between the received and predicted input ( s j s j (x x x) andx j x j (ŷ y y) ), and inhibitory that encode the estimated stimulus features, (x i andŷ i ).
1 Note that, strictly speaking x i discrete, and thus the gradient ∂ log p(x j |ŷ y y) ∂ŷ i is undefined. However, when x i is large, it can be treated as a continuous variable with a negligible effect on inference.
The responses of neurons in the second layer are described by the following differential equations: where I j represents the input to the j th excitatory neuron in the 2nd-layer, which evolves in time according to equation 26 (so that I j = r inh j1 ). The response of each neuron is labelled with two indices: for example, r exc jk corresponds to the response of the j th excitatory neuron in the k th layer of the network.
The responses of neurons in the first layer of the network are described by the following differential equations: These equations are very similar to the one layer network, described in the main text (main text, equations 5 & 6), with the addition of a second term in equation 26, that denotes feed-back from downstream neurons.

Analytical expression for neural firing rates
In this section, we show that the canonical normalization model of Heeger et al. emerges as a special case of our model.
In the steady state, the response of the j th excitatory neuron is given by: Substituting this into the equation 6 in the main text, and setting ∂ r int i ∂t = 0, we obtain: In the general case where there are many excitatory and inhibitory neurons, equation 28 cannot be solved exactly. However, in the special case where, for all k = i, either ∑ j w ji w jk s j = 0 or r int k = 0, equation 28 simplifies to: If w 0 w ji r inh l , the above equation can be solved, to give, Substituting this expression for r inh i back into equation 27, we obtain a closed-form expression for the excitatory neuron response: wherew ji = 1 ∑ j w jl w jl , |x| = max (x, w 0 ), and i = arg max k ∑ j s j log w jk . Equation 31 is of a very similar form to the canonical normalization model proposed by Heeger et al., in which neural responses are described by the equation: where I j is the j th input to the network, and γ, n & σ are free parameters that determine the shape of the contrast response curve. One can show that in order for neural responses to be well approximated by equation 31, the stimulus should not drive the inputs to more than one competing interneuron too strongly. Mathematically, we require that for all k = i, ∑ jw jk w jis j < 1 or ∑ jw jkw ji s j = 0, wheres j = 1 ∑ j s j s j andw ji = 1 ∑ j w ji w ji . Supplementary figure 1 shows the response of a single excitatory neuron versus the strength of input to the network, with the excitatory response computed numerically (solid lines) or using equation 31 (see figure legend for simulation details). As can be seen in this figure, equation 31 provides a good description of the neuron's response. linearly integration of inputs followed by non-linear rescaling of responses. This LN model is a simple generalisaion of a subtractive inhibition model (predicted by optimal estimation with gaussian noise), which predicts linear responses. It is straight-forward to show that such an LN model is incapable of producing contextual tuning curve shifts, regardless of the form of the non-linearity. In addition, we compared our model to a global-divisive model, with responses were described by the following equation: where U ji and V ji are positive input and divisive filters, and | · | is a rectifying non-linearity. As with the LN model shown in the main text, linear filters, U ji and V ji , were learned in each case so as to minimise the mean squared difference between the responses obtained with the input-targeted inhibition model in the main text and the above global divisive inhibition model. Supplementary figures 2a & b show the responses obtained with global divisive inhibition, with paramaters trained to reproduce the tuning curves shown in figure 2c-d in the main text. As can be seen, global divisive inhibition was unable to produce contextual shifts in tuning curves.

Temporal dynamics of neural responses
In figure 7 we plotted the temporal response profiles of excitatory and inhibitory neurons in our model, in response to a constant input. To investigate how our results depended on the relative timescale of excitation and inhibition, we ran an additional simulations with the relative timescale of excitation increased by a factor of 40, relative to the timescale of inhibibion (i.e. determined ratio between a and b in equations 5-6 in the main text; Supp. fig. 3) Increasing the relative speed of excitation resulted in more pronounced input-dependent variations in the time that excitory responses took to reach their peak (Supp. fig. 3a). Also, inhibitory responses were quicker to arrive at steady-state, and in some cases showed a transient overshoot (Supp fig 3b). Overall, however we observed qualitatively similar results to figure 7a-b, in the main text.