Stochastic neural field model of stimulus-dependent variability in cortical neurons

We use stochastic neural field theory to analyze the stimulus-dependent tuning of neural variability in ring attractor networks. We apply perturbation methods to show how the neural field equations can be reduced to a pair of stochastic nonlinear phase equations describing the stochastic wandering of spontaneously formed tuning curves or bump solutions. These equations are analyzed using a modified version of the bivariate von Mises distribution, which is well-known in the theory of circular statistics. We first consider a single ring network and derive a simple mathematical expression that accounts for the experimentally observed bimodal (or M-shaped) tuning of neural variability. We then explore the effects of inter-network coupling on stimulus-dependent variability in a pair of ring networks. These could represent populations of cells in two different layers of a cortical hypercolumn linked via vertical synaptic connections, or two different cortical hypercolumns linked by horizontal patchy connections within the same layer. We find that neural variability can be suppressed or facilitated, depending on whether the inter-network coupling is excitatory or inhibitory, and on the relative strengths and biases of the external stimuli to the two networks. These results are consistent with the general observation that increasing the mean firing rate via external stimuli or modulating drives tends to reduce neural variability.


Author summary
A topic of considerable current interest concerns the neural mechanisms underlying the suppression of cortical variability following the onset of a stimulus. Since trial-by-trial variability and noise correlations are known to affect the information capacity of neurons, such suppression could improve the accuracy of population codes. One of the main candidate mechanisms is the suppression of noise-induced transitions between multiple attractors, as exemplified by ring attractor networks. The latter have been used to model experimentally measured stochastic tuning curves of directionally selective middle temporal (MT) neurons. In this paper we show how the stimulus-dependent tuning of neural variability in ring attractor networks can be analyzed in terms of the stochastic wandering of spontaneously formed tuning curves or bumps in a continuum neural field model. The advantage of neural fields is that one can derive explicit mathematical expressions for the second-order statistics of neural activity, and explore how this depends on important a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 Introduction synaptic connections, or two different cortical hypercolumns linked by horizontal patchy connections within the same layer. We will refer to these two distinct architectures as model A and model B, respectively. (See also Fig 1) In this paper, we use model A to show how vertical excitatory connections between two stochastic ring networks can reduce neural variability, consistent with a previous analysis of spatial working memory [22]. We also show that the degree of noise suppression can differ between layers, as previously found in an experimental study of orientation selective cells in V1 [24]. An experimental "center-surround" study of stimulus-dependent variability in V1 indicates that correlations in spontaneous activity at the center can be suppressed by stimulating outside the classical receptive field of the recorded neurons [25], that is, by evoking activity in the surround. In this paper, we show that the effect of a surround stimulus depends on at least two factors: (i) whether or not the horizontal connections effectively excite or inhibit the neurons in the center, and (ii) the relative directional bias of the surround stimulus. In particular, we find that at high contrasts (inhibitory regime), noise is facilitated in the center when the center and surround stimuli have the same directional bias, whereas it is suppressed when the center and surround stimuli have opposite directional biases. The converse holds at low contrasts (excitatory regime). These results are consistent with the general observation that increasing the mean firing rate via external stimuli or modulating drives tends to reduce neural variability.
In the remainder of the Introduction we introduce our stochastic neural field model of coupled ring networks and describe in more detail the structure of models A and B. In Materials and Methods we use perturbation theory to show how the neural field equations can be reduced to a pair of stochastic phase equations describing the stochastic wandering of bump solutions. These equations are analyzed in the Results, using a modified version of the bivariate von Mises distribution, which is well-known in the theory of circular statistics. This then allows us to determine the second-order statistics of a single ring network, providing a mathematical underpinning for the experimental and computational studies of Ponce-Alvarez et al [10], and to explore the effects of inter-network coupling on neural variability in models A and B.
One final point is in order. There are many different measures of neural variability in the literature, both in terms of the type of statistical quantity (variance, Fano factor, auto and cross correlations) and the relevant observables (single cell or population firing rates/binned spikes, voltages for in vivo patch recordings, and individual spikes). In this paper, we mainly focus on the mean and variance of population activity, which could be interpreted as the extracellular voltage or current associated with a population of cells. Hence, our results speak most directly to the experimental studies of Ref. [10]. The possible relationship to other measures of neural variability are considered in the Discussion.

Coupled ring model
Consider a pair of mutually coupled ring networks labeled j = 1, 2. Let u j (θ, t) denote the activity at time t of a local population of cells with stimulus preference θ 2 [−π, π) in network j.
Here θ could represent the direction preference of neurons in area-middle temporal cortex (MT) [10], the orientation preference of V1 neurons, after rescaling θ ! θ/2 [26,27], or a coordinate in spatial working memory [22,28,29]. For concreteness, we will refer to θ as a direction preference. The variables u j evolve according to the neural field equations [21,22,30,31] where ffi ffi � p is a constant scale factor (see below), J j (θ − θ 0 ) is the distribution of intra-network connections between cells with stimulus preferences θ 0 and θ in network j, K j (θ − θ 0 ) is the corresponding distribution of inter-network connections to network j, and h j (θ) is an external stimulus. The firing rate function is assumed to be a sigmoid with maximal firing rate f 0 , gain γ and threshold η. The final term on the right-hand side of each equation represents external additive noise, with W j (θ, t) a θ-dependent Wiener process. In particular, where δ(t) is the Dirac delta function and δ i,j is the Kronecker delta function. For concreteness, we will take C(θ) = aδ(θ) + b cos(θ) for constants a, b. For b 6 ¼ 0, the noise is colored in θ (which is necessary for the solution to be spatially continuous) and white in time. (One could also take the noise to be colored in time by introducing an additional Ornstein-Uhlenbeck process. For simplicity, we assume that the noise processes in the two networks are uncorrelated, which would be the case if the noise were predominantly intrinsic. Correlations would arise if some of the noise arose from shared fluctuating inputs. For a discussion of the effects of correlated noise in coupled ring networks see [22].) The external stimuli are taken to be weakly biased inputs of the form ffi ffi � p h j with where � y j is the location of the peak of the input (stimulus bias) and � h j is the contrast. Finally, the time-scale is fixed by setting the time constant τ = 10 msec. The maximal firing rate f 0 is taken to be 100 spikes/sec. The weight distributions are 2π-periodic and even functions of θ and thus have cosine series expansions. Following [21], we take the intra-network recurrent connections to be which means that cells with similar stimulus preferences excite each other, whereas those with sufficiently different stimulus preferences inhibit each other. It remains to specify the nature of the inter-network connections. As we have already mentioned, we consider two different network configurations (see Fig 1): (A) a vertically connected two layer or laminar model and (B) a horizontally connected single layer model. In model A, the inter-network weight distribution is taken to have the form which represents vertical coupling between the layers. We also assume that both layers receive the same stimulus bias, that is, � (4). In model B, the inter-network weight distribution represents patchy horizontal connections, which tend to link cells with similar stimulus preferences [32][33][34][35]. This is implemented by taking Now the two networks can be driven by stimuli with different biases so that � y 1 6 ¼ � y 2 . Note that in order to develop the analytical methods of this paper, we scale the internetwork coupling, the noise terms and the external stimuli in Eq (1) by the constant factor ffi ffi � p . Taking 0 < � � 1 (weak noise, weak inputs and weak inter-network coupling) will allow us to use perturbation methods to derive explicit parameter-dependent expressions for neural variability. We do not claim that cortical networks necessarily operate in these regimes, but use the weakness assumption to obtain analytical insights and make predictions about the qualitative behavior of neural variability. In the case of weak inter-network connections, the validity of the assumption is likely to depend on the source of these connections. For example, in model B, they arise from patchy horizontal connections within superficial or deep layers of cortex, which are known to play a modulatory role [36]. On the other hand, vertical connections between layers are likely to be stronger than assumed in our modeling analysis, at least in the feedforward direction [37]. Finally, the weak stimulus assumption depends on a particular view of how cortical neurons are tuned to stimuli, which is based on the theory of ring attractor networks, see the Discussion.

Results
We present various analytical and numerical results concerning stimulus-dependent neural variability, under the assumption that the neural field Eq (1) support stable stationary bump solutions u j (θ, t) = U j (θ) = A j cos(θ), j = 1, 2, in the absence of noise, external stimuli, and inter-network coupling (� = 0). The amplitudes A j are determined self-consistently from the equations (see Material and methods) One of the important properties of the uncoupled homogeneous neural field equations is that they are marginally stable with respect to uniform translations around the ring. That is, the location of the peak of the bump is arbitrary, which reflects the fact that the homogeneous neural field is symmetric with respect to uniform translations. Marginal stability has a number of important consequences. First, the presence of a weakly biased external stimulus ffi ffi � p h j can lock the bump to the stimulus. The output activity is said to amplify the input bias and provides a network-based encoding of the stimulus, which can be processed by upstream networks. (Since the bump may persist if the stimulus is removed, marginally stable neural fields have been proposed as one mechanism for implementing a form of spatial working memory [22,28,29,38,39]).
A second consequence of operating in a marginally stable regime is that the bump is not robust to the effects of external noise, which can illicit a stochastic wandering of the bump [20][21][22][39][40][41]. One way to investigate the stochastic wandering of bumps in a neural field model is to use perturbation theory. The latter was originally applied to the analysis of traveling waves in one-dimensional neural fields [30,31], and was subsequently extended to the case of wandering bumps in single-layer and multi-layer neural fields [21,22,42]. The basic idea is to to treat longitudinal and transverse fluctuations of a bump (or traveling wave) separately in the presence of noise, in order to take proper account of marginal stability. This is implemented by decomposing the stochastic neural field into a deterministic bump profile, whose spatial location or phase has a slowly diffusing component, and a small error term. (There is always a non-zero probability of large deviations from the bump solution, but these are assumed to be negligible up to some exponentially long time.) Perturbation theory can then be used to derive an explicit stochastic differential equation (SDE) for the diffusive-like wandering of the bump in the weak noise regime. (A more rigorous mathematical treatment that provides bounds on the size of transverse fluctuations has also been developed [43,44]).
Motivated by previous studies of wandering bumps in stochastic neural fields, we introduce the amplitude phase decomposition [22,30] u j ðy; tÞ ¼ U j ðy þ b j ðtÞÞ þ ffi ffi � p v j ðy; tÞ; j ¼ 1; 2: ð9Þ (As it stands, this decomposition is non-unique, unless an additional mathematical constraint is imposed that can define β j and v j uniquely. Within the context of formal perturbation methods, this is achieved by imposing a solvability condition that ensures that the error term can be identified with fast transverse fluctuations, which converge to zero exponentially in the absence of noise, see Materials and methods.) Substituting Eq (9) into the full stochastic neural field Eq (1) and using perturbation theory along the lines of [21,22,30,42], one can derive the following SDEs for the phases β j (t), see Materials and methods: where L j ¼ � h j =A j , K j ðbÞ are 2π-periodic functions that depend on the form of the inter-network connections, and w j (t) are independent Wiener processes: The functions K j ðbÞ and the diffusion coefficients D 1 , D 2 are calculated in Materials and Methods, see Eqs (73), (74) and (77). where E½dWðy; tÞ� ¼ 0; E½dWðy; tÞdWðy 0 ; t 0 Þ� ¼ Cðy À y 0 Þdðt À t 0 Þdt dt 0 ; ð13Þ

Wandering bumps in a single stochastic ring network
with JðyÞ ¼ � J cos y; hðyÞ ¼ � h cos y; CðyÞ ¼ adðyÞ þ b cos y: A clear demonstration of the suppressive effects of an external stimulus can be seen from direct numerical simulations of Eq (12), see Fig 2. In the absence of an external stimulus, the centerof-mass (phase) of the bump diffuses on the ring, whereas it exhibits localized fluctuations when a weakly-biased stimulus is present. Clearly, the main source of neural variation is due to the wandering of the bump, which motivates the amplitude phase decomposition given by Eq (9).
Applying the perturbation analysis of Materials and Methods yields a one-network version of the phase Eq (10), which takes the form where A is the amplitude of the bump for � = 0. Eq (14) is known as a von Mises process, which can be regarded as a circular analog of the Ornstein-Uhlenbeck process on a line, and generates distributions that frequently arise in circular or directional statistics [45]. The von Mises process has been used to model the trajectories of swimming organisms [46,47], oscillators in physics [48], bioinformatics [49], and the data fitting of neural population tuning curves [50]. (Nonlinear stochastic phase equations analogous to (14) also arise in models of ring attractor networks with synaptic heterogeneities, which have applications to spatial working memory [23,51,52]).
Introduce the probability density This satisfies the forward Fokker-Planck equation (dropping the explicit dependence on initial conditions) for β 2 [−π, π] with periodic boundary conditions p(−π, t) = p(π, t). It is straightforward to show that the steady-state solution of Eq (15) is the von Mises distribution with Here I 0 (κ) is the modified Bessel function of the first kind and zeroth order (n = 0), where expðk cos yÞ cosðnyÞdy: Sample plots of the von Mises distribution are shown in Fig 3. One finds that M(β; β � , κ) ! 1/ 2π as κ ! 0; since k � � h this implies that in the absence of an external stimulus one recovers the uniform distribution of pure Brownian motion on the circle. On the other hand, the von Mises distribution becomes sharply peaked as κ ! 1. More specifically, for large positive κ, We thus have an explicit example of the noise suppression of fluctuations by an external stimulus, since s 2 / 1= � h. (We are assuming that the time for the distribution of the stochastic phase variable to reach steady-state is much shorter than the time for the amplitude-phase decomposition (9) to break down. This can be proven rigorously using variational methods for sufficiently small �, since the time for a large transverse fluctuation becomes exponentially large [44]).
Moments of the von Mises distribution are usually calculated in terms of the circular moments of the complex exponential x = e iβ = cos β + i sin β. The nth circular moment is defined according to In particular, We can use these moments to explore stimulus-dependent variability in terms of the stochastic wandering of the bump or tuning curve. That is, consider the leading order approximation u(θ, t) � A cos(θ + β(t)), with β(t) evolving according to the von Mises SDE (14). Trial-to-trial variability can be captured by averaging the solution with respect to the stationary von Mises density (16). First, from Eq (20). Hence, the mean amplitude A(κ) is given by the first circular moment of the von Mises distribution, see inset of Fig 3. When κ = 0 (zero external stimulus), the amplitude vanishes due to the fact that the random position of the bump is uniformly distributed around the ring. As the stimulus contrast � h increases the wandering of the bump is more restricted and A(κ) monotonically increases. Second, It follows that the variance is In Fig 4(a), we show example plots of the normalized variance var(U)/A 2 as a function of the parameter κ, which is a proxy for the input amplitude � h, since k / � h. It can be seen that our theoretical analysis reproduces the various trends observed in [10]: (i) a global suppression of neural variability that increases with the stimulus contrast; (ii) a directional tuning of the variability that is bimodal; (iii) a peak in the suppression of cells at the preferred directional selectivity. One difference between our theoretical results and those of [10] is that, in the latter case, the directional tuning of the variance is not purely sinusoidal. Part of this can be accounted for by noting that we consider the variance of the activity variable u rather than the firing rate f(u). Moreover, for analytical convenience, we take the synaptic weight functions etc. to be firstorder harmonics. In Fig 4(b) we show numerical plots of the variance in the firing rate, which exhibits the type of bimodal behavior found in [10] when the ring network operates in the marginal regime.

Effects of inter-laminar coupling (model A)
We now turn to a pair of coupled ring networks that represent vertically connected layers as shown in Fig 1(a) (model A), with inter-network weight distribution (6). For analytical tractability, we impose the symmetry conditions However, we allow the contrasts of the external stimuli to differ, � (10) then reduce to the form, see Materials and methods with Given our various simplifications, we can rewrite Eq (23) in the more compact form where F is the potential function Introduce the joint probability density This satisfies the two-dimensional forward Fokker-Planck equation (dropping the explicit dependence on initial conditions) for β j 2 [−π, π] and periodic boundary conditions p(−π, β 2 , t) = p(π, β 2 , t), p(β 1 , −π, t) = p(β 1 , π, t). The existence of a potential function means that we can solve the time-independent FP equation. Setting time derivatives to zero, we have where J j is a probability current. In the stationary state the probability currents are constant, but generally non-zero. However, in the special case D 1 = D 2 = D, then there exists a steadystate solution in which the currents vanish. This can be seen by rewriting the vanishing current conditions as This yields the steady-state probability density, which is a generalization of the von Mises distribution, where and N is the normalization factor The distribution M 2 (β 1 , β 2 ;κ 1 , κ 2 , χ) is an example of a bivariate von Mises distribution known as the cosine model [49]. The normalization factor can be calculated explicitly to give N ðk 1 ; k 2 ; wÞ ¼ ð2pÞ The corresponding marginal distribution for β 1 is An analogous result holds for the marginal density p(β 2 ). We now summarize a few important properties of the cosine bivariate von Mises distribution [49]: and is bimodal if 2. When κ 1 and κ 2 are large, the random variables (β 1 , β 2 ) are approximately bivariate normal distributed, that is, (β 1 , β 2 ) � N 2 (0, S) with We will assume that the vertical connections are maximal between neurons with the same stimulus preference so that � K � 0 and χ � 0. It then follows that p(β 1 , β 2 ) is unimodal. Moreover, from Eq (32) we have For zero inter-network coupling (χ = 0), we obtain the diagonal matrix S ¼ diagðk À 1 1 ; k À 1 2 Þ and we recover the variance of the single ring networks, that is, varðb j Þ ¼ k À 1 j ; there are no interlaminar correlations. On the other hand, for χ > 0 we find two major effects of the interlaminar connections. First, the vertical coupling reduces fluctuations in the phase variables within a layer. This is most easily seen by considering the symmetric case κ 1 = κ 2 = κ for which Clearly, (This result is consistent with a previous study of the effects of inter-network connections on neural variability, which focused on the case of zero stimuli and treated the bump positions as effectively evolving on the real line rather than a circle [22]. In this case, inter-network connections can reduce the variance in bump position, which evolves linearly with respect to the time t.) The second consequence of interlaminar connections is that they induce correlations between the phase β 1 (t) and β 2 (t).
Having characterized the fluctuations in the phases β 1 (t) and β 2 (t), analogous statistical trends will apply to the trial-to-trial variability in the tuning curves. This follows from making the leading-order approximation u j (x, t) � A cos(θ + β j (t)), and then averaging the β j with respect to the bivariate von Mises density M 2 (β 1 , β 2 ; κ 1 , κ 2 , χ). In the large κ j regime, this could be further simplified by averaging with respect to the bivariate normal distribution under the approximations cos(β) � 1 − β 2 /2 and sin β � β. Both the mean and variance of the tuning curves are similar to the single ring network, see Eqs (21) and (22): and Their dependence on the coupling strength χ and input parameter κ 1 = κ 2 = κ is illustrated in Fig 5. Finally, so that inter-network covariance take the form hU 1 ðyÞU 2 ðy 0 Þi À hU 1 ðyÞihU 2 ðy 0 Þi ¼ A 2 cos y cos y 0 ½hcos b 1 cos b 2 i À hcos b 1 ihcos b 2 i� þ A 2 sin y sin y 0 hsin b 1 sin b 2 i: In particular, for θ = θ 0 we have The resulting correlation tuning curve behaves in a similar fashion to the variance, see Fig  5( (Note that our definition of the cross-correlation function differs from that used, for example, by Churchland et al [9]. These authors consider the covariance matrix of simultaneous recordings of spike counts obtained using a 96-electrode array. The matrix is then decomposed into a network covariance matrix and a diagonal matrix of private single neuron noise. Our definition involves pairwise correlations between the activity of two distinct populations. Nevertheless, consistent with the findings of Churchland et al. [9], we find that the cross-correlations decrease in the presence of a stimulus).
The above qualitative analysis can be confirmed by numerical simulations of the full neural field Eq (1), as illustrated in Fig 6( In the absence of interlaminar connections, the phase of network 2 fluctuates much more than the phase of network 1. When interlaminar connections are included, fluctuations are reduced, but network 2 still exhibits greater variability than network 1. This latter result is consistent with an experimental study of neural variability in V1 [24], which found that neural correlations were more prominent in superficial and deep layers of cortex, but close to zero in input layer 4. One suggested explanation for these differences is that layer 4 receives direct feedforward input from the LGN. Thus we could interpret network 1 in model A as being located in layer 4, whereas network 2 is located in a superficial layer, say.

Effects of intra-laminar coupling (model B)
Our final example concerns a pair of coupled ring networks that represent horizontally connected hypercolumns within the same superficial layer, say, as shown in Fig 1(b) (model B), with inter-network weight distribution (7). Again, for analytical tractability, we impose the symmetry conditions A 1 = A 2 = A and � K 1 ¼ � K 2 ¼ � K . However, unlike model A, we take the contrasts to be the same, � h 1 ¼ � h 2 ¼ � h, but allow the biases of the two inputs to differ, � y 1 6 ¼ � y 2 . Eq (10) become, see Materials and methods with w j (t) given by Eq (24) and We can rewrite KðbÞ in the form ; �ðbÞ ¼ f ðA cosðy À bÞÞf ðA cosðyÞÞ: Note that ϕ(−β) = ϕ(β) and thus ϕ 0 (−β) = −ϕ 0 (β). A sample plot of the potential ϕ(β) is shown in Fig 7(a), together with an approximate curve fitting based on a von Mises distribution. For the given firing rate parameters η = 0.5 and γ = 4, the unperturbed bump amplitude is A � 1.85. As in the case of model A, we can rewrite Eq (40) in the more compact form where C is the potential function and we have absorbed the factor 2/(A|Γ|) into the constant � K . The corresponding two- dimensional forward Fokker-Planck equation is for β j 2 [−π, π] and periodic boundary conditions p(−π, β 2 , t) = p(π, β 2 , t), p(β 1 , −π, t) = p(β 1 , π, t). Following the analysis of model A, if D 1 = D 2 = D then the stationary density takes the form where and M is a normalization factor. Long-range horizontal connections within superficial layers of cortex are mediated by the axons of excitatory pyramidal neurons. However, they innervate both pyramidal neurons and feedforward interneurons so that they can have a net excitatory or inhibitory effect, depending on stimulus conditions [36,53,54], More specifically, they tend to be excitatory at low contrasts and inhibitory at high contrasts. Suppose that ring network 1 represents a hypercolumn driven by a stimulus � h cos y and network 2 represents a hypercolumn driven by a stimulus � h cosðy À � yÞ, see Fig 1(b). In Fig 7(b) and 7(c) we plot how the normalized maximal mean and variance of network 1 (at θ = ±π/2) varies with the directional bias � y of the input to network 2. We also show the baseline mean and variance in the absence of horizontal connections (χ = 0). It can be seen that the mean and variance covary in opposite directions. In particular, for inhibitory horizontal connections (χ < 0) the variance is facilitated relative to baseline when the two stimuli have similar biases ( � y � 0) and is suppressed when they are sufficiently different ( � y � �p). The converse holds for excitatory horizontal connections (χ > 0). In the Discussion, these results will be explored within the context of surround modulation.

Discussion
In this paper we used stochastic neural field theory to analyze stimulus-dependent neural variability in ring attractor networks. In addition to providing a mathematical underpinning of previous experimental observations regarding the bimodal tuning of variability in directionally specific MT neurons, we also made a number of predictions regarding the effects of inter-network connections on noise suppression: 1. Excitatory vertical connections between cortical layers can suppress neural variability; different cortical layers can exhibit different degrees of variability according to the strength of afferents into the layers.
2. At low stimulus contrasts, surround stimuli tend to suppress (facilitate) neural variability in the center when the center and surround stimuli have similar (different) biases.
3. At high stimulus contrasts, surround stimuli tend to facilitate (suppress) neural variability in the center when the center and surround stimuli have similar (different) biases.
It is important to emphasize that previous related studies of variability in marginally stable ring networks have been based on computer simulations of spatially discrete models [10,20]. That is, integrals with respect to the orientation or direction variable θ are replaced by discrete sums, so that the model dynamics is described by stochastic differential equations rather than stochastic neural fields. As we have demonstrated in this paper, the advantage of neural field theory is that it provides an analytical framework for studying neural variability in marginally stable ring attractor networks, see also [21]. In Ref. [20], the behavior of a marginally stable ring network is compared to a stabilized supralinear ring network. The latter operates in a completely different dynamical regime, consisting of a single stimulus-tuned attractor. This means that there does not exist a bump solution in the absence of a stimulus. One of the consequences of this is that weak (spontaneous) inputs increase variability, which is subsequently quenched by stronger inputs. The basic mechanism involves stimulus-dependent changes in the balance of two opposing effects [20]: feedforward interactions and recurrent excitation, which amplify variability and dominate for weak stimuli, and stabilizing inhibitory feedback, which suppresses variability and dominates in the case of stronger inputs. The authors also show that the orientation tuning of neural variability tends to be Ushaped, rather than M-shaped as found in [10], with a minimum at the preferred stimulus orientation. The stabilized supralinear ring network model was found to be more consistent with single neuron recordings from the V1 of awake primates, when compared to the marginally stable ring model.
However, certain caution should be taken when interpreting the results of Ref. [20]. First, the precise mechanism underlying the role of feedforward and recurrent inputs in generating orientation tuning in V1 is still controversial, see below. Second, the marginally stable ring model can also produce U-shaped tuning of neural variability using an appropriate Fourier decomposition of the weights-the M-shape was a direct consequence of using the first harmonic cosθ. Third, it is far from clear that the same operating regime holds for all V1 neurons, and may also vary according to the specific stimulus feature, cortical layer and cortical area. (The latter might account for differences between MT direction selective cells and V1 neurons.) Finally, there could be differences between the trial-averaged statistics of single neuron recordings and the statistics of local neural populations, as represented by neural field variables. As a further comparison of the two model paradigms, it would be interesting to explore the effects of intralaminar and interlaminar coupling on noise variability in stabilized supralinear ring networks.

Weak stimulus assumption
In order to utilize perturbation methods, we assumed that the ring networks were driven by weakly biased stimuli. This assumption depends on a particular view of how cortical neurons are tuned to stimuli. Consider the most studied example, which involves orientation tuning of cells in V1. The degree to which recurrent processes contribute to the receptive field properties of V1 neurons has been quite controversial over the years [55][56][57][58]. The classical model of Hubel and Wiesel [59] proposed that the orientation preference and selectivity of a cortical neuron in input layer 4 arises primarily from the geometric alignment of the receptive fields of thalamic neurons in the lateral geniculate nucleus (LGN) projecting to it. (Orientation selectivity is then carried to other cortical layers through vertical projections). This has been confirmed by a number of experiments [60][61][62][63][64]. However, there is also significant experimental evidence suggesting the importance of recurrent cortical interactions in orientation tuning [65][66][67][68][69][70][71]. One issue that is not disputed is that some form of inhibition is required to explain features such as contrast-invariant tuning curves and cross-orientation suppression [58]. The uncertainty in the degree to which intracortical connections contribute to orientation tuning of V1 neurons is also reflected in the variety of models. In ring attractor models [26,27,72,73], the width of orientation tuning of V1 cells is determined by the lateral extent of intracortical connections. Recurrent excitatory connections amplify weakly biased feedforward inputs in a way that is sculpted by lateral inhibitory connections. Hence, the tuning width and other aspects of cortical responses are primarily determined by intracortical rather than thalamocortical interconnections. On the other hand, in push-pull models, cross-orientation inhibition arises from feedforward inhibition from interneurons [62,74]. Finally, in normalization models, a large pool of orientation-selective cortical interneurons generates shunting inhibition proportional in strength to the stimulus contrast at all orientations [75]. In the end, it is quite possible that are multiple circuit mechanisms for generating tuned cortical responses to stimuli, which could depend on the particular stimulus feature, location within a feature preference map, and cortical layer [58].

Surround modulation of neural variability
Surround modulation (SM) refers to the phenomenon in which stimuli in the surround of a neuron's receptive field (RF) modulate the neuron's response to stimuli simultaneously presented inside the RF. SM is a fundamental property of sensory neurons in many species and sensory modalities, and is thought to play an important role in contextual image processing. As with mechanisms of orientation tuning, there is considerable debate over whether feedforward or intracortical circuits generate SM, and whether this results from increased inhibition or reduced excitation [19,36,53,54,[76][77][78][79][80][81][82]. SM has been characterized in many species, commonly using circular grating patches of increasing radius or grating patches confined to the RF surrounded by annular gratings, and varying systematically the grating parameters. Modulatory effects are typically quantified in terms of changes in the mean firing rates of single neurons recorded from the center. Some of the main features of SM in V1 are as follows (see [36] and references therein): (i) SM is spatially extensive. For example, in primates, modulatory effects from the surround (both facilitatory and suppressive) can be evoked at least 12.5 degrees away from a neuron's RF center. (ii) SM is tuned to specific stimulus parameters. The strongest suppression is induced by stimuli in the RF and surround of the same orientation, spatial frequency, drift direction, and speed, and weaker suppression or facilitation is induced by stimuli of orthogonal parameters (e.g., orthogonally oriented stimuli or stimuli drifting in opposite directions). (iii) SM is contrast dependent. Surround stimulation evokes suppression when the center and surround stimuli are of high contrast, but can be facilitatory when they are of low contrast.
One way to interpret the results of model B is to treat networks 1 and 2 as hypercolumns driven by center and surround stimuli, respectively. SM is then mediated by the horizontal connections that can have a net excitatory or inhibitory effect, depending on stimulus conditions. Here, for simplicity, we impose the sign of the horizontal connections by hand. However, one could develop a more detailed model that implements the switch between excitation and inhibition using, for example, high threshold interneurons [54]. The major prediction of our analysis is that whenever the surround modulation suppresses (facilitates) the center firing rate, the corresponding variance is facilitated (suppressed).

Extensions of the neural field model
One of the main simplifications of our neural field model is that we do not explicitly distinguish between excitatory and inhibitory populations. This is a common approach to the analysis of neural fields, in which the combined effects of excitation and inhibition are incorporated using, for example, Mexican hat functions [83][84][85]. In the case of the ring network, the spontaneous formation of population orientation tuning curves or bumps is implemented using a cosine function, which represents short-range excitation and longer-range inhibition around the ring. We note, however, that the methods and results presented in this paper could be extended to the case of separate excitatory and inhibitory populations, as well as different classes of interneuron, as has been demonstrated elsewhere for deterministic neural fields [27,54]. One major difference between scalar and E-I neural fields is that the latter can also exhibit time-periodic solutions, which would add an additional phase variable associated with shifts around the resulting limit cycle. The effects of noise on limit cycle oscillators can be analyzed in an analogous fashion to wandering bumps [86,87]. We also note that neural variability in a two-population (E-I) stabilized supralinear network has been analyzed extensively using linear algebra [20].
Another possible extension of our work would be to consider higher-dimensional neural fields. For example, one could replace the ring attractor on S 1 by a spherical attractor on S 2 . In the latter case, marginally stable modes would correspond to rotations of the sphere. (Mathematically speaking, this corresponds to the action of the Lie group SO(3) rather than SO(2) for the circle.) One could generalize the Fourier analysis of the ring network by using spherical harmonics, as previously shown for deterministic neural field models of orientation and spatial frequency tuning in V1 [88,89]. One could also consider a planar neural field with Euclideansymmetric weights, for which marginally stable modes would be generated by the Euclidean group of rigid body transformations of the plane (translations, rotations and reflections.) However, this example is more difficult since the marginally stable manifold is non-compact, and one cannot carry out a low-dimensional harmonic reduction. In order to obtain analytical results, one has to use Heaviside rate functions [30,90].
A third possible extension would be to develop a more detailed model of the laminar structure of cortex. Roughly speaking, cortical layers can be grouped into input layer 4, superficial layers 2/3 and deep layers 5/6 [37,[91][92][93]. They can be distinguished by the source of afferents into the layer and the targets of efferents leaving the layer, the nature and extent of intralaminar connections, the identity of interneurons within and between layers, and the degree of stimulus specificity of pyramidal cells. In previous work, we explored the role of cortical layers in the propagation of waves of orientation selectivity across V1 [94], under the assumption that deep layers are less tuned to orientation. This suggests considering coupled ring networks that differ in their tuning properties. Another modification would be to consider asymmetric coupling between layers, both in terms of the range of coupling and its strength. Interestingly, the properties of SM also differ across cortical layers, suggesting different circuits and mechanisms generating SM in different layers. More specifically, surround fields in input layer 4 are smaller than in other layers, and SM is weaker and untuned for orientation. Moreover, SM is stronger and more sharply orientation-tuned in superficial layers compared to deep layers [36]. Therefore, it would be interesting to consider coupled ring networks that combine models A and B.

Spiking versus rate-based models
One final comment is in order. Neural variability in experiments is typically specified in terms of the statistics of spike counts over some fixed time interval, and compared to an underlying inhomogeneous Poisson process. Often Fano factors greater than one are observed. In this paper, we worked with stochastic firing rate models rather than spiking models, so that there is some implicit population averaging involved. In particular, we focused on the statistics of the variables u j (x, t), which represent the activity of local populations of cells rather than of individual neurons, with f(u j ) the corresponding population firing rate [30]. This allowed us to develop an analytically tractable framework for investigating how neural variability depends on stimulus conditions within the attractor model paradigm. In order to fit a neural field model to single-neuron data, one could generate spike statistics by taking f(u j ) to be the rate of an inhomogeneous Poisson process. Since f(u j ) is itself stochastic, this would result in a doubly stochastic Poisson process, which is known to produce Fano factors greater than unity [95]. Moreover, the various phenomena identified in this paper regarding stimulus-dependent variability would carry over to a spiking model, at least qualitatively. However, one should not expect a mean-field reduction to capture everything in a spiking model. For example, multivariate doubly stochastic Poisson processes can have correlations between their spike times in addition to the correlations induced by shared rate fluctuations. Spiking network models typically do produce these spike timing correlations that are not captured by most mean-field reductions, even those that account for correlated firing rate fluctuations [13,15,[96][97][98]. These correlations could, in turn, affect auto-correlation and firing rates in the network.

Materials and methods
We present the details of the derivation of the stochastic phase Eq (10).

Stationary bumps in a single uncoupled ring
First, suppose that there are no external inputs, no inter-network coupling (J 12 = J 21 = 0), and no noise (� = 0). Each network can then be described by a homogeneous ring model of the form @uðy; tÞ @t ¼ À uðy; tÞ þ Substituting the cosine series expansion cosðy À y 0 Þ ¼ cosðyÞ cosðy 0 Þ þ sinðyÞ sinðy 0 Þ ð49Þ into the integral equation yields the even solution U(θ) = A cos θ with the amplitude A satisfying the self-consistency condition The amplitude Eq (50) can be solved explicitly in the large gain limit γ ! 1, for which f(u) ! H(u − κ), where H is the Heaviside function [21]. That is, A ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 þ k p � ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 À k p , corresponding to a marginally stable large amplitude wide bump and an unstable small amplitude narrow bump, consistent with the original analysis of Amari [90]. On the other hand, at intermediate gains, there exists a single stable bump rather than an unstable/stable pair of bumps, see Fig 8. Linear stability of the stationary solution can be determined by considering weakly perturbed solutions of the form u(θ, t) = U(θ) + ψ(θ)e λt for |ψ(θ)| � 1. Substituting this expression into Eq (47), Taylor expanding to first order in ψ, and imposing the stationary condition (48) yields the infinite-dimensional eigenvalue problem [27] ðl þ 1ÞcðyÞ ¼ This can be reduced to a finite-dimensional eigenvalue problem by applying the expansion (49): where Substituting Eqs (52) into (53) then gives the matrix equation [21] ðl þ 1Þ A I½cos y sin y� where for any periodic function v(θ) Integrating Eq (50) by parts shows that for A 6 ¼ 0 Hence, exploiting the fact that I is a linear functional of v, Finally, integration by parts establishes that which yields the pair of solutions The zero eigenvalue is a consequence of the fact that the bump solution is marginally stable with respect to uniform shifts around the ring; the generator of such shifts is the odd function sinθ. The other eigenvalue λ e is associated with the generator, cosθ, of expanding or contracting perturbations of the bump. Thus linear stability of the bump reduces to the condition λ e < 0. This can be used to determine the stability of the pair of bump solutions in the high-gain limit [21]. (Note that there also exist infinitely many eigenvalues that are equal to −1, which form the essential spectrum. However, since they lie in the left-half complex λ-plane, they do not affect stability).
A variety of previous studies have shown how breaking the underlying translation invariance of a homogeneous neural field by introducing a nonzero external input stabilizes wave and bump solutions to translating perturbations [21,[99][100][101][102]. For the sake of illustration, suppose that hðyÞ ¼ � h cosðyÞ in the deterministic version of Eq (1). This represents a weak θdependent input with a peak at θ = 0. Extending the previous analysis, one finds a stationary bump solution UðyÞ ¼ Acos y þ ffi ffi � p � h cos y, with A satisfying the implicit equation where L j b are the following linear operators It can be shown that the operator L j 0 has a 1D null space spanned by U 0 j ðyÞ. The fact that U 0 j ðyÞ belongs to the null space follows immediately from differentiating Eq (48) with respect to θ. Moreover, U 0 j ðyÞ is the generator of uniform translations around the ring, so that the 1D null space reflects the marginal stability of the bump solution. (Marginal stability of the bump means that the linear operator L j 0 has a simple zero eigenvalue while the remainder of the discrete spectrum lies in the left-half complex plane. The spectrum is discrete since S 1 is a compact domain.) This then implies a pair of solvability conditions for the existence of bounded solutions of Eq (58a), namely, that dv j is orthogonal to all elements of the null space of the adjoint operator L jy b j . The corresponding adjoint operator is L jy b vðy; tÞ ¼ À vðy; tÞ þ f 0 ðU j ðy þ bÞÞ Z p À p J j ðy À y 0 Þvðy 0 ; tÞdy 0 : ð61Þ Let V j ðyÞ span the 1D adjoint null space of L y 0 . Now taking the inner product of both sides of Eq (58a) with respect to V j ðy þ b j Þ and using translational invariance then yields the following SDEs to leading order: where for H j (β + 2π) = H j (β), and Here w j (t) are scalar independent Wiener processes, E½dw j ðtÞ� ¼ 0; E½dw j ðtÞdw k ðt 0 Þ� ¼ d j;k D j dðt À t 0 Þdt 0 dt; V j ðyÞV j ðy 0 ÞC j ðy À y 0 Þdy 0 dy: ð66Þ Note that stochastic phase equations similar to (62) were previously derived in [21,22], except that the functions H j (β) and K j ðbÞ were linearized, resulting in a system of coupled Ornstein-Uhlenbeck (OU) processes: ffi ffi ffi ffi ffi 2� p dw 1 ðtÞ; ð67aÞ for constant coefficients ν 1 , ν 2 , r 1 , r 2 . Properties of one-dimensional OU processes were then used to explore how the variance in the position of bump solutions depended on inter-network connections and statistical noise correlations. However, it should be noted that the variables β j (t) are phases on a circle (rather than positions on the real line), so that the right-hand side of Eq (67) should involve 2π-period functions. Therefore, the linear approximation only remains accurate on sufficiently short times scales for which the probability of either of the phases winding around the circle is negligible. In order to illustrate this point, consider an uncoupled OU process evolving according to ffi ffi ffi ffi ffi 2� p dw j ðtÞ: A standard analysis shows that [103] hb j ðtÞi ¼ b 0 e À n j t ; hb j ðtÞ 2 i À hb j ðtÞi 2 ¼ �D j n j 1 À e À 2n j t ½ �: In particular, the variance approaches a constant �D/2ν j in the large t limit. The corresponding density is given by the Gaussian rðb; tjb 0 ; 0Þ ¼ 1 ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 2p�D j ½1 À e À 2n j t �=n j q exp À ðb À b 0 e À kt Þ 2 2�D j ½1 À e À 2n j t �=n j ! : Although the linear approximation is sufficient if one is interested in estimating the diffusivity D j , which was the focus of [21,22], it does not yield the correct steady-state distribution on the ring in the limit t ! 1. Indeed, for v j ! 0, the density of the OU process converges pointwise to zero, whereas ρ(β, t) ! 1/2π on the ring. In our paper, we are interested in the full steady-state densities rather than just the diffusivities D j .

Evaluation of functions H j and K j
In order to determine the functions H j and K j we need to obtain explicit expressions for the null vectors V j . We will take h j ðyÞ ¼ � h j cosðy À � y j Þ. Applying the expansion (49) to the adjoint equation L jy 0 V j ¼ 0 with L jy 0 defined by Eq (61), we can write [21] V j ðyÞ ¼ f 0 ðU j ðyÞÞ½C j cos y þ S j sin y�; with sin y V j ðyÞdy: Substituting the expression for V j ðyÞ into the expressions for C j and S j then leads to a matrix equation of the form (56) with λ = 0. Since I½1� 6 ¼ 1, it follows that C j = 0 so that, up to scalar multiplications, V j ðyÞ ¼ f 0 ðU j ðyÞÞ sin y; UðyÞ ¼ A j cos y: Now substituting VðyÞ into Eq (63), we have f 0 ðU j ðyÞÞ sin y cosðy À � y j À bÞdy f 0 ðU j ðyÞÞ sin y½cos y cosðb þ � y j Þ þ sin y sinðb þ � y j Þ�dy with We have used the fact that f@(U j (θ)) is an even function of θ, so that the coefficient for cosðb þ � y j Þ is zero. The constant Γ j can be calculated from Eq (65): It follows that cos y 0 f ðA 2 cosðy 0 ÞÞdy 0