Slow diffusive dynamics in a chaotic balanced neural network

It has been proposed that neural noise in the cortex arises from chaotic dynamics in the balanced state: in this model of cortical dynamics, the excitatory and inhibitory inputs to each neuron approximately cancel, and activity is driven by fluctuations of the synaptic inputs around their mean. It remains unclear whether neural networks in the balanced state can perform tasks that are highly sensitive to noise, such as storage of continuous parameters in working memory, while also accounting for the irregular behavior of single neurons. Here we show that continuous parameter working memory can be maintained in the balanced state, in a neural circuit with a simple network architecture. We show analytically that in the limit of an infinite network, the dynamics generated by this architecture are characterized by a continuous set of steady balanced states, allowing for the indefinite storage of a continuous parameter. In finite networks, we show that the chaotic noise drives diffusive motion along the approximate attractor, which gradually degrades the stored memory. We analyze the dynamics and show that the slow diffusive motion induces slowly decaying temporal cross correlations in the activity, which differ substantially from those previously described in the balanced state. We calculate the diffusivity, and show that it is inversely proportional to the system size. For large enough (but realistic) neural population sizes, and with suitable tuning of the network connections, the proposed balanced network can sustain continuous parameter values in memory over time scales larger by several orders of magnitude than the single neuron time scale.


Introduction
The consequences of irregular activity in the brain, and the mechanisms responsible for its emergence, are topics of fundamental interest in the study of brain function and dynamics. In theoretical models of brain activity, the irregular dynamics observed in neuronal activity are often modeled as arising from noisy inputs or from intrinsic noise in the dynamics of single neurons. However, theoretical and experimental works have suggested that explanations based on sources of noise in intrinsic neural dynamics are insufficient to account for the stochastic nature of activity in the cortex [1][2][3][4]. An alternative proposal is that noise in the cortex arises primarily from chaotic dynamics at the network level [3][4][5][6]. A central result in the field is that simple neural circuits with recurrent random connectivity can settle, under a broad range of conditions, into a fixed point called the balanced state [3,[7][8][9]: in this state the mean excitatory drive to each neuron nearly balances the mean inhibitory drive, and neural activity is driven by fluctuations in the excitatory and inhibitory inputs. The overall dynamics are chaotic, resulting in an apparent stochasticity in the activity of single neurons, which can exist even in the absence of any sources of random noise intrinsic to the dynamics of single neurons and synapses.
It remains unclear which computational functions in the brain are compatible with the architecture of the balanced network model, since this model assumes random, unstructured connectivity in its rudimentary form. The possibility that functional circuits in the brain are in a balanced state raises another important question: does the apparent stochasticity of single neurons in this state have similar consequences on brain function as would arise from stochasticity which is truly intrinsic to the neural and synaptic dynamics?
Here we explore the effects of chaotic noise on continuous parameter working memory. This task is particularly sensitive to noise, yet neurons in cortical areas involved in the maintenance of continuous parameter working memory have been shown to fire irregularly during task performance [10,11]. Attractor dynamics are often put forward as a mechanism for the persistent neural activity underlying this task. Continuous attractor networks are dynamically characterized by a continuous manifold of semi-stable steady states, which make it possible to memorize parameters with a continuous range of values, such as an angle or a position [12][13][14][15][16][17][18]. In such networks, noise in neural or synaptic activity can cause diffusion along the manifold of steady states, leading to gradual degradation of the stored memory [19][20][21]. However, irregular activity in the balanced state does not arise from mechanisms intrinsic to neurons or synapses, but rather from chaotic dynamics, and its consequences for continuous parameter working memory are largely unexplored. For this reason we addressed two questions. First, we asked whether a neural network can possess a continuum of balanced stable states. Second, we investigated how, in this scenario, chaotic noise would affect information maintenance.

Persistence in balanced networks
The question of whether balanced networks can produce persistent activity has attracted considerable interest in recent years. Several works explored architectures which give rise to slow dynamics in balanced networks, characterized by the coexistence of multiple discrete balanced states [22]. In several recent works multi-stability resulted from the existence of clustered connectivity, and slow transitions were observed between the discrete semi-stable states [23][24][25]. Other works [8,26] demonstrated that a discrete set of semi-stable states can be embedded in a balanced neural network, using a similar construction as employed in the classical Hopfield model of associative memory [27].
A few works have addressed the possibility that balanced neural networks may generate slow persistent activity over a continuous manifold. Such dynamics were demonstrated in simulations of neural networks that included short-term synaptic plasticity [28], or a derivativefeedback mechanism [29,30]. Previous works have not demonstrated the existence of a continuum of steady states in a balanced neural network analytically, and it has remained unclear whether such a continuum can be obtained without evoking additional mechanisms (such as short-term synaptic plasticity, or derivative-feedback). In addition, the influence of the chaotic dynamics on the persistence of stored memory has not been analyzed. These questions are addressed in the present work.
Below, we identify an architecture in which slow dynamics are attainable in a simple form of a balanced neural network. We prove analytically the existence of a continuous attractor in our model in the large population limit. In finite networks, we show that the chaotic noise drives diffusive motion along the attractor-leading, among other consequences, to slowly decaying spike cross-correlations. We show that the diffusivity scales inversely with the system size, as predicted previously for continuous attractor networks with intrinsic sources of neuronal stochasticity. With a reasonable number of neurons and suitable tuning, our model network exhibits slow dynamics over a continuous manifold of semi-stable states, while exhibiting single neural dynamics which appear stochastic, as observed in cortical circuits.

Reciprocal inhibition between two balanced networks
Our neural network model is based on the classical balanced network model presented in Refs. [3,7,8]. This model consists of two distinct populations of binary neurons, one inhibitory and the other excitatory. The recurrent connectivity is random and sparse, with a probability K/N for a connection, where N is the population size (assumed for simplicity to be the same in both populations), K is the average number of connections per neuron from each population, and the connection strength is $ 1= ffiffiffi ffi K p . For 1 ( K ( N and over a wide range of parameters, the mean population activity settles to a fixed point (the balanced state) where on average the total excitation received by each neuron is approximately canceled by the total inhibition (to leading order in 1= ffiffiffi ffi K p ), and the neural dynamics are driven by the fluctuations in the input. The single neuron activity appears noisy, neither of the populations is fully activated or deactivated, and the overall network state is chaotic.
Despite the nonlinearities involved in the dynamics of each neuron, the population averaged activities in the balanced state are linear functions of the external input [3,7]. We exploit this linearity to build a simple system of two balanced networks projecting to each other. The intuition comes from a simple model of a continuous attractor neural network consisting of linear neurons arranged in two populations that mutually inhibit each other, Fig 1A. The linear rate dynamics of this system are given by: For J = 1 the system has a vanishing eigenvalue, and the fixed points form a continuous line: The simple neural architecture of Fig 1A was used as a basis for modeling the dynamics of neural circuits responsible for memory and decision making in the prefrontal cortex [31][32][33][34][35]. In our model, a balanced subnetwork replaces each of the populations of Fig 1A, and the inhibitory population in each subnetwork projects to the excitatory population of the other subnetwork, Fig 1B and Methods. Thus, the model consists of two reciprocally inhibiting balanced neural populations.
We consider first a scenario in which the inhibitory connectivity between the two sub-networks is all-to-all. Therefore, the network includes a combination of strong, random synapses within each sub-network and highly structured, weak synapses between the two sub-networks. This scenario lends itself to analytical treatment of finite N effects (see below). Later on, we present results also for an alternative scenario, in which the connections between the twosubnetworks are sparse, random, and strong (Additional randomness in connectivity and inputs).

Continuum of balanced states
We first examine whether the two-subnetwork architecture can give rise to a continuum of balanced states. The parameters of the network connectivity in our model are summarized in Fig 1 and in Methods. The mutual inhibition between the subnetworks is assumed to be all to all, and the interaction strength is scaled such that the total inhibitory input to each neuron, coming from the opposing subnetwork scales in proportion to ffiffiffi ffi K p . p =N from each inhibitory population to the excitatory population of the other subnetwork. As in [7], connections within each subnetwork are random with a connection probability K/N, 1 ( K ( N. Connection strengths are: J EE = ffiffiffi ffi K Similar to the case of a single balanced network [7], the mean field dynamics of the population averaged activities for N ! 1 and K ) 1 are given by: where m i ðtÞ ¼ 1=N P N k¼1 s k i ðtÞ [i = 1 (2) for the excitatory (inhibitory) population of the first subnetwork, and similarly i = 3, 4 in the second subnetwork], s k i ðtÞ is the state of neuron k in population i at time t, H(x) is the complementary error function, and u i (α i ) is the mean (variance) of the input to a neuron in population i, averaged over the population and over the random connectivity (Methods). This equation is an approximation which becomes exact in the limit K ! 1.
To check whether there exist parameters for which the system has a continuum of balanced states, it is convenient to write the steady state equations of the above dynamics, while making use of the assumption that K is large. In the limit K ! 1 these equations become linear (Methods): By choosing the interaction strength between the two subnetworks to beJ ¼ J E À J I , this system becomes singular, and has a continuum of solutions arranged on a line in the mean activities space, which represent a continuum of stable balanced states. Finite K. When K is large and finite, but still in the idealized limit of infinite N, the population dynamics remain deterministic, and are approximately described by Eq 3, but the nullclines corresponding to the steady state equations (Eq 13) are now nonlinear. Therefore, a precise continuum of steady states cannot be established. However, if the nonlinear nullclines are close to each other, slow dynamics are attainable in a specific direction of the mean activity space. To make this statement more precise, we first note that the steady state equations always have a symmetric solution in which m 1 = m 3 and m 2 = m 4 . If in addition, at this symmetric point, the slopes of the nullclines are identical (@m 3 /@m 1 = −1), there is a vanishing eigenvalue of the linearized mean field dynamics (Eq 3) around this point (Methods).
Under these conditions, the smallest eigenvalue of the linearized dynamics is expected to be small also in the vicinity of the symmetric point. In fact, even for moderately large values of K (K = 1000), the two nullclines nearly overlap over a large range of m 1 and m 3 , Fig 2A and 2B. Fig 2C demonstrates that in this case there is an eigenvalue close to zero within a wide range of locations along the approximate attractor, and therefore the dynamics are slow at any position along this range. Below, we denote by λ the eigenvalue closest to zero of the linearized dynamics, evaluated at the symmetric fixed point.
As observed in other continuous attractor neural network models, the slow dynamics are sensitive to the tuning of the recurrent connectivity [12,15,29,33] (see also Discussion). This sensitivity is quantified by the dependence of λ on the coupling strength between the two subnetworks. Fig 2D (top panel) shows how λ depends on the mutual inhibition strengthJ and on K: λ is linear inJ , and proportional to ffiffiffi ffi K p . For K = 1000,J must be tuned to a precision of *0.1% to achieve a time scale λ −1 of several seconds, when the intrinsic time scale τ is 10 ms.
The real part of the next eigenvalue is negative, proportional to ffiffiffi ffi K p , and is weakly dependent onJ ( Fig 2D, bottom panel).
We find that the approximate line attractor is stable to small perturbations over a wide range of parameters. This is verified by observing that when linearizing the population dynamics (Eq 3) around positions along the approximate attractor, the real parts of the four The same values of J E , J I , E 0 , τ E , and τ I are used throughout the manuscript. B Same as A, except that herẽ J ' 1:7, tuned to achieve a singular Jacobian at the symmetric point. C Real part of the four eigenvalues of the Jacobian evaluated at different points along the approximate attractor (here parametrized by the value of m 1 ). Note that there are two complex conjugate eigenvalues, and their real parts overlap (red curve). D Top: the eigenvalue closest to zero as a function of the mutual inhibition strength. Different colors correspond to different values of K: 5000 (black), 1000 (green), 500 (red) and 100 (blue). All other parameters are as in A. Bottom: real part of the eigenvalue next to closest to zero as a function of the mutual inhibition strength. E Integration of Eq 3 with injected uncorrelated Gaussian noise with s ¼ 10 À 2 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi 10msec p (Eqs 15 and 16), J ¼ 1:5 (black),J ' 1:7 (blue). F Dynamics of the projection along the special direction (parameters and colors as in E).
https://doi.org/10.1371/journal.pcbi.1005505.g002 eigenvalues are negative, and one of them is close to zero, reflecting the slow dynamics along the approximate attractor-as demonstrated in Fig 2C. As an illustration of the existence of a direction in mean activity space, along which the dynamics are slow, we numerically solved the mean field differential equations in the limit of infinite N and K = 1000, with injected white noise. Fig 2(E) and 2(F) shows that the resulting mean activities trace a line in the mean activities space (E) and the dynamics along the line are slow (F).

Diffusive dynamics in finite size networks
Next, we consider the realistic situation in which N is finite in the two-subnetwork model, while still requiring that N ) K ) 1. Instead of adding noise to the dynamics of each neuron, we ask whether the chaotic dynamics are sufficient to drive diffusive motion along the approximate attractor. This question is motivated by the fact that diffusive dynamics are observed in model neural networks of intrinsically noisy neurons, with a finite number of neurons [19]. In addition, this question is motivated by evidence of diffusive dynamics underlying continuous parameter working tasks-as observed both in the behavioral data and in its neural correlates in the prefrontal cortex [10,11,36].
Since the population dynamics are no longer given by Eq 3, we performed large scale numerical simulations of networks with N ranging between 10 4 to 15 × 10 4 (additional details on the simulations can be found in Methods). To simplify the analysis and the presentation, we chose the random weights within each subnetwork such that they precisely mirrored each other, which ensured that the fixed point would be symmetric (m 1 = m 3 and m 2 = m 4 ). If, alternatively, the connections in each subnetwork are chosen independently, the fixed point deviates slightly from this symmetry plane (this deviation approaches zero for infinite networks). However, all the results described below remain qualitatively valid (see below, Additional randomness in connectivity and inputs).
The neural activity observed in our simulations is irregular and individual neurons approximately exhibit exponential ISI distributions similar to those observed in the two population case, although their dynamics are deterministic (Fig 3). To test whether the network can perform short term memory tasks, we initiated the population activities such that the network state was close to some point along the approximate line attractor. Fig 4A shows the resulting dynamics of the four populations: the activities persisted for a few seconds before decaying towards the symmetric fixed point. Fig 4B shows the projection along the slow direction, X(t) (defined in Eq 17), again revealing the slow decay of the initial state. Fig 4C shows statistics of trajectories that start from two initial positions along the approximate attractor, whenJ is tuned to achieve λ −1 ' 9 s. The state of the network enables discrimination between the two conditions over a time scale of several seconds. The ability to do so with high confidence is influenced both by λ and the stochasticity of the motion, which we characterize in the following section (see also Discussion). S1 Fig shows the mean square displacement (MSD) from the starting point for the same dataset, averaged over all trials.
Long after initialization, the population activities fluctuate around the symmetric fixed point, along a line corresponding to the approximate attractor: a projection on the m 1 − m 3 plane is shown in Fig 4D. Fig 4E demonstrates that X(t) exhibits slow diffusive dynamics. To demonstrate that the dynamics are effectively one dimensional, a projection on a perpendicular direction is shown as well.
Statistics of the diffusive motion. From here on we focus on the dynamics of X(t), the projected position along the approximate attractor. The dynamics of X can be characterized by the two moments: GðX; DtÞ h½Xðt þ DtÞ À XðtÞ The first moment F characterizes the systematic component of drift along the approximate attractor, and the second moment G characterizes the random, diffusive component of the motion. Both moments may depend, in general, on the position X along the approximate attractor.  For small Δt and near the fixed point, we expect F(X, Δt) ' − λXΔt with constant λ, where in the limit of large N and K, λ becomes equal to the smallest eigenvalue of the deterministic linearized dynamics at the symmetric fixed point. In fact, this relation holds to a very good approximation over a wide range of positions along the approximate attractor ( Fig 5A).
The moment G(X, Δt) (Eq 6) characterizes the diffusion along the approximate line attractor, and is the focus of our analysis in the rest of this section, since it quantifies the random aspect of the dynamics, driven by the chaotic noise. Short time scales. Our main interest lies in the diffusive motion over long time scales compared to τ. However, we consider first the diffusive motion over short time scales, quantified by G(X, Δt) for Δt ≲ τ, since in this case the behavior of G can be expressed exactly in terms of the averaged autocorrelation function, q j ðDtÞ 1=N P N i¼1 hs j i ðt þ DtÞs j i ðtÞi (Methods). Using the mean field theory, it is possible to derive a differential equation for q(t) [7], which can be solved numerically. Using the numerical solution, we obtained a prediction for G(X, Δt) which is in excellent agreement with measurements from numerical simulations, Fig 5D. Note that there are no fitting parameters in this calculation.
This analysis leads to two conclusions, which are important for the analysis that follows below: first, G is proportional to Δt for small Δt. Second, G is inversely proportional to N, the  Slow diffusive dynamics in a chaotic balanced neural network Slow diffusive dynamics in a chaotic balanced neural network size of the neural populations. A similar derivation can be applied also to the single balanced network discussed in [7], for Δt ( τ (upper inset in Fig 5E).
In addition, we note that G(X, Δt ! 0)/Δt is approximately constant along the approximate attractor, as seen in Fig 5B. Therefore, in most of the numerical results below we focus on G near the symmetric fixed point (X = 0). Diffusion over arbitrary time scales. On time scales larger than τ, the behavior of the two coupled balanced subnetworks differs dramatically from that of the single balanced network: in the single balanced network G saturates for Δt ≳ τ (Fig 5E, upper inset), whereas in the two coupled balanced subnetworks G continues to increase as a function of Δt, up to Δt of order λ −1 (Fig 5E, main plot). Thus, the diffusive motion generates correlated activity over time scales much longer than τ. Because the chaotic noise itself is uncorrelated on time scales longer than τ (as shown more precisely below), and since λ is approximately constant along the approximate attractor, we may expect the motion to approximately follow the statistics of an Ornstein-Uhlenbeck (OU) process. This approximation provides a good fit to the dynamics, Fig 5C, as expected. This made it possible to extract a diffusion coefficient D from the simulations which characterizes the random motion on time scales τ ≲ Δt ≲ λ -1 . Furthermore, since D and λ are approximately constant over a wide range of positions along the approximate attractor (Fig 5A and 5B), the approximation as an OU process provides a precise and compact description of the trajectory statistics, from which the performance of the network in retention of memory can be deduced (see Discussion and Fig 5C, S1, S2D, S3E and S4B and S4C Figs).
According to Eq 28 (Methods), fluctuations in the mean activity scale as 1/N, but this equation is valid only for time scales smaller than τ, whereas the diffusion coefficient D characterizes fluctuations on longer time scales. Fig 5F demonstrates that the 1/N scaling holds also for the diffusive motion over long time scales: the diffusion coefficient, extracted from a fit to the statistics of an OU process, is inversely proportional to N. The same scaling with N has been observed in continuous attractor networks with intrinsic neural stochasticity [19]. Another important implication of the 1/N scaling is that sufficiently large networks can reliably store a continuous variable in short-term memory (see Discussion).
To understand this result in more detail, we start by considering the time dependent correlation functions of m i in a single balanced network: where i, j 2 {Ex,Inh}. An analytical expression for these correlation functions is not available (see [9] for further discussion). Therefore, we measured them numerically in simulations of activity in the single balanced network architecture. These measurements indicate that the time-dependent correlation functions decay over time scales of order τ, and that they scale as 1/N, Fig 6. Next, we show that the statistics of diffusion in the coupled system can be expressed precisely in terms of the correlation functions of the single, uncoupled balanced networks (Methods). Thus, the correlation structure of the chaotic noise in the single balanced network determines the statistics of the slow diffusive motion along the approximate attractor in the coupled two-population network.
Using the noise cross correlations measured in simulations of a single balanced network, it is possible to obtain a semi analytical approximation for G in the system of two coupled subnetworks, which does not involve any fitting parameters. The measurements of G from simulations are in excellent agreement with this analytical prediction, Fig 5E. The above analysis indicates that the 1/N scaling of the diffusion coefficient ( Fig 5F) is a consequence of the decay with N of cross correlations in activity of different neurons in the single balanced network. In this sense, for large N the network behaves as a collection of neurons with independent random noise, although the source of this apparent noise is the chaotic activity generated by the recurrent connectivity.

Spike correlation functions
The diffusion along the approximate attractor implies that the population activities are correlated over long time scales, up to order λ −1 : Fig 7A shows  Spike trains generated by single neuron pairs are correlated over long time scales as well, since all neurons in the network are coupled to the collective diffusion along the approximate attractor. However, a reliable observation of the slowly decaying correlation in a single pair might require an unrealistically long recording time. This difficulty can be overcome potentially by considering the simultaneous activity of multiple neurons: for example, we find in our simulations of a network with N = 10 5 that for 15 minutes of simulated time, a simultaneous recording from *50 or more neurons from each population would be sufficient to reliably observe the slow temporal decay of the correlations, Fig 7C, whereas a simultaneous recording from ten neurons over 15 minutes may be insufficient. As demonstrated in Fig 7(

Additional randomness in connectivity and inputs
In addition to the results described above, we investigate several scenarios in which we introduce additional randomness, either frozen or dynamic. First, we relax the assumption that connections in the two sub-networks precisely mirror each other. This assumption was made above for convenience: the precise identity of the synaptic connections simplifies the numerical analysis since it ensures a precise symmetry of the dynamics around the hyperplane X = 0.  The main new feature that arises when the synaptic connections are drawn independently in the two sub-networks, is that the relaxation point of the dynamics along the approximate attractor deviates from the hyperplane X = 0. The characteristic magnitude of this deviation decays monotonically to zero with increase of the system size N. We note that the mean field description of the dynamics for finite K ) 1 and in the limit N ! 1, is identical to the mean field dynamics associated with the perfectly symmetric scenario.
Next, we consider a scenario in which the inhibitory connections between the two subnetworks are random, and follow the same basic architecture as the connections within each sub-network. Therefore, instead of assuming weak all-to-all connections of order ffiffiffi ffi K p =N, we include random connections of order 1= ffiffiffi ffi K p , with a probability K/N for a synaptic connection. In addition, we relax the assumption of mirrored connections in the two sub-networks. In this case it is straightforward to show that the mean-field equations remain identical to those associated with the case of all-to-all connections, in the limit N ! 1, K ! 1. Therefore, a continuous set of balanced states can be achieved (Methods).
When K is large and finite, the mean-field equations are slightly different in the two scenarios (Methods). The main outcome of this difference is a shift of the unstable fixed points of the dynamics from the planes m 1 = m 2 = 0 and m 3 = m 4 = 0. Consequently, there is a certain degree of activity in both sub-networks even in the unstable fixed points of the dynamics. This is shown in S3 Fig More significantly, S3 Fig demonstrates that in the case of random and sparse connections between the two sub-networks, the dynamics exhibit the same characteristics as in the case of all-to-all connectivity, and can be accurately approximated as an OU process over time scales longer than τ. The coefficient of diffusion D scales linearly with 1/N, with a prefactor which is close to the one observed in S2 Fig. Finally, we explore the effects of stochasticity in the input E 0 to the network (see Fig 1B). S4 Fig demonstrates that even when the inputs include a large degree of temporal variability, and the noise injected to all the neurons in the network is highly correlated, the network exhibits slow dynamics along an approximate attractor, with statistics that are qualitatively similar to what we present above.

Discussion
In summary, we demonstrated that a simple balanced network can exhibit slow dynamics along a continuous line in the population mean activity space. In finite networks, the chaotic dynamics drive diffusive motion along the approximate line attractor. We calculated the diffusivity in the system, based on the correlation structure observed in a single balanced network, and showed that the diffusion coefficient along the approximate attractor is inversely proportional to the network size. This is similar to the effect of noise that arises from intrinsic neural or synaptic mechanisms [19].
The slow diffusive motion along a one-dimensional trajectory induces correlations within the populations, and in single neuron pairs, that persist up to a long time scale set by the decay time λ −1 . This property characterizes the dynamics of the system even when it is at the resting state (X ' 0), i.e., when the network is not engaged in a memory task. Hence, this observation generates a prediction for spontaneous activity in brain areas such as the prefrontal cortex, in which continuous attractor dynamics, based on mutual inhibition between two populations, have been postulated [33].
A slowly decaying cross-correlation function characterizes the spikes produced by any pair of neurons in the network, but due to the high irregularity in the single neuron activity it may be necessary to average over multiple simultaneously recorded neuron pairs in order to obtain a clear measurement over a realistic time scale for a single experiment (in Fig 7C, 50 neuron pairs and a *15 minute measurement). Furthermore, it is important to label the neurons based on their functional properties: averaged over all populations, the cross correlations seen in Fig 7A cancel. In brain areas involved in short term memory tasks, this labeling can potentially be achieved by first measuring the tuning curves of neurons as a function of the stored variable.
Linear vs. nonlinear neural response in decision making circuits Several models of decision making circuits in the prefrontal cortex were based on the simple neural architecture of Fig 1A [32]. This network architecture can precisely generate a continuous attractor if the activity of single units is a linear function of their input. While the linear dynamics provide a simple intuition for the principles underlying continuous attractor dynamics in recurrent neural networks [31], it is more difficult to obtain a continuum of steady states using the above network architecture when single neural responses are nonlinear. Therefore, specifically tuned forms of nonlinearity [33,35], or more elaborate network architectures-still based on mutual inhibition between two or more neural populations have been proposed [33,34]. In this context the linear input-output relationship, characterizing the single balanced network of Ref. [3], is a useful computational feature that facilitates the construction of a continuous attractor network based on the simple architecture of Fig 1A. However, the main motivation for considering the balanced state in this work lies in its ability to account for the irregular spiking of single neurons in cortical circuits.

Time scales of persistence and retention of information
Continuous attractor networks are an important model for maintenance of short-term memory in the brain. The memory is represented by the position along the attractor, and therefore the stochastic motion along the attractor determines the fidelity of memory retention. Since the dynamics of our proposed network are well characterized as an OU process over time scales longer than τ and over a large range of positions along the approximate attractor, it is straightforward to assess how the position along the approximate attractor evolves in time. All aspects of the trajectory can be easily inferred based on the initial state along the approximate attractor, the time interval, and the two parameters which characterize the OU process: λ and D. Similar considerations can be applied also for noisy continuous attractors in which the stochasticity arises from mechanisms other than the chaotic dynamics studied here [19,20]. We next discuss how the decay time and the diffusivity depend on the parameters of our model.
The decay time λ −1 can be calculated exactly in the limit of N ! 1, K ) 1 (Fig 2). It is interesting to note that there are competing influences of K on the tuning of the attractor: with increase of K, λ −1 becomes more sensitive toJ . However, when K is reduced, the nullclines (Fig 2) become less linear, causing deviations from the ideal behavior far from the symmetry point. We showed that for K * 10 3 and τ = 10ms it is possible to achieve persistence over several seconds, while the decay time and diffusion coefficient are approximately constant along a wide range of positions. This requires to tuneJ to a relative precision of order 0.1%.
Since all times scale linearly with the intrinsic time constant, longer persistence times (or a weaker tuning requirement) can be achieved if the intrinsic time constants of individual units is longer that the value of 10 ms assumed in our examples. Intrinsic neural persistence or slow synapses could potentially contribute to this goal under more realistic biophysical descriptions of the neural dynamics.
Finally, we note that the requirement for precise tuning of the connectivity is a characteristic feature of all continuous attractor models. Several works have proposed ways to achieve tuning through plasticity mechanisms [37,38], or ways to stabilize the dynamics by additional mechanisms such as synaptic adaptation [28,39,40] or negative derivative feedback [29,30], in order to increase the persistence time.

Diffusive motion
The diffusive motion along the approximate attractor, which is the main focus of this work, poses an additional limitation on the persistence of short term memory. While appropriate readout mechanisms may be able to take into account the systematic drift caused by the decay towards to the symmetry point, random diffusion inherently degrades the information stored in the position along the attractor.
With 10 5 neurons per population, random diffusion over an interval of one second causes a deflection in X with a standard deviation of *10 −2 . This quantity should be compared with the possible range of X, which is approximately [-0.2, 0.2] in our parametrization of the position along the approximate attractor (we verified that the dynamics are accurately approximated as an OU process over the range [-0.1, 0.1]). Therefore, with the tuning chosen in Fig 4C, where λ −1 * 10s, and with N = 10 5 , the limiting factor for discrimination between nearby stimuli after a delay period of order 1 s is the diffusive dynamics along the approximate attractor.
Random diffusion in continuous attractor networks of Poisson neurons is also often very significant [19,20]. The diffusivity can be suppressed by increasing the number of neurons, increasing the intrinsic time constant of individual neurons and synapses, or by assuming that the firing of individual neurons is sub-Poisson [19,41]. Our proposed model for a line of persistent balanced states similarly predicts a significant degree of random diffusion, highlighting the need to better understand how noise influences the retention of continuous parameter memory in cortical circuits.
There are several ways in which the random, diffusive component of the motion can potentially be reduced: First, by increasing the number of neurons. We presented results for networks containing (altogether) up to 6 × 10 5 neurons, and it is straightforward to extrapolate our estimates for D to larger networks based on the 1/N dependence of the diffusion coefficient. Second, the diffusion coefficient is expected to decrease significantly if slow synapses participate in the dynamics [19], or if the intrinsic time constant of the neurons is increased. Third, additional mechanisms such as synaptic adaptation [28,40] or derivative feedback [29] may perhaps contribute to a reduction in the diffusivity. Finally, an intriguing possibility is that highly structured and tuned connectivity can yield improved robustness to noise in a balanced state, as hinted by recent results on predictive coding in spiking neural networks [42].

Model
In our model, two balanced neural subnetworks inhibit each other reciprocally: the inhibitory population in each subnetwork projects to the excitatory population of the other subnetwork, Fig 1B. As in Refs. [3,7], the neurons are binary and are updated asynchronously, at update times that follow Poisson statistics. The mean time interval between updates is τ E (τ I ) for neurons in the excitatory (inhibitory) populations. In each update of a neuron k from population i, the new state of the neuron s k i is determined based on the total weighted input to the neuron, where Θ is the Heaviside step function, and u k i is the total input to the unit at that time, Here, T k is the threshold and E 0 is an external input. We chose the external input to be zero for the inhibitory populations and to be positive (and constant) for the excitatory populations. . An excitatory feed-forward input ffiffiffi ffi K p E 0 is fed into both excitatory populations. We denote by u i the mean of u k i over all the neurons k within the population i and over realizations of the connectivity: Here m i are the population averaged activities. The variance of u k i over all the neurons k within the population i and over realizations of the connectivity is given (to leading order in K/N) by: These expressions are obtained in similarity to the derivation of the variances in Ref. [7]. Note that the all-to-all inhibitory connections between the subnetworks contribute only terms of higher order in K/N. In the scenario where the connections between subnetworks are randomly drawn (S3 Fig), the variance of the input to the excitatory neurons includes an additional term, due to the variability of inhibitory synapses from the opposing sub-network. In this scenario The conditions J E − J I > 0, J I > 1, and 0 < J I E 0 /(J E − J I ) < 1 ensure that for 0 < x < J I E 0 /(J E − J I ) the mean activities are positive and none of them is equal to 0 or 1. In Fig 2(E) and 2(F), we artificially add to the dynamics (Eq 3) white Gaussian noise as follows: where hξ i (t)i = 0 and

Simulations and statistics of the diffusive dynamics
Our results for networks with finite N are based on large scale numerical simulations. In each simulation the connections were chosen randomly as described in the text, and an asynchronous update schedule was generated by a Poisson process. Parameter values are specified in the legend of Fig 2 in the main text. Averaged population activities were calculated online. The projection along the approximate attractor was defined at each time point as where m(t) is the measured 4 dimensional averaged population activity, m 0 is the vector of mean population activities at the symmetric fixed point, and v 0 is the left eigenvector of the linearized dynamics with an eigenvalue close to zero. We chose the following normalization for the corresponding right eigenvector (see Eq 14): and the normalization of v 0 was chosen such that the dot product of the left eigenvector and the right eigenvector equals unity. Measurements of G(X, Δt) (Eq 6) were done in the following way: for each value of X we found all the time points for which |X(t i ) − X| < δ, using a small δ ' 10 −3 . Then, for each such t i we calculated [X(t i + Δt) − X(t i )] 2 , and averaged all these values to get G(X, Δt). Subsequently, we averaged over multiple simulations with different quenched noise and update schedules. In the manuscript we present results for G(0, Δt), but in similarity to F(X, Δt)/X, G(X, Δt) was fairly uniform along the approximate attractor. A similar calculation was performed to measure the drift F(X, Δt). In Here t a = aΔt. Now, we averaged over all the measured pairs: For the calculation of the entire population averaged cross covariance we used the measured mean activities: where m l ðtÞ ¼ 1=N P N i¼1 s i l ðtÞ. Note that the sum in Eq 21 includes the auto-covariances, while the expression in Eq 20 does not. However, the contribution of the auto-covariances is negligible in the entire population average, since its contribution, relative to the contribution of cross-covariances scales as 1/N.

Diffusion over short time scales
To analytically evaluate G(X, Δt) (Eq 6) over short time scales, we start by writing the change in the state of the k-th neuron in population i in a short time interval Δt as: where Y k i ðtÞ is the outcome of an update if it occurs, and c k i ðtÞ is a random variable equal to 1 if the i-th neuron was updated between t and t + Δt and to 0 otherwise. The updates occur each τ ms on average, so that whereas for i 6 ¼ j and/or k 6 ¼ l, Now the mean squared displacement, G(X, Δt), can be written as: where v 0 i is the i'th component of the left eigenvector of the Jacobian with eigenvalue close to zero. From Eqs 23 and 24 we see that for Δt ( τ the contribution of elements with i = j, k = l dominates the sum. To leading order in Δt we have Slow diffusive dynamics in a chaotic balanced neural network Defining we obtain: Diffusion over arbitrary time scales To derive an expression for the diffusive dynamics over arbitrary time scales, we start by representing the stochastic linearized dynamics of a single balanced network, near the symmetric fixed point, as a two dimensional stochastic process: where δm is the deviation of the mean activities from the fixed point and δE is the deviation of the input from the constant input E 0 . Here B 1 is a 2 × 2 matrix representing the response to perturbations in m, and B 2 is a 2 × 2 matrix representing the response to perturbations in the feedforward input. Both are obtained analytically from a linearization of the mean field dynamics. Finally, ξ is a random process with vanishing mean, whose covariance functions C ξ (Δt) are stationary and are yet unspecified: Using Eq 29, it is straightforward to relate C ξ (t) to the covariance of the activities (while assuming constant feedforward input, d " E ¼ 0): Using the measurements of C m from simulations, we can thus obtain C ξ numerically, using the above equation. In similarity to C m , C ξ decays to zero over a time scale of order τ. Altogether, Eq 29 describes the stochastic dynamics of a single balanced network close to the balanced state, in response to small fluctuations δE in the feedforward inputs. In the two-subnetwork architecture, each subnetwork is coupled only to the mean activity of the other subnetwork, because of the all-to-all connectivity. More specifically, the mean activity of each subnetwork linearly modulates the external input to the excitatory population of the other subnetwork. Therefore, we can approximate the state of the 4-population network as a stochastic process with the following dynamics: where δm is now a 4 dimensional vector, whose first (last) two entries represent the state of the first (second) subnetwork, and A is the Jacobian of the full 4 dimensional dynamics around the fixed point (Eq 38), related in a simple manner also to the matrices B 1,2 defined above. The correlation matrix of the 4 dimensional noise vector ξ is given bỹ where C ξ is the 2 × 2 noise correlation matrix Eq (30) evalulated for a single balanced network receiving fixed excitatory input, equal to the mean input to each subnetwork at the symmetric fixed point. Finally, we use this description of the dynamics to predict the statistics of diffusion along the line. We multiply Eq 32 from the left by v 0 , the eigenvector with the eigenvalue close to zero, which we denote by λ (note below that λ < 0): Here, X ¼ v T 0 Á δm. Thus, we obtain using the Wiener-Khintchine theorem the time dependent correlation function of X, Finally, the diffusion over an arbitrary time interval Δt is given by: Proof: @m 1 /@m 3 = −1$ vanishing eigenvalue Here we show that when @m 1 /@m 3 = −1 at the symmetric point (m 1 = m 3 , m 2 = m 4 ), the Jacobian matrix has a vanishing eigenvalue, leading to slow dynamics near the fixed point. We denote: In terms of these quantities, the Jacobian matrix can be written as Results from simulations, for a network in which dynamical noise is added to the input E 0 . The noise has a correlation time of 3τ, which ensures that the correlation across neurons is not averaged out due to the asynchronous updating. The noise is described by an OU process: t noise _ x ¼ À x þ s noise ZðtÞ, where τ noise = 30 ms, and η(t) is a gaussian white noise, and the value of σ noise was varied to control the noise amplitude. A Mean activities of the four populations for σ = E 0 /3: blue for one subnetwork and red for the other. The higher activities are those of the excitatory populations (m 1 and m 3 ). B G(X = 0, Δt) for σ noise = 0 (black), σ noise = E 0 /30 (green) and σ noise = E 0 /3 (blue). Error bars represent the standard deviation of the mean. Red: fits to an OU process. C Mean square displacement (MSD) of the location along the line for initial location X(0) = 0.05. Colors are the same as in B. In this figure K = 500, N = 1.5 × 10 5 ,J ' 1:77. (TIF)