Input correlations impede suppression of chaos and learning in balanced firing-rate networks

Neural circuits exhibit complex activity patterns, both spontaneously and evoked by external stimuli. Information encoding and learning in neural circuits depend on how well time-varying stimuli can control spontaneous network activity. We show that in firing-rate networks in the balanced state, external control of recurrent dynamics, i.e., the suppression of internally-generated chaotic variability, strongly depends on correlations in the input. A distinctive feature of balanced networks is that, because common external input is dynamically canceled by recurrent feedback, it is far more difficult to suppress chaos with common input into each neuron than through independent input. To study this phenomenon, we develop a non-stationary dynamic mean-field theory for driven networks. The theory explains how the activity statistics and the largest Lyapunov exponent depend on the frequency and amplitude of the input, recurrent coupling strength, and network size, for both common and independent input. We further show that uncorrelated inputs facilitate learning in balanced networks.

Introduction Neural circuits are highly interconnected, which generates complex activity dynamics both spontaneously and in response to incoming stimuli. Identifying mechanisms by which timevarying stimuli can control circuit dynamics is important for understanding information transmission, learning reliable input-output functions, and designing optogenetic stimulation protocols.
Recurrent neural networks provide a framework for understanding the interaction between external input and internally-generated dynamics. These networks can exhibit rich chaotic dynamics in the absence of external input [1]. External input can suppress chaotic dynamics, thus controlling the internal state of the network [2][3][4]. Such control of the recurrent dynamics appears necessary for reliable task learning [5][6][7][8].
Fundamental features of biological neural network dynamics include operation in continuous time, nonnegative firing rates and segregation of excitation and inhibition. Here we address input-driven network dynamics that adhere to these biological constraints. Excitation and inhibition in most biological circuits are conveyed by separate sets of neurons with a predominance of recurrent inhibitory feedback, a property known as 'inhibition dominance' [9][10][11][12]. Moreover, neurons in local populations receive time-dependent input that is correlated across neurons and can trigger a time-dependent population response. It is important to investigate how such biological features shape network dynamics, response to external inputs, and learning.
A class of recurrent network models originally proposed to explain the origins of asynchronous irregular activity is termed 'balanced' [13,14]. In these networks, large excitatory inputs are dynamically canceled by strong recurrent inhibitory feedback. Firing-rate networks in the balanced state can exhibit chaotic activity fluctuations [15,16], giving rise to complex activity patterns. How the dynamic cancellation described in balanced networks of binary neurons [13,14] extends to firing-rate models and how it affects the suppression of chaotic dynamics has not yet been addressed. Previous dynamic mean-field theory (DMFT) approaches to input-driven rate networks assumed that the mean of the external input across neurons does not depend on time, which facilitates DMFT [2][3][4].
It remains unclear how external input should be structured to suppress chaos and control the network state effectively in rate networks satisfying biological constraints. To address this gap, we study stimulus-induced suppression of chaos in balanced rate networks with two types of time-dependent external input. Specifically, we study time-dependent input that is either identical across network neurons (referred to as common input) or that varies independently between the neurons (referred to as independent input).
We show that much larger input modulations are necessary to suppress chaos in networks that are driven by common input, because common input is canceled by strong recurrent inhibition in balanced networks. Conventional DMFT methods [1,4,15,16] are not adequate to fully capture the effects of time-varying common input. Therefore, we developed a DMFT that is non-stationary, meaning that the order parameters can explicitly depend on time. This novel technique accurately captures the time-dependent mean and variance, the two-time autocorrelation function and the largest Lyapunov exponent of input-driven network dynamics. Specifically, we calculate the smallest input modulation amplitude required to suppress chaos, referred to as the critical input amplitude. Using both theory and simulations, we examine differences in the effect of common and independent input across a wide range of frequencies of the sinusoidal input modulation, weight heterogeneity and network sizes. We also provide approximations at low and high input frequencies. All the theoretical results match those from network simulations, provided the networks are sufficiently large.
Our findings have important implications for learning in balanced models and for fitting rate networks that obey biological constraints to neural data. We quantify how successful learning performance requires chaos suppression. As a result of residual chaotic fluctuations, common input that is used to suppress chaos during learning in a number of schemes [5][6][7][8] meets with limited success in balanced networks unless it has a very large amplitude. We show how the use of independent input resolves this problem.

Chaos suppression with common vs independent input
We study how suppression of chaos depends on input correlations in balanced rate networks with time-dependent external input. For simplicity, we begin our analysis by studying a single inhibition-dominated population, where the recurrent inhibitory feedback dynamically balances a positive external input rather than recurrent excitation. The excitatory-inhibitory case is considered in a later section. Thus, we study a network of N nonlinear rate units ('neurons') with dimensionless synaptic currents h i and firing rates ϕ(h i ) that obey with i.i.d. Gaussian-distributed random couplings J ij � N ðÀ J 0 = ffi ffi ffi ffi N p ; g 2 =NÞ, where the gain parameter g controls the weight heterogeneity of the network. The transfer function ϕ is set to a threshold-linear function ϕ(x) = max(x, 0). The 1= ffi ffi ffi ffi N p scaling of the negative mean coupling results in strongly negative recurrent feedback that dynamically cancels the constant input term ffi ffi ffi ffi N p I 0 . In addition to this constant positive term, the external input contains a timedependent component δI i (t).
Throughout, we distinguish between two types of time-dependent inputs, 'common' vs 'independent'. In both cases, the time-dependence is sinusoidal, but for common input, δI i (t) = δI(t) = I 1 sin(2πft), which is identical across network neurons (Fig 1A). For independent input, δI i (t) = I 1 sin(2πft + θ i ) has an independent random phase for each neuron ( Fig  1B), with phase θ i drawn independently from a uniform distribution between 0 and 2π. We assume that N is large enough or the phases are appropriate so that we can take the average of δI i (t) across the population to be zero in the independent case. The amplitude of δI i (t) is denoted by I 1 , and f is the input frequency. We will investigate in the following how large I 1 has to be, and how it has to scale with network size, in order to control the dynamics of recurrent networks and suppress chaotic fluctuations for the two input types. Therefore, we do not a assume priori any particular scaling of I 1 with network size in Eq 1.
For firing-rate networks in the balanced state, suppression of chaos strongly depends on the correlations of the input (Fig 1). One might expect that driving all neurons with a common input would be an effective way to suppress chaos, but input that is shared across neurons recruits strong recurrent inhibitory feedback that is anti-correlated with the common input ( Fig 1C). This means that the time-varying external input is dynamically canceled by recurrent feedback, leaving behind only a small fraction of the time-dependent common input for chaos suppression. In contrast, for independent input, which is randomly phase-offset across neurons, no such cancellation occurs (Fig 1D), and thus weaker external input is required to suppress chaotic fluctuations in the network.
To understand how this discrepancy arises in the model, it is useful to rewrite Eq 1 by decomposing h i ¼ m þh i into a mean component m and residual fluctuationsh i . We decom- where the entries ofJ ij are Gaussian with variance g 2 /N and mean zero. For common input, this results in with mean population firing rate nðtÞ ¼ 1 Here δI(t) directly enters the expression for m, because it is identical across all neurons. It thus directly impacts ν(t) and recruits, through the negative recurrent mean coupling À J 0 = ffi ffi ffi ffi N p , strong recurrent feedback À ffi ffi ffi ffi N p J 0 n that is anti-correlated with the input and cancels most of both the positive static input and the time-dependent common component of the total external input. Solving Eq 2a for the population firing rate ν(t) yields: In the absence of time-dependent input, this equation is commonly referred to as the 'balance equation' [14][15][16]. Note that the impact of δI(t) on the population firing rate is reduced by a factor of 1= ffi ffi ffi ffi N p . With independent input, Eq 1 can be written as In this case, δI i (t) enters the equation for the fluctuationsh i . Thus, the strong recurrent feedback only cancels the positive static input term, ffi ffi ffi ffi N p I 0 . Chaos, in this case, is suppressed through the influence of δI i (t) on the fluctuationsh i , similar to what happens in the case of random non-balanced networks [2][3][4].
We quantify chaos in the network dynamics by the largest Lyapunov exponent λ 1 . This quantity measures the average exponential rate of divergence or convergence of nearby network states [17] and is positive if the network dynamics is chaotic. We computed λ 1 analytically using non-stationary DMFT (Materials and Methods) and confirmed the results by simulations of the full network dynamics. For both common and independent input, λ 1 is a decreasing function of the input amplitude I 1 and crosses zero at a critical input amplitude I crit 1 (Fig 2). With common input, a much larger value of I 1 is required for λ 1 to become negative and thus for chaos suppression.

Dependence on network parameters
Next, we explore how I crit 1 varies between networks driven by common and independent input. As suggested by Eqs 2 and 4, the discrepancy between common and independent input grows with network size N. For common input, I crit 1 is proportional to ffi ffi ffi ffi N p for large N, while it saturates as a function of N for independent input ( Fig 3A). Thus, an ever-increasing I 1 is required to suppress chaotic activity in larger networks that are driven by common input. Note that the agreement between theory and simulations is good for large N (Fig 3A).
In balanced networks, the network size N acts as a scale factor for the mean of the coupling weights and the magnitude of the constant external input (Eq 1). Mean-field theory describes the limit when the number of neurons goes to infinity, but it still contains N as a parameter multiplying these terms (Materials and Methods). To separate these two different aspects, we introduce a 'tightness of balance' parameter K by scaling both J 0 and I 0 with a factor ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi K=N p . This removes the N-dependence in the DMFT equations. K mimics the effect of changing the number of synapses per neuron on the mean current m in a random sparse network [13,15,16]. This allows us to vary the 'tightness of balance' [18,19], while still studying networks with large enough N so that mean-field theory applies (Fig 3B).
We observe that for sufficiently large K, the dependence on K matches that on N in the unscaled model ( Fig 3A): for common input, I crit 1 is proportional to ffi ffi ffi ffi K p and for independent input, I crit 1 is independent of K. However, the qualitative difference between independent and  for large N, but is constant for independent input. Error bars indicate interquartile range around the median. B) Dependence of I crit 1 on 'tightness of balance' parameter K, which scales both I 0 and J 0 . Results for large K are the same as in A but for small K, the network is no longer in the balanced regime, and results for common and independent input become similar. Error bars indicate ±2 std. C) Dependence of I crit 1 on gain parameter g for low input frequency f. Close to g crit , an arbitrarily small independent input can suppress chaos; this is not the case with common input. common input vanishes for small values of K because the network is no longer in the regime of the balanced state. In the balanced regime, we expect the qualitative difference to be largely independent of the transfer function ϕ. For non-balanced networks with zero mean couplings and tanh transfer function, we also found no qualitative difference between common and independent input (Materials and Methods).
In balanced networks, the difference in I crit 1 for common and independent input increases for decreasing gain parameter g. With independent input, I crit 1 becomes arbitrarily small as g approaches g crit ¼ ffi ffi ffi 2 p (Fig 3C). At this critical gain parameter, the network with constant external input transitions from a fixed point to chaos [16]. At low frequency, I crit 1 remains of order ffi ffi ffi ffi N p even near g crit for common input ( Fig 3C). We note that if N is fixed but g is increased to large values, I crit 1 for independent input becomes larger than I crit 1 for common input. The reason is that the variance of the synaptic currents h i (t) grows faster for independent input than for common input as the network approaches a global instability where the dynamics diverges.

Mechanism of chaos suppression for slowly varying common input
An intuitive picture of chaos suppression by common sinusoidal input can be provided in the limit of low frequency, where the input varies more slowly than the intrinsic network fluctuations. We call this limit the quasi-static approximation. In this limit, when I 1 exceeds the static external input ffi ffi ffi ffi N p I 0 , recurrent activity is periodically silenced (Fig 4A and 4B). I 0 þ dI i ðtÞ (dashed) and recurrent input I rec i ðtÞ ¼ P j J ij �ðh j ðtÞÞ (solid) for three example neurons. B) Synaptic currents h i for four example neurons. C) Local Lyapunov exponent from network simulation, which reflects the local exponential growth rates between nearby trajectories (solid), and Lyapunov exponent from stationary DMFT (dashed) used in quasi-static approximation. When I 1 > ffi ffi ffi ffi N p I 0 , external input periodically becomes negative and silences the recurrent activity (gray bars). During these silent episodes, the network is no longer chaotic and l local

PLOS COMPUTATIONAL BIOLOGY
Input correlations impede suppression of chaos and learning in balanced rate networks During these silent episodes, all neurons intermittently approach the locally stable dynamics On the other hand, when g > g crit any positive external input result in chaos [16]. Thus, in a quasi-static approximation, λ 1 is given by averaging the local Lyapunov exponent l local 1 across the silent and chaotic episodes, weighted by their respective durations (Fig 4C; Materials and Methods; l local 1 is approximated using DMFT). During the silent episodes, l local 1 ¼ À 1=t. In the chaotic episodes, l local 1 > 0 depends on how far the network is from the transition to chaos, i.e., on the gain parameter g. As a result, I crit 1 is determined by the duration of the silent episodes that is required to compensate the remaining transiently chaotic dynamics, and it grows monotonically with g ( Fig 3C) because longer silent episodes are necessary to compensate for the stronger chaotic activity.
This quasi-static perspective might suggest that chaos can only be suppressed by common input when I 1 > ffi ffi ffi ffi N p I 0 , so that there exist 'silent episodes' where the total input into neurons ffi ffi ffi ffi N p I 0 þ dIðtÞ is intermittently negative and firing rates are zero. That finding is however only correct in the limit f ! 0. For finite input frequency, chaos can be suppressed by a common input even when the external input is always positive.

Frequency-dependent chaos suppression
We next explore the effects of the frequency of the sinusoidal input on I crit 1 . For both common and independent input, we observe a 'resonant frequency' at which the input is most effective at suppressing chaos ( Fig 5A). For common input, at low frequency, I crit 1 is insensitive to the frequency and is thus well approximated by the quasi-static approximation described above. However, for increasing frequencies, I crit 1 exhibits a minimum in f, which can only be captured by non-stationary DMFT (Materials and Methods). For both common and independent input, when the frequency is high, low-pass filtering originating from the leak term in Eq 1 attenuates the effective input modulation amplitude by a factor of 1= ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 þ ð2pf tÞ . As a result, a stronger input modulation amplitude is required to compensate the effect of this attenuation, and I crit 1 exhibits a linear increase with f ( Fig 5A). We find that also for independent input, I crit 1 exhibits a minimum in f, an effect previously reported for networks that have zero mean couplings, J 0 = 0, and a sigmoidal transfer function [3]. The resonant frequency originates from a superposition of two distinct effects: for increasing frequency, the input decorrelates subsequent Jacobians, which makes the network less chaotic and thus leads to smaller I crit 1 [4,17]. For everincreasing frequencies, however, the input is increasingly attenuated by the filtering effect of the leak term, which overcompensates the decorrelation effect for large f.
We also examined the effect of the coupling gain g on the critical input amplitude I crit 1 . For low input frequencies, a finite value I crit 1 occurred near the onset of chaos at g ¼ g crit (Fig 3C). At a higher frequency, f = 0.2/τ, this is no longer the case (Fig 5B), which is captured by the non-stationary DMFT. Close to g crit , the critical input amplitude is small for both common and independent input.
Collectively, these results demonstrate that a larger input amplitude is necessary to suppress chaotic dynamics when balanced networks are driven by common, as opposed to independent input, and that non-stationary DMFT successfully captures this effect in large networks.

Chaos suppression in balanced two-population E-I network
The effect that we report for a fully-connected random network of neurons with negative mean coupling extends to a sparsely-connected two population excitatory-inhibitory network in the balanced state. We calculate the largest Lyapunov exponent λ 1 as a function of input amplitude I 1 in network simulations and find that, consistent with our earlier observations, a much stronger input is required for common input to reduce λ 1 to zero and consequently suppress the chaotic activity (Fig 6).
Because of the additional parameters, two population excitatory-inhibitory network can exhibit more complex behaviors [13,15,16,20]. Here we consider a one-dimensional parametrization by the excitatory efficacy α, a parameter that multiplies all excitatory couplings as described in [16]. We observe numerically that increasing the excitatory efficacy α increases λ 1 for both common and independent input (Fig 6). We leave a detailed theoretical analysis of two population excitatory-inhibitory network, including the effect of different time constants that is known to affect chaos [15], for future work.

Training balanced networks with common vs independent input
Our results on the impact of common versus independent input have important implications for learning in recurrent networks. To address this issue, we considered a target-based approach for task learning, called full-FORCE [6,8]. In this learning procedure, a 'student network' (S) learns a task by matching its recurrent inputs to those of a 'teacher network' (Fig  7A). The teacher network is randomly connected and driven by the desired output to generate the target currents. The synaptic coupling matrix of the student network is then trained by an online learning algorithm to autonomously generate the desired output (Materials and Methods).
We consider a case in which the task of the student network is to autonomously generate the target output F out ðtÞ ¼ sin ð2pftÞ. In the standard student-teacher network setup [6,8], an input proportional to this desired output, δI i (t) = I 1 sin(2πft), would be injected into each unit of the teacher network. However, in a balanced network, as we have shown, this is not an efficient way to suppress chaos within the teacher network; an input of the form I 1 sin(2πft + θ i ) with varying phases will be far more effective.
We examine learning using teacher networks set up according to Eq 1 with each neuron i driven by δI i (t) = I 1 sin(2πft + θ i ). We systematically studied the influence of common input (same θ i across the teacher network) and independent input (random θ i across the teacher network) on learning performance in the student network. In both cases, test error drops when chaos is suppressed in the teacher network, as signaled by the zero-crossing of λ 1 (Fig 7B), but a much larger value of I 1 is required to obtain the same test error with common input than with independent input.
In Fig 7D, we show examples of firing rates in both the teacher and student network, for two different values of input amplitude I 1 during testing. The readout z(t) = ∑ i w i ϕ (h i (t)) is also shown. When the teacher network dynamics is chaotic, the readout z(t) quickly deviates from the target output. Crucially, chaos suppression in the teacher network is not induced by intermittent silencing of the whole teacher network: a reliable readout z can be produced when the external input ffi ffi ffi ffi N p I 0 þ dIðtÞ > 0 at all times. Difference in chaos suppression in sparsely-connected E-I network. λ 1 as a function of I 1 for common and independent inputs, showing a monotonic decrease with I 1 and a larger zero-crossing for common input. This result is qualitatively similar to that obtained in the single population network with negative mean coupling (Fig 2). Error bars indicate ±2 std, lines are a guide for the eye. Increasing the excitatory efficacy α increases λ 1 for both common and independent input (α 2 {0, 0.5, 0.7}). Model parameters (parameters defined as in [16] for constant input and W I1 and W E1 are the modulation amplitudes of the input to the excitatory and inhibitory population): N E = N I = 3500, K = 700, g = 1. The impact of chaos on task performance is more striking when the test error is plotted against λ 1 for individual network realizations (Fig 7D), demonstrating that trained networks with small test error correspond to ones where the time-varying inputs suppresses chaos in the teacher network. Interestingly, in some cases, the student network can learn to approximately reproduce the prescribed dynamics even when the teacher network is slightly in the chaotic regime (small but positive λ 1 ).

Firing rates and autocorrelations of balanced networks with common and independent input
The non-stationary DMFT also accurately describes the mean population rate ν(t) and the two-time autocorrelation function of the residual fluctuationsh i in the case of common input. In Fig 8, we compare non-stationary DMFT and numerical neural network simulations with common input in both the chaotic (cyclostationary) and stable (periodic) regimes. We consider the population-averaged autocorrelation function of the fluctuating single neuron rates ϕ(h i ), temporally averaged over t 0 making C �� ðt À sÞ a

Fig 7. Common input impedes learning in balanced networks. A)
Schematic of the training setup. A 'student network' (S) is trained to autonomously generate the output F out ðtÞ ¼sinð2pftÞ, by matching its recurrent inputs to those of a driven 'teacher network', whose weights are not changed during training. B) λ 1 in the teacher network as a function of I 1 . C) Test error in the student network as a function of I 1 . Critical input amplitude I crit 1 is indicated by vertical dashed lines. Consistent with the difference in I crit 1 , the teacher networks driven with common input require a larger I 1 to achieve small test errors in the student network. Error bars indicate interquartile range around the median. D) Top: Target output F out (green) and actual output z (dashed orange) for two input amplitudes I 1 2 {5, 15}. Bottom: Firing rate ϕ(h i ) for two example neurons in teacher network with common input (green full line) and student network (orange dotted line) for two input amplitudes. E) Scatter plot of test error as a function of λ 1 for each network realization in A and B, with both common and independent input. When chaos in the teacher network is not suppressed (λ 1 > 0), test error is high. Training is successful (small test error) when targets are strong enough to suppress chaos in the teacher network. Training is terminated when error reaches below 10 −2 . Model parameters: N = 500, g = 2, I 0 = J 0 = 1, ϕ(x) = max(x, 0) in both teacher and student networks; f = 0.2/τ in the teacher network inputs and target F out .
https://doi.org/10.1371/journal.pcbi.1010590.g007 function of the time difference t − s. Here the angular bracket represent either an population average or the stochastic average according to DMFT. Code for the non-stationary DMFT is made available here.
The excellent agreement also holds for independent input (Fig 9), extending previous results of stationary DMFT for driven networks [3] to balanced networks.

Discussion
We investigated how correlations in the external input influence suppression of chaos and learning in balanced networks. Stronger input modulations are required to suppress chaos when the inputs are correlated across neurons. The discrepancy between common and independent input increases for large network size, deep in the balanced regime, and in the vicinity of the transition to chaos of the autonomous dynamics. We developed a non-stationary dynamic mean-field theory to explain the dynamical effects of time-varying input (Materials and Methods). Furthermore, we demonstrated that this discrepancy affects task learning in balanced networks.
Our study is relevant in light of recent advances in optogenetics that allow for time-dependent stimulation of a selected population of neurons. Theoretical models that distinguish between different network dynamic regimes are of interest for this purpose [10,19,21]. Our work addresses this question through the spatiotemporal structure of the feedforward input. One experimental prediction of our work is that, if cortical circuits are in the balanced state, time-varying stimulation that is common across neurons will not suppress response firing-rate variability of firing rates as effectively as independently time-varying stimulation.
For multi-layered networks, common feedforward input from one balanced neural population with asynchronous activity into another population results in a time-averaged current that is proportional to ffi ffi ffi ffi N p [14] but its temporal fluctuations would be small and according to our theory unsuitable to suppress rate chaos. In contrast, if the driving population has synchronous rate fluctuations, the mean of the current fluctuations would be proportional to ffi ffi ffi ffi N p and thus be suitable to suppress rate chaos. We conclude that common input in biological neural circuits is not able to control a downstream population state unless the driving population is in a synchronous state.
Previous studies on suppression of chaos in rate networks were limited to independent inputs in the form of stochastic [2,4] and sinusoidal [3] time-dependent drive, but the networks were not balanced, and their connectivity had zero mean coupling. In these previous studies, the distribution of the inputs across neurons in the population is time-independent  [2][3][4] and stationary DMFT was sufficient to describe the results. In contrast, the treatment of common input is only possible by the non-stationary dynamic mean-field approach introduced here.
The dynamic cancellation of time-varying input through recurrent inhibitory feedback has been previously studied in balanced networks with binary [14,22,23] and spiking neurons [24,25]. Chaos in balanced firing-rate networks was studied previously [4,15,16,20], but the dynamic cancellation of correlated input and its implications on chaos suppression in rate networks were not investigated, nor were the implications for learning. It would be interesting to investigate the influence of input correlations on chaos in alternative models of the balanced state [10,21] and recurrent networks with low-rank structure [26][27][28][29].
The different underlying mechanisms of chaos suppression for common and independent input we report here are not specific to periodic input modulations and a threshold-linear transfer function, which we merely chose for the sake of simplicity and analytical tractability. Balanced rate networks driven by stochastic inputs, such as an Ornstein-Uhlenbeck (OU) process, exhibit a qualitatively similar discrepancy between common and independent inputs (Materials and Methods). In that case, common input corresponds to a situation where all neurons receive the same realization of the OU process, with the intensity of the noise serving as the input amplitude. Moreover, a similar qualitative difference between independent and common input is expected in spiking balanced networks with sufficiently slow synaptic dynamics [15].
The ability to control the dynamics of recurrent networks is closely linked to the problem of learning. A target-based approach to supervised learning in recurrent networks provides a convenient framework for studying the link between chaos and trainability. This is because in this approach, as opposed to backpropagation through time for example, learning-induced changes in connectivity are uncoupled from the dynamics: whether chaos is suppressed in the teacher network does not depend on synaptic changes in the student network. We found that the reduced impact of common input on chaos suppression is reflected in learning performance: when the targets fail to suppress chaos in the teacher network, target trajectories cannot be learned reliably and, as a result, the student network fails to learn the task. This is not only relevant for computational studies of training recurrent neural networks in the balanced state [6][7][8], but also for fitting recurrent network models to neural data [30,31] when imposing biological constraints such as inhibition-dominance, non-negative firing rates and correlated external inputs.
Based on our analysis, we propose two strategies to overcome this problem. One strategy is to time-offset the time-varying input into the teacher network across neurons (as in independent input explored in this study) so that their population average is approximately zero. An alternative approach is to project the target through input weights with significant variability across neurons in the population. Both solutions avoid a large time-varying mean component in the external input that would otherwise be dynamically canceled by recurrent feedback. In sum, the uncovered discrepancy can help to harness the computational capabilities of balanced networks for learning stable trajectories.

Materials and methods
We analyze the dynamics of Eq 1 with time-varying common or independent external input. For common input, we develop a novel non-stationary dynamic mean-field theory (DMFT) yielding time-dependent population firing rates, two-time autocorrelation functions of the activity fluctuations and the largest Lyapunov exponents. For independent input, we calculate autocorrelation functions and Lyapunov exponents using stationary DMFT [1,4,15,16], extending previous work [3].
We consider a single population of neurons with negative mean coupling, with the dynamic equation (see Eq 1) that we repeat here for convenience, As mentioned in the main text, we decompose h i ¼ m þh i and J ij ¼ À J 0 = ffi ffi ffi ffi N p þJ ij , where the entries ofJ ij are i.i.d. Gaussian with variance g 2 /N and mean zero. For convenience, we include here the decompositions of Eq 1 for common input, and for independent input, with the network-averaged (population) firing rate nðtÞ ¼ 1

Common input
After developing a non-stationary DMFT for the dynamics given above with common input, we analyze the small and large frequency limits. Non-stationary dynamic mean-field theory. In this section, we derive a non-stationary DMFT for common input starting from Eq 2. With time-dependent common input to all units, the mean component m(t) and the autocorrelations of the residual fluctuationsh i ðtÞ change over time. Therefore, the statistics of h i are not stationary, in contrast to what is assumed in conventional DMFT approaches [1,3,4,15,16,18,32,33].
The basic idea of DMFT is that for large N, the distribution of the recurrent input for different neurons becomes Gaussian and pairwise uncorrelated, according to the central limit theorem. To this end, we characterize the distribution of the residual fluctuationsh i ðtÞ by considering the (linear) stochastic dynamics where η(t) is a Gaussian process with mean hη(t)i = 0 and autocorrelation qðt; sÞ ¼ hZðtÞZðsÞi ¼ g 2 h�ðmðtÞ þhðtÞÞ�ðmðsÞ þhðsÞÞi : Here and in the following, angular brackets denote expectation values over the distribution of the stochastic processhðtÞ, which approximates population averages in the full network.
The mean-field estimate for the mean component m(t) of h i therefore evolves according to Eq 2 with nðtÞ ¼ h�ðmðtÞ þhðtÞÞi, the mean-field estimate of the mean population firing rate.
We derive coupled equations for the time evolution of the two-time autocorrelation function cðt; sÞ ¼ hhðtÞhðsÞi, which explicitly depends on the two times t and s. Taking the temporal derivative of c(t, s) with respect to s and using Eq 9, we obtain t d ds cðt; sÞ ¼ À cðt; sÞ þ rðt; sÞ; ð11Þ where rðt; sÞ ¼ hhðtÞZðsÞi, which we take as an auxiliary function. Taking the temporal derivative of r(t, s) with respect to t we arrive at an expression for the time evolution of the function r(t, s): where qðt; sÞ ¼ g 2 h�ðmðtÞ þhðtÞÞ�ðmðsÞ þhðsÞÞi (see Eq 10). The idea of considering an auxiliary function r has been proposed for a discrete-time model previously [34]. Together, the dynamic mean-field equations for m(t), c(t, s) and r(t, s) form a closed system of self-consistent dynamic equations and can be solved forward in time, both in s and t, by integrating them on a two-dimensional grid from some initial condition for m, c and r. The integration requires q(t, s), which can be calculated by evaluating a Gaussian double integral that depends on c(t, s), c(t, t), c(s, s), m(t) and m(s). For the threshold-linear transfer function ϕ(x) = max(x, 0), one integral can be evaluated analytically, which allows for an efficient numerical implementation using adaptive Gauss-Kronrod integration [35][36][37]. The non-stationary DMFT accurately captures the time-dependent mean population rate ν(t) and the two-time autocorrelation function (Fig 8) both in the (cyclostationary) chaotic and in the (periodic) driven stable regime.
To quantify chaos, we calculate the largest Lyapunov exponent using DMFT by considering the distance between the states of two replicas of the system with identical realization of the network couplings J ij , identical external input δI i (t), but different initial conditions [4,14,38]. The squared distance between the two systems can be expressed in terms of their two-time autocorrelations c 11 , c 22   with c 21 (t, s) = c 12 (s, t). We next linearize the dynamics of the cross-correlation function and thereby of the squared distance around the solution that is perfectly correlated between the two replicas: c 12 (t, s) = c(t, s) + � k(t, s), � � 1. This yields a linear partial differential equation for the temporal evolution of the squared distance d(t) between infinitesimal perturbations [4]: with d(t) = −2� k(t, t) and q � 0 � 0 ðt; sÞ ¼ g 2 h� 0 ðmðtÞ þhðtÞÞ� 0 ðmðsÞ þhðsÞÞi.
In contrast to earlier approaches [1,4,16], where the statistics were stationary, for common input, the two-time autocorrelation function is required to evaluate Eq 14, which makes q ϕ 0 ϕ 0 (t, s) explicitly dependent on t and s and not only on the difference t − s. Eq 14 can be solved by integrating forward on a two-dimensional grid, similarly to the solution of the two-time autocorrelation function.
Specifically, similar to the case of the equations for c and r, we solve Eq 14 by rewriting it as two differential equations for k and an auxiliary variable l, The function q ϕ 0 ϕ 0 (t, s) can be calculated by evaluating a Gaussian double integral that depends on c(t, s), c(t, t), c(s, s), m(t) and m(s), which we obtained above (Eqs 10-12).
The largest Lyapunov exponent is given by the the average exponential growth rate of k(t, t), discarding an initial transient: Example code in Julia 1.8 for solving the non-stationary DMFT and calculating autocorrelations and the largest Lyapunov exponent is available at github.com/RainerEngelken/ NonstationaryDynamicMeanFieldTheory.
Quasi-static approximation for low-frequency input. We consider very slow common input modulations with τ f � 1. In this case, the network can be approximately described by stationary DMFT which, for g > g crit ¼ ffi ffi ffi 2 p , yields chaotic dynamics for any constant positive external input [16]. However, when dIðtÞ < À ffi ffi ffi ffi N p I 0 , neurons are driven by negative input and the network becomes silent. During these silent episodes, because of the dissipation coming from the leak of the individual neurons, the dynamics is transiently very stable. In other words, for silenced networks, the largest Lyapunov exponent is l local 1 ¼ À 1=t as the Jacobian matrix of the dynamics is À 1 t d ij . The critical input amplitude I crit 1 occurs when these silent episodes on average compensate transiently chaotic episodes. Since the Lyapunov exponent of the chaotic episodes is small for g close to g crit , very short silent episodes suffice to suppress chaos. Therefore, the critical input amplitude in the limit of small g À g crit in the quasi-static approximation is expected to be For increasing g, the positive input episodes become locally more chaotic, which increases I crit 1 . Thus, in the quasi-static approximation, the largest Lyapunov exponent depends on g and the distribution of the time-varying input where l const 1 ðgÞ is the largest Lyapunov exponent for constant input [16], and I is integrated over the probability distribution of dIðtÞ þ ffi ffi ffi ffi N p I 0 . In the second equality, we used the fact that, for constant positive external input, the Lyapunov exponent is independent of the input value I for threshold-linear transfer function due to its positive homogeneity. For δI(t) = I 1 sin(2πft), Eq 20 becomes Solving λ 1 (g, I 1 ) = 0 for I 1 yields l 1 is calculated analytically using stationary DMFT [15,16]. This is the quasi-static approximation plotted as dotted lines in Figs 3C and 5. Note that I crit 1 diverges when g is so large that l const 1 ¼ 1=t. For larger g, arbitrary strong slow inputs cannot suppress chaos. For g close to the autonomous transition g crit , we can use the analytical approximation l const 1 ðgÞ ¼ cðg À g crit Þ [16], where c is a constant of order 1. Thus, In the case of time-varying common input modulations generated by an OU process Again, when g is sufficiently large such that the largest Lyapunov exponent during the chaotic episodes reaches l const 1 ¼ 1 t , OU-input of any amplitude cannot suppress chaos. High-frequency limit. For high input frequencies, the leak term in the network dynamics acts as a low-pass filter of the external input effectively attenuating it by a factor of 1= ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 þ ð2pf tÞ 2 q . Thus, common input is low-pass filtered in Eq 2a. Analyzing the attenuation in Eq 2a, we find a linear dependence for high input frequencies, The expected high-frequency scaling is visible in Fig 5A. The crossover to the linear scaling regime of I crit 1 occurs at f c / ffi ffiffi N p I 0 t : We observed such a behavior of the crossover also in numerical simulations. exponential divergence or convergence of nearby initial conditions, For dynamics on an ergodic attractor, λ 1 does not depend on the initial condition h 0 . We calculated the largest Lyapunov exponent of the networks dynamics in two different ways, both based on analytical expressions of the Jacobian of the dynamics [17,39] and with direct numerical simulations tracking the distance between two nearby trajectories. Based on the Lyapunov exponent, we determine the critical input amplitude I crit 1 using a bisection method with a relative precision of one percent.
Target-based learning. We employ a recently developed target-based learning algorithm called full-FORCE [6,8]. The learning procedure is the following: a student network (S) learns a task by matching its total incoming currents Z S i ðtÞ ¼ P j J S ij � ðh S j ðtÞÞ þ ffi ffi ffi ffi N p I 0 to those of a randomly coupled teacher network (T), that is driven by the desired output signal, i.e. Z T i ðtÞ ¼ P j J T ij �ðh T j ðtÞÞ þ ffi ffi ffi ffi N p I 0 þ dI i ðtÞ; in the special case of a sinusoidal signal F out ðtÞ ¼ sinð2p ftÞ, with δI i (t) = I 1 sin(2π ft + θ i ) (as in the main text). The synaptic matrix J S ij is trained using an online procedure so that the student network can generate the target output autonomously, zðtÞ ¼ P i w i �ðh S i ðtÞÞ, where z(t) is a linear readout of the student network's firing rates. Both the recurrent couplings J S ij and the readout weights w i are trained to produce the prescribed output signal, i.e., zðtÞ � F out ðtÞ.

Fig 10. No qualitative difference in chaos suppression by common vs independent input in canonical random networks. A) I crit
1 as a function of input frequency f (g ¼ ffi ffi ffi 2 p light color, g = 2 dark color). I crit 1 has a minimum for both common and independent input. The independent input case is identical to the scenario studied in [3]. At high f, the low-pass filter effect of the leak term attenuates the external input for both cases, thus resulting in a linearly increasing I crit The incoming currents in the teacher and student network are matched via an online minimization of the following error function for each neuron i 2 1, . . .N, Following [5,6], recursive least square (RLS) is used to minimize the training error (Eq 31) and to concurrently learn the readout weight vector w i . We used a balanced initialization for both the teacher and student network: J T ij and J S ij are independently initialized as i.i.d. Gaussian matrices with mean À J 0 = ffi ffi ffi ffi N p and variance g 2 /N. Both networks receive a constant external input ffi ffi ffi ffi N p I 0 . Euler integration was used with a time step of Δt = 0.01. The regularization parameter for RLS was β = 1.
Test error is computed over a testing period T test ¼ 50T osc , which we take as 50 periods of the desired output signal, i.e. T test ¼ 50=f , as For a periodic target F out , testing is interleaved with training so that the network state h S i ðtÞ is usually close to the target trajectory. In this case, a sufficiently low test error usually implies the presence of a stable limit cycle, and the periodic output is reproduced, up to a phase shift, starting from any initial condition.