Computational Aspects of Feedback in Neural Circuits

It has previously been shown that generic cortical microcircuit models can perform complex real-time computations on continuous input streams, provided that these computations can be carried out with a rapidly fading memory. We investigate the computational capability of such circuits in the more realistic case where not only readout neurons, but in addition a few neurons within the circuit, have been trained for specific tasks. This is essentially equivalent to the case where the output of trained readout neurons is fed back into the circuit. We show that this new model overcomes the limitation of a rapidly fading memory. In fact, we prove that in the idealized case without noise it can carry out any conceivable digital or analog computation on time-varying inputs. But even with noise, the resulting computational model can perform a large class of biologically relevant real-time computations that require a nonfading memory. We demonstrate these computational implications of feedback both theoretically, and through computer simulations of detailed cortical microcircuit models that are subject to noise and have complex inherent dynamics. We show that the application of simple learning procedures (such as linear regression or perceptron learning) to a few neurons enables such circuits to represent time over behaviorally relevant long time spans, to integrate evidence from incoming spike trains over longer periods of time, and to process new information contained in such spike trains in diverse ways according to the current internal state of the circuit. In particular we show that such generic cortical microcircuits with feedback provide a new model for working memory that is consistent with a large set of biological constraints. Although this article examines primarily the computational role of feedback in circuits of neurons, the mathematical principles on which its analysis is based apply to a variety of dynamical systems. Hence they may also throw new light on the computational role of feedback in other complex biological dynamical systems, such as, for example, genetic regulatory networks.


Introduction
The neocortex performs a large variety of complex computations in real time. It is conjectured that these computations are carried out by a network of cortical microcircuits, where each microcircuit is a rather stereotypical circuit of neurons within a cortical column. A characteristic property of these circuits and networks is an abundance of feedback connections. But the computational function of these feedback connections is largely unknown. Two lines of research have been engaged to solve this problem. In one approach, which one might call the constructive approach, one builds hypothetical circuits of neurons and shows that (under some conditions on the response behavior of its neurons and synapses) such circuits can perform specific computations. In another research strategy, which one might call the analytical approach, one starts with data-based models for actual cortical microcircuits, and analyses which computational operations such ''given'' circuits can perform under the assumption that a learning process assigns suitable values to some of their parameters (e.g., synaptic efficacies of readout neurons). An underlying assumption of the analytical approach is that complex recurrent circuits, such as cortical microcircuits, cannot be fully understood in terms of the usually considered properties of their components. Rather, system-level approaches that directly address the dynamics of the resulting recurrent neural circuits are needed to complement the bottom-up analysis. This line of research started with the identification and investigation of so-called canonical microcircuits [1]. Several issues related to cortical microcircuits have also been addressed in the work of Grossberg; see [2] and the references therein. Subsequently it was shown that quite complex real-time computations on spike trains can be carried out by such ''given'' models for cortical microcircuits ( [3][4][5][6], see [7] for a review). A fundamental limitation of this approach was that only those computations could be modeled that can be carried out with a fading memory, more precisely only those computations that require integration of information over a timespan of 200 ms to 300 ms (its maximal length depends on the amount of noise in the circuit and the complexity of the input spike trains [8]). In particular, computational tasks that require a representation of elapsed time between salient sensory events or motor actions [9], or an internal representation of expected rewards [10][11][12], working memory [13], accumulation of sensory evidence for decision making [14], the updating and holding of analog variables such as for example the desired eye position [15], and differential processing of sensory input streams according to attentional or other internal states of the neural system [16] could not be modeled in this way. Previous work on concrete examples of artificial neural networks [17] and cortical microcircuit models [18] had already indicated that these shortcomings of the model might arise only if one assumes that learning affects exclusively the synapses of readout neurons that project the results of computations to other circuits or areas, without giving feedback into the circuit from which they extract information. This scenario is in fact rather unrealistic from a biological perspective, since pyramidal neurons in the cortex typically have in addition to their long projecting axon a large number of axon collaterals that provide feedback to the local circuit [19]. Abundant feedback connections also exist on the network level between different brain areas [20]. We show in this article that if one takes feedback connections from readout neurons (that are trained for specific tasks) into account, generic cortical microcircuit models can solve all of the previously listed computational tasks. In fact, one can demonstrate this also for circuits whose underlying noise levels and models for neurons and synapses are substantially more realistic than those which had previously been considered in models for working memory and related tasks.
We show in the Theoretical Analysis section that the significance of feedback for the computational power of neural circuits and other dynamical systems can be explained on the basis of general principles. Theorem 1 implies that a large class of dynamical systems, in particular systems of differential equations that are commonly used to describe the dynamics of firing activity in neural circuits, gain universal computational capabilities for digital and analog computation as soon as one considers them in combination with feedback. A further mathematical result (Theorem 2) implies that the capability to process online input streams in the light of nonfading (or slowly fading) internal states is preserved in the presence of fairly large levels of internal noise. On the basis of this theoretical foundation, one can explain why the computer models of generic cortical microcircuits, which are considered in the section Applications to Generic Cortical Microcircuit Models, are able to solve the previously mentioned benchmark tasks. These results suggest a new computational model for cortical microcircuits, which includes the capability to process online input streams in diverse ways according to different ''instructions'' that are implemented through high-dimensional attractors of the underlying dynamical system. The high dimensionality of these attractors results from the fact that only a small fraction of synapses need to be modified for their creation. In comparison with the commonly considered low-dimensional attractors, such high-dimensional attractors have additional attractive properties such as compositionality (the intersection of several of them is in general nonempty) and compatibility with real-time computing on online input streams within the same circuit.
The presentation of theoretical results for abstract circuit models in the Theoretical Analysis section is complemented by mathematical details in the Methods section, under the heading Mathematical Definitions, Details to the Proof of Theorem 1, and Examples, and the heading Mathematical Definitions and Details to the Proof of Theorem 2. Details of the computer simulations of more detailed cortical microcircuit models are discussed in Applications to Generic Cortical Microcircuit Models in the Methods section. A discussion of the results of this paper is given in the Discussion section.

Results
We consider two types of models for neural circuits. The first model type is mean field models, such as those defined by Equation 6, which models the dynamics of firing rates of neurons in neural circuits. These models have the advantage that they are theoretically tractable, but they have the disadvantage that they do not reflect many known details of cortical microcircuits. However we show that the theoretical results that are proven in the section Theoretical Analysis hold for fairly large classes of dynamical systems. Hence, they potentially also hold for some more detailed models of neural circuits.
The second model type involves quite detailed models of cortical microcircuits consisting of spiking neurons (see the description in Applications to Generic Cortical Microcircuit Models and in Details of the Cortical Microcircuit Models). At present these models cannot be analyzed directly by theoretical methods, hence we can only present statistical data from computer simulations. Our simulation results show that feedback has in these more detailed models a variety of computational consequences that we have derived analytically for the simpler models in Theoretical Analysis. This is not totally surprising insofar as the computations that we consider in the more detailed models can be approximately described in terms of time-varying firing rates for individual neurons.
In both types of models we focus on computations that transform time-varying input streams into time-varying output streams. The input streams are modeled in Theoretical Analysis by time-varying analog functions u(t) (that might for example represent time-varying firing rates of neurons that provide afferent inputs) and in Applications to Generic

Author Summary
Circuits of neurons in the brain have an abundance of feedback connections, both on the level of local microcircuits and on the level of synaptic connections between brain areas. But the functional role of these feedback connections is largely unknown. We present a computational theory that characterizes the gain in computational power that feedback can provide in such circuits. It shows that feedback endows standard models for neural circuits with the capability to emulate arbitrary Turing machines. In fact, with suitable feedback they can simulate any dynamical system, in particular any conceivable analog computer. Under realistic noise conditions, the computational power of these circuits is necessarily reduced. But we demonstrate through computer simulations that feedback also provides a significant gain in computational power for quite detailed models of cortical microcircuits with in vivo-like high levels of noise. In particular it enables generic cortical microcircuits to carry out computations that combine information from working memory and persistent internal states in real time with new information from online input streams.
Cortical Microcircuit Models by spike trains generated by Poisson processes with time-varying rates. Output streams are analogously modeled by time-varying firing rates, or directly by spike trains. We believe that such online computations, which transform time-varying inputs into time-varying outputs, provide a better framework for modeling cortical processing of information than computations that transform a static vector of numbers (i.e., a batch input) into a static output. Mappings from time-varying inputs to time-varying outputs are referred to as filters (or operators) in mathematics and engineering. A frequently discussed reference class of linear and nonlinear filters includes those that can be described by Volterra or Wiener series (see, e.g., [21]). These filters can equivalently be characterized as those filters that are time-invariant (i.e., they are input-driven and have no ''internal clock'') and have a fading memory (see [5]). Fading memory (which is formally defined in Fading-Memory Filters means intuitively that the influence of any specific segment of the input stream on later parts of the output stream becomes negligible when the length of the intervening time interval is sufficiently large. We show in the next two subsections that feedback endows a circuit, which by itself can only carry out computations with fading memory, with flexible ways of combining fading-memory computations on time-varying inputs with computational operations on selected pieces of information in a nonfading memory.

Theoretical Analysis
The dynamics of firing rates in recurrent circuits of neurons is commonly modeled by systems of nonlinear differential equations of the form ð1Þ or [22][23][24][25]. Here each x i ,i ¼ 1,. . .,n, is a real-valued variable that represents the current firing rate of the i th neuron or population of neurons in a recurrent neural circuit, and v(t) is an external input stream. The coefficients a ij ,b i denote the strengths of synaptic connections, and k i . 0 denotes time constants. The function r is some sigmoidal activation function (nondecreasing, with bounded range). In most models of neural circuits, the parameters are chosen so that the resulting dynamical system has a fading memory for preceding inputs. If one makes the synaptic connection strengths a ij in Equation 1 or Equation 2 so large that recurrent activity does not dissipate, the neural circuit tends to exhibit persistent memory. But it is usually quite difficult to control the content of this persistent memory, since it tends to be swamped with minor details of external inputs (or initial conditions) from the distant past. Hence this chaotic regime of recurrent neural circuits (see [62] for a review) is apparently also not suitable for biologically realistic online computations that combine new information from the current input with selected (e.g., behaviorally relevant) aspects of external or internal inputs from the past.
Recurrent circuits of neurons (e.g., those described by Equations 1 or 2) are from a mathematical perspective special cases of dynamical systems. The subsequent mathematical results show that a large variety of dynamical systems, in particular also fading-memory systems of type Equation 1 or Equation 2, can overcome in the presence of feedback the computational limitations of a fading memory without necessarily falling into the chaotic regime. In fact, feedback endows them with universal capabilities for analog computing, in a sense that can be made precise in the following way (see Figure 1A-1C for an illustration): Theorem 1. A large class S n of systems of differential equations of the form i ðtÞ ¼ f i ðx 1 ðtÞ; . . . ; x n ðtÞÞ þ g i ðx 1 ðtÞ; . . . ; x n ðtÞÞ Á vðtÞ; i ¼ 1; . . . ; n ð3Þ are in the following sense universal for analog computing: This system (3) can respond to an external input u(t) with the dynamics of any n t order differential equation of the form z ðnÞ ðtÞ ¼ GðzðtÞ; z9ðtÞ; z 99ðtÞ; . . . ; z ðnÀ1Þ ðtÞÞ þ uðtÞ ð4Þ (for arbitrary smooth functions G: R n ! R) if the input term v(t) is replaced in Equation 3 by a suitable memoryless feedback function K(x 1 (t), . . . ,x n (t),u(t)), and if a suitable memoryless readout function h(x(t)) is applied to its internal state x(t) ¼ hx 1 (t),. . .,x n (t)i: one can achieve then that h(x(t)) ¼ z(t) for any solution z(t) of Equation 4. Also the dynamic responses of all systems consisting of several higher order differential equations of the form Equation 4 can be simulated by fixed systems of the form Equation 3 with a corresponding number of feedbacks.
This result says more precisely that for any n th order differential equation (Equation 4) there exists a (memoryfree) feedback function K: R n 3 R ! R and a memory-free readout function h: R n ! R (which can both be chosen to be smooth, in particular continuous) so that, for every external input u(t),t ! 0, and each solution z(t) of the forced system (Equation 4), there is an input u 0 (t) with u 0 (t) [ 0 for all t ! 1, so that the solution x(t) ¼ hx 1 (t),. . .,x n (t)i of the fixed system (Equation 3) x9ðtÞ ¼ f ðxðtÞÞ þ gðxðtÞÞKðxðtÞ; uðtÞ þ u 0 ðtÞÞ; xð0Þ ¼ 0 ð5Þ (for f: R n ! R n consisting of h f 1 ,. . ., f n i and g: R n ! R n consisting of hg 1 ,. . .g n i) is such that hðxðtÞÞ ¼ zðtÞ for all t ! 1: Note that the function u 0 (t), which is added to the input for t , 1 (whereas u 0 (t) ¼ 0 for t ! 1), allows the system (Equation 3) (and Equation 5) to simulate with a standardized initial condition x(0) ¼ 0 for any solution of Equation 4 with arbitrary initial conditions. Theorem 1 implies that even if some fixed dynamical system (Equation 3) from the class S n has fading memory, a suitable feedback K and readout function h will enable it to carry out specific computations with persistent memory. In fact, it can carry out any computation with persistent memory that could possibly be carried out by any dynamical system (Equation 4). To get a clear understanding of this universality property, one should note that the feedback function K and the readout function h depend only on the function G that characterizes the simulated system (Equation 4), but not on the external input u(t) or the particular solution z(t) of Equation 4 that it simulates. Hence, Theorem 1 implies in particular that any system (Equation 3) that belongs to the class S n has in conjunction with several feedbacks the computational power of a universal Turing machine (see [26] or [27] for relevant concepts from computation theory). This follows from the fact that every Turing machine (hence any conceivable digital computation, most of which require a persistent memory) can be simulated by systems of equations of the form Equation 4 (this was shown in [28] for the case with continuous time, and in [29,30] for recurrent neural networks with discrete time; see [31] for a review). But possibly more relevant for applications to biological systems is the fact that any fixed system (Equation 3) that belongs to the class S n is able to emulate any conceivable continuous dynamic response to an input stream u(t) if it receives a suitable feedback K(x(t),u(t)), where K can always be chosen to be continuous. Hence one may argue that these systems (Equation 3) are also universal for analog computing on timevarying inputs.
The class S n of dynamical systems become through feed-back universal for analog computing subsumes systems of the form  (C) If the input v(t) to circuit C is replaced by a suitable feedback K(x(t),u(t)), then this fixed circuit C can simulate the dynamic response z(t) of the arbitrarily given system shown in B, for any input stream u(t).
(D) Arbitrary given FSM A with l state. (E) A noisy fading-memory system with feedback can reliably reproduce the current state A(t) of the given FSM A, except for timepoints t shortly after A has switched its state. doi:10.1371/journal.pcbi.0020165.g001 Equation 3 will also stay within a bounded range. More precisely, one has that: For each constant c . 0 there is a constant C . 0 such that: for every external input u(t),t ! 0, and each solution z(t) of the forced system (Equation 4) such that juðtÞj c and jz ðiÞ ðtÞj c for all i ¼ 0; :::; n À 1; for all t ! 0 the input u 0 can be picked so that the feedback vðtÞ ¼ KðxðtÞ; uðtÞ þ u 0 ðtÞÞ t ! 0 to Equation 1 or 2 satisfies: jvðtÞj C for all t ! 0 Thus, if we know a priori that we will only deal with solutions of the differential Equation 4 that are bounded by c, and inputs are similarly bounded, we could also consider instead of Equation 3 a system such as x9(t) ¼ f(x(t)) þ g(x(t))r(v(t)) with f,g: R n ! R n , where some bounded activation function r: R ! R (e.g., q Á tanh(v), for a suitable constant q) is applied to the term v(t) (as in Equation 2). The resulting feedback term r(K(x(t),u(t) þ u 0 (t))) is then of a mathematical form which is adequate for modeling feedback in neural circuits.
The proof of Theorem 1 builds on results from control theory. One important technique in nonlinear control is feedback linearization ( [32,33]). With this technique, a large class of nonlinear dynamical systems can be transformed through suitable feedback into a linear system (which is then much easier to control). It should be pointed out that this feedback linearization is not a standard linearization method that only yields approximation results, but a method that yields an exact transformation. More generally, one can show in various cases that two dynamical systems, D 1 and D 2 , are feedback equivalent. The notion of ''feedback equivalence'' (see Definition of Feedback Equivalence), which is in fact an equivalence relation, expresses that two systems of differential equations can be transformed into each other through application of a suitable feedback and a change of basis in the state space. Such change of basis can be achieved through readout functions h(x(t)) as considered in the claim of Theorem 1. Thus, to show that a fixed system D 1 has the universality property that is specified in the claim of Theorem 1, it suffices to show that D 1 is feedback equivalent to all systems of the form Equation 4. Known results about feedback linearization (see [33], Lemma 5.3.5) imply that the following linear system (Equation 7) is an example of a system D 1 (consisting of n differential equations) which has this universality property: It is in fact very easy to see that any system (Equation 4) can be transformed into the system of Equation 7 with the help of feedback: set x 1 (t) ¼ z(t),x iþ1 (t) ¼ z (i) for i ¼ 1,. . .,n À 1, and use the feedback v(t) ¼ G(x(t)) þ u(t) in Equation 7. To prove that many other dynamical systems have the same universality property as this system (Equation 7), it suffices to observe that feedback equivalence preserves this universality property. We define the class S n in the claim of Theorem 1 as the class of feedback linearizable systems, that is, the class of dynamical systems (Equation 3) that are feedback equivalent to some generic linear system. It can be proved (see Lemma in the section Definition of the Class S n ) that every feedback linearizable system (Equation 3) is also feedback equivalent to Equation 7, and hence has the same universality property as Equation 7.
We give in Definition of Class S n a precise definition of the class S n in terms of feedback equivalence (which is formally defined in Definition of Feedback Equivalence). We present in Details of the Proof of Theorem 1 a formal proof of the simulation result that is claimed in Theorem 1 (taking also initial conditions into account). In addition we formulate in the section A Characterization of S n via Lie Brackets an equivalent criterion for a system (Equation 3) to belong to the class S n , which can be more easily tested for concrete cases of dynamical systems. This criterion makes use of the Lie bracket formalism that is briefly reviewed in Lie Brackets. Applications of this criterion to neural network equations are discussed in Applications to Neural Network Equations. In particular, we use this criterion to show that some dynamical systems (Equation 6) that are defined by standard equations for recurrent neural circuits belong to the class S n . We also show in Applications to Neural Network Equations that not all systems of the form (Equation 6) belong to the class S n , rather it depends on the particular choice of parameters a ij and b i in Equation 6.
Theorem 1 implies that a generic neural circuit may become through feedback a universal computational device, which cannot only simulate any Turing machine, but also any conceivable model for analog computing with bounded dynamic responses. The ''program'' of such an arbitrary simulated computing machine gets encapsulated in the static functions K that characterize the memoryless computational operations that are required from feedback units, and the static readout functions h. Since these functions are static, i.e., time-invariant, and continuous, they provide suitable targets for learning. More precisely, to train a generic neural circuit to simulate the dynamic response of an arbitrary dynamical system, it suffices to train-apart from readout neurons-a few neurons within the circuit (or within some external loop) to transform the vector x(t), which represents the current firing activity of its neurons, and the current external input u(t) into a suitable feedback value K(x(t),u(t)). This could, for example, be carried out by training a suitable feedforward neural network within the larger circuit, which can approximate any continuous feedback function K [34]. Furthermore, we will show in Applications to Generic Cortical Microcircuit Models that these feedback functions K can in many biologically relevant cases be chosen to be linear, so that it would in principle suffice to train a single neuron to compute K.
It is known that the memory capacity of such a circuit is reduced to some finite number of bits if these feedback functions K are not learnt perfectly, or if there are other sources of noise in the system. More generally, no analog circuit with noise can simulate arbitrary Turing machines [35]. But the subsequent Theorem 2 shows that fadingmemory systems with noise and imperfect feedback can still achieve the maximal possible computational power within this a priori limitation: they can simulate any given finite state machine (FSM). Note that any Turing machine with tapes of finite length is a special case of a FSM. Furthermore, any existing digital computer is an FSM, hence the computational capability of FSMs is actually quite large.
To avoid the cumbersome mathematical difficulties that arise when one analyses differential equations with noise, we formulate and prove Theorem 2 on a more abstract level, resorting to the notion of fading-memory filters with noise (see Mathematical Definitions and Details to the Proof of Theorem 2). We assume here that the input-output behavior of those dynamical systems with noise, for which we want to determine the computational impact of (imprecise) state feedback, can be modeled by fading-memory filters with additive noise on their output. The assumption that the amplitude of this noise is bounded is a necessary assumption according to [36]. We refer to [4,5,37] for further discussions of the relationship between models for neural circuits and fading-memory filters. In particular it was shown in [37] that every time-invariant fading-memory filter can be approximated by models for neural circuits, provided that these models reflect the empirically found diversity of time constants of neurons and synapses.
Theorem 2. Feedback allows linear and nonlinear fading-memory systems, even in the presence of additive noise with bounded amplitude, to employ for real-time processing of time-varying inputs the computational capability and nonfading states of any given FSM (see Figure 1D-1E).
A precise formalization of this result is formulated as Theorem 5 in Precise Statement of Theorem 2, and a formal proof of Theorem 5 is given in Proof of the Precise Statement of Theorem 2. The external input u(t) can in this case be injected directly into the fading-memory system, so that the feedback K(x(t)) depends only on the internal state x(t) (see Figure 1E). One essential ingredient of the proof is a method for making sure that noise does not get amplified through feedback: the functions K that provide feedback values K(x(t)) can be chosen in such a way that they cancel the impact of imprecision in the values K(x(s)) for immediately preceding time steps s , t.

Applications to Generic Cortical Microcircuit Models
We examine in this section computational aspects of feedback in recurrent circuits of spiking neurons that are based on data from cortical microcircuits. The dynamics of these circuits is substantially more complex than the dynamics of circuits described by Equation 6, since it is based on action potentials (spikes) rather than on firing rates. Hence one can expect at best that the temporal dynamics of firing rates in these circuits of spiking neuron is qualitatively similar to that of circuits described by Equation 6.
The preceding theoretical results imply that it is possible for dynamical systems to carry out computations with persistent memory without acquiring all the computational disadvantages of the chaotic regime, where the memory capacity of the system is dominated by noise. Feedback units can create selective ''loopholes'' into the fading-memory dynamics of a dissipative system that can only be activated by specific patterns in the input or circuit dynamics. In this way the potential content of persistent memory can be controlled by feedback units that have been trained to recognize such patterns. This feedback may arise from a few neurons within the circuit, or from neurons within a larger feedback loop. The task to approximate a suitable feedback function K is less difficult than it may appear on first sight, since it suffices in many cases to approximate a linear feedback function. The reason is that sufficiently large generic cortical microcircuit models have an inherent kernel property [8], in the sense of machine learning [38]. This means that a large reservoir of diverse nonlinear responses to current and recent input patterns is automatically produced within the recurrent circuit. In particular, nonlinear combinations of variables a,b,c,. . . (that may result from the circuit input or internal activity) are automatically computed at internal nodes of the circuit. Consequently, numerous low-degree polynomials in these variables a,b,c,. . . can be approximated by linear combinations of outputs of neurons from the recurrent circuit. An example of this effect is demonstrated in Figure  2G, where it is shown that the product of firing rates r 3 (t) and r 4 (t) and of two independently varying afferent spike train inputs can be approximated quite well by a linear readout neuron. The kernel property of biologically realistic cortical microcircuit models is apparently supported by the fact that these circuits have many additional nonlinearities in addition to those that appear in Equations 1, 2, and 6.
One formal difference between neurons in the mean field model (Equation 6) and more realistic models for spiking neurons is that the input to a neuron of the latter type consists of postsynaptic potentials, rather than of firing rates. Hence the time-varying input x(t) to a readout neuron is in this section not a vector of time-varying firing rates, but a smoothed version of the spike trains of all presynaptic neurons. This smoothing is achieved through application of a linear filter with an exponentially decaying kernel, whose time constant of 30 ms models time constants of receptors and postsynaptic membrane of a readout neuron in a qualitative fashion. Thus, if w is a vector of synaptic weights, then w Á x(t) models the impact of the firing activity of presynaptic neurons on the membrane potential of a readout neuron.
We refer in the following to those neurons where the weights of synaptic connections from neurons within the circuit are adapted for a specific computational task (rather than chosen randomly from distributions that are based on biological data, as for all other synapses in the circuit) as readout neurons. The output of a readout neuron was modeled in most of our simulations simply by a weighted sum w Á x(t) of the previously described vector x(t). Such output can be interpreted as the time-varying firing rate of a readout neuron. However, we show in Figure 2 that these readout neurons can (with a moderate loss in performance) also be modeled by spiking neurons, exactly like the other neurons in the simulated circuit. This demonstrates that not only those circuits that receive feedback from external readout neurons, but also generic recurrent circuits in which a few neurons have been trained for a specific task, acquire computational capabilities for real-time processing that are not restricted to computations with fading memory.
Theorem 2 suggests that the training of a few of its neurons enables generic neural circuits to employ persistent internal states for state-dependent processing of online input streams. Previous models for nonfading memory in neural circuits [13,[39][40][41] proposed that it is implemented through lowdimensional attractors in the circuit dynamics. These attractors tend to freeze or to entrain the whole state of the circuit, and thereby shut it off from the online input stream (although independent local attractors could emerge in local subcircuits under some conditions [40]). In contrast, the generation of nonfading memory through a few trained neurons does not entail that the dynamics of the circuit be dominated by their persistent memory states. For example, when a readout neuron gives during some time interval a constant feedback K(x(t)) ¼ c, this only constrains the circuit state x(t) to remain in the sub-manifold fx: K(x) ¼ cg of its high-dimensional state space. This sub-manifold is in general high-dimensional. In particular, if K(x) is a linear function w Á x, which often suffices as we will show; the dimensionality of the sub-manifold fx: K(x) ¼ cg differs from the dimension of the full state space only by 1. Hence several such submanifolds have in general a high-dimensional intersection, and their intersection still leaves sufficiently many degrees of freedom for the circuit state x(t) to also absorb continuously new information from online input streams. These submanifolds are in general not attractors in a strict mathematical sense. Rather, their effective attraction property (or noise-robustness) results from the subsequently described training process (''teacher forcing''). This training process produces weights w which have the property that the resulting feedback w ÁxðtÞ moves on a trajectory of circuit (D) Target activation times of the high-dimensional attractor (blue shading), spike trains of two of the eight I&F neurons that were trained to create the high-dimensional attractor by sending their output spike trains back into the circuit, and average firing rate of all eight neurons (lower trace). (E,F) Performance of linear readouts that were trained to switch their real-time computation task depending on the current state of the highdimensional attractor: (G) Performance of linear readout that was trained to output r 3 (t) Á r 4 (t), showing that another linear readout from the same circuit can simultaneously carry out nonlinear computations that are invariant to the current state of the high-dimensional attractor. doi:10.1371/journal.pcbi.0020165.g002 states that goes through states x (t) in the neighborhood of the sub-manifold fx: K(x) ¼ cg, closer to this sub-manifold.
We simulated generic cortical microcircuit models consisting of 600 integrate-and-fire (I&F) neurons (for Figures 2 and  3), and circuits consisting of 600 Hodgkin-Huxley (HH) neurons (for Figure 4), in either case with a rather high level of noise that reflects experimental data on the high conductance state in vivo [42]. These circuits were not constructed for any particular computational task. In particular, sparse synaptic connectivity between neurons was generated (with a biologically realistic bias towards short connections) by a probabilistic rule. Synaptic parameters were chosen randomly from distributions that depend on the type of pre-and postsynaptic neurons (in accordance with empirical data from [43,44]). More precisely, we used biologically realistic models for dynamic synapses whose  Figure 2, which has here two coexisting high-dimensional attractors. The autonomously generated periodic bursts with a periodic frequency of about 8 Hz are not related to the task, and readouts were trained to become invariant to them. (C,D) Feedback from two linear readouts that were simultaneously trained to create and control two high-dimensional attractors. One of them was trained to decay in 400 ms (C), and the other in 600 ms (D) (scale in nA is the average current injected by feedback into a randomly chosen subset of neurons in the circuit). (E) Response of the same neurons as in (B), for the same circuit input, but with feedback from a different linear readout that was trained to create a high-dimensional attractor that increases its activity and reaches a plateau of 600 ms after the occurrence of the cue in the input stream.  (D) Performance of a neural integrator, generated by feedback from a linear readout that was trained to output at any time t an approximation CA(t) of the integral R t 0 ðr 1 ðsÞ À r 2 ðsÞÞds over the difference of both input rates. Feedback values were injected as input currents into a randomly chosen subset of neurons in the circuit. Scale in nA shows average strength of feedback currents, also in (H). (E) Performance of linear readout that was trained to output 0 as long as CA(t) stayed below 0.83 nA, and to output r 2 (t) once CA(t) had crossed this threshold, as long as CA(t) stayed above 0.66 nA (i.e., in this test run during the shaded time periods). individual mixture of paired-pulse depression and facilitation (depending on the type of pre-and postsynaptic neuron) was based on these data. It has previously been shown in [6,8] that the presence of such dynamic synapses extends the timespan of the inherent fading memory of the circuit. However the computational tasks that are considered in this paper require, apart from a nonfading memory, only a fading memory with a rather short timespan (to make the estimation of the current firing rate of input spike trains feasible). Therefore, the biologically more realistic dynamic synapses could be replaced in this model by simple static synapses, without a change in the performance of the circuit for the subsequently considered tasks. All details of the simulated microcircuit models can be found in Details of the Cortical Microcircuit Models. Details of the subsequently discussed computer experiments are given in the sections Technical Details of Figure 5, Technical Details of Figure 2, and Technical Details of Figure 3.
We tested three different types of computational tasks for generic neural circuits with feedback. The same neural circuit can be used for each task, only the organization of input and output streams needs to be chosen individually (see Figure 5). The following procedure was applied to train readout neurons, i.e., to adjust the weights of synaptic connections from neurons in the circuit to readout neurons for specific computational tasks (while leaving all other parameters of the generic microcircuit model unchanged): 1) first those readout neurons were trained that provide feedback, then the other readout neurons; 2) during the training of readout neurons that provide feedback, their actual feedback was replaced by a noisy version of their target output (''teacher forcing''); 3) each readout neuron was trained by linear regression to output at any time t a particular target value f(t). Linear regression was applied to a set of datapoints of the form hx(t),f(t)i for many timepoints t, where x(t) is a smoothed version of the spike trains of presynaptic neurons (as defined before).
Note that teacher forcing, with noisy versions of target feedback values, trains these readouts to correct errors resulting from imprecision in their preceding feedback (rather than amplifying errors). This training procedure is responsible for the robustness of the dynamics of the resulting closed-loop circuits, in particular for the ''attractor'' properties of the effectively resulting high-dimensional attractors.
In our first computer experiment, readout neurons were trained to turn a high-dimensional attractor on or off ( Figure  2D), in response to bursts in two of the four independent input spike trains. More precisely, eight neurons were trained to represent in their firing activity at any time the information: in which of the input streams, 1 or 2, had a burst most recently occurred? If it had occurred most recently in stream 1, they were trained to fire at 40 Hz, and if a burst had occurred most recently in input stream 2, they were trained not to fire. Hence these neurons were required to represent the nonfading state of a simple FSM, demonstrating in an example the computational capabilities predicted by Theorem 2. Figure 2G demonstrates that the circuit retains its kernel property in spite of the feedback injected into the circuit by these readouts. But beyond the emulation of a simple FSM, the resulting generic cortical microcircuit is able to combine information stored in the current state of the FSM with new information from the online circuit input. For example, Figure 2E shows that other readouts from the same circuit can be trained to amplify their response to specific inputs if the high-dimensional attractor is in the ''on'' state. Readouts can also be trained to change the function that they compute if the high-dimensional attractor is in the on state ( Figure 2F). This provides an example for an online reconfigurable circuit. The readout neurons that provide feedback had been modeled in this computer simulation like the other neurons in the circuit: by I&F neurons with in vivo-like background noise. Hence they can be viewed equivalently as neurons within an otherwise generic circuit.
Another difficult problem in computational neuroscience is to explain how neural circuits can implement a parametric memory, i.e., how they can hold and update an analog value that may represent, for example, an intended eye position that a neural integrator computes from a sequence of eyemovement commands [45], an estimate of elapsed time [9], or accumulated sensory evidence [14]. Various designs have been proposed for parametric memory in recurrent circuits, where continuous attractors (also referred to as line attractors) hold and update an analog value. But these approaches are inherently brittle [41], and have problems in dealing with high noise or online circuit inputs. On the other hand, Figure  3 shows that dedicated circuit constructions are not necessary, since feedback from readout neurons in generic cortical microcircuits models can also create high-dimensional attractors that hold and update an analog value for behaviorally relevant timespans. In fact, due to the highdimensional character of the resulting high-dimensional attractors, two such analog values can be stored and updated independently ( Figure 3C and 3D), even within a fairly small circuit. In this example, the readouts that provide feedback were simply trained to increase or reduce their feedback at each timepoint. Note that the resulting circuit activity is qualitatively consistent with recordings from neurons in cortex and striatum during reward expectation [10][11][12]. A similar ramp-like rise and fall of activity as shown in Figure  3C, 3D, and 3F has also been recorded in neurons of posterior parietal cortex of the macaque in experiments where the monkey had been trained to classify the duration of elapsed time [9]. The high dimensionality of the continuous attractors in this model makes it feasible to constrain the circuit state to stay simultaneously in more than one continuous attractor, thereby making it in principle possible to encode complex movement plans that require specific temporal relationships between individual motor commands.
Our model for parametric memory in cortical circuits is consistent with high noise: Figure 4G shows the typical trialto-trial variability of a neuron in our simulated circuit of HH neurons with in vivo-like background noise. It qualitatively matches the ''wide diversity of neural firing drift patterns in individual fish at all states of tuning'' that was observed in the horizontal occulomotor neural integrator in goldfish [15], and the large trial-to-trial variability of neurons in prefrontal cortex of monkeys reported in [10]. In addition, this model is consistent with the surprising plasticity that has been observed even in quite specialized neural integrators [15], since continuous attractors can be created or modified in this model by changing just a few synaptic weights of neurons that are immediately involved. It does not require the presence of long-lasting postsynaptic potentials, NMDA receptors, or other specialized details of biological neurons or synapses, although their inclusion in the model is likely to provide additional temporal stability [13]. Rather it points to complementary organizational mechanisms on the circuit level, which are likely to enhance the controllability and robustness of continuous attractors in neural circuits. The robustness of this learning-based model can be traced back to the fact that readout neurons can be trained to correct undesired circuit responses resulting from errors in their previous feedback. Furthermore, such error correction is not restricted to linear computational operations, since the previously demonstrated kernel property of these generic circuits allows even linear neurons to implement complex nonlinear control strategies through their feedback. As an example, we demonstrate in Figure 4 that even under biologically realistic high-noise conditions a linear readout can be trained to update a continuous attractor ( Figure 4D), to filter out input activity during certain time intervals independent of the current state of the continuous attractor ( Figure 4E), or to combine the time-varying analog variable encoded by the current state CA(t) of the continuous attractor with a time-varying variable r 1 (t) that is delivered by an online spike input. Hence, intention-based information processing [16] and other tasks that involve a merging of external inputs and internal state information can be implemented in this way. Figure 4C shows that a high-dimensional attractor need not entrain the firing activity of neurons in a drastic way, since it just restricts the high-dimensional-circuit dynamics x(t) to a slightly lower dimensional manifold of circuit states x(t) that satisfy w Á x(t) ¼ f(t) for the current target output f(t) of the corresponding linear readout. On the other hand, Figure 4E shows that the activity level CA(t) of the highdimensional attractor can nevertheless be detected by other linear readouts, and can simultaneously be combined in a nonlinear manner with a time-varying variable r 2 (t) from one afferent circuit input stream, while remaining invariant to the other afferent input stream.
Finally, the same generic circuit also provides a model for the integration of evidence for decision making that is compatible with in vivo-like high noise conditions. Figure 4H depicts the timecourse of the same neural integrator as in Figure 4D, but here for the case where the rates r 1 ,r 2 of the two input streams assume in eight trials eight different constant values after the first 100 ms (while assuming a common value of 65 Hz during the first 100 ms). The resulting timecourse of the continuous attractor is qualitatively similar to the meandering path towards a decision threshold that has been recorded from neurons in area LIP where firing rates represent temporally integrated evidence concerning the dominating direction of random dot movements (see Figure  4A in [14]).

Discussion
We have presented a theoretically founded model for realtime computations on complex input streams with persistent internal states in generic cortical microcircuits. This model does not require a handcrafted circuit structure or biologically unrealistic assumptions such as symmetric weight distributions, static synapses that do not exhibit pair-pulsed depression or facilitation, or neuron models with low levels of noise that are not consistent with data on in vivo conditions. Our model only requires the assumption that adaptive procedures (synaptic plasticity) in generic neural circuits can approximate linear regression. Furthermore, in contrast to classical learning paradigms for attractor neural networks, it is here not required that a large fraction of synaptic parameters in the circuit are changed when a new computational task is introduced or a new item is stored in working memory. Rather, it suffices if those neurons that provide the circuit output and a few neurons that provide feedback are subject to synaptic plasticity. Such minimal circuit modifications have the advantage that thereby created attractors of the circuit dynamics are high-dimensional. We have shown that the circuit state can simultaneously be in several of such high-dimensional attractors, and still retain sufficiently many degrees of freedom to absorb and process new information from online input streams. In particular, we have shown in Figures 2 and 4 how bottom-up processing can be reconfigured dependent on discrete internal states (implemented through high-dimensional attractors) by turning certain input channels on or off, and by changing the computational operations that are applied to input variables. Furthermore we have shown in Figure 4 that analog variables, which are extracted from an online input stream, can be combined in real-time computations with analog variables that are stored in high-dimensional continuous attractors. This provides in particular a model for the implementation of intention-based information processing [16] in cortical microcircuits.
It remains open how learning signals can induce neurons in a biological organism to compute specific linear feedback functions. But at least we have reduced this problem to the feasibility of perceptron-like learning (or more abstractly: to linear regression) for single neurons. Subsequent research will have to determine whether these learning requirements (which can be partially reduced to spike-timing dependent plasticity [46]) can be justified on the basis of results on unsupervised learning and reinforcement learning [47] in biological organisms.
Whereas it was previously already known that one can construct specific circuits that have universal computational capabilities for real-time computing on analog input streams, Theorems 1 and 2 of this article imply that a large variety of dynamical systems (in particular generic cortical microcircuits) can acquire through feedback such universal capabilities for computations that map time-varying inputs to time-varying outputs. It should be noted that these universal computational capabilities differ from the wellknown but much weaker universal approximation property of feedforward neural networks (see [34]), since not only the static output of an arbitrary continuous static function is approximated, but also the dynamic response of arbitrary differential equations of higher-order to time-varying inputs.
The theoretical results of this article also provide an explanation for the astounding computational capability and flexibility of echo state networks [17]. In addition they can be used to analyze computational aspects of feedback in other biological dynamical systems besides neural circuits. Several such systems, for example, genetic regulatory networks, are known to implement complex maps from time-varying input streams (e.g., external signals) onto time-varying outputs (e.g., transcription rates). But little is known about the way in which these maps are implemented. Whereas feedback in biological dynamical systems is usually only analyzed and modeled from the perspective of control, we propose that an analysis of its computational aspects is likely to yield a better understanding of the computational capabilities of such systems.

Materials and Methods
Mathematical definitions, details to the proof of Theorem 1, and examples. Definition of feedback equivalence. We recall that a smooth mapping is one for which derivatives of all orders exist (infinite differentiability), and that a diffeomorphism T: T: R n ! R n is a smooth mapping for which there exists a well-defined smooth inverse T À1 : R n ! R n .
Definition (see [33], Definition 5.3.1). Two n-dimensional systems x9 ¼ f(x) þ g(x)v and x9 ¼f ðxÞ þgðxÞv (with smooth vector fields f ¼ h f 1 ; . . . ; f n i; g ¼ hg 1 ; . . . ; g n i;f ¼ hf 1 ; . . . ;f n i;g ¼ hg 1 ; . . . ;g n iÞare called feedback equivalent (over the state space R n ) if there exists 1) a diffeomorphism T: R n ! R n , and 2) smooth maps a,b: R n ! R with b(x) 6 ¼ 0 for all x 2 R n , such that, for each x 2 R n T Ã ðxÞðf ðxÞ þ gðxÞaðxÞÞ ¼f ðTðxÞÞ and bðxÞT Ã ðxÞgðxÞ ¼gðTðxÞÞ (where T * denotes the Jacobian of T). Definition of the class S n . Recall that a linear system x9 ¼ Ax þ bu is controllable if it is possible to drive any state x 0 to any other state x 1 using an input (see [33], Definition 3.1.6). Controllability is a generic property of systems, and amounts to the requirement that the matrix (b,Ab,. . .,A nÀ1 b) has full rank, where n is the dimension of the system (see [33], Theorem 2). Note that the linear system (Equation 7) satisfies this requirement, and hence is controllable.
We take S n to be the class of n-dimensional systems (Equation 3) that are (globally) feedback linearizable, that is to say, the systems (Equation 3) for which there exists some linear controllable system that is feedback equivalent to Equation 3 (see [33], Definition 5.3.2).
An n-dimensional system is feedback linearizable if and only if it is feedback-equivalent to the system (Equation 7) (see [33], Lemma 5.3.5). Therefore, we have the following: Lemma: A system (Equation 3), with smooth vector fields f ¼ hf 1 ,. . .,f n i and g ¼ hg 1 ,. . .,g n i belongs to S n if and only if there exists a diffeomorphism T: R n ! R n and two smooth maps a,b: R n ! R, with b(x) 6 ¼ 0 for all x 2 R n , such that, for each x 2 R n : T Ã ðxÞ f ðxÞ ¼ A n TðxÞ À aðxÞ bðxÞ b n ð8Þ where T * denotes the Jacobian of T and An interpretation of the property given in the above Lemma, that will be used in the proof of Theorem 1 in the section Details to the Proof of Theorem 1, is as follows (see [33], Chapter 5, for more discussion): For each input l(t) and each solution z(t) of ZðtÞ ¼ ðzðtÞ; z9ðtÞ; z 99ðtÞ; . . . ; z ðnÀ1Þ ðtÞÞ Details to the proof of Theorem 1. In this section, we prove the simulation result that is claimed in Theorem 1.
Take any system (Equation 3) in S n and any system (Equation 4) to be simulated. Using T,a,b as in the Lemma in section Definition of the Class S n that characterizes the class S n , we define: Kðx; wÞ :¼ aðxÞ þ bðxÞ½GðTðxÞÞ þ w and we let h(x) be the first coordinate of T(x). In the special case where Equation 3 describes the dynamics of a circuit according to Equation 6, a is a linear function, b is a constant, and T is an invertible linear map from R n to R n .
Next, pick an external input u(t),t ! 0, and a solution z(t) of the forced system (Equation 4).
This almost proves the simulation result, except for the fact that there is no reason for the initial value x(0) ¼ T À1 (Z(0)) to be zero, since z(t) is an arbitrary trajectory. This is where the input u 0 plays a role. Let n : ¼ T(0). We will show that, given any solution z(t) and any input u(t), there is some input u 0 (t), with u 0 (t) [ 0 for all t ! 1, so that the solution of y ðnÞ ðtÞ ¼ GðyðtÞ; y9ðtÞ; y 99ðtÞ; . . . ; y ðnÀ1Þ ðtÞÞ þ uðtÞ þ u 0 ðtÞ ð 10Þ with y(0) ¼ n has the property that y(t) ¼ z(t) for all t ! 1. (Where z(t) is the desired trajectory to be simulated, with u 0 [ 0.) Then letting Consider now an arbitrary solution z(t) of Equation 4 and let f be the vector with entries f iþ1 :¼ z ðiÞ ð1Þ; i ¼ 0; :::; n À 1 We next pick a scalar differentiable function u such that u (i) (0) ¼ n iþ1 and u (i) (1) ¼ n iþ1 for i ¼ 0,. . .,n À 1. (It is easy to see that such functions exist. For example, one may simply consider the linear system p9 ¼ A n p þ b n q with states p and input q. This is a completely controllable linear system (cf. [33] Chapter 3), so we just pick an input q(t) that steers n into f, and finally let u(t) be the first coordinate of p(t). Now we let u 0 ðtÞ :¼ u ðnÞ ðtÞ À GðuðtÞ; :::; u ðnÀ1Þ ðtÞÞ À uðtÞ for t , 1, and u 0 (t) [ 0 for t ! 1, and claim that the solution of Equation 10 with y(0) ¼ n has the property that y(t) ¼ z(t) for all t ! 1. Since u(t) þ u 0 (t) ¼ u(t) for all t ! 1, we only need to show that y (i) (1) ¼ z (i) (1) for every i ¼ 0,. . .,n À 1. To see this, in turn, and using uniqueness of solutions of differential equations, it is enough to show that y(t): ¼ u(t) satisfies u ðnÞ ðtÞ ¼ GðuðtÞ; u9ðtÞ; u 99ðtÞ; . . . ; u ðnÀ1Þ ðtÞÞ þ uðtÞ on the interval [0,1] and has derivatives at t ¼ 0 as specified by the vector n. But this is indeed true by construction.
Finally, we remark that if j u(t) j c and j z (i) j (t) c for all t ! 0, then x(t) ¼ T À1 (Z(t)) is bounded in norm by a constant that only depends on c (since T À1 is continuous, by definition of diffeomorphism), and the numbers b i : ¼ z (i) (1) are also bounded by a constant that depends only on c, so K(x(t),u(t) þ u 0 (t)) also is. Corollary 3. Analogous results can be shown for the simulation of systems consisting of any number k of higher order differential equations as in Equation 4. In this case fixed systems of first-order differential equations of a form as in Equation 3, but with k memoryless feedback functions K 1 ,. . .K k that depend on the simulated higher-order system, can be shown to be able to simulate the dynamic response of arbitrary higher-order systems of differential equations.
Lie brackets. The study of controllability and other properties of nonlinear systems is based upon the use of Lie bracket formalism and theory ( [33], Chapter 4). We need this formalism to show in the section Application to Neural Network Equations that the class S n includes some neural networks of the form Equation 6. For any two vector fields f and g, denotes the Lie bracket of f and g. Recall that the Lie bracket of two vector fields is a vector field that characterizes the effective direction of movement obtained by performing this ''commutator'' motion: follow the vector field f for t time steps, then g for t time steps, then f backward in time for t time steps, and finally g backward in time for t time steps, for small t . 0. To be more precise, denote formally by e tf the flow associated to f, and similarly for g. Consider the following curve, for any initial state x 0 : Applying repeatedly this expansion: (and similarly for g), we obtain that ffi ffi t p f x 0 ¼ e t½f ;g x 0 þ oðtÞ as t ! 0, from which it follows that c9(0) ¼ [f,g](x 0 ), which means that the direction of [f,g] is followed when performing the commutator motions. Using the possible noncommutativity of the vector fields, one generates in this manner genuinely new directions of movement in addition to those provided by the linear combinations of f and g. Well-known examples are provided by the Lie bracket of two rotations around orthogonal axes, which is a rotation around the remaining axis (see for example [33], page 150), or the motions involved in parking an automobile (see for example [33], Example 4.3.13).
Iterations of Lie brackets play a key role. Let us introduce, for any given vector field f, the operator ad f , which maps vector fields into vector fields by means of the formula ad f (g): ¼ [f,g]. Iterations of the operator ad f are defined in the obvious way: ad 0 f ðgÞ ¼ g and ad kþ1 f ðgÞ ¼ ad f ðad k f ðgÞÞ. It is also useful to consider an operator L f that acts on scalar functions. We use the notation L f u, for any (smooth) vector field f and (smooth) function u, to denote the Lie derivative of u along f, that is, ru Á f. The function L f u, which is again smooth, is nothing more than the directional derivative of the function u in the direction of the vector field f, in the sense of elementary calculus. One can also consider iterated applications of the operator L f .
A characterization of S n via lie brackets. With these notations, we are ready to present a Lie geometric characterization of the class S n . The next theorem follows by combining the proofs of Proposition 5.3.9 and of Theorem 15 in [33]  Observe that the conditions amount to the existence of a wellbehaved solution u of a set of first-order linear partial differential equations. Existence of a solution of this form is not trivial to verify. To study solvability, in control theory one considers the following conditions: The set of vector fields fgðxÞ; ad f gðxÞ; . . . ; ad nÀ1 f gðxÞg is linearly independent. (INV) The distribution generated by fg; ad f g; . . . ; ad nÀ2 f gg is involutive.
This last condition means that the Lie bracket of any two of the vector fields ad i f g, for i 2 f0,. . .,n À 2g, should be, for each x, a linear combination of these same n À 1 vectors.
One then has the following result (see Theorem 15 in [33]), which is a consequence of Frobenius' Theorem in partial differential equation theory: A system satisfies both conditions (LI) and (INV) at a state x if and only if it is feedback linearizable in some open set containing x. This provides a useful and complete characterization of local feedback linearizability, and in particular a necessary condition for global feedback linearizability. In examples, often these conditions lead one to a globally defined solution, see, e.g., example 5.3.10 in [33]).
Application to neural network equations. Let us now show with the help of Theorem 4 that the class Sn includes some fading-memory systems of the form Equation 6. Indeed, consider any system as follows: where the k i 6 ¼ k j for each i 6 ¼ j are all positive, diag(k 1 ,...,k n ) is the resulting diagonal matrix, and the column vector b ¼ col(b 1 ,. . .,b n ) has nonzero entries: b i 6 ¼ 0 for all i. (Such a system, which has the form Equation 6 with u(Ax) [ 0, consists of n first-order linear differential PLoS Computational Biology | www.ploscompbiol.org January 2007 | Volume 3 | Issue 1 | e165 0027 Feedback in Neural Circuits equations in parallel, and is obviously fading-memory.) It is easy to see that, up to signs (À1) i , we have ad i f gðxÞ ¼ colðk i 1 b 1 ; . . . ; k i n b n Þ for i . 0, and the linear independence of gðxÞ; ad f gðxÞ; . . . ; ad nÀ1 f gðxÞ follows from the fact that these constant vectors form a Vandermonde matrix. Then we can pick u(x) as a linear map x ! ax, where a is any vector in R n that is orthogonal to all of the vectors colðk i 1 b 1 ; . . . ; k i n b n Þ ; i ¼ 0; 1; . . . ; n À 2 : The map x 7 !ðuðxÞ; L f uðxÞ; . . . ; L nÀ1 f uðxÞÞ is represented then also by a Vandermonde matrix, so it is a bijection. Hence, conditions 1)À3) of Theorem 4 are satisfied, which implies that the system (Equation 11) belongs to the class S n .
As a further example, we now consider the following system, which also has the general form of the neural network Equation 6: 1 ðtÞ ¼ Àk 1 x 1 ðtÞ þ rðx 2 ðtÞ þ ax 3 ðtÞÞ x 9 2 ðtÞ ¼ Àk 2 x 2 ðtÞ þ rðx 2 ðtÞ þ x 3 ðtÞÞ x 9 3 ðtÞ ¼ Àk 3 x 3 ðtÞ þ vðtÞ where u is a scalar function, smooth but otherwise arbitrary for now, and a as well as the k i are constants, also arbitrary for now. We will analyze this example using the Lie formalism described in the section Lie Brackets. The system has the form x9 ¼ f(x) þ g(x)v, with n ¼ 3, and f and g are the following vector fields: Note that the Jacobian g * of g is identically zero, which simplifies the computation of Lie brackets. We calculate ad f g( and [g,ad f g] ¼ (ad f g) * (x)g(x) as follows: The involutivity condition says that the set of vector fields fg,ad f gg should be involutive, which means that [g,ad f g](x) should be in the span of g(x) and ad f g(x) for all x. Let us evaluate g,ad f g,[g,ad f g] at the particular points for which x 3 ¼ 0, so that we obtain, respectively, three vectors v 1 ,v 2 ,v 3 that depend on x 2 only: is in the span of g(x) and ad f g(x) for all vectors x, then, in particular, v 3 (x 2 ) must belong to the span of v 1 (x 2 ) and v 2 (x 2 ) for all x 2 . This means that there is, for each x 2 , a scalar r(x 2 ) such that for all x 2 . So, if also a 6 ¼ 1, we conclude that u 99 (x 2 ) must vanish for all x 2 . Thus, the system in our example (assuming a = 2 f0,1g) is feedback linearizable only if u is a linear function.
On the other hand, consider now the cases a ¼ 0 or a ¼ 1. Then, the involutivity condition becomes the requirement that there should exist a scalar function r such that which can be achieved provided only that the function u9 is everywhere nonzero (which is true if u is, for example, a standard sigmoidal function), simply by taking r(x) ¼ u 99 (x 2 þ x 3 ) / u9(x 2 þ x 3 ). The linear independence condition amounts to showing that the set of vectors fg,ad f g,ad 2 f gg is linearly independent. Computing the determinant of the matrix that has these vectors as columns, when a ¼ 0 we obtain Àu9(x 2 )[u9(x 2 þ x 3 )] 2 , which is everywhere nonzero, provided that we again assume that u has an everywhere nonzero derivative. Thus, the Lie-theoretic conditions for feedback linearization are satisfied, for any choice of k i , when a ¼ 0. In the case a ¼ 1, the same computation gives a determinant of k 1 À k 2 )[u9(x 2 þ x 3 )] 2 , so the Lie-theoretic conditions for feedback linearization are satisfied, for any choice of k i such that k 1 6 ¼ k 2 .
Mathematical definitions and details to the proof of Theorem 2. Fading-memory filters. A map (or filter) F from input to output streams is defined to have fading memory if its current output at time t depends (up to some precision e) only on values of the input u during some finite time interval [t -T,t]. (We use in this section boldface letters to denote input streams, because they typically have a dimension larger than 1.) In formulas: F has fading memory if there exists for every e . 0 some d . 0 and T . 0 so that j (Fu)(t) À (Fũ )(t) j , e for any t 2 R and any input functions u,ũ with jj u(s) À ũ (s) jj , d for all s 2[t À T,t]. This is a characteristic property of all filters that can be approximated by an integral over the input stream u, or more generally by Volterra or Wiener series. Note that nontrivial Turing machines and FSMs cannot be approximated by filters with fading memory, since they require a persistent memory.
Finite state machines. The deterministic finite state machine (FSM), also referred to as deterministic finite automaton, is a standard model for a digital computer, or more generally for any realistic computational device that operates in discrete time with a discrete set of inputs and internal states [26]. One assumes that an FSM is at any time in one of some finite number l of states, and that it receives at any (discrete) time step one input symbol from some alphabet fs 1 ,. . .,s k g that may consist of any finite number k of symbols. Its ''program'' may consist of any transition function TR:fs 1 ,. . .,s k g 3 f1,. . .,lg ! f1,. . .,lg, where TR(s i ,j9) ¼ j denotes the new internal state j which the FSM assumes at the next time step after processing input symbol s i in state j9.
Precise statement of Theorem 2. We consider here a slight variation of the FSM model, which is more adequate for systems that operate in continuous time and receive analog inputs (for example, trains of spikes in continuous time). We assume that the raw input is some arbitrary n-dimensional input stream u (i.e., u(t) 2 R n for every t 2 R n ). Furthermore we assume that there exist pattern detectors F 1 ,. . .,F k that report the occurrence of spatio-temporal patterns in the input stream u from k different classes C 1 ,. . .,C k . In the case where the input u consists of spike trains, these classes could consist, for example, of particular patterns of firing rates, of particular spike patterns, or of particular correlation patterns among some of the input spike trains. It was shown in [5] that readouts from generic neural microcircuit models can easily be trained to approximate the role of such pattern detectors F 1 ,. . .,F k . We assume that the detection of a pattern from class C i by pattern detector F i affects the state of the FSM according to its transition function TR in a way that corresponds to the presentation of input symbol s i in the discrete-time version: if j9 was its preceding state, then it changes now within some finite switching time to state j ¼ TR(s i ,j9).
To make an implementation of such FSM by a noisy system feasible, we assume that the pattern detectors (F 1 u)(t),. . .,(F k u)(t) always assume values 0, except during a switching episode. During a switching episode, exactly one of the pattern detectors (F i u)(t) assumes values .0. We assume that this (F i u(t) reaches values !1 during this switching episode. We also assume that the length of each switching episode (i.e., the time during which some (F i u(t) assumes values .0) is bounded from above by some constant d, and that the temporal distance between the beginnings of any two different switching episodes is at least D þ 3d9 (where D is the assumed temporal delay of the feedback in the circuit). To avoid that the subsequent construction is based on unrealistic assumptions, we allow that each pattern detector F i is replaced by some arbitrary filterF i so that ðF i uÞðtÞ is a continuous function of time (with values in some arbitrary bounded range [ÀB,B]) with jðF i uÞðtÞ À ðF i uÞðtÞj 1 4 for any input stream u that is considered.
The informal statement of Theorem 2 is made precise by the subsequent Theorem 5 (see Figure 6 for an illustration). It exhibits a simple construction method whereby fading-memory filters with additive noise of bounded amplitude can be composed into a closed loop system C that emulates an arbitrary given FSM in a noise-robust manner. The resulting system C can be embedded into any other fading-memory system, which receives the outputs CL -Ĥ j (t) of C as additional inputs. In this way, any given fading-memory system can integrate the computational capability and nonfading states of the FSM that is emulated by C into its own real-time computation on time-varying input streams u.
An essential aspect of the proof of Theorem 5 is that suitable fading-memory filters H j can prevent in the closed loop the accumulation of errors through feedback, even if the ideal fading-memory filters H j are subsequently replaced by imperfect approximations Ĥ j . One just has to construct the ideal fading-memory filters H j in such a way that they take into account that their previous outputs, which have been fed back into the system C, may have been corrupted by additive noise. As long as this additive noise of bounded amplitude has not been amplified in the closed loop, the filters H j can still recover which of the finitely many states of the emulated FSM A was represented by that noise-corrupted feedback.
From the perspective of neural circuit models, it is of interest to note that the construction of the system C can be replaced by an adaptive procedure, whereby readouts from generic cortical microcircuit models are trained to approximate the target filters H j . General approximation results [4,5,37] imply that if the neural circuit is sufficiently large and contains sufficiently diverse components (for example, dynamic synapses with slightly different parameter values), then the actual outputs Ĥ j of these readouts can approximate the target filters H j uniformly up to any given maximal error e . 0. Theorem 5 guarantees that the resulting neural circuit model with these (imperfectly) trained readouts can in the closed loop emulate the given FSM A in a reliable manner, provided that the neural circuit model is sufficiently large and diverse so that its readout can achieve an approximation error e not larger than 1/4. Theorem 5. One can construct for any given FSM A, some time-invariant fading-memory filters H 1 ,. . .,H l with the property that any approximating filters Ĥ 1 ,. . .,Ĥ l with jH j À Ĥ j j 1/4 provide in the closed loop with delay D (see Figure 6) outputs CL À Ĥ 1 ,. . .,CL À Ĥ l that simulate the FSM A in the following sense: If [t 1 ,t 2 ] is some arbitrary time interval between switching episodes of the FSM A with noise-free pattern detectors (F 1 u)(t),. . .,(F k u)(t) during which A is in state j, then the outputs CL À Ĥ i (t) of the approximating filters Ĥ i in the closed loop with noisy pattern detectors ðF 1 uÞðtÞ; . . . ; ðF k uÞðtÞ satisfy CL À Ĥ j (t) ! 3/4 and CL À Ĥ j* (t) 1/4 for all j * 6 ¼ j and all t 2 [t 1 ,t 2 ].
Proof of the precise statement of Theorem 2. We present here a proof of Theorem 5 (see Precise Statement of Theorem 2 section above), which provides a formally precise version of Theorem 2.
To prove that the given FSM A can be implemented in a noiserobust fashion, we construct suitable time-invariant fading-memory filters H 1 ,. . .,H l . They receive as inputs the time-varying functions ðF 1 uÞðtÞ; . . . ; ðF k uÞðtÞ. In addition, they receive in the open-loop inputs v 1 (t),. . .,v 1 (t), where each v j (t) will be replaced by a delayed version of the output of H j (or Ĥ j ) in the closed loop (see Figure 6). The filters H j will be defined in such a way that H j (t) ! 1 signals in the closed loop that the FSM A is at time t in state j. To make this implementation noise-robust, we make sure that even if one replaces the filters H j by noisy approximations Ĥ j , which satisfy in the open loop j H j (t) À Ĥ j (t) j ¼ (for all t 2 R and any time-varying inputs ðF 1 uÞðtÞ; . . . ; ðF k uÞðtÞ and v 1 (t),. . .,v l (t)), then the closed-loop version of such imperfect approximations Ĥ j simulates the FSM A in such a way that Ĥ j (t) ! 1 = 3 implies that A is in state j at time t. Let D be the time delay in the feedback for the closed loop. We now define the target outputs H 1 (t),. . .,H l (t) (for the open-loop version, where the H j receive in addition to ðF 1 uÞðtÞ; . . . ; ðF k uÞðtÞ some arbitrary time-varying variables v 1 (t),. . .,v 1 (t) with values in [À1,2] as inputs). We define the target outputs of H 1 ,. . .,H l as a stationary transformation of the time-varying inputs v j (t) and of the outputs of the following two other types of time invariant fading-memory filters: (i) f i ðtÞ :¼ maxfðF i uÞðsÞ : t À D À d s tg for i ¼ 1,. . .,k; (ii) v j (t À 2d) for j ¼ 1,. . .,l. We will show below in Lemma 6 and Lemma 7 that both of these functions of time can be viewed as outputs of time-invariant fading-memory filters that receive as inputs the time-varying functions ðF i uÞðtÞ (for some arbitrary input stream u) and v j (t). On the basis of these two Lemmata, it is clear that the H j are timeinvariant fading-memory filters if one can define H 1 (t),. . .,H l (t) as (static) continuous functions of the variables v j (t) and the outputs of the filters (i) and (ii). In the following we sometimes refer to H 1 ,. . .,H l as static functions of input vectors (f 1 (t),. . .,f k (t)v 1 (t),. . .,v l (t),v 1 (t À 2d),. . .,v l (t À 2d)) from R kþ2l , and sometimes as filters with time-varying inputsF i u and v j (if we view the filters (i) and (ii) as being part of the computation of H j ). To define such functions H j (t), we first define for each j 2 f1,. . .,lg two disjoint closed and bounded sets S j,0 ,S j,1 2 R kþ2l , and we set H j (x) ¼ 0 for x 2 S j,0 and H j (x) ¼ 1 for x 2 S j,1 . Since the sets S j,0 and S j,1 will have positive distance (i.e., inffjj x À y jj: x 2S j,0 and y 2 S j,1 g . 0), it follows from standard arguments of analysis that the definition of H j can be continued outside of S j,0 ,S j,1 to yield a continuous function from R kþ2l into R.
It follows immediately from the definition of the sets S j,0 and S j,1 that they are closed and bounded. One can also verify immediately that for any j,j9 2 f1,. . .,lg the 2 conditions A j and B j9 can never be simultaneously satisfied (for any values of the variables f i (t),v j (t),v j (t À 2d)). In addition the conditions A j and A j9 (B j and B j9 ) can never be simultaneously satisfied for any j 6 ¼ j9. This implies that the sets S j,0 and S j,1 are disjoint for each j 2 f1,. . .,lg.

<
: where dist(x,S): ¼ inffjj x À y jj: y 2 Sg for any set S 2 R kþ2l . It is then obvious that H j is a continuous function from R kþ2l into [0,1] with H j (x) ¼ 0 for all x 2 S j,0 and Hj(x) ¼ 1 for all x 2 S j,1 . These functions H j will prevent the amplification of noise in the closed loop, since they assume outputs 1 or 0 in all relevant situations, even if their inputs deviate by up to ¼ from their ''ideal'' values. We consider some arbitrary imprecise and/or noisy versions Ĥj of these filters Ĥj (with inputs ðF 1 uÞðtÞ; . . . ; ðF k uÞðtÞ and additional inputs v 1 (t),. . .,v l (t) whose output differs at any time t by at most ¼ from that of H j (of course in the closed loop these deviations could be accumulated and amplified to values .¼). We want to show that for any such Ĥ 1 ,. . .,Ĥ l the closed loop version of the circuit implements the given FSM A. As initial condition we assume that the given FSM A is in state 1 for t 0, and consequently also that Ĥ 1 (t) ! 1 = 3 and Ĥ j (t) ¼ for j ¼ 2,. . .,l, as well as f i (t) ¼ for all t 0 and i ¼ 1,. . .,k.
We will now prove the claim of Theorem 5 for arbitrary time intervals [t 1 ,t 2 ] outside of switching episodes. We assume without loss of generality that t 2 marks the beginning of the next switching episode [t 2 ,t 3 ] for some t 3 . t 2 with j t 3 À t 2 j d. Furthermore we assume that either t 1 ¼ 0 (Case 1), or t 1 is the endpoint of the preceding switching episode [t 0 ,t 1 ] with j t 1 À t 0 j d (Case 2). The formal proof is carried out by induction on the number of preceding switching episodes (and Case 2 represents the induction step). In both cases one just needs to analyze the outputs of the previously defined filtersĤ j ðtÞ in the case where some of their inputs are delayed feedbacks of their previous outputs. Case 1: t 1 ¼ 0. We prove by a nested induction on m 2 N that CL À Ĥ 1 (t) ! 1 = 3 and CL À Ĥ j (t) ¼ for all j . 1 holds for all t 2 [m Á D,(m þ 1) Á D) \[t 1 ,t 2 ]. Since by assumption no switching episode occurs during [t 1 ,t 2 ], one has f i (t) ¼ for i ¼ 1,. . .,k and for all t 2 [t 1 ,t 2 ]. Furthermore, by our assumption on the initial condition of the FSM A (for m ¼ 0), or by the induction hypothesis of the nested induction (for m . 0), we can assume that the variables v j (t) of the open loop have now been assigned in the closed loop the values CL -Ĥ j (t À D); therefore, they are ! 1 = 3 for j ¼ 1 and ¼ for all j . 1. Hence condition B 1 in the definition of the sets S j , 0 ,S j,1 applies, and the current circuit input is therefore in S 1,1 . Thus H 1 ¼ 1 and H j ¼ 0 for j . 1, which implies Ĥ 1 ! Figure 6. Emulation of an FSM by a Noisy Fading-Memory System with Feedback According to Theorem 5 (A) Underlying open-loop system with noisy pattern detectors F 1 , . . ., F k and suitable fading-memory readouts Ĥ 1 , . . ., Ĥ l (which may also be subject to noise). (B) Resulting noise-robust emulation of an arbitrary given FSM by adding feedback to the system in (A). The same readouts as in (A) (denoted CL À Ĥ j (t) in the closed loop) now encode the current state of the simulated FSM. doi:10.1371/journal.pcbi.0020165.g006 1 = 3 and Ĥ j ¼ for j . 1 in the open loop, hence CL À Ĥ 1 (t) ! 1 = 3 and CL À Ĥj(t) ¼ for j . 1 in the closed loop (since v j (t) ¼ CL À Ĥ j (t À D) in the closed loop). = 4 and f i* (t) ¼ for all i * 6 ¼ i and for all t 2 [t9,t9 þ D þ d] (by the definition of the filters f i (t)). Furthermore, one has by the induction hypothesis that for the state j9 in which the FSM A was before the switching episode [t 0 ,t 1 ] that CL À Ĥ j9 (t À D À 2d ! 3 = 4 and CL À Ĥ j* (t À D À 2d) ¼ for all j * 6 ¼ j9 and all t 2 [t9,t9 þ D þ 2d]. We exploit here that t 0 t9 t 1 t 0 þ d, . Furthermore, we have assumed that the minimal distance between the beginnings of switching episodes is D þ 3d. Therefore, the considered range [t 0 À D À 2d,t 0 ] for t À D À 2d is contained in the preceding time interval before the switching episode [t 0 ,t 1 ] to which the induction hypothesis applies.
The previously listed conclusions imply that for t 2 [t9,t9 þ D þ d] the current input to the open loop lies in the set S j,1 for j ¼ TR(i,j9), hence H j ¼ 1 and Ĥ j ! 3 One can then prove by a nested induction on m 2 N like in Case 1 that the outputs CL À Ĥ j* (t) for j * ¼ 1,. . .,l have the desired values for t The preceding argument provides the verification of the claim for the initial step m ¼ 0 of this nested induction.
To complete the proof of Theorem 5, it only remains to verify the following two simple facts about time-invariant fading-memory filters. Lemma 6. Assume thatF i is some arbitrary time-invariant fading-memory filter, and D,d are arbitrary positive constants. Then the map that assigns to an input stream u the function f i ðtÞ :¼ maxfðF i uÞðsÞ : t À D À d s tg is also a time-invariant fading-memory filter.
Proof of Lemma 6: Assume some e . 0 is given. Fix d9 and T . 0 so that jðF i uÞðsÞ À ðF i vÞðsÞj , e for all s 2 [t À D À d,t] and all u,v with jj u(s) À v(s) jj , d9 for all s 2 [t À D À d À T,t].
Lemma 7. The filter that maps for some arbitrary fixed d . 0 the function u(t) onto the function u(t À 2d) is time-invariant and has fading memory.
Proof of Lemma 7 follows immediately from the definitions (choose T ! 2d in the condition for fading memory).
This completes the proof of Theorem 5, which shows that any given FSM can be reliably implemented by fading-memory filters with feedback even in the presence of noise.
Remark. In the application of this theory to cortical microcircuit models, we train readouts from such circuits to simultaneously assume the role of the pattern detectorsF 1 ; . . . ;F k , which become active if some pattern occurs in the input stream that may trigger a state change of the simulated FSM A, and the role of the fading-memory filters Ĥ 1 ,. . .,Ĥ l , which create high-dimensional attractors of the circuit dynamics that represent the current state of the FSM A.
Details of the cortical microcircuit models. We complement in this section the general description of the simulated cortical microcircuit models from the section Applications to Generic Cortical Microcircuit Models, providing in particular all missing data that are needed to reproduce our simulation results. The original code that was used for these simulations is online available at http://www.lsm. tugraz.at/research/index.html.
Each circuit consisted of 600 neurons, which were placed on the integer grid points of a 5 3 5 3 24 grid. Twenty percent of these neurons were randomly chosen to be inhibitory. The probability of a synaptic connection from neuron a to neuron b (as well as that of a synaptic connection from neuron b to neuron a) was defined as C Á exp(ÀD 2 (a,b)/k 2 ), where D(a,b) is the Euclidean distance between neurons a and b, and k is a parameter that controls both the average number of connections and the average distance between neurons that are synaptically connected (we set k ¼ 3.). Depending on whether the pre-or postsynaptic neuron was excitatory (E) or inhibitory (I), the value of C was set according to [44] to 0.3 (EE), 0.2 (EI), 0.4 (IE), 0.1 (II), yielding an average of 10,900 synapses for the chosen circuit size. External inputs and feedbacks from readouts were connected to populations of neurons in the circuit with randomly chosen connection strengths. I&F neurons. A standard leaky I&F neuron model was used, where the membrane potential V m of a neuron is given by: where t m is the membrane time constant (30 ms), which subsumes the time constants of synaptic receptors as well as the time constant of the neuron membrane. Other parameters are: absolute refractory period 3 ms (excitatory neurons), 2 ms (inhibitory neurons); threshold 15 mV (for a resting membrane potential V resting , assumed to be 0), reset voltage drawn uniformly from the interval [13.8 mV, 14.5 mV] for each neuron; input resistance R m , 1 MX, constant nonspecific background current I inject uniformly drawn from the interval [13.8 mV, 14.5 mV] for each neuron; input resistance R m , 1 MX, constant nonspecific background current I inject uniformly drawn from the interval [13.5 nA, 14.5 nA] for each neuron; additional time-varying noise input current I noise drawn every 5 ms from a Gaussian distribution with mean 0; and SD chosen randomly for each neuron from the uniform distribution over the interval [4.0 nA, 5.0 nA]. For each simulation, the initial condition of each I&F neuron, i.e., its membrane voltage at time t ¼ 0, was drawn randomly (uniform distribution) from the interval [13.5 mV, 14.9 mV]. Finally, I syn (t) is the sum of input currents supplied by the explicitly modeled synapses. HH neurons: We used single-compartment HH neuron models with passive and active properties modeled according to [48,49]. The membrane potential was modeled by C m dV dt ¼ Àg l ðV À E l Þ À I Na À I Kd À I M À 1 a I noise À I syn ; where C m ¼ 1lF/cm 2 is the specific membrane capacitance, g L ¼ 0.045 mS/cm 2 is the leak conductance density, E L ¼ À80 mV is the leak reversal potential, and I syn (t) is the input current supplied by explicitly modeled synapses (see the definition below). The membrane area a of the neuron was set to be 34,636 lm 2 as in [48]. The term I noise (t) (see the precise definition below) models smaller background input currents from a large number of more distal neurons, causing a depolarization of the membrane potential and a lower input resistance commonly referred to as ''high conductance state'' (for a review see [42]).
In accordance with experimental data on neocortical and hippocampal pyramidal neurons ( [50][51][52][53]) the active currents in the HH neuron model comprise a voltage dependent Na þ current I Na ( [54]) and a delayed rectifier K þ current I Kd ( [54]). For excitatory neurons, a noninactivating K þ current I M ( [55]) responsible for spike frequency adaption was included in the model.
The voltage-dependent Na þ current was modeled by: where V T ¼À63 mV, and the inactivation was shifted by 10 mV toward hyperpolarized values (V S ¼ 10 mV) to reflect the voltage dependence of Na þ currents in neocortical pyramidal cells [56]. The peak conductance densities for the I Na current was chosen to be 500 pS/ lm 2 .
The delayed rectifier K þ current was modeled by: I Kd ¼ g Kd n 4 ðV À E K Þ dn dt ¼ a n ðVÞð1 À nÞ À b n ðVÞn a n ¼ À0:032ðV À V T À 15Þ exp½ÀðV À V T À 15Þ=5 À 1 firing behavior, although they only received input from a circuit with fading memory, because they were actually trained to acquire the following behavior: fire whenever the rate in input stream 1 becomes higher than 30 Hz, or if one can detect in the current state x(t) of the circuit traces of recent high feedback values, provided the rate of input stream 2 stayed below 30 Hz. Obviously this definition of the learning target for readout neurons only requires a fading memory of the circuit.
The readouts for the other three tasks achieved in 50 tests for new inputs over 1 s (that had been generated by the same distribution as the training inputs, see the preceding description) showed the following average performance: task of panel E: mean correlation: 0.85, task of panel F: mean correlation: 0.63, task of panel G: mean correlation: 0.86.
Technical details of Figure 2. The same circuit as for Figure 5 was used. First, two linear readouts with feedback were simultaneously trained to become highly active after the occurrence of the cue in the spike input, and then to linearly reduce their activity, but each within a different timespan (400 ms versus 600 ms). Their feedback into the circuit consisted of two time-varying analog values (representing time-varying firing rates of two populations of neurons), which were both injected (with randomly chosen amplitudes) into the same subset of 350 neurons in the circuit. Their weights w were trained by linear regression for a total training time of 120 s (of simulated biological time), consisting of 120 runs of length 1 s with randomly generated input cues (a burst at 200 Hz for 50 ms) and noise inputs (five spike trains at 10 Hz).
Technical details of Figure 3. Time-varying firing rates for the two input streams (each consisting of eight Poisson spike trains) were drawn randomly from values between 10 Hz and 90 Hz. The 16 spike trains from the two input streams, as well as feedback from trained readouts were injected into randomly chosen subsets of neurons. In contrast to the experiment for Figure 3, these circuit inputs were not injected into spatially concentrated clusters of neurons, but to a sparsely distributed subset of neurons scattered throughout the three-dimensional circuit. As a consequence, the firing activity CA(t) of the high-dimensional attractor (see Figure 3D) cannot be readily detected from the spike raster in Figure 3C. Both the linear readout that sends feedback, and subsequently the other two linear readouts (whose output for a test input to the circuit is shown in Figure 3E and 3F), were trained by linear regression during 140 s of simulated biological time.
Average performance of linear readouts on 100 new test inputs of length 700 ms (that had been generated from the same distribution as the training inputs) was-task of panel D, mean correlation: 0.82; task of panel E, mean correlation: 0.71; task of panel F, mean correlation: 0.79.
Control experiments (see Figure 7) show that the feedback is essential for the performance of the circuit for these computational tasks.