Learning Universal Computations with Spikes

Providing the neurobiological basis of information processing in higher animals, spiking neural networks must be able to learn a variety of complicated computations, including the generation of appropriate, possibly delayed reactions to inputs and the self-sustained generation of complex activity patterns, e.g. for locomotion. Many such computations require previous building of intrinsic world models. Here we show how spiking neural networks may solve these different tasks. Firstly, we derive constraints under which classes of spiking neural networks lend themselves to substrates of powerful general purpose computing. The networks contain dendritic or synaptic nonlinearities and have a constrained connectivity. We then combine such networks with learning rules for outputs or recurrent connections. We show that this allows to learn even difficult benchmark tasks such as the self-sustained generation of desired low-dimensional chaotic dynamics or memory-dependent computations. Furthermore, we show how spiking networks can build models of external world systems and use the acquired knowledge to control them.


Introduction
The understanding of neural network dynamics on the mesoscopic level of hundreds and thousands of neurons and their ability to learn highly complicated computations is a fundamental open challenge in neuroscience.For biological systems, such an understanding will allow to connect the microscopic level of single neurons and the macroscopic level of cognition and behavior.In artificial computing, it may allow to propose new, possibly more efficient computing schemes.
Randomly connected mesoscopic networks can be a suitable substrate for computations [1,4,3,5,5], as they reflect the input in a complicated, nonlinear way and at the same time maintain, like a computational "reservoir", fading memory of past inputs as well as of transformations and combinations of them.This includes the results of computations on current and past inputs.Simple readout neurons may then learn to extract the desired result; the computations are executed in real time, i.e. without the need to wait for convergence to an attractor ("reservoir computing") [1,4].Non-random and adaptive network connectivity can change performance [6,7,8].
Networks with higher computational power, in particular with the additional ability to learn selfsustained patterns of activity and persistent memory, require an output feedback or equivalent learning of their recurrent connections [4,3].However, network modeling approaches achieving such universal (i.e.general purpose) computational capabilities so far concentrated on networks of continuous rate units [4,5], which do not take into account the characteristics that neurons in biological neural networks communicate via spikes.Indeed, the dynamics of spiking neural networks are discontinuous, usually highly chaotic, variable, and noisy.Readouts of such spiking networks show low signal-to-noise ratios.This hinders computations following the described principle in particular in presence of feedback or equivalent plastic recurrent connections, and has questioned it as model for computations in biological neural systems [9,10,11].
Here we first introduce a class of recurrent spiking neural networks that are suited as a substrate to learn universal computations.They are based on standard, established neuron models, take into account synaptic or dendritic nonlinearities and are required to respect some structural constraints regarding the connectivity of the network.To derive them we employ a precise spike coding scheme similar to ref. [3], which was introduced to approximate linear continuous dynamics.
Thereafter we endow the introduced spiking networks with learning rules for either the output or the recurrent connection weights and show that this enables them to learn equally complicated, memory dependent computations as non-spiking continuous rate networks.The spiking networks we are using have only medium sizes, between tens and a few thousands of neurons, like networks of rate neurons employed for similar tasks.We demonstrate the capabilities of our networks by applying them to challenging learning problems which are of importance in biological contexts.In particular, we show how spiking neural networks can learn the self-sustained generation of complicated dynamical patterns, and how they can build world models, which allow to compute optimal actions to appropriately influence an environment.

Continuous signal coding spiking neural networks (CSNs) Network architecture
For our study, we use leaky integrate-and-fire neurons.These incorporate crucial features of biological neurons, such as operation in continuous time, spike generation and reset, while also maintaining some degree of analytical tractability.A network consists of N neurons.The state of a neuron n is given by its membrane potential V n (t).The membrane potential performs a leaky integration of the input and a spike is generated when V n (t) reaches a threshold, resulting in a spiketrain with spike times t n and the Dirac delta-distribution δ.After a spike, the neuron is reset to the reset potential, which lies θ below the threshold.The spike train generates a train of exponentially decaying normalized synaptic currents where τ s = λ −1 s is the time constant of the synaptic decay and Θ( .) is the Heaviside theta-function.Throughout the article we consider two closely related types of neurons, neurons with saturating synapses and neurons with nonlinear dendrites (cf.Fig. 1).In the model with saturating synapses (Fig. 1a), the membrane potential V n (t) of neuron n obeys A nm tanh (γr m (t)) + V r λ s r n (t) − θs n (t) + I e,n (t), with membrane time constant τ m = λ −1 V .The saturation of synapses, e.g.due to receptor saturation or finite reversal potentials, acts as a nonlinear transfer function [13,14], which we model as a tanh-nonlinearity (since r m (t) ≥ 0 only the positive part of the tanh becomes effective).We note that this may also be interpreted as a simple implementation of synaptic depression: A spike generated by neuron m at t m leads to an increase of r m (t m ) by 1.As long as the synapse connecting neuron m to neuron n is far from saturation (linear part of the tanh-function) this leads to the consumption of a fraction γ of the synaptic "resources" and the effect of the spike on the neuron is approximately the effect of a current A nm γe −λs(t−tm) Θ(t − t m ).When a larger number of such spikes arrive in short time such that the consumed resources accumulate to 1 and beyond, the synapse saturates at its maximum strength A nm and the effect of individual inputs is much smaller than before.The recovery from depression is here comparably fast, it takes place on a timescale of λ −1 s (compare, e.g., [15]).The reset of the neuron is incorporated by the term −θs n (t).The voltage lost due to this reset is partially recovered by a slow recovery current (afterdepolarization) V r λ s r n (t); its temporally integrated size is given by the parameter V r .This is a feature of many neurons e.g. in the neocortex, (a,b): A neuron with saturating synapses (a) that directly codes for a continuous signal (b).Panel (a) displays the neuron with an axon (red) and dendrites (dark blue) that receive inputs from the axons of other neurons (axons at the bottom) via saturating synapses (symbolized by sigmoids at the synaptic contacts).The currents entering the soma are weighted sums of input spike trains that are synaptically filtered (generating scaled normalized synaptic currents γrn(t), synaptic time scale τs) and thereafter subject to a saturating synaptic nonlinearity.External inputs (axons at the top) are received without saturation.The continuous signal x(t) (panel b left hand side, green) is the sum of the neuron's membrane potential V (t) (red) and its scaled normalized synaptic current θr(t) (dark blue).r(t) is a low-pass filtered version of the neuron's spike train s(t) (light blue in red box).If x(t) > 0, the time scale of x(t) should be large against the synaptic time scale τs and x(t) should predominantly be large against the neuron's threshold, θ /2 (panel b right hand side, assumptions [1,2] in the main text).x(t) is then already well approximated by θr(t), while V (t) is oscillating between ± θ /2.If x(t) ≤ 0, we have V (t) ≤ 0, no spikes are generated and r(t) quickly decays to zero, such that we predominantly have r(t) ≈ 0 and x(t) is well approximated by V (t) (cf.Equation (9)).(c,d): Two neurons with nonlinear dendrites (c) from a larger network that distributedly codes for a continuous signal (d).(c): Each neuron has an axon (red) and different types of dendrites (cyan, light blue and dark blue) that receive inputs from the axons of other neurons (axons at the bottom) via fast or slow conventional synapses (highlighted by circles and squares).Linear dendrites with slow synapses (cyan with circle contacts) generate somatic currents that are weighted linear sums of low-pass filtered presynaptic spike trains (weighted sums of the rn(t)).Linear dendrites with fast synapses (light blue with square contacts) generate somatic currents with negligible filtering (weighted sums of the spike trains sn(t)).Spikes arriving at a nonlinear dendrite (dark blue) are also filtered (circular contact).The resulting rn(t) are weighted, summed up linearly in the dendrite and subjected to a saturating dendritic nonlinearity (symbolized by sigmoids at dendrites), before entering the soma.We assume that the neurons have nonlinear dendrites that are located in similar tissue areas, such that they connect to the same sets of axons and receive similar inputs.(d): All neurons in the network together encode J continuous signals x(t) (one displayed in green) by a weighted sum of their membrane potentials V(t) (two traces of different neurons displayed in red) and their normalized PSCs r(t) (two traces displayed in cyan).The Γr(t) alone already approximate x(t) well.The neurons' output spike trains s(t) (light blue in red box) generate slow and fast inputs to other neurons.(Note that spikes can be generated due to suprathreshold excitation by fast inputs.Since we plot V(t) after fast inputs and possible resets, the corresponding threshold crossings do not appear.) in the hippocampus and in the cerebellum [16], and may be caused by different types of somatic or dendritic currents, such as persistent and resurgent sodium and calcium currents, or by excitatory autapses [17,18].It provides a simple mechanism to sustain (fast) spiking and generate bursts, e.g. in response to pulses.I e,n (t) is an external input, its constant part may be interpreted as sampling slow inputs specifying the resting potential that the neuron asymptotically assumes for long times without any recurrent network input.We assume that the resting potential is halfway between the reset potential V res and the threshold V res + θ.We set it to zero such that the neuron spikes when the membrane potential reaches θ /2 and resets to −θ /2.To test the robustness of the dynamics we sometimes add a white noise input η n (t) satisfying η n (t)η m (t ) = σ 2 η δ nm δ(t − t ) with the Kronecker delta δ nm .
For simplicity, we take the parameters λ V , θ, V r and γ, λ s identical for all neurons and synapses, respectively.We take the membrane potential V n and the parameters V r and θ dimensionless, they can be fit to the voltage scale of biological neurons by rescaling with an additive and a multiplicative dimensionful constant.Time is measured in seconds.
We find that networks of the form Equation (3) generate dynamics suitable for universal computation similar to continuous rate networks [4,5], if 0 < λ x λ s , where λ x = λ s 1 − Vr θ , A nm sufficiently large and γ small.The conditions result from requiring the network to approximate a nonlinear continuous dynamical system (see next section).
An alternative interpretation of the introduced nonlinearity is that the neurons have nonlinear dendrites, where each nonlinear compartment is small such that it receives at most one (conventional, nonsaturating) synapse.A nm is then the strength of the coupling from a dendritic compartment to the soma.This interpretation suggests an extension of the neuron model allowing for several dendrites per neuron, where the inputs are linearly summed up and then subjected to a saturating dendritic nonlinearity [19,20,21].Like the previous model, we find that such a model has to satisfy additional constraints to be suitable for universal computation: Neurons with nonlinear dendrites need additional slow and fast synaptic contacts which arrive near the soma and are summed linearly there (Fig. 1c).Such structuring has been found in biological neural networks [22].We gather the different components into a dynamical equation for V n as D nj is the coupling from the jth dendrite of neuron n to its soma.The total number of dendrites and neurons is referred to as J and N respectively.W njm is the coupling strength from neuron m to the jth nonlinear dendrite of neuron n. η n (t).To increase the richness of the recurrent dynamics and the computational power of the network (cf.[23] for disconnected units without output feedback) we added inhomogeneity, e.g. through the external input current in some tasks.In the control/mental exploration task, we added a constant bias term b j as argument of the tanh to introduce inhomogeneity.
We find that the network couplings D, W, U and Ũ (4) (we use bold letters for vectors and matrices) should satisfy certain interrelations.As motivated in the subsequent section and derived in the supporting material, their components may be expressed in terms of the components of a J × N matrix Γ, and a J × J matrix A as , where a = λ s − λ x and µ ≥ 0 is small (see also Table 1 for an overview).The thresholds are chosen identical, θ n = θ, see Methods.
Again, the conditions result from requiring the network to approximate a nonlinear continuous dynamical system.This system, Equation (11), is characterized by the J × J coupling matrix A and a J-dimensional input c(t) whose components are identical to the J independent components of the external input current I e in equation (4); the matrix Γ is a decoding matrix that fixes the relation between spiking and continuous dynamics (see next section).We note that the matrices Γ and A are largely unconstrained, such that the coupling strengths maintain a large degree of arbitrariness.Ideally, W njm is independent of n, therefore neurons have dendrites that are similar in their input characteristics to dendrites in some other neurons (note that D may have zero entries, so dendrites can be absent).We interpret these as dendrites that are located in a similar tissue area and therefore connect to the same axons and receive similar inputs (cf.Fig. 1c for an illustration).The interrelations between the coupling matrices might be realized by spike-timing dependent synaptic or structural plasticity.Indeed, for a simpler model and task, appropriate biologically plausible learning rules have been recently highlighted [2,25].We tested robustness of our schemes against structural perturbations (see Figs. C, D in S1 Supporting Information), in particular for deviations from the n-independence of W njm (Fig. C in S1 Supporting Information).The networks Equation (3) with saturating synapses have a largely unconstrained topology, in particular they can satisfy the rule that neurons usually act only excitatorily or inhibitorily.For the networks Equation ( 4) with nonlinear dendrites, it is less obvious how to reconcile the rule with the constraints on the network connectivity.Solutions for this have been suggested in simpler systems and are subject to current research [3].
The key property of the introduced neural architecture is that the spike trains generated by the neurons encode with high signal-to-noise ratio a continuous signal that can be understood in terms of ordinary differential equations.In the following section we show how this signal is decoded from the spike trains.Thereafter, we may conclude that the spiking dynamics are sufficiently "tamed" such that standard learning rules can be applied to learn complicated computations.

Direct encoding of continuous dynamics
The dynamics of a neural network with N integrate-and-fire neurons consist of two components, the sub-threshold dynamics V(t) = (V 1 (t), ..., V N (t)) T of the membrane potentials and the spike trains s(t) = (s 1 (t), ..., s N (t)) T (Equation 1), which are temporal sequences of δ-distributions.In the model with saturating synapses, all synaptic interactions are assumed to be significantly temporally filtered, such that the V n (t) are continuous except at reset times after spiking (Equation ( 3)).We posit that the V(t) and the s(t) should together form some N -dimensional continuous dynamics x(t) = (x 1 (t), ..., x N (t)) T .The simplest approach is to setup x(t) as a linear combination of the two components V(t) and s(t).To avoid infinities in x n (t), we need to eliminate the occurring δdistributions by employing a smoothed version of s n (t).This should have a finite discontinuity at spike times such that the discontinuity in V n (t) can be balanced.A straightforward choice is to use θr n (t) (Equation ( 2)) and to set (cf.Fig. 1b).When the abovementioned conditions on λ x , λ s , A and γ are satisfied (cf.end of the section introducing networks with saturating synapses), the continuous signal x(t) follows a system of first order nonlinear ordinary differential equations similar to those describing standard non-spiking continuous rate networks used for computations (cf.[4,5,6] and Equation ( 11) below), with the rectifications [x n (t)] + = max (x n (t), 0) , [x n (t)] − = min (x n (t), 0).We call spiking networks where this is the case continuous signal coding spiking neural networks (CSNs).
Except for the rectifications, Equation ( 6) has a standard form for non-spiking continuous rate networks, used for computations [4,5,6].A salient choice for λ x is λ x = λ V , i.e.V r = 1 − λ V λs θ, such that the rectifications outside the tanh-nonlinearity vanish.Equation (6) generates dynamics that are different from the standard ones in the respect that the trajectories of individual neurons are, e.g. for random Gaussian matrices A, not centered at zero.However, they can satisfy the conditions for universal computation (enslaveability/echo state property and high dimensional nonlinear dynamics) and generate longer-term fading memory for appropriate scaling of A. Also the corresponding spiking networks are then suitable for fading memory-dependent computations.Like for the standard networks [7,28], we can derive sufficient conditions to guarantee that the dynamics Equation ( 6) are enslaveable by external signals (echo state property).A < min(λ V , λ x ), where A is the largest singular value of the matrix A, provides such a condition (see Supplementary material for the proof).
The condition is rather strict, our applications indicate that the CSNs are also suited as computational reservoirs when it is violated.This is similar to the situation in standard rate network models [7].We note that if the system is enslaved by an external signal, the time scale of x n (t) is largely determined by this signal and not anymore by the intrinsic scales of the dynamical system.
We will now show that spiking neural networks Equation (3) can encode continuous dynamics Equa-tion (6).For this we derive the dynamical equation of the membrane potential (3) from the dynamics of x(t) using the coding rule Equation ( 5), the dynamical equation (2) for r n (t) and the rule that a spike is generated whenever V n (t) reaches threshold θ /2: We first differentiate Equation (5) to eliminate ẋn (t) from Equation ( 6) and employ Equation (2) to eliminate ṙn (t).The resulting expression for For this, we make two assumptions (cf.Fig. 1b) on the x n (t) if they are positive: [1] The dynamics of x n (t) are slow against the synaptic timescale τ s , [2] the x n (t) assume predominantly values x n (t) θ /2.
First we consider the case x n (t) > 0. Since V n (t) is reset when it reaches its threshold value θ /2, V n (t) is always smaller than θ /2.Thus, given V n (t) > 0 assumption [2] implies that we can approximate is negative and its absolute value is not large against θ /2.Furthermore, assumption [1] implies that smaller negative V n (t) cannot co-occur with positive x n (t): r n (t) is positive and in the absence of spikes it decays to zero on the synaptic time scale τ s (Equation ( 2)).When V n (t) < 0, neuron n is not spiking anymore.Thus when V n (t) is shrinking towards small negative values and r n (t) is decaying on a timescale of τ s , x n (t) is also decaying on a time-scale τ s .This contradicts assumption [1].Thus when x n (t) > 0, the absolute magnitude of V n (t) is on the order of θ /2.With assumption [2] we can thus set x n (t) ≈ θr n (t), whenever x n (t) > 0, neglecting contributions of size θ /2.Now we consider x n (t) ≤ 0. This implies V n (t) ≤ 0 (since always r n (t) ≥ 0) as well as a quick decay of r n (t) to zero.When x n (t) assumes values significantly below zero, assumption [1] implies that we have x n (t) ≈ V n (t) and r n (t) ≈ 0, otherwise x n (t) must have changed from larger positive (assumption [2]) to larger negative values on a timescale of τ s .
The approximate expressions may be gathered in the replacements Using these in Equation ( 7) yields together with Note that our replacements allowed to eliminate the biologically implausible V-dependencies in the interaction term.
To simplify the remaining V n (t)-dependence, we additionally assume that [2'] x n (t) assumes predominantly values x n (t) λ V θ/(2λ x ), if x n (t) is positive.This can be stricter than [2] depending on the values of λ x and λ V .For positive x n (t), where λ V [x n (t)] − in Equation ( 7) is zero, λ V V n (t) has an absolute magnitude on the order of λ V θ/2 (see the arguments above).[2'] implies that this is negligible against −λ x [x n (t)] + .For negative x n (t), we still have x n (t) ≈ V n (t).This means that we may replace in Equation (7).Taken together, under the assumptions [1,2,2'] we may use the replacements in Equation (7), which directly yield Equation (3).Note that this also implies r n (t) θ /2 if the neuron is spiking, so during active periods inter-spike-intervals need to be considerably smaller than the synaptic time scale.Equation (6) implies that the assumptions are justified for suitable parameters: For fixed parameters s and θ of the r-dynamics, we can choose sufficiently small λ x , large A nm and small γ to ensure assumptions [1,2,2'] (cf. the conditions highlighted in the section "Network architecture").On the other hand, for given dynamics Equation ( 6), we can always find a spiking system which generates the dynamics via Equations ( 3), ( 2) and ( 5), and satisfies the assumptions: We only need to choose τ s sufficiently small such that [1] is satisfied and the spike threshold sufficiently small such that [2,2'] are satisfied.For the latter, γ needs to be scaled like θ to maintain the dynamics of x n and V r needs to be computed from the expression for λ x .Interestingly, we find that also outside the range where the assumptions are satisfied, our approaches can still generate good results.
The recovery current in our model has the same time constant as the slow synaptic current.Indeed, experiments indicate that they possess the same characteristic timescales: Timescales for NMDA [29] and slow GABA A [30,31] receptor mediated currents are several tens of milliseconds.Afterdepolarizations have timescales of several tens of milliseconds as well [32,33,16,34,35].Another prominent class of slow inhibitory currents is mediated by GABA B receptors and has time scales of one hundred to a few hundreds of milliseconds [36].We remark that in our model the time constants of the afterdepolarization and the synaptic input currents may also be different without changing the dynamics: Assume that the synaptic time constant is different from that of the recovery current, but still satisfies the conditions that it is large against the inter-spike-intervals when the neuron is spiking and small against the timescale of [x n (t)] + .The synaptic current generated by the spike train of neuron n will then be approximately continuous and the filtering does not seriously affect its overall shape beyond smoothing out the spikes.As a consequence, the synaptic and the recovery currents are approximately proportional up to a constant factor that results from the different integrated contribution of individual spikes to them.Rescaling γ by this factor thus yields dynamics equivalent to the one with identical time constants.

Distributed encoding of continuous dynamics
In the above-described simple CSNs (CSNs with saturating synapses), each spiking neuron gives rise to one nonlinear continuous variable.The resulting condition that the inter-spike-intervals are small against the synaptic time constants if the neuron is spiking may in biological neural networks be satisfied for bursting or fast spiking neurons with slow synaptic currents.It will be invalid for different neurons and synaptic currents.The condition becomes unnecessary when the spiking neurons encode continuous variables collectively, i.e. if we partially replace the temporal averaging in r n (t) by an ensemble averaging.This can be realized by an extension of the above model, where only a lower, say J−, dimensional combination x(t) of the N −dimensional vectors V(t) and r(t) is continuous, where L and Γ are J × N matrices (note that Equation ( 5) is a special case with N = J and diagonal matrices L and Γ).We find that spiking networks with nonlinear dendrites Equation ( 4) can encode such a lower dimensional variable x(t).The x(t) satisfy J-dimensional standard equations describing non-spiking continuous rate networks used for reservoir computing [4,5,6], We denote the resulting spiking networks as CSNs with nonlinear dendrites.
The derivation (see Supplementary material for details) generalizes the ideas introduced in refs.[1,2,3] to the approximation of nonlinear dynamical systems: We assume an approximate decoding equation (cf.also Equation ( 9)), where Γ is a J × N decoding matrix and employ an optimization scheme that minimizes the decoding error resulting from Equation ( 12) at each time point.This yields the condition that a spike should be generated when a linear combination of x(t) and r(t) exceeds some constant value.We interpret this linear combination as membrane potential V(t).Solving for x(t) gives L and Γ in terms of Γ in Equation (S1).Taking the temporal derivative yields V(t), first in terms of ẋ(t) and ṙ(t) and after replacing them via Equations ( 2), (11), in terms of x(t), r(t) and s(t).We then eliminate x(t) using ( 12) and add a membrane potential leak term for biological realism and increased stability of numerical simulations.This yields Equation (4) together with the optimal values of the parameters given in Table 1.We note that the difference to the derivation in ref. [3] is the use of a nonlinear equation when replacing ẋ(t).We further note that the spiking approximation of the continuous dynamics becomes exact, if in the last step x(t) is eliminated using Equation (S1) and the leak term is omitted as it does not arise from the formalism in contrast to the case of CSNs with saturating synapses.
Like in CSNs with saturating synapses, using the approximated decoding Equation ( 12) eliminates the biologically implausible V-dependencies in the interaction terms.For an illustration of this coding see Fig. 1d.

Learning universal computations
Recurrent continuous rate networks are a powerful means for learning of various kinds of computations, like steering of movements and processing of sequences [4,5].For this, an input and/or an output feedback signal needs to be able to "enslave" the network's high-dimensional dynamics [7,28].This means that at any point in time the network's state is a deterministic function of the recent history of input and feedback signals.The function needs to be high dimensional, nonlinear, and possess fading memory.A standard model generating suitable dynamics are continuous rate networks of the form Equation (11).Due to the typically assumed random recurrent connectivity, each neuron acts as a randomly chosen, nonlinear function with fading memory.Linearly combining them like basis functions by a linear readout can approximate arbitrary, nonlinear functions with fading memory (time-scales are limited by the memory of the network), and in this sense universal computations on the input and the feedback.The feedback can prolong the fading memory and allow to generate self-contained dynamical systems and output sequences [4,3,5,38].The feedback can be incorporated into the network by directly training the recurrent synaptic weights [5,38].
Our understanding of the complex spiking dynamics of CSNs in terms of nonlinear first order differential equations enables us to apply the above theory to spiking neural networks: In the first step, we were able to conclude that our CSNs can generate enslaveable and thus computationally useful dynamics as they can be decoded to continuous dynamics that possess this property.In the second step, we have to ask which and how output signals should be learned to match a desired signal: In a biological setting, the appropriate signals are the sums of synaptic or dendritic input currents that spike trains generate, since these affect the somata of postsynaptic neurons as well as effectors such as muscles [39].To perform, e.g., a desired continuous movement, they have to prescribe the appropriate muscle contraction strengths.For both CSNs with saturating synapses and with nonlinear dendrites, we choose the outputs to have the same form as the recurrent inputs that a soma of a neuron within the CSN receives.Accordingly, in our CSNs with saturating synapses, we interpret sums of the postsynaptic currents as output signals, where the index k distinguishes K out different outputs, and w o km are the learnable synaptic output weights.For networks with nonlinear dendrites the outputs are a linear combination of inputs preprocessed by nonlinear dendrites where the strengths w o kj of the dendro-somatic coupling are learned [40].The networks can now learn the output weights such that z k (t) imitates a target signal F k (t), using standard learning rules for linear readouts (see Fig. 2a for an illustration).We employ the recursive least squares method [41].
To increase the computational and learning abilities, the output signals should be fed back to the network as an (additional) input (Fig. 2b) where each neuron receives a linear combination of the output signals z k (t) with static feedback connection strengths w f βk .Here and in the following Greek letter indices such as β, ρ range over all saturating synapses (β, ρ = 1, ..., N ; rβ (t) = tanh (γr β (t))) in CSNs with saturating synapses, or over all nonlinear dendrites (β, ρ = 1, ..., J; rβ (t) = tanh N m=1 Γ βm r m (t) ) in CSNs with nonlinear dendrites.
It often seems biologically more plausible not to assume a strong feedback loop that enslaves the recurrent network, but rather to train recurrent weights.Our CSNs allow for this (Fig. 2c (a) A static continuous signal coding spiking neural network ( CSN, gray shaded) serves as a spiking computational reservoir with high signal-to-noise ratio.The results of computations on current and past external inputs Ie can be extracted by simple neuronlike readouts.These linearly combine somatic inputs generated by saturating synapses or nonlinear dendrites, r (red), to output signals z (Equations (13,14)).The output weights w o are learned such that z approximates the desired continuous target signals.(b) Plastic continuous signal coding spiking neural networks (PCSNs) possess a loop that feeds the outputs z back via static connections as an additional input I f e (blue, Equation 15).Such networks have increased computational capabilities allowing them to, e.g., generate desired self-sustained activity.(c) The feedback loop can be incorporated into the recurrent network via plastic recurrent connections (red in gray shaded area).
dendrites (CSNs with nonlinear dendrites) and the soma [40] (we learn A nm , see Methods for details of the implementation).We note that approximating different dynamical systems, e.g.ones equivalent to Equation ( 11) but with the coupling matrix inside the nonlinearity [42], may also in CSNs with nonlinear dendrites allow to learn synaptic weights in similar manner.We call CSNs with learning of outputs in presence of feedback, or with learning of recurrent connections plastic continuous signal coding spiking neural networks (PCSNs).
To learn feedback and recurrent connections, we use the FORCE imitation learning rule, which has recently been suggested for networks of continuous rate neurons [5,38]: We use fast online learning based on the recursive least squares rule of the output weights in order to ensure that the output of the network is similar to the desired output at all times.Since during training the output is ensured to be close to the desired one, it can be used as feedback to the network at all times.The remaining deviations from the desired output are expected to be particularly suited as training noise as they reflect the system's inherent noise.As mentioned before, the feedback loop may be incorporated in the recurrent network connectivity.During training, the reservoir connections are then learned in a similar manner as the readout.
In the following, we show that our approach allows spiking neural networks to perform a broad variety of tasks.In particular, we show learning of desired self-sustained dynamics at a degree of difficulty that has, to our knowledge, previously only been accessible with continuous rate networks.

Applications Self-sustained pattern generation
Animals including humans can learn a great variety of movements, from periodic patterns like gait or swimming, to much more complex ones like producing speech, generating chaotic locomotion [43,44] or playing the piano.Moreover when an animal learns to use an object (Fig. 3a), it has to learn the dynamical properties of the object as well as how its body behaves when interacting with it.Especially for complex, non-periodic dynamics, a dynamical system has to be learned with high precision.
How are spiking neural networks able to learn dynamical systems, store them and replay their activity?We find that PCSNs may solve the problem.They are able to learn periodic patterns of different degree of complexity as well as chaotic dynamical systems by imitation learning.Fig. 3 illustrates this for PCSNs with nonlinear synapses (Fig. 3d,e) and with nonlinear dendrites (Fig. 3b,e,f).
The figure displays the recall of three periodic movements after learning: a sine wave, a more complicated non-differentiable saw tooth pattern and a "camel's hump" superposition of sine and cosine.Also for long simulation times, we find no deviation from the displayed dynamics except for an inevitable phase shift (Fig. Ga  This indicates that the network has learned a stable periodic orbit to generate the desired dynamics, the orbit is sufficiently stable to withstand the intrinsic noise of the system.Fig. 3 furthermore illustrates learning of a chaotic dynamical system.Here, the network learns to generate the time varying dynamics of all three components of the Lorenz system and produces the characteristic attractor pattern after learning (Fig. 3f).Due to the encoding of the dynamics in spike trains, the signal maintains a small deterministic error which emerges from the encoding of a continuous signal by discrete spikes (Fig. 3g).The individual training and recall trajectories quickly depart from each other after the end of learning since they are chaotic.However, also for long simulation times, we observe qualitatively the same dynamics, indicating that the correct dynamical system was learned (Fig. Gc in S1 Supporting Information).Occasionally, errors occur, cf. the larger loop in Fig. 3f.This is to be expected due to the relatively short training period, during which only a part of the phase space covered by the attractor is visited.Importantly, we observe that after errors the dynamics return to the desired ones indicating that the general stability property of the attractor is captured by the learned system.To further test these observations, we considered a not explicitly trained long-term feature of the Lorenz-dynamics, namely the tent-map which relates the height z n−1 of the (n−1)th local maximum in the z−coordinate, to the height z n of the subsequent local maximum.The spiking network indeed generates the map (Fig. 3h), with two outlier points corresponding to each error.
In networks with saturating synapses, the spike trains are characterized by possibly intermittent periods of rather high-frequency spiking.In networks with nonlinear dendrites, the spike trains can have low frequencies and they are highly irregular (Figs.3c, F in S1 Supporting Information).In agreement with experimental observations (e.g.[45]), the neurons can have preferred parts of the encoded signal in which they spike with increased rates.
The dynamics of the PCSNs and the generation of the desired signal are robust against dynamic and structural perturbations.They sustain noise inputs which would accumulate to several ten percent of the level of the threshold within the membrane time constant, for a neuron without further input (Fig. B in S1 Supporting Information).For larger deviations of W njm from their optimal values, PCSNs with nonlinear dendrites can keep their learning capabilities, if µ is tuned to a specific range.Outside this range, the capabilities break down at small deviations (Fig. C in S1 Supporting Information).
However, a slightly modified version of the models, where the reset is always to −θ (even if there was fast excitation that drove the neuron to spike by a suprathreshold input), has a high degree of robustness against such structural perturbations.We also checked that the fast connections are important, albeit substantial weakening can be tolerated (Fig. D in S1 Supporting Information).
The deterministic spike code of our PCSNs encodes the output signal much more precisely than neurons generating a simple Poisson code, which facilitates learning.We have quantified this using a comparison between PCSNs with saturating synapses and networks of Poisson neurons of equal size, both learning the saw tooth pattern in the same manner.Since both codes become more precise with increasing spike rate of individual neurons, we compared the testing error between networks with equal spike rates.Due to their higher signal-to-noise ratio, firing rates required by the PCSNs to achieve the same pattern generation quality are more than one order of magnitude lower (Fig. A in S1 Supporting Information).

Delayed reaction/time interval estimation
For many tasks, e.g.computations focusing on recent external input and generation of self-sustained patterns, it is essential that the memory of the involved recurrent networks is fading: If past states cannot be forgotten, they lead to different states in response to similar recent inputs.A readout that learns to extract computations on recent input will then quickly reach its capacity limit.In neural networks, fading memory originates on the one hand from the dynamics of single neurons, e.g.due to their finite synaptic and membrane time constants; on the other hand it is a consequence of the neurons' connection to a network [46,47,48].In standard spiking neural network models, the overall fading memory is short, of the order of hundreds of milliseconds [49,9,10,11].It is a matter of current debate how this can be extended by suitable single neuron properties and topology [1,3,50,51].
Many biological computations, e.g. the simple understanding of a sentence, require longer memory, on the order of seconds.
We find that CSNs without learning of recurrent connectivity or feedback access such time scales.
We illustrate this by means of a delayed reaction/time estimation task: In the beginning of a trial, the network receives a short input pulse.By imitation learning, the network output learns to generate a desired delayed reaction.For this, it needs to specifically amplify the input's dynamical trace in the recurrent spiking activity, at a certain time interval.The desired response is a Gaussian curve, representative for any type of delayed reaction.The reaction can be generated several seconds after the input (Fig. 4a-c).
The quality of the reaction pattern depends on the connection strengths within the network, specified by the spectral radius g of the coupling matrix divided by the leak of a single corresponding continuous unit λ x .Memory is kept best in an intermediate regime (Fig. 4b), where the CSN stays active over long periods of time without overwriting information.This has also been observed for continuous rate networks [52].For too weak connections (Fig. 4a), the CSN returns to the inactive state after short time, rendering it impossible to retrieve input information later.If the connections are too strong, (Fig. 4c), the CSN generates self-sustained, either irregular asynchronous or oscillating activity, partly overwriting information and hindering its retrieval.We observe that already the memory in disconnected CSNs with synaptic saturation can last for times beyond hundreds of milliseconds (cf.Fig. E in S1 Supporting Information).This is a consequence of the recovery current: If a neuron has spiked several times in succession, the accumulated recovery current leads to further spiking (and further recovery current), and thus dampens the decay of a strong activation of the neuron [53].
Experiments show that during time estimation tasks, neurons are particularly active at two times: When the stimulus is received and when the estimated time has passed [54,55].Often the neuron populations that show activity at these points are disjoint.Our model reproduces this behavior for networks with good memory performance.In particular, at the time of the initial input the recurrently connected neurons become highly active (gray traces in Fig. 4b, upper sub-panel) while at the estimated reaction time, readout neurons would show increased activity (red trace).

Persistent memory and context dependent switching
Tasks often also require to store memories persistently, e.g. to remember instructions [56].Such memories may be maintained in learned attractor states (e.g.[57,58,59,60]).In the framework of our computing scheme, this requires the presence of output feedback [3].Here, we illustrate the ability of PCSNs to learn and maintain persistent memories as attractor states as well as the ability to change behavior according to them.For this, we use a task that requires memorizing computational instructions (Fig. 4d) [3].The network has two types of inputs: After pulses in the instruction channels, it needs to switch persistently between different rules for computation on the current values of operand channels.To store persistent memory, the recurrent connections are trained such that an appropriate output can indicate the instruction channel that has sent the last pulse: The network learns to largely ignore the signal when a pulse arrives from the already remembered instruction channel, and to switch states otherwise.Due to the high signal-to-noise ratio of our deterministic spike code, the PCSNs are able to keep a very accurate representation of the currently valid instruction in their recurrent dynamics.Fig. 4d, middle sub-panel, shows this by displaying the output of the linear readout trained to extract this instruction from the network dynamics.A similarly high precision can be observed for the output of the computational task, cf.Fig. 4d, lower sub-panel.

Building of world models, and control
In order to control its environment, an animal has to learn the laws that govern the environment's dynamics, and to develop a control strategy.Since environments are partly unpredictable and strategies are subject to evolutionary pressure, we expect that they may be described by stochastic optimal control theory.A particularly promising candidate framework is path integral control, since it computes the optimal control by simulating possible future scenarios under different random exploratory controls, and the optimal control is a simple weighted average of them [61].For this, an animal needs an internal model of the system or tool it wants to act on.It can then mentally simulate different ways to deal with the system and compute an optimal one.Recent experiments indicate that animals indeed conduct thought experiments exploring and evaluating possible future actions and movement trajectories before performing one [62,63].
Here we show that by imitation learning, spiking neural networks, more precisely PCSNs with a feedback loop, can acquire an internal model of a dynamical system and that this can be used to compute optimal controls and actions.As a specific, representative task, we choose to learn and control a stochastic pendulum (Fig. 5a,b).The pendulum's dynamics are given by with the angular displacement φ relative to the direction of gravitational acceleration, the undamped angular frequency for small amplitudes ω 0 , the damping ratio c, a random (white noise) angular force ξ(t) and the deterministic control angular force u(t), both applied to the pivot axis.The PCSN needs to learn the pendulum's dynamics under largely arbitrary external control forces; this goes beyond the tasks of the previous sections.It is achieved during an initial learning phase characterized by motor babbling as observed in infants [64] and similarly in bird song learning [65]: During this phase, there is no deterministic control, u = 0, and the pendulum is driven by a random exploratory force ξ only.
Also the PCSN receives ξ as input and learns to imitate the resulting pendulum's dynamics with its output.
During the subsequent control phase starting at t = 0, the aim is to swing the pendulum up and hold it in the inverted position (Fig. 5c).For this, the PCSN simulates at time t a set of M future trajectories of the pendulum, for different random exploratory forces ξ i ("mental exploration" with u = 0, cf.Fig. 5a,b), starting with the current state of the pendulum.In a biological system, the initialization may be achieved through sensory input taking advantage of the fact that an appropriately initialized output enslaves the network through the feedback.Experiments indicate that explored trajectories are evaluated, by brain regions separate from the ones storing the world model [66,67,68].We thus assign to the simulated trajectories a reward R i measuring the agreement of the predicted states with the desired ones.The optimal control u(t + s) (cf.Equation ( 16)) for a subsequent, not too large time interval s ∈ [0, δ] is then approximately given by a temporal average over the initial phase of the assumed random forces, weighted by the exponentiated total expected reward, where ξi (t) = 1 δ t+δ t ξ i ( t)d t and λ c is a weighting factor.We have chosen R i (t) = t+Tr t y i ( t)d t, i.e. the expected reward increases linearly with the heights y i ( t) = − cos(φ i ( t)) predicted for the pendulum for input trajectory ξ i ; it becomes maximal for a trajectory at the inversion point.T r is the duration of a simulated trajectory.The optimal control is applied to the pendulum until t + ∆, with ∆ < δ.Then, at t + ∆, the PCSN simulates a new set of trajectories starting with the pendulum's updated state and a new optimal control is computed.This is valid and applied to the pendulum between t + ∆ and t + 2∆, and so on.We find that controlling the pendulum by this principle leads to the desired upswing and stabilization in the inversion point, even though we assume that the perturbing noise force ξ (Equation ( 16)) acting on the pendulum in addition to the deterministic control u, remains as strong as it was during the exploration/learning phase (cf.Fig. 5a,b).
We find that for controlling the pendulum, the learned internal model of the system has to be very accurate.This implies that particular realizations of the PCSN can be unsuited to learn the model (we observed this for about half of the realizations), a phenomenon that has also been reported for small continuous rate networks before.However, we checked that continuous rate networks as encoded by our spiking ones reliably learn the task.Since the encoding quality increases with the number of spiking neurons, we expect that sufficiently large PCSNs reliably learn the task as well.

Discussion
The characteristic means of communication between neurons in the nervous system are spikes.It is widely accepted that sequences of spikes form the basis of neural computations in higher animals.
How computations are performed and learned is, however, largely unclear.Here we have derived continuous signal coding spiking neural networks (CSNs), a class of mesoscopic spiking neural networks that are a suitable substrate for computation.Together with plasticity rules for their output or recurrent connections, they are able to learn general, complicated computations by imitation learning (plastic CSNs, PCSNs).Learning can be highly reliable and accurate already for comparably small networks of hundreds of neurons.The underlying principle is that the networks reflect the input in a complicated nonlinear way, generate nonlinear transformations of it and use fading memory such that the inputs and their pasts interfere with each other.This requires an overall nonlinear relaxation dynamics suitable for computations [4].Such dynamics are different from standard spiking neural network dynamics, which are characterized by a high level of noise and short intrinsic memory [9,10,69,11].
To find spiking networks that generate appropriate dynamics, we use a linear decoding scheme for continuous signals encoded in the network dynamics as combinations of membrane potentials and synaptic currents.A specific coding scheme like this was introduced in refs.[1,3] to derive spiking networks encoding linear dynamics in an optimal way.We introduce spiking networks where the encoded signals have dynamics desirable for computation, i.e. a nonlinear, high-dimensional, lownoise, relaxational character as well as significant fading memory.We conclude that, since we use simple linear decoding, already the dynamics of the spiking networks must possess these properties.
Using this approach, we study two types of CSNs: Networks with saturating synapses and networks During model building, random exploratory control drives the dynamical system (here: a swinging pendulum).The spiking neural network is provided with the same control as input and learns to mimic the behavior of the pendulum as its output.(b): After learning, the spiking network can simulate the system's response to control signals.The panel displays the height of the real pendulum in the past (solid black line) and future heights under different exploratory controls (dashed lines).For the same controls, the spiking neural network predicts very similar future positions (colored lines) as the imitated system.It can therefore be used for mental exploration and computation of optimal control to reach an aim, here: to invert the pendulum.(c): During mental exploration, the network simulates in regular time intervals a set of possible future trajectories for different controls, starting from the actual state of the pendulum.From this, the optimal control until the next exploration can be computed and applied to the pendulum.The control reaches its aim: The pendulum is swung up and held in inverted position, despite a high level of noise added during testing (uncontrolled dynamics as in panel (a)).
with nonlinear dendrites.The CSNs with saturating synapses use a direct signal encoding; each neuron codes for one continuous variable.It requires spiking dynamics characterized by possibly intermittent phases of high rate spiking, or bursting, with inter-spike-intervals smaller than the synaptic time constants, which leads to a temporal averaging over spikes.Dynamics that appear externally similar to such dynamics were recently highlighted as a 'second type of balanced state' in networks of pulse-coupled, intrinsically oscillating model neurons [51].Very recently [70,71] showed that networks whose spiking dynamics are temporally averaged due to slow synapses possess a phase transition from a fixed point to chaotic dynamics in the firing rates, like the corresponding rate models that they directly encode.In the analytical computations the spike coding was not specified [70] or assumed to be Poissonian [71].Numerical simulations of leaky integrate-and-fire neurons in the chaotic rate regime can generate intermittent phases of rather regular high-rate spiking [70].The networks might provide a suitable substrate for learning computations as well.However, since the chaotic rate dynamics have correlations on the time scale of the slow synapses its applicability is limited to learning tasks where only a short fading memory of the reservoir is needed.For example delayed reaction tasks as illustrated in Fig. 4a-c would not be possible.Interestingly, in our scheme a standard leaky integrate-and-fire neuron with saturating synapses appears as a special case with recovery current of amplitude zero.According to our analysis it can act as a leaky integrator with a leak of the same time constant as the synapses, λ x = λ s .In contrast, in presence of a recovery current, our networks with saturating synapses can encode slower dynamics on the order of seconds.
After training the network, the time scales can be further extended.
In the CSNs with nonlinear dendrites the entire neural population codes for a usually smaller number of continuous variables, avoiding high firing rates in sufficiently large networks.The networks generate irregular, low frequency spiking and simultaneously a noise-reduced encoding of nonlinear dynamics, the temporal averaging over spikes in the direct coding case is partially replaced by a spatial averaging over spike trains from many neurons.The population coding scheme and our derivations of CSNs with nonlinear dendrites generalize the predictive coding proposed in ref. [3] to nonlinear dynamics.The roles of our slow and fast connections are similar to those used there: In particular, redundancies in the spiking are eliminated by fast recurrent connections without synaptic filtering.We expect that these couplings can be replaced by fast connections that have small finite synaptic time constants, as shown for the networks of ref. [3] in ref. [72].In contrast to previous work, in the CSNs with nonlinear dendrites we have linear and nonlinear slow couplings.The former contribute to coding precision and implement linear parts of the encoded dynamics, the latter implement the nonlinearities in the encoded dynamics.Further, in contrast to previous work, the spike coding networks provide only the substrate for learning of general dynamical systems by adapting their recurrent connections.Importantly, this implies (i) that the neurons do not have to adapt their nonlinearities to each nonlinear dynamical system that is to be learned (which would not seem biologically plausible) and (ii) that the CSNs do not have to provide a faithful approximation of the nonlinear dynamics Equations ( 6), (11), since a rough dynamical character (i.e.slow dynamics and the echo state property) is sufficient for serving as substrates.We note that refs.[73,74] suggested to use the differential equations that characterize dynamical systems to engineer spiking neural networks that encode the dynamics.The approach suggests an alternative derivation of spiking networks that may be suitable as substrate for learning computations.Their rate coding scheme, however, allows for redundancy and thus higher noise levels, and it generates high frequency spiking.In a future publication, B. DePasquale, M. Churchland, and L.F. Abbott will present an approach to train rate coding spiking neural networks, with continuous rate networks providing the target signals [75].We will discuss the relation between our and this approach in a joint review [76].
A characteristic feature of our neuron models is that they take into account nonlinearities in the synapses or in the dendrites.On the one hand this is biologically plausible [13,19,20,21], on the other hand it is important for generating nonlinear computations.Our nonlinearities are such that the decoded continuous dynamics match those for typical networks of continuous rate neurons and provide a simple model for dendritic and synaptic saturation.However, the precise form of the neuron model and its nonlinearity is not important for our approaches: As long as the encoded dynamical system is suitable as a computational reservoir, the spiking system is a CSN and our learning schemes will work.As an example, a dendritic tree with multiple interacting compartments may be directly implemented in both the networks with saturating synapses and in the networks with nonlinear dendrites.A future task is to explore the computational capabilities of CSNs incorporating different and biologically more detailed features that lead to nonlinearities, e.g.neural refractory periods, dendritic trees with calcium and NMDA voltage dependent channels and/or standard types of short term synaptic plasticity.
Inspired by animals' needs to generate and predict continuous dynamics such as their own body and external world movements, we let our networks learn to approximate desired continuous dynamics.
Since effector organs such as muscles and post-synaptic neurons react to weighted, possibly dendritically processed sums of post-synaptic currents, we interpret these sums as the relevant, continuous signal-approximating outputs of the network [39].Importantly, this is not the same as Poissonian rate coding of a continuous signal: As a simple example, consider a single spiking neuron.In our scheme it will spike with constant inter-spike-intervals to encode a constant output.In Poissonian rate coding, the inter-spike-intervals will be random, exponentially distributed and many more spikes need to be sampled to decode the constant output (cf.The outputs and recurrent connections of CSNs can be learned by standard learning rules [41,5].
The weight changes depend on the product of the error and the synaptic or dendritic currents and may be interpreted as delta-rules with synapse-and time-dependent learning rates.PCSNs, with learning of recurrent weights or output feedback, show how spiking neural networks may learn internal models of complicated, self-sustained environmental dynamics.In our applications, we demonstrate that they can learn to generate and predict the dynamics in different depths, ranging from the learning of single stable patterns over the learning of chaotic dynamics to the learning of dynamics incorporating their reactions to external influences.
The spiking networks we use have medium size, like networks with continuous neurons used in the literature [4,5].CSNs with saturating synapses have, by construction, the same size as their non-spiking counterparts.In CSNs with nonlinear dendrites the spike load necessary to encode the continuous signals is distributed over the entire network.This leads to a trade-off between lower spiking frequency per neuron and larger network size (cf.Previous work using spiking neurons as a reservoir to generate a high dimensional, nonlinear projection of a signal for computation, concentrated on networks without output feedback or equivalent taskspecific learning of recurrent connectivity [1,77,50].Such networks are commonly called "liquid state machines" [78].By construction, they are unable to solve tasks like the generation of self-sustained activity and persistent memorizing of instructions; these require an effective output feedback, since the current output determines the desired future one: To compute the latter, the former must be made available to the network as an input.The implementation of spiking reservoir computers with feedback was hindered by the high level of noise in the relevant signals: The computations depend on the spike rate, the spike trains provide a too noisy approximation of this average signal and the noise is amplified in the feedback loop.While analytically considering feedback in networks of continuous rate neurons, ref. [3] showed examples of input-output tasks solved by spiking networks with a feedback circuit, the output signals are affected by a high level of noise.This concerns even output signals just keeping a constant value.We implemented similar tasks (Fig. 4d), and find that our networks solve them very accurately due to their more efficient coding and the resulting comparably high signal-tonoise ratio.In contrast to previous work, our derivations systematically delineate spiking networks which are suitable for the computational principle with feedback or recurrent learning; the networks can accurately learn universal, complicated memory dependent computations as well as dynamical systems approximation, in particular the generation of self-sustained dynamics.
In the control task, we show how a spiking neural network can learn an internal model of a dynamical system, which subsequently allows to control the system.We use a path integral approach, which has already previously been suggested as a theory for motor control in biological systems [79,80].
We apply it to learned world models, and to neural networks.Path integral control assumes that noise and control act in a similar way on the system [61].This assumption is comparably weak and the path integral control method has been successfully applied in many robotics applications [81,82,83], where it was found to be superior to reinforcement learning and adaptive control methods.
Continuous rate networks using recurrence, readouts, and feedback or equivalent recurrent learning, are versatile, powerful devices for nonlinear computations.This has inspired their use in manifold applications in science and engineering, such as control, forecasting and pattern recognition [6].
Our study has demonstrated that it is possible to obtain similar performance using spiking neural networks.Therewith, our study makes spiking neural networks available for similarly diverse, complex computations and supports the feasibility of the considered computational principle as a principle for information processing in the brain.

Network simulation
We use a time grid based simulation scheme (step size dt).If not mentioned otherwise, between time points, we compute the membrane potentials using a Runge-Kutta integration scheme for dynamics without noise and an Euler-Maruyama integration scheme for dynamics with noise.Since CSNs with nonlinear dendrites have fast connections without conduction delays and synaptic filtering, we process spikings at a time point as follows: We test whether the neuron with the highest membrane potential is above threshold.If the outcome is positive, the neuron is reset and the impact of the spike on postsynaptic neurons is evaluated.Thereafter, we compute the neuron with the highest, possibly updated, membrane potential and repeat the procedure.If all neurons have subthreshold membrane potential, we proceed to the next time point.The described consecutive updating of neurons in a single time step increases in networks with nonlinear dendrites the robustness of the simulations against larger time steps, as the neurons maintain an order of spiking and responding like in a simulation with smaller time steps and a small but finite conduction delay and/or slight filtering of fast inputs.As an example, the scheme avoids that neurons that code for similar features and thus possess fast mutual inhibition, spike together within one step and generate an overshoot in the readout, as it would be the case in a parallel membrane potential updating scheme.The different tasks use either networks with saturating synapses or networks with nonlinear dendrites.In both cases, A is a sparse matrix with a fraction p of non-zero values.These are drawn independently from a Gaussian distribution with zero mean and variance g 2 pN (CSNs with saturating synapses) or pJ (CSNs with nonlinear dendrites), which sets the spectral radius of A approximately to g.For networks with nonlinear dendrites, the elements of Γ are drawn from a standard normal distribution.
To keep the approach simple, we allow for positive and negative dendro-somatic couplings.In order to achieve a uniform distribution of spiking over the neurons in the network, we normalize the columns of Γ to have the same norm, which we control with the parameter γ s .This implies that the thresholds are identical.

Training phase
The networks are trained for a period of length T t such that the readouts z k imitate target signals , where the index β runs from 1 to N (CSNs with saturating synapses) or from 1 to J (CSNs with nonlinear dendrites).The weights w βk are fixed and drawn from a uniform distribution in the range [− wi , wi ].If present, the feedback weights w f βk (cf.Equation ( 15)) are likewise chosen randomly from a uniform distribution in the range [− w f , w f ] with a global feedback parameter w f .For the delayed reaction/time estimation task (Figs.4a-c, E in S1 Supporting Information), we applied the RLS (recursive least squares) algorithm [41] to learn the linear outputs.For the pattern generation, instruction switching and control tasks, we applied the FORCE (first-order reduced and controlled error) algorithm [5] (Figs. 3, 4d, 5, A-D, F and G in S1 Supporting Information) to learn the recurrent connections and linear outputs.

Learning rules
The output weights w o km are trained using the standard recursive least squares method [41].They are initialized with 0, we use weight update intervals of ∆t.The weight update uses the current training error e k (t) = z k (t) − F k (t), where z k (t) is the output that should imitate the target signal F k (t), it further uses an estimate P βρ (t) of the inverse correlation matrix of the unweighted neural synaptic or dendritic inputs rβ (t), as well as these inputs, The indices β, ρ range over all saturating synapses (β, ρ = 1, ..., N ; rβ (t) = tanh (γr β (t))) or all nonlinear dendrites (β, ρ = 1, ..., J; rβ (t) = tanh N m=1 Γ βm r m (t) ) of the output neuron.The square matrix P is a running filter estimate of the inverse correlation matrix of the activity of the saturated synapses (CSNs with saturating synapses) or non-linear dendrites (CSNs with nonlinear dendrites).
The matrix is updated via where the indices β, γ, ρ, σ run from 1 to N (CSNs with saturating synapses) or from 1 to J (CSNs with nonlinear dendrites).P is initialized as P(0) = α −1 1 with α −1 acting as a learning rate.
For the update of output weights in presence of feedback and of recurrent weights we adopt the FORCE algorithm [5].In presence of feedback, this means that recursive least squares learning of output is fast against the temporal evolution of the network, and already during training the output is fed back into the network.Thus, each neuron gets a feedback input The feedback weights w f βk are static, the output weights are learned according to Equation (18).Since the outputs are linear combinations of synaptic or dendritic currents, which also the neurons within the network linearly combine, the feedback loop can be implemented by modifying the recurrent connectivity, by adding a term K out k=1 w f ρk w o kβ to the matrix A ρβ .Learning then affects the output weights as well as the recurrent connections, separate feedback connections are not present.This learning and learning of output weights with a feedback loop are just two different interpretations of the same learning rule.For networks with saturating synapses the update is where the w f nk are now acting as learning rates.For networks with nonlinear dendrites, the update is

Control task
The task is achieved in two phases, the learning and the control phase.
1. Learning: The PCSN learns a world model of the noisy pendulum, i.e. it learns the dynamical system and how it reacts to input.The pendulum follows the differential Equation ( 16) with cω 0 = 0.1s −1 and ω 2 0 = 10s −2 , ξ(t) is a white noise force with ξ(t)ξ(t ) = s −3 δ(t − t ), x(t) = sin(φ(t)) and y(t) = − cos(φ(t)) are Cartesian coordinates of the point mass.The neural network has one input and three outputs which are fed back into the network; it learns to output the x-and the y-coordinate, as well as the angular velocity of the pendulum when it receives as input the strength of the angular force (noise plus control) ξ(t) + u(t) applied to the pivot axis of the pendulum.The learning is here interpreted as learning in a network with feedback, cf.Equation (20).
We created a training trajectory of length T t = 1000s by simulating the pendulum with the given parameters and by driving it with white noise ξ(t) as an exploratory control (u(t) = 0).Through its input, the PCSN receives the same white noise realization ξ(t).During training the PCSN learns to imitate the reaction of the pendulum to this control, more precisely its outputs learn to approximate the trajectories of x, y and ω.As feedback to the reservoir during training we choose a convex combination of the reservoir output and the target (feedback = 0.9 • output + 0.1 • target).We find that such a combination improves performance: If the output at the beginning of the training is very erroneous, those errors are accumulated through the feedback-loop, which prevents the algorithm from working.On the other hand, if one feeds back only the target signal, the algorithm does not learn how to correct for feedback transmitted readout errors.In our task, the convex combination alleviates both problems.
2. Control: In the second phase, the learned world model of the pendulum is used to compute stochastic optimal control that swings the pendulum up and keeps it in the inverted position.The PCSN does not learn its weights in this phase anymore.It receives the different realizations of exploratory (white noise) control and predicts the resulting motion ("mental exploration").From this, the optimal control may be computed using the path integral framework [61].In this framework a stochastic dynamical system (which is possibly multivariate) with arbitrary nonlinearity f (x(t)) and white noise ξ(t), is controlled by the feedback controller u(x(t), t) to optimize an integral C(t) over a state cost U (x( t)) and a moving horizon quadratic control cost, The reward is related to the cost by R = −C.Path integral control theory shows that the control at time t can be computed by generating samples from the dynamical system under the uncontrolled dynamics The control is then given by the success weighted average of the noise realizations ξ i where C i (t) = t+Tr t U (x i ( t))d t is the cost observed in the ith realization of the uncontrolled dynamics, which is driven by noise realization ξ i and u = 0. Equation ( 17) is a discrete approximation to Equation (25).In our task, Equation (24) becomes and U (x(t)) = −y(t) = cos(φ(t)).

Figure details
The parameters of the different simulations are given in Table 2 for simulations using saturating synapses and in Table 3 for simulations using nonlinear dendrites.Further parameters and details about the figures and simulations are given in the following paragraphs.
If not mentioned otherwise, for all simulations we use g  Nonlin.dendr.0.01s, γ = θ and σ η = 0 1 √ s .We note that for simulations with saturating synapses, we model the slow synaptic currents to possess synaptic time constants of 100ms (cf., e.g., [50,60]).We usually use the same value for the slow synapses in networks with nonlinear dendrites.Upon rescaling time, these networks can be interpreted as networks with faster time constants, which learn faster target dynamics.Since the spike rates scale likewise, we have to consider larger networks to generate rates in the biologically plausible range (cf. with parameters σ = 10, ρ = 28, β = 8/3; we set the dimensionless temporal unit to 0.2s and scale the dynamical variables by a factor of 0.1.Panels (f,g) show a recall phase of 400s, panel (h) shows points from a simulation of 4000s.Panel (f) only shows every 10th data point, panel (g) shows every data point.The mean spike rate is 432Hz.

Figure 4
Figure 4a-c: We quantified the memory capacity of a CSN with saturating synapses.The network has a sparse connectivity matrix A without autapses.We applied white noise with σ η = 0.001 1 √ s .The input is a Gaussian bell curve with σ = 0.2s and integral 10s (height normalized to one in the figure).The target is a Gaussian bell curve with σ = 1s and integral 1s (height normalized to one in the figure).The target is presented several seconds after the input.Trials consisting of inputs and subsequent desired outputs are generated at random times with exponential inter-trial-interval distribution with time constant 10s and a refractory time of 100s.Training time is T t = 800s, i.e. the network is trained with about 6 to 8 trials.Testing has the same duration with a similar number of trials.There is no feedback introduced by initialization or by learning, so the memory effect is purely inherent to the random network.We compute the quality of the desired output generation as the root mean squared (RMS) error between the generated and the desired response, normalized by the number of test trials.As reference, we set the error of the "extinguished" network, which does not generate any reaction to the input, to 1. Lower panels of Fig. 4a-c   The pulsed instruction input is created by the convolution of a Poisson spike train with an exponential kernel e −t 1 s .The rate of the delta pulses during training is 0.04 1 s .During testing we choose a slower rate of 0.01 1 s for a clearer presentation.In the rare case when two pulses overlap such that the pulsed signal exceeds an absolute value of 1.01 times the maximal pulse height of one, we shift the pulse by the minimal required amount of time to achieve a sum of the pulses below or equal to 1.01.We use weights w i,p = 100 1 s for the pulsed inputs, w i,c = 250 1 s for the continuous inputs and w f = 250 1 s for the feedback; g = 75 1 s .The recurrent weights of the network are trained with respect to the memory target F m (t).This target is +1 if the last instruction pulse came from f p 1 and it is −1 if the last pulse came from f p 2 .During switching the target follows the integral of the input pulse.The corresponding readout is z m .The second readout z c is trained to output the absolute value of the difference of the two continuous inputs, if the last instruction pulse came from f p 1 , and to output their sum, if the last instruction pulse came from f p 2 .The specific analytical form of this target is The mean spike rate is 5.53Hz.

Figure 5
Since we have white noise as input we use the Euler-Maruyama scheme in all differential equations.
The PCSN has nonlinear dendrites.Non-plastic coupling strengths are w f,y = 100 1 s for the feedback of the y-coordinate, w f,x = 100 1 s for the feedback of the x-coordinate, w f,ω = 20 1 s for the feedback of the angular velocity and w i = 2 7 1 s for the input.We introduce an additional random constant bias term into the nonlinearity to increase inhomogeneity between the neurons: The nonlinearity is tanh b j + N m=1 W njm r m (t) where b j is drawn from a Gaussian distribution with standard deviation 0.01.The integration time δ is 0.1s.During the control/testing phase, every ∆ = 0.01s, M = 200 samples of length T r = 1s are created, the cost function is weighted with λ c = 0.01 1 s .The mean spike rate is 146Hz.

S1 Supporting information
1 Supporting text and figures 1.1 Networks with nonlinear dendrites stated in the main text, we can generalize Equation (3) by introducing fast connections that generate discontinuities in postsynaptic neurons when a neuron spikes.We may then require that only a lower, say J−, dimensional combination x(t) of the N −dimensional vectors V(t) and r(t) is continuous, where L and Γ are J × N matrices.(For clarity of presentation we will use vector/matrix notation instead of components throughout the present section.)The benefit of this approach is that the spike trains of a larger population of neurons contribute to each x n , such that a modified analogue to Equation ( 9), with a J × N matrix Γ can hold even if the spike rates of individual neurons are low, i.e. if we make use of population/distributed coding.The matrix L is fixed (except for degenerate cases) as soon as the matrix Γ and the fast changes in V are fixed.However, it is not a priori clear how to choose the latter two; we need to employ some optimization scheme to ensure both a good approximation Equation (S2) and a low firing rate.
For this, we start anew, and in contrast to the previous section with the dynamics for x(t).From these we will derive spiking dynamics approximating the x(t).We begin with a general J-dimensional nonlinear dynamical system yielding x(t), where f (x) and c(t) are column vectors of functions f j (x 1 , ..., x N ) and external inputs c j (t), respectively.We will generalize an approach introduced in refs.[1,2,3] to nonlinear systems and derive spiking dynamics that optimally (see below) approximate x(t) satisfying Equation (S3).The approach will yield Equation (S1) with a specific L as by-product.We will find that the dynamics of individual neurons depend on the f j and we will specify these functions such that the neural dynamics are biologically plausible and suitable for universal computation.
We choose the momentary error or cost function Systems of the form Equation (S17) are known to be suitable for universal computation [4,5,6], in particular for appropriate A they can maintain longer-term fading memory.Since the x are dynamical quantities linearily derived from the underlying spiking network, already the underlying spiking network is suitable for computations.
Furthermore, the structure of Equation (S16) allows for a straightforward interpretation in biological terms: The response of neuron n's soma to slow input to its J dendrites is modeled by the term ).The inputs have non-negligible synaptic time constant (cf.r(t)), they are linearly summed and thereafter subjected to a dendritic sublinearity (tanh).The coupling strength of a synaptic connection from neuron m to the jth dendrite of neuron n is given by Γ jm , the coupling strength from the jth dendrite of neuron n to its soma is (Γ T A) nj .Further fast and slow inputs arriving near the soma (and thus not subject to a dendritic non-linearity) are incorporated by the terms − Γ T Γ + µ1 s(t) and aΓ T Γ + µλ s 1 r(t).Their impact is characterized by the product Γ T Γ of the decoding matrix with itself and the comparably small weight µ of the spike frequency penalty term, the positive diagonal terms incorporate the reset of the neurons after a spike and a slower recovery.

A sufficient condition for the echo state property of the dynamics Equation (6)
When does (Equation ( 6) of the main text) possess the echo state property?Dynamics have this property if after sufficiently long time any initial conditions are washed out and the state of the system is completely determined by the input.This is definitely the case if the distance between trajectories decays at least exponentially with a rate independent of the input [7].We will prove the latter for our dynamics Equation (S18).For this, we will consider the difference ∆(t) = x 1 (t) − x 2 (t) and the Euclidean distance ∆(t) of two trajectories that satisfy Equation (S18) and have different initial conditions x 1 (0),x 2 (0) but the same input I(t).We will show that under the condition A < min(λ V , λ x ), with A being the spectral norm (the largest singular value) of A, an inequality ˙ ∆(t) ≤ − ∆(t) holds for some > 0 (as usual the dot denotes the temporal derivative of the entire expression below, here ∆(t) ).
We start with the expression 1 2 ˙ ∆(t) 2 = ∆(t) ∆(t) and replace the right hand side using ∆(t) = x 1 (t) − x 2 (t) and Equation (S18), which leads to To proceed we use the three inequalities xy ≤ x y , Ax ≤ A x and tanh(x) − tanh(y) ≤ x − y , which allow to estimate We now simplify the right hand side of the inequality further.For this use that for every pair of real valued vectors x and y we have The result can be used to bound the right hand side of Equation (S20) by a simpler expression, Now we assume A < min(λ V , λ x ) such that we can write λ V = V + A and λ x = x + A with V > 0 and x > 0. Using this in Equation (S19) yields and together with Equation (S22) Both terms on the right hand side are smaller or equal to zero, We can therefore set = min( V , x ) > 0 and simplify The noise level is given in terms of the standard deviation generated by a purely noise-driven subthreshold membrane potential (Ornstein-Uhlenbeck process) with membrane time constant λV , in multiples of the threshold.The insets display the testing phase for two examples using the setup with nonlinear dendrites (crosses in the main plot denote the corresponding noise and error levels).
given in terms of the standard deviation generated by a purely noise-driven subthreshold membrane potential (Ornstein-Uhlenbeck process) with membrane time constant λ V , in multiples of the threshold (which equals θ/2 in the case of saturating synapses and θ in the case of non-linear dendrites), i.e. noise-level := threshold .The error is determined as RMS error between the desired signal and the signal generated during testing.

Robustness of PCSNs with nonlinear dendrites against structural perturbations
In CSNs with nonlinear dendrites, the optimal strengths of the couplings from other neurons to the nonlinear dendrites and the fast couplings are independent of the parameters of the encoded nonlinear dynamical system Equation (S17).Since deviations from the optimal values in these couplings in general do not imply a simple change in the encoded dynamics but impair the coding scheme, we here investigate robustness of PCSN-learning against them.
The optimal coupling strength from neuron m to the jth nonlinear dendrite of neuron n is W njm = Γ jm .
Here we test the robustness of the learning scheme against deviations from the optimal couplings.We find that PCSN learning is robust, if we modify the neuron model to have a "precise reset", i.e. the reset is always to the fixed value −θ (−2θ below threshold θ), even if fast excitation to a suprathreshold potential caused the spike (Fig. C).In the native model the reset has fixed size −2θ such that the membrane potential would be reset to a value larger than −θ after a suprathreshold excitation.We note that the reset to a fixed value may also be biologically more plausible than a reset of fixed size.3d, A).The W njm are perturbed proportionally to the strength of their optimal values, the perturbed couplings are given by W njm = Γ jm 1 + W pert Ξ n jm , where the Ξ n jm are independently drawn from a Gaussian distribution with mean zero and variance one.We adopt the interpretation of the PCSNs as networks with plastic recurrent connections: The outputs, the output weight updates, the inverse correlation matrices and the updates of the dendrite-to-soma weights D nj are computed using Equations ( 14), ( 18), ( 19) and (22) with unperturbed readouts rj (t) = tanh N m=1 Γ jm r m (t) .We note that updating the D nj is not equivalent to a static feedback of the updated overall network readout anymore, since the latter does not contain the perturbed W njm .
Our findings raise the question why the networks with precise reset are much more stable to perturbations in the dendrites.We find that the instability in the simulations of the conventional model is due to an explosion of the spike rates, such that every neuron spikes once at every simulated time step.This is due to the "ping pong effect" already described in ref. [3]: Neurons that spike due to excitation from fast connections generate further suprathreshold excitation and in the end all neurons have spiked within a single step.In [3] this problem is solved using a higher value of µ.We find for our simulations with perturbed W njm that a fine tuning of µ is required to prevent a breakdown of the learning.In contrast, the precise reset solves this problem robustly.This is a consequence of the fact that on the one hand the precise reset yields a membrane potential that is further away from the threshold and thus reduces the chance of re-excitation of a neuron that has recently spiked (µ has an in principle similar effect).On the other hand, the difference from the theory arises only after suprathreshold excitation, the change in the precise spiking dynamics is small and the spike coding scheme is preserved.
The optimal strength of a fast coupling from neuron m to neuron n is U nm = J j=1 Γ jn Γ jm + µδ nm (which is independent of N for fixed neuron threshold).In "balanced state" irregular spiking networks, such recurrent connections may be expected to be much smaller, since they scale with 1/ √ N for fixed neuron thresholds [9].We therefore test the dependence of the PCSNs on these connections by a multiplicative weakening by a factor 1 − s pert , U pert nm = (1 − s pert )U nm for n = m (resets are kept U pert nn = U nn ).Fig. D shows that PCSNs are robust against this: The learning capabilities are conserved, if s pert < 10%.The figure also indicates that the fast connections are important for PCSN functionality, since the learning abilities are quickly lost as s pert increases beyond this range.

Comparison of network sizes and the spike rate-network size trade-off
We illustrate that for PCSNs, network sizes comparable to those of continuous rate networks solving the same task with FORCE learning can be sufficient.Further, we show that there is a trade-off between network size and spike rate of individual neurons.As example we use the "camel's hump" task (cf.Fig. 3e).Fig. Fa shows that continuous networks Equation ( 6) can learn the signal well for about N > 40, we use N = 50 as a reference.We compare with PCSNs with nonlinear dendrites that encode a system Equation ( 6) with the same parameters (in particular J = 50), and are trained to solve the same task.We compare the RMS error and spiking frequencies for different PCSN sizes N and γ s , which regulates the threshold of the neurons (cf.Methods and Equation (S10)).small γ s also smaller networks learn the task well, but all networks generate a higher spike frequency.

Error evolution for longer times
We do not observe lasting changes of the error in long term simulations.For periodic signals, there is an inevitable phase shift.It originates from the small error between the period of the desired and the learned signal; this error accumulates over time.Apart from that, tested features such as the deviation from the desired dynamics (RMS error) and the spike frequency are remarkably constant.the good agreement with the tent-map generated by the teacher dynamics also for long times.In the displayed simulation, the Lorenz dynamics deviate three times from the desired dynamics, which leads to the six outliers in Fig. 3h.However, the dynamics return every time to the desired dynamics such that the errors just generate single outlier pairs in the tent map and the qualitative dynamics agree with those of the Lorenz system still after 4000s. 2 Supporting methods

Details on the supporting figures
The parameters of the different simulations are given in Table A for simulations using saturating synapses and in Table B for simulations using nonlinear dendrites.Further parameters and details about the figures and simulations are given in the following paragraphs.

Figure A
The signal has period 2s and amplitude 10.The parameters of the Poisson networks are N = 50, α = 0.01, g = 1.5 1 s , T t = 10.5s,λ −1 s = 100ms, λ −1 V = 1s, dt = 0.1/s 0 , the sparse matrix A has a fraction p = 0.1 of nonzero entries, which are drawn from a Gaussian distribution with zero mean and variance g 2 pN .s 0 is swept between 50

Figure C
The signal has period 2s and amplitude 10 (normalized to one in the figure).We take medians and quartiles over 50 trials.We use the Euler method to integrate the differential Equations.
Plotted are the median of the RMS error (shaded: intervals between first and third quartile) between the signal and the target in the first 30s after training.The error is computed using the normalized version of the signal, where the amplitude is set to 1.The sweep covers 80 equidistant values of W pert from 0 to 0.2.
For simulations that showed pathological spiking (more than 200 spikes per time step in the numerical simulation) we assigned an infinite error.

Figure D
The signal has period 2s and amplitude 10 (normalized to one in the figure).We take medians and quartiles over 50 trials.We use the Euler method to integrate the differential Equations.
Plotted are the median of the RMS error (shaded: intervals between first and third quartile) between the signal and the target in the first 30s after training.The error is computed using the normalized version of the signal, where the amplitude is set to 1.The sweep covers 80 equidistant values of s pert from 0 to 0.3 (displayed is the range up to 0.2).
For simulations that showed pathological spiking (more than 200 spikes per time step in the numerical simulation) we assigned an infinite error.

Figure E
The parameters are as in Fig. 4a-c

Figure 1 :
Figure 1: Coding of continuous signals in neurons with saturating synapses (a,b) and nonlinear dendrites (c,d).(a,b):A neuron with saturating synapses (a) that directly codes for a continuous signal (b).Panel (a) displays the neuron with an axon (red) and dendrites (dark blue) that receive inputs from the axons of other neurons (axons at the bottom) via saturating synapses (symbolized by sigmoids at the synaptic contacts).The currents entering the soma are weighted sums of input spike trains that are synaptically filtered (generating scaled normalized synaptic currents γrn(t), synaptic time scale τs) and thereafter subject to a saturating synaptic nonlinearity.External inputs (axons at the top) are received without saturation.The continuous signal x(t) (panel b left hand side, green) is the sum of the neuron's membrane potential V (t) (red) and its scaled normalized synaptic current θr(t) (dark blue).r(t) is a low-pass filtered version of the neuron's spike train s(t) (light blue in red box).If x(t) > 0, the time scale of x(t) should be large against the synaptic time scale τs and x(t) should predominantly be large against the neuron's threshold, θ /2 (panel b right hand side, assumptions[1,2]  in the main text).x(t) is then already well approximated by θr(t), while V (t) is oscillating between ± θ /2.If x(t) ≤ 0, we have V (t) ≤ 0, no spikes are generated and r(t) quickly decays to zero, such that we predominantly have r(t) ≈ 0 and x(t) is well approximated by V (t) (cf.Equation (9)).(c,d): Two neurons with nonlinear dendrites (c) from a larger network that distributedly codes for a continuous signal (d).(c): Each neuron has an axon (red) and different types of dendrites (cyan, light blue and dark blue) that receive inputs from the axons of other neurons (axons at the bottom) via fast or slow conventional synapses (highlighted by circles and squares).Linear dendrites with slow synapses (cyan with circle contacts) generate somatic currents that are weighted linear sums of low-pass filtered presynaptic spike trains (weighted sums of the rn(t)).Linear dendrites with fast synapses (light blue with square contacts) generate somatic currents with negligible filtering (weighted sums of the spike trains sn(t)).Spikes arriving at a nonlinear dendrite (dark blue) are also filtered (circular contact).The resulting rn(t) are weighted, summed up linearly in the dendrite and subjected to a saturating dendritic nonlinearity (symbolized by sigmoids at dendrites), before entering the soma.We assume that the neurons have nonlinear dendrites that are located in similar tissue areas, such that they connect to the same sets of axons and receive similar inputs.(d): All neurons in the network together encode J continuous signals x(t) (one displayed in green) by a weighted sum of their membrane potentials V(t) (two traces of different neurons displayed in red) and their normalized PSCs r(t) (two traces displayed in cyan).The Γr(t) alone already approximate x(t) well.The neurons' output spike trains s(t) (light blue in red box) generate slow and fast inputs to other neurons.(Note that spikes can be generated due to suprathreshold excitation by fast inputs.Since we plot V(t) after fast inputs and possible resets, the corresponding threshold crossings do not appear.)

2 :
): We can transform the learning of output weights in networks with feedback into mathematically equivalent learning of recurrent connection strengths, between synapses (CSNs with saturating synapses) or Setups used to learn versatile nonlinear computations with spiking neural networks.
in S1 Supporting Information).It results from accumulation of small differences between the learned and desired periods.Apart from this, the error between the recalled and the desired signals is approximately constant over time (Fig. Gb in S1 Supporting Information).

Figure 3 :
Figure 3: Learning dynamics with spiking neural networks.(a): Schematic hunting scene, illustrating the need for complicated dynamical systems learning and control.The hominid has to predict the motion of its prey, and to predict and control the movements of its body and the projectile.(b-h): Learning of self-sustained dynamical patterns by spiking neural networks.(b): A sine wave generated by summed, synaptically and dendritically filtered output spike trains of a PCSN with nonlinear dendrites.(c): A sample of the network's spike trains generating the sine in (b).(d): A saw tooth pattern generated by a PCSN with saturating synapses.(e): A more complicated smooth pattern generated by both architectures (blue: nonlinear dendrites, red: saturating synapses).(f-h): Learning of chaotic dynamics (Lorenz system), with a PCSN with nonlinear dendrites.(f): The spiking network imitates an example trajectory of the Lorenz system during training (blue); it continues generating the dynamics during testing (red).(g): Detailed view of (f) highlighting how the example trajectory (yellow) is imitated during training and continued during testing.(h):The spiking network approximates not explicitly trained quantitative dynamical features, like the tent map between subsequent maxima of the z-coordinate.The ideal tent map (yellow) is closely approximated by the tent map generated by the PCSN (red).The spiking network sporadically generates errors, cf. the larger loop in (f) and the outlier points in (h).Panel (h) shows a ten times longer time series than (f), with three errors.

Figure 4 :
Figure 4: Learning of longer-term memory dependent computations with spiking neural networks.(a-c): Delayed reaction and time interval estimation: The synaptic output of a CSN learns to generate a generic reaction several seconds after a short input.Upper panels show typical examples of input, desired and actual reactions (green, yellow and red traces).In the three panels, the desired reaction delay is the same (9sec), the networks (CSNs with saturating synapses) have different levels of recurrent connection strengths ((a), (b), (c): low, intermediate, high level).The generation of the reaction is best for the network with intermediate level of connection strength.The CSNs with lower or higher levels have not maintained sufficient memory due to their extinguished or noisy and likely chaotic dynamics (gray background lines: spike rates of individual neurons).The median errors of responses measured for different delays in ensembles of networks (levels of connection strength as in the upper panels), are given in the lower panels.The shaded regions represent the area between the first and third quartile of the response errors.Dashed lines highlight delay and error size of the examples in the upper panels.(d): Persistent retaining of instructions and switching between computations: The network receives (i) two random continuous operand inputs (upper sub-panel, yellow and purple traces), and (ii) two pulsed instruction inputs (middle sub-panel, blue and green; memory of last instruction pulse: red).The network has learned to perform different computations on the operand inputs, depending on the last instruction (lower subpanel): if it was +1 (triggered by instruction channel 1), the network performs a nonlinear computation, it outputs the absolute value of the difference of the operands (red trace (network output) agrees with blue); if it was -1 (triggered by channel 2), the values of the operands are added (red trace agrees with green trace).

Figure 5 :
Figure5: Model building and mental exploration to compute optimal control.(a): Learning of an internal world model with spiking neural networks.During model building, random exploratory control drives the dynamical system (here: a swinging pendulum).The spiking neural network is provided with the same control as input and learns to mimic the behavior of the pendulum as its output.(b): After learning, the spiking network can simulate the system's response to control signals.The panel displays the height of the real pendulum in the past (solid black line) and future heights under different exploratory controls (dashed lines).For the same controls, the spiking neural network predicts very similar future positions (colored lines) as the imitated system.It can therefore be used for mental exploration and computation of optimal control to reach an aim, here: to invert the pendulum.(c): During mental exploration, the network simulates in regular time intervals a set of possible future trajectories for different controls, starting from the actual state of the pendulum.From this, the optimal control until the next exploration can be computed and applied to the pendulum.The control reaches its aim: The pendulum is swung up and held in inverted position, despite a high level of noise added during testing (uncontrolled dynamics as in panel (a)).
Fig. A in S1 Supporting Information).
Fig. F in S1 Supporting Information): The faster the neurons can spike the smaller the network may be to solve a given task.

Figure 3 Figure
Figure 3

Figure 3d :
Figure3d: The PCSN has saturating synapses.The target signal is a saw tooth pattern with period 2s and amplitude 10 (normalized to one in the figure).We used an Euler scheme here.The mean spike rate is 226Hz.

Figure 3e :
Figure 3e:The task is performed by a PCSN with non-linear dendrites and by a PCSN with saturating synapses.The target signal is sin(t 0.5 s ) + cos(t 1 s ).The mean spike rate is 77.8Hz for saturating synapses and 21.3Hz for non-linear dendrites.

Figure
Figure 3f-h: The PCSN has nonlinear dendrites.As teacher we use the standard Lorenz system display medians and quartiles taken over 50 task repetitions.The sweep was done for time-delays 2 − 20s in steps of 0.5 s.

Figure 4d :
Figure 4d: The PCSN has nonlinear dendrites.For this task a constant input of I const e = b is added to the network with the elements of the vector b chosen uniformly from [0 1 s , 250 1 s ] to introduce inhomogeneity.Four different inputs are fed into the network, two continuous f c 1/2 and two pulsed input channels f p 1/2 .The continuous inputs are created by convolving white noise twice with an exponential kernel e −t 1 s (equivalent to convolving once with an alpha function) during training and e −t 1 10s during testing.The continuous input signals are normalized to have mean 0 and standard deviation 0.5.

)Figure B :
Figure B: Robustness of PCSNs against noise.The figure shows the median RMS error between the output of PCSNs and the saw tooth target pattern during testing, versus the noise level (shaded: intervals between first and third quartile).The noise level is given in terms of the standard deviation generated by a purely noise-driven subthreshold membrane potential (Ornstein-Uhlenbeck process) with membrane time constant λV , in multiples of the threshold.The insets display the testing phase for two examples using the setup with nonlinear dendrites (crosses in the main plot denote the corresponding noise and error levels).

Figure C :
Figure C: Robustness of PCSNs against structural perturbations of dendritic coupling.The figure shows performance of native PCSNs (networks without precise reset, green) and for PCSNs where neurons are reset to a fixed membrane potential after spike generation (networks with precise reset, blue).Displayed is the median RMS error between the output of PCSNs and the saw tooth target pattern during testing versus the standard deviation Wpert of a multiplicative Gaussian perturbation in the connectivity (shaded: intervals between first and third quartile).The insets display the testing phase for two examples using the setup with precise reset (crosses in the main plot denote the corresponding Wpert and error levels).While the native model does not show successful learning for perturbations larger than 2% of the connection weights, networks with precise reset generate a recognizable sawtooth pattern as learned output even for perturbations of 20% (inset at lower right).The error generated by networks with precise reset increases gradually with perturbation strength, while there is an abrupt change for the native networks.

Figure D :
Figure D: Robustness of PCSNs against reduction of fast couplings.The panel shows the median RMS error between the output of PCSNs and the saw tooth target pattern during testing versus the size spert of the multiplicative reduction of the fast connections (shaded: intervals between first and third quartile).For spert < 10% the error increases only slightly with increasing spert.A further increase of spert leads to a strong, rapid increase in the error and the PCSN is soon not able to learn the pattern anymore.

Fig. C shows
Fig. C shows this by example of the learning of the saw tooth pattern (cf.Figs.3d, A).The W njm are

FigFigure E : 2 ,Figure F :
Fig. Fb,c shows the trade-off between the number of neurons in the PCSNs and their individual spiking frequency: For γ s ≈ 0.1 − 0.15 only the networks with N = 200 and N = 400 learn the task reliably (panel (b)), the neurons adopt a mean spike rate of about or smaller 100Hz.For sufficiently

Fig.
Fig. Ga,b illustrates and quantifies this for the camel's hump task (Fig. 3e).Fig. Ga displays the continued desired dynamics and the occurring phase shift for longer recall durations.Fig. Gb shows that the RMS error is stationary, approximately constant over time, if one corrects for the shift.For the Lorenz attractor (Fig. 3f-h), the teacher and student trajectories quickly depart from each other after the end of learning.The spiking network nevertheless continues to generate dynamics that after decoding agree with the dynamics of a Lorenz system.Fig. Gc shows this for longer times.As a quantitative check, the tent map Fig. 3h relating subsequent local maxima in the z-coordinate shows

Figure G :
Figure G: Long term evolution.(a): PCSN generated camels's hump signal and target (Figs.3e, F) after long times: the signal is phase-shifted compared to the target but otherwise not noticeably changed.Panel (b) quantifies this observation by plotting the RMS error, corrected for a possible phase shift to the target, against time.Displayed are several learning trials with different random initial connectivity.The error is stationary, constant except for fluctuations.(c): Lorenz attractor signal of Fig. 3f-h, dynamics of x, y, z vs. time.Also after long times, the PCSN (red trace) in general generates qualitatively the same dynamics as the target Lorenz system (yellow trace).Sat.syn.N α dt T t λ −1 s Parameters used in the different figures for simulations of networks with nonlinear dendrites.The parameter a = λs − λx is given in terms of λs and λx θ is swept between 0.5 and 0.01 (specific values of the sweep: θ =0.5, 0.4, 0.3, 0.2, 0.1, 0.09, 0.08, 0.07, 0.06, 0.05, 0.04, 0.03, 0.02, 0.01).We compute the RMS error between the signal and the target in the first 10.5s after training.The error is computed using a normalized version of the signal, where the amplitude is set to 1.

Figure B √( 1 −e − 2 ) 1 √ s to 0.01 1 √s 1 √ s to 0.2 1 √s
Figure BThe signal has period 2s and amplitude 15 (normalized to one in the figure).We take medians and quartiles over 50 trials.The noise level is given in terms of the standard deviation generated by a purely noise-driven subthreshold membrane potential (Ornstein-Uhlenbeck process) with membrane time constant λ V , in multiples of the threshold (which equals θ/2 in the case of saturating synapses and θ in the case of non-linear dendrites), i.e. noise-level := ση √ 2λ V , lower panels, see "Figure details" in the main text.

Figure F Fig
Figure F

Fig.
Fig. Fb,c: We use PCSNs with nonlinear dendrites, which encode continuous dynamics as generated by the rate networks in panel (a).γ s is swept over 0.4, 0.25, 0.15, 0.1, 0.09, 0.075, 0.06, 0.05, 0.04, 0.025, 0.015, 0.01 and networks of size N are plotted for N = 70, 100, 200, 400.Fig. Fb shows the median of the RMS error (shaded: interval between first and third quartile) between the signal and the target in the first 10s after training.Fig. Fc shows the median of the mean spike rate per neuron (shaded: intervals between first and third quartile).The statistics are based on 20 trials per parameter combination.Fig. Fd: The plot shows example patterns generated by continuous networks as used in panel (a) (blue trace) and PCSNs as used in panels (b,c) (red trace).For the continuous network N = 50, for the PCSNs γ s = 0.01, N = 70, J = 50.The PCSN has a mean spike rate of 441 Hz

Fig. Gb:
Fig. Gb: We use the same network parameters as in Fig. Fd and Fig. Ga, displayed are 10 learning trials with different randomly chosen initial connectivity.We compute a phase-shift corrected version of the RMS error in a sliding window of size 100s, for different starting points τ of the sliding window.τ is in the range from 1s to 10000s with step size 1s.The phase-shift corrected version of the RMS error is computed by 1 100s τ +100s τ signal( t) − target( t + ∆) 2 d t with the phase-shift ∆ of the target signal chosen such that the integral is minimal.

Fig. Gc:
Fig. Gc: The parameters are the same as in Fig. 3h.

Table 1 :
nj coupling from the jth dendriteD nj = J i=1 Γ in A ij of neuron nto its soma W njmcoupling strength from neuron m to W njm = Γ jm the jth nonlinear dendrite of neuron n Ũnm slow coupling from neuron m to neuron n; Ũnm = a J j=1 Γ jn Γ jm + µλ s δ nm Parameters of a network of neurons with nonlinear dendrites (cf.Equation (4)) and their optimal values.
The slow, significantly temporally filtered inputs from neuron m to the soma of neuron n, Ũnm r m (t), have connection strengths Ũnm .The fast ones, U nm s m (t), have negligible synaptic filtering (i.e.negligible synaptic rise and decay times) as well as negligible conduction delays.The resets and recoveries are incorporated as diagonal elements of the matrices U nm and Ũnm .To test the robustness of the dynamics, also here we sometimes add a white noise input explanation optimal value D ), i.e. such that the time average of the square of the errors e k (t) = z k (t) − F k (t) is minimized.At T t , training stops and the weights are not updated anymore in the subsequent testing.If present, the external input to the neurons is a weighted sum of K in continuous input signals f k (t), I e,β (t) =

Table 2 :
Parameters used in the different figures for simulations of networks with saturating synapses.

Table 3 :
Parameters used in the different figures for simulations of networks with nonlinear dendrites.The parameter a = λs − λx is given in terms of λs and λx since every element of [x] + is larger/equal zero while every element of [x] − is smaller/equal zero, and elements which are nonzero in [x] + are zero in [x] − and vice versa.With this we get The figure shows performance of native PCSNs (networks without precise reset, green) and for PCSNs where neurons are reset to a fixed membrane potential after spike generation (networks with precise reset, blue).Displayed is the median RMS error between the output of PCSNs and the saw tooth target pattern during testing versus the standard deviation Wpert of a multiplicative Gaussian perturbation in the connectivity (shaded: intervals between first and third quartile).The insets display the testing phase for two examples using the setup with precise reset (crosses in the main plot denote the corresponding Wpert and error levels).While the native model does not show successful learning for perturbations larger than 2% of the connection weights, networks with precise reset generate a recognizable sawtooth pattern as learned output even for perturbations of 20% (inset at lower right).The error generated by networks with precise reset increases gradually with perturbation strength, while there is an abrupt change for the native networks.
1 s and 21544 1 s (values in the different sweeps: s 0 = 50 1