Achieving stable dynamics in neural circuits

The brain consists of many interconnected networks with time-varying, partially autonomous activity. There are multiple sources of noise and variation yet activity has to eventually converge to a stable, reproducible state (or sequence of states) for its computations to make sense. We approached this problem from a control-theory perspective by applying contraction analysis to recurrent neural networks. This allowed us to find mechanisms for achieving stability in multiple connected networks with biologically realistic dynamics, including synaptic plasticity and time-varying inputs. These mechanisms included inhibitory Hebbian plasticity, excitatory anti-Hebbian plasticity, synaptic sparsity and excitatory-inhibitory balance. Our findings shed light on how stable computations might be achieved despite biological complexity. Crucially, our analysis is not limited to analyzing the stability of fixed geometric objects in state space (e.g points, lines, planes), but rather the stability of state trajectories which may be complex and time-varying.


Matrix Notions
We will denote vectors and matrices with bold font. For example: x ∈ R N is an N -dimensional column vector and A ∈ R N ×M is an N × M matrix. We use the notation P 0 to denote a positive definite matrix, which is a square matrix such that, for all x ∈ R N = 0, the quadratic form x T Px is strictly positive: When the quadratic form is non-negative (i.e x T Px ≥ 0) then we say that that P is positive semi-definite and write P 0. Negative definiteness and semidefiniteness are defined the same way, with the sign of the above inequalities reversed.
Every real, square matrix Q can be decomposed into a symmetric part and a skew symmetric part. We denote the symmetric part of Q as Q s and define it as follows Q s ≡ Q + Q T 2 (1. 1.2) and the antisymmetric (skewsymmeric) part as Note that Q = Q s + Q a . Also note that without ambiguity, when we say that a non-symmetric Q is positive definite, we mean that Q s 0. This is because: Substituting this equality back into (1.1.1), we see that a matrix is positive definite if and only if its symmetric part is positive definite. For more informa-tion on positive definite matrices, see section 7.1 of [3] or section 3.5.1 of [5], for example.

Matrix Measures
We will also need the fact that a given vector norm | · | will induce a matrix norm || · || ||Q|| ≡ sup For more information about matrix measures the reader is referred to section 2.2.2 of [7] or section 2 of [1].In this paper we will rely primarily on the matrix measures induced by the infinity-norm | · | ∞ and the 2-norm | · | 2 . These norms induce the matrix measures Finally, consider the (potentially time-varying), non-singular matrix Θ and the associated weighted 2-norm and weighted infinity-norm : |x| Θ,2 = |Θx| 2 .
The matrix measures induced by these norms are:

Matrix Vecorization
Finally, consider an arbitrary N × N matrix A. The vectorization of A, which we denote by v A , is the N 2 × 1 column vector obtained by stacking the columns of A on top of one another, starting from the leftmost column and ending at the rightmost column. Consider the example of: We also define the N 2 × N 2 matrix D A as the diagonal matrix with the elements of A along its diagonal, starting with the leftmost column of A and ending with the rightmost column. For the A matrix defined above, we have: These two operators have a special relationship to the Hadamard product (element-wise matrix multiplication). Consider a matrix Z defined as: where X and Y are of the same size. Then we have the following relationship: which can be verified through direct inspection.

Basic Contraction Definition
In this section we define what a contracting system is, and state several properties of these systems that will be useful for the remainder of the supplementary.
The reader is referred to [4] and [6] for more details. Consider a deterministic system of the formẋ where f is an N×1 nonlinear vector function and x is the N×1 state vector.
We assume that all quantities are real and smooth, by which we mean that any required derivative or partial derivatives exist and are continuous. Denote the N × N time-varying Jacobian of this system as J(x, t) = J = ∂f ∂x . If there exists a matrix measure µ such that then all trajectories of (1.2.1) converge exponentially towards one another with rate λ and the system is said to be contracting. In particular, let R(t) = |x(t) − x (t)| denote the distance between any two trajectories of system (1.2.1) and assume the system is contracting. Then there exists a constant k ≥ R(0) To avoid ambiguity in language, we now define some terms associated with the measures µ 1 ,µ 2 and µ ∞ . As in [4], consider a metric M = Θ T Θ 0 and the following associated quantity, called the generalized Jacobian: If the matrix measures induced by the 1,2, or infinity norm are negative definite when applied to F, i.e.
then we say that the system is contracting in metric Θ with rate λ. Calling Θ a metric is a slight abuse of language, because the real metric is M = Θ T Θ, but since we will always use the symbol Θ in the generalized Jacobian, this won't cause any confusion.

Contracting Dynamics of A Leaky Neuron
In this section we apply contraction analysis to the dynamics of a single, leaky, input-driven neuron. Analyzing the case of one neuron will turn out to be helpful when thinking about the case of many neurons.

A Motivating Example
Consider the activation of a pure integrator neuron that is receiving constant input, I:ẋ = I (2.1.1) This differential equation can be solved analytically, and has the solution: Since the solution does not forget x(0), its initial condition, this system is not contracting. However, if we were to add a 'leak' term, for example −x to (2.1.1), we would haveẋ which yields the solution with χ = x(0) − I. Since this system forgets x(0) exponentially quickly, we may conclude that it is contracting (of course we also could have computed the Jacobian, which in the case of (2.1.3) is −1). In the case of nonlinear dynamics and time-varying input, we will not be able to solve for the dynamics of x explicitly to determine contracting properties-instead, we will have to analyze the Jacobian.

Leaky Neuron with No Autapse
Let x denote the activation of a single, isolated, leaky-integrator neuron. Assume that this neuron has no self-connection (i.e no autapse) and obeys the equation: where τ > 0 is the time-constant of integration, α > 0 is the time-constant of the 'leak', and I(t) is an external input into the neuron. If α = 0 this would be a pure integrator neuron (see section 2.1).Dividing through by τ , we may rewrite (2.2.1) as the Jacobian of this scalar system is: since α and τ are positive, α τ is also positive, which means that − α τ is negative. Since the Jacobian is negative, we can conclude that the isolated leaky neuron is contracting for all inputs. In the paper we will consider several classes of leaky neurons, including neurons that obey the equation By the same arguments as above, leaky neurons of this form are also contracting in isolation. Of course, the standard leak term of −x is a special case of h(x). For h(x) = −x we find that β = 1.

Synaptic Dynamics that Favor or Preserve Contraction
The assumption that synaptic dynamics are local introduces strong structure into the Jacobian of the overall network. In particular, it implies that the 'synaptic' diagonal block of the Jacobian will be a diagonal matrix. We'll now establish some simple facts about matrices with this structure that will be useful later on.
Theorem 1. Consider the following matrix where X is an arbitrary negative definite matrix, D is diagonal and negative definite, A and B are diagonal and positive definite and G is arbitrary. Then there exists a diagonal metric Θ such that where L is some nonsingular matrix, to be determined. Applying this to the matrix F, we find that We would like the off-diagonal terms of this matrix to cancel out when taking the symmetric part. To make this happen, we need Which can be easily manipulated into The square root is guaranteed to exist because A and B are both positive definite and diagonal by assumption. Taking the symmetric part of (3.0.1), we find that: which is negative definite since the diagonal blocks are negative definite by assumption. This proves the result.
We will use this result in the next section to show that anti-hebbian plasticity gives rise to contracting dynamics.

Anti-Hebbian Dynamics Produce Contraction
Consider RNNs of the following form: , W is the weight matrix of the network, and u(t) is a time-varying input. And consider the plasticity described in the main text,given bẏ where the matrix of k ij terms, K must be positive semi-definite for reasons to be discussed. To prove that this plasticity leads to overall contracting dynamics, we need to show that the overall Jacobian of the system is contracting in some metric. We will do this in several steps. To give an overview of the strategy, first write the synaptic dynamics as an N 2 × 1 vector: where v W denotes the vectorization of W. The Jacobian of the overall neural-synaptic system is then: The goal will be to show that the off-diagonal blocks of this overall Jacobian cancel out in some metric, while the diagonal blocks are negative definite.

Off-Diagonal Blocks of the Jacobian
To begin, first consider the following observation. For weight dynamcics given by: with γ(t) > 0, we have that the off-diagonal blocks of the overall Jacobian cancel out.
First we will show that w ij converges to w ji after exponential transients for (3.1.2). Consider a simple distance function between w ij and w ji : evaluating the time-derivative of V and substituting in (3.1.2), we see thaṫ which implies that the distance between w ij and w ji shrinks exponentially with at least rate min t (γ(t)), independently of initial condition or the particular trajectory of x. For the rest of the analysis we will therefore assume that w ij ≈ w ji . Using the symmetry of W, the following four relationship can be shown: The first three of these relationships can be verified by direct differentiation of (3.1.2). The last relationship can be seen by using the symmetry of W. In particular, that: What we have shown now is that under the synaptic dynamics defined above, we have the following relationship: Now , define a 'Hebbian' term as H ≡ xx T . The synaptic dynamics defined by (3.1.2) can in matrix notation can be written aṡ and differentiating this expression with respect to x, we have: and finally, left multiplying this expression by the inverse of D P we have Note that this expression is only true when W is symmetric. Now consider the synaptic dynamics in the main text: where K is symmetric , positive semi-definite and has positive entries. Vectorizing in the same manner as above and substituting in (3.1.5), we find that If D K is positive (which will happen when K has all positive entries), then D K D −1 P is diagonal and positive definite, and thus Theorem 1 applies-this is a feedback combination that automatically preserves contraction.In other words, a metric can be found in terms of D K D −1 P which leaves the diagonal blocks of the Jacobian untouched and forces the off-diagonal blocks to cancel out when computing the symmetric part of the Jacobian.

Diagonal Blocks of the Jacobian
The upper left block of the Jacobian is thus, for this matrix to be negative definite we need To see that this will be the case, consider that the synaptic dynamics defined in the main text can be written in matrix notation aṡ Now discretize the differential equations for W: rearranging this to collect all the t terms on the right hand side, we have To make derivations less tedious, let's define β ≡ ∆t > 0 and α ≡ (1 − β).
Note that since that, ideally, ∆t is small, we can say that 0 < α < 1. Now we have the discrete dynamical system for W: We know need to notice that −K x t x T t is symmetric (since K and x t x T t are symmetric). It is also negative semi-definite, by the Schur Product Theorem.

Let's denote this expression by
Taking the symmetric part of both sides we find that: We can now use Weyl's inequality for the sum of Hermitian matrices (see premilinaries) to conclude that and therefore that which, since 0 < α < 1, means that the symmetric part of W decays exponentially to zero. Recall that for contraction we needed µ 2 (W) < β. Assuming that µ 2 (W 0 ) > β, the upper-bound for the number of steps this will take is To summarize, we have shown that the synaptic dynamics described bẏ where K is a symmetric, positive semi-definite matrix with positive entries, lead to contracting neural and synaptic dynamics in a wide class of RNNs.
We did this by showing that the overall Jacobian (3.1.1) has diagonal blocks that 'cancel' out and diagonal blocks that are negative definite-implying negativedefinitness of the whole Jacobian and thus contraction of the combined neural and synaptic system. Crucially, to show this we had to use a metric other than the identity metric.

Sparsity Ensures Contraction
In this section we show that it is possible to ensure contraction-without any knowledge of the specific synaptic dynamics-by keeping the network sufficiently sparse. This result is a straightfoward application of the µ ∞ matrix measure to the Jacobian of the overall neural and synaptic system. Consider neural networks of the formẋ where h i is a leak-term and r i is bounded activation function that satisfies the following two properties: We consider synaptic dynamics of the forṁ That is to say, we only consider synaptic dynamics that are local, in the sense that the change in weight is only a function of the weight itself, and the pre and post synaptic neurons.
It will be useful to define, for neuron i, the total number of afferent synapses and the total number of afferent dynamic synapses which we denote p i and d i , respectively. Note that d i is a fraction of p i and can therefore be written as d i = α i p i for some 0 ≤ α i ≤ 1. It will also be useful to give names to one more global system parameter: the maximum amplitude (absolute value) of all the weights in the network: w * ≡ max ij (|w ij |). For the neural rows of the Jacobian, we see that the µ ∞ measure can be upper bounded as By using the following two relations: we can see that (4.0.4) can be assured if Substituting in the relationship d i = α i p i into the above, we get: We still have the 'synaptic' rows of the Jacobian to worry about, but these are much simpler. The locality assumption implies that there are at most three elements per row. These elements are the derivatives of g ij with respect to x i and x j and the derivative of g ij with respect to w ij . Thus, if the bottom two Jacobian blocks satisfy the matrix measure condition. Intuitively, this inequality tells us that the 'self-stabilization' of the synapses must be stronger than the change in plasticity due to neural activity. For example, if one imagines that g ij contains a constant leak term of the form −γw ij where γ > 0, then the inequality above is a sufficient condition on how large γ should be.
In summary, we have shown that if the synapses are contracting (i.e (4.0.6) is satisfied) then (4.0.5) tells us that it is possible to assure contraction of the overall neural and synaptic network by making the network sufficiently sparse.

Contraction in RNNs with Static Weights
In this section we'll analyze RNNs of the general forṁ Where h i is a leak-term (see (2.2.4) for definition), φ is a nonlinear function with a linear 'regime' (such as tanh(x)), g i is a neuron-specific gain-term that satisfies: and u i (t) is a time-varying input into neuron i. For simplicity, we focus here on the regime where In this case we may rewrite (5.0.1) in matrix form as: where G is a constant diagonal, positive definite matrix such that G ii = g i .
The Jacobian of (5.0.1) is a state-dependent matrix: where L is a diagonal matrix such that L ii = ∂hi ∂xi ≤ −β. Consider the following metric transformation to yield a generalized Jacobian, F: Where D is an arbitrary, non-singular diagonal matrix. Applying this metric to (5.0.3) we find that Which implies that a sufficient condition for contraction in the RNN described by (5.0.2) is that If (5.0.6) is true, then the network described by (5.0.2) is contracting with rate of λ F = c g min .

Contraction in Circuits with Perfect E-I Balance
One practical implication of (5.0.6) is that we can now exploit feedback structure present in W in a way that would have been missed by using the spectral norm.
As an example, consider a network with weight matrix Assume that W IE is a matrix containing all positive entries and W EI is a matrix containing all negative entries. Further assume that: where k > 0. In other words, (5.0.8) asserts that this circuit has perfect E-I balance, up to a constant scalar factor. In this case, the weight matrix becomes: This corresponds to two interacting neural populations, one inhibitory and one excitatory. The excitatory population excites the inhibitory one through the W IE submatrix and the inhibitory population inhibits the excitatory population through the −kW T IE submatrix. Denote the (unweighted) matrix measures of the EE and II subnetworks µ EE and µ II , respectively: It's straightforward to show that by picking D as: (5.0.10) which implies that This shows that by appropriately scaling µ EE and µ II to satisfy (5.0.6), the RNN can be made contracting independent of the strength of individual elements in the diagonal blocks.

Contraction in Circuits With Statistical E-I Balance
In a biological system, perfect E-I balance (i.e (5.0.8)) cannot reasonably be expected. However, some basic results from matrix algebra and random matrix theory can tell us that contraction can be assured even in the case where E-I balance occurs in an average sense.
In particular, we'll need the following result from matrix algebra. Consider a symmetric block matrix of the form: where A and C are negative definite and symmetric, with λ max (A) = −λ A < 0 and λ max (C) = −λ C < 0. Then a standard result from matrix algebra [3] says that a sufficient condition for negative-definiteness of (5.0.11) is that: We'll also need the following result from random matrix theory. Consider an N × N symmetric matrix, R, whose elements above and including the diagonal are distributed according to a Gaussian distribution: Then for large N , we have from [2] that, in expectation: We can use (5.0.12) together with (5.0.13) for how much deviation from perfect E-I balance a circuit of size N can tolerate while preserving contraction.
In particular, consider again the matrix defined in (5.0.7) and assume that (5.0.6) is satisfied for each diagonal block. That is, assume: and assume N E = N I = N 2 for simplicity. We define R as follows: from (5.0.12), we can assure contraction in regions where: To find this region, we will upper bound the spectral norm of R as follows: instead of the case where E-I and I-E connections cancel out exactly, consider the case when instead they are each drawn from a distribution whose means cancel out exactly. Namely: This implies that:

)
Thus, the matrix R is a symmetric, Gaussian matrix with zero mean and variance σ 2 IE +σ 2

4
. By plugging this expression into (5.0.13) we can conclude contraction when the following relationship holds between R and the eigenvalues of the E-E and I-I connection matrices: Of course, in the limit as the spreads go to zero, we arrive back at the case of perfect E-I balance with k = 1, which always yields contraction.
Another way to get a sense of the tolerance' for imperfect E-I balance is to use the Frobenius norm of R, which upper-bounds the spectral norm. Assume that for each connection between excitatory neuron i and inhibitory neuron j, we have that This straightforwardly implies that the Frobenius norm of R is upper bounded by: Which means that it is possible to ensure contraction by scaling κ with N , since: So by scaling κ, which measures how imperfectly each E-I synapse is balanced by the corresponding I-E synapse, we can guarantee contraction. and derive a slight generalization of a known sufficient condition for the ESP in networks where the activation function is more flexible than tanh(x).

Relationship to Echo State Networks
We will need the following result. Consider a discrete-time dynamical system: x n+1 = T(x n , u n ) It was shown in [4] that this system will be contracting if there exists some non-singular matrix Θ n such that Theorem 2. Consider the discrete time RNN x n+1 = Wr(x n + u n ) (5.1.1) where r i = r j = r is a nonlinearity (e.g. tanh(x)) that satisfies If the matrixW ≡ √ gW is Schur diagonally stable, then the RNN is contracting, and therefore has the ESP. That is, if there exists a postive definite, diagonal matrix P such that: the RNN is contracting.
Proof. For the RNN defined in the theorem, we have σ max (Θ n+1 WΦΘ −1 n ) < 1 Let Θ n = P 1/2 , where P is diagonal and positive definite. Using the submultiplicitivity of the spectral norm, we find that a sufficient condition for contraction is: σ max (P 1/2 WP −1/2 ) < g −1 we can easily rewrite this condition as Multiplying both sides by g 2 , we find that the RNN is contracting if Which can be readily recognized as the statement that the matrixW = gW is Schur diagonally stable.
Remark 1. The above stability criterion immediatly applies to RNNs of the form: x n+1 = r(Wx n + u n ) for nonsingular W. This can be seen using the following coordinate transform and applying it to (5.1.1): Wr(x n + u n )] = r(Wy n + u n ) Since δy = W −1 δx, this means that δx → 0 ⇐⇒ δy → 0, Remark 2. For the specific case of r(x) = tanh(x),and therefore g = 1, this result already appears in [8].
Remark 3. A common RNN uses the discontinuous activation function: This does not pose a problem for the above stability condition, for the following reason. Define a diagonal, 'switching' matrix S in the following way: And note that the dynamics of the ReLu network can be written: x n+1 = WSx n + u n The Jacobian of this system is: Since S is diagonal and ||S|| ≤ 1, we can use the same diagonal metric transformation as above to find that a sufficient condition for a ReLu RNN to be contracting is that W is Schur diagonally stable. In other words, if there exists some positive and diagonal P such that