The impact of sparsity in low-rank recurrent neural networks

Neural population dynamics are often highly coordinated, allowing task-related computations to be understood as neural trajectories through low-dimensional subspaces. How the network connectivity and input structure give rise to such activity can be investigated with the aid of low-rank recurrent neural networks, a recently-developed class of computational models which offer a rich theoretical framework linking the underlying connectivity structure to emergent low-dimensional dynamics. This framework has so far relied on the assumption of all-to-all connectivity, yet cortical networks are known to be highly sparse. Here we investigate the dynamics of low-rank recurrent networks in which the connections are randomly sparsified, which makes the network connectivity formally full-rank. We first analyse the impact of sparsity on the eigenvalue spectrum of low-rank connectivity matrices, and use this to examine the implications for the dynamics. We find that in the presence of sparsity, the eigenspectra in the complex plane consist of a continuous bulk and isolated outliers, a form analogous to the eigenspectra of connectivity matrices composed of a low-rank and a full-rank random component. This analogy allows us to characterise distinct dynamical regimes of the sparsified low-rank network as a function of key network parameters. Altogether, we find that the low-dimensional dynamics induced by low-rank connectivity structure are preserved even at high levels of sparsity, and can therefore support rich and robust computations even in networks sparsified to a biologically-realistic extent.


Introduction
Neural recordings in animals performing cognitive tasks have revealed that individual neurons ubiquitously display a high degree of coordination. When viewed in the activity state space, in which each each axis represents the firing rate of one unit, the trajectories of neural activity are typically confined to low-dimensional subspaces [1][2][3][4][5][6]. The resulting latent dynamics have been proposed to underpin complex cortical computations [7]. However, how the inputs to the network and the connectivity between the individual units shape such low-dimensional activity remains a prominent question.
A recently developed class of models, recurrent networks with low-rank connectivity, provide a tractable theoretical framework for addressing this question and unravelling the relationship between connectivity structure, low-dimensional dynamics and the resulting computations [8][9][10][11][12][13][14][15][16][17]. One important limitation is, however, that these models often assume a dense connectivity structure in which every neuron shares synapses with every other. In contrast, cortical networks exhibit a high degree of sparsity in their connectivity, meaning that each neuron receives inputs from only a fraction of its neighbors [18][19][20][21]. Since sparse matrices are typically full-rank, an important question is whether and how the results obtained in the study of low-rank recurrent networks apply to sparse connectivity structures.
Here we investigate how the dynamics of low-rank recurrent networks are impacted by increasing degrees of sparsity. We start by analysing the eigenvalue spectra of low-rank connectivity matrices in which sparsity is imposed by removing a random fraction of entries. Such matrices are full-rank, but we find that the corresponding eigenspectra consist of a continuous bulk and isolated outliers, and are therefore analogous to low-rank matrices superposed with a random, full-rank component [8,22,23]. We show that both the radius of the eigenvalue bulk and the outliers can be estimated analytically. We then use these results to compare the dynamics of sparsified low-rank networks to those of densely connected low-rank networks with a full-rank random component. Altogether we found that the low-dimensional dynamics generated by a low-rank connectivity structure are highly resistant with respect to sparsity and therefore provide a robust substrate for implementing computations in networks with biologically realistic connectivity.
where x i describes the total input current to each unit, τ is the time constant of the dynamics, J ij is the synaptic weights from unit j to unit i and ϕ(�) is a non-linear transfer function that we take to be the hyperbolic tangent. Each unit can also receive a time-dependent input current of magnitude u(t) via a feedforward weight vector I = {I i } i = 1. . .N . The set of synaptic weights J ij are stored in a connectivity matrix J, for which we consider two forms. We begin by considering full-rank Gaussian connectivity, introducing sparsity into the synaptic weights and establishing the ways in which sparse Gaussian networks differ from their fully-connected counterparts. We then constrain the connectivity to be low-rank and sparsify as before, examining how the impact of sparsity in such networks both parallels and contrasts with that of the Gaussian case.
The full-rank Gaussian networks are defined by a matrix J with entries independently distributed as where g controls the variance of the matrix entries and thus the strength of the coupling. For the low-rank networks, we consider the simplest case of a rank-one matrix P, constructed as in previous work [8] as a rescaled outer product of two N-dimensional random connectivity vectors m = {m i } i = 1. . .N and n = {n i } i = 1. . .N such that This ensures that all columns of P are linearly dependent and proportional to m. The individual entries m i and n i of the connectivity vectors are drawn independently for each i from a joint Gaussian distribution with mean 0 and covariance matrix S: where σ 2 is the variance of both connectivity vectors which controls the overall strength of the coupling, and σ mn is the covariance between them (see Methods). In the large N limit, this covariance becomes equivalent to the degree of overlap between m and n, given by the normalised scalar product: The covariance σ mn plays a critical role in the stability of the dynamics of the rank-one network due to its influence on the matrix eigenvalues, as will be seen in the following sections. The variance σ 2 likewise gains a critical influence as soon as the matrix becomes sparse. These two key parameters controlling the connectivity, the variance and covariance of the connectivity vectors, will therefore become paramount in the later analysis of the dynamics.

Eigenvalues of connectivity matrices
The dynamics of recurrent networks are strongly influenced by the eigenspectrum of their connectivity matrix. Regardless of network structure, the dynamics always possess a trivial fixed point at zero, since we take the transfer function ϕ(�) to be the hyperbolic tangent, and tanh(0) = 0. The stability of this zero fixed point is determined by the magnitude of the eigenvalue with largest real part, λ max . Since ϕ 0 (0) = 1, the stability matrix at zero reduces to S ij = J ij − δ ij , so the fixed point at zero becomes unstable when the largest eigenvalue of the connectivity matrix J surpasses unity. As soon as this occurs, non-trivial dynamics can emerge. To understand the impact of sparsity on network dynamics we will therefore analyse the changes in the network eigenspectra, and will place particular focus on λ max , the eigenvalue with maximum real part.
Eigenvalues of sparsified full-rank networks. We first consider the impact of sparsity on the eigenspectra of the full-rank, random connectivity matrices (Eq 2). Work in random matrix theory has demonstrated that the eigenspectra of such matrices are described by Girko's circular law [25]: for a matrix with entries independently distributed with mean zero and variance Var, the eigenvalues converge in the limit of N ! 1 to a uniform distribution within a disk of radius ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Var � N p centred at the origin. This result is universal in the sense that it holds for any distribution with finite variance. For the Gaussian networks J introduced in (Eq 2) with Var = g 2 /N, the eigenvalues are therefore uniformly distributed on a circular disk of radius approximated by g. Since the distribution is circular, the radius of this disk is equivalent in the large N limit to λ max , the eigenvalue with maximum real part, so we refer to both by the spectral radius R.
To sparsify the matrix, we simply choose a fraction s 2 [0, 1] of connections to set to zero at random. This is achieved by multiplying the original matrix J elementwise with a binary matrix X, where X ij are drawn independently from a Bernoulli distribution Bð1; 1 À sÞ, forming a sparse matrixJ ¼ J � X characterised by a degree of sparsity s (Fig 1A). Due to the presence of zeros, the entries ofJ have a lower variance than those of J, but remain independently distributed. Because of this property of independence, we expect the universality result of the circular law for iid matrices to hold [25]. Indeed, we find that the eigenvalues of the sparse matrixJ continue to distribute uniformly on a disk for which the spectral radius is described by ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where Var is now the variance of the sparse matrix elementsJ ij . This variance can be calculated (see Methods) as (1 − s)g 2 /N, giving a spectral radius of: Influence of sparsity on the eigenspectra of full-rank networks. A: Illustration of how sparsity is imposed in the connectivity matrix, where the degree of sparsity is s = 0.5. B: Complex eigenspectra of full-rank, Gaussian connectivity matrices of finite size (N = 300, g = 1) in the dense case (left) and with a sparsity of 0.5 (right). The dashed line plots the unit circle. C, D, E: Reduction of spectral radius R as a function of sparsity in a fullrank matrix J constructed as in (Eq 2) with connection strength g = 1. In C, sparsity is imposed as a fraction of total connections removed (N = 1000). In D and E sparsity is imposed by fixing the number of outgoing connections to C = 200 and increasing N. Dots: mean empirical spectral radius, measured as the largest absolute value of all eigenvalues, over 50 instances. Solid lines: theoretical prediction (Eq (6)). https://doi.org/10.1371/journal.pcbi.1010426.g001 Increasing the degree of sparsity s in a full-rank Gaussian network thus monotonically reduces the radius of the disk on which the eigenvalues distribute. In Fig 1C we demonstrate the correspondence of the prediction in (Eq 6) to the empirical spectral radius, measured as the largest eigenvalue in the spectrum of a finite-sized Gaussian matrix sparsified in the manner described above.
An alternative means of sparsifying the matrix is to set to zero a fixed number of outgoing connections C per unit, while increasing the total number of units N, where C � N. This is often a regime of interest in neuroscientific work [26]. The network is now defined by a degree of sparsity s = 1 − C/N, where C/N is the fraction of non-zero connections per unit. The impact of sparsifying the matrix in this manner is equivalent to the previous case, where the theoretical spectral radius is now given by g ffi ffi ffi ffi ffi ffi ffi ffi ffiffi C=N p ( Fig 1D) and taking the degree of sparsity to one now corresponds to taking the number of units N to infinity (Fig 1E).
Eigenvalues of sparsified rank-one networks. We now turn to the impact of sparsity on the eigenspectra of rank-one matrices. Fully-connected rank-one matrices have only one potentially-nonzero eigenvalue, formed from the scalar product of the corresponding left and right eigenvectors. For the matrix P defined in (Eq 3), the right and left eigenvectors are the connectivity vectors m and n, and the corresponding eigenvalue is located at m T n/N on the real axis, which is equivalent in the large N limit to the overlap σ mn between the connectivity vectors. All the remaining eigenvalues lie at zero.
When sparsity is introduced to the rank-one structure, forming a sparse matrixP, this matrix is now formally full-rank and possesses N potentially-nonzero eigenvalues. However, we observed empirically that the eigenspectrum splits into two distinct components. The eigenvalue associated with the rank-one structure remains distinct in the spectrum, since such structure persists as a backbone to the connectivity. We refer to this structural eigenvalue as the outlier. At the same time, the full-rank perturbation introduced by the sparsity induces additional eigenvalues with nonzero real and imaginary parts which distribute about the origin on a disk with non-uniform density (Fig 2A). We refer to this set of additional non-zero eigenvalues induced by sparsity as the bulk. For the sparsified rank-one matrixP, it is now these two components of the spectrum, the bulk and the outlier, that together contribute to the Influence of sparsity on the eigenspectra of rank-one networks. A: Eigenspectra of rank-one connectivity matrices of finite size, in the dense case (left) and under a sparsity of 0.5 (right). The matrix P is constructed as in (Eq 3), with parameters σ 2 = 16, σ mn = 1.44 and N = 300. Under sparsity, the outlier (gold) is reduced and the bulk distribution (brown) emerges. The dashed line plots the unit circle. B, C: Impact of sparsity on two key features of the eigenspectrum of finite-size rank-one networks: B, the outlier λ 1 , and C, the spectral radius of the bulk distribution. The outlier is eventually reduced below the instability boundary of λ 1 = 1, dashed line. Sparsity is imposed as a fraction of total connections removed; σ 2 = 16, σ mn = 4 and N = 1000. D: Same as in C but for sparsity imposed by fixing C = 200 non-zero connections and increasing N; bulk radius is plotted as a function of N. Dots: empirical measurements of outlier and bulk radius. Solid lines: theoretical prediction (Eq (9)).
https://doi.org/10.1371/journal.pcbi.1010426.g002 dynamics. Understanding the impact of sparsity on the dynamics therefore reduces to understanding how each component is modified by sparsity. We now address each in turn.
Firstly, the outlier λ 1 is reduced monotonically by sparsity. It can be shown that in the large N limit, the right connectivity vector m remains a right-eigenvector ofP, and yet a fraction s of matrix entries are now zero; this means that the factor by which the outlier is reduced is 1 − s (see Methods). The outlier therefore lies at (1 − s)m T n/N on the real axis, and is drawn in towards the origin as the degree of sparsity approaches one (Fig 2B). In the large N limit, this value is equivalent to Secondly, we wish to characterise the radius of the bulk distribution. Although the distribution of eigenvalues in the bulk is non-uniform (Fig 3A), it continues to be circular, and we thus hypothesise that the universality result for the radius [25] still holds. This would allow us to characterise the bulk radius directly using the variance of matrix elements, in a similar manner to the Gaussian matrix. To test this, we derive the variance of the elements not of the entire sparse matrixP, but of a new matrixP � which possesses solely the eigenvalues in the bulk distribution. We remove the eigenvalue outlier from the spectrum as in [22,23,27], constructing a new matrixP � ¼P À ð1 À sÞP as a linear combination of the dense matrix P and the original sparse matrixP. The connectivity vector m is also an eigenvector ofP � , but now with a zero eigenvalue. The distribution of the remaining eigenvalues ofP � is identical to those inP, but with the outlier λ 1 removed. By deriving the variance of the elements ofP � (see Methods), we obtain: The theoretical bulk radius given by the circular law is therefore: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi sð1 À sÞ N r : ð9Þ We find that this expression accurately describes the radius of the bulk distribution measured empirically in finite-size networks ( Fig 2C). When sparsity is imposed by setting a fixed number of connections C and increasing N, these quantities can simply be redefined in terms of C and N by substituting s = 1 − C/N. We thus obtain the outlier as: and the radius of the bulk as: In contrast to the previous case of Gaussian networks, the radius of the bulk distribution which emerges in rank-one networks therefore scales non-monotonically with sparsity, first increasing to its maximum extent at s = 0.5, then reducing once more into the origin as the degree of sparsity approaches one (Fig 2C and 2D).
High sparsity limit. With the rank-one connectivity defined with a 1/N scaling as in Eq 3, when C is fixed and N is taken to 1, both the bulk radius R and the outlier λ 1 are reduced to zero. In order to have a non-vanishing eigenspectrum and ensure that the bulk radius and the outlier remain finite in the limit of N ! 1, we turn to a rescaled version of the connectivity matrix P ij (Fig 3). By removing the weight scaling by N and considering simply the matrix the outlier is now constant with respect to sparsity: and the radius of the bulk becomes: which approaches ffi ffi ffiffi C p s 2 in the limit of high sparsity as N is taken to infinity at finite C ( Fig 3B). Thus in this rescaled network, for a given number of connections per neuron C and for a high level of sparsity C � N, the radius of the bulk distribution depends only on the variance σ 2 of the connectivity vectors (Fig 3C), and the location of the structural eigenvalue outlier depends only on their covariance σ mn (Fig 3D). This decoupling from N allows us to understand the dynamics of networks situated in the high-sparsity regime solely in terms of the two variables characterising the rank-one connectivity, the variance σ 2 and the covariance σ mn of the connectivity vectors.

Dynamics of sparsified rank-one networks
Having characterised the eigenspectrum of the sparsified rank-one matrix, we now turn to the insights we can extract about the dynamics. We have shown that the eigenspectrum of the sparse rank-one matrix is comprised of two distinct components, the outlier and the bulk distribution, which are under the independent control of two key parameters defining the network connectivity. Moreover, we have shown that in the large N limit, the spectral radius of the bulk distribution can be characterised by Girko's circular law, in the same manner as the

PLOS COMPUTATIONAL BIOLOGY
The impact of sparsity in low-rank recurrent neural networks circular disk of eigenvalues characteristic of a full-rank Gaussian matrix (Fig 1). This leads us to hypothesize an equivalence between the dynamics of a sparsified rank-one network and those of a dense rank-one network with an added full-rank, Gaussian component, which also give rise to an outlier and an eigenvalue disk in the spectrum [8]. In what follows, we explore the extent to which the dynamics of sparsified rank-one networks resemble dense rank-one networks with additional random connectivity, and highlight the aspects in which they are unique.
To preface our interpretation of the dynamics, we briefly summarise the behaviour of lowrank recurrent networks in the dense case [8,12,13]. In general, a network of rank R gives rise to dynamics embedded in an R-dimensional subspace spanned by the right connectivity vectors, with an additional dimension introduced by each addition of external input along a given vector I. Recent work has demonstrated that the trajectories within this subspace can be reduced to a mean-field description of a small number of interacting latent variables [12,13]. For the rank-one networks that we consider here, the activation of each unit x i can be described by: where I is the vector along which the network receives an external input. The latent variables κ r (t) and κ I (t) define the projection of the population activity x onto the vectors m and I respectively. The population activity therefore spans the plane formed by the vectors m and I, and reduces to a one-dimensional trajectory along the vector m in the absence of input. Whether or not activity is generated along m is determined by the total recurrent input κ rec , given by (see Methods): which represents the overlap of the network activity ϕ(x) with the left connectivity vector n. A non-zero value of κ rec -and thus non-trivial equilibrium dynamics structured along m-can only arise if the connectivity vector n has a non-zero overlap σ nI with the input vector I (inputdriven dynamics) or a non-zero overlap σ nm with the connectivity vector m (autonomous dynamics). In the latter case, the system can possess a bistability and evolve in the absence of input towards one of two stable fixed points. Adding an external input which overlaps with the vector n increases the activity along m but also eventually suppresses one of these bistable states [8].
Here, we address the impact of sparsity on the degree of structure present in the inputdriven and autonomous network dynamics. We focus on rank-one networks in the high-sparsity limit (Eq 12) where the number of non-zero connections C is fixed independently of N, in which case the magnitude of the outlier and bulk distribution become independent of N and remain finite at high sparsities. We first fix both the outlier and the bulk below unity, and consider the network response to an external feedforward input (Fig 4). We then turn to the autonomous dynamics that arise when both the outlier and bulk are above the instability, and study the dynamical landscape formed from the interaction between the two components ( Fig 5).
Input-driven dynamics. For dense rank-one connectivity, when both the eigenvalue outlier and the radius of the bulk are fixed below unity, the fixed point at zero is stable, and the network can display only transient dynamics invoked by an external input u(t) along a feedforward input pattern I (Fig 4A). If the input pattern is orthogonal to the left connectivity vector n, the network activity simply propagates the feedforward input pattern along the one-dimensional axis of I. However, two-dimensional trajectories can emerge if the vector I is given a non-zero overlap with n; in this case, the component of the input-driven activity ϕ(x) along n results in a non-zero κ r , which allows the trajectory to evolve into the m dimension during the course of the input current [8]. This behaviour can be seen by projecting the activity into the m-I plane (Fig 4B). The input-driven trajectory is confined to the m-I plane, revealing the underlying rank-one structure in the connectivity.
To understand the manner in which sparsity interferes with these input-driven trajectories, we first consider what happens when we simply add a random, Gaussian component of variance g 2 /N to an otherwise dense rank-one matrix, which introduces to the eigenspectrum an eigenvalue disk similar in nature to the bulk distribution that arises under sparsity. With such an addition, the input-driven trajectories are only subtly modified as the strength g of the random component (Eq 2) is increased from 0 to 1, increasing the radius of the corresponding eigenvalue disk from 0 to 1 (Fig 4B and 4C). The projection of the population activity in the m-I plane remains unaffected (Fig 4B, left), but the random perturbation to the recurrent inputs causes the population activity to gain additional dimensions (Fig 4B, right). The

PLOS COMPUTATIONAL BIOLOGY
The impact of sparsity in low-rank recurrent neural networks dominant dimensions of the activity remain aligned with the axes of I and m (Fig 4B, right), and the degree of activation along the m dimension, as quantified by κ r , is not reduced (Fig  4C, lower). However, because the radius of the eigenvalue disk induced by the random component is equal to g (Fig 4C, lower), the dimensionality of the network activity increases non-linearly with g (Fig 4C, upper).
The impact of sparsity is markedly different (Fig 4D-4F). As the degree of sparsity is increased, the increased presence of zeros in the connectivity reduces the overlap of the inputdriven activity with the n dimension, reducing κ r and leading to a progressive loss of structure along m (Fig 4E and 4F). Since the feedforward connections are left untouched, the degree to which the activity spans the I dimension is not affected. The input-driven trajectories of the sparse network are therefore flattened towards the I axis as sparsity is increased (Fig 4E), and the degree of activity along m is progressively decreased to zero (Fig 4F, lower). Moreover, since the radius of the eigenvalue bulk varies non-monotonically as C is decreased to increase the degree of sparsity (Fig 4F, lower), the dimensionality of the network activity also first increases before decreasing again to zero as the recurrent inputs are stripped away (Fig 4F, upper, blue curve). For comparison, we also show the dimensionality of an equivalent Gaussian network defined by a low-rank plus a random component with strength g equal to the corresponding radius of the bulk distribution at each degree of sparsity C (red curve). At intermediate sparsity levels, the sparse low-rank network gives rise to higher-dimensional dynamics than the equivalent Gaussian network.
We therefore highlight a key difference between the input-driven dynamics of sparsified rank-one networks and those of a dense rank-one network with an added random component. Sparsity interferes with the structure of input-induced activity in a way that a random component does not: it reduces the component of the dynamics along m otherwise revealed by an appropriate geometric configuration of inputs, and ultimately reduces the overall dimensionality of the activity by stripping away the influence of recurrent inputs. At intermediate sparsities, this dimensionality is increased above that expected in a dense low-rank network with an analogous eigenvalue disk introduced by a superposed Gaussian component. In this latter network type, the overall dimensionality of the dynamics monotonically increases with the strength of the random component while the the dynamics along m are preserved.
Autonomous dynamics. When the outlier and the bulk radius in the eigenspectrum are increased above unity, the fixed point at zero loses stability and non-trivial autonomous dynamics emerge. It is here that the equivalence of the sparse regime to the addition of a random component manifests itself. In previous work [8] investigating the autonomous dynamics that emerge in networks comprised of a rank-one component P plus a full-rank random part, distinct dynamical regimes were identified on the basis of the dominance of each component in the eigenspectrum. The dynamics were described as decaying, if both the outlier of P and the eigenvalue disk of J lie below unity; structured stationary, if the outlier crosses the instability and the disk radius, inducing a non-trivial fixed point along the axis of m; or chaotic, if the radius of the disk belonging to J crosses the instability and the outlier, introducing the higherdimensional fluctuations classically associated with random networks [24].
Due to the similarities in the eigenspectra, our analyses reveal that the autonomous dynamical regimes of sparsified rank-one networks can be mapped directly onto those described above (Fig 5). The bulk distribution plays a role analogous to the eigenvalue disk of the random part of the connectivity, while the eigenvalue outlier takes the role of the outlier of P. In the rank-one network in the high-sparsity regime (Eq 12), the location of the bulk and the outlier are independent of N and thus remain present in the eigenspectrum even at high sparsities, at magnitudes described respectively by ffi ffi ffiffi C p s 2 and Cσ mn (Fig 3). As in densely connected networks, the instability can either be lead by the outlier, bringing the network into a heterogenous stationary regime aligned with m ( Fig 5A, top left), or by the bulk, inducing chaotic dynamics (Fig 5A, bottom right). Since the magnitude of the bulk radius and the outlier are controlled respectively by the variance σ 2 and covariance σ mn of the connectivity vectors, the regime in which the network is situated is dictated solely by the relative configuration of these key connectivity parameters. The phase diagram of Fig 5A summarises the dynamical landscape that arises; we note that this diagram is equivalent to that in [8], where the variance σ 2 takes the place of the coupling strength g of the random component.
Since the precise location of the outlier and the bulk are a function of C, the form of the phase diagram is modulated by C. Fixing N and modulating C shifts the boundaries of the phase diagram (Fig 5B), which can alter the dynamics displayed by a network for given values of the variance σ 2 and covariance σ mn . For example, in a network fixed at a given parameter location in the phase diagram (white square), progressively decreasing C to increase the degree of sparsity causes the network activity to lose structure along m (Fig 5C), since the outlier Cσ mn decreases as a function of C.
In contrast, when fixing C and increasing N to sparsify the connections, the degree of structure along m remains unaffected (Fig 5D). This is because the outlier is independent of N, and the bulk radius quickly becomes so as the sparsity 1 − C/N is increased. Thus, the rank-one network constructed as in Eq 12, appropriately parameterised, can preserve its rank-one outlier and sustain a high degree of sparsity while still displaying the structured, one-dimensional dynamics that are the hallmark of its underlying connectivity.
In summary, sparsified low-rank networks display a wider range of dynamics than their dense counterparts, since the full-rank perturbation to the connectivity introduced by sparsity acts in a manner analogous to the addition of a random component. When the dynamics are purely input-driven, these two cases are not directly equivalent; increasing the degree of sparsity ultimately reduces the dimensionality of the dynamics and erodes the structured component, while imposing a random component expands the dimensionality of the dynamics. However, when the dynamics are autonomous, the dynamical regimes accessible to sparsified low-rank networks can be equated directly to those of a rank-one network with an added Gaussian term.

Computations in sparsified networks
Low-rank recurrent networks lead to a simple, transparent relationship between the connectivity and resulting dynamics which can be harnessed to implement a rich repertoire of inputoutput computations [8,[11][12][13]. The fact that this relationship is preserved even at high levels of sparsity indicates that such computations can be performed even in the highly-sparse regime. By way of example, here we demonstrate how a non-linear input-integration task can be implemented in a sparse low-rank network by designing the connectivity according to geometric principles originally developed in the context of dense low-rank networks (Fig 6).
We consider a simple evidence-accumulation task in the form of a common behavioural paradigm: a stimulus is assumed to vary continuously along a particular stimulus feature, such as the coherence of a random-dot kinetogram [28], and the task is to report whether the mean magnitude � c of this stimulus feature is greater than a certain threshold. The actual magnitude c (t) is assumed to be subject to noise fluctuations, so the input must be integrated over time. To model the task, we follow [8] and use the network structure of Fig 6A. The network is provided with a fluctuating stimulus c(t)I + η(t), where the input pattern I represents the stimulus feature of interest, c(t) is its fluctuating magnitude, and η(t) is a vector of independent noise inputs. A readout unit sums the network activity through a set of readout weights w to generate an output z(t). We consider the task as a Go-Nogo paradigm, in which the network must produce a positive readout if the average stimulus magnitude, � c, is above a certain threshold θ (Go condition), and a negative readout if the magnitude is below this threshold (Nogo condition).
In dense networks, the task can be implemented easily with unit rank connectivity, using a solution which relies solely on an appropriate geometric structure of input and readout weights. The details of the solution are discussed in detail in [8]; here we simply give an overview of the requirements. As described in the previous section on dynamics, the network response to an input pattern I overlapping with n is a two-dimensional trajectory which spans the m − I plane. A non-trivial readout can therefore be obtained by using a readout vector w which overlaps either with m or I. Here we use the simplest connectivity solution: we set I = n and w = m. This ensures that the network generates a non-zero output in response to the given input pattern I. Moreover, to obtain the nonlinear behaviour required for the task and ensure that the output is positive only when the input strength is above a certain threshold, we additionally choose m and n with an overlap σ mn greater than unity. This allows us to exploit the bistability of the resulting fixed point: by initialising the network in the lower fixed point, the readout is initially negative, but if the integrated input is strong enough, the network activity To summarise, the task is implemented with the following connectivity solution: we first generate randomly the input and readout vectors I and w; we then set n = I to allow the input to be picked up by the recurrent dynamics and m = w for the recurrent dynamics to themselves influence the readout; and finally, we add a common component to m and n so that they overlap in a shared dimension orthogonal to the input and readout, in order to generate the nonlinear switch in readout upon integrating the stimulus (Fig 6A, centre). To implement the task in a sparse rank-one network, we generate connectivity vectors m and n which obey these requirements and construct a rank-one matrix P as P ij = m i n j . We then keep only C non-zero inputs per neuron to form the sparse connectivity matrixP (Fig 6A, right).
Network simulations confirm that the task can be performed accurately at high degrees of sparsity (Fig 6B), generating a positive readout only for high average stimulus magnitudes, � c. The requirement for success is that the rank-one outlier remains greater than the bulk of the eigenvalue distribution, to keep the network in the structured regime in which the bistability can be maintained. Since the boundaries between regimes shift as C is modified (Fig 5B), the connectivity vector overlap σ mn needs to be modulated to ensure that the outlier continues to dominate as the network is sparsified. If σ mn is not modulated simultaneously, the psychometric curve for the task simply shifts with C ( Fig 6C): as C is lowered, the network displays increased sensitivity and lower (sub-threshold) stimulus strengths begin to generate positive readouts, decreasing the accuracy on the task. In contrast, if σ mn is modulated in order to fix the outlier at a constant value while sparsity is modified, the psychometric curve does not change (Fig 6D). This is because it is the overlap σ mn , and thus the outlier λ 1 , that determines the timescale over which the input is integrated to form the estimate � c, and thus the threshold value for which the readout switches from negative to positive.
For comparison, we also test the performance of dense low-rank networks with an added Gaussian component, parameterised to be in an equivalent regime to the sparse networks. Fig 6E shows the task implementation in a low-rank-plus-Gaussian network constructed to have the equivalent eigenspectrum to the sparse network in Fig 6B, with the same outlier λ 1 and spectral radius R. Under this parameterisation, the task is performed accurately and gives rise to a similar psychometric curve, indicating that inputs are integrated similarly in the structured regime of both sparse and Gaussian networks. We find that the psychometric curve is unaffected by changes in random strength g from 0 and 1 (Fig 6F), and begins to be affected only at very high random strengths (g > 3). This is because the high-dimensional fluctuations induced by random connectivity are averaged out by the readout, leaving the task performance unaffected. This stresses the fact that task performance depends critically on the value of the outlier and not on the bulk radius, provided the bulk radius is still low enough to maintain the network in the structured regime. The result is that task performance is robust to connectivity perturbations-such as sparsity or addition of a random component-that merely modify the eigenspectrum in such a way that does not fundamentally disrupt the dominance of low-rank dynamics.
The basic principle we have demonstrated here is that the structured dynamical regime induced by low-rank connectivity can be preserved even at high sparsities, which means that computations designed to exploit this structure can be implemented effectively even in networks which are highly sparse. The implication is that the full repertoire of dynamical computations accessible to a low-rank network can be likewise performed at high sparsities, provided the network is appropriately parameterised.

Discussion
In this study, we investigated the dynamics of recurrent neural networks in which the connectivity matrices are sparse but possess an underlying low-rank structure. We showed that the resulting full-rank connectivity matrices have eigenspectra which consist of two distinct components, a continuous bulk distribution and isolated outliers. Such eigenspectra are directly analogous to those of a fully-connected unit rank structure plus a full-rank random component [8,22,23]. Analytically estimating the magnitude of the outlier and the radius of the continuous bulk in the large N limit allowed us to predict the dynamics of the sparsified networks. In particular, the relative magnitude of the two major eigenspectrum components delineates boundaries between decaying, structured and chaotic dynamical regimes. The similarity in the eigenspectra implies that the regimes of autonomous dynamics in sparsified unit-rank networks are analogous to those of dense unit-rank networks with a random connectivity component [8]. Notable differences however appear when the dynamics are purely input-driven. Altogether, we found that computations designed to exploit key dynamical properties of lowrank networks are highly robust with respect to sparsity. This identifies sparsified low-rank networks as a biologically-plausible network structure through which to implement computation through low-dimensional population dynamics [7].
The sparse networks examined here were generated by directly removing connections in a fully-connected low-rank structure. The resulting connectivity matrices are directly analogous to those obtained by learning a single pattern through Hebbian plasticity on a sparse subset of connections [9,29], and therefore are of potential biological relevance. Our analyses can be directly extended to connectivity which consists of a sparsified low-rank structure superposed with a random sparse component with independent entries. As with randomly-connected networks, the results depend on assumptions regarding how the synaptic weights scale with the number of connections [10,26,[30][31][32][33]. Here we considered two cases, and ultimately focused on the situation in which both the number C of non-zero connections per neuron and the strength of the connections are fixed as the total number of neurons N is increased [26]. Under these assumptions, the radius of continuous bulk of the eigenspectrum remains finite for large N [27], as does the value of the outlier. For alternative choices of scaling, our analyses suggest that the expected behaviour of the eigenvalue bulk ultimately depends on the scaling of the variance of the connectivity matrix adjusted by removing the mean low-rank component.
Given its extreme ubiquity in the brain, a question of interest is whether sparsity confers any direct benefit to cortical networks aside from the evident reduction in metabolic and wiring costs. From our analysis, it is not directly clear whether sparsified low-rank networks possess direct computational advantages over their dense counterparts. Insights into the potential computational benefit of sparsity are however rife in the related field of deep learning. Research indicates that the performance of deep networks is remarkably robust to sparsity, and that a large majority of parameters can be pruned without significant loss in accuracy [34,35]. This makes sparsity a natural regulariser, often employed in combatting over-parameterisation and overfitting [36]. Computational advantages are observed as a consequence, including an improvement in the ability of the trained network to generalise [37,38] and an increased robustness to adversarial attacks [39,40], on top of significant savings in memory storage, training time and energy efficiency [35,41]. Nonetheless, such benefits are most often the result of a highly selective, rule-based pruning process, as opposed to the random weight selection employed in this study. An important avenue of future work will be to explore different forms of structure in the sparsity imposed, and its relation to the training process and the dynamic rules under which the connectivity evolves.

Spectral radius of sparsified full-rank matrix
The sparsified full-rank matrixJ is generated as the elementwise product of the original matrix J with an independent random binary matrix X whose elements are 0 with probability s and 1 with probability 1 − s. In other words: The entries of J have a variance of g 2 /N and a mean of 0, while the entries of X have a variance of (1 − s). The variance of the entries ofJ can therefore be derived as: The spectral radius is then given by the circular law: R ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Var ðJÞ � N q ¼ g ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi When sparsity is imposed by setting the number of connections per unit C, the radius is given by substituting s = 1 − C/N as: Eigenvalue spectrum of sparsified rank-one matrix The elements of the sparsified rank-one matrixP are generated in an equivalent manner as P ij ¼ P ij X ij . Its eigenspectrum is comprised of a continuous bulk and an isolated outlier. We here derive the location of the outlier λ 1 and the radius R of the bulk distribution individually. Outlier. We proceed by showing that, for N ! 1, the right connectivity vector m is an eigenvector v ofP, and derive the corresponding eigenvalue λ. By writing the individual matrix elements ofP as:P ij ¼ the i th element ofP applied to m is: As N ! 1, the sum over j in the right-hand side converges to an expectation due to the central limit theorem, and we may write: It therefore holds that:P m ¼ s mn ð1 À sÞ m ð27Þ so that m is a right eigenvector ofP with eigenvalue Bulk. To determine the radius of the bulk distribution, we derive the variance of the elements of the matrix with the outlier eigenvalue removed,P � ¼P À ð1 À sÞP. We first rewrite X as 1 − B, where B is a Bernoulli matrix with entries B ij � Bð1; sÞ, in order to rewrite the individual matrix entriesP � ij as follows: P � ij ¼P ij À ð1 À sÞ P ij ¼ P ij � X ij À ð1 À sÞ P ij ¼ P ij � ð1 À B ij Þ À ð1 À sÞP ij We can then derive the variance of the entriesP � ij as given that P ij ¼ m i n j N , and m i and n i each have variance σ 2 . As before, we may substitute s = 1 − C/N when sparsity is imposed by setting the number of connections per unit, to give: The bulk radius is then obtained via the circular law R ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Var � N p as in the Gaussian case.

Latent dynamics in low-rank networks
Here we summarize the description of low-dimensional dynamics in low-rank networks [12,13]. The low-rank network of Eq 3, with connectivity matrix P ij ¼ m i n j N , is governed by the dynamics introduced in the main text (Eq 1): At the level of the population, the collective trajectory x(t) is embedded in a low-dimensional linear subspace [12,13]. The total dimensionality of this subspace is the sum of the rank of the connectivity matrix P plus the dimensionality of external inputs. For a rank-one network with one external input vector, the dynamics are constrained to the two-dimensional plane spanned by the left connectivity vector m and the input vector I. The dynamics can then be represented in a new basis by projecting the activity trajectory x(t) onto these two axes. The individual unit activations thus read as: where κ r (t) and κ I (t) are projections of the activity x(t) onto the m and I axes respectively: k I ðtÞ ¼ 1 k I k 2 I T xðtÞ: The projection onto the m axis, κ r (t), is then governed by its own dynamical equation: where: At equilibrium, we therefore have κ r = κ rec .