Skip to main content
  • Loading metrics

The impact of sparsity in low-rank recurrent neural networks


Neural population dynamics are often highly coordinated, allowing task-related computations to be understood as neural trajectories through low-dimensional subspaces. How the network connectivity and input structure give rise to such activity can be investigated with the aid of low-rank recurrent neural networks, a recently-developed class of computational models which offer a rich theoretical framework linking the underlying connectivity structure to emergent low-dimensional dynamics. This framework has so far relied on the assumption of all-to-all connectivity, yet cortical networks are known to be highly sparse. Here we investigate the dynamics of low-rank recurrent networks in which the connections are randomly sparsified, which makes the network connectivity formally full-rank. We first analyse the impact of sparsity on the eigenvalue spectrum of low-rank connectivity matrices, and use this to examine the implications for the dynamics. We find that in the presence of sparsity, the eigenspectra in the complex plane consist of a continuous bulk and isolated outliers, a form analogous to the eigenspectra of connectivity matrices composed of a low-rank and a full-rank random component. This analogy allows us to characterise distinct dynamical regimes of the sparsified low-rank network as a function of key network parameters. Altogether, we find that the low-dimensional dynamics induced by low-rank connectivity structure are preserved even at high levels of sparsity, and can therefore support rich and robust computations even in networks sparsified to a biologically-realistic extent.

Author summary

In large networks of neurons, the activity displayed by the population depends on the strength of the connections between each neuron. In cortical regions engaged in cognitive tasks, this population activity is often seen to be highly coordinated and low-dimensional. A recent line of theoretical work explores how such coordinated activity can arise in a network of neurons in which the matrix defining the connections is constrained to be mathematically low-rank. Until now, this connectivity structure has only been explored in fully-connected networks, in which every neuron is connected to every other. However, in the brain, network connections are often highly sparse, in the sense that most neurons do not share direct connections. Here, we test the robustness of the theoretical framework of low-rank networks to the reality of sparsity present in biological networks. By mathematically analysing the impact of removing connections, we find that the low-dimensional dynamics previously found in dense low-rank networks can in fact persist even at very high levels of sparsity. This has promising implications for the proposal that complex cortical computations which appear to rely on low-dimensional dynamics may be underpinned by a network which has a fundamentally low-rank structure, albeit with only a small fraction of possible connections present.


Neural recordings in animals performing cognitive tasks have revealed that individual neurons ubiquitously display a high degree of coordination. When viewed in the activity state space, in which each each axis represents the firing rate of one unit, the trajectories of neural activity are typically confined to low-dimensional subspaces [16]. The resulting latent dynamics have been proposed to underpin complex cortical computations [7]. However, how the inputs to the network and the connectivity between the individual units shape such low-dimensional activity remains a prominent question.

A recently developed class of models, recurrent networks with low-rank connectivity, provide a tractable theoretical framework for addressing this question and unravelling the relationship between connectivity structure, low-dimensional dynamics and the resulting computations [817]. One important limitation is, however, that these models often assume a dense connectivity structure in which every neuron shares synapses with every other. In contrast, cortical networks exhibit a high degree of sparsity in their connectivity, meaning that each neuron receives inputs from only a fraction of its neighbors [1821]. Since sparse matrices are typically full-rank, an important question is whether and how the results obtained in the study of low-rank recurrent networks apply to sparse connectivity structures.

Here we investigate how the dynamics of low-rank recurrent networks are impacted by increasing degrees of sparsity. We start by analysing the eigenvalue spectra of low-rank connectivity matrices in which sparsity is imposed by removing a random fraction of entries. Such matrices are full-rank, but we find that the corresponding eigenspectra consist of a continuous bulk and isolated outliers, and are therefore analogous to low-rank matrices superposed with a random, full-rank component [8, 22, 23]. We show that both the radius of the eigenvalue bulk and the outliers can be estimated analytically. We then use these results to compare the dynamics of sparsified low-rank networks to those of densely connected low-rank networks with a full-rank random component. Altogether we found that the low-dimensional dynamics generated by a low-rank connectivity structure are highly resistant with respect to sparsity and therefore provide a robust substrate for implementing computations in networks with biologically realistic connectivity.


Network connectivity

We study recurrent networks of N firing rate units, following the classical formalism in [24]. The dynamics of each individual unit i evolve as: (1) where xi describes the total input current to each unit, τ is the time constant of the dynamics, Jij is the synaptic weights from unit j to unit i and ϕ(⋅) is a non-linear transfer function that we take to be the hyperbolic tangent. Each unit can also receive a time-dependent input current of magnitude u(t) via a feedforward weight vector I = {Ii}i = 1…N. The set of synaptic weights Jij are stored in a connectivity matrix J, for which we consider two forms. We begin by considering full-rank Gaussian connectivity, introducing sparsity into the synaptic weights and establishing the ways in which sparse Gaussian networks differ from their fully-connected counterparts. We then constrain the connectivity to be low-rank and sparsify as before, examining how the impact of sparsity in such networks both parallels and contrasts with that of the Gaussian case.

The full-rank Gaussian networks are defined by a matrix J with entries independently distributed as (2) where g controls the variance of the matrix entries and thus the strength of the coupling.

For the low-rank networks, we consider the simplest case of a rank-one matrix P, constructed as in previous work [8] as a rescaled outer product of two N-dimensional random connectivity vectors m = {mi}i = 1…N and n = {ni}i = 1…N such that (3)

This ensures that all columns of P are linearly dependent and proportional to m. The individual entries mi and ni of the connectivity vectors are drawn independently for each i from a joint Gaussian distribution with mean 0 and covariance matrix Σ: (4) where σ2 is the variance of both connectivity vectors which controls the overall strength of the coupling, and σmn is the covariance between them (see Methods). In the large N limit, this covariance becomes equivalent to the degree of overlap between m and n, given by the normalised scalar product: (5)

The covariance σmn plays a critical role in the stability of the dynamics of the rank-one network due to its influence on the matrix eigenvalues, as will be seen in the following sections. The variance σ2 likewise gains a critical influence as soon as the matrix becomes sparse. These two key parameters controlling the connectivity, the variance and covariance of the connectivity vectors, will therefore become paramount in the later analysis of the dynamics.

Eigenvalues of connectivity matrices

The dynamics of recurrent networks are strongly influenced by the eigenspectrum of their connectivity matrix. Regardless of network structure, the dynamics always possess a trivial fixed point at zero, since we take the transfer function ϕ(⋅) to be the hyperbolic tangent, and tanh(0) = 0. The stability of this zero fixed point is determined by the magnitude of the eigenvalue with largest real part, λmax. Since ϕ′(0) = 1, the stability matrix at zero reduces to Sij = Jijδij, so the fixed point at zero becomes unstable when the largest eigenvalue of the connectivity matrix J surpasses unity. As soon as this occurs, non-trivial dynamics can emerge. To understand the impact of sparsity on network dynamics we will therefore analyse the changes in the network eigenspectra, and will place particular focus on λmax, the eigenvalue with maximum real part.

Eigenvalues of sparsified full-rank networks.

We first consider the impact of sparsity on the eigenspectra of the full-rank, random connectivity matrices (Eq 2). Work in random matrix theory has demonstrated that the eigenspectra of such matrices are described by Girko’s circular law [25]: for a matrix with entries independently distributed with mean zero and variance Var, the eigenvalues converge in the limit of N → ∞ to a uniform distribution within a disk of radius centred at the origin. This result is universal in the sense that it holds for any distribution with finite variance. For the Gaussian networks J introduced in (Eq 2) with Var = g2/N, the eigenvalues are therefore uniformly distributed on a circular disk of radius approximated by g. Since the distribution is circular, the radius of this disk is equivalent in the large N limit to λmax, the eigenvalue with maximum real part, so we refer to both by the spectral radius R.

To sparsify the matrix, we simply choose a fraction s ∈ [0, 1] of connections to set to zero at random. This is achieved by multiplying the original matrix J elementwise with a binary matrix X, where Xij are drawn independently from a Bernoulli distribution , forming a sparse matrix characterised by a degree of sparsity s (Fig 1A). Due to the presence of zeros, the entries of have a lower variance than those of J, but remain independently distributed. Because of this property of independence, we expect the universality result of the circular law for iid matrices to hold [25]. Indeed, we find that the eigenvalues of the sparse matrix continue to distribute uniformly on a disk for which the spectral radius is described by , where Var is now the variance of the sparse matrix elements . This variance can be calculated (see Methods) as (1 − s)g2/N, giving a spectral radius of: (6)

Fig 1. Influence of sparsity on the eigenspectra of full-rank networks.

A: Illustration of how sparsity is imposed in the connectivity matrix, where the degree of sparsity is s = 0.5. B: Complex eigenspectra of full-rank, Gaussian connectivity matrices of finite size (N = 300, g = 1) in the dense case (left) and with a sparsity of 0.5 (right). The dashed line plots the unit circle. C, D, E: Reduction of spectral radius R as a function of sparsity in a full-rank matrix J constructed as in (Eq 2) with connection strength g = 1. In C, sparsity is imposed as a fraction of total connections removed (N = 1000). In D and E sparsity is imposed by fixing the number of outgoing connections to C = 200 and increasing N. Dots: mean empirical spectral radius, measured as the largest absolute value of all eigenvalues, over 50 instances. Solid lines: theoretical prediction (Eq (6)).

Increasing the degree of sparsity s in a full-rank Gaussian network thus monotonically reduces the radius of the disk on which the eigenvalues distribute. In Fig 1C we demonstrate the correspondence of the prediction in (Eq 6) to the empirical spectral radius, measured as the largest eigenvalue in the spectrum of a finite-sized Gaussian matrix sparsified in the manner described above.

An alternative means of sparsifying the matrix is to set to zero a fixed number of outgoing connections C per unit, while increasing the total number of units N, where CN. This is often a regime of interest in neuroscientific work [26]. The network is now defined by a degree of sparsity s = 1 − C/N, where C/N is the fraction of non-zero connections per unit. The impact of sparsifying the matrix in this manner is equivalent to the previous case, where the theoretical spectral radius is now given by (Fig 1D) and taking the degree of sparsity to one now corresponds to taking the number of units N to infinity (Fig 1E).

Eigenvalues of sparsified rank-one networks.

We now turn to the impact of sparsity on the eigenspectra of rank-one matrices. Fully-connected rank-one matrices have only one potentially-nonzero eigenvalue, formed from the scalar product of the corresponding left and right eigenvectors. For the matrix P defined in (Eq 3), the right and left eigenvectors are the connectivity vectors m and n, and the corresponding eigenvalue is located at mTn/N on the real axis, which is equivalent in the large N limit to the overlap σmn between the connectivity vectors. All the remaining eigenvalues lie at zero.

When sparsity is introduced to the rank-one structure, forming a sparse matrix , this matrix is now formally full-rank and possesses N potentially-nonzero eigenvalues. However, we observed empirically that the eigenspectrum splits into two distinct components. The eigenvalue associated with the rank-one structure remains distinct in the spectrum, since such structure persists as a backbone to the connectivity. We refer to this structural eigenvalue as the outlier. At the same time, the full-rank perturbation introduced by the sparsity induces additional eigenvalues with nonzero real and imaginary parts which distribute about the origin on a disk with non-uniform density (Fig 2A). We refer to this set of additional non-zero eigenvalues induced by sparsity as the bulk. For the sparsified rank-one matrix , it is now these two components of the spectrum, the bulk and the outlier, that together contribute to the dynamics. Understanding the impact of sparsity on the dynamics therefore reduces to understanding how each component is modified by sparsity. We now address each in turn.

Fig 2. Influence of sparsity on the eigenspectra of rank-one networks.

A: Eigenspectra of rank-one connectivity matrices of finite size, in the dense case (left) and under a sparsity of 0.5 (right). The matrix P is constructed as in (Eq 3), with parameters σ2 = 16, σmn = 1.44 and N = 300. Under sparsity, the outlier (gold) is reduced and the bulk distribution (brown) emerges. The dashed line plots the unit circle. B, C: Impact of sparsity on two key features of the eigenspectrum of finite-size rank-one networks: B, the outlier λ1, and C, the spectral radius of the bulk distribution. The outlier is eventually reduced below the instability boundary of λ1 = 1, dashed line. Sparsity is imposed as a fraction of total connections removed; σ2 = 16, σmn = 4 and N = 1000. D: Same as in C but for sparsity imposed by fixing C = 200 non-zero connections and increasing N; bulk radius is plotted as a function of N. Dots: empirical measurements of outlier and bulk radius. Solid lines: theoretical prediction (Eq (9)).

Firstly, the outlier λ1 is reduced monotonically by sparsity. It can be shown that in the large N limit, the right connectivity vector m remains a right-eigenvector of , and yet a fraction s of matrix entries are now zero; this means that the factor by which the outlier is reduced is 1 − s (see Methods). The outlier therefore lies at (1 − s)mTn/N on the real axis, and is drawn in towards the origin as the degree of sparsity approaches one (Fig 2B). In the large N limit, this value is equivalent to (7)

Secondly, we wish to characterise the radius of the bulk distribution. Although the distribution of eigenvalues in the bulk is non-uniform (Fig 3A), it continues to be circular, and we thus hypothesise that the universality result for the radius [25] still holds. This would allow us to characterise the bulk radius directly using the variance of matrix elements, in a similar manner to the Gaussian matrix. To test this, we derive the variance of the elements not of the entire sparse matrix , but of a new matrix which possesses solely the eigenvalues in the bulk distribution. We remove the eigenvalue outlier from the spectrum as in [22, 23, 27], constructing a new matrix as a linear combination of the dense matrix P and the original sparse matrix . The connectivity vector m is also an eigenvector of , but now with a zero eigenvalue. The distribution of the remaining eigenvalues of is identical to those in , but with the outlier λ1 removed. By deriving the variance of the elements of (see Methods), we obtain: (8)

Fig 3. Key features of rank-one eigenspectrum become independent of N in the high sparsity limit.

A: Illustration of the spectral radius R of the bulk distribution induced by sparsity, and the outlier λ1 inherited from the rank-one structure. Dots: eigenvalues of matrix. Dashed lines: theoretical predictions, with C = 200, N = 2000, σ2 = 0.09, and σmn = 0.008. B: Bulk radius (Eq 14) as a function of sparsity imposed by fixing C = 200 and increasing N, for the rescaled matrix Pij = minj with σ2 = 0.09 and σmn = 0.008. The outlier mn is now independent of N. The bulk radius converges towards (dashed line) as sparsity increases. C: Outlier and bulk radius as a function of the variance of the connectivity vectors, while the covariance is fixed (σmn = 0.01). D: Outlier and bulk radius as a function of the covariance σmn, while the variance is fixed (σ2 = 0.09). Empirical values are displayed as mean (dots) and standard deviation (bars, 10 repeats) of the eigenvalue with largest absolute magnitude (bulk) and real part (outlier), while the outlier is still distinguished from the bulk. When the outlier is smaller than the bulk, its location cannot be measured empirically. Lines: theoretical predictions at empirically measurable (solid) and unmeasurable (dashed) locations. Parameters: C = 200, N = 1200, resulting in s = 0.8.

The theoretical bulk radius given by the circular law is therefore: (9)

We find that this expression accurately describes the radius of the bulk distribution measured empirically in finite-size networks (Fig 2C). When sparsity is imposed by setting a fixed number of connections C and increasing N, these quantities can simply be redefined in terms of C and N by substituting s = 1 − C/N. We thus obtain the outlier as: (10) and the radius of the bulk as: (11)

In contrast to the previous case of Gaussian networks, the radius of the bulk distribution which emerges in rank-one networks therefore scales non-monotonically with sparsity, first increasing to its maximum extent at s = 0.5, then reducing once more into the origin as the degree of sparsity approaches one (Fig 2C and 2D).

High sparsity limit.

With the rank-one connectivity defined with a 1/N scaling as in Eq 3, when C is fixed and N is taken to ∞, both the bulk radius R and the outlier λ1 are reduced to zero. In order to have a non-vanishing eigenspectrum and ensure that the bulk radius and the outlier remain finite in the limit of N → ∞, we turn to a rescaled version of the connectivity matrix Pij (Fig 3). By removing the weight scaling by N and considering simply the matrix (12) the outlier is now constant with respect to sparsity: (13) and the radius of the bulk becomes: (14) which approaches in the limit of high sparsity as N is taken to infinity at finite C (Fig 3B). Thus in this rescaled network, for a given number of connections per neuron C and for a high level of sparsity CN, the radius of the bulk distribution depends only on the variance σ2 of the connectivity vectors (Fig 3C), and the location of the structural eigenvalue outlier depends only on their covariance σmn (Fig 3D). This decoupling from N allows us to understand the dynamics of networks situated in the high-sparsity regime solely in terms of the two variables characterising the rank-one connectivity, the variance σ2 and the covariance σmn of the connectivity vectors.

Dynamics of sparsified rank-one networks

Having characterised the eigenspectrum of the sparsified rank-one matrix, we now turn to the insights we can extract about the dynamics. We have shown that the eigenspectrum of the sparse rank-one matrix is comprised of two distinct components, the outlier and the bulk distribution, which are under the independent control of two key parameters defining the network connectivity. Moreover, we have shown that in the large N limit, the spectral radius of the bulk distribution can be characterised by Girko’s circular law, in the same manner as the circular disk of eigenvalues characteristic of a full-rank Gaussian matrix (Fig 1). This leads us to hypothesize an equivalence between the dynamics of a sparsified rank-one network and those of a dense rank-one network with an added full-rank, Gaussian component, which also give rise to an outlier and an eigenvalue disk in the spectrum [8]. In what follows, we explore the extent to which the dynamics of sparsified rank-one networks resemble dense rank-one networks with additional random connectivity, and highlight the aspects in which they are unique.

To preface our interpretation of the dynamics, we briefly summarise the behaviour of low-rank recurrent networks in the dense case [8, 12, 13]. In general, a network of rank R gives rise to dynamics embedded in an R-dimensional subspace spanned by the right connectivity vectors, with an additional dimension introduced by each addition of external input along a given vector I. Recent work has demonstrated that the trajectories within this subspace can be reduced to a mean-field description of a small number of interacting latent variables [12, 13]. For the rank-one networks that we consider here, the activation of each unit xi can be described by: (15) where I is the vector along which the network receives an external input. The latent variables κr(t) and κI(t) define the projection of the population activity x onto the vectors m and I respectively. The population activity therefore spans the plane formed by the vectors m and I, and reduces to a one-dimensional trajectory along the vector m in the absence of input. Whether or not activity is generated along m is determined by the total recurrent input κrec, given by (see Methods): (16) which represents the overlap of the network activity ϕ(x) with the left connectivity vector n. A non-zero value of κrec—and thus non-trivial equilibrium dynamics structured along m—can only arise if the connectivity vector n has a non-zero overlap σnI with the input vector I (input-driven dynamics) or a non-zero overlap σnm with the connectivity vector m (autonomous dynamics). In the latter case, the system can possess a bistability and evolve in the absence of input towards one of two stable fixed points. Adding an external input which overlaps with the vector n increases the activity along m but also eventually suppresses one of these bistable states [8].

Here, we address the impact of sparsity on the degree of structure present in the input-driven and autonomous network dynamics. We focus on rank-one networks in the high-sparsity limit (Eq 12) where the number of non-zero connections C is fixed independently of N, in which case the magnitude of the outlier and bulk distribution become independent of N and remain finite at high sparsities. We first fix both the outlier and the bulk below unity, and consider the network response to an external feedforward input (Fig 4). We then turn to the autonomous dynamics that arise when both the outlier and bulk are above the instability, and study the dynamical landscape formed from the interaction between the two components (Fig 5).

Fig 4. Impact of sparsity on input-driven dynamics.

Network responses to a step input current along a random vector I. A-C: network consists of a dense rank-one component and a full-rank, Gaussian component of variance g2/N; the random strength g is progressively increased from zero to one, in order to increase the radius of the eigenvalue disk from zero to one. D-F: network consists only of rank-one component Pij = minj; the sparsity is progressively increased by decreasing the number of non-zero connections C. Parameters are chosen such that the radius of the eigenvalue bulk also spans the range [0, 1) as sparsity is modulated (σ2 = 0.043). The outlier is fixed at zero (σmn = 0). The input vector I partially overlaps with n (σnI = 0.2). A, D: Temporal dynamics of the network during step input (A: g = 0.8; D: s = 0.8). Top: samples of input timeseries u(t)Ii. Bottom: samples of network activations xi(t). B, E: Left: input-driven population trajectories projected onto the plane defined by the right connectivity vector m and input vector I, as random strength (resp. sparsity) is progressively increased. Right: principal component analysis (PCA) of each trajectory, showing the fraction of variance explained by the first three components (upper panels) and the correlation between first three principal components and the vectors I and m (lower panels). Examples are shown for both low and high random strength (resp. sparsity). C, F: Top: Dimensionality of network trajectories quantified by the participation ratio , where λi are the eigenvalues of the covariance matrix of activations. Bottom: Projection of network activation x onto the right connectivity vector m (the latent variable κr). The analytical radius of the corresponding eigenvalue disk is also shown in grey. For comparison, we also plot the dimensionality of Gaussian network trajectories corresponding to values of g equal to the radius of the sparse networks (red curve). The mean value for both dimensionality and projections is taken over 50 simulations for each value of g and C. Parameters for all graphs: N = 2000, σ2 = 0.043, σmn = 0.

Fig 5. Dynamical regimes of autonomous network activity at high sparsity.

A: Dynamics of a sparsified rank-one network in the high-sparsity regime where Pij = minj and the number of non-zero connections C is fixed. The variance σ2 and covariance σmn of the connectivity vectors respectively control the bulk radius and outlier of the eigenvalue distribution. Centre: Phase diagram of dynamical regimes in the variance-covariance plane, for C = 200 and N = 1000. The transition from structured to chaotic activity occurs when the bulk radius surpasses the location of the outlier. Side panels: samples of autonomous dynamics of simulated networks situated in different dynamical regimes (coloured squares). Eigenspectra of each network accompany each panel, showing bulk distribution (small dots) and outlier (large dot) with respect to the instability limit at unity (dashed line). B: Modification of the phase diagram when N is fixed (N = 1000) and C is reduced to increase the degree of sparsity. C: Projection of activity along m for a network with fixed variance σ2 and covariance σmn (situated at the white square in phase diagrams in B) while N is fixed (N = 1000) and C is decreased. The network activity progressively loses structure along m since the eigenvalue outlier is reduced. D: Same as C, but with C fixed (C = 600) and N increased. The outlier is independent of N, so structured dynamics can be maintained.

Input-driven dynamics.

For dense rank-one connectivity, when both the eigenvalue outlier and the radius of the bulk are fixed below unity, the fixed point at zero is stable, and the network can display only transient dynamics invoked by an external input u(t) along a feedforward input pattern I (Fig 4A). If the input pattern is orthogonal to the left connectivity vector n, the network activity simply propagates the feedforward input pattern along the one-dimensional axis of I. However, two-dimensional trajectories can emerge if the vector I is given a non-zero overlap with n; in this case, the component of the input-driven activity ϕ(x) along n results in a non-zero κr, which allows the trajectory to evolve into the m dimension during the course of the input current [8]. This behaviour can be seen by projecting the activity into the m-I plane (Fig 4B). The input-driven trajectory is confined to the m-I plane, revealing the underlying rank-one structure in the connectivity.

To understand the manner in which sparsity interferes with these input-driven trajectories, we first consider what happens when we simply add a random, Gaussian component of variance g2/N to an otherwise dense rank-one matrix, which introduces to the eigenspectrum an eigenvalue disk similar in nature to the bulk distribution that arises under sparsity. With such an addition, the input-driven trajectories are only subtly modified as the strength g of the random component (Eq 2) is increased from 0 to 1, increasing the radius of the corresponding eigenvalue disk from 0 to 1 (Fig 4B and 4C). The projection of the population activity in the m-I plane remains unaffected (Fig 4B, left), but the random perturbation to the recurrent inputs causes the population activity to gain additional dimensions (Fig 4B, right). The dominant dimensions of the activity remain aligned with the axes of I and m (Fig 4B, right), and the degree of activation along the m dimension, as quantified by κr, is not reduced (Fig 4C, lower). However, because the radius of the eigenvalue disk induced by the random component is equal to g (Fig 4C, lower), the dimensionality of the network activity increases non-linearly with g (Fig 4C, upper).

The impact of sparsity is markedly different (Fig 4D–4F). As the degree of sparsity is increased, the increased presence of zeros in the connectivity reduces the overlap of the input-driven activity with the n dimension, reducing κr and leading to a progressive loss of structure along m (Fig 4E and 4F). Since the feedforward connections are left untouched, the degree to which the activity spans the I dimension is not affected. The input-driven trajectories of the sparse network are therefore flattened towards the I axis as sparsity is increased (Fig 4E), and the degree of activity along m is progressively decreased to zero (Fig 4F, lower). Moreover, since the radius of the eigenvalue bulk varies non-monotonically as C is decreased to increase the degree of sparsity (Fig 4F, lower), the dimensionality of the network activity also first increases before decreasing again to zero as the recurrent inputs are stripped away (Fig 4F, upper, blue curve). For comparison, we also show the dimensionality of an equivalent Gaussian network defined by a low-rank plus a random component with strength g equal to the corresponding radius of the bulk distribution at each degree of sparsity C (red curve). At intermediate sparsity levels, the sparse low-rank network gives rise to higher-dimensional dynamics than the equivalent Gaussian network.

We therefore highlight a key difference between the input-driven dynamics of sparsified rank-one networks and those of a dense rank-one network with an added random component. Sparsity interferes with the structure of input-induced activity in a way that a random component does not: it reduces the component of the dynamics along m otherwise revealed by an appropriate geometric configuration of inputs, and ultimately reduces the overall dimensionality of the activity by stripping away the influence of recurrent inputs. At intermediate sparsities, this dimensionality is increased above that expected in a dense low-rank network with an analogous eigenvalue disk introduced by a superposed Gaussian component. In this latter network type, the overall dimensionality of the dynamics monotonically increases with the strength of the random component while the the dynamics along m are preserved.

Autonomous dynamics.

When the outlier and the bulk radius in the eigenspectrum are increased above unity, the fixed point at zero loses stability and non-trivial autonomous dynamics emerge. It is here that the equivalence of the sparse regime to the addition of a random component manifests itself. In previous work [8] investigating the autonomous dynamics that emerge in networks comprised of a rank-one component P plus a full-rank random part, distinct dynamical regimes were identified on the basis of the dominance of each component in the eigenspectrum. The dynamics were described as decaying, if both the outlier of P and the eigenvalue disk of J lie below unity; structured stationary, if the outlier crosses the instability and the disk radius, inducing a non-trivial fixed point along the axis of m; or chaotic, if the radius of the disk belonging to J crosses the instability and the outlier, introducing the higher-dimensional fluctuations classically associated with random networks [24].

Due to the similarities in the eigenspectra, our analyses reveal that the autonomous dynamical regimes of sparsified rank-one networks can be mapped directly onto those described above (Fig 5). The bulk distribution plays a role analogous to the eigenvalue disk of the random part of the connectivity, while the eigenvalue outlier takes the role of the outlier of P. In the rank-one network in the high-sparsity regime (Eq 12), the location of the bulk and the outlier are independent of N and thus remain present in the eigenspectrum even at high sparsities, at magnitudes described respectively by and mn (Fig 3). As in densely connected networks, the instability can either be lead by the outlier, bringing the network into a heterogenous stationary regime aligned with m (Fig 5A, top left), or by the bulk, inducing chaotic dynamics (Fig 5A, bottom right). Since the magnitude of the bulk radius and the outlier are controlled respectively by the variance σ2 and covariance σmn of the connectivity vectors, the regime in which the network is situated is dictated solely by the relative configuration of these key connectivity parameters. The phase diagram of Fig 5A summarises the dynamical landscape that arises; we note that this diagram is equivalent to that in [8], where the variance σ2 takes the place of the coupling strength g of the random component.

Since the precise location of the outlier and the bulk are a function of C, the form of the phase diagram is modulated by C. Fixing N and modulating C shifts the boundaries of the phase diagram (Fig 5B), which can alter the dynamics displayed by a network for given values of the variance σ2 and covariance σmn. For example, in a network fixed at a given parameter location in the phase diagram (white square), progressively decreasing C to increase the degree of sparsity causes the network activity to lose structure along m (Fig 5C), since the outlier mn decreases as a function of C.

In contrast, when fixing C and increasing N to sparsify the connections, the degree of structure along m remains unaffected (Fig 5D). This is because the outlier is independent of N, and the bulk radius quickly becomes so as the sparsity 1 − C/N is increased. Thus, the rank-one network constructed as in Eq 12, appropriately parameterised, can preserve its rank-one outlier and sustain a high degree of sparsity while still displaying the structured, one-dimensional dynamics that are the hallmark of its underlying connectivity.

In summary, sparsified low-rank networks display a wider range of dynamics than their dense counterparts, since the full-rank perturbation to the connectivity introduced by sparsity acts in a manner analogous to the addition of a random component. When the dynamics are purely input-driven, these two cases are not directly equivalent; increasing the degree of sparsity ultimately reduces the dimensionality of the dynamics and erodes the structured component, while imposing a random component expands the dimensionality of the dynamics. However, when the dynamics are autonomous, the dynamical regimes accessible to sparsified low-rank networks can be equated directly to those of a rank-one network with an added Gaussian term.

Computations in sparsified networks

Low-rank recurrent networks lead to a simple, transparent relationship between the connectivity and resulting dynamics which can be harnessed to implement a rich repertoire of input-output computations [8, 1113]. The fact that this relationship is preserved even at high levels of sparsity indicates that such computations can be performed even in the highly-sparse regime. By way of example, here we demonstrate how a non-linear input-integration task can be implemented in a sparse low-rank network by designing the connectivity according to geometric principles originally developed in the context of dense low-rank networks (Fig 6).

Fig 6. Implementation of input integration task in rank-one networks at high sparsity.

A: Illustration of recurrent network structure (left), geometrical configuration of input, readout and connectivity vectors (centre), and construction of sparse rank-one connectivity matrix (right). B: Implementation of the task in a sparse network (s = 0.9, C = 200 and N = 2000). Top: sample of fluctuating inputs, with magnitude set by strength parameter . Centre: examples of network activations xi(t), for high and low stimulus strengths (dark and light blue). Bottom: corresponding readout z(t) for high and low stimulus strengths. Network is parameterised by variance σ2 = 0.1 and covariance σmn = 0.04, located at the white square on the phase diagram in the inset. C: Psychometric curve for different sparsity levels C, indicating the proportion of positive readouts produced as the stimulus strength is increased. The dashed line indicates the threshold stimulus strength for which the readout should switch from positive to negative. The proportion is defined as the fraction of times, over 50 repeats, that the mean readout (taken over the final 50ms of stimulus presentation) is positive. All other parameters are held fixed (σ2 = 0.1, σmn = 0.06 and N = 2000) D: Psychometric curve for different sparsity levels C, with the outlier held fixed at a constant value of λ1 = 8. The different values in C result in bulk distributions with spectral radii of 0.76, 0.87 and 0.97 respectively. E: Readout dynamics for task implemented in a network composed of a dense low-rank component plus a Gaussian component, parameterised to ensure the equivalent outlier and spectral radius to the sparse network in B (network defined by Jij + minj, with Jij as in Eq 12, σ2 = 9, σmn = 2.3, g = 1.3). F: Psychometric curve of low-rank-plus-Gaussian network for different random strengths, g, with the outlier fixed to the same value as the sparse network in D.

We consider a simple evidence-accumulation task in the form of a common behavioural paradigm: a stimulus is assumed to vary continuously along a particular stimulus feature, such as the coherence of a random-dot kinetogram [28], and the task is to report whether the mean magnitude of this stimulus feature is greater than a certain threshold. The actual magnitude c(t) is assumed to be subject to noise fluctuations, so the input must be integrated over time. To model the task, we follow [8] and use the network structure of Fig 6A. The network is provided with a fluctuating stimulus c(t)I + η(t), where the input pattern I represents the stimulus feature of interest, c(t) is its fluctuating magnitude, and η(t) is a vector of independent noise inputs. A readout unit sums the network activity through a set of readout weights w to generate an output z(t). We consider the task as a Go-Nogo paradigm, in which the network must produce a positive readout if the average stimulus magnitude, , is above a certain threshold θ (Go condition), and a negative readout if the magnitude is below this threshold (Nogo condition).

In dense networks, the task can be implemented easily with unit rank connectivity, using a solution which relies solely on an appropriate geometric structure of input and readout weights. The details of the solution are discussed in detail in [8]; here we simply give an overview of the requirements. As described in the previous section on dynamics, the network response to an input pattern I overlapping with n is a two-dimensional trajectory which spans the mI plane. A non-trivial readout can therefore be obtained by using a readout vector w which overlaps either with m or I. Here we use the simplest connectivity solution: we set I = n and w = m. This ensures that the network generates a non-zero output in response to the given input pattern I. Moreover, to obtain the nonlinear behaviour required for the task and ensure that the output is positive only when the input strength is above a certain threshold, we additionally choose m and n with an overlap σmn greater than unity. This allows us to exploit the bistability of the resulting fixed point: by initialising the network in the lower fixed point, the readout is initially negative, but if the integrated input is strong enough, the network activity jumps to the upper fixed point and the readout becomes positive, symbolising stimulus detection.

To summarise, the task is implemented with the following connectivity solution: we first generate randomly the input and readout vectors I and w; we then set n = I to allow the input to be picked up by the recurrent dynamics and m = w for the recurrent dynamics to themselves influence the readout; and finally, we add a common component to m and n so that they overlap in a shared dimension orthogonal to the input and readout, in order to generate the non-linear switch in readout upon integrating the stimulus (Fig 6A, centre). To implement the task in a sparse rank-one network, we generate connectivity vectors m and n which obey these requirements and construct a rank-one matrix P as Pij = minj. We then keep only C non-zero inputs per neuron to form the sparse connectivity matrix (Fig 6A, right).

Network simulations confirm that the task can be performed accurately at high degrees of sparsity (Fig 6B), generating a positive readout only for high average stimulus magnitudes, . The requirement for success is that the rank-one outlier remains greater than the bulk of the eigenvalue distribution, to keep the network in the structured regime in which the bistability can be maintained. Since the boundaries between regimes shift as C is modified (Fig 5B), the connectivity vector overlap σmn needs to be modulated to ensure that the outlier continues to dominate as the network is sparsified. If σmn is not modulated simultaneously, the psychometric curve for the task simply shifts with C (Fig 6C): as C is lowered, the network displays increased sensitivity and lower (sub-threshold) stimulus strengths begin to generate positive readouts, decreasing the accuracy on the task. In contrast, if σmn is modulated in order to fix the outlier at a constant value while sparsity is modified, the psychometric curve does not change (Fig 6D). This is because it is the overlap σmn, and thus the outlier λ1, that determines the timescale over which the input is integrated to form the estimate , and thus the threshold value for which the readout switches from negative to positive.

For comparison, we also test the performance of dense low-rank networks with an added Gaussian component, parameterised to be in an equivalent regime to the sparse networks. Fig 6E shows the task implementation in a low-rank-plus-Gaussian network constructed to have the equivalent eigenspectrum to the sparse network in Fig 6B, with the same outlier λ1 and spectral radius R. Under this parameterisation, the task is performed accurately and gives rise to a similar psychometric curve, indicating that inputs are integrated similarly in the structured regime of both sparse and Gaussian networks. We find that the psychometric curve is unaffected by changes in random strength g from 0 and 1 (Fig 6F), and begins to be affected only at very high random strengths (g > 3). This is because the high-dimensional fluctuations induced by random connectivity are averaged out by the readout, leaving the task performance unaffected. This stresses the fact that task performance depends critically on the value of the outlier and not on the bulk radius, provided the bulk radius is still low enough to maintain the network in the structured regime. The result is that task performance is robust to connectivity perturbations—such as sparsity or addition of a random component—that merely modify the eigenspectrum in such a way that does not fundamentally disrupt the dominance of low-rank dynamics.

The basic principle we have demonstrated here is that the structured dynamical regime induced by low-rank connectivity can be preserved even at high sparsities, which means that computations designed to exploit this structure can be implemented effectively even in networks which are highly sparse. The implication is that the full repertoire of dynamical computations accessible to a low-rank network can be likewise performed at high sparsities, provided the network is appropriately parameterised.


In this study, we investigated the dynamics of recurrent neural networks in which the connectivity matrices are sparse but possess an underlying low-rank structure. We showed that the resulting full-rank connectivity matrices have eigenspectra which consist of two distinct components, a continuous bulk distribution and isolated outliers. Such eigenspectra are directly analogous to those of a fully-connected unit rank structure plus a full-rank random component [8, 22, 23]. Analytically estimating the magnitude of the outlier and the radius of the continuous bulk in the large N limit allowed us to predict the dynamics of the sparsified networks. In particular, the relative magnitude of the two major eigenspectrum components delineates boundaries between decaying, structured and chaotic dynamical regimes. The similarity in the eigenspectra implies that the regimes of autonomous dynamics in sparsified unit-rank networks are analogous to those of dense unit-rank networks with a random connectivity component [8]. Notable differences however appear when the dynamics are purely input-driven. Altogether, we found that computations designed to exploit key dynamical properties of low-rank networks are highly robust with respect to sparsity. This identifies sparsified low-rank networks as a biologically-plausible network structure through which to implement computation through low-dimensional population dynamics [7].

The sparse networks examined here were generated by directly removing connections in a fully-connected low-rank structure. The resulting connectivity matrices are directly analogous to those obtained by learning a single pattern through Hebbian plasticity on a sparse subset of connections [9, 29], and therefore are of potential biological relevance. Our analyses can be directly extended to connectivity which consists of a sparsified low-rank structure superposed with a random sparse component with independent entries. As with randomly-connected networks, the results depend on assumptions regarding how the synaptic weights scale with the number of connections [10, 26, 3033]. Here we considered two cases, and ultimately focused on the situation in which both the number C of non-zero connections per neuron and the strength of the connections are fixed as the total number of neurons N is increased [26]. Under these assumptions, the radius of continuous bulk of the eigenspectrum remains finite for large N [27], as does the value of the outlier. For alternative choices of scaling, our analyses suggest that the expected behaviour of the eigenvalue bulk ultimately depends on the scaling of the variance of the connectivity matrix adjusted by removing the mean low-rank component.

Given its extreme ubiquity in the brain, a question of interest is whether sparsity confers any direct benefit to cortical networks aside from the evident reduction in metabolic and wiring costs. From our analysis, it is not directly clear whether sparsified low-rank networks possess direct computational advantages over their dense counterparts. Insights into the potential computational benefit of sparsity are however rife in the related field of deep learning. Research indicates that the performance of deep networks is remarkably robust to sparsity, and that a large majority of parameters can be pruned without significant loss in accuracy [34, 35]. This makes sparsity a natural regulariser, often employed in combatting over-parameterisation and overfitting [36]. Computational advantages are observed as a consequence, including an improvement in the ability of the trained network to generalise [37, 38] and an increased robustness to adversarial attacks [39, 40], on top of significant savings in memory storage, training time and energy efficiency [35, 41]. Nonetheless, such benefits are most often the result of a highly selective, rule-based pruning process, as opposed to the random weight selection employed in this study. An important avenue of future work will be to explore different forms of structure in the sparsity imposed, and its relation to the training process and the dynamic rules under which the connectivity evolves.


Connectivity vectors

The right and left connectivity vectors m and n are constructed from three independent normal random vectors x, y and z in the following manner: (17) (18) where the components of x, y and z are generated independently from . The elements of m and n are therefore Gaussian-distributed as and their degree of overlap onto the z direction is controlled by scaling in the interval [0, σ].

Spectral radius of sparsified full-rank matrix

The sparsified full-rank matrix is generated as the elementwise product of the original matrix J with an independent random binary matrix X whose elements are 0 with probability s and 1 with probability 1 − s. In other words: (19)

The entries of J have a variance of g2/N and a mean of 0, while the entries of X have a variance of (1 − s). The variance of the entries of can therefore be derived as: (20) (21)

The spectral radius is then given by the circular law: (22)

When sparsity is imposed by setting the number of connections per unit C, the radius is given by substituting s = 1 − C/N as: (23)

Eigenvalue spectrum of sparsified rank-one matrix

The elements of the sparsified rank-one matrix are generated in an equivalent manner as . Its eigenspectrum is comprised of a continuous bulk and an isolated outlier. We here derive the location of the outlier λ1 and the radius R of the bulk distribution individually.


We proceed by showing that, for N → ∞, the right connectivity vector m is an eigenvector v of , and derive the corresponding eigenvalue λ. By writing the individual matrix elements of as: (24) the ith element of applied to m is: (25)

As N → ∞, the sum over j in the right-hand side converges to an expectation due to the central limit theorem, and we may write: (26)

It therefore holds that: (27) so that m is a right eigenvector of with eigenvalue (28)


To determine the radius of the bulk distribution, we derive the variance of the elements of the matrix with the outlier eigenvalue removed, . We first rewrite X as 1 − B, where B is a Bernoulli matrix with entries , in order to rewrite the individual matrix entries as follows: (29)

We can then derive the variance of the entries as (30) given that , and mi and ni each have variance σ2. As before, we may substitute s = 1 − C/N when sparsity is imposed by setting the number of connections per unit, to give: (31)

The bulk radius is then obtained via the circular law as in the Gaussian case.

Latent dynamics in low-rank networks

Here we summarize the description of low-dimensional dynamics in low-rank networks [12, 13].

The low-rank network of Eq 3, with connectivity matrix , is governed by the dynamics introduced in the main text (Eq 1): (32)

At the level of the population, the collective trajectory x(t) is embedded in a low-dimensional linear subspace [12, 13]. The total dimensionality of this subspace is the sum of the rank of the connectivity matrix P plus the dimensionality of external inputs. For a rank-one network with one external input vector, the dynamics are constrained to the two-dimensional plane spanned by the left connectivity vector m and the input vector I. The dynamics can then be represented in a new basis by projecting the activity trajectory x(t) onto these two axes. The individual unit activations thus read as: (33) where κr(t) and κI(t) are projections of the activity x(t) onto the m and I axes respectively: (34) (35)

The projection onto the m axis, κr(t), is then governed by its own dynamical equation: (36) where: (37)

At equilibrium, we therefore have κr = κrec.


  1. 1. Gao P, Ganguli S. On simplicity and complexity in the brave new world of large-scale neuroscience. Current opinion in neurobiology. 2015;32:148–155. pmid:25932978
  2. 2. Gallego JA, Perich MG, Miller LE, Solla SA. Neural manifolds for the control of movement. Neuron. 2017;94(5):978–984. pmid:28595054
  3. 3. Saxena S, Cunningham JP. Towards the neural population doctrine. Current opinion in neurobiology. 2019;55:103–111. pmid:30877963
  4. 4. Jazayeri M, Ostojic S. Interpreting neural computations by examining intrinsic and embedding dimensionality of neural activity. Current opinion in neurobiology. 2021;70:113–120. pmid:34537579
  5. 5. Urai AE, Doiron B, Leifer AM, Churchland AK. Large-scale neural recordings call for new insights to link brain and behavior. Nature neuroscience. 2022; p. 1–9. pmid:34980926
  6. 6. Chung S, Abbott L. Neural population geometry: An approach for understanding biological and artificial neural networks. Current opinion in neurobiology. 2021;70:137–144. pmid:34801787
  7. 7. Vyas S, Golub MD, Sussillo D, Shenoy KV. Computation through neural population dynamics. Annual Review of Neuroscience. 2020;43:249–275. pmid:32640928
  8. 8. Mastrogiuseppe F, Ostojic S. Linking connectivity, dynamics, and computations in low-rank recurrent neural networks. Neuron. 2018;99(3):609–623. pmid:30057201
  9. 9. Pereira U, Brunel N. Attractor dynamics in networks with learning rules inferred from in vivo data. Neuron. 2018;99(1):227–238. pmid:29909997
  10. 10. Landau ID, Sompolinsky H. Coherent chaos in a recurrent neural network with structured connectivity. PLoS computational biology. 2018;14(12):e1006309. pmid:30543634
  11. 11. Schuessler F, Dubreuil A, Mastrogiuseppe F, Ostojic S, Barak O. Dynamics of random recurrent networks with correlated low-rank structure. Physical Review Research. 2020;2(1):013111.
  12. 12. Beiran M, Dubreuil A, Valente A, Mastrogiuseppe F, Ostojic S. Shaping dynamics with multiple populations in low-rank recurrent networks. Neural Computation. 2021;33(6):1572–1615. pmid:34496384
  13. 13. Dubreuil A, Valente A, Beiran M, Mastrogiuseppe F, Ostojic S. The role of population structure in computations through neural dynamics. bioRxiv. 2020; p. 2020–07.
  14. 14. Beiran M, Meirhaeghe N, Sohn H, Jazayeri M, Ostojic S. Parametric control of flexible timing through low-dimensional neural manifolds. biorXiv. 2021;.
  15. 15. Landau ID, Sompolinsky H. Macroscopic fluctuations emerge in balanced networks with incomplete recurrent alignment. Physical Review Research. 2021;3(2):023171.
  16. 16. Kadmon J, Timcheck J, Ganguli S. Predictive coding in balanced neural networks with noise, chaos and delays. Advances in Neural Information Processing Systems. 2020;33.
  17. 17. Logiaco L, Abbott L, Escola S. Thalamic control of cortical dynamics in a model of flexible motor sequencing. Cell reports. 2021;35(9):109090. pmid:34077721
  18. 18. Barbour B, Brunel N, Hakim V, Nadal JP. What can we learn from synaptic weight distributions? TRENDS in Neurosciences. 2007;30(12):622–629. pmid:17983670
  19. 19. Holmgren C, Harkany T, Svennenfors B, Zilberter Y. Pyramidal cell communication within local networks in layer 2/3 of rat neocortex. The Journal of physiology. 2003;551(1):139–153. pmid:12813147
  20. 20. Seeman SC, Campagnola L, Davoudian PA, Hoggarth A, Hage TA, Bosma-Moody A, et al. Sparse recurrent excitatory connectivity in the microcircuit of the adult mouse and human cortex. elife. 2018;7:e37349. pmid:30256194
  21. 21. Lefort S, Tomm C, Sarria JCF, Petersen CC. The excitatory neuronal network of the C2 barrel column in mouse primary somatosensory cortex. Neuron. 2009;61(2):301–316. pmid:19186171
  22. 22. Rajan K, Abbott LF. Eigenvalue spectra of random matrices for neural networks. Physical review letters. 2006;97(18):188104. pmid:17155583
  23. 23. Tao T. Outliers in the spectrum of iid matrices with bounded rank perturbations. Probability Theory and Related Fields. 2013;155(1):231–263.
  24. 24. Sompolinsky H, Crisanti A, Sommers HJ. Chaos in random neural networks. Physical review letters. 1988;61(3):259. pmid:10039285
  25. 25. Tao T, Vu V. Random matrices: the circular law. Communications in Contemporary Mathematics. 2008;10(02):261–307.
  26. 26. Brunel N. Dynamics of sparsely connected networks of excitatory and inhibitory spiking neurons. Journal of computational neuroscience. 2000;8(3):183–208. pmid:10809012
  27. 27. Ostojic S. Two types of asynchronous activity in networks of excitatory and inhibitory spiking neurons. Nature neuroscience. 2014;17(4):594–600. pmid:24561997
  28. 28. Gold JI, Shadlen MN. The neural basis of decision making. Annu Rev Neurosci. 2007;30:535–574. pmid:17600525
  29. 29. Pereira-Obilinovic U, Aljadeff J, Brunel N. Forgetting leads to chaos in attractor networks. arXiv preprint arXiv:211200119. 2021;.
  30. 30. Amit DJ, Brunel N. Model of global spontaneous activity and local structured activity during delay periods in the cerebral cortex. Cerebral cortex (New York, NY: 1991). 1997;7(3):237–252. pmid:9143444
  31. 31. van Vreeswijk C, Sompolinsky H. Chaotic balanced state in a model of cortical circuits. Neural computation. 1998;10(6):1321–1371. pmid:9698348
  32. 32. Renart A, De La Rocha J, Bartho P, Hollender L, Parga N, Reyes A, et al. The asynchronous state in cortical circuits. science. 2010;327(5965):587–590. pmid:20110507
  33. 33. Barral J, Reyes AD. Synaptic scaling rule preserves excitatory–inhibitory balance and salient neuronal network dynamics. Nature neuroscience. 2016;19(12):1690–1696. pmid:27749827
  34. 34. Denil M, Shakibi B, Dinh L, Ranzato M, De Freitas N. Predicting parameters in deep learning. Advances in neural information processing systems. 2013;26.
  35. 35. Gale T, Elsen E, Hooker S. The state of sparsity in deep neural networks. arXiv preprint arXiv:190209574. 2019;.
  36. 36. Gerum RC, Erpenbeck A, Krauss P, Schilling A. Sparsity through evolutionary pruning prevents neuronal networks from overfitting. Neural Networks. 2020;128:305–312. pmid:32454374
  37. 37. Mozer MC, Smolensky P. Skeletonization: A technique for trimming the fat from a network via relevance assessment. Advances in neural information processing systems. 1988;1.
  38. 38. Bartoldson B, Morcos A, Barbu A, Erlebacher G. The generalization-stability tradeoff in neural network pruning. Advances in Neural Information Processing Systems. 2020;33:20852–20864.
  39. 39. Sehwag V, Wang S, Mittal P, Jana S. Hydra: Pruning adversarially robust neural networks. Advances in Neural Information Processing Systems. 2020;33:19655–19666.
  40. 40. Özdenizci O, Legenstein R. Training adversarially robust sparse networks via Bayesian connectivity sampling. In: International Conference on Machine Learning. PMLR; 2021. p. 8314–8324.
  41. 41. Hoefler T, Alistarh D, Ben-Nun T, Dryden N, Peste A. Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks. Journal of Machine Learning Research. 2021;22(241):1–124.