Architecture of the brain’s visual system enhances network stability and performance through layers, delays, and feedback

Osvaldo Matias Velarde; Hernán A. Makse; Lucas C. Parra

doi:10.1371/journal.pcbi.1011078

Abstract

In the visual system of primates, image information propagates across successive cortical areas, and there is also local feedback within an area and long-range feedback across areas. Recent findings suggest that the resulting temporal dynamics of neural activity are crucial in several vision tasks. In contrast, artificial neural network models of vision are typically feedforward and do not capitalize on the benefits of temporal dynamics, partly due to concerns about stability and computational costs.

In this study, we focus on recurrent networks with feedback connections for visual tasks with static input corresponding to a single fixation. We demonstrate mathematically that a network’s dynamics can be stabilized by four key features of biological networks: layer-ordered structure, temporal delays between layers, longer distance feedback across layers, and nonlinear neuronal responses. Conversely, when feedback has a fixed distance, one can omit delays in feedforward connections to achieve more efficient artificial implementations.

We also evaluated the effect of feedback connections on object detection and classification performance using standard benchmarks, specifically the COCO and CIFAR10 datasets. Our findings indicate that feedback connections improved the detection of small objects, and classification performance became more robust to noise. We found that performance increased with the temporal dynamics, not unlike what is observed in core vision of primates.

These results suggest that delays and layered organization are crucial features for stability and performance in both biological and artificial recurrent neural networks.

Author summary

The visual cortex is a part of the brain that receives, integrates, and processes visual information. It is made up of many interconnected areas that work together to help us see. Studies have shown that lateral and feedback connections between these areas are essential for us to be able to see and understand the world around us. However, most computer vision models only consider feedforward connections.

In this study, we looked at the stability of networks with feedback. We used mathematical tools to discover that layered networks with long-range feedback favor stability, as do biologically realistic implementations with temporal delays in the feedforward connections. We also demonstrated the performance advantages of adding feedback connections to convolutional networks in image classification and detection tasks.

These results suggest that the organization of the visual system favors stability. This implies that biologically more realistic implementations of computational vision networks may be easier to train.

Citation: Velarde OM, Makse HA, Parra LC (2023) Architecture of the brain’s visual system enhances network stability and performance through layers, delays, and feedback. PLoS Comput Biol 19(11): e1011078. https://doi.org/10.1371/journal.pcbi.1011078

Editor: Drew Linsley, Brown University, UNITED STATES

Received: April 5, 2023; Accepted: October 19, 2023; Published: November 10, 2023

Copyright: © 2023 Velarde et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript and its Supporting information file.

Funding: This work was supported by NIH grant R01CA247910 (LCP). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

The visual system receives information arriving from the retinas through the lateral geniculate nucleus and processes it from there through a sequence of cortical areas (see Fig 1a) [1, 2]. Each subsequent cortical area captures a hierarchy of image information transforming low-level visual features, such as edges in V1, to mid-level shapes in V2-V4 to high-level semantic in IT (Inferior Temporal cortex) [1, 3–5]. This processing takes a few hundred milliseconds and typically happens with a nearly-static input while the animal maintains fixation [3].

Download:

Fig 1. Visual processing.

(a) Schematic of visual processing across several areas defined by anatomical structure and function of the area such as V1, V2, V4, and Inferior Temporal cortex (IT). The major input to the visual cortex is from the lateral geniculate nucleus (LGN) whose principal neurons receive input from the retina [26]. (b) Models of the visual cortex. In feedforward models, information is only processed in one direction. In models with feedback connections, information from distant layers loops back to earlier layers (e.g. from IT to V1) across multiple distances [27]. Also, local activity feeds back to the same layer. (c) Feedforward models map each point in the input space to a point in the output space. In a model with feedback, a static point in the input space can generate a temporal dynamic (or trajectory) in the output space. The properties of the trajectory strongly depend on the feedback connections and the input. This image is inspired by Fig 2 in [28] describing “core vision”, the sub-second processing in primates during a single fixation.

https://doi.org/10.1371/journal.pcbi.1011078.g001

Computer vision has taken inspiration from this hierarchical organization to design image-processing networks that solve a variety of vision tasks [6–8]. These artificial neural networks are organized in successive layers often with identical processing within a layer that can be implemented as a set of convolutions (Fig 1b). The term “layer” in neuroscience refers to distinct layers of neuronal structures on the cortical sheet within a single cortical area, whereas in computational networks it refers to a sequence of similar processing stages comparable to the sequence of processing across cortical areas. Here we will use the meaning of layer used in computational networks. Such convolutional neural networks (CNN) have achieved remarkable performance when the network is many layers deep. In such deep networks, “activations” at various depths have been associated with neural activity observed in different areas of the visual processing hierarchy [9, 10]. An early finding was that neural activity in IT contains enough information to identify the class of the image comparable to deeper layers in CNNs [3, 11].

Conventional CNNs consist of convolutions followed by nonlinear functions applied sequentially without feedback connections (Fig 1b-left). However, both anatomical and functional evidence in the primate visual system show the presence of a large number of lateral connections within brain areas, and feedback connections from later to earlier areas [12, 13]. It is now well established that these distant feedback connections contribute to visual processing [14–18] (Fig 1b-right). Importantly, feedback connections lead to a temporal dynamic as the activity in a given area will change once it receives feedback from later (higher) areas. Thus, even in the presence of a static input, the neural activity follows trajectories (Fig 1c-right) in the space of activations.

The structure of the network determines the temporal dynamic including the time-scale of changes in activations [19], the location of fixed points or limit cycles and their stability [20]. Feedforward networks are by definition stable. In contrast, a network with feedback is not guaranteed to have stable temporal dynamics and the question of stability becomes important. The most dramatic example of instability in a biological network is runaway excitation during epileptic seizures. For biological neural networks to be effective, it is critical that their dynamics reach a consistent state (or sequence of states) [21, 22]. Meanwhile, for artificial neural networks [23], stability plays an important role in the learning process [24, 25].

In visual processing, the dynamics of the network define how the input (image) will be processed to obtain the output (representations) (Fig 1c). For feedforward processing, each point in the input space is associated with a point in the output space. For a network with feedback connections, however, each input image results in a trajectory of activations, even when the input is constant. In visual tasks where only high-level representations matter, the output should be robust to perturbations of the input, such as partial occlusion, orientation, and contrast. For example, in Fig 1c, assume that points 2 and 3 of the input space represent different perturbations of input point 1, and their corresponding outputs are close. Note that point 3 is a stronger disturbance than point 2 (comparable to the distance of an arbitrary point 4); however, the input-output relationship correctly replicates the desired proximity relations.

In this work, we use analytic tools of dynamical system theory to study the stability of networks with local and long-range feedback connections. We found that the stability of the network depends on the temporal delay of the feedforward connections. We also show that feedback over longer distances and a layered structure favor the stability of the dynamics. Finally, we use these results to add stable feedback in recurrent convolutional networks for visual tasks such as object detection and classification. There, feedback connections improved the detection of small objects and robustness against noise.

2 Methods

A neural network consists of connected units, with connections characterized by their sign, strength, and time delay. The structure of connections determines the resulting dynamics of activity in the units of the network. When connections are organized in a sequence of layers, it’s possible to distinguish between feedforward and feedback connections (a mathematical definition is provided in Section A.1 of S1 Text). To analyze the stability of the dynamic we will consider a simplified network, first with single units per layer, and later extend this to multiple units per layer. To analyze the effects of feedback on performance in visual tasks we then rely on a state-of-the-art convolutional network with a more complex structure. Here we introduce both structures.

2.1 Reduced neural network

In an artificial neural network that has only feedforward connections, there is no need to consider time delays, and connections are often treated as instantaneous. Once feedback is included, one has to decide on the exact order of operations, and whether the forward pass is implemented instantaneously, or if each transmission from one unit to the next takes time, as it does in a biological system. To make these notions concrete, consider the following dynamics of the variable that represents the activation in layer l ∈ {1, 2…, N} (with a single unit) at time t: (1) where is a constant input in the first layer (l = 1), α_j→i and Δ indicate the weight and time delay of the connection between layers j → i, respectively. We use the symbol δ to represent the Kronecker delta.

When feedforward connections are instantaneous (Δ = 0), we say that the transmission of information is Artificial; while for Δ = 1 we say that it is Biological (Fig 2).

Download:

Fig 2. Two implementations of feedforwad transmission.

In the artificial case, feedforward connections (black arrows) transmit information instantaneously (Δ = 0). In the biological case, feedforward transmission requires time and thus introduces a time delay (Δ = 1). In both cases, feedback connections require a delay (Δ = 1) irrespective of distance (0, 1, or 2 in green, blue, and red). Note that in the artificial case, the units along lines r_i only depend on the information from units along line r_i−1; while in the biological case units along line d_i depend on the information from units along several lines d_i−1, d_i−2, …. The lines r_i and d_i represent how the state of the network evolves as a function of the layers and time.

https://doi.org/10.1371/journal.pcbi.1011078.g002

The matrix version of the Eq (1) is: (2) where , and . The N × N matrices are defined by (M_FF)_ij = α_j→i δ_j,i−1, and (M_FB)_ij = α_{j → i} Θ(i ≤ j). Here, Θ indicates the step function.

Regardless of the value of Δ, the fixed point of the system is (3)

However, the stability of depend on Δ. Let’s rewrite Eq (2) for the two cases:

Biological: (Δ = 1): where M = M_FF + M_FB
Artificial: (Δ = 0): where M = (Id − M_FF)⁻¹M_FB

Both of the above equations define a discrete-time linear dynamical system. The fixed point h* and its stability depend on the entries of the matrix M. From the bifurcation theory, we know that the eigenvalues of the matrix M define the stability of the system [29]. By definition, λ is an eigenvalue of the matrix M if and only if it is a root of the characteristic polynomial p_M(λ) = det(M − λId). When all the eigenvalues satisfy |λ| < 1, the iteration above will converge to a fixed point. However, when there is at least one eigenvalue with |λ| > 1, the iteration will diverge. The bifurcation boundary between stable and unstable dynamic can thus be parameterized by the equation p_M(e^iθ) = 0, since |e^iθ| = 1 with θ ∈ [0, 2π).

Note that a layer right now only has a single unit. Later we will treat the case where each layer contains several units. From now on, we will use the notation α_j,i = α_j→i.

2.2 A recurrent convolutional neural network

For the effective processing of images, we will need a more complex network structure. We will rely on a specific recurrent CNN based on [30]. This network has N layers with the activity of the layer l ∈ {1, 2, …, N} at time t ∈ {1, 2, …, T} given by the recurrent map R: (4) where I_l,t is the input to this layer originating in other layers (Fig 3). Each layer has units arranged as an image array of height H_l and width W_l, with separate units for C_l “channels”, so that the activity of the layer is . The temporal dynamic is initialized with h_l,0 = 0, while the external input at the lowest layer of the network is kept constant, h_0,t = x.

Download:

Fig 3. A recurrent CNN.

(a) The CNN is composed of N layers (gray boxes) and bottom-up and top-down connections (black and colored arrows). Each bottom-up connection involves spatial downsampling or local pooling. The network input is a static image (i.e. h_0,t = x) and the output is the activity of the last layer after T time steps (i.e. h_N,T). (b) Each layer l is composed of a mechanism for aggregating information from other layers (i.e. h_l−1,t, h_l+1,t−1, …, h_N,t−1) and a recurrent map R that updates the activity of the layer h_l,t—see Eq (4)—and transmits this to other layers. (c) The aggregation consists of upsampling feedback from higher layers with the function S to match spatial dimensions of the current layer and concatenating along the channel axis (⊕). Then a linear map T_l combines these channels resulting in the same number of channels as h_l−1,t, to which this feedback input is then added—see Eq (6). Finally, F_l is an additional nonlinear mapping that implements downsampling.

https://doi.org/10.1371/journal.pcbi.1011078.g003

There are several ways to define the map R. Here, we analyze the Time Decay dynamic governed by the equation where τ_l is the time scale of the layer l.

For any function R, the variable I_l,t represents the information coming from other layers, i.e. (5)

There are several ways to define the information integration mechanism (i.e. the function ϕ_l). An example is shown in Fig 3c, which is represented by the equation: (6) where F_l is a “ResNet stage” which involves downsampling (see Section A.6 of S1 Text), S is an upsampling operation to match spatial dimensions of activity from later layers h_k,t, ⊕ indicates concatenation along the channel axis, and T_l is a linear map combining the concatenated channels and reducing the channel dimension to match h_l−1,t (see Fig 3c).

Note that Eq (1) is a simplified case of the model proposed in this section. In particular, it is the result of reducing the number of units and channels per layer to one and setting F as a linear function. Note that the CNN presented here used a time delay for the feedforward connections between layers (Δ = 1)—see Eq (5).

2.3 Network training

To train these deep networks we use conventional gradient descent. In the training of recurrent neural networks, two levels of dynamics coexist:

The dynamics of network activity during inference: with t = 1, …, T.
The dynamics of parameters during training:

where is the cost function comparing the training labels y to the network prediction for the input x (i.e. ). The activity dynamics during inference (1) take place in the space of activities , while the training dynamics (2) occur in the parameter space w. Computing gradients in (2) requires bounded activity dynamics (1) during inference. From an analytical perspective, the smoothness condition of the function is sufficient for gradient calculations; however, for computational purposes, it is necessary for the function and its derivative to be finite. The schematic of Fig 4a shows a training dynamic ω₁ that remains for the entire duration in the domain of stable activation dynamic, exemplified by activity dynamic Ω₁ in Fig 4b. In contrast, learning trajectory ω₂ approaches the boundary of stability in the activity space, at which point learning can no longer progress smoothly as the activity diverges, exemplified by Ω₂ in Fig 4b. When the dynamics of the activity converge to stable fixed points, it can be demonstrated that the fixed point’s dependence on the parameters is smooth (see Section 10.2 of [29]). This ensures the smoothness of , thereby guaranteeing that trajectory w_m is, at the very least, continuous. On the other hand, bifurcations within the activity dynamics can disrupt the smoothness of the gradient and the continuity of trajectories within the parameter space.

Download:

Fig 4. Training and inference dynamic.

(a) Learning trajectories in parameter space. There exists a region in parameter space such that the dynamics of the activity are stable for those parameter values (inside the bifurcation boundary—blue curve). The trajectories ω₁, ω₂ start with stable dynamic. After some gradient updates the trajectory may remain in the stable domain (ω₁) or may move beyond the stable domain (ω₂). (b) Space of activity showing activity dynamic that converges (Ω₁: orange curve corresponding to the endpoint of learning ω₁) and dynamic that diverges (Ω₂: green curve, corresponding to the endpoint of learning ω₂).

https://doi.org/10.1371/journal.pcbi.1011078.g004

One of the main theoretical results will be that the biological implementation of delays promotes stability. Indeed, initial simulation with artificial time delays often led to unbounded activity, making learning impossible in practice. Thus, in all scenarios presented here, networks with biological time delays were used. The training was initialized with all feedback weights set to zero ensuring that training starts with stable activity dynamics. We kept the conventional ReLU nonlinearity of these networks unchanged, which supports stability as it sets the gain to zero for all negative inputs (see Section 3.6). Additionally, we have retained batch-normalization used in these networks and included it for the feedback connections thus potentially further contributing to stability (Section 3.6). Beyond that, no additional measures were necessary to keep the networks in the space of stable activation dynamic, e.g. adding an explicitly “contractive” term as in [31] to promote stability during inference was not needed.

Finally, note that the notion of convergence of the training process (2) is not practically relevant for modern deep networks, where we typically use early stopping to prevent over-training. All networks we have trained here were not trained to convergence, and parameters stayed within the stability region without any additional constrained on learning.

3 Results

We will now demonstrate mathematically—using the reduced model of Section 2.1—that recurrent networks with biologically realistic delays are more stable than artificial networks with no delays in the forward connections (3.1). We also show that longer distance feedback increases stability (3.3), including when they are added to a network with shorter feedback connections (3.5). For the special case that a network only has feedback with a fixed distance, one can gain computational efficiency with an equivalent artificial implementation, without affecting stability (3.2). These basic results on stability still hold when the reduced model is extended to include layers with multiple units (Section A.2 of S1 Text). Indeed, when networks are organized in layers, as they are in biological networks, they gain stability as the relative strength of longer feedback tends to increase 3.4. The results presented next are derived for linear networks. However, we also show that for networks using typical nonlinear activation functions stability can only improve over the linear case when there are fixed points (3.6).

In Sections 3.7 and 3.8, we report on the performance benefits obtained for visual tasks when biological feedback is added to the recurrent CNN of Section 2.2.

3.1 A biological implementation with feedforward delay is more stable

As mentioned above, the bifurcation boundary is determined by the eigenvalues with absolute value equal to 1 (i.e. λ = e^iθ, θ ∈ [0, 2π)). As we show in Section A.3 of S1 Text, λ = e^iθ is a root of if and only if: (7) or (8)

In the above equations, U_n’s are Chebyshev polynomials of the second kind: U₀(z) = 1, U₁(z) = 2z, U₂(z) = 4z² − 1, … [32]. Note that the coefficients c_n of p depend on the values of the associated matrix (see examples in Table 1). We are interested in comparing the results for the matrices M_B = M_FF + M_FB and M_A = (Id − M_FF)⁻¹M_FB reflecting biologically realistic and artificial feedforward connections (see Section 2.1):

Download:

Table 1. Coefficients of characteristic polynomials.

For matrices of size N, the coefficients of the characteristic polynomials are indicated as a function of the values of the matrix. For each size, the results are shown for the cases of biological and artificial feedforward connections. We denote κ₁ = α₁₁ + α₂₂ + α₃₃, κ₂ = α₁₂α₂₁ + α₂₃α₃₂, κ₃ = α₁₂α₂₃α₃₁, κ₄ = α₁₁α₂₂ + α₁₁α₃₃ + α₂₂α₃₃, κ₅ = α₁₁α₂₃α₃₂ + α₃₃α₁₂α₂₁, κ₆ = α₁₁α₂₂α₃₃.

https://doi.org/10.1371/journal.pcbi.1011078.t001

For example, Eqs (7) and (8) for N = 2 are: And for N = 3 they are:

In Fig 5, the region of the parameter space with stable dynamic for a few different network structures are shown. The structure in Fig 5c, in particular, is motivated by the simplified circuit diagram proposed for the ventral visual pathway, as shown schematically in Fig 1b, right. The regions of stability were determined analytically using Eqs (7) and (8) and the calculation of the coefficients c_n as functions of the weights α_j,i (e.g. Table 1). The results are shown for a 2D subspace of the parameter space for better visualization. However, in all cases, the following is true: the stability region for networks with the biological transmission is greater than or equal to the stability region for the artificial case. In the next section, we will identify the special cases where the stability regions of biological and artificial transmission are equal.

Download:

Fig 5. Bifurcation boundaries for 2, 3, and 5 layers.

The region of the parameter space where the network is stable depends on the number of layers N, the type of feedforward transmission (B: biological with delay, A: artificial without delay), and the distance of the feedback (see Eqs (7) and (8)). For the networks presented in the lower panels (a) N = 2, (b) N = 3, (c) N = 5, we see that the stability region for the biological case is larger or equal than for the artificial case. This is a repeating behavior for different N. Notation: (a) w_i = α₁₁ = α₂₂, w_p = α₁₂α₂₁. (b) w_i = α₁₁ = α₂₂ = α₃₃, w_p = α₁₂α₂₁ = α₂₃α₃₂ and α₃₁ = 0. (c) w_short = α₂₃α₃₂ = α₃₄α₄₃ = α₄₅α₅₄, w_long = α₂₃α₃₄α₄₅α₅₂. Note that in all cases, the red and blue curves intersect at the axes (w_i = 0, w_p = 0, w_short = 0 or w_long = 0). This is a consequence of the discussion presented in Section 3.2.

https://doi.org/10.1371/journal.pcbi.1011078.g005

3.2 Cases where biological and artificial transmission are equivalent

We define feedback of distance q as connections from layer l to layer l − q, therefore q ∈ {0, 1, 2, …, N − 1}. In Fig 2, the green, blue, and red arrows are feedbacks of distance 0, 1, and 2, respectively.

For a fixed distance q, the weights of the connections α_{l,l − q} form one of the diagonals in the matrix M_FB. For example, the weights α₁₁, …, α_NN (i.e. q = 0) are on the main diagonal. On the other hand, α₂₁, …, α_N,N−1 (i.e. q = 1) are on the first off-diagonal.

Suppose that in a network there are only feedforward connections (black arrows in Fig 2) and feedback connections of a fixed distance q. For example, let’s take q = 1 (blue arrows in Fig 2). Note that the black arrows form straight lines. In the artificial case, we denote them r₁, r₂, … and they are vertical lines; whereas in the biological case, we denote them d₁, d₂, … and they are diagonals. These lines are parallel to each other and only interact if there are feedback connections (blue arrows). The ordering of all the arrows indicates how information is transmitted across layers and over time. In the biological case, the blue arrows transmit information from the line d₁ to d₃, from d₃ to d₅, etc (it is possible to ignore d₂, d₄, …). In the artificial case, something similar happens as the blue arrows connect r₁ with r₂, r₂ with r₃, etc. There is a geometric transformation that maps the ordering of arrows in the biological case to the ordering of the artificial case (see Fig 6). This intuitively shows that the dynamics of both cases will be the same.

Download:

Fig 6. Equivalence between biological and artificial implementation.

When the network consists of feedforward connections and single-distance feedback connections, the biological and artificial implementations have the same bifurcation boundaries. For a network with feedback connections of distance q = 2, the biological implementation (top panel) is represented with a pattern of arrows (bold lines) that repeats (q+1)-times. The complete information about the dynamics is in the pattern (center panel). When applying a temporal contraction, the pattern is equivalent to the information flow of the artificial implementation (bottom panel). The presented theorem (see main text) shows this equivalence through the transformation of the characteristic polynomial p_B to p_A.

https://doi.org/10.1371/journal.pcbi.1011078.g006

Theorem (Proof in Section A.4 of S1 Text). Let the matrices be M_FF (feedforward weights) and (feedback connections of distance q) in . If and , then the characteristic polynomial of M_B and M_A can be expressed as: (9) where g is a polynomial with coefficients that are functions of matrices M_FF and ; and . In Table A in S1 Text, some examples of the polynomials p_A and p_B are shown. Note that the order of g and the integers k₁, k₂, k₃ only depend on N and q.

An immediate consequence of the above Theorem is that if M_B has an eigenvalue with absolute value equal to 1, then M_A has an eigenvalue with absolute value equal to 1. To see this, choose an eigenvalue z of M_B (i.e. p_B(z) = 0) with |z| = 1. Then will also satisfy p_A(w) = 0 and |w| = 1. For this reason, for a network with feedforward connections and feedback connections of a single distance q, the bifurcation boundaries of the artificial and biological implementation coincide. Therefore, the dynamics of the implementation of networks with artificial transmission (Δ = 0) is equivalent to biological transmission (Δ = 1). This means that both have the same fixed points with the same stability region. This property allows replacing the biological implementation with the artificial implementation, which is q + 1 times less computationally expensive in terms of time and the number of operations.

An example of this result is shown in Fig 5b where N = 3 and κ₃ = 0 (there are no feedback connections of distance 2). When w_i = α₁₁ = α₂₂ = α₃₃ = 0, there is only feedback of distance q = 1; whereas when w_p = κ₂ = 0, there are only feedback of distance q = 0 (see Table 1). In both cases, the stability region is the same for the biological and artificial implementations. In this particular example, the regions of stability coincide even when there are two types of feedback simultaneously (q = 0 and q = 1). But in general, the equivalence in stability between biological and artificial connections is only true if there is a single feedback distance q in the network. As we will discuss below, there are cases where considering mixed-feedback favors stability of the dynamics, and therefore the biological implementation is preferred in terms of stability.

3.3 Longer loops are more stable

To see the advantage of distant feedback, consider a simplified network such that all feedforward connections have the same weight (i.e. (M_FF)_i,j = βδ_j,i−1) and feedback connections of distance q only, all with the same weight (i.e. (M_FB)_i,j = fδ_j,i+q, |f| < 1). As seen in Section 3.2, the stability based on M_A is equivalent to the stability based on M_B. For this simplified network, (M_A)_ij = βf^i−j+qΘ(q + 1 ≤ j ≤ i + q). That is, (10) where the null blocks have q columns and the block is (11)

A more compact way of writing L is (12) where and

From the form of the matrix M_A, it follows that there are at least q independent eigenvectors associated with the eigenvalue λ = 0. The other N − q eigenvalues correspond to those of the matrix L.

On the other hand, assuming one can show that (13)

There are two cases. The first case is N − 2q < 2 (i.e. ). In this case, the equation has N − q − 1 independent solutions (i.e. L has N − q − 1 eigenvectors associated with the eigenvalue λ = 0). In addition, we find that is an eigenvector of L associated with the eigenvalue λ = βf^q(N − q). In this case, the stability of the network depends only on the factor βf^q(N − q). The term of βf^q corresponds to the effective gain of one of the loops of distance q, while N − q is the number of loops of distance q in the network (see Fig 7). So, the stability condition is that the absolute value of the effective gain of the loops is less than . Note that if the loops are longer (i.e. q increases), the number of loops N − q and the effective weight f^qb_q decrease; then, the stability threshold increases. Even when the number of loops is predetermined (i.e. it does not depend on q), the term f^q continues to decrease as a function of q and modifies the threshold. This tells us that networks with longer loops are more stable than with shorter loops.

Download:

Fig 7. The dominant eigenvalue for a network with feedforward connections and feedback connections of a single distance q.

The eigenvalue in this network is proportional to the number of loops and the effective weight of the loops (f^qβ_q). When all feedback connections of distance q are considered, there are a total of N−q loops.

https://doi.org/10.1371/journal.pcbi.1011078.g007

For the second case 2 ≤ N − 2q (i.e. ), the same result is obtained but using another argument. Note that in this case, the matrix L has N − 2q independent rows (i.e. the rank of L is N − 2q). Therefore, the dimension of the null space of L is q (i.e. the eigenvalue 0 has at least multiplicity q). This implies that where g is a polynomial of degree N − 2q whose roots satisfy (14) If λ_i > 0, then max(λ_i) = βf^q(N − q) defines the stability as in the previous case.

3.4 Fully connected networks are less stable

To see the advantages of a network with layers, consider a simple counter-example, a fully connected network with the connectivity matrix

The weight of connections between units is w_e and the self-interaction weight is w_i (usually, w_i < 0). The eigenvalues of M_B are w_i − w_e (with multiplicity N − 1) and w_i + (N − 1)w_e (multiplicity 1) (see Fig 8a). In this case, the stability condition (i.e. eigenvalues with absolute value less than 1) is equivalent to . In the limit of N → ∞, this region of stability converges to −1 < w_i < 0 and . Note that the region of stability decreases for larger networks and does not depend on the distance of the feedback connections. Furthermore, the threshold of |w_e| (i.e. ) is less than the threshold of |w_e| in layered network with feedback connections of distance q (i.e. ) for all q (see Fig 8b). This would indicate that networks ordered in layers are more stable than a fully connected network. This is a consequence of the fact that in a fully connected network, all distances of the feedback appear, including the short distances that are the least stable.

Download:

Fig 8. Fully connected and layered networks.

a) Decomposition into eigenvalues and eigenstates of a fully connected network. Nodes of the same color are in-phase synchronized, the nodes with opposite colors (yellow-red) are anti-phase synchronized, and the black nodes are deactivated. b) Dominant eigenvalue in a layered network.

https://doi.org/10.1371/journal.pcbi.1011078.g008

3.5 Mixed-feedback

In the previous sections, we obtained that: (1) the area of stability increases as feedback distance q increases and (2) fully connected networks tend to be more unstable. An intermediate case is a network with feedback of two distances, say q₁ < q₂. We can calculate p_B(λ) according to the proof of the theorem in Section A.4 of S1 Text. For the case that , the relevant eigenvalue is

The region of stability in terms of is: (15)

For finite networks, the term due to the feedback with longer distance q₂ helps to stabilize the dynamic. However, this effect is lost for very deep networks (very large N) since the stability region is defined by the equation . Then, adding a longer feedback connection favors the stability of the dynamics.

3.6 Nonlinear dynamics

Thus far we present an analysis of the linear dynamics of a neural network, focusing on the eigenvalues of the matrices M_B = M_FF + M_FB and M_A = (Id − M_FF)⁻¹M_FB for both biological and artificial cases, respectively. In the presence of a nonlinear activation F in the system, the stability analysis around the fixed point relies on calculating the eigenvalues of the following matrices where (16) and is the fixed point that satisfies .

Table 2 provides an overview of the properties of several popular activation functions. Notably, while certain activation functions like sigmoid have bounded ranges, others like softplus have unbounded ranges. However, all these functions share the common feature of having bounded derivatives denoted by F′. Moreover, except the GELU and Sigmoid Linear functions, their derivatives are bounded within the interval [−1, 1]. Consequently, we can assert that ||Diag[F′(I*)]|| ≤ 1 and the eigenvalues of and are guaranteed to possess smaller absolute values compared to those of M_B and M_A, respectively. As a result, the use of these nonlinear functions inherently promotes stability around the existing fixed point, making them favorable choices in neural network applications.

Download:

Table 2. Properties of activation functions.

https://doi.org/10.1371/journal.pcbi.1011078.t002

An additional feature of some modern deep networks is the normalization of the input to the activation functions, such as batch normalization [33]. When batch normalization is included as part of a feedback loop it contributes to keeping the gain of that link in the feedback loop constant. Therefore, if the network starts with a stable configuration, normalization likely contributes to maintaining the overall feedback gain constant across training. It’s important to note that while there are arguments related to the transformation properties of batch normalization (e.g. linearity) that support this conjecture, even though a formal proof is still lacking.

3.7 Feedback connections improve detection of small objects

We will now use the recurrent CNN described in Section 2.2 to test the effects of feedback on the performance of object detection using a state-of-the-art architecture. Specifically, we implemented the Faster Region-CNN architecture (Faster R-CNN described in Section A.5 of S1 Text) with our recurrent CNNs as a backbone) and tested performance on the COCO dataset [34]. In this architecture, the backbone is trained to extract image features that serve the detection and classification of objects. We used various configurations of the recurrent CNN to test a range of layers and types of feedback in the backbone.

Our recurrent CNNs use the same stages (or parts of them) of the ResNet-50 (see Section A.6 of S1 Text). Thus, we were able to initialize our networks with the corresponding weights from the pretrained ResNet-50 [30], which we then fine-tuned on the COCO dataset. The implementation code and the configuration files for the networks used here are available at GitHub. We use 118k images for training and 5k images for testing. In both stages, there are 80 categories of objects to be detected.

We monitor the loss for the validation set during the training process for the four different backbones we tested (see Fig 9e). Two of the backbones are purely feedforward CNNs with 3 and 5 layers (see a),c) in Fig 9, respectively). The validation loss does not improve much during training and there is minimal benefit to increasing the number of layers from 3 to 5. We also tested the same networks, but now including feedback connections of distance 0 and 1 (see b),d) in Fig 9, respectively). In these latter cases, we use time delay Δ = 1 for the feedforward connections between layers (see Section 2.2). Artificial delays with Δ = 0 tended to become unstable during learning and were not further explored. For both networks, adding feedback reduced the validation loss. Adding feedback connections is better than adding layers with feedforward connections.

Download:

Fig 9. Performance on object detection during the training process.

The results here are on the validation set. Recurrent CNNs (a-d) were used as backbones in Faster R-CNN. (a, c) Feedforward networks with 3 and 5 layers, respectively. (b, d) Feedback connections of distance 0 (green arrows) and 1 (blue arrows) were added to networks a) and c). (e) During the training stage, validation loss of the feedforward networks evolve similarly, regardless of depth (lines a), c)). Adding feedback connections reduce the validation loss (lines b), d)). (f) Average precision and recall for detection of objects of different sizes in images of the validation set. The initial value of each metric (epoch 1) tends to be higher as the number of layers in the network increases. However, the evolution of each metric depends on the size of the image and whether feedback connections are included. The gray line indicates the performance of the Feature Pyramidal Network (FPN) pretrained on this data [35].

https://doi.org/10.1371/journal.pcbi.1011078.g009

We also evaluated standard performance measures in this task, namely, the Average Recall (AR) and Average Precision (AP) on small, medium, and large objects, as defined in Section A.7 of S1 Text. The Faster R-CNN architecture proposed regions of interest based on the backbone output and then classifies or dismisses them. The results presented in Figs 9 and 10 are calculated using a maximum of N_prop = 100 region proposals per images and threshold values of intersection-over-union (IoU) in the range 0.5 : 0.95 (for more details, see Section A.7 of S1 Text).

Download:

Fig 10. Examples of object detection and classification results.

Predictions for five images (rows) of the evaluation set using Faster R-CNN. The backbone for Faster Region-CNN is one of four of our recurring CNNs or Feature Pyramidal Network—FPN (columns).

https://doi.org/10.1371/journal.pcbi.1011078.g010

Fig 9f presents the evolution of these metrics during training. Each column corresponds to the size of objects (small, medium, large). These results show that the initial and final values of the metrics, and their temporal evolution, depend on the depth of the network, the feedback connections, and the size of the objects. More precisely, we observed that the initial performance (epoch 1), for both AP and AR, is higher in deeper networks. This result is independent of the size of the objects (see Fig 9f) and is due to the fact that the networks c), and d) are more similar to the pretrained ResNet-50. In addition, when the network has five layers, the initial value of the metrics for large objects are not changed if feedback connections are added; however, for medium and small objects, these connections help to improve performance.

Note that networks with feedback connections perform better than feedforward networks of the same depth (compare black vs orange lines in Fig 9). This result is the same for all sizes of objects. Furthermore, for small objects, the network with three layers and feedback connections has better performance than the five-layer network without feedback connections (orange solid lines vs black dashed lines).

For comparison Fig 9 also show the performance of the Feature Pyramidal Network, which is a current benchmark for this object detection task [35]. The FPN architecture consists of a bottom-up pathway, a top-down pathway, and lateral connections. As in our architectures, the bottom-up pathway is the feedforward computation of the backbone (i.e. F_L in Eq (6)). More precisely, [35] uses a ResNet-50. The main difference between FPN and our architectures is the implementation of the feedback (i.e. recurrent map R in Eq (4) and integration mechanism ϕ_l in Eq (5)). The FPN can be thought of as having feedback from all layers (hence the name “feature pyramid”) and recurrence is iterated for a single time step. It is likely this hierarchical feedback that provides a performance boost to the FPN.

In Fig 10, we show some examples of predictions with the Faster Region-CNN using different backbones. In the first four columns (a-d), we are using the recurrent CNNs implemented here as the backbone (i.e. Fig 9(a)–9(d)); while in the last column, we use the FPN. The examples in Fig 10 show that for networks with feedback connections, the detection was improved over small and medium objects (see b) vs a) and d) vs c)). Furthermore, the predictions shown for the network d) and FPN coincide in most cases.

3.8 Feedback connections improve robustness against noise

In this section, we discuss the effect of feedback on image classification performance. For this, we implement a neural network that consists of a features extractor, a pooling operation, and a classifier (i.e. perceptron). We use three networks presented in Fig 11(a)–11(c) as feature extractors. Networks a) and c) are feedforward architectures with two and three layers, respectively. Also, only one distance feedback connection q = 0 was added to the network a) (see Fig 11b). As in the previous section, for the feature extractors (a-c), we use the architecture described in Section 2.2, but using ResNet-18 stages.

Download:

Fig 11. Accuracy in classification task.

Recurrent CNNs (a-c) were used as feature extractors in the classification task. (a, c) Feedforwards networks with 2 and 3 layers, respectively. b) Feedback connection of distance 0 (green arrow) was added to network a). During the training of the networks (a-c), the accuracy calculated over the training set and test set increases. The performance of the networks is reduced when evaluating on the images of the test set with Gaussian noise.

https://doi.org/10.1371/journal.pcbi.1011078.g011

We use the CIFAR-10 dataset which consists of 60000 color images in 10 classes (0: airplane, 1: automobile, 2: bird, 3: car, 4: deer, 5: dog, 6: frog, 7: horse, 8: ship, 9: truck), with 6000 images per class. There are 50000 training images and 10000 test images [36]. The output of the classifier is a 10-dimensional vector indicating the probability that the input image belongs to each class and the final prediction of the network is the class with the highest probability. The evaluation of the networks is expressed in terms of accuracy and its confidence intervals (95%), which were estimated using a bootstrap procedure.

In Fig 11, we show the accuracy of the three networks calculated for different groups of images. In the left and center panels, accuracy as a function of training epochs was calculated using the training set and test images, respectively. After training, the deepest network (c) performs better than the other two (test acc = 90.2 ± 0.3%). Note that while network (b) fits the training data better than (a) (94.1 ± 0.3% vs 87.2 ± 0.4%), network (a) performs better over the test set (81.3 ± 0.4% vs 78.2 ± 0.4%). In the right panel, the performance of trained networks on noisy images is shown. Gaussian noise was added to each image in the test set. The noise has a mean value of 0 and the standard deviation is proportional to the standard deviation of the dataset. This proportionality factor is called “noise level”. Note that the performance of network (b) is higher than that of network (a) when the noise level is greater than 0.1. Furthermore, it is also greater than the performance of the network (c) when the noise level is greater than 0.35. That is, from a certain noise level, the performance of the recurrent network (b) will be more robust against noise than that of the purely feedforward networks (a,c).

3.9 Activity dynamic reduces entropy, improving classification performance over time

Due to the dynamic activity of the network, the output of the classifier (i.e. vector of probabilities) changes over time during the inference stage. Furthermore, each vector is associated with an entropy value (i.e. -). In cases where entropy is high (∼ log₂ 10 = 3.13), all probabilities are close to , indicating that the network is less certain about a class selection. Conversely, in cases of low entropy (∼ 0), there is a class with a maximum probability close to 1. Consequently, in high-entropy cases, the final prediction is more susceptible to errors.

For the trained network with the feature extractor shown in Fig 11b, we computed the classifier’s output for all images in the test set and for different time steps of the activity dynamic (t = 1, …, 6). We applied the t-SNE visualization method [37] to show the temporal evolution of the output vector in the activity space for some input images (Fig 12a). Notably, these trajectories converge to a fixed point, replicating the schematic representation of Fig 4b. When monitoring the entropy of outputs, we find that the dynamic decreases in uncertainty as to the class identity over time, i.e. the networks gains in “confidence” over time (Fig 12b).

Download:

Fig 12. Temporal dynamic of the classification network.

This simulation uses the trained network of Fig 11b. a) Examples of t-SNE projection of the trajectories of the activity space of the last layer. The activity space has dimension 10; while the projection is two-dimensional. Please note that for feedforward networks without temporal dynamics, the trajectory is a point that remains constant over time. Therefore, the stability of the dynamics during inference is assured. b) Distribution of entropy of the output as a function of the time steps in the inference stage. c) Performance of the network as a function of the time steps in the inference stage for the low (green line) and high (red line) entropy images. d) Performance of the network as a function of the time steps in the inference stage. We evaluated the network on the complete test (solid black line) and by classes (dashed lines). Based on the accuracy by classes, the easy and hard classes to classify by the network were identified (green and red lines, respectively).

https://doi.org/10.1371/journal.pcbi.1011078.g012

We categorized images with high entropy (resp. low entropy) as those whose outputs have entropy greater (resp. lower) than the mean entropy at t = 1 (see t = 1 in Fig 12b). In Fig 12c, we show the network’s performance as a function of time steps in the inference stage for both groups (red curve: high entropy, green curve: low entropy). The network’s performance outperforms in cases of low entropy, reaching a convergence of 88%, while for high entropy, it converges to 74% (see Fig 12c). The time needed for performance convergence differs between the two groups. Specifically, at t = 3, the low entropy group’s performance is 96% of the final value, whereas for the high entropy group, it’s 86%. The high entropy group requires an additional time step (t = 4) to reach 96% of the final value. This result is reminiscent of the finding in the inferior temporal (IT) cortex of primates, whereby neurons reach more “confident” decisions later in time (∼ 30 ms) for more challenging images (see Fig 2 in [18]). We see a similar result (Fig 12d) when separating performance for classes that are more challenging to identify (class 2: bird, class 3: car, class 4: deer, and class 6: frog).

4 Discussion

In this work, we studied the dynamics of recurrent networks with static inputs. We observed that the stability region for networks with biologically realistic feedforward delay is larger than for artificial networks without feedforward delays. Furthermore, we showed that in networks with feedback connections of fixed distance, the stability of both implementations (biological and artificial) are equivalent. This is a consequence of the presence of a single time scale when only a single feedback distance is present. Using this last result, we found that the effective gain of longer loops dominates the dynamic and improves overall stability. In fact, adding longer distance loops can improve the stability of a recurrent network. Note that, implicitly, deeper networks can accommodate longer loops. Furthermore, layered networks tend to be more stable than fully connected networks, as they tend to increase the loop distance compared to fully connected networks. While some of these mathematical results were derived with “layers” consisting of individual units, we showed that the results generalize to layers with multiple uniform units, which is common in both artificial and biological recurrent networks. Finally, we demonstrated that typically nonlinear activation functions only contributed to increase stability. In total, we found that basic organizational principles in biological networks favor stability, namely, feedforward delays, a layered organization with similar units in each layer, long-range feedback, and nonlinear activations.

The computational power of deep networks has now been widely demonstrated, with state-of-the-art performance using up to a hundred or more layers. However, such very deep networks are not biologically realistic, and the argument has been made that recurrent processing can add processing steps in a reduced architecture [38, 39]. Therefore, the important question is whether adding feedback benefit performance at a limited depth. We implemented and evaluated recurrent CNNs for object detection and image classification in the COCO and CIFAR10 datasets, respectively. We used biological feedback to ensure stability during learning. The feedback connections helped to improve the detection of small objects and to obtain robust performances against noise in the classification task. This is consistent with previous work [16, 40, 41] showing that recurrent dynamics improve recognition performance in the challenging scenario of partial occlusion (e.g., multiple targets occluding each other) or degraded images. Importantly, the temporal dynamics of these recurrent networks were reminiscent of the activity dynamic in biological vision [18] as discussed in more detail below.

The analytical results we derived here assumed a simplified linear recurrent network. For nonlinear networks, the same analysis can be carried out by linearizing around fixed points. As we showed in Section 3.6 the typical nonlinear activation functions used in current network models can only improve stability in existing fixed points. In this sense, here we performed as worst-case analysis. In nonlinear networks with bounded activations (as in biological systems), even unstable fix points are likely to result in oscillations with stable limit cycles. The analysis of such limit cycles is more complex and beyond the scope of this work. Another limitation of this work is that many of the analytic results were obtained for special cases with simplified connection weights that capture the essence of the phenomena. We conjecture that similar results hold on average under random connection strengths. Similarly, the results were derived for uniform time delays. However, in biological networks, time delays are not uniform across the network. An outlook on how to treat the case of non-uniform time delays is provided in Section A.1 of S1 Text.

From a mathematical perspective, Eqs (1) and (4) represent the temporal evolution of the activity in neural networks and are examples of discrete-time dynamic systems [29]. Some of the main results of this work are a consequence of applying the bifurcation theory of dynamic systems to these cases. The temporal evolution depends on the weights of the network connections. The set of weights and the input x (i.e. image) define the possible trajectories that exist in phase space (see Fig 1c). That is, the structure of the phase space (e.g. fixed points, periodic orbits, invariant torus) will also depend on the parameters of the network. For fixed points of the dynamic, we studied its behavior as a function of the connectivity and identified the bifurcation point where its local stability changes. This type of stability analysis is one of the first steps in the general study of phase space. The next step is the analysis of attractors or limit cycles [29]. However, in our work, this step is sufficient as we have focused on vision tasks associated with static images, such as classification and object detection. These tasks are considered core vision that is completed in primate within a few hundred milliseconds [3, 18, 27, 28, 42], i.e. within a single fixation. The importance of feedback in the core vision has been demonstrated, for instance, in the classification of images in background clutter [18]. The last “layer” of this system of core vision is the inferior temporal (IT) cortex where one can linearly decode the class identity of images from neural activity. As time progresses after the image presentation, the decoding performance increases reaching a peak at 100–200 ms [18]. Importantly, challenging images take longer to “decode” by about 30 ms which corresponds to approximately two additional processing steps. Here we found that in object classification with top-down feedback, performance increases over time of the activity dynamic with challenging images taking longer to achieve maximum performance (Fig 12). A limitation of the present work is that we have only analyzed the case of a static input. Yet, primate vision is marked by static input during fixations, but changes of fixation in a sequence of saccades, often attracted by salient and moving objects. It would be interesting to determine the role of feedback in those dynamical contexts [43–45], where information across fixations is integrated.

In the context of time-sequence processing, a dynamic that converges to a fixed point may be quite restrictive, and a more diverse dynamic, perhaps with limit cycles, could be more expressive [46]. However, in the context of static inputs, we note that purely feedforward nonlinear networks can be highly expressive, despite being “stable”. Empirically, we found that adding loops to pretrained deep networks can enhance performance. The search began in the proximity of an expressive network and led to improvements. This leads us to conclude that there are situations where adding stable feedback can contribute to the expressiveness of nonlinear networks. In addition to fixed points and stable limit cycles, neural networks can exhibit chaotic behavior. Chaotic dynamics can be leveraged to enhance information processing capacity, long-term memory [47], and adaptability in practical applications [48]. However, chaotic dynamics can also introduce challenges in prediction, control, and training due to their extreme sensitivity to initial conditions [49]. This sensitivity must be carefully considered in the system design and the tuning of learning parameters to ensure stability and proper functionality.

Here, we argued that the structured organization of connections in a network contributes to stability. However, at first glance, the visual system seems to exhibit densely interconnected recurrent pathways [50]. It’s important to recognize, though, that brain networks are far from fully connected [51]. In particular, for the ventral visual pathway, there is clear sequential processing across the processing hierarchy with top-down feedback [12]. There are also connections of the visual hierarchy with subcortical brain nuclei and other cortices, but these are not necessarily reciprocal connections [12]. The simplified model structure we analyzed here (e.g. Fig 5c) is motivated by the specifics of the wiring diagram (Fig 1b) that have been proposed for core vision, e.g. [28]. Future work may use the formalism proposed here to analyze the stability of other network motives.

Here we emphasized stability when discussing network organization. There are a multitude of theoretical and experimental studies on other principles of network organization in the mammalian brain. On the largest scale of the whole brain, this includes observation of a small-world structure with densely connected hubs and sparse long-range connections [51]. The overall structure of the human brain appears to form a set of segregated networks that exhibit correlations within each network [52], such as the default mode network, ventral and dorsal attention networks, visual network, etc. Brain organization also appears to exhibit gradients in the microstructure such as inhibitory and excitatory strength and connectivity [53] as well as functional gradients such as in the time scale [54] which has been linked to cortical microstructure [55]. Here we have narrowly focused on the effect of delays on stability, and how different connectivity motives may aid stability, and contrasted this with how sequential processing in artificial neural networks incorporate delays. We found that layers and long-range feedback contribute to stability, but do not mean to imply that the only purpose of the layered organization is stability. Stability is also facilitated, for instance by a balance between excitation and inhibitory feedback, e.g. [56]. A caveat of the present study is that we have not analyzed this important principle of stability in biological networks [57].

As mentioned above, the fixed points do not depend on the number of time steps used. When a trajectory converges to a fixed point, it can be interpreted that this point condenses all the information of this trajectory. Therefore, each input of the network x and initial condition h₀, will be associated with a fixed point . This interpretation seems to be very similar to the simple construction of a feedforward network (input x—output relationship). However, the main difference is that each fixed point of a dynamic system defines a basin of attraction: small perturbations of x and h₀ (e.g. Δx: noise) do not modify the fixed point . We believe that this is the basis for the robustness of the dynamics against the noise we have observed. On the other hand, stable fixed points are a particular case of bounded dynamics (i.e. ). Clearly, dynamic systems with unbounded trajectories are a serious problem both for the calculation of the output and the training of the network (calculation of gradients).

Although recurrent networks seem to be crucial in visual processing, a bottleneck for computational models is the computational cost of the standard algorithm for training (BPTT: “back-propagation through time”) [24], which has to propagate errors backward in time for every learning step. In recent years, efforts have focused on efficient approximations to BPTT [25, 58]. An example is recurrent backpropagation (RBP) which assumes that the dynamical system converges to a task-optimized fixed point; under this assumption, a constant memory-complexity is achieved with recursive processing steps. In [31], Linsley shows that stable dynamics improve performance in large-scale computer vision challenges. However, in general, this assumption is very strong as it depends on the network parameters during training. One way to ensure that recurrent models are stable is to apply penalties (e.g. Contractor-RBP) [31]. In this work, we have shown that both the architecture (i.e. type of connections and order of loops) and type of feedforward transmission (biological vs artificial) plays an important role in the stability of the dynamics. The results presented here indicate that there are more favorable architectures for the application of RBP, which may not require additional constraints to ensure stability. Specifically, one can show that RBP becomes an exact algorithm if the dynamic has a stable fix point (in preparation). In short, with the proper choice of feedback, deep learning models may become easily trainable biologically inspired networks.

A possible future direction of this research is to analyze the role of feedback connections at different levels. The most popular networks for visual processing tasks consist of 1) a backbone that reduces the dimensionality of the input using convolutions and returns a set of features, and 2) a predictor that returns the output as a function of the features. In the architectures presented in this work, we only added feedback connections to the backbone (low/mid-level); while the predictor was not modified. Some works use architectures where only the predictor is a recurrent network [59, 60]. The next step is to use both levels of feedback connections, which may represent the top-down feedback across fixations and can be useful for the integration of information across a larger image (e.g. interaction between objects, action recognition) or may serve to integrate information across time in a dynamic visual input (e.g. video processing).

Supporting information

S1 Text. Mathematical proofs and implementation details.

https://doi.org/10.1371/journal.pcbi.1011078.s001

(PDF)

References

1. Huff T, Mahabadi N, Tadi P. Neuroanatomy, Visual Cortex. In: StatPearls. Treasure Island (FL): StatPearls Publishing; 2023 Jan.
2. Grill-Spector K, Weiner KS. The functional architecture of the ventral temporal cortex and its role in categorization. Nature reviews Neuroscience. 2014;15(8):536–48. pmid:24962370
- View Article
- PubMed/NCBI
- Google Scholar
3. DiCarlo JJ, Zoccolan D, Rust NC. How does the brain solve visual object recognition? Neuron. 2012;73(3):415–34. pmid:22325196
- View Article
- PubMed/NCBI
- Google Scholar
4. Peirce JW. Understanding mid-level representations in visual processing. Journal of Vision. 2015;15(7):5. pmid:26053241
- View Article
- PubMed/NCBI
- Google Scholar
5. Xu Y. A Tale of Two Visual Systems: Invariant and Adaptive Visual Information Representations in the Primate Brain. Annual Review of Vision Science. 2018;4(1):311–36. pmid:29949722
- View Article
- PubMed/NCBI
- Google Scholar
6. Richards BA, Lillicrap TP, Beaudoin P, Bengio Y, Bogacz R, Christensen A, et al. A deep learning framework for neuroscience. Nature Neuroscience. 2019;22(11):1761–70. pmid:31659335
- View Article
- PubMed/NCBI
- Google Scholar
7. Kriegeskorte N. Deep Neural Networks: A New Framework for Modeling Biological Vision and Brain Information Processing. Annual Review of Vision Science. 2015;1(1):417–46. pmid:28532370
- View Article
- PubMed/NCBI
- Google Scholar
8. Yamins DLK, DiCarlo JJ. Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience. 2016;19(3):356–65. pmid:26906502
- View Article
- PubMed/NCBI
- Google Scholar
9. Yamins DLK, Hong H, Cadieu CF, Solomon EA, Seibert D, DiCarlo JJ. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences of the United States of America. 2014;111(23):8619–24. pmid:24812127
- View Article
- PubMed/NCBI
- Google Scholar
10. Khaligh-Razavi S-M, Kriegeskorte N. Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation. PLOS Computational Biology. 2014;10(11):e1003915. pmid:25375136
- View Article
- PubMed/NCBI
- Google Scholar
11. Groen IIA, Silson EH, Baker CI. Contributions of low- and high-level properties to neural processing of visual scenes in the human brain. Philosophical Transactions of the Royal Society B: Biological Sciences. 2017;372(1714):20160102. pmid:28044013
- View Article
- PubMed/NCBI
- Google Scholar
12. Kravitz DJ, Saleem KS, Baker CI, Ungerleider LG, Mishkin M. The ventral visual pathway: an expanded neural framework for the processing of object quality. Trends in Cognitive Sciences. 2013;17(1):26–49. pmid:23265839
- View Article
- PubMed/NCBI
- Google Scholar
13. Markov NT, Vezoli J, Chameau P, Falchier A, Quilodran R, Huissoud C, et al. Anatomy of hierarchy: Feedforward and feedback pathways in macaque visual cortex. Journal of Comparative Neurology. 2014;522(1):225–59. pmid:23983048
- View Article
- PubMed/NCBI
- Google Scholar
14. Hupé JM, James AC, Girard P, Lomber SG, Payne BR, Bullier J. Feedback connections act on the early part of the responses in monkey visual cortex. Journal of Neurophysiology. 2001;85(1):134–45. pmid:11152714
- View Article
- PubMed/NCBI
- Google Scholar
15. Wyatte D, Jilk DJ, O’Reilly RC. Early recurrent feedback facilitates visual object recognition under challenging conditions. Frontiers in Psychology. 2014;5. pmid:25071647
- View Article
- PubMed/NCBI
- Google Scholar
16. Tang H, Schrimpf M, Lotter W, Moerman C, Paredes A, Ortega Caro J, et al. Recurrent computations for visual pattern completion. Proceedings of the National Academy of Sciences. 2018;115(35):8835–40. pmid:30104363
- View Article
- PubMed/NCBI
- Google Scholar
17. Kietzmann TC, Spoerer CJ, Sörensen LKA, Cichy RM, Hauk O, Kriegeskorte N. Recurrence is required to capture the representational dynamics of the human visual system. Proceedings of the National Academy of Sciences. 2019;116(43):21854–63. pmid:31591217
- View Article
- PubMed/NCBI
- Google Scholar
18. Kar K, Kubilius J, Schmidt K, Issa EB, DiCarlo JJ. Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior. Nature Neuroscience. 2019;22(6):974–83. pmid:31036945
- View Article
- PubMed/NCBI
- Google Scholar
19. Golesorkhi M, Gomez-Pilar J, Zilio F, Berberian N, Wolff A, Yagoub MCE, et al. The brain and its time: intrinsic neural timescales are key for input processing. Communications Biology. 2021;4(1):1–16. pmid:34400800
- View Article
- PubMed/NCBI
- Google Scholar
20. Izhikevich EM. Dynamical systems in neuroscience: the geometry of excitability and bursting. Cambridge, Mass: MIT press; 2007.
21. Li L, Lu B, Yan C-G. Stability of dynamic functional architecture differs between brain networks and states. bioRxiv; 2019. pmid:31577959
- View Article
- PubMed/NCBI
- Google Scholar
22. Kozachkov L, Lundqvist M, Slotine J-J, Miller EK. Achieving stable dynamics in neural circuits. PLOS Computational Biology. 2020;16(8):e1007659. pmid:32764745
- View Article
- PubMed/NCBI
- Google Scholar
23. Haber E, Ruthotto L. Stable architectures for deep neural networks. Inverse Probl. 2017
- View Article
- Google Scholar
24. Werbos PJ. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE. 1990;78(10):1550–60.
- View Article
- Google Scholar
25. Liao R, Xiong Y, Fetaya E, Zhang L, Yoon K, Pitkow X, et al. Reviving and Improving Recurrent Back-Propagation. ArXiv. 2018.
26. Covington BP, Al Khalili Y. Neuroanatomy, Nucleus Lateral Geniculate. In: StatPearls. Treasure Island (FL): StatPearls Publishing; 2023 Jan.
27. Nayebi A, Sagastuy-Brena J, Bear DM, Kar K, Kubilius J, Ganguli S, et al. Recurrent Connections in the Primate Ventral Visual Stream Mediate a Trade-Off Between Task Performance and Network Size During Core Object Recognition. Neural Computation. 2022;34(8):1652–75. pmid:35798321
- View Article
- PubMed/NCBI
- Google Scholar
28. DiCarlo JJ, Cox DD. Untangling invariant object recognition. Trends in Cognitive Sciences. 2007;11(8):333–41. pmid:17631409
- View Article
- PubMed/NCBI
- Google Scholar
29. Kuznetsov YA. Elements of Applied Bifurcation Theory. New York, NY: Springer; 2004 2004.
30. He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. arXiv; 2015.
31. Linsley D, Ashok AK, Govindarajan LN, Liu R, Serre T. Stable and expressive recurrent vision models. arXiv; 2020.
32. Arfken GB, Weber HJ. Mathematical Methods for Physicists, 6th Edition. Amsterdam, Heidelberg: Academic Press; 2005.
33. Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ICML’15: In Proceedings of the 32nd International Conference on International Conference on Machine Learning, 2015, 448–456.
34. Lin T-Y, Maire M, Belongie S, Bourdev L, Girshick R, Hays J, et al. Microsoft COCO: Common Objects in Context. arXiv; 2015.
35. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature Pyramid Networks for Object Detection. arXiv; 2017.
36. Krizhevsky A. Learning multiple layers of features from tiny images. Technical Report TR-2009, University of Toronto, Toronto.
37. Maaten Lvd, Hinton G. Visualizing Data using t-SNE. Journal of Machine Learning Research. 2008;9(86):2579–605.
- View Article
- Google Scholar
38. Stelzer F, Röhm A, Vicente R, Fischer I, Yanchuk S. Deep neural networks using a single neuron: folded-in-time architecture using feedback-modulated delay loops. Nature Communications. 2021;12(1):5164. pmid:34453053
- View Article
- PubMed/NCBI
- Google Scholar
39. Liao Q, Poggio T. Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex. arXiv; 2016.
40. Spoerer CJ, McClure P, Kriegeskorte N. Recurrent Convolutional Neural Networks: A Better Model of Biological Object Recognition. Frontiers in Psychology. 2017;8. pmid:28955272
- View Article
- PubMed/NCBI
- Google Scholar
41. Lindsay GW, Mrsic-Flogel TD, Sahani M. Bio-inspired neural networks implement different recurrent visual processing strategies than task-trained ones do. bioRxiv; 2022.
- View Article
- Google Scholar
42. Cadieu CF, Hong H, Yamins DLK, Pinto N, Ardila D, Solomon EA, et al. Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition. PLOS Computational Biology. 2014;10(12):e1003963. pmid:25521294
- View Article
- PubMed/NCBI
- Google Scholar
43. Edwards G, Vetter P, McGruer F, Petro LS, Muckli L. Predictive feedback to V1 dynamically updates with sensory input. Scientific Reports. 2017;7(1):16538. pmid:29184060
- View Article
- PubMed/NCBI
- Google Scholar
44. Spratling MW. Predictive coding as a model of cognition. Cognitive Processing. 2016;17(3):279–305. pmid:27118562
- View Article
- PubMed/NCBI
- Google Scholar
45. Vetter P, Edwards G, Muckli L. Transfer of Predictive Signals Across Saccades. Frontiers in Psychology. 2012;3. pmid:22701107
- View Article
- PubMed/NCBI
- Google Scholar
46. Zegers P, Sundareshan MK. Trajectory generation and modulation using dynamic neural networks. IEEE Transactions on Neural Networks. 2003;14(3):520–33. pmid:18238036
- View Article
- PubMed/NCBI
- Google Scholar
47. Aram Z, Jafari S, Ma J, Sprott JC, Zendehrouh S, Pham V-T. Using chaotic artificial neural networks to model memory in the brain. Communications in Nonlinear Science and Numerical Simulation. 2017;44:449–59.
- View Article
- Google Scholar
48. Ryeu JK, Chung HS. Chaotic recurrent neural networks and their application to speech recognition. Neurocomputing. 1996;13(2):281–94.
- View Article
- Google Scholar
49. Mikhaeil JM, Monfared Z, Durstewitz D. On the difficulty of learning chaotic dynamics with RNNs. arXiv; 2022.
50. Felleman DJ, Van Essen DC. Distributed hierarchical processing in the primate cerebral cortex. Cereb Cortex. 1991 Jan-Feb;1(1):1–47.
- View Article
- Google Scholar
51. Bullmore E, Sporns O. Complex brain networks: graph theoretical analysis of structural and functional systems. Nat Rev Neurosci, 2009; 10, 186–198. pmid:19190637
- View Article
- PubMed/NCBI
- Google Scholar
52. Yeo BT, Krienen FM, Sepulcre J, Sabuncu MR, Lashkari D, Hollinshead M, et al. The organization of the human cerebral cortex estimated by intrinsic functional connectivity. J Neurophysiol. 2011 Sep;106(3):1125–65. pmid:21653723
- View Article
- PubMed/NCBI
- Google Scholar
53. Wang XJ. Macroscopic gradients of synaptic excitation and inhibition in the neocortex. Nat Rev Neurosci 2020; 21, 169–178. pmid:32029928
- View Article
- PubMed/NCBI
- Google Scholar
54. Murray J, Bernacchia A, Freedman D, Romo R, Wallis JD, Cai Xinying, et al. A hierarchy of intrinsic timescales across primate cortex. Nat Neurosci 2014; 17, 1661–1663. pmid:25383900
- View Article
- PubMed/NCBI
- Google Scholar
55. Gao R, van den Brink RL, Pfeffer T, Voytek B. Neuronal timescales are functionally dynamic and shaped by cortical microarchitecture. eLife 2020 Nov. pmid:33226336
- View Article
- PubMed/NCBI
- Google Scholar
56. Wang XJ. Probabilistic decision making by slow reverberation in cortical circuits. Neuron, 2002. pmid:12467598
- View Article
- PubMed/NCBI
- Google Scholar
57. Lim S, Goldman MS. Balanced cortical microcircuitry for maintaining information in working memory. Nat Neurosci, 2013 Aug 18;16(9):1306–14. pmid:23955560
- View Article
- PubMed/NCBI
- Google Scholar
58. Gruslys A, Munos R, Danihelka I, Lanctot M, Graves A. Memory-Efficient Backpropagation Through Time. Neural Information Processing Systems, 2016.
- View Article
- Google Scholar
59. McIntosh L, Maheswaranathan N, Sussillo D, Shlens J. Recurrent Segmentation for Variable Computational Budgets. arXiv; 2018.
60. Shi J, Wen H, Zhang Y, Han K, Liu Z. Deep recurrent neural network reveals a hierarchy of process memory during dynamic natural vision. Human Brain Mapping. 2018;39(5):2269–82. pmid:29436055
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Huff T, Mahabadi N, Tadi P. Neuroanatomy, Visual Cortex. In: StatPearls. Treasure Island (FL): StatPearls Publishing; 2023 Jan.

[ref2] 2. Grill-Spector K, Weiner KS. The functional architecture of the ventral temporal cortex and its role in categorization. Nature reviews Neuroscience. 2014;15(8):536–48. pmid:24962370
View Article
PubMed/NCBI
Google Scholar

[3] View Article

[4] PubMed/NCBI

[5] Google Scholar

[ref3] 3. DiCarlo JJ, Zoccolan D, Rust NC. How does the brain solve visual object recognition? Neuron. 2012;73(3):415–34. pmid:22325196
View Article
PubMed/NCBI
Google Scholar

[7] View Article

[8] PubMed/NCBI

[9] Google Scholar

[ref4] 4. Peirce JW. Understanding mid-level representations in visual processing. Journal of Vision. 2015;15(7):5. pmid:26053241
View Article
PubMed/NCBI
Google Scholar

[11] View Article

[12] PubMed/NCBI

[13] Google Scholar

[ref5] 5. Xu Y. A Tale of Two Visual Systems: Invariant and Adaptive Visual Information Representations in the Primate Brain. Annual Review of Vision Science. 2018;4(1):311–36. pmid:29949722
View Article
PubMed/NCBI
Google Scholar

[15] View Article

[16] PubMed/NCBI

[17] Google Scholar

[ref6] 6. Richards BA, Lillicrap TP, Beaudoin P, Bengio Y, Bogacz R, Christensen A, et al. A deep learning framework for neuroscience. Nature Neuroscience. 2019;22(11):1761–70. pmid:31659335
View Article
PubMed/NCBI
Google Scholar

[19] View Article

[20] PubMed/NCBI

[21] Google Scholar

[ref7] 7. Kriegeskorte N. Deep Neural Networks: A New Framework for Modeling Biological Vision and Brain Information Processing. Annual Review of Vision Science. 2015;1(1):417–46. pmid:28532370
View Article
PubMed/NCBI
Google Scholar

[23] View Article

[24] PubMed/NCBI

[25] Google Scholar

[ref8] 8. Yamins DLK, DiCarlo JJ. Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience. 2016;19(3):356–65. pmid:26906502
View Article
PubMed/NCBI
Google Scholar

[27] View Article

[28] PubMed/NCBI

[29] Google Scholar

[ref9] 9. Yamins DLK, Hong H, Cadieu CF, Solomon EA, Seibert D, DiCarlo JJ. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences of the United States of America. 2014;111(23):8619–24. pmid:24812127
View Article
PubMed/NCBI
Google Scholar

[31] View Article

[32] PubMed/NCBI

[33] Google Scholar

[ref10] 10. Khaligh-Razavi S-M, Kriegeskorte N. Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation. PLOS Computational Biology. 2014;10(11):e1003915. pmid:25375136
View Article
PubMed/NCBI
Google Scholar

[35] View Article

[36] PubMed/NCBI

[37] Google Scholar

[ref11] 11. Groen IIA, Silson EH, Baker CI. Contributions of low- and high-level properties to neural processing of visual scenes in the human brain. Philosophical Transactions of the Royal Society B: Biological Sciences. 2017;372(1714):20160102. pmid:28044013
View Article
PubMed/NCBI
Google Scholar

[39] View Article

[40] PubMed/NCBI

[41] Google Scholar

[ref12] 12. Kravitz DJ, Saleem KS, Baker CI, Ungerleider LG, Mishkin M. The ventral visual pathway: an expanded neural framework for the processing of object quality. Trends in Cognitive Sciences. 2013;17(1):26–49. pmid:23265839
View Article
PubMed/NCBI
Google Scholar

[43] View Article

[44] PubMed/NCBI

[45] Google Scholar

[ref13] 13. Markov NT, Vezoli J, Chameau P, Falchier A, Quilodran R, Huissoud C, et al. Anatomy of hierarchy: Feedforward and feedback pathways in macaque visual cortex. Journal of Comparative Neurology. 2014;522(1):225–59. pmid:23983048
View Article
PubMed/NCBI
Google Scholar

[47] View Article

[48] PubMed/NCBI

[49] Google Scholar

[ref14] 14. Hupé JM, James AC, Girard P, Lomber SG, Payne BR, Bullier J. Feedback connections act on the early part of the responses in monkey visual cortex. Journal of Neurophysiology. 2001;85(1):134–45. pmid:11152714
View Article
PubMed/NCBI
Google Scholar

[51] View Article

[52] PubMed/NCBI

[53] Google Scholar

[ref15] 15. Wyatte D, Jilk DJ, O’Reilly RC. Early recurrent feedback facilitates visual object recognition under challenging conditions. Frontiers in Psychology. 2014;5. pmid:25071647
View Article
PubMed/NCBI
Google Scholar

[55] View Article

[56] PubMed/NCBI

[57] Google Scholar

[ref16] 16. Tang H, Schrimpf M, Lotter W, Moerman C, Paredes A, Ortega Caro J, et al. Recurrent computations for visual pattern completion. Proceedings of the National Academy of Sciences. 2018;115(35):8835–40. pmid:30104363
View Article
PubMed/NCBI
Google Scholar

[59] View Article

[60] PubMed/NCBI

[61] Google Scholar

[ref17] 17. Kietzmann TC, Spoerer CJ, Sörensen LKA, Cichy RM, Hauk O, Kriegeskorte N. Recurrence is required to capture the representational dynamics of the human visual system. Proceedings of the National Academy of Sciences. 2019;116(43):21854–63. pmid:31591217
View Article
PubMed/NCBI
Google Scholar

[63] View Article

[64] PubMed/NCBI

[65] Google Scholar

[ref18] 18. Kar K, Kubilius J, Schmidt K, Issa EB, DiCarlo JJ. Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior. Nature Neuroscience. 2019;22(6):974–83. pmid:31036945
View Article
PubMed/NCBI
Google Scholar

[67] View Article

[68] PubMed/NCBI

[69] Google Scholar

[ref19] 19. Golesorkhi M, Gomez-Pilar J, Zilio F, Berberian N, Wolff A, Yagoub MCE, et al. The brain and its time: intrinsic neural timescales are key for input processing. Communications Biology. 2021;4(1):1–16. pmid:34400800
View Article
PubMed/NCBI
Google Scholar

[71] View Article

[72] PubMed/NCBI

[73] Google Scholar

[ref20] 20. Izhikevich EM. Dynamical systems in neuroscience: the geometry of excitability and bursting. Cambridge, Mass: MIT press; 2007.

[ref21] 21. Li L, Lu B, Yan C-G. Stability of dynamic functional architecture differs between brain networks and states. bioRxiv; 2019. pmid:31577959
View Article
PubMed/NCBI
Google Scholar

[76] View Article

[77] PubMed/NCBI

[78] Google Scholar

[ref22] 22. Kozachkov L, Lundqvist M, Slotine J-J, Miller EK. Achieving stable dynamics in neural circuits. PLOS Computational Biology. 2020;16(8):e1007659. pmid:32764745
View Article
PubMed/NCBI
Google Scholar

[80] View Article

[81] PubMed/NCBI

[82] Google Scholar

[ref23] 23. Haber E, Ruthotto L. Stable architectures for deep neural networks. Inverse Probl. 2017
View Article
Google Scholar

[84] View Article

[85] Google Scholar

[ref24] 24. Werbos PJ. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE. 1990;78(10):1550–60.
View Article
Google Scholar

[87] View Article

[88] Google Scholar

[ref25] 25. Liao R, Xiong Y, Fetaya E, Zhang L, Yoon K, Pitkow X, et al. Reviving and Improving Recurrent Back-Propagation. ArXiv. 2018.

[ref26] 26. Covington BP, Al Khalili Y. Neuroanatomy, Nucleus Lateral Geniculate. In: StatPearls. Treasure Island (FL): StatPearls Publishing; 2023 Jan.

[ref27] 27. Nayebi A, Sagastuy-Brena J, Bear DM, Kar K, Kubilius J, Ganguli S, et al. Recurrent Connections in the Primate Ventral Visual Stream Mediate a Trade-Off Between Task Performance and Network Size During Core Object Recognition. Neural Computation. 2022;34(8):1652–75. pmid:35798321
View Article
PubMed/NCBI
Google Scholar

[92] View Article

[93] PubMed/NCBI

[94] Google Scholar

[ref28] 28. DiCarlo JJ, Cox DD. Untangling invariant object recognition. Trends in Cognitive Sciences. 2007;11(8):333–41. pmid:17631409
View Article
PubMed/NCBI
Google Scholar

[96] View Article

[97] PubMed/NCBI

[98] Google Scholar

[ref29] 29. Kuznetsov YA. Elements of Applied Bifurcation Theory. New York, NY: Springer; 2004 2004.

[ref30] 30. He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. arXiv; 2015.

[ref31] 31. Linsley D, Ashok AK, Govindarajan LN, Liu R, Serre T. Stable and expressive recurrent vision models. arXiv; 2020.

[ref32] 32. Arfken GB, Weber HJ. Mathematical Methods for Physicists, 6th Edition. Amsterdam, Heidelberg: Academic Press; 2005.

[ref33] 33. Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ICML’15: In Proceedings of the 32nd International Conference on International Conference on Machine Learning, 2015, 448–456.

[ref34] 34. Lin T-Y, Maire M, Belongie S, Bourdev L, Girshick R, Hays J, et al. Microsoft COCO: Common Objects in Context. arXiv; 2015.

[ref35] 35. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature Pyramid Networks for Object Detection. arXiv; 2017.

[ref36] 36. Krizhevsky A. Learning multiple layers of features from tiny images. Technical Report TR-2009, University of Toronto, Toronto.

[ref37] 37. Maaten Lvd, Hinton G. Visualizing Data using t-SNE. Journal of Machine Learning Research. 2008;9(86):2579–605.
View Article
Google Scholar

[108] View Article

[109] Google Scholar

[ref38] 38. Stelzer F, Röhm A, Vicente R, Fischer I, Yanchuk S. Deep neural networks using a single neuron: folded-in-time architecture using feedback-modulated delay loops. Nature Communications. 2021;12(1):5164. pmid:34453053
View Article
PubMed/NCBI
Google Scholar

[111] View Article

[112] PubMed/NCBI

[113] Google Scholar

[ref39] 39. Liao Q, Poggio T. Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex. arXiv; 2016.

[ref40] 40. Spoerer CJ, McClure P, Kriegeskorte N. Recurrent Convolutional Neural Networks: A Better Model of Biological Object Recognition. Frontiers in Psychology. 2017;8. pmid:28955272
View Article
PubMed/NCBI
Google Scholar

[116] View Article

[117] PubMed/NCBI

[118] Google Scholar

[ref41] 41. Lindsay GW, Mrsic-Flogel TD, Sahani M. Bio-inspired neural networks implement different recurrent visual processing strategies than task-trained ones do. bioRxiv; 2022.
View Article
Google Scholar

[120] View Article

[121] Google Scholar

[ref42] 42. Cadieu CF, Hong H, Yamins DLK, Pinto N, Ardila D, Solomon EA, et al. Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition. PLOS Computational Biology. 2014;10(12):e1003963. pmid:25521294
View Article
PubMed/NCBI
Google Scholar

[123] View Article

[124] PubMed/NCBI

[125] Google Scholar

[ref43] 43. Edwards G, Vetter P, McGruer F, Petro LS, Muckli L. Predictive feedback to V1 dynamically updates with sensory input. Scientific Reports. 2017;7(1):16538. pmid:29184060
View Article
PubMed/NCBI
Google Scholar

[127] View Article

[128] PubMed/NCBI

[129] Google Scholar

[ref44] 44. Spratling MW. Predictive coding as a model of cognition. Cognitive Processing. 2016;17(3):279–305. pmid:27118562
View Article
PubMed/NCBI
Google Scholar

[131] View Article

[132] PubMed/NCBI

[133] Google Scholar

[ref45] 45. Vetter P, Edwards G, Muckli L. Transfer of Predictive Signals Across Saccades. Frontiers in Psychology. 2012;3. pmid:22701107
View Article
PubMed/NCBI
Google Scholar

[135] View Article

[136] PubMed/NCBI

[137] Google Scholar

[ref46] 46. Zegers P, Sundareshan MK. Trajectory generation and modulation using dynamic neural networks. IEEE Transactions on Neural Networks. 2003;14(3):520–33. pmid:18238036
View Article
PubMed/NCBI
Google Scholar

[139] View Article

[140] PubMed/NCBI

[141] Google Scholar

[ref47] 47. Aram Z, Jafari S, Ma J, Sprott JC, Zendehrouh S, Pham V-T. Using chaotic artificial neural networks to model memory in the brain. Communications in Nonlinear Science and Numerical Simulation. 2017;44:449–59.
View Article
Google Scholar

[143] View Article

[144] Google Scholar

[ref48] 48. Ryeu JK, Chung HS. Chaotic recurrent neural networks and their application to speech recognition. Neurocomputing. 1996;13(2):281–94.
View Article
Google Scholar

[146] View Article

[147] Google Scholar

[ref49] 49. Mikhaeil JM, Monfared Z, Durstewitz D. On the difficulty of learning chaotic dynamics with RNNs. arXiv; 2022.

[ref50] 50. Felleman DJ, Van Essen DC. Distributed hierarchical processing in the primate cerebral cortex. Cereb Cortex. 1991 Jan-Feb;1(1):1–47.
View Article
Google Scholar

[150] View Article

[151] Google Scholar

[ref51] 51. Bullmore E, Sporns O. Complex brain networks: graph theoretical analysis of structural and functional systems. Nat Rev Neurosci, 2009; 10, 186–198. pmid:19190637
View Article
PubMed/NCBI
Google Scholar

[153] View Article

[154] PubMed/NCBI

[155] Google Scholar

[ref52] 52. Yeo BT, Krienen FM, Sepulcre J, Sabuncu MR, Lashkari D, Hollinshead M, et al. The organization of the human cerebral cortex estimated by intrinsic functional connectivity. J Neurophysiol. 2011 Sep;106(3):1125–65. pmid:21653723
View Article
PubMed/NCBI
Google Scholar

[157] View Article

[158] PubMed/NCBI

[159] Google Scholar

[ref53] 53. Wang XJ. Macroscopic gradients of synaptic excitation and inhibition in the neocortex. Nat Rev Neurosci 2020; 21, 169–178. pmid:32029928
View Article
PubMed/NCBI
Google Scholar

[161] View Article

[162] PubMed/NCBI

[163] Google Scholar

[ref54] 54. Murray J, Bernacchia A, Freedman D, Romo R, Wallis JD, Cai Xinying, et al. A hierarchy of intrinsic timescales across primate cortex. Nat Neurosci 2014; 17, 1661–1663. pmid:25383900
View Article
PubMed/NCBI
Google Scholar

[165] View Article

[166] PubMed/NCBI

[167] Google Scholar

[ref55] 55. Gao R, van den Brink RL, Pfeffer T, Voytek B. Neuronal timescales are functionally dynamic and shaped by cortical microarchitecture. eLife 2020 Nov. pmid:33226336
View Article
PubMed/NCBI
Google Scholar

[169] View Article

[170] PubMed/NCBI

[171] Google Scholar

[ref56] 56. Wang XJ. Probabilistic decision making by slow reverberation in cortical circuits. Neuron, 2002. pmid:12467598
View Article
PubMed/NCBI
Google Scholar

[173] View Article

[174] PubMed/NCBI

[175] Google Scholar

[ref57] 57. Lim S, Goldman MS. Balanced cortical microcircuitry for maintaining information in working memory. Nat Neurosci, 2013 Aug 18;16(9):1306–14. pmid:23955560
View Article
PubMed/NCBI
Google Scholar

[177] View Article

[178] PubMed/NCBI

[179] Google Scholar

[ref58] 58. Gruslys A, Munos R, Danihelka I, Lanctot M, Graves A. Memory-Efficient Backpropagation Through Time. Neural Information Processing Systems, 2016.
View Article
Google Scholar

[181] View Article

[182] Google Scholar

[ref59] 59. McIntosh L, Maheswaranathan N, Sussillo D, Shlens J. Recurrent Segmentation for Variable Computational Budgets. arXiv; 2018.

[ref60] 60. Shi J, Wen H, Zhang Y, Han K, Liu Z. Deep recurrent neural network reveals a hierarchy of process memory during dynamic natural vision. Human Brain Mapping. 2018;39(5):2269–82. pmid:29436055
View Article
PubMed/NCBI
Google Scholar

[185] View Article

[186] PubMed/NCBI

[187] Google Scholar

Figures

Abstract

Author summary

1 Introduction

2 Methods

2.1 Reduced neural network

2.2 A recurrent convolutional neural network

2.3 Network training

3 Results

3.1 A biological implementation with feedforward delay is more stable

3.2 Cases where biological and artificial transmission are equivalent

3.3 Longer loops are more stable

3.4 Fully connected networks are less stable

3.5 Mixed-feedback

3.6 Nonlinear dynamics

3.7 Feedback connections improve detection of small objects

3.8 Feedback connections improve robustness against noise

3.9 Activity dynamic reduces entropy, improving classification performance over time

4 Discussion

Supporting information

S1 Text. Mathematical proofs and implementation details.

References