Skip to main content
Advertisement
  • Loading metrics

Energy optimization induces predictive-coding properties in a multi-compartment spiking neural network model

  • Mingfang Zhang,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing

    Current address: ENS-PSL, Ecole normale supérieure, Paris, France

    Affiliation Centrum Wiskunde & Informatica, Amsterdam, The Netherlands

  • Raluca Chitic,

    Roles Conceptualization, Formal analysis, Methodology, Software, Writing – review & editing

    Affiliation University of Groningen, Groningen, The Netherlands

  • Sander M. Bohté

    Roles Conceptualization, Formal analysis, Funding acquisition, Methodology, Supervision, Validation, Visualization, Writing – review & editing

    S.M.Bohte@cwi.nl

    Affiliations Centrum Wiskunde & Informatica, Amsterdam, The Netherlands, Swammerdam Institute for Life Sciences (SILS), University of Amsterdam, Amsterdam, The Netherlands

Abstract

Predictive coding is a prominent theoretical framework for understanding hierarchical sensory processing in the brain, yet how it could be implemented in networks of cortical neurons is still unclear. While most existing studies have taken a hand-wiring approach to creating microcircuits that match experimental results, recent work in rate-based artificial neural networks revealed that suitable cortical connectivity might result from self-organisation given some fundamental computational principle, such as energy efficiency. As no corresponding approach has studied this in more plausible networks of spiking neurons, we here investigate whether predictive coding properties in a multi-compartment spiking neural network can emerge from energy optimisation. We find that a model trained with an energy objective in addition to a task-relevant objective is able to reconstruct internal representations given top-down expectation signals alone. Additionally, neurons in the energy-optimised model show differential responses to expected versus unexpected stimuli, qualitatively similar to experimental evidence for predictive coding. These findings indicate that predictive-coding-like behaviour might be an emergent property of energy optimisation, providing a new perspective on how predictive coding could be achieved in the cortex.

Author summary

Predictive coding is an elegant and influential theoretical framework for understanding learning and processing in the brain, with several experimental findings seemingly in support. Yet, current predictive coding frameworks require specific connectivity motifs to be implemented whose emergence so far has remained unexplained – instantiated with spiking neurons, such motifs become even more intricate and more difficult to explain. An alternative point of view assumes that the brain is capable of efficient deep learning in some manner, for example energy optimization in rate-based RNNs can result in network behavior reminiscent of predictive coding. However, real biological networks differ from RNNs in important ways: first, they operate in continuous time rather than sequential steps, and second, real biological neurons emit binary spikes, which for instance makes it difficult to communicate an error that could be positive or negative. Defining an internal energy-measure for multi-compartment spiking neurons, we demonstrate how the resulting recurrent networks can exhibit several predictive coding like-properties when optimizing for both task and energy efficiency. The energy optimized network then demonstrates lower overall activity, generative behavior, and differential responses to expected vs unexpected stimuli. Energy-minimization in multi-compartment spiking neurons can thus bring tangible benefits and explain predictive-coding like empirical findings.

1. Introduction

Predictive coding is a prominent theory of sensory processing in the brain, postulating that the brain learns a generative model of the world capable of predicting sensory inputs through hierarchically organized brain areas [1,2]. Although indirect experimental evidence and computational models built with predictive coding principles have successfully explained various experimental phenomena, the precise neural implementation of predictive coding remains a subject of debate [3,4]. Proposed algorithms disagree in terms of the neuronal types and connectivities, with disparate views for what cortical microcircuits are involved in the implementation of predictive coding [3,58].

Computational models of microcircuits are instrumental in uncovering computational mechanisms and generating novel hypotheses for understanding predictive coding in the cortex. However, existing proposals of predictive coding are subject to two main constraints that limit their accuracy in capturing the corresponding cortical mechanisms. First, most predictive coding algorithms involve specific wiring between distinct neuronal types with particular abstractions that might be ill-founded. For instance, the classical formulations of predictive coding implement separate error and prediction neurons within each layer or area (Fig 1A), though limited evidence supports functionally distinct sub-populations [3,9,10]. Some studies apply a highly constrained one-to-one correspondence between error and prediction neurons within individual cortical regions (Fig 1A), which is a biologically problematic assumption about the cortex [11,12]. Another example is the specific wiring between excitatory and inhibitory neurons to create three functionally different groups for encoding representations, positive, and negative errors in the canonical circuit proposed by [3] (Fig 1B). While these models are built to implement predictive coding principles, it is difficult to argue that these specifically hardwired microcircuits are precisely present in the cortex. The second major limitation is that most models are non-spiking networks which lack biological realism [9,1315]. This has been mostly due to the lack of a straightforward way to transfer classical rate-based predictive coding to a spiking implementation, ie. spiking neurons cannot signal negative errors without specific wiring. The difficulties in training spiking neural networks have also hindered efforts in this direction [4]. Additional trade-offs occur between biological fidelity and scalability, which makes it difficult to study more complex phenomena in a biological network [9,11,14,15]. The few studies implementing predictive coding in spiking neural networks like in [11] leave a gap in the literature for more biologically realistic network models without specific architectural biases.

thumbnail
Fig 1. Example microcircuits of predictive coding.

(A) Classical predictive coding from [12] with separate error (E) and prediction (P) neurons in each layer. A one-to-one connection is imposed between error and prediction populations. (B) Schematic proposal from [3] with specific wiring between excitatory and inhibitory neurons to encode positive errors (E yellow background), negative errors (E blue background), and representations/predictions (P). Connectivity types between neuron populations were uncategorized. (C) Our architecture with multicompartment neurons in each layer. The rectangle denotes the apical tuft compartment and the triangle denotes the somatic compartment. Fully connected feedforward signals are integrated at the soma (triangle) and feedback signals at the apical tuft (rectangle). In contrast to the hard-wiring approach in (A) and (B), our proposal does not assume the presence of specialised neuron types or circuits.

https://doi.org/10.1371/journal.pcbi.1013112.g001

To build a model without these limitations, we take inspiration from several approaches adopting a gradient optimisation approach to investigate the relationship between more fundamental computational principles and structural-functional properties [1618]. We argue that if the network exhibits cortical properties after being optimised for a particular objective, it means that that objective could also be optimised in the brain and be a driving force in the learning of cortical connectivities. This approach allows us to hypothesize and test the more fundamental computational goals that give rise to neural properties. Ali et al. [16] demonstrated the potential of this approach by showing that energy optimisation in a rate-based recurrent artificial neural network led to the prediction of input at the next time step via inhibition, aligning with the classical formulation of predictive coding. The authors argued that predictive coding can result from self-organisation as the cortex optimises for energy efficiency, extending the connection between predictive coding and energy efficiency [1923]. Moreover, their findings suggest that predictive coding microcircuits do not have to be hard-wired in the cortex, but can instead be an emergent attribute of a system with some fundamental architectural components in place. Although the conclusions from [16] have limited generalizability due to the use of a single-layer non-spiking network, they provide a new perspective on predictive coding implementation in the cortex based on gradient optimisation.

In this work, we apply the optimisation approach in more detailed and biologically plausible spiking neural networks. Inspired by recent progress on predictive coding in spiking and multi-compartment neurons [8,11,24], in particular the somato-dendritic error mismatch scheme proposed by [25], we create a multi-layer multi-compartment spiking neural network that can be trained in a supervised fashion using gradient optimisation. The question we ask is whether energy optimisation would induce predictive-coding-like behaviour. Following the core notion in [25], we define the energy loss as a function of the voltages in the separate compartments of each spiking neuron in the model. We hypothesise that within a multi-layer network with basic feedforward and feedback connections between areas, an additional ‘internal’ energy loss optimised alongside a task loss will be enough for predictive-coding-like behaviour to emerge. After training, we evaluate two unique properties supporting predictive coding: the models’ capabilities of reconstructing internal representations with top-down expectation signals and their differential responses to expected versus unexpected stimuli. We find that the energy-optimised network is capable of holding internal representations of expected stimuli in the absence of actual input, similar to what was found in the human brain [26,27]. We also qualitatively replicate the empirical results showing differential responses in both apical tuft and somatic voltage of neurons when perceiving expected versus unexpected stimuli [28]. The unique presence of these properties in the energy-optimised model demonstrates that when optimizing for an energy minimization objective, predictive-coding-like behaviour can be learned without pre-specified connectivity. Additional analyses find that network training results in stable internal connectivity despite the possibility of spiking saturation due to positive feedback loops. Overall, this work demonstrates that using an optimisation approach in spiking neural networks can inform the underlying computational principles driving the emergence of predictive coding circuits and produce models that match experimental results.

2. Methods

2.1. Neuron and network model

Taking inspiration from previous approaches [8,2931], we construct a simple multi-compartment spiking neuron model that mimics a pyramidal cell in the cortex (Fig 1C). Each neuron has two compartments: a dendritic compartment representing the apical tuft of a neuron and a somatic compartment. The apical tuft integrates inputs from higher areas in the hierarchically organised network, while the soma directly integrates feed-forward information [3235]. Voltage in the apical tuft unidirectionally affects the soma potential. As we focus on object classification in visual hierarchical processing, which involves mainly the inter-layer interactions, we omitted the details of basal dendritic sites to arrive at a simple neuronal model where bottom-up inputs are directly integrated into the soma [8,24]. This setup captures some key aspects of the current understanding of cortical connectivity patterns between areas [36].

The neuron model’s spiking mechanism is modelled as in the Adaptive Leaky-Integrate-and-Fire (ALIF) model, a LIF neuron augmented with an adaptive firing threshold [37]. The spiking of a neuron i is a function of the somatic membrane potential and the spiking threshold bi(t): spikes {tj} from a neuron j are modeled as Dirac delta-functions , and a neuron i emits a spike if the somatic membrane potential at time ti exceeds the threshold from below, emitting a spike . Three factors, the voltage at the apical tuft (), the somatic membrane potential (), and the adaptive threshold (bi(t)), affect the spiking dynamics of each neuron (Fig 2A). vAdopting the idiom of the deep learning field, we refer to the calculation of neural activity patterns in the network given inputs as “inference”. At each time step of inference, each neuron simultaneously traces the top-down and bottom-up signals in the somatic and apical compartments respectively. The apical dendritic compartment receives top-down spike-trains from the next layer and its voltage evolves according to:

thumbnail
Fig 2. The energy model optimises for energy compared to control.

(A) Schematic illustration of network architecture with multi-compartment spiking neurons. Only two layers are shown here. Feedforward connections project to the somatic compartment of neurons in the next layer while feedback connections project to the apical tuft dendrite compartment of the previous layer. The voltage at the apical tuft uni-directionally affects the somatic membrane potential. The output neurons are non-spiking membrane potential integrators that determine the predicted class. (B) Test error per epoch for both models. Results from ten models, initialised with different random seeds, for each condition are assessed. (C) Energy per neuron averaged over all samples as the mean absolute voltage difference between soma and apical tuft compartments. All layers in the energy model show lower energy than the control model. (D) Mean spike rate per layer in the energy and control models. Neurons in the energy model spike less across all layers. (E) Absolute values of feedforward and feedback weights. The left panel plots each set of weights separately. The right panel plots all weights and shows that the energy optimisation also results in smaller weights in the energy model. Error bars in all sub-figures plot confidence interval.

https://doi.org/10.1371/journal.pcbi.1013112.g002

(1)

where is the apical voltage of the ith neuron in layer l, is the time constant for the apical site, and is the feedback weights from layer l + 1 to l. The membrane potential at the somatic compartment evolves following

(2)

where the is the somatic membrane potential of the ith neuron in layer l, is the corresponding time constant, and is the adapted spiking threshold at time t. The somatic compartment voltage is directly influenced by the feed-forward signal, in the form of spikes from the previous layer weighted by feed-forward weights () from layer l–1 to l, and the voltage at the apical tuft. We let the strength at which the voltage from the apical tuft () drives the soma be determined by a shifted sigmoid function f(x), defined as:

(3)

Inspired by [17], this function bounds the influence from apical tuft at each time step for both positive and negative voltage ranges. The unidirectional influence from the apical tuft to the soma means that over time only the somatic compartment integrates two sources of input from the lower and higher hierarchical areas.

Whether an ALIF neuron spikes at a given time step is additionally dependent on the adaptive spiking threshold bi(t), which is determined by:

(4)

where b0 is the baseline threshold, is the adaptive contribution term, and is a constant (default value 1.8) that determines the size of adaptation of the threshold. The adaptive contribution to the spiking threshold of each neuron evolves following:

(5)

where is the time constant that determines the decay rate of . Whenever a neuron receives sufficient bottom-up and top-down inputs such that a spike is emitted, the increase in raises the spiking threshold, making the neuron less likely to spike again at the next time step. After spiking, the somatic potential undergoes a soft reset through a spike-triggered refractory response of size (Eq. 2), retaining the amount in the potential that exceeds the threshold due to the time step effect. Overall, the spiking dynamics of each neuron in the network are determined by a combination of feed-forward and feedback inputs, an adaptive spiking threshold, and the time constants that control the decay rates of each dynamical variable developing in the neuron.

The studied network architecture is composed of three layers of multi-compartment spiking neuron models (L1, L2, L3) (Fig 2A). In each layer, the neurons receive spiking input from both the lower and higher layers via fully connected weights, where a bias is implemented for each neuron as a constant current injection through a trainable weight, , as in [42]. The output layer is comprised of non-spiking leaky neurons that integrate inputs through membrane potentials following

(6)

where is the membrane potential of one output neuron, is the time constant, and is the spike-train from L3. Due to the non-spiking nature of these output neurons, we first L2-normalise their membrane potentials before passing them as directly injected currents through the feedback weights from the output to layer 3 (Fig 2A). Overall, the network can be seen as a fully connected network with feedforward and feedback connections with internal recurrence within the dynamics of each neuron. Inputs are injected at each timestep as a constant spike-train proportional to the intensity of the input value, as in [38]. Training adjusts all weights W as well as all decay-constants , the latter specific for each individual neuron. The Eqs (1), (2), (5), and (6) are calculated using the Euler forward method, see vbS1 Text.

2.2. Training and task

We implement supervised training of the networks, both with and without an energy-loss term, to investigate whether predictive coding properties can arise due to energy optimisation. The training process utilizes a combination of the online learning algorithm Forward Propagation Through Time (FPTT) and surrogate gradients, which enables end-to-end optimisation using gradient descent within the Pytorch auto-differentiation framework [3840,42]. The Forward-Propagation-Through-Time (FPTT) algorithm [40], which enables training of complex spiking neural networks on classification tasks [38], allows updates of parameters at each or every K timesteps (K-step updates) during the sequence. We apply K-step=10 updates during training as we find that empirically yielded the best results. Unlike the more standard Backpropagation Through Time (BPTT) algorithm, where parameters are updated once at the end of each sequence, FPTT achieves online learning through immediate updates to network parameters by optimizing a dynamic regularizer in addition to the task-relevant loss [40]. As we show in our results, FPTT resulted in better learning of feedback weights in the energy models than classical Backpropagation Through Time (BPTT). For the surrogate gradient, we apply the Multi-Gaussian surrogate gradient introduced in [42] which was shown to consistently outperform other surrogate gradients.

At each update step during training, the parameters are optimised with respect to a global loss, which contains a task-relevant loss and the dynamic FPTT regularizer. In the energy optimization condition, an energy term is added to the global loss function as an additional regularizer to be optimised. Within our multi-compartment neuronal model, we defined an energy term using a function g of the apical tuft and soma compartments voltages at the time of update:

(7)

where g() computes the absolute difference between the voltages and is the average of all outputs of g in the network (N: total number of neurons). Here, the voltages from different compartments are used to compute the membrane potential that determines the spiking dynamics. In [25], such a difference signal is computed via a dedicated inter-neuron projecting the somatic output to the apical tuft. Our version is thus a roughly equivalent efficient implementation. Alternatively, biological neurons may compute this separate signal g() via some biochemical pathway in the neuron diffusing from soma to apical tuft. The signal g() can be interpreted either as the electric potential energy local to each neuron, or alternatively be regarded as a comparison between the integrated feedforward and feedback signals within each neuron. The overall loss optimised during training at each learning step follows:

(8)

where is the task-related classification loss (Negative Log-Likelihood), is the dynamic FPTT regularizer, and , are constant scalars for weighting respective regularizers. An energy-optimised model was trained with and the control model with . We use the AdamX optimiser [43] and apply dropout as well as weight decay during the training to reduce overfitting.

We train the network to perform MNIST handwritten digit classification. The MNIST dataset consists of 60,000 training and 10,000 test samples which were normalised during preprocessing. The network runs inference for T time steps on each image and is reinitialised between samples. The log softmax values of output membrane potentials determine the predicted class. At the beginning of inference for each batch of samples, spiking neurons are initialised with somatic membrane potentials uniformly distributed between 0 and 1. All and are set to 0 and b0 to 0.1 at the beginning of each inference (see Tables 1 and 2 for all hyperparameter settings). Network weights were initialised with Xavier initialisation [44] and all bias terms were initialised to 0 prior to training, ie Wib = 0. Hyperparameters are determined with reference to [42].

thumbnail
Table 2. Initialisation values for hyperparameters of each neuron. All time constants were initialised to have normal distributions centred around the values presented in the table with a standard deviation of 0.1. All output neurons had Ï"mem initialised to be the same constant.

https://doi.org/10.1371/journal.pcbi.1013112.t002

3. Results

3.1. The energy model shows lower inter-compartmental and spiking energy than the control

We initialise ten models for each condition with different random seeds to assess model performance. After training, models of both conditions achieve good performance on the MNIST classification test set, with an error rate of % accuracy) for the control models and , (97.59% accuracy) for the energy models (Fig 2B). We find that this decrease in accuracy for the energy-networks is gradual as a function of the strength of energy-regularization, that is, larger setting result in fewer spikes and lower accuracy. This finding in consistent with literature [41] and we selected a value for that still resulted in high accuracy.

One energy-optimised model and one control model trained with FPTT are randomly selected for the subsequent analyses, where the control model was studied at equal accuracy as the energy model by selecting an accuracy-matching earlier check-point – all findings also held up when using the fully trained control model.

We first validate that the energy model indeed consumes less energy than the control model (Fig 2). We assess this using two key metrics: energy computed by per neuron across samples, and the average spike rate of each layer per sample. During inference on the test set, which we run for T time steps, the mean energy for each layer in the energy model is lower than their counterparts in the control model and consistently stabilises at a value below the initial level, indicating the additional energy loss successfully induced energy optimisation in the network (Fig 2C). We then compute the mean spike rate per layer in response to each sample in both models: we find that the energy-optimized model emitted fewer spikes compared to the control model (Fig 2D). This could be attributed to the significantly lower mean absolute weights of the feedforward connections, akin to synaptic transmissions, which resulted in smaller contributions to the energy consumption of the energy model overall (Fig 2E). We also see here that overall the energy model has smaller weights than the control model. Having adaptive thresholds was critical, as removing adaptation by setting in (4) resulted in poor accuracy and significantly increased firing rates, both for the control model and in particular for the energy model (Supporting Information S4 Fig). The trained time-constants of the neurons do not contribute to the differences in energy consumption as the distributions are similar across both models (Supporting Information,S1 Fig). These findings demonstrate that by minimizing the inter-compartmental voltage difference, a measure of the electrical potential energy within each neuron, we concurrently achieve reduced spiking and synaptic transmission, which are two main sources of neuronal energy consumption [45]. In particular, this also establishes the voltage difference as a valid proxy for energy consumption in these models.

3.2. Only the energy model can reconstruct internal representations given top-down signals

We next ask whether the energy-trained model can generate internal representations with occluded or no inputs. Not only is the brain able to imagine visual objects, experiments have also shown that the retinotopic areas where visual input is occluded within a larger image contain information about the image, which could be explained by the activation of those areas due to top-down projections carrying predictions or context given the non-occluded parts of the visual stimuli [26]. We conduct a similar experiment on the trained networks to see whether we could replicate this result. To decode from the spiking representations, we first train a linear decoder to reconstruct the test sample from the spiking pattern (vector containing the average spikes per neuron across inference time) in a particular layer (Fig 3A). The decoder is trained to minimise MSE loss between the projected image and the actual test sample via gradient descent (using the Adam optimiser) over 20 epochs. The error curves of decoder training for both models (Fig 3B) demonstrate that the linear decoder successfully converged when fitting to the training data. One decoder is trained for each layer from each model and used to decode what information the internal representations of the networks contain.

thumbnail
Fig 3. Reconstructive capacity of the energy model.

(A) Illustration of the decoding setup. A linear decoder is trained to reconstruct the original image from the spike rate representation of one layer along T steps. An example heat map of the spike rate over T time steps is shown on the left. On the right is an example projected image from the spiking representation. (B) MSE loss during training of decoders for L2. Both decoders were able to fit the training set well. (C) Comparison of network inference with correct class clamping. The energy model can fully reconstruct the digit while the control model does not perform meaningful reconstruction of internal representations. (D) Decoded internal representations with class clamping absent input. Below each decoded digit, the top 3 classification as obtained from the respective models given these digits as input. Only the energy model reconstructs class-specific internal representations. Clamped representations from the control model are indistinguishable between classes. (E) Pair-wise representational similarity of clamped vs normal representations in the energy and control models. A clear class-specific representational structure is present in the energy model while absent in the control model.

https://doi.org/10.1371/journal.pcbi.1013112.g003

We first test the networks with a half-occluded image randomly sampled from a class (eg. number 3 in Fig 3C) with the correct class clamping in the output layer to mimic top-down predictive projections from processing areas downstream to the visual cortex. During clamping, the membrane potentials of the output layer are fixed to be the same vector throughout inference on one sample, where the membrane potential of the output neuron for the intended class was set to 1 and others were set to -1, which modelled perceiving a partially occluded image with internal expectations of the image class. In both the occluded and no input conditions, models are given 5T steps for inference to compensate for the reduced inputs and to leave sufficient time for top-down projections to take effect. As shown in Fig 3C, with the correct class clamping, the energy model’s internal representation from the L2 is able to fill in the occluded parts while the control model does not perform meaningful reconstruction. Presenting the correct class clamping induces the energy-optimised network to reconstruct the intended image ‘3’. Notably, if a uniformly distributed noise vector is used to clamp the output neurons, the energy model reconstructs different digits in the internal representations with repeated sampling of noise (Fig 4). This demonstrates that internal representations in the energy model differed depending on the prior when the input was ambiguous.

thumbnail
Fig 4. Decoded clamped representation with ambiguous occluded input and noise vector clamping.

The same occluded sample ‘3’ from Fig 3C and the same decoding scheme were used to decode representations from L3 of the energy model. A vector with uniform noise was used to clamp the output neurons during inference. The figure shows samples generated from 20 samples of random noise. The decoded images show that depending on the noisy prior, the energy model would internally represent different digit classes (eg. ‘0’, ‘3’, ‘8’, inset number denotes the digit the sample is classified as when presented as input.).

https://doi.org/10.1371/journal.pcbi.1013112.g004

We next test the models’ capabilities to reconstruct without any input (pixel values equal to 0) and with only clamping. The same clamping and decoding methods are used as described above for internal representations from the models over 5T time steps of inference with class clamping. We find that only the energy model’s spiking representations could be decoded into digits while those in the control model are indistinguishable between classes (Fig 3D). This is further verified by a Representation Similarity Analysis (RSA) [46] of the per-class representations of networks in the normal inference condition (with input) or the clamped condition (no input) (Fig 3E). We compute the normal representations for each class by averaging the spike rate patterns for each layer over all samples of each class. The clamped representations are taken as the spike rate pattern per layer given a clamped class. The pair-wise similarities were computed as 1 minus the cosine distance of normal and clamped representations per class. As shown in Fig 3E, the clamped representations in the energy model show a clear class-specific structure, where the clamped representation is most similar to the normal representation from the corresponding class; this pattern was not observed in the control model. After grouping pair-wise similarities into same-class or different-class similarities across layers, results further confirmed that the clamped representation in the control model does not contain any class-specific information (Supporting Information S2 Fig). All these results indicate that only the energy model has the capability of reconstructing, thus predicting the inputs when top-down signals are provided as a prior for disambiguating or imagining the inputs. The energy regularizer induced effective learning of feedback weights such that representations in a higher layer could spatially predict the bottom-up signals received by the lower layer.

3.3. Neurons in the energy model respond differentially to expected vs unexpected stimuli

One neural phenomenon at the foundation of predictive coding is that neurons respond differentially to expected versus unexpected stimuli [28,47]. We thus ask whether the energy model would exhibit such properties. To evaluate this, we designed a match/mismatch experiment that simulates scenarios of expected versus unexpected stimuli for the trained networks, thereby probing if the neuronal response within the energy model varied across these conditions (see Fig 5A and Fig 5B). In this experiment, the models initially receive no stimuli. Upon stimulus onset, an image sample is introduced as normal, accompanied by clamping at the output neurons. This clamping corresponds to either the actual class of the image (representing the match or expected condition) or an incorrect class (representing the mismatch or unexpected condition) (Fig 5A). The membrane potential of the output neuron linked to the clamped class is subsequently set to 1, while those of all other neurons were set to -1: by clamping the top-down information in the network, we create an environment wherein the information relayed from higher hierarchical areas of the brain either corroborated or contradicted the bottom-up input. The presentation of the stimulus and the associated clamping is followed by an additional phase of zero input, marking the conclusion of the inference process (Fig 5B). Both the class of the presented stimulus and the clamped class are randomly selected.

thumbnail
Fig 5. The energy model responds more differentially to unexpected stimuli than the control model.

(A, B) Illustration of the match/mismatch experiment. Either the correct or wrong class of output neuron is clamped at the output layer. The models are given T time steps for inference, with no inputs on either end of stimulus presentation (“stim pres”). (C, D, E) Distributions of apical tuft compartment voltage difference, soma voltage difference, and spike rate difference between conditions. All differences are computed for the time steps during stimulus presentation (). Across all three metrics, the energy model shows a more drastic response difference between expected and unexpected stimuli than the control model (Supporting Information S1 Table). (F) Difference in spike rate between experimental conditions across layers. (G) Examples of voltage trajectories in the apical tuft and soma compartments from single neurons from layer 2 in the energy model with different MSD. (H) MSD responses decrease more strongly for the energy model compared to the control model, as a function of decreased top-down clamping, for both somatic and tuft response.

https://doi.org/10.1371/journal.pcbi.1013112.g005

To compare the extent of differential response to expected and unexpected stimuli in the energy and control models, we compute a Mean Signed Difference (MSD) in voltage signals between match and mismatch conditions in each compartment (, ) for each neuron within one layer during stimulus presentation:

(9)

where denote the size of layer l, and Vma and Vmm denote the potential (either apical or somatic) in the match and mismatch conditions respectively.

Comparing the distribution of these MSDs in each compartment in the energy and control model, we observe that significantly more neurons in L2 of the energy model have larger MSD between conditions in voltage traces than in the control model, indicating the energy model exhibits a greater differential response to unexpected stimuli (Fig 5C and Fig 5D). Example voltage traces of neurons with different MSD values are presented in Fig 5G, where we observe diverging voltage values during stimulus presentation in a subset of neurons. This is also reflected in the differences in spike rates () per neuron during stimulus presentation between conditions in different models (Fig 5E). Kurskal Wallis tests on the distributions of MSD in voltage traces and all yielded significant differences (Supporting Information S1 Table). We proceed to compute the per neuron across all three layers (Fig 5F). This reveals that L3 in the energy model - the highest in the processing hierarchy - displays a markedly more pronounced divergence in spiking responses between conditions relative to the lower layers. This could be attributed to the hierarchical nature of our model, wherein the upper layers are primarily driven by top-down signals, while lower layers are chiefly influenced by inputs [33]. We remark that this suggests a novel prediction that can be validated through neural recordings from different cortical areas in match/mismatch experimental paradigms.

The MSD measure also enables us to predict the effect of a purported increase in classification uncertainty: we decrease the amount of top-down clamping of during stimulus representation, while equally distributed this decrease to increased activation of the other classes. As shown in Fig 5H, the result of this manipulation is a smooth and substantial decrease of the mismatch difference in the energy model, while there is barely a change in the control model.

In all, these findings show that the energy model successfully replicated the experimental outcomes delineated in [28] while making specific predictions for novel manipulations like classification uncertainty. At the same time, such properties were notably absent in the control model. We thus infer that the observed predictive coding properties of the network, the distinct response to expected and unexpected stimuli, can be attributed to the energy optimization in the energy model.

3.4. The internal connectivity is stable in both models

We next ask whether the trained networks are stable. Given that the energy model is optimised for matching bottom-up and top-down projections, a neuron might end up receiving both excitatory feed-forward and excitatory feedback inputs in a potentially positive feedback loop, leading to over-excitation that would result in instability in firing (eg. saturation of spiking, Fig 6A). To confirm the stability of the models, we vary the amount of current input into the networks by adding or subtracting pixel values from the preprocessed images. Subtracting pixel values increases inputs as currents into the neurons due to negative weights associated with the negative pixel values (Supporting Information S3 Fig). We find that overall the spike rates of neurons in both models responded roughly linearly to variations in the average current input into the neurons, that is, the larger the positive currents, the higher the resulting firing rates (Fig 6B). This is not due to saturation of spiking in the neuron, as most neurons respond in a graded fashion to increase in spike-rates: in Fig 6C, we plot for 20 randomly selected neurons in L1 the slope for the change in firing rate as a function of modulation of the input. We applied 5 different levels of input modulation and measures the change in firing rate for each level. We used linear regression to fit this change to a linear response model, obtaining excellent fits with for all neurons; Fig 6C plots the slope for each of the 20 neurons. The energy model’s response also varies less than the control model for the same amount of input manipulation due to smaller input weights in the network, which could also explain the lower performance deterioration in test accuracy with different input intensities (Supporting Information, (Supporting Information S3 Fig). Overall, these results demonstrate that the trained models were indeed stable and reflect the intensity of inputs through spike rates.

thumbnail
Fig 6. Model stability.

(A) Illustration of stable and unstable networks. The spiking neurons in the network emit spikes at respective spike rates given inputs at baseline intensity. In a stable network, most neurons respond with a higher spike rate with stronger input, resulting in a higher overall spike rate in the network. In an unstable network, the few neurons that are linked in positive feedback loops via feedforward and feedback weights either barely spike or show saturated spiking at every time step with stronger input. The colour of neurons in the illustration indicates their spike rate. Neurons with the darkest blue have saturated spiking. (B) The average spike rate of all neurons in the model per sample in relation to normalised average input into each neuron in L1. (C) Slope for the spike rate as a function of input modulation (dS/dA) for 20 individually sampled neurons in L1 of the energy model. Most sampled neurons show a graded response in spike rate to input pixel manipulation.

https://doi.org/10.1371/journal.pcbi.1013112.g006

3.5. FPTT results in more effective learning of feedback weights in the energy model than BPTT

Finally, we investigate whether the distinct temporal credit assignment mechanisms of FPTT and BPTT would lead to any substantial differences in the properties of trained networks. We train an additional energy model using BPTT and contrast its reconstructive capabilities with the FPTT-trained energy model. To quantify the reconstructive quality, we computed the cosine distance between the decoded images from spiking representations and the mean pixel values of images from each class. We find that the BPTT-trained energy model can internally represent different digit classes with no input and only top-down clamping, yet the quality of reconstruction from each layer is consistently lower than that from the FPTT-trained energy model (Fig 7B and Fig 7C). The reconstruction quality is particularly worse in L1 of the BPTT-trained energy model, indicating degradation of temporal credit assignment in BPTT towards lower areas in the network processing hierarchy. The class structure in the clamped representations of the BPTT-trained energy model is also less pronounced (Fig 7D) than in the FPTT-trained energy model. Overall, given that reconstruction relies mainly on feedback projections between adjacent layers, these results demonstrate the less effective learning of feedback weights in the BPTT-trained energy model. This suggests that the temporal locality of credit assignment might be crucial in credit assignment for feedback weights in a hierarchically organised network.

thumbnail
Fig 7. Reconstructive capacities of BPTT- vs FPTT-trained energy models.

(A) Test errors from BPTT-trained and FPTT-trained energy models. The error rate is 2.19% for the BPTT model and 2.49% for the FPTT model. (B) Quality of reconstructed internal representations in BPTT- vs FPTT-trained energy models. The cosine distances are computed between the mean pixel values of each class and decoded images from internal clamped spiking representations from respective models. Across layers, the clamped class representations in the BPTT-trained energy model are more different from the class mean image. (C) Decoded images from internal spiking representations in L2 of BPTT-trained and FPTT-trained energy models. The quality of decoded images from the BPTT-trained energy model is lower than those of the FPTT-trained energy model (FPTT decoded images repeated from Fig 3D for comparison). (D) Pair-wise representational similarity of clamped vs normal representations in BPTT-trained and FPTT-trained energy models. The class-specific representational structure in the energy model is less pronounced than that in the FPTT-trained energy model.

https://doi.org/10.1371/journal.pcbi.1013112.g007

4. Discussion

We demonstrate that energy optimisation, formulated as a function of voltages in different compartments within each neuron in a spiking neural network, is a potential computational principle for driving cortical learning to produce predictive-coding properties. Our energy loss, which can be interpreted as the electrical potential energy within each neuron, proves to be an adequate proxy for energy consumption as it encourages reduced spiking and synaptic weights in the energy model. Given a main classification objective, the spiking network trained with an additional energy loss is able to learn feedback weights that predict the bottom-up inputs. The resulting energy model also replicates experimental findings supporting the theory of predictive coding, such as the reconstruction of inputs with only top-down feedback [26] and differential responses to expected versus unexpected stimuli [28], where our model additionally predicts that increasing classification uncertainty is associated with decreased differential responses. These predictive coding properties are observed only in the energy model, thus providing support for the hypothesis of energy efficiency underlying the emergence of predictive-coding-like behaviour in the cortex. The weights are successfully trained to produce stable internal connectivity in the network despite the potential for positive feedback loops in the network. Additionally, our results indicate that feedback weights in the FPTT-trained energy model are learned more effectively than those in a corresponding BPTT-trained energy model, demonstrating the benefit of choosing a learning algorithm with temporal locality in its credit assignment.

Our results yield two notable implications. First, our network communicates predictions between layers, as opposed to errors. Since the matching of bottom-up and top-down signals is carried out implicitly within individual neurons, this dispensed the necessity for discrete error and prediction neurons within each area. Second, the feedback weights essentially serve to reverse the feature extraction operations performed by the feed-forward weights. This aligns with the hierarchical predictive coding scheme, where abstract representations from higher cortical areas must be translated into more sample-specific representations within earlier cortical areas, acting as a form of prediction. In our network configuration, this translation is accomplished directly by the feedback weights linking prediction neurons between layers, rather than linking prediction to error neurons. While our network model does not mirror the specific laminar organisation of the cortex, it offers an alternative perspective for the transference of information between cortical areas. This direct mapping from higher to lower hierarchical representations may elucidate how non-stimulated cortical areas retain information about the visual context and mental imagery [26,27,48].

Our formulation of energy loss is closely related to other studies on energy and neural computation, such as [25]. We design the energy loss to be the absolute difference between the somatic and apical tuft compartment, which essentially represents the difference between bottom-up and top-down signals received by a single neuron. By minimising this term that captures the electric potential energy within each neuron, the network learns to match representations across layers and also optimises energy consumption both in terms of spiking and synaptic transmission. While this is different to some other approaches (eg. the Free Energy Principle [2] which centres around thermodynamic energy), many types of energy are involved and interchangeable during the metabolic processes of a neuron. Including energy minimization between top-down and bottom-up representations also implies that the network optimizes the somewhat conflicting objectives of learning top-down class-generic representations and bottom-up sample-specific representations. This conflicting objective is the likely cause of accuracy decreasing as a function of strength of the energy loss. Our empirical results on the overall reduction in spiking and synaptic transmission in the energy model call for a more in-depth mathematical analysis into how our formulation of energy could be related to other ones.

Our results diverge from the findings in [16] which asked a similar question of whether energy optimisation gives rise to predictive coding using a different setup and using classical rate-based artificial neurons. Ali et al [16] optimised the pre-activation of ReLU units as energy in a one-layer recurrent network inferencing on predictable sequences. As a result, they found that units in the recurrent layer self-organised into separate prediction and error neurons and that prediction occurred as within-layer inhibition to counter the excitatory inputs. This is different from our results which showed that top-down predictions were present as excitatory signals, with feedback weights creating a direct mapping of predictions from higher to lower layers.

The disparities between these findings and conclusions predominantly stem from differences in the conceptual frameworks, setups of the network model and tasks, as well as the chosen definitions of energy loss. First, [16] studied unsupervised temporal prediction in a discrete-time rate-coded recurrent network, a problem that becomes fundamentally different in spiking neural networks operating over continuous time. While the rate-coded network just switches to a new prediction at every discrete time step, the spiking neural network would have to hold the current prediction over a time interval until it’s time to switch, turning temporal sequence prediction into a nontrivial decision-making problem (maintain vs switch) at every moment in continuous time (see e.g. [49] for modelling of continuous decision-making in basal ganglia for action-selection). This fundamental difference between the networks makes it difficult to make a direct meaningful comparison between the findings of the two studies. Combining learning with decision-making could however be a way to study dynamic datasets in future work. Second, we are interested in using multiple processing layers to model visual hierarchical processing, yet [16] was focused on self-organisation within one recurrent layer in a temporal prediction problem. The distinct energy definitions are also more meaningful in their respective network contexts. In these separate setups, the findings regarding whether the prediction signals should be excitatory or inhibitory are optimal for each system: within-layer inhibitory recurrent drive minimises preactivation as energy, and top-down excitatory projection that matches bottom-up input minimises intercompartment voltage difference as energy. It is possible that these processes coexist in the cortex, just as [8] argued that dedicated error neurons could exist together with the dendritic implementation of predictive coding. While the visual ventral pathway could employ the mechanisms in our model to link abstract and sample-specific representations along the visual areas, cortical areas responsible for sequence learning could have inhibitory temporal predictions as shown in [16], implemented with additional mechanisms to solve the problem in continuous time. Prospective investigations might consider using our multi-compartment, multi-layer spiking network configuration for a temporal prediction task analogous to that in [16] to ascertain whether congruent or different outcomes are achieved. Empirical evidence regarding the existence of specialized error neurons within specific sensory processing pathways would ultimately help determine which model offers a better mechanistic explanation for various types of sensory processing in the cortex.

Our work was inspired by the recent studies of dendritic predictive coding yet different in one subtle way. Existing proposals of dendritic error computation implement algorithms such that the error value is explicitly encoded by the voltage of the apical dendritic compartment and used to guide local voltage-dependent plasticity rules [8,24,29]. In [24], this was achieved by wiring up specific interneurons to dendritic sites of pyramidal neurons. There is some experimental evidence supporting the involvement of inhibitory interneurons producing predictive error and even gating the plasticity of feedforward synapses [5052]. In our model, the error value is represented implicitly, computed as the difference in voltage between compartments in our model. A natural question is thus how the neurons can utilise this internal value for learning at the synapse. The empirical literature that directly examines this phenomenon is relatively sparse, though [53] presented some experimental evidence supporting the presence of implicit error information within each neuron. One possibility is that since the membrane potential, which determines the neuronal spiking, is a non-linear summation between the voltages of two compartments, biological neurons could compute another signal with these voltages as a representation of their internal electric potential energy to drive synaptic plasticity. Our energy loss, which calculates the absolute difference in voltages between distinct compartments in spiking neurons, models this computation that could potentially be carried out by specific biochemical pathways. This energy loss, unlike a simple spike-counting loss, may be critical for the presence of predictive coding features in the energy model because its implicit information about the mismatch between feedforward and feedback signals could be the driving force behind the learning of top-down weights for predictions. Our model thus offers a new perspective on the potential relevance of this internal energy term in synaptic plasticity. Alternatively, as noted earlier, we can consider our multi-compartment neuron as an abstraction of a small circuit where the somatic output is signalled to the apical tuft using a dedicated interneuron, similar to the proposal in [25].

This current work, which involved a simple classification task using an internal energy loss, can be extended in several ways to test the generalizability of its framework and conclusions. To start with, we chose supervised learning to model top-down projections from multimodal-associative areas downstream to the visual ventral stream as a form of supervision signals in the brain [5456]. However, several recent studies have shown that networks trained unsupervised or self-supervised have representations that better correlate with brain representations and better predict human perception and behaviour than supervised networks [5759]. Therefore, it would be interesting to explore the optimisation of energy in an unsupervised or self-supervised training scheme. A larger dataset with more naturalistic images could also be used to test more complex network properties. Another possibility is implementing different architectures for different tasks. For instance, we have not included within-layer lateral recurrent connections that are important for visual recognition [6062]. We also omitted local lateral inhibition which has been shown to play a role in plasticity for memory and learning [63]. Future work could thus extend energy optimisation work with in-layer recurrent neural networks for a temporal task. vTo study spatial organization for instance in thalamocortical circuits [64], and compare to known biological observations, the fully connected architecture could be replaced by more natural connectivity patterns mimicking receptive fields. In terms of the learning algorithm, we chose FPTT, which is a temporally local but spatially global algorithm. It would be interesting to explore whether other online algorithms could replicate these results. vSimilarly, the surrogate gradient used in the backward pass is principally sensitive to spike-rates; learning rules that are more sensitive to spike-times could be investigated to study the potential for spike-time-based top-down prediction canceling predictable bottom-up inputs [65]. Future work could also incorporate more complex dendritic computations or implement Dale’s law in the network to examine the resulting self-organisation due to energy efficiency [66,67].

Our present study demonstrates that predictive coding properties in a multicompartment spiking neural network may arise from the optimization of each neuron’s internal energy. Empirically, we have connected this energy loss to a decrease in synaptic transmission and spiking, proposing an optimization technique that produced models capable of replicating experimental findings. This approach paves the way for further exploration of the link between energy optimization and predictive coding in spiking neural networks.

Supporting information

S1 Text. Euler approximations to dynamical systems.

https://doi.org/10.1371/journal.pcbi.1013112.s001

(PDF)

S1 Table. Kurskal Wallis Test statistics for distributions in Fig 2C,Fig 2D and Fig 2E.

https://doi.org/10.1371/journal.pcbi.1013112.s002

(PDF)

S1 Fig. Densities of trained time constants in both models.

The energy and control models have similar time constants after training.

https://doi.org/10.1371/journal.pcbi.1013112.s003

(TIF)

S2 Fig. Same vs different class representation similarity between clamped and normal representations.

We aim to evaluate whether clamped representations exhibited greater similarity to normal representations from the corresponding class as opposed to those from different classes. To achieve this, we group pairwise representation similarities into two categories: same-class similarities are representation similarities between normal and clamped representations from the same class, while different-class similarities are those between different classes. In the energy model, the same-class similarities are significantly higher than different-class similarities. In the control model, there is no significant difference between similarity types, indicating a lack of class information in the clamped representations of the control model.

https://doi.org/10.1371/journal.pcbi.1013112.s004

(TIF)

S3 Fig. Model Reaction to Modifications in Pixel Values.

Left: Accuracy of models tested on manipulated images. The x-axis shows the changes in pixel values introduced to the preprocessed test set images. The control model’s test accuracy shows a steeper decline as pixel values strayed from the standard range. This effect may be attributed to more significant alterations in the input currents to L1 neurons at each level of pixel manipulation in the control model (Right). The control model’s comparatively larger input weights potentially account for this observed trend (Refer to Fig 2E).

https://doi.org/10.1371/journal.pcbi.1013112.s005

(TIF)

S4 Fig. Non-adapting spiking neurons.

Removing adaptation from the spiking neuron model decreases accuracy for both models (Left), in particular for the energy model and increases the average firing rates in all layer (Right).

https://doi.org/10.1371/journal.pcbi.1013112.s006

(TIF)

Acknowledgments

The authors express their appreciation to Bojian Yin for his insightful recommendations concerning SNN training.

References

  1. 1. Mumford D. On the computational architecture of the neocortex. II. The role of cortico-cortical loops. Biol Cybern. 1992;66(3):241–51. pmid:1540675
  2. 2. Friston K. A theory of cortical responses. Philos Trans R Soc Lond B Biol Sci. 2005;360(1456):815–36. pmid:15937014
  3. 3. Keller GB, Mrsic-Flogel TD. Predictive processing: a canonical cortical computation. Neuron. 2018;100(2):424–35. pmid:30359606
  4. 4. Millidge B, Seth A, Buckley CL. Predictive coding: a theoretical and experimental review. 2021. https://arxiv.org/abs/21077.12979
  5. 5. Rao RP, Ballard DH. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat Neurosci. 1999;2(1):79–87. pmid:10195184
  6. 6. Spratling MW. A review of predictive coding algorithms. Brain Cogn. 2017;112:92–7. pmid:26809759
  7. 7. Bastos AM, Usrey WM, Adams RA, Mangun GR, Fries P, Friston KJ. Canonical microcircuits for predictive coding. Neuron. 2012;76(4):695–711. pmid:23177956
  8. 8. Mikulasch FA, Rudelt L, Wibral M, Priesemann V. Where is the error? Hierarchical predictive coding through dendritic error computation. Trends Neurosci. 2023; 46(1):45–59. pmid:36577388
  9. 9. Dora S, Bohte SM, Pennartz CMA. Deep gated hebbian predictive coding accounts for emergence of complex neural response properties along the visual cortical hierarchy. Front Comput Neurosci. 2021;15:666131. pmid:34393744
  10. 10. Walsh KS, McGovern DP, Clark A, O’Connell RG. Evaluating the neurophysiological evidence for predictive processing as a model of perception. Ann N Y Acad Sci. 2020;1464(1):242–68. pmid:32147856
  11. 11. Lee K, Dora S, Mejias JF, Bohte SM, Pennartz CMA. Predictive coding with spiking neurons and feedforward gist signalling. Cold Spring Harbor Laboratory. 2023. https://doi.org/10.1101/2023.04.03.535317
  12. 12. Whittington JCR, Bogacz R. Theories of error back-propagation in the brain. Trends Cogn Sci. 2019;23(3):235–50. pmid:30704969
  13. 13. Lotter W, Kreiman G, Cox D. Deep predictive coding networks for video prediction and unsupervised learning. 2017. https://arxiv.org/abs/1605.08104
  14. 14. Choksi B, Mozafari M, O’May CB, Ador B, Alamia A, VanRullen R. Predify: augmenting deep neural networks with brain-inspired predictive coding dynamics. 2021. http://arxiv.org/abs/2106.02749
  15. 15. Han K, Wen H, Zhang Y, Fu D, Culurciello E, Liu Z. Deep predictive coding network with local recurrent processing for object recognition. In: Advances in Neural Information Processing Systems, 2018. https://proceedings.neurips.cc/paper/2018/hash/1c63926ebcabda26b5cdb31b5cc91efb-Abstract.html
  16. 16. Ali A, Ahmad N, de Groot E, van Gerven MAJ, Kietzmann TC. Predictive coding is a consequence of energy efficiency in recurrent neural networks. Patterns. 2022;3(12).
  17. 17. Keijser J, Sprekeler H. Optimizing interneuron circuits for compartment-specific feedback inhibition. PLoS Comput Biol. 2022;18(4):e1009933. pmid:35482670
  18. 18. Perez-Nieves N, Leung VCH, Dragotti PL, Goodman DFM. Neural heterogeneity promotes robust learning. Nat Commun. 2021;12(1):5791. pmid:34608134
  19. 19. Dauwels J. On variational message passing on factor graphs. In: 2007 IEEE International Symposium on Information Theory. 2007. p. 2546–50.
  20. 20. Still S, Sivak DA, Bell AJ, Crooks GE. Thermodynamics of prediction. Phys Rev Lett. 2012;109(12):120604. pmid:23005932
  21. 21. Candadai M, Izquierdo EJ. Sources of predictive information in dynamical neural networks. Sci Rep. 2020;10(1):16901. pmid:33037274
  22. 22. Da Costa L, Parr T, Sengupta B, Friston K. Neural dynamics under active inference: plausibility and efficiency of information processing. Entropy (Basel). 2021;23(4):454. pmid:33921298
  23. 23. Chalk M, Marre O, Tkačik G. Toward a unified theory of efficient, predictive, and sparse coding. Proc Natl Acad Sci U S A. 2018;115(1):186–91. pmid:29259111
  24. 24. Sacramento J, Ponte Costa R, Bengio Y, Senn W. Dendritic cortical microcircuits approximate the backpropagation algorithm. In: Advances in Neural Information Processing Systems. 2018. https://proceedings.neurips.cc/paper/2018/hash/1dc3a89d0d440ba31729b0ba74b93a33-Abstract.html
  25. 25. Senn W, Dold D, Kungl AF, Ellenberger B, Jordan J, Bengio Y. A neuronal least-action principle for real-time learning in cortical circuits. 2023. https://www.biorxiv.org/content/10.1101/2023.03.25.534198v2
  26. 26. Smith FW, Muckli L. Nonstimulated early visual areas carry information about surrounding context. Proc Natl Acad Sci U S A. 2010;107(46):20099–103. pmid:21041652
  27. 27. Shatek SM, Grootswagers T, Robinson AK, Carlson TA. Decoding images in the mind’s eye: the temporal dynamics of visual imagery. Vision (Basel). 2019;3(4):53. pmid:31735854
  28. 28. Gillon CJ, Pina JE, Lecoq JA, Ahmed R, Billeh YN, Caldejon S, et al. Learning from unexpected events in the neocortical microcircuit. 2021. https://www.biorxiv.org/content/10.1101/2021.01.15.426915v2
  29. 29. Urbanczik R, Senn W. Learning by the dendritic prediction of somatic spiking. Neuron. 2014;81(3):521–8. pmid:24507189
  30. 30. Guerguiev J, Lillicrap TP, Richards BA. Towards deep learning with segregated dendrites. Elife. 2017;6:e22901. pmid:29205151
  31. 31. Körding KP, König P. Learning with two sites of synaptic integration. Netw: Comput Neural Syst. 2000;11(1):25–39.
  32. 32. Spratling MW. Cortical region interactions and the functional role of apical dendrites. Behav Cogn Neurosci Rev. 2002;1(3):219–28. pmid:17715594
  33. 33. Budd JM. Extrastriate feedback to primary visual cortex in primates: a quantitative analysis of connectivity. Proc Biol Sci. 1998;265(1400):1037–44. pmid:9675911
  34. 34. Bernander O, Koch C, Douglas RJ. Amplification and linearization of distal synaptic input to cortical pyramidal cells. J Neurophysiol. 1994;72(6):2743–53. pmid:7897486
  35. 35. Larkum ME, Zhu JJ, Sakmann B. A new cellular mechanism for coupling inputs arriving at different cortical layers. Nature. 1999;398(6725):338–41. pmid:10192334
  36. 36. Spruston N. Pyramidal neurons: dendritic structure and synaptic integration. Nat Rev Neurosci. 2008;9(3):206–21. pmid:18270515
  37. 37. Bellec G, Scherr F, Subramoney A, Hajek E, Salaj D, Legenstein R, et al. A solution to the learning dilemma for recurrent networks of spiking neurons. Nat Commun. 2020;11(1):3625. pmid:32681001
  38. 38. Yin B, Corradi F, Bohté SM. Accurate online training of dynamical spiking neural networks through forward propagation through time. Nat Mach Intell. 2023;5(5):518–27.
  39. 39. Neftci EO, Mostafa H, Zenke F. Surrogate gradient learning in spiking neural networks. arXiv preprint 2019. http://arxiv.org/abs/1901.09948
  40. 40. Kag A, Saligrama V. Training recurrent neural networks via forward propagation through time. In: Proceedings of the 38th International Conference on Machine Learning, 2021. p. 5189–200. https://proceedings.mlr.press/v139/kag21a.html
  41. 41. Boutin V, Franciosini A, Ruffier F, Perrinet L. Effect of top-down connections in hierarchical sparse coding. Neural Comput. 2020;32(11):2279–309. pmid:32946716
  42. 42. Yin B, Corradi F, Bohté SM. Accurate and efficient time-domain classification with adaptive spiking recurrent neural networks. Nat Mach Intell. 2021;3(10):905–13.
  43. 43. Tran PT, Phong LT. On the convergence proof of AMSGrad and a new version. IEEE Access. 2019;7:61706–16.
  44. 44. Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. 2010. p. 249–56. https://proceedings.mlr.press/v9/glorot10a.html
  45. 45. Attwell D, Laughlin SB. An energy budget for signaling in the grey matter of the brain. J Cereb Blood Flow Metab. 2001;21(10):1133–45. pmid:11598490
  46. 46. Kriegeskorte N, Mur M, Bandettini P. Representational similarity analysis - connecting the branches of systems neuroscience. Front Syst Neurosci. 2008:2. pmid:19104670
  47. 47. Jordan R, Keller GB. Opposing influence of top-down and bottom-up input on excitatory layer 2/3 neurons in mouse primary visual cortex. Neuron. 2020;108(6):1194-1206.e5. pmid:33091338
  48. 48. Reddy L, Tsuchiya N, Serre T. Reading the mind’s eye: decoding category information during mental imagery. Neuroimage. 2010;50(2):818–25. pmid:20004247
  49. 49. Gurney K, Prescott TJ, Redgrave P. A computational model of action selection in the basal ganglia. II. Analysis and simulation of behaviour. Biol Cybern. 2001;84(6):411–23. pmid:11417053
  50. 50. Poirazi P, Papoutsi A. Illuminating dendritic function with computational models. Nat Rev Neurosci. 2020;21(6):303–21. pmid:32393820
  51. 51. Attinger A, Wang B, Keller GB. Visuomotor coupling shapes the functional development of mouse visual cortex. Cell. 2017;169(7):1291-1302.e14. pmid:28602353
  52. 52. Williams LE, Holtmaat A. Higher-order thalamocortical inputs gate synaptic long-term potentiation via disinhibition. Neuron. 2019;101(1):91-102.e4. pmid:30472077
  53. 53. Francioni V, Tang VD, Brown NJ, Toloza EHS, Harnett M. Vectorized instructive signals in cortical dendrites during a brain-computer interface task. 2023. https://www.biorxiv.org/content/10.1101/2023.11.03.565534v1
  54. 54. Kveraga K, Ghuman AS, Bar M. Top-down predictions in the cognitive brain. Brain Cogn. 2007;65(2):145–68. pmid:17923222
  55. 55. Barbas H. Connections underlying the synthesis of cognition, memory, and emotion in primate prefrontal cortices. Brain Res Bull. 2000;52(5):319–30. pmid:10922509
  56. 56. Kringelbach ML, Rolls ET. The functional neuroanatomy of the human orbitofrontal cortex: evidence from neuroimaging and neuropsychology. Prog Neurobiol. 2004;72(5):341–72. pmid:15157726
  57. 57. Nayebi A, Kong NCL, Zhuang C, Gardner JL, Norcia AM, Yamins DLK. Shallow unsupervised models best predict neural responses in mouse visual cortex. 2021. https://www.biorxiv.org/content/10.1101/2021.06.16.448730v2
  58. 58. Conwell C, Mayo D, Barbu A, Buice M, Alvarez G, Katz B. Neural regression, representational similarity, model zoology & neural taskonomy at scale in rodent visual cortex. In: Advances in Neural Information Processing Systems. 2021. p. 5590–607. https://proceedings.neurips.cc//paper/2021/hash/2c29d89cc56cdb191c60db2f0bae796b-Abstract.html
  59. 59. Storrs KR, Anderson BL, Fleming RW. Unsupervised learning predicts human perception and misperception of gloss. Nat Hum Behav. 2021;5(10):1402–17. pmid:33958744
  60. 60. Kar K, Kubilius J, Schmidt K, Issa EB, DiCarlo JJ. Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior. Nat Neurosci. 2019;22(6):974–83. pmid:31036945
  61. 61. van Bergen RS, Kriegeskorte N. Going in circles is the way forward: the role of recurrence in visual inference. Curr Opin Neurobiol. 2020;65:176–93. pmid:33279795
  62. 62. Kietzmann TC, Spoerer CJ, Sörensen LKA, Cichy RM, Hauk O, Kriegeskorte N. Recurrence is required to capture the representational dynamics of the human visual system. Proc Natl Acad Sci U S A. 2019;116(43):21854–63. pmid:31591217
  63. 63. Herstel LJ, Wierenga CJ. Network control through coordinated inhibition. Curr Opin Neurobiol. 2021;67:34–41. pmid:32853970
  64. 64. Bruno RM, Sakmann B. Cortex is driven by weak but synchronously active thalamocortical synapses. Science. 2006;312(5780):1622–7. pmid:16778049
  65. 65. Kim J, Kim K, Kim JJ. Unifying activation-and timing-based learning rules for spiking neural networks. Adv Neural Inf Process Syst. 2020;33(2020):19534–44.
  66. 66. Payeur A, Béïque J-C, Naud R. Classes of dendritic information processing. Curr Opin Neurobiol. 2019;58:78–85. pmid:31419712
  67. 67. Barranca VJ, Bhuiyan A, Sundgren M, Xing F. Functional implications of Dale’s law in balanced neuronal network dynamics and decision making. Front Neurosci. 2022;16:801847. pmid:35295091