Fig 1.
(A) Schematic of our simple pyramidal cell model (green) consisting of an apical and basal compartment with activations and
respectively. The excitability of the apical trunk
is variable and indicated in dark green. The neuron output
is projected to an output layer (yellow). Prediction errors generate learning signals which are fed back via randomly initialized feedback weights
. (B) Network view. Input neurons (blue) project to apical and basal compartments of a population of pyramidal cells (green). Pyramidal output
is projected to a layer of linear output neurons (yellow) producing the network output
.
Fig 2.
Illustration of eligibility trace dynamics and synaptic plasticity at the basal compartment in our model.
imulation of a single neuron with two apical and basal synapses, each having unit weight. The first apical and basal synapse is initially activated for one time step, followed by the activation of the second apical and basal synapse for three consecutive time steps (panel A). These co-activations lead to increases in the branch strength (panel B), as well as to changes in the eligibility traces (panel C, see text). Then, a recall is performed at time step 19 where both apical synapses are activated. Further, a learning signal
is received (panel D). Changes of basal weights are then given by the product of the eligibility traces with the learning signal and apical activation (panel E, see .
Fig 3.
Learning associations with local plasticity.
(A) An agent observes a sequence of stimulus pairs. After being cued by one of the observed stimuli, it has to indicate the associated one. (B) Number of training episodes needed until the network achieved an accuracy of 80% as a function of association pairs to be remembered (mean and SD over 16 training trials).
Fig 4.
Learning of a delayed match-to-sample task with local plasticity.
(A) Task schema. The agent observes a stimulus followed by 8 white noise inputs and another stimulus
. The agent should choose the left action when the initial stimulus
matches query stimulus
. (B) Learning progress in terms of choice accuracy. Green: Only one character instantiation per class for training and testing. Blue: Network is tested on a character not seen during training. Brown: LSTM in the fixed setting (16 trials; shading indicates standard deviation)
Fig 5.
Context-dependent reward associations.
(A) Schema of the radial maze task. In each trial, one arm pair is accessible to the agent (yellow in the example) and the context cue is presented (Omniglot character). The agent then has to choose the correct arm (left or right) to obtain the reward. (B) Fraction of rewarded actions over learning episodes in the basic radial maze task (blue) and the same task where the rewarding arm is switched after visit (green; mean and SD. over 16 runs). Red: maximum achievable performance. Orange: LSTM in the basic radial maze task.
Table 1.
Comparison of mean errors of a network trained with BPTT vs our local synaptic plasticity on bAbI tasks 10k (mean and SD. over 5 trials). Error rates for tasks solved by using our local synaptic plasticity rules are printed in bold face. BPTT: backpropagation through time; LSP: Local synaptic plasticity; LSP joint: joint training where a single network was trained to perform all tasks concurrently.
Fig 6.
Network analysis for the Single Supporting Fact task.
(A) Projection of keys, values, and recall keys based on a non-negative matrix factorization of memory traces after network training. Keys are shown for specific persons, with representations averaged over locations and verbs. The key for John clearly activates 6 components corresponding to possible locations for John. Value representations are shown for specific locations, with representations averaged over persons and verbs. (B) Story sample along with its respective key (top, outer ring), value (top, inner ring), and memory state after memorization (bottom). Each key and value pair predominantly overlaps in a single component, which is then memorized. Additionally, the change in John’s location in the last fact is accurately updated from component 14 to 1, causing component 14 to be deactivated due to the negative term in our Oja-type Hebbian rule.
Table 2.
Hyperparameters: Supervised learning.
Table 3.
Hyperparameters: Reinforcement learning.
Table 4.
Hyperparameters: LSTM.