Skip to main content
  • Loading metrics

Hebbian plasticity in parallel synaptic pathways: A circuit mechanism for systems memory consolidation

  • Michiel W. H. Remme ,

    Contributed equally to this work with: Michiel W. H. Remme, Urs Bergmann

    Roles Conceptualization, Formal analysis, Funding acquisition, Investigation, Software, Visualization, Writing – original draft, Writing – review & editing

    Current address: INAIT, Lausanne, Switzerland

    Affiliation Department of Biology, Institute for Theoretical Biology, Humboldt-Universität zu Berlin, Berlin, Germany

  • Urs Bergmann ,

    Contributed equally to this work with: Michiel W. H. Remme, Urs Bergmann

    Roles Conceptualization, Formal analysis, Investigation, Software, Visualization, Writing – original draft, Writing – review & editing

    Current address: Google Research, Berlin, Germany

    Affiliation Department of Biology, Institute for Theoretical Biology, Humboldt-Universität zu Berlin, Berlin, Germany

  • Denis Alevi,

    Roles Data curation, Formal analysis, Investigation, Software, Validation, Visualization, Writing – review & editing

    Affiliations Department for Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany, Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany

  • Susanne Schreiber,

    Roles Conceptualization, Funding acquisition, Investigation, Resources, Supervision, Visualization, Writing – review & editing

    Affiliations Department of Biology, Institute for Theoretical Biology, Humboldt-Universität zu Berlin, Berlin, Germany, Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany, Einstein Center for Neurosciences Berlin, Berlin, Germany

  • Henning Sprekeler ,

    Roles Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Visualization, Writing – original draft, Writing – review & editing

    ‡ HS and RK also contributed equally to this work.

    Affiliations Department for Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany, Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany, Einstein Center for Neurosciences Berlin, Berlin, Germany, Excellence Cluster Science of Intelligence, Berlin, Germany

  • Richard Kempter

    Roles Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Visualization, Writing – review & editing

    ‡ HS and RK also contributed equally to this work.

    Affiliations Department of Biology, Institute for Theoretical Biology, Humboldt-Universität zu Berlin, Berlin, Germany, Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany, Einstein Center for Neurosciences Berlin, Berlin, Germany


Systems memory consolidation involves the transfer of memories across brain regions and the transformation of memory content. For example, declarative memories that transiently depend on the hippocampal formation are transformed into long-term memory traces in neocortical networks, and procedural memories are transformed within cortico-striatal networks. These consolidation processes are thought to rely on replay and repetition of recently acquired memories, but the cellular and network mechanisms that mediate the changes of memories are poorly understood. Here, we suggest that systems memory consolidation could arise from Hebbian plasticity in networks with parallel synaptic pathways—two ubiquitous features of neural circuits in the brain. We explore this hypothesis in the context of hippocampus-dependent memories. Using computational models and mathematical analyses, we illustrate how memories are transferred across circuits and discuss why their representations could change. The analyses suggest that Hebbian plasticity mediates consolidation by transferring a linear approximation of a previously acquired memory into a parallel pathway. Our modelling results are further in quantitative agreement with lesion studies in rodents. Moreover, a hierarchical iteration of the mechanism yields power-law forgetting—as observed in psychophysical studies in humans. The predicted circuit mechanism thus bridges spatial scales from single cells to cortical areas and time scales from milliseconds to years.

Author summary

After new memories are acquired, they can be transferred over time into other brain areas—a process called systems memory consolidation. For example, new declarative memories, which refer to the conscious memory of facts and events, depend on the hippocampus. Older declarative memories, however, also rely on neocortical networks. The cellular mechanisms underlying such a transfer are poorly understood. In this work, we show that a simple and in the brain ubiquitous connectivity pattern, combined with a standard learning rule, leads to gradual memory transfer. We illustrate our proposed mechanism in numerical simulations and mathematical analyses. At the neurophysiological level, our theory explains experimental findings on memory storage in the hippocampal formation when specific pathways between neural populations are disrupted. At the psychophysical level, we can account for the power-law forgetting curves typically found in humans. A consequence of the proposed model is that consolidated memories can yield faster responses because they are stored in increasingly shorter synaptic pathways between sensory and motor areas. By giving a mechanistic explanation of the consolidation process, we contribute to the understanding of the transfer of memories and the reorganization of memories over time.


Clinical and lesion studies suggest that declarative memories initially depend on the hippocampus, but are later transferred to other brain areas [13]. Some forms of memory eventually become independent of the hippocampus and depend only on a stable representation in the neocortex [13]. Similarly, procedural memories are consolidated within cortico-striatal networks [1, 4, 5]. This process of memory transformation—termed systems memory consolidation—is thought to prevent newly acquired memories from overwriting old ones, thereby extending memory retention times (“plasticity-stability dilemma”; [610]), and to enable a simultaneous acquisition of episodic memories and semantic knowledge of the world [11, 12]. While specific neuronal activity patterns, including for example an accelerated replay of recent experiences [13, 14], are involved in the transfer of memories from hippocampus to neocortex [15], the mechanisms underlying systems memory consolidation are not well understood. Specifically, it is unclear how this consolidation-related transfer is shaped by the anatomical structure and the plasticity of the underlying neural circuits. This poses a substantial obstacle for understanding into which regions memories are consolidated; why some memories are consolidated more rapidly than others [1618]; why some memories stay hippocampus dependent, and why and how the character of memories changes over time [1]; and whether the consolidation of declarative and non-declarative memories [1, 4, 5] are two sides of the same coin. These questions are hard to approach within phenomenological theories of systems consolidation such as the standard consolidation theory [11, 19], the multiple trace theory [16], and the trace transformation theory [20, 21]. Here, we propose a novel mechanistic foundation of the consolidation process that accounts for several experimental observations and that could contribute to understanding the transfer of memories and the reorganisation of memories over time on a neuronal level.

Our focus lies on simple forms of memory that can be phrased as cue-response associations. We assume that such associations are stored in synaptic pathways between an input area—neurally representing the cue—and an output area—neurally representing the response. Thus, our work relates to feedforward, hetero-associative memory (and is therefore applicable to both declarative and non-declarative memories) rather than recurrent, auto-associative memory (see, e.g., [2224]). Our central hypothesis—the parallel pathway theory (PPT)—is that systems memory consolidation arises naturally from the interplay of two abundantly found neuronal features: parallel synaptic pathways and Hebbian plasticity [25, 26]. First, we illustrate this theory in a simple hippocampal circuit motif and show that Hebbian plasticity can consolidate previously stored associations into parallel pathways. Next, we outline the PPT in a mathematical framework for the simplest possible (linear) case. Then we show in simulations that the proposed mechanism is robust to various neuronal nonlinearities; further, the mechanism reproduces the results of a hippocampal lesion study in rodents [27]; iterated in a cascade, it can achieve a full consolidation into neocortex and result in power-law forgetting of memories as is observed in psychophysical studies in humans [28].


A mechanistic basis for systems memory consolidation

The suggested parallel pathway theory (PPT) relies on a parallel structure of feedforward connections onto the same output area: a direct, monosynaptic and an indirect, multisynaptic pathway. We propose that memories are initially stored in the indirect pathway and are subsequently transferred to the direct pathway via Hebbian plasticity. Because the indirect pathway is multisynaptic, it transmits signals with a longer time delay than the direct pathway (Fig 1A). A timing-dependent plasticity rule allows the indirect pathway to act as a teacher for the direct pathway.

Fig 1. A mechanistic basis for systems memory consolidation.

(A) Circuit motif for the parallel pathway theory. Cue-response associations are initially stored in an indirect synaptic pathway (blue) and consolidated into a parallel direct pathway (red). (B) Hippocampal connectivity. The entorhinal cortex projects to CA1 through an indirect pathway via DG-CA3 and the Schaffer collaterals (SC, blue arrow), and through the direct perforant path (PPCA1, red arrow). (C) Model of consolidation through STDP. Left: before consolidation, a strong SC input (middle, blue vertical bar) causes a large EPSP and triggers a spike in CA1 (bottom, black vertical bar). A weak PPCA1 input (top, red) that precedes the SC input is potentiated by STDP. Right: after consolidation through STDP, the PPCA1 input (top) can trigger a spike in CA1 by itself (bottom). (D-E) Consolidation in a single integrate-and-fire CA1 cell receiving 1000 PPCA1 and 1000 SC excitatory inputs. (D) PPCA1 activity consists of independent poisson spike trains; the SC activity is an exact copy of the PPCA1 activity, delayed by 5 ms. (E) Consolidation of a synaptic weight pattern from non-plastic SC synapses to plastic PPCA1 synapses. Left and middle: normalized synaptic weights before and after consolidation. Right: time course of correlation between SC and PPCA1 weight vectors during consolidation (mean ± SEM for 10 trials). (F) Failure of consolidation of a synaptic weight pattern from non-plastic PPCA1 to plastic SC synapses; panels as in E.

The proposed mechanism can be exemplified in the hippocampal formation, by considering direct and indirect pathways to area CA1. CA1 receives a direct, monosynaptic pathway from the entorhinal cortex (EC), which is called perforant path (PPCA1, Fig 1B, red; [29]). In addition, EC input is relayed to CA1 via the classical trisynaptic pathway via dentate gyrus (DG) and CA3, reaching CA1 through the Schaffer collaterals (SC; Fig 1B, blue; [29]).

As in earlier theories, we assume that the indirect pathway via CA3 is involved in the original storage of memories [30, 31], an assumption that is supported by experiments, e.g. [3234]. We neglect, for simplicity, any encoding-related change in the direct pathway, even though in animals this pathway might also show some, putative much lower, plasticity during memory acquisition. This simplification does not affect our proposed mechanism on the consolidation-related transfer of memories.

We assume encoding in such a way that a memory can be recalled by a specific neural activity pattern in EC—a cue—that triggers spikes in a subset of CA1 cells through this indirect pathway via the SC, representing the associated response. The same cue reaches CA1 also through the direct pathway via the PPCA1. We assume that this direct input from EC initially fails to trigger spikes because the synaptic weight pattern in the PPCA1 does not match the cue. However, PPCA1 inputs that are activated by the cue precede the spikes in CA1 pyramidal cells that are triggered by the indirect pathway by 5–15 ms [35] due to transmission delays. Presynaptic spikes preceding postsynaptic spikes with a short delay favor selective long-term potentiation by spike timing-dependent plasticity (STDP, Fig 1C) [3638]. Consequently, cue-driven PPCA1 synapses onto activated CA1 cells are strengthened until the memory that was initially stored in the indirect pathway can be recalled via the direct pathway alone. The indirect pathway thus acts as a teacher for the direct pathway.

To illustrate this mechanism, we used a simple integrate-and-fire neuron model (for details, see Methods) of a CA1 cell that receives inputs through the SC and the PPCA1. We also considered the two pathways to contain the same number of synapses and transmit identical spike patterns apart from a 5-ms delay in the SC (Fig 1D). Consolidation then corresponds to copying the synaptic weight pattern of the SC to the PPCA1. In line with our hypothesis, such a consolidation was indeed achieved by STDP in the PPCA1 synapses (Fig 1E). A consolidation in the opposite direction, i.e., from the PPCA1to the SC cannot be achieved by STDP because the temporal order of spiking activity is reversed and hence does not favour synaptic potentiation (Fig 1F). Note that in this simple example, the EC-to-DG/CA3 synapses don’t store any memory, but only introduce the transmission delay. In the following, we will show that all synapses of the indirect pathway can be involved in the original storage of memories.

To understand the conditions under which the suggested PPT can achieve a consolidation of associative memories, we performed a mathematical analysis, which shows that consolidation is robust to differences in the neural representation in the two pathways and illustrates its dependence on the temporal input statistics in the two pathways. Readers who are less interested in the mathematical details are welcome to jump to section “Consolidation of spatial representations”, where we show in simulations that the mechanism is robust to neuronal complexities; in subsequent sections, we also show that the mechanism accounts for lesion studies in rodents, and that it can be hierarchically iterated.

Theory of spike timing-dependent plasticity (STDP) for parallel input pathways.

In the following mathematical analysis, we consider a single cell that receives inputs through two pathways, as in Fig 1A. The cell could be located, for example, in CA1, as in Fig 1B. We assume that memories, i.e., cue-response associations, are stored in the synaptic weight vector V of the indirect path, and that consolidation occurs by transferring this information into the weights W of the direct path. In the simulation in Fig 1, the weight vector V represents the SC pathway, and the vector W the PPCA1 pathway. For simplicity, we consider the case of a single rate-based neuron, which represents one of the output neurons in the simulated network. Very similar theoretical results can be obtained for the spiking case of linear Poisson neurons, apart from additional contributions from spike-spike correlations, which can be neglected for a large number of synapses [39].

The output y of the rate-based neuron is assumed to be given by a linear function of the input (1) where the vectors x and x′ denote the input arising from the direct and indirect pathways, respectively, and denotes the transpose of a vector (or matrix). We assume that the inputs x and x′ are both representations of the cue and therefore are related by some kind of (potentially nonlinear) statistical dependency. Moreover, we assume that x′ arises from an indirect pathway and is therefore delayed by a time interval D > 0. The notation is chosen such that the case where the two inputs to the two pathways are the same (apart from the delay) reduces to the condition x(t) = x′(t), which is the case, e.g., in Fig 1D.

We now consider the learning dynamics of a simple additive (STDP) rule that would result from a rate picture (neglecting spike-spike correlations; cf. [39]), (2) where L(τ) is the learning window (example in section Effects of temporal input statistics on systems memory consolidation), which determines how much a pair of pre- and postsynaptic activity pulses (i.e., spikes) with a time difference τ changes the synaptic weight, and η is a learning rate that scales the size of these changes. We adopt the convention that the time difference τ is positive when a presynaptic spike occurs before a postsynaptic spike.

The notation indicates averaging over an interval of length T. We assume that the integration time T can be chosen such that the weights do not change significantly during the integration time (i.e., a small learning rate η), but that the statistics of the input are sufficiently well sampled so that boundary effects in the temporal integration are negligible. We also assume that the statistics of the inputs x and x′ are stationary, i.e., they do not change over time. Under these assumptions, we can insert the output firing rate y from Eq (1) into the learning rule in Eq (2) and get (3) where 〈⋯〉t denotes the average over all times. Eq (3) describes the dynamics of the weights W in the direct pathway, which are driven by an interplay of the correlation structures within the direct pathway (through ) and between the two pathways (through ); the dynamics of W depends also on the shape of the learning window L and the weights V in the indirect pathway.

It is important to emphasize that in this analysis of the learning dynamics we consider the input arising during consolidation, e.g., during sleep, and this input may be statistically different from the input during memory storage or recall. If the correlation structure between the two pathways is different during consolidation and during storage/recall, the consolidation process leads to a distortion of the memory in the sense that a different cue would be required to retrieve the memory. Here, we consider only the case where the correlation structure during consolidation is the same as during storage and recall.

Let us now study under which conditions this weight update generates a consolidation of the input-output associations stored initially in the weights V of the indirect pathway into the weights W of the direct pathway.

Learning dynamics implement memory consolidation as a linear regression.

In general, the learning dynamics is hard to analyze if the covariance matrices and are arbitrary objects. A case that can be studied analytically is that of separable statistics in which each of the two correlation matrices can be written as a product of scalar functions f and g of the delay τ and the covariance matrices for zero delay, and : (4) (5) For simplicity, we omitted the lower index t in and . Note that this separability assumption is consistent with all simulations shown, except the one in the section “Consolidation of spatial representations” of the Results; see there for details.

For separable input statistics, the learning dynamics in Eq (3) can be simplified to (6) If the scalar constant A is negative (see below for conditions when this is the case), the learning dynamics is stable and converges to a unique fixed point that is given by (7) Note that apart from the factor , this fixed point has the same structure as the closed-form solution of a linear regression. In fact, it is straightforward to show that the learning dynamics in Eq (6) performs a gradient descent on the error function (8) If A is negative and B is positive (and thus β is positive), the learning dynamics in the direct path converges to a weight configuration for which the input from the direct path is an optimal linear approximation of the input from the indirect path, in the sense of minimal mean squared error E. If β > 1, the direct pathway would contribute more to a potential recall than the original memory trace in the indirect pathway. A sign reversal of β (i.e. β < 0) implies a sign reversal of V. Then, however, a constraint on the sign change of weights (see below for details) would prohibit consolidation; memories in the indirect path could even be actively deleted from the direct path. In summary, memory consolidation in the PPT is supported by A < 0 (stable dynamics) and B > 0 (consolidation possible), which implies .

Note that we assumed storage of original memories only in the weight vector V representing the SC pathway. But since learning in the direct pathway is driven by the input from the entire indirect pathway, these results also hold if original memories are stored in any other plastic synapses of the indirect pathway (e.g. EC to DG/CA3 in Fig 1B).

Let us relate the theoretical results obtained so far to the simulations shown in Fig 1; although the simulations are performed for integrate-and-fire neurons, our theory on rate-based neurons accounts for the main findings: Because two inputs x and x′ are the same apart from the delay, the fixed point condition Eq (7) reduces to , in line with the result that the weights are copied into the direct pathway. Because the learning window is dominated by depression we have A < 0 while the delay in combination with the shorter autocorrelation time of the Poisson processes in the input ensures B > 0. A consolidation from the direct to the indirect pathway is not possible because this inverts the delay and pushes the cross-correlation between the two pathways into the depression component of STDP. As a result, the factor B is negative and consolidation fails.

In terms of systems memory consolidation in general, the weights V of the indirect path change as new memories are acquired, so the fixed point in Eq (7) for the weights W of the direct path is usually never reached. If it were, the direct pathway would merely represent a copy of the memories that are currently stored in the indirect path rather than retaining older memories, as intended. The time scale of the learning dynamics of the direct path [determined by η in Eq (6)] should therefore be longer than the memory retention time in the indirect path, which is determined, e.g., by the rate at which new memories are stored. In case of a small enough η, the transient dynamics of the system is more important for the consolidation process than the fixed point.

Another important aspect to emphasize is that the consolidation is influenced by the correlation structure between the two pathways that is encountered during the consolidation period. Intuitively and according to Eq (8), consolidation is achieved by matching the input that is caused by “cues” x′ in the indirect path with the input caused by the associated “cues” x in the direct path. In order for the consolidated memories to be accessible during recall, the relation between the “cues” in the two pathways (i.e., the correlation between the two pathways) should be the same during recall as during consolidation.

The objective function argument in Eq (8) only holds when the constant A is negative. For positive A, the learning dynamics in Eq (7) suffers from the common Hebbian instability and thus has to be complemented by a weight-limiting mechanism. The choice of this weight limitation (e.g., subtractive or divisive normalization, weight bounds) will then have an impact on the dynamics and the fixed point of the learning process [40, 41]. For the simulations, the parameters were therefore always chosen such that the learning dynamics were stable (A < 0). Although this suggests that no weight limiting mechanism was required in principle, upper and lower bounds for the weights were nevertheless used in simulations, with no qualitative impact on the results.

Effects of temporal input statistics on systems memory consolidation.

The constants A and B, which were defined in Eq (6) as A ≔ ∫dτ L(τ)f(τ) and B ≔ ∫dτ L(τ)g(τD), play an important role for the learning dynamics. As already elaborated, the sign of A determines stability while B should be positive to obtain consolidation. Sign and magnitude depend on the interplay between the learning window L and the temporal input statistics, characterized by the correlation functions f and g defined in Eqs (4) and (5), respectively. For the assumed separable statistics, f is fully determined by the autocorrelation of the input in the direct path, and f(τ) is therefore symmetric in time τ.

A first interesting observation is that for the special case of an antisymmetric learning window L, we obtain A = 0 for symmetry reasons. Mathematically, this implies that the first term of the learning dynamics in Eq (6)—the dependence of the change of the weights W in the direct path on their actual value—vanishes. Intuitively, the balance of potentiation and depression in an antisymmetric learning window implies that the direct path, although able to drive the postsynaptic neuron, is causing equal amounts of potentiation and depression in all of its synapses. On average, synaptic changes are caused only by the indirect pathway with weights V, which therefore acts as a supervisor for the learning dynamics of W in the direct path. A thorough analysis under which conditions STDP can be used for supervised learning has been provided elsewhere [42, 43], and the results of this analysis are applicable in the present case. Functionally, the depressing part of an STDP learning window serves to neutralize the impact of the direct pathway on its own learning dynamics, effectively creating a supervised learning scenario.

Another interesting observation relates to the magnitude of the terms A and B, which is determined by the time scale on which the inputs change (reflected, e.g., in the time constants of the decay of the correlation functions f and g). Let us assume that both correlation functions f(τ) and g(τ) are maximal for τ = 0 and that they decay to 0 for large |τ|; such conditions are reasonable for most correlation structures. We also assume that the learning window has the typical structure of potentiation for causal timing, L(τ) > 0 for τ > 0, and depression for acausal timing, L(τ) < 0 for τ < 0 [36, 37, 44]. Then the delay D > 0 in the indirect path shifts the maximum of the cross-correlation g(τD) into the potentiating part of the learning window (Fig 2B) while the maximum of f(τ) remains in the transition region of potentiation and depression (Fig 2A). The following three observations can be made concerning the constant B as defined by the integral in Eq (6):

  1. (1). If the cross-correlation g has a narrow enough peak (i.e., narrower than the time scale of the learning window and the delay D), B is positive, suggesting that consolidation can occur (Fig 2B). The sharp localization of g corresponds to rapidly changing input signals.
  2. (2). If the decay time constant of the cross-correlation g is large compared to that of the learning window, the depressing component of the learning window has more impact and reduces the constant B and thus the efficiency of consolidation (Fig 2C). In the case where the learning window is dominated by depression, B can even get negative for large time constants of g, abolishing consolidation altogether.
  3. (3). If the delay D along the indirect path is much longer than the decay time constant of the learning window, we obtain B ≈ 0, meaning that consolidation is abolished (Fig 2D). In other words, the delayed correlations between the two pathways are too large to be exploited by STDP. This will limit the ability to consolidate from too long indirect paths into shortcuts.

Fig 2. Interaction of temporal correlations and the STDP learning window.

The weight dynamics of the direct path [Eq (6)] is driven by inputs from the direct and indirect paths: weight changes are determined by the integrated products of the STDP learning window L with the autocorrelation f [Eq (4)] and the cross-correlation g [Eq (5)], respectively. (A) Examples of a learning window L(τ) and an autocorrelation f(τ), both plotted as a function of the “relative timing” τ. For separable statistics, f is symmetric. If the learning window L has a stronger negative part for τ < 0 and a weaker positive part for τ > 0, the coefficient A ≔ ∫dτ L(τ)f(τ) is typically negative. (B)–(D) Learning window L as in (A) and three example cross-correlations g. (B) The indirect path primarily induces potentiation in the direct path if B ≔ ∫dτ L(τ)g(τD) > 0. This is the case if (i) the delay D between the paths is positive, (ii) the learning window is positive for positive delays, and (iii) the time scale of the decay of cross-correlations g is shorter than the delay D and the width of the learning window L. These three conditions favor consolidation. (C) If the cross-correlation g decays on a time scale that is much longer than the width of the learning window and the delay D, the indirect path can drive both potentiation and depression, and consolidation is weaker (i.e., the coefficient B is smaller) than for shorter correlations. (D) If the delay D between the direct and the indirect paths is longer than the width of the learning window L, the indirect path cannot induce systematic changes in the weights of the direct path (coefficient B ≈ 0), and consolidation is ineffective.

Consolidation of spatial representations

The mathematical analysis of the PPT makes two key predictions. First, it suggests that STDP in a parallel direct pathway achieves consolidation by performing a linear regression between inputs in the direct and the indirect pathways [Eqs (7) and (8)]. Therefore, the proposed mechanism should generalize to situations in which the cue representations in the direct and indirect pathways differ. Second, the theory suggests that consolidation is most effective when the correlation time constants of the input during consolidation is matched to the coincidence time scale of STDP (Fig 2B). In the following, we will show in simulations that those predictions hold and, moreover, that the mechanism is robust to neuronal nonlinearities.

To begin with, we show that the mechanism is robust to differing cue representations in the two pathways and to weaker correlations among them [45]. To this end, we used place cell representations [46] for the SC input from CA3 and grid cell representations [47, 48] for the PPCA1input from EC (Fig 3A). Moreover, we show that the suggested mechanism is compatible with the biophysical properties of CA1 neurons, which receive inputs in different subcellular compartments. To this end, we simulated a multicompartmental CA1 pyramidal cell (Fig 3B) that was endowed with active ion channels supporting backpropagating action potentials and dendritic calcium spikes (Fig 3C, Methods).

Fig 3. Consolidation of spatial representations.

(A) Replay of PPCA1 and SC activity during sleep. 500 PPCA1 inputs and 2500 SC inputs are spatially tuned on a linear track with periodic grid fields (top, red) and place fields (bottom, blue). Spiking activities are independent Poisson processes (10 spikes/s) inside place/grid fields, otherwise silent. SC activity is delayed by 5 ms. (B) Multi-compartmental model of a reconstructed CA1 pyramidal neuron (see Methods). PPCA1 and SC inputs project to distal apical tuft dendrites (red dots) and proximal apical and basal dendrites (blue dots). (C) Active neuron properties. Top: somatic sodium spike (black) propagates to the distal tuft and initiates a dendritic calcium spike (red) and further sodium spikes. Bottom: dendritic calcium spike leads to bursts of somatic spikes. (D) Spatial tuning before consolidation. SC provides place field-tuned input to the CA1 cell (left, blue), which yields spatially tuned spiking activity (right, blue); PPCA1 input is not spatially tuned (left, red), and (alone) triggers low and untuned spiking activity (right, red). (E) Somatic and dendritic activity during consolidation. During replay, SC input generates backpropagating sodium spikes (black vertical lines) that generate dendritic calcium spikes (red). (F) After consolidation. Spatial tuning is consolidated from the indirect SC pathway into the direct PPCA1 pathway. Left: spatial tuning of total PPCA1 input (red) approaches theoretically derived PPCA1 input tuning (magenta; see Methods). Right: CA1 output is place field-tuned through either SC or PPCA1 input alone. (G) Evolution of correlation between actual and optimal PPCA1 input tuning (see F) for replay speeds corresponding to hippocampal replay events (black) and real-time physical motion (grey). Position in D, E, and F normalized to [0, 1].

The use of spatial representations in the input pathways allows us to consider simple forms of memories in a navigational context in which a given location on a linear track is associated with the activity of a given CA1 cell. Effectively, such an association generates a CA1 place cell. In line with the PPT, we assumed that the spatial selectivity of this CA1 place cell is initially determined solely by the indirect pathway via the SC, i.e., by place cell input from CA3. The goal of systems memory consolidation is then to transfer this spatial association to the direct input, which reaches the CA1 cell via the PPCA1 derived from grid cells in EC. In other words, place-cell input should supervise grid-cell input to develop a place-cell tuning. Note that we use the spatial setup primarily as an illustration of the theory. We do not make claims regarding the temporal development of CA1 place cells in vivo, which is not fully understood [4951].

SC place field inputs were modelled by synapses that were active only in a small region of the track, whereas individual PPCA1 grid cell inputs were active in multiple, evenly spaced regions along the track (Fig 3A). In terms of the theory, the cue representation in the two pathways is now different, but correlated, because the same location is encoded. The SC and PPCA1 inputs projected to proximal and distal dendrites, respectively (Fig 3B, [52]). Synapses were initialized such that the SC input conductances were spatially tuned and resulted in place field-like activity in the CA1 cell while the PPCA1 input had no spatial tuning (Fig 3D).

During consolidation, SC and PPCA1 input to the CA1 cell consisted of replays of previously encountered sequences of locations [13, 14], with a replay speed 20 times faster than physical motion [13]. During replay, the SC input led to somatic spikes, which in turn triggered backpropagating action potentials that caused calcium spikes in the distal dendrites where the PPCA1 synapses arrive (Fig 3C and 3E, [53]). Through synaptic plasticity, PPCA1 synapses active in the place field of the neuron were potentiated. Over time, the PPCA1 input adopted the spatial tuning of the SC input (Fig 3F, left) and reproduced the original SC-induced place field output (Fig 3F, right) with high correlation (Fig 3G). The fact that the spatial tuning of two inputs is not perfectly matched does not contradict with theoretical results, which merely state that the direct input should attain the best possible linear approximation of the indirect pathway. In the present setting, this approximation is bounded by the finite range of frequencies of the entorhinal inputs (in analogy to reconstructing a high-frequency signal, e.g. a narrow peak, with a finite set of Fourier components), which causes the ringing next to the target peak in Fig 3F (left). In summary, the PPT mechanism therefore consolidated associations even though the spatial representations in the two pathways differed and although the two pathways targeted different neuronal compartments with different numbers of synapses in the CA1 neuron with complex morphology.

The theory also predicts that consolidation is most effective when the correlation time in the input is matched to the time scale of STDP (Fig 2B). In line with this prediction, consolidation failed when replay speed was reduced to that of physical motion (Fig 3G) because the time scale of rate changes in place and grid cell activity is then much longer than the delay between the two pathways and the time scale of STDP (Fig 2C). Accelerated replay during sleep [13] hence supports systems memory consolidation within the PPT by aligning the time scales of neural activity and synaptic plasticity [54], and this alignment is similar to the effect of phase precession during memory acquisition [55].

Finally, we note that the theoretical analysis relied on a separability assumption for the statistics in the two pathways; cf. Eqs (4) and (5). This condition is not fulfilled for sequence replay during consolidation because the time-delayed covariance of different place cells depends on the relative spatial location of their place fields; such correlations are non-separable even for slower replay or during memory acquisition with real-time physical motion. The observation that consolidation was successful nevertheless illustrates that the separability assumption does not need to be fulfilled for the PPT to achieve a successful consolidation.

Consolidation of place-object associations in multiple hippocampal stages

Ultimately, to consolidate memories into neocortex, they have to move beyond the PPCA1. Notably, the PPCA1 is itself part of an indirect pathway from EC to the subiculum (SUB) that is shortcut by a direct connection from EC to SUB (referenced as PPSUB; Fig 4A, left; [29]). This suggests that the PPT can be reiterated to further consolidate memories from the PPCA1 to the PPSUB and beyond.

Fig 4. Consolidation of place-object associations in multiple hippocampal stages.

(A) Structure of the extended model. PPSUB: perforant path to the subiculum. Each area (EC, DG-CA3, CA1, SUB) contains object-coding and place-coding populations. Open arrows: all-to-all connections between these areas. (B) Decoding of consolidated associations. Top: The location of a platform in a circular environment is stored as an object-place association in the SC (thick diagonal arrows in A, right). Middle: Platform position probability maps given the platform object cue, inferred from the CA1 output resulting from SC or PPCA1 alone, at different times during consolidation (see section “Consolidation of place-object associations in multiple hippocampal stages” in Methods). Bottom: Platform-in-quadrant probabilities (±SEM) given PPCA1 input alone during consolidation. Quadrant with correct platform position (target quadrant) in orange. (C) Consolidation from SC to PPCA1 and to PPSUB over four weeks. Each day, a new association is first stored in SC and then partially consolidated. An association on day 0 is monitored in SC, PPCA1, and PPSUB. Panels as in B. (D) Effects of PPCA1 lesions on memory consolidation, model and experiment (data with permission from [27]). Histograms of time (±SEM) spent in quadrants at different delays after memory acquisition (“probe”). Dashed lines at 25% are chance levels. T: target quadrant; Left, Right: adjacent quadrants; O: opposite quadrant. Top: Control without lesion. Middle: Lesion before memory acquisition. Bottom: Lesion 21 days after memory acquisition.

To illustrate this idea, we considered a standard paradigm for memory research in rodents: the Morris water maze [56]. In the water maze, the rodent needs to find a submerged platform (object), i.e., it must store an object-place association. Thus this paradigm requires neural representations of objects (such as the submerged platform) and places. We hence constructed a model in which subregions of the hippocampal formation included neurons that encode places and neurons that encoded the identity of objects (Fig 4A, right).

For simplicity and computational efficiency we switched to a rate-based neuron model (Methods). An object was chosen from a set of 128 different objects and placed in a circular open field environment (Fig 4B, top). As motivated by experiments [3234], we implemented object-to-place associations in our model by enhancing, as before, synaptic connections in the SC, but now between object-encoding neurons in CA3 and place-specific neurons in CA1 (Fig 4A, right). Here, we did not consider place-to-object associations. These are less relevant for the water maze task, where the task is to recall the location of a given object—the platform—rather than to recall which object was encountered at a given location. We tested object-to-place associations stored in the SC by activating the object representation in EC—as a memory cue—and determining the activities in CA1, triggered by the SC alone. From these activities we inferred a spatial probability map of the recalled object location (Fig 4B; Methods).

We first stored a single object-place association in the SC. During a subsequent consolidation cycle—representing one night—place and object representations in EC were then randomly and independently activated. Consistent with our previous results, the object-place association was gradually consolidated from the SC to the PPCA1: after one night of consolidation, the correct spatial probability map of an object location was inferrable from CA1 activity triggered by the PPCA1 alone (Fig 4B).

To track the consolidation process over longer times, we assumed that a new random object-place association is stored in the SC every day. This caused a decay of previous SC memory traces due to interference with newly stored associations (Fig 4C, [57, 58]). During the night following each day, associations in the SC were partially consolidated into the PPCA1, such that the consolidated association could be decoded from the PPCA1 after a single night, but previously consolidated associations were not entirely overwritten. As a result, object-place associations were maintained in the PPCA1 for longer periods than in the SC, thus extending their memory lifetime (Fig 4C). Eventually, a given PPCA1 memory trace would also degrade as new interfering memories from the SC are consolidated. However, as noted above, the PPCA1 itself is part of an indirect pathway from EC to the SUB, for which there is in turn a parallel, direct perforant pathway PPSUB. The association in the PPCA1 (and SC) could therefore, in turn, be partially consolidated into the PPSUB, further extending memory lifetime (Fig 4C). Note that the extension of memory lifetime is supported in the model by a reduced plasticity (i.e. halved learning rate in Eq (33)) in PPSUB compared to PPCA1.

The model suggests that the PPCA1 serves as a transient memory buffer that mediates a further consolidation into additional shortcut pathways downstream. This hypothesis is supported by navigation studies in rats. Using PPCA1 lesions, Remondes and Schuman [27] have shown that the PPCA1 is not required for the original acquisition of spatial memories, but that it is critically involved in their long-term maintenance. However, lesioning the PPCA1 21 days after acquiring a memory did not disrupt spatial memories, suggesting that the PPCA1 is not the final storage site (Fig 4D) and further supporting the idea that the PPCA1 is important to enable a transition from short-term to long-term memories.

To test whether our model could reproduce these experimental results, we simulated PPCA1 lesions either before the acquisition of an object-place association or 21 days later. Assuming that the rat’s spatial exploration is determined by the probability map of the object location [59], the model provided predictions for the time spent in different quadrants of the environment, which were in quantitative agreement with the data for all experimental conditions (Fig 4D). Our model thus suggests that a hierarchical reiteration of parallel shortcuts—the central circuit motif of the PPT—could explain these experiments.

Similar to lesioning the PPCA1, we predict that lesioning PPSUB also has an impact on memory consolidation: PPSUB should act as a transient memory buffer but on a longer timescale than PPCA1. In general, lesioning a pathway with a set of synapses that cover a specific range of time scales such that there is a “gap” should result in an impairment of consolidation if the lesion is done before the memory has “moved on”. To illustrate this idea in more detail, we study in the next section a model with many stages in a hierarchy.

Consolidation from hippocampus into neocortex by a hierarchical nesting of consolidation circuits

Given that shortcut connections are widespread throughout the brain [25, 60, 61], we next hypothesized that a reiteration of the PPT can also achieve systems consolidation from hippocampus into neocortex. To test this hypothesis, we studied a network model (Fig 5A), in which the hippocampus (now simplified to a single area) receives input from a hierarchy of cortical areas, representing, e.g., a sensory system. It provides output to a different hierarchy of areas, representing, e.g., the motor system or another sensory system.

Fig 5. Consolidation from hippocampus into neocortex by hierarchical nesting of consolidation circuits.

(A) Schematic of the hierarchical model. The hippocampal formation (HPC) is connected to cortical input circuit 1 and output circuit 1. Increasing numbers indicate circuits further from the HPC and closer to the sensory/motor periphery. Each direct connection at one level (e.g., dark blue arrow between input 1 and output 1) is part of the indirect pathway of the next level (e.g., for pathways from input 2 to output 2). Learning rates of the direct connections decrease exponentially with increasing level (i.e., from blue to red). (B) Memories gradually propagate to circuits more distant from the HPC. The correlation of the initial HPC weights with the direct pathways is shown as a function of time and reveals a memory wave from HPC into neocortex. The maximum of the output circuits follows approximately a power-law (black curve). Noise level indicates chance-level correlations between pathways. (C) Consolidated memories yield faster responses (from sensory periphery, e.g., Input 8, to system output) because these memories are stored in increasingly shorter synaptic pathways.

The network also contained shortcut connections that bypassed the hippocampus. As in the previous section, new memories were stored in the hippocampus but not in any other indirect connection in the hierarchy. The repeated storage of new memories every day leads to a decay of previously stored hippocampal memories. But memories are also consolidated by Hebbian plasticity in parallel pathways; for details, see Methods.

Tracing a specific memory over time revealed a gradual consolidation into the cortical shortcut connections, forming a “memory wave” [10] that travels from hippocampus into neocortex (Fig 5B). By exponentially decreasing the shortcut learning rate with distance from the hippocampus, a power-law decay of memories can be observed in the union of all shortcuts, e.g., by reading out the shortcut with the strongest memory trace at any moment in time (Fig 5B). This observation is in line with a rich history of psychological studies on the mathematical shape of forgetting curves [28]. Note that for the readout we tried to make as few assumptions as possible by letting all pathways contribute on an equal footing. Taking the maximum over the pathways (as well as the mean) generates a power law. Notably, we achieved memory retention times of years through only a small number (∼5) of iterations of the PPT. Finally, we found that memory retrieval accelerates during consolidation (Fig 5C), in line with consolidation studies for motor skills [62]. In our consolidation model, the time to recall decreases because the path from peripheral input to output becomes shorter through the use of more direct (peripheral) shortcut connections (Fig 5A and 5B).

The predicted consolidation-mediated decrease of the time to recall critically depends on the utilized plasticity rule (STDP), which uses timing of input and output of neurons, and on our assumption that memories are initially acquired in an indirect pathway with a longer delay than direct pathways. While this assumption is reasonable for declarative memories that are initially stored in the hippocampus and then consolidated in sensory or motor areas towards neocortex, the underlying computational reasons for such a strategy are unknown. The strategy of the initial storage of memories in a pathway with a longer transmission delay could be related to the Complementary Learning Systems Theory (CLST) [11, 63] if the initial storage needs some preprocessing, e.g., to achieve representations that are suited for one-trial learning, e.g. population-sparse representations [9]. In general, our results do not imply that the reduction of delay is a central goal of systems memory consolidation or that it is even necessary. Reduction of delay may, however, be a nice side effect of systems memory consolidation with timing-based plasticity rules [64]. And such a reduction of delay does not need to be restricted to declarative memories but also could apply to, e.g., motor skill learning or habit formation.


We proposed the parallel pathway theory (PPT) as a mechanistic basis for systems memory consolidation. This theory relies on two abundant features in the nervous system: parallel shortcut connections between brain areas and Hebbian plasticity. A mathematical analysis suggests that STDP in a direct pathway achieves consolidation by implementing a linear regression that approximates the input-output mapping of an indirect pathway by that of the direct pathway. We applied the PPT to hippocampus-dependent memories and showed that the proposed mechanism can transfer memory associations across parallel synaptic pathways. This transfer is robust to different representations in those pathways and requires only weak correlations. Our results are in quantitative agreement with lesion studies of the perforant path in rodents [27] and are able to reproduce forgetting curves that follow a power-law as observed in humans [28].

Theory requirements and predictions

In addition to the anatomical motif of shortcut connections and Hebbian synaptic plasticity, the parallel pathway theory relies on four further requirements during the consolidation phase, which can also be considered as model predictions.

  1. (1). Temporal correlations between the inputs from the two input pathways are necessary during consolidation, and these correlations should be similar to the ones during storage and recall. For example, a consolidation from hippocampus into neocortex would require correlations between cortical and hippocampal activity, as reported in [65]. Similarly, a consolidation of spatial memories within the hippocampal formation (including the medial entorhinal cortex, MEC) during replay would require correlations between activity in MEC and hippocampus; in particular, the same locations should be replayed, but represented by grid cells in MEC and by place cells in CA3 and CA1, as in Fig 3. A significant but weak correlation between the superficial layers of MEC (which provides input to the hippocampus) and CA1 was indeed observed [45]. Furthermore, pyramidal cells in the superficial layer III (projecting to CA1, “direct path”) and stellate cells in the superficial layer II (projecting to DG, which projects to CA3, “indirect path”) are expected to be correlated due to a strong excitatory feedforward projection from pyramids to stellates [66]; reviewed in [51]. Coordinated grid and place cell replay was also observed in [67] but there CA1 and deep layers of MEC (which receives the hippocampal output) were studied.
  2. (2). The direct pathway should be plastic during consolidation, while the stored associations in the indirect path remain sufficiently stable (in contrast to the model in [24]). In practice, this requires the degree of plasticity to differ between periods of storage and consolidation (e.g., due to neuromodulation [68, 69]), in a potentially pathway-dependent manner. In other words, the requirement is that the content of a memory should not be altered much while creating a backup.
  3. (3). Plasticity in the shortcut pathway should be driven by a teaching signal from the indirect pathway. This can be achieved by STDP in combination with longer transmission delays in the indirect pathway, as suggested here, but other neural implementations of supervised learning may be equally suitable [42, 43, 70].
  4. (4). Within the present framework, a systematic decrease in learning rates within the consolidation hierarchy (Fig 5) is needed to achieve memory lifetimes on the order of years. That is, synapses involved in later stages of consolidation should be less plastic during consolidation periods such as sleep, as also suggested by [10] and [24]. Furthermore, Roxin and Fusi elegantly showed in [10] that a multistage memory system confers an advantage (in terms of memory lifetime, memory capacity, and initial signal-to-noise ratio) compared to a homogeneous memory system with the same number of synapses, which provides a fundamental computational reason for the existence of a memory consolidation processes at the systems level. However, to be able to exploit this advantage, an efficient mechanism to transfer memories across stages is necessary. The proposed PPT explains how memories can be transferred in a biologically plausible way in a multistage memory system.
    Conceptually related to models of systems-memory consolidation with a systematic decrease in learning rates across a hierarchy of networks are models of synaptic memory consolidation with complex synapses that can assume many different states and a decrease of plasticity across a hierarchy of states. In such models of synaptic memory consolidation, also a power-law forgetting has been achieved [8, 71]. Synaptic and systems memory consolidation models are different but not mutually exclusive.

What limits systems memory consolidation?

Our account of systems memory consolidation explains how memories are re-organized and transferred across brain regions. However, certain forms of episodic memory remain hippocampus-dependent throughout life [21].

In the context of the present model, this restriction could result from different factors. The PPT simplifies memory engrams by replacing multisynaptic by monosynaptic connections whenever possible. However, a shortcut pathway may not be present anatomically, or it may not host an appropriate representation for a given cue-response association in question. For example, it may be difficult to consolidate a complex visual object detection task into a shortcut from primary visual cortex (V1) to a decision area because the low-level representation of the visual cue in V1 may not allow it [72, 73]. The same applies to tasks that require a mixed selectivity of neural responses [74]. Such tasks cannot be fully consolidated into shortcuts with simpler representations of cues and/or responses that do not allow a linear separation of the associations. On the basis of similar arguments, early work suggested that the hippocampus could be critical for learning tasks that are not linearly separable [75].

Within the present framework, the consolidated memory is in essence a linear approximation of the original cue-response association, as indicated in the theoretical analysis around Eqs (7) and (8). The resulting simplification of the memory content could underlie the commonly observed semantization of memories and the loss of episodic detail [20, 21]. Such a semantization could already occur in the earliest shortcut connections [76], but could also gradually progress in a multi-stage consolidation process.

Relation to phenomenological models of systems consolidation

The basic mechanism of our framework explains memory transfer between brain regions, which is in line with the Standard Consolidation Theory (SCT) [11, 19]. Our theoretical framework is closely related to the Complementary Learning Systems Theory (CLST) [11, 63], which posits that slow and interleaved cortical learning is necessary to avoid catastrophic interference of new memory items with older memories [77]. In our model, later—presumably neocortical—shortcut connections have lower learning rates to achieve longer memory retention times. Interleaved learning could be achieved by interleaved replay [7880] during consolidation. Thereby, the results of CLST can be directly applied to learning in shortcuts in our model, such as the rapid neocortical consolidation of new memories that are in line with a previously learned schema [17, 63, 81].

Limitations of memory transfer between brain regions—as discussed above—can impair the consolidation process, resulting in memories that remain hippocampus-dependent throughout life. Hence, our theoretical framework is also in agreement with the Multiple Trace Theory (MTT) [16] and the Trace Transformation Theory (TTT) [20, 21]. The MTT postulates that memories are re-encoded in the hippocampus during retrieval, generating multiple traces for the same memory. Our model maintains multiple memory traces in different shortcut pathways, even without a retrieval-based re-encoding. The consolidation mechanism of the PPT, however, could also transfer a specific memory multiple times if it is re-encoded during retrieval. If neocortex extracts statistical regularities from a collection of memories [11], the consolidation of such a repeatedly re-encoded memory could then lead to a gist-like, more semantic version of that memory in neocortex [16, 21, 82], as emphasized by the TTT.

The premise of our model is that memories are actively transferred between brain regions. This premise has recently been subject to debate [8385], following the suggestion of the Contextual Binding (CB) theory. The CB theory argues that amnesia in lesion studies and replay-like activity can be explained by simultaneous learning in hippocampus and neocortex, together with interference of contextually similar episodic memories [83]. Note, however, that our framework does not exclude a simultaneous encoding in neocortex and hippocampus, which can be combined with active consolidation [1, 86].

Hence, our mechanistic approach is in agreement with and may allow for a unification of several phenomenological theories of systems consolidation.

Consolidation of non-declarative memories

Given that shortcut connections are widespread throughout the central nervous system [25, 60], the suggested mechanism may also be applicable to the consolidation of non-declarative memories, e.g., of perceptual [4] and motor skills [5], fear memory [87] or to the transition of goal-directed to habitual behaviour [88].

Several studies have suggested two-pathway models in the context of motor learning [8992]. In particular, Murray and Escola [92] recently used a two-pathway model to investigate how repeated practice affects future performance and leads to habitual behaviour. While their model does not incorporate an active consolidation mechanism or multiple learning stages, the basic mechanism is the same: A fast learning pathway from cortex to sensorimotor striatum first learns a motor skill and then teaches a slowly learning pathway from thalamus to striatum during subsequent repetition.

Limitations of the model and future directions

The present work focuses on feedforward networks and local learning rules. Hence, the model cannot address how systems memory consolidation affects the representation of sensory stimuli and forms schemata that facilitate future learning [17, 81] because representation learning typically requires a means of backpropagating information through the system, e.g., by feedback connections [93]. The interaction of synaptic plasticity with recurrent feedback connections generates a high level of dynamical complexity, which is beyond the scope of the present study. Our framework also does not explain reconsolidation, that is, how previously consolidated memories become labile and hippocampus-dependent again through their reactivation [94, 95].

On the mechanistic level, the PPT predicts temporally specific deficits in memory consolidation when relevant shortcut connections are lesioned, that is, a tight link between the anatomical organisation of synaptic pathways and their function for memory. These predictions may be most easily tested in non-mammalian systems, where connectomic data are available [96].

The PPT could provide an inroad to a mechanistic understanding of the transformation of episodic memories into more semantic representations. This could be modelled, e.g, by encoding a collection of episodic memories that share statistical regularities and studying the dynamics of statistical learning and semantisation in the shortcut connections during consolidation. Such future work may allow us to ultimately bridge the gap between memory consolidation on the mechanistic level of synaptic computations and the behavioural level of cognitive function.


Consolidation in a single integrate-and-fire neuron

For the results shown in Fig 1E and 1F we used a single integrate-and-fire model neuron that received excitatory synaptic input. The membrane potential V(t) evolved according to (9) with membrane time constant τm = 20 ms, resting potential Vrest = −70 mV, and synaptic reversal potential Esyn = 0 mV. When the membrane potential reached the threshold Vthresh = −54 mV, the cell produced a spike and the voltage was reset to −60 mV during an absolute refractory period of 1.75 ms.

The total synaptic conductance gsyn(t) in Eq (9) is denoted in units of the leak conductance and thus dimensionless (parameters are taken from [97]). The total synaptic conductance was determined by the sum of 1000 Schaffer collateral (SC) inputs and 1000 perforant path (PPCA1) inputs. Activation of input i (where i denotes synapse number) leads to a jump gi > 0 in the synaptic conductance: (10) All synaptic conductances decay exponentially, (11) with synaptic time constant τsyn = 5 ms. The PPCA1 inputs were activated by mutually independent Poisson processes with a mean rate of 10 spikes/s. The activity patterns of the SC fibers were identical to those of the PPCA1 fibers but were delayed by 5 ms.

The synaptic peak conductances or weights, gi, were either set to a fixed value or were determined by additive STDP [98]. A single pair of a presynaptic spike (at time tpre) and a postsynaptic spike (at time tpost) with time difference Δttpretpost induced a modification of the synaptic weight Δgi according to (12) with τSTDP = 20 ms. Lt) is the learning window of STDP [98]. Hard upper and lower bounds were imposed on the synaptic weights, such that for all i, where the dimensionless maximum synaptic weight was . Parameters and A = 1.05 ⋅ A+ with η = 0.005 determine the maximum amounts of LTP and LTD, respectively.

Synaptic weights were initialized to form a bimodal distribution, such that it agrees with the steady state weight distribution resulting from additive STDP, when presynaptic input consists of uncorrelated Poisson spike trains [98]. Specifically, half the weights were sampled from an exponential distribution with mean , the other half as minus that same exponential distribution.

The dynamics were integrated numerically using the forward Euler method, with an integration time step of 0.1 ms.

Consolidation of spatial representations in a multi-compartment neuron model

The results presented in Fig 3C–3G relied on numerical simulations of a conductance-based compartmental model of a reconstructed CA1 pyramidal cell (cell n128 from [99]). Passive cell properties were defined by the membrane resistance Rm = 30 kΩ cm2 with reversal potential EL = −70 mV, intracellular resistivity Ri = 150 Ωcm, and membrane capacitance Cm = 0.75μF/cm2. Dendrites were discretized into compartments with length smaller than 0.1 times the frequency-dependent passive space constant at 100 Hz. Three types of voltage-dependent currents and one calcium-dependent current, all from [100], were distributed over the soma and dendrites. Gating dynamics of the currents evolved according to standard first-order ordinary differential equations. The steady state (in)activation functions x and voltage-dependent time constants τ for each gating variable (i.e., x = m, h, n; see below) were calculated from a first-order reaction scheme with forward rate αx and backward rate βx according to x(V) = αx(V)/(αx(V) + βx(V)) and τx(V) = 1/(αx(V) + βx(V)) where V was the membrane potential. All used current densities and time constants were selected for a temperature of 37°C (see [100]).

A fast sodium current, INa, was distributed throughout the soma ( pS/μm2) and dendrites ( pS/μm2), except from the distal apical dendritic tuft, (13) with reversal potential ENa = 60 mV. The dynamics of activation gating variable m and inactivation gating variable h were characterized by (14) Here and in the following, we dropped units for simplicity, assuming that the membrane potential V is given in units of mV.

The steady-state inactivation function was defined directly as (15)

A fast potassium current, IKv, was present in the soma ( pS/μm2) and throughout the dendrites ( pS/μm2), (16) with reversal potential EK = −90 mV and with activation gating variable n characterized by (17)

A high-voltage activated calcium current, ICa, was distributed throughout the apical dendrites ( pS/μm2) with an increased density ( pS/μm2) for dendrites distal from the main apical dendrite’s bifurcation, (18) with reversal potential ECa = 140 mV and with activation gating variable m and inactivation gating variable h characterized by (19)

A calcium-dependent potassium current, , was similarly distributed throughout the apical dendrites ( pS/μm2) with an increased density ( pS/μm2) beyond the main bifurcation of the apical dendrite, (20) with activation gating variable n characterized by (21) with [Ca2+] in μM.

Internal calcium concentration in a shell below the membrane surface was computed using entry via ICa and removal by a first-order pump, (22) with Faraday constant F, depth of shell d = 0.1 μm and with [Ca2+] = 0.1μM, and τR = 80 ms. To account for dendritic spines, the membrane capacitance and current densities were doubled throughout the dendrites. An axon was lacking in the cell reconstruction and was added as in [100].

Excitatory synaptic inputs were distributed over the membrane surface. Upon activation of a synapse, the conductance with a reversal potential of 0 mV increased instantaneously and subsequently decayed exponentially with a time constant of 3 ms. The PPCA1 provided 500 inputs that were distributed with uniform surface density throughout the distal apical tuft dendrites; the SC provided 2500 inputs, distributed uniformly over basal dendrites and proximal apical dendrites [52].

All inputs were spatially tuned on a 2.5 m long linear track over which the simulated rat walked. The PPCA1 inputs showed periodic, grid field-like spatial tuning with periodicity ranging from 2 to 6 grid fields along the entire track with random phase: , where is the Heaviside step function, r is the mean firing rate within the grid field, k is the spatial frequency, and ξi is the random spatial phase offset for neuron i (for i = 1, …, 500). The 2500 SC inputs showed place field-like tuning, having single, 25 cm long place fields distributed uniformly random along the track. When the virtual rat was within the place or grid field of an SC or PPCA1 fiber, respectively, the input was activated as an independent Poisson process with a mean rate of r = 10 spikes/s. Outside of the place/grid fields the fibers were quiescent. Simulations of the consolidation phase considered replay of the rat walking back and forth along the linear track, with running speeds increased, compared to realistic speeds, by a factor 20 (5 m/s; [13]). SC input activity to the CA1 cell was delayed by 5 ms with respect to the PPCA1 input [101], accounting for the extra processing stages involved for information reaching CA1 from the entorhinal cortex through DG and CA3, compared to the direct entorhinal PPCA1 input.

The PPCA1 and/or SC inputs showed additive STDP, operating in the same manner as defined around Eq (12). Post-synaptic spikes were defined as local voltage crossings of a threshold at −30 mV. The maximum synaptic weight for the SC inputs was 400 pS and 140 pS for the PPCA1 inputs.

The reference tuning curve shown in Fig 3F (PPCA1 inputs theory) was computed by adding up all grid field tuning functions that had an active field in the SC-encoded spatial position (i.e., halfway along the linear track).

Simulations were carried out with a fixed time step of 25 μs using the NEURON simulation software [102].

Consolidation of place-object associations in multiple hippocampal stages

The results related to Fig 4 show the acquisition and consolidation of place-object associations in a hippocampal network model. Every day a virtual animal learns the position of one of many possible objects in a circular open field environment. The simulations show that during a subsequent sleep phase, replay of the hippocampal activity that is associated with runs through this environment allows for the consolidation of the place-object association. We call the imprinting of a new memory and the subsequent memory consolidation phase a consolidation cycle. In the simulations, a place-object association learned at time t = 0 is tracked for Ncycle consolidation cycles, i.e., nights after memory acquisition. Between consolidation cycles, the memory in the system is assessed as described below.

Model architecture.

The model consists of four neuronal layers: entorhinal cortex (EC), dentate gyrus/CA3 (DG-CA3; note that the dentate gyrus is not explicitly included as a separate area), CA1, and the subiculum (SUB). Each layer consists of a population of place-coding cells and a population of object-coding cells. The connectivity is depicted in Fig 4A: EC projects to DG-CA3, which connects to CA1 (through the SC pathway), which in turn connects to the SUB. EC provides also shortcut connections to CA1 (PPCA1 pathway) and the SUB (PPSUB pathway).

The SC, PPCA1, and PPSUB pathways consist of four different connection types among populations of neurons that represent either place or object: (i) from object (populations) to object (populations), (ii) from place to place, (iii) from object to place, and (iv) from place to object. For simplicity, the pathway from CA1 to the SUB consists only of place-to-place and object-to-object connections, because we never store object-place or place-object associations in this pathway. The pathway from EC to DG-CA3 was not explicitly modelled. Instead, we assumed that the same location (of the virtual animal) is represented in both areas, but with a grid cell code and a place cell code, respectively. We assumed that all connections have the same transmission delay, which is equal to one time step D = ΔT = 5 ms in the simulation (see Table 1 for parameter values). In practice, this meant that the activities in the SC pathway and the connection from CA1 to the SUB each had a transmission delay D relative to the activities in the connections from EC to DG/CA1 and from EC to SUB.

Activities of neurons in each layer were described as firing rates and were determined by a linear model, (23) (24) where xEC(t) and xCA3(t) are the activities in the input layers EC and DG-CA3, respectively, and yCA1(t) and ySUB(t) represent the activities in the output layers CA1 and SUB, respectively. Time is denoted by t. The symbols WPP-CA1 and WPP-SUB denote the weight matrices of the pathways from EC to CA1 and from EC to SUB, respectively. The matrices VSC and VCA1-SUB summarise the weights from DG-CA3 to CA1 and from CA1 to SUB, respectively, which mediate the transmission delay D. Eqs (23) and (24) are identical in structure to Eq (1) except that now the output is a vector (and not a scalar) and the synaptic weights are a matrix (and not a vector).

As already mentioned above, each neuron in a layer is assumed to primarily encode either place or object information (see Fig 4A). To simplify the mathematical analysis, we turn to a notation where we write a layer’s activity vector z (where z = xEC, xCA3, yCA1, or ySUB) as a concatenation of place and object vectors: (25) where the number of place- and object-coding cells is identical, dim(zplace) = dim(zobject) = N, hence dim(z) = 2N. Correspondingly, the weight matrices M (where M = WPP-CA1, WPP-SUB, VSC, or VCA1-SUB) are composed of four submatrices, connecting the corresponding feature encoding sub-vectors (place-place, place-object, object-place, and object-object): (26) Associations between objects and places were initially stored in VSC as described below. To achieve a consistency in the code for places and objects, the weights in VSC and VCA1-SUB that connect neurons coding for the same feature (i.e., place-place or object-object) were set proportional to identity matrices I, (27) (28) The scaling factors and ensure that these pathways had similar impact as the other pathways projecting to CA1 cells and SUB cells, respectively, and is twice as large as to account for the fact that only in the CA1-SUB pathway the object-place and place-object connections were set to zero. The matrices WPP-CA1 and WPP-SUB, which represent shortcuts, were plastic during a consolidation cycle and evolved according to the learning rule described below. Their initial values were chosen as a random permutation of an equilibrium state, taken from a long running previous simulation.

Place- and object-coding cells.

Place-coding cells in EC and DG-CA3 were assumed to respond deterministically, given a two-dimensional position variable p(t) ∈ [0, 1]2, which evolves in time.

Place-coding cells in entorhinal cortex show grid field spatial tuning [48], which we modelled as a superposition of 3 plane waves with relative angles of : (29) where the spacing is chosen so that a total range of 2 to 6 periods fit into the circular environment. The orientation of the plane waves is determined by the vector where θi are uniformly chosen random angles, and pi ∈ [0, 1]2 are uniformly sampled random phases of the grid field [49]. Each cell’s output rate varies between 0 to rmax spikes per second.

Place-coding cells in DG-CA3 show place-field tuning and were assumed to have a 2D Gaussian activity profile (30) where rmax is the maximum rate, σ the field size, and ci the centre of field i. The centres ci were chosen to lie on a regular grid.

The object-coding cells in EC and DG-CA3 respond with fixed deterministic responses and to each of Nobject objects. Given that they are located in the same brain region, we assumed that the firing-rate statistics of the object-coding cells and the place-coding cells were similar, both in EC and CA1. This was ensured by calculating the rates of the object-coding cells in two steps. First, we used the same equations as for the place-coding cells (i.e., Eq (29) for EC cells and Eq (30) for DG-CA3 cells) with a randomly selected “object position” oi, i ∈ {1, ‥, Nobject} for each of the Nobject objects. Subsequently the rates of the neurons within the population were randomly permuted for each object, to avoid an artificial constraint of the population activity onto a 2-dimensional manifold.

Imprinting of place-object associations in the SC pathway.

The virtual animal learned a single new object-to-place association each day. Storing more memories per day would not qualitatively change the results, but would merely alter the time scale at which a given memory is overwritten in the SC pathway. Memories were imprinted in VSC by first determining the activities of the object-coding DG-CA3 cells and place-coding CA1 cells given a random object and a random position where the object was encountered (see previous section). The weights in VSC that connect object cells to place cells were then updated according to (31) where 0 < λSC < 1 (numerical values of parameters are summarized in Table 1) denotes the strength of the new memory and controls the rate of forgetting. The symbol [M]norm denotes the normalized version of the matrix M; the normalisation ensures that the biggest sum along the columns of [M]norm was 1 by rescaling all entries of M with the same factor. The specific choice of the normalisation does not alter the results. The inner norm in Eq (31) ensures the same relative influence of different memories, irrespective of the associated activity levels. This ensures an approximately constant rate of overwriting/forgetting. The outer norm guarantees that the weights VSC stay bounded and hence induces forgetting. As a consequence of this updating scheme, the memories are lost over time. Note that before we imprint a new memory to VSC (other than on day 0 on which the place-object association is learned that is tracked during the simulation), the place-coding cells in DG-CA3 are remapped, i.e., they are assigned to new random positions. This corresponds to learning the new object in a new environment/room, and effectively reduces the amount in interference between memories. Before starting a simulation, we imprinted Nmem place-object associations to VSC to ensure an equilibrium state.

The weights from place-to-object coding cells could be updated analogously. This would allow to decode the identity of a stored object given a location. We did not test this direction of the object-place association, because this is not relevant for the water maze task.

Learning rule operating on PPCA1 and PPSUB pathways.

The plastic weight matrices WPP-CA1 and WPP-SUB changed according to a timing-based learning rule [41]: (32) where W is either WPP-CA1 or WPP-SUB, and y correspondingly yCA1 or ySUB. The learning window L(τ) defined in Eq (12) determines the learning dynamics.

Eq (32) differs from the corresponding Eq (2) in several ways. First, on the left-hand side there is now a derivative, in contrast to the earlier version with a differential quotient; and on the right-hand side we omit the angular brackets that indicated a temporal average. Therefore, Eq (32) represents the instantaneous change of weights for a particular input, which is numerically more straightforward to implement in an online-learning paradigm. The resulting weight change for long times and many inputs approximates well Eq (2) if consolidation is slow enough. Second, we now omit the learning rate parameter η, which is absorbed in the definition of the parameters A+ and A of the learning window L. Third, there are now two addends in the integral and the integration limits are from 0 to ∞. This is equivalent to the earlier definition, but more convenient for a numerical implementation. All this allows to simplify the description of the learning dynamics, as will be outlined in what follows.

We integrated the learning dynamics using the Euler method, with time steps ΔT equal to the inverse pattern presentation rate. In practice, we used the standard method of calculating pre- and postsynaptic traces and to integrate the equation (33) where A+ and A again determine the maximum amount of potentiation and depression of the synaptic weights, respectively. Note that these parameters effectively control the learning rate and are chosen twice as large in the PPCA1 than in the PPSUB (Table 1), to increase memory lifetime in the latter shortcut. Again, we used an exponential window function L(τ), so that exponentially filtered activities and can be calculated as in [98]: (34) where τSTDP determines the width of the learning window.

Weight values are constrained to the interval [0, wmax]. The weights of WPP-CA1 and WPP-SUB were initialized to small random values from a uniform distribution in .

For each iteration in a consolidation cycle of duration Tc, i.e., every ΔT = 5ms, we chose a random input position and a random object to calculate the activities in all layers. These activities were then used to update the weights as given in Eq (33).

Assessing the strength of memories in SC, PPCA1, and PPSUB.

To assess the memory strength encoded in a pathway, we determine the activity yplace of place-coding cells (in either CA1 or SUB) in response to an object o ∈ {1, …, Nobjects} along the object-to-place pathway under consideration (e.g., for PPCA1 it would be from object-coding cells in EC to place-coding cells in CA1). From this response we decode the memorized place of the object using Bayesian inference. However, the response is usually corrupted due to various factors such as imperfect imprinting, consolidation, or interference with other memories. Assuming that these imperfections result from a superposition of many statistically independent factors, we use a Gaussian likelihood: (35) where is the multivariate Gaussian probability density function, σnoise is the standard deviation of the noise, i.e., the imperfections. I is the identity matrix, i.e., we assumed uncorrelated noise in the responses.

The expected activity μ(p) depends on the location p and is given by the activity that would result from the activation of place-coding cells in EC or DG-CA3, i.e., by Eqs (30), (23) and (24). Because the connections between place-coding cells in DG-CA3, CA1, and SUB are scaled identity matrices, the expected activity μ(p) is essentially a place-cell code: (36) To avoid a dependence on overall activity levels, μ(p) and yplace are normalized to zero mean and unit variance.

Using Bayes’ theorem we can now calculate the posterior probabilities of the places that coded for the given response yplace: (37) (38) (39) where for Eq (38) we used a flat prior, because the environment was uniformly sampled in the simulations. To avoid the explicit evaluation of the sum in the denominator, we normalise the evaluated place probabilities to sum to one. We make use of the linear relationship of the place response given an object (see Eqs (23) to (26)): (40) where the matrix Mobject,place is either , , or , depending on the pathway for which the strength of the memory is assessed. This allows to compute the posterior probability of the place given an object (Fig 4B and 4C): (41)

Memory consolidation over many days.

To simulate a single consolidation cycle (i.e., a storage of a new memory followed by a single consolidation phase), we alternated the imprinting of a new place-object association (Eq (31)) with a consolidation phase of length Tc. Before starting the experiments, we equilibrated the weights WPP-CA1 and WPP-SUB by simulating Nequi consolidation cycles. At day 0 we imprinted the object : the memory which was tracked. After each following consolidation phase the place probabilities along the different pathways were calculated for object according to Eq (41) (see Fig 4C).

Lesion experiments.

Remondes and Schuman [27] lesioned the perforant path (temporoammonic pathway) during a Morris water maze consolidation experiment. Their finding evidenced a role of the perforant path in memory consolidation by showing that the precise time-point of the lesion after memory acquisition determined whether the memory persisted (see Fig 4D).

In our simulations we implemented a lesion by setting all PPCA1 weights to 0 (WPP-Ca1 = 0) and by disabling their plasticity. Like in the experimental setup of [27], we lesioned either right before or 21 days after presentation of object . For each day and lesioning protocol, the place probabilities, Eq (41), along the pathways can then be calculated. The pathway with the highest inferred object position probability was then selected, and the summed probabilities per quadrant were calculated for this pathway. To account for exploration versus exploitation (see, e.g., [103]) of the rats, the inferred probabilities were linearly mixed with a uniform distribution over the quadrants. We used 70% explore versus 30% exploit for the plots in Fig 4D. Note that we assumed that the probabilities per quadrant correspond to the time spent in each quadrant.

Consolidation in a hierarchical rate-based network

Fig 5 demonstrates the consolidation of memories in a hierarchy of connected neural populations. In the model, signals flow along distinct neocortical neural populations to the hippocampal formation (HPC) and back into neocortex (black arrows in Fig 5A). Shortcut connections exist between the neocortical populations (colored arrows in Fig 5A). All connections carry the same transmission delay D.

Every day new memories are imprinted into the weight matrix representing the HPC. The model describes the transfer of the memories into neocortex during Ncycle consolidation phases, of which there is one per night (for all model parameters and values, see Table 2). In contrast to the model for Fig 4, we do not consider object-place associations, but directly analyse correlations between a stored memory weight matrix and the weight matrices that describe the neocortical shortcut connections.

Model details.

We consider a hierarchy of 2L neocortical populations with L = 8 shortcut connections. Activities of the populations that project towards the HPC are given by vectors xi(t) and the activities of the populations leading away from the HPC by vectors yi(t) (i ∈ {1, …, L}). At each iteration, the activities xL(t) (i.e., the neocortical population most distal from the HPC) are sampled from a Gaussian distribution with a mean input rate r and a standard deviation r/2. The sampled activities are rectified to be non-negative (r ← max(r, 0)), hence yielding a rectified Gaussian distribution. The activities on all other layers are then determined by their respective connections. For simplicity, we assume that weight matrices connecting subsequent populations in the hierarchy (black arrows in Fig 5A) are identity matrices that are scaled such that activity levels remain comparable along the hierarchy (see below). The results do not depend on this simplifying assumption. The population activities along the HPC directed path are then given as (42) In Fig 5, we modelled the HPC as a single neural population, with activities given by (43) Here, VHPC is the hippocampal-formation weight matrix into which new memories are imprinted (see below).

The first outward-directed neocortical population receives input from the HPC and through a shortcut connection from the activities x1, (44) Using Eq (43), we obtain (45) Note that Eq (45) is slightly different from Eq (1) because we have included the delay D now also in the direct pathway, for consistency; this does not influence the learning dynamics or the applicability of the theoretical analyses because the same delay is included in the learning rule in Eq (48). Subsequent activities yi of populations projecting away from HPC are calculated as (46) where Wi are the direct shortcut connections from the populations xi to the populations yi.

Memory imprinting to the HPC weight matrix VHPC is analogous to the imprinting used in Fig 4 (compare Eq (31)). Before each consolidation phase, new memories were sampled from a binomial distribution B(1, 0.5). The HPC weights were then updated as (47) where denotes the L1 normalization of each row of the matrix M and 0 < λ < 1 is the strength of a new memory.

All shortcut connections Wi showed plasticity similar to Eqs (33) and (34), i.e. (48) and (49) with parameters τSTDP, , and specified in Table 2. Weights were constrained to the interval [0, wmax] with and N being the number of neurons per layer. Initial weights were drawn from a uniform distribution in this interval. To increase memory lifetime in the system, learning rates were decreased along the hierarchy such that the learning rate in layer i is smaller than that in layer 1 by a factor qi−1. Hence, layers closer to the HPC are more plastic than more remote layers.

Before starting the main simulation of Ncycle consolidation cycles, we equilibrated the weight matrices by simulating Nequi consolidation cycles.

Assessing the strength of memories in neocortical weight matrices.

To assess the decay of memory in the system, a reference memory Vref, i.e. a specific realization from a row-normalized binomial distribution B(1, 0.5), was imprinted according to Eq (47) to VHPC at time t = 0. The memory pathway correlation, i.e., the Pearson correlation of this reference memory with all shortcut weight matrices Wi was then calculated.

In analogy to the Methods on Fig 4, the maximum correlation (across layers) was taken as the overall memory signal of the system. This yields the power law in Fig 5B. The noise level indicated in Fig 5B is the standard deviation of the correlation between two random matrices drawn from a binomial distribution B(1, 0.5) and then row-normalized, both having sample size N2. Considering the central limit theorem, the noise level will be approximately 1/N.

Theoretical analysis of hierarchical consolidation

As outlined in the Results and illustrated in Fig 5, the suggested consolidation mechanism can be hierarchically iterated and leads to power law forgetting when the learning rates in the various pathways are suitably chosen. To get a theoretical understanding of this behaviour, let us consider the architecture shown in the Fig 6A, which is a generalized version of Fig 5A. The network consists of a hierarchy of N + 1 input layers and N + 1 output layers. For mathematical simplicity, the network is assumed to be linear (in contrast to the model described in Fig 5A, which was nonlinear due to biologically motivated weight constraints), and the representation in the input layers is assumed to be the same, i.e., the weight matrices between the input layers (indicated in black in Fig 6A) are all simply the identity matrix (in contrast to the model described in Fig 5A where the identity matrices were also scaled). Similarly, we also assume that all weight matrices between the output layers are also the identity matrix. The mathematical derivations presented in the following can be generalized to arbitrary weight matrices both in the input and the output pathways, but we prefer to treat the simple case to avoid cluttered equations and to make the theoretical approach more accessible.

Fig 6. Mathematical analysis of the hierarchical consolidation network.

(A) The mathematical analysis is performed for a network consisting of N + 1 input and N + 1 output layers. All output layers (except output layer 0) weight the input from the previous layer with a factor α and the input via the shortcut pathway with a factor 1 − α, to ensure that activity does not rise as increasingly many pathways converge onto the output layers. Input layer i is hence connected to output layer i through a shortcut connection with weight matrix (1 − α)Wi (except for the bottom-most layers i = 0, for which no factor 1 − α is required). All connections between input layers are set to the identity matrix I, and all connections between output layers are set to αI, for notational simplicity in the derivations. The math can be generalized to arbitrary connection matrices, as long as the network is linear. Each connection introduces a synaptic delay of D. The multi-synaptic pathway from input layer i to output layer i via shortcut connection ji has a total delay of (2(ij) + 1) ⋅ D, so the difference in delays between the pathway through shortcut i and shortcut j is Dij = 2(ij) ⋅ D. (B) The similarity Oi of the weight matrix W0 (in which memory traces are initially stored) and the shortcut connection Wi as a function of the time elapsed after storage (colored lines), and their maximum (black line). Simulations shown for D = 2 ms, α = 0.8, ηi = 2i and STDP time constant τSTDP = 40 ms.

We assume that due to newly acquired memories during the day, the weight matrix W0(t) (earlier called VHPC) that represents the memory trace in the hippocampus is varying in time, with an exponentially decaying autocorrelation function with time constant , where tr denotes the trace of a square matrix.

All other pathways that project from an input layer to an output layer are plastic according to STDP. To derive the learning dynamics for these pathways, we first have to calculate the activity yi in the i-th output layer, (50) where xj denotes the activity in input layer j and cij denote weighting factors that determine the impact of the jth pathway, i.e. the indirect pathway via Wj, on output layer i. These weighting factors are needed, because we would like to keep the weight matrices on a similar scale, but avoid that the activity increases from one output region to the next, because more synaptic pathways converge onto “later” output layers. The symbol Dij = 2D(ij) (defined only for ij) denotes the total additional delay that is accumulated on the connection from the i-th input layer to the i-th output layer that traverses the j-th direct “shortcut” pathway, relative to the direct shortcut from input layer i to output layer i. For simplicity, we assumed that all connections have the same delay D. In a very similar way as in Eq (3), the learning dynamics of the weight matrix Wi in the direct path can be written as (51) where ηi denotes the learning rate for the i-th pathway. For simplicity, we will assume that the different components of the input signal vector xi(t) are uncorrelated amongst each other, and have identical temporal autocorrelations that are also independent of the layer index: , where I is the identity matrix. The learning dynamics then simplify to (52) with A(D) ≔ ∫L(τ)f(τD) dτ.

To measure the degree to which a memory trace that is stored in the weight matrix W0 at time t = 0 is still present in the j-th shortcut pathway at a later time t, we compare the weight matrix Wj(t) at time t to the weight matrix W0(0) at time t = 0. We quantify the correlation of these two matrices by calculating the summed overlap of the column vectors: (53)

Note that the overlaps Oi(t) are real numbers, and that their temporal dynamics for the shortcut connections (i.e., for all i > 0) are dictated by the dynamics of the weight matrices in the network: (54) (55) (56) To capture the exponential decay of the initially stored memories in the “hippocampal” weight matrix W0 due to the storage of new memories, the set of dynamical equations is completed by (57) Note that the dynamics of the overlaps Oi form a linear dynamical system.

To show that this mathematical description shows a power-law behavior akin to the simulated system in Fig 5, we simulated the equations with the following parameter choices. Consistent with the exponential decay of the learning rates in the simulations, we chose the learning rates as ηi = 2i. The weighting factors cij were chosen based on the assumption that output layer i (for i > 0) receives a fraction α of its input from the output layer i − 1 below, and a fraction 1 − α via its direct shortcut connection (associated with the weight matrix Wi). Taking into account that the signal reaching layer i through shortcut connection j traverses several of these weighting stages (Fig 6A), this choice yields cij = αij for j = 0 and cij = αij(1 − α) for j > 0. Note that , so the activity level in different output layers should be similar. Finally, we assume that each synaptic transmission generates a fixed delay D and that the autocorrelation function f(τ) decays much more quickly than the STDP learning window. In this case, we can approximate .

For the simulations illustrated in Fig 6, we chose τSTDP = 40 ms as the time constant of an exponentially decaying STDP learning window for positive delays τ > 0, and we set A+ = 1 in Eq (12). Furthermore, we used D = 2 ms. As shown in the Fig 6B, the maximum of the overlaps Oj indeed approximates a power law decay.


We would like to thank Naomi Auer, Tiziano D’Albis, and Robert Gütig for discussions and feedback on the manuscript.


  1. 1. Dudai Y, Karni A, Born J. The consolidation and transformation of memory. Neuron. 2015;88(1):20–32. pmid:26447570
  2. 2. Squire LR, Genzel L, Wixted JT, Morris RG. Memory consolidation. Cold Spring Harbor Perspectives in Biology. 2015;7(8):a021766. pmid:26238360
  3. 3. Sekeres MJ, Moscovitch M, Winocur G. Mechanisms of Memory Consolidation and Transformation. In: Axmacher N, Rasch B, editors. Cognitive Neuroscience of Memory Consolidation. Switzerland: Springer International Publishing; 2017. p. 17–44.
  4. 4. Karni A, Tanne D, Rubenstein BS, Askenasy JJ, Sagi D. Dependence on REM sleep of overnight improvement of a perceptual skill. Science. 1994;265(5172):679–682. pmid:8036518
  5. 5. Brashers-Krug T, Shadmehr R, Bizzi E. Consolidation in human motor memory. Nature. 1996;382(6588):252–255. pmid:8717039
  6. 6. Grossberg S. The Adaptive Brain I. Amsterdam: Elsevier Science; 1987.
  7. 7. Abraham WC, Robins A. Memory retention—the synaptic stability versus plasticity dilemma. Trends Neurosci. 2005;28(2):73–78. pmid:15667929
  8. 8. Fusi S, Drew PJ, Abbott LF. Cascade models of synaptically stored memories. Neuron. 2005;45(4):599–611. pmid:15721245
  9. 9. Leibold C, Kempter R. Sparseness constrains the prolongation of memory lifetime via synaptic metaplasticity. Cereb Cortex. 2008;18(1):67–77. pmid:17490993
  10. 10. Roxin A, Fusi S. Efficient partitioning of memory systems and its importance for memory consolidation. PLoS Computat Biol. 2013;9(7):e1003146. pmid:23935470
  11. 11. McClelland JL, O’Reilly BL, McNaughton RC. Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychol Rev. 1995;102(3):419–457. pmid:7624455
  12. 12. Kumaran D, Hassabis D, McClelland JL. What learning systems do intelligent agents need? Complementary Learning Systems Theory updated. Trends Cogn Sci. 2016;20(7):512–534. pmid:27315762
  13. 13. Lee AK, Wilson MA. Memory of sequential experience in the hippocampus during slow wave sleep. Neuron. 2002;36(6):1183–1194. pmid:12495631
  14. 14. Skaggs WE, McNaughton BL. Replay of neuronal firing sequences in rat hippocampus during sleep following spatial experience. Science. 1996;271(5257):1870–1873. pmid:8596957
  15. 15. Diekelmann S, Born J. The memory function of sleep. Nat Rev Neurosci. 2010;11(2):114–126. pmid:20046194
  16. 16. Nadel L, Moscovitch M. Memory consolidation, retrograde amnesia and the hippocampal complex. Curr Opin Neurobiol. 1997;7(2):217–227. pmid:9142752
  17. 17. Tse D, Langston RF, Kakeyama M, Bethus I, Spooner PA, Wood ER, et al. Schemas and memory consolidation. Science. 2007;316(5821):76–82. pmid:17412951
  18. 18. Brodt S, Gais S, Beck J, Erb M, Scheffler K, Schönauer M. Fast track to the neocortex: A memory engram in the posterior parietal cortex. Science. 2018;362(6418):1045–1048. pmid:30498125
  19. 19. Squire LR, Alvarez P. Retrograde amnesia and memory consolidation: a neurobiological perspective. Curr Opin Neurobiol. 1995;5(2):169–177. pmid:7620304
  20. 20. Winocur G, Moscovitch M, Bontempi B. Memory formation and long-term retention in humans and animals: Convergence towards a transformation account of hippocampal–neocortical interactions. Neuropsychologia. 2010;48(8):2339–2356. pmid:20430044
  21. 21. Winocur G, Moscovitch M. Memory transformation and systems consolidation. J Int Neuropsychol Soc. 2011;17(5):766–780. pmid:21729403
  22. 22. Hopfield JJ. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA. 1982;79:2554–2558.
  23. 23. Zenke F, Agnes EJ, Gerstner W. Diverse synaptic plasticity mechanisms orchestrated to form and retrieve memories in spiking neural networks. Nat Commun. 2015;6 (6922). pmid:25897632
  24. 24. Tomé DF, Sadeh S, Clopath C. Coordinated hippocampal-thalamic-cortical communication crucial for engram dynamics underneath systems consolidation. bioRxiv. 2020;.
  25. 25. Van Essen DC, Anderson CH, Felleman DJ. Information processing in the primate visual system: an integrated systems perspective. Science. 1992;255(5043):419–423. pmid:1734518
  26. 26. Malenka RC, Bear MF. LTP and LTD: An Embarassment of Riches. Neuron. 2004;44(1):5–21. pmid:15450156
  27. 27. Remondes M, Schuman EM. Role for a cortical input to hippocampal area CA1 in the consolidation of a long-term memory. Nature. 2004;431(7009):699–703. pmid:15470431
  28. 28. Wixted JT. The psychology and neuroscience of forgetting. Annu Rev Psychol. 2004;55:235–269. pmid:14744216
  29. 29. Amaral DG. Emerging principles of intrinsic hippocampal organization. Curr Opin Neurobiol. 1993;3(2):225–229. pmid:8390320
  30. 30. Marr D. A theory of cerebellar cortex. J Physiol. 1969;202(2):437–470. pmid:5784296
  31. 31. Treves A, Rolls ET. A computational analysis of the role of the hippocampus in learning and memory. Hippocampus. 1994;4(3):373–391.
  32. 32. Brun VH, Otnass MK, Molden S, Steffenach HA, Witter MP, Moser MB, et al. Place cells and place recognition maintained by direct entorhinal-hippocampal circuitry. Science. 2002;296(5576):2243–2246. pmid:12077421
  33. 33. Nakazawa K, Sun LD, Quirk MC, Rondi-Reig L, Wilson MA, Tonegawa S. Hippocampal CA3 NMDA receptors are crucial for memory acquisition of one-time experience. Neuron. 2003;38(2):305–315. pmid:12718863
  34. 34. Nakashiba T, Young JZ, McHugh TJ, Buhl DL, Tonegawa S. Transgenic inhibition of synaptic transmission reveals role of CA3 output in hippocampal learning. Science. 2008;319(5867):1260–1264. pmid:18218862
  35. 35. Yeckel MF, Berger TW. Feedforward excitation of the hippocampus by afferents from the entorhinal cortex: redefinition of the role of the trisynaptic pathway. Proc Natl Acad Sci USA. 1990;87(15):5832–5836. pmid:2377621
  36. 36. Bi Gq, Poo Mm. Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. J Neurosci. 1998;18:10464–10472. pmid:9852584
  37. 37. Markram H, Lübke J, Frotscher M, Sakmann B. Regulation of synaptic efficacy by coincidence of postysnaptic APs and EPSPs. Science. 1997;275(5297):213–215. pmid:8985014
  38. 38. Gerstner W, Kempter R, van Hemmen JL, Wagner H. A neuronal learning rule for sub-millisecond temporal coding. Nature. 1996;383(6595):76–78. pmid:8779718
  39. 39. Kempter R, Gerstner W, van Hemmen JL. Hebbian learning and spiking neurons. Phys Rev E. 1999;59(4):4498–4514.
  40. 40. Miller KD, MacKay DJC. The role of constraints in Hebbian learning. Neural Comput. 1994;6(1):100–126.
  41. 41. Dayan P, Abbott LF. Theoretical Neuroscience. Cambridge: MIT Press; 2001.
  42. 42. Legenstein R, Naeger C, Maass W. What can a neuron learn with spike-timing dependent plasticity. Neural Comput. 2005;17(11):2337–2382. pmid:16156932
  43. 43. Pfister JP, Toyoizumi T, Barber D, Gerstner W. Optimal spike-timing-dependent plasticity for precise action potential firing in supervised learning. Neural Comput. 2006;18(6):1318–1348. pmid:16764506
  44. 44. Sjöström PJ, Rancz EA, Roth A, Häusser M. Dendritic excitability and synaptic plasticity. Physiol Rev. 2008;88(2):769–840. pmid:18391179
  45. 45. O’Neill J, Boccara C, Stella F, Schoenenberger P, Csicsvari J. Superficial layers of the medial entorhinal cortex replay independently of the hippocampus. Science. 2017;355(6321):184–188. pmid:28082591
  46. 46. O’Keefe J, Dostrovsky J. The hippocampus as a spatial map: Preliminary evidence from unit activity in the freely-moving rat. Brain Res. 1971;34(1):171–175. pmid:5124915
  47. 47. Hafting T, Fyhn M, Molden S, Moser M, Moser EI. Microstructure of a spatial map in the entorhinal cortex. Nature. 2005;436(7052):801–806. pmid:15965463
  48. 48. Moser EI, Kropff E, Moser MB. Place cells, grid cells, and the brain’s spatial representation system. Annu Rev Neurosci. 2008;31:69–89. pmid:18284371
  49. 49. Solstad T, Moser EI, Einevoll GT. From grid cells to place cells: a mathematical model. Hippocampus. 2006;16(12):1026–1031. pmid:17094145
  50. 50. O’Keefe J, Krupic J. Do hippocampal pyramidal cells respond to nonspatial stimuli? Physiol Rev. 2021;101:1427–1456.
  51. 51. Tukker JJ, Beed P, Brecht M, Kempter R, Moser EI, Schmitz D. Microcircuits for spatial coding in the medial entorhinal cortex. Physiol Rev. 2021;. pmid:34254836
  52. 52. Stuart G, Spruston N, Häusser M. Dendrites. Oxford: Oxford University Press; 2007.
  53. 53. Larkum ME, Zhu JJ, Sakmann B. A new cellular mechanism for coupling inputs arriving at different cortical layers. Nature. 1999;398(6725):338–341. pmid:10192334
  54. 54. D’Albis T, Jaramillo J, Sprekeler H, Kempter R. Inheritance of hippocampal place fields through Hebbian learning: effects of theta modulation and phase precession on structure formation. Neural Comput. 2015;27(8):1624–1672. pmid:26079752
  55. 55. Reifenstein ET, Bin Khalid I, Kempter R. Synaptic learning rules for sequence learning. eLife. 2021;10:e67171. pmid:33860763
  56. 56. Morris RGM, Garrud P, Rawlins JNP, O’Keefe J. Place navigation impaired in rats with hippocampal lesions. Nature. 1982;297(5868):681–683. pmid:7088155
  57. 57. Lux V, Atucha E, Kitsukawa T, Sauvage MM. Imaging a memory trace over half a life-time in the medial temporal lobe reveals a time-limited role of CA3 neurons in retrieval. eLife. 2016;5:e11862. pmid:26880561
  58. 58. Fusi S, Abbott L. Limits on the memory storage capacity of bounded synapses. Nat Neurosci. 2007;10(4):485–493. pmid:17351638
  59. 59. Herrnstein RJ. Relative and absolute strength of response as a function of frequency of reinforcement. J Exp Anal Behav. 1961;4(3):267–272. pmid:13713775
  60. 60. Morgenstern NA, Bourg J, Petreanu L. Multilaminar networks of cortical neurons integrate common inputs from sensory thalamus. Nat Neurosci. 2016;19(8):1034–1040. pmid:27376765
  61. 61. Constantinople CM, Bruno RM. Deep cortical layers are activated directly by thalamus. Science. 2013;340(6140):1591–1594. pmid:23812718
  62. 62. Walker MP, Brakefield T, Morgan A, Hobson JA, Stickgold R. Practice with Sleep Makes Perfect: Sleep-Dependent Motor Skill Learning. Neuron. 2002;35(1):205–211. pmid:12123620
  63. 63. McClelland JL. Incorporating rapid neocortical learning of new schema-consistent information into complementary learning systems theory. J Exp Psychol Gen. 2013;142(4):1190–1210. pmid:23978185
  64. 64. Song S, Miller KD, Abbott LF. Competitive Hebbian learning through spike-timing-dependent synaptic plasticity. Nat Neurosci. 2000;3(9):919–926. pmid:10966623
  65. 65. Ji D, Wilson MA. Coordinated memory replay in the visual cortex and hippocampus during sleep. Nat Neurosci. 2007;10(1):100–107. pmid:17173043
  66. 66. Winterer J, Maier N, Wozny C, Beed P, Breustedt J, Evangelista R, et al. Excitatory microcircuits within superficial layers of the medial entorhinal cortex. Cell Rep. 2017;19(6):1110–1116. pmid:28494861
  67. 67. Ólafsdóttir HF, Carpenter F, Barry C. Coordinated grid and place cell replay during rest. Nat Neurosci. 2016;19(6):792–794. pmid:27089021
  68. 68. Hasselmo ME. Neuromodulation: acetylcholine and memory consolidation. Trends Cogn Sci. 1999;3(9):351–359. pmid:10461198
  69. 69. Papouin T, Dunphy JM, Tolman M, Dineley KT, Haydon PG. Septal Cholinergic Neuromodulation Tunes the Astrocyte-Dependent Gating of Hippocampal NMDA Receptors to Wakefulness. Neuron. 2017;94(4):840–854. pmid:28479102
  70. 70. Urbanczik R, Senn W. Learning by the dendritic prediction of somatic spiking. Neuron. 2014;81(3):521–528. pmid:24507189
  71. 71. Benna MK, Fusi S. Computational principles of synaptic memory consolidation. Nat Neurosci. 2016;19(12):1697–1706. pmid:27694992
  72. 72. DiCarlo JJ, Cox DD. Untangling invariant object recognition. Trends Cogn Sci. 2007;11(8):333–341. pmid:17631409
  73. 73. Majaj NJ, Hong H, Solomon EA, DiCarlo JJ. Simple learned weighted sums of inferior temporal neuronal firing rates accurately predict human core object recognition performance. J Neurosci. 2015;35(39):13402–13418. pmid:26424887
  74. 74. Rigotti M, Barak O, Warden MR, Wang XJ, Daw ND, Miller EK, et al. The importance of mixed selectivity in complex cognitive tasks. Nature. 2013;497(7451):585–590. pmid:23685452
  75. 75. Sutherland RJ, Rudy JW. Configural association theory: The role of the hippocampal formation in learning, memory, and amnesia. Psychobiology. 1989;17(2):129–144.
  76. 76. Schapiro AC, Turk-Browne NB, Botvinick MM, Norman KA. Complementary learning systems within the hippocampus: A neural network modelling approach to reconciling episodic memory with statistical learning. Philos Trans R Soc Lond B Biol Sci. 2017;372(1711):20160049. pmid:27872368
  77. 77. McCloskey M, Cohen NJ. Catastrophic interference in connectionist networks: The sequential learning problem. In: Bower GH, editor. Psychology of Learning and Motivation. vol. 24. Academic Press; 1989. p. 109–165.
  78. 78. Foster DJ. Replay comes of age. Annu Rev Neurosci. 2017;40(1):581–602. pmid:28772098
  79. 79. Schuck NW, Niv Y. Sequential replay of nonspatial task states in the human hippocampus. Science. 2019;364(6447):eaaw5181. pmid:31249030
  80. 80. Liu Y, Dolan RJ, Kurth-Nelson Z, Behrens TEJ. Human replay spontaneously reorganizes experience. Cell. 2019;178(3):640–652. pmid:31280961
  81. 81. Tse D, Takeuchi T, Kakeyama M, Kajii Y, Okuno H, Tohyama C, et al. Schema-dependent gene activation and memory encoding in neocortex. Science. 2011;333(6044):891–5. pmid:21737703
  82. 82. Levine B, Svoboda E, Hay JF, Winocur G, Moscovitch M. Aging and autobiographical memory: dissociating episodic from semantic retrieval. Psychol Aging. 2002;17(4):677–89. pmid:12507363
  83. 83. Yonelinas AP, Ranganath C, Ekstrom AD, Wiltgen BJ. A contextual binding theory of episodic memory: systems consolidation reconsidered. Nat Rev Neurosci. 2019;2(6):364–375. pmid:30872808
  84. 84. Antony JW, Schapiro AC. Active and effective replay: systems consolidation reconsidered again. Nat Rev Neurosci. 2019;20(8):506–507. pmid:31160728
  85. 85. Yonelinas AP, Ranganath C, Ekstrom AD, Wiltgen BJ. Reply to ‘Active and effective replay: systems consolidation reconsidered again’. Nat Rev Neurosci. 2019;20(8):507–508. pmid:31160729
  86. 86. Pöhlchen D, Schönauer M. Sleep-dependent memory consolidation in the light of rapid neocortical plasticity. Curr Opin Behav Sci. 2020;33:118–125.
  87. 87. Kitamura T, Ogawa SK, Roy DS, Okuyama T, Morrissey MD, Smith LM, et al. Engrams and circuits crucial for systems consolidation of a memory. Science. 2017;356(6333):73–78. pmid:28386011
  88. 88. Aarts H, Dijksterhuis A. Habits as knowledge structures: Automaticity in goal-directed behavior. J Pers Soc Psychol. 2000;78(1):53–63. pmid:10653505
  89. 89. Makino H, Hwang EJ, Hedrick NG, Komiyama T. Circuit mechanisms of sensorimotor learning. Neuron. 2016;92(4):705–721. pmid:27883902
  90. 90. Pyle R, Rosenbaum R. A reservoir computing model of reward-modulated motor learning and automaticity. Neural Comput. 2019;31(7):1430–1461. pmid:31113300
  91. 91. Teşileanu T, Ölveczky B, Balasubramanian V. Rules and mechanisms for efficient two-stage learning in neural circuits. eLife. 2017;6:e20944. pmid:28374674
  92. 92. Murray JM, Escola GS. Remembrance of things practiced: Fast and slow learning in cortical and subcortical pathways. bioRxiv. 2020; p. 797548. pmid:33361766
  93. 93. Lillicrap TP, Santoro A, Marris L, Akerman CJ, Hinton G. Backpropagation and the brain. Nat Rev Neurosci. 2020;21(6):335–346. pmid:32303713
  94. 94. Debiec J, LeDoux JE, Nader K. Cellular and systems reconsolidation in the hippocampus. Neuron. 2002;36(3):527–538. pmid:12408854
  95. 95. Dudai Y. The restless engram: consolidations never end. Annu Rev Neurosci. 2012;35:227–247. pmid:22443508
  96. 96. Xu CS, Januszewski M, Lu Z, Takemura Sy, Hayworth K, Huang G, et al. A connectome of the adult Drosophila central brain. bioRxiv. 2020; p. 911859.
  97. 97. Troyer TW, Miller KD. Physiological gain leads to high ISI variability in a simple model of a cortical regular spiking cell. Neural Comput. 1997;9(5):971–983. pmid:9188190
  98. 98. Song S, Miller KD, Abbott LF. Competitive Hebbian learning through spike-timing-dependent synaptic plasticity. Nat Neurosci. 2000;3(9):919–926. pmid:10966623
  99. 99. Cannon RC, Turner DA, Pyapali GK, Wheal HV. An on-line archive of reconstructed hippocampal neurons. J Neurosci Methods. 1998;84(1-2):49–54. pmid:9821633
  100. 100. Mainen ZF, Sejnowski TJ. Influence of dendritic structure on firing pattern in model neocortical neurons. Nature. 1996;382(6589):363–366. pmid:8684467
  101. 101. Yeckel MF, Berger TW. Spatial distribution of potentiated synapses in hippocampus: dependence on cellular mechanisms and network properties. J Neurosci. 1998;18(1):438–450. pmid:9412520
  102. 102. Hines ML, Carnevale NT. The NEURON simulation environment. Neural Comput. 1997;9(6):1179–1209. pmid:9248061
  103. 103. Sutton RS, Barto AG. Introduction to Reinforcement Learning. Cambridge: MIT Press; 1998.