Hebbian plasticity in parallel synaptic pathways: A circuit mechanism for systems memory consolidation

Systems memory consolidation involves the transfer of memories across brain regions and the transformation of memory content. For example, declarative memories that transiently depend on the hippocampal formation are transformed into long-term memory traces in neocortical networks, and procedural memories are transformed within cortico-striatal networks. These consolidation processes are thought to rely on replay and repetition of recently acquired memories, but the cellular and network mechanisms that mediate the changes of memories are poorly understood. Here, we suggest that systems memory consolidation could arise from Hebbian plasticity in networks with parallel synaptic pathways—two ubiquitous features of neural circuits in the brain. We explore this hypothesis in the context of hippocampus-dependent memories. Using computational models and mathematical analyses, we illustrate how memories are transferred across circuits and discuss why their representations could change. The analyses suggest that Hebbian plasticity mediates consolidation by transferring a linear approximation of a previously acquired memory into a parallel pathway. Our modelling results are further in quantitative agreement with lesion studies in rodents. Moreover, a hierarchical iteration of the mechanism yields power-law forgetting—as observed in psychophysical studies in humans. The predicted circuit mechanism thus bridges spatial scales from single cells to cortical areas and time scales from milliseconds to years.


Introduction
Clinical and lesion studies suggest that declarative memories initially depend on the hippocampus, but are later transferred to other brain areas [1][2][3]. Some forms of memory eventually become independent of the hippocampus and depend only on a stable representation in the neocortex [1][2][3]. Similarly, procedural memories are consolidated within cortico-striatal networks [1,4,5]. This process of memory transformation-termed systems memory consolidation-is thought to prevent newly acquired memories from overwriting old ones, thereby extending memory retention times ("plasticity-stability dilemma"; [6][7][8][9][10]), and to enable a simultaneous acquisition of episodic memories and semantic knowledge of the world [11,12]. While specific neuronal activity patterns, including for example an accelerated replay of recent experiences [13,14], are involved in the transfer of memories from hippocampus to neocortex [15], the mechanisms underlying systems memory consolidation are not well understood. Specifically, it is unclear how this consolidation-related transfer is shaped by the anatomical structure and the plasticity of the underlying neural circuits. This poses a substantial obstacle for understanding into which regions memories are consolidated; why some memories are consolidated more rapidly than others [16][17][18]; why some memories stay hippocampus dependent, and why and how the character of memories changes over time [1]; and whether the consolidation of declarative and non-declarative memories [1,4,5] are two sides of the same coin. These questions are hard to approach within phenomenological theories of systems consolidation such as the standard consolidation theory [11,19], the multiple trace theory [16], and the trace transformation theory [20,21]. Here, we propose a novel mechanistic foundation of the consolidation process that accounts for several experimental observations and that could contribute to understanding the transfer of memories and the reorganisation of memories over time on a neuronal level.
Our focus lies on simple forms of memory that can be phrased as cue-response associations. We assume that such associations are stored in synaptic pathways between an input areaneurally representing the cue-and an output area-neurally representing the response. Thus, our work relates to feedforward, hetero-associative memory (and is therefore applicable to both declarative and non-declarative memories) rather than recurrent, auto-associative memory (see, e.g., [22][23][24]). Our central hypothesis-the parallel pathway theory (PPT)-is that systems memory consolidation arises naturally from the interplay of two abundantly found neuronal features: parallel synaptic pathways and Hebbian plasticity [25,26]. First, we illustrate this theory in a simple hippocampal circuit motif and show that Hebbian plasticity can consolidate previously stored associations into parallel pathways. Next, we outline the PPT in a mathematical framework for the simplest possible (linear) case. Then we show in simulations that the proposed mechanism is robust to various neuronal nonlinearities; further, the mechanism reproduces the results of a hippocampal lesion study in rodents [27]; iterated in a cascade, it can achieve a full consolidation into neocortex and result in power-law forgetting of memories as is observed in psychophysical studies in humans [28].

A mechanistic basis for systems memory consolidation
The suggested parallel pathway theory (PPT) relies on a parallel structure of feedforward connections onto the same output area: a direct, monosynaptic and an indirect, multisynaptic pathway. We propose that memories are initially stored in the indirect pathway and are subsequently transferred to the direct pathway via Hebbian plasticity. Because the indirect pathway is multisynaptic, it transmits signals with a longer time delay than the direct pathway ( Fig 1A). A timing-dependent plasticity rule allows the indirect pathway to act as a teacher for the direct pathway.
The proposed mechanism can be exemplified in the hippocampal formation, by considering direct and indirect pathways to area CA1. CA1 receives a direct, monosynaptic pathway from the entorhinal cortex (EC), which is called perforant path (PP CA1 , Fig 1B, red; [29]). In addition, EC input is relayed to CA1 via the classical trisynaptic pathway via dentate gyrus (DG) and CA3, reaching CA1 through the Schaffer collaterals (SC; Fig 1B, blue; [29]).
As in earlier theories, we assume that the indirect pathway via CA3 is involved in the original storage of memories [30,31], an assumption that is supported by experiments, e.g. [32][33][34]. We neglect, for simplicity, any encoding-related change in the direct pathway, even though in animals this pathway might also show some, putative much lower, plasticity during memory acquisition. This simplification does not affect our proposed mechanism on the consolidation-related transfer of memories.
We assume encoding in such a way that a memory can be recalled by a specific neural activity pattern in EC-a cue-that triggers spikes in a subset of CA1 cells through this indirect pathway via the SC, representing the associated response. The same cue reaches CA1 also through the direct pathway via the PP CA1 . We assume that this direct input from EC initially fails to trigger spikes because the synaptic weight pattern in the PP CA1 does not match the cue. However, PP CA1 inputs that are activated by the cue precede the spikes in CA1 pyramidal cells that are triggered by the indirect pathway by 5-15 ms [35] due to transmission delays. Presynaptic spikes preceding postsynaptic spikes with a short delay favor selective long-term potentiation by spike timing-dependent plasticity (STDP, Fig 1C) [36][37][38]. Consequently, cue-driven PP CA1 synapses onto activated CA1 cells are strengthened until the memory that was initially stored in the indirect pathway can be recalled via the direct pathway alone. The indirect pathway thus acts as a teacher for the direct pathway.
To illustrate this mechanism, we used a simple integrate-and-fire neuron model (for details, see Methods) of a CA1 cell that receives inputs through the SC and the PP CA1 . We also considered the two pathways to contain the same number of synapses and transmit identical spike patterns apart from a 5-ms delay in the SC (Fig 1D). Consolidation then corresponds to copying the synaptic weight pattern of the SC to the PP CA1 . In line with our hypothesis, such a consolidation was indeed achieved by STDP in the PP CA1 synapses (Fig 1E). A consolidation in the opposite direction, i.e., from the PP CA1 to the SC cannot be achieved by STDP because the temporal order of spiking activity is reversed and hence does not favour synaptic potentiation ( Fig 1F). Note that in this simple example, the EC-to-DG/CA3 synapses don't store any memory, but only introduce the transmission delay. In the following, we will show that all synapses of the indirect pathway can be involved in the original storage of memories.
To understand the conditions under which the suggested PPT can achieve a consolidation of associative memories, we performed a mathematical analysis, which shows that consolidation is robust to differences in the neural representation in the two pathways and illustrates its dependence on the temporal input statistics in the two pathways. Readers who are less interested in the mathematical details are welcome to jump to section "Consolidation of spatial representations", where we show in simulations that the mechanism is robust to neuronal complexities; in subsequent sections, we also show that the mechanism accounts for lesion studies in rodents, and that it can be hierarchically iterated.
Theory of spike timing-dependent plasticity (STDP) for parallel input pathways. In the following mathematical analysis, we consider a single cell that receives inputs through two pathways, as in Fig 1A. The cell could be located, for example, in CA1, as in Fig 1B. We assume Cue-response associations are initially stored in an indirect synaptic pathway (blue) and consolidated into a parallel direct pathway (red). (B) Hippocampal connectivity. The entorhinal cortex projects to CA1 through an indirect pathway via DG-CA3 and the Schaffer collaterals (SC, blue arrow), and through the direct perforant path (PP CA1 , red arrow). (C) Model of consolidation through STDP. Left: before consolidation, a strong SC input (middle, blue vertical bar) causes a large EPSP and triggers a spike in CA1 (bottom, black vertical bar). A weak PP CA1 input (top, red) that precedes the SC input is potentiated by STDP. Right: after consolidation through STDP, the PP CA1 input (top) can trigger a spike in CA1 by itself (bottom). (D-E) Consolidation in a single integrate-and-fire CA1 cell receiving 1000 PP CA1 and 1000 SC excitatory inputs. (D) PP CA1 activity consists of independent poisson spike trains; the SC activity is an exact copy of the PP CA1 activity, delayed by 5 ms. (E) Consolidation of a synaptic weight pattern from non-plastic SC synapses to plastic PP CA1 synapses. Left and middle: normalized synaptic weights before and after consolidation. Right: time course of correlation between SC and PP CA1 weight vectors during consolidation (mean ± SEM for 10 trials). (F) Failure of consolidation of a synaptic weight pattern from non-plastic PP CA1 to plastic SC synapses; panels as in E.
https://doi.org/10.1371/journal.pcbi.1009681.g001 that memories, i.e., cue-response associations, are stored in the synaptic weight vector V of the indirect path, and that consolidation occurs by transferring this information into the weights W of the direct path. In the simulation in Fig 1, the weight vector V represents the SC pathway, and the vector W the PP CA1 pathway. For simplicity, we consider the case of a single ratebased neuron, which represents one of the output neurons in the simulated network. Very similar theoretical results can be obtained for the spiking case of linear Poisson neurons, apart from additional contributions from spike-spike correlations, which can be neglected for a large number of synapses [39].
The output y of the rate-based neuron is assumed to be given by a linear function of the input where the vectors x and x 0 denote the input arising from the direct and indirect pathways, respectively, and T denotes the transpose of a vector (or matrix). We assume that the inputs x and x 0 are both representations of the cue and therefore are related by some kind of (potentially nonlinear) statistical dependency. Moreover, we assume that x 0 arises from an indirect pathway and is therefore delayed by a time interval D > 0. The notation is chosen such that the case where the two inputs to the two pathways are the same (apart from the delay) reduces to the condition x(t) = x 0 (t), which is the case, e.g., in Fig 1D. We now consider the learning dynamics of a simple additive (STDP) rule that would result from a rate picture (neglecting spike-spike correlations; cf. [39]), where L(τ) is the learning window (example in section Effects of temporal input statistics on systems memory consolidation), which determines how much a pair of pre-and postsynaptic activity pulses (i.e., spikes) with a time difference τ changes the synaptic weight, and η is a learning rate that scales the size of these changes. We adopt the convention that the time difference τ is positive when a presynaptic spike occurs before a postsynaptic spike.
The notation h� � �i T ¼ 1 T R T 0 � � � dt indicates averaging over an interval of length T. We assume that the integration time T can be chosen such that the weights do not change significantly during the integration time (i.e., a small learning rate η), but that the statistics of the input are sufficiently well sampled so that boundary effects in the temporal integration are negligible. We also assume that the statistics of the inputs x and x 0 are stationary, i.e., they do not change over time. Under these assumptions, we can insert the output firing rate y from Eq (1) into the learning rule in Eq (2) and get where h� � �i t denotes the average over all times. Eq (3) describes the dynamics of the weights W in the direct pathway, which are driven by an interplay of the correlation structures within the direct pathway (through hxðtÞxðt þ tÞ T i t ) and between the two pathways (through hxðtÞx 0 ðt þ t À DÞ T i t ); the dynamics of W depends also on the shape of the learning window L and the weights V in the indirect pathway. It is important to emphasize that in this analysis of the learning dynamics we consider the input arising during consolidation, e.g., during sleep, and this input may be statistically different from the input during memory storage or recall. If the correlation structure between the two pathways is different during consolidation and during storage/recall, the consolidation process leads to a distortion of the memory in the sense that a different cue would be required to retrieve the memory. Here, we consider only the case where the correlation structure during consolidation is the same as during storage and recall.
Let us now study under which conditions this weight update generates a consolidation of the input-output associations stored initially in the weights V of the indirect pathway into the weights W of the direct pathway.
Learning dynamics implement memory consolidation as a linear regression. In general, the learning dynamics is hard to analyze if the covariance matrices hxðtÞxðt þ tÞ T i t and hxðtÞx 0 ðt þ t À DÞ T i t are arbitrary objects. A case that can be studied analytically is that of separable statistics in which each of the two correlation matrices can be written as a product of scalar functions f and g of the delay τ and the covariance matrices for zero delay, hxx T i and hxx 0T i: For simplicity, we omitted the lower index t in hxx T i and hxx 0T i. Note that this separability assumption is consistent with all simulations shown, except the one in the section "Consolidation of spatial representations" of the Results; see there for details. For separable input statistics, the learning dynamics in Eq (3) If the scalar constant A is negative (see below for conditions when this is the case), the learning dynamics is stable and converges to a unique fixed point that is given by Note that apart from the factor À B A ≕ b, this fixed point has the same structure as the closedform solution of a linear regression. In fact, it is straightforward to show that the learning dynamics in Eq (6) performs a gradient descent on the error function If A is negative and B is positive (and thus β is positive), the learning dynamics in the direct path converges to a weight configuration for which the input W T x from the direct path is an optimal linear approximation of the input V T x 0 from the indirect path, in the sense of minimal mean squared error E. If β > 1, the direct pathway would contribute more to a potential recall than the original memory trace in the indirect pathway. A sign reversal of β (i.e. β < 0) implies a sign reversal of V. Then, however, a constraint on the sign change of weights (see below for details) would prohibit consolidation; memories in the indirect path could even be actively deleted from the direct path. In summary, memory consolidation in the PPT is supported by A < 0 (stable dynamics) and B > 0 (consolidation possible), which implies b ≔ À B A > 0. Note that we assumed storage of original memories only in the weight vector V representing the SC pathway. But since learning in the direct pathway is driven by the input from the entire indirect pathway, these results also hold if original memories are stored in any other plastic synapses of the indirect pathway (e.g. EC to DG/CA3 in Fig 1B).
Let us relate the theoretical results obtained so far to the simulations shown in Fig 1; although the simulations are performed for integrate-and-fire neurons, our theory on ratebased neurons accounts for the main findings: Because two inputs x and x 0 are the same apart from the delay, the fixed point condition Eq (7) reduces to W ¼ À B A V, in line with the result that the weights are copied into the direct pathway. Because the learning window is dominated by depression we have A < 0 while the delay in combination with the shorter autocorrelation time of the Poisson processes in the input ensures B > 0. A consolidation from the direct to the indirect pathway is not possible because this inverts the delay and pushes the cross-correlation between the two pathways into the depression component of STDP. As a result, the factor B is negative and consolidation fails.
In terms of systems memory consolidation in general, the weights V of the indirect path change as new memories are acquired, so the fixed point in Eq (7) for the weights W of the direct path is usually never reached. If it were, the direct pathway would merely represent a copy of the memories that are currently stored in the indirect path rather than retaining older memories, as intended. The time scale of the learning dynamics of the direct path [determined by η in Eq (6)] should therefore be longer than the memory retention time in the indirect path, which is determined, e.g., by the rate at which new memories are stored. In case of a small enough η, the transient dynamics of the system is more important for the consolidation process than the fixed point.
Another important aspect to emphasize is that the consolidation is influenced by the correlation structure hxx 0T i between the two pathways that is encountered during the consolidation period. Intuitively and according to Eq (8), consolidation is achieved by matching the input V T x 0 that is caused by "cues" x 0 in the indirect path with the input W T x caused by the associated "cues" x in the direct path. In order for the consolidated memories to be accessible during recall, the relation between the "cues" in the two pathways (i.e., the correlation hxx 0T i between the two pathways) should be the same during recall as during consolidation.
The objective function argument in Eq (8) only holds when the constant A is negative. For positive A, the learning dynamics in Eq (7) suffers from the common Hebbian instability and thus has to be complemented by a weight-limiting mechanism. The choice of this weight limitation (e.g., subtractive or divisive normalization, weight bounds) will then have an impact on the dynamics and the fixed point of the learning process [40,41]. For the simulations, the parameters were therefore always chosen such that the learning dynamics were stable (A < 0). Although this suggests that no weight limiting mechanism was required in principle, upper and lower bounds for the weights were nevertheless used in simulations, with no qualitative impact on the results.
Effects of temporal input statistics on systems memory consolidation. The constants A and B, which were defined in Eq (6) , play an important role for the learning dynamics. As already elaborated, the sign of A determines stability while B should be positive to obtain consolidation. Sign and magnitude depend on the interplay between the learning window L and the temporal input statistics, characterized by the correlation functions f and g defined in Eqs (4) and (5), respectively. For the assumed separable statistics, f is fully determined by the autocorrelation of the input in the direct path, and f(τ) is therefore symmetric in time τ.
A first interesting observation is that for the special case of an antisymmetric learning window L, we obtain A = 0 for symmetry reasons. Mathematically, this implies that the first term of the learning dynamics in Eq (6)-the dependence of the change of the weights W in the direct path on their actual value-vanishes. Intuitively, the balance of potentiation and depression in an antisymmetric learning window implies that the direct path, although able to drive the postsynaptic neuron, is causing equal amounts of potentiation and depression in all of its synapses. On average, synaptic changes are caused only by the indirect pathway with weights V, which therefore acts as a supervisor for the learning dynamics of W in the direct path. A thorough analysis under which conditions STDP can be used for supervised learning has been provided elsewhere [42,43], and the results of this analysis are applicable in the present case. Functionally, the depressing part of an STDP learning window serves to neutralize the impact of the direct pathway on its own learning dynamics, effectively creating a supervised learning scenario.
Another interesting observation relates to the magnitude of the terms A and B, which is determined by the time scale on which the inputs change (reflected, e.g., in the time constants of the decay of the correlation functions f and g). Let us assume that both correlation functions f(τ) and g(τ) are maximal for τ = 0 and that they decay to 0 for large |τ|; such conditions are reasonable for most correlation structures. We also assume that the learning window has the typical structure of potentiation for causal timing, L(τ) > 0 for τ > 0, and depression for acausal timing, L(τ) < 0 for τ < 0 [36,37,44]. Then the delay D > 0 in the indirect path shifts the maximum of the cross-correlation g(τ − D) into the potentiating part of the learning window ( Fig 2B) while the maximum of f(τ) remains in the transition region of potentiation and depression (Fig 2A). The following three observations can be made concerning the constant B as defined by the integral in Eq (6): (1). If the cross-correlation g has a narrow enough peak (i.e., narrower than the time scale of the learning window and the delay D), B is positive, suggesting that consolidation can occur ( Fig 2B). The sharp localization of g corresponds to rapidly changing input signals.
(2). If the decay time constant of the cross-correlation g is large compared to that of the learning window, the depressing component of the learning window has more impact and reduces the constant B and thus the efficiency of consolidation ( Fig 2C). In the case where the learning window is dominated by depression, B can even get negative for large time constants of g, abolishing consolidation altogether.
(3). If the delay D along the indirect path is much longer than the decay time constant of the learning window, we obtain B � 0, meaning that consolidation is abolished (Fig 2D).
In other words, the delayed correlations between the two pathways are too large to be exploited by STDP. This will limit the ability to consolidate from too long indirect paths into shortcuts.

Consolidation of spatial representations
The mathematical analysis of the PPT makes two key predictions. First, it suggests that STDP in a parallel direct pathway achieves consolidation by performing a linear regression between inputs in the direct and the indirect pathways [Eqs (7) and (8)]. Therefore, the proposed mechanism should generalize to situations in which the cue representations in the direct and indirect pathways differ. Second, the theory suggests that consolidation is most effective when the correlation time constants of the input during consolidation is matched to the coincidence time scale of STDP ( Fig 2B). In the following, we will show in simulations that those predictions hold and, moreover, that the mechanism is robust to neuronal nonlinearities.
To begin with, we show that the mechanism is robust to differing cue representations in the two pathways and to weaker correlations among them [45]. To this end, we used place cell representations [46] for the SC input from CA3 and grid cell representations [47,48] for the PP CA1 input from EC ( Fig 3A). Moreover, we show that the suggested mechanism is compatible with the biophysical properties of CA1 neurons, which receive inputs in different subcellular compartments. To this end, we simulated a multicompartmental CA1 pyramidal cell . This is the case if (i) the delay D between the paths is positive, (ii) the learning window is positive for positive delays, and (iii) the time scale of the decay of cross-correlations g is shorter than the delay D and the width of the learning window L. These three conditions favor consolidation. (C) If the crosscorrelation g decays on a time scale that is much longer than the width of the learning window and the delay D, the indirect path can drive both potentiation and depression, and consolidation is weaker (i.e., the coefficient B is smaller) than for shorter correlations. (D) If the delay D between the direct and the indirect paths is longer than the width of the learning window L, the indirect path cannot induce systematic changes in the weights of the direct path (coefficient B � 0), and consolidation is ineffective. https://doi.org/10.1371/journal.pcbi.1009681.g002 ( Fig 3B) that was endowed with active ion channels supporting backpropagating action potentials and dendritic calcium spikes (Fig 3C, Methods).
The use of spatial representations in the input pathways allows us to consider simple forms of memories in a navigational context in which a given location on a linear track is associated . PP CA1 and SC inputs project to distal apical tuft dendrites (red dots) and proximal apical and basal dendrites (blue dots). (C) Active neuron properties. Top: somatic sodium spike (black) propagates to the distal tuft and initiates a dendritic calcium spike (red) and further sodium spikes. Bottom: dendritic calcium spike leads to bursts of somatic spikes. (D) Spatial tuning before consolidation. SC provides place field-tuned input to the CA1 cell (left, blue), which yields spatially tuned spiking activity (right, blue); PP CA1 input is not spatially tuned (left, red), and (alone) triggers low and untuned spiking activity (right, red). (E) Somatic and dendritic activity during consolidation. During replay, SC input generates backpropagating sodium spikes (black vertical lines) that generate dendritic calcium spikes (red). (F) After consolidation. Spatial tuning is consolidated from the indirect SC pathway into the direct PP CA1 pathway. Left: spatial tuning of total PP CA1 input (red) approaches theoretically derived PP CA1 input tuning (magenta; see Methods). Right: CA1 output is place field-tuned through either SC or PP CA1 input alone. (G) Evolution of correlation between actual and optimal PP CA1 input tuning (see F) for replay speeds corresponding to hippocampal replay events (black) and real-time physical motion (grey). Position in D, E, and F normalized to [0, 1]. https://doi.org/10.1371/journal.pcbi.1009681.g003 with the activity of a given CA1 cell. Effectively, such an association generates a CA1 place cell. In line with the PPT, we assumed that the spatial selectivity of this CA1 place cell is initially determined solely by the indirect pathway via the SC, i.e., by place cell input from CA3. The goal of systems memory consolidation is then to transfer this spatial association to the direct input, which reaches the CA1 cell via the PP CA1 derived from grid cells in EC. In other words, place-cell input should supervise grid-cell input to develop a place-cell tuning. Note that we use the spatial setup primarily as an illustration of the theory. We do not make claims regarding the temporal development of CA1 place cells in vivo, which is not fully understood [49][50][51].
SC place field inputs were modelled by synapses that were active only in a small region of the track, whereas individual PP CA1 grid cell inputs were active in multiple, evenly spaced regions along the track (Fig 3A). In terms of the theory, the cue representation in the two pathways is now different, but correlated, because the same location is encoded. The SC and PP CA1 inputs projected to proximal and distal dendrites, respectively ( Fig 3B, [52]). Synapses were initialized such that the SC input conductances were spatially tuned and resulted in place fieldlike activity in the CA1 cell while the PP CA1 input had no spatial tuning ( Fig 3D).
During consolidation, SC and PP CA1 input to the CA1 cell consisted of replays of previously encountered sequences of locations [13,14], with a replay speed 20 times faster than physical motion [13]. During replay, the SC input led to somatic spikes, which in turn triggered backpropagating action potentials that caused calcium spikes in the distal dendrites where the PP CA1 synapses arrive (Fig 3C and 3E, [53]). Through synaptic plasticity, PP CA1 synapses active in the place field of the neuron were potentiated. Over time, the PP CA1 input adopted the spatial tuning of the SC input (Fig 3F, left) and reproduced the original SC-induced place field output ( Fig 3F, right) with high correlation (Fig 3G). The fact that the spatial tuning of two inputs is not perfectly matched does not contradict with theoretical results, which merely state that the direct input should attain the best possible linear approximation of the indirect pathway. In the present setting, this approximation is bounded by the finite range of frequencies of the entorhinal inputs (in analogy to reconstructing a high-frequency signal, e.g. a narrow peak, with a finite set of Fourier components), which causes the ringing next to the target peak in Fig 3F (left). In summary, the PPT mechanism therefore consolidated associations even though the spatial representations in the two pathways differed and although the two pathways targeted different neuronal compartments with different numbers of synapses in the CA1 neuron with complex morphology.
The theory also predicts that consolidation is most effective when the correlation time in the input is matched to the time scale of STDP ( Fig 2B). In line with this prediction, consolidation failed when replay speed was reduced to that of physical motion (Fig 3G) because the time scale of rate changes in place and grid cell activity is then much longer than the delay between the two pathways and the time scale of STDP ( Fig 2C). Accelerated replay during sleep [13] hence supports systems memory consolidation within the PPT by aligning the time scales of neural activity and synaptic plasticity [54], and this alignment is similar to the effect of phase precession during memory acquisition [55].
Finally, we note that the theoretical analysis relied on a separability assumption for the statistics in the two pathways; cf. Eqs (4) and (5). This condition is not fulfilled for sequence replay during consolidation because the time-delayed covariance of different place cells depends on the relative spatial location of their place fields; such correlations are non-separable even for slower replay or during memory acquisition with real-time physical motion. The observation that consolidation was successful nevertheless illustrates that the separability assumption does not need to be fulfilled for the PPT to achieve a successful consolidation.

Consolidation of place-object associations in multiple hippocampal stages
Ultimately, to consolidate memories into neocortex, they have to move beyond the PP CA1 . Notably, the PP CA1 is itself part of an indirect pathway from EC to the subiculum (SUB) that is shortcut by a direct connection from EC to SUB (referenced as PP SUB ; Fig 4A, left; [29]). This suggests that the PPT can be reiterated to further consolidate memories from the PP CA1 to the PP SUB and beyond.
To illustrate this idea, we considered a standard paradigm for memory research in rodents: the Morris water maze [56]. In the water maze, the rodent needs to find a submerged platform (object), i.e., it must store an object-place association. Thus this paradigm requires neural representations of objects (such as the submerged platform) and places. We hence constructed a model in which subregions of the hippocampal formation included neurons that encode places and neurons that encoded the identity of objects (Fig 4A, right).
For simplicity and computational efficiency we switched to a rate-based neuron model (Methods). An object was chosen from a set of 128 different objects and placed in a circular open field environment (Fig 4B, top). As motivated by experiments [32][33][34], we implemented object-to-place associations in our model by enhancing, as before, synaptic connections in the SC, but now between object-encoding neurons in CA3 and place-specific neurons in CA1 ( Fig  4A, right). Here, we did not consider place-to-object associations. These are less relevant for the water maze task, where the task is to recall the location of a given object-the platformrather than to recall which object was encountered at a given location. We tested object-toplace associations stored in the SC by activating the object representation in EC-as a memory cue-and determining the activities in CA1, triggered by the SC alone. From these activities we inferred a spatial probability map of the recalled object location (Fig 4B; Methods).
We first stored a single object-place association in the SC. During a subsequent consolidation cycle-representing one night-place and object representations in EC were then randomly and independently activated. Consistent with our previous results, the object-place association was gradually consolidated from the SC to the PP CA1 : after one night of consolidation, the correct spatial probability map of an object location was inferrable from CA1 activity triggered by the PP CA1 alone (Fig 4B).
To track the consolidation process over longer times, we assumed that a new random object-place association is stored in the SC every day. This caused a decay of previous SC memory traces due to interference with newly stored associations ( Fig 4C, [57,58]). During the night following each day, associations in the SC were partially consolidated into the PP CA1 , such that the consolidated association could be decoded from the PP CA1 after a single night, but previously consolidated associations were not entirely overwritten. As a result, objectplace associations were maintained in the PP CA1 for longer periods than in the SC, thus extending their memory lifetime ( Fig 4C). Eventually, a given PP CA1 memory trace would also degrade as new interfering memories from the SC are consolidated. However, as noted above, the PP CA1 itself is part of an indirect pathway from EC to the SUB, for which there is in turn a parallel, direct perforant pathway PP SUB . The association in the PP CA1 (and SC) could therefore, in turn, be partially consolidated into the PP SUB , further extending memory lifetime ( Fig  4C). Note that the extension of memory lifetime is supported in the model by a reduced plasticity (i.e. halved learning rate in Eq (33)) in PP SUB compared to PP CA1 .
The model suggests that the PP CA1 serves as a transient memory buffer that mediates a further consolidation into additional shortcut pathways downstream. This hypothesis is supported by navigation studies in rats. Using PP CA1 lesions, Remondes and Schuman [27] have shown that the PP CA1 is not required for the original acquisition of spatial memories, but that it is critically involved in their long-term maintenance. However, lesioning the PP CA1 21 days after acquiring a memory did not disrupt spatial memories, suggesting that the PP CA1 is not the final storage site ( Fig 4D) and further supporting the idea that the PP CA1 is important to enable a transition from short-term to long-term memories.
To test whether our model could reproduce these experimental results, we simulated PP CA1 lesions either before the acquisition of an object-place association or 21 days later. Assuming that the rat's spatial exploration is determined by the probability map of the object location [59], the model provided predictions for the time spent in different quadrants of the environment, which were in quantitative agreement with the data for all experimental conditions ( Fig  4D). Our model thus suggests that a hierarchical reiteration of parallel shortcuts-the central circuit motif of the PPT-could explain these experiments.
Similar to lesioning the PP CA1 , we predict that lesioning PP SUB also has an impact on memory consolidation: PP SUB should act as a transient memory buffer but on a longer timescale than PP CA1 . In general, lesioning a pathway with a set of synapses that cover a specific range of time scales such that there is a "gap" should result in an impairment of consolidation if the lesion is done before the memory has "moved on". To illustrate this idea in more detail, we study in the next section a model with many stages in a hierarchy.

Consolidation from hippocampus into neocortex by a hierarchical nesting of consolidation circuits
Given that shortcut connections are widespread throughout the brain [25,60,61], we next hypothesized that a reiteration of the PPT can also achieve systems consolidation from hippocampus into neocortex. To test this hypothesis, we studied a network model (Fig 5A), in which the hippocampus (now simplified to a single area) receives input from a hierarchy of cortical areas, representing, e.g., a sensory system. It provides output to a different hierarchy of areas, representing, e.g., the motor system or another sensory system.
The network also contained shortcut connections that bypassed the hippocampus. As in the previous section, new memories were stored in the hippocampus but not in any other indirect connection in the hierarchy. The repeated storage of new memories every day leads to a decay of previously stored hippocampal memories. But memories are also consolidated by Hebbian plasticity in parallel pathways; for details, see Methods.
Tracing a specific memory over time revealed a gradual consolidation into the cortical shortcut connections, forming a "memory wave" [10] that travels from hippocampus into neocortex ( Fig 5B). By exponentially decreasing the shortcut learning rate with distance from the hippocampus, a power-law decay of memories can be observed in the union of all shortcuts, e.g., by reading out the shortcut with the strongest memory trace at any moment in time ( Fig  5B). This observation is in line with a rich history of psychological studies on the mathematical Schematic of the hierarchical model. The hippocampal formation (HPC) is connected to cortical input circuit 1 and output circuit 1. Increasing numbers indicate circuits further from the HPC and closer to the sensory/motor periphery. Each direct connection at one level (e.g., dark blue arrow between input 1 and output 1) is part of the indirect pathway of the next level (e.g., for pathways from input 2 to output 2). Learning rates of the direct connections decrease exponentially with increasing level (i.e., from blue to red). (B) Memories gradually propagate to circuits more distant from the HPC. The correlation of the initial HPC weights with the direct pathways is shown as a function of time and reveals a memory wave from HPC into neocortex. The maximum of the output circuits follows approximately a power-law (black curve). Noise level indicates chance-level correlations between pathways. (C) Consolidated memories yield faster responses (from sensory periphery, e.g., Input 8, to system output) because these memories are stored in increasingly shorter synaptic pathways. https://doi.org/10.1371/journal.pcbi.1009681.g005 shape of forgetting curves [28]. Note that for the readout we tried to make as few assumptions as possible by letting all pathways contribute on an equal footing. Taking the maximum over the pathways (as well as the mean) generates a power law. Notably, we achieved memory retention times of years through only a small number (�5) of iterations of the PPT. Finally, we found that memory retrieval accelerates during consolidation (Fig 5C), in line with consolidation studies for motor skills [62]. In our consolidation model, the time to recall decreases because the path from peripheral input to output becomes shorter through the use of more direct (peripheral) shortcut connections (Fig 5A and 5B).
The predicted consolidation-mediated decrease of the time to recall critically depends on the utilized plasticity rule (STDP), which uses timing of input and output of neurons, and on our assumption that memories are initially acquired in an indirect pathway with a longer delay than direct pathways. While this assumption is reasonable for declarative memories that are initially stored in the hippocampus and then consolidated in sensory or motor areas towards neocortex, the underlying computational reasons for such a strategy are unknown. The strategy of the initial storage of memories in a pathway with a longer transmission delay could be related to the Complementary Learning Systems Theory (CLST) [11,63] if the initial storage needs some preprocessing, e.g., to achieve representations that are suited for one-trial learning, e.g. population-sparse representations [9]. In general, our results do not imply that the reduction of delay is a central goal of systems memory consolidation or that it is even necessary. Reduction of delay may, however, be a nice side effect of systems memory consolidation with timing-based plasticity rules [64]. And such a reduction of delay does not need to be restricted to declarative memories but also could apply to, e.g., motor skill learning or habit formation.

Discussion
We proposed the parallel pathway theory (PPT) as a mechanistic basis for systems memory consolidation. This theory relies on two abundant features in the nervous system: parallel shortcut connections between brain areas and Hebbian plasticity. A mathematical analysis suggests that STDP in a direct pathway achieves consolidation by implementing a linear regression that approximates the input-output mapping of an indirect pathway by that of the direct pathway. We applied the PPT to hippocampus-dependent memories and showed that the proposed mechanism can transfer memory associations across parallel synaptic pathways. This transfer is robust to different representations in those pathways and requires only weak correlations. Our results are in quantitative agreement with lesion studies of the perforant path in rodents [27] and are able to reproduce forgetting curves that follow a power-law as observed in humans [28].

Theory requirements and predictions
In addition to the anatomical motif of shortcut connections and Hebbian synaptic plasticity, the parallel pathway theory relies on four further requirements during the consolidation phase, which can also be considered as model predictions.
(1). Temporal correlations between the inputs from the two input pathways are necessary during consolidation, and these correlations should be similar to the ones during storage and recall. For example, a consolidation from hippocampus into neocortex would require correlations between cortical and hippocampal activity, as reported in [65]. Similarly, a consolidation of spatial memories within the hippocampal formation (including the medial entorhinal cortex, MEC) during replay would require correlations between activity in MEC and hippocampus; in particular, the same locations should be replayed, but represented by grid cells in MEC and by place cells in CA3 and CA1, as in Fig 3. A significant but weak correlation between the superficial layers of MEC (which provides input to the hippocampus) and CA1 was indeed observed [45]. Furthermore, pyramidal cells in the superficial layer III (projecting to CA1, "direct path") and stellate cells in the superficial layer II (projecting to DG, which projects to CA3, "indirect path") are expected to be correlated due to a strong excitatory feedforward projection from pyramids to stellates [66]; reviewed in [51]. Coordinated grid and place cell replay was also observed in [67] but there CA1 and deep layers of MEC (which receives the hippocampal output) were studied.
(2). The direct pathway should be plastic during consolidation, while the stored associations in the indirect path remain sufficiently stable (in contrast to the model in [24]). In practice, this requires the degree of plasticity to differ between periods of storage and consolidation (e.g., due to neuromodulation [68,69]), in a potentially pathway-dependent manner. In other words, the requirement is that the content of a memory should not be altered much while creating a backup.
(3). Plasticity in the shortcut pathway should be driven by a teaching signal from the indirect pathway. This can be achieved by STDP in combination with longer transmission delays in the indirect pathway, as suggested here, but other neural implementations of supervised learning may be equally suitable [42,43,70].
(4). Within the present framework, a systematic decrease in learning rates within the consolidation hierarchy (Fig 5) is needed to achieve memory lifetimes on the order of years. That is, synapses involved in later stages of consolidation should be less plastic during consolidation periods such as sleep, as also suggested by [10] and [24]. Furthermore, Roxin and Fusi elegantly showed in [10] that a multistage memory system confers an advantage (in terms of memory lifetime, memory capacity, and initial signal-to-noise ratio) compared to a homogeneous memory system with the same number of synapses, which provides a fundamental computational reason for the existence of a memory consolidation processes at the systems level. However, to be able to exploit this advantage, an efficient mechanism to transfer memories across stages is necessary. The proposed PPT explains how memories can be transferred in a biologically plausible way in a multistage memory system. Conceptually related to models of systems-memory consolidation with a systematic decrease in learning rates across a hierarchy of networks are models of synaptic memory consolidation with complex synapses that can assume many different states and a decrease of plasticity across a hierarchy of states. In such models of synaptic memory consolidation, also a power-law forgetting has been achieved [8,71]. Synaptic and systems memory consolidation models are different but not mutually exclusive.

What limits systems memory consolidation?
Our account of systems memory consolidation explains how memories are re-organized and transferred across brain regions. However, certain forms of episodic memory remain hippocampus-dependent throughout life [21].
In the context of the present model, this restriction could result from different factors. The PPT simplifies memory engrams by replacing multisynaptic by monosynaptic connections whenever possible. However, a shortcut pathway may not be present anatomically, or it may not host an appropriate representation for a given cue-response association in question. For example, it may be difficult to consolidate a complex visual object detection task into a shortcut from primary visual cortex (V1) to a decision area because the low-level representation of the visual cue in V1 may not allow it [72,73]. The same applies to tasks that require a mixed selectivity of neural responses [74]. Such tasks cannot be fully consolidated into shortcuts with simpler representations of cues and/or responses that do not allow a linear separation of the associations. On the basis of similar arguments, early work suggested that the hippocampus could be critical for learning tasks that are not linearly separable [75].
Within the present framework, the consolidated memory is in essence a linear approximation of the original cue-response association, as indicated in the theoretical analysis around Eqs (7) and (8). The resulting simplification of the memory content could underlie the commonly observed semantization of memories and the loss of episodic detail [20,21]. Such a semantization could already occur in the earliest shortcut connections [76], but could also gradually progress in a multi-stage consolidation process.

Relation to phenomenological models of systems consolidation
The basic mechanism of our framework explains memory transfer between brain regions, which is in line with the Standard Consolidation Theory (SCT) [11,19]. Our theoretical framework is closely related to the Complementary Learning Systems Theory (CLST) [11,63], which posits that slow and interleaved cortical learning is necessary to avoid catastrophic interference of new memory items with older memories [77]. In our model, later-presumably neocortical-shortcut connections have lower learning rates to achieve longer memory retention times. Interleaved learning could be achieved by interleaved replay [78][79][80] during consolidation. Thereby, the results of CLST can be directly applied to learning in shortcuts in our model, such as the rapid neocortical consolidation of new memories that are in line with a previously learned schema [17,63,81].
Limitations of memory transfer between brain regions-as discussed above-can impair the consolidation process, resulting in memories that remain hippocampus-dependent throughout life. Hence, our theoretical framework is also in agreement with the Multiple Trace Theory (MTT) [16] and the Trace Transformation Theory (TTT) [20,21]. The MTT postulates that memories are re-encoded in the hippocampus during retrieval, generating multiple traces for the same memory. Our model maintains multiple memory traces in different shortcut pathways, even without a retrieval-based re-encoding. The consolidation mechanism of the PPT, however, could also transfer a specific memory multiple times if it is re-encoded during retrieval. If neocortex extracts statistical regularities from a collection of memories [11], the consolidation of such a repeatedly re-encoded memory could then lead to a gist-like, more semantic version of that memory in neocortex [16,21,82], as emphasized by the TTT.
The premise of our model is that memories are actively transferred between brain regions. This premise has recently been subject to debate [83][84][85], following the suggestion of the Contextual Binding (CB) theory. The CB theory argues that amnesia in lesion studies and replaylike activity can be explained by simultaneous learning in hippocampus and neocortex, together with interference of contextually similar episodic memories [83]. Note, however, that our framework does not exclude a simultaneous encoding in neocortex and hippocampus, which can be combined with active consolidation [1,86].
Hence, our mechanistic approach is in agreement with and may allow for a unification of several phenomenological theories of systems consolidation.

Consolidation of non-declarative memories
Given that shortcut connections are widespread throughout the central nervous system [25,60], the suggested mechanism may also be applicable to the consolidation of non-declarative memories, e.g., of perceptual [4] and motor skills [5], fear memory [87] or to the transition of goal-directed to habitual behaviour [88].
Several studies have suggested two-pathway models in the context of motor learning [89][90][91][92]. In particular, Murray and Escola [92] recently used a two-pathway model to investigate how repeated practice affects future performance and leads to habitual behaviour. While their model does not incorporate an active consolidation mechanism or multiple learning stages, the basic mechanism is the same: A fast learning pathway from cortex to sensorimotor striatum first learns a motor skill and then teaches a slowly learning pathway from thalamus to striatum during subsequent repetition.

Limitations of the model and future directions
The present work focuses on feedforward networks and local learning rules. Hence, the model cannot address how systems memory consolidation affects the representation of sensory stimuli and forms schemata that facilitate future learning [17,81] because representation learning typically requires a means of backpropagating information through the system, e.g., by feedback connections [93]. The interaction of synaptic plasticity with recurrent feedback connections generates a high level of dynamical complexity, which is beyond the scope of the present study. Our framework also does not explain reconsolidation, that is, how previously consolidated memories become labile and hippocampus-dependent again through their reactivation [94,95].
On the mechanistic level, the PPT predicts temporally specific deficits in memory consolidation when relevant shortcut connections are lesioned, that is, a tight link between the anatomical organisation of synaptic pathways and their function for memory. These predictions may be most easily tested in non-mammalian systems, where connectomic data are available [96].
The PPT could provide an inroad to a mechanistic understanding of the transformation of episodic memories into more semantic representations. This could be modelled, e.g, by encoding a collection of episodic memories that share statistical regularities and studying the dynamics of statistical learning and semantisation in the shortcut connections during consolidation. Such future work may allow us to ultimately bridge the gap between memory consolidation on the mechanistic level of synaptic computations and the behavioural level of cognitive function.

Consolidation in a single integrate-and-fire neuron
For the results shown in Fig 1E and 1F we used a single integrate-and-fire model neuron that received excitatory synaptic input. The membrane potential V(t) evolved according to with membrane time constant τ m = 20 ms, resting potential V rest = −70 mV, and synaptic reversal potential E syn = 0 mV. When the membrane potential reached the threshold V thresh = −54 mV, the cell produced a spike and the voltage was reset to −60 mV during an absolute refractory period of 1.75 ms.
The total synaptic conductance g syn (t) in Eq (9) is denoted in units of the leak conductance and thus dimensionless (parameters are taken from [97]). The total synaptic conductance was determined by the sum of 1000 Schaffer collateral (SC) inputs and 1000 perforant path (PP CA1 ) inputs. Activation of input i (where i denotes synapse number) leads to a jump g i > 0 in the synaptic conductance: g syn ðtÞ ! g syn ðtÞ þ g i : ð10Þ All synaptic conductances decay exponentially, with synaptic time constant τ syn = 5 ms. The PP CA1 inputs were activated by mutually independent Poisson processes with a mean rate of 10 spikes/s. The activity patterns of the SC fibers were identical to those of the PP CA1 fibers but were delayed by 5 ms. The synaptic peak conductances or weights, g i , were either set to a fixed value or were determined by additive STDP [98]. A single pair of a presynaptic spike (at time t pre ) and a postsynaptic spike (at time t post ) with time difference Δt � t pre − t post induced a modification of the synaptic weight Δg i according to with τ STDP = 20 ms. L(Δt) is the learning window of STDP [98]. Hard upper and lower bounds were imposed on the synaptic weights, such that 0 � g i � � g max for all i, where the dimensionless maximum synaptic weight was � g max ¼ 0:006. Parameters A þ ¼ Z � � g max and A − = 1.05 � A + with η = 0.005 determine the maximum amounts of LTP and LTD, respectively.
Synaptic weights were initialized to form a bimodal distribution, such that it agrees with the steady state weight distribution resulting from additive STDP, when presynaptic input consists of uncorrelated Poisson spike trains [98]. Specifically, half the weights were sampled from an exponential distribution with mean 0:05 � � g max , the other half as � g max minus that same exponential distribution.
The dynamics were integrated numerically using the forward Euler method, with an integration time step of 0.1 ms.

Consolidation of spatial representations in a multi-compartment neuron model
The results presented in Fig 3C-3G relied on numerical simulations of a conductance-based compartmental model of a reconstructed CA1 pyramidal cell (cell n128 from [99]). Passive cell properties were defined by the membrane resistance R m = 30 kO cm 2 with reversal potential E L = −70 mV, intracellular resistivity R i = 150 Ocm, and membrane capacitance C m = 0.75μF/cm 2 . Dendrites were discretized into compartments with length smaller than 0.1 times the frequency-dependent passive space constant at 100 Hz. Three types of voltage-dependent currents and one calcium-dependent current, all from [100], were distributed over the soma and dendrites. Gating dynamics of the currents evolved according to standard first-order ordinary differential equations. The steady state (in)activation functions x 1 and voltage-dependent time constants τ 1 for each gating variable (i.e., x = m, h, n; see below) were calculated from a first-order reaction scheme with forward rate α x and backward rate β x according to x 1 (V) = α x (V)/(α x (V) + β x (V)) and τ x (V) = 1/(α x (V) + β x (V)) where V was the membrane potential. All used current densities and time constants were selected for a temperature of 37˚C (see [100]).
A fast sodium current, I Na , was distributed throughout the soma (� g Na ¼ 130 pS/μm 2 ) and dendrites (� g Na ¼ 260 pS/μm 2 ), except from the distal apical dendritic tuft, with reversal potential E Na = 60 mV. The dynamics of activation gating variable m and inactivation gating variable h were characterized by Here and in the following, we dropped units for simplicity, assuming that the membrane potential V is given in units of mV. The steady-state inactivation function was defined directly as A fast potassium current, I Kv , was present in the soma (� g Kv ¼ 95 pS/μm 2 ) and throughout the dendrites (� g Kv ¼ 190 pS/μm 2 ), with reversal potential E K = −90 mV and with activation gating variable n characterized by a n ¼ À 0:064 A high-voltage activated calcium current, I Ca , was distributed throughout the apical dendrites (� g Ca ¼ 30 pS/μm 2 ) with an increased density (� g Ca ¼ 35 pS/μm 2 ) for dendrites distal from the main apical dendrite's bifurcation, with reversal potential E Ca = 140 mV and with activation gating variable m and inactivation gating variable h characterized by A calcium-dependent potassium current, I KCa , was similarly distributed throughout the apical dendrites (� g KCa ¼ 30 pS/μm 2 ) with an increased density (� g KCa ¼ 35 pS/μm 2 ) beyond the main bifurcation of the apical dendrite, with activation gating variable n characterized by with [Ca 2+ ] in μM.
Internal calcium concentration in a shell below the membrane surface was computed using entry via I Ca and removal by a first-order pump, with Faraday constant F, depth of shell d = 0.1 μm and with [Ca 2+ ] 1 = 0.1μM, and τ R = 80 ms.
To account for dendritic spines, the membrane capacitance and current densities were doubled throughout the dendrites. An axon was lacking in the cell reconstruction and was added as in [100]. Excitatory synaptic inputs were distributed over the membrane surface. Upon activation of a synapse, the conductance with a reversal potential of 0 mV increased instantaneously and subsequently decayed exponentially with a time constant of 3 ms. The PP CA1 provided 500 inputs that were distributed with uniform surface density throughout the distal apical tuft dendrites; the SC provided 2500 inputs, distributed uniformly over basal dendrites and proximal apical dendrites [52].
All inputs were spatially tuned on a 2.5 m long linear track over which the simulated rat walked. The PP CA1 inputs showed periodic, grid field-like spatial tuning with periodicity ranging from 2 to 6 grid fields along the entire track with random phase: where H is the Heaviside step function, r is the mean firing rate within the grid field, k is the spatial frequency, and ξ i is the random spatial phase offset for neuron i (for i = 1, . . ., 500). The 2500 SC inputs showed place field-like tuning, having single, 25 cm long place fields distributed uniformly random along the track. When the virtual rat was within the place or grid field of an SC or PP CA1 fiber, respectively, the input was activated as an independent Poisson process with a mean rate of r = 10 spikes/s. Outside of the place/grid fields the fibers were quiescent. Simulations of the consolidation phase considered replay of the rat walking back and forth along the linear track, with running speeds increased, compared to realistic speeds, by a factor 20 (5 m/s; [13]). SC input activity to the CA1 cell was delayed by 5 ms with respect to the PP CA1 input [101], accounting for the extra processing stages involved for information reaching CA1 from the entorhinal cortex through DG and CA3, compared to the direct entorhinal PP CA1 input.
The PP CA1 and/or SC inputs showed additive STDP, operating in the same manner as defined around Eq (12). Post-synaptic spikes were defined as local voltage crossings of a threshold at −30 mV. The maximum synaptic weight for the SC inputs was 400 pS and 140 pS for the PP CA1 inputs.
The reference tuning curve shown in Fig 3F (PP CA1 inputs theory) was computed by adding up all grid field tuning functions that had an active field in the SC-encoded spatial position (i.e., halfway along the linear track).
Simulations were carried out with a fixed time step of 25 μs using the NEURON simulation software [102].

Consolidation of place-object associations in multiple hippocampal stages
The results related to Fig 4 show the acquisition and consolidation of place-object associations in a hippocampal network model. Every day a virtual animal learns the position of one of many possible objects in a circular open field environment. The simulations show that during a subsequent sleep phase, replay of the hippocampal activity that is associated with runs through this environment allows for the consolidation of the place-object association. We call the imprinting of a new memory and the subsequent memory consolidation phase a consolidation cycle. In the simulations, a place-object association learned at time t = 0 is tracked for N cycle consolidation cycles, i.e., nights after memory acquisition. Between consolidation cycles, the memory in the system is assessed as described below.
Model architecture. The model consists of four neuronal layers: entorhinal cortex (EC), dentate gyrus/CA3 (DG-CA3; note that the dentate gyrus is not explicitly included as a separate area), CA1, and the subiculum (SUB). Each layer consists of a population of place-coding cells and a population of object-coding cells. The connectivity is depicted in Fig 4A: EC projects to DG-CA3, which connects to CA1 (through the SC pathway), which in turn connects to the SUB. EC provides also shortcut connections to CA1 (PP CA1 pathway) and the SUB (PP SUB pathway).
The SC, PP CA1 , and PP SUB pathways consist of four different connection types among populations of neurons that represent either place or object: (i) from object (populations) to object (populations), (ii) from place to place, (iii) from object to place, and (iv) from place to object. For simplicity, the pathway from CA1 to the SUB consists only of place-to-place and object-toobject connections, because we never store object-place or place-object associations in this pathway. The pathway from EC to DG-CA3 was not explicitly modelled. Instead, we assumed that the same location (of the virtual animal) is represented in both areas, but with a grid cell code and a place cell code, respectively. We assumed that all connections have the same transmission delay, which is equal to one time step D = ΔT = 5 ms in the simulation (see Table 1 for parameter values). In practice, this meant that the activities in the SC pathway and the connection from CA1 to the SUB each had a transmission delay D relative to the activities in the connections from EC to DG/CA1 and from EC to SUB.
Activities of neurons in each layer were described as firing rates and were determined by a linear model, where x EC (t) and x CA3 (t) are the activities in the input layers EC and DG-CA3, respectively, and y CA1 (t) and y SUB (t) represent the activities in the output layers CA1 and SUB, respectively. Time is denoted by t. The symbols W PP-CA1 and W PP-SUB denote the weight matrices of the pathways from EC to CA1 and from EC to SUB, respectively. The matrices V SC and V CA1-SUB summarise the weights from DG-CA3 to CA1 and from CA1 to SUB, respectively, which mediate the transmission delay D. Eqs (23) and (24) are identical in structure to Eq (1) except that now the output is a vector (and not a scalar) and the synaptic weights are a matrix (and not a vector).
As already mentioned above, each neuron in a layer is assumed to primarily encode either place or object information (see Fig 4A). To simplify the mathematical analysis, we turn to a notation where we write a layer's activity vector z (where z = x EC , x CA3 , y CA1 , or y SUB ) as a concatenation of place and object vectors: where the number of place-and object-coding cells is identical, dim(z place ) = dim(z object ) = N, hence dim(z) = 2N. Correspondingly, the weight matrices M (where M = W PP-CA1 , W PP-SUB , V SC , or V CA1-SUB ) are composed of four submatrices, connecting the corresponding feature encoding sub-vectors (place-place, place-object, object-place, and object-object): Associations between objects and places were initially stored in V SC as described below. To achieve a consistency in the code for places and objects, the weights in V SC and V CA1-SUB that connect neurons coding for the same feature (i.e., place-place or object-object) were set proportional to identity matrices I, The scaling factors w id SC ¼ 1 4 and w id CA1-SUB ¼ 1 2 ensure that these pathways had similar impact as the other pathways projecting to CA1 cells and SUB cells, respectively, and w id CA1-SUB is twice as large as w id SC to account for the fact that only in the CA1-SUB pathway the object-place and place-object connections were set to zero. The matrices W PP-CA1 and W PP-SUB , which represent shortcuts, were plastic during a consolidation cycle and evolved according to the learning rule described below. Their initial values were chosen as a random permutation of an equilibrium state, taken from a long running previous simulation.
Place-and object-coding cells. Place-coding cells in EC and DG-CA3 were assumed to respond deterministically, given a two-dimensional position variable p(t) 2 [0, 1] 2 , which evolves in time.
Place-coding cells in entorhinal cortex show grid field spatial tuning [48], which we modelled as a superposition of 3 plane waves with relative angles of p 3 : x place EC;i t ð Þ ¼ r max 2 9 where the spacing m i ¼ 2p 2 þ 4i N À � ; 8i 2 ½1; N�; is chosen so that a total range of 2 to 6 periods fit into the circular environment. The orientation of the plane waves is determined by the vector where θ i are uniformly chosen random angles, and p i 2 [0, 1] 2 are uniformly sampled random phases of the grid field [49]. Each cell's output rate varies between 0 to r max spikes per second.
Place-coding cells in DG-CA3 show place-field tuning and were assumed to have a 2D Gaussian activity profile where r max is the maximum rate, σ the field size, and c i the centre of field i. The centres c i were chosen to lie on a regular grid. The object-coding cells in EC and DG-CA3 respond with fixed deterministic responses x object EC and x object CA3 to each of N object objects. Given that they are located in the same brain region, we assumed that the firing-rate statistics of the object-coding cells and the place-coding cells were similar, both in EC and CA1. This was ensured by calculating the rates of the object-coding cells in two steps. First, we used the same equations as for the place-coding cells (i.e., Eq (29) for EC cells and Eq (30) for DG-CA3 cells) with a randomly selected "object position" o i , i 2 {1, ‥, N object } for each of the N object objects. Subsequently the rates of the neurons within the population were randomly permuted for each object, to avoid an artificial constraint of the population activity onto a 2-dimensional manifold.
Imprinting of place-object associations in the SC pathway. The virtual animal learned a single new object-to-place association each day. Storing more memories per day would not qualitatively change the results, but would merely alter the time scale at which a given memory is overwritten in the SC pathway. Memories were imprinted in V SC by first determining the activities of the object-coding DG-CA3 cells and place-coding CA1 cells given a random object and a random position where the object was encountered (see previous section). The weights in V SC that connect object cells to place cells were then updated according to where 0 < λ SC < 1 (numerical values of parameters are summarized in  (31) ensures the same relative influence of different memories, irrespective of the associated activity levels. This ensures an approximately constant rate of overwriting/forgetting. The outer norm guarantees that the weights V SC stay bounded and hence induces forgetting. As a consequence of this updating scheme, the memories are lost over time. Note that before we imprint a new memory to V SC (other than on day 0 on which the place-object association is learned that is tracked during the simulation), the place-coding cells in DG-CA3 are remapped, i.e., they are assigned to new random positions. This corresponds to learning the new object in a new environment/ room, and effectively reduces the amount in interference between memories. Before starting a simulation, we imprinted N mem place-object associations to V SC to ensure an equilibrium state.
The weights from place-to-object coding cells could be updated analogously. This would allow to decode the identity of a stored object given a location. We did not test this direction of the object-place association, because this is not relevant for the water maze task.
Learning rule operating on PP CA1 and PP SUB pathways. The plastic weight matrices W PP-CA1 and W PP-SUB changed according to a timing-based learning rule [41]: where W is either W PP-CA1 or W PP-SUB , and y correspondingly y CA1 or y SUB . The learning window L(τ) defined in Eq (12) determines the learning dynamics. Eq (32) differs from the corresponding Eq (2) in several ways. First, on the left-hand side there is now a derivative, in contrast to the earlier version with a differential quotient; and on the right-hand side we omit the angular brackets that indicated a temporal average. Therefore, Eq (32) represents the instantaneous change of weights for a particular input, which is numerically more straightforward to implement in an online-learning paradigm. The resulting weight change for long times and many inputs approximates well Eq (2) if consolidation is slow enough. Second, we now omit the learning rate parameter η, which is absorbed in the definition of the parameters A + and A − of the learning window L. Third, there are now two addends in the integral and the integration limits are from 0 to 1. This is equivalent to the earlier definition, but more convenient for a numerical implementation. All this allows to simplify the description of the learning dynamics, as will be outlined in what follows.
We integrated the learning dynamics using the Euler method, with time steps ΔT equal to the inverse pattern presentation rate. In practice, we used the standard method of calculating pre-and postsynaptic tracesx andŷ to integrate the equation where A + and A − again determine the maximum amount of potentiation and depression of the synaptic weights, respectively. Note that these parameters effectively control the learning rate and are chosen twice as large in the PP CA1 than in the PP SUB (Table 1), to increase memory lifetime in the latter shortcut. Again, we used an exponential window function L(τ), so that exponentially filtered activitiesx andŷ can be calculated as in [98]: where τ STDP determines the width of the learning window. Weight values are constrained to the interval [0, w max ]. The weights of W PP-CA1 and W PP-SUB were initialized to small random values from a uniform distribution in ½0; w max init �. For each iteration in a consolidation cycle of duration T c , i.e., every ΔT = 5ms, we chose a random input position and a random object to calculate the activities in all layers. These activities were then used to update the weights as given in Eq (33).
Assessing the strength of memories in SC, PP CA1 , and PP SUB . To assess the memory strength encoded in a pathway, we determine the activity y place of place-coding cells (in either CA1 or SUB) in response to an object o 2 {1, . . ., N objects } along the object-to-place pathway under consideration (e.g., for PP CA1 it would be from object-coding cells in EC to place-coding cells in CA1). From this response we decode the memorized place of the object using Bayesian inference. However, the response is usually corrupted due to various factors such as imperfect imprinting, consolidation, or interference with other memories. Assuming that these imperfections result from a superposition of many statistically independent factors, we use a Gaussian likelihood: where N is the multivariate Gaussian probability density function, σ noise is the standard deviation of the noise, i.e., the imperfections. I is the identity matrix, i.e., we assumed uncorrelated noise in the responses. The expected activity μ(p) depends on the location p and is given by the activity that would result from the activation of place-coding cells in EC or DG-CA3, i.e., by Eqs (30), (23) and (24). Because the connections between place-coding cells in DG-CA3, CA1, and SUB are scaled identity matrices, the expected activity μ(p) is essentially a place-cell code: To avoid a dependence on overall activity levels, μ(p) and y place are normalized to zero mean and unit variance. Using Bayes' theorem we can now calculate the posterior probabilities of the places that coded for the given response y place : where for Eq (38) we used a flat prior, because the environment was uniformly sampled in the simulations. To avoid the explicit evaluation of the sum in the denominator, we normalise the evaluated place probabilities to sum to one. We make use of the linear relationship of the place response given an object (see Eqs (23) to (26) , depending on the pathway for which the strength of the memory is assessed. This allows to compute the posterior probability of the place given an object (Fig 4B and 4C): Memory consolidation over many days. To simulate a single consolidation cycle (i.e., a storage of a new memory followed by a single consolidation phase), we alternated the imprinting of a new place-object association (Eq (31)) with a consolidation phase of length T c . Before starting the experiments, we equilibrated the weights W PP-CA1 and W PP-SUB by simulating N equi consolidation cycles. At day 0 we imprinted the objectô: the memory which was tracked. After each following consolidation phase the place probabilities along the different pathways were calculated for objectô according to Eq (41) (see Fig 4C).
Lesion experiments. Remondes and Schuman [27] lesioned the perforant path (temporoammonic pathway) during a Morris water maze consolidation experiment. Their finding evidenced a role of the perforant path in memory consolidation by showing that the precise timepoint of the lesion after memory acquisition determined whether the memory persisted (see Fig 4D).
In our simulations we implemented a lesion by setting all PP CA1 weights to 0 (W PP-Ca1 = 0) and by disabling their plasticity. Like in the experimental setup of [27], we lesioned either right before or 21 days after presentation of objectô. For each day and lesioning protocol, the place probabilities, Eq (41), along the pathways can then be calculated. The pathway with the highest inferred object position probability was then selected, and the summed probabilities per quadrant were calculated for this pathway. To account for exploration versus exploitation (see, e.g., [103]) of the rats, the inferred probabilities were linearly mixed with a uniform distribution over the quadrants. We used 70% explore versus 30% exploit for the plots in Fig 4D. Note that we assumed that the probabilities per quadrant correspond to the time spent in each quadrant.  Fig 5A). Shortcut connections exist between the neocortical populations (colored arrows in Fig 5A). All connections carry the same transmission delay D.

Consolidation in a hierarchical rate-based network
Every day new memories are imprinted into the weight matrix representing the HPC. The model describes the transfer of the memories into neocortex during N cycle consolidation phases, of which there is one per night (for all model parameters and values, see Table 2). In contrast to the model for Fig 4, we do not consider object-place associations, but directly analyse correlations between a stored memory weight matrix and the weight matrices that describe the neocortical shortcut connections.
Model details. We consider a hierarchy of 2L neocortical populations with L = 8 shortcut connections. Activities of the populations that project towards the HPC are given by vectors x i (t) and the activities of the populations leading away from the HPC by vectors y i (t) (i 2 {1, . . ., L}). At each iteration, the activities x L (t) (i.e., the neocortical population most distal from the HPC) are sampled from a Gaussian distribution with a mean input rate r and a standard deviation r/2. The sampled activities are rectified to be non-negative (r max(r, 0)), hence yielding a rectified Gaussian distribution. The activities on all other layers are then determined by their respective connections. For simplicity, we assume that weight matrices connecting subsequent populations in the hierarchy (black arrows in Fig 5A) are identity matrices that are scaled such that activity levels remain comparable along the hierarchy (see below). The results do not depend on this simplifying assumption. The population activities along the HPC directed path are then given as 8i 2 f1; ::; L À 1g: ð42Þ In Fig 5, we modelled the HPC as a single neural population, with activities given by Here, V HPC is the hippocampal-formation weight matrix into which new memories are imprinted (see below). The first outward-directed neocortical population receives input from the HPC and through a shortcut connection from the activities x 1 , Using Eq (43), we obtain Note that Eq (45) is slightly different from Eq (1) because we have included the delay D now also in the direct pathway, for consistency; this does not influence the learning dynamics or the applicability of the theoretical analyses because the same delay is included in the learning rule in Eq (48). Subsequent activities y i of populations projecting away from HPC are calculated as where W i are the direct shortcut connections from the populations x i to the populations y i . Memory imprinting to the HPC weight matrix V HPC is analogous to the imprinting used in Fig 4 (compare Eq (31)). Before each consolidation phase, new memories were sampled from a binomial distribution B(1, 0.5). The HPC weights were then updated as where ½M� norm 1 denotes the L1 normalization of each row of the matrix M and 0 < λ < 1 is the strength of a new memory.
All shortcut connections W i showed plasticity similar to Eqs (33) and (34), i.e. with parameters τ STDP , A þ i , and A À i specified in Table 2. Weights were constrained to the interval [0, w max ] with w max ¼ 2 N and N being the number of neurons per layer. Initial weights were drawn from a uniform distribution in this interval. To increase memory lifetime in the system, learning rates were decreased along the hierarchy such that the learning rate in layer i is smaller than that in layer 1 by a factor q i−1 . Hence, layers closer to the HPC are more plastic than more remote layers.
Before starting the main simulation of N cycle consolidation cycles, we equilibrated the weight matrices by simulating N equi consolidation cycles.
Assessing the strength of memories in neocortical weight matrices. To assess the decay of memory in the system, a reference memory V ref , i.e. a specific realization from a row-normalized binomial distribution B(1, 0.5), was imprinted according to Eq (47) to V HPC at time t = 0. The memory pathway correlation, i.e., the Pearson correlation of this reference memory with all shortcut weight matrices W i was then calculated.
In analogy to the Methods on Fig 4, the maximum correlation (across layers) was taken as the overall memory signal of the system. This yields the power law in Fig 5B. The noise level indicated in Fig 5B is the standard deviation of the correlation between two random matrices drawn from a binomial distribution B(1, 0.5) and then row-normalized, both having sample size N 2 . Considering the central limit theorem, the noise level will be approximately 1/N.

Theoretical analysis of hierarchical consolidation
As outlined in the Results and illustrated in Fig 5, the suggested consolidation mechanism can be hierarchically iterated and leads to power law forgetting when the learning rates in the various pathways are suitably chosen. To get a theoretical understanding of this behaviour, let us consider the architecture shown in the Fig 6A, which is a generalized version of Fig 5A. The network consists of a hierarchy of N + 1 input layers and N + 1 output layers. For mathematical simplicity, the network is assumed to be linear (in contrast to the model described in Fig 5A, which was nonlinear due to biologically motivated weight constraints), and the representation in the input layers is assumed to be the same, i.e., the weight matrices between the input layers (indicated in black in Fig 6A) are all simply the identity matrix (in contrast to the model described in Fig 5A where the identity matrices were also scaled). Similarly, we also assume that all weight matrices between the output layers are also the identity matrix. The mathematical derivations presented in the following can be generalized to arbitrary weight matrices both in the input and the output pathways, but we prefer to treat the simple case to avoid cluttered equations and to make the theoretical approach more accessible.
We assume that due to newly acquired memories during the day, the weight matrix W 0 (t) (earlier called V HPC ) that represents the memory trace in the hippocampus is varying in time, with an exponentially decaying autocorrelation function with time constant t overwrite : htrðW 0 ð0Þ T W 0 ðtÞÞi t / expðÀ t=t overwrite Þ, where tr denotes the trace of a square matrix.
All other pathways that project from an input layer to an output layer are plastic according to STDP. To derive the learning dynamics for these pathways, we first have to calculate the activity y i in the i-th output layer, where x j denotes the activity in input layer j and c ij denote weighting factors that determine the impact of the jth pathway, i.e. the indirect pathway via W j , on output layer i. These weighting factors are needed, because we would like to keep the weight matrices on a similar scale, but avoid that the activity increases from one output region to the next, because more synaptic pathways converge onto "later" output layers. The symbol D ij = 2D(i − j) (defined only for i � j) denotes the total additional delay that is accumulated on the connection from the i-th input layer to the i-th output layer that traverses the j-th direct "shortcut" pathway, relative to the direct shortcut from input layer i to output layer i. For simplicity, we assumed that all connections have the same delay D. In a very similar way as in Eq (3), the learning dynamics of the The mathematical analysis is performed for a network consisting of N + 1 input and N + 1 output layers. All output layers (except output layer 0) weight the input from the previous layer with a factor α and the input via the shortcut pathway with a factor 1 − α, to ensure that activity does not rise as increasingly many pathways converge onto the output layers. Input layer i is hence connected to output layer i through a shortcut connection with weight matrix (1 − α)W i (except for the bottom-most layers i = 0, for which no factor 1 − α is required). All connections between input layers are set to the identity matrix I, and all connections between output layers are set to αI, for notational simplicity in the derivations. The math can be generalized to arbitrary connection matrices, as long as the network is linear. Each connection introduces a synaptic delay of D. The multi-synaptic pathway from input layer i to output layer i via shortcut connection j 6 ¼ i has a total delay of (2(i − j) + 1) � D, so the difference in delays between the pathway through shortcut i and shortcut j is weight matrix W i in the direct path can be written as where η i denotes the learning rate for the i-th pathway. For simplicity, we will assume that the different components of the input signal vector x i (t) are uncorrelated amongst each other, and have identical temporal autocorrelations that are also independent of the layer index: hx i ðtÞx T i ðt þ tÞi t ¼ If ðtÞ, where I is the identity matrix. The learning dynamics then simplify to with A(D) ≔ R L(τ)f(τ − D) dτ. To measure the degree to which a memory trace that is stored in the weight matrix W 0 at time t = 0 is still present in the j-th shortcut pathway at a later time t, we compare the weight matrix W j (t) at time t to the weight matrix W 0 (0) at time t = 0. We quantify the correlation of these two matrices by calculating the summed overlap of the column vectors: Note that the overlaps O i (t) are real numbers, and that their temporal dynamics for the shortcut connections (i.e., for all i > 0) are dictated by the dynamics of the weight matrices in the network: To capture the exponential decay of the initially stored memories in the "hippocampal" weight matrix W 0 due to the storage of new memories, the set of dynamical equations is completed by Note that the dynamics of the overlaps O i form a linear dynamical system. To show that this mathematical description shows a power-law behavior akin to the simulated system in Fig 5, we simulated the equations with the following parameter choices. Consistent with the exponential decay of the learning rates in the simulations, we chose the learning rates as η i = 2 −i . The weighting factors c ij were chosen based on the assumption that output layer i (for i > 0) receives a fraction α of its input from the output layer i − 1 below, and a fraction 1 − α via its direct shortcut connection (associated with the weight matrix W i ). Taking into account that the signal reaching layer i through shortcut connection j traverses several of these weighting stages (Fig 6A), this choice yields c ij = α i−j for j = 0 and c ij = α i−j (1 − α) for j > 0. Note that P i j¼0 c ij ¼ 1, so the activity level in different output layers should be similar. Finally, we assume that each synaptic transmission generates a fixed delay D and that the autocorrelation function f(τ) decays much more quickly than the STDP learning window. In this case, we can approximate AðD ij Þ ¼ exp À 2D ðiÀ jÞ For the simulations illustrated in Fig 6, we chose τ STDP = 40 ms as the time constant of an exponentially decaying STDP learning window for positive delays τ > 0, and we set A + = 1 in Eq (12). Furthermore, we used D = 2 ms. As shown in the Fig 6B, the maximum of the overlaps O j indeed approximates a power law decay.