Spike-Based Bayesian-Hebbian Learning of Temporal Sequences

Many cognitive and motor functions are enabled by the temporal representation and processing of stimuli, but it remains an open issue how neocortical microcircuits can reliably encode and replay such sequences of information. To better understand this, a modular attractor memory network is proposed in which meta-stable sequential attractor transitions are learned through changes to synaptic weights and intrinsic excitabilities via the spike-based Bayesian Confidence Propagation Neural Network (BCPNN) learning rule. We find that the formation of distributed memories, embodied by increased periods of firing in pools of excitatory neurons, together with asymmetrical associations between these distinct network states, can be acquired through plasticity. The model’s feasibility is demonstrated using simulations of adaptive exponential integrate-and-fire model neurons (AdEx). We show that the learning and speed of sequence replay depends on a confluence of biophysically relevant parameters including stimulus duration, level of background noise, ratio of synaptic currents, and strengths of short-term depression and adaptation. Moreover, sequence elements are shown to flexibly participate multiple times in the sequence, suggesting that spiking attractor networks of this type can support an efficient combinatorial code. The model provides a principled approach towards understanding how multiple interacting plasticity mechanisms can coordinate hetero-associative learning in unison.


Introduction
Converging threads of evidence suggest that neural ensemble dynamics consist of intermittent population bursts with abrupt sequential transitions occurring on the order of hundreds of milliseconds. Indicating conserved underlying mechanisms, this hallmark of activity occurs in a diverse array of behaviors including during motor [1][2][3], sensory [4,5], memory [6][7][8][9], and decision making [10,11] tasks. Similar population activity regimes are observed in cortical reactivations during sleep following motor performance or sensory exposure while the animal is awake [12][13][14]. These patterns routinely arise in the form of transient neuronal coalitions that experience concomitant shifts in their firing rates and manifest as stereotypical trajectories taken through the state space of neocortical networks.
Such discrete states of increased activity within which firing rates are approximately stationary allude to the attractor memory paradigm, which has been elaborated theoretically [15,16] and biophysically [17] from Hebb's early notion of cell assemblies [18]. Empirical evidence for attractor-like dynamics notwithstanding [19,20], reconciling attractor network models with the irregularity of cortical spiking observed in vivo has been problematic [21]. A proposed model circumvents this issue by introducing modularity in the form of soft winner-take-all (WTA) hypercolumns, which bestow irregular firing regimes and allow for switching between a lowrate background state and elevated attractor activations [22]. However, spike-based procedures for learning the connectivity required to support attractor dynamics in this model have not yet been considered, and temporal associations between different attractor states remain absent.
More generally, it remains an open question how stable sequential dynamics could self-organize within recurrent networks in a biologically meaningful way [23,24]. Despite issues regarding its ubiquity [25] and fidelity [26] as a generic learning rule, spike-timing dependent plasticity (STDP) [27] has often been exploited for sequence learning. Temporal causality is presumably beneficial for feed-forward structural development, but complementary mechanisms are typically required due to the inherent instability of phenomenological STDP rules [28].
For reliable temporal sequence production, STDP has been coupled with a slew of biologically-inspired phenomena including network-wide neuromodulatory interactions [29], synaptic weight normalization [30][31][32] and intrinsic excitability [33,34]. Although such modeling studies may employ the fewest number of mechanisms for the sake of computational simplicity, they implicitly ignore the combination of these mechanisms among potentially many other interacting dynamical processes.
How can one begin to understand these complicated synergies in a principled way? We propose that an attractor memory network could learn temporal sequences using the spike-based BCPNN learning rule [35]. Although the approach encompasses functionally diverse mechanisms including Hebbian, homeostatic, neuromodulated, and intrinsic plasticity, it can be straightforwardly understood since it is neatly encapsulated within the framework of probabilistic inference [36,37].
To enable the collective cortical transitions of the sort aforementioned, synaptic plasticity occurs on separate short and long time scales. Fast AMPA channels stabilize within-attractor activity, whereas temporal sequence development hinges upon slower NMDA channels whose delayed activations govern transitions between attractors [38,39]. The learning rule interprets these parameters as establishing a window of temporal integration for synaptic plasticity.
Through AdEx neuron simulations, the model can account for several noteworthy aspects of temporal sequence development including dependence on trained temporal interval [40], performance of prospective or retrospective replay regardless of the temporal training direction [41,42], cue-triggered recall [43], and co-temporal neural representation of serial order [1]. It provides a basis to explore mechanisms by which sequential cortical activity is compressed in time [12,13,43], and demonstrates that relative replay speed is determined by a combination of intrinsic network parameters as well as the speed of conditioning stimuli. Lastly, we illustrate the flexibility of the learning rule in the network using experimentally motivated stimulus configurations, and find that it is able to complete and disambiguate overlapping patterns [44] through successful retrieval of sequential attractor trajectories.

A Spike-Based BCPNN Learning Model
We implemented a computational model of a cortical microcircuit inspired by previous work [45] that was topographically structured according to the columnar organization of neocortex [46,47]. The network consisted of 2700 pyramidal cells and 270 basket cells divided into nine hypercolumns (N HC = 9), each consisting of 300 pyramidal cells and 30 basket cells. Pyramidal cells had spike-frequency adaptation according to the AdEx model (See Eq 6 in Methods).
The pyramidal cells in each hypercolumn were further divided into ten minicolumns (N MC = 10), each consisting of 30 pyramidal cells [48]. The pyramidal cells in each minicolumn laterally projected to and received feedback from the 30 local inhibitory basket cells within their own hypercolumn to form competitive WTA subnetworks [49]. These connections were static and consisted of both excitatory AMPA-mediated pyramidal-to-basket and inhibitory GABAergic basket-to-pyramidal connections with connection probabilities set to 0.7. In addition to the static local (i.e., within-hypercolumn) feedback connections, there were plastic BCPNN synapses (see below), which randomly formed AMPA-and NMDA-mediated connections among all pyramidal cells across the network with distant-dependent axonal delays (see Eq 5 in Methods) and a connection probability of 0.25 ( Fig 1A). These connections also exhibited short-term depression (see Eq 8 in Methods).
The strengths of AMPA and NMDA synapses between pyramidal cells along with the intrinsic excitabilities of all pyramidal cells underwent plasticity according to the spike-based BCPNN learning rule. BCPNN makes use of local traces that estimate the firing rates of pre-and postsynaptic neurons along with their coactivations, respectively referred to below as P i , P j , and P ij . Statistics of neural input and output activity are used to infer a synaptic weight w ij between pre-and postsynaptic neurons i and j, and an intrinsic excitability β j of postsynaptic neuron j. For simplicity, the separate synaptic traces in Eqs 1-3 are specified by either pre-(i), postsynaptic (j), or both (ij) neural variables depending on their relative source(s) of activity [35].
Pre-S i and postsynaptic S j spike trains are formally described by summed Dirac delta pulses δ(Á) with respective spike times t i sp and t j sp : S i ðtÞ ¼ X sp dðt À t i sp Þ; S j ðtÞ ¼ X sp dðt À t j sp Þ: ð1Þ Smoothing was performed using a system of two coupled exponentially weighted moving averages. The first level of traces, the Z traces, lowpass filtered S i and S j : Setting f max = 20 Hz established a normalizing linear transformation between neuronal spike rate 2 [f min , f max ] and probability space 2 [ε, 1] (see  in Methods) such that the average value of the Z traces over time was approximately 1 when a neuron fired at f max , ε = 0.01 represented the lowest attainable probability estimate, f min = εf max = 0.2 Hz the corresponding lowest firing rate, Δt = 1 ms denoted the spike duration, and syn represented AMPA or NMDA synapses. Presynaptic Z trace time constants were set to t AMPA z i = 5 ms and t NMDA z i = 150 ms, the latter reflected the slow closing dynamics of NMDA receptor gated channels. Postsynaptic Z trace time constants t AMPA z j = t NMDA z j = 5 ms reflected fast closing dynamics of AMPA receptor gated channels and membrane depolarization due to spike back-propagation, respectively [50].
Activation and coactivation probabilities were estimated by filtering the Z traces: AMPA channel phosphorylation and mobilization from reserve pools, gene expression and protein synthesis are potential candidates for mediating Hebbian synaptic plasticity and memory formation in a manner prescribed by the P traces [51,52]. These traces constitute the memory itself, which decays in palimpsest fashion.
The time constant for the P traces, τ p , reflected the extended time course of the downstream processes. We simply imposed t syn z i , t syn z j < τ p , whereas in reality the timescale of τ p can exhibit high variability ranging from seconds to months [53]. Since the time it took to learn increased proportionally to τ p , it was set to 5000 ms to speed up simulations while preserving the dynamics studied. P traces had the same time constants because they reflected estimated spike rates whose updates incrementally converged to approximate probabilities (see Eq 1-3 in S1 Appendix). The WTA hypercolumns (shaded gray areas) consisted of excitatory neurons (red circles) belonging to minicolumns (red ovals) that innervated (red arrows) local populations (blue ovals) of basket cells (blue circles), which provided inhibitory feedback (blue arrows). Nonspecific BCPNN synapses (black arrow) were recurrently connected amongst all excitatory neurons (black dotted box), with learned weights (scale) determined by pre-and postsynaptic spike activity. Semicircular arrowheads indicate the postsynaptic target. (B) Schematic of a polysynaptic subcircuit that elicited a net excitatory effect on the postsynaptic neuron when the BCPNN weight was positive (+ pathway), while a negative BCPNN weight can be thought of as being converted to inhibition via excitation of a local inhibitory interneuron (− pathway).
doi:10.1371/journal.pcbi.1004954.g001 P traces were modulated by a globally applied 'print-now' factor κ 2 [0, 1], which signaled the relevance of recent synaptic events in the spirit of what would be expected from a neuromodulator like dopamine [54]. By assuming that neurons were minimally active, the P traces were always positive, allowing their logarithmic transformation (Eq 4, see below) to remain defined.
The full derivation starting from Bayes' rule is delineated elsewhere [35] and summarized in S1 Appendix. Probabilistic computations on the network level can be approximated by individual neuronal biases I b j and synaptic weights w syn ij : The dimensionless β j = log(P j ) component quantified a general level of excitability and spiking of the postsynaptic neuron as it was solely dependent upon that neuron's relative activity [55]. β j was scaled by β gain = 50 pA to give I b j , an activity-dependent intrinsic membrane current of an AdEx model neuron (see Eq 6 in Methods) like the A-type K+ channel [56] or TRP channel [57]. The strength of a BCPNN synapse was w syn ij . The peak amplitude of the conductance transient was determined by the logarithmic BCPNN weight multiplied with w syn gain using values w AMPA gain = 6.02 nS and w NMDA gain = 1.22 nS. Their ratio was the AMPA/NMDA ratio, which was tested at different levels due to variations in reported values [58].
It was assumed that stimulus arrival coincided with an increase in κ, the third factor of the learning rule, for all BCPNN synapses and pyramidal cells. Mimicking dopamine activation responses as learning progresses for novel stimuli [59,60], κ temporarily permitted changes in all w syn ij and I b j through global modulation of τ p (Eq 3). During interleaved quiescent periods when no stimulus was deemed relevant, κ was clamped to 0. In practice κ = 0 disabled learning by freezing w syn ij and I b j , allowing for post-training simulation in off-line mode [61]. During training periods, κ = 1 while both β gain and w syn gain were set to 0, and during recall periods, κ = 0 while both β gain and w syn gain were positive. The logarithmic transformation from Eq 4 was motivated by the Bayesian underpinnings of the learning rule, but this caveat meant that w syn ij multiplexed both positive and negative signals. Learning of excitatory or inhibitory weights was interpreted in the context of a monosynaptic excitatory connection with conductance set by the positive component of w syn ij alongside a disynaptic inhibitory connection [62][63][64][65] set by the negative component ( Fig 1B). In practice, when the sign of the BCPNN weight turned negative, an inhibitory reversal potential was used instead of an excitatory one (see Eq 7 in Methods). The rationale for the development of negative weights is expounded upon in the Discussion.
Following previous theoretical studies [66][67][68] the shape of the learning kernel (Fig 2A) was linked to the shape of the EPSP measured by change in synaptic conductance g syn ij (see Eq 9 in Methods), which was determined by synaptic time constants τ AMPA = 5 ms and τ NMDA = 150 ms [69] (Fig 2B). We set t NMDA z i = τ NMDA and t AMPA z i = τ AMPA , entailing that the NMDA component was subject to plasticity [70] and resulting in a temporally shifted window function [71].

Learning Sequential Attractors Using Spike-Based BCPNN
The network was initially trained using 10 orthogonal component patterns. Each pattern consisted of one active minicolumn per hypercolumn without overlap (Fig 3A), and was presented for t stim = 100 ms using an inter-pulse interval (IPI) of 2 seconds (Fig 3B) during which κ was clamped to 0 ( Fig 3C). Over the course of training, plastic changes transpired for intrinsic currents I b j (Fig 3D) and weights g syn ij in both AMPA ( Fig 3E) and NMDA (Fig 3F) synapses.
Neurons sharing patterns developed positive connections among themselves, but those belonging to different patterns developed strong negative ones. Minicolumn neurons belonging to the same hypercolumn had similar average connection strength profiles, so Fig 3E and 3F would look the same for a single minicolumn within a hypercolumn as they do here for neurons belonging to a single pattern across all hypercolumns. Learned quantities did not converge precisely but rather fluctuated within an interval. The amplitude of these fluctuations was determined by τ p (S1 Fig). Similar firing rate statistics during training together with a vanishingly small t stim in the limit of τ p ensured approximately equal values for positive g syn ij within patterns, for negative g syn ij between patterns, and for all I b j . During memory recall, the network spontaneously recalled stored temporal sequences in the form of temporally coactive patterns (Fig 3G). Sorting the rastergram by pattern revealed clear~500 ms periods of elevated firing rates shared by groups of neurons from different hypercolumns ( Fig 3H). We refer to these periods as attractor states (see Methods). Network dynamics wandered from attractor to attractor, alternating in a seemingly random fashion, while neurons outside the currently active attractor were silenced due to lateral inhibition ( Fig  3I). A non-coding ground state spontaneously emerged where individual active states were transiently interrupted by lower rate (~2 Hz) volatile periods in which neurons competed via WTA, thwarting any obvious selection of a winner. Attractor demise was precipitated by a combination of inhibition from the network together with neural and synaptic fatigue, which consisted of an enhanced spike-triggered adaptation current I w (Fig 3J, see Eq 6 in Methods) and short-term depression that depleted available synaptic resources ( Fig 3K, Eq 8) amid active attractor states.  Evolution of the adaptation current I w for a single neuron whose activity builds up and attenuates as it enters and exits attractor states. (K) Evolution of the same neuron's dynamic AMPA strength due to shortterm depression, whose postsynaptic target resides within the same attractor.
A longer IPI (2 s) relative to the NMDA plasticity window width t NMDA z i (150 ms) meant that associations between different patterns did not develop. To demonstrate how BCPNN learning could memorize sequences, we altered the training protocol by decreasing IPIs and then reexamined network recall dynamics. We began by reducing the IPI to 0 ms ( Fig 4A). Salient stimuli were always being encountered by the network in this case, so κ = 1 throughout. This enabled an evolution of I b j as in Fig 3D and  The otherwise similar learning outcomes can be appreciated by the fact that neurons were exposed to identical stimuli while κ = 1. Stimulus gaps previously encountered while κ = 0 (Fig 3A), which would have led to g syn ij and I b j decay if κ was nonzero, were essentially never learned.
However, the lower IPI did lead to a dispersion of higher average NMDA connection strengths that deviated markedly from Fig 3F (Fig 4B). Filtering by the BCPNN traces reflected  relative stimulus presentation times, which created a discrepancy in the amount of overlap in the underlying running averages of coactivity traces P ij (Eq 3, S2B Fig) when the IPI was reduced. Reflecting the temporal disparity between t stim versus t NMDA z i , the modified training procedure allowed the network to learn stronger feedforward associations than recurrent ones and produced an asymmetrical terminal g NMDA ij structure ( Fig 4C). NMDA synapses connecting excitatory cells in subsequently presented patterns formed progressively decreasing forward associations. Taken from the other perspective, trailing inhibition was learned in the form of negative-valued g NMDA ij towards antecedent pattern cells. The resulting weight configuration reorganized the network dynamics, generating reliable sequential transitioning between attractor states (Fig 4D-4F). Sequential state traversal stably and perpetually repeated, though including longer temporal gaps between entire epochs rather than individual patterns could alleviate such cyclic behavior. Given the same training protocol as depicted in Fig  In networks trained using longer IPIs, the probability of transitioning from one attractor to another seemed uniform (Fig 3G-3I). However, attractor transitions became more deterministic as IPIs were decreased to 0 ms ( Fig 4D-4F), thus there were differential effects of the training interval on neural recall dynamics [40]. Since the difference in recall dynamics between the two setups presented here could be mostly ascribed to the makeup of g NMDA ij between neurons from different patterns (e.g. rather than g AMPA ij or I b j ), we expressed the learned model networks at different temporal intervals using their terminal g NMDA ij profiles ( Fig 5A). To characterize the transformation from random to nonrandom temporal sequences depending on these resulting weight profiles, we measured the distribution of attractor transitions for different IPIs using CRP curves (see Methods). Shortening the IPI led to narrower CRP curves that peaked at lag 1 (Fig 5B), indicating this manipulation decreased recall heterogeneity and made forward transitions more likely. Networks trained with longer IPIs instead displayed CRP curves that hovered around chance levels.
A side effect of the learning procedure that led to varying degrees of asymmetry as a function of the IPI was the difference between total learned NMDA synaptic strengths averaged over each entire network ( Fig 5C). In networks trained with shorter IPIs, surplus aggregate g NMDA ij , specifically g NMDA ij projecting towards subsequent patterns, influenced not only the order of recall but also the resulting average dwell times μ dwell (Fig 5D, see Methods) and firing rates ( Fig 5E). A more pronounced effect of NMDA in low-IPI trained networks meant higher within-attractor firing rates that translated into stronger activation of adaptation and shortterm depression, and therefore quicker initiation and cessation of individual attractors. Reduced NMDA in high-IPI trained networks meant that AMPA receptors could prolong attractor states further before succumbing to the pressures of neural and synaptic fatigue.
Specifically, lag 1 transitions (Fig 5B) conveyed the temporal extent of sequence replay. To ascertain whether the transformation in recall order dynamics from random to sequential was gradual or abrupt, we ran the same analysis but considered only the lag 1 transitions of the CRP curves (S3 Fig). We found that the temporal extent of sequence recall was limited with CRP curves reaching chance levels around IPI = 1500 ms [12]. The rather gradual decline in the CRP from the point at which forward transitions were inevitable, to the point at which their probability was indistinguishable from all other possible transitions in the network, demonstrates the capability of NMDA to influence the trajectory taken through state space depending upon trained temporal interval.

Characterizing the Dynamics Underlying Temporal Sequence Recall
As the network with IPI = 0 ms proceeded through sequential states, individual neurons not actively participating in attractors were mostly quiescent, being suppressed through lateral inhibition. Previously referred to as the background state [22], single cells displayed characteristically hyperpolarized V m (Fig 6A, see Eq 6 in Methods) until recruited into an attractor, where they then received an excitatory push through delayed activation of the NMDA receptor ( Fig 6B, Eq 7). Companion neurons then followed, and there was a period of mutual excitation that stabilized attractor activity via the AMPA receptor ( Fig 6C, Eq 7). Once reaching fruition, the attractor immediately began to perish due to ongoing cellular adaptation and short-term depression (Fig 3J and 3K, Eqs 9 and 11) that ensured its neurons would eventually lose their competition against neurons of opposing attractors due to GABA receptor activation (Fig 6D,  Eq 7).
On the network level, the effect of systematically eliminating one dynamical mechanism at a time provided insights into each of their overall contributions to the circuit (S4 Fig). Eliminating plastic mechanisms one at a time each degraded sequence replay capabilities except for intrinsic plasticity. Most mechanisms also degraded sequence replay capabilities at higher parameter settings with the exception being adaptation, whose detrimental effects were mitigated by its relatively short time constant, which remained constant for these experiments (see Eq 6 in Methods).
Introducing a variable P(switch) that determined the probability that individual sequence elements within a training pattern order would differ from the original one led to a disruption of structure in the terminal g NMDA ij profiles (S5A Fig). The chance that a sequence element was switched was applied indiscriminantly for all sequence elements in all epochs. This alteration gradually degraded the capability of the network to perform sequential recall, similar to the effect of increasing the IPI (S5B Fig, compare to Fig 5B). On the spiking level, a bimodal distribution of pyramidal cell inter-spike intervals (ISIs) representing either ISIs within attractor states or ISIs between attractor states arose (S6A Fig). The variability of single cell spiking was predominantly high for both pyramidal and basket cell spike trains, as reflected by their average local coefficient of variation of ISIs that clustered close to unity (S6B Fig). Up to this point, the network has been presented in a regime where background excitation was high enough to spontaneously rebroadcast stored patterns. To instead demonstrate cuetriggered recall of temporal sequences [43], the network input and training protocol were slightly altered by decreasing background excitation and adding temporal gaps between the last and first patterns of the training sequences. When briefly presented with a cue (see Methods) at the first pattern, the network could then perform sequential recall in the order that previously experienced conditioning stimuli were encountered (Fig 7A). Until presented with a cue and after recall, the network was generally disengaged and fired at low rates, exhibiting similar characteristics to the ground state that arose in Fig 3 (Fig 7B).
A notable aspect of Fig 6B was the elevated level of I NMDA just before the cell entered an attractor state, suggesting that a neuron's subthreshold dynamics could anticipate its future participation in the state despite the strong lateral inhibition that prevented it from spiking. As switching between states was not necessarily visible from the perspective of the single cell, V m was averaged over all cells in the network for the cue-triggered simulation, from which it became clear that groups of cells exhibited consecutively postponed upward V m transients corresponding to their anchoring events (Fig 7C).
Prior to cue presentation, averaged attractor membrane potentials spontaneously fluctuated yet roughly kept a reverse ordering of the original trained sequence. This was a relic from small leftover differences in I b j after training because the last pattern was the one most recently presented and the first was the least recently presented to the network (Fig 3D; although referring to the non-sequentially trained network, the same principle holds). However once the cue was presented, and as similarly shown in Averbeck et al. (2002), the network dynamics were overwritten. The rank ordered attractors reorganized to correspond to the serial order of the trained stimuli, the subsequent attractor began to rise at roughly the middle of the current attractor, and the previous and subsequent attractors began to cross at approximately the point where the execution of the subsequent attractor began.
These relationships were most prominent for the first four attractors occurring directly after the cue (attractor index 0). The rank ordered V m of attractors with indices 1-4 at this time corresponded to their trained serial order, but this ordering principle was not as well-defined at the same point in time for the more distant ahead attractors with indices 5-9. This was due to the limited temporal extent of t NMDA z i that was able to reach ahead only by a dwell time of about four attractors or so, after which its impact was unfelt. Relative rank ordering was not unique to the cue time, but was largely conserved even for attractors that occurred later on in the sequence. For example during the activation of attractor 5, the next highest ranks represented were attractors 6, then 7, then 8, and finally 9. Alterations of the moving average window duration did not qualitatively distort these properties.
But beginning around attractor index 5, the tendency of the subsequent attractor to rise at the middle of the current attractor, and the tendency of the previous and subsequent attractors to cross where the subsequent attractor began, broke down. For example, the attractors with indices 8 and 9 already began to rise while the attractor with index 6 was active. This was due to neural and synaptic fatigue whose effects become more pronounced in the network towards the end of the cue-triggered sequence, making it easier for neurons of attractors occurring later in the sequence to prevail in their competition with exhausted ones that were active earlier.
Since the NMDA current is subject to temporal summation, later attractors also tended to rise earlier due to NMDA build-up from previously active inputs. Such considerations may inform future experiments evoking sequential memories whose temporal extent stretches beyond the time course of NMDA receptor decay.

Factors Determining Temporal Sequence Recall Speed
In Fig 7A, the slow increase of average subthreshold V m predicted the neuron's participation within the approaching assembly, raising the question as to what cellular and network factors In this example both before and after the cued sequence, patterns spontaneously but weakly activated, which could occur randomly due to the sensitivity of the system. Black arrows represent time periods occurring after the midpoint of the cue initially and after the midpoint of each attractor thereafter, during which the average V m of the upcoming attractors are ranked according to their relative serial order within the sequence. initiated this V m build-up, determined the rising angle of its positive slope, and therefore increased or decreased the speed of ensuing attractor activation. At the network level, differential attractor dwell times would manifest as changes in the speed of sequential transitions.
To analyze which factors played a role in determining the relative lifetimes of sequential attractors, we began by taking the perspective that the speed at which sequences were learned was a product of the speed at which conditioning stimuli were presented to the network. Given its functional importance for sequence development in this model (see Fig 5), the strength of terminal NMDA weights are shown for different stimulus durations ( Fig 8A). As t stim was decreased, the shape of the terminal NMDA weight distribution flattened out, leading to an increased level of overall NMDA excitation and therefore faster network recall speeds (Fig 8B, see Eq 11 in Methods). We also measured the strength of terminal NMDA weights by varying the number of training epochs while t stim = 100 ms (S7 Fig). Terminal weights could evolve to a relatively stable distribution of values after only 10 epochs, lowering the possibility that observed speed effects were localized only to the 50 epoch condition.
The speed with which attractors were recalled ranged from 3 to 7 attractors per second due to changes in t stim . But recall speeds were mostly dilated, displaying a compression factor < 1 (see Eq 12). This meant t stim changes (i.e. trained speed changes) alone did not suffice to keep an even pace with the recall speed, let alone account for compression factors as high as those from experimental measurements [12,13,43].
An alternate hypothesis is that speed is not constrained as much by peripheral events as it is by intrinsic cortical mechanisms. When the network recalled sequences, it was apparent that the V m of each attractor displayed weak subthreshold oscillations (see Figs 6A and 7C). This suggests that elevating V m should lead to faster replay. To explore this possibility, we modified several different intrinsic parameters of the network after training and measured their effects on recall speed. The most straightforward modification we considered was to increase the amount of background noise in the network r ex , and as expected, delivering higher input levels to the excitatory cells accelerated recall speeds (Fig 9A).
Next, we considered the effect of synaptic currents on recall speed. We did not alter the absolute AMPA or NMDA synaptic efficacies alone, but rather altered their relative contributions via w syn gain (Eq 4). Decreasing the AMPA/NMDA ratio led to a noticeable speed-up (Fig 9B), highlighting again the importance of NMDA in fueling the ascent of forthcoming attractor states.
Because the gradual erosion of attractor states was determined by a combination of neural adaptation and short-term synaptic depression (see Fig 3J and 3K), we next varied these dampening mechanisms in isolation to see how they affected sequence recall speed. We varied two types of parameters in the models of adaptation and short-term depression (see Eqs 9 and 11): their respective magnitudes, b and U, and time constants, t I w and τ rec . Increasing the magnitude of both neural ( Fig 9C) and synaptic (Fig 9D) fatigue enhanced recall speeds. Increasing the time constants of both neural ( Fig 9E) and synaptic (Fig 9F) fatigue also enhanced recall speeds, but to a lesser extent. Overall, these results not only illustrated the relationship between intrinsic network parameters and recall speed, but also demonstrated a level of robustness of the model to these parameter changes since the sequential function of the network remained intact.
It became clear based on all of the parameter variations considered that temporal compression in this network was more easily attainable for longer stimulus durations (Figs 8B and 9). Since experiments demonstrating temporal compression similarly elicited firing activity at longer durations [12,13], we analyzed the extent to which the network could be maximally compressed and dilated for a pattern duration of t stim = 500 ms (i.e. a trained speed of 2 patterns/ second). Compression factors for each calibrated parameter at this trained speed are summarized by Fig 10A. Although all of the parameters tested at this speed were able to alter the network speed to some extent, some more effectively than others, it would still be impossible to separate out which specific factor could give rise to compression if sequential activity was already compressed, for example in an experimental setting. To further characterize how these mechanisms could inform the network dynamics, we considered their effects on firing rates.
Linear regression analysis was performed on the relationship between firing rates and recall speeds using all of the output data procured from the different networks in Figs 8 and 9, and a weak relationship between these two variables was found (Fig 10B). Since there was minimal  interaction between the mechanisms contributing to these variables, the same type of analysis was then performed for each individual network component (Fig 10C). We found that training speed and b were the most strongly correlated, U and t I w were weakly correlated, and r ex , τ rec and the AMPA/NMDA ratio were uncorrelated with average firing rate.

Completion and Disambiguation of Overlapping Sequences
To further illustrate its robustness and functional flexibility, the network was trained using sequentially overlapping temporal stimulus patterns. Two separate training protocol were designed that were motivated by experimental paradigms [72,73] and adhered to generic sequential memory recall tasks [44]. For the following protocol, we reverted to a training configuration involving interposed delays between trials, in other words training blocks with IPI = 0 ms were separated by 2000 ms periods of quiescence as in Fig 7. Each protocol consisted of two unique subsequences of the complete sequence pattern alternatingly presented 50 times each.
In the first test, alternating stimulus patterns consisted of two subsequences where the last pattern of the first subsequence matched the first pattern of the second subsequence ( Fig 11A). The network should be able to bind the transitively associated discontiguous subsequences together and perform cue-triggered recall in its entirety despite the different degrees of temporal proximity between subsequences [73]. After training, the resulting NMDA weight matrix exhibited pronounced asymmetry (Fig 11B) as in Fig 4C. But nonspecific negative connections developed between all of the non-overlapping patterns such that the network could only rely upon the excitatory connections projecting to and from a single pattern in order to bridge the two subsequences together. Nevertheless, the network could overcome these limitations and successfully recall the entire sequence as in Fig 7 (Fig 11C).
In the second test, stimuli consisted of two alternating subsequences whose middle two patterns were shared between both subsequences (Fig 11D). The first two unique patterns of each subsequence were followed by an intersection where the next two patterns were shared by both subsequences; this was followed by a 'fork' that divided the subsequences again into two separate branches comprising their last two unique patterns [72].
The resulting NMDA weight matrix after training showed that the connections projecting from the overlapping patterns towards the patterns belonging to each fork were identical ( Fig  11E). But the unique subsequence patterns occurring before the overlapping patterns selectively developed connectivity preferences towards the last two unique patterns of their associated fork branch, and formed negative projections towards patterns belonging to their opposing fork branch. These preferences, enabled by temporal extent of t NMDA z i , allowed the network to successfully recall each individual subsequence when presented with its respective cue ( Fig 11F).
Sequence disambiguation could be performed using a single overlapping pattern shared by both subsequences (S8 Fig), but not using more than two. Spontaneous attractors arose in between cue presentations during these experiments, but they did not hinder the ability of the network to faithfully perform recall. Due to the success of the task, the resulting CRP curve for this experiment was similar to the sequential one presented in Fig 5B, albeit with more noise outside of the lag 1 transitions due to these more unreliable spontaneous attractors (S9 Fig).

Neurobiological Underpinnings
Spike-based Bayesian-Hebbian learning enabled imprinting of sequential memory patterns onto the neocortical microcircuit model. This convenient setup allowed us to scrutinize the cellular, synaptic, and network mechanisms underlying sequence formation. Although some of the generic mechanisms presented for learning temporal sequences here may be employed throughout other well-studied brain regions [74][75][76], the proposed model is identified with a neocortical microcircuit since the BCPNN derivation entails columnar processing [46,47].
Hypercolumns were locally governed by soft WTA dynamics, which is a generic neocortical organizational principle [49] that has been emphasized elsewhere in the context of sequential neural activity [77][78][79] and sparse distributed coding [45,80]. They were modularly organized into laterally inhibiting minicolumns (Fig 2) [37,81,82], which can be thought of as a more economical way for circuits to abstractly achieve a more dense and symmetric connectivity as originally postulated by the Hopfield model, given that connectivity is viewed at the minicolumn rather than cellular level [17]. Modularity is also present in the enthorinal cortex [83] suggesting some of the generic computational features of this model could be relevant in the context of grid cells, whose V m modulations have been shown to closely match predictions from similar types of attractor models (Fig 7C) [84].
On the level of neurons and synapses, the learning rule incorporated several experimentally reported dynamical processes. We chose the AdEx model to demonstrate our findings because of its broad usage and availability in the simulation literature, its mapping between parameters and experiments [85], and its compatibility with recently developed neuromorphic hardware systems [86]. We would not expect employing other adapting neural models [87] to influence our results, since network dynamics are relatively robust to parameter changes and individual spikes. In fact, a reduced version of this model was recently run with generalized adaptive leaky integrate-and-fire neurons on the digital neuromorphic SpiNNaker architecture [88].
The intrinsic neuronal bias I b j reflecting the prior activation level, was viewed as a model for the activity-dependent hyperpolarizing activation of A-type K + ion channels mediated by Ca 2+ regulated cationic currents [56] or TRP channels [57]. It shifted the input-output curve of the postsynaptic neuron such that persistently low active neurons received more hyperpolarizing current and highly active neurons received less hyperpolarizing current, in functional agreement with long-term potentiation of intrinsic excitability (LTP-IE) [55]. The proposed network does not require intrinsic plasticity in order to replay temporal sequences (see S4A Fig), but it is included here to comply with a previously introduced spikebased probabilistic framework [35]. The role of I b j within the circuit depends on the distribution of training patterns. When the training patterns are presented equally often and at evenly spaced intervals, I b j converges to similar values for all of the neurons in the network (Figs 3 and   4). But more complex training with patterns occurring at different frequencies (e.g. pattern 5 of Fig 11A and 11D) means that I b j would be different for neurons belonging to different attractors. This opens up the possibility that rather than providing a generic downward pressure on activity, learned I b j could more directly determine the sequence trajectory itself in some circumstances, although this was not observed in the simulations presented here.
Plastic changes in synaptic weights were determined by the relative window width set by t syn z i and t syn z j (see Fig 2B and S2 Fig). We set the postsynaptic Z j trace time constants to t AMPA z j = t NMDA z j = 5 ms, reflecting the fast closing dynamics of AMPA channels and the membrane depolarization due to spike back-propagation, respectively [50]. The latter helps lift the Mg 2+ -block on NMDA channels and allows Ca 2+ to flow into the postsynaptic neuron and induce synaptic potentiation upon activation of the NMDA receptor by glutamate [89]. Presynaptic Z i trace time constants were set to t AMPA z i = 5 ms and t NMDA z i = 150 ms, the latter reflecting the slow closing dynamics of NMDA receptor gated channels.
AMPA synapses had symmetrical learning time constants t AMPA z i = t AMPA z j = 5 ms, which were matched to the AMPA synaptic time constant τ AMPA = 5 ms. Learned auto-associative AMPA connectivity was responsible for sharpening and stabilizing activity within attractor states. But NMDA synapses had asymmetrical learning time constants, which enabled functionally asymmetric inter-pattern hetero-association. Asymmetric excitation (Fig 4C) can be achieved through synaptic learning [71,90], which may explain the development of asymmetric hippocampal place [91] and visual receptive [92] fields. We matched the shape of the learning kernel to the shape of the EPSP (Fig 2) following previous studies [66][67][68], and also acknowledge this parameterization is sufficient but not necessary for temporal sequence recall in our model. In principle, it would also be possible using different degrees of NMDA window shape asymmetry, and even using a symmetrical NMDA kernel (e.g. by setting t NMDA z i = t NMDA z j = 150 ms). In a cue-triggered setup with trained symmetrical NMDA, the sequence could be recalled either forwards or backwards by presenting a cue to the first or last attractor, respectively. Instead of being directed by NMDA asymmetries, the preferred sequence trajectory would evolve according to adaptation and short-term depression. After shutting down an active attractor, the network would activate the immediately neighboring attractor since it would receive the next highest amount of learned potentiation.
However, if this were the case experimentally, one would expect cortical networks to be able to be trained on a specific stimulus order and then nonselectively perform sequential pattern completion regardless of whether the cue was presented at the head or the tail of the original sequence. Since it is rather the case that there is a high specificity for trained sequence order [43,93], we emphasize the functional importance of an asymmetrical NMDA kernel in this model. It should be noted that such temporal specificity is also compatible with what would be expected from STDP [92].
As apparent from Figs 3D-3F, 4B and S1 Fig, g syn ij and I b j did not continually grow nor shrink but ultimately settled into quasi-stationary values that reflected accrued knowledge of the stimuli. Such behavior is functionally reminiscent of synaptic scaling [94] and anti-homeostatic plasticity of intrinsic excitability [55], which can similarly stabilize Hebbian synaptic plasticity and LTP-IE over longer time scales, respectively.
Furthermore, as the P j trace was involved in the computation of both I b j and g syn ij , the learning model can be viewed in terms of co-expressed synaptic and nonsynaptic plasticity arising from shared biochemical signaling cascades [95]. The computational role of an eligibility trace (i.e. E trace) as an intermediate exponential filter in between the Z and P traces was previously explored [35], but was excluded here because delayed reward learning was not required for learning temporal sequences.
The globally applied 'print-now' factor κ controlled both synaptic and intrinsic plasticity. In contrast to learning scenarios in which the dynamics of the network disturb the training regime, our protocol activated κ and de-activated both β gain and w syn gain during pattern presentation and vice-versa during recall (see Figs 3 and 4). Clamping is unconventional in spiking models, but it is more common in classical recurrent associative networks in which recalling old patterns actively interferes with learning new patterns. This dilemma has been addressed in spiking models by introducing multiple time scales with different synaptic states [96,97], but we assume here that neurophysiological mechanisms clamp activity to the patterns to be learned through selective suppression [61].
We consider this third-factor κ as representing the influence of a non-local neuromodulatory signal like acetylcholine or dopamine, which are thought to play important roles in the emergence of temporal sequential activity [93,98]. They can have pronounced postsynaptic effects on cortical neurons like increasing spiking activity from synaptic stimulation. These effects enhance the response of cortical neurons to afferent input during the suppression of synaptic transmission within intrinsic connections, enabling neurons to clamp to afferent input patterns [61].
Though beyond the scope of this work, activating β gain and w syn gain during pattern presentation should not drastically diminish the network's learning capabilities under most circumstances. A suppression of competing minicolumn activity would largely prevent neurons outside of a presented pattern to be recruited into that pattern, as long as patterns remained relatively orthogonal. Introducing more overlapping patterns would likely re-establish the need to modulate κ, β gain and w syn gain . Furthermore, we applied a simplified binary model of κ, but acknowledge that introducing more realistic neuromodulatory dynamics [99] could also be a productive direction for future study.
Both positive and negative weights can be learned in the BCPNN formulation due to its Bayesian derivation. Negative weights are neurobiologically interpreted as a form of disynaptic inhibition (see Fig 1B) [62][63][64][65]. In the interest of more biologically detailed setups, networks with negative synaptic weights have been shown to be functionally equivalent to ones with both excitatory and inhibitory neurons with only positive weights [100]. Introducing explicit fast-spiking linear interneurons may influence the network dynamics due to their added synaptic delays, but these effects would be minimal since our model is not critically dependent upon precise spike timing.

Temporal Sequence Dynamics
Supporting the notion that learned spatiotemporal sequences depend on stimulus timing (see Fig 8), evoked responses are sensitive to the duration of experienced temporal patterns in vitro [40], are spontaneously replayed at the same speed they are trained on in vivo [101], and are decreased for identically ordered but re-timed stimuli down to a difference of just 50 ms [93]. But there is a growing consensus that the involvement of peripheral or environmental signals cannot fully account for the speed of sequential activity [102]. Supporting the notion that internally generated signals play an important role (Fig 9), sequence speeds may be altered depending on the behavioral state of the animal [43] and in the absence of any perceivable input [5]. Furthermore, they are not time-locked to the onset of the stimulus [4] or task [9], and they are compressed during sleep in visual [13,43] and prefrontal [12] cortices.
Although some studies of cortical activity report variable sequence element durations [4,8], they have previously been shown to last about 100 ms and correlate with the animal's reaction time [3]. They can also last up to hundreds of milliseconds and dynamically vary as a function of the behavioral relevance of spatial information [5] or moment within task execution [1]. Temporal compression is particularly important for memory networks since patterns with shorter durations are more amenable to potentiation through associative processes (Fig 8A). Indeed, learned stimulus sequences replay up to an order of magnitude faster than trained time in prefrontal cortex [12] and primary visual cortex [41,43]. Conceptually, our model provides a way to explore how spike-based processes could influence sequence replay speeds (Fig 9). Several aspects including the amount of short-term depression [103], differential contributions of AMPA and NMDA currents [104], and strength of adaptation currents [105], are all considered to be under the influence of different neuromodulators. Their demonstrated effects on replay speed in our model imply that sequential circuit activity could be dynamically modified through neuromodulation [93,98]. The wide diversity of published values associated with these mechanisms [58,106], along with the functional heterogeneity inherent within neocortical microcircuits [3,107], lent some leeway to our choice of biophysical parameter ranges while measuring replay speeds. We expect that the addition of physiological detail to the model in the form of more diverse interneuron subtypes would influence speeds as well [108,109].
The generic neural mechanisms elucidated by this model may also be suitable for describing temporal interactions in the hippocampus, where its well known that place cell activity can be compressed [41] or dilated [110] in time. A functionally powerful aspect of our model is that it can recall sequences in reverse by switching t NMDA z i and t NMDA z j ( Fig 4G). Although bidirectional replay is typically associated with hippocampal place cell activity [41,42], alterations in plasticity window shape between groups of reciprocally connected neurons may also constitute a more generic learning mechanism for evaluating past experiences.
While orthogonality can reduce interference between patterns and lead to pristine synaptic weight matrices (Figs 3 and 4), natural stimuli often consist of common repeating patterns, which would in turn shape how sequences are memorized by the network [111]. That neural networks must solve the problems of overlapping sequence completion and disambiguation was proposed on theoretical grounds [44] and confirmed later on by experiments in neocortical [72,73] and hippocampal [112,113] preparations. The essence of the problem boils down to ambiguous information: is the circuit able to distinguish separate events that share common repeating patterns and remember earlier patterns in order to complete sequence patterns despite overlapping interference?
We have demonstrated examples of overlapping sequence recall inspired by these experiments (Fig 11) that may also be relevant for working memory tasks in which divergent sequential trajectories unfold when the animal's travel routes share common spatial locations during training (e.g. the center arm of a T-maze) [9,11]. In this way, our simulations support a functional architecture in which overlapping neural assemblies form coherent network states that can be selectively deployed depending on the cognitive requirements of the task at hand [10]. Indeed due to the extended time course of NMDA receptor dynamics, the model can disambiguate up to 2 overlapping patterns (Fig 11D-11F, S9 Fig). Given the many other possible combinations of overlapping and nonoverlapping sequence configurations that could be simulated using this network model, future work could address scaling the network up in order to accommodate longer and even more complex hetero-associative patterns.

Related Theoretical Models
Presumably in an effort to avert analytical intractability, models of sequence generation typically rely only upon one or few underlying computational mechanisms. But large-scale, recurrently connected cortical circuits exhibit complex dynamical interactions; they play host to many plastic mechanisms that intricately sculpt and are sculpted by ongoing activity. A key question that arises then is how to balance the presence of such generic and widespread phenomena while reducing unnecessary model complexity.
Early models enabled Hebbian learning of attractor sequences using a fully connected network of binary neurons [114] and a slow synaptic refresh mechanism in more realistic ratebased neurons [115]. In conjunction with STDP, more recent approaches have proposed a global astrocytic factor [67], homeostatic intrinsic plasticity [34], homeostatic synaptic plasticity [116], presynaptic homeostatic synaptic plasticity [117], homeostatic synaptic and non-Hebbian heterosynaptic plasticity [118], homeostatic synaptic and intrinsic plasticity [78], and homeostatic synaptic and homeostatic intrinsic plasticity [119]. In the same vein, our model combines synaptic and nonsynaptic plasticity mechanisms to hypothesize a means of producing sequences.
The excitatory-to-inhibitory and inhibitory-to-excitatory WTA connections (see bottom of Fig 1A) are predefined in this model. WTA dynamics can theoretically be stably learned in rate-based networks of randomly connected neurons using both excitatory and inhibitory plasticity [120]. We speculate that introducing Hebbian excitatory-to-inhibitory and inhibitory-toexcitatory WTA synapses and subsequently nonspecifically stimulating all inhibitory cells during the presentation of each element of the sequence may permit similar dynamics. In line with several newly proposed spiking models that enable plasticity for all synaptic connections by relying upon inhibitory plasticity to learn and retrieve memories [118,121,122], learning WTA synapses in our modular model is a promising direction of future study.
Our model avoids unnecessary complexity by framing the accompanying dynamical changes in the context of Bayes' rule [35]. Spike-based BCPNN computes synaptic weights and intrinsic biases based upon probabilities that are estimated by cascades of interacting synaptic and nonsynaptic memory traces. Supporting the view that cortical networks can represent probability distributions that are interrelated via Bayes' rule [123][124][125][126], BCPNN statistically interprets glutamate receptor kinetics as establishing a window of temporal integration for Bayesian-Hebbian learning.
Bayesian inference is a viable neurobiological encoding strategy that reasonably captures numerous psychophysical [127] and even electrophysiological [124,128] data. Furthermore, it is attractive from a computational point of view due to its ability to represent and compute using uncertainty in a principled way [123]. Uncertainty is inherent in situations where there are several plausible interpretations of an incoming signal, for example on a macroscopic level given noisy or incomplete sensory stimuli, or on a microscopic level where stochastic biochemical processes abound [129,130].
By the same token, the intuition that probabilistic computation should occur on the timescale of a single ISI in order to match the temporal scale of perception and sensorimotor integration [125] has propelled theoreticians to devise biophysical substrates for the neural implementation of Bayes' rule. Like the aforementioned sequence studies, such models tend to consolidate multiple generic learning processes into a single cohesive probabilistic framework [35,68,131,132].
Likewise, and by virtue of the postsynaptic traces (i.e. Z j and P j ) and third factor κ, our approach highlights how synaptic, intrinsic and neuromodulated plasticity mechanisms [133,134] spanning several time scales [96,135] could interact to memorize sequential patterns of activity. Moreover, the incorporation of cellular adaptation [85] and synaptic depression [103] for terminating attractor states comprehensively provides an explicit function for these other dynamic mechanisms in the proposed network's overarching computation. It conforms to previous work asserting an essential role for NMDA [136], adaptation [137] and short-term plasticity [9,[138][139][140] in shaping sequential network dynamics over time.
Generation of sequential states in temporally asymmetric attractor networks was proposed early on [38,39]. The innovation in our work involves learning temporally asymmetrical NMDA conductances and temporally symmetrical AMPA conductances between spiking neurons. Unique propagatory roles for AMPA and NMDA receptors in terms of circuit activity are suspected [141], and it is not surprising that the long decay time constant of NMDA would contribute to sequential neuronal activity [40,43,98] given its neural ubiquity, along with its postulated role in long-term memory [142], working memory [143], predictive coding [144,145], and storing hetero-associative sequence information [146]. Its prolonged activation acts as a slow memory trace that dynamically bridges different neuronal assemblies [18,147], allowing passage of representational content from one group of neurons to the next in sequence, and propelling the network along a trajectory through state space.
Sequential activity due to asymmetrical connections has alternatively been described in the context of reservoir computing [148], where the trajectory through state space via transient dynamics emphasizes a departure from classical attractor networks involving the inevitable convergence to a single state [149]. However, active states in the model presented here should be distinguished from fixed-point attractors since they exhibit finite dwell times before being repulsed by adaptation and short-term depression. In our model, the circuit undergoes dynamic meta-stable patterns of activity whose voyage through state space is dictated by the extent of previously learned temporal associations, offering a plausible way in which time may be implicitly encoded within an attractor network.

Neuronal and Synaptic Models
Hypercolumns were spatially laid out on a 3x3 square grid with coordinates (m, n) {0, 1, 2}. The axonal delay t ij for each BCPNN connection between presynaptic neuron i and postsynaptic neuron j was set according to hypercolumn membership of the neurons using the Euclidean distance between corresponding hypercolumn grid coordinates: Delay values were randomly generated from a normal distribution with standard deviation set 10% relative to their mean t ij in order to account for individual arborization differences. The conduction velocity V was 0.2 mm/ms, and we assumed the 3x3 piece of cortex occupied a 1.5x1.5 mm area so setting d norm = 0.75 mm scaled relative distances accordingly. The additional offset of 1 ms was included so that the shortest delays in the network, which occurred when neurons i and j belonged to the same hypercolumn (i.e. m i = m j and n i = n j ), were centered on 1 ms.
All simulations were conducted using the NEST simulator [150] with the AdEx neuron model [85]. The AdEx model was modified here in order to account for the intrinsic excitability term β j , and we simplified the model further by eliminating subthreshold adaptation dynamics. For each neuron, the temporal evolution of the membrane potential V m and adaptation current I w obeyed: The model theoretically describes spike emission at the time when V m diverges towards infinity, but in practice, when a threshold V t = -55 mV was approached (V m ! V t ), V m was reset to V r = -70 mV. Here, V t is not a strict threshold per se, but rather the center of an exponential nonlinearity modeling each action potential upswing. At the same time as this discrete voltage reset, the adaptation current I w increased according to I w = I w + b with b = 150 pA. This was followed by a decay with time constant t I w = 150 ms. Unlike pyramidal cells, local basket cells included neither spike-triggered adaptation (b = 0) nor intrinsic excitability (β gain = 0). V m was initialized to random values between V t and V r for each neuron. Otherwise, all neurons had identical parameters.
The total current flow across the membrane was determined by the membrane capacitance C m = 280 pF, leak reversal potential E L = -70 mV, leak conductance g L = 14 pS, spike upstroke slope factor Δ T = 3 mV, I b j defined by Eq 4, external input I ext set by Poisson noise (see next subsection), and total synaptic input current to postsynaptic neuron j originating from other presynaptic neurons in the network: ij ðtÞðV m j À E syn ij Þ ¼ I AMPA j ðtÞ þ I NMDA j ðtÞ þ I GABA j ðtÞ: ð7Þ The definitions of I totj (t), V mj and E syn ij carry over from Eq 6 but now assume the perspective of pre-and postsynaptic neurons i and j, and synaptic conductance changes g syn ij ðtÞ defined below (see Eq 9)  In addition to the long-term BCPNN plasticity mechanisms, AMPA and NMDA synapses between pyramidal cells (i.e. w syn ij >0, connections with positive learning outcomes) were subject to short-term depression in a manner prescribed by the Tsodyks-Markram formalism [103]. In this model, a finite fraction of usable synaptic resources out of the total available is spent due to vesicle depletion from each incoming presynaptic spike at time t i sp : The parameter U = 0.25 controlled the portion of resources utilized in response to a spike, 0 x dep ij 1 was the fraction of resources available and was exponentially replenished towards its baseline level (i.e. 1) between spikes with a time constant τ rec = 800 ms, and t ij was the distance dependent transmission delay (Eq 5).
A presynaptic input spike at time t i originating from either recurrent or external sources evoked a conductance increase of x dep ij ðtÞw syn ij nS followed by an exponential decay: Here, H(Á) represented the Heaviside step function. Weight values for the local static excitatory and inhibitory feedback connections within a single hypercolumn were drawn from a normal distribution with a standard deviation of 10% relative to their mean strengths, which were set to w AMPA ij~N (6.65, .665) nS and w GABA ij~N (33.3, 3.33) nS for AMPA and GABA synapses, respectively. The synaptic time constant for GABA synapses was set to τ GABA = 5 ms. Parameters used in all network simulations can be found in S1 Appendix.

Stimulation Paradigm for Attractor Learning
Spiking attractor memory networks can store static memory patterns and have been linked to cortical memory architecture and dynamics, including performance of perceptual operations like pattern retrieval, rivalry and completion [17]. Two types of attractor states coexist in such networks: a ground state displaying unspecific and non-selective low-rate firing, and several possible active states where one population displays an elevated foreground activity and the rest display a low-rate background activity. Lateral inhibition (Fig 1A) is primarily responsible for silencing neurons outside the active attractor, and attractor activity is terminated through neural and synaptic fatigue (Eqs 9 and 11) [45]. External stimulation or high amounts of background activity can transiently switch the network into one of the coding attractors. We use the term attractor to refer to the state displaying elevated coding foreground activity throughout this work.
The network was trained by selectively stimulating neurons belonging to each pattern in succession for a duration t stim = 100 ms (but note t stim is varied in Fig 8) through external presynaptic background sources (Eq 7) such that neurons in active patterns fired at f max = 20 Hz. An epoch was defined as the period during which each pattern was stimulated once, and training consisted of 50 epochs. Simulating upstream network input, a constant amount of Poisson noise r ex = 350 Hz through w AMPA ij = 5 nS strength synaptic input was applied to each pyramidal cell during recall. These parameters determined the amount of external input I ext (Eq 6). During cue-triggered recall experiments (see Figs 7 and 10), r ex was reduced to 150 Hz, and instead, a stimulus consisting of a Poisson spike train with an average firing rate of 200 Hz through w AMPA ij = 5 nS strength input synapses was delivered to each neuron of the given pattern for 100 ms. The realization of Poisson noise was different for each trial, but with a rate modulation that was repeated across trials.

Spike Train Analysis
After learning, the analysis of certain network properties was contingent upon whether or not neurons participated in an attractor state. We adopted previously used criteria for detecting active attractor states in these types of networks [151]: The condition holds that the instantaneous firing rate of the most active attractor with index a, r a (t), is greater than the instantaneous standard deviation of the average firing rates of all attractors σ(t), which itself is greater than the instantaneous firing rate of the second most active attractor with index k, r k (t). In order to be counted as an active attractor, we required that index a did not change for at least 25 ms in order to filter out spurious activations. We found that reasonable variations of this cutoff value along with the numerical constant c = 1 that scaled σ(t) did not qualitatively impact our results. Detecting attractors in this way allowed at most one attractor to be active at a time, which reflected the propensity of the network to either be dominated by the activity of a single attractor or be in a ground state where there was competition amongst several slightly active populations. Furthermore it was advantageous due to the fact that r a (t) was a temporally local measure meaning that large, noisy activations due to complex feedback interactions did not bias detection at other points in time [151].
This information was in turn used to assess mesoscopic aspects of network function during recall such as attractor dwell times, sequence speeds, distribution of attractor transitions, and recall performances. The period of time in which the most active attractor with index a remained stable was recorded as the dwell time of that active attractor, t dwell . Sequence speeds in attractors per second could then be calculated by inverting the average t dwell , which was referred to as μ dwell , for all attractors replayed during recall: Trained speed refers to the number of attractors per second that were sequentially stimulated at controlled durations during training, whereas recalled speed is the number of attractors per second replayed as a result of the learning that occurred during training. They were used to compute the compression factor according to: The conditional response probability (CRP) [152,153] curve was examined in order to evaluate the distribution of attractor transitions during recall. It represented the fraction of times a recalled attractor was followed by another attractor with a certain lag, where the index of the recalled attractor was relative to the index of its predecessor during training. Positive lags denoted forward transitions and negative lags denoted backward transitions. Time lags in the range -4. . .5 were adopted for full wrap around coverage of attractor position in the temporal domain, although self-transitions, i.e. time lag 0, were effectively unattainable due to residual adaptation and short-term depression. CRP values approaching 1 implied greater certainty in a transition. Since there were 10 patterns in this network, a CRP of 0.1 represented chance level for a transition.
Sequence recall fidelity was quantified by comparing the index order of detected attractors during recall with that of the trained stimulus template using the Levenshtein distance (i.e. edit distance) [154]. The difference between string representations of the sequentially trained attractor order T and the recalled attractor order R was calculated as the minimum number of single character edits needed to convert between T and R. The method penalized each required insertion, deletion, and substitution: ; v > 0; u > 0: Here, u and v denote matrix indices, (. . .) the matrix element at the given indices and [. . .] the indicator function that returned 0 when R u = T v and 1 otherwise. The reported edit distance D L was the resulting lower right element of the matrix D RT . Thus D L = 0 only when the recalled sequence episode was an exact match to the trained stimulus template.
The trained stimulus template T comprised only a single sequence that was repeated with different Poisson noise added each time. But in cases where attractors were cyclically activated by background activity and detected over the entire recall period, sequence replay success could be mixed, leading to varying individual recall patterns R. Sometimes the sequence was able to fully complete, but other times recall was truncated. Thus a single D L value could not accurately summarize this heterogeneity.
For specific situations in which sequences degraded at the margins of compatible parameter sets, we observed that the first and last patterns tended to replay even when the patterns in the middle did not. These serial position effects, or the tendency for primacy and recency during sequence recall [1,153], meant that we could split the entirety of all concatenated yet separate recalled sequences based on delimiters corresponding to the first or last element indices. Using this criteria, we calculated the mean edit distance D L by averaging D L between the stimulus template and each individual recalled sequence. The maximum acceptable tolerance was set such that sequences with edit distances D L > 5 were too degenerated to be considered successfully recalled. The expanded parameter sets of Figs 8-10 were guided by this criteria. By eliminating this criteria it was possible to demonstrate that sequence recall quality degraded for fast training speeds and in some cases for slow training speeds as well (S10 Fig). Supporting Information