Learning of Chunking Sequences in Cognition and Behavior

We often learn and recall long sequences in smaller segments, such as a phone number 858 534 22 30 memorized as four segments. Behavioral experiments suggest that humans and some animals employ this strategy of breaking down cognitive or behavioral sequences into chunks in a wide variety of tasks, but the dynamical principles of how this is achieved remains unknown. Here, we study the temporal dynamics of chunking for learning cognitive sequences in a chunking representation using a dynamical model of competing modes arranged to evoke hierarchical Winnerless Competition (WLC) dynamics. Sequential memory is represented as trajectories along a chain of metastable fixed points at each level of the hierarchy, and bistable Hebbian dynamics enables the learning of such trajectories in an unsupervised fashion. Using computer simulations, we demonstrate the learning of a chunking representation of sequences and their robust recall. During learning, the dynamics associates a set of modes to each information-carrying item in the sequence and encodes their relative order. During recall, hierarchical WLC guarantees the robustness of the sequence order when the sequence is not too long. The resulting patterns of activities share several features observed in behavioral experiments, such as the pauses between boundaries of chunks, their size and their duration. Failures in learning chunking sequences provide new insights into the dynamical causes of neurological disorders such as Parkinson’s disease and Schizophrenia.

where A + > 0, A − > 0 define the magnitude of the weight update.
For simplicity, we choose an exponential temporal window K A (∆) = exp(−|∆/τ ST DP |) with decay rate τ ST DP T . This rule is consistent with the requirement that V ij depotentiates when a transition from x i to x j occurs. As long as potentiation and depression are matched, this does not depend critically on this window, as we demonstrate below.
The condition that potentiation and depression are matched can be written: We assume that the sign of K A (∆) is fixed at each side of the ∆ = 0 axis: The state typically transitions sharply, such that x i (t) and x j (t) are monotonic around the transition times. For a transition, this can be written: Under the above assumptions, we show that during a transition from x i to x j , V ij depotentiates and V ji potentiates. The weight change is: With a change in sign in the second integral, the above equality can be written: Adding two terms that sum to zero under the integral: The matching of the potentiation and depression in Eq.
(2) guarantees that the middle terms vanishes.
The terms under the integral can be regrouped as follows: It is clear that, under the assumptions above (Eq. (2), (Eq. (3)) and (Eq. (4))), the integrand is positive or zero, leading to Fig. 1 illustrates how the asymmetric learning windows causes the weight to change when a transition between two units takes place.

Network Dynamics Influence Chunking Rate
The chunking rate is defined as the number of transitions in the chunking layer while a pattern of the sequence is presented in the learning phase. This rate can be modulated, for example by biasing the chunking layer or its auxiliary variables z k . To illustrate this, we added a global, step-wise varying input to the auxiliary variables z k , and proceeded with the learning protocol similarly to the experiments in the main text (100 epochs). Results show that a larger number of chunks transition around the steps, and that the input magnitude drastically alters the chunking rate.

Learning with Noisy Stimuli
Noisy patterns S k were obtained by adding noise to each pattern of the sequence: are the original patterns consisting of horizontal bars. The noise term changes from one presentation of the sequence to the other, but it remains constant during the presentation. In the main text, we report the amplitude of the noise as the ratio: where · i is the expectation over realizations of max(0, η k [i]). Fig. 3 shows examples of the stimuli with noise amplitudes matching those used in the main text. We added a global, step-wise varying input b z to the auxiliary variables z k . (Middle) Chunking rate computed as the number of transitions in the chunking layer during the presentation of each sequence element, averaged over 60 different runs of the training, and averaged over epochs 50 to 100. The average chunking rate was 0.071 from time 35 to 60, and 1.24 from time 95 to 120. Very few transitions occurred during the phase where b z was strongly positive, compared to chunking rate .314, when b z was zero. For strongly negative b z , chunking is nearly absent, as the chunking layer transitions almost once every presentation of a sequence element (the chunking rate is close to 1). Furthermore, the chunking rate is high at the points where b z changes, which illustrates how the chunking has a tendency to synchronize with changes in b z . (Bottom) Illustration of the activity in the chunking layer at trial 50 for all 60 runs. The boundaries of the chunks are clearly located at the time points where b z changed.

Parameters of the learning model
In Tab. 1, we detail all the parameters and values of the learning model so that the dynamics can be reproduced.