Learning of Chunking Sequences in Cognition and Behavior

doi:10.1371/journal.pcbi.1004592

Fig 1.

Two-layer network for learning chunking dynamics.

In this example, the input sequence (a, b, c, d, e) is presented repeatedly. Initially, all the synaptic connections within a matrix are similar with small random variations. Through learning distinct elementary modes associate to each of the five patterns through weights of the projection matrix P_ki. In the elementary layer, the weights V_ii′ in the directions a to b, b to c, and d to e are weakened (arrow thickness denotes coupling strength), while the weights in the opposite direction are strengthened. The W_jj′ follow a similar learning rule to three chunks: ab, c and de. Chunking, i.e. the information specifying the association between CM and EM, is learned in the coupling matrices Q_ij and R_ji. The input in the perceptual layer is represented as non-overlapping binary patterns. For example, element a is the binary pattern s^a = [11000100], input b is the binary pattern s^b = [00100010], etc. Black circles represent inhibitory couplings, while arrowheads represent excitatory couplings. The number of elementary modes should be larger or equal to the number of patterns in a sequence. Note that there must be at least three units in each layer for a stable heteroclinic cycle to exist. It is not necessary that N_y < N_x, and any value such that N_y > 3, N_x > 3 can be used. i = 1, …, N_X; j = 1, …, N_Y; k = 1…, M; N_X ≥ M > 3.

More »

Expand

Fig 2.

Projection of the phase portrait of the two-layer chunking hierarchical dynamics in the space of three auxiliary variables.

This example illustrates the dynamics of a system N_X = 24, N_Y = 3 before (left) and after learning (right) a sequence consisting of 24 patterns of M = 144 pixels. For visualization purposes, the variable space was projected according to , where superscript refers to the associated chunk. The plot is colored red when either of the chunks are active (y_i > .9, ∀i). The traces were obtained from 12 runs starting from random initial conditions in the vicinity of the origin of the transformed space. Before learning, the network reaches stable fixed points. After learning, the network results in a closed chunking sequence (black) that consists of several heteroclinic cycles that represent the chunks (red). Each of the three chunks consist of EM, as the system visits the eight states in each chunk. Note however that the projection used here effectively reduces these to 9 (three states per chunk) for visualization purposes.

More »

Expand

Fig 3.

Input and network activities during learning and recall.

s_k, x_i, y_j, z_k during learning (after 5 presentations) (a) and during sequence recall (after 120 presentations) (b). Within each layer, different colors represent different modes (variables). The sensory input (presented only during learning) consisted of 24 different patterns presented sequentially. The patterns were composed of 144 binary (represented in black and white) pixels. During learning, the input drives the system dynamics. During recall, the elementary modes and the chunking modes activate in the same order as in learning. Each CM represents about 8 consecutively active elementary modes. The onset of each chunk is delayed and caused by the inhibition from the chunking layer. It is consistent with pauses before loading chunks observed in behavioral studies (highlighted in dashed line). (c) Duration that each EM remains active, with the same color codings as in (b). Three modes associated to the transitions between chunks remain active for a longer time than the others. Such pauses can be identified with pauses observed in behavioral experiments involving chunking [17].

More »

Expand

Fig 4.

Synaptic weights before and after learning.

(a, b) Initially (t_ini), the recurrent weight matrices implement all-to-all symmetric inhibition, leading to WTA. After learning t_fin the matrices acquire an asymmetric component, leading to WLC. Superimposed white arrows in (b) indicate the resulting order of the recalled states. (c, d) The weights in the matrices Q_ij and R_ji learn which EM belongs to which chunk. The last three columns correspond to the elements that activate during chunk transitions.

More »

Expand

Fig 5.

Input weights P_ki at the elementary modes.

(left) before and (right) after training. At the beginning, t_ini, the weights are random. The learning associates each of the 24 patterns to one EM.

More »

Expand

Fig 6.

The dynamics of chunking.

The model is run 60 times, for 120 trials (N_y = 30) for different levels of noise. Each trial consisted of the presentation of one sequence, followed by a recall phase. (Top-Left) Sequence recall accuracy D averaged over all the runs. The sequence was determined by the identity of the most active mode in the elementary layer.D was computed using the Levenshtein distance (equal to the number of additions and subtractions between two sequences). In the noiseless and low noise cases, the distance between the presented sequence and the reproduced sequence reached about.05 (horizontal line), roughly corresponding to 1 addition/subtraction per sequence recall. The network was robust to noise, and sequence recall accuracy degraded gracefully as the amplitude of noise was increased. (Bottom-Left) Estimates of chunking rate measure CR for monitoring chunking in the noiseless case (blue curves).CR is defined as the number of transitions taking place in the chunking layer during the presentation of a pattern in the sequence. During an initial transient CR decreases as learning proceeds, indicating the formation of the chunks. (Right) Activity in the chunking layer for two representative runs, one with no noise, the other with no chunks, where learning of Q_ij and R_ji was turned off. The identity of the chunks is color-coded. Interestingly, the boundaries of the chunks can change during training, and the chunks can undergo substantial reconfigurations at the beginning of the training phase. In absence of learning in Q_ij and R_ji, the chunking rate did not diminish over the course of learning, indicating the absence of chunks. S4 Fig displays the evolution of the individual weights for the run shown in the top-right panel (No Noise).

More »

Expand

Fig 7.

Chunk size, number of EM in each chunk, (left) as a function of the potentiation scaling factor in Q, , (right) as a function of the time constant in the synaptic dynamics, τ_z.

The number of information-carrying items contained in the chunks depends on the system dynamics, suggesting that they have impact on the total capacity of the memory. The initial random conditions lead the system to different structures after learning (number and size of chunks). The case τ_z = 0 corresponds to completely removing the synaptic dynamics. Although the chunking is present in the absence of z_j, the characteristic time scale of z_j, τ_z has a powerful effect on chunk size. Each point was evaluated 100 times and the mean and standard deviation are presented, suggesting a monotonically increasing relationship between chunk size and or τ_z. In total, 98.6% of the runs exhibited sequential activity in the chunking layer. Total number of available chunk modes, N_Y = 30; total number of elementary modes, N_X = 30.

More »

Expand

Fig 8.

(A) Stable heteroclinic chain with two connected metastable states (B) Stable heteroclinic channel (SHC)—robust sequence of metastable states. Adapted from [82]. (C) Transformation of the phase volume along trajectories in the neighborhood of unstable separatrix in the case when both coupled saddles are characterized by saddle values larger than one.

More »

Expand