Sequence learning recodes cortical representations instead of strengthening initial ones

doi:10.1371/journal.pcbi.1008969

Fig 1.

Sequence learning.

(A) Four Gabor patches (items used in this study) associated with four sequence positions and a matrix representation of the sequence. (B) Item-position associations in monkey prefrontal cortex as observed by Berdyyeva et al. [38]. Each subplot displays spiking activity for a particular neuron: the first one responds most to items at the beginning of a three-item sequence, the second for the ones in the middle, and the last one for items at the end of the sequence. Numbers on x-axis mark the onset of the stimulus events. (C) Visual representation of three sequences as position-item associations and the resulting frequency of associations. The frequency of associations can be learned as a model of the environment. (D) Dissociating between learning mechanisms in terms of similarity between novel and learned sequences: with associative learning (left) learned sequences share the same item codes with novel ones. Furthermore, learning reduces noise in learned sequence representations. Recoding (right) changes item representations so that novel and learned stimuli do not share representations.

More »

Expand

Fig 2.

Task.

(A) Single trial: participants had to recall a sequence of four Gabor patches in the order they were presented after a 4.8s delay period using a button-box. The size of the stimuli within the display area is exaggerated for illustrative purposes. (B) Trial types and progression: 2/3 of the trials were repetitions of the same two individual sequences (repeating sequences), while 1/3 of the trials were novel unseen orderings of the items (novel sequences). The identity and order of repeating and novel sequences were pseudo-randomised across participants.

More »

Expand

Fig 3.

Testing the predictions of learning models using RSA.

Left: model prediction expressed as a representational dissimilarity matrix (RDM) of pairwise between-stimulus distances (the cells in the matrix display the associative learner predictions in terms of item-position associations as quantified by the Hamming distance). The small matrices on the top refer to the representations of individual sequences in the matrix form (as shown on Fig 1). For example, second cell in the first row is the predicted Hamming distance between sequences presented on trials 1 and 2. Right: RDM of measured voxel activity patterns elicited by the stimuli. The small matrices are illustrative representations of voxel patterns from an arbitrary brain region. The correlation between these two RDMs reflects the evidence for the predictive model. The significance of the correlation can be evaluated via permuting the labels of the matrices and thus deriving the null-distribution. See Representational similarity analysis (RSA) in Methods for details.

More »

Expand

Fig 4.

Sequence representation in associative and recoding models.

The figure illustrates difference in sequence representations for four individual sequences: associative representations are displayed at the top and chunk codes at the bottom row. Differently coloured letters and boxes refer to individual item codes. For the chunk recoding model (bottom) item codes reflect the optimal chunking structure for the sequences presented in our experiment. Note that the representation of the novel sequence (Trial 2) contains the same number of item codes at same positions for both models.

More »

Expand

Fig 5.

Evidence for the recoding model.

The recoding model predicted the distance between pairs of voxel activity patterns corresponding to novel and repeating sequences in three brain regions. Models are shown on the x-axis (’R’: recoding, ‘A-IP’ and ‘A-II’ are associative item-position and item-item models respectively). Y-axis displays the model fit in terms of participants’ average Spearman’s rank-order correlation (r). Dots represent individual participants’ values and error bars the standard error of the mean (SEM). Coloured dashed lines represent the lower and upper bounds of the noise ceiling for the recoding model. In all displayed plots the lower noise ceilings were significantly greater than zero across participants. The anatomical contours of the regions are superimposed on the MNI152 glass-brain template (left).

More »

Expand

Table 1.

Representation of novel sequences.

Anatomical region suffixes indicate gyrus (G) or sulcus (S). Asterisks (*) represent significant evidence for the item-position sequence representation model reaching the lower bound of the noise ceiling in any of the three task phases: presentation, delay, and response. The lower noise ceilings were significantly greater than zero for all regions displayed in the table (df = 21, p < 10⁻³); see see Noise ceiling estimation in Methods for details).

More »

Expand

Fig 6.

Univariate effects of learning.

Statistical map of t-values (magenta-cyan) of the univariate BOLD difference for learned stimuli (repeating/learned < novel sequences) superimposed on the MNI152 glass-brain template. Regions which encode both novel and repeating sequences as predicted by the recoding model are projected on top of the statistical image with solid lines (red: the parietal inferior-supramarginal gyrus; green: the postcentral sulcus; blue: the occipital superior transversal sulcus).

More »

Expand

Fig 7.

Interference in sequence learning.

(A) Visual representation of two sequences as position-item associations (top) and the resulting frequency of associations (bottom) as defined by the associative sequence learning model. (B) Associative learning of two sequences on panel A would boost the representations of four individual sequences despite the statistical regularities being extracted from only two. See S4 Text for a worked example. (C) Histogram of the expected number of shared codes (item-position associations, x-axis) for a single 4-item sequence with all other possible 4-item sequences (n = 256, allowing repeats), measured as a proportion of sequences sharing the same number of codes (y-axis). (D) Histogram of the shared codes for a two-item (bi-gram) chunk representation. (E) Interference between sequence representations in the item-position model. X-axis displays how many sequences have been learned and lines on the plot display the proportion of other sequences affected by learning as a function of codes shared: the lines correspond to columns in panel C. The red line shows the proportion of sequences which have been unaffected by learning. (F) Interference between sequence representations in the chunk model.

More »

Expand

Fig 8.

Optimal chunking model.

(A) Evidence for three alternative chunking models and their components at the beginning of the scanning experiment, when participants had seen the two repeating sequences 12 times each during the practice session. The three models use only single type n-grams: the 1-gram model encodes sequences using four single-item n-grams, 2-gram model with two bi-grams, and the 4-gram model with a single four-gram. The left panel shows the probability of the set of n-grams (code) each model specifies in terms of their negative log values. The centre panel shows the probability of their mappings (encoding) and the right panel the combination of the two into model evidence. The blue and red parts of the model evidence bar represent model code (n-grams) and encoding (mappings) probabilities in terms of their negative logs and the total length of the bar displays the model evidence as their sum. This allows intuitive visualisation of the code-encoding trade-off calculated by the Bayesian model comparison. The 4-gram model is the optimal model at the start of the experiment. (B) Model evidence across trials. X-axis shows the trial number and y-axis shows the log of model evidence. The optimal model is inferred at every trial; the 1-gram model encodes sequences only with four uni-grams, and the 4-gram model only uses four-grams. Note that at the beginning of the experiment the 4-gram model is equivalent to the optimal model: however, as new sequences are presented the optimal model encodes new data with shorter chunks (uni-grams) while the 4-gram model encodes new unique sequences with four-grams. Note that as new data is observed the evidence for any particular model decreases as the set of data becomes larger and the space of possible models increases exponentially. Also note that the log scale transforms the change of evidence over trials into linear form.

More »

Expand

Fig 9.

Change in pattern distance across trials for a single participant and region.

Y-axis displays the distance value, while x-axis the trial number.

More »

Expand