PPM-Decay: A computational model of auditory prediction with memory decay

doi:10.1371/journal.pcbi.1008304

Fig 1.

A simple decay kernel.

The kernel is defined by an initial weight of w₀ = 1, an exponential decay with half life t_0.5 = 1 s, and an asymptotic weight w_∞ = 0.2.

More »

Expand

Fig 2.

Illustrative plots for Experiment 1.

A) Example sequence-generation models as randomly generated in Experiment 1. The bar plots describe 0th-order symbol distributions, whereas the matrices describe 1st-order transition probabilities. B) Repeated-measures plot indicating how predictive accuracy for individual sequences (N = 500, hollow circles) increases after the introduction of an exponential-decay kernel. C) Absolute changes in predictive accuracy for individual sequences, as summarised by a kernel density estimator. The median accuracy change is marked with a solid vertical line.

More »

Expand

Fig 3.

Sample chord sequences analyzed in Experiment 2.

A) represents the popular music corpus (‘Night Moves’, by Bob Seger), B) represents the jazz corpus (‘Thanks for the Memory’, by Leo Robin), and C) represents the Bach chorale harmonization corpus (‘Mit Fried und Freud ich fahr dahin’, by J. S. Bach). Each chord is labeled by its integer encoding within the chord alphabet for the respective corpus. Each chord sequence corresponds to the first eight chords of the first composition in the downsampled corpus. Each chord is defined by a combination of a bass pitch class (lower stave) and a collection of non-bass pitch classes (upper stave). For visualization purposes, bass pitch classes are assigned to the octave below middle C, and non-bass pitch classes to the octave above middle C.

More »

Expand

Fig 4.

Predictive performances for different decay kernels in Experiment 2.

Each composition contributed one cross-entropy value for each decay kernel; these cross-entropy values are expressed relative to the cross-entropy values of the original PPM model, and then summarised using kernel density estimators. Median performance improvements are marked with solid vertical lines.

More »

Expand

Fig 5.

Example analysis of a single trial in Experiment 3.

The three panels plot each tone’s frequency, change-point statistic, and information content respectively. ‘Nominal transition’ denotes the point at which the pattern changes from random tones to a repeating pattern of length 10. This repetition starts to become discernible after 10 tones (‘Effective transition’), at which point the sequence becomes fully deterministic. Correspondingly, information content (or ‘surprise’) drops, and triggers change-point detection at ‘Detection of transition’.

More »

Expand

Fig 6.

Behavioral results for Experiment 3.

A) Participant d-prime scores by condition, as summarized by violin plots and Tukey box plots. B) Participant mean response times by condition, as summarized by violin plots and Tukey box plots. C) As B, except benchmarking response times against the 25 ms conditions.

More »

Expand

Fig 7.

Decay kernels employed in Experiment 3.

The temporal duration of the buffer corresponds to the buffer’s informational capacity (15 tones) multiplied by the tone duration.

More »

Expand

Table 1.

Optimized model parameters for Experiment 3.

More »

Expand

Fig 8.

Modeling participant data in Experiment 3.

Participant data (mean response times) are plotted as white circles, whereas different model configurations (mean simulated response times) are plotted as solid bars. Error bars denote 95% confidence intervals computed using the central limit theorem. A) Progressively adding exponential weight decay and retrieval noise to the original PPM model. B) Progressively adding longer buffers to the PPM-Decay model.

More »

Expand

Fig 9.

Schematic figure of accumulating observations within a memory buffer.

Weights for the n-gram “AB” are displayed as a function of time, assuming an itemwise buffer capacity (n_b) of 5, a buffer weight (w₀) of 1.5, an initial post-buffer weight (w₁) of 1, a half life (t_0.5) of 1 second, and an asymptotic post-buffer weight (w_∞) of 0.

More »

Expand

Table 2.

n-grams learned from training on the sequence a, b, a.

More »

Expand

Fig 10.

Illustrative weight decay profile.

This figure plots the weight of an n-gram of length one as a function of relative observer position, assuming that new symbols continue to be presented every 0.05 seconds. Model parameters are set to t_b = 2, n_b = 15, w₀ = 1.0, t_0.5 = 3.5, w₁ = 0.6, and w_∞ = 0, as optimized in Experiment 3.

More »

Expand

Fig 11.

Illustration of the interpolated smoothing mechanism.

This smoothing mechanism blends together maximum-likelihood n-gram models of different orders. Here the Markov order bound is two, the predictive context is “abracadabra”, and the task is to predict the next symbol. Columns are identified by Markov order; rows are organized into weight distributions, maximum-likelihood distributions, and interpolated distributions. Maximum-likelihood distributions are created by normalizing the corresponding weight distributions. Interpolated distributions are created by recursively combining the current maximum-likelihood distribution with the next-lowest-order interpolated distribution. The labelled arrows give the weight of each distribution, as computed using escape method “A”. The “Order = −1” column identifies the termination of the interpolated smoothing, and does not literally mean a Markov order of −1.

More »

Expand

Table 3.

Summary of PPM-Decay hyperparameters.

More »

Expand

Table 4.

The dictionary of chord templates used in constructing the Bach chorale corpus.

More »

Expand