Human Inferences about Sequences: A Minimal Transition Probability Model

doi:10.1371/journal.pcbi.1005260

Fig 1.

Three different hypothesis spaces

(A) Sequences can be characterized by a hierarchy of statistics. We consider here binary sequences with only two items: X and Y. The simplest statistic considers stimuli in isolation, based on the frequency of items, p(X) and p(Y). A second level considers pairs of items irrespective of their order, distinguishing pairs of identical versus different items (XX and YY vs. XY and YX). The relevant statistic is the frequency of alternations, or conversely, the frequency of repetitions: p(alt.) = 1 – p(rep.). A third level considers ordered pairs, distinguishing X₁Y₂ from Y₁X₂. The relevant statistics are the two transition probabilities between consecutive items: p(Y₂|X₁) and p(X₂|Y₁). For brevity, we generally omit the subscripts. For binary sequences, the space of transition probabilities is 2-dimensional. In this space, the diagonals are special cases where transition probabilities coincide with the frequency of items and frequency of alternations. Out of the diagonals, there is no linear mapping between transition probabilities and the frequency of items (shown in red/blue and iso-contours) or the frequency of alternations (shown with transparency and iso-contours). (B) Example sequences generated from distinct statistics. From top to bottom: The sequences (1) and (2) differ in their frequency of X but not in their frequency of alternations. To generate such sequences, one can select the next stimulus by flipping a biased coin. The sequences (3) and (4) differ in their frequency of alternations, but not in their frequency of X. To generate such a sequence, one can start the sequence arbitrarily with X or Y, and then decide whether to repeat the same item or not by flipping a biased coin. The sequence (5) is biased both in its frequency of alternations and its frequency of items. It cannot be generated with a single biased coin, but instead two biased coins are required, one to decide which item should follow an X and the other to decide which item should follow a Y. The sequence (6) is a purely random sequence, with no bias in either transition probabilities, and hence, no bias in item nor alternation frequencies. It can be generated by flipping a fair coin.

More »

Expand

Fig 2.

Three different inference styles

Panel A shows an example of a sequence in which the statistics change abruptly: the first half, from 1 to 150, was generated with p(X|Y) = 1 – p(Y|X) = 2/3, and the second half with p(X|Y) = 1 – p(Y|X) = 1/3. In this paper, we consider different hypotheses regarding the inference algorithm used by the brain to cope with such abrupt changes (panel B). Some models assume that a single statistic generates all the observations received (“fixed belief”) while other assume volatility, i.e. that the generative statistic may change from one observation to the next with fixed probability p_c (“dynamic belief”). Models with fixed belief may estimate the underlying statistic either by weighting all observations equally (“perfect integration”), or by considering all observations within a fixed recent window of N stimuli (“windowed integration”, not shown in the figure), or by forgetting about previous observations with an exponential decay ω (“leaky integration”). The heat maps show the posterior distributions of transition probabilities generating the sequence in (A) as estimated by each model. The white dash line indicates the true generative value. The insets show the estimated 2-dimensional space of transition probabilities at distinct moments in the sequence. White circles indicate the true generative values.

More »

Expand

Fig 3.

The electrophysiological P300 response reflects the tracking of statistical regularities.

A) Data redrawn from Squires et al. (1976). Subjects passively listened to binary streams of auditory stimuli (denoted X and Y). Stimuli were generated randomly with global frequency p(X) = 0.5 (no bias), p(X) = 0.7 or p(X) = 0.3 (biased frequencies) in separate sessions. The P300 amplitude was averaged at the end of all possible patterns of 5 stimuli at most, and plotted as a “tree” whose branches show the possible extensions for each pattern. (B-C) Average theoretical levels of surprise for all possible patterns. For each model (i.e. each set of three trees), the theoretical surprise levels were adjusted for offset and scaling to fit the data. For local models with leaky integration (B), we show the trees corresponding to the best fitting value of the leak parameter ω. The insets show a direct comparison between data and best-fitting theoretical surprise levels, with the regression R².

More »

Expand

Table 1.

Model comparison

More »

Expand

Fig 4.

Tracking of statistical regularities and reaction times.

(A-B) Experimental data redrawn from Huettel et al. (2002) [2]. Subjects were presented with a purely random stream of two items. They had to press a key corresponding to the presented item as fast as possible. Reaction times are sorted depending on whether the local sequence of items followed a local streak of repeated or alternated items, and whether the last item continued or violated the preceding pattern. For instance, in XXXXY, the last item violates a previous streak of four repeated items. (C) Experimental data redrawn from Cho et al. (2002) [20]. The task was similar to Huettel et al. (2002) but reaction times are now sorted based on all possible patterns of repetition (R) or alternation (A) across the five past stimuli. For instance, the pattern AAAR denotes that the current item is a repetition of the previous item, and that the four preceding stimuli all formed alternations (e.g. XYXYY). (D-L) Theoretical surprise levels estimated in purely random sequences by three different local models. These local models differ only in the statistic they estimate. Their single free parameter is the leak of integration, it was fitted to each dataset. We report the regression R² for these best parameters. Note that regressions include the data for both repetitions and alternations in the case of Huettel et al. Note that only a learning of transition probabilities predicts several aspects of the experimental data.

More »

Expand

Fig 5.

Asymmetric perception of randomness.

(A) Data redrawn from Falk (1975) and reported in [25]. Subjects were presented with various binary sequences of 21 stimuli. They were asked to rate the apparent randomness of each sequence. The range of perceived randomness was normalized between 0 and 1. Ratings were sorted based on the alternation frequency in the sequences. (B-D) Theoretical levels of entropy estimated by distinct local models. The entropy characterizes the unpredictability of a sequence. For each model, we generated random sequences differing in their alternation frequencies. For each sequence and model, we computed the estimated probability of the next stimulus of the sequence, given the preceding stimuli. We then converted these predictions into entropy levels and plotted the average for different values of the leak parameter of the model. Note that only a learning of transition probabilities predicts a slight asymmetry of perceived randomness.

More »

Expand