Tracking the contribution of inductive bias to individualised internal models

doi:10.1371/journal.pcbi.1010182

Fig 1.

Experimental paradigm and Cognitive Tomography (CT).

A Top: Behavioural responses: participants are responding with key presses on a keyboard where stimulus identities (shown as different coloured squares) are associated with unique keys. Middle: An example deterministic pattern sequence, which recurrently occurs in the stimulus sequence of a particular participant. Different participants are presented with permutations of this four-element sequence. Bottom: In the actual stimulus sequence presented to participants, the deterministic pattern sequence is interleaved with random items (small squares. Random items can be any of the four stimuli and can occur with equal probability (size of the square is proportional to the probability of a stimulus). Grey line indicates one particular realization of the stochastic sequence. B The probabilistic generative model underlying Cognitive tomography. The generative model describes the process how a stimulus sequence (top grey box) results in a behavioural response. A participant is assumed to use the internal model top blue box to make a prediction for the upcoming stimulus. The internal model assumes dynamics over the latent states. The current latent state is determined jointly by earlier states and the current observation. Based on the current latent state a prediction can be made on the probability of possible upcoming stimuli. The predicted probability (size of squares corresponds to the probability of prediction) is related to the behaviour through a behavioral model (bottom blue box). The behavioral model depends on the task being performed and therefore the type of response being predicted. Here, the logarithm of the predictive probability is mapped to a mean response time and actual response times are assumed to be noisy versions of this mean. Response times (bottom grey box) shown here are 400 trials from an example participant. Cognitive tomography uses the stimulus sequence and the sequence of behavioural responses (grey boxes) to infer the components of CT, the internal model and the behavioral model (blue boxes).

More »

Expand

Fig 2.

Inference and predictions using the internal model.

A We formulate the internal model as an iHMM, where the number of latent states (grey circles), transitions between the states (arrows), and the distribution of possible stimuli for any given state (coloured squares) needs to be inferred by the experimenter. Width of arrows is proportional to transition probability and arrows are pruned if the transition probability is below a threshold; size of dots indicates the probability of self-transition. Size of stimuli is proportional to appearance probability in the given state. The result of inference is a distribution over possible model structures, the figure represents a single sample from such a distribution. B Evolving the internal model from trial t to trial t + 1. At time t, participants use the internal model components to update their beliefs over the current state of the latent states (Ba, size of dark purple discs represent the posterior belief of the latent state based on the current observation, blue square). Then, participants play the model forward into the future (open purple circles). Finally, they generate predictions for the upcoming stimulus (Bb, squares in grey boxes) by summing over the possible future states (open purple circles in grey boxes). Participants use previous state beliefs and the new stimulus to update latent state beliefs. In this particular example, at trial t + 1 only one of the possible states can generate the observation, hence there is only one dark purple disk. Again, they play the dynamics forward and predict the next stimulus. C Predicted response times against actual response times are shown for individual trials for an example participant (dots). After training our inference algorithm on a training dataset of 10 blocks, we predict response times of another 10 blocks on the same day. Performance is measured as the trial-by-trial coefficient of determination between measured and predicted response times (R², coloured label).

More »

Expand

Fig 3.

Alternative models.

A Table of models and the maximum likelihood parameter sets for the stimuli in our experiment. The ideal observer model (the true generative model of the stimuli) can be formalized as an 8-state HMM with states Pattern1, Random1, Pattern2, Random2, Pattern3, Random3, Pattern4, Random4 where the pattern states produce the corresponding sequence element with probability 1 and all the random states produce any of the four observations with equal probability independently. The Markov model (where predictions are produced by conditioning only on the previous observation) fits the observations best when it predicts all observations with equal probability, since the marginal probabilities of any one stimulus is equal regardless what the previous observation was, because every other trial is random. The trigram model produces a “high triplet” prediction, where the next stimulus is the successor of the stimulus two trials ago in the pattern sequence (the current observation is either a random or a pattern element, each with 50% probability, with conditional probabilities of 100% or 25%, respectively). All alternatives have equal probability of 0.125. Note that the exact probabilities in this case are not relevant since the trials are categorized into two groups (high and low) and therefore the parameters of the response time model and these probabilities are underspecified. The CT model produces a prediction for the next stimulus via filtering. A latent state of the sequence is estimated from previous observations using a Hidden Markov Model. This flexible model space includes the ideal observer model as well as the Markov model as special cases. B Structure of the ideal observer model (top panel) and that of the Markov model (bottom panel). For the description of the graphical elements as Fig 2A.

More »

Expand

Fig 4.

Contrasting the ideal observer and CT performance in predicting trial-by-trial response times.

A, Performance of the two models in predicting response times on the eight days of exposure to the stimulus sequences governed by the same statistics. Performance is measured as the amount of variance in response times (R²) explained by the particular model. Dots represent mean performance, boxes represent the 25 and 75 percentile of the performances across the population of 25 participants. B, Violin plot of the distribution of mode l performances across the participants on the eighth day of exposure. Grey dots indicate individual participants, lines connect model performances for the same participant. All data on the figure are cross-validated by fitting the model on a set of blocks late in the session and tested on non-overlapping earlier blocks.

More »

Expand

Fig 5.

Validation of the inferred internal model by selectively changing the task and the stimulus statistics.

A-D Choice predictions by CT (red) and the ideal observer model (green). Models are trained on response times for correct key presses on Day 8 and tested on both correct and error trials the same day. A, Proportion of trials where the model ranked the upcoming stimulus first. For correct trials both models have preference for the stimulus. For incorrect trials, the ideal observer model falsely predicts the stimulus in more than a quarter of the time. B, Proportion of trials where the model ranked the button pressed by the participant first. For incorrect responses, both models display a preference towards the actually pressed key over alternatives. C, ROC curves for two example participants based on the subjective probabilities of upcoming stimuli (held-out dataset). Area under the ROC curve characterizes the performance of a particular model in predicting error trials. D, Area under ROC curve. Grey dots show individuals, bars show means. E, Investigating new internal models that emerge when new stimulus sequences are presented. Participant-averaged performance of predicting response times on Day 8–10 using CT-inferred models that were trained on Day 8 (filled red symbols) and Day 9 (open red symbols) on stimulus sequences governed by Day 8 or Day 9 statistics. On Day 9 a new stimulus sequence was introduced, therefore across-day prediction of response times corresponded to across sequence predictions. Training of the models was performed on 10 blocks of trials starting from the 11th block and prediction was performed on the last five blocks of trials (the index of the blocks used in testing is indicated in brackets). On Day 10, stimulus sequence was switched in 5-block segments between sequences used during Day 8 and Day 9 (purple and grey bars indicate the identity of stimulus sequence with colours matching the bars used in Day 8 and Day 9. Error bars show 2 s.e.m. over participants. Stars denote p < 0.05 difference.

More »

Expand

Fig 6.

Evolution of the internal model with increasing training.

A Mean explained variance (dots, averaged over participants) in held-out response times in sessions recorded on successive days for the CT (red), Markov (blue), ideal observer (green) and trigram (yellow) models. Error bars denote 2 standard error of the group mean. Error bars show 2 s.e.m. B Color coding of response buttons used in this figure. C Color coding of sequence showed to participants. D-F Learning in individual participants (left, middle, and right panels corresponding to different participants: 102, 110, and 119, respectively). E Learning curves of CT, ideal observer, Markov, and trigram models. Internal models shown on D & F panels (corresponding to Days indicated by red disks on panel E, respectively) are samples from the posterior of possible internal models inferred by CT. CT predictive performance is calculated by averaging over the predictive performances of 60 samples. Participant 102 finds a partially accurate model by Day 2 (D) and a model close to the true model by Day 8 (F). Participant 110 retains a Markov model throughout the eight days of exposure. Prediction of their behaviour by the Markov model gradually improves while the predictive performance of the ideal observer model is floored, indicating that no higher-order statistical structure was learned. G & H Mismatch between subjective probabilities of upcoming stimuli derived from CT and alternative models: the ideal observer model (generative probabilities, horizontal axis); and the Markov model (vertical axis). KL-divergences of the predictive probabilities are shown for individual participants (dots) on Day 2 (G) and Day 8 (H). KL-divergence is zero at perfect match and grows with increasing mismatch.

More »

Expand

Fig 7.

The internal model captured by CT can be reliably broken down into the independent contribution of an inductive bias and the ideal observer model.

A Day-by-day comparison of the number of participants for whom the predictive performance of Markov (blue) or ideal observer (green) models was higher. B Subject-by-subject comparison (dots represent individual subjects) of ideal observer model performance and normalized CT performance (the margin by which CT outperforms the Markov model) on Day 8. Dots close to the identity line (grey line) indicate cases where CT performance can be reliably accounted for by contributions from the two simpler models. Normalized CT performance closely follows the performance of the ideal observer model, and deviations tend to indicate slightly better normalized CT performance. C Performance of a linear model predicting CT model predictions on a trial-by-trial basis from a Markov and ideal observer model predictions on different days of the training. Thick mid-line indicates R² of the trial-by-trial fit of the linear combination to CT performance averaged across participants. Boxes show 25th and 75th percentile of the distribution. Upper whiskers show largest value within 1.5 from 75th percentile. Similarly for lower whisker. Dots are data points outside the whiskers. D Histogram of the advantage of normalized CT performance over the ideal observer model. Red line marks the mean of the histogram. E Higher-order statistical learning in CT (left panels) and ideal observer model (right panels) on Day 2 (top panels) and Day 8 (bottom panels) of the experiment. Dots show individual participants. Orange dots represent participants with higher-order learning score significantly deviating from zero. CT can capture both negative deviations (Day 2) and positive deviations (Day 8) in this test and displays significant correlations across participants on both days between the predicted and measured higher-order statistical learning, indicating that subtle and nontrivial statistics of the internal model is represented in CT.

More »

Expand

Table 1.

Summary of when model parameters are allowed to change.

More »

Expand

Table 2.

Parameter priors.

Values of the hierarchical prior over state transitions taken from [33].

More »

Expand