Simple Plans or Sophisticated Habits? State, Transition and Learning Interactions in the Two-Step Task

doi:10.1371/journal.pcbi.1004648

Fig 1.

Original and reduced versions of the two-step task.

(A, B) Diagram of task structure for original (A) and reduced (B) two step tasks. (C, D) Example reward probability trajectories for the second-step actions in each task. (E—H) Stay probability plots for Q(1) (E,G) and model-based (F, H) agents on the two task versions. Plots show the fraction of trials on which the agent repeated its choice following rewarded and non-rewarded trials with common and rare transitions (SEM error bars shown in red). (I, J) Performance (fraction of trials rewarded) achieved by Q(1) and model based agents, and by an agent which chooses randomly at the first step. Agent parameters in (I,J) have been optimised to maximise the fraction of rewarded trials.

More »

Expand

Fig 2.

Stay probability transition-outcome interaction for Q(1) agent due to trial start action values.

(A) Predictor loadings for logistic regression model predicting whether the Q(1) agent will repeat the same choice as a function of 4 predictors; Stay–a tendency to repeat the same choice irrespective of trial events, Outcome–a tendency to repeat the same choice following a rewarded trial, Transition—a tendency to repeat the same choice following common transitions, Transition x outcome interaction–a tendency to repeat the same choice dependent on the interaction between transition (common/rare) and outcome (rewarded/not). (B) Action values at the start of the trial for the chosen and not chosen action shown separately for trials with different transitions (common or rare) and outcomes (rewarded or not). Yellow error bars show SEM across sessions. (C) Predictor loadings for logistic regression model with additional predictor capturing tendency to repeat correct choices, i.e. choices whose common transition lead to the state which currently has high reward probability. (D) Across trial correlation between predictors in logistic regression analysis shown in (C).

More »

Expand

Fig 3.

Comparison of agents’ behaviour–reduced task.

Comparison of the behaviour of all agents types discussed in the paper on the reduced task. Far left panels–Stay probability plots. Centre left panels—Predictor loadings for logistic regression model predicting whether the agent will repeat the same choice as a function of 4 predictors; Stay–a tendency to repeat the same choice irrespective of trial events, Outcome–a tendency to repeat the same choice following a rewarded trial, Transition—a tendency to repeat the same choice following common transitions, Transition x outcome interaction–a tendency to repeat the same choice dependent on the interaction between transition (common/rare) and outcome (rewarded/not). Centre right panels–Predictor loadings for logistic regression analysis with additional ‘correct’ predictor which captures a tendency to repeat correct choices. Right panels—Predictor loadings for lagged logistic regression model. The model uses a set of 4 predictors at each lag, each of which captures how a given combination of transition (common/rare) and outcome (rewarded/not) predicts whether the agent will repeat the choice a given number of trials in the future, e.g, the ‘rewarded, rare’ predictor at lag -2 captures the extent to which receiving a reward following a rare transition predicts that the agent will choose the same action two trials later. Legend for right panels is at bottom of figure. Error bars in all plots show SEM across sessions. Agent types: (A-D) Q(1), (E-H) Model-based, (I-L) Q(0), (M-P) Reward-as-cue, (Q-T) Latent-state.

More »

Expand

Fig 4.

Comparison of agents’ performance.

Performance achieved by different agent types in the original (A) and reduced (B) tasks, with parameter values optimised to maximise the fraction of trials rewarded. For the reward as cue agent, performance is shown for a fixed strategy of choosing action A (B) following reward in state a (b) and action B (A) following non-reward in state a (b). SEM error bars shown in red. Significant differences indicated by *: 5 < 0.05, ** P < 10⁻⁵.

More »

Expand

Fig 5.

Likelihood comparison.

Data likelihood for maximum likelihood fits of different agent types (indicated by x-axis labels; MB–Model based, RC–Reward-as-cue, LS–Latent-state) to data simulated from each agent type (indicted by labels above axes) on the reduced (A-E) and original (F-J) tasks. All differences in data likelihood between different agents fit to the same data are significant at P < 10⁻⁴ except for that between the fit of the reward-as-cue and latent-state agents to data simulated from the reward-as-cue agent which is significant at P = 0.027.

More »

Expand

More »

Expand