Contextual inference through flexible integration of environmental features and behavioural outcomes
Fig 3
Feature and outcome inference learn distinct strategies to solve a cued T-maze.
(A) Setup of the cued T-maze, showing distinct cues and reward locations for distinct trial types, (B) Training setup for the paradigm, showing trial identity for 1000 trials during block switches followed by 500 random trials, (C) Performance of an example feature inference agent on the last 10 block switches (left) and last 100 random trials (right) used to quantify their performance, (D) Performance of feature inference and outcome inference in comparison to other RL agents on block switches (left) and random trials (right), (E) Trial type specific SRs learnt by the feature inference algorithm showing distinct predicted future occupancy when the agent is in the starting state, averaged over all agents on the last 100 random trials (log-scale), compared with trial type specific convolved reward maps learnt by the outcome inference algorithm, showing distinct expected behavioural outcomes, averaged over all agents on the last 500 trials during blocks of trials. Predicted future occupancy of locations is indicated within the T-maze and predicted future occupancy of cue 1 associated with trial type L is indicated to the left of the location it occurs in and cue 2 associated with trial type R is indicated to the right of the location it occurs in. * indicates p < 0.05, statistical results are detailed in S1 Table.