Skip to main content
Advertisement

< Back to Article

Fig 1.

Kinds of imperfect information used to investigate feature- and outcome-based algorithms.

Schematic of the cued T-maze used to investigate the impact of decreasing contextual SNR on algorithms. Schematics detail decreases in contextual SNR due to increasingly incomplete information due to either increasing spatial separation of the cue and the choice or dependence on a sequence of multiple cues, or due to increasingly unreliable information due to the addition of distractor cues around the contextual cue.

More »

Fig 1 Expand

Fig 2.

Design of feature and outcome inference models.

(A) Setup of the cued T-maze task, showing different location and cue-based features, (B) Algorithm underlying feature inference showing how observed feature transitions to are compared with learnt successor feature maps M using Bayesian inference to determine which context is currently most likely, followed by using the corresponding temporal difference map TD to choose the current best action at, (C) Algorithm underlying outcome inference showing how observed outcomes crt are compared with learnt convolved reward maps C using Bayesian inference to determine which context is currently most likely, (D) Schematic of how the feature and outcome inference models react to a change in trial type, showing selection of the relevant map for feature inference following the cue and for outcome inference following the lack of reward, and how the inferred context allows for action selection using separate temporal difference maps.

More »

Fig 2 Expand

Table 1.

Parameters across all models.

More »

Table 1 Expand

Table 2.

Parameters for feature inference.

More »

Table 2 Expand

Table 3.

Parameters for outcome inference.

More »

Table 3 Expand

Table 4.

Parameters for feature inference in biconditional discrimination.

More »

Table 4 Expand

Table 5.

Parameters for outcome inference in biconditional discrimination.

More »

Table 5 Expand

Fig 3.

Feature and outcome inference learn distinct strategies to solve a cued T-maze.

(A) Setup of the cued T-maze, showing distinct cues and reward locations for distinct trial types, (B) Training setup for the paradigm, showing trial identity for 1000 trials during block switches followed by 500 random trials, (C) Performance of an example feature inference agent on the last 10 block switches (left) and last 100 random trials (right) used to quantify their performance, (D) Performance of feature inference and outcome inference in comparison to other RL agents on block switches (left) and random trials (right), (E) Trial type specific SRs learnt by the feature inference algorithm showing distinct predicted future occupancy when the agent is in the starting state, averaged over all agents on the last 100 random trials (log-scale), compared with trial type specific convolved reward maps learnt by the outcome inference algorithm, showing distinct expected behavioural outcomes, averaged over all agents on the last 500 trials during blocks of trials. Predicted future occupancy of locations is indicated within the T-maze and predicted future occupancy of cue 1 associated with trial type L is indicated to the left of the location it occurs in and cue 2 associated with trial type R is indicated to the right of the location it occurs in. * indicates p < 0.05, statistical results are detailed in S1 Table.

More »

Fig 3 Expand

Fig 4.

Increasing overlap between contexts reduces performance of feature inference, but not outcome inference.

(A) Decreasing the contextual SNR by adding multiple overlapping features around the cue in each context or (B) extending the distance between the cue and the choice-point, (C) Performance of feature and outcome inference as contextual overlap increases on block switches and (D) random trials. * indicates p < 0.05, statistical results are detailed in S1 Table.

More »

Fig 4 Expand

Fig 5.

Supporting feature inference with outcome inference during learning rescues performance with increasing contextual overlap.

(A) Schematic of joint algorithm showing that during learning outcome inference is used to generate a joint context estimate, (B) Algorithmic implementation showing how the joint estimate is determined during learning, (C) Performance of joint inference on random trials as contextual overlap increases via distractor features around the cue or increasing cue-choice distance, (D) Performance of joint inference on block switches as contextual overlap increases via distractor features around the cue or increasing cue-choice distance. * indicates p < 0.05, statistical results are detailed in S1 Table.

More »

Fig 5 Expand

Fig 6.

Supporting learning with outcome inference improves initial formation of context-dependent representations leading to long-lasting improvement in performance.

(A) Number of incorrect updates of the context map during the first trial type switch, (B) Evolution of the average context map’s predicted future occupancy when the agent is in the starting state (log scale), at the end of the first block and the end of the second block of trials for joint and feature inference agents, (C) Underrepresentation of the correct cue (left) and overrepresentation of the incorrect cue (right) on the average context maps used on random trials, (D) Confidence in inferred context identity following the cue on random trials. * indicates p < 0.05, statistical results are detailed in S1 Table.

More »

Fig 6 Expand

Fig 7.

Removing support from outcome inference during learning mimics experimentally observed splitter cell loss.

(A) Simulated cell firing of a cell representing future predicted occupancy of location 4 (indicated with an arrow) along the overlapping central arm on correct random trials across all agents for cue-choice distance 20, (B) Quantification of simulated cell firing along the central arm in its preferred and non-preferred contexts, (C) Evolution of the difference in splitter probabilities along the central arm, (D) Schematic showing how simulated firing rates are calculated using the successor representation map associated with each context weighted by its posterior probability on its preferred and non-preferred trial types to calculate the future predicted occupancy of a specific location within all other locations of the environment, (E) Impact of increasing cue-choice distance on the difference in splitter probabilities in the location following the cue (left) and correlation between performance on random trials and the difference in splitter probabilities in the location following the cue (right). (F) Same as (E) but for distractor features. * indicates p < 0.05, statistical results are detailed in S1 Table.

More »

Fig 7 Expand

Fig 8.

Supporting feature inference with outcome inference during learning improves performance on cue-discrimination tasks.

(A) Schematic of the non-match to sample task, where trials with the same cue repeated twice require a different response than trials where two differing cues are presented (left) and training protocol showing that outcome inference is used to support feature inference during the training phase (right), (B) Performance of feature inference and joint inference on the last 10 block switches, (C) Performance of feature inference and joint inference on random trials, (D) Schematic of the biconditional discrimination task, where the meaning of the second cue depends on the identity of the first cue, (E) Performance of feature inference and joint inference on the last 10 block switches, (F) Performance of feature inference and joint inference on random trials. * indicates p < 0.05, statistical results are detailed in S1 Table.

More »

Fig 8 Expand