Environmental uncertainty and the advantage of impulsive choice strategies

doi:10.1371/journal.pcbi.1010873

Environmental uncertainty and the advantage of impulsive choice strategies

Fig 2

Temporal Discounting task and performance of impulsive and non-impulsive agents in different task environments.

A) Task schematic for the Temporal Discounting task. Participants or agents are given a series of questions with two offers, one for a small immediate reward and the other for a larger, delayed reward. B) The state space tree for one pair of options in the task. The agent starts on the far left with a choice between the immediate reward or delayed reward. If the immediate reward is chosen, the agent proceeds on the upper branch to the immediate reward state (s_IR) and always collects the immediate reward. If the agent chooses the delayed reward, the agent proceeds through the lower branch towards the delayed reward state (s_DR). Along this branch are a sequence of intermediate transition states (s_b) which the agent progresses through with probability δ. At each transition state, the agent might proceed to a terminal, non-rewarding state (s_a) with probability 1-δ. The number of transition states is defined by the delay to the larger reward. C) Average reward collected and choice behavior across simulated trials in certain and uncertain task environments for impulsive and non-impulsive agents in the Temporal Discounting task. “High certainty” is when δ_env>δ_agent and “low certainty” is when δ_agent<δ_env. The non-impulsive agent (black) has a discount factor of γ = 0.99 and the impulsive agent (red) has a discount factor of γ = 0.6. Left: Average reward collected for the two agents. Right: Average proportion of trials in which an agent selected the larger, delayed option. Error bars are s.e.m. across 10 iterations of 100 trials using variable reward sizes and delays. *** indicates p<0.0001 paired t-test. D) Difference in average reward across a range of δ_env and δ_agent values. The heatmap shows domains where the non-impulsive agent performs better (more blue), the impulsive agent performs better (more red) or there are marginal differences between the two agents (red). The values shown in each box on the heatmap is the difference in average reward for the two agents. The white boxes indicate the task regimes shown in Fig 2C. See S1 Fig for other discount factors. Image credit: Openclipart.org (coins image, money image).

doi: https://doi.org/10.1371/journal.pcbi.1010873.g002