Cost-benefit trade-offs in decision-making and learning

Value-based decision-making involves trading off the cost associated with an action against its expected reward. Research has shown that both physical and mental effort constitute such subjective costs, biasing choices away from effortful actions, and discounting the value of obtained rewards. Facing conflicts between competing action alternatives is considered aversive, as recruiting cognitive control to overcome conflict is effortful. Moreover, engaging control to proactively suppress irrelevant information that could conflict with task-relevant information would presumably also be cognitively costly. Yet, it remains unclear whether the cognitive control demands involved in preventing and resolving conflict also constitute costs in value-based decisions. The present study investigated this question by embedding irrelevant distractors (flanker arrows) within a reversal-learning task, with intermixed free and instructed trials. Results showed that participants learned to adapt their free choices to maximize rewards, but were nevertheless biased to follow the suggestions of irrelevant distractors. Thus, the perceived cost of investing cognitive control to suppress an external suggestion could sometimes trump internal value representations. By adapting computational models of reinforcement learning, we assessed the influence of conflict at both the decision and learning stages. Modelling the decision showed that free choices were more biased when participants were less sure about which action was more rewarding. This supports the hypothesis that the costs linked to conflict management were traded off against expected rewards. During the learning phase, we found that learning rates were reduced in instructed, relative to free, choices. Learning rates were further reduced by conflict between an instruction and subjective action values, whereas learning was not robustly influenced by conflict between one’s actions and external distractors. Our results show that the subjective cognitive control costs linked to conflict factor into value-based decision-making, and highlight that different types of conflict may have different effects on learning about action outcomes.


S3 Text. Distractor bias parameter vs. behaviour correlations
To investigate the relation between conflict avoidance and conflict adaptation effects, we assessed the relation between the estimated distractor bias parameter (as an index conflict avoidance) and conflict adaptation on RTs. Conflict adaptation effects were calculated as the difference between conflict effects (I minus C) for previously congruent minus previously incongruent trials. Thus, larger conflict adaptation reflects a greater reduction in conflict effects following incongruent trials. Since similar conflict adaptation was observed for free and instructed trials, we averaged over choice conditions. This analysis revealed a significant positive correlation between the distractor bias parameter, i.e. conflict avoidance, and conflict adaptation effects on RTs (see Fig C.i, Pearson's correlation: r = 0.54, t18 = 2.72, p = .014). That is, participants who were better able to adapt their behaviour to reduce conflict costs on RTs were also more likely to avoid conflict when unnecessary (i.e. in the absence of strong value differences).
These results should be interpreted with care, given our relatively small sample size. Nonetheless, they suggest that participants' sensitivity to conflict may be reflected in these two types of adaptive behaviours, rather than being a trade-off between them. It could have been hypothesised instead that participants who were worse at minimising RT costs would benefit most from avoiding conflict. Yet, this correlation implies that a common process of conflict monitoring and adaptation may underlie both types of behavioural responses. In fact, previous work has suggested that conflict signals can trigger both adjustments in cognitive control and conflict avoidance ([1-3] but see [4]).

Fig C. Relation between distractor bias parameter and participants' behaviour.
Correlations between the estimated distractor bias parameter (φ) and conflict adaptations effects on RTs (i), and average performance (ii).
Finally, it could have been hypothesised that having a larger choice bias might impair performance in the task, as participants choices might be too driven by the distractors ii Participants' data rather than action values. Importantly, since the probably of left and right distractors was equate within each learning episode (i.e. between reversals), following the distractors' suggestion would be equally likely to be helpful vs. unhelpful to task performance (i.e. 50/50 chance). Nevertheless, we tested this hypothesis by assessing the correlation between the estimated distractor bias parameter and average task performance, which showed no significant correlation (Fig C.ii, Pearson's correlation: r = -0.36, t18 = -1.63, p = .12).
The independence of distractor bias effects from average performance was further corroborated through model simulations. Virtual datasets (N = 100) were simulated across a range of distractor bias (φ) values ([-2, 2], at intervals of 0.1; and constant β = 2, α = 0.6). The simulated virtual choices were then used to calculate the average performance (i.e. average proportion of high reward choices, Fig D.i), as well as the percentage of distractor congruent choices (Fig D.ii), i.e. the consequence of the simulated distractor bias effect. These findings show that, across this broad range of φ values (φ estimated on participants' data varied less than 1 unit), changes in average performance were minimal, whereas they were associated with very large differences in the effect of distractor bias on free choices (i.e. proportion of distractor congruent vs. incongruent choices). This confirms that neither our task nor our model implies that the distractor bias would result in a significant impairment in task performance. ii Simulated data