Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts
Fig 9
Hysteresis represented by the previous trial.
The learners were next reclassified according to whether the hysteretic bias was an alternation bias (β1 < 0) (violet bars) or a repetition bias (β1 > 0) (orange bars). With some adhering to a more typical profile of first-order perseveration, the repetition-bias group did retain a substantial effect on the probability of repeating an action independent of state (p < 0.05). However, in keeping with second-order perseveration, the alternation-bias group actually outnumbered and outweighed in effect size the repetition-bias group (p < 0.05). That is, extra alternation could follow from the design feature whereby optimal behavior would more frequently result in alternating actions. In contrast to optimal alternation when appropriate for a given state, this perseverative alternation was action-specific so as to not actually improve reward-maximizing accuracy for the alternation-bias group (p > 0.05). The models with at least one parameter for hysteretic bias could replicate these 1-back effects (p < 0.05). Although the 2C model with constant bias could partially mimic action repetition with a nonsignificant trend, the models without any hysteresis parameters (2 and 2C) could not properly match the empirical 1-back effect (p > 0.05).