Fig 1.
Action bias and hysteresis for the “generalized reinforcement learning” (GRL) model.
(a) Each trial of the structured reward-learning task was initiated with an image cue symbolizing the state of the environment (e.g., “A” or “B”), where the optimal action given the state was a button press with either the left (“L”) or right (“R”) hand. In contrast to the expert control of GRL for mapping state-action pairs to rewards, the nonexpert forces of action bias and hysteresis were modeled as leftward or rightward bias and repetition or alternation bias. These action-specific effects manifest independently of the external state and reward history. (b) What matters for the present purposes is that, while a model with GRL adds complexity to basic RL, even more complexity must be accommodated for action bias and hysteresis. The agent’s mixture policy πt(st,a*) is probabilistic over available actions a* in state st. The action selection of this mixture policy is determined by not only learned value for state-action pairs Qt(st,a*) but also constant bias B(a*) and dynamic hysteretic bias Ht(a*) with an exponentially decaying hysteresis trace. The outcome of the chosen action at is a reward rt+1 that updates Qt(st,at) via the reward-prediction error (RPE) δt+1 weighted by a learning rate α. For GRL specifically, this RPE signal is generalized to representations of other state-action pairs according to extra parameters for action generalization (gA) and state generalization (gS). See Figs 8 and 13 for details of the plots representing individual differences in constant lateral bias (left versus right) and the exponential hysteresis trace (repeat versus alternate). See also the original report of this study with additional details about the paradigm and GRL per se [12].
Table 1.
Variables for basic forms of RL, bias, and hysteresis.
Fundamentally for even basic RL, the possibilities for variables in a more comprehensive behavioral model can be classified according to dependence on (or independence of) states, actions. previous actions, and reward outcomes. In principle, whereas action value is outcome-dependent, action hysteresis is outcome-independent. However, when modeling actual behavior, this conceptual independence does not guarantee statistical independence because of incidental correlations in finite sequences of action choices. For the present study, the primary model comparison focuses on the three variables (marked with an asterisk) that are the most fundamental and typically the most dissociable—namely, constant bias B(a), state-independent action hysteresis H(a), and state-dependent action value Q(s,a). The extended model comparison also incorporates state-dependent action hysteresis H(s,a) and state-independent action value Q(a). Note that state value V(s) is generally relevant in RL but is not considered here. The abbreviations “PrevAction”, “dep.”, and “indep.” correspond to “previous action”, “dependent”, and “independent”, respectively.
Table 2.
Free parameters are listed for the 72 behavioral models in ascending order of complexity within and across classes. The models are coded with the first letter of the label referring to four possibilities: an absence of learning (“X”), reinforcement learning (RL) without generalization (“0”), generalized reinforcement learning (GRL) with one shared generalization parameter g1 (“1”), or GRL with two separate generalization parameters g1 and g2 (“2”). RL itself required free parameters for the learning rate α and the softmax temperature τ. Models labeled with “C” for the second letter included a constant lateral bias, which was arbitrarily designated as a rightward bias βR (where βR < 0 is leftward). The list is condensed with bracket notation to represent the range for the n-back horizons of each successive model within a hysteresis category (e.g., “2CE[1–3]” for models 2CE1, 2CE2, and 2CE3). Models labeled with”N” and ending with a positive integer (from the range in brackets) included n-back hysteresis with free parameters βn for repetition (βn > 0) or alternation (βn < 0) of each previous action represented—up to 4 trials back (β4) with learning and up to 8 trials back (β8) without learning. Models labeled with “E” and ending with a positive integer N (from the range in brackets) included exponential hysteresis with inverse decay rate λH taking effect N+1 trials back. Exponential models could also be both parametric and nonparametric with N free parameters βn for initial n-back hysteresis up to 3 trials back (β3), where the final βN is the initial magnitude of the exponential component. “df” stands for degrees of freedom. See also Table A in S1 Text for the unrolled version of the list. This ordering of the models corresponds to the ordering in Figs 2 and 3.
Fig 2.
Model comparison: 3-T Face/House version.
The ordering of the models here corresponds to the ordering in Table 2 and Table A in S1 Text. As before, the model begins with “X-”, “0-”, “1-”, or “2-” for no learning, basic RL, 1-parameter GRL, or 2-parameter GRL. A subsequent “C” denotes constant bias, and “N” or “E” represents n-back or exponential hysteresis, respectively, while incrementally adding a step back to the n-back horizon with each successive model within a hysteresis category (e.g., the rightmost models 2CE1, 2CE2, and 2CE3). (a) Shown for each model is average goodness of fit relative to the null chance model (“X”) with (light bars) and without (light and dark bars combined) a penalty for model complexity according to the corrected Akaike information criterion (AICc). With the addition of action bias and hysteresis parameters alongside GRL, Poor learners (blue bars) and Nonlearners (red bars) revealed the greatest gains in model performance, but Good learners (green bars) benefited significantly as well. The best-performing models (written above each plot) featured not only GRL for the actual learners but also constant bias and exponential hysteresis for all (FH-G: 2CE1, FH-P: 1CE3, FH-N: XCE2; see Fig 3 for CM-G: 2CE1, CM-P: 1CE2). For the most essential Good-learner group, the originally preferred 2CE1 model was validated as preferable to both simpler and more complex alternatives for the specification of bias and hysteresis or lack thereof. A more positive residual corresponds to a superior fit. (b) Counts of the participants best fitted by each model according to the AICc are plotted with separation of Good learners, Poor learners, and Nonlearners. At the individual level, 87% of participants across both data sets exhibited significant effects of some kind of action bias or hysteresis. The 7-parameter 2CE1 model—complementing 2-parameter GRL with constant bias and 2-parameter exponential hysteresis—accommodates heterogeneity in both learning and action-specific effects across individuals, leaving 64% best fit by 2CE1 or one of its nested models rather than other n-back or n-back-plus-exponential models.
Fig 3.
Model comparison: 7-T Color/Motion version.
Compare to Fig 2. Results were replicated in the 7-T Color/Motion version of the experiment with a nearly identical experimental design.
Fig 4.
Reduced model comparison: 3-T Face/House version.
Compare to Fig 2. The next round of comparisons focused on subsets of eight models building up to constant bias and exponential hysteresis (“-CE1”). The baseline models were 2-parameter GRL (“2”) for Good and Poor learners or a random policy (“X”) for Nonlearners. The evidence for best fit with the 2CE1 model is more visibly salient here (FH-G: 2CE1, FH-P: 2CE1, FH-N: XCE1; see Fig 5 for CM-G: 2CE1, CM-P: 2CN2).
Fig 5.
Table 3.
Fitted parameters for the preferred 2CE1 model are listed for each participant group based on learning performance. To characterize the dimensions of distinct behavioral profiles for each participant, the signs of individual fits are categorized as “discriminative” (-1 ≤ gA < 0) or “none” (gA = 0) for action generalization; “discriminative” (-1 ≤ gS < 0), “none” (gS = 0), or “associative” (0 < gS ≤ 1) for state generalization; “leftward” or (βR < 0) “rightward” (βR > 0) for constant bias; and “alternation” (β1 < 0) or “repetition” (β1 > 0) for hysteretic bias. Also listed are metrics for absolute constant bias |βR|, absolute hysteretic bias |β1|, and overall bias |βR|+|β1|, which is inversely related to the probability of a correct response (p < 0.05). The residual deviance Ddf (with degrees of freedom in the subscript) corresponds to the 2CE1 model’s improvement in fit relative to either the XC model with only constant bias or the complete nonlearning model XCE1 adding exponential hysteresis. Standard deviations are listed in parentheses below corresponding means.
Fig 6.
Action bias and hysteresis versus learning performance: 3-T Face/House version.
To compare the pure GRL model (“2”) with the final 2CE1 model adding three parameters for constant bias and exponential hysteresis, simulated data sets from each model were yoked to their respective empirical data sets. Posterior predictive checks were tested for the probability of a correct action, the probability of a right-hand action, or the probability of a repeated action independent of state. (a) If only examining accuracy in terms of correct choices for maximizing reward, the shortcomings of the reduced model without bias are not so obviously apparent at first. (b) Upon considering action bias, these right-handed individuals mostly had a tendency to select the right-hand action (p < 0.05). Whereas the 2CE1 model could account for this effect with a constant lateral bias (p < 0.05), the reduced model could not (p > 0.05). (c) Regarding the probability of repetition versus alternation, note that 100% accuracy would produce 66.7% alternation for the present experimental design, but 100% alternation would still produce 50% accuracy. The Good-learner group exhibited a tendency to alternate in the aggregate as expected (p < 0.05), whereas the Poor-learner and Nonlearner groups did not (p > 0.05). Only the 2CE1 model featuring exponential hysteresis could match this pattern with quantitative precision. (d-f) Independent of direction, absolute differences from the chance level of 50% reveal the full extent of the action-specific components of variance, which are as substantial as the effects of reward typically emphasized in active learning. For fitting the probability of a right-hand action or a repeated action, a margin of roughly 2% for pure GRL was insubstantial in comparison. Error bars indicate standard errors of the means.
Fig 7.
Action bias and hysteresis versus learning performance: 7-T Color/Motion version.
Compare to Fig 6. Results were replicated in the 7-T Color/Motion version of the experiment.
Fig 8.
(a) Based on individual fits of the 2CE1 model, Good and Poor learners were combined and then reclassified according to whether the constant lateral bias was a leftward bias (βR < 0) (magenta bars) or a rightward bias (βR > 0) (cyan bars). The model comparison extended this posterior predictive check and others to another six intermediate models—four models nested within the 2CE1 model featuring exponential hysteresis (2N1, 2E1, 2C, 2CN1) and two models substituting 2-back hysteresis (2N2, 2CN2) but matched for degrees of freedom. For the probabilities of left or right actions, some of these right-handed people actually exhibited a contrary leftward bias; those who did exhibited a smaller absolute magnitude of bias than that of the rightward-bias group (p < 0.05). The models with a parameter for constant bias (2C through 2CE1) could replicate these effects (p < 0.05), falsifying the models that could not at all for lack of this parameter (p > 0.05). (b) Results were replicated in the 7-T Color/Motion version of the experiment.
Fig 9.
Hysteresis represented by the previous trial.
The learners were next reclassified according to whether the hysteretic bias was an alternation bias (β1 < 0) (violet bars) or a repetition bias (β1 > 0) (orange bars). With some adhering to a more typical profile of first-order perseveration, the repetition-bias group did retain a substantial effect on the probability of repeating an action independent of state (p < 0.05). However, in keeping with second-order perseveration, the alternation-bias group actually outnumbered and outweighed in effect size the repetition-bias group (p < 0.05). That is, extra alternation could follow from the design feature whereby optimal behavior would more frequently result in alternating actions. In contrast to optimal alternation when appropriate for a given state, this perseverative alternation was action-specific so as to not actually improve reward-maximizing accuracy for the alternation-bias group (p > 0.05). The models with at least one parameter for hysteretic bias could replicate these 1-back effects (p < 0.05). Although the 2C model with constant bias could partially mimic action repetition with a nonsignificant trend, the models without any hysteresis parameters (2 and 2C) could not properly match the empirical 1-back effect (p > 0.05).
Fig 10.
Psychometric modeling of constant bias.
The probability of an action increased with the difference between action values Qt(st,a) derived from the GRL component of the 2CE1 model as fitted to empirical behavior (p < 0.05). Constant bias was derived from a logistic model in the appropriate directions for both the leftward-bias and rightward-bias groups (p < 0.05). The models featuring constant bias could replicate these effects with quantitative precision as well (p < 0.05), whereas models without the parameter could not (p > 0.05). The nine plots per row each have an identical x-axis despite omission of tick labels from every other plot for readability. Error bars indicate standard errors of the means.
Fig 11.
Psychometric modeling of hysteresis represented by the previous trial.
For instead the probabilities of alternated or repeated actions, hysteretic bias was likewise derived from a GRL-based logistic model in the appropriate directions for both the alternation-bias and repetition-bias groups (p < 0.05). The models featuring at least one parameter for hysteretic bias could replicate these 1-back effects with comparable psychometric functions (p < 0.05), and while models without the parameter could not (p > 0.05), the 2C model could again deceptively mimic repetition with a nonsignificant trend.
Fig 12.
Hysteresis represented across multiple trials.
Here the scope of hysteresis was extended to previous actions up to eight trials back. For the repetition-bias group, this probability of repeating a previous action remained elevated above chance prior to 1-back (p < 0.05). For the alternation-bias group, this probability instead returned from a 1-back alternation effect (p < 0.05) to chance prior to 1-back as it increases backward (p > 0.05). Only the models with exponential hysteresis could properly match the shapes of the action-history curves, and the addition of constant bias made the correspondence even more precise. With regard to mimicry, an upward shift in the curve from constant bias in the 2C model superficially resembles the autocorrelational signature of repetition across multiple trials with exponential hysteresis. The nine plots per row each have an identical x-axis despite omission of tick labels from every other plot for readability. Error bars indicate standard errors of the means.
Fig 13.
Hysteresis parameters with exponential or nonparametric models.
The fitted parameters of the GRL model with either exponential or 4-back hysteresis are plotted as repetition weights (or alternation if negative)—simply βn for n-back models or the corresponding weights β1λHn-1 in the exponential function. Action-specific effects are better illuminated here by explicitly factoring out effects of RL and GRL within the comprehensive model. There is close correspondence between these parametric (2E1 and 2CE1) and nonparametric (2N4 and 2CN4) implementations of hysteresis for at least the first two trials back. The need for a scope extending beyond 1-back demands more than one free parameter, and a proper hysteresis trace with exponential decay yields an even better fit than a scope of 2-back due to subtle effects from 3-back and beyond. As further evidence of interactions among parameters, omission of constant bias (2E1 or 2N4) consistently inflated the modeled repetition weights as they were forced to attempt to mimic the necessary third parameter for constant bias. Altogether, the CE1 adjunct is essential. Error bars indicate standard errors of the means.
Fig 14.
Alternatives to state-independent action hysteresis.
Compare to Fig 12. To falsify alternative hypotheses concerning the origins of the apparent effects of state-independent action hysteresis Ht(a) (“2CE1”), the model comparison was first extended to test substitution of state-dependent action hysteresis Ht(st,a) (“sE1+2C”), state-independent action value Qt(a) (“Qa+2C”), confirmation bias in learning with the constraint αN < αP (“cLR+2C”), or asymmetric learning rates with no constraint for αN ≠ αP (“LR+2C”). As expected, none of these alternatives were capable of generating the original action-history curves that only state-independent action hysteresis could produce.
Table 4.
Additional models were constructed with substitution or addition of the alternative features that might be expected to interact with effects of state-independent action hysteresis. Each alternative was fixed within a new subset of eight models building up to constant bias and exponential state-independent hysteresis (“-CE1”). Variations on substitution of state-dependent hysteresis in particular were also tested up to two parameters. Listed for each participant group are the best-fitting models (per AICc score) among each subset of eight models as well as the full set of 44 models. Although there appears to be some quantitative evidence suggesting state-dependent hysteresis in addition to state-independent hysteresis, the lack of qualitative validation with falsification leaves this quantitative result inconclusive. Hence the 2CE1 model remains preferred for a final model. “df” stands for degrees of freedom. See also Figs S-W and Tables Q-U in S1 Text.