Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts

doi:10.1371/journal.pcbi.1011950

Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts

Fig 2

Model comparison: 3-T Face/House version.

The ordering of the models here corresponds to the ordering in Table 2 and Table A in S1 Text. As before, the model begins with “X-”, “0-”, “1-”, or “2-” for no learning, basic RL, 1-parameter GRL, or 2-parameter GRL. A subsequent “C” denotes constant bias, and “N” or “E” represents n-back or exponential hysteresis, respectively, while incrementally adding a step back to the n-back horizon with each successive model within a hysteresis category (e.g., the rightmost models 2CE1, 2CE2, and 2CE3). (a) Shown for each model is average goodness of fit relative to the null chance model (“X”) with (light bars) and without (light and dark bars combined) a penalty for model complexity according to the corrected Akaike information criterion (AICc). With the addition of action bias and hysteresis parameters alongside GRL, Poor learners (blue bars) and Nonlearners (red bars) revealed the greatest gains in model performance, but Good learners (green bars) benefited significantly as well. The best-performing models (written above each plot) featured not only GRL for the actual learners but also constant bias and exponential hysteresis for all (FH-G: 2CE1, FH-P: 1CE3, FH-N: XCE2; see Fig 3 for CM-G: 2CE1, CM-P: 1CE2). For the most essential Good-learner group, the originally preferred 2CE1 model was validated as preferable to both simpler and more complex alternatives for the specification of bias and hysteresis or lack thereof. A more positive residual corresponds to a superior fit. (b) Counts of the participants best fitted by each model according to the AICc are plotted with separation of Good learners, Poor learners, and Nonlearners. At the individual level, 87% of participants across both data sets exhibited significant effects of some kind of action bias or hysteresis. The 7-parameter 2CE1 model—complementing 2-parameter GRL with constant bias and 2-parameter exponential hysteresis—accommodates heterogeneity in both learning and action-specific effects across individuals, leaving 64% best fit by 2CE1 or one of its nested models rather than other n-back or n-back-plus-exponential models.

doi: https://doi.org/10.1371/journal.pcbi.1011950.g002