Modeling the Violation of Reward Maximization and Invariance in Reinforcement Schedules

doi:10.1371/journal.pcbi.1000131

Modeling the Violation of Reward Maximization and Invariance in Reinforcement Schedules

Figure 6

Predictions of the context-sensitive model in choice-schedule tasks.

(A) Description of the choice-schedule task with 2-trial schedules. At decision node N (of value V_N) the agent can either choose the immediate-reward schedule A (which gives a larger reward, R, sooner and a smaller reward, r, later) or the delayed-reward schedule B (smaller reward sooner and larger reward later). The same value σV_N is carried over to whatever outcome of the choice, but following trials in each schedule modify the value of A or B differently (curved grey arrows, shown for schedule A only. See the text for details). (B) Mean frequency of choosing the immediate-reward schedule (schedule A) in the task of panel 6A predicted by the model as a function of σ. Shown are means (dots) and standard deviations (error bars) over 20 simulations with β = 3, γ = 0.55, R = 1 and r = 0.5. Dashed line: theoretical prediction according to the equation with V_sch.A−V_sch.B = (1−γσ)⁻¹(1−γ)(R−r). A positive value of σ enhances the existing preference for the immediate-reward schedule. (C) Choice-schedule task between two 3-trial schedules, a generalization of the task in panel 6A. (D) mean frequency of choosing the immediate-reward schedule (schedule A) in the task of panel 6C predicted by the model as a function of σ. Shown are means (dots) and standard deviations (error bars) over 20 simulations with the same parameters as in 6B. Dashed line: theoretical prediction according to the equation with V_sch.A−V_sch.B = (1−2γσ)⁻¹(1−γ−γ²−γσ)(R−r). Dotted line: indifference point P_sel(sch.A) = 0.5, i.e., the situation where the agent has no preference for either schedule. For σ larger than ≈0.268, choice preference is reversed and the delayed-reward schedule is chosen more often.

doi: https://doi.org/10.1371/journal.pcbi.1000131.g006