Joint modeling of choices and reaction times based on Bayesian contextual behavioral control

doi:10.1371/journal.pcbi.1012228

Joint modeling of choices and reaction times based on Bayesian contextual behavioral control

Fig 2

Choice and RT generation in the BCC and the RL+DDM models.

(A) The BCC model uses experienced states and rewards as inputs, which update the underlying MDP, as well as parameters and priors through learning. To generate a choice, a policy is sampled from the prior over policies P = p(π) and expected rewards under this policy are evaluated through the likelihood L = p(R|π). This process is iterated until a stopping criterion is reached, and the agent is sufficiently certain that the sampling approximates the true posterior q(π) ≈ p(π|R). The action according to the most recently sampled policy is then chosen and executed. Variability in RTs and choices is a direct consequence of the policy sampling for the calculation of the predicted rewards, and therewith an intrinsic property of the planning process in the BCC model. (B) A typical RL+DDM value-based decision making and RT model uses states and rewards as inputs to an instance of an RL model. These are used to update the underlying model through learning, and calculate reward-based action values (Q-values), which are used to select actions. The action values are fed into an instance of a DDM, or more generally, an evidence accumulator model. The action values are used to set DDM parameters, such as the drift rate. In each sampling step, Gaussian white noise is added to introduce variability to the sampled choices and RTs. Finally, when the sampling reaches a threshold, a choice is selected and executed.

doi: https://doi.org/10.1371/journal.pcbi.1012228.g002