When Does Model-Based Control Pay Off?
Fig 10
The influence of the type of reward distribution (points vs probabilities) on choice accuracy.
(A) We ran simulations of RL agents on two different two-armed bandit tasks. For one, the reward distributions indicate the reward probability associated with each action. The other task does not include binomial noise, but instead the actions pay off rewards that are directly proportional to its value in the reward distribution. (B) Agents show greater accuracy in choosing the highest-value action on the task the task where the two-armed bandit pays off points instead of affording a probability to win a reward, especially when both the inverse temperature and learning rate were high. (C) The Q-values of each action shows stronger correlations with their objective reward value in the task where the two-armed bandit payed off points instead of affording a probability to win a reward.