Reward Optimization in the Primate Brain: A Probabilistic Model of Decision Making under Uncertainty
(a) Optimal value as a joint function of and the number of POMDP steps . (b) Optimal Policy as a function of and the number of POMDP steps . The boundaries and divide the belief space into three areas: (red), (green), and (blue), each of which represents belief states whose optimal actions are and respectively. Model parameters: , , and . (c) Left: The rightward decision boundary for different values of . Right: The half time of for different values of , where .