Reward Optimization in the Primate Brain: A Probabilistic Model of Decision Making under Uncertainty
The input to the model is a random dots motion sequence. Neurons in MT with tuning curves emit spikes at time step , which constitutes the observation in the POMDP model. The animal maintains the belief state by computing ( can be parameterized by and - see text). The optimal policy is implemented by selecting rightward eye movement when , or equivalently, when (and likewise for leftward eye movement ).