Reward Optimization in the Primate Brain: A Probabilistic Model of Decision Making under Uncertainty
Left: The graphical model representing the probabilistic relationship between random variables , , and . In the POMDP model, the hidden state corresponds to coherence and direction jointly. The observation corresponds to MT response . The relations between these variables are summarized in table 1. Right: In order to solve a POMDP problem, the animal maintains a belief , which is a posterior probability distribution over hidden states of the world given observations . At a current belief state , an action is selected according to the learned policy , which maps belief states to actions.