Skip to main content
Advertisement

< Back to Article

Minimal cross-trial generalization in learning the representation of an odor-guided choice task

Fig 2

State representations of RL models.

(A) The four-state model: free-choice trials and correct forced-choice trials share the same “Left” and “Right” states; “Right-NoRwd” and “Left-NoRwd” are the corresponding states for incorrect forced-choice trials. This is the true structure of the task as designed by the experimenters, as the same reward was available in forced-choice trials and free-choice trials if a correct choice was made. (B) The six-state model: each of the three odors leads to one of two states for left and right choices, with no generalization across odors. (C) The hybrid-value model: this model uses both the four-state and six-state representations (with a total of 10 states), with state values combined using weights w4 and (1 − w4) (illustrated as vertical boxes). (D) The hybrid-learning model: the same state representation and learning rule (green arrows, with learning rate η) as in the six-state model, with additional generalization (orange arrows, with generalization rate ηg) between states representing valid forced choices and free choices. For simplicity, shown here only half of the learning and generalization updates (when reward is delivered in Left-Forced and Right-Free states), each representing generalization from forced-choice states to free-choice states or vice versa; the same rules apply to Right-Forced and Left-Free states. Boxes in white and gray represent rewarded and unrewarded states, respectively.

Fig 2

doi: https://doi.org/10.1371/journal.pcbi.1009897.g002