Minimal cross-trial generalization in learning the representation of an odor-guided choice task

doi:10.1371/journal.pcbi.1009897

Minimal cross-trial generalization in learning the representation of an odor-guided choice task

Fig 2

State representations of RL models.

(A) The four-state model: free-choice trials and correct forced-choice trials share the same “Left” and “Right” states; “Right-NoRwd” and “Left-NoRwd” are the corresponding states for incorrect forced-choice trials. This is the true structure of the task as designed by the experimenters, as the same reward was available in forced-choice trials and free-choice trials if a correct choice was made. (B) The six-state model: each of the three odors leads to one of two states for left and right choices, with no generalization across odors. (C) The hybrid-value model: this model uses both the four-state and six-state representations (with a total of 10 states), with state values combined using weights w₄ and (1 − w₄) (illustrated as vertical boxes). (D) The hybrid-learning model: the same state representation and learning rule (green arrows, with learning rate η) as in the six-state model, with additional generalization (orange arrows, with generalization rate η_g) between states representing valid forced choices and free choices. For simplicity, shown here only half of the learning and generalization updates (when reward is delivered in Left-Forced and Right-Free states), each representing generalization from forced-choice states to free-choice states or vice versa; the same rules apply to Right-Forced and Left-Free states. Boxes in white and gray represent rewarded and unrewarded states, respectively.

doi: https://doi.org/10.1371/journal.pcbi.1009897.g002