Modeling the Violation of Reward Maximization and Invariance in Reinforcement Schedules

doi:10.1371/journal.pcbi.1000131

Modeling the Violation of Reward Maximization and Invariance in Reinforcement Schedules

Figure 1

Behavioral paradigm used in the reward schedule task.

(A) Color discrimination task. Each trial begins with the monkey touching a bar. A visual cue (horizontal black bar) appears immediately. Four hundred milliseconds later a red dot (WAIT signal) appears in the center of the cue. After a random interval of 500–1500 ms the dot turns green (GO signal). The monkey is required to release the touch-bar between 200 and 800 ms after the green dot appeared, in which case the dot turns blue (OK signal), and a drop of water is delivered 250 to 350 ms later. If the monkey fails to release the bar within the 200–800 ms interval after the GO signal, an error is registered, and no water is delivered. An anticipated bar release (<200 ms) is also counted as an error. (Red, green and blue dots are enlarged for the purpose of illustration). (B) 2-trial schedule. Each trial is a color discrimination task as in panel A, with cues of different brightness for different trials (see Materials and Methods for details). In the 2-trial schedule, completion of the first trial is not rewarded and is followed by the second trial after an inter-trial interval (ITI) of 1–2 seconds. An error at any point during a trial causes the trial to be aborted and then started again after the ITI interval. The same applies to schedules of any length. Schedules of different length are randomly interleaved. Note that after an error, the schedule is resumed from the current trial and not from the first trial of the schedule.

doi: https://doi.org/10.1371/journal.pcbi.1000131.g001