Dual Reward Prediction Components Yield Pavlovian Sign- and Goal-Tracking

doi:10.1371/journal.pone.0108142

Figure 1.

Schematic of the Pavlovian conditioned approach (PCA) task.

(A) The apparatus. (B) Temporal order of events in a trial. (C) Illustrations of sign- and goal-tracking responses in the task (top) and respective DA responses (middle, bottom). (A) One of the two cues (illuminated-levers) is randomly assigned to each animal and then consistently used for all of the sessions. (B) In each trial, the cue was presented for 8s and then immediately followed by reward (food pellets in the food tray). (C) (Top) Two types of conditioned-approach behaviors are observed immediately during cue presentation; one group of rats approach the cue and stay at the cue until it is retracted at which time they approach the food-tray (sign-tracking); the other group approach the food-tray and wait for the reward (goal-tracking). (Middle) Phasic dopamine (DA) release recorded in the core of the nucleus accumbens using fast-scan cyclic voltammetry during cue and reward presentation in the final conditioning session [25]. (Bottom) Illustration of the phasic DA activity assumed in this study, based on the DA release recorded in the nucleus accumbens.

More »

Expand

Figure 2.

Schematic diagram of reward prediction and Pavlovian responses.

Reward predictions (V(t)) are learned as the sum of the sign (V_s(t)) and the goal component (V_g(t)), which are based on neural correlates of cue () and cue-evoked recall of reward () respectively. Upon the cue presentation, in addition to the standard cue correlate, the correlate of cue-evoked reward is generated (indicated by the dashed arrow from to ). The TD error is used for learning both V_s(t) and V_g(t) (“TD Error”). On the other hand, the correlate of cue-evoked reward presentation is learned independent of TD error (“reward rep”), which further influences the learning of goal component (“TD error + reward rep”). Each reward prediction component V_s(t) and V_g(t) supports specific pavlovian responses directed towards the cue and the food tray respectively.

More »

Expand

Figure 3.

Pavlovian approach responses in the PCA task.

Results showing the probability of approach responses, mean + s.e.m., of (A) the simulated models and (B) the animal experiments [25]. ST: sign-tracking, GT: goal-tracking. The plots show the probability of sign-tracking responses relative to the probability of goal-tracking responses in the range [−1, 1].

More »

Expand

Figure 4.

Responses of intermediate group (IG) in the PCA task.

Results showing the probability of approach responses, mean + s.e.m., of (A) the simulated models and (B) the animal experiments [14]. ST: sign-tracking, GT: goal-tracking, IG: intermediate-group. The plot shows the probability of sign-tracking responses relative to the probability of goal-tracking responses in the range [−1, 1].

More »

Expand

Figure 5.

Secondary reinforcer task.

(A) Task apparatus. In this task, a nose poke into the active or inactive port led to insertion of the cue into the chamber for 2 s or nothing occurred, respectively. The active and inactive ports were randomly pre-assigned for each animal. (B, C) Simulation results: number of choices, mean + s.e.m, of the active and inactive ports in sign-tracking (B) and goal-tracking (C) models with either paired (black bars) or random (white bars) cue-reward presentations in the PCA task. (D, E) Results of experiments [25]: number of nose pokes into the active and inactive ports in sign-tracking (D) and goal-tracking (E) rats that received either paired (black bars) or random (white bars) cue-reward presentations in the PCA task. Significantly different responses to active port in paired vs. random condition (P<0.01; t-test) are indicated with ‘*’.

More »

Expand

Figure 6.

Temporal difference (TD) errors and phasic dopamine (DA) responses in the PCA task.

(A, B) Simulations: temporal difference errors (a putative indicator of phasic DA activity), mean + s.e.m, at the time of cue and reward (white) in the sign-tracking (A) and goal-tracking (B) models. (C, D) Experiments [25]: peak DA concentration recorded in the core of the nucleus accumbens (mean + s.e.m), using fast-scan cyclic voltammetry and measured as change in peak DA concentration during the 5 s after cue or reward presentation averaged over 25 trials in each session, of sign-tracking (D) and goal-tracking (D) rats. Significantly different responses (P<0.01) between the cue and the reward are shown with ‘*’ (paired t-test).

More »

Expand

Figure 7.

Components of reward prediction at the time of cue presentation in ST and GT.

Reward predictions (V(t)) are learned as the sum of the sign (V_s(t)) and the goal component (V_g(t)), which are based on correlates of cue () and cue-evoked reward () respectively. Reward prediction components evoked by the cue presentation are shown for both ST and GT groups across different sessions of the PCA task in paired (or control) (A, B) and DA-blockade (C, D) conditions.

More »

Expand

Figure 8.

Recovery of TD error and DA led to rapid emergence of goal-tracking, but not of sign-tracking.

(A, B) Simulations: (Left) probability of sign-tracking, mean + s.e.m, (A) and goal-tracking, mean + s.e.m, (B) in the models when reward prediction learning was present (white) or absent (black) in the first 7 sessions and (Right) corresponding probabilities in the 8^th session, mean + s.e.m, where reward prediction learning was present. (C, D) Experiments [25]: (Left) Probability of approaching the CS-lever, mean + s.e.m. in presumed sign-tracking rats (C) and food-tray, mean + s.e.m, in presumed goal-tracking rats (D) upon injection of flupenthixol (black) and saline (white) in the first 7 sessions and (Right) corresponding probabilities, mean + s.e.m, in the 8^th session where flupenthixol was not administered. Significantly different probabilities in 8^th session are indicated by ‘*’ (unpaired t-test).

More »

Expand