Stimulus-Dependent Adjustment of Reward Prediction Error in the Midbrain

doi:10.1371/journal.pone.0028337

Figure 1.

Gabor patch with random dot noise.

(a) Stimulus with maximal contrast (99% Michelson contrast) used during conditioning sessions. (b) Typical examples of the stimuli with the contrast corresponding to 90% and (c) 60% correctness of orientation discrimination.

More »

Expand

Figure 2.

Sequence of one trial.

A Gabor patch stimulus was presented to subjects for 500 ms: juice or tasteless saliva was delivered after 4000 ms delay. Subjects were requested to judge the orientation of stimulus as quickly as possible after the stimulus onset. After delivery of the juice or saliva, subjects were allowed to swallow the liquid during presentation of the blue fixation period.

More »

Expand

Figure 3.

Reaction time results.

Reaction time data for each stimulus discriminability, orientation, and session (reward, stimulus conditioned with reward; non-reward, stimulus conditioned with tasteless saliva; tr, average of conditioning sessions; 60%, low-contrast stimuli; 90%, high-contrast stimuli; 100%, maximal contrast stimuli used in conditioning session). Reaction times for stimuli conditioned with reward were significantly shorter than those with non-reward from the conditioning session to experimental session 3. In session 4, this pattern of reaction time difference disappeared. Error bars represent ±1 s.e.m.

More »

Expand

Table 1.

Regions with BOLD responses correlated with reward prediction error values δ(t) at the time of juice/saliva delivery (α = 0.05).

More »

Expand

Table 2.

Regions with BOLD responses correlated with predicted reward values V(t) at the time of presentation of Gabor patch stimuli (α = 0.05).

More »

Expand

Figure 4.

Preference ratings.

Rating scores for preference of juice and tasteless saliva in each session (tr: average of conditioning sessions). Error bars represent ±1 s.e.m.

More »

Expand

Figure 5.

Model simulation of predicted reward values for Gabor patch stimuli.

(a) Changes of Vr(t) and (b) changes in percentage of trials in which Vr(t) was higher than Vn(t). Results obtained using three representative values of α (0.01, 0.05, and 0.1) are depicted. Vr(t) decreased faster in the WITHOUT model than WITH model. Error bars represent ±1 s.e.m.

More »

Expand

Figure 6.

Model simulation of reward prediction error.

Changes in δ(t) for trials in which the unpredicted reward was delivered (positive prediction error trials: subjects judged orientation conditioned with tasteless saliva but juice was delivered). The vertical axis represents the average of δ(t) for each condition in each session. Results for α = 0.01, 0.05 and 0.1 are depicted. Solid black lines represent the WITH model whereas dotted gray lines represent the WITHOUT model. Black squares represent high-contrast (90% correctness) trials whereas open circles represent low-contrast trials (60% correctness). δ(t) for high and low-contrast stimuli were almost identical in the WITHOUT model, but differed in the WITH model for all learning rates.

More »

Expand

Figure 7.

Brain regions showing significant correlation between fMRI signals and reward prediction error values at the time of reward delivery (n = 23).

A white arrow indicates the midbrain region showing a significant correlation (yellow— P<.05, corrected for false discovery rate: red— P<.001, uncorrected for multiple comparison) with the variation of prediction error δ(t) at the time of reward delivery calculated using the WITH model using α = 0.05.

More »

Expand

Figure 8.

Effect size at the peak midbrain voxel in each learning rate.

The effect sizes at the peak midbrain voxel across the eight learning rates are shown for each model (filled squares and thick lines for the WITH model, open circles and dotted lines for the WITHOUT model). Vertical axis represents the effect size (parameter estimates for the regressor of the reward prediction error δ(t)) at the peak voxel averaged for 23 subjects based on the data up to session 3. The effect size was significantly greater for the WITH model than for the WITHOUT model in most learning rates (two-tailed paired t-test: *, p<.05; **, p<.01). Error bars represent ±1 s.e.m.

More »

Expand

Figure 9.

Differences in effect size between the WITH model and WITHOUT model.

Positive values represent a greater effect size for the WITH model (α = 0.05). An asterisk denotes significant difference between the two models (two-tailed t-test: *, p<.05). Error bars represent ±1 s.e.m.

More »

Expand

Figure 10.

Brain regions showing significant correlation between fMRI signals and predicted reward values at the time of Gabor patch presentation (n = 23).

White arrows indicate significant correlation with the trial-by-trial variation of the predicted reward values V(t) calculated using the WITH model, α = 0.05 in the left anterior cingulate cortex (ACC; yellow areas — P<.05, corrected for false discovery rate) and the left putamen/globus pallidus (Put/GPi, red areas — P<.001, uncorrected for multiple comparison). Significant correlations (uncorrected P<.001) in several other cortical areas and in the cerebellum were also depicted.

More »

Expand