Beyond negative valence: 2-week administration of a serotonergic antidepressant enhances both reward and effort learning signals

doi:10.1371/journal.pbio.2000756

Fig 1.

Task description.

In the decision phase (A), participants were shown two options (i.e., choices) overlaid with a cue (percentage number) informing them of the probability of receiving a real (rather than hypothetical) reward for each choice. They could only decide after an initial monitoring phase (1.4–4.5 s). The chosen option was then highlighted for 2.9–8s. In the following outcome phase (B), participants saw the outcome for the chosen option first (1.9–2.1 s). The reward magnitude was shown as a purple bar (top of the screen); the effort magnitude was indicated through the position of a dial on a circle. Whether they received a reward (i.e., the trial’s reward type) was indicated by a green tick mark (real reward, top display) or a red crossed out sign over the reward magnitude (hypothetical reward, bottom display). If a reward was real, the reward was also added to a status bar at the bottom of the screen, which tracked rewards over the course of the experiment. A reminder of which option had been chosen was shown at the top of the screen. Then, the reward and effort magnitudes were shown for the unchosen option (1.9–6.9 s). Finally, participants performed the effort phase (C), in which participants needed to exert a sustained effort by selecting circles that appeared on the screen using a trackball mouse. The number of targets was equivalent to the chosen effort outcome. Participants had to perform the effort phase on every trial independently of whether the reward was real or hypothetical. Participants successfully completed the effort phase on almost every trial. Participants performed a fixed number of 120 trials per session (thus, selecting options with less effort did not have a lower opportunity cost, i.e., it did not allow participants to perform more trials for more overall reward). An example schedule is shown in (D), with both the reward and effort magnitude values of the two options. Figure is adapted from Figure 1 [27].

More »

Expand

Fig 2.

Task validation and model comparison.

(A) The choices of participants in both groups (i, ii), between option one and option two, were guided by the learnt reward and effort differences between the options (estimated from a Bayesian model). They were more likely to choose the option with higher reward and lower effort magnitudes. (B) Regression analysis (bGLM1) predicting whether participants selected the same option again as on the last trial (“stay”) or selected the alternative option (“switch”). Participants took all relevant features of the task into account: they were more likely to choose options that had a higher displayed probability, higher learnt reward, and lower effort magnitudes (all p < 10⁻⁸; no group differences, all p > 0.2; omnibus ANOVA including regression weights for probability, learnt reward and effort also revealed no group difference: F(1,27) = 2.3, p = 0.14). Participants were also more likely to choose an option again if they had received a real reward on the last trial (t(28) = 3.04, p = 0.005). There was no difference between the groups in the overall amount of money earned. (C) Model comparison using summed Bayesian Information Criterion (BIC) values revealed that models in which choice utility was computed as a linear sum (i.e., reward + probability − effort, “Add”) provided a far better fit to the data than models computing choice utility multiplicatively (i.e., reward x probability—effort, “Mult”). Of these models, a Bayesian model (no free parameters for learning rate, reward/effort predictions are instead derived using Bayes’ rule) provided the best fit to the data (“Bayesian—Add”: BIC = 4375), closely followed by a model in which there was one free and shared parameter for the reward and effort learning rate (“Shared learning rate—Add”: BIC = 4378). The regressors for learnt reward and effort magnitudes used in the behavioral and neural analyses derived from “Bayesian—Add” were highly correlated with regressors derived from “Shared learning rate − Add” (r > 0.99). Error bars are standard error of the mean. Data for individual participants can be found in S1 Data.

More »

Expand

Fig 3.

Citalopram leads to a widespread increase in reward and effort learning signals.

(A) We identified ROIs that were sensitive to reward type (i.e., whether reward was really received or only hypothetical, analysis fGLM1) at whole-brain level (p < 0.05). Abbreviations: ventral striatum (striatum), midcingulate cortex (mCC), ventromedial prefrontal cortex (vmPFC), parietal cortex (Parietal, IPL_E, [38]). (B) Shows the time course of the regression coefficients in these ROIs for the RPE of the chosen option on the neural BOLD signal for the placebo (light green) and the citalopram (dark green) groups (analysis fGLM2). Citalopram increased the RPE signal across all ROIs (ANOVA, group difference across all areas, i.e., difference in the mean value across all four areas: F(1,27) = 9.48, p = 0.005). This effect was driven by the citalopram group showing an activation across all four areas (ANOVA including all four areas, citalopram group only, activation across all areas: F(1,14) = 7.66, p = 0.015), while the placebo group did not show any change in BOLD (ANOVA including all four areas, placebo group only, no activation or deactivation across all areas: F(1,13) = 2.20, p = 0.16). Next, we performed similar analyses for the effort dimension. (C) We first identified dorsal anterior cingulate cortex (dACC) and other areas (S2 Table and S2 Fig) as being sensitive to the relative effort outcome. (D) Shows the regression coefficient for the EPE of the chosen option on the neural BOLD signal in dACC for the placebo (light red) and the citalopram (dark red) groups. Again, citalopram significantly enhanced the EPE signal (t(27) = 3.01, p = 0.006; significance threshold for Bonferroni correction for six brain areas is p < 0.008), making it more negative like the EPE signal in other brain areas (S2 Fig). The pattern of group differences for RPEs and EPEs across the whole brain (at a reduced statistical threshold) is shown in S3 Fig. Brain maps and data for individual participants can be found in S2 and S3 Data.

More »

Expand

Fig 4.

Citalopram does not increase neural signals for reward or effort outcomes.

(A) Shows the time course of the regression coefficients for the relative (chosen minus unchosen option) reward magnitude outcomes on brain activity (analysis fGLM2) for the placebo (light green) and the citalopram (dark green) groups. We found that citalopram did not increase the relative reward magnitude outcome signal (ANOVA, testing for a main effect of group across all areas: F(1,27) = 1.19, p = 0.29). On the contrary, a more lenient time point—by—time point t test analysis of the time courses revealed that in striatum and vmPFC, citalopram, in fact, decreased the relative reward magnitude outcome signal late in the outcome phase (*p < 0.05 for time point—by—time point two-sided t tests). (B) Similarly, for dACC, citalopram did not increase the coding of the relative effort outcome signal (t(27) = 0.65, p = 0.52). Abbreviations: ventral striatum (striatum), midcingulate cortex (mCC), ventromedial prefrontal cortex (vmPFC), parietal cortex (Parietal, IPL_E [38]), dorsal anterior cingulate cortex (dACC). Data for individual participants can be found in S4 Data.

More »

Expand

Fig 5.

Citalopram protects reward learning from interference.

(A) A regression analysis (bGLM2) assessed interference in reward learning (impact of relative RPEs) by effort learning (relative EPEs) and reward type (receiving a real or only a hypothetical reward). Larger regression weights indicate larger interference. While the placebo group’s learning was affected by interfering factors (ANOVA, testing the average size of the two interference factors against zero in the placebo group: F(1,13) = 5.39, p = 0.033), this was remedied by citalopram (ANOVA, testing the average size of the two interference factors against zero in the citalopram group: F(1,14) = 1.04, p = 0.32; ANOVA, testing whether the two groups differed in the average size of the two interference factors: F(1,27) = 7.00, p = 0.013). This effect can be illustrated more directly by comparing how much participants could take RPEs into account for making decisions on the next trial when there was interference or when there was not (analyses bGLM3a and b). (B) When reward was real, the two groups did not differ in how well they could use RPEs (t(27) = −0.47, p = 0.64). However, when reward was only hypothetical, the citalopram group was better at using RPEs (t(27) = −2.21, p = 0.036). (C) Similarly, when EPEs were favorable, the two groups did not differ in how well they could use RPEs (t(27) = −0.32, p = 0.75), but when EPEs were unfavorable, the citalopram group was better at using RPEs (t(27) = −2.69, p = 0.012). Error bars show standard error of the mean, *p < 0.05. Data for individual participants can be found in S5 Data.

More »

Expand