Reduced motivation is an important symptom of major depression, thought to impair recovery by reducing opportunities for rewarding experiences. We characterized motivation for monetary outcomes in depressed outpatients (N = 39, 22 female) and controls (N = 22, 11 female) in terms of their effectiveness in seeking rewards and avoiding losses. We assessed motivational function during learning of associations between stimuli and actions, as well as when learning was complete. We compared the activity within neural circuits underpinning these behaviors between depressed patients and controls.
We used a Go/No-Go task that assessed subjects’ abilities in learning to emit or withhold actions to obtain monetary rewards or avoid losses. We derived motivation-relevant parameters of behavior (learning rate, Pavlovian bias, and motivational influence of gains and losses). After learning, participants performed the task during functional magnetic resonance imaging (fMRI). We compared neural activation during anticipation of action emission vs. action inhibition, and for actions performed to obtain rewards compared to actions that avoid losses.
Depressed patients showed a similar Pavlovian bias to controls and were equivalent in terms of withholding action to gain rewards and emitting action to avoid losses, behaviors that conflict with well-described Pavlovian tendencies to approach rewards and avoid losses. Patients were not impaired in overall performance or learning and showed no abnormal neural responses, for example in bilateral midbrain or striatum. We conclude that basic mechanisms subserving motivated learning are thus intact in moderate depression.
Therapeutically, the intact mechanisms identified here suggest that learning-based interventions may be particularly effective in encouraging recovery. Etiologically, our results suggest that the severe motivational deficits clinically observed in depression are likely to have complex origins, possibly related to an impairment in the representation of future states necessary for long-term planning.
Citation: Moutoussis M, Rutledge RB, Prabhu G, Hrynkiewicz L, Lam J, Ousdal O-T, et al. (2018) Neural activity and fundamental learning, motivated by monetary loss and reward, are intact in mild to moderate major depressive disorder. PLoS ONE 13(8): e0201451. https://doi.org/10.1371/journal.pone.0201451
Editor: Jean Daunizeau, Brain and Spine Institute (ICM), FRANCE
Received: October 12, 2017; Accepted: July 16, 2018; Published: August 2, 2018
Copyright: © 2018 Moutoussis et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper, its Supporting Information files and 'Neurovault' repository. Note that the ethical permission does not allow us to publish potentially identifiable MRI images. Therefore MRI data underlying all findings in the manuscript are provided at the group (second) level. Detailed spreadsheets with all questionnaire and model parameters are included in the submission; however, the scan data are uploaded to the 'Neurovault' public directory and detailed instructions of how to access it are provided in the submitted Supplementary Materials file with the key and notes to data.
Funding: Ray Dolan is supported by a Wellcome Trust Senior Investigator Award (ref 098362/Z/12/Z) (https://wellcome.ac.uk/). Michael Moutoussis is funded by the Neuroscience in Psychiatry Network (NSPN), via a Wellcome Trust (ref 095844/7/11/Z) Strategic Award where Ray Dolan is a Principal Investigator (https://wellcome.ac.uk/); The Max Planck UCL Centre for Computational Psychiatry and Ageing Research is a joint initiative of the Max Planck Society and UCL. M. Moutoussis is also supported by the UCLH Biomedical Research Centre (www.uclhospitals.brc.nihr.ac.uk). Peter Fonagy is in receipt of a National Institute for Health Research (NIHR) Senior Investigator Award (NF-SI-0514-10157). P. Fonagy was in part supported by the NIHR Collaboration for Leadership in Applied Health Research and Care (CLAHRC) North Thames at Barts Health NHS Trust (https://www.nihr.ac.uk/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
One in 12 people suffer from major depressive disorder (MDD) at some point in their lives . Yet ‘depressive disorder’ is unlikely to be a well-circumscribed entity and may be no more specific than was the ascription ‘fever disorder’ , where the latter can reflect the presence of different pathologies. Lack of motivation is a central feature in MDD diagnostic criteria , and patients report this deficit as a source of on-going personal suffering. Health professionals, patients and relatives often blame lack of motivation for the impoverished efforts of patients to improve their condition. Studying central features of MDD like amotivation may enable a better dissection of both the underlying neurobiology, information-processing and computational processes that it subserves.
It is unclear how disturbances in motivation contribute to the genesis of depression. If outcomes that make most people happy have, in some, a reduced motivational power, they are likely to be pursued less. This in turn may provide fewer opportunities for rewarding experiences. Conceivably, if depressed mood reduces motivation below a certain threshold this would lead to a vicious cycle of reduced rewards, lower mood and lower goal pursuit. This in turn could lead to the functional decompensation we call clinical depression. Both biological and psychological theories of the maintenance of depression invoke such vicious cycles. Indeed, interrupting the vicious cycle of avoiding potentially rewarding activities and subsequent low mood is an important theoretical basis for cognitive-behavioral and especially behavioral activation therapies [4,5]. These therapies are largely effective, but the mechanisms remain unclear. Detailed mechanistic testing is required .
One core concept informing the vicious cycle hypothesis is anhedonia, a symptom that looms large in the context of clinical depression. Anhedonia, subjectively reported as a lack of pleasure in response to ordinarily rewarding experiences, is hypothesized to reflect an endophenotype of reduced responsiveness to reward  predisposing to depression. This has been termed decisional anhedonia. Early studies showed that depression, and anhedonia specifically, is associated with reduced reward sensitivity more than with other aspects of reinforcement learning , consistent with the hypothesis of decisional anhedonia. However, for this to lead to depression through a ‘vicious cycle’, a reduced average reward rate should lead to lower mood. This has been challenged at the population level, albeit controversially. For example, the variation of the gross domestic product of developed countries bears an unexpectedly weak relation to the mood of their population, as reflected by surveys of subjective well-being . It has also been challenged with respect to non-clinical individuals over short timescales . Moreover, also challenging clinical stereotypes, we recently showed that receiving surprising monetary rewards (and indeed losses) has a comparable neural and emotional impact in moderately depressed patients to that seen in healthy control participants . This suggests that depressed patients may behave maladaptively because of processes located further downstream in the processing of rewards and losses, making it imperative to study how patients learn and motivate action on the basis of experienced rewards and losses.
Learning-dependent motivation can be powerfully characterized by assessing the propensity to pursue rewards and avoid losses via tasks that tap into neuro-computational processes. Healthy, motivated adults have a tendency to emit action in the presence of potential rewards and to withhold action in the face of potential losses. Over and above this fundamental propensity, they quickly learn whether action or inaction leads to desired outcomes. This motivational structure has been well characterized by the orthogonalized Go–No-Go task, both during learning and after performance plateaus . Withholding action (‘No-Go’) is difficult to learn in a context of potential reward, while emitting action (‘Go’) is difficult to learn in a context of potential loss. In a natural environment, these Pavlovian biases usefully guide decision-making, acting as baseline, or prior, beliefs about the right decision given an expectation of reward or punishment respectively. ‘Expecting reward’ should increase the prior belief that ‘engaging with the stimulus’ is the right decision, whereas ‘expecting loss’ should favor ‘holding back’ . We thus hypothesized that depression would be (a) associated with disturbed Pavlovian guidance of action selection, as well as (b) a reduced overall impact of anticipating given rewards or losses on motivated action.
If a disturbance in Pavlovian guidance of action selection contributes to the etiology of depression, we would expect blunted Pavlovian guidance of action (i.e., less bias) in this condition. Pavlovian amotivation could reduce the effective belief that ‘engage’ (‘Go’) would bring gains in contexts where rewards are common. This could then contribute to a vicious cycle as above. The situation with regards to avoiding punishment is more complex [14,15]. In mild and moderate MDD, where anxiety is prominent, the propensity towards ‘Go’ (active avoidance) in a context of potential losses might be preserved or enhanced.
Blunted ‘reward sensitivity’, or the motivational impact of rewards on behavior, is therefore likely to be reduced in depression. Here, we tested whether moderately depressed patients show reduced sensitivity to appetitive outcomes, versus an alternative, ‘intact sensitivity’ null hypothesis. We tested this in the context of learning through well-characterized computational modeling of behavior. This parallels and complements our group’s findings regarding unchanged reward prediction error processing  in a non-learning context. At this moderate level of illness, and in the absence of explicit threats, we expected that processing aversive events would be preserved  or even enhanced, consistent with the high levels of defensive avoidance reported in depression . At the neural level, we hypothesized that brain areas whose activation mirrors the value of emitting actions (Substantia Nigra / Ventral Tegmental area (SN/VTA), ventral striatum (VS), medial prefrontal cortex (mPFC)) would be less active in depression in response to action-associated cues. This would reflect blunted reward sensitivity. By contrast, regions activated in response to cues associated with withholding actions (hippocampus, inferior frontal gyrus (IFG)) would not differ from controls . Finally, we sought to corroborate the findings regarding preserved reward prediction errors (RPEs) in depression that we recently reported  in the same participants.
Methods and materials
We recruited working-age adults from primary medical and psychological care services. Participants were recruited from North London between November 2012 and July 2014. Depressed participants had major depressive disorder, or its ICD-10 counterpart (Table 1). Ecological validity and generalizability were maximized by requiring health impairment to be severe enough to require diagnosis and active treatment by a qualified doctor or psychologist (in the UK, milder presentations are usually managed by less qualified personnel with support, self-help). All participants gave written informed consent and did not have other major psychiatric or medical diagnoses, neurological impairments or trauma, moderate or severe learning disability, claustrophobia or left-handedness. The study was approved by the City and East London Research Ethics Committee (11/LO/0250).
We matched healthy and depressed groups by age, gender, socioeconomic status and years of education. Data for this study were provided by 39 depressed and 22 healthy participants. MRI safety screening allowed us to scan 33/39 depressed and 20/22 healthy participants. Participants received a flat fee per hour of the study, travel reimbursement, plus their monetary winnings in the cognitive tasks. All the participants in this study also participated in a previous study .
Just before the scanning session, participants performed a Go/No-Go task involving learning . The task was to discover, by trial and error, whether each of four abstract stimuli required pressing a button (Go) or holding back on pressing (No-Go; see Fig 1). It was explained that during this phase of the task the correct action would be followed by the ‘best outcome’ 80% of the time and the incorrect action by the ‘worst outcome’ 80% of the time. Participants had to find out which ‘best’ and ‘worst’ outcome and action were for each stimulus. For some stimuli the best was ‘win’ and worst was ‘nothing’ (null), while for some best was ‘nothing’ and worst was ‘lose’. A stimulus could hence belong to one of four conditions: Go-to-Win (GtW), Go-to-Avoid-Loss (GtAL), NoGo-to-Win (NGtW) and NoGo-to-Avoid-Loss (NGtAL). We stressed to the participants the probabilistic nature of the task and instructed them to apply a trial and error strategy, especially if the correct responses were not clear.
One of four abstract stimuli was presented, followed by a waiting period (+). This, as well as the inter-trial interval, were jittered as shown to aid fMRI analysis. The participant’s decision, either to ‘Go’ or to ‘not Go’, was implemented when the target (o) appeared. The best action was followed by the best outcome 80% of the time. For example, if the second stimulus down was a ‘Go to avoid loss’ one, then quickly pressing the button when the circle appeared would result in a null outcome (yellow horizontal line) 80% of the time, and a loss outcome (downward arrow) 20% of the time. The stimuli were randomized as to their best action and outcome across participants. Before scanning, in the ‘discovery’ version of the task, suboptimal action would attract 20% best outcomes, but during scanning suboptimal action never led to the best outcome. No deception was involved at any point.
Each trial started with presentation of a fractal image followed by a fixation cross and thereafter a target. Only then did participants implement their decision (Go or No-Go). ‘Go’ actions always involved the same button and had to be emitted within 700 ms after target appearance. Once participants completed the ‘discovery’ task (144 trials), they were explicitly trained on the correct responses. They were instructed on the correct answers and practiced responding until they attained over 90% performance in the exact task that they subsequently performed in the scanner (see S1 File). Despite older age than subjects in earlier studies with this task, and a potential for psychomotor retardation, depressed participants achieved in-scanner success rates equivalent to the healthy controls . This was important in order to avoid performance-related confounds. We emphasized that exactly the same responses would be the correct ones for the scanning part of the experiment, where seeming failures should be attributed to chance and not to changes in action-outcome contingencies. Participants then performed the ‘trained’ version of the task in the scanner, following the above description and as per the published paradigm . The task consisted of two 12-min blocks. Each ‘win’ was worth £0.40, making average performance-related fees about £15 overall.
Behavior in the ‘discovery’ phase was fitted with computational models based on published work [12,13,20]. The core model that we used was the one shown to perform best in the literature. All models we used comprised of two parts. First, at each trial a Rescorla-Wagner or Q-learning like rule  updated the values of the presented stimuli (V) and values of the actions taken (Q) according to a constant learning rate λ and a prediction error. The latter was an estimate of the extent to which expectations about reward were violated (Eqs 1 and 2 below). In this first part, the value of actions was calculated in an unbiased form. The core model contained two different return sensitivities (aka motivational exchange rates, or inverse temperatures) ρv depending on whether a positive or negative return r was received. In our case, r could take the values of +1 (win), -1 (loss) or 0 (null outcome): Eq 1
Only Q values pertaining to realized stimuli and actions were updated–all others were carried forward from the previous trial. All models also kept track of the state values pertaining to each stimulus using the same parameters: Eq 2
Crucially, in the second part of the models the Q values were biased by up to two terms. Both were included in the core model, and represented an overall tendency towards action (‘Go bias’) and a Pavlovian bias that depended on valence (state value): Eq 3
Both the bias coefficients b assume zero values unless Vt(st)>0 (for both) and at = Go ≔ 1 (for bpav). This means that the NoGo action and aversive context are taken as reference. Finally, the policy probability for choosing an action in all models was given by the softmax function, albeit squashed by a lapse rate parameter ξ: Eq 4
The core model thus had six parameters, appetitive and aversive sensitivity, learning rate, action bias, Pavlovian bias and lapse rate. We compared this against alternatives that did not contain biases and/or multiple motivational exchange rates.
Within the core model, the motivational exchange parameters and learning rate operationalized our hypotheses that depressed participants would show reduced sensitivity to reward (reduced motivational exchange for rewards). The second part of the model was not concerned with learning, but with making the decisions to emit or withhold actions. It boosted action-values for ‘Go’ by a constant Go-bias parameter but also a Pavlovian term proportional to the value of the stimulus in question. Thus ‘Go’ was boosted for appetitive stimuli and penalized for aversive stimuli (Eqs 3 and 4). The coefficient of the Pavlovian term allowed us to operationalize the hypothesis that Pavlovian bias would be aberrant in depression.
In order to test for differences between the two groups, we fitted behavior with maximum-a-posteriori (MAP) statistical models, providing minimal regularization of the fitted parameters, as using empirical priors during Expectation-Maximization EM fitting [20,22] risks suppressing true extreme values  and thus biasing group comparisons. The MAP approach is also suitable to compare groups on an equal footing with respect to the parameters quantifying our hypotheses, so that the Pavlovian bias and motivational exchange rate parameter estimates would be available for both groups by construction. We also performed EM analyses to test whether patients and controls may have used different cognitive models, as has been found with other patient groups .
Hamilton Depression Rating scale (HDRS-17) scores were our primary measure of depression severity . This was used in correlational analyses. We also administered the Patient Health Questionnaire– 9 (PHQ-9), the Beck Depression Inventory–II (BDI-II) and the Snaith-Hamilton Pleasure scale (SHAPS). All participants provided HDRS-17 and PHQ-9 data. 14 healthy and 24 clinical participants provided BDI-II and SHAPS data.
We used a 3-Tesla Trio scanner (Siemens, Erlangen, Germany) with a 32 channel head coil. Whole-brain T2*-weighted echo-planar imaging (EPI) data were acquired using a sequence designed to minimize dropout in the striatum, frontal cortex, and amygdala . Each volume contained 43 slices of 3-mm isotropic data (echo time = 30 ms, repetition time = 3.01 s, slice tilt of -30 degrees, Z-shim of -0.4 mT/m ms, ascending slice acquisition order. To account for T1 saturation effects, the first 5 volumes of each scan were discarded. Field maps were acquired to allow for subsequent geometric distortion correction and T1-weighted images were acquired for structural alignment. Physiological monitoring included measurements of pulse and breathing using the Spike2 data acquisition system (Cambridge Electronic Design Limited, Cambridge, UK). Preprocessing of the EPI data followed standard procedures , i.e. EPI unwarping using field maps, slice-time correction to the first volume, motion correction, spatial transformation to the Montreal Neurological Institute (MNI) template, spatial smoothing with a Gaussian kernel of 8-mm full-width at half-maximum. Analyses used the Statistical Parametric Mapping package (SPM; www.fil.ion.ucl.ac.uk/spm).
Functional analyses were also performed in SPM (version 12b; www.fil.ion.ucl.ac.uk/spm). In order to assess brain activity during decision-making, i.e. in anticipation of action, we first formed regressors by convolving a canonical hemodynamic response with stick functions at the onset of the fractal images for each of our four conditions of interest (Go-to-win, Go-to-avoid-loss etc.) for each participant. The analyses included a regressor devoted to decisions which were followed by a motor response, so as to parcel out variance associated with actual motor execution. All models also included three regressors describing rigid-body translational and three describing rotational movement of the head, resulting from the realignment analysis during pre-processing, to reduce movement artifacts.
We created regressors of interest for outcome-onsets, separately for win, loss and null outcomes. Here only correct response trials were included. These were complemented by separate regressors of no interest, which modelled trials where the stimulus was not followed by a correct response. Incorrect responses were few (<5%) and likely to reflect processes, such as attentional lapses, underpinned by mechanisms unrelated to the study goals. We then performed hypothesis testing regarding differences in contrasts of interest with respect to psychopathology in three ways. First, we used functional regions of interest (ROIs) based upon areas showing task related activation or de-activation in all participants combined. We performed a 2x2 ANOVA with factors Go/No-Go and win/lose and looked across the entire sample for clusters significant at 5% FWE for the contrasts in question, but not for between-group differences, thereby avoiding ‘double-dipping’ . For example, the ‘greater activation for action emission’ contrast was (Go-to-win + Go-to-Avoid-Loss)–(No-Go-to-Win + No-Go-to-Avoid-Loss). These significant clusters defined the functional ROIs. Second, we used predefined ROIs, i.e. clusters of significant contrast within specific anatomical areas (e.g. SN/VTA) where activity was found to vary with choice in previous work . In both cases we formed contrasts using averaged ROI activations and compared them between groups. Third, we formed regressors corresponding to contrasts of interest at the first (individual) level and performed whole-brain, between-group t-tests at the group level (also see S1 File). For exploratory analyses, we first used more liberally thresholded but similarly formed ROIs. Finally we explored brain responses to reward prediction errors as per our previous analyses . Thus we formed 2x2 ANOVAs, but with factors Go/No-Go vs. better-than-expected / worse-than-expected, and implemented the functional ROI-based approach above. All exploratory analyses were performed both with 5%-FWE and with 0.001-uncorrected thresholds (also see S1 File).
All healthy controls scored 0–4 in the HDRS-17. 27/39 of the depression participants scored above a conventional cutoff of 14 for moderate depression, whereas using the classification of Zimmerman and co-workers, 21 scored in the ‘mild’ range (8–16), 17 in ‘moderate’ (17–23) and one in ‘severe’, i.e. over 23 . We confirmed that the depression sample experienced substantial problems with the subjective experience of pleasure and interest in activities. We used the SHAPS total score and a priori clinically relevant questions from the PHQ-9 and SHAPS. No matter which measure is used, it clearly separated the depression from the healthy samples with virtually no overlap. Fig 2 specifically illustrates psychometric measures relevant to motivation.
Shown are summed items ‘loss of interest’, ‘change in appetite’ and ‘loss of interest in sex’ from the Beck Depression Inventory BDI), ‘interest’ and ‘appetite’ items from the Patient Health Questionnaire– 9 (PHQ-9) and the total score of the Snaith-Hamilton Pleasure Scale (SHAPS). For each pairwise comparison, Wilcoxon p < 1e-06.
The ‘discovery’ phase of the experiment replicated previous findings (Fig 3A and 3B). Both healthy and depressed participants performed worse when required to withhold action, compared to emit action, in order to win. This, as well as the opposite effect seen in the avoid-loss conditions are the signature of Pavlovian bias .
a. Here, depressed and healthy groups are combined. Shaded bars: interquartile ranges; Notch: standard error of the median. During early trials participants perform much better in Go-to-Win than No-Go-to-Win, where they do worse than chance. b. Late trials, but otherwise as in a. All conditions show increased performance–each median has improved. However different abilities of participants to learn the optimal behavior results in greater spread of performance. This is particularly striking for the No-Go-to-Win condition. c. Performance in each quartile of trials for the Healthy Control group, showing learning across trials. Bars are +/- 1 standard error of the mean. d. As in c. but for the Depression group. Differences group differences between c. and d. are not significant.
We then fitted the behavior with the ‘core model’ using uninformative priors and MAP fitting. Against our hypotheses, we found no significant differences between the groups. Table 2 shows that for the key parameters of interest, Pavlovian bias, learning rate and especially motivational exchange rate there is evidence that the two groups do not differ.
We then performed model comparisons (Table 3). We used the integrated Bayesian Information Criterion score derived from EM fitting to compare between models and to estimate the strength of evidence for or against using two separate sets of group-level, empirical prior distributions to estimate individual parameters. When the behaviour of the two groups was fitted separately by a variety of different models, these performed worse, in terms of integrated Bayesian Information Criterion, than the ‘core’ model, with one important exception. That is, for the healthy group the ‘core’ model with the six parameters of Table 2 came second (by 17.7 BIC units) after a model that excluded the Pavlovian bias. However, when the common empirical priors (last column) vs. separate priors comparison is taken into account, we see that overall the evidence powerfully favours the core model and common priors (by 60.83 units). Fig 4 shows visually how similar the parameter posterior distributions are for the two groups under this winning 6-parameter model.
Key: ‘b’ return sensitivity, so that ‘bb’ refers to a model with separate appetitive and aversive sensitivity; ‘a’ learning rate; ‘p’ Pavlovian bias; ‘l’ lapse rate; ‘g’ Go-bias (favoring action over inaction). Separate iBIC for the two groups, and also the total sample, are shown in columns. Using a single set of parameters to describe the group distributions for the healthy and control groups achieves a better score, by about 61 BIC units, than summing the best of each separate fit. Were one to fit the two groups separately, one would run a danger of over-fitting, here over-emphasizing differences between the groups.
The ‘trained’ version of the task used in the scanner was designed to induce equivalent performance in the two groups, guarding against differences that might be engendered by differential performance or learning. Indeed there were no performance differences, the median fraction of correct responses being 0.94 for both healthy and depression groups. The effect of trial valence on reaction time was significant, with the Go-to-win trials being faster (overall sample Wilcoxon p = 0.0009) but this Pavlovian effect was the same in healthy (mean RT difference = 14.8 ms) vs. depressed participants (RT diff. = 14.4 ms, p = 0.95 between groups). The same measure is not as valid a measure of Pavlovian bias during learning because during learning RT also reflects deliberation of which is the optimal choice. Still, we checked for differences in bias-related speeding between the groups in the discovery phase and found none.
We replicated the main task findings from the literature in the combined sample. Fig 5 shows key clusters responding to action emission (5a) or its withholding (5b). There was no significant cluster responding to valence or to action-valence interaction that might index Pavlovian bias. We also found significant responses both for the main effect of reward prediction error (RPE) and for action x RPE interactions, with evidence for stronger RPE signals for Go actions (Figures A and B in S1 File). Better-minus-worse than expected outcome (bte-wte) contrasts showed prominent clusters in left inferior medial prefrontal cortex. Worse-better than expected outcome (wte-bte) contrasts were prominent in bilateral anterior insula. These regions also showed FWE corrected clusters at 5% significance for the respective interactions with action.
a. Go > No-Go contrast showed significant bilateral midbrain (crosshairs), caudate, insula, primary motor and supplementary motor clusters (FWE p<0.001). b. No-Go > Go contrast, i.e. areas activated when action was to be withheld, showed highly significant bilateral inferior frontal, medial prefrontal, posterior cingulate and hippocampal clusters, as well as a prominent left parieto-occipital cluster (all FWE p<0.001).
None of the contrasts that we tested, either at action anticipation or at outcome, showed evidence of dependence on depression in the hypothesis-driven analyses described above (Fig 6). As an important example, we tested for group differences with respect to the simple contrasts win-loss and loss-win, using functional regions of interest (see S1 File for more details about regressors and illustration of the functional ROIs, Figure A in S1 File). These contrasts provide evidence for positive and negative reward prediction errors, but they are not balanced for expected value in our task. Therefore the validity of this contrast as a measure of RPE relies on there being no influence of expected value on brain activation in our paradigm. Several studies [12,13,20] including the present one, have found no such main effect based on the cue-onset regressors justifying the use of this contrast. Using both 5% FWE and uncorrected <0.001 clusters, we found no significant differences between the groups.
Means and standard errors are shown for each of the four sets of trials. No area showed differences between depression (blue) and control (black). a. Right Caudate region of interest defined as the cluster significant at 5% FWE for the main effect of action over the entire sample. The ‘Go’ conditions have greater activation by construction, but no other differences are seen. b. As an example, this ROI is an anatomically predefined mask for Substantia Nigra / Ventral Tegmental Area. It is a subset of the cluster indicated by cross-hairs in Fig 4A. The win–avoid-loss difference is not statistically significant. c. Inferior frontal ROI defined by functional criteria (as per a.) of greater activation for holding back (‘No-Go’). d. As per (c.) but for amygdala/hippocampal cluster.
Using HDRS-17 depression scores as a continuous measure did not yield any significant correlations with neither neural activations nor behavioral measures. In the exploratory analyses, based on functional ROIs, an activation cluster comprising bilateral supplementary motor areas showed somewhat activation in the depression group (p = 0.034 uncorrected). Failure to reject the null hypothesis of no-difference in most ROIs does not by itself constitute positive evidence for this null hypothesis. We therefore calculated Bayes’ factors (scaled Jeffrey-Zellner-Siow priors; ) and found that each ROI except bilateral SMA contributed moderate evidence that the null hypothesis was true. By this Bayesian criterion, the bilateral SMA showed weak evidence for the depression group showing increased activation. This is further described in ‘Exploratory findings in Supplementary Motor Area’ in S1 File.
We used a well-characterized paradigm and a carefully selected and matched sample of outpatients to investigate motivational abnormalities in mild to moderate but clinically important MDD. Our tasks first assessed learning, just before scanning, and then neural activation during action anticipation and during responses to outcomes. A considerable body of research [8,31,32] led us to expect that depressed participants would show blunted behavioral and neural responses to potential reward, but preserved responses in the context of potential losses. Instead, we found no such behavioral blunting and no differences in neural responses across multiple brain areas, where we hypothesized that the value of actions would be represented. In exploratory analyses we found some evidence that the bilateral supplementary motor area was slightly more activated during anticipation of action in people with depression.
This evidence argues against a hypothesis that a disturbed Pavlovian bias plays an important role in depression. Our depressed participants were carefully chosen to be representative of patients seeking and receiving active, usually pharmacologic, treatment for this condition. We can therefore say with some confidence that disturbed Pavlovian bias is unlikely to impair recovery of such patients. The situation may be quite different in severely depressed patients, where disturbed Pavlovian and learning mechanisms may play more important roles. It also remains to be seen whether the treatment that our participants received may itself have normalized their decision-making parameters. However, as brief treatment is unlikely to have modified relevant traits, our results suggest that disturbances in such parameters do not constitute important trait-like risk factors for outpatient depression.
We did not replicate a widely reported blunted response to reward in depression [33–37]. Instead, the results of this study parallel the findings of intact prediction-error processing and emotional responsivity in the same sample of depressed patients, under a different paradigm that tested for full expression of an outcome prediction error . Importantly, here we distinguish between reward-dependent learning versus reward-dependent performance, as well as limiting our sample to moderate MDD. Crucially, subjects were thoroughly trained at the time of neuroimaging and so performance was equal between groups, obviating performance confounds. This performance feature might explain the intact reward responses we observed in depression. It would also imply that effects seen in previous studies may be related to decrements in performance and associated processes. Thorough training may modify how feedback is perceived from a subjective point of view, how attention may differ in depressed patients with respect to the valence of outcomes and result in rumination, etc. These processes may also impair the effective motivation of patients in ecological, non-laboratory contexts. Tentatively, we may associate such processes with reduced confidence but further studies are needed to distil them. Our study had a total N = 61 and so the absence of any trend effects even where negative findings are reported is striking. Heterogeneity of the mechanisms behind motivational disturbance in MDD may also contribute to non-replication, though the nature of this putative heterogeneity accounting for the pattern of results we observe is unclear. Furthermore, our study addressed in detail the motivating power of different types of gain and loss rather than the evaluation of different outcomes.
Our study suggests that the basic neurobiological machinery of motivated action emission and inhibition is largely preserved in moderate MDD, consistent with studies that have found other aspects of reinforcement learning to be, slightly unexpectedly, preserved or even improved in this disorder [19,38,39]. The paradigm that we used powerfully differentiated between neural activation patterns during different types of decision-making (Figs 5 and 6), so that if these patterns are disturbed in depression, then this effect is very small. Bayesian metrics (Tables 2 and 4) showed that our study provided moderately strong support in favor of the null hypothesis, going beyond the no-evidence-for-difference afforded by p-value testing . Consistent with the results we present here, our group also found intact RPEs in a non-learning context  in the same patient sample.
The only exception was the bilateral supplementary motor area, where there was evidence for greater activation in the depression group upon action emission as contrasted with action withholding.
Both the task data and the neural data stand in contrast with the clinical data, including the subjective state of anhedonia that is arguably the self-reported counterpart of decisional anhedonia (Fig 1). This may be due to the specific learning and decision-making involved in tasks involving simple associations and small monetary gains and losses. Such tasks can be carried out well based on associative operant learning and habit-based decision-making, just as described by our mathematical model. These basic mechanisms could be largely intact, suggesting that successful psychological therapies may recruit intact learning mechanisms to destabilize separate pathogenic mechanisms, rather than targeting generalized deficits in reinforcement learning . Intriguingly, a recent study found that in depressed patients, somewhat younger than in our study, those with stronger Pavlovian biases had a better prognosis . Such findings point to a possible translational potential not only of pinpointing disturbances that can be targeted by treatments, but also of identifying the intact resources that can be used in rehabilitative treatments.
One alternative, phenomenologically plausible hypothesis regarding the role of motivation in depression would be that the clinically observed deficits concern the emotional component of explicitly represented, planfully achieved, future states. This would argue for a distortion in affective forecasting  in depression. In our task, decisional consequences of distortions of affective forecasting may be masked both by the dominance of associative learning and reliance on avoiding the worse of the two outcomes for each stimulus. Another alternative is that motivation in depression assumes pathogenic importance  in domains ecologically relevant to the aberrant, real-life beliefs about negative outcomes that depressed patients harbor.
Future research should target specific ways in which modestly dysfunctional reinforcement learning mechanisms may operate as part of networks of mechanistic factors, such as model-based reasoning applied to personal contingencies and the computational role of emotion  including affective forecasting . Most importantly, rather than addressing ‘depression’ (like fever!) in general, studies of learning-based motivation should specifically investigate that minority of patients that fail to benefit from learning-based interventions, especially cognitive-behavioral therapies, which tacitly assume that patients have adequate neurobiological resources in their disposition to learn new, adaptive behaviors in therapy. Our intriguing exploratory finding (somewhat increased supplementary motor activation during anticipation of action) awaits further research. Given that it was not accompanied by changes in reaction time or neural activation during performance of action and receipt of outcome (Figure C in S1 File), if confirmed it may be related to increased subjective perception of effort in this condition 
The key limitation of the current study is its modest sample size, which meant that small differences between groups may have been missed. The Bayesian analysis and our companion study  render this less likely.
S1 File. Supplement.
Key Supplementary Material file, including additional methodological details, details of the group comparison methodology, details of analysis of brain responses to reward prediction error and exploratory findings in the Supplmentary Motor Area. It also includes Figure A, Figure B and Figure C and References to the Supplementary material. Figure A in S1 File. Outcome contrast maps for correct trials at uncorrected threshold p = 0.001. a. Win-lose. Cross-hairs at left ventral striatum. This cluster does not survive correction at 5% FWE. b. Lose-Win contrast. Cross-hairs at the level of pretectum. Figure B in S1 File. Go responses (left side of each panel) elicit stronger contrasts than No-Go both in the mPFC area significantly sensitive to better-than-expected RPEs (a.) and in the insular area more sensitive to worse-than-expected RPEs (b.). Non-overlapping notches denote significance using a conservative non-parametric (Wilcoxon) test for illustration. Figure C in S1 File. Supplementary motor area differentially activated in the Depression vs. Healthy control group. (a.) The cluster showing clearly significant greater activation during action anticipation, p FWE < 0.01 (b.). Activations for the four conditions, demonstrating lesser responsiveness for the healthy group. (c.). Contrasts during anticipation (according to Action, i.e. anticipating Go > NoGo, according to valence, i.e. anticipating Win > Avoid Loss, interaction of the two, i.e. measure of Pavlovian bias), at the onset of action itself (the Key press) and at receipt of outcome (Win > Lose only, null outcomes excluded).
S2 File. Key to data and notes.
This is a document detailing the variable names in the processed and raw data provided, and also how to access the imaging data uploaded into Neurovault. This document also explains the contents of the zip files HealthyControlBehavioralData.zip and DepressionGroupBehavioralData.zip which include the raw ‘discovery task’ data.
S3 File. Dataset—ModelParameters.
A comma-separated variable spreadsheet containing the detailed model parameters derived from the ‘winning’ model of Table 3 in the main text.
S4 File. Dataset—QuestionnaireData.
Raw data for the psychometric scales used in this work.
S5 File. Dataset–healthy control behavioral task.
behavioural data from the ‘discovery’ task for the Healthy Control group.
We would like to thank Will Penny for statistical advice; the administrative staff of the REDIT trial; Dr. Quentin Huys for making analysis software available; Peter Dayan, Tobias Hauser and Alexandra Hopkins, for advice and discussions; and many others. Ray Dolan is supported by a Wellcome Trust Senior Investigator Award (ref 098362/Z/12/Z). Michael Moutoussis is funded by the Neuroscience in Psychiatry Network (NSPN), via a Wellcome Trust (ref 095844/7/11/Z) Strategic Award where Ray Dolan is a Principal Investigator; The Max Planck UCL Centre for Computational Psychiatry and Ageing Research is a joint initiative of the Max Planck Society and UCL. M. Moutoussis is also supported by the UCLH Biomedical Research Council. Peter Fonagy is in receipt of a National Institute for Health Research (NIHR) Senior Investigator Award (NF-SI-0514-10157). P. Fonagy was in part supported by the NIHR Collaboration for Leadership in Applied Health Research and Care (CLAHRC) North Thames at Barts Health NHS Trust. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health. Cognitive experiments were realized using Cogent 2000 developed by the Cogent 2000 team at the FIL and the ICN and Cogent Graphics developed by John Romaya at the ION at the Wellcome Department of Imaging Neuroscience.
- 1. Bourdon KH, Rae DS, Locke BZ, Narrow WE, Regier DA. Estimating the prevalence of mental disorders in US adults from the Epidemiologic Catchment Area Survey. Public health reports. Association of Schools of Public Health; 1992;107: 663. pmid:1454978
- 2. Sajadi MM, Bonabi R, Sajadi M-RM, Mackowiak PA. Akhawaynι and the First Fever Curve. Clinical infectious diseases. Oxford University Press; 2012;55: 976–980. pmid:22820543
- 3. APA. DSM-5 Diagnostic and statistical manual of mental disorders. 5th ed. American Psychiatric Publishing. Arlington: American Psychiatric Association; 2013.
- 4. Beck JS. Cognitive behavior therapy: Basics and beyond. Guilford Press; 2011.
- 5. Veale D. Behavioural activation for depression. Advances in Psychiatric Treatment. RCP; 2008;14: 29–36.
- 6. Moutoussis M, Shahar N, Hauser T, Dolan RJ. Computation in psychotherapy, or how computational psychiatry can aid learning-based psychological therapies. Computational Psychiatry.
- 7. Pizzagalli DA. Depression, stress, and anhedonia: toward a synthesis and integrated model. Annual review of clinical psychology. NIH Public Access; 2013;10: 393–423.
- 8. Huys QJ, Pizzagalli DA, Bogdan R, Dayan P. Mapping anhedonia onto reinforcement learning: a behavioural meta-analysis. Biol Mood Anxiety Disord. 2013;3: 12. pmid:23782813
- 9. Easterly W. The happiness wars. The Lancet. Elsevier; 2011;377: 1483–1484.
- 10. Rutledge RB, Skandali N, Dayan P, Dolan RJ. A computational and neural model of momentary subjective well-being. Proceedings of the National Academy of Sciences. National Acad Sciences; 2014;111: 12252–12257.
- 11. Rutledge RB, Moutoussis M, Smittenaar P, Zeidman P, Taylor T, Hrynkiewicz L, et al. Association of neural and emotional impacts of reward prediction errors with major depression. Jama Psychiatry. American Medical Association; 2017;74: 790–797. pmid:28678984
- 12. Guitart-Masip M, Huys QJM, Fuentemilla L, Dayan P, Duzel E, Dolan RJ. Go and no-go learning in reward and punishment: Interactions between affect and effect. NeuroImage. 2012;62: 154–166. pmid:22548809
- 13. Guitart-Masip M, Duzel E, Dolan R, Dayan P. Action versus valence in decision making. Trends in cognitive sciences. Elsevier; 2014;18: 194–202. pmid:24581556
- 14. Johnson JD, Li W, Li J, Klopf AH. A computational model of learned avoidance behavior in a one-way avoidance experiment. Adaptive Behavior. 2002;9: 91–104.
- 15. Lieder F, Goodman ND, Huys QJM. Learned helplessness and generalization. Cognitive Science Conference http://wwwstanfordedu/ngoodman/papers/LiederGoodmanHuys2013pdf. 2013.
- 16. Mkrtchian A, Aylward J, Dayan P, Roiser JP, Robinson OJ. Modeling Avoidance in Mood and Anxiety Disorders Using Reinforcement Learning. Biological Psychiatry. Elsevier; 2017;
- 17. Moutoussis M, El-Deredy W, Bentall RP. An Empirical Study of Defensive Avoidance in Paranoia. Behavioural and Cognitive Psychotherapy. 2013;FirstView: 1–8. pmid:24073930
- 18. Koster R, Guitart-Masip M, Dolan RJ, Düzel E. Basal ganglia activity mirrors a benefit of action and reward on long-lasting event memory. Cerebral Cortex. Oxford Univ Press; 2015;25: 4908–4917. pmid:26420783
- 19. Chase HW, Michael A, Bullmore ET, Sahakian BJ, Robbins TW. Paradoxical enhancement of choice reaction time performance in patients with major depression. Journal of Psychopharmacology. Sage Publications Sage UK: London, England; 2010;24: 471–479. pmid:19406853
- 20. Guitart-Masip M, Fuentemilla L, Bach DR, Huys QJM, Dayan P, Dolan RJ, et al. Action dominates valence in anticipatory representations in the human striatum and dopaminergic midbrain. The Journal of Neuroscience. Soc Neuroscience; 2011;31: 7867–7875.
- 21. Watkins CJC., Dayan P. Q-learning. Machine learning. Springer; 1992;8: 279–292.
- 22. Moutoussis M, Bentall RP, El-Deredy W, Dayan P. Bayesian modeling of Jumping-to-Conclusions Bias in delusional patients. Cognitive Neuropsychiatry. 2011;
- 23. Efron B. Large-scale inference: empirical Bayes methods for estimation, testing, and prediction. Cambridge University Press; 2012.
- 24. Hamilton M. A rating scale for depression. Journal of neurology, neurosurgery, and psychiatry. BMJ Group; 1960;23: 56. pmid:14399272
- 25. Weiskopf N, Hutton C, Josephs O, Deichmann R. Optimal EPI parameters for reduction of susceptibility-induced BOLD sensitivity losses: a whole-brain analysis at 3 T and 1.5 T. Neuroimage. Elsevier; 2006;33: 493–504. pmid:16959495
- 26. Rutledge RB, Dean M, Caplin A, Glimcher PW. Testing the reward prediction error hypothesis with an axiomatic model. The Journal of Neuroscience. Soc Neuroscience; 2010;30: 13525–13536.
- 27. Guitart-Masip M, Economides M, Huys QJM, Frank MJ, Chowdhury R, Duzel E, et al. Differential, but not opponent, effects of L-DOPA and citalopram on action learning with reward and punishment. Psychopharmacology. Springer; 2014;231: 955–966. pmid:24232442
- 28. Kurniawan IT, Guitart-Masip M, Dayan P, Dolan RJ. Effort and valuation in the brain: the effects of anticipation and execution. The Journal of Neuroscience. Soc Neuroscience; 2013;33: 6160–6169. pmid:23554497
- 29. Zimmerman M, Martinez JH, Young D, Chelminski I, Dalrymple K. Severity classification on the Hamilton depression rating scale. Journal of affective disorders. Elsevier; 2013;150: 384–388. pmid:23759278
- 30. Rouder JN, Speckman PL, Sun D, Morey RD, Iverson G. Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic bulletin & review. Springer; 2009;16: 225–237.
- 31. Chen C, Takahashi T, Nakagawa S, Inoue T, Kusumi I. Reinforcement learning in depression: A review of computational research. Neuroscience & Biobehavioral Reviews. Elsevier; 2015;55: 247–267.
- 32. Hagele C, Schlagenhauf F, Rapp M, Sterzer P, Beck A, Bermpohl F, et al. Dimensional psychiatry: reward dysfunction and depressive mood across psychiatric disorders. Psychopharmacology. Springer; 2015;232: 331–341. pmid:24973896
- 33. Arrondo G, Segarra N, Metastasio A, Ziauddeen H, Spencer J, Reinders NR, et al. Reduction in ventral striatal activity when anticipating a reward in depression and schizophrenia: a replicated cross-diagnostic finding. Frontiers in psychology. Frontiers Media SA; 2015;6.
- 34. Pizzagalli DA. Depression, stress, and anhedonia: toward a synthesis and integrated model. Annual review of clinical psychology. NIH Public Access; 2014;10: 393. pmid:24471371
- 35. Segarra N, Metastasio A, Ziauddeen H, Spencer J, Reinders NR, Dudas RB, et al. Abnormal Frontostriatal Activity During Unexpected Reward Receipt in Depression and Schizophrenia: Relationship to Anhedonia. Neuropsychopharmacology. Nature Publishing Group; 2015;
- 36. Weinberg A, Liu H, Hajcak G, Shankman SA. Blunted neural response to rewards as a vulnerability factor for depression: Results from a family study. American Psychological Association; 2015;
- 37. Weinberg A, Shankman SA. Blunted Reward Processing in Remitted Melancholic Depression. Clinical Psychological Science. SAGE Publications; 2016; 2167702616633158.
- 38. Knutson B, Bhanji JP, Cooney RE, Atlas LY, Gotlib IH. Neural responses to monetary incentives in major depression. Biological psychiatry. Elsevier; 2008;63: 686–692. pmid:17916330
- 39. Rothkirch M, Tonn J, Kohler S, Sterzer P. Neural mechanisms of reinforcement learning in unmedicated patients with major depressive disorder. Brain. Oxford University Press; 2017;140: 1147–1157. pmid:28334960
- 40. Lewis G, Lewis G. No evidence that CBT is less effective than antidepressants in moderate to severe depression. Evidence Based Mental Health. BMJ Publishing Group Ltd, Royal College of Psychiatrists and British Psychological Society; 2016; ebmental–2016.
- 41. Huys QJM, Gölzer M, Friedel E, Heinz A, Cools R, Dayan P, et al. The specificity of Pavlovian regulation is associated with recovery from depression. Psychological medicine. Cambridge Univ Press; 2016;46: 1027–1035. pmid:26841896
- 42. Charpentier CJ, De Neve J, Li X, Roiser JP, Sharot T. Models of Affective Decision Making How Do Feelings Predict Choice? Psychological science. SAGE Publications; 2016;27: 763–775. pmid:27071751
- 43. Clery-Melin M-L, Schmidt L, Lafargue G, Baup N, Fossati P, Pessiglione M. Why don’t you try harder? An investigation of effort production in major depression. PloS one. Public Library of Science; 2011;6: e23178. pmid:21853083