This is an uncorrected proof.
Figures
Abstract
Humans and other animals learn the value of candidate actions by interacting with their environment, which invariably requires the exertion of effort. Dopamine has been implicated in both effort and reward learning, but little is known about how these processes interact. In this double-blind study, healthy young adults (N = 43) were randomized to receive either high-dose sulpiride (a post-synaptic D2-receptor antagonist) or placebo. Participants then completed a novel two-armed bandit task, in which they weighed the effort costs associated with each option against their expected rewards. Overall, learning accuracy was lower on sulpiride compared to placebo. Computational modeling revealed that this was driven by the capacity of effort to significantly modulate learning rates on placebo but, critically, not on sulpiride. Simulations showed that the capacity of effort to modulate learning rates plays an adaptive role by improving performance in agents whose learning would otherwise be compromised by low motivation. Together, these data provide causal evidence that dopamine supports the relationship between effort and learning, and reveal a novel role for dopamine in shaping how humans learn from the consequences of their actions.
Citation: Jarvis H, Obawede O, Huynh AQ, Coxon JP, Bellgrove MA, Chong TT-J (2026) Dopamine D2-receptor blockade in humans disrupts the effect of effort on learning. PLoS Biol 24(4): e3003765. https://doi.org/10.1371/journal.pbio.3003765
Academic Editor: Matthew F. S. Rushworth, Oxford University, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND
Received: July 31, 2025; Accepted: April 3, 2026; Published: April 16, 2026
Copyright: © 2026 Jarvis et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Data and original modeling code used for the main analyses in this study are publicly available from the corresponding author’s Github repository (https://github.com/huwjarv/sulpiride-effort-learning) and at the following link: https://doi.org/10.5281/zenodo.19245259.
Funding: M.A.B. is supported by grants from the National Health and Medical Research Council of Australia (App 2025415; App 2010899; https://www.nhmrc.gov.au). T.C. is supported by grants from the Australian Research Council (DE180100389; DP180102383; https://www.arc.gov.au). J.C. is supported by an Australian Research Council Future Fellowship (FT230100656; https://www.arc.gov.au). H.J. was supported by a Research Training Program (RTP) scholarship funded by the Australian Government Department of Education (CHESSN 3337449987; https://www.education.gov.au). The funders played no role in study design, data collection and analysis, decision to publish, or the preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Abbreviations: AIC, Akaike Information Criterion; AW, Akaike weights; BBI, Bang Blinding Index; MVC, maximal voluntary contraction; PD, Parkinson’s disease; RPEs, reward prediction errors.
Introduction
The exertion of effort and the capacity to learn from our mistakes are both fundamental to daily life. Separate literatures have emphasized the importance of dopaminergic neurotransmission in supporting effort-based decisions [1–5], and reinforcement learning [6–13]. Despite often being studied independently, effort and learning are functionally and behaviorally related, and a topical question has been how to reconcile the roles of dopamine in both processes [14]. Data suggest that dopamine may support an interaction between effort and learning [15–17], but this proposal has not been directly tested in humans.
Dopamine is known to play a causal role in decisions about whether to engage in effortful actions [18–20], and in the exertion of effort itself [1,2,5]. Humans and other animals generally consider effort to be aversive [18,21–23], such that the prospect of exerting effort reduces (or ‘discounts’) the subjective value of rewards on offer [24–26]. Traditional models of basal ganglia function stipulate that D2 receptors are primarily involved in movement inhibition [27,28], but more recent work has challenged this view. For example, D2 signaling is also linked to increased motor activation [29,30], and blockade of D2-receptors can heighten the aversiveness of effort [31].
A separate body of literature has implicated dopamine in the capacity to learn from reward [32,33]. Most notably, the firing rates of dopaminergic neurons convey the reward prediction errors (RPEs) that drive reinforcement learning [34–36]. Canonical models of reinforcement learning propose that individuals update their expectations by scaling the magnitude of the RPE as a function of a learning rate parameter [32], and previous work has shown that individuals tend to learn more quickly from positive relative to negative outcomes [37–40]. Disrupting dopaminergic signaling has been shown to impede reward-based learning [41–44], with D2-receptor blockade in humans appearing to modulate the relative efficiency with which individuals learn from rewards and punishments [43,45].
Together, these previous studies demonstrate an important role for dopamine in both the execution of effortful actions and the capacity to learn from reward. However, these two functions are typically studied in isolation, and an important question is how to reconcile these seemingly disparate functions of dopamine within a single framework [14,15,33]. There is growing evidence of a close behavioral and neurophysiological relationship between effort and learning. For example, exerting greater effort in a reward learning task augments the effect of positive RPEs, and attenuates the effect of negative RPEs [40]. Furthermore, studies in nonhuman animals suggest that effort can amplify dopaminergic activity associated with positive outcomes [46], and attenuate activity associated with negative outcomes [47]. These findings are consistent with the notion that the neural mechanisms underpinning reward-guided effort and learning are closely related, but a causal role for dopamine is yet to be established. An outstanding question, therefore, is whether intact dopamine signaling is necessary to support the effect of effort on learning in humans—and, if so, why.
In this study, healthy human volunteers completed a novel two-armed bandit paradigm, in which they had to exert high or low levels of physical force to register their responses [40]. We compared performance in participants on a dopamine D2-receptor antagonist (sulpiride, 800 mg; n = 23) against a separate group of participants on placebo (n = 19). First, in the placebo group, we aimed to replicate previous findings that effort modulates asymmetries in the efficiency of learning following rewarded and nonrewarded choices [40]. Next, we tested whether dopamine plays a causal role in maintaining this interaction by asking whether the effect of effort on learning was also present in the sulpiride group. Finally, we examined whether dopamine plays an adaptive role in maintaining task performance by simulating the effects of preserved and disrupted D2 signaling on the interaction between effort and learning.
Methods
Ethics
The experimental procedures in this study were approved by the Human Research Ethics Committee of Monash University in Melbourne, Australia (Project ID: 26350). The study was conducted in accordance with the principles of the Declaration of Helsinki. All participants provided written informed consent before commencing the study.
Participants
We recruited healthy adults aged between 18 and 40 years. Exclusion criteria were: females not on hormonal contraception; recent use of psychotropic medication or recreational drugs; personal history of neurological disease or psychiatric illness; and contraindication to the study drug. Four participants were excluded due to inconsistent behavior during calibration of the dynamometers, such that they exerted forces exceeding their nominal maximal voluntary contraction (MVC) by more than 10% during the experimental blocks. The final sample included 42 participants: 19 in the placebo group and 23 in the sulpiride group. Groups did not differ in terms of age, gender, and body mass index (all p > .1; Table 1).
Study design
We used a double-blind, randomized design to compare a single 800 mg dose of sulpiride to placebo. At an 800 mg dose, sulpiride effectively blocks post-synaptic D2-receptors [48,49]. This study was planned and conducted as a within-subjects cross-over design, in which participants were administered either sulpiride or placebo (microcrystalline cellulose) in identical-looking capsules across two counterbalanced sessions separated by one week to ensure drug washout (T1/2 = 8 hours). However, post hoc analyses after unblinding at the conclusion of the study revealed strategic changes in learning following the first session, which resulted in significant differences in behavior between the two sessions (S1 Text; S1 Fig). Consequently, in this paper, we analyze the effect of drug versus placebo in the first session only using a between-subjects design [e.g., 13].
Participants were randomized according to a pre-determined schedule prepared by an independent researcher. After ingesting the corresponding capsule in each session, participants were seated for 2 hours to allow for sulpiride to reach peak plasma concentrations before commencing the practice blocks [50–52]. We took baseline and hourly measures of heart rate and blood pressure, and reports of subjective state using a digital version of the Bond and Lader Visual Analogue Scales [53].
Drug reactions and blinding
Sulpiride did not alter heart rate or blood pressure relative to placebo (S1 Text; S2 Fig). There were no effects of sulpiride versus placebo on subjective reports of feeling strong/feeble, well-coordinated/clumsy, or lethargic/energetic, nor on aggregated factors of alertness, contentedness, and calmness. Small differences in feeling alert/drowsy did not survive correction for multiple comparisons (S1 Text; S3 Fig). No participant reported any adverse drug reactions during the testing session.
At the conclusion of testing, participants were asked whether they believed they had ingested the placebo, the active drug, or if they were unsure. These data were then used to calculate the Bang Blinding Index (BBI) for each drug group (1 = complete lack of blinding; 0 = perfect blinding; −1 = complete attribution of assignment to alternate group; [54]). Importantly, the BBI in both groups was close to zero, which reflects effective blinding in our study (placebo, BBI = 0 ± 0.17 (mean ± SEM), 95% CI = [−0.33, 0.33]; sulpiride, BBI = 0.26 ± 0.15, 95% CI = [−0.04, 0.56]; S1 Text). In addition, a supplementary analysis demonstrated that behavior was not significantly related to whether participants were able to correctly guess their assigned drug group (S1 Text).
Behavioral paradigm
To examine the effects of effort on choice and learning, we used a reward learning paradigm involving the exertion of effort [Experiment 2 in 40]. Specifically, participants in both drug groups completed a two-armed bandit task in which they were required to register their responses by applying pre-specified levels of physical force (‘low’ or ‘high’) to a pair of hand-held dynamometers (SS25LA, BIOPAC Systems, USA). Target force levels were standardized for each participant as proportions of their maximum voluntary contraction (MVC; low = 5%; high = 44%). MVCs for each hand were defined at the beginning of the experiment as the maximum force generated from three ballistic contractions with the corresponding dynamometer.
The experimental task consisted of two blocks of 150 trials. On each trial, participants were presented with a pair of abstract stimuli (fractals) on the left and right of the screen, and were required to select which was more rewarding based on probabilistic feedback received on previous trials (Fig 1A). The left/right location of each stimulus was randomized on every trial. On any given trial, both stimuli could have a high probability of being rewarded (P = 0.7); both could have a low probability (P = 0.3); or one could be superior to the other (P = 0.7 versus P = 0.3). Stimulus-reward contingencies changed after every 12 or 24 trials according to a pseudorandomised sequence, and contingency changes were not signaled to participants (Fig 1B). On rewarded trials, a ‘smiley face’ was presented for 0.5 s accompanied by a positively-valenced auditory tone (‘cash register’ sound effect). On nonrewarded trials, a ‘sad face’ was presented for 0.5 s with a negatively-valenced tone (‘wrong answer buzzer’ sound effect). Participants had a maximum of 2 s to register a response on each trial, otherwise a ‘Too slow!’ message was displayed for 0.5 s and then the next trial began. Participants were incentivised by the opportunity to increase their remuneration based on their performance.
(A) Participants made a series of choices between two abstract stimuli by applying physical force to a pair of hand-held dynamometers. One stimulus required a negligible amount of force to select (low effort stimulus; >5% MVC) and the other a greater amount of force (high effort stimulus; >44% MVC). Participants, therefore, had to balance an aversion to the high effort stimulus against their desire to maximize reward. (B) Stimulus-reward contingencies across a single block of 150 trials. The probability of reward upon selecting a given stimulus was either P = 0.7 or P = 0.3. These contingencies reversed every 12 or 24 trials, and reversals were not signaled to participants. Fractal images in panel (A) are by Optoskept, reproduced from https://commons.wikimedia.org under a Creative Commons Attribution 4.0 International license.
A critical feature of this task was that one of these two stimuli was always designated the ‘low effort’ stimulus, and could be selected by exerting only a negligible amount of force (>5% MVC). The other was designated the ‘high effort’ stimulus, and required a higher amount of force to be selected (>44% MVC). These stimulus-effort mappings remained constant for the duration of the experiment, and participants were explicitly informed about the identity of the low and high effort stimuli at the start of the testing session. To reinforce these stimulus-effort mappings, participants performed a preliminary block of 50 trials in which they were cued to generate the force corresponding to either the low or high effort stimulus (randomly determined). Participants then received binary feedback (correct versus incorrect) about whether they had generated the correct amount of force (5%–44% of MVC for the low effort stimulus, or >44% MVC for the high effort stimulus). In the subsequent experimental blocks, participants thus had to incorporate a consideration of both the effort required to select each stimulus, which was known in advance, as well as the expected reward, which had to be learned during the task.
The experiment was run in Psychtoolbox [55] implemented in Matlab version 9.4 (2018) [56], and presented on a monitor at a viewing distance of ~60 cm.
Data analysis
Statistical analysis.
Choice accuracy was defined as the proportion of trials on which the participant chose the stimulus associated with the higher probability of reward (when stimulus-reward contingencies were P = 0.3 versus P = 0.7). We also examined choice behavior with respect to win-stay and lose-switch strategies. Win-stay behavior was defined as the proportion of trials on which the participant selected the same choice stimulus again following a positive reward outcome. Lose-switch behavior was defined as the proportion of trials on which the participant switched to the alternative choice stimulus following a negative reward outcome. ANOVAs were fit with Type III sums of squares, and violations of sphericity were corrected using the Greenhouse-Geisser method [57]. All t tests were two-tailed, and p-values were corrected for multiple comparisons using the Bonferroni method [58]. Missed choices were assessed using a two-sample rank-sum test due to nonnormality of these data. Statistical analysis was performed in Jamovi version 2.4 (2023) [59]. Plots were created in Matlab version 9.4 (2018) [56].
Computational models of learning.
We investigated interactions between effort and learning on a trial-by-trial basis by considering a family of three computational models (M1-3) adapted from previous work [40]. All models shared a common structure that combined core features of the classical Rescorla-Wagner model of reinforcement learning [60] with canonical models of effort discounting [25]. The core learning model stated that, on every trial (t), the value (V) of the chosen stimulus (s) is updated according to an RPE (δ) (Eq. 1), which is the difference between the reward obtained (r = 0 or 1) and the reward expected based on the current stimulus value (V) (Eq. 2). The extent to which V is updated by δ is determined by the learning rate (α), which takes a value between 0 and 1.
To model the effect of effort on decision-making, we also incorporated an effort discounting function [25], which reduces the reward value associated with a stimulus by the amount of effort required to select it. We thus distinguish reward value (V), which participants can learn using trial-by-trial reward feedback, from action value (V’), which also accounts for the effort required to select the stimulus in question (Fig 2B). This effort cost enters the model as Ec = 0.05 for the low effort stimulus and Ec = 0.44 for the high effort stimulus, scaled by a subject-specific effort discounting parameter (k) to capture each individual’s aversion to effort (Eq. 3).
(A) Schematic diagrams of candidate models (M1-3) depicting learning rates (α) (top row), and their predicted effects on RPEs (bottom row). M1 includes a single learning rate, M2 fits separate learning rates for positive and negative RPEs, and M3 stipulates that learning rates are sensitive to trial-by-trial effort exertion (Ex). (B) The action value (V’) associated with a given stimulus is modeled as its expected reward value (V) discounted by the effort cost required to select it (Ec). Plot depicts V = 0.75, k = 0.5. (C and D) Candidate models were simulated 100 times, yielding model recovery accuracy ≥ 0.84 for all models (C), and parameter reliability ≥ 0.7 for all parameters (D).
Finally, action values are converted into choice probabilities using a softmax function [61], in which the probability (P) of choosing a given stimulus (s1) depends on its associated action value relative to that of the nonchosen stimulus (s2; Eq. 4). An inverse temperature parameter (β) accounts for individual differences in choice stochasticity.
All three candidate models shared this common structure (Eq. 1–4), and differed only in how they estimated the learning rate (α). To allow for the possibility that learning rate varies at the level of single trials, we first defined α as a sigmoidal function of a signal gain term (G) [40].
We then let the specific contents of G be the defining feature of each candidate model (Fig 2A).
Baseline model (M1).
Model M1 reduces to a standard Rescorla–Wagner model, in which individuals learn equally well from positive and negative reward outcomes. In this model, G is estimated directly via a subject-specific signal gain parameter (γ).
Dual Learning Rates model (M2).
Model M2 was similar to M1, but accounted for the possibility that learning rates differ following positive and negative reward outcomes. For example, a common finding in reward learning paradigms is that learning rates tend to be higher for positive relative to negative RPEs [8,37–39]. Model M2 accounts for this possibility by estimating the difference between positive and negative learning rates with an additional free parameter (). This model allows for learning to be more efficient from positive than negative RPEs (
> 0), vice versa (
< 0), or equally efficient from both (
= 0).
Note that, like M1, this model assumes that effort has no effect on learning rates.
Effort Reinforcement model (M3).
Finally, based on our previous work, we included a model that accounts for the possibility that within-subject variations in learning rate are sensitive to the amount of effort invested in each choice [40]. This model is similar to M2 in assuming that learning rates are asymmetrical (i.e., differ following positive and negative reward outcomes). In M3, however, a parameter scales the effect of effort exerted (Ex) on the current trial (
), defined as the peak amplitude of force exerted as a proportion of the participant’s MVC. This model allows for the possibility that effort boosts learning from positive compared to negative RPEs (
> 0), vice versa (
< 0), or that effort has no effect on learning rates (
= 0).
Thus, in both M2 and M3, the parameter captures the degree of learning rate asymmetry. The critical distinction is that, in M2, this parameter is not dependent on effort (
). In contrast, in M3, this parameter captures the degree to which effort modulates learning by scaling the effect of effort on learning rate asymmetry (
).
Computational model fitting and comparison
Candidate models were fit to the observed choice data using maximum likelihood estimation. The best-fitting parameter values were estimated for each participant separately, using flat priors for all parameters. Model fits were compared based on the Akaike Information Criterion (AIC), which prevents overfitting by penalizing models that have a greater number of free parameters [62]. AIC scores were summed across participants to calculate overall model fits in each drug group. We also calculated Akaike weights (AW) to quantify the relative likelihood that the winning model best accounted for the observed data compared to others in the model space (Eq. 9):
where is the AIC weight of model
;
is the difference in AIC between model
and the winning model; and
is the number of models in the space.
We confirmed that each candidate model was uniquely identifiable by conducting a model recovery analysis. We ran 100 simulations per model, each of which generated synthetic data from 20 learning agents completing the experiment. We then repeated our model fitting procedure on these synthetic data and calculated the proportion of simulations on which the true generative model was successfully recovered as the winning model (Fig 2C). We also quantified the reliability of the parameter estimates from each model as the median rank-order correlation (Spearman’s ρ) between the true generative values and the recovered values across all 100 simulations (Fig 2D). Computational modeling was performed in Matlab version 9.4 (2018) [56].
Results
Sulpiride did not alter motor performance or effort preference relative to placebo
To test whether sulpiride affected motor capacity, we analyzed participants’ force generation. First, we found no significant difference in MVCs recorded during the calibration phase between the sulpiride and placebo groups (p = .54; Fig 3A). We also confirmed that participants were able to successfully meet the required effort thresholds when registering responses in the experimental blocks (<0.7% of trials missed per participant on average), and found that this did not differ significantly between drug groups (z = −1.46, p = .15). Next, we used a two-way ANOVA to examine the average peak force exerted per response in the experimental blocks (defined as a proportion of each participant’s MVC). Average effort exertion was compared between Drug groups (placebo, sulpiride) as a function of the Stimulus type (low effort, high effort). As expected, the main effect of Stimulus was significant, such that participants exerted significantly more effort when selecting the high versus low effort stimulus (ΔMVC = 0.41 ± 0.01 (mean ± SEM), t(40) = −36.8, p < .001). However, neither the main effect of Drug nor the two-way interaction was significant (both p ≥ .33; Fig 3B). We then ran an equivalent ANOVA testing differences in the standard deviation of peak force across responses (Effort SD). Importantly, the main effect of Drug group was not significant (ΔEffort SD = 0.01 ± 0.01 (mean ± SEM), F(1,40) = 3.36, p = .074; Fig 3C). The effect of Stimulus in this case was not significant (main and interaction both p ≥ .22; Fig 3C). Finally, we compared the overall effort bias of each group based on the tendency of participants to select the less effortful option. We found no significant difference in the proportion of trials on which the low versus high effort stimulus was chosen between the two groups (p = .33; Fig 3D).
(A-C) Motor performance did not differ significantly between drug groups with respect to (A) MVCs, (B) mean effort exerted during the experimental blocks, or (C) standard deviation of effort during the experimental blocks. Dotted lines in (B) indicate minimum effort thresholds for the low and high effort stimulus, respectively. (D) Sulpiride did not affect preferences for low vs. high effort actions compared to placebo. (E–G) Learning performance differed between drug groups. Relative to placebo, participants on sulpiride exhibited (E) significantly lower choice accuracy (p = .042), and (F) were less likely to repeat rewarded choices (p = .029), but (G) were no different in their tendency to switch stimulus following non-rewarded choices (p = .4). In addition, across both drug groups, participants were less likely to repeat high effort choices than low effort choices, consistent with effort aversion (p ≤ .009; F and G). Accuracy in panel E is based on trials with reward contingencies of P = 0.7 vs. 0.3. Error bars depict the standard error of the mean. Statistical significance marks are shown for differences between drug groups. *p < .05; **p < .01; PL, placebo (n = 19); SP, sulpiride (n = 23). Data underlying this Figure can be found in S1 Data.
In sum, we found no strong evidence that sulpiride affected either the motor capacity or the willingness of participants to produce forceful responses compared to placebo.
Sulpiride impaired reward-based learning relative to placebo
Having established that sulpiride did not affect motor performance relative to placebo, we next tested whether sulpiride affected performance in the reward learning task. To do so, we first compared drug groups on choice accuracy, defined as the proportion of trials on which participants chose the more highly rewarded stimulus (on trials with unequal stimulus-reward contingencies). We used a two-way ANOVA to compare Accuracy between Drug groups (placebo, sulpiride) as a function of the Stimulus type (low effort, high effort). This revealed that, across both stimulus types, sulpiride reduced overall choice accuracy compared to placebo (Drug, ∆Accuracy = 0.04 ± 0.02, t(40) = 2.1, p = .042; Stimulus main effect and interaction, p ≥ .15; Fig 3E).
We next ran analogous ANOVAs to investigate whether sulpiride altered the tendency to make Win-stay and Lose-switch choices, as more specific measures of reward-guided behavior. We found a significant main effect of Drug in the Win-stay analysis, indicating that the sulpiride group was less likely than the placebo group to repeat rewarded choices (∆Win-stay = 0.05 ± 0.02, t(40) = 2.27, p = .029; Fig 3F). This analysis also revealed a main effect of Stimulus type, which reflected that, across both groups, participants were less likely to repeat rewarded high effort choices than rewarded low effort choices (∆Win-stay = −0.02 ± 0.01, t(40) = −2.75, p = .009; Fig 3F). Notably, the Drug × Stimulus interaction was not significant (p = .89).
In the Lose-switch analysis, we found no significant difference between Drug groups (p = .4; Fig 3G). However, we again found a significant main effect of Stimulus type, which in this case indicated that participants were more likely to switch stimulus following nonrewarded high effort choices than nonrewarded low effort choices (∆Lose-switch = 0.03 ± 0.01, t(40) = 2.9, p = .006; Fig 3G). Together with the Win-stay analysis, these effects of Stimulus type are indicative of effort aversion, such that participants were in general less likely to repeat high effort choices than low effort choices. Again, the Drug × Stimulus interaction effect was not significant, providing further evidence that sulpiride did not alter effort aversion in this study (p = .75).
In sum, these analyses reveal that overall reward-related performance was lower on sulpiride than placebo, and that this was associated with changes in win-stay, but not necessarily lose-switch, behavior.
Effort modulated learning rates on placebo, but not sulpiride
We next turned to our central question of how effort modulates learning, and the role of the dopamine D2-receptor in maintaining this relationship. We compared three candidate reinforcement learning models (M1-3) in each drug group (see Methods section for details): a Baseline model (M1) with a single learning rate; a Dual Learning Rates model (M2) capturing different learning rates for positive and negative reward outcomes; and an Effort Reinforcement model (M3) in which learning rates are sensitive to trial-by-trial effort exertion [40].
In the placebo group, we found that choice behavior was best captured by the Effort Reinforcement model (M3). This model yielded a superior fit compared to alternative models (∆ AIC ≥ 16.59; Akaike weight > 0.99; Fig 4A–4C). Analysis of the key parameters in this model indicated that, as expected, the k parameter was significantly greater than zero, indicating that the prospect of exerting effort discounted the value of candidate actions (k = 0.24 ± 0.06, t(18) = 3.97, p < .001; Fig 5A). In addition, the parameter was significantly greater than zero, indicating that the overall effect of effort in this group was to increase learning rate asymmetry, boosting learning rates following positive reward outcomes, and blunting learning rates following negative reward outcomes (
= 0.95 ± 0.37, t(18) = 2.59, p = 0.019; Fig 5A). To confirm that this model was not merely approximating dual learning rates independent of effort, we ran a permutation test in which we randomly shuffled the effort exerted across trials for each participant in the placebo group and re-fit the model. The empirical data provided a better fit than the permuted data on each of 1,000 permutations, confirming that learning rates in this group were sensitive to effort at the level of single trials (permuted p < .001).
(A–C) In the placebo group (top row, blue, n = 19), the empirical data were best explained by the Effort Reinforcement model (M3), which stipulates that learning rates are modulated by the exertion of effort. This model provided a superior fit to models M1 and M2 based on AIC scores (A) and Akaike weights (B). (C) M3 also provided an accurate account of observed choices on a trial-by-trial basis. The black line shows the proportion of participants on each trial selecting the low effort stimulus (with standard error in gray), and the blue line shows the mean choice probabilities derived from M3. (D–F) In the sulpiride group (bottom row, red, n = 23), the empirical data were best explained by the Dual Learning Rates model (M2), in which effort has no effect on learning rates. This model was superior to the alternative models in this group on the basis of AIC scores (D) and AIC weights (E), and provided an accurate account of observed behavior on sulpiride (F). Data underlying this Figure can be found in S1 Data.
(A) Parameter estimates from the Effort Reinforcement model (M3) in the placebo group (n = 19), including signal gain (γ), inverse temperature (β), effort discounting (k), and effort-sensitive learning rate asymmetry (). k and
parameters were both significantly greater than zero, confirming that the prospect of effort was aversive (p < .001), and that the exertion of effort increased positive and decreased negative learning rates (p = .019). (B) k and
parameters were positively correlated (p < .001), indicating that effort modulated learning rates to a greater extent in those more averse to exerting it. (C) Parameter estimates from the Dual Learning Rates model (M2) in the sulpiride group (n = 23). The k parameter was significantly greater than zero (p = .021), consistent with effort aversion. Effort-insensitive learning rate asymmetry (
) was also positive overall (p < .001), indicating higher positive than negative learning rates. (D) k and
parameters were not significantly correlated in the sulpiride group (p = .33). Data underlying this Figure can be found in S1 Data.
We then repeated the same model comparison in the sulpiride group. Critically, choice behavior in these participants was best captured by the Dual Learning Rates model (M2), in which learning rates are entirely independent of effort (∆ AIC ≥ 34.41; Akaike weight > 0.99; Fig 4D–4F). As in the placebo group, k values were significantly greater than zero, indicating that individuals found effort to be aversive (k = 0.12 ± 0.05, t(22) = 2.48, p = .021; Fig 5C). Consistent with prior work, the parameter was also positive, indicating that learning rates were higher following positive outcomes than negative outcomes (
= 2.06 ± 0.46, t(22) = 4.52, p < .001; Fig 5C).
Together, these results demonstrate that, across both the placebo and sulpiride groups, learning rates were higher for positive than negative reward outcomes. In the absence of dopaminergic blockade, effort reinforced learning by increasing learning rates for positive RPEs, and decreasing learning rates for negative RPEs [40]. Critically, however, this effect of effort on learning was absent in the sulpiride group, thus highlighting the importance of the dopamine D2-receptor in maintaining the effect of effort on reward learning.
On placebo, effort-sensitive learning rates attenuate the detrimental effect of effort aversion on accuracy
These results raise the question of why the exertion of effort should modulate learning at all, and whether dopamine plays an adaptive role in maintaining this relationship. To address these questions, we first considered the placebo group, and asked whether the degree of effort discounting (k) was related to the degree to which effort modulated learning rate asymmetry () in the winning model (M3). Importantly, we found that these parameters were positively correlated, such that those with higher k values also had higher
values (r = 0.77, p < .001; Fig 5B). This suggests that individuals whose learning was most sensitive to effort exertion were also those least willing to choose high effort actions in the first place. We used a permutation test to verify that this significant correlation was not merely due to these parameters trading off during model fitting [63]. On each permutation, we generated synthetic data from M3 using randomly sampled, uncorrelated k and
values for 20 participants. Across 1,000 permutations, the recovered correlation between these parameters was as or more extreme than our empirical result on just three occasions, confirming that the relationship between
and
in the placebo group was most likely driven by a true correlation between effort discounting and effort reinforcement (permuted p = .003).
One possible interpretation of this result is that effort plays an adaptive role by facilitating learning in those who are averse to exerting it in the first place. To test this possibility, we first fit a linear regression model predicting mean accuracy in the placebo group as a function of effort aversion (k), learning rate asymmetry (φ), and their interaction. Although none of these parameters was significant (p ≥ .21), the direction of the interaction effect would be consistent with an adaptive role for effort in participants most averse to investing it (S1 Text; S5 Fig). To investigate this more thoroughly, we ran a simulation analysis testing the predictions of model M3 in four groups of n = 100 simulated agents (Fig 6A, 6B). Each group varied according to their degree of effort aversion (‘low’, 0 ≤ k ≤ 0.15; ‘high’, 0.45 ≤ k ≤ 0.6), and the degree to which effort modulated their learning rate asymmetry (‘low’, 0 ≤ ≤ 1; ‘high’, 3 ≤
≤ 4). Signal gain and temperature parameters were fixed at plausible values close to the observed group means (γ = 2, β = 6).
(A) Learning performance simulated using model M3 at positive values of learning asymmetry (; x-axis) and effort aversion (
; color shades). Each data point depicts mean choice accuracy (y-axis) based on 1,000 learning agents (total N = 25,000). (B) Choice accuracy (mean ± SEM; y-axis) of M3-simulated learning agents with low vs. high effort aversion (
; x-axis), further divided into those with low vs. high learning rate asymmetry (
; white vs. black markers). Each group includes 100 simulated agents (total N = 400). The model predicts that, with preserved D2-receptor function, having more effort-sensitive learning rates improves the learning performance of individuals more averse to investing effort (p < .001). (C) Simulation of model M2 (N = 25,000) at positive values of at positive values of learning asymmetry (
; x-axis) and effort aversion (
; color shades). (D) Choice accuracy (mean ± SEM; y-axis) of M2-simulated individuals with low vs. high effort aversion (
; x-axis), further divided into those with low vs. high learning rate asymmetry (
; white vs. black markers). Each group includes 100 simulated agents (total N = 400). The model predicts that following D2-receptor blockade, effort aversion is detrimental to performance irrespective of learning rate asymmetry (p < .001). Data underlying this Figure can be found in S1 Data.
We formally tested the interaction between and
in a two-way, between-subjects ANOVA comparing overall choice accuracy. This revealed a significant k ×
interaction (F(1,396) = 10.1, p = .002), such that greater effort aversion (high versus low k) led to lower choice accuracy in the low
group (∆ Accuracy = −0.042 ± 0.009, t(396) = −4.99, p < .001), but not the high
group (p > .99). Accordingly, in simulated agents who were most averse to investing effort (high k), having learning rates that were more sensitive to effort (high versus low
) significantly improved learning performance (∆ Accuracy = 0.038 ± 0.009, t(396) = 4.49, p < .001; low k group, p > .99).
This result indicates that agents who were less motivated to invest effort (high k) had lower accuracy relative to those who were more motivated (low k). Importantly, however, this effect was only seen in those whose learning rates were less sensitive to effort (low ), and not in those whose learning rates were more sensitive to effort (high
). This suggests that effort-sensitive learning rates may confer a learning advantage by significantly improving performance in those who are more averse to effort, while at least maintaining performance in those who are less averse to effort.
On sulpiride, greater effort aversion impairs accuracy regardless of learning rate
To investigate whether dopamine plays a role in maintaining this adaptive relationship between effort and learning rates, we performed the corresponding simulation on the sulpiride group using the M2 model. We began by testing whether effort discounting (k) in this group was significantly related to the asymmetry of learning rates (). Recall that, in this model,
is not sensitive to effort, and therefore should not be expected to correlate with effort discounting. Accordingly, this correlation was not significant (p = .33; Fig 5D), confirming that effort discounting did not influence learning rate asymmetry once effort exertion was decoupled from learning.
We examined the predictions of model M2 by simulating learning performance in four groups of n = 100 agents using the same range of parameter values as in the previous analysis. Specifically, groups had fixed signal gain (γ = 2) and inverse temperature parameters (β = 6), but varied according to their degree of effort aversion (‘low’, 0 ≤ k ≤ 0.15; ‘high’, 0.45 ≤ k ≤ 0.6), and learning rate asymmetry (‘low’, 0 ≤ ≤ 1; ‘high’, 3 ≤
≤ 4; Fig 6C, 6D). A two-way ANOVA showed significant main effects of both k and
. In particular, accuracy was significantly lower in agents who were more versus less averse to effort (∆ Accuracy = −0.036 ± 0.006, F(1,396) = 34.03, p < .001). In addition, accuracy was significantly lower in those exhibiting large versus small learning rate asymmetries (∆ Accuracy = −0.045 ± 0.006, F(1,396) = 51.72, p < .001). Importantly, the k ×
interaction was not significant (p = .27), indicating that learning performance was impaired in agents who were less motivated to invest effort, and this detrimental effect of low motivation was seen regardless of the degree of learning rate asymmetry.
Together with the preceding simulations, these results reveal that D2-receptor blockade removes a mechanism that can compensate for decrements in performance that otherwise result from being less motivated to exert effort.
Discussion
This study demonstrates a causal role for dopamine in supporting the interaction between effort and learning. In the placebo group, more forceful motor responses resulted in more efficient learning from positive outcomes, and less efficient learning from negative outcomes. Critically, blocking dopaminergic transmission with sulpiride disrupted the effect of effort on learning rates, and resulted in poorer learning accuracy relative to placebo. Model simulations revealed that effort-sensitive learning rates may play an adaptive role in maintaining learning performance. Under placebo, the capacity of effort to modulate learning rates served to, at the very least, maintain performance regardless of an agent’s level of motivation. In contrast, under sulpiride, agents who were less motivated to exert effort had poorer overall accuracy, and learning rates (that were no longer sensitive to effort) could no longer offset this deficit. Together, these data demonstrate an adaptive role for effort in modulating learning rates, and reveal a novel function of dopamine in maintaining this critical relationship.
The roles of dopamine in effort and learning are often studied in isolation. Here, we examined the interaction between effort and learning using a novel reward learning task which required individuals to integrate the learned value of a prospective reward with the effort required to obtain it. We then fit computational models that captured the effect of effort on learning at the level of single trials. Across both the placebo and sulpiride groups, learning rates were higher following positive relative to negative outcomes, which is consistent with previous reinforcement learning studies [37–40]. Importantly, in the placebo group, the exertion of effort modulated this learning rate asymmetry by further enhancing learning from positive outcomes, and attenuating learning from negative outcomes. This represents an important replication of previous work [40], and demonstrates the robustness of the finding that, in the presence of preserved dopaminergic signaling, effort reinforces learning.
Critically, we found that sulpiride disrupted the relationship between effort exertion and reward learning, which indicates that dopamine plays a central role in coupling these two processes. This finding broadly contrasts with traditional models of dopaminergic signaling, which draw a clear distinction between the role of dopamine in generating effortful actions versus learning from reward outcomes. For example, the tonic activity that guides engagement in effortful behavior [64,65] is typically distinguished from the phasic activity that drives reward-based learning [34,35]. However, more recent data suggest that such a dichotomy may oversimplify a more complex relationship [15–17]. For example, effort has also been associated with transient spikes in dopaminergic activity [46,66–68], and this phasic activity appears to play a role in reward valuation [15,46,69]. These recent findings predict that dopamine may serve to mediate an effect of effort exertion on reward processing. Here, we provide direct causal evidence in favor of this interpretation in humans, by showing that dopaminergic blockade dissolves the link between effort and reward learning. This result presents the possibility that transient, effort-induced dopamine signals may act on post-synaptic D2-receptors [70] to modulate phasic learning rates, and that this effect may be disrupted by D2 antagonism. However, the precise neurophysiological correlate of our effect will need to be determined in future studies.
Our findings raise the teleological question of why learning rates should be sensitive to effort at all. Data from the placebo group provide some insight into this question. On placebo, effort had a greater effect on learning () in those who were more averse to exerting it (k; Fig 5B). These data parallel the psychological concept of ‘effort justification’, which describes the tendency of individuals who are more averse to investing effort to overvalue the rewards of any such investment [71]. This tendency is often attributed to the cognitive dissonance that arises from the aversiveness of effort [72]. However, our simulations suggest that the effect of effort on learned reward values may confer a behavioral advantage. In particular, agents who were more averse to the prospect of effort achieved a higher choice accuracy if their learning rates were more versus less sensitive to effort exertion itself (Fig 6B). This suggests that the coupling of effort and learning may represent an ecological mechanism [73] that mitigates against potential performance decrements associated with low motivation.
The consequences of dopaminergic blockade in our study highlight the functional importance of a mechanistic link between effort and learning. Dopamine has long been implicated in the regulation of learning rates [6–8,74,75]. Thus, our finding that overall accuracy was lower on sulpiride versus placebo was in itself unsurprising. However, recent work has also shown that dopamine encodes reward information in the context of effortful behavior [46,47]. Our model simulations shed light on the behavioral consequences of decoupling effort and reward through dopaminergic blockade. In the sulpiride group, accuracy was poorer in learning agents who were less motivated to exert effort relative to those who were more motivated (Fig 6D). Importantly, unlike in the placebo group, this detrimental effect of being poorly motivated could not be mitigated by higher learning rate asymmetries—which, in this group, were no longer sensitive to effort. Our simulations thus point to an adaptive role for dopamine in maintaining learning outcomes regardless of one’s motivation to exert effort.
Together, these findings provide broad insights into the functions of dopamine in learning and motivation. For example, our data build on established frameworks of reinforcement learning [35,76] by suggesting that dopamine controls learning in a specific, directional manner as a function of the effort requirements of candidate actions. The current results also inform an influential model of motivation, which stipulates that striatal dopamine encodes the average reward rate of the environment, and that the vigour of an action should increase when more rewards are available [64,65,77]. Here, we describe a mechanism that might facilitate this process—namely, that the exertion of an effortful action enables individuals to adapt to, and learn more quickly about, any increases in the prevailing reward rate.
An intriguing question is whether the observed effects of effort should hold in contexts other than reward-based learning. For example, recent work has proposed that choice behavior is shaped not only by rewards, but by the formation of habits via direct strengthening of recently taken actions [78,–80]. It seems plausible that the dopaminergic mechanisms underpinning habit formation might also be sensitive to the amount of effort with which these actions are executed [81,–83], but to our knowledge this has not been directly tested. It is also unclear whether our results should generalize to learning from punishment—that is, outcomes that are not only worse than expected, but wholly detrimental to the decision-maker. Given the complex role of dopamine (and other systems) in punishment learning [84–86], it is difficult to predict the specific modulatory effects of effort in this domain. Our results thus present interesting directions for future work on the role of effort outside the context of reward learning.
Finally, these results have important implications for clinical disorders of dopaminergic dysfunction, in which motivational deficits are often accompanied by learning deficits, such as in Parkinson’s disease (PD) [6], ADHD [87], and schizophrenia [88]. For example, individuals with PD often exhibit altered effort-based decision-making [20,25,89–91], as well as reinforcement learning [6–13]. Our data suggest that, in addition to impairments in motivation and learning, the behavior of these patients may be further compromised because effort is no longer able to ‘rescue’ the impaired learning that results from greater effort aversion. Future work will be needed to test this prediction in clinical groups with dopaminergic dysfunction.
In summary, we demonstrate a novel function of dopamine in supporting an adaptive link between effort and reinforcement learning. Our data show that learning rates are sensitive to effort exertion, and that the dopamine D2-receptor plays a causal role in maintaining this effect. By modeling the effects of effort and reward-based learning within a common computational framework, this work advances earlier attempts to reconcile the role of dopamine in effort and learning across species [15,92–94], and invites further consideration of how effort-learning interactions drive motivated behavior in health and disease.
Supporting information
S1 Text. Supplementary analyses.
Full details of additional statistical and computational analyses, including: session effects on choice behavior; drug effects on heart rate and blood pressure; drug effects on subjective feelings; assessment of participant blinding; controlling for effects of drowsiness; controlling for effects of effort variability; testing differences between experimental blocks; testing the effect of sulpiride on choice exploration; exploring the interaction between k and φ; computational models with declining learning rates; computational models fit across both drug groups. Refer to file: S1 Text.
https://doi.org/10.1371/journal.pbio.3003765.s001
(DOCX)
S1 Data. Processed data underlying main analyses.
Excel workbook containing processed data underlying the main statistical analyses in the paper, including data depicted in Figs 3A–3G, 4A–4E, 5A, 5C, 6B, 6D, S1A–S1F, S2, S3A, S3B, and S4. Refer to file: S1 Data.
https://doi.org/10.1371/journal.pbio.3003765.s002
(XLSX)
S1 Fig. Session effects on behavior.
PL, placebo; SP, sulpiride. Total N = 42. Error bars depict the standard error of the mean. Numbers inside markers in panels E and F denote first and second sessions. *p < .05, **p < .01. Underlying data can be found in S1 Data (file: S1 Data).
https://doi.org/10.1371/journal.pbio.3003765.s003
(TIF)
S2 Fig. Drug effects on heart rate and blood pressure.
Effects are plotted as a function of time post-ingestion of the sulpiride (red, n = 23) or placebo (blue, n = 19) capsule. Error bars depict the standard error of the mean. bpm, beats per minute; sBP, systolic blood pressure; dBP, diastolic blood pressure. Underlying data can be found in S1 Data (file: S1 Data).
https://doi.org/10.1371/journal.pbio.3003765.s004
(TIF)
S3 Fig. Drug effects on subjective feelings.
Effects are plotted as a function of time post-ingestion of the sulpiride (red, n = 23) or placebo (blue, n = 19) capsule. (A) Aggregated factor scores. (B) Selected individual scales. Error bars depict the standard error of the mean. Underlying data can be found in S1 Data (file: S1 Data).
https://doi.org/10.1371/journal.pbio.3003765.s005
(TIF)
S4 Fig. Learning curves averaged over contingency blocks.
Accuracy (mean ± SEM; y-axis) in the sulpiride (red, n = 23) and placebo (blue, n = 19) groups on each trial since the most recent change in stimulus-reward contingencies (x-axis). Hollow markers depict accuracy during learning, solid markers depict accuracy after contingencies have been fully learned. Underlying data can be found in S1 Data (file: S1 Data).
https://doi.org/10.1371/journal.pbio.3003765.s006
(TIF)
S5 Fig. Interaction between k and φ.
Exploratory simple slopes analysis showing predicted choice accuracy (y-axis) as a function of learning rate asymmetry (φ; x-axis) and effort aversion (k; colors) derived from M3 in the placebo group.
https://doi.org/10.1371/journal.pbio.3003765.s007
(TIF)
Acknowledgments
The authors thank Dylan Curtin, Mindaugas Jurgelis, Julia Koutoulogenis, Bridgitt Shea, and Eleanor Taylor for assisting with participant recruitment and screening; Patrick Cooper and Ziarih Hawi for overseeing participant randomization; and all those who volunteered as participants in this study.
References
- 1. Panigrahi B, Martin KA, Li Y, Graves AR, Vollmer A, Olson L, et al. Dopamine is required for the neural representation and control of movement vigor. Cell. 2015;162(6):1418–30. pmid:26359992
- 2. Le Bouc R, Rigoux L, Schmidt L, Degos B, Welter M-L, Vidailhet M, et al. Computational dissection of dopamine motor and motivational functions in humans. J Neurosci. 2016;36(25):6623–33. pmid:27335396
- 3. Salamone JD, Pardo M, Yohn SE, López-Cruz L, SanMiguel N, Correa M. Mesolimbic dopamine and the regulation of motivated behavior. Curr Top Behav Neurosci. 2016;27:231–57. pmid:26323245
- 4. Chong TT-J. Updating the role of dopamine in human motivation and apathy. Curr Opin Behav Sci. 2018;22:35–41.
- 5. Michely J, Viswanathan S, Hauser TU, Delker L, Dolan RJ, Grefkes C. The role of dopamine in dynamic effort-reward integration. Neuropsychopharmacology. 2020;45(9):1448–53. pmid:32268344
- 6. Frank MJ, Seeberger LC, O’reilly RC. By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science. 2004;306(5703):1940–3. pmid:15528409
- 7. Cools R, Altamirano L, D’Esposito M. Reversal learning in Parkinson’s disease depends on medication status and outcome valence. Neuropsychologia. 2006;44(10):1663–73. pmid:16730032
- 8. Frank MJ, Moustafa AA, Haughey HM, Curran T, Hutchison KE. Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning. Proc Natl Acad Sci U S A. 2007;104(41):16311–6. pmid:17913879
- 9. Moustafa AA, Cohen MX, Sherman SJ, Frank MJ. A role for dopamine in temporal decision making and reward maximization in parkinsonism. J Neurosci. 2008;28(47):12294–304. pmid:19020023
- 10. Bódi N, Kéri S, Nagy H, Moustafa A, Myers CE, Daw N, et al. Reward-learning and the novelty-seeking personality: a between- and within-subjects study of the effects of dopamine agonists on young Parkinson’s patients. Brain. 2009;132(Pt 9):2385–95. pmid:19416950
- 11. Palminteri S, Lebreton M, Worbe Y, Grabli D, Hartmann A, Pessiglione M. Pharmacological modulation of subliminal learning in Parkinson’s and Tourette’s syndromes. Proc Natl Acad Sci U S A. 2009;106(45):19179–84. pmid:19850878
- 12. McCoy B, Jahfari S, Engels G, Knapen T, Theeuwes J. Dopaminergic medication reduces striatal sensitivity to negative outcomes in Parkinson’s disease. Brain. 2019;142(11):3605–20. pmid:31603493
- 13. van Nuland AJ, Helmich RC, Dirkx MF, Zach H, Toni I, Cools R, et al. Effects of dopamine on reinforcement learning in Parkinson’s disease depend on motor phenotype. Brain. 2020;143(11):3422–34. pmid:33147621
- 14. Walton ME, Bouret S. What is the relationship between dopamine and effort? Trends Neurosci. 2019;42(2):79–91. pmid:30391016
- 15. Berke JD. What does dopamine mean? Nat Neurosci. 2018;21(6):787–93.
- 16. Tanaka S, O’Doherty JP, Sakagami M. The cost of obtaining rewards enhances the reward prediction error signal of midbrain dopamine neurons. Nat Commun. 2019;10(1):3674. pmid:31417077
- 17. Mohebi A, Pettibone JR, Hamid AA, Wong J-MT, Vinson LT, Patriarchi T, et al. Dissociable dopamine dynamics for learning and motivation. Nature. 2019;570(7759):65–70. pmid:31118513
- 18. Aberman JE, Salamone JD. Nucleus accumbens dopamine depletions make rats more sensitive to high ratio requirements but do not impair primary food reinforcement. Neuroscience. 1999;92(2):545–52. pmid:10408603
- 19. Mai B, Sommer S, Hauber W. Motivational states influence effort-based decision making in rats: the role of dopamine in the nucleus accumbens. Cogn Affect Behav Neurosci. 2012;12(1):74–84. pmid:22012275
- 20. Chong TT-J, Bonnelle V, Manohar S, Veromann K-R, Muhammed K, Tofaris GK, et al. Dopamine enhances willingness to exert effort for reward in Parkinson’s disease. Cortex. 2015;69:40–6. pmid:25967086
- 21.
Tsai LS. The laws of minimum effort and maximum satisfaction in animal behavior. National Research Institute of Psychology; 1932.
- 22.
Hull CL. Principles of behavior. New York: Appleton-Century-Crofts; 1943.
- 23. Kurzban R. The sense of effort. Curr Opin Psychol. 2016;7:67–70.
- 24. Kurniawan IT, Seymour B, Talmi D, Yoshida W, Chater N, Dolan RJ. Choosing to make an effort: the role of striatum in signaling physical effort of a chosen action. J Neurophysiol. 2010;104(1):313–21. pmid:20463204
- 25. Chong TT-J, Bonnelle V, Husain M. Quantifying motivation with effort-based decision-making paradigms in health and disease. Prog Brain Res. 2016;229:71–100. pmid:27926453
- 26. Chong TT-J, Apps M, Giehl K, Sillence A, Grima LL, Husain M. Neurocomputational mechanisms underlying subjective valuation of effort costs. PLoS Biol. 2017;15(2):e1002598. pmid:28234892
- 27. Alexander GE, Crutcher MD. Functional architecture of basal ganglia circuits: neural substrates of parallel processing. Trends Neurosci. 1990;13(7):266–71. pmid:1695401
- 28. DeLong MR. Primate models of movement disorders of basal ganglia origin. Trends Neurosci. 1990;13(7):281–5. pmid:1695404
- 29. Calabresi P, Picconi B, Tozzi A, Ghiglieri V, Di Filippo M. Direct and indirect pathways of basal ganglia: a critical reappraisal. Nat Neurosci. 2014;17(8):1022–30. pmid:25065439
- 30. Soares-Cunha C, Coimbra B, David-Pereira A, Borges S, Pinto L, Costa P, et al. Activation of D2 dopamine receptor-expressing neurons in the nucleus accumbens increases motivation. Nat Commun. 2016;7:11829. pmid:27337658
- 31. Erfanian Abdoust M, Froböse MI, Schnitzler A, Schreivogel E, Jocham G. Dopamine and acetylcholine have distinct roles in delay- and effort-based decision-making in humans. PLoS Biol. 2024;22(7):e3002714. pmid:38995982
- 32. Niv Y. Reinforcement learning in the brain. J Math Psychol. 2009;53(3):139–54.
- 33. Webber HE, Lopez-Gamundi P, Stamatovich SN, de Wit H, Wardle MC. Using pharmacological manipulations to study the role of dopamine in human reward functioning: a review of studies in healthy adults. Neurosci Biobehav Rev. 2021;120:123–58. pmid:33202256
- 34. Montague PR, Dayan P, Sejnowski TJ. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J Neurosci. 1996;16(5):1936–47. pmid:8774460
- 35. Schultz W, Dayan P, Montague PR. A neural substrate of prediction and reward. Science. 1997;275(5306):1593–9. pmid:9054347
- 36. Pessiglione M, Seymour B, Flandin G, Dolan RJ, Frith CD. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature. 2006;442(7106):1042–5. pmid:16929307
- 37. den Ouden HEM, Daw ND, Fernandez G, Elshout JA, Rijpkema M, Hoogman M, et al. Dissociable effects of dopamine and serotonin on reversal learning. Neuron. 2013;80(4):1090–100. pmid:24267657
- 38. Lefebvre G, Lebreton M, Meyniel F, Bourgeois-Gironde S, Palminteri S. Behavioural and neural characterization of optimistic reinforcement learning. Nat Hum Behav. 2017;1(4).
- 39. Palminteri S, Lefebvre G, Kilford EJ, Blakemore S-J. Confirmation bias in human reinforcement learning: evidence from counterfactual feedback processing. PLoS Comput Biol. 2017;13(8):e1005684. pmid:28800597
- 40. Jarvis H, Stevenson I, Huynh AQ, Babbage E, Coxon J, Chong TT-J. Effort reinforces learning. J Neurosci. 2022;42(40):7648–58. pmid:36096671
- 41. Swainson R, Rogers RD, Sahakian BJ, Summers BA, Polkey CE, Robbins TW. Probabilistic learning and reversal deficits in patients with Parkinson’s disease or frontal or temporal lobe lesions: possible adverse effects of dopaminergic medication. Neuropsychologia. 2000;38(5):596–612. pmid:10689037
- 42. Jocham G, Klein TA, Ullsperger M. Differential modulation of reinforcement learning by D2 dopamine and NMDA glutamate receptor antagonism. J Neurosci. 2014;34(39):13151–62. pmid:25253860
- 43. Janssen LK, Sescousse G, Hashemi MM, Timmer MHM, ter Huurne NP, Geurts DEM, et al. Abnormal modulation of reward versus punishment learning by a dopamine D2-receptor antagonist in pathological gamblers. Psychopharmacology (Berl). 2015;232(18):3345–53. pmid:26092311
- 44. Linden J, James AS, McDaniel C, Jentsch JD. Dopamine D2 receptors in dopaminergic neurons modulate performance in a reversal learning task in mice. eNeuro. 2018;5(1).
- 45. van der Schaaf ME, van Schouwenburg MR, Geurts DEM, Schellekens AFA, Buitelaar JK, Verkes RJ, et al. Establishing the dopamine dependency of human striatal signals during reward and punishment reversal learning. Cereb Cortex. 2014;24(3):633–42. pmid:23183711
- 46. Syed ECJ, Grima LL, Magill PJ, Bogacz R, Brown P, Walton ME. Action initiation shapes mesolimbic dopamine encoding of future rewards. Nat Neurosci. 2016;19(1):34–6. pmid:26642087
- 47. Stelly CE, Haug GC, Fonzi KM, Garcia MA, Tritley SC, Magnon AP, et al. Pattern of dopamine signaling during aversive events predicts active avoidance learning. Proc Natl Acad Sci U S A. 2019;116(27):13641–50. pmid:31209016
- 48. Takano A, Suhara T, Yasuno F, Suzuki K, Takahashi H, Morimoto T, et al. The antipsychotic sultopride is overdosed–a PET study of drug-induced receptor occupancy in comparison with sulpiride. Int J Neuropsychopharmacol. 2006;9(5):539–45. pmid:16288681
- 49. Beaulieu J-M, Gainetdinov RR. The physiology, signaling, and pharmacology of dopamine receptors. Pharmacol Rev. 2011;63(1):182–217. pmid:21303898
- 50. Mehta MA, McGowan SW, Lawrence AD, Aitken MR, Montgomery AJ, Grasby PM. Systemic sulpiride modulates striatal blood flow: relationships to spatial working memory and planning. Neuroimage. 2003;20(4):1982–94.
- 51. Mehta MA, Manes FF, Magnolfi G, Sahakian BJ, Robbins TW. Impaired set-shifting and dissociable effects on tests of spatial working memory following the dopamine D2 receptor antagonist sulpiride in human volunteers. Psychopharmacology (Berl). 2004;176(3–4):331–42. pmid:15114435
- 52. Helmy SA. Therapeutic drug monitoring and pharmacokinetic compartmental analysis of sulpiride double-peak absorption profile after oral administration to human volunteers. Biopharm Drug Dispos. 2013;34(5):288–301. pmid:23585286
- 53. Bond A, Lader M. The use of analogue scales in rating subjective feelings. Br J Med Psychol. 1974;47(3):211–8.
- 54. Bang H, Ni L, Davis CE. Assessment of blinding in clinical trials. Control Clin Trials. 2004;25(2):143–56. pmid:15020033
- 55. Brainard DH. The psychophysics toolbox. Spat Vis. 1997;10(4):433–6.
- 56.
Matlab. Natick, Massachusetts: The MathWorks Inc.; 2018.
- 57. Greenhouse SW, Geisser S. On methods in the analysis of profile data. Psychometrika. 1959;24(2):95–112.
- 58. Dunn OJ. Multiple comparisons among means. J Am Stat Assoc. 1961;56(293):52–64.
- 59.
Jamovi [Computer Software]. Sydney, Australia: The Jamovi Project; 2023.
- 60.
Rescorla RA, Wagner AR. A theory of Pavlovian conditioning: Variations on the effectiveness of reinforcement and non-reinforcement. Classical conditioning II: Current research and theory. In: Black AH, Prokasy WF, editors. New York: Appleton-Century-Crofts; 1972:64–99.
- 61.
Luce RD. Individual choice behavior. Oxford, England: John Wiley; 1959.
- 62. Akaike H. A new look at the statistical model identification. IEEE Trans Automat Contr. 1974;19(6):716–23.
- 63. Wilson RC, Collins AG. Ten simple rules for the computational modeling of behavioral data. Elife. 2019;8:e49547. pmid:31769410
- 64. Niv Y, Daw ND, Joel D, Dayan P. Tonic dopamine: opportunity costs and the control of response vigor. Psychopharmacology (Berl). 2007;191(3):507–20. pmid:17031711
- 65. Beierholm U, Guitart-Masip M, Economides M, Chowdhury R, Düzel E, Dolan R, et al. Dopamine modulates reward-related vigor. Neuropsychopharmacology. 2013;38(8):1495–503. pmid:23419875
- 66. Howe MW, Dombeck DA. Rapid signalling in distinct dopaminergic axons during locomotion and reward. Nature. 2016;535(7613):505–10. pmid:27398617
- 67. da Silva JA, Tecuapetla F, Paixão V, Costa RM. Dopamine neuron activity before action initiation gates and invigorates future movements. Nature. 2018;554(7691):244–8. pmid:29420469
- 68. Hughes RN, Bakhurin KI, Petter EA, Watson GDR, Kim N, Friedman AD, et al. Ventral tegmental dopamine neurons control the impulse vector during motivated behavior. Curr Biol. 2020;30(14):2681–2694.e5. pmid:32470362
- 69. Hamid AA, Pettibone JR, Mabrouk OS, Hetrick VL, Schmidt R, Vander Weele CM, et al. Mesolimbic dopamine signals the value of work. Nat Neurosci. 2016;19(1):117–26. pmid:26595651
- 70. Marcott PF, Mamaligas AA, Ford CP. Phasic dopamine release drives rapid activation of striatal D2-receptors. Neuron. 2014;84(1):164–76. pmid:25242218
- 71. Festinger L, Carlsmith JM. Cognitive consequences of forced compliance. J Abnorm Psychol. 1959;58(2):203–10. pmid:13640824
- 72.
Festinger L. A theory of cognitive dissonance. Stanford University Press; 1957.
- 73. McNamara JM, Trimmer PC, Houston AI. The ecological rationality of state-dependent valuation. Psychol Rev. 2012;119(1):114–9. pmid:22022832
- 74. Cools R, Frank MJ, Gibbs SE, Miyakawa A, Jagust W, D’Esposito M. Striatal dopamine predicts outcome-specific reversal learning and its sensitivity to dopaminergic drug administration. J Neurosci. 2009;29(5):1538–43. pmid:19193900
- 75. Coddington LT, Lindo SE, Dudman JT. Mesolimbic dopamine adapts the rate of learning from action. Nature. 2023;614(7947):294–302. pmid:36653450
- 76. Wise RA. Dopamine, learning and motivation. Nat Rev Neurosci. 2004;5(6):483–94. pmid:15152198
- 77. Guitart-Masip M, Beierholm UR, Dolan R, Duzel E, Dayan P. Vigor in the face of fluctuating rates of reward: an experimental examination. J Cogn Neurosci. 2011;23(12):3933–8. pmid:21736459
- 78. Miller KJ, Shenhav A, Ludvig EA. Habits without values. Psychol Rev. 2019;126(2):292–311. pmid:30676040
- 79. Bennett D, Niv Y, Langdon AJ. Value-free reinforcement learning: policy optimization as a minimal model of operant behavior. Curr Opin Behav Sci. 2021;41:114–21. pmid:36341023
- 80. Collins AGE. A habit and working memory model as an alternative account of human reward-based learning. Nat Hum Behav. 2026;10(2):357–69. pmid:41249816
- 81. Calabresi P, Picconi B, Tozzi A, Di Filippo M. Dopamine-mediated regulation of corticostriatal synaptic plasticity. Trends Neurosci. 2007;30(5):211–9. pmid:17367873
- 82. Matsuda W, Furuta T, Nakamura KC, Hioki H, Fujiyama F, Arai R, et al. Single nigrostriatal dopaminergic neurons form widely spread and highly dense axonal arborizations in the neostriatum. J Neurosci. 2009;29(2):444–53. pmid:19144844
- 83. Greenstreet F, Vergara HM, Johansson Y, Pati S, Schwarz L, Lenzi SC, et al. Dopaminergic action prediction errors serve as a value-free teaching signal. Nature. 2025;643(8074):1333–42. pmid:40369067
- 84. Oleson EB, Gentry RN, Chioma VC, Cheer JF. Subsecond dopamine release in the nucleus accumbens predicts conditioned punishment and its successful avoidance. J Neurosci. 2012;32(42):14804–8. pmid:23077064
- 85. de Jong JW, Afjei SA, Pollak Dorocic I, Peck JR, Liu C, Kim CK, et al. A neural circuit mechanism for encoding aversive stimuli in the mesolimbic dopamine system. Neuron. 2019;101(1):133–151.e7. pmid:30503173
- 86. Lopez GC, Lerner TN. How dopamine enables learning from aversion. Curr Opin Behav Sci. 2025;61:101476. pmid:39719969
- 87. Seidman LJ, Biederman J, Monuteaux MC, Doyle AE, Faraone SV. Learning disabilities and executive dysfunction in boys with attention-deficit/hyperactivity disorder. Neuropsychology. 2001;15(4):544–56. pmid:11761044
- 88. Waltz JA, Frank MJ, Robinson BM, Gold JM. Selective reinforcement learning deficits in schizophrenia support predictions from computational models of striatal-cortical dysfunction. Biol Psychiatry. 2007;62(7):756–64. pmid:17300757
- 89. Czernecki V, Pillon B, Houeto JL, Pochon JB, Levy R, Dubois B. Motivation, reward, and Parkinson’s disease: influence of dopatherapy. Neuropsychologia. 2002;40(13):2257–67.
- 90. McGuigan S, Zhou S-H, Brosnan MB, Thyagarajan D, Bellgrove MA, Chong TT-J. Dopamine restores cognitive motivation in Parkinson’s disease. Brain. 2019;142(3):719–32. pmid:30689734
- 91. Scott BM, Eisinger RS, Mara R, Rana A-N, Bhatia A, Thompson S, et al. Motivational disturbances and cognitive effort-based decision-making in Parkinson’s disease. Parkinsonism Relat Disord. 2025;134:107355. pmid:40120211
- 92. Beeler JA, Frazier CRM, Zhuang X. Putting desire on a budget: dopamine and energy expenditure, reconciling reward and resources. Front Integr Neurosci. 2012;6:49. pmid:22833718
- 93. Collins AGE, Frank MJ. Opponent actor learning (OpAL): modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive. Psychol Rev. 2014;121(3):337–66. pmid:25090423
- 94. Coddington LT, Dudman JT. Learning from action: reconsidering movement signaling in midbrain dopamine neuron activity. Neuron. 2019;104(1):63–77. pmid:31600516
- 95. Chamberlain SR, Müller U, Blackwell AD, Clark L, Robbins TW, Sahakian BJ. Neurochemical modulation of response inhibition and probabilistic learning in humans. Science. 2006;311(5762):861–3. pmid:16469930
- 96. Chong TT-J, Husain M. The role of dopamine in the pathophysiology and treatment of apathy. Prog Brain Res. 2016;229:389–426. pmid:27926449
- 97. Eisenegger C, Naef M, Linssen A, Clark L, Gandamaneni PK, Müller U, et al. Role of dopamine D2 receptors in human reinforcement learning. Neuropsychopharmacology. 2014;39(10):2366–75. pmid:24713613
- 98. Shiner T, Seymour B, Wunderlich K, Hill C, Bhatia KP, Dayan P, et al. Dopamine and performance in a reinforcement learning task: evidence from Parkinson’s disease. Brain. 2012;135(Pt 6):1871–83. pmid:22508958
- 99. Sommer WH, Costa RM, Hansson AC. Dopamine systems adaptation during acquisition and consolidation of a skill. Front Integr Neurosci. 2014;8:87. pmid:25414648