Depressive symptoms are associated with blunted reward learning in social contexts

Depression is characterized by a marked decrease in social interactions and blunted sensitivity to rewards. Surprisingly, despite the importance of social deficits in depression, non-social aspects have been disproportionally investigated. As a consequence, the cognitive mechanisms underlying atypical decision-making in social contexts in depression are poorly understood. In the present study, we investigate whether deficits in reward processing interact with the social context and how this interaction is affected by self-reported depression and anxiety symptoms in the general population. Two cohorts of subjects (discovery and replication sample: N = 50 each) took part in an experiment involving reward learning in contexts with different levels of social information (absent, partial and complete). Behavioral analyses revealed a specific detrimental effect of depressive symptoms–but not anxiety–on behavioral performance in the presence of social information, i.e. when participants were informed about the choices of another player. Model-based analyses further characterized the computational nature of this deficit as a negative audience effect, rather than a deficit in the way others’ choices and rewards are integrated in decision making. To conclude, our results shed light on the cognitive and computational mechanisms underlying the interaction between social cognition, reward learning and decision-making in depressive disorders.


Introduction
One of the core clinical symptoms of depression is anhedonia, which refers to a reduced motivation to engage in daily life activities (motivational anhedonia) and a reduced enjoyment of usually enjoyable activities (consummatory anhedonia) [1,2].In principle, this clinical manifestation could be explained by reduced reward sensitivity, both in terms of incentive motivation and in terms of reinforcement processes [3][4][5].A direct prediction of this hypothesis is that depressive symptoms should be associated with reduced reward sensitivity in learning contexts both at the behavioral and neural level.However, while some studies do find evidence that depressive symptoms in the general population and in clinical depression are associated with blunted reward learning and reward-related signals in the brain [6,7], others indicate no [8,9] or mixed effects [5].As a consequence, there is no strong consensus about which components of reward processing are most predictive of depressive symptoms in both the general population and clinical depression [5].
Another striking clinical manifestation of depressive symptoms is a marked decrease in social interactions.Depression is indeed associated with social risk factors, social impairments and poor social functioning [10].Surprisingly, despite the importance of the socio-cognitive impairments that are often associated with elevated depressive symptoms, non-social aspects have received disproportionate attention.Furthermore, when social aspects are investigated the focus is often on emotional processing and theory of mind but not on how social information is integrated to produce efficient goal-directed behavior [11].In the present study, our goal was to investigate whether the reward-learning deficit that is often associated with elevated depressive symptoms interacts with the social context [12].
According to social learning theory, a sizable amount of decisions are not directly shaped by people's personal history of reward and punishments, but are rather acquired through social observation [13].More specifically, this framework posits that human learning occurs mostly in social contexts, where subjects can be influenced by social cues (i.e.others' choices and outcomes) [13,14].In order to test how depressive symptoms affect the integration of social cues during reinforcement learning, we administered a variant of a previously validated observational learning task on two independent samples of participants [14,15].Subjects also completed psychometric questionnaires assessing depression and anxiety (a co-morbid trait) symptoms.The task included a 'Private' learning condition, in which participants only had access to the outcome of their own choice, and two social conditions: the 'Social-Choice' condition in which participants had access to the demonstrator's choice, and the 'Social-Choice+-Outcome' condition in which participants had access to the demonstrator's actions and their outcome (Fig 1A and 1B).
Our design allowed us to test several hypotheses concerning the relation between depressive symptoms and learning performance in private and social contexts.First, our design allowed us to test whether or not depressive symptoms degrade reward learning per se, as assumed by the standard account of depression as a reward sensitivity deficit.Second, by comparing the 'Private' and the 'Social' learning contexts, we could assess whether or not depressive symptoms are associated with a learning deficit in 'Social' contexts, as predicted by evidence of socio-cognitive impairments in depressive patients.Finally, thanks to computational analyses, we could precisely characterize the learning deficit in the 'Social' context either as a primary social learning deficit (i.e.impaired imitation) or as a secondary social learning (i.e. a negative audience effect).

Experimental protocol and quality checks
An online experiment was particularly suited to test our hypothesis because-compared to laboratory-based experiments-it provides a more diversified pool of subjects, in terms of Participants first performed a training session before choosing their avatar for the task.They were then paired with another player (simulated) represented by an avatar neutral in trustworthiness and dominance.Participants then performed the behavioral task that were organized by randomized blocks.Each block corresponded to a learning condition 'Private', 'Social-Choice' or 'Social-Choice+Outcome' presented once with stable contingencies and one with unstable contingencies (reversal condition).After the task, participants completed the HAD questionnaire and performed the social evaluations as a post-test.(B) Behavioral task.In each condition, participants played in turn with a virtual demonstrator.In each private trial, after each choice, participants received a reward or a punishment.In the Private blocks, participants did not see the choice or the outcome of the demonstrator.In the Social-Choice blocks, the choice of their demonstrator was displayed at each trial.In the Social-Choice+Outcome blocks, both the choice and the outcome of the demonstrator were displayed.(C) Learning behavior of the virtual demonstrator and the participants.The behavior of the virtual partner (top) was simulated using a reinforcement learning model (whose parameters were correctly recovered by our model optimization procedure: black dotted line).Participants accurately learned which option was the most rewarded across the trial.In both the real and simulated tasks a reversal of the contingencies occured at the 10 th ±1 trial (grey shaded area).
https://doi.org/10.1371/journal.pcbi.1007224.g001psychiatric traits and cognitive performance [16][17][18][19].Specifically, we tested 50 participants in the general population and then ran a direct replication of the experiment on a second independent sample of 50 participants.In the main text, we report the meta-analytical p-values computed using a mixed effect meta-analysis.In the tables we present the results separately for each experiment and highlight the replication criteria proposed by the open science framework [20].
Levels of depressive and anxiety symptoms spanned a large range (Table 1) [21], with good internal consistency (Hospital Anxiety Depression scale-depression subscale: Cronbach's alpha 85%; anxiety subscale: Cronbach's alpha 84%).Participants were paired with a virtual demonstrator and performed a probabilistic reinforcement learning task in three contexts: a 'Private' condition, in which participants performed the task individually with no access to the demonstrator's choices and outcomes, and two social conditions: the 'Social-Choice' condition in which participants had access to the demonstrator's choices, and the 'Social-Choice+ Outcome' condition in which participants had access to the demonstrator's choices and their outcome.Overall, participants displayed robust instrumental learning and chose the most rewarded symbol above chance in all conditions (meta-analysis 'Private': M META = 0.65 ± 0.

Assessing observational learning
Contrary to previous studies [14,15], we used an online adaptive learning algorithm that determined the demonstrator's behavior (Q-learning with learning rate = 0.5 and choice temperature = 10).As a consequence, the virtual demonstrators displayed realistic learning curves with some variability of performance (Fig 1C).We predicted that observational learning would result in a correlation between the participants' and the demonstrator's correct choice rate in a given learning session.As predicted, a higher correct choice rate for the demonstrator was associated with a higher correct choice rate for participants in both social conditions ('Social-Choice' condition: r META = .20± 0.07, z META = 2.89, p = .004;'Social-Choice+ Outcome' condition: r META = .20± 0.07, z META = 2.87, p = .004)but not in the private condition (r META = -.01 ± 0.11, z META = -0.05,p > .250;Fig 2A ; see Table 2 for the results on the two samples separately).
In order to confirm that participants actually integrated the virtual demonstrator as a social partner, we measured the influence of participants' rating of trustworthiness of the demonstrator's face on social learning.An effect of perceived trustworthiness evaluations was found, such that participants who perceived the demonstrator's avatar as more trustworthy had higher correct choice rates in the 'Social-Choice' (r META = .32± 0.13, z META = 2.54, p = .011)and in the 'Social-Choice+Outcome' conditions (r META = .29 ± 0.10, z META = 2.96, p = .003)

Correlation between depressive symptoms and performance
A significant effect of depressive symptoms was found such that the higher the depressive symptoms, the lower the rate of correct choices in the 'Social-Choice' condition only (r META = -.33 ± 0.  Table 2. Main statistical effects obtain by correlations on the performances in 'Private', 'Social-Choice' and 'Social-Choice+Outcome' conditions, with three replication criteria.For each correlation we report the result (Pearson's correlation coefficient, p-value and t-value; (± corresponds to s.e.m.) in the first (E0) and the second (E1) experiment, as well as the meta-analytical p-value (E META ).For the results with a significant meta-analytical p-value, to better visualize the replicability, we also explicitly report replication parameters ('+' = yes; '-' = no): i) whether or not the E1 effect is within the 95% confidence interval of the E0 effect; ii)whether or not the effect was significant in both experiments; (iii) whether or not E META was significant.n.a.: not applicable.linear logistic regression that included depressive and anxiety scores, taken as continuous between-subject variables (the regression also included a range of controls listed in Table 3).
The analysis revealed a significant effect of depression scores such that the higher the depressive scores, the lower the rate of correct choices in the 'Social-Choice' condition compared to the 'Private' condition (z META = -2.85,p = .004;no other significant effect of depression and anxiety scores was evidenced: all ps > .250;Fig 3A).Importantly, the negative effect of depressive symptoms in the 'Social-Choice' condition was particularly robust, because it was found in both the discovery and the replication sample and in the blocks with stable and reversal contingencies (within-subject Finally, we tested whether the correct choice rates in the 'Social-Choice' condition identified participants with difficulties linked to depressive symptoms (i.e.scoring � 8 on the HAD depression subscale [21]) from participants in whom these difficulties are absent.The classification analysis revealed that the performance in the 'Social-Choice' condition identified participants with depressive symptoms with good accuracy of 73 ± 1% and with good sensitivity, or True Positive Rate (82 ± 2%) but low specificity, or True Negative Rate (53 ± 3%) of the classifier (Fig 4A).

Computational model-based analyses
Although model-free analyses reveal a robust negative effect of depressive symptoms on learning in the 'Social-Choice' condition, they do not elucidate the cognitive mechanisms underlying this effect.Indeed, the effect of depressive symptoms could either be due to differences in social information processing, such as the demonstrator's choices and outcomes (i.e. a primary social learning deficit) or to differences in the weighting of the information generated by participants' own choices when social information is also available (i.e. a secondary social learning deficit or audience effect).These two hypotheses are hard to tease apart based on raw behavioral analyses, because both predict a reduced correct choice rate in the 'Social' conditions.Thus, to arbitrate between these two possibilities, we fitted a previously validated social reinforcement learning model [14,24].This model allows for biasing participants' choice depending on the demonstrator's choice in the 'Social-Choice' condition (i.e.imitation) and to update the value attributed to each symbol depending on the demonstrator's outcome in the 'Social-Choice+Outcome' condition (i.e.vicarious trial-and-error).To directly assess the 'socially induced individual learning deficit' hypothesis [14], we allowed participants to have different individual learning parameters in the 'Private' (learning rate: α P ,temperature parameter: β P )  More precisely, individual learning and decision-making were modeled with classical softmax (Eq 1) and delta-rule (Eq 2) functions, respectively governed by learning rate and choice randomness (or temperature) parameters: Where RPEt is the reward prediction error calculated as follows (Eq 3): During the 'Social-Choice' condition, the model assumes that the Demonstrator's choice induces an 'action' prediction error (APE t ; (Eq 4)), which measures how surprising the Demonstrator's choice is, given the subject's current estimate of the probability of selecting this option: The APEt is then used to bias choice probability (Eq 5) in the subsequent trial and the effect is scaled by a parameter κ 2 {0-1}: Finally, in the 'Social-Choice+Outcome' trials, the model assumes that the demonstrator's outcome induces an 'observational' reward prediction error (Eq 6), which is scaled by observational learning rate α O 2 {0-1} (Eq 7): To sum up, this computational model allowed us to address both primary social learning deficits (i.e.learning deficits captured by the parameters κ and α O , which are specific to social information) and secondary social learning deficits (i.e.learning deficits captured by the parameters β S and α S , which are specific to individual learning in contexts where social information is available).Interestingly, high depression scores were not solely associated with decreased learning rates in the 'Social' conditions, but also with decreased learning rates in the 'Social' conditions when controlling for the learning rates in the 'Private' condition (z META = -3.08,p = .002),which indicates that the presence of social information decreased the learning rate of the most depressed participants.To assess the complementary utility of computational measures, we tested whether the learning rate in the 'Social' conditions could identify participants with symptoms of depression (i.e.HAD depression subscale score equal or above 8 [21]).The difference in learning rates detected participants with depressive symptoms (score � 8) with good accuracy (64 ± 1%), good sensitivity (64 ± 2%) and good specificity (65 ± 3%

Model simulations analyses
Model-based analyses indicated that the severity of depressive symptoms specifically reduced individuals' learning rate in 'Social' conditions (α S ): a parameter that is used both in the 'Social-Choice' and in the 'Social-Choice+Outcome' condition.Model-free behavioral analyses showed that the learning deficit associated with depressive symptoms was specific to the 'Social-Choice' condition.To ascertain that this computational result was compatible with our model-free observation, we ran the same statistical analysis on simulated data [25].Crucially, data simulated using the fitted parameters accurately recovered the decrease in performance associated with depression scores in the 'Social-Choice' condition compared to the 'Private' condition using the same mixed linear regression as on behavioral data (z META = -2.72,p = .007)as well as the blunted effect of depression scores in the 'Social-Choice+Outcome' condition compared to the 'Private' condition (z META = -1.74,p = .082).Therefore, it appears that, although depressive symptoms are associated with decreased learning rates in both social conditions, its detrimental effect is manifest only in the 'Social-Choice' condition.This is probably due to showing the demonstrator's outcomes in the 'Social-Choice+Outcome' condition.This additional outcome information may compensate for the decreases learning rates with depressive symptoms.Confirming this intuition, our simulation analyses accurately recovered the absence of significant effect of depressive symptoms in the 'Private' condition (z META = -0.29,p > .250;S6 Fig) .Thus, the simulations captured the specificity of the behavioral effect of depression scores and illustrate that our model provides an accurate description of the data.

Checking parameter recovery
As we were interested in the modulation of specific parameters by depression scores we tested whether our task allowed us to successfully retrieve a correlation between parameters in simulated datasets, an important quality check often referred to as 'parameter recovery' [25].To do so, we ran 100 sets of simulations for each parameter, each simulating 100 participants, with the parameter of interest correlating with an arbitrary variable (defined as the depression scores) and the other parameters being randomly set for each participant in the range obtained by optimization on the total sample.The simulated data were then fitted using our social reinforcement-learning model.Overall parameter recovery was very good, especially for the parameters of the social conditions, with significant correlations were found in the 100% of the simulated datasets (average correlation coefficient of the parameters: r = 0.73 ± 0.01).Importantly, the recovery of the correlations was specific to the manipulated parameter with false alarms detected in less than 10% of the cases except for learning rate and choice temperature in the 'Private' condition (which was not our condition of interest) (Fig 5B).This result indicates that it is very unlikely that a correlation of one of our parameters with participants' HAD depression scores is due to an effect of depression scores on another parameter.

Discussion
In the present study we assessed reinforcement learning with a behavioral paradigm involving both private and social contexts, while concomitantly assessing depressive and anxiety symptoms in the general population.First, we replicate previous findings showing that participants integrate the demonstrator's choices and outcomes, which is consistent with the idea that social learning processes (both in terms of imitation and vicarious trial-and-error) play a role in human reinforcement learning [14,15,[26][27][28].Second, we show that the severity of depressive symptoms is associated with a learning impairment that is specific to the learning context where participants are informed about the demonstrator's choices (social context).This negative effect was robust to the inclusion of anxiety, and robust across experiments and outcome contingencies.Finally, computational analyses allowed us to characterize the effect of depressive symptoms as a secondary social learning deficit, i.e. a reduction of the learning rate in social contexts.We found that depressive symptoms had a specific effect on imitation in the 'Social-Choice' condition.Crucially, the effect was robust to the inclusion of anxiety, which did not modulate performance in our task.That anxiety had no effect may come as a surprise given that previous studies have found that anxiety is associated with deficits in social and non-social reinforcement learning [29].One possible explanation is that anxiety might be more strongly linked to classical fear conditioning than reward-based instrumental learning [30].Depressive symptoms might thus undermine social reinforcement learning in instrumental and reward-maximization contexts, while anxiety might affect the same processes when outcomes are independent from the participants' choices (i.e.Pavlovian learning) and when outcomes have a negative valence (aversive contexts).
Model-free analyses per se do not allow us to pinpoint the psychological mechanisms underlying the negative effect of depressive scores on correct choice rates in the 'Social-Choice' context.The absence of interaction between the demonstrator's performance and depressive symptoms suggests that depressive symptoms did not lead participants to disproportionally follow 'bad examples' or to be insensitive to 'good examples'.However, interpretations based on negative results are, at best, unsafe.To formally characterize the psychological mechanisms of the detrimental effects of depressive symptoms we thus turned to model-based analyses.
We fitted subjects' choice with a slightly modified version of a previously validated social reinforcement-learning model [14].As in standard algorithms, the model assumes that subjects learn option values via the calculation of a reward prediction error, that the values are moderated by a learning rate (α P ) and that choices are generated via a soft-maximization process whose stochasticity is governed by a temperature (β P ) [31].In addition to this 'private' learning module, the model also displays sensitivity to social information: in the 'Social-Choice' condition the demonstrator's choice biases the subsequent subject's choice (the magnitude of this effect is governed by an imitation rate κ) and in the 'Social-Choice+Outcome' condition the demonstrator's outcome is integrated into the subject's value function with a vicarious learning rate (α O ).Finally, we also allowed for different private learning rates and temperatures in the 'Social' contexts (α S and β S ).This precise model parameterization allowed us to disentangle two different hypotheses concerning the drop in performance associated with depressive symptoms in the 'Social-Choice' condition.A correlation between depressive scores and imitation rates and/or vicarious learning rates would imply what we define a 'primary' social learning impairment (i.e. an impairment of the social learning processes per se).On the contrary, a correlation between the 'Social' context-specific learning rate and/or temperature would imply a 'secondary' social learning impairment (i.e. an impairment of the private learning processes in presence of social information).We found that depressive scores negatively correlated with the private learning rate in the social context (α S ), thus indicating that the effect was consistent with a secondary impairment and was specific to the learning (as opposed to the decision) process.In other words, our computational results suggest that one possible way in which depressive symptoms affect learning in social contexts is conceptually similar to a negative audience effect [32,33], where the presence of social signals (the demonstrator's choices) induces a reduction of subjects' instrumental performance.
From a methodological point of view, our study exemplifies how computational approaches can provide new insights on the way in which cognitive processes vary with clinical symptoms.Indeed, computational modeling demonstrated that the effect of depressive symptoms was selective of the way individual information was processed [34,35].It is worth noting that these conclusions were only allowed after a careful testing of the ability of our task to precisely identify which model parameter was influenced by depressive symptoms [25].The exact cognitive and psychological mechanisms that mediate the negative effect of social signals in instrumental performance remain to be characterized.One possibility given that depressive symptoms are associated with lower cognitive functioning in general [36] is that the mere presence of others exacerbates these difficulties by capturing already scarcer attentional resources.Alternatively, negative perception of self and negative comparison to others are core symptoms of depressive symptoms [37].Therefore, it is possible that the most depressed participants perceived their demonstrator's behavior as more reliable, thus underweighting the information they acquired through their own experience.
Our results provide new evidence that depression-related reward learning deficits are highly context-dependent [3][4][5], and suggest that the difference in learning rates associated with depressive symptoms may only arise in social contexts [5,9].Crucially, our results suggest that supposedly neutral aspects of the experimental setup (such as whether or not the task is done in the presence or absence of an experimenter), may affect the results and explain inconsistent findings [38].In line with recent propositions, our results also suggest that a deeper investigation of socio-cognitive impairments in depressive symptoms may provide important new insights [10,11].Following this idea, it would be particularly interesting to contrast the effect of depressive symptoms on learning when the information is socially (as in the current study) compared to asocially provided.Finally, we suggest that developing tools assessing reward learning outside and inside social contexts (characterized either by the presence of another player or by the social nature of the outcomes [39]) may prove useful to improve diagnosis and personalize treatments of depressive syndromes in the long term.
An obvious limitation of our study, is that we did not control for participants' actual diagnosis and treatment, which may be problematic since medication interacts with decision-making in depression [40].Therefore, our results would benefit from being replicated in carefully characterized population, while controlling for medication status and medical history.This replication would allow us to further measure the diagnostic value of our behavioral task and associated computational model-based analyses.Indeed, in the present study, we only tested its ability to detect participants with depressive symptoms as identified by a self-rated scale [21] .It would be particularly interesting to test whether our behavioral and computational measures improve existing self-assessments that detect clinically diagnosed cases of depression [41].Finally, longitudinal designs will be required to assess whether or not our behavioral and computational measures present good test-retest reliability and reflect states or traits, and whether or not they predict the evolution of depressive symptoms to clinical diagnosis.
Our results have implications beyond their clinical relevance.Consistent with the 'social learning theory' participants imitated demonstrators' choices ('Social-Choice' condition) and learned from their outcomes ('Social-Choice+Outcome' condition) [13,14].At the behavioral level, these two psychological processes were manifest in the fact that participants' performance was modulated by the demonstrators' performance.In particular, we found that participants observing a demonstrator performing 'well' performed better in the social compared to the private learning context.Importantly, the opposite was also true: participants observing low performing demonstrators displayed lower performance in the social compared to the private context.This latter result is in apparent contrast with the normative view that imitation should be biased toward successful individuals in order to be evolutionary adaptive [42][43][44].This is also in contrast with recent empirical evidence using a very similar paradigm and showing that imitation rate is modulated by the actual performance of the demonstrator, so that demonstrators making random (i.e., non reward-maximizing) decisions are less imitated [15].Two differences between the previous design and ours may explain this discrepancy.First, the previous study involved mild electric shocks (primary reinforcer), while our study involved abstract points to be converted into money (secondary reinforcer).More importantly perhaps, the previous design involved a between-subjects design with two groups of participants paired either with a consistently good or with a consistently bad participant, while in our experiments the performance of the demonstrator was allowed to fluctuate in a within-subject manner around an optimal behavior.Therefore, it could also be argued that our experiment is not wellsuited for measuring demonstrators' performance effects on participants' imitation behavior as such effects require a relatively long and stable reputation building process [45,46].
The question remains whether or not social learning in our task (imitation and vicarious trial-and-error) engaged domain-specific social cognitive module or domain-general information processing modules.In the absence of additional data (such as neuroimaging) we cannot provide a definitive answer.However, evidence from post-learning face ratings provides some clues [47].We found a positive correlation between performance in the social contexts and the demonstrator's judgment of trustworthiness.Even if we cannot infer a causal link and its direction from the post-learning face evaluation, these results suggest that a specific socio-cognitive module (face evaluation) correlated with instrumental performance, thus demonstrating the engagement of social information-specific processing and our reinforcement learning task.

Participants
Two independent cohorts of 100 American participants, similar in terms of reported age (mean reported age across the two cohorts: 33.39 ± 2.03) and of reported male/female ratio (mean reported male/female ratio across the two cohorts: 35%; see Table 1) were recruited via Amazon Mechanical Turk to participate in this online study.Each participant received a fixed 4$ amount for completing the 40-minute task to which a bonus earned during the experiment was then added (average bonus: 0.49$).Participant received a description of the study and signed an informed consent before starting the experiment.The study was approved by the the local Ethical Committee (Conseil d'e ´valuation e ´thique pour les recherches en sante ´-CERES n2 01659) and is in accordance with the Declaration of Helsinki (World Medical Association, 2008).The first cohort corresponded to a 'discovery experiment' where we explored the relation between instrumental performance and clinical scores; the second cohort corresponded to a 'replication experiment' where we tested the robustness and replicability of the effect identified in the first experiment.

Experimental design
Participants performed the probabilistic instrumental learning task described in the Results section (Fig 1A and 1B).The task was programmed on Qualtrics and was composed of six learning blocks of 20 trials each.In each block, participants had to choose between two cues.Cues were characters of the agathodaimon font and were always presented in pair and only in one block per subject.The cue-to-condition attribution was randomized across subjects.Participants made their choice by pressing the E or P keys to choose the leftmost or rightmost symbol.Participants were given no explicit information on reward probabilities, which they had to learn through trial and error.In addition, they were encouraged to accumulate as many points as possible, with their final amount of points being translated into bonus money at the end of the experiment (conversion rate: 40 points equals 1$ bonus).In each pair, cues were associated with reciprocal reward probabilities (20/80% or 30/70%).For instance, in a 30/70% pair, the most rewarded cue provided a positive outcome (+1 point) 70% of the times and a negative outcome (-1 point) 30% of the time, while the less rewarded cue provided a negative outcome 70% of the time and a positive outcome 30% of the time.Participants had unlimited time to make their choice (Mean reaction time: 2.47 ± 0.88 s, no significant effect of depressive symptoms were found on the reaction times, all ps > .250).
Participants were told they had been paired with another player at the beginning of the experiment with whom they played in turn in each trial.In addition, it was indicated that there was no competition between them and the other player and that each player played for her/ himself.As in previous studies [48], the behavior of the demonstrators was determined by a reinforcement learning algorithm (Q-learning) with a reasonable set of free parameters (α = 0.5, ß = 10; see below for a description for the Q-learning and its parameters).To avoid social perceptual biases, the other player was represented by a neutral avatar, chosen to be generally perceived as neither dominant or submissive nor trustworthy or untrustworthy [49].Participants had to choose their own avatars in a set of other 16 identities (8 female, 8 male) at the beginning of the task.Participants performed this task in three different contexts with different amounts of social information: a 'Private' condition in which they did not have access to the demonstrator's behavior, a 'Social-Choice' condition in which participants could see the demonstrator's behavior but not their outcomes and a 'Social-Choice+Observation' in which participants could observe the demonstrator's decisions and outcomes.Importantly, participants performed each condition ('Private', 'Social-Choice' and 'Social-Choice+Outcome') in separate blocks and each block was repeated twice.In the 'Stable' type of contingency, outcome probabilities were set at 30/70% and did not change during the block.In the 'Reversal' type of contingency, outcome probabilities were set at 20/80% and was inverted across cue after 10 trials (in average).Finally, at the end of the experiment, participants rated their demonstrator's avatar on three personality traits (trustworthiness, dominance and competence) and completed the Hospital Anxiety and Depression Scale [21] as well as the Peters et al.Delusions Inventory, that was included in the exploratory analysis of the Discovery sample and then discarded in absence of any significant effect and its inclusion did not affect the effect of depression.The total procedure lasts approximatively 45 minutes.

Statistical analyses
The analyses were performed on all participants and trials.No exclusion criteria was applied.
Percentage of correct choices.Percentage of correct choices were extracted for each block and either correlated or used as a continuous dependent variable.
Meta-analysis.Meta-analyses were ran using a mixed-effects model which is a conservative method for computing meta-analytic effects across studies.More precisely, this method weights each study depending on its variability and allows non-random differences in effect sizes between samples and computes the average of the distribution of the effect sizes.These analyses were performed using R Metafor package [50].
Regression analyses.A mixed linear regression with both random intercept and random slopes was conducted on correct choice rates taking participants' ID as a random factor, condition ('Private', 'Social-Choice vs 'Social-Choice+Outcome') as within-subject variables and depression and anxiety scores as well as demonstrator's performance and trustworthiness judgment as continuous between-subject variables (Table 3).
Classification analyses.Out of sample tests were used to assess whether our task was able to distinguish participants scoring above the 'depressive symptoms absent' threshold in depression scale from those below this threshold.50 participants were randomly extracted from the entire sample and used to optimize a classifier of depressive symptoms (HAD depression subscale score above or equal to 8 [21]) using either the correct choice rates in the 'Social-Choice' condition (model-free measure) or the learning rates in the Social information conditions (α S model-based measure; see below).The optimal cut-off was defined to jointly maximize the specificity (true negative rate) and the sensitivity (true positive rate) of the classifier on the training sample.The classifier and the associated optimal cut-off was tested on the 50 remaining participants.This operation was repeated 100 times in order to estimate the average accuracy, sensitivity and sensibility of the classifiers.

Computational analyses
Model fitting.Computational analyses were performed after the collection of the replication sample.However, in order to assess the robustness of our computational model, our computational results are presented as a meta-analysis across the exploratory and replication samples (S2 Table ).
We optimized the model parameters by minimizing the Laplace approximation to the model evidence (log of the posterior probability: LPP) (Eq 8): Where D represents the data, θ 1,. ..n the model, and θ k represents one of the n parameters of the computational model.The LPP represents a trade-off between the model's accuracy and complexity: it increases with the likelihood of the model given the data (a measure of fit) and decreases with the number of parameters.By including priors over the parameters, this method avoids degenerate parameter estimation.In our analysis, the priors were defined as a gamma function (gampdf(1.2,5))for the temperature parameters (range: 0<β<Infinite) and as a beta function (betapdf(1.1,1.1)) for the learning and imitation rates (ranges: 0<α<1, 0<κ<1) as described in [51] (see Table 4 for the estimated parameters).Importantly, LPP analysis suggested that the social reinforcement learning fit the data better than a simple Q-learning model without social influence, even accounting for its extra-complexity (social reinforcement learning model: posterior probability: 90 ± 3%; exceedance probability: 100%).As a control analysis, in order to ensure that our model comparison criterion was not over-fitting prone, we fit the behavior of the virtual demonstrators that we generated with a Q-learning model.This model recovery analysis [25] correctly indicated that the simple Q-learning model explained the demonstrators' data better (social reinforcement learning model: posterior probability: 100 ± 0%; exceedance probability: 100%) (see supplementary figures and table for additional information concerning the parameter recovery analysis).
Because the model parameters were correlated with each other (maximal correlation: r = 0.53; S4 Table), we used structural equation modeling in addition to correlation analyses to analyze the influence of depression scores on the model parameters.This technique allowed us to test the influence of depression scores on each parameter while simultaneously accounting for the inter-correlations of the dependent variables (the model free parameters) and of the independent variable (the depression score).
Model simulation analyses.Finally, we assessed the ability of the model to recover the observed behavioral effect of depressive symptoms using model simulations [25].For each participant, we simulated behavioral data for each condition based on their best fitting parameters.Importantly, a simulated demonstrator was also generated, such that the simulated data were completely independent of the contingencies actually experienced by the participants.This procedure was repeated 100 times, to avoid any effect of participant's and demonstrator's history of choice and outcomes.The analysis of the recovered percentage of correct choices was ran on the averaged rates of correct choices across the 100 simulations using a linear mixed regression taking the exact same predictors as the mixed general linear model used for analyzing participants' percentage of correct choices.

Fig 1 .
Fig 1. Learning task and learning behavior (A) Experimental procedure.Participants first performed a training session before choosing their avatar for the task.They were then paired with another player (simulated) represented by an avatar neutral in trustworthiness and dominance.Participants then performed the behavioral task that were organized by randomized blocks.Each block corresponded to a learning condition 'Private', 'Social-Choice' or 'Social-Choice+Outcome' presented once with stable contingencies and one with unstable contingencies (reversal condition).After the task, participants completed the HAD questionnaire and performed the social evaluations as a post-test.(B) Behavioral task.In each condition, participants played in turn with a virtual demonstrator.In each private trial, after each choice, participants received a reward or a punishment.In the Private blocks, participants did not see the choice or the outcome of the demonstrator.In the Social-Choice blocks, the choice of their demonstrator was displayed at each trial.In the Social-Choice+Outcome blocks, both the choice and the outcome of the demonstrator were displayed.(C) Learning behavior of the virtual demonstrator and the participants.The behavior of the virtual partner (top) was simulated using a reinforcement learning model (whose parameters were correctly recovered by our model optimization procedure: black dotted line).Participants accurately learned which option was the most rewarded across the trial.In both the real and simulated tasks a reversal of the contingencies occured at the 10 th ±1 trial (grey shaded area).

Fig 3 .
Fig 3. Effect of depression scores on reinforcement learning.(A) Effect of depression scores on learning.Scatter plots representing the correlation between the correct choice rate and the self-reported depression score in the three learning contexts (from left to right: 'Private', 'Social Choice', 'Social Choice+Outcome').(B) Effect of anxiety scores on learning.Scatter plots representing the correlation between the correct choice rate and the self-reported anxiety score in the three learning contexts 'r' = Pearson's correlation coefficient.˚p<0.10,� p<0.05,Pearson's correlation.https://doi.org/10.1371/journal.pcbi.1007224.g003

Fig 4 .Fig 5 .
Fig 4. Social reinforcement learning model (A) Computational model.A social reinforcement learning model was fitted on participants' behavior.In the 'Private' condition ('Private context'), the model corresponded to a classical Q-learning (or Rescorla-Wagner) model.In Social context' ('Social-Choice' and 'Social-Choice+Outcome' conditions), the model assumes that social information is integrated into the learning and decision process.Following Burke et al. [14], choice probability was updated based on the demonstrator's action (imitation) in the 'Social-Choice' condition and the option value was updated when the demonstrator's outcome was presented (counterfactual learning) in the 'Social-Choice+Outcome' condition.The proposed model also allows for different ).A comparison between a classifier based on the model parameters and a classifier based on correct choice rates revealed that the model-based classifier was more specific to detect participants with higher symptoms of depression (t(198) = 5.86, p < .001),but was less sensitive (t(198) = -12.03,p < .001;Fig 4C) than the classifier based on correct choice rates.

Table 1 . Descriptive statistics for age, gender, depression and anxiety scores. For
each sample, the mean of each demographic variable is presented with its 95% confidence interval.

Fig 2B). This
effect of the social evaluation of the demonstrator's avatar confirms that participants processed the information in a social context.