Authors FG and TS have no conflict of interests. DC declares to have been employed by Google LLC, Mountain View, at the time of preparing the article for publication.
To make good judgments people gather information. An important problem an agent needs to solve is when to continue sampling data and when to stop gathering evidence. We examine whether and how the desire to hold a certain belief influences the amount of information participants require to form that belief. Participants completed a sequential sampling task in which they were incentivized to accurately judge whether they were in a desirable state, which was associated with greater rewards than losses, or an undesirable state, which was associated with greater losses than rewards. While one state was better than the other, participants had no control over which they were in, and to maximize rewards they had to maximize accuracy. Results show that participants’ judgments were biased towards believing they were in the desirable state. They required a smaller proportion of supporting evidence to reach that conclusion and ceased gathering samples earlier when reaching the desirable conclusion. The findings were replicated in an additional sample of participants. To examine how this behavior was generated we modeled the data using a drift-diffusion model. This enabled us to assess two potential mechanisms which could be underlying the behavior: (i) a valence-dependent response bias and/or (ii) a valence-dependent process bias. We found that a valence-dependent model, with both a response bias and a process bias, fit the data better than a range of other alternatives, including valence-independent models and models with only a response or process bias. Moreover, the valence-dependent model provided better out-of-sample prediction accuracy than the valence-independent model. Our results provide an account for how the motivation to hold a certain belief decreases the need for supporting evidence. The findings also highlight the advantage of incorporating valence into evidence accumulation models to better explain and predict behavior.
People tend to gather information before making judgments. As information is often unlimited a decision has to be made as to when the data is sufficient to reach a conclusion. Here, we show that the decision to stop gathering data is influenced by whether the data points towards the desired conclusion. Importantly, we characterize the factors that generate this behaviour using a valence-dependent evidence accumulation model. In a sequential sampling task participants sampled less evidence before reaching a desirable than undesirable conclusion. Despite being incentivized for accuracy, participants’judgments were biased towards believing they were in a desirable state. Fitting the data to an evidence accumulation model revealed this behavior was due both to the starting point and rate of evidence accumulation being biased towards desirable beliefs. Our results show that evidence accumulation is altered by what people want to believe and provide an account for how this modulation is generated.
Judgments are formed over time as information is accumulated [
It seems probable, however, that the decision to stop gathering evidence would also be influenced by the desire to hold one belief over another [
Here, we set out to empirically examine in a controlled laboratory setting whether and how the desire to hold a belief influences the amount of information required to reach it, when all else is held equal. Presently, we have limited understanding if and how motivation alters evidence accumulation, despite the potential for such effects to dramatically impact people’s decisions in domains ranging from finance to politics and health [
Specifically, we hypothesized that the desire to hold one judgment over another could alter information accumulation in at least two ways. First, people may be predisposed towards desired judgments before observing any evidence at all (for example, one may believe it will be a nice day before checking the weather or glancing outside) [
To dissociate these mechanisms, we use a computational approach. We adopt a sequential sampling model to model noisy evidence accumulation towards either of two decision thresholds [
In our task participants witness various events that are contingent upon which one of two hidden states they are in. One state was associated with greater rewards than losses (desirable state) and the other with greater losses than rewards (undesirable state). The participants had no control over which state they were in; their task was simply to judge the state, gaining additional rewards for accurate judgments and losing rewards for inaccurate judgments. Thus, it is in participants’ best interest to be as accurate as possible and they were allowed to accumulate as much evidence as they wish before making a judgment. We examine whether and how the accumulation process is sensitive to participants’ motivation to believe that they are in one state and not the other.
We tested 84 participants on “The Factory Game” (
On each trial participants saw TVs and phones moving along the screen and had to guess if they were in a TV factory (that sometimes produces telephones) or a phone factory (that sometimes produces TVs). They were incentivized for accuracy and could enter their judgment whenever they liked. Each participant was “invested” in one factory. On trials where they happened to be in that (desirable) factory they gained points, on trials in which they happened to be in the other (undesirable) factory they lost points.
Additionally, participants were told that they had “invested” in either a telephone or television factory. In the context of the game, this meant that they received a bonus payment when they happened to be visiting the type of factory they had invested in (
We also ran a replication and extension study (N = 92), which is described in
The proportion of factories participants judged as desirable was significantly greater than the number they actually encountered (mean = 53.7%, t(83) = 3.42, p < 0.0001). They gathered less samples before concluding they were in a desirable than undesirable factory (t(83) = -3.10, p < 0.01) and required a smaller proportion of samples to be consistent with their judgment when reaching that conclusion. The latter point is shown by fitting a psychometric function to the data which relates the percentage of TVs observed on a trial to participants’ judgment on whether they are visiting a TV or telephone factory. This was done separately for participants for whom the TV factory was desirable and for whom it was undesirable. As expected, both functions show that the greater the proportion of TVs on a trial the more likely participants are to judge the factory as a TV factory (TV factory desirable: β1 = 25.24, 95% CI [21.20, 29.28], TV factory undesirable: β1 = 24.34, 95% CI [20.81, 27.88]). Crucially, as can be observed in
Fitted psychometric function on (A) participants’ data reveals that the probability of judging a factory as a TV factory increases with proportion of TVs observed. Importantly, a smaller proportion of TVs is needed to judge a factory as a TV factory when the factory is desirable than when it is undesirable. (B) The same pattern is observed when plotting simulated data generated from winning model 4 (see
As participants concluded they were in a desirable factory more often than undesirable factory, they were more likely to falsely believe they were in a desirable factory when in an undesirable factory (30.96% of undesirable factories wrongly categorized) than to falsely believe they were in an undesirable factory when in a desirable factory (only 24.78% of desirable factories wrongly categorized),
In sum, the results show that participants were more likely to believe they were in a desirable factory. They gathered less samples before making these judgments and required a smaller proportion of the samples to be consistent with said belief. We next sought to understand how this behavior was generated by characterizing the underlying computations that give rise to the behavior. In particular, the bias we observed may have emerged if valence was modulating (i) the starting point of the accumulation process; (ii) the rate of evidence accumulation; or (iii) both. To tease apart these possible mechanisms we modeled the data as a drift-diffusion process.
Responses were modeled as a drift-diffusion process [
The Deviance Information Criterion (DIC), a generalization of the Akaike Information Criterion for hierarchical models, was calculated for each model. The DIC scores indicated that Model 4, which included a valence dependent starting point and drift rate, outperformed all other models (
(
Number | Model | Starting point (z) | Drift Rate (v) | DIC | BPIC |
---|---|---|---|---|---|
1. | 28695 | 28937 | |||
2. | 28521 | 28821 | |||
3. | 0< |
28534 | 28828 | ||
4. | 0< |
||||
5. | 28670 | 28926 | |||
6. | 0< |
28522 | 28831 |
Our replication study also returned an identical pattern of results—a DDM model in which drift rate and starting point were valence-dependent provided the best fit to the data (
To evaluate whether the above model specifications would benefit from including collapsing boundaries rather than a fixed decision threshold, we also fitted a model where the decision threshold was expressed as a Weibull cumulative distribution function (fit individually to each participant; see
To test for predictive accuracy, we fitted both the winning model (which includes valence dependent drift rate and starting point) and the valence-independent model to data from even trials and evaluated how well the models predicted responses on odd trials using mean absolute error (MAE) as a measure of fit (
We simulated data on odd trials, based on parameter estimates obtained from fitting the data on even trials, separately for the winning valence-dependent model and the valence independent model. For each trial we calculated (
The findings show that motivation has a profound effect on the process by which evidence is accumulated. On trials in which participants indicated they believed the state was desirable, they ceased gathering data earlier and required a smaller proportion of samples to be consistent with that conclusion. We used a computational model to characterize the underlying factors that may generate this behavior. The model revealed two factors; first, participants began the process of evidence accumulation with a biased starting point towards the desired belief. Thus, they required less evidence to reach that boundary. Second, the drift rate–the rate of information accumulation [
Most learning models [
Our findings are in accord with previous suggestions that people hold positively biased priors [
In sum, the current study describes how the motivation to hold a certain belief over another can decrease the need for supporting evidence. The implication is that people may be quick to respond to signs of prosperity (such as rising financial markets)–forming desirable beliefs even when evidence is relatively weak- but slow to respond to indictors of decline (such as political instability)–forming undesirable beliefs only when negative evidence can no longer be discarded. Indeed, in our study participants were more likely to hold positive false beliefs (falsely believing they are in the desirable factory when in fact they were in the undesirable factory) than negative false beliefs (falsely believing they are in the undesirable factory when in fact they were in the desirable factory). While both positive and negative false beliefs resulted in a material cost, we speculate that positive false beliefs may have non-monetary benefits. In particular, it has been hypothesized that beliefs, just like material goods and services, have utility in and of themselves [
We recruited 100 participants (
Participants played 80 trials of the “Factory Game”. They began each trial by pressing the space bar, after which they witnessed an animated sequence of televisions and telephones passing along a conveyor belt. Each object would take 400 ms to traverse the belt with a 150 ms lag between stimuli.
There were two types of trials: Telephone Factory trials and Television Factory trials. In telephone factory trials the probability of each item in the animated sequence being a telephone was 0.6. and of being a television 0.4. For Television Factory trials the proportion was reversed. The current trial type was randomly determined with replacement on every trial with an equal probability for each trial type.
Participants were tasked with judging whether they were in a Telephone Factory trial or whether they were in a Television Factory trial. Since the trial type was not directly observable, their means of doing this was through reverse inference over the sequence of objects they were seeing. Participants were free to respond as soon as they wished after initiating the trial and the sequence would continue until they made their choice.
Participants began the game with an endowment of 5000 points. Each 100 points was worth 1 cent. One of the two factory types was randomly assigned per participant to be the desirable factory type and the other to be an undesirable type. Participants were informed that each time they visited the desirable factory, they would win an unspecified number of points, and each time they visited the undesirable factory, they would lose an unspecified number of points.
We dropped trials where the participant made their judgment before seeing a second item. In cases where a participant did this in over half their trials, we assumed that participant was not appropriately engaging with the task and eliminated the entirety of their trials. We dropped 10 participants for this reason, as well as a further 123 responses made before seeing second item. We additionally excluded 3 participants whose average accuracy in the task was two standard deviations below the mean of the sample (i.e. for whom accuracy was below 53.28%; mean accuracy of the sample was 71.24%), assuming that these participants were guessing rather than providing their answers based on presented evidence. Finally, 3 participants were excluded as possible bots. These included "participants" who had at least two of the following indicators: nonsense answers to open-ended questions and/or IPs originating outside of the region targeted by Mturk and/or reaction times at regular intervals (i.e. button presses at exactly the same millisecond after the start of the trial) in more than 10% of trials and/or comprehension questions at chance level. After the above exclusions, we performed the analysis on 84 participants, and a total of 6597 trials. The same exclusion criteria are applied in the replication and control studies.
Participants received extensive instructions prior to playing the game, and were required to answer multiple choice comprehension check questions on the key points of the task, with the question repeated until they either chose correctly or reached three times, upon which the correct answer was displayed to them. The comprehension check questions addressed the following key points of how the game worked: that telephone factories mostly produced telephones, but sometimes produced televisions; investment bonus was independent of the judgments they made; which factory was their desirable factory; and that trial types were randomly determined and it was not guaranteed that they would see exactly the same amount of each type of factory.
Participants then played a practice session of 20 trials, where the trial type was visibly displayed to them, so they could have prior experience of the outcome contingencies and the trial type distribution.
To relate participants’ judgments to the strength of evidence they observed we fitted a psychometric function, using a generalized mixed effects equivalent of a logistic regression, with fixed and random effects for all independent variables. We fitted these functions separately for participants for whom TV factory was desirable and for whom TV factory was undesirable.
Where
As stimuli were presented at a steady pace, the number of samples drawn was highly correlated with reaction times (R = 0.99, p < 0.00001) and thus these two measures can be thought of as interchangeable. As the number of samples drawn before making a judgment was non-normally distributed and had a heavy positive skew, we log-transformed this variable [
To examine speed-accuracy trade-off we divided the trials into fast and slow, based on median reaction time of the participant, and then calculated the average accuracy of desirable and undesirable responses within these categories. We performed a 2x2 ANOVA, with average accuracy as a dependent variable, and response (desirable/undesirable) and speed (fast/slow) as independent factors.
Our aim in modeling our task using the drift-diffusion framework was to assess the contribution of both the starting point and drift rate to the desirability bias we saw in our data. To that end, we implemented and compared six different specifications of a drift-diffusion model (DDM; see
Number | Model | Starting point (z) | Drift Rate (v) |
---|---|---|---|
1. | |||
2. | |||
3. | 0< |
||
4. | 0< |
||
5. | |||
6. | 0< |
In particular, in models with valence-independent starting point its value was fixed at 0.5. In models with valence-dependent staring point, its value could vary between 0 and 1. In models with an unbiased drift rate the parameter was symmetric for desirable and undesirable factories (v and -v). In models with biased drift rate the model additionally included a term reflecting the difference between drift rates for desirable and undesirable factories (
We used the HDDM software toolbox [
In fitting the models, we used priors that assigned equal probability to all possible values of the parameters. Also, since our “error” RT distribution included relatively fast errors we included an inter-trial starting point parameter (
In addition, model fits were compared using the Deviance information criterion, which is a generalization of the Akaike Information Criterion (AIC) for hierarchical models. The DIC is commonly used when the posterior distributions of the
To further validate the model and check its predictive accuracy, we fitted again the valence dependent and valence independent models using data from only even trials. We then used the parameter estimates to predict log RTs, judgments and their accuracy for odd trials for each participant. The simulation was repeated 1000 times with normally distributed random noise added to the drift rate averaging predicted responses for each trial. We then calculated mean absolute error between predicted and observed responses (RTs, judgments and judgment accuracy). We compared the average mean absolute errors between the models using a paired t-test. We also fitted a psychometric function to the simulated data.
Decision boundaries may collapse over time rather than remain fixed, reflecting increasing impatience or urgency of decisions [
Where
A judgment is made when the accumulated difference between the number of samples supporting one type of the factory over the other exceeded one of two symmetric boundaries,
Where
Model parameters were fitted to each participant’s data for desirable and undesirable responses separately using maximum likelihood estimation method. For each trial, we simulated the models 1000 times for a given set of proposal parameters and calculated the proportion of trials in which the model RT matched the empirical data. Denoting this proportion by
To find the best set of proposal parameters we first used an adaptive grid search algorithm and then used the five best sets of proposal parameters as starting points to a Simplex minimization routine [
(DOCX)
(DOCX)
We thank members of the Affective Brain Lab for comments on previous versions of this manuscript. Amiti Shenhav and Brad Love for helpful discussion. Marius Usher and Moshe Glickman for providing us with analysis scripts for DDM with collapsing boundaries.