Skip to main content
Advertisement
  • Loading metrics

Disentangling choice value and choice conflict in sequential decisions under risk

  • Laura Fontanesi ,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing

    laura.fontanesi@unibas.ch

    Affiliation Department of Psychology, University of Basel, Basel, Switzerland

  • Amitai Shenhav ,

    Roles Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Writing – review & editing

    ‡ These authors are joint senior authors on this work.

    Affiliation Department of Cognitive, Linguistic, and Psychological Sciences, Brown University, Providence, Rhode Island, United States of America

  • Sebastian Gluth

    Roles Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Writing – original draft, Writing – review & editing

    ‡ These authors are joint senior authors on this work.

    Affiliation Department of Psychology, University of Hamburg, Hamburg, Germany

Abstract

Recent years have witnessed a surge of interest in understanding the neural and cognitive dynamics that drive sequential decision making in general and foraging behavior in particular. Due to the intrinsic properties of most sequential decision-making paradigms, however, previous research in this area has suffered from the difficulty to disentangle properties of the decision related to (a) the value of switching to a new patch versus, which increases monotonically, and (b) the conflict experienced between choosing to stay or leave, which first increases but then decreases after reaching the point of indifference between staying and switching. Here, we show how the same problems arise in studies of sequential decision-making under risk, and how they can be overcome, taking as a specific example recent research on the ‘pig’ dice game. In each round of the ‘pig’ dice game, people roll a die and accumulate rewards until they either decide to proceed to the next round or lose all rewards. By combining simulation-based dissections of the task structure with two experiments, we show how an extension of the standard paradigm, together with cognitive modeling of decision-making processes, allows to disentangle properties related to either switch value or choice conflict. Our study elucidates the cognitive mechanisms of sequential decision making and underscores the importance of avoiding potential pitfalls of paradigms that are commonly used in this research area.

Author summary

A large body of work has investigated how people make sequential decisions under risk. For instance, how people decide whether to continue gambling for potentially greater rewards or to cash in to avoid losing everything. Here, we identify a critical confound in this line of research, between (a) the value of switching and (b) the amount of conflict between choosing to stay or switch. Using a previously proposed paradigm (i.e., the pig dice game) as an example, we replicated behavior from a recent study and showed that switch value is highly correlated with choice conflict. By simulating behavior across hypothetical contexts, we then identified and tested novel variants of this task that allow to deconfound switch value and conflict. However, only by means of sequential sampling modeling we could conclude that it is conflict rather than switch value that drives response times in this task. Sequential sampling modeling also shows how the switch value influences other cognitive components in this task.

1 Introduction

Sequential decision making refers to situations in which we continue to take a series of similar actions until we either decide to stop or are required to do so. A very prominent case of sequential decision making is foraging. A foraging animal collects food from their current patch of land (e.g., a grazing area), where resources are gradually depleting, until it decides to leave this patch for a new one, or until no resources are left. Humans face similar dilemmas on a regular basis. A characteristic example are housing bubbles. The longer we wait to sell a house, the higher the potential gain as long as the prices keep increasing. But if the housing market collapses, we might have to sell the house at a price that is even lower than the acquisition costs. Critically, both the grazing animal and the house seller need to trade off the expected benefits and risks of staying in the current situation against those of making a switch [1]. An inherent property of sequential decision-making problems is that this trade-off between staying and switching becomes increasingly difficult the longer one stays in a given environment [2]. Thus, when the animal starts to exploit a new and rich patch, it is obvious that the animal should keep harvesting from the patch for a while. However, as the patch gets more and more depleted, the expected benefits from staying and switching become increasingly similar. The marginal value theorem predicts that a reward-maximizing animal will leave the old patch as soon as the expected benefits of switching is equal to or higher than the expected benefits of staying [1]. As a consequence, the final decision is usually made at the indifference point (IP) between staying and switching [2], which reflects the point of maximal choice conflict. Choice conflict in this context is a latent variable that cannot be observed directly but must be inferred from modeling the choices of a decision maker, taking into account their own idiosyncratic preferences (e.g., risk tendencies).

This feature of sequential decision making has been the center of a recent controversy on the neural basis of foraging [3, 4]. In particular, it has been debated whether the dorsal anterior cingulate cortex (dACC) encodes either the value of switching to the next patch (henceforth: ‘switch value’) [5] or the conflict of choosing whether to stay or to switch [2, 6]. The methodological challenge behind this debate is that both the switch value and the conflict rise until the IP, and thus can be confounded when used as predictors for neural activity (Fig 1). A distinction is only possible after reaching the IP, when switch value keeps rising while choice conflict declines (as the decision to switch becomes more and more obvious; see shaded grey areas in Fig 1). Thus, the two can be dissociated by modifying standard foraging tasks so that more decisions are made after the IP. For example, one could create situations of heavily depleted patches, such that the switch value is very high while choice conflict is relatively low, as it is quite obvious to decide to move away from the depleted patch.

thumbnail
Fig 1.

Shaded grey areas represent decision settings after the IP (i.e., the point at which leaving the patch has the same expected utility as staying in the patch). Decisions made after the IP have lower choice conflict (black) than those made at the IP, and are thus crucial to discriminate conflict from the monotonically increasing switch value (blue). In Patch 1, for example, switch value and conflict are perfectly correlated. The decisions shown here were taken from 3 trials in our task of a participant in Experiment 1, but help illustrating a general pattern in foraging and sequential decision-making tasks.

https://doi.org/10.1371/journal.pcbi.1010478.g001

Notably, the same methodological challenge can also be found in sequential decision making under risk and uncertainty. A prominent example of this type of task is the Balloon Analogue Risk Task (BART), which involves inflating a virtual balloon with sequential pumps. Participants accumulate more money the larger the balloon gets, but lose that accumulated money if the balloon pops [7]. Thus, each decision to pump the current balloon rather than cash in and move on to the next balloon comes with potential risk and reward. Consistent with the foraging literature discussed above, neural signals that increase with every pump have been found in dACC [8], but it remains unclear whether these signals represent the value of cashing in (i.e., switch value), choice conflict, escalating risks, or simply the number of executed pumps per round.

A recent study [9] attempted to tackle this question using a similar task, based on the dice game ‘pig’. In this game, participants repeatedly roll a die in a series of rounds. Every time they roll a die and a number between 2 and 6 faces up, that number is added to their accumulated reward for that round. If they roll a 1, however, the round ends and they lose the rewards accumulated in that round. After every roll of a die that does not result in a 1, participants can choose whether to terminate the current round or to continue rolling the die (Fig 2A). If they decide to terminate, the cumulative sum of rewards is cashed in. Their ultimate payoff depends on the average cumulative sum of rewards per round, so the potential costs of rolling a 1 (and thus losing everything) must be traded off against the disadvantage of cashing in too early (and thus collecting too little for that round). In fact, the expected value (EV) maximizing policy for a risk-neutral player is to keep rolling until the expected loss amount is higher than the EV of one more roll [9]. Thus, the ‘pig’ game exhibits a simpler and less ambiguous definition of the optimal policy compared to the BART, offering a more transparent quantification of risk and expected value (in the BART, participants can only obtain an accurate belief of the loss probability after observing many explosions of the balloon). Still, it shares the BART’s sequential decision-making structure (and that of foraging tasks), and with it the difficulty to separate choice conflict from switch value. Meder and colleagues [9] reported that neural activity in dACC scaled linearly with the cumulative sum of rewards in each round (which effectively represents the switch value in this task), and found a significant correlation between the cumulative sum of rewards and response times (RTs). They concluded that their data provided evidence in support of the switch-value hypothesis and against the conflict hypothesis [5]. Generally speaking, contrasting the two hypotheses of choice conflict versus switch value is important for understanding the computational and neural basis of these sequential decision-making tasks and others. With respect to the dACC, for instance, it is critical to understand whether this region encodes information related to or correlated with choice conflict—such as task difficulty, error likelihood, mental effort costs, or surprise [10]—or whether it encodes information specifically relevant to adaptive foraging.

thumbnail
Fig 2.

A: Example of two rounds of the ‘pig’ dice game, the top one ending with a decision to stop and a reward of 130, the bottom one ending because of a rolled 1 and with no reward. B: Probability of stopping (shaded areas) as a function of the cumulative sum within a round for a simulated risk-neutral player. The IPs (dashed lines) for the 1/6, 2/6, and 3/6 conditions are, respectively: 40, 85, and 200. C: Percentage of decisions before and after the IP for a simulated group of players with different risk preferences (risk neutral on average). The number of decisions made after reaching the IP increases with the probability of losing, making the task more balanced. D: Percentage of decisions made after the IP as a function of the IP itself, for the same group of simulated participants as in C. Participants with higher IPs (more risk seeking) experience a more imbalanced task in the 2/6 and 1/6 conditions.

https://doi.org/10.1371/journal.pcbi.1010478.g002

The purpose of the current study was to stress the need for careful consideration of a task’s structure as well as to demonstrate the benefits of employing computational modeling of behavior when studying sequential decision making. To this end, we used the ‘pig’ dice game as an exemplary sequential decision-making paradigm and subjected it to careful theoretical and empirical examination. In the following, we will first show that the standard version of the ‘pig’ dice game cannot dissociate variability in switch value (or in the number of choices to continue within a given round) from variability in choice conflict. As a consequence, it would be very difficult to disentangle the relative contributions of these variables to neural activity measured while performing the task. Second, we will introduce a simple extension of the paradigm to mitigate this shortcoming. Third, we will present results from two behavioral experiments, one a replication of the original and the other a novel extension of the task that was designed to better discriminate switch value and conflict. Finally, we will use modeling of choice and RT data via the diffusion decision model (DDM, [11, 12]) to map different task dynamics onto parameters of a well-established computational model of decision making.

2 Results

2.1 Structure of the ‘pig’ dice game and modifications

In the version of the ‘pig’ dice game used by Meder and colleagues [9], the dice numbers were multiplied by 10 to specify the rewards. For example, cashing in after rolling a dice three times with numbers 3, 5, 5 would lead to a cumulative sum of rewards of 30+50+50 = 130 (top example in Fig 2A). On the other hand, if a 1 is rolled after 3, 3, 4 are rolled, the reward is not 110 but 0 (bottom example in Fig 2A). The time that participants spent on the entire task was fixed, and the monetary payoff was based on the average number of rewards collected per round. This combination of task design and incentive structure implies a fairly simple definition of the optimal strategy for a risk-neutral player who seeks to maximize EV. The player should continue rolling the die until a cumulative sum of rewards of 200 has been reached and then terminate the current round by cashing in (i.e., IP = 200). The rationale behind this EV-maximizing strategy is that the expected loss of rolling a 1 on the next try exceeds the expected gain of rolling one of the other numbers as soon as more than 200 points have been accumulated.

In this standard version of the game, it takes 4–10 consecutive die rolls to accumulate 200 points. Once that number is reached or crossed, points accumulate at a similarly gradual pace. Given that the latent variable of choice conflict for a risk-neutral player is by definition maximal at 200 points (i.e., at the player’s IP), it is likely that this player will need to make several choices to reach their IP. At the same time, it is unlikely that the player will continue to try accumulating rewards past their IP for long. In other words, the design of this task creates a strong imbalance in the number of decisions prior to and past the IP. This presents a significant challenge for disentangling correlates of reward accumulation and choice conflict, because it means that having greater accumulated reward will also generally mean being nearer to one’s IP (i.e., in a state of greater choice conflict about whether to continue or stop).

To demonstrate this, and to quantify the relationship between reward accumulation and choice conflict under different task conditions, we simulated a participant playing this game who chooses to continue or to stop in a probabilistic manner depending on the current distance of the cumulative rewards from 200 (the risk-neutral IP, details provided in Methods, Fig 2C, yellow curve). On average, this participant made only 9.7% of their decisions to stop the round past this IP. We then varied these risk preferences (IPs) across 100 simulated players (risk neutral on average, and with different sensitivity to value differences, details provided in Methods) and found that, across this population, players made only between 6% and 13% (10% on average) of their decisions past their IP (Fig 2C, where P losing = 1/6). This means that most of the times they make a decision, the cumulative sum is lower than their IP. This low proportion of choices past IP was higher when for risk-averse participants (i.e., IP < 200) compared to risk-seeking participants (i.e., IP > 200) (see Fig 2D, yellow dots).

To mitigate the imbalance of decisions before and after the IP, one can simply increase the probability of losing the cumulative sum of rewards at each roll of the die. In the standard version, participants have a 1/6 chance of losing with each roll (i.e., if they roll a 1). If this probability is increased to 2/6 (e.g., lose with a roll of 1 or 3) or to 3/6 (e.g., lose with 1, 3, or 5), then the optimal strategy is to shift one’s IP substantially downward. For a risk-neutral participant, this IP would be 85 points for the 2/6 case or 40 for the 3/6 case (Fig 2B, pink and blue curves). As a consequence, participants are expected to make fewer decisions prior to reaching their IP (because these lower IPs will be reached with fewer die rolls) and thus to have more balanced numbers of decisions before and after reaching their IP. Indeed, our simulations predict that participants will make 30% of their decisions past their IP when the probability of losing is 2/6, and that this number increases to 61% when the probability of losing is 3/6 (Fig 2C and 2D; see also Fig A in S1 Text for a group of simulated, on average risk-averse participants).

We tested these predictions empirically across two experiments. The purpose of Experiment 1 was to test whether we can replicate the central results of Meder and colleagues [9]. We also sought to substantiate our critique of the standard version of the task with empirical data. In Experiment 2, we tested the standard version together with the two proposed changes of loss probabilities as described above in a within-subject design.

2.2 Experiment 1: Replication of the standard ‘pig’ dice game

Participants (N = 30) performed the standard version of the ‘pig’ dice game with a constant 1/6 probability of losing with each roll. To estimate each participant’s IP, we regressed the decision to stop or to continue onto the cumulative sum of rewards for a given round, using a hierarchical Bayesian logistic regression model (details provided in the Methods section). The cumulative sum of rewards was a good predictor for the probability of stopping, as indicated by a well-above-zero regression coefficient (the 95% highest density posterior density interval, or HDI, was between 0.03 and 0.05, Fig 3A and Table A in S2 Text). The odds ratio, calculated on the mean of the estimated posterior distribution of the logistic regression coefficient was 1.04. The Bayesian R squared [13] of the logistic regression had a median of 0.62 (HDI = [0.61, 0.63]) (see Fig A in S3 Text). In line with Meder and colleagues [9], we found that the average estimated IP was below the EV-maximizing IP of 200 (HDI = [92.73, 163.06], Fig 3B), indicating risk aversion. Similarly to the results of our simulations, our participants made only very few decisions after the IP (M = 17.1%, SD = 10.7%, Fig 3C). We also calculated the Spearman’s rank correlation between the estimated trial conflict (based on the logistic regression results) and the cumulative sum of rewards. This gives us an idea of how likely it is to dissociate switch value from conflict in this task. The Spearman correlation coefficient was on average .78 (SD = .26) across participants (with 96% of the p values being < 0.01, and 51% of the participants having a coefficient higher than.9).

thumbnail
Fig 3. Regression analyses Experiment 1.

The left column shows the results of the logistic model fit on choice data, while the right column shows the results of the linear model fit on RTs. A Posterior distribution of the cumulative sum of rewards coefficient at the group level (the shaded area is the 95% HDI), when we predict the probability of stopping within a round. B Estimated logistic curve (colored shaded area) and the IP (grey shaded area) at the group level. C Distribution of trials before and after the estimated IP. D Posterior distributions of the round number, draw number within a round, and cumulative sum of rewards coefficients at the group level (the shaded area is the 95% HDI), when we predict RTs. E Posterior predictives of mean RT data as the probability of stopping increases within a round, against the mean RT data. F Comparison of the same posterior predictives, selectively at the points of maximum conflict and at the points of maximum probability of stopping. Here the vertical lines represent the data while the shaded bars are the predictions.

https://doi.org/10.1371/journal.pcbi.1010478.g003

The low number of decisions made after reaching the IP confirms our simulation results reported above and call into question whether the standard version of the task is suitable to isolate choice conflict from other factors that may drive behavior, such as the expected utility of stopping. Nevertheless, we tried to identify such potential effects by analyzing the RT data of Experiment 1. More specifically, we fitted a hierarchical Bayesian linear regression model (details provided in the Methods section) that regressed log(RT) onto the predictor variables round number, within-round draw number (i.e., the number of times a dice was rolled within a round of the pig dice game), and cumulative sum of rewards. By doing so, we found influences on RT for all predictor variables except for the cumulative sum of rewards (see Fig 3D and Table B in S2 Text). The Bayesian R squared of the linear regression had a median of 0.62 (HDI = [0.61, 0.63]) (see Fig A in S3 Text). A careful look at the development of RT over the course of the round of the game indicates that it is difficult to decipher whether RT decreased beyond the IP (Fig 3E), consistent with the fact that there were relatively few trials after the IP. More specifically, when comparing the predicted mean RT data at the points of maximum and minimum conflict (Fig 3F) we would expect to see a clear difference (with RT data at the point of maximum conflict being slower), but the 95% HDI of the respective posterior predictives mostly overlap.

To summarize, in Experiment 1 we were able to replicate key behavioral patterns observed in [9] when participants perform a version of the pig dice game with a 1/6 probability of loss. In particular, we show that stopping choices are highly sensitive to the cumulative sum of reward within a round, that participants are on average risk averse, and that RTs demonstrate a trend towards increasing with cumulative sum of rewards. At the same time, we were also able to show that under this task design participants made the vast majority (> 80%) of their choices before hitting their IPs. As a result, there were many more trials where the obvious choice was to continue than there were where the obvious choice was to stop, and thus the value of switching tended to be highly correlated with choice conflict. We next examined whether these variables could be decorrelated by modifying the task to better balance the number of choices on either side of one’s IP.

2.3 Experiment 2: Extension of the ‘pig’ dice game

Participants (N = 50) performed a variant of the task in which the probability of losing at every die roll varied across conditions between 1/6, 2/6, and 3/6. As expected, the logistic regression analysis of choices (details about how these analyses were adapted for Experiment 2 are provided in the Methods section) made in this experiment revealed that (1) the cumulative sum of rewards was a good predictor for the probability of stopping in all conditions, as indicated by well-above-zero regression coefficients (HDI = [0.03, 0.05], [0.07, 0.09], [0.09, 0.14] for the 1/6, 2/6, and 3/6 conditions, respectively; Fig 4A and Table A in S2 Text), and (2) the average IP was substantially lower for the two added conditions (standard 1/6 condition: HDI = [108.75 189.74]; 2/6 condition: HDI = [56.83 92.89]; 3/6 condition: HDI = [26.56 49.21]; Fig 4B). Note that in these analyses we tested the effect of the cumulative sum of rewards separately per condition (for an alternative analysis using the condition as a predictor and testing the interaction between condition and the cumulative sum of rewards see S4 Text. The odds ratio, calculated on the mean of the estimated posterior distribution of the logistic regression coefficients was 1.04, 1.08, 1.12 for the 1/6, 2/6, and 3/6 conditions, respectively. The Bayesian R squared of the logistic regression had a median of 0.80 in the 1/6 condition (HDI = [0.788, 0.802]), 0.86 in the 2/6 condition (HDI = [.856, .866]), and .905 in the 3/6 condition (HDI = [0.901, 0.909]) (see Fig B in S3 Text). The group-level IPs were all lower than the ‘risk-neutral’ IPs in their respective conditions, meaning that participants were on average risk-averse in all three conditions. Most importantly, the proportion of decisions made after the IP was substantially greater in the 2/6 condition compared to the 1/6 condition (with a mean difference equal to .13, padjusted = .018, DOF = 114, Tukey’s Test adjusted), and in the 3/6 condition compared to the standard 2/6 condition (with a mean difference equal to .31, padjusted = .001, DOF = 114, Tukey’s Test adjusted) (Fig 4C). As for Experiment 1, we also calculated the Spearman’s rank correlation coefficient between the estimated trial conflict (based on the logistic regression results) and the cumulative sum of rewards. The Spearman’s coefficient was on average .85 (SD = .28, with 97% of the p values being < 0.01 and 72% of the participants having a coefficient higher than .9) across participants in the 1/6 condition, .42 (SD = .57, with 87% of the p values being < 0.01 and 25% of the participants having a coefficient higher than .9) in the 2/6 condition, and -.42 (SD = .62, with 87% of the p values being < 0.01, 2% of the participants having a coefficient higher than .9 and 32% having a coefficient smaller than -.9).

thumbnail
Fig 4. Regression analyses Experiment 2.

The left column shows the results of the logistic model fit on choices, while the right column shows the results of the linear model fit on RTs. A: Posterior distribution of the cumulative sum of rewards coefficients (separate per condition) at the group level (the shaded area is the 95% HDI), when we predict the probability of stopping within a round. B: Estimated logistic curves (colored shaded area) and the IP (grey shaded area) at the group level. C: Distribution of trials before and after the estimated IP, separately by condition. D: Posterior distributions of the round number, draw number within a round, and cumulative sum of rewards coefficients at the group level (the shaded area is the 95% HDI), when we predict RTs.E: Posterior predictives of mean RT as the probability of stopping increases within a round. F: Comparison of the posterior predictives, selectively at the points of maximum conflict and at the points of maximum probability of stopping. Here the vertical lines represent the data while the shaded bars are the predictions.

https://doi.org/10.1371/journal.pcbi.1010478.g004

As with Experiment 1, we tested the influence of round number, decision number, and cumulative sum of rewards on RT data (Table B in S2 Text). Similar to Experiment 1, there was a negative effect of round number on RT in the 1/6 condition (implying faster decisions later in the experiment), but no systematic effects in the other two conditions. The coefficient for the decision number was above 0 for the 1/6 and 2/6 condition (implying slower decisions later in a trial). Most importantly, the cumulative sum of rewards coefficient was lower than 0 in the 3/6 condition, around 0 in the 2/6 condition, and positive in the 1/6 condition. This shows that the relationship between cumulative sum of rewards and RT critically depends on the task settings: As the proportion of decision after the IP increased, the effect of reward sum on RT turned from positive to negative. The Bayesian R squared of the linear regression had a median of 0.59 in the 1/6 condition (HDI = [0.58, 0.60]) (see Fig B in S3 Text), 0.63 in the 2/6 condition (HDI = [.62, .64]), and .62 in the 3/6 condition (HDI = [0.61, 0.64]).

We also looked at the posterior predictive distributions of RT as a function of the predicted probability to stop (Fig 4E and 4F). Here, we were specifically interested in comparing the predicted RT for a stop probability close to .5 (i.e., when being at the IP) against the predicted RT for a stop probability close to 1 (i.e., when being well beyond the IP). Even though on average the RT was lower at a stop probability close to 1 across all conditions, this analysis did not provide conclusive evidence for a decline of RT after the IP. To confirm this point, we also fit linear regression models to predict RTs from cumulative sum of rewards after the IP, separately by participant and condition. On average the cumulative sum of rewards was -0.0001 in the 1/6 condition (SD = 0.003, 5% with p<.01), -0.001 in the 2/6 condition (SD = .003, 7% with p<.01), and -0.002 in the 3/6 condition (SD = 0.003, 8% with p<.01). Therefore, we proceeded with simultaneously fitting choice and RT data using a sequential sampling modeling approach. The WAIC of the logistic regression model on choice was 10515, while for the linear regression model on RTs was 6013.

2.4 Sequential sampling modeling

The previous sections provide evidence that switch value and choice conflict are confounded in standard versions of the ‘pig’ dice game (and any games that share an analogous structure). While, under the appropriate experimental conditions (i.e., higher probabilities of loss), it is possible to increase the number of trials after the estimated IPs, it is still unclear whether this can help identify a (non-monotonic) decline of RT after the IP that would be consistent with the choice conflict hypothesis. This might be because we have thus far relied on separate regression analyses for choice and RT data. As a result, we did not take advantage of the joint information provided by both choice and RT data (and by the full shape of the RT distribution). At the same time, we also did not account for potential asymmetries in the speed with which one chooses to continue versus stop the round, and for ways in which different conditions may affect how cautious participants are in making their decisions. To address these gaps, we present and compare different variants of a process model for decisions made in this game, based on the sequential sampling modeling framework.

Sequential sampling models allow us to fit choice and RT data jointly in order to map them onto meaningful model variables and parameters. In essence, sequential sampling models assume that a decision emerges from a noisy process of evidence accumulation that is terminated as soon as a targeted level of evidence in favor of a particular choice option has been reached [1417]. We used the core version of the DDM with its four parameters drift rate, boundary separation, starting point, and non-decision time, but no across-trial variability parameters. The drift rate dt was linked to the utility difference between choosing to continue (vs. to stop) by specifying it at every decision point t as: (1) where Σt refers to the cumulative sum of rewards, and δ0 and δ1 are free parameters. A positive drift rate favors the decision to stop, and a negative drift rate favors the decision to continue. It follows that this specification of the drift rate implicitly models choice conflict: As long as δ0 < δ1 Σt, there is more evidence for continuing, but as soon as δ0 > δ1 Σt, there is more evidence for stopping. Critically, at the IP, we have δ0 = δ1 Σt, which implies that the probability to choose either option is .5, the expected RT is longest, reflecting maximum choice conflict. Therefore, we can use the DDM to infer participants’ IPs on the basis of their joint choice and RT data and model risk preferences similarly to how it is done in the logistic regression model. In addition to modeling the influence of accumulated reward on drift rate, we also generated variants of the DDM in which the drift rate was modulated by the decision number (i.e., by substituting the decision number within rounds to Σt in Eq 1), and that allowed for the possibility that either the cumulative sum of rewards or the decision number might affect the threshold for responding (e.g., participants could become increasingly cautious after each ‘continue’ decision) or bias the starting point (e.g., the evidence required for making a ‘top’ decision could be lowered after each ‘continue’ response). Since the threshold parameter must be positive, the equation for the threshold modulations was a softplus transformation of Eq 1: (2) where At is the threshold at decision t. Since the relative starting point can vary from 0 to 1, we used the Phi transformation instead (i.e., the cumulative function of the standard Gaussian distribution): (3) where zt is the relative starting point at decision t. We also fitted a baseline model in which none of the parameters was modulated by task variables.

The DDM variants were fitted to the choice and RT data of Experiment 2 using a hierarchical Bayesian modeling approach (details are provided in the Methods section). Importantly, all models assumed different sets of parameters across the different conditions of the ‘pig’ dice game. In total, we compared 19 different DDM variants against each other, including a baseline model that assumed fixed drift-rate, threshold, and starting-point bias for every decision. As can be seen in Fig 5, the DDM variant that provided the most parsimonious account of the data assumed that cumulative sum of rewards impacted all of the considered DDM parameters (i.e., drift rate, threshold, and starting-point bias). In general, all models where the drift rate was modulated by the decision number fitted the data worse than the models where the drift rate was modulated by the cumulative sum of rewards (see S5 Text for the full model comparison including all 19 models). Closer inspection of the posterior predictive distributions of the parameters coefficients (see Table C in S2 Text and Fig 6A) indicated that cumulative sum of reward had the expected influence on the drift rate, with parameter δ1 being well above 0 in all three conditions.

thumbnail
Fig 5. Model comparison of diffusion decision models (DDMs) based on WAIC (data: Experiment 2).

All the models included here have their drift-rate modulated by the cumulative sum of rewards, while their thresholds and starting-point could be either fixed (not modulated), or modulated by either the decision number or the cumulative sum of rewards. Lower WAICs indicate better fits to data after accounting for model complexity. The bars represent the estimated WAICse. **Best model. *These models do not fit credibly worse than the best model.

https://doi.org/10.1371/journal.pcbi.1010478.g005

thumbnail
Fig 6. DDM analyses of Experiment 2.

A: Posterior distribution of the cumulative sum of rewards coefficients (separate per condition) at the group level (the shaded area is the 95% HDI) for three of the DDM parameters. B: Posterior predictives of mean RT data as the probability of stopping increases within a round. C: Distribution of trials after the estimated IP, separately by condition.D: Comparison of the posterior predictives, selectively at the points of maximum conflict and at the points of maximum probability of stopping. Here, the vertical lines represent the data while the shaded bars are the predictions.

https://doi.org/10.1371/journal.pcbi.1010478.g006

In the 3/6 condition but not the other two conditions, increases in the cumulative sum of rewards were also associated with higher thresholds, indicating greater overall caution when a lot of rewards had been collected. Across all three conditions, we also found that greater cumulative rewards were associated with a shift in one’s starting point towards the boundary for continuing (vs. stopping). While this may seem counter-intuitive at first (given that participants will also be increasingly likely to stop as cumulative reward increases), we speculate that it may reflect a bias towards repeating the previous response(s) (i.e., perseveration; [1820]).

Analogous to the logistic regression analysis, we estimated the IP based on the drift-rate coefficients at a group-level and, based on them, the number of trials before and after the IP: (4)

The estimated IP were similar to the ones obtained from the logistic regression: HDI = [115.00 190.98] (standard 1/6 condition); HDI = [61.67 89.19] (2/6 condition); and HDI = [27.50 45.91] (3/6 condition), confirming that participants were on average risk-averse in all three conditions. Accordingly, we also obtained a similar percentage of decisions made after the IP (Fig 6E): again, they were substantially more in the 2/6 condition compared to the 1/6 condition (with a mean difference equal to 0.14, padjusted = .018, DOF = 116, Tukey’s Test adjusted), and in the 3/6 condition compared to the standard 2/6 condition (with a mean difference equal to .32, p = .001, Tukey’s Test adjusted). For a direct comparison of the logistic and DDM coefficients, see S6 Text).

By estimating the posterior predictive distributions of mean RT for different probability to stop (Fig 6B) we could confirm one of our central hypotheses, namely that the RT decrease after the IP, and thus reflect choice conflict rather than switch value. More specifically, when comparing the predicted RT for a stop probability close to.5 (i.e., when being at the IP) against the predicted RT for a stop probability close to 1 (i.e., when being well beyond the IP) the difference between mean RT is higher in our proposed conditions (i.e., conditions 2/6 and 3/6) compared to the original one (i.e., condition 1/6) (Fig 6C).

Taken together, the sequential sampling modeling analyses confirm that both choice and RT data in the ‘pig’ dice game primarily reflect choice conflict, and that RT data exhibit a non-monotonic relationship with the cumulative sum of rewards (i.e., switch value), insofar as they first increase up to the IP but then decrease after it. In addition, we found that switch value exerted multiple influences on behavior by affecting the drift rate, the starting-point bias, and (in the 3/6 condition) the boundary separation.

3 Discussion

In this study, we sought to illustrate difficulties inherent to studies of sequential decision-making under risk, in teasing apart how people process the value of switching (stopping a given round and cashing out) from conflict over whether to stay or switch. We further demonstrated how these challenges can be overcome through a combination of cognitive modeling, model-driven experimental modifications, and advanced statistical analyses. To this end, we took the ‘pig’ dice game as an exemplary paradigm of sequential decision making. In this task, a player can pile up rewards over a sequence of repeated die rolls while facing a constant threat of losing all of the collected rewards. By analyzing the expected behavior of a rational (i.e., risk-neutral) agent, we demonstrated that the standard version of the task may not be able to disambiguate choice conflict and switch value (here conceptualized as the sum of collected rewards within a round) because of the limited number of decisions made after the IP. Thus, we extended the ‘pig’ dice game by adding two new conditions, in which we increased the probability of losing the cumulative sum of rewards. This modification led to a more balanced design of the task. Ultimately, the simultaneous modeling of choice and RT data with the DDM allowed us to identify both influences of choice conflict as well as switch value, and to map those influences onto different DDM parameters.

Our findings have broad implications for research into sequential decision-making in general and foraging-like settings in particular. With respect to research on foraging, our results confirm that the pig dice task, as originally devised, faces significant challenges in disentangling contributions of switch value (cf. foraging value) from those of choice conflict when investigating underlying mechanisms, as [9] attempted to do. While we did not collect neural data in this study, we did replicate their behavioral findings with an identical task design (Experiment 1) and showed that proper estimates of choice conflict for such a design are highly collinear with estimates of switch value. Our results are silent on which of these variables better accounts for dACC activity during such a task, but our normative and empirical findings from Experiment 2 suggest a clear path towards resolving this question. Previous studies by Shenhav and colleagues applied this same approach to deconfound foraging value and choice conflict in a non-sequential foraging choice task (originating in [5]) and demonstrated that choice conflict consistently accounted for dACC activity during these choices better than foraging value [2, 6, 21]. While questions remain regarding whether foraging value signals might emerge by altering other properties of this experimental design (e.g., the reward-related cues), this approach was successful at unambiguously disentangling conflict and foraging value by enabling the authors to test for qualitatively distinct patterns of neural activity (i.e., ones that vary monotonically vs. non-monotonically). It remains to be determined whether a deconfounded version of the ‘pig’ dice task (e.g., our Experiment 2) would continue to demonstrate that dACC tracks switch value (e.g., cumulative sum of rewards) or whether it would determine that activity previously attributed to switch value is in fact better accounted for by choice conflict. Importantly, in extending these theoretical and empirical findings to the domain of sequential choices, our current work provides an even richer account of relevant real-world choice settings, including how trial history effects within such settings help shape choice behavior over the course of a patch/round (see also [22]).

There is a plethora of work showing that people can adapt their choice strategies to different environments [2326]. Thus, it is possible that participants used different strategies in the three different conditions. Take the 3/6 condition as an example, in which the loss probability at every decision point is 50%. Effectively, this eliminates the sequential element of the task, which is usually characterized by a sequence of multiple consecutive actions that are terminated by a final choice (see Introduction). Therefore, participants may have employed a choice strategy in this condition that is typical of and appropriate for standard 2-alternative-forced-choice tasks and that is well-captured by the DDM. In the other two conditions, however, they may have used a different strategy that could have resulted in different dynamics. Although we cannot rule out this possibility, our modeling results speak against the notion of different strategies, as the DDM provided a good account of the behavioral data across all conditions, and the choice dynamics remained stable in most aspects. Furthermore, we suppose that the within-subject design of Experiment 2, in which each participant was required to perform all three conditions, should have discouraged the adoption of different choice strategies from one condition to the next.

As stated above, the present work has strong connections to both the foraging literature and sequential decision making under risk and uncertainty. The latter does not only include the BART and the ‘pig’ dice game, but also the Angling Risk Task [27], the Columbia Card Task [28], and the stock-buying task used by [29]. Such tasks have been used to study neural and physiological signals underlying evaluations of risky choice (see also [30, 31]). Similarly, our work is related to research on the optimal stopping problem, such as deciding when to give up on a project that requires a lot of resources to become a success [32] or when to stop checking websites for the cheapest flight and buying a ticket [33]. In both cases, the value of breaking up the investment or search process rises monotonically while the decision to ‘call it quits’ becomes more and more obvious. As such, the current findings have very broad implications for research across sequential decision-making tasks in general. First, the basic structure of sequential decision-making paradigms makes it difficult to investigate different dynamics, because most task features (e.g., cumulative rewards, escalating risks) and internal states (e.g., choice conflict) develop in a monotonic manner up to the terminal action. Therefore, it is necessary to understand this limitation and to consider variations of tasks that allow the researcher to investigate what happens after the point of maximum conflict has been reached. To exemplify how to apply our insights to other paradigms, take the BART: This task could be extended by a variant, in which a single pump occasionally increases the balloon’s volume massively. If this happens, the IP has most likely been exceeded to a large extent, and the decision to switch to the next round is fairly easy and made with comparatively little choice conflict.

Our results also demonstrate the utility of combining state-of-the-art statistical analyses of behavior (e.g., hierarchical Bayesian techniques) with similarly sophisticated cognitive modeling approaches in order to maximize the potential of dissociating different dynamics. First of all, while the extension of the task paradigm in itself revealed that—contrary to previous claims—the relationship between switch value and RT is not necessarily positive (Fig 4D), it provided only relatively weak evidence for a decline of RT after the IP (Fig 4E and 4F). The joint modeling of choice and RT by means of the DDM strengthened this evidence substantially (Fig 6B and 6C). Furthermore, applying cognitive modeling yielded a more fine-graded picture of the various choice dynamics in the ‘pig’ dice game. For instance, our DDM results suggest that switch value not only influences the rate of evidence accumulation within a given trial but also the ‘initial settings’ of the decision process in terms of boundary separation and starting-point bias. In particular, our modeling results suggest the presence of a perseveration (or stickiness) effect of increasing trials within a given round, which leads to fast errors (i.e., fast ‘continue’ choices late in the round, in the presence of high evidence for ‘stopping’). This was made evident by a modulation of the starting point bias by the cumulative rewards in each round. This effect would have not been evident with simple regression analyses, which do not take the whole distribution into account. Taken together, we believe that the combination of extending the experimental paradigm and applying cognitive modeling was pivotal to clarify the complex dynamics in the present sequential decision-making task. It will be important for future research to try to generalize these results to other sequential scenarios and to identify the (potentially diverging) neural correlates of different cognitive mechanisms.

4 Methods

4.1 Ethics statement

The study was approved by the Department of Psychology Ethics Committee at the University of Basel. Participants provided written informed consent before the start of the experiment.

4.2 Simulations and task extension

We performed the first simulation to show how a risk-neutral, EV maximizing participant would behave in our extended version of the ‘pig’ dice game. Because we also simulated players with non-neutral risk preferences (see further below), we will use the more general term of expected utility (EU) in the following. The EU of the two options (i.e., stop vs. continue) were calculated as: where Σt is the cumulative sum of rewards, is the average winning number in a specific condition. In the 1/6 condition, we have ; in the 2/6 condition, we have ; in the 3/6 condition, we have . Parameter α was set to 1 for the risk-neutral player, so that the EU was equal to EV.

We simulated choices drawing random samples from a Bernoulli distribution, where the probability to stop was obtained using the softmax function: (5) where θ is a sensitivity parameter, and was set to 0.2. We simulated 120 minutes of game play per condition.

The IPs of a risk-neutral player for each condition can be obtained by equating EUcontinue and EUstop:

Therefore, when the probability of losing is 1/6: (6)

Likewise, when the probability of losing is 2/6: (7)

Finally, when the probability of losing is 3/6: (8)

We performed the second simulation to show how a group of participants with different risk preferences and sensitivities would behave in the standard and extended versions of the ‘pig’ dice game. Instead of fixing parameter α at 1, we randomly drew 100 values of α from a transformed normal distribution α ∼ log(1+ exp(N(M = 0.5, SD = 0.3)), so that the resulting α had a mean of 1. This distribution ensured that on average participants were risk neutral and had an IP on average around 200 when the probability of losing was 1/6. To implement variability in the sensitivity parameter θ at .2, we randomly drew 100 values of θ from a transformed normal distribution θ ∼ log(1 + exp(N(M = −1.5, SD = 0.3)), so that the resulting θ had a mean of .2 as the fixed value used in the first simulation (Fig C in S1 Text). To make the simulations more similar to the original task, we simulated 20 minutes of game play per participant and condition.

Finally, we performed a third simulation to show a group of participants with different risk preference but risk averse on average, with an average α of.8 (Fig B in S1 Text).

4.3 Experiment 1

4.3.1 Participants.

In Experiment 1, we tested 30 participants (22 females, age: 18–48, M = 26.1, SD = 6.9). Four participants were excluded, because they did not understand the task (e.g., they made less than 5% either continue or stop decision, or they made more ‘stop’ than ‘continue’ responses at the beginning of a round). The sample size was selected based on previous work with comparable sample sizes, but was not determined by a formal power analysis. Notably, the study that we sought to replicate [9] tested only 20 participants and analyzed the data of only 18 participants.

4.3.2 Experimental procedures.

After reading and signing the consent form, participants were instructed on the ‘pig’ dice game. This task was played in multiple rounds for a predetermined amount of time. Following Meder and colleagues [9], participants played the task for 25 min. Each round consisted of multiple sequential decisions to continue rolling a regular six-sided die (and thereby accumulating rewards) or to cash in the cumulative sum of rewards by stopping the current round and proceeding to a new round. The two buttons to ‘continue’ and to ‘stop’ were counterbalanced across participants. As long as any number between 2 and 6 was rolled, the rolled number was multiplied by 10, and the reward was added to the current round’s cumulative sum of rewards. As soon as a 1 was rolled, all cumulative sum of rewards were lost, and the player automatically proceeded to a new round. Participants were informed that they would receive a monetary bonus payment on top of their show-up fee (of 5 Swiss Francs or 1/2 course credits every 15 minutes), and that the amount of this bonus was a function of the average cumulative sum of rewards per round, including the rounds in which all rewards were lost. More specifically, participants were paid out a tenth of this reward average as Swiss Francs. On average, participants received 6.3 Swiss Francs (SD = .94, min = 5, max = 8.5) as a bonus on top of their show-up fee. This incentive structure implies a simple and straightforward definition of the optimal policy for an EV maximizing, risk-neutral player, which is to attempt to accumulate a reward of 200 and then to cash in (see Methods above).

Each round began with a first die roll that was played out automatically. A regular white die with black dots was presented on a green, circular piece of carpet surrounded by a wooden background (Fig 2A). The roll of the die was animated by switching its orientation (tilted by 9 degrees to the right and to the left of the vertical axis) and randomly changing its displayed dots every 150 ms for a variable amount of time, drawn from a uniform distribution between 1.5 and 3.75 s and discretized in steps of 150 ms. Afterwards, the outcome of the current roll was displayed for 2 s by presenting the die in regular orientation. At the same time, the cumulative sum of rewards of the current round was presented on top of the die. In case of rolling a number from 2 to 6, participants had to make their decision to continue or to stop within this 2 s time window. Otherwise, they would be shown the losing die with a 0 reward on top for 2 seconds. In case the participant decided to stop, the cumulative sum of rewards of the current round was shown in the middle of the die for 2.5 s, and a new round started. If the participant decided to continue, the next die roll was animated as described above. After 25 min, the experiment ended (even if a round was not completed). Participants were then shown their average cumulative sum of rewards per round together with the respective bonus payments, received this payment together with their show-up fee, were debriefed, and left the experiment. The task was programmed using Psychopy [34].

4.3.3 Statistical analyses.

To estimate the IP for each participant, we performed a hierarchical Bayesian logistic regression analysis that regressed the decision to continue (vs. to stop) onto the cumulative sum of rewards (Σt) at every decision t. The regression model has the form: (9) with β0 and β1 representing the intercept and slope coefficients, respectively and Σt is the cumulative sum of rewards. The IP, which is defined as the amount of cumulative sum of rewards at which p(continue)t = .5, can then be derived as: (10)

We adopted a hierarchical Bayesian approach to estimate the regression coefficients and the IP. Individual coefficients were drawn from group-level Normal distributions:

Mean and standard deviations of these distributions were themselves drawn using the following hyper-priors:

To investigate how RT of stop and continue decisions were affected by different dynamics including choice conflict and switch value, we performed a hierarchical Bayesian linear regression predicting the log(RT) by the variables round number (i.e., r), within-round decision number, cumulative sum of rewards (i.e., Σt), and choice conflict (i.e., cct). Choice conflict was defined as the derivative of the logistic curve taken from the aforementioned analysis on choice data (note that the slope/derivative of this curve is maximal at p(stop)t = .5 and falls of symmetrically for both smaller and greater values of p(stop)t). Thus, the linear regression of RT had the form: (11)

Individual regression coefficients were drawn from group-level normal distributions: with hyper-priors for the coefficients i > 0: and for i = 0 (the intercept):

All Bayesian hierarchical models were estimated using pystan, a Python interface to Stan [35]. For the logistic and linear regression, we ran 2 chains with 2000 iterations each, the first half of which was discarded. For the sequential sampling models, we ran 4 chains with 8000 iterations each, the first 7000 of which was discarded. At the end of the model fitting procedure we performed the following sanity checks: we checked that the model converged by looking at the R-hat measure being lower than 1.05, we checked that there were not more than 1% divergencies, and we checked that not more than 1% of the iterations ended with a maximum tree depth of 10.

4.4 Experiment 2

4.4.1 Participants.

In Experiment 2, we tested 50 participants (35 females, age: 19–36, M = 22.8, SD = 3.2). Nine participants were excluded, using the same criteria as in Experiment 1. A larger sample size than in Experiment 1 was chosen because of the additional 2 conditions of the task and to counteract the problem of the low number of decisions after the IP per participant. However, the sample size was not determined by a formal power analysis.

4.4.2 Experimental procedures.

The experimental procedures were largely identical to those of Experiment 1. The only exception was that the task included three different conditions, which varied the probability of losing the cumulative sum of rewards at every decision. Thus, in addition to the standard (1/6) condition, in which all rewards in a certain round were lost when rolling a 1, there was also a 2/6 condition, in which all rewards were lost in a round when rolling a 1 or a 3, and a 3/6 condition, in which all rewards in a round were lost when rolling any odd number (i.e., 1, 3, 5). Moreover, participants played 18 minutes of each condition, so that the entire experiment would not be too long. The order of conditions was counterbalanced across participants. In particular, we had three possible orders: high to low probability (p = 3/6, p = 2/6, p = 1/6), low to high probability (p = 1/6, p = 2/6, p = 3/6), and middle then high then low probability (p = 2/6, p = 3/6, p = 1/6). We also counterbalanced across participants whether the ‘stop’ button was on the right or on the left of the ‘continue’ button. Participants were instructed about the presence of different conditions prior to the task. On average, participants received 3.4 Swiss Francs (SD = .33, min = 3, max = 4) as a bonus on top of their show-up fee.

4.4.3 Statistical analyses.

Statistical analyses of Experiment 2 were largely identical to those of Experiment 1, with performing hierarchical Bayesian logistic and linear regressions for the choice and RT data, respectively. Note that separate regressors were estimated for each task condition.

4.4.4 Sequential sampling modeling.

The DDM is arguably the most prominent representative of sequential sampling models. Without assuming across-trial variability in parameters [12], the DDM maps choice and RT data onto the four parameters drift rate dt, boundary separation a, non-decision time Ter, and (relative) starting point b on the basis of the Wiener distribution [36]: (12) with: (13)

DDM parameters were estimated using hierarchical Bayesian modeling procedures (similarly to the logistic and linear regression models described above). Individual coefficients for drift rate, threshold and starting-point bias were drawn from group-level Normal distributions:

Mean and standard deviations of these distributions for drift rate and threshold were themselves drawn using the following hyper-priors:

The hyper-priors of the starting-point bias were, however set to:

Since the non decision time did not have coefficients, its prior and hyper priors were the same as the δ0 parameter of drift rate and threshold parameters.

The DDM was applied to the data of Experiment 2, assuming different parameter sets for each condition.

Supporting information

S1 Text.

Fig A: Top panel: Percentage of decisions before and after the IP for a simulated group of 100 players with different risk preferences (risk averse on average, with α = .8). The decisions before the IP significantly decrease with the probability of losing, making the task more balanced. Lower panel: Percentage of decisions after the IP as a function of the IP, for the same group of simulated participants as in C. Participants with higher IPs (more risk seeking) experience a more imbalanced task in the 2/6 and 1/6 conditions. Fig B: Joint parameter distribution used for the simulation of 100 participants risk averse on average (mean α = .8). These parameters were used for the simulation reported in S1 Text. Fig C: Joint parameter distribution used for the simulation of 100 participants risk neutral on average (mean α = 1). These parameters were used for the simulation reported in the main text.

https://doi.org/10.1371/journal.pcbi.1010478.s001

(PDF)

S2 Text.

Table A: Logistic regression coefficients summary. Note. Summary of the intercept and cumulative sum coefficients at the group level. Table B: Linear regression coefficients summary. Note. Summary of the intercept, cumulative sum of rewards, decision number, and round number coefficients at the group level. Table C: Diffusion decision model coefficients summary. Note. The IP was calculated based on the drift-rate coefficients alone (IP =−δ0/δ1). The μ HDI are reported for all the diffusion decision model coefficients, that describe the effect of the cumulative sum on the main parameters: the drift-rate δ, the threshold A, and the relative starting point z. The model was only fit to the data of Experiment 2.

https://doi.org/10.1371/journal.pcbi.1010478.s002

(PDF)

S3 Text.

Fig A: Distribution of R squared based on 4000 posterior samples for the logistic and linear regressions in Experiment 1. Fig B: Distribution of R squared based on 4000 posterior samples for the logistic and linear regressions in Experiment 2.

https://doi.org/10.1371/journal.pcbi.1010478.s003

(PDF)

S4 Text.

Additional regression models Additional analyses to check whether a logistic regression model in which the condition was treated as a continuous predictor, together with the cumulative sum of rewards and their interaction, could explain the data better.

https://doi.org/10.1371/journal.pcbi.1010478.s004

(PDF)

S5 Text.

Table A: Model comparison of diffusion decision models (DDM) based on WAIC (data: Experiment 2). Note. If a DDM parameter was not modulated by any variable, a single value was estimated for it. If it was modulated, an intercept and a coefficient variable were estimated per variable. Lower WAICs indicate better fits to data after accounting for model complexity. **Best model. *These models do not fit credibly worse than the best model (because of the relatively high WAICse).

https://doi.org/10.1371/journal.pcbi.1010478.s005

(PDF)

S6 Text.

Fig A: Posterior distribution of the group-level IP according to the logistic model and to the best fitting diffusion decision model in the second Experiment, separately for the different conditions. Note that the IP in the diffusion model is calculated based on the drift rate coefficients only.

https://doi.org/10.1371/journal.pcbi.1010478.s006

(PDF)

References

  1. 1. Charnov E.L. Optimal foraging, the marginal value theorem. Theoretical Population Biology. 1976;9:129–136. pmid:1273796
  2. 2. Shenhav A., Straccia M.A., Cohen J.D., Botvinick M.M. Anterior cingulate engagement in a foraging context reflects choice difficulty, not foraging value. Nature Neuroscience. 2014;17(9):1249. pmid:25064851
  3. 3. Shenhav A., Cohen J.D., Botvinick M.M. Dorsal anterior cingulate cortex and the value of control. Nature neuroscience. 2016;19(10):1286–1291. pmid:27669989
  4. 4. Kolling N., Wittmann M.K., Behrens T.E.J., Boorman E.D., Mars R.B., Rushworth M.F.S. Value, search, persistence and model updating in anterior cingulate cortex. Nature Neuroscience. 2016;19. pmid:27669988
  5. 5. Kolling N., Behrens T.E., Mars R.B., Rushworth M.F Neural mechanisms of foraging. Science 2012;336(6077):95–98. pmid:22491854
  6. 6. Shenhav A., Straccia M.A., Botvinick M.M., Cohen J.D. Dorsal anterior cingulate and ventromedial prefrontal cortex have inverse roles in both foraging and economic choice. Cognitive, Affective, & Behavioral Neuroscience. 2016;16(6):1127–1139. pmid:27580609
  7. 7. Lejuez C.W., Read J.P., Kahler C.W., Richards J.B., Ramsey S.E., Stuart G.L., et al. Evaluation of a behavioral measure of risk taking: The Balloon Analogue Risk Task (BART). Journal of Experimental Psychology: Applied. 2002;8:75–84. pmid:12075692
  8. 8. Schonberg T., Fox C.R., Mumford J.A., Congdon E., Trepel C., Poldrack R.A. Decreasing ventromedial prefrontal cortex activity during sequential risk-taking: An fMRI investigation of the balloon analog risk task. Frontiers in Neuroscience. 2012;6. pmid:22675289
  9. 9. Meder D., Haagensen B.N., Hulme O., Morville T., Gelskov S., Herz D.M., et al. Tuning the brake while raising the stake: network dynamics during sequential decision-making. Journal of Neuroscience. 2016;36(19):5417–5426. pmid:27170137
  10. 10. Shenhav A., Botvinick M.M., Cohen J.D. The expected value of control: an integrative theory of anterior cingulate cortex function. Neuron. 2013;79:217–240. pmid:23889930
  11. 11. Ratcliff R. A theory of memory retrieval. Psychological Review. 1978;85(2):59–108.
  12. 12. Ratcliff R., Rouder J.N. Modeling response times for two-choice decisions. Psychological Science. 1998;9(5):347–356.
  13. 13. Gelman A., Goodrich B., Gabry J., Vehtari A. R-squared for bayesian regression models. The American Statistician. 2019.
  14. 14. Busemeyer J.R., Gluth S., Rieskamp J., Turner B.M. Cognitive and neural bases of multi-attribute, multi-alternative, value-based decisions. Trends in Cognitive Sciences. 2019;23:251–263. pmid:30630672
  15. 15. Gold J.I., Shadlen M.N. The neural basis of decision making. Annual Review of Neuroscience. 2007;30:535–574. pmid:17600525
  16. 16. Heekeren H.R., Marrett S., Ungerleider L.G. The neural systems that mediate human perceptual decision making. Nature Reviews Neuroscience. 2008;9:467–479. pmid:18464792
  17. 17. Ratcliff R., Smith P.L., Brown S.D., McKoon G. Diffusion decision model: current issues and history. Trends in Cognitive Sciences. 2016;20:260–281. pmid:26952739
  18. 18. Kool W., Cushman F.A., Gershman S.J. When does model-based control pay off? PLoS computational biology. 2016;12(8):e1005090. pmid:27564094
  19. 19. Miller K.J., Shenhav A., Ludvig E.A. Habits without values. Psychological review. 2019;126(2):292. pmid:30676040
  20. 20. Miller K.J., Ludvig E.A., Pezzulo G., Shenhav A. Realigning models of habitual and goal-directed decision-making. In Goal-directed decision making (pp. 407–428); 2018. Elsevier.
  21. 21. Zacharopoulos G., Shenhav A., Constantino S., Maio G.R., Linden D.E. The effect of self-focus on personal and social foraging behaviour. Social cognitive and affective neuroscience. 2021;13(9):967–975.
  22. 22. Kane G.A., James M.H., Shenhav A., Daw N.D., Cohen J.D., Aston-Jones G. Rat anterior cingulate cortex continuously signals decision variables in a patch foraging task. Journal of Neuroscience. 2022;JN-RM-1940-21. pmid:35688627
  23. 23. Gluth S., Rieskamp J., Büchel C. Neural evidence for adaptive strategy selection in value-based decision-making. Cerebral Cortex. 2014;24:2009–2021. pmid:23476024
  24. 24. Payne J.W., Bettman J.R., Johnson E.J. Adaptive strategy selection in decision making. Journal of Experimental Psychology: Learning, Memory, and Cognition. 1988;14:534–552.
  25. 25. Rieskamp J., Otto P.E. SSL: a theory of how people learn to select strategies. Journal of Experimental Psychology: General. 2006;135:207–236. pmid:16719651
  26. 26. Todd P.M., Gigerenzer G. Environments that make us smart: ecological rationality. Current Directions in Psychological Science. 2007;16:167–171.
  27. 27. Pleskac T.J. Decision making and learning while taking sequential risks. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2008;34:167–185. pmid:18194061
  28. 28. Figner B., Mackinlay R.J., Wilkening F., Weber E.U. Affective and deliberative processes in risky choice: age differences in risk taking in the Columbia Card Task. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2009;35:709–730. pmid:19379045
  29. 29. Gluth S., Rieskamp J., Büchel C. Deciding when to decide: time-variant sequential sampling models explain the emergence of value-based decisions in the human brain. Journal of Neuroscience. 2012;32:10686–10698. pmid:22855817
  30. 30. Gluth S., Rieskamp J., Büchel C. Classic EEG motor potentials track the emergence of value-based decisions. NeuroImage. 2013;79:394–403. pmid:23664943
  31. 31. van Duijvenvoorde A.C.K., Huizenga H.M., Somerville L.H., Delgado M.R., Powers A., Weeda W.D., et al. Neural correlates of expected risks and returns in risky choice across development. Journal of Neuroscience. 2015;35:1549–1560. pmid:25632132
  32. 32. Analytis P.P., Wu C.M., Gelastopoulos A. Make-or-break: Chasing risky goals or settling for safe rewards? Cognitive science. 2019;43(7):e12743. pmid:31310027
  33. 33. Baumann C., Singmann H., Gershman S.J., von Helversen B. A linear threshold model for optimal stopping behavior. Proceedings of the National Academy of Sciences. 2020;117(23):12750–12755. pmid:32461363
  34. 34. Peirce J.W. Psychopy—psychophysics software in python. Journal of neuroscience methods. 2007;162(1–2):8–13. pmid:17254636
  35. 35. Carpenter B., Gelman A., Hoffman M.D., Lee D., Goodrich B., Betancourt M., et al. Stan: A probabilistic programming language. Journal of Statistical Software. 2017;76(1):1–32.
  36. 36. Navarro D.J., Fuss I.G. Fast and accurate calculations for first-passage times in Wiener diffusion models. Journal of Mathematical Psychology. 2009;53:222–230.