Intertemporal Choice as Discounted Value Accumulation

Two separate cognitive processes are involved in choosing between rewards available at different points in time. The first is temporal discounting, which consists of combining information about the size and delay of prospective rewards to represent subjective values. The second involves a comparison of available rewards to enable an eventual choice on the basis of these subjective values. While several mathematical models of temporal discounting have been developed, the reward selection process has been largely unexplored. To address this limitation, we evaluated the applicability of the Linear Ballistic Accumulator (LBA) model as a theory of the selection process in intertemporal choice. The LBA model formalizes the selection process as a sequential sampling algorithm in which information about different choice options is integrated until a decision criterion is reached. We compared several versions of the LBA model to demonstrate that choice outcomes and response times in intertemporal choice are well captured by the LBA process. The relationship between choice outcomes and response times that derives from the LBA model cannot be explained by temporal discounting alone. Moreover, the drift rates that drive evidence accumulation in the best-fitting LBA model are related to independently estimated subjective values derived from various temporal discounting models. These findings provide a quantitative framework for predicting dynamics of choice-related activity during the reward selection process in intertemporal choice and link intertemporal choice to other classes of decisions in which the LBA model has been applied.


Introduction
In order to choose between rewards available at different points in time it is often necessary to evaluate the tradeoff between the size of potential rewards and the corresponding delays until their receipt. For example, deciding whether to save or spend a certain amount of money requires determining whether ensuring greater future wealth is worth delaying the pleasure of spending and consuming now. When engaged in this form of decision making, a class of decisions known as intertemporal choice, humans and other species discount the value of rewards in proportion to the delay at which they are available. Moreover, the behavior observed in intertemporal choice experiments reveals preferences consistent with a steep reduction in the value of rewards delayed from the present moment but more modest discounting of rewards delayed from future time points [1]. This property is particularly evident as a greater reluctance to forego immediate for delayed rewards compared with when both outcomes are delayed, a tendency that manifests itself in impulsivity and a predilection for procrastination. Several mathematical models have been shown to account for this pattern of delay discounting [2]. However, subjective valuation is only one of the cognitive processes involved in intertemporal choice behavior [3,4].
In addition to representing the value of delayed rewards, intertemporal choices require comparing alternatives and selecting among them. One proposal for how delayed rewards might be compared and selected is through a process of sequential sampling of discounted values [3]. Similar processes are commonly assumed to underlie perceptual judgments based on sensory evidence [5]. This hypothesis suggests that there exists a direct connection between choices made on the basis of discounted values and other choices which have been argued to derive from sequential sampling processes. However, the hypothesis that a sequential sampling process underlies intertemporal decision-making has not been empirically tested. Therefore, our primary goal is to determine whether intertemporal choice behavior can be explained by a sequential sampling process based on discounted value.
There are several computational models that employ sequential sampling mechanisms to explain choice behavior (cf. [6][7][8]). A major accomplishment of all of these models is their ability to provide a process-level account of how experimental manipulations such as time pressure and stimulus ambiguity simultaneously affect response times (RT) and error rates. While many of these models might be able to explain intertemporal choice behavior, we used the Linear Ballistic Accumulator (LBA) model [8] in our analyses. The LBA model incorporates the fundamental features of all sequential sampling models, including trial-to-trial variability in the rate of evidence accumulation, a decision criterion, and constants to account for perception and motor execution times. The major advantage of the LBA model is its analytical tractability, which facilitates testing several versions of the model to determine which combination of parameters best accounts for intertemporal choice behavior. We show that the LBA model provides an excellent description of the relationship between choice outcomes and RT and that best-fitting model parameters can be directly related to subjective values.

Subjects
Fifty healthy adults participated in this study (28 females, ages 19-46 years, mean 24.36 years). All subjects gave written informed consent. Stanford University's Institutional Review Board approved the study. One subject was excluded because the behavior did not allow us to estimate reliable temporal discounting parameters. Another three subjects were excluded because of data collection problems. Data from a total of forty-six subjects were analyzed (28 females, ages 19-46 years, mean 24.26 years).

Temporal discounting model and task design
The experiments were conducted over two sessions. The purpose of the first session was to estimate each individual's discount rate using a hyperbolic discounting model. For half of our subjects (n~23) the second session consisted of an electroencephalography (EEG) experiment. For the other half the second session consisted of a functional magnetic resonance imaging (fMRI) experiment. The analyses reported below were obtained from the behavior observed during these EEG and fMRI sessions.
We assumed that the subjective value of delayed rewards was discounted according to where r is the magnitude of a reward offered at delay t. The individually-determined parameter k is the discount factor [9]. While subjects completed the first session, we used a stair-stepping procedure to approximate k. All choices required participants to select between a delayed reward (of amount r available at delay t) and a fixed immediate reward of $10. For any choice, indifference between the immediate and delayed options implies a discount rate of k~(r{10)(10t) {1 . We refer to this implied equivalence point as k eq ; our procedure amounted to varying k eq systematically until indifference was reached. Specifically, we began with k eq~0 :02. If the delayed offer was chosen, k eq was decreased by a step size of a~0:01 for the next trial. Otherwise, k eq increased by the same amount. At every second choice reversal, occurring within five consecutive trials, the step size was reduced by 5%. A total of 60 trials were completed. We placed no limits on the time subjects could take to respond, and presented both offers on the screen, as ''$10 now'' on the left side, and ''$r in t days'' on the right. Critically, our use of the hyperbolic discounting model to summarize behavior in this first experimental session had no bearing on the modeling results that follow. We used the hyperbolic model because it provided a good fit to behavior with a single parameter (k) summarizing preferences. Fits of this model were used solely to generate choices for the second experimental session. Alternative delay discounting functions that may or may not provide better fits to behavior would have a subtle impact on the choice set (dollar amounts of choice options) for the second study, but no impact on the model fitting that is the primary aim of the current study.
After completing the first session, we fit a softmax decision function to participants' choices. Intuitively, this procedure allowed us to determine how consistently participants selected the option with greater subjective value. Practically, we fit the softmax to better equate choices during the second session, across participants. In particular, our aim was to equate the relative impact of delayed rewards, across subjects, with respect to actual choice outcomes (i.e. the likelihood of selecting the delayed option). Best fitting softmax functions were estimated by maximizing the likelihood of observed choices. We assumed that the likelihood of selecting a delayed reward (P D ) was given by where V D is given by Equation 1, V I~$ 10 (i.e., the fixed-value of the immediate reward also given by the right side of Equation 1) and m describes a subject's sensitivity to changes in V D . We used individually determined values of k and m to generate choices for the second session. At every trial, t was randomly selected from a range of 30-45 days. We then calculated and offered an amount r that would give P D of 0.1, 0.3, 0.5, 0.7, or 0.9 (Figure 1a-b). The EEG group completed 30 trials at every P D level, except at P D = 0.5, for which they completed 60 trials. The fMRI group completed 40 trials at every P D level, except at P D = 0.5, for which they completed 80 trials. Non-uniform trial distributions as a function of P D were introduced to allow us to study the effects of choice difficulty on EEG and fMRI measures, with equal numbers of trials at each difficulty level. We report the results of these analyses elsewhere. Trial types were randomized and counterbalanced over two blocks for the EEG group and over four blocks for the fMRI group. We also counterbalanced the mapping between choices and button presses for every subject. During the first half of the second session, approximately half of subjects (13 in EEG, 11 in fMRI) indicated choices of the delayed reward by pressing a button with their left index finger and immediate choices by pressing a different button with their right index finger. The other subjects indicated their choices by the inverse left-right mapping. All subjects switched the initial response mapping during the second half of the session.
To ensure reliable neural measures, we used a sequential presentation of delay and amount during the second session ( Figure 1c). During pilot studies we found that a simultaneous presentation of delay and amount caused participants to sequentially fixate the information, producing excessive EEG artifacts. Having the information presented sequentially allowed subjects to maintain central fixation during the task, avoiding these artifacts. As we show below, this sequential presentation of delayed reward information had no adverse effects on behavior. We maintained the same sequential presentation during the fMRI study for the purpose of facilitating direct comparisons and pooling of behavioral data. We report RT as measured from the onset of the decision period, 1000 ms into the trial. The duration of the decision period was fixed at 4000 ms. When subjects made choices in less than 4000 ms the amount information disappeared and the screen remained blank until 4000 ms elapsed. Trial length was thus fixed at 5000 ms. We discarded any trial in which a response was made in less than 200 ms or fell outside of the decision period. To optimize experimental time and separability of neural signals across trials for both groups, we introduced a long inter-trialinterval for the fMRI group (between 4-10 s), whereas the intertrial-interval was shorter for the EEG group (100-350 ms). In exchange for participation subjects received $10 cash and an additional amount, determined by their choice in a randomly selected trial, taken from either the first or second sessions.

Model specification and fitting
Dg. I and D are the immediate and delayed rewards respectively. The model assumes that evidence for I and D is accumulated independently in separate accumulators. Both accumulators begin with some choice bias, which is provided as independent amounts of starting point evidence fa I ,a D g, sampled from a common uniform distribution U½0,A. Evidence then increases through time at rates fd I ,d D g, which are sampled from independent normal distributions with means fm I ,m D g. Mean accumulation rates vary across value conditions, but the standard deviation s is the same for I and D. Therefore, d I *N (m v,I ,s) and d D *N (m v,D ,s). Each accumulator gathers evidence until either one reaches a response threshold b. The observed RT is the sum of the decision time, plus some extra time t, which accounts for non-comparison and selection processes, such as temporal discounting and motor execution. Letting fa I ,a D g = a and fd I ,d D g = d, the RT in any given trial is given by The model provides a closed-form and joint account of RT and choice probability across value conditions by specifying ''defective'' probability density functions (PDF) for I and D in terms of the parameters just described. These defective PDFs give the probabilities of each accumulator reaching the bound at time t.
For our best fitting model, the full PDFs are given by where f (RT{t) and F (RT{t) are the PDF and cumulative density functions of each accumulator (see [8] for details). We estimated LBA model parameters using a hierarchical Bayesian procedure. This procedure offers two advantages over conventional maximum likelihood methods, providing measures of uncertainty for every parameter estimate and allowing the sharing of information across subjects (e.g., [10,11]), which improves fitting accuracy [12][13][14]. We assume that the data for each subject is characterized by an individual set of LBA model parameters h, and that these subject-specific parameters are constrained by a set of group-level parameters w, which characterize the central tendency and dispersion of h across subjects. The procedure first samples the posterior distributions for every subjects' h and uses these estimates to derive the posterior distribution of w. On every subsequent iteration, the posterior estimates of w are used to constrain the sampling of possible values of h for every subject. We specified mildly informative priors for h, based on empirical evidence from previous fits of the LBA model using the hierarchical Bayesian procedure [15]. For w, we specified a conjugate relationship between prior and posterior (see, e.g., [16]). Assuming a conjugate relationship at the group-level allowed us to derive exact conditional posterior distributions, so that we could perform the estimation of all of the parameters simultaneously, based on a single sample of subject-level parameters. The joint posterior distribution estimated by this procedure is given by: where p(w) is the prior distribution for w, p(hjw) is the prior distribution for h given w, and  is the likelihood function of the data under the LBA model (given by Equation 4).
To satisfy scaling conditions, we imposed a constraint such that the drift rates sum to one (i.e., m v,I zm v,D~1 ). Consequently, it is sufficient to only estimate the drift rate for the delayed reward. For the subject-specific parameters, we first transformed the parameters so that they had continuous, infinite support (i.e., can take on any real value). Thus, for parameters bounded by zero, we applied a log transformation, whereas for the drift rates -which were bounded by zero and one -we used a logit transformation. Following these transformations, we specified the following priors for h: To obtain the desired conjugate relationship between prior and posterior at the level of w, we specified the following priors for the group-level means: where C {1 (a,b) denotes the inverse gamma distribution with shape parameter a, and scale parameter b. This particular choice of a and b for the priors produces a skewed distribution with an approximate 95% credible set of (1.14, 9.05), and an expected value of 3.32. These choices reflect our a priori beliefs: we did not expect the between-subject variability to be less that 1, and felt that larger values would become increasingly less likely to account for these data. While our prior selections were informed by other similar modeling applications (see, e.g., [15]), we remained conservative in our choices to avoid undue parameter constraint, because our experimental task was considerably different from prior research using the hierarchical version of the LBA model.
We used Gibbs sampling to estimate parameters at the grouplevel [16], and differential evolution with Markov chain Monte Carlo to estimate parameters at the subject-level (DE-MCMC; [15,17]). For the subject level estimates, we used 24 chains and obtained 5,000 samples after a burn-in period of 5,000 samples. We then thinned the chains to reduce autocorrelation by retaining every fourth sample. Thus, our estimates of the joint posterior distribution of LBA model parameters are based on 30,000 samples. The burn-in period allowed us to converge quickly to the high-density regions of the posterior distribution, while the rest of the samples allowed us to improve the reliability of the estimates.
To find the optimal number of parameters needed to account for intertemporal choice behavior, we tested a variety of model variants where different sets of parameters were assumed to vary across value conditions. We fit a total of eight variants, following a model building approach based on the Bayesian predictive information criterion (BPIC; [18]). Table 1 shows the model variants we fit (left column) with the particular constraints that were imposed (right column) along with the resulting BPIC values obtained (middle column). We started with the simplest possible model and added parameters only if they improved model fits on the basis of BPIC. The most basic model (M1) only allowed the mean drift rates fm I ,m D g to vary across value conditions. Another four models freed each of the remaining parameters (t,A,b and s), independently, across value conditions. Because the model that freed t (M2) was superior to M1, we considered three additional models that freed fm I ,m D g and t, together with each of the remaining parameters independently. None of these three models improved fits, indicating that no additional parameter combinations needed to be tested. We did not consider any models that freed parameters other than m between I and D because we found no a priori justification for them. Table 1 shows BPIC results for all the models tested. The best overall model, albeit by a small margin, was M2, which allowed mean drift rates (m) and non-decision times (t) to vary across experimental conditions. Figure 3 shows the quality of the fits obtained with this model. The match between the data and the model predictions is clear in each of the defective PDFs and histograms shown on the top row. These fits speak to the LBA model's ability to simultaneously account for observed RT distributions and choice probabilities during intertemporal choice.

Model fits
The bottom row of Figure 3 shows the model fits with the RT distributions for both accumulators on the same axis to better illustrate the relationship between choice probability and RT. As net value (i.e. V D {V I j j ) increases, choices for the reward of less subjective value are slower relative to choices for the reward of greater value. This finding is illustrated by the increased separation of RT medians as the probability of choosing the delayed reward deviates further from P D~0 :5 ( Figure 3). We confirmed the reliability of this pattern in the data by analyzing RT medians for choices that were consistent versus inconsistent with estimated subjective values. Specifically, we performed a rank-test on RT medians for consistent and inconsistent choices for all value conditions for which P D =0:5 and confirmed that inconsistent responses were slower relative than consistent responses in all conditions where P D =0:5 (p~5:883|10 {12 ). A similar relationship between RT and choice probability is commonly observed during perceptual decision making under stressed accuracy conditions. As choice probabilities deviate from P D~0 :5, the means of the drift rate distributions (fm I ,m D g) grow further apart (cf. [19,20]). Recall that m I~1 {m D . However, subjects maintain an elevated accumulation bound (b) relative to the starting points (fa I ,a D g). As a result, choices for the reward of less subjective value only occur in the improbable trials where the drift rate for the highest valued reward is unusually low, the drift rate for the lowest valued reward is unusually high, and subjects require more accumulated information before a decision can be made. If the starting points were large relative to the decision bound we would observe the opposite interdependence of RT and choice probabilities. Inconsistent choices would be faster than consistent choices, because fast errors occur when the initial choice bias drives the accumulation close to the decision bound before much evidence influences the decision. This value accumulation mechanism can explain why our model fitting results indicated that variability in b or A was not required to provide a good fit for these data (i.e., M1 and M2 performed better than M3, M4, M6, and M7).

Non-decision time
The best fitting model, M2, specifies a total of 13 subject-specific parameters, four more than the next best, and simplest model, M1. The four additional parameters modeled differences in nondecision time (t) by value condition (P D ). To evaluate whether there was indeed systematic variance in non-decision time, we first inspected group-level estimates of t, shown in the left panel of Figure 4. These parameter estimates showed a positive quadratic pattern centered at P D~0 :5. To test the quadratic relationship between t and value, we performed a mixed-effects regression analysis with the nlme package in R (Jose Pinheiro et al., 2013), specifying subjects as random effects, and the regressor (P D {:5) 2 as a predictor of subject-specific maximum a posteriori (MAP) estimates of t. The results corroborated a positive quadratic relationship between t estimates and value (t(183)~3:506, p~6|10 {04 ), suggesting that there is an increase in valuation and/or motor-execution times as net value increases.  In the LBA model, t functions as an offset term that captures differences in condition-wise RT that are not captured by the other parameters. The obvious empirical statistics related to average RT differences are condition-wise median and minimum RT. We therefore next tested whether (1) t estimates were related to either median or minimum RT, and (2) whether minimum and/or median RT differed by value condition as suggested by the positive quadratic relationship between t estimates and value.
The middle and right panels of Figure 4 plot subject-specific MAP estimates of t against minimum and median RT, respectively. We conducted two mixed-effects regressions (using subjects as random effects) to determine whether t estimates were related to minimum or median RT at each value condition. As hypothesized, t estimates showed a significant linear relationship with minimum RT (b~0:137, t(183)~9:716, pv1|10 {16 ) and also a significant linear relationship with median RT (b~0:029, t(183)~4:599, pv1|10 {16 ).
Given these results, we next sought to determine whether RT differed across value conditions in the same manner as did estimates of t. To test this hypothesis, we ran two additional mixed-effects regressions using the quadratic regressor (P D {:5) 2 as a predictor of minimum and median RT (with subjects again as random effects). Recall that t estimates showed a positive quadratic relationship with value. This relationship with value was not evident in analyses of minimum or median RT. Specifically, minimum RT did not show a significant quadratic relationship (t(183)~{0:403, p~0:688), and median RT showed a significant negative relationship with value (t(183)~{6:169, pv1|10 {16 ). We conclude from these results that neither minimum nor median RT alone can explain the positive quadratic relationship between t and value. Taken together, our results suggest that the additional degrees of freedom in M2 allowed the model to capture withinsubject changes in minimum RT and residual variance of median RT across value conditions.

Drift rates and value
To obtain a more precise characterization of M2 as a mechanistic theory of discounted value accumulation, we examined the relationship between independently estimated accumulation rates and discounted values. We first tested whether there were systematic differences in group-level estimates of m D as a function of P D . Group-level means of m D increased as a function of P D . Specifically, we ran a mixed-effects regression of subjectspecific MAP estimates of m D on P D (using subjects as randomeffects). This test revealed a significant positive linear relationship (b~0:124, t(183)~30:587, pv1|10 {16 ; Figure 5, left plot).
Next, we tested for a relationship between observed choice probabilities for the delayed reward and MAP estimates of m D and m I at the level of individual subjects. Specifically, we hypothesized that drift rates (m) should be related to subjective value through a linear transform, with a slope parameter to account for differences in scale (i.e. m D and m I are restricted to be between 0 and 1 but V D and V I are in dollars with a mean of $10) and an offset parameter to account for differences in drift rate and value means. We further reasoned that if drift rates were directly related to discounted subjective value then drift rates ought to be related to choice probability in the same way that differences in value are related to choice probability. Based on fits of the hyperbolic temporal discounting model (Equation 1) to choice outcomes, we already knew that a sigmoidal relationship (Equation 2) existed between subjective value (i.e. DV~V D {V I ) and choice probabilities (i.e. P D ). If modeled drift rates had the same relationship then we would expect a similar relationship between P D , m D , and m I . However, m D and m I were not independent in our model specification. They were restricted such that m D zm I~1 . Thus, the difference in drift rates, Dm~m D {m I reduces to a linear transformation of m D : Dm~2m D {1. We therefore tested whether a sigmoidal relationship exists between subject-and conditionspecific P D and a linear transform of m D : where b 1 and b 2 are subject specific parameters.
We tested for evidence to support Equation 6 in two ways. First, we performed a mixed-effects logistic regression using m D to predict P D , with subjects as random effects. This analysis revealed a significant fit (b 1~{ 4:782, b 2~9 :556, z~47:03, pv1|10 {16 ). The sigmoidal relationship is also clearly evident in the center plot of Figure 5 which plots P D against m D . Next, we tested whether the relationship between P D and DV (i.e., Equation 2) was directly related to the relationship between P D and Dm (i.e., Equation 6). If so, then the logistic function in both analyses should be equivalent and the following relationship should hold: We estimated all of the parameters in Equation 7 from separate logistic regression analyses. Namely, b 1 and b 2 were obtained from fitting Equation 6, m derived from fitting Equation 2, and V D {V I was obtained from best fits of Equation 1, all independently for every subject. In a group-level analysis, we used a mixed-effects regression with subjects as random effect and the right side of Equation 7 as the predictor. This analysis revealed a highly significant slope near unity (b~0:881, t(183)~51:326, pv1|10 {16 ). Together, these analyses indicated that there was a strong and direct relationship between drift rates and discounted value. Parameter estimates derived from fitting the LBA model to behavior therefore provided an independent means of estimating subjective values. Moreover, subjective values estimated from the LBA model corresponded closely with values estimated using a hyperbolic discounting model.

Generalizability of the relationship between drift rates and value
The previous analysis showed that a relationship existed between drift rates derived from LBA model fits and subjective value calculated based on a hyperbolic discount function. Of course, subjective value may actually be determined in a manner that differs in functional form from the hyperbolic equation (cf. [2]). Indeed, numerous functions have been proposed to account for delay discounting. In this final section, we aimed to show that drift rates derived from the LBA model are related to subjective value more generally; that is, that the relationship between drift rates and subjective value does not strictly depend on capturing subjective value using the hyperbolic discount function. To do so, we first fitted two additional discounting models to individual subjects' choices, substituting the right side of Equation 1 with exponential and ''quasi-hyperbolic'' value functions. For the exponential discounting function, we assumed V D to be given by: where r is the delayed reward amount, a is the discount rate, and t is the delay. Similarly, for the quasi-hyperbolic discounting function, we assumed V D to be given by: where r is again the delayed reward amount, b is 1 when there is no delay or some fixed value between 0 and 1 when there is a delay, d is between 0 and 1, and t is the delay (always greater than zero). We then obtained estimates of V D {V I using Equation 8 and Equation 9, as well as two independent estimates of m, one for each discounting function, from Equation2, for every subject. Next, we ran mixed-effects regression analyses with subjects as random effect and the right side of Equation 7 as predictors of subject-specific drift rate estimates. The analysis using V e D revealed a significant slope near unity (b~:938, t(183)~25:662, pv1|10 {16 ) and the analysis using V bd D also revealed a significant positive slope (b~:47, t(183)~15:58, pv1|10 {16 ). We therefore conclude that drift rates are related to subjective value independent of the specific functional form assumed for delay discounting.

Discussion
We have shown that intertemporal choice behavior is consistent with a process of discounted value accumulation instantiated by the LBA model. Our findings support the broader hypothesis that selecting among delayed rewards can be explained by a sequential sampling process that corresponds closely with mechanisms known to predict other types of choices (cf. [3]). Thus, perceptual and value-based decision making may depend on similar comparison and selection processes. It is interesting to speculate on whether this similarity reflects a direct correspondence between the cognitive and neural processes that support selection across diverse domains or whether there is simply a common motif for action selection used in separate choice domains.
The LBA model we employed here has been used to explain neural activity during perceptual decision making (cf. [20][21][22]). Furthermore, sequential sampling processes such as that implemented by the LBA model provide a direct link between neural dynamics and decision making behavior. For example, evidence about visual motion is believed to be integrated in the lateral intraparietal (LIP) area, resulting in a progressive increase in LIP neuron firing rates that reflect the accumulation of sensory evidence and predict choice outcomes and response times [23,24]. Our results represent a first step in extending such findings from perceptual decision making tasks to generate quantitative predictions about discounted value accumulation in intertemporal choice. Moreover, our hierarchical LBA model fitting method might be particularly advantageous for studying the neural mechanisms of value accumulation when used in combination with the ''joint modeling framework'', which was designed to simultaneously explain neuroimaging and choice data [25,26]. Using this framework, [25] have shown that it is possible to link neural and behavioral measures in a way that maps the mechanisms assumed by cognitive models directly to neural function. This approach allows for the specification of a priori predictions for how neural mechanisms should influence the modeled cognitive processes that presumably best explain behavior, providing a basis for hypothesis tests that are simultaneously informed by neural data, model parameters, and behavior.
Our results revealed a relationship between response time and choice probability, such that low probability choices are associated with increased response time. Similar results have been observed in previous studies using accumulation models to account for behavior in risk preference [27][28][29] and simple choice tasks [30][31][32][33][34]. Our observation that the LBA model can accommodate the relationship between response times and choice probability during intertemporal choice is thus consistent with previous findings and suggests that the LBA model might also be useful in accounting for behavior in other value-based decision domains.
Our best-fitting model included variability in drift rates and non-decision times across value conditions. This result violated our a priori expectation that drift rate variability across value conditions would be sufficient to account for our behavioral manipulation. Moreover, our results indicate that the model containing nondecision time variability performed only slightly better than the simplest model which was consistent with our theoretical expectation. Thus, from a purely theoretical standpoint, we favor the simplest model. However, for methodological consistency and empirical validity, we supported and analyzed the fits obtained from the best-fitting model. The BPIC statistic provides a measure of model quality that penalizes for the total number of parameters in the model [18]. Relying on the BPIC statistic we corroborated our prediction that very few parameters needed to vary across conditions, but also found that the best model was not the simplest one. Future studies using the LBA could corroborate if in fact the simplest model generalizes better than the model with variability in non-decision time.
We showed that drift rates estimated with the model are directly related to discounted subjective values independently derived from behavioral models of intertemporal choice. The drift rate parameters of the LBA model therefore have a direct psychological interpretation and suggest a powerful means to estimate subjective values independent of assuming and fitting a specific form for temporal discounting (e.g. the hyperbolic model in Equation 1). In contrast, we are uncertain about how to interpret the variability in non-decision times across value conditions. On average, non-decision times decreased with increased difficulty. Moreover, although median RT showed a modest relationship with non-decision times, median RT increased with choice difficulty, reflecting a dependence on accumulation rates. Nondecision times also correlated strongly with minimum RT, which did not vary systematically across value conditions, but was highly variable across subjects. This suggests that our best-fitting model is reflecting the fact that minimum RT varies considerably across value conditions. It is unclear what to conclude from these findings. Our belief is that non-decision times capture idiosyncratic differences in choice strategies and valuation processes across subjects and that incorporating a parameter to absorb these trends improves model fits overall and the interpretability of drift rates more specifically.
In summary, we have demonstrated that an LBA model provides an excellent description of the choice process in intertemporal decision making. The model fits RT distributions, provides an explanation for interdependence between RT and choice probability, and can be interpreted in terms of value accumulation. These results validate the LBA model as a complementary tool to temporal discounting models for studying the cognitive and neural mechanisms of intertemporal choice. Because the LBA has been applied to a wide range of perceptual decision making tasks, our findings not only demonstrate that a general mechanism of evidence accumulation drives decision making but also support a common and analytically tractable framework for explaining it.