The importance of urgency in decision making based on dynamic information

A standard view in the literature is that decisions are the result of a process that accumulates evidence in favor of each alternative until such accumulation reaches a threshold and a decision is made. However, this view has been recently questioned by an alternative proposal that suggests that, instead of accumulated, evidence is combined with an urgency signal. Both theories have been mathematically formalized and supported by a variety of decision-making tasks with constant information. However, recently, tasks with changing information have shown to be more effective to study the dynamics of decision making. Recent research using one of such tasks, the tokens task, has shown that decisions are better described by an urgency mechanism than by an accumulation one. However, the results of that study could depend on a task where all fundamental information was noiseless and always present, favoring a mechanism of non-integration, such as the urgency one. Here, we wanted to address whether the same conclusions were also supported by an experimental paradigm in which sensory evidence was removed shortly after it was provided, making working memory necessary to properly perform the task. Here, we show that, under such condition, participants’ behavior could be explained by an urgency-gating mechanism that low-pass filters the mnemonic information and combines it with an urgency signal that grows with time but not by an accumulation process that integrates the same mnemonic information. Thus, our study supports the idea that, under certain situations with dynamic sensory information, decisions are better explained by an urgency-gating mechanism than by an accumulation one.


Introduction
When making decisions, one needs to predict which option will lead to the best outcome. To do that, information is gathered from all possible sources and weighted according to its reliability. In laboratory studies, this has been investigated using perceptual decision-making tasks requiring sensory evidence discrimination to correctly select between two options [1,2]. Neurons from the frontoparietal network exhibit a ramping activity that seems to mimic the deliberative process of decision making [2][3][4][5][6]. The general agreement is that, during such deliberative process, information is sequentially sampled until a bound is reached and a decision is made. However, how such samples are incorporated into the decision-making process is still open to debate. Here, we address this issue by studying the accuracy of two widely accepted alternative models-the Evidence Accumulation Model and the Urgency Gating Model-to describe experimental data collected from a decision-making task with information that varied over time.
In the last decades, two alternative theories have been proposed to explain decision making. The standard view proposes that decisions are the result of accumulating evidence until a threshold is reached. This view has led to the development of the Evidence Accumulation Model (EAM), which has accounted for a variety of behavioral and neuronal data [7][8][9][10][11][12][13]. However, recently, this view has been questioned by an alternative theory that proposes that, rather than being accumulated, sensory evidence is weighted by an urgency signal that grows with time. This could also explain the ramping activity of neurons in the decision-making network as well as behavior in different decision-making paradigms [14][15][16][17]. Within this view, the Urgency Gating Model (UGM) proposes that evidence is low-pass filtered and multiplied by a temporally increasing signal [15]. Thus, previous research supports both kinds of models. However, in most tasks, decisions relayed on constant information and, although they have proven to be valid to discriminate between the models in some cases [18], in some others, they have proven to be inadequate [15,19].
In the last years, new perceptual decision-making paradigms involving changes of information in the course of a trial have been proposed [15,16,[20][21][22][23][24]. In such tasks, perceptual evidence is sequentially presented in favor of one of two options and humans or animals have to decide which one of the two options is the most favored one. Although these tasks have provided a significant advance towards the description of a general mechanism for perceptual decision making, the question of whether the neuronal dynamics during such decisions follow the mechanism proposed by either the EAM or the UGM remains still unanswered. To distinguish between the models, Cisek et al. [15] used a perceptual decision-making task, called the tokens task, in which visual stimulus sequentially jumped towards a right or left target and stayed there until subjects committed to a choice. In these cases, the EAM failed to explain the experimental data and, instead, the UGM provided a reliable mechanism by which decisions might be made. The UGM model proposes that, rather than integrated, sensory evidence is modulated by an urgency signal that increases over time, reflecting the increasing need to make a decision as time passes [15,16]. Subsequently, by using the same task, Thura et al. [5] showed that the activity of the neurons in the premotor and primary motor cortex reflected the combination of sensory evidence with an urgency signal. However, the fact that novel sensory evidence remained available until the decision was made might have biased their results, since there was no implicit need for integration. In other words, one could assess the situation just before the decision is made and obtain the same information that one would get by observing for the entire period. Thus, there is still no real consensus on whether the ramping activity of the neurons in the frontoparietal network reflects the integration of sensory evidence or is instead the result of a combination of sensory evidence with an urgency signal that increases over time.
Here, we further contrast the predictions produced by the two computational models by using a modified version of the tokens task [5,15], introducing an additional condition in which novel sensory evidence is removed from the screen soon after it is provided. In each trial, fifteen tokens, presented in a central circle, sequentially jumped towards a left or right target, indicated by a circle on the screen. Subjects had to guess which target would have more tokens by the end of the trial. They could make their choice at any time. The trials were divided into blocks that contained only "all-stay" or "all-away" trials ( Fig 1A). In all-stay trials the tokens stayed visible during the entire trial whereas in the all-away condition they disappeared soon after they jumped into one target (see Materials and Methods). With this new task design, we can test both models under conditions that might favor evidence integration.

Results
Overall, the subjects performed the task with an accuracy greater than chance in both the allstay and the all-away conditions, although with a significantly higher accuracy in the all-stay trials (73 ± 2%) than in the all-away trials (68 ± 2%; paired-samples t-test, p = 0.04, t = 2.25). In addition, their decision times (DTs) were slower in the all-stay than in the all-away condition, at 1.465 ± 0.074 s and 1.207 ± 0.121 s, respectively (paired-samples t-test, p = 0.04, t = 2.25). The subjects' mean (± standard error of the mean [SEM]) baseline reaction time (RT) used to calculate the DTs was 0.347 ± 0.009 s.

Behavior is modulated by context
We used the easy and ambiguous trials to investigate whether the subjects' performance was influenced by the type of trial. DTs and success probabilities (SPs) in these two trial types indicated that the subjects behaved differently in easy and ambiguous trials in both the all-stay and all-away conditions (Fig 2A). Specifically, DTs were faster and SPs higher for easy than for ambiguous trials in both conditions (mean DT and SP ± SEM for all-stay: 1.587 ± 0.088 s and 57 ± 1% for ambiguous trials, 1.228 ± 0.058 s and 93 ± 1% for easy trials; mean DT and SP ± SEM for all-away: 1.331 ± 0.146 s and 55 ± 1% for ambiguous trials, 1.049 ± 0.095 s and 89 ± 2% for easy trials). Fig 2B shows the behavior of one representative subject during ambiguous and easy trials in all-stay and all-away conditions. Consistent with the group's behavior, this subject responded significantly faster in easy trials than in ambiguous trials in both experimental conditions (all-stay: 1.578 ± 0.080 s for ambiguous trials and 1.251 ± 0.044 s for easy trials; all-away: 1.440 ± 0.082 s for ambiguous trials and 1.069 ± 0.034 s for easy trials) and his/ her SPs were significantly higher for easy trials than for ambiguous trials (all-stay: 61 ± 2% for ambiguous trials and 95 ± 1% for easy trials; all-away: 56 ± 2% for ambiguous trials and 92 ± 1% for easy trials).
Next, we investigated the subjects' performance during bias-against and bias-for trials. These two trial profiles are the most interesting of the study because, for non-leaky sensory evidence (all-stay condition), the decision-making models make different predictions about DTs for these trial types [15]. While the UGM predicts no differences between them, the EAM predicts that DTs in bias-against trials will be longer than those in bias-for trials. Importantly, using only the all-stay condition, Cisek et al. [15] showed that the subjects' DTs did not differ between bias-against and bias-for trials, providing strong evidence in support of the UGM. However, one possible explanation for their results was that accumulation or integration of evidence was not required by the task because the information was available on the screen during the entire trial, favoring urgency dynamics. Our all-away condition was designed to control for that possibility. By using a condition in which each token disappeared from the screen after jumping, we could control for the possibility that the previous findings in favor of the UGM over the EAM were not merely a consequence of the limitations of the original experimental design. In the all-stay trials the tokens remain visible after jumping, while in the all-away trials they disappear soon after they have jumped. The participant has to guess which of the two target circles will contain more tokens at the end of the trial. (B) Success probability profiles for specific trial types. Top panel, success probability for easy (black) and ambiguous (gray) trials. Bottom panel, success probability for bias-against (black) and bias-for (gray) trials. https://doi.org/10.1371/journal.pcbi.1009455.g001 We, first, examined the results of the all-stay condition trials. Consistent with Cisek et al. [15][23], we observed no differences in the subjects' mean DT or SP (± SEM) between bias-for and bias-against trials (bias-against: 1.810 ± 0.036 s and 85 ± 1%; bias-for: 1.845 ± 0.043 s and 86 ± 2%; Fig 3A). Then, we examined the results of the new condition that we had introduced: the all-away condition. Interestingly, the subjects' mean DT and SP differed significantly between the two trial types (bias-against: 1.650 ± 0.056 s and 80 ± 2%; bias-for: 1.826 ± 0.053 s and 86 ± 1%; Fig 3A). Notably, contrary to the prediction of the EAM for non-leaky sensory evidence (see text above), the mean DT was shorter in bias-against trials than in bias-for trials. Indeed, the behavioral difference between both types of trials is related to shortened DTs observed in the bias-against trials compared with the same trials in the all-stay condition. The shortened DTs reduces accuracy but not significantly (paired-samples t-test p = 0.25; all-away: 60 ± 6%; all-stay: 68 ± 6%). Fig 3B shows the DTs and SPs of the same subject represented in Fig 2B. This subject showed no difference in DTs and SPs between bias-against and bias-for trials in the all-stay condition (bias-against: 1.883 ± 0.048 s and 87 ± 1%; bias-for: 1.8883 ± 0.050 s and 85 ± 2%), but did show a significant difference between trial types in the all-away condition (bias-against: 1.546 ± 0.030 s and 78 ± 1%; bias-for: 1.841 ± 0.063 s and 85 ± 2%), with longer DTs and higher SPs in bias-for than for bias-against trials.
Consistent with previous research [15,16], subjects were biased towards an urgencylike strategy in the all-stay condition. Next, we wondered whether subjects' behavior in the all-away condition might be influenced by the order in which they performed the two conditions (see Materials and Methods). In other words, did subjects' behavior in the allaway condition depend on whether they encountered first the all-stay condition than the all-away condition or vice versa? To answer this question, we analyzed DTs and SPs of subjects sorted by whether they belonged to the first or the second group. Mean DT and SP (± SEM) in bias-for and bias-against trials were close to significance when subjects performed the all-away condition after the all-stay condition (n = 8, paired-samples t-test p = 0.051 for DTs and p = 0.164 for SPs; bias-for: 1.73 ± 0.06 and 83 ± 2%; bias-against: 1.56 ± 0.05 s and 78 ± 4%) and significantly different when they performed the all-away condition at first (n = 7, paired-samples t-test p < 0.05; p = 0.0152, t = 3.36 for DTs and p = 0.0171, t = 3.26 for SPs; bias-for: 1.94 ± 0.07 s and 89 ± 2%; bias-against: 1.75 ± 0.09 s and 83 ± 3%). Nevertheless, in both cases, mean DT and mean SP were longer and higher, respectively, in the bias-for than in the bias-against trials, indicating no influence of the order of blocks in behavior.

The urgency-gating model correctly predicts behavior
We investigated whether the behavioral results could be better explained by the EAM or by the UGM. To do that, we devised a computational framework in which sensory evidence directly fed the decision-making model, implemented as the EAM or the UGM, or did so through a working memory module (Fig 4A), simulating the all-stay and all-away conditions, respectively. Therefore, the working memory was responsible for monitoring and remembering the sensory evidence that disappeared from the screen. To fit the data with the models, we used a  ) for the all-away condition. In this case, simulations were performed with and without leakage in the sensory evidence. differential evolution algorithm with the experimental DTs recorded in correct and error easy, ambiguous, bias-for, and bias-against trials (see Materials and Methods). Table 1 shows the mean and SEM of the best fitting parameters that were obtained for each model and condition. In the all-stay condition, DTs in correct and error trials were better fitted by the UGM than by the EAM (Fig 4B), as indicated by a lower difference between mean experimental and simulated DTs for both kinds of trials (EAM: 135 ms for correct trials, 279 ms for error trials; UGM: 51 ms for correct trials, 41 ms for error trials). Moreover, better performance of the UGM over the EAM was observed when the experimental data was fitted in the all-away condition (Fig 4C), with lower difference in mean DTs that held for sensory evidence without and with leakage (EAM without sensory leak: 79 ms for correct trials, 134 ms for error trials; EAM with sensory leak: 72 ms for correct trials, 115 ms for error trials; UGM without sensory leak: 14 ms for correct trials, 12 ms for error trials; UGM with sensory leak: <1 ms for correct trials, 26 ms for error trials). Indeed, in the two experimental conditions, the shapes of the distributions were better estimated by the UGM in all cases, with the EAM tending to predict shorter DTs resulting in positive skew distributions.
We then investigated the data obtained with the models for each trial type separately to assess whether the models showed the same effect as observed in the experimental data. As previously shown [15], in the all-stay condition, the EAM correctly produced shorter DTs and higher SPs in easy than ambiguous trials but predicted longer DTs and higher SPs in biasagainst trials compared with bias-for trials (Fig 5A), despite no difference was observed in the experimental data in those kinds of trials. On the contrary, the UGM correctly replicated the same effects in DTs and SPs for the four kinds of trials, with distributions that were very similar to those from the real data (Fig 5A, green square). Next, we looked at the all-away condition in two cases: when the sensory evidence had no leak (L e = 0) and when it leaked away with time (L e >0) (see Materials and Methods). In the simulations without sensory leak, both models failed to correctly reproduce the experimental data and showed the same results as those obtained for the all-stay condition. In other words, both models correctly reproduced the differences in DTs and SPs in easy and ambiguous trials but failed to produce shorter DTs and lower SPs in bias-against than bias-for trials, with an opposite result in the EAM and no difference in the UGM (Fig 5B). However, a leak in the sensory evidence (L e >0) was sufficient for the UGM to explain the experimental data in the four types of trials, but not for the EAM, which predicted no significant difference in DTs between bias-for and bias-against trials

PLOS COMPUTATIONAL BIOLOGY
The importance of urgency ( Fig 5B). In addition, the shapes of the distributions of DTs and SPs in the real and UGM data were comparable. Thus, the UGM is capable of explaining the data in the two experimental conditions of our task. The fitted parameters of the UGM seem to indicate a difference in the strategy used in the two experimental conditions: higher mean drift rate and lower boundaries were estimated in the all-away condition compared to the all-stay condition ( Table 1). This result could be related to an increase in the need to make a decision when sensory evidence disappears, related to the gradual "forgetting" of information. Nevertheless, both experimental conditions support the existence of an urgency signal that modulates the decision-making process, strengthening the idea of decisions being the result of sensory evidence (visual or mnemonic) combined with an internal urge to decide.

Discussion
In this work we have advanced in the understanding of the general mechanism of decision making. To accomplish this, we used an experimental task in which the available sensory information varied over time. Changing information over time appears to be a critical element of task design, since it allows a more efficient discrimination between computational models of decision making than that using tasks with constant information [19], and therefore between decision-making mechanisms. In our experimental task, fifteen tokens, presented in the center of the screen, successively jumped into a circular target either to their right or to their left. Subjects were required to decide which of the two targets would contain more tokens at the end of the trial. They could make their choice at any time, but to discourage random guessing, they were required to reach a specific number of correct trials to finish the experiment. This task represents a modification of a previously described tokens task [15,16], in which we have introduced a new condition that required the use of working memory and that could favor the integration of sensory evidence. This addition has allowed us to compare decision making under conditions where each novel piece of sensory information was visually available until the end of the decision-making process (and thus did not explicitly require the integration of information) with conditions where novel sensory evidence was removed shortly after it had been presented (thus requiring the active maintenance of information in the brain). With this comparison, we have advanced in the description of the general mechanism by which decisions are made under different circumstances. We have shown that the experimental results observed in both contexts could be explained by a decision-making model that low-pass filters sensory evidence, provided by a visual-input or a working-memory module, and multiplies it by an urgency signal. In this model, the decision is made when the result of the multiplication reaches a decision threshold.
We have focused our study on the two main models in the literature of the mechanism governing decision making: the EAM and the UGM. In our behavioral trials, sensory evidence was represented by the number of tokens that had already jumped to each target. This approach is different from the one previously used [15], in which the success probability for one target was calculated after each tokens' jump and used as sensory evidence. The reasons we used this alternative approach are twofold. First, estimating the number of tokens in each target is easier for subjects than estimating the probability of success for each target. Indeed, our approach allows for a simpler and more realistic way of representing and updating sensory evidence, especially when it needs to be maintained in memory, which does not rely on relatively complex calculations of probabilities. Second, the predictions of each model are equivalent to when using success probability.
One possible explanation for the fact that our results support the UGM over the EAM might be that the EAM is fundamental for non-noisy sensory evidence, such as the stimuli provided in a motion-direction discrimination task, but not for non-noisy sensory evidence [14,16,17,[25][26][27][28][29][30]. However, using a paradigm similar to the tokens task but with noisy stimuli, Thura et al. [16] showed that, even in such cases, when the available information changes over time, the EAM cannot account for the behavioral observations while the UGM can. Thus, the decision-making mechanism for noisy or non-noisy sensory evidence appears to be the same.
The EAM has been the standard view in the decision-making literature for many years but recently it has been questioned by models of urgency and of accumulation with time-varying boundaries [15,16,[31][32][33]. Here we implemented the UGM, which relies on an urgency signal that grows with time and that can resolve situations of high time-pressure. An alternative to that is offered by the time-varying boundaries models that propose that decision boundaries decay with time. In both cases, less sensory or accumulated evidence, respectively, is needed to make a decision as time passes. We have shown that the UGM can explain our experimental data better than the EAM. The same result would be expected when comparing the UGM with a time-varying boundaries model. The reason is that these models only differ from the EAM in the decision boundaries, which are fixed in the latter case and time-varying in the former. Thus, evidence accumulation in bias-for and bias-against trials will reach the decision boundary at similar time points, independently of whether the boundary keeps fixed or decays with time. Further additions to the model would be necessary to account for our data, significantly increasing the complexity of the model.
Using a random motion discrimination task [25][26][27][28][29], Winkel et al. [24] showed that early evidence influenced subsequent decisions. This was considered as an evidence in favor of the EAM and against the UGM. However, the authors missed a low-pass filter in the implementation of the UGM and, indeed, when such filter, with a short time constant (250 ms), was added to the model, the UGM could correctly fit the data [17,34]. Using a similar task, Evans et al. [23] showed that participants were faster and more accurate in their responses when early pulses of motion were consistent with the subsequent direction of motion than when they were inconsistent. Moreover, the behavioral data was better explained by the EAM than by the UGM. Instead, in our all-away condition, we found an opposite result, i.e. decision times were longer when an early pulse was consistent with the subsequent evidence (bias-for) than when it was not (bias-against) and the results could be better explained by the UGM than by the EAM. In an early study, we showed that the effect of pulses in participants' behavior depended on the decision policy employed, which could be explained by different slopes of the urgency signal [34]. This is, indeed, consistent with the lack of effect observed in Evans et al. [23] for responses longer than 2 s and a possible explanation for the behavioral differences between their study and ours. Moreover, a possible reason for the low performance of the UGM in fitting the data from Evans et al. [23] might be the use of a non-optimal value of slope for the urgency signal, which was a fixed parameter during data fitting. Indeed, similar behavioral data was properly fitted by the UGM in previous studies [17,34].
Our results show that information in memory necessarily leaks away. Such leakage could be related to the arrival of new information or to the passage of time [35]. A recent study showed that the accuracy of subjects is unaffected by changes in the interval between two pulses of evidence, pointing to the arrival of new information as the main cause of memory leakage in a perceptual decision-making task [36]. Following this result, we assumed here that memory leakage occurs at the onset of each tokens' jump and freezes after and before each jump. In future research, this could be formally investigated by varying the time between tokens' jumps and by testing whether or not the difference in performance between bias-for and bias-against trials remains unaffected.
Perceptual decisions are influenced by factors that are irrelevant to the task and that can sometimes lead to a decrease in behavioral accuracy [37][38][39][40]. Yet two questions remain: (1) How are these factors integrated into the decision-making process?, and (2) Do such factors influence decisions made with constant and non-constant information in a similar way? Follow-up studies addressing these questions are needed to shed further light on the general mechanism of decision making by distinguishing between decision-making models.
Novel experimental tasks that can help to distinguish between decision-making models are essential for uncovering the underlying mechanisms of decision making. Moreover, the two main models for how decisions are formed-EAM and UGM-imply the involvement of different structures of the brain. Neurons in the lateral intraparietal cortex exhibit a ramping up activity that has been associated with the accumulation of sensory evidence when a decision is being made [2,14,41]. However, this interpretation has recently been questioned based on new experimental data showing that the ramping up activity might be an effect of averaging neural activity associated with instantaneous jumps at different times in different trials [42]. In other studies, neurons in the dorsal premotor cortex and primary motor cortex have been shown to combine sensory evidence with an urgency signal, with no sign of sequential evidence accumulation [5]. Furthermore, the basal ganglia have been identified as one part of the brain that controls the urgency of commitment [43]. Future studies should be designed to move the study of decision making from its general process to the investigation of the underlying specific contributions of each brain area.

Ethics statement
All experimental procedures were in accordance with the ethical standards of the university research committee and with the Code of Ethics of the World Medical Association (Declaration of Helsinki, 1964) and its later amendments. The experimental protocol was approved by the Ethics Committee of the Physiology and Pharmacology Department at Sapienza University of Rome. All subjects provided written consent before participating in the experiment.

Experimental task
Fifteen subjects performed the tokens task (aged 23-56, all right-handed, eight female). A 19.5" BenQ GL2023A LED monitor was used to display the visual stimuli for the task and a USB mouse was used as interface between the participants and the computer. The participants sat in front of the screen at a distance of approximately 60 cm. Our experimental protocol was based on the one proposed by Cisek et al. [15]. The main difference between the two was the addition of the "all-away" condition, described in detail below.
We used two trial conditions, which featured a similar sequence of events. At the beginning of each trial, three circles with white outlines, each 2.5 cm in diameter, appeared on the screen: one (central circle) was displayed at the center of the screen and the other two (targets) were placed 5 cm on either side of the central circle (Fig 1A). The central circle contained 15 dots (tokens) randomly distributed within its circumference. Subjects were required to place the cursor inside the central circle to start a trial. One of the tokens jumped from the central circle to one of the two targets every 200 ms. The subjects were required to select, by moving the mouse and placing the cursor inside one of the two peripheral circles, the target that they guessed would contain the majority of the tokens at the end of the trial. They could make their choice at any time before the last token jumped. Immediately after they had made their choice, the outline of the selected target changed from white to green, if the response was correct, or red, if it was incorrect. After the feedback, the remaining tokens jumped to one of the two targets every 20 ms, so a considerable amount of time could be saved by making decisions earlier. This motivated the subjects not to wait until the end of the trial to make their choice. An interval of 500 ms separated the end of one trial from the beginning of the next one.
We called the two trial conditions "all-stay" and "all-away". In the all-stay condition, the tokens remained visible after jumping to one of the two targets [15]. Thus, the sensory evidence was always available to the subjects. Conversely, in the all-away condition, the tokens disappeared 200 ms after they jumped to one of the targets, so that sensory evidence was not available to the subjects for the remainder of the trial. The task was divided into twelve blocks, half of which comprised only all-stay trials and the other half only all-away trials. The subjects underwent all the blocks of one condition, followed by all the blocks of the other condition, and the order was alternated between subjects. To avoid random guessing, each block was complete only when the subject had achieved 70 correct answers.
The success probability (SP) of selecting target one over target two was calculated as [15]: where P(C|N 1 ,N 2 ,N r ) was the probability of being correct (C) when selecting target one when there were N 1 tokens in target one, N 2 tokens in target two, and N r tokens still remaining in the central circle.
Within each of the 12 blocks, in both all-away and all-stay conditions, the direction in which the tokens jumped was randomly determined in 50% of the trials. In the other half of the trials, one of four predefined trial types was used: "easy" (15% of trials), "ambiguous" (15%), "bias-for" (10%), and "bias-against" (10%) [15]. In the easy trials, most of the tokens jumped only to the correct target, whereas in the ambiguous trials the tokens were evenly distributed between the two targets until just before the end (top panel of Fig 1B). The bias-for and bias-against trials were the most powerful for distinguishing between the decision-making models. The two cases differed only in the direction of jump of the first six tokens. In the biasfor trials, the first three tokens jumped to the correct target and the next three went to the incorrect target. The opposite occurred in the bias-against trials. In both cases most of the remaining tokens jumped to the correct target (bottom panel of Fig 1B). The random trials were included in order to prevent the subjects from predicting the pattern of the trials.
To estimate decision times (DTs), the subjects performed 40 additional trials in which only one token jumped to one of the two targets, which was randomly selected. The time from the arrival of the token in the target to the time at which the cursor of the mouse left the central circle and reached the chosen circle represented the baseline reaction time (RT) of the subject. The baseline RT of each subject was then subtracted from the RT obtained in the main task, which was computed as the difference between the time at which the cursor of the mouse left the central circle and the start of the trial. The SP was then computed at the estimated DT. Unless otherwise specified, all analyses were conducted using only correct trials.
Computational models Sensory evidence. The computational models used the same input (sensory evidence) to make a decision. The variation in the sensory evidence was estimated as: where the delta functions defined the tokens' jump to either the right (δt R = 1) or the left (δt L = 1) target. When the working-memory module was added, the sensory evidence stored in it and used as input in the decision-making process was estimated as: where L e was the leak term for the working memory. The value of e leak was only updated after the jump of each token and remained with that value until the next token's jump occurred [36]. Decision-making models. We implemented two different models of decision making that have been widely accepted and used to analyze behavior and neural data observed in previous research [7,41,[44][45][46][47]: the EAM and the UGM.
The EAM was implemented with the following dynamics: where x describes the state of the process at time t and t+Δt, ν is the drift rate, e is the sensory evidence provided (by either the working-memory module or the visual input) to make a decision, Δt is the step size of the simulation, s is a scaling factor and ξ is an independent and identically distributed random sample taken from a standard normal distribution. In each simulated trial, the initial value of x was set to y 2 , where θ reflects the difference between decision boundaries, and the drift rate ν was sampled from a normal distribution with standard deviation η and mean ν. In our simulations, the step size Δt was set to 10 ms and s was arbitrarily set to 0.1. In the EAM, a decision was considered to be made when x>θ or x<0 and the corresponding decision time was estimated as t À Dt 2 . The dynamics of the UGM differed from the EAM in a leaky term and an urgency signal: where ν, η and Δt are parameters defined as in the EAM. The leaky term was implemented as a low-pass filter with a time constant given by τ (100 ms). The instantaneous value of the accumulation of evidence was multiplied by an urgency signal, which was defined as the time passed since the start of the decision-making process (u t = t). In the UGM, the decision was considered to be made when x t+Δt u t+Δt >θ or x t+Δt u t+Δt <−θ and the corresponding decision time was estimated as t À Dt 2 . In both models, the parameters ν, η and θ are free parameters estimated with a fitting procedure that optimized the goodness of fit.

Estimation of model parameters
The model parameters (ν, η and θ) were estimated independently for each model to fit each participant's data. To do that, we used quantile maximum products estimation (QMPE) [48]. For that, the decision times of the experimental data were sorted into quantiles, divided into correct and error responses. The QMPE estimates the similarity between experimental and simulated data by comparing the proportion of data that belong to each quantile. The search of parameters to optimize goodness of fit was performed with the differential evolution algorithm [49,50]. We defined wide boundaries for each parameter and used 100 particles for 500 search iterations. This procedure of parameters' estimation was repeated 5 times to avoid local maxima. Model predictions were evaluated using Monte Carlo simulation with 10,000 replicates per experimental condition. At the search termination point, data was simulated with the set of parameters that had the highest goodness of fit and with a number of trials that matched those in the experimental data.
Data fitting was performed using all (correct and error) easy, ambiguous, bias-for, and biasagainst trials.

Statistical tests
The normality of the datasets was tested using the Shapiro-Wilk test.
Bayes Factors [51] were calculated by using a Cauchy distribution with a scale factor of 3 for DTs and a scale factor of 0.707 for SPs, since big and small differences were expected, respectively. A Bayes Factor greater than 3 or smaller than 1/3 indicated evidence for the alternative or the null hypothesis, respectively [52].