Paradoxical Evidence Integration in Rapid Decision Processes

Decisions about noisy stimuli require evidence integration over time. Traditionally, evidence integration and decision making are described as a one-stage process: a decision is made when evidence for the presence of a stimulus crosses a threshold. Here, we show that one-stage models cannot explain psychophysical experiments on feature fusion, where two visual stimuli are presented in rapid succession. Paradoxically, the second stimulus biases decisions more strongly than the first one, contrary to predictions of one-stage models and intuition. We present a two-stage model where sensory information is integrated and buffered before it is fed into a drift diffusion process. The model is tested in a series of psychophysical experiments and explains both accuracy and reaction time distributions.

at time t is caused by a stimulus I. At stimulus onset (t = 0) the observer has no information about its identity.All stimuli are equally likely, i.e. p(I; t = 0) = p at is at.As time progresses, the observer gradually acquires knowledge on stimulus identity by integrating the evidence in the signals.The posterior probability p(I; t) of the stimulus identity given the signal S t at time t is p(I; t) = p(S t |I)p(I; t) I p(S t |I)p(I; t) , where p(I; t) is a prior probability and p(S t |I) is the likelihood of the signal S t given stimulus identity I.The likelihood p(S t |I) represents the model of the environment that the observer has acquired through previous experiences.
If we used in Eq. ( 1) the posterior p(I; t−∆t) at the last time step as a prior for calculating the posterior at time t (i.e.p(I; t) = p(I; t−∆t)), the observer would perform a lossless integration of information over time.Such a full temporal integration, however, makes the implicit assumption that the stimulus identity remains constant at all times.In the face of changing stimuli, the observer would interpret dierent stimuli as one and thus come to erroneous conclusions.(2) Before we specify how p(new) is calculated, let us rst show that this model introduces an information leak, similar to the low-pass properties found in the integration stage of the two-stage model.
The leaky evidence integrator.
Until now, we assumed that signals arrive in the visual system at discrete moments in time.Let us now consider the limit in which stimuli are presented in continuous time: ∆t → 0. To keep the amount of information in the signals nite as ∆t → 0, the amount of information per time bin ∆t has to go to zero, i.e. the observer's model P (S t |I) has to become progressively less informative.The limit can be taken by using where N is the number of possible signals and W (S t |I) denotes an evidence rate, which is constant as ∆t → 0. The continuous dynamics of the posterior p(I; t) can be derived by expanding Eqs. ( 1) and ( 2) in orders of ∆t and disregarding all terms of order (∆t) (3) Note that if it were not for the signal, the posterior p(I; t) would relax towards the at prior p at with a time constant τ (t) = 1/n(t).We denote the relaxation rate n(t) as the novelty, because it is related to the probability p(new) of the signal being new: n(t) = lim ∆t→0 p(new)/∆t.Since the signals become less informative as ∆t → 0, the probability p(new) decreases with ∆t such that the novelty n(t) is well dened.
There are dierent approaches to calculating p(new).In the absence of information on the stimuli, we can assume that there is a characteristic time τ after which the observer typically expects stimulus identity to change.The probability of a change within a short time window ∆t is then given by p(new) = ∆t/τ and the novelty n(t) = 1/τ is constant.This leads to a leaky evidence integration with a constant leak time constant τ , quite similar to the evidence integration model in the main article (see stage 1 in the two-stage model in Figure 2F).Note that for constant p(new), our model is equivalent to a hidden Markov model. Novelty.
A more elaborate approach to calculating p(new) To fully specify this model, we have to choose a prior p(new).Similar to the arguments in Eqs.
(2) and (3), we allow the observer to accumulate evidence on the novelty of the signal over a given time interval τ new .To this end, we again use a mixed prior p(new) containing the old posterior p(new|S t−∆t ) and a constant prior p 0 (new) = ∆t/τ .By taking the limit of continuous time, we obtain a dierential equation for the novelty n(t) at time t:  3) is small and information is integrated over a long time scale.When stimulus identity changes, there is a brief period in which the signals disagree with the current belief: W (S t |new) > W (S t |old).As a result, the novelty n(t) increases and p(I; t) relaxes more quickly towards the at prior previous evidence is forgotten. Simulations.
We simulated the case of 3 dierent stimulus identities (stimulus A, stimulus B and blank) and 3 dierent signals (`A', `B' and `blank').The fact that A and B are similar is modeled by a relatively small dierence in the evidence rates W (S t |I) for stimuli A and B. In contrast, the evidence rate W (S t |I = blank) for blank vs. stimulus A or B is relatively high.
The evidence rates W (S t |I) are given in Table 1.The time constants for novelty detection and expected stimulus identity change are τ new = 1 ms and τ = 50 ms.We used a time discretization of ∆t = 0.01 ms, which is suciently small to ensure that the discretization has no inuence on the results.
The model can reproduce the central features of evidence integration in the psychophysical experiments (Figure 2G) and generates novelty signals needed for the transfer to the buer of the two-stage model (see Supporting Figure S2).

Two-stage model
For long stimulus durations, the two-stage model smoothly converts into a one-stage model.To illustrate this, we studied the behavior of the model for long stimulus durations, where a onestage model would predict a dominance of the rst stimulus or, for the intermediate duration of 160 ms in our experiments, a weaker dominance of the second.
The model was t to the data of experiment one.Fitting was done in two separate steps.
First, we t the leaky integrator of stage one and the drift-diusion process of the extended two-stage model onto the reaction time distributions using the standard procedure to those conditions in which we found the model to work reasonably well (i.e.20-80 ms total duration, but not 160 ms).Second, we keep the parameters found in the rst step xed and optimize T start to t the dominance across all conditions (i.e.20-160 ms total duration).The tting of T start is done for each observer individually by minimizing the sum of the mean square error of the dominance across conditions (range of tested T start : 0 ≤ T start − T pre ≤ 120 ms) using 10.000 repetitions.
The model indeed shows a weaker dominance of the second stimulus for large stimulus durations (Supporting Figure S3C).It bears similarities to one-stage models using sensory preprocessing [1].However, there are two marked dierences: First, one-stage models using sensory pre-processing do not comprise a buer that allows informed decision making to continue after the disappearance of the stimulus.Second, they start the drift-diusion with stimulus onset (i.e.T start = T pre ).However, without these features the decision variable moves towards the decision bound for stimulus `A' before dropping back to chance level.The decision variable does not continue towards the decision bound for stimulus `B'.Therefore, these models also show a dominance of the rst stimulus.
2 .Taking the limit ∆t → 0 yields a nonlinear dierential equation for the posterior: d dt p(I; t) = −n(t) (p(I) − p at (I)) + W (S t |I)p(I; t) − p(I; t) I W (S t |I )p(I ; t) .
is to use the likelihood to calculate the probability that the current signal is in agreement with the current belief p(I; t − ∆t) of the observer.To this end, we again use the Bayesian approach and calculate the posterior p(new|S t ) = p(S t |new)p(new)/p(S t ), where p(new) is a prior probability that the signal is new and p(S t |new) and p(S t ) are calculated using the probabilistic model: p(S t |new) = I p(S t |I)p(I|new) = I p(S t |I)p at (I) p(S t |old) = I p(S t |I)p(I|old) = I p(S t |I)p(I; t − ∆t) p(S t ) = p(S t |new)p(new) + p(S t |old)(1 − p(new)) .
In essence, this model tracks the novelty by comparing the stimuli within a time window of duration τ new with the current beliefs p(I; t) for the dierent stimuli I.As long as belief and signal are in agreement, W (S t |new) < W (S t |old) so that the novelty n(t) remains smaller than 1/τ .Consequently, the leak term in Eq. ( ) , where W (S t |new) and W (S t |old) are dened by W (S t |new) := I W (S t |I)p at (I) and W (S t |old) := I W (S t |I)p(I; t) .