Fixation patterns in simple choice reflect optimal information sampling

Simple choices (e.g., eating an apple vs. an orange) are made by integrating noisy evidence that is sampled over time and influenced by visual attention; as a result, fluctuations in visual attention can affect choices. But what determines what is fixated and when? To address this question, we model the decision process for simple choice as an information sampling problem, and approximate the optimal sampling policy. We find that it is optimal to sample from options whose value estimates are both high and uncertain. Furthermore, the optimal policy provides a reasonable account of fixations and choices in binary and trinary simple choice, as well as the differences between the two cases. Overall, the results show that the fixation process during simple choice is influenced dynamically by the value estimates computed during the decision process, in a manner consistent with optimal information sampling.


Response to R1
Thank you for the continued very useful comments, and apologies for misunderstanding some of your earlier comments. We are hopeful that we understand these nuanced points better now and that we have addressed it adequately in the text and below.
1. I would like to clarify the issue with the term "sequential". What I meant is that both the DDM and aDDM are sequential sampling models in which evidence is sampled (sequentially) in a relative fashion, so it is always evidence for A relative to B that it sampled (as in the Sequential probability ratio test). This is what distinguishes sequential models in which evidence is accumulated "in a single sum" (such as DDM, aDDM, DFT, OU models) from sequential sampling models in which evidence is accumulated in separate accumulators (such as race models, LCA, etc). Therefore, I would like to stress that when the authors were using the term "sequential" to refer to the fact that evidence was accumulated sequentially for different options, this does not correspond to neither what the DDM or the aDDM "do". Therefore, I suggest to change the term "sequential" with perhaps "time-varying accumulation rates" or something similar. I think it should be clear that the authors are not suggesting that when looking at A you are not accumulating evidence for B (again, the accumulated evidence is relative for A vs. B, not absolute for A in both the DDM and aDDM). However, the crucial addition of the aDDM vs. the DDM is that it allows the accumulation rate to vary within the trial based on attentional shifts. This is a valid point. We certainly did not intend to suggest "that when looking at A you are not accumulating evidence for B" and we agree that it is critical that we prevent the possibility that the reader infers such a suggestion. By "sequential" we only mean that multiple value samples are taken over a period of time. We have ensured that we never use the term "sequence" or "sequential" in a context where it could be interpreted in this way. This involved removing one occurrence (l. 82): In contrast, the optimal algorithm when the decision-maker must sample information sequentially and selectively is unknown Moreover, since the DDM is related to the SPRT, I am not sure I get why the author say that "Adding inhibition would necessarily violate the rational norm of Bayesian inference, and thus does not seem appropriate for our model given our emphasis on optimality". For what I understand, mutual inhibition is necessary for optimality in sequential sampling models. Perhaps the authors can elaborate/better explain their point.
We are sorry that these comments regarding inhibition in our previous response letter were not as clear as they should have been. Fortunately the confusion does not seem to have leaked into the paper.
Here are some thoughts on this.

First, we actually had the Bogacz paper in mind when writing our previous response. That paper beautifully shows how various accumulator models with inhibition are analogous to the DDM, under appropriate combinations of parameters.
Second, one can also devise multi-unit accumulators that are equivalent to the DDM, without inhibition . This is what we meant by "the role of inhibition depends on the exact neural network used to implement or approximate the algorithm".
In particular, some previous specifications of the DDM do not involve any explicit inhibition. For example, in the binary DDM, there is only one accumulator and thus there is no place for inhibition. We can also express the DDM with two accumulators and a relative decision value defined as the difference between the values of the two accumulators.
Note that the joint noise term ε t has twice the variance of the individual noise terms. We see that the two-accumulator model reduces to the DDM without introducing any inhibition between the accumulators. However, one could very reasonably argue that the subtraction in defining the relative value signal is a form of inhibition. In that sense, our model also has inhibition because the decision to stop sampling is based on differences in estimated values. We were not considering this comparison to fall under "inhibition" in our original response.
Third, we view inhibition as a process-level mechanism in which neural populations directly inhibit each other. Such a mechanism may be critical in a biologically plausible implementation/approximation of our model; however, our more abstract model is silent about these issues.
Finally, although these issues were related to our previous response letter, and not the paper, inspired by the reviewer's comment we have added the following new final sentences of the discussion (l. 608-613): Finally, in contrast to many sequential sampling models, our model is not intended as a biologically plausible process model of how the brain actually makes decisions. Exploring how the brain might approximate the optimal sampling policy presented here, and also how optimal sampling might change under accumulation mechanisms such as decay and inhibition is another priority for future work.