Foraging as an evidence accumulation process

A canonical foraging task is the patch-leaving problem, in which a forager must decide to leave a current resource in search for another. Theoretical work has derived optimal strategies for when to leave a patch, and experiments have tested for conditions where animals do or do not follow an optimal strategy. Nevertheless, models of patch-leaving decisions do not consider the imperfect and noisy sampling process through which an animal gathers information, and how this process is constrained by neurobiological mechanisms. In this theoretical study, we formulate an evidence accumulation model of patch-leaving decisions where the animal averages over noisy measurements to estimate the state of the current patch and the overall environment. Evidence accumulation models belong to the class of drift diffusion processes and have been used to model decision making in different contexts. We solve the model for conditions where foraging decisions are optimal and equivalent to the marginal value theorem, and perform simulations to analyze deviations from optimal when these conditions are not met. By adjusting the drift rate and decision threshold, the model can represent different strategies, for example an increment-decrement or counting strategy. These strategies yield identical decisions in the limiting case but differ in how patch residence times adapt when the foraging environment is uncertain. To account for sub-optimal decisions, we introduce an energy-dependent utility function that predicts longer than optimal patch residence times when food is plentiful. Our model provides a quantitative connection between ecological models of foraging behavior and evidence accumulation models of decision making. Moreover, it provides a theoretical framework for potential experiments which seek to identify neural circuits underlying patch leaving decisions.


INTRODUCTION
In systems and cognitive neuroscience, decision-making has been extensively studied using the concept of evidence accumulation [1,2]. Evidence accumulation has been implicated for example in social decisions [3], sensory decisions [4,10,11], economic decisions [5], gambling decisions [6], memory decisions [7], visual search decisions [8], and value decisions [9]. Such processes of evidence accumulation can be well described quantitatively with a drift-diffusion model, providing a moment by moment estimate of the animal's internal accumulation process. This quantitative modeling has given the experimenter the opportunity to investigate a myriad of neuronal mechanisms underlying these processes, for example the contributions of different cortical areas that differentially accumulate evidence over time [11][12][13][14][15][16][17][18][19]. Although this work has revealed a detailed account of the neural mechanisms associated with decision-making, an outstanding question remains as to how these mechanisms have been shaped by selection forces in the animal's environment [20,21].
Foraging is one of the most ubiquitous behaviors that animals exhibit, as search for food is essential for survival [22,23]. From a cognitive perspective, foraging comprises aspects of learning, statistical inference, self-control, and decision-making, thus providing the opportunity to understand how these processes have been shaped by natural selection to optimize returns in the face of environmental and physiological constraints and costs [21]. There is an increased interest to study foraging behavior within a neuroscience context and link neural signals to relevant foraging parameters [24][25][26][27][28][29]. For example, during a visual foraging task with non-human primates (Macaca mulatta), the activity in the dorsal anterior cingulate cortex (dACC) region was found to increase while a patch depletes until a threshold, after which the animal switches patches [24]. Other work has found that neurons in primate posterior cingulate cortex (PCC) signal decision salience during visual foraging, and thus relate to disengagement from the current patch [30].
A canonical foraging task is the patch leaving problem where an animal must decide when to leave a resource to search for another. Ecological models, such as the well-known marginal value theorem (MVT) [31], describe patch-leaving decision rules that an animal should use to optimize its food intake. Deviations from optimal decisions may be due to internal state-dependence or environmental characteristics [32]. Studies that link cognitive biases to environmental structure highlight the importance of studying the decision-maker in their natural environment, by framing decision making in terms of "ecological rationality" (as opposed to "economic rationality") [33][34][35][36]. The MVT and related work provides a quantitative basis for understanding patch decisions, but does not give a mechanistic account of the animal's internal decision process and how it uses its experience to reach patch-leaving decisions.
In this work we formulate a mechanistic model of patch leaving decisions by linking ecological models of the patch leaving problem with models of evidence accumulation that are used in systems neuroscience. We call this model the foraging drift-diffusion model (FDDM). This model builds on previous mechanistics models of patch leaving decisions [37][38][39][40][41][42]. In our model, patch-leaving decisions are described by a drift-diffusion process [43,44], which represents the noisy process through which an animal accumulates evidence (by finding food) and uses its experience to decide when to leave the patch. Evidence accumulation and decisions within a patch are coupled to a moving average process that keeps track of the average rate of energy intake available from the environment. We solve for conditions where the model yields optimal foraging decisions according to the MVT, and perform simulations to analyze deviations from optimal when these conditions are not met. We then consider different decision "strategies", which are adaptive to different environmental conditions, and introduce a utility function into the model in order to account for sub-optimal foraging. More importantly, our model generates testable predictions about the different decision strategies an animal may employ in an uncertain environment. The model provides a quantitative connection between foraging behavior and experiments that seek to understand the neural basis of patch leaving decisions. For example, the model may be used to investigate how the neural activity of a particular brain area is tuned to the decision variable, or how different brain areas process other aspects of the decision making process detailed by the FDDM.

RESULTS
To present the model, we first define the governing equations then describe the dynamics of patch depletion by introducing parameters to represent the patch characteristics in the foraging environment. We then solve a simplified form of the model to establish conditions where the foraging drift-diffusion model yields identical decisions to the marginal value theorem. We show that optimal decisions can be represented in the model using different decision "strategies", including an increment-decrement mechanism, where receiving food reward makes the forager more likely to stay in the patch, and a counting mechanism, where receiving food reward makes the forager more likely to leave. Following this, we perform simulations with the general form of the model, to show how noise in the patch decision process and discrete food rewards affect energy intake and patch residence times. We then consider different configurations of the foraging environment, and present approximate solutions along with simulation results to show how the different decision strategies can be adaptive, depending on the uncertain versus known information about the foraging environment. Finally, we introduce a utility function into the model, and show how this can quantitatively account for the salient experimental observation that patch residence times tend to be longer than optimal. The model that we term foraging drift-diffusion model (FDDM) is a drift diffusion process that includes two coupled variables. The first calculates the energy intake available from the environment (E) by taking a moving average over a timescale τ E . The second uses a drift-diffusion process for patch leaving decisions, which we represent by a patch decision variable x. Upon entering a patch x = 0, and changes in x occur with evidence accumulation from a constant drift α and a time-dependent reward function r(t). The forager decides to leave the patch when the threshold of x = η is reached. There is a constant cost of s, so that the net rate of energy gain while in a patch is r(t) − s, and while traveling between patches it is −s. The two equations are defined as: 1. Net energy intake rate 2. Decision to leave a patch Fig 1 shows a schematic of the model, an example for the probability density of x when in a patch, and example traces of E and x across multiple patches (see Section S1 for details on the numerical simulation of the probability density of the patch decision variable). Table  I lists the quantities defined in the governing equations.

Patch characteristics
We formulate equations to represent patch depletion, and incorporate a parameter that interpolates between continuous and discrete food rewards. The function r(t) describes the A forager estimates the average rate of reward from the environment, and the decision to leave a patch occurs when the internal decision variable reaches a threshold. Travel time between patches is T tr , and patches are described by the parameters ρ 0 , A, and c (see Table II). (B) Evolution of the probability density of the patch decision variable (x) while in a single patch, along with the time-dependent probability that the decision to leave the patch has been made. Blue arrows denote the receipt of food rewards. (C) Energy estimate coupled with the patch decision variable over multiple patches. (D) Patch depletion with discrete rewards, showing examples of the food reward received and the time-dependent in-patch food density for different values of the food chunk size (c).
rate of food reward that the animal receives while in a patch, and ρ(t) is the density of food in the current patch. The initial density of food in the patch is ρ 0 , and when a forager finds and eats a piece of food, the total amount of food remaining in the patch decreases.
To formalize this, we consider that patches have an area of a and that food is uniformly scattered within a patch in chunk sizes of c. If the forager searches at a rate of v, the probability of finding k chunks of food in a time interval ∆t is given by a Poisson distribution with event rate of ρ(t)v∆t/c: When food is found, the total amount of food remaining is reduced by an amount kc. On Patch variables and parameters ρ(t) Time-dependent food density in the current patch ρ 0 Initial food density A Patch size c Food chunk size T tr Travel time between patches average, the total amount of food, aρ(t), changes according to where · denotes an ensemble average. Using Eq. 3, the ensemble average for the number of pieces of food found in one time step is k = ρ(t)v∆t/c, and using this, average change in density follows a linear differential equation [45]: where A = a/v is the effective time constant of the patch. Without loss of generality, we set v = 1, i.e. the forager explores one unit area per unit time, and therefore refer to A as the "patch size", with units of time. Average patch depletion (as well as the average rate of reward received) follows a simple exponential decay: where t is the time spent in the current patch. Fig 1D shows example time traces of patch density and food received for different values of the chunk size c. With larger chunk size there is larger variability in the food rewards found per unit time. In limit of zero chunk size, food reward is continuous and the food reward rate and patch density are equal to the average density from Eq. 6: lim c→0 r(t) = lim c→0 ρ(t) = ρ(t) .
Optimal foraging decisions and model equivalence to marginal value theorem We solve the model to establish conditions on the drift rate α and the decision threshold η which lead to optimal patch residence times. To do this, we consider E = E = const., (the estimated value of energy is constant and equal to the actual average), σ = 0, (no noise on the patch decision variable), and c = 0 (food reward is received continuously). We establish an equivalence between the FDDM and the marginal value theorem for this case, then relax these assumptions and use simulations to show that the derived rules lead to approximately optimal patch decisions over a wide range of parameter values and configurations of the foraging environment.
First, we rewrite the marginal value calculation for patch residence time using the above notation. If there is a travel time between patches of T tr , then the average rate of energy intake is given by a weighted sum of intakes during time spend in patches and traveling between patches. Taking the derivative of the average energy intake rate, setting to zero, and re-arranging, yields the well-known condition to solve for the optimal time T opt to stay in a patch: Eq. 7 can be written more compactly in the form r(T ) − s = E , where E is the average energy rate from the environment. Using the average patch depletion dynamics (Eq. 6) in Eq. 7, we can solve for the optimal time to remain in a patch: Integrating the patch decision variable (Eq. 2) to the threshold and inserting Eq. 8 for the optimal patch residence time yields a relationship between the threshold, drift rate, energy, and patch parameters: If Eq. 9 is satisfied, optimal decisions can be obtained with different values of the drift rate α. To determine a valid range for α values, in Section S3 we solve for conditions on the drift rate such that there is only a single threshold crossing up to the time T opt . In addition to this, we omit the small range where α and η have opposite signs. With these conditions, the valid range for the drift rate is Using this range, and also substituting E, the energy estimate, for E , we highlight the following different "strategies": Density-adaptive: α D = ρ 0 Size-adaptive: η S = 0
For the density-adaptive, counting, and robust counting strategies, η is defined by Eq. 9 with the corresponding value of α, and substituting E instead of E .
These strategies are illustrated in Fig 2 using zero noise and discrete rewards. The density-adaptive strategy (Fig 2A) uses an increment-decrement mechanism [39,46] which in previous work has been suggested as adaptive for the case when the forager does not initially know the number of expected reward items on the patch [47]. In the FDDM framework, this is implemented using η > 0 and α > 0, so that finding food makes the forager more likely to stay in the patch, but otherwise the drift brings the x towards the threshold. We show in Section S4 that using α D is optimal to adapt PRTs to uncertain food density within each patch. The size-adaptive strategy also uses an increment/decrement mechanism, but with different values of α and η that are optimal to adapt PRTs with respect to uncertainty in the size of each patch (Section S4). This strategy uses a threshold of zero, such that x first decreases below zero and then rises back to the threshold. Because x both starts and ends at zero, the size-adaptive strategy is sensitive to noise and randomness in the timing of rewards received. We therefore illustrate this strategy in Fig 2B by choosing a value α > α S , which yields η > 0. The counting strategy has zero drift rate and a negative value of the threshold. With this strategy, finding food makes the forager more likely to leave the patch, and the forager leaves only after a set amount of food reward has been received ( Fig 2C). Since the choice of α = 0 in the counting strategy can cause PRTs to become infinite if patches do not contain as much food as expected, we define an additional strategy termed 'robust counting' which has a nonzero drift α < 0. With this, there is still drift towards the threshold in the absence of food reward; thus, in contrast to the density-adaptive or size-adaptive strategies, both drift and receiving food reward bring x closer to the threshold ( Fig 2D).
The size-adaptive and counting strategies represent limiting cases of η = 0 and α = 0, respectively, and this makes these choices sensitive to noise. We therefore focus our analysis on the density-adaptive and robust counting strategies, which have (α > 0, η > 0) and (α < 0, η < 0), respectively. Patch decisions using these strategies are exactly equivalent to the marginal theorem for the case of E = E , σ = 0, and c = 0. In the next section we investigate model behavior and compare simulated patch decisions to optimal for a range of parameter values in the general case of E = E , σ > 0, and c > 0.
Parameter dependence: noisy decisions and discrete food rewards In the general case, individual patch decisions will be noisy, food may come in discrete chunks, the estimate of available energy in the environment will vary as the forager explores and obtains food rewards, and patches may vary in quality and distribution. We investigate both a range of environmental configurations and patch parameters as well as different patch decision strategies. The choice α = ρ 0 is optimal for uncertainty in patch food density; this represents an "increment-decrement" mechanism for patch decisions. (B) A threshold of zero is optimal for uncertainty in patch size. Since η = 0 is sensitive to noise, we choose a small value η > 0 to illustrate. (C) The counting strategy uses zero drift, so that the forager leaves after a set amount of food rewards (D) The robust counting strategy uses α < 0 so that there is still drift towards the threshold. Each plot shows the patch decision variable along with the time-dependent patch decision threshold that changes with receipt of food reward due to updates of energy estimate.
To simplify model analysis, we use τ as the unit of time, and s as the unit of energy, and set τ E = 50τ to represent that the energy estimate occurs at a longer time scale that individual patch decisions. We illustrate dominant trends by choosing an intermediate range for characteristics of the foraging environment: E = 2s, A = 5τ , and T tr = 5τ . Simulation results for a range of different configurations defined in Section S2 are shown in Figs S1 and S2. of patch residence time increases. Even with zero noise, the mean simulated PRTs are slightly lower than optimal; this is due to the finite time scale for the moving average estimate of E. Because E increases within a patch and then decreases outside of a patch, E tends to be slightly higher than the actual average energy when the agent leaves the patch (see Fig  1C). The increase in E causes the threshold to decrease in magnitude before the forager leaves the patch (see Fig 2), which is why the average simulated PRTs are slightly less than optimal.
With higher values of σ, the simulated average energy intake decreases, and the effect is larger for the robust-counting (RC) strategy compared to the density-adaptive (DA) strategy. With the DA strategy, the variance of patch residence time increases with noise, but the average stays nearly the same. With the RC strategy, the variance increases more strongly with noise, and for large values of σ, the average patch residence time is longer than optimal.
Fig 3B shows average energy intake and patch residence time when the food chunk size (c) increases. For both strategies, larger chunk sizes increase the variance of PRTs without much effect on the mean. However, the two strategies show opposite trends for average energy intake: with the DA strategy, average energy decreases for large chunk size, but with the RC strategy, average energy increases for large c, to values that are higher than the optimum determined by the marginal value theorem. This is because with a counting strategy, food reward makes the forager more likely to leave the patch, and therefore patch leaving decisions tend to occur immediately after receipt of a food reward, instead of after a certain amount of time in the patch (Fig 2).
With large chunk sizes, the number of food chunks per patch will be small, and therefore instantaneous food intake and leaving decisions are not well described by a 'rate', as expressed with the MVT. The optimum number of food chunks obtained per patch is For example, using parameter values from Fig 3, a chunk size of c = 8 leads to N opt = 4.02.
In this case it is difficult to assess current food density, which is why average energy intake with the DA strategy is less than optimal. For extreme cases where N opt < 1, which occurs for example in small patch size, short inter-patch travel times, and low available energy in the environment, the DA strategy performs poorly, while the RC strategy yields average energy intake rates that are higher than MVT optimal ( Fig S2).

Patch uncertainty and adaptive decisions
To this point we have considered cases where patch quality and inter-patch travel times are the same for all patches; we now ask how the different strategies perform when aspects of the foraging environment are uncertain and may vary from patch to patch. The MVT predicts that foragers should stay longer in high quality patches, and shorter in low quality patches. However, this assumes that as they enter a patch, the forager recognizes the 'type' of the patch and therefore adjusts their expectation of food rewards. We instead consider that the forager only knows the average patch quality in the environment, and must use this along with the estimate of E and its current experience of food rewards to determine when to leave a patch.
We first consider the case that patch quality is uncertain, by varying the initial food density of each patch. This is simulated by drawing the initial density (ρ 0 ) from a Gaussian distribution with meanρ 0 and standard deviation ∆ρ 0 . If the density of each patch is known, the MVT predicts that the forager should adjust its PRT according to Eq. 8. For the FDDM, we show in Section S4 that changes in patch residence time (T ) in response to a small change in patch food density, ρ 0 =ρ 0 + δρ 0 , are approximated by With the DA strategy, foragers stay longer in higher quality patches (i.e. patches with higher ρ 0 ) and shorter in lower quality patches (i.e. lower ρ 0 ), and changes from patch to patch Different foraging environments with associated patch decision strategies. Shown are simulation results with the density-adaptive and robust-counting strategies in two different foraging environments. The left column illustrates the foraging environment for a given case, the middle column shows average energy and patch residence time when a particular strategy is used in that environment, and the right column shows simulation results compared to optimal strategies in each environment. All simulations use a noise level of σ = 0.3ρ 0 and a patch size of A = 5, and the robust counting strategy is implemented by setting α = −0.2ρ 0 . (A) Uncertainty in patch food density. Patches have a Gaussian distribution for initial food density with mean ofρ 0 = 9.439 and a standard deviation of ∆ρ 0 = 0.3ρ 0 , and rewards are received continuously (c = 0). Travel time between patches is constant at T tr = 5. (B) Scattered patches with discrete rewards. Food reward is received in discrete chunks (c = 8) and each patch has the same initial food density of ρ 0 = 9.439. Travel time between patches is drawn from an exponential distribution with meanT tr = 5.
asymptotically follow optimal adjustments. In contrast, the RC strategy yields the opposite trend: patch residence time decreases with patch quality (Fig 4A). Simulations with added patch decision noise agree well with Eq. 13 for small changes in ρ 0 aboutρ 0 , but for the RC strategy there are deviations from the linear trend for large changes. This demonstrates that for an environment where patch food density varies, the DA strategy yields an average energy intake and PRT close to optimal, while using the RC strategy yields an energy intake lower than optimal due to PRTs that are higher than optimal (Fig 4A).
We next consider a different configuration of the foraging environment: food is received in discrete chunks, patches are randomly distributed about the landscape, but the quality of each patch is the same. Because each patch contains the same amount of food, an optimal strategy is to leave a patch after a certain amount of food reward is received, and thus a 'counting' strategy is expected to be optimal. We represent randomly scattered patches by drawing inter-patch travel times from an exponential distribution with meanT tr . Simulations with noise show that in this environment, the RC strategy leads to a higher average energy intake than the DA strategy ( Fig 4B). This is because the distribution of number of food items per patch is sharply peaked near the optimal value for the RC strategy, while the distribution is broader with the peak skewed from optimal for the DA strategy ( Fig 4B). Similar to Fig 3B, Fig 4B shows that the RC strategy leads to mean energy intakes that are higher than the optimum predicted by the MVT, because patch leaving decisions tend to occur immediately following the receipt of food reward.
Another type of patch uncertainty can come from patches that vary in size. The sizeadaptive strategy defined in Eq. 11 yields adjustments to PRTs based on the size of each patch that follow, in the limiting case of zero noise, the optimal times given by Eq. 8. However, because the size-adaptive strategy has a threshold of zero, it is very sensitive to noise. In simulations with added noise, using a strategy close to the size-adaptive strategy with a small but nonzero threshold yields similar or slightly lower average energy intakes compared to the density-adaptive strategy when patch size is uncertain (Fig S3). This suggests that while a forager with an appropriate strategy can nearly optimally adapt individual patch residence times to uncertainty in patch food density, it is more difficult to use a noisy sampling process to adapt individual patch residence times to uncertainty in patch size.

Sub-optimal behavior: satisficing
In the previous section we showed that the FDDM can represent different decision strategies that are appropriate to optimize the response to uncertainty in different environments. However, with the exception of the RC strategy in an environment where patch quality is uncertain, simulations yield average PRTs that are near or slightly lower than optimal. Many studies have examined patch residence times in comparison to MVT predictions; the most common trend is that animals tend to stay longer in patches than predicted by the MVT [32]. In this section we introduce a change to the model to account for this observation.
An animal's perception of a reward, and subsequent foraging decisions, depend on their internal state. One way to capture this is by using a utility function approach, borrowed from behavioral economics [48][49][50]. Conceptually this is also related to 'satisficing' [51][52][53][54][55][56], defined as the process by which animals do not seek to maximize food intake, but instead seek to maintain food intake above a threshold. If food is plentiful, then the utility of increasing intake is small; in this case, an animal will likely be more concerned with, for example, avoiding threats than leaving a current patch in search of higher returns. Conversely, if food is scarce, then survival depends on maximizing the rate of food rewards. The utility concept can represent these scenarios.
We model this by introducing a utility function u(E), Which depends on available energy in the environment. The utility function modifies patch decision dynamics by changing the drift rate and the impact of receiving food: Using this form, the utility function decreases the rate of drift towards the threshold, and either increases or decreases the change in x with food reward depending on whether the threshold is positive or negative. To set a form for u(E), first recall that the animal must obtain energy E > 0 in order to survive. In the limit E → 0, we therefore expect that an animal will adopt a foraging strategy that maximizes energy intake; this is set by u(0) = 1.
For high values of E, we expect that the animal cares less about maximizing food intake rate, and therefore u should decrease. We consider two functions to represent this: where β > 0 is a parameter that determines how fast the utility changes with energy We choose these forms for u(E) to investigate the model response, and note that other functional forms can be used.
Integrating Eq. 14 using either Eq. 15 or 16, setting σ = 0 and E = E , and combining with Eqs. 7-8 yields an approximate correction for how the utility function affects both average intake energy intake and patch residence time. We compare this approximate solution with simulation results in Fig 5. The simulations use the density adaptive strategy in an environment where patch food density is uncertain (i.e. the same configuration as Fig  4A). The results show that using either form of the utility function leads to patch decisions that approach optimal when energy is low, but deviate from optimality when energy is high. Patch residence times are longer than optimal when the available energy is high, in particular for the larger value of β show in Fig 5. Although both forms of the utility function demonstrate longer than optimal patch residence times, the change of PRTs with energy levels depends on whether the exponential or threshold linear form is used. In both cases, simulation results agree reasonably well with the approximate solution. Analogous results for the robust counting strategy, and for an environment with discrete rewards and uncertain travel times, are shown in Fig S4.

DISCUSSION
In this study, we developed a foraging drift-diffusion model (FDDM) to describe how an animal accumulates evidence over time in the form of food rewards and uses this experience to decide when to leave a foraging patch. We solved for conditions where the FDDM yields identical decisions to the marginal value theorem, and performed simulations to show how deviations from optimality are affected by noisy patch decisions and discrete versus continuous food rewards. By adjusting the drift rate and the threshold for patch decisions, the model can represent different decision strategies, including an increment-decrement mechanism, where finding food makes the animal more likely to stay in the patch, and a counting mechanism, where finding food makes the animal more likely to leave the patch. We obtained approximate solutions in addition to model simulations to demonstrate how these different strategies are adaptive, depending on the known and unknown aspects of the foraging environment. We then showed that incorporating a utility function into the model can quantitatively account for the common experimental observation that patch residence times tend to be longer than optimal. Our model links ecological models of patch foraging with drift-diffusion models of decision making, and by representing both MVT-optimal behaviors and sub-optimal deviations, it provides a mechanistic description which yields predictions for future experimental work.
The FDDM model builds on a body of previous work that has considered statistics of patch depletion [45], averaging mechanisms to estimate available energy [57,58], and "patch leaving potentials" or other mechanistic descriptions of when to leave a patch [37][38][39][40][41][42]. Rewards that come stochastically in discrete chunks have been treated in [45,59], for example the case of Poisson distributed rewards we considered as well as more general scenarios [45]. The moving average of energy in the environment that we use is similar to a recent experience-driven model, in that the timescale for updates of energy is finite [58]. Other models have also considered that the forager creates a moving average of the average profitability of the environment [57]. Previous mechanistic models of patch leaving decisions have proposed that a forager has a patch potential, which declines in the absence of food and increases when food is found, and then the forager leaves when the potential crosses zero [37,38]. This model has been used to analyze the behavior of parasitoid wasps seeking hosts, where it is referred to as the incremental-decremental model [39]. In this model, finding a successful host increases the potential, and thus the probability of staying, and in the absence of a finding a successful host the potential continues to move towards the threshold. The incremental/decremental model of parasitoid host decisions has been extended to consider different types of encounters, i.eẇith parasitized or unparasitized individuals [40] The increment/decrement mechanism has been modeled in a similar way by a considering a leaving potential, which reflects how likely the forager is to leave [39,41]. Another similar related model, based on the concept of increment-decrement, has been used to analyze bumblebee foraging [42]. A different class of models have considered patch leaving rules where the forager "counts", for example leaving after a single food item has been found [60]. By combining these different concepts into a single model with a tractable analytical form, we provide a framework for future experiments that seek to understand different decision strategies that may depend on environmental characteristics, neural dynamics, and state-dependence of the animal.
The utility-function approach represents foraging decisions which lead to sub-optimal energy intake and longer than optimal patch residence times. This formulation relates to the mechanisms of satisficing and temporal discounting. Satisficing refers when animals do not seek to maximize food intake, but instead to maintain food intake above a threshold [51][52][53][54][55][56]. Temporal discounting refers to when the animal values current rewards more than expected future rewards [61]. This can be called other things as well, for example impulsivity, failure to delay gratification, and delay discounting [62,63], and studies have examined this phenomena in the context of inter-temporal choice tasks [64][65][66]. Various models of cognitive biases have been formulated to represent staying in a patch longer than optimal (which is also referred to as "over-harvesting" the patch). One way is to define a subjective cost that approximates the aversion to leave the patch [65,67]. Another method is discounting of future rewards, so that for example an expected large reward in a new patch is discounted because of the time delay until which it is available [68]. An alternative interpretation uses a decreasing marginal utility function, such that an expected large reward in a new patch is not viewed as proportionally better than the current low rate of reward in an almost-depleted patch, for example due to costs associated with switching patches [69]. In our model, the utility function describes an explore-exploit tradeoff; if food is plentiful, then the relative utility of leaving the current patch to find a new patch with possibly higher rewards is small, and therefore the animal strays longer in the current patch. The reason for this could be that the animal is satisfied with its current rate of food intake, or that due to other factors (e.g. risks involving with continued search), it values receiving smaller, certain rewards in the present moment instead of leaving to obtain uncertain but possibly larger rewards. We investigated two examples for the form of the utility function in Fig 5, and note that an interesting area for future work is to ask how an animal's perception of the value or utility of a reward depends on internal state and external environmental conditions. Foraging decisions differ from common models of economic choice in a key aspect: decisions are sequential, instead of between discrete alternatives [73]. Experiments with the self-control preparation, where an animal must choose between two alternatives, and the patch preparation, which is a sequential foraging preparation, have seen behavioral differences even though from an economic standpoint the setups are equivalent [62]. Additionally, when rats are required to physically move to perform foraging, the observed behavior differs from tasks that simulate foraging by presenting sequential choices or that consider visual search [65]. These studies highlight the importance of state-dependence and context to understand decision-making processes. In future work it will be interesting to understand the neural basis for why these treatments differ, and how this contributes to state-and environment-dependent decision biases.
In this study we modeled a single forager acting independently. Often times a more realistic situation involves others agents who simultaneously exist in the environment, which leads to competitive and/or collective foraging. If foragers are competing for resources, the ideal free distribution theory describes an optimal way to distribute multiple agents at different food sources in relation to the quality of food sources and the density of competition [74]. In other cases a group may forage together collectively, leading to individual decisions that incorporate both non-social and social information (e.g. [75]). Patch-leaving decisions will then depend on the group reaching consensus. The drift-diffusion modeling framework has been extended to represent coupled decision-makers who share information to collectively reach a decision [76], and this approach could be used to extend the FDDM to multiple agents who make decisions as a group.
We considered that the forager knows the average patch food density (ρ 0 ) and the average patch size (Ā), and uses these to set an optimal decision "strategy" by choosing values of the drift rate (α) and threshold (η). Other models have considered the process of learning about the environment during foraging using reinforcement learning [77]. Reinforcement learning (RL) is a framework to represent how an agent that receives information about the state of the world along with a scalar valued reward signal learns to select actions which maximize the long run accrued reward. Kolling and Akam [77] reframed the MVT rule as an average reward RL algorithm, which estimates relative values of staying and leaving using a particular assumption about the patch's reward rate dynamics. To incorporate RL into the FDDM, one possibility is that the agent has to learn the patch characteristics (ρ 0 ,Ā), and then uses these learned values to set α and η. Another possibility is that the agent could adjust α and η directly, based on feedback from the amount of reward received.
Bayesian foraging theories have considered how patch foraging decisions should be based on a prior estimate of the distribution of patches and expected reward in the environment [78][79][80][81]. For example, if you know the variability of the environment, you should adjust the strategy. If you know patches contain a set number of reward items, then finding a prey item should decrease your probability of staying at the patch (i.e. you should choose a counting strategy). Conversely, if you know that patches vary in their quality, finding a food item should increase the probability that you stay in the patch (i.e. the density-adaptive strategy is a good choice). Experimental work has shown that bumblebees make exactly this adjustment to their patch-leaving strategies [82], but bluegill fish do not [83]. Other studies have considered the effect of reward uncertainty (e.g. [84,85]), suggesting that foragers may not follow optimal rules when patch quality is uncertain [86]. From our simulation results, one possible explanation for sub-optimal decisions when the foraging environment is uncertain is adopting the "wrong strategy" (Fig 4).
In the FDDM, the forager has memory of its previous foraging experience through the estimate of available energy. A related question is how foraging decisions are affected when the environment changes over time, which for example can lead to biases from contrast effects [85]. The speed of environmental fluctuations affects which strategy is optimal [87], and the relative importance of taking different adaptive strategies depends on the dynamics and predictability of the environment [59]. Spatio-temporal autocorrelation is a common feature of natural environments, and this may have driven certain observed decision biases [88]. Related to this, work has shown that patch time allocation is influenced by recent experiences of travel time [89][90][91], and patch quality [92][93][94].
In summary, in this work we developed a mechanistic model of a natural behavior (foraging), with a mathematical form inspired by models used in systems neuroscience. The model considers an agent that calculates a moving average of the available energy in the environment, and makes noisy patch decisions according to the receipt of food rewards and a decision "strategy", which can be adapted to optimize for the characteristics of the foraging environment. This work provides a step towards establishing a unifying framework tying concepts from systems neuroscience, ecology and behavioral economics to study naturalistic decision making. With the advent of functional imaging [95][96][97] and wireless eletrophysiological techniques in freely moving animals [98][99][100][101][102], one can monitor different brain areas simultaneously along with the detailed movement and postural dynamics of the animal [103][104][105][106][107][108], with the aim to map the involvement of both neurobiological and biomechanical mechanisms that relate to certain aspects of behavior. Additionally, recent advancements in closed loop techniques allow precise perturbations of neural systems that depend on the state and current behavior of the animal [109][110][111]. The proposed model provides a momentby-moment estimate of the evolution of the decision process, which enables future work to map brain activity to quantitative behavioral variables using neural recordings and targeted perturbations. Noise: / 0 Robust counting strategy FIG. S1. Full simulation results with added patch decision noise. Shown are the average and standard deviation of the energy intake (left grid) and patch residence times (right grid), for the density adaptive strategy (top) and the robust counting strategy (bottom), when the noise on the patch decision variable (σ) is increased. The robust counting strategy is implemented by setting α = −0.2ρ 0 for each case. Each grid of 9 plots contains simulation results with different values of the travel time and the optimal available energy in the environment: columns correspond to values of T tr = (1, 5, 10)τ , and rows correspond to values of E opt = (0.5, 2, 5)s. For each plot, the filled blue curve uses a patch size of A = 1.5τ , the filled red curve uses a patch size of A = 5τ , and solid line is the optimal energy or patch time.

A=1.5 A=5
Patch size Robust counting strategy   Fig 4A). (B) Results using the linear threshold utility function (see Fig 4B).

S1. Fokker-Planck formulation and numerical solution for probability density
Consider the master equation for the patch decision variable, rewritten here for clarity: τ dx = (α − r(t)) dt + σdW (t). (S1) We will formulate this as a Fokker-Plank equation and solve for the probability density via the finite element method. To do this, we first define a normalized patch decision variable with and take the differential: where the approximation is used, because the threshold η(t) changes slowly compared to the patch decision variable. Now we can write a new master equation with this change of variables: where α y (t) ≡ α/η(t), r y (t) ≡ r(t)/η(t), and σ y (t) ≡ σ/η(t), and the decision threshold occurs at y = 1. Note that since we consider strategies where α and η are either zero or have the same sign, α y (t) will always be either zero or positive, setting a drift towards the threshold. For food rewards, if η > 0 then r y > 0, and from Eq. S4 food reward will decrease y, i.e. lowering it away from the threshold of y = 1. If η < 0, then r y < 0, and food will increase y towards the threshold. Thus, the normalized formulation with the threshold at y = 1 can represent the different decisions strategies without any other further modifications. The Fokker-Plank equation corresponding to Eq. S4 is where G(y, t) is the time-dependent probability density for the normalized decision variable y. We keep the terms α y (t) and r y (t) separate, because the former is a continuous function while the latter is defined by discrete inputs via a Poisson process when food rewards are received in chunks.
To solve this using the finite element method, first let G = N i (y)g i (t), where N i (y) are the shape functions and g i (t) are the nodal variables. Summation notation applies over the indices i and j. After writing the weak form of the equation and setting the integral of the residual to zero, we obtain the finite element matrix equation: where M ij is the mass matrix, A ij is a second-derivative matrix operator, and B ij is a first-derivative matrix operator. We consider the solution over a domain of [−L, 1], and choose the lower value of the domain as sufficiently low to encompass the full range of the probability distribution of y. The upper boundary of y = 1 is absorbing, and therefore has the condition G(1, t) = 0. We define the lower boundary as reflecting: ∂G(−L, t)/∂t = 0. The mass matrix is defined by integrating the shape functions: To define A ij , which is the second derivative matrix operator, we will use integration by parts so that only a first derivative remains (and thus we will only need to use linear shape functions). Writing out the integral, and then integrating by parts, we have where the last equality uses the zero-flux reflecting boundary condition at y = −L. For all elements the 2nd term in Eq. S8 yields 1/dy((−1, 1), (1, −1)), where dy is the size of each element. The absorbing boundary at y = 1 leads to a nonzero flux, and therefore must be included in the global matrix calculation. To do this, consider the last element in the mesh. Evaluating the boundary term yields an element matrix of 1/dy((0, 0), (1, −1)), which must also be included in the calculation of A ij to enforce the boundary condition. The first derivative matrix operator, B ij , is also defined by integrating by parts: where the last equality applies the absorbing boundary condition of G(1, t) = 0. The reflecting boundary condition at y = −L adds an additional contribution of ((1, 0), (0, 0)) to the first element of the mesh, which must also be included in the calculation of B ij .
To solve these equations numerically, the discrete food rewards are treated separately from the drift and diffusion of the probability density. Therefore, in the code, we solve the equation and add an extra statement to shift the probability distribution when discrete food rewards r y (t) are received. We use the simulation to determine the flux through the upper boundary and the timedependent probability P (t) that a decision to leave the patch has been made. Flux through the upper boundary can occur from either drift, diffusion, or the receipt of food reward. We calculate P (t) by integrating over the probability density: (S11) For the simulations shown in Fig 1C, we coupled patch decisions with the estimate of the energy in the environment by using the expectation value of the decision time: where t max is a sufficiently large time value.

S2. Simulations for full parameter range
In the main text we focused on the case A = 5τ , T tr = 5τ , and E = 2s. To investigate the full parameter dependence of the model, we consider scenarios that represent different configurations of the environment:

S3. Range for drift rate values
Here we determine the values of the drift rate α that lead to valid model behavior, i.e. where there is only a single threshold crossing during the time 0 < t < T opt . Consider the value α = α S in Eq. 11, which yields a threshold of η = 0. For this case, the patch decision variable will start at x = 0, decrease, and then increase again to reach the threshold at zero. However, when α < α s , which yields η < 0, the patch decision variable will start at zero and will at first decrease, crossing the threshold at an early time t < T opt , then staying below the threshold before reaching it again at time T opt . Therefore, for some range of value α crit < α < α S , there will be two threshold crossings, one at t < T opt and one at t = T opt , while outside of this range there is only a single threshold crossing at t = T opt .
We solve for the critical value of the drift rate, α crit , by considering the derivative of the patch decision variable at t = T opt . The critical value is when the derivative changes signs from positive to negative, i.e.
which yields α crit = E + s. For drift values in the range α crit < α < α S , there will be two threshold crossings, and therefore a simulation would need an extra rule to "ignore" the first crossing in order to obtain optimal decisions. We therefore restrict drift values to be outside of this range. In our analysis, we make a further restriction to simply results by additionally neglecting the range 0 < α < α crit , because in this range α and η have opposite signs. Note that when α is near the boundaries of this range, we can expect patch decisions to be very sensitive to the addition of noise on the patch decision variable, uncertainty in patch characteristics, and/or if rewards come in discrete chunks.

S4. Drift and threshold choices for optimal patch residence times with patch uncertainty
The values of the drift rate, α, and the threshold, η, are defined using the average values of the patch characteristics in the environment: the average initial patch densityρ 0 , and the average patch sizeĀ. In this section we derive expressions for α and η to consider two possible cases: to optimally adjust patch residence times for uncertainty in patch density, or to optimally adjust patch residence times for uncertainty in patch size.
Eq. 8 is the optimal form for patch residence time as a function of patch density and patch size; we repeat it here for clarity, using E = E : We consider changes of patch residence time of the form First consider a small change in patch density about an average value by using the expansion ρ 0 =ρ 0 + δρ 0 . Plugging this into Eq. S14, expanding to first order terms, and comparing with Eq. S15 yields the optimal first order changes in patch residence time as function of changes in individual patch density: Similarly, considering a change in patch size of the form A =Ā + δA yields an optimal first order change in patch residence time with changes in patch size: δT opt = lnρ 0 E + s δA. (S17) We derive values for the drift rate and threshold so that either Eq. S16 or Eq. S17 are satisfied; these represent two different strategies that an animal may use to adapt to uncertainty in an environment. In doing so, we demonstrate that both Eqs. S16 and S17 cannot be satisfied; the strategies represented by these cases represent a tradeoff between optimally adapting to uncertainty in patch density versus optimally adapting to uncertainty in patch size.
We start with the integral of the patch decision variable equation (Eq. 2) with zero noise, using the average patch depletion function from Eq. 6. Integrating up to a time T when the threshold is reached yields η = αT + ρ 0 A e −T /A − 1 (S18) Applying the condition that the threshold is reached at the optimal patch residence time in Eq. S14 yields a relationship between the threshold and the drift rate: where we note that this is the same form as Eq. 9, except that here the average patch parametersĀ andρ 0 are used. We now combine Eq. S18 and S19, plug in expansions for T = T opt + δT and ρ 0 =ρ 0 + δρ 0 , expand to first order in δT , and solve for the first-order changes in patch residence times: δT = δρ 0Ā (−ρ 0 + E + s) ρ 0 (−α + E + s) + δρ 0 (E + s) where the approximation uses a series expansion in δρ 0 to first order terms. Comparing this with Eq. S16 leads a value of α which satisfies optimal adaption to randomness in patch density, which is simply α D =ρ 0 .
We use an analogous process to calculate values of the drift rate and threshold for optimal adaption to uncertainty in patch size. Again we combine Eq. S18 and S19, then plug in expansions for T = T opt + δT and A =Ā + δA, expand to first order in δT , and solve for the first-order changes in patch residence times: where the approximation uses a series expansion in δA to first order terms. Comparing this with Eq. S17 and solving for α yields the drift rate that satisfies optimal adaption to randomness in patch size: Using this in Eq. S19 yields the threshold value of Thus, for optimal adaptation to patch size, the decision variable will start at zero, decrease to negative values as the animal finds food, and then increase back to zero for a decision to leave the patch.

S5. Optimal energy when patches vary in quality
To investigate the model dependence on parameters and environmental characteristics, we perform simulations by choosing a value of ρ 0 such that optimal energy return of the environment has a certain value. However, when patches vary in quality, we must consider the distribution to calculate the optimal energy. In the simulation we consider patches where Gaussian noise is added to the initial patch density (ρ 0 ) and/or the patch size (A). These distributions are defined by mean parametersρ 0 andĀ and standard deviation parameters ∆ρ 0 and ∆A. In this section we calculate a correction to the optimal energy that depends on the standard deviation of initial patch food density ∆ρ 0 .
First, we write the average energy in the environment, following Eq. 7: where the average over the environment, denoted · , must be evaluated over the distribution of patches. Using Gaussian probability distributions for these, we have where C ρ 0 and C A are normalization factors. The average of some quantity z over these probability distributions is where the lower end of the integral should be restricted to zero, because patches cannot have negative density or size. We first use this to evaluate the average return from patches by using the optimal time from Eq. 8: where the approximation uses an evaluation of the Gaussian probability distribution over a full range, instead of restricting to positive values as expressed in Eq. S28. This approximation holds well for ∆ρ 0 /ρ 0 1 and ∆A/Ā 1. To evaluate the average optimal patch residence time, we use the same approximation for the distribution of A, but evaluate the distribution of ρ 0 over the restricted range due to the nonlinear form: The solution to this integral can be expressed in closed form using special functions; we performed this calculation using Mathematica. We use this to calculate a correction to the optimal energy, which was used to plot results in Fig 4, 5, and Fig S4. For example, usinḡ ρ 0 = 9.439 and ∆ρ 0 = 0 leads to E opt =2, while usingρ 0 = 9.439 and ∆ρ 0 = 0.3ρ 0 (which was used Fig 4), leads to E opt = 2.077. To apply this to the cases shown in Figs 5 and S4, we found that the solution for the correction to the optimal energy, using Eqs. S25 and S30 can be approximated as a linear function function of the "uncorrected" energy, E 0 , using E opt ≈ 0.0256307 + 1.02563E 0 .