Evolution of innate behavioral strategies through competitive population dynamics

Many organism behaviors are innate or instinctual and have been “hard-coded” through evolution. Current approaches to understanding these behaviors model evolution as an optimization problem in which the traits of organisms are assumed to optimize an objective function representing evolutionary fitness. Here, we use a mechanistic birth-death dynamics approach to study the evolution of innate behavioral strategies in a simulated population of organisms. In particular, we performed agent-based stochastic simulations and mean-field analyses of organisms exploring random environments and competing with each other to find locations with plentiful resources. We find that when organism density is low, the mean-field model allows us to derive an effective objective function, predicting how the most competitive phenotypes depend on the exploration-exploitation trade-off between the scarcity of high-resource sites and the increase in birth rate those sites offer organisms. However, increasing organism density alters the most competitive behavioral strategies and precludes the derivation of a well-defined objective function. Moreover, there exists a range of densities for which the coexistence of many phenotypes persists for evolutionarily long times.


Author summary
The innate, or instinctual, behavioral strategies that populations of organisms employ to navigate their environments and fend for survival are shaped over epochs of evolutionary selection, in contrast to individual behaviors that can change within an individual's lifetime based on experience and sensory input. Understanding how the interplay between organism and their environment shapes which behavior strategies emerge as the most successful for a population's survival is a major problem in mathematical biology. Often, evolution is modeled as an optimization process that selects for behaviors that optimize the "fitness" of organisms in their environment. However, the fundamental evolutionary events are stochastic birth and death events, and the most successful organisms that emerge under these dynamics are not always those predicted by fitness-based approaches.
In this work, we use agent-based stochastic simulations and mean-field approximations of a mechanistic population dynamics model to investigate the evolution of a population's innate foraging strategies. In particular, we investigate when an emergent fitness function

Introduction
Foraging for resources-shelter, food, etc.-is one of the fundamental problems an organism must solve in order to survive and reproduce, ensuring the continuity of its species [1]. Many of the strategies organisms employ to solve such problems appear to be instinctual or innate; for example, birds and fish know where to migrate even if they have never visited their migration grounds before [2], and many complex species such as fish, reptiles, birds, and insects have been found to employ search strategies similar to the much simpler C. Elegans [3]. The fact that many disparate species have converged upon similar strategies suggests there may be unifying principles that can be captured by simple quantitative models to help explain the emergence of these behavioral patterns. Perhaps the most well-known candidate principle is "survival of the fittest:" species with adaptations that give them any edge for outcompeting other species in their environments are those that go on to proliferate. On its own this principle is somewhat tautologous, as the organisms deemed fittest for their environment are those observed to survive and proliferate [4]. In order to render this problem well-posed for quantitative modeling, a common approach is to define a "fitness function," a function of a population's phenotypic parameters whose output is a proxy for that phenotypic configuration to survive and reproduce in its environment. Often the fitness function is modeled as the fitness of a single representative agent, rather than an entire population. This idea has given rise to modeling evolution as an optimization process that maximizes an organism's fitness [4], i.e., selects for the phenotype parameters that maximize the fitness function. Fitness functions can range from relatively straightforward and concrete quantities, such as an organism's rate of producing offspring or the time it takes to find shelter or food [5][6][7][8], or they can be relatively abstract, such as the amount of information an organism's sensory organs convey to its nervous system [9][10][11][12]-the idea being that the more efficiently an organism's nervous system can process information it acquires about its environment, the more successful it will be at finding food, shelter, mates, etc.
Modeling evolution as an optimization process has had some notable successes in predicting features of biological organisms [12][13][14][15][16][17]; however, some caveats and cautions are in order. Firstly, one can always find an objective function and set of constraints for which an observed solution is optimal [18]. Secondly, there are examples in which evolution is demonstrably not optimal [19], exhibits significant deviations from optimality [16], or can even be repeatedly reversible [20,21]. These and other issues with modeling evolution as optimization have been discussed extensively by Doebeli et al. [4]. Doebeli et al. emphasize that the fundamental events in evolution are stochastic birth-death processes, which can give rise to complex evolutionary dynamics that cannot always be captured by optimizing fitness. For instance, an approach that focuses on birth-death events of entire populations can yield the distribution of successful phenotypes, including phenotypes that would be considered "sub-optimal" by optimization approaches yet are able to coexist with phenotypes predicted to be optimal.
In this paper we take the mechanistic approach, simulating the dynamics of a population of organisms with different innate behavioral strategies for exploring their environment in search of locations with favorable conditions for reproduction. In particular, non-consumable resources are distributed randomly throughout the environment, and each organism's behavioral strategy consists of the set of rates at which the organism will leave locations of each resource level in search of other locations with potentially greater amounts of the resource. Each location in the environment can only support a single individual, resulting in competitive population dynamics. The most competitive phenotypes in our model will be the set of relocation rates that enable organisms to proliferate, allowing these phenotypes to persist in the population over long time scales. Importantly, the birth rates of organisms in our model depend only on an individual's location within the environment, not that individual's phenotype (i.e., locations with plentiful resources confer larger birth rates, regardless of the phenotype of the organism). Thus, the reproductive competitiveness of a phenotype is only determined by how that phenotype enables our model organisms to outcompete other phenotypes in discovering high-resource locations. There is no explicit fitness function that the dynamics are assumed or enforced to optimize, but we show that in the limit of low interaction density we can derive a growth rate that is a function of the relocation rate parameters and serves as an emergent fitness function. However, we also show that in general this function does not describe the most successful phenotypes when interactions between individuals are frequent, or when demographic fluctuations are significant.
This paper is organized as follows: in Results we first define our stochastic model and its formulations as an agent-based simulation and a corresponding analytic description in terms of a stochastic master equation (Mechanistic model of evolutionary dynamics). We then show that the environmental structure and the population dynamics of the simulated model indeed select for particular phenotypes, even though birth rates depend only on spatial location and not the phenotype. Importantly, we find that while some components of the phenotype experience selection pressures strong enough to select for a very narrow range of values, others experience relative weak selection pressure, resulting in distributions of nearly equally competitive phenotypes (Distributions of competitive phenotypes are selected by birth-death dynamics). We show that we can understand the origins of weak selection pressures using a minimal mean-field model, in which we can derive the growth rate as a function of phenotype, which acts as an effective fitness function, when organism density is low (A minimal mean-field model explains variability in competitive phenotypes). In this limit we also study how the exploration-exploitation trade-off depends on the scarcity of high-resource sites and their corresponding birth rate (Distribution of resources shapes exploration-exploitation trade-offs). At higher density the interactions between individual conspecifics cannot be neglected and alters the dominant phenotypes: at very high densities there is a reversal in the most competitive phenotypes, and there is a range of intermediate densities at which many phenotypes coexist for evolutionarily long times (Ranges of population density allow for long-lasting phenotype coexistence). We conclude in the Discussion by addressing the interpretation of our results in the context of mechanistic and normative approaches to the evolution of biological structure (Summary and interpretation of results) and comparing our results to those of related studies (Comparisons to related work), as well as discussing the limitations of our approach (Limitations of the study) and future directions for extensions of this work (Future directions).

Mechanistic model of evolutionary dynamics
To investigate the evolution of foraging behavioral strategies, we focus on modeling a population of organisms that must navigate their environment in search of locations favorable for reproduction. For simplicity, we will assume the favorability of each location is determined by the amount of a non-depleting "resource" at this location. Additionally, each location has a fixed carrying capacity, allowing only a single individual to occupy the location at any given moment. Organisms are able to detect the resource level only at their current locations, and must decide whether to remain where they are or explore in search of regions with more resources (see Fig 1A); this is referred to as the "exploration-exploitation trade-off" [22]. In this model the organisms' movements are not deterministic, but instead each individual will attempt to relocate stochastically at rates determined by the amount of resource at the individual's current location; the value of these rates is determined by the individual's phenotype. The The model consists of a population of agents foraging in an environment with a random distribution of resources, in which each agent can either asexually produce an offspring (as shown by the orange arrow) or relocate to one of the vertical or horizontal neighboring locations (as shown by the dashed blue arrow). Different colors here distinguish agents with different phenotypes. B. The environment is a two dimensional discrete lattice on which each location has a predetermined resource level S k , k 2 {1, 2, 3, 4, 5}, chosen from a distribution (a discrete uniform distribution shown here). A 20 × 20 subset of the environment is shown here. Each site on the lattice is colored according to its resource level S k . Agents at locations with resource level S k will reproduce at a birth rate b k . C. Each agent has a phenotype vector with 5 components, corresponding to the relocation or hopping rates of the agent at 5 different resource levels. These components are initially chosen randomly for each agent and are shaped over the course of evolution. In the example shown here, agent 2 has a relocation rate γ 3 � 4 when it is at a site with resource level S 3 . In the agent-based stochastic model, an offspring inherits a noisy copy of its parent's phenotype, as shown here for the phenotypes of agent 1 and its offspring.
https://doi.org/10.1371/journal.pcbi.1009934.g001 resource level at each location is predetermined and sampled from a discrete distribution; see Fig 1B, depicting a discrete uniform resource distribution. The set of relocation rates (used interchangeably with the term "hopping rates") for each resource level comprises an organism's phenotype in this model; see Fig 1C.
In addition to determining an organism's relocation rate, the resource level at each site directly determines the organisms' birth rates at that location. Importantly, the birth rates do not depend on the organisms' phenotype-the only way the phenotype influences the organism's reproductive success is by improving the organism's odds of encountering high-resource sites and staying there long enough to produce sufficiently large numbers of offspring. That is, the selective advantage of a phenotype is due only to its dynamical success in the environment, not any intrinsic advantage. For simplicity, we model all reproduction as asexual. In the agentbased model to be defined shortly, the offsprings' phenotypes are taken to be noisy copies of the parents' phenotype (see "Agent 1" and "Agent 1 offspring" in Fig 1C). In the mean-field model discussed in the next section, the offsprings' phenotypes will be direct copies of the parents'.
The final ingredient the model needs is a means of selection. Typically, one would allow for stochastic death of individuals, such that phenotypes that do not confer a sufficiently competitive advantage will eventually die out, leaving the more competitive phenotypes to dominate the population. However, a technical downside of stochastic death is that extinction becomes not only possible but inevitable, potentially complicating investigation of long-term phenotype distributions. To avoid the technical complications of stochastic death, we instead induce selection by subjecting the population to a periodically-occurring "catastrophe" that randomly culls individuals until the population is reduced to its initial size. In doing so, the population is never at risk for extinction, but selection still occurs because the phenotypes that are able to produce large numbers of offspring have better odds of surviving each catastrophe.
To formalize this schematic picture, we first build an agent-based stochastic model. In this model the population consists of individual organisms, or "agents," each of which is assigned a location within a 2d environment, modeled as a discrete lattice, and the values of its hopping rates γ at each resource level. In the simulations we investigate here, we assume the resource level of site at location x, labeled s x , is drawn from 5 distinct resource levels, s x 2 {S 1 , S 2 , S 3 , S 4 , S 5 }. Thus each agent has 5 corresponding hopping rates γ = {γ 1 , γ 2 , γ 3 , γ 4 , γ 5 }, one for each of the five resource levels. The values of each of these hopping rates are initially chosen randomly from the range [γ min , γ max ], where γ min is the minimum rate at which an organism relocates and γ max is the fastest rate an organism can relocate-we interpret this upper bound on relocation rates as being set by metabolic constraints that we do not model or allow to change during the evolution of the population.
The state of the population of agents is updated in continuous time. At any given moment, an agent can either relocate to or asexually give birth to an offspring into one of the vertical or horizontal neighboring locations, as schematically shown in Fig 1A (because only one agent is permitted per site, an agent cannot give birth to a child on its current site). The target site is chosen randomly with a probability 1/4 in a 2d environment. The per-capita birth rate of an agent, b(s x ), is a function of the local resource level s x at location x. Because the resources are not depleted in this model the actual numerical values of the s x are unimportant; the meaningful values are the birth rates b(s x ). However, we assume that the values of the birth rates are ordered according to resource level, such that a site x with resource level s x = S 1 has the lowest birth rate and a site with resource level s x = S 5 has the highest birth rate. Again, we stress that the birth rates are entirely independent of an agent's phenotype γ.
Agents with very different phenotypes will navigate the environment in different ways. An agent with a small value of γ 1 , for example, will tend to remain in sites of resource level S 1 for long periods of time, while agents with large values of γ 1 will tend to leave sites of this resource level rather quickly. Because the birth rate in sites of resource level S 1 is low, we might expect that agents with small γ 1 may be less successful at proliferating than agents with larger γ 1 , and will therefore be less likely to survive the periodic catastrophes that occur, creating a selection pressure toward larger values of γ 1 . Similarly, agents with small (large) values of γ 5 will tend to remain in these high-resource sites for long (short) periods of time. Agents with small γ 5 may therefore produce more offspring because b(S 5 ) is the highest, increasing their odds of surviving the catastrophes. This line of thinking makes it seem intuitively "obvious" that the most competitive phenotype ought to have large hopping rates out of low resource sites and minimal hopping out of the highest resource sites. Moreover, we might expect we can identify an objective function that yields this phenotype as its optimum. In the next sections we present the results of our stochastic simulations as well as the numerical solutions of the mean-field model, showing that this intuition is largely correct when the population density is low-in which case we can indeed derive an effective objective function-and when the distribution of resource levels are drawn from a discrete uniform distribution, but this picture is not always correct when high resource sites are rare or population density is high.

Distributions of competitive phenotypes are selected by birth-death dynamics
For the results presented in this section, we simulate our agent-based model using the Gillespie algorithm [23], starting with 500 agents on a 128 × 128 2d lattice. This corresponds to an initial population density of 3.1%. Each lattice site belongs to one of the 5 resource levels {S 1 , S 2 , S 3 , S 4 , S 5 }, which is drawn from a discrete uniform distribution. Initially, each agent is assigned a spatial location x = (x, y) and a phenotype vector γ = (γ 1 , γ 2 , γ 3 , γ 4 , γ 5 ) uniformly at random. Catastrophes occur after a simulation runtime T, which defines an iteration of the simulation. While the catastrophe randomly picks which agents to cull, it always reduces the total population back to 500 agents. In Distribution of resources shapes exploration-exploitation trade-offs we consider non-uniform distributions, and in Ranges of population density allow for longlasting phenotype coexistence we investigate different lattice sizes as a means of changing the initial population density. The details of the simulation setups are presented in the Methods Section. Fig 2A-2D displays the evolution of the distribution of phenotypes over several thousand catastrophes, with snapshots at (A) 0, (B) 50, (C) 500, and (D) 20000 iterations; these distributions represent averages over 20 trials of the dynamics starting from random initial conditions generated from the same distribution. The initially uniform distributions of each of the five hopping rates (phenotype components) are gradually reshaped by births and deaths. At long times, the distributions are approximately stable; we do not expect the distributions to converge to a single value for each γ k due to the reproduction noise that accompanies each birth. See S1A-S1C We see that the long-time distribution appears to favor high hopping rates in sites with resource levels S 1 to S 3 and low hopping rates for sites with the highest resource level S 5 , reflecting our intuition that the most competitive phenotypes would be those that result in agents leaving their location as fast as possible when at low resource sites and staying put as long as possible when at high resource sites. Notably, however, the distribution of the hopping rates γ 4 at the second-highest resource level S 4 is not narrow, but spans much of the entire possible range of hopping rate values. This indicates that the selection pressure on this phenotype component is relatively weak compared to the selection pressure on the other components. While the average value of γ 4 is closer to γ max than γ min , the large variance demonstrates that the most competitive phenotype does not conform tightly to our intuitive predictions, and the situation is more complex than might be naively expected. To gain insight into our results and investigate how selection pressure depends on our environments' properties, namely the distribution of resources and the values of the birth rates tied to each resource level, we develop a deterministic minimal mean-field model that enables us to derive an effective objective The stochastic birth-death dynamics selects out competitive phenotypes whose first three phenotype components γ 1,2,3 saturate at hopping rate of around 9, close to the maximum possible hopping rate of γ max = 10, and whose last phenotype component γ 5 saturates at hopping rate around 0.5, close to the minimum possible hopping rate of γ min = 0. In contrast, the fourth component γ 4 approaches a value around 8 with a relatively large variability throughout the simulation. F. Predictions of the mean relocation rate of each phenotype component and their standard deviations using the coupled 5-state mean-field model, discussed in A minimal mean-field model explains variability in competitive phenotypes. Note that the mean-field model uses a slightly different birth rate b 5 = 5, while the stochastic simulations use b 5 = 4, and the 5-state mean-field model converges at a much shorter timescale (700 iterations vs. 20000 iterations). These quantitative differences between the full agent-based model and the simplified 5-state model are discussed in detail in the Discussion, S5 and S6 Figs.
https://doi.org/10.1371/journal.pcbi.1009934.g002 function in a low-density limit. This fitness function predicts the optimal phenotypes and explains the lack of selection pressure on γ 4 observed in our stochastic simulations.

A minimal mean-field model explains variability in competitive phenotypes
Agent-based models are often difficult to formulate directly as mathematical models, owing to the combination of having individual with private properties (here, their phenotype) and the fact that agents may be born or culled at any time. However, a formalism that can approximate the agent-based model outlined here is a master equation formalism that keeps track of the probability of configurations of the number of agents with each phenotype at each spatial location. From such a formalism one can then derive a system of differential equations for the expected number of agents with each phenotype and location. Such a model serves as the starting point for the reduced model we investigate in this section. We focus only on the reduced mean-field model here; see the Methods for details on the full master equation model and the approximate reduction to the minimal mean-field model described below.
When agent density is low, such as in the simulations of the previous section (initial density of 3.1%), we may assume that events in which an agent is blocked from relocating to another site by another occupying agent are rare and may be neglected. In this approximation the growth dynamics of the different phenotypes decouple, and we can describe the dynamics of the population as a set of linear differential equations describing the expected number of agents at each site. To simplify the mean-field model further, we use the fact that each site belongs to one of the five resource levels and only count the number of agents at each level, eliminating the spatial dependence of the environment. This reduces the model to a 5-state model whose solution is the time course of the expected number of agents of each phenotype at each resource level; see the Methods for a detailed derivation of this reduction. The resulting system of differential equations reads where n kγ is the expected number of agents at a site with resource level k and phenotype γ. The parameter b k is the per-capita birth rate of resource level k, a constant of the environment, and γ k is the hopping rate out of state k, determined by the phenotype. Again, we note that in this approximation the dynamics of different phenotypes are decoupled, such that there is a system of differential equations of the form Eq (1) for each phenotype γ. For practical calculations we must discretize the phenotype space, giving a finite system of differential equations. Finally, the weights w k correspond to the probability that a site in the agent-based model has resource level S k . For resource levels drawn from a discrete uniform distribution, which we investigate first, w k = 1/5 for all k = 1 to 5. The first term, w k ∑ i b i n iγ , in Eq (1) is the number of new agents being birthed into state k by agents in all other states. The second term w k ∑ i γ i n iγ is the number of agents hopping into resource level k from all other resource levels, formally including resource level k itself to account for the possibility of an agent hopping from a site of resource level k to a neighboring site that also has resource level k. The last term −γ k n kγ is the number of agents hopping out of resource level k.
The system of differential Eq (1) can be written down in matrix form as dn g ðtÞ dt ¼ M g n g ðtÞ, which has a solution in terms of the matrix exponential, n g ðtÞ ¼ expðtM g Þn g ð0Þ. The betweencatastrophe dynamics of each phenotype are therefore dominated by the eigenvectors and eigenvalues of M g . Although the dynamics of the phenotypes are independent, they are coupled through the periodic catastrophes, which reduce the population down to its original size after a time T. Let Λ γ be the maximum eigenvalue of M g , which we interpret as a function of the phenotype parameters γ = (γ 1 , γ 2 , γ 3 , γ 4 , γ 5 ). In the Methods we show that the matrix M g has purely real eigenvalues and only the maximum eigenvalue Λ γ is positive. Therefore, the phenotype γ that yields with the largest maximum eigenvalue Λ γ overall will be the only phenotype to survive after a large number of catastrophes, with coexistence possible only if different phenotypes have the same value of Λ γ . Because the hopping rates γ are just parameters in this formulation of the model, the eigenvalues of M g will simply be functions of the phenotype parameters (as well as environmental parameters), and therefore the maximal eigenvalue serves as an effective objective function of the system in the low-density limit.
If the maximal eigenvalue of this minimal model acts like an effective objective function, then we would expect the lack of selection pressure on the hopping rate γ 4 to be reflected in an insensitivity of this eigenvalue to varying γ 4 , i.e., there would be a range of values of γ 4 for which the maximum eigenvalue approximately takes on its largest value. This is indeed what we find but only when the birth rate in the highest resource site b 5 is close to a particular critical value B 5c . As shown in S2 Fig, when b 5 > B 5c we find that the competitive population dynamics drive γ 4 ! γ max , and the eigenvalue Λ γ is insensitive to variations in the value of γ 4 around this point. When b 5 < B 5c and is close to b 4 , the birth rate in sites of resource level S 4 , we instead find that selection drives γ 4 ! γ min , again with little variation in the maximal eigenvalue. This makes sense: if the birth rates b 4 and b 5 are close then there is little advantage in leaving a site with resource level S 4 to search for a site of level S 5 . That is, the maximal eigenvalue predicts a transition from γ 4 ! γ min to γ 4 ! γ max as b 5 increases from b 4 and crosses B 5c . Near the transition at b 5 � B 5c the maximum of the eigenvalue is less sensitive to γ 4 -equivalently, fluctuations in γ 4 are large, as is often observed near phase transitions and bifurcations [24], such that selection is weak, resulting in coexistence of a range of γ 4 values near this transition, at least in a finite population. See S3A-S3C Fig for the distributions of phenotypes around this transition point. Fig 3A, upper panel, demonstrates that the predicted transition occurs in both our minimal mean-field model and our full agent-based simulations. The critical value B 5c is different for the full stochastic simulation and the mean-field model, but this is to be expected due to the simplifications of the mean-field model as well as the lack of demographic noise. As seen in Fig 3A, upper panel, the distributions of γ 4 are indeed narrower when b 5 is far from the critical value B 5c , reflecting that the selection pressure indeed increases when there is a clear advantage to exploring (b 5 > B 5c ) versus exploiting (b 5 < B 5c ). In principle, we might expect to see similar transitions in the other hopping rates γ 1 , γ 2 , and γ 3 if we adjust the value of b 4 , b 3 , etc. Instead of varying these parameters to investigate whether such transitions exist, we shift focus to another ecologically important scenario in which these transitions will occur: the effect of the distribution of resources throughout the environment.

Distribution of resources shapes exploration-exploitation trade-offs
The distribution of resources throughout an environment will have an impact on the most successful phenotypes. For instance, when high resource sites are scarce, it may be advantageous to settle for lower resource locations. To explore this effect, we vary the distribution of resource levels within the random environment, in particular by selecting distributions in which the high-resource sites are much rarer than lower resources sites. In the previous sections, the probability that any site has resource level S k , denoted w k , was chosen to be w k = 1/5; i.e., a discrete uniform distribution of the 5 resource levels. A simple choice of distribution in which high resources sites become rarer than low resource sites is an exponential probability w k = Ae −ck , where A À 1 ¼ P 5 k¼1 e À ck is the normalization constant and c is a parameter that determines the steepness of the distribution. The factor w k is the same factor that appears in Eq (1). Note that c = 0 results in w k = 1/5; a positive c results in a distribution that resource level rarity increases with k, that is, the high resource levels become rarer than the low resource levels.
As shown in Fig 3A, lower panel, our agent-based simulations confirm that the transition from an average value of hγ 4 i = γ min = 0 to γ max = 10 still exists when c is, e.g., 0.5, but occurs at a much larger value of b 5 � 11. Indeed, as seen in Fig 3B, there exists a transition line in the b 5c plane that separates hγ 4 i � γ min from γ max = 10. The orange dashed curve is the boundary predicted by evaluating the maximal eigenvalue of the low-density model Λ γ as c varies from 0 to 1 and solving for the value of b 5 that yields γ 4 = (γ max + γ min )/2 = 5 as the most competitive phenotype. This curve represents the line along which the population achieves a balance of the exploration-exploitation costs. See S1D-S1F  Intuitively, agents staying at low resource levels need sufficient incentives to leave their current locations and start to explore new sites, at the risk of missing out reproductive opportunities while exploring. These results demonstrate that rarity of high resource sites can drive selection of phenotypes that settle for sites with lower resource levels.
Importantly, here the rarity of sites is only due to the intrinsic environmental distribution of resources. However, high resource sites could also become dynamically rare and unavailable as they become occupied, preventing multiple individuals from exploiting the same sites. So far we have been working within a low-density regime, such that it is relatively rare for organisms' movements to be blocked by other individuals, and such that a sufficient fraction of high resource sites remain available for discovery and occupation. We now turn to investigating how the competitive phenotypes change with organism density.

Ranges of population density allow for long-lasting phenotype coexistence
In order to investigate how population density impacts the most competitive phenotype changes, we return to our agent-based stochastic simulations with resource levels drawn from a discrete uniform distribution. The simulations presented previously had 500 agents on a 128 × 128 2d lattice, resulting in an initial occupancy level of 3.1%. To study the effect of population density we keep the initial number of agents fixed and decrease the lattice size L × L to L = 96, 80, 64,48,40,35,32,28,26,25,24, and 23, giving initial occupancies of 5.4%, 7.8%, 12.2%, 21.7%, 31.2%, 40.8%, 48.8%, 63.8%, 74.0%, 80.0%, 86.8%, and 94.5%, respectively. One could of course also vary density by increasing the initial agent number and keeping the lattice size at 128 × 128, but increasing the number of agents dramatically increases the simulation load.
For each density we ran 20 independent trials of the agent-based stochastic simulation, the results of which are summarized in Fig 4. We found that as density-and hence interactions among agents-increases, the most competitive phenotypes also change. Notably, these changes qualitatively mimic the effects of increasing the rarity of high resource sites: the mean hopping rate hγ 4 i saturates at a value γ min < hγ 4 i � 5 < γ max with a large variance at 12.2% initial density, and drops to hγ 4 i � γ min with small variance at higher densities. See The results observed in our stochastic simulations are replicated by a 5-state mean-field model that incorporates the effects of site occupation: where n kγ is again the number of agents at sites of resource level k and M = L 2 is the total number of sites, with w k M the expected number of sites of resource level k, which sets a maximum occupancy for each state. Note that when P γ 0 n kg 0 w k M � 1 this model reduces to Eq (1). Importantly, Eq (2) is nonlinear, and unlike the low-density approximation of Eq (2), all phenotypes are dynamically coupled. Not only does this preclude writing down an analytical solution, but the dynamical coupling between phenotypes also suggests there may be no effective objective function of a single arbitrary set of the phenotype parameters γ = (γ 1 , . . ., γ 5 ) that predicts the most competitive phenotype simply by maximizing it with respect to these values. As in the low-density model, we solve Eq (2) for a discretized set of phenotypes. Solving the high density model is comparatively computationally demanding due to the necessity of solving for all phenotypes simultaneously, but is still feasible for the set of discretized phenotypes  Fig 2E at 3.1% density. The initial density is varied by fixing the initial population size at 500 and decreasing the environment size L × L with L = 128, L = 64, 48, 32, and 24. We observe a systematic shift of the most competitive phenotypes; in particular, the fourth phenotype component γ 4 decreases as the initial density increases. E. Mean values of γ 4 after 20000 iterations in the agent-based model as a function of the initial density, showing the change in the most competitive phenotype. At ultrahigh densities, e.g. 94%, there is no space for agents to move or give birth, resulting in little change in the initially uniform phenotype distribution. F. Mean-field model predictions of the most competitive phenotypes as a function of initial density after 700 iterations. A birth rate of b 5 = 5 is used in the mean-field model so that hγ 4 (t)i converges to 10 at low densities.
https://doi.org/10.1371/journal.pcbi.1009934.g004 chosen in this study. As shown in Fig 4F, the mean-field predictions are consistent with our full stochastic simulations, and we indeed observe changes in the most competitive phenotype as a function of initial density N 0 /M. At low densities the value of hγ 4 i is close to γ max (after 700 catastrophes), and as N 0 /M increases we see a coexistence region in which multiple phenotypes are present at an observation time T obs , before hγ 4 i drops to γ min at densities N 0 /M ≳ 20%. As N 0 /M increases towards 1, we observe an increase in hγ 4 i. This extreme high-density limit represents the scenario in which the population is so dense that there is little room for agents to give birth or move, leading to an extremely slow change in the composition of the phenotypes. In the extreme limit N 0 /M = 1 we have hγ 4 (t)i = (γ max + γ min )/2 for all times t, due to the fact that phenotypes are initially present in equal proportions and there is no room for any agent to move or give birth.
The mean-field model is useful for dissecting the dynamics of the phenotype changes. Initially, we hypothesized that the dominance of γ 4 = γ min phenotypes at high densities is due to the S 5 resource sites filling up, leaving any γ 4 = γ max agents that did not find an S 5 site to search fruitlessly for resource-level 5 sites instead of producing offspring, allowing the γ 4 = γ min phenotypes to gain the reproductive advantage. However, as shown in Fig 5, it turns out that early mean-field model; insets at 3.1% and 21.7% initial densities show a zooming in of the initial transient dynamics from iteration 0 to 50. Initially in the evolution, γ 4 = γ max = 10 agents show a reproductive disadvantage, intuitively because agents with larger γ 4 tend to leave S 4 resource levels (with a relatively high birth rate) more often and produce less offspring, while later in the evolutionary process this disadvantage is overcome by γ 4 = γ max agents in low densities as they eventually find enough available S 5 sites to overcome their initial disadvantage. However, as density increases, other agents can quickly fill up S 5 sites and leave little opportunities for γ 4 = γ max agents to overcome that initial disadvantage. Note that b 5 = 5 is used in these numerical solutions so that hγ 4 (t)i converges to 10 at low densities.
https://doi.org/10.1371/journal.pcbi.1009934.g005 in the evolutionary dynamics the γ 4 = γ max agents lag behind the other phenotypes, as the γ 4 = γ max agents leave S 4 sites more frequently and explore the neighboring locations while agents with lower γ 4 gain an advantage by tending to stay at S 4 longer and produce offspring at a relatively high birth rate b 4 . At low densities the γ 4 = γ max agents eventually discover S 5 sites and their reproduction rate is able to overcome their slow reproductive start. However, at higher densities the γ 4 = γ max agents are increasingly less able to overcome the other phenotypes' head start, and are eventually outcompeted completely by the γ 4 = γ min phenotypes at sufficiently high densities.
Unlike the low-density model, we cannot tractably derive an effective objective function similar to Λ γ in the high-density regime-in particular, it is not clear whether there exists a function C(γ) of a single phenotype vector γ whose optima predict the most competitive phenotypes. We discuss this issue further in Evolution as an optimization or learning process. Even if no such cost function for the phenotypes exists, one might wonder whether the dynamics for the (expected) population sizes can be written as a dynamical system that follows the gradient of some cost function C(n), dn k;γ dt ¼ À @ @n k;γ CðnÞ. In evolutionary models, the negative of the cost function −C(n) is often interpreted as fitness. However, the vector field defined by the right hand side of Eq (2) is not conservative, and therefore cannot be written as the gradient of an objective function C(n) (see The mean-field dynamics are not gradient-based in Methods). The evolution of the population sizes therefore cannot be interpreted as optimization problem, at least using gradient dynamics.

Summary and interpretation of results
In this work we have used a population dynamics approach to explore the evolution of the innate foraging behaviors of a population of organisms exploring an environment with randomly distributed non-consumable resources. These organisms have limited sensory capabilities and no memory, and must take actions to either continue exploiting the resources at their current location or leave in search of locations with more resources (i.e., more favorable to reproduction). The phenotype of the organisms comprises the rates at which the agents attempt to relocate. While an "intuitive" prediction-the agents should leave low-resource sites as quickly as possible and not move from the highest resources sites-is indeed correct when the population density is low and high resource sites are readily available or have high enough birth rates to compensate their scarcity, this intuitive prediction becomes incorrect when the difficulty of finding high resource sites increases due to intrinsic scarcity or occupation by other organisms in high-density populations. In particular, as the initial population density varies, we not only observe a change in the dominant phenotype at long times, but we also observe long-lived periods of phenotype coexistence.
We have used a combination of stochastic agent-based simulations and a simplified meanfield model to gain insight into the basic phenomenology of the evolutionary dynamics. In the low-density limit we derived an effective fitness function that predicts the dominant phenotype at long times, but could not identify such a function that predicts the dominant phenotypes in higher density environments. Similarly, the resulting dynamics of the average population sizes are not interpretable as gradient-based learning dynamics, as might be obtained from a normative approach in which the organism phenotypes are predicted by optimizing a hypothetical objective function. Thus, there is no guarantee that one can in general derive a function of the phenotype parameters that predicts the optimal phenotype-or distribution of phenotypeswithout running the dynamics of the model.
We will discuss some of these points in detail in the following subsections, along with detailed remarks on Comparisons to related work and Limitations of the study, and we close with a brief discussion of Future directions.
Comparison of the agent-based and mean-field models. Overall, the outcomes of the full stochastic agent-based model and the minimal 5-state mean-field model we used to gain further insight were in qualitative agreement. This is often the case in statistical physics models; for example, mean-field approximations of Ising models of magnetism correctly predict the qualitative features of the phase diagram of the model, but overestimate the critical temperature at which a transition from zero to non-zero net magnetization of the material occurs [24], similar to the overestimate of the critical birth rate for the transition shown in Fig 3. This discrepancy may also be due to the well-mixed assumption made during the reduction of the full 2d model to the 5-state model, discussed in the next paragraph. Despite some quantitative discrepancies in predicting the values of the critical birth rates B 5c (c), the low-density meanfield model accurately predicts the transition boundary from hγ 4 i � 10 to hγ 4 i � 0 as a function of the resource scarcity c, as shown in Fig 3. The primary source of quantitative discrepancies between stochastic models and meanfield models is the neglect of fluctuations and the necessity of choosing a carrying capacity that takes continuous inputs (which gives rise to the factor 1 − ∑ γ 0 n kγ 0/(w k M) in Eq (2)). In many mean-field approximations of master equation models, upon which our 5-state model is based, the variance of these fluctuations eventually invalidate the mean-field approximation at long times or when the mean population sizes are small. As discussed in Limitations of the study, we avoid these causes of mean-field breakdown by using a periodic catastrophe to reduce the population size to a fixed level, rather than stochastic death. Additional discrepancies are caused by the reduction from the mean-field approximation for the full 2d environment to the 5-state model (see Numerical solution and analysis of the 5-state mean-field model for details). Some of the discrepancy in Fig 3 may also reflect the effects of population density, particularly in our simulations with scarce resources, as the transition occurs at relatively large values of b 5 (compared to the discrete uniform distribution of resource levels), which could lead to the population growing large enough before each catastrophe that interactions between individuals are not negligible, reducing the validity of the low-density model used to predict the transition boundary in Fig 3B. The largest qualitative discrepancy between the stochastic simulations and the mean-field model is the timescale of the evolutionary dynamics. Saturation of the hopping rates in the stochastic model appears to take *30 times longer than in the mean-field model, despite the birth rates, hopping rates, and catastrophe period parameters being nearly the same across the models. We believe this discrepancy is due to the well-mixing assumption of the mean-field model, which posits that the spatial variability of agents throughout the environment is negligible, allowing our reduction to the 5-state model; we quantify the magnitude of this approximation in S5 Fig. This conjecture is supported by direct simulations of a 5-state agent-based model, which shows a convergence time similar to the mean-field model predictions (see S6  Fig). We therefore expect that in a mean-field model that takes into account spatial variation of agents and resources we would get convergence times more similar to the full 2d agentbased simulations.
Metastability of phenotype coexistence. In both our stochastic simulations and meanfield model we observed long-lived coexistence of different phenotypes (Figs 4 and 5). In the low-density limit of the mean-field model we could show analytically that this coexistence is almost-surely transient: only the phenotypes with the largest eigenvalues Λ γ will survive in the limit of infinitely many catastrophes, ℓT ! 1. The exceptions in which coexistence is possible in this long time limit are fine-tuned conditions under which multiple phenotypes have the exact same values of Λ γ . Therefore, outside of this special fine-tuned condition, at low-density coexistence is metastable in the sense that phenotypes may coexist for long durations of time but the ultimate fixed point is survival of only a single phenotype (we remind the reader that extinction is not possible in our setup; in practice the fate of any model with a finite number of agents and stochastic death is extinction [25]). In the stochastic agent-based model the reproduction noise prevents any one phenotype from being the only survivor, but the distribution of phenotypes will be narrow around a dominant phenotype if it exists. In contrast, long-lived coexistence is represented by a wide distribution of phenotype that persists across a large number of catastrophes. In both the mean-field model and the agent-based simulations the timescale of this phenotype coexistence depends strongly on the value of b 5 : when b 5 � B 5c the distribution of γ 4 remains wide for a long time, while the distribution quickly converges when b 5 � B 5c or b 5 � B 5c . This long-lived phenotype coexistence is again somewhat fine-tuned, requiring b 5 to be close to B 5c .
In the high density regime (Eq (2)), on the other hand, we observed a range of densities for which multiple phenotypes coexist for long times and do not show clear signs of a trend towards a single dominant phenotype. This form of phenotype coexistence appears more robust-requiring less fine-tuning of the initial density-than the coexistence observed in the low-density regime when the birth rate b 5 is close to the critical value B 5c . However, it is difficult, if not analytically intractable, to determine whether the phenotype distribution would ultimately converge to a single dominant phenotype if run for a much longer time, or if there is a broader range of parameter space for which coexistence occurs in the infinite time limit. Inspection of the average hopping rate dynamics hγ k (t)i in the mean-field model ( Fig 4F) seems to suggest there will be fine-tuned values of the initial density N 0 /M at which two or more phenotypes may coexist in the long-time limit, but deviations from these values result in a single surviving phenotype in the long time limit, rendering coexistence of phenotypes in the high-density regime similar to low-density coexistence near b 5 � B 5c . However, because the mean-field model does not include reproduction noise, it is possible that the reproduction noise could stabilize coexistence states, preventing any one phenotype from dominating at long enough times in the full stochastic model-the trends of hγ 4 (t)i in Fig 4A and 4B could be reflecting such a behavior.
Evolution as an optimization or learning process. In this work we investigated the evolution of innate foraging behaviors using evolutionary population dynamics [4], in contrast to the more common normative approaches in which evolution is assumed to act like an optimization process that extremizes some objective function subject to constraints [6,26,27]. One of our original motivations for doing so was to better understand conditions under which we might expect such normative approaches to be mimicked naturally by the population dynamics, either through the population dynamics being interpretable as gradient dynamics or by deriving an effective "fitness function" that predicts the dominant phenotype, or conditions under which typical normative approaches might need to consider "multi-agent" interactions to properly predict emergent phenotypes, a growing trend in deep reinforcement learning [28]. While normative methods have proven valuable for developing insight into biological structure in many areas of application [12,17], conditions under which evolutionary dynamics can be expected to optimize abstract quantities such as "information" have not been extensively explored, and our goal was to gain insight into such limitations with a simple population dynamics model.
We found that at low initial population densities we could effectively capture the evolution of the agent phenotypes with a mean-field model that neglects interactions between different agents, and hence different phenotypes. Effectively, agents of different phenotypes only interact through the periodic catastrophe we use in our model to drive phenotype selection. In this regime it turns out that the overall growth rates of each phenotype are independent, and the catastrophes ultimately select for the phenotype with the largest growth rate. Thus, in this regime the maximum eigenvalue of the linear dynamics serves as an effective fitness function for the population that predicts the dominant phenotype at long times. This result is non-trivial, because a priori it was not obvious that the catastrophes, which nonlinearly couple the phenotypes, would not change the predictions of the linear eigenvalues. We might therefore hypothesize that organisms that evolved under low-density conditions in which individuals were not often competing for resources or space may be more amenable to normative approaches that aim to predict the optimal phenotype through the use of objective functions of only a single representative agent. This said, in a more realistic stochastic model with asynchronous death rather than a synchronous selecting catastrophe, there is no mechanism (other than extinction or limited resources) for keeping the population size from eventually growing large enough to invalidate the low-density approximation. In an environment with a very large carrying capacity the low-density approximation may still be valid on the relevant evolutionary timescales, and the eigenvalues (growth rates) of such a mean-field model could still predict the phenotype that comprises the largest fraction of the population.
When the population density is sufficiently high that interactions between agents cannot be neglected, then interactions between individuals can alter which phenotypes dominate the population at long times, or lead to long-lived coexistence, and there does not appear to be an obvious single-agent objective function that can be derived to predict the phenotype distributions observed in our simulations or mean-field model. In light of this, one might wonder whether the dynamics of the expected numbers of agents of each phenotype might follow gradient dynamics, but this is not the case for the model studied here. This suggests it may not be possible to derive a function of a single representative phenotype parameter to exist in the interacting regime.
While the conditions under which evolution acts as an optimization process are not generally clear, and may not be valid in some circumstances [20,[29][30][31], evolution can perhaps still be interpreted as a kind of learning process. While the individual agents in our model do not possess any ability to learn, the population effectively "learns" the structure of the environment over the course of its evolution: the phenotype structure of the population after a long time depends on the structure of the environment and the dynamics of the agents. Even though there is not necessarily any well-defined fitness or objective function being optimized, the adaptation of the population to the environment displays some of the hallmarks of learning.

Comparisons to related work
Several lines of work have used population dynamics approaches based on master equation models to study the evolution of species under various ecological situations [25,32], including competition for resources [33][34][35][36], prey and predation [37][38][39][40], evolutionary arms races [41,42], and more. Our model follows this methodology, and may be viewed as a generalization of logistic population growth to a spatial environment with multiple possible phenotypes competing for the same limited resources.
Because our model considers only organisms of the same trophic level (i.e., similar organisms compete for a common pool of resources, as opposed to, e.g., predator and prey dynamics), it is similar to the context of the neutral theory of biodiversity [35,[43][44][45]. The neutral theory of biodiversity assumes that the relative abundances of different species occupying the same environment is not necessarily due to relative evolutionary advantages or disadvantages, but potentially simply due to stochasticity of birth-death dynamics-or "luck." This idea has been a subject of debate in ecology [35], but neutral models have been successful in explaining some observed trends in experimental data, although a more complete picture of ecological dynamics is thought to require non-neutral effects [33,43]. While our model population has no intrinsic differences in terms of their raw birth rates, and hence may superficially be considered neutral, different phenotypes do have emergent advantages over one another because some phenotypes are certainly more successful than others at finding high resource sites and producing more offspring, and is therefore effectively not neutral.
Beyond master-equation based models, there are many other studies that take normative approaches to understanding the behaviors and structures of organisms performing similar foraging tasks. Work in this vein may take several forms of abstraction, ranging from models in which the quantities to be optimized are directly relevant quantities-such as expected numbers of offspring [5] or the expected time it takes organisms to find a stockpile of food [6,7]-to more abstract "rewards" such as the information an organism has extracted from its environment [11,17,27,46,47]. In some cases these normative models consider just a single representative agent and its associated parameters that determine its dynamics over the course of a specified task, and hence phenotype. More recent work has considered the dynamics of multiple agents performing tasks through the lens of evolutionary game theory [17,[48][49][50][51], sometimes taking the limit of a continuous population density, effectively reducing the individual agent-based dynamics to a coarse-grained statistical description [27]. Many similar approaches are being developed in machine learning applications [28]. We note that the dynamics in these normative approaches are typically a single instantiation of a task, such as a single catastrophe period in our model, not a full evolutionary dynamics, though some studies do use evolution-inspired "genetic algorithms" [52], in which the parameters of a population are remixed and inherited by the next generation after a certain amount of simulation time (i.e., there are typically no mechanistic birth-death events as in the model considered in this work). Relatedly, multiple independent lines of work have begun formulating Bayesian approaches to bridge mechanistic and normative approaches to understanding biological structure [6,9,10]. As investigation of biological evolution from this multitude of approaches continues to develop, it would be interesting for future studies to directly investigate the alignment of the outcomes of agent-based birth-death dynamics and normative approaches in such multi-agent settings, especially as increases in computational efficiency allow for simulation and investigation of increasingly complex agents.

Limitations of the study
As with any model, there are many aspects of the phenomena we seek to study that cannot be fully captured by our approach or assumptions, and warrant discussion. We focus on discussing how some of the important model simplifications could potentially be relaxed, and consider possible implications for empirical data at the end of this section.
Disparity between evolutionary timescales and individual agent lifetimes. One of the most challenging aspects of modeling the evolutionary process is that the actions of individual organisms take place on time scales orders of magnitude shorter than the timescales over which significant phenotypic change occurs. This discrepancy is even more extreme for timescales over which accumulated mutations and reproduction result in radiations of new species. This separation of timescales presents an obstacle to modeling the evolution of complex behaviors and structures, motivating our choice of modeling agents with a very simple repertoire of available actions. Even relatively "simple" organisms like C. Elegans are capable of much more complicated behaviors than simply relocating based on the conditions at the organism's current location [22]. While it is possible to consider models of more complex agents, for example with internal states that process environmental information, accounting for both the fast internal timescales and long evolutionary timescales in a single model is a significant challenge. Other modeling strategies for bridging these timescales will be necessary for the foreseeable future, such as the timescale separation approximation of Mann [27].
Selection by catastrophes versus stochastic death. We have implemented selection in our model by an unorthodox choice (in population dynamics models) of implementing a periodically occurring catastrophe that reduces the population down to a fixed size, rather than the more common and realistic scenario of having stochastic death in which agents randomly perish according to an asynchronous death rate (or set of death rates). This said, this method of removing agents is similar to approaches used in evolutionary game theoretic models [17,[48][49][50][51]. The reasons for this modeling choice in our population dynamics approach are largely technical: in any stochastic model with a finite population size and stochastic death, the ultimate fate of the population is always extinction. While this may not be a problem in practice for simulations, as the extinction time can be quite large [25] and only a few simulations might terminate in extinction, it complicates analytic analyses. In particular, the mean-field approximation in a model with stochastic death will always predict that the extinction fixed point is unstable and the non-zero population size is stable, which is qualitatively invalidated at long times: the variance of the stochastic process grows with time, owing to the fact that eventually a large enough fluctuation will occur to drive the system to the extinction fixed point, from which it cannot escape. By only allowing agents to perish via a periodically occurring catastrophe that resets the population size to the initial number of agents we allow our model to remain close to the mean-field regime throughout the entirety of the simulation. One can imagine a hybrid scenario in which the catastrophes do not occur periodically, but with stochastic timing (e.g., with Poisson-distributed arrival times), which would retain the technical advantages of the period catastrophes while also re-introducing stochasticity to the selection that could be interesting to investigate in future iterations of this model. Sexual versus asexual reproduction. We considered only asexual reproduction in our model for simplicity, in which the phenotypes inherited by offspring are noisy copies of the parent's phenotype (or direct copies in the mean-field model). One could introduce sexual reproduction into the model by allowing agents at adjacent sites to mate and produce an offspring onto one of the unoccupied sites adjacent to the parent that gives birth, with the child's phenotype determined by some combination of the parents' phenotypes.
The most significant impact that adding sexual reproduction would have on our results is the complication of the low-density approximation that we used to derive the effective cost function Λ γ . Because sexual reproduction requires interactions between individuals, a meanfield model that includes sexual reproduction is necessarily nonlinear. As a consequence, deriving an effective cost function like Λ γ in this case is not straightforward, similar to the high density regime studied in this work. This could suggest that single-agent normative models may be most applicable not simply to organisms that evolved in low density conditions, but also organisms that reproduce primarily through asexual reproduction-or perhaps organisms that reproduction via external fertilization, such as many aquatic animals [53][54][55][56]. However, a more thorough analysis of phenotype dynamics in populations with sexual selection is required to test these possibilities.
Complexity of agents and environment in population dynamics models. The organisms we have considered in this work are rather simple in terms of their ability to process sensory information and take actions: an organism can only detect the resource level at its current location and move to a neighboring location at a fixed attempt rate. The agents have no memory, nor can they change their behavorial strategies, capabilities that even "simple" organisms like C. Elegans have been found to display after short periods of local exploration [22].
Even within the restricted repertoire of sensory processing and behavioral actions, we consider only relatively low-dimensional phenotype spaces, which in turn imposes a restrictions on the environmental structure: keeping the phenotype space constrained to 5 components requires we have only 5 resource levels. We could in principle have more resource levels while retaining a low-dimensional phenotype space by defining a function γ(s) parametrized by a small set of abstract parameters, but the choice of this functional form imposes additional structure, which could result in different competitive behavioral strategies for different choices.
Implementing even more complex agents will thus require even larger phenotype spaces; while this is feasible in single-agent normative approaches to evolutionary modeling, having a large number of dynamic complex organisms is extremely computationally expensive. We therefore opted for the more parsimonious choice of a simpler environmental structure with a 1-to-1 correspondence with the (directly interpretable) hopping rate parameters.
Depleting versus non-depleting resources. In this study we have considered only nondepleting resources (also called "non-destructive resources"), such that the resource levels S k are not modified by the dynamics of the organisms in the environment. This restricts the interpretation of these resource levels to "conditions favorable for reproduction," such as temperature or shelter, or consumables that are so plentiful that consumption by agents does not appreciably change the resource level of a site. Consumption of resources with replacement (e.g., continued growth of food sources or rainfall) would allow for a feedback loop between organisms and the environment, as the actions of the organisms change the environment, which in turn can alter which behavioral phenotypes are most successful at finding sites with plentiful resources. In such a scenario the dominant phenotype could be constantly changing, unless the statistics of resource consumption of and agent dynamics achieve an equilibrium. Such a scenario could give rise to the evolutionary dynamics similar to those observed in populations of turtle ants, in which the head shapes of soldier ants have repeatedly reappeared through evolution [20], though the specific case of soldier ants likely involves additional environmental selective pressures that change with time, not just changing resource distributions. Modeling such systems from a normative perspective would require defining a fitness landscape that changes with time. Investigating these possibilities within frameworks like ours will be an interesting avenue for future work studying the evolution of organism behaviors and sensory capabilities.
Metabolic costs of reproduction and relocation. In our model we have not explicitly implemented any metabolic penalty for giving birth or hopping to a new location. The phenomenological effects of such metabolic costs are implicit in the resource level dependence of the birth rates and the imposition of maximum hopping rate γ max . A more complex agentbased model could give each agent its own metabolic budget. Agents would acquire energy from the environment-e.g., high resource sites would provide more energy per unit time than low resource sites, possibly through consumption of more plentiful resources-and then agents would spend that energy to give birth (incurring a large metabolic cost) or relocate, with the metabolic cost of moving depending on the speed (hopping rate) at which the agents attempts to move. In such a formulation the hopping rate would not need to have an intrinsic maximum per se, though there would be an effective maximum determined by the ability of the agents to acquire enough food to support the high metabolic costs of moving quickly. A metabolic budget could be incorporated into a master equation model or mean-field model by treating the energy as an additional variable on par with the agent position and phenotype, i.e., by counting the number of agents at a particular location with a particular phenotype configuration and specified amount of energy.
We expect that in an environment with non-consumable resources that provide agents with a fixed amount of energy per unit time, with higher rates conferred by higher resource levels, one could choose the energy rates and metabolic costs of reproduction and relocation in such a way that the results of our current study are qualitatively reproduced.
Comparison to empirical data. Owing to many of the simplifications of our model discussed above, it is difficult to directly compare our results to empirical data. Studies investigating the evolution of innate behaviors have focused on the interplay of the innate behavior evolution and learning within an individual's lifetime [57][58][59], a relationship that is difficult to study in this sort of population-dynamics model due to the disparate timescales of evolution and within-life learning.
The most promising means of comparing our model to empirical data would be to design and perform an experiment that closely mimics the set-up of the model. For instance, one could imagine an experiment with a population of organisms that a) are capable of self-propelled motion, b) have some means of detecting local nutrient level/temperature/etc., c) have tools for genetic modification to create mutants representing an initial distribution the phenotype behaviors, and d) have relatively short reproduction times. These organisms would be placed into a controlled environment in which nutrients or reproductively favorable conditions are randomly distributed; consumable resources would need to be replenished at rates comparable to the rate of uptake by the organisms. The organisms would be given time to explore their environment and discover locations of high nutrient content, after which they are culled back to the initial population size, mimicking the catastrophe in our model. By using tracking software to monitor the organisms' relocation rates out of each nutrient level, one could observe over a large number of cullings whether there is a change in the distribution of phenotyped behaviors over the course of the experiment.
This said, the implementation of such an experiment may not be pragmatic for the foreseeable future, and would likely be limited to single-celled organisms. A more promising avenue might be to compare the qualitative predictions of future extensions of this work modeling more complex organisms. For example, experimental studies have observed selection-induced genetic drift in a mixed population of wild-type mice and mice with a mutation that disrupts their circadian rhythm [60]. One could model such an experiment by incorporating some of the more complex features discussed above into a formalism like the one presented in this work, with additional modifications to introduce a day/night cycle that interacts with agents' internal circadian rhythms. It would be of great interest to model similar scenarios that investigate the relative reproductive success of organisms with different perturbations to their sensory modalities, to see if the outcomes can be assessed using various normative predictions commonly used in neuroscience.

Future directions
In this work we have focused on agents with relatively simple sensory capabilities and behaviors in order to tractably compare agent-based simulations with mean-field approximations and calculate an effective fitness function in a tractable limit of the mean-field model. Two main avenues for future investigations of the models that we have not attempted to elucidate here include i) performing more advanced mathematical analyses of master equation models, for example by investigating the effect of stochastic fluctuations on the stability of phenotype coexistence in more detail, and ii) pushing the boundaries of agent and environment complexity, along with implementing other more realistic features of evolutionary dynamics.
Investigating the effects of stochasticity requires tools to go beyond the mean-field models introduced here. Several tools exist in the mathematical and statistical physics literature for this purpose, such as the Wentzel-Kramers-Brillouin (WKB) approximation [25,32] or fieldtheoretic formulations of the master equation model [37][38][39]61]. Because these cases explicitly deal with beyond-mean-field effects it would be more natural to use stochastic death in this setting, rather than the periodic catastrophe introduced in this work to control the efficacy of the mean-field approximation. For example, the WKB approximation could allow for a direct calculate of the large deviation functions determining the extinction transition. This large deviation function may serve as an effective cost function of the population, potentially allowing for the derivation of such a function, even in the high-density regime [61].
In terms of increasing organism complexity, a long-term goal of this line of work would be to endow agents with "brains"-small neural networks that take in some sensory information from the environment and then determine how the agent will attempt to move. In such a context evolution would act on the structure of the brain, similar to the case of Braitenberg vehicles [62,63] (as opposed to determining the synaptic weights in neural network models such as in [64]). In practice, this will be a challenging direction due to the large phenotype space and the separation of timescales involved in sensory processing and evolution. A more tractable approach would be to endow the organisms with a simple "receptive field" that detects sensory information in its immediate environment, down to a modest modification of the current model that allows for agents to detect resource levels at neighboring sites in addition to its current location. Straightforward increases in environmental complexity include adding spatial correlations between resources levels at different locations, as well as introducing sensory signals distributed through the environment that are correlated with resource levels that agents cannot detect directly. Modeling the impact of consumable resources is also of ecological importance, and could be implemented in our models, e.g., by introducing a non-motile "species" at each site that is consumed by agents and replenished at a fixed rate. In this way one could also introduce more realistic metabolic limitations: actions such as giving birth or relocating could incur energy costs that must be replenished by consuming food in the environment. This is one way we could formally remove the artificial restriction of a maximum hopping rate γ max , allowing the organisms to move as fast as possible between sites if they can acquire the necessary energy to maintain such a strategy.
Finally, one of the goals of this work was to attempt to derive functions that could serve as effective cost functions in normative approaches to modeling the biological structures and behavioral strategies that emerge from the more fundamental evolutionary dynamics. It would be interesting to perform a more thorough comparison of the predictions of birth-death dynamics and normative approaches to identify regimes where the predictions diverge, especially in approaches in which the objective functions are abstract quantities such as information-theoretic quantities related to random variables in an organism's environment [9][10][11]. Doing so will provide valuable insight into the relationships between fundamental evolutionary processes and the more parsimonious normative approaches useful for modeling complex organisms.

Details of the agent-based stochastic simulations
In the agent-based stochastic simulation, we use an L × L lattice with periodic boundary conditions, with a standard value of L = 128. The resource level at each location is drawn from a discrete uniform distribution with 5 possible levels, labeled S 1 , S 2 , S 3 , S 4 , and S 5 . Each agent has a phenotype vector γ = (γ 1 , γ 2 , γ 3 , γ 4 , γ 5 ) where the i th component corresponds to the hopping or relocation rate when the agent is at a site with resource level S i . The birth rate of agents at spatial coordinate x within the environment is a function of the resource level at that location, b(s x ) = b(S i ) � b i if s x = S i . Initially, 500 agents are introduced to the system, each of which has a random initial location and random initial phenotype vector with each component ranging from γ max = 0 to γ max = 10. This bound is set considering that the relocation rate cannot go negative and there are metabolic costs that constrain agents from relocating too frequently. An offspring inherits its parent's phenotype vector with Gaussian noise added to each component. The noise term follows N ð0; s 2 I 5 Þ, with σ = 0.05 (arbitrary units) in all the stochastic simulations throughout this study. If the noise would push a relocation rate component outside of the range [γ min , γ max ] then the component is set to the closest boundary value. Each location can hold at most 1 agent at any given time and thus the reproduction process has to leave the offspring at one of the vertical or horizontal neighboring locations in practice-if the agent attempts to give birth into an occupied site then this reproduction attempt is considered as unsuccessful. Similarly, hopping may only occur from an agent's position to one of the four adjacent sites, with the hop being unsuccessful if the target site is occupied. Importantly, a system-wide periodic random culling is applied to the system after a duration of T = 0.3 (arbitrary time units), which reduces the number of agents back to its initial state. This periodic catastrophe removes agents in a random manner without distinguishing its phenotype or location, functioning as the selection in the evolutionary dynamics. We expect that the most competitive phenotypes will be selected out after a series of growth and catastrophe events. A full iteration of the simulation consists of stochastic growth and relocation, as well as a random culling at the end. The next iteration starts off with the survived agents from the last iteration as the new initial condition.
As the carrying capacity of each location is set to 1, the 128 × 128 2d lattice can hold at most 16384 agents. For a typical run, in 0.3 units of simulation time the number of agents grows from 500 to around 850 before the random culling brings the number of agents back to 500. The initial density of agents is defined as initial number of agents divided by total possible number of agents in the system, which is 500/16384 � 3.1% for this standard parameter set. When investigating the high density regime, we kept the number of agents to be 500 in all the initial conditions and reduced the lattice size from 128 × 128 to smaller size to achieve higher initial densities.
To reduce the potential variability from stochastic simulations, we run 20 independent simulations with the same set of model parameters and resource background for all the agentbased stochastic simulations in this study. Then the averaged phenotype components cross all trials and their standard deviation were calculated as the solid lines and shaded areas in Figs  2E and 4A-4D.
The stochastic agent-based model was simulated in the Python programming language using a direct method Gillespie algorithm [23]. The Gillespie algorithm computes the total rates of the possible events that may take place in the system to determine when and what the next "action" is. Here "action" refers to either birth or relocation of an agent in the system. The total rate is the sum of the birth and relocation rates of all the agents in the system; the algorithm first determines the time of the next action and then probabilistically chooses one of the available actions based on the relative rate of each action compared to the total rate. That is, at each step, the time of the next birth or relocation event is drawn from an exponential distribution with mean 1/total rate. The probability that the next event is agent A giving birth or relocating is A 0 s birth rate/total rate or A 0 s relocation rate/total rate. The target location of the birth or relocation event is then randomly chosen from one of the four possible neighboring locations of the agent's current position. If the chosen target location is already occupied, the birth or relocation attempt is considered unsuccessful. Note that in this implementation of the algorithm, hopping or birth from site x to site y is a two-step process: the rates going into the total rate are just b(s x ) and γ x , and then if one of these events is chosen to occur the target site is drawn with probability 1/2d = 1/4 in our two-dimensional environment. However, an alternative way we could have implemented these dynamics is via a one-step process in which we include the birth and hopping rates from site x to site y for all possible target sites as separate rates into the total rate (all of which happen to have the same numerical values). In this second possible implementation the birth and hopping rates must be scaled down by the number of target sites (2d = 4) in order to match the results of the two-step implementation. This technical detail is important when developing the mean-field model, as the one-step implementation corresponds more directly to the master equation formalism we introduce in Master equation model of the stochastic dynamics and derivation of the mean-field approximation, and scaling the birth and hopping rates is required to make the derived mean-field model match the agent-based simulations.
Multiple Python packages were used in the stochastic simulations, mean-field model analyses as well as data visualizations in this study, most notably, Numpy [65], Scipy [66], and Matplotlib [67]. In the stochastic simulations, each stochastic trajectory was run in series on a single CPU core. A computing server with two Intel Xeon Processor E5-2690 v4 CPU (14 cores for each CPU unit) was used to generate 20 stochastic trajectories for the same set of parameters in parallel with the aid of the Python Multiprocessing module. Typically, each run takes approximately 35 minutes for a system starting with 500 agents and b 5 = 4 in Fig 2. Higher b 5 led to more agents being produced, resulting in a higher simulation load, and thus simulations with higher b 5 value take longer to run. For example, the simulations with b 5 = 16 depicted in Fig 3 took over an hour to run.

Master equation model of the stochastic dynamics and derivation of the mean-field approximation
Full model in a d-dimensional environment with random resources. Formalizing agent-based stochastic simulations as mathematical models that allow for tractable analytic analyses is often difficult [68]. Instead of keeping track of every agent individually, an equivalent description that lends itself to a standard class of mathematical models is one in which we only keep track of the number of agents with each phenotype at each spatial location in the environment. In order to write down such a model we need to make a few simplifications: 1. We discretize the phenotype space γ so that there are a finite number of discrete phenotypes and the number of agents with each phenotype can be enumerated. In principle, we could work with a continuous phenotype space and define a density of agents with each phenotype, but for practical numerical calculations using the mean-field model it is necessary to discretize the phenotypes.
2. We omit reproduction noise. In the mean-field model we will formally include all possible phenotypes in the initial conditions, so we do not need reproduction noise to introduce any phenotypes that were not initially present, as is the case in the agent-based simulations. Now, let n x,γ be the number of agents at location x with a particular phenotype γ (i.e., a particular choice of values for each of the relocation rate components) and denote by n the collection of all of these variables into a single vector. The probability that the state of the system has a particular configuration n at time t, p(t, n), is the solution to the system of differential We have suppressed the conditional dependence of p(t, n) on the configuration of resource levels s x . The notationê xg represents a vector with a 1 at the element labeled by (x, γ) and 0 elsewhere. The bare birth rate per capita at a site x is b(s x ), explicitly denoting that the birth rate depends only on the resource level of the current location. We assume that each site has a maximum occupancy of a single agent, so all births occur via an agent at some site y birthing a new agent into an adjacent site x. The bare hopping rate out of a site y is denoted γ y ; the value of the hopping rate is equal to the hopping agent's value of the hopping rate at resource level s y . Both birth rates and hopping rates into a site x are modulated by a factor f x (n), which is 0 if site x is occupied and 1 if unoccupied. The possible target sites for hopping or birth by an agent at site x are identified by the notation y 2 V(x), which means that site y is in the "vicinity" of x. By assumption, x = 2 V(x). For our 2d simulations, V(x) consists of the nearest neighbor sites y = x ± (1, 0) T and y = x ± (0, 1) T .
The initial condition p(0, n) represents the initial configuration of the system. In many applications the initial condition is deterministic, but one can choose an initial distribution as well. In particular, we can ensure every phenotype is represented by choosing an initial distribution of N 0 agents and assigning each agent's phenotype and location by drawing from the possible phenotypes and sites with a uniform probability. This eliminates the need for reproduction noise in this model.
The dynamics of Eq (3) run from t = 0 up to t = T, at which point the catastrophe occurs. N 0 agents are randomly selected out of the agents alive at time t = T and retain their spatial position and phenotype as we eliminate the remaining agents from the environment. The distribution of selected N 0 agents forms the initial distribution for the next iteration. This procedure repeats indefinitely, but we expect the distributions of the total number of agents of each phenotype to converge after a large number of catastrophes, though it may not be the case that the full spatial distributions p(0, n) and p(T, n) converge.
By summing Eq (3) over all possible configurations n, changing dummy indices, and using the fact that p(t, n) = 0 if any n x,γ < 0, one can show that the right hand side vanishes, as it must in order for probability to be conserved.
In practice, solving probability flow equations such as Eq (3) is intractable except for very simple cases. Moreover, implementing the catastrophe at this level of description is challenging. A more common approach that lends itself to analytical and numerical analyses is to instead derive a system of equations for the moments of the distribution and make a meanfield approximation to close the system of equations (as moments are normally coupled to higher order moments). For example, the "mean-field approximation" corresponds to deriving a system of equations for the first moments of the distribution and assuming all higher order cumulants vanish, such that averages over functions of n can be replaced by those functions evaluated at the average hni. For Eq (3) the mean-field system of equations is where we employed the mean-field approximation to replace products such as hf x (n)n x,γ i with the factored moments f x (hni)hn x,γ i. Note that in order for this approximation to work with our modulation function f x (n) we must specify the values that this function f x takes for continuous-valued inputs, as hni will not be an integer. In this work we make the choice which satisfies the desired boundary conditions of f x = 1 when the site is unoccupied and 0 when there is an agent at the site. This choice amounts to the assumption that hf x (n)n x,γ i/hn x,γ i varies approximately linearly with ∑ γ 0 hn x, γ 0 i. Although it appears that we should also impose a threshold to prevent this choice of f x (n) from becoming negative, this is unnecessary in practice as long as the initial conditions respect the occupancy of the system, as once f x reaches 0 it blocks any further increases in occupancy that would drive it to be negative. This is true in both the full master equation model and the mean-field model (Eq 4). These mean-field equations are still challenging to analyze due to the spatial variability of the system. We simplify our analysis further by deriving an approximate system for the total number of agents at sites of each resource level, detailed in the next section.
Reduction to a 5-state mean-field model. Rather than keeping track of the expected number of agents of each phenotype at every location in the environment we will instead focus only on the number of agents of each phenotype at sites of each resource level S k . The total expected number of agents at resource level k is where δ(s x , S k ) is an indicator function that is 1 if s x = S k and 0 otherwise. We will denote spatial sites by bold x and y indices while states will be denoted by italic k and i indices.
We can derive the system of equations for hn k,γ i by multiplying Eq (4) by δ(s x , S k ) and summing over x. The restricted sum over y 2 V(x) poses a complication for this approach, and will require an approximation to be discussed below. We detail the calculation for each term of Eq (4) in turn.
The left-hand side of Eq (4) is trivial, and simply gives d dt hn k;g i, so we consider the birth rate term ∑ x δ(s x , S k )f x (hni)b(s x )hn x,γ i first. Note that we may write b(s y ) = ∑ i δ(s y , S i )b i , which essentially just says that b( The sum over y depends on the current value of x, and so cannot be easily evaluated. One way to circumvent this problem would be to make the extreme approximation of allowing the target sites to include not just nearest neighbors but all sites in the environment (a usage sometimes used synonymously with "mean-field approximation," but this is due to the fact that the mean-field approximation becomes exact on all-to-all graphs in many models). A slightly less drastic approximation is to assume that the environment is "well-mixed," such that the sum over a local set of sites should just be a scaled down version of the sum over all sites.
Specifically, we will assume where the sum on the right-hand side is over all sites y and A y is a general observable. The ratio 2d/M is the number of target sites (nearest-neighbors) in d dimensions to the total number of sites in the environment. For A y = δ(s y , S i )hn y,γ i we therefore obtain where ∑ x δ(s x , S k ) = w k M is the expected number of sites of resource level k, with M the total number of sites and w k the frequency of resource level k. We assume the number of sites is large enough that the actual number of sites of each level is equal to its expected value. Thus, the birth rate term evaluates to X x X y2VðxÞ dðs x ; S k Þf x hni ð Þbðs y Þhn y;g i � 2dw k 1 À Next we consider the first hopping rate term. For this term it will be useful to use the identity γ x = ∑ i δ(s x , S i )γ i , which like the similar identify for the birth rates just says that γ x = γ i if s x = S i . Then, where we used the same approximations for the sums over y and x as in the approximation of the birth rate term.
For the final term we have X x X y2VðxÞ dðs x ; S k Þf y ðhniÞg x hn xγ i ¼ where the equality in the first line follows from using the fact that δ(s x , S k )γ x = δ(s x , S k )γ k , and we used our approximation for the sum over nearest-neighbor sites as a fraction of the sum over all sites in going to the second line. For our choice of f the sum over y is where we used the fact that ∑ x hn x,γ i = ∑ i hn i,γ i (the total number of agents obtained by summing over all sites must be equal to the total number of agents obtained by summing over all states) and the total frequency of each state must be ∑ i w i = 1. The final term of the mean-field equation is thus Putting everything together, the system of differential equations for our 5-state model is This is nearly our final result for the mean-field model. One additional correction is necessary to align it with our agent-based simulations.
Conditional-event correction and the low-density approximation. The 5-state model Eq (7) describes the mean-field approximation to Eq (3), which would correspond to a direct Gillespie simulation of the master Eq (3) that treats all of the possible events at any moment on equal footing when calculating the total rate to determine the next event. For example, birth from site x to x ± (0, 1) T and birth from x ± (1, 0) T are all treated as separate events, even though they have the same rate (when the target sites are all unoccupied). The same is true for the different hopping directions.
However, as detailed in Details of the agent-based stochastic simulations, our implementation of the agent dynamics uses the fact that the birth and hopping rates only depend on an agent's current location, not its target location, to calculate the total Gillespie rate using only a single instance of the birth or hopping rates. Then, conditional on a birth or hopping event being determined to occur, the target site is chosen randomly from one of the possible targets with equal probability.
These two implementations of the Gillespie dynamics are not equivalent. However, they are simply related. The conditional birth/hopping implementation essentially reduces the birth and hopping rates by a factor of the number of target sites. In a d-dimensional lattice this means we should rescale the rates b i ! b i /2d and γ i ! γ i /2d in Eq (7), yielding the mean-field dynamics that correspond to our implementation of the stochastic simulations, The mean-field equations given in the main text follow from this equation. In particular, the low-density approximation follows from this system of equations by assuming that ∑ γ 0 hn k, for each phenotype γ. As this is a linear equation, we may also write it in matrix form, where hn γ i is a 5 × 1 vector of the expected number of agents of phenotype γ at each of the 5 resource levels and M g is the transition matrix involving the birth and hopping rates. This equation can be formally solved in terms of the matrix exponential: hn g ðtÞi ¼ expðtM g Þhn g ð0Þi, for each γ. Note that although the phenotypes have become dynamically independent in this low-density approximation, the catastrophe still couples them together, as the fraction by which the population size is reduced depends on all of the phenotypes. We give our mathematical analysis of the effect of catastrophes in the next section.
Catastrophes in the mean-field model. We introduce catastrophes into the dynamics of Eq (8) by running the mean field dynamics up to time t = T, at which point the catastrophe occurs and we reset the population to its initial size N 0 by uniformly reducing the size of each phenotype: where ℓ indexes the number of catastrophes that have occurred and we have dropped the angled brackets for convenience. Though this equation holds for any density, in the lowdensity limit we can formally solve Eq (9) and use this iterative map to derive the effective objective function Λ γ , the growth rate of the phenotypes, as we show below. At low-density the most competitive phenotype has the largest maximum eigenvalue. We now prove that the most competitive phenotype in the reduced mean field model is that whose hopping rates maximize the largest eigenvalue of M g , denoted Λ γ .
Our proof begins by using the explicit solution of Eq (9) in terms of the matrix exponential, together with the reset condition Eq (10), to relate the final population sizes just before the next catastrophe by a system of iterative maps: where n ð'Þ g ðTÞ is the vector of population sizes at the end of the ℓ th iteration (i.e., just prior to the ℓ th catastrophe). As ℓ ! 1 this iterative map may converge to a fixed point, n � g ðTÞ, which is the stable distribution of the population in this dynamical system. The fixed point equation may be rewritten for each phenotype γ, and where I is the identity matrix. We expect the trivial solution n � g ðTÞ ¼ 0 to be obtained for phenotypes that are outcompeted by other phenotypes, such that these phenotypes comprise smaller and smaller fractions of the total population, eventually vanishing as ℓ ! 1.
Because extinction is not possible in this model, at least one phenotype must achieve a nonzero population size at the fixed point. For a non-trivial fixed point to exist, the determinant of ð P i;g 0 n � ig 0 ðTÞÞ=N 0 I À expðTM g Þ must vanish for some phenotypes γ. This implies that ð P i;g 0 n � ig 0 ðTÞÞ=N 0 is an eigenvalue of the matrices expðTM g Þ and n � g ðTÞ is the corresponding eigenvector. The eigenvalues and eigenvectors of expðTM g Þ are related to the eigenvalues λ γ and eigenvectors of the much simpler M g , which can be efficiently computed ( [69]). As empirically observed in Fig 6, only one of the eigenvalues of M g is positive; the rest are negative. As the self-consistency condition requires that P k;g n � kg ðTÞ ¼ N 0 e l g T , the only possible choice of eigenvector is that which corresponds to the largest, positive eigenvalue, which we denote Λ γ , as otherwise P k;g n � kg ðTÞ would be less than N 0 , which is not possible in this model. (Moreover, the eigenvectors with negative eigenvalues often have negative components, which does not make sense as a population count).
Because we assume that all phenotypes are present in equal proportions in the mean field model, it follows that the phenotype with the largest overall maximum eigenvalue Λ γ will be the sole survivor in the ℓ ! 1 limit. The fact that ð P i;g 0 n � ig 0 ðTÞÞ=N 0 , a single scalar number, must be an eigenvalue of expðTM g Þ for every γ implies that the only way multiple phenotypes may coexist in the low-density limit is if their largest eigenvalues Λ γ have the same value. In such a case the relative proportions of those phenotypes will depend on the overlap of n(0) with the eigenvector corresponding to Λ γ , which could be different for different phenotypes even if their eigenvalues are the same, leading to different relative proportions in the long time limit.

The mean-field dynamics are not gradient-based
Here, we briefly prove the claim that the mean-field model cannot be written as a gradientbased dynamics dn dt ¼ À @ @n CðnÞ for some effective fitness function −C(n). The proof uses the fact that the right-hand side of Eq (2) is not conservative, and hence cannot be written as such a gradient. Let the right-hand-side of Eq (2) be If this vector field were conservative, then it must be the case that @Q kg @n ja À @Q ja @n kg ¼ 0; as if Q kγ = @C(n)/@n kγ then this quantity would vanish by equality of mixed partial derivatives. We will demonstrate this is not the case. The derivative is Then, the difference becomes This quantity is non-zero in general, and hence the vector field Q kγ is not conservative and cannot be written as a gradient of a cost function that predicts the steady-state configuration of the population. This holds true even in the low-density limit (M � ∑ γ 0 n kγ 0 ), for which the difference reduces to

Numerical solution and analysis of the 5-state mean-field model
With the simplified 5-state mean-field model we can evaluate the evolution of agents by numerically solving the system of differential equations in Eq (8). To do so, we discretize the continuous phenotype space γ = (γ 1 , γ 2 , γ 3 , γ 4 , γ 5 ) with γ i 2 [0, 10] into a space where each component can take discrete values from the set {0, 2, 4, 6, 8, 10}. This discretized phenotype space contains 6 5 = 7776 possible phenotypes. We numerically solve Eq (9) with the ODE solver from Scipy package [66], using explicit Runge-Kutta method of order 5(4). In practice, we solve the full mean-field model Eq (2) for the fully coupled phenotype system, even in the low density limit. The number of agents is reset following Eq (10)  The mean phenotype and its standard deviation can be calculated from the fraction of agents with different phenotypes where N γ (t) = ∑ i n iγ (t) is the total number of agents with phenotype γ. Because there are no deaths in our model the numerator will never go to zero, and f γ is therefore bounded between 0 and 1, so we can treat it as a probability and calculate the mean phenotype by hgðtÞi ¼ X g gf g ðtÞ; for each component of the vector γ. We similarly estimate the standard deviation by s g ðtÞ ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi X g g 2 f g ðtÞ À hgðtÞi 2 r : Supporting information S1 Fig. Different realizations of the resource distributions give consistent phenotype distributions. In evaluating the phenotype distributions shown in the main text, we average over 20 trials of the evolutionary dynamics, but we keep the random realization of the resource distribution the same across these 20 trials. To demonstrate that the specific realization of resources throughout the environment does not strongly shape the ultimate phenotype distribution, we run the agent-based stochastic simulations for additional unique random realizations of the resources throughout the environment. The maximum eigenvalue Λ γ of the low-density coefficient matrix M g predicts the most competitive foraging strategies γ opt . Typically, the optimal values are γ 1,2,3 = 10 and γ 5 = 0, but the value of γ 4 depends on the birth rate at resource level S 5 , b 5 . When b 5 is above the critical value B 5c , the phenotype with γ 4 ! γ max has the largest maximum eigenvalue and thus are predicted to be the optimal; however when b 5 is below B 5c , the phenotype with γ 4 ! γ min is predicted to dominate the population instead. The same transition in the full agent-based stochastic simulation is shown in Fig 3. B. The derivatives of the eigenvalue predict a range of γ 4 values yield similar maximum eigenvalues when b 5 is close to B 5c . The effective objective function is insensitive to changes of γ 4 , resulting in a large variability of γ 4 as seen in the simulation results shown in Fig 2E and 2F. When the derivative is positive, larger γ 4 leads to larger λ γ and thus agents are better off leaving their current location to explore ("go"). When the derivative is negative, smaller γ 4 leads to larger λ γ and thus agents are better off exploiting their current location ("stay"). Here we present the violin plots of the relocation rate distributions of the full agent-based stochastic simulations at these different initial densities, showing the coexistence of phenotypes at a range of initial densities, indicated by a large width of the distribution. At lower densities, coexistence occurs when b 5 � B 5c , while some degree of coexistence is observed for several initial densities without tuning b 5 . Note that the large degree of coexistence at 94% density is mostly due to the environment being so dense that opportunities for relocation and birth are extremely rare. (TIF)

S5 Fig. Spatial covariance and the well-mixed assumption error at different initial density levels.
In deriving the 5-state mean-field model from the 2d master equation model, we made a "well-mixed approximation" (Eq (6)). The error introduced by this assumption (the difference of left and right hand sides of Eq (6)) has a zero mean when averaged across sites and here we show how the standard deviation of this error changes over different initial densities.
To assess the spatial variation neglected by our 5-state approximation, we evaluated the spatial cross-covariance of the agent numbers across different sites and then averaged across trials. This cross-covariance is localized around the nearest neighbor sites at low-to-medium densities, but weakens at higher densities. Averaging the covariance of the 4 nearest neighbors exhibits qualitatively similar behavior as the standard deviation of the well-mixed assumption error as the initial density changes. A. The spatial covariance of the number of agents at location (x, y) with the number of agents at sites (x + i, y + j), averaged across all sites with periodic boundary conditions and then averaged over 20 trials of independent simulations. The center of the covariance map (the variance) is suppressed for visibility. B. Green line and circle markers show the averaged spatial covariance of the 4 nearest neighbors in panel A (i.e., the mean of the covariances at (1, 0), (-1, 0), (0, 1), and (0, -1)) at different initial density levels. Blue line and triangle markers show the standard deviation of the well-mixed assumption error (the difference of left and right hand sides of Eq (6)) across 20 trials at different initial density levels.
Both quantities are small for very low or high densities, with peaks in a density range around 31% to 48%. (TIF)

S6 Fig. The well-mixed assumption introduces a mismatch in timescales.
The full agentbased model has a population of agents navigating on a 2d environment. For analytical tractability, this full model is reduced to a 5-state model that eliminates spatial dependencies. However, it is observed in Fig 2E and 2F that the full model and the reduced model have a qualitative mismatch in the order of magnitude of the convergence speed, as well as a quantitative mismatch in the value of the transition point of b 5 . Here we show the comparison of the two models at a shorter timescale (700 iterations) with different b 5 values (4 and 5). The results suggest that the well-mixed assumption introduces the discrepancy in timescales, as simulations of a stochastic 5-state model agree with the mean-field 5-state model. A. The mean relocation rates of the full 2d stochastic simulation with b 5 = 4 have not converged by 700 catastrophes. B. Agent-based simulations of a 5-state model with b 5 = 4 show a faster convergence of γ 5 to 0 and γ 1,2,3 to high values near γ max = 10, within the same 700 catastrophes shown in A (note the different scales of the horizontal axes in Fig 2E and 2F). C. The 5-state mean-field model with b 5 = 5 (which predicts γ 4 ! 10 at long times) converges within 700 catastrophes, though the fate of γ 4 is different from the stochastic 5-state model with b 5 = 4. D.
The stochastic agent-based 5-state model with b 5 = 5 largely converges within 700 iterations and agrees slightly better with the mean-field model's prediction for γ 4 . (TIF)