Stochasticity and non-additivity expose hidden evolutionary pathways to cooperation

Cooperation is widespread across the tree of life, with examples ranging from vertebrates to lichens to multispecies biofilms. The initial evolution of such cooperation is likely to involve interactions that produce non-additive fitness effects among small groups of individuals in local populations. However, most models for the evolution of cooperation have focused on genealogically related individuals, assume that the factors influencing individual fitness are deterministic, that populations are very large, and that the benefits of cooperation increase linearly with the number of cooperative interactions. Here we show that stochasticity and non-additive interactions can facilitate the evolution of cooperation in small local groups. We derive a generalized model for the evolution of cooperation and show that if cooperation reduces the variance in individual fitness (separate from its effect on average fitness), this can aid in the evolution of cooperation through directional stochastic effects. In addition, we show that the potential for the evolution of cooperation is influenced by non-additivity in benefits with cooperation being more likely to evolve when the marginal benefit of a cooperative act increases with the number of such acts. Our model compliments traditional cooperation models (kin selection, reciprocal cooperation, green beard effect, etc.) and applies to a broad range of cooperative interactions seen in nature.


Introduction
Cooperation in which individuals interact and generate fitness benefits either at the individual or group level is found in an immense array of biological systems. Of particular interest is the evolution of costly cooperation. Throughout this discussion, we define a costly cooperative trait as one that reduces the total expected relative fitness of the individual that expresses it, all else held equal, while increasing the total expected relative fitness of another. In this analysis, we will illustrate some ways in which stochastic fitness and non-additive benefits facilitate the evolution of cooperation.
In natural populations, there is a sizeable stochastic component to individual fitness [23,[39][40][41][42][43][44][45][46][47][48][49][50][51][52][53][54]. This means that we cannot say with certainty the number of descendants any given individual will leave. We thus need to treat fitness as a random variable-each individual having a probability distribution of possible fitness values (Fig 1B). These probability distributions must be distinguished from the frequency distribution of variation within a population. We thus need to distinguish two different kinds of statistical operations in our models (Fig 1). We will use a bar (e.g. � w) to denote the frequency mean across a population, and hat (e.g.ŵ) to denote the expected value of a random variable that has a probability distribution (this follows the notation in Rice and Papadopoulos 2009; note that a hat denotes an expected value, not an estimator, of a random variable [55]). Fig 1A shows the joint distribution of expected average offspring phenotype (� o i ) and expected absolute fitness (ŵ i ) for a hypothetical population. There is variation between individuals across the population, but zero variation within individuals (intra-individual variation). In this case, there is a positive frequency covariance (denoted ⟦� o i ;ŵ i ⟧) between these variables (Table 1). We use the term 'frequency covariance' because it is calculated across a frequency distribution of individuals.
In reality, neither offspring phenotype nor fitness can be predicted with certainty prior to reproduction. Fig 1B shows a hypothetical plot of these variables by themselves-not their expected values. Each individual in the current generation now has a distribution of � o i and w i . In the example shown, there is a negative probability covariance (denoted hh� o i ; w i ii) for each individual ( Table 2). This means that if an individual happens to leave more offspring than expected (its actual fitness, w i , turn out to be higher than its expected fitness,ŵ i ), then its offspring will likely be smaller than they would be if there were fewer of them. We call these 'probability covariances' because they are calculated across a probability distribution of possible future outcomes. In general, there is no reason to expect the frequency covariance across a population to be the same as the probability covariance for any one individual.
To create a generalized model of the evolution of cooperation, we assume populations are finite, that individuals can interact in any size groups, and that the components of individual fitness-including the costs and benefits of cooperation-are random variables, here captured with probability distributions (See Table 2 for definitions). Since we are adding stochasticity to models that already involve operations like covariances and regressions, we have to be clear about our notation. The term 'random variable' is sometimes used to describe any value for  which we can calculate a mean, variance, or a covariance. We are using the term in this way. We will use random variable to refer to a variable that has a probability distribution of possible states, due to the fact that it is influenced by one or more random processes-such as mutation, recombination, or unpredictable environmental variation. Under this definition, terms in the deterministic Price equation are not treated as random variables [56] (Table 1): The covariance in Eq 1 is a frequency covariance, calculated over all individuals in a population. If we are treating the change in mean phenotype (D � �) as a determinate value, then we are tacitly assuming that each individual has a single value of � o i and O i . This means that in Eq 1, � o i and O i are not strictly random variables-since they are not the result of any random process. Of course, if we were calculating the covariance of values in a random sample from a larger population (which is what statisticians are most often concerned with), then these terms would be random variables; inheriting their randomness from the sampling process. The Price equation, however, is concerned with the entire population, not a sample from it. We are thus, (in the case of Eq 1) dealing with a covariance between deterministic values.
This distinction becomes important when we want to introduce true stochasticity into our analysis. Now, � o i and O i for a particular individual become true random variables-meaning that there can be a probability covariance between them. When � o i and w i become random variables, the change in mean phenotype (D � �) and mean absolute population fitness ( � w) also become random variables, having a distribution of possible values. If we calculate the expected value of change in mean phenotype, we get the stochastic Price equation [57]: The first term on the right side of Eq 2 (⟦� o i ;Ô i ⟧) is the same as the covariance in Eq 1, except that it explicitly involves the expected values of � o i and O i . This is analogous to the covariance between the points in Fig 1A. The second term in Eq 2 (⟪� o i ; O i ⟫ i ) is the frequency average (i.e. the average across all individuals in the population) of the probability covariance between � o i and O i for each individual. Each individual has a joint distribution of possible mean offspring phenotypes and relative fitness values that have not been realized. This is like the average of the covariances for the different probability distributions in Fig 1B. Eq 2 shows that introducing stochasticity can alter the expected change in mean phenotype. Below, we will show that the combination of stochasticity and non-additivity, reveals new mechanisms that can facilitate or impede the evolution of cooperation between genealogically unrelated individuals.

The non-additive, deterministic case
Costly cooperation entails a fitness cost to the cooperator, C, and confers a fitness benefit, B, on recipients (whether they are cooperators or non-cooperators). For the rest of this discussion, we impose the cost as an inherent deduction from the maximal baseline fitness of cooperators, whether they interact with another individual or not. We are thus focusing on traits like overproduction of a metabolic compound, which incurs a cost to the cooperator regardless of whether there is another individual nearby to benefit from it. This is likely a common case in systems such as biofilms or slime molds. To capture any potential group structure within the population, we denote the amount of cooperation experienced by a cooperator as ρ A and the amount experienced by a non-cooperator as ρ S [9,26,29,58] (Table 1). ρ i will be a function of the distribution of individuals within a population (See Methods -Simulation 1 and 2).
We will be focusing specifically on costly cooperation; meaning that we need to distinguish costs and benefits. If we were instead modeling mutualism, then the fitness components of interest may change (e.g. the cost term may be unnecessary) or the benefits of cooperation may entail details that are not necessarily captured. Using the notation in Table 1, we can now write the deterministic absolute fitness equations for each phenotype, cooperators and noncooperators respectively: For the moment, we are treating w 0 , B, ρ i , and C as deterministic (we relax this assumption in the next section). In the majority of cooperation models, the benefits of cooperation are assumed to be additive, meaning that the total fitness benefit from multiple cooperative interactions is the sum of the effects of each independent interaction. In fact, as with many components of fitness, the benefits of cooperation are likely to be non-additive [13,19,23,39,[42][43][44][47][48][49][50][51][52][53]58,59]. Non-additivity means that the benefits an individual receives will change (may increase or decrease) as they interact with more cooperators. Non-additivity is often represented in terms of 'synergistic' benefits, though this 'deviation from additivity' produces the same benefit value each cooperator/cooperator interaction (e.g. Table 2 in [15]) [15][16][17][18]22]. By contrast, we will allow the benefits of cooperation to be a non-additive function of the number of cooperative interactions experienced, regardless of whether the recipient is an cooperator or not. This means that as the number of cooperative interactions increase for an individual over time, the benefit per interaction is greater or smaller than the benefit gained in the previous interaction.
To capture any possible non-additive benefit, we write the total benefit, B, as a function of the amount of cooperation experienced, ρ i and a parameter 'β' (which determines the curvature of the function):  [35]. In Eq 5 and Fig 2, setting β = 0 yields the additive case, in which the fitness benefit increases additively with experienced cooperation. Setting β > 0 means that the marginal benefit of a cooperative act increases with the number of such acts experienced. Conversely, β < 0 corresponds to cases in which there are diminishing returns from increasing experienced cooperation.
The cost of cooperation (C) could also be expanded in this way; if there is variation in how much cost is paid as the number of cooperative interactions increases. As noted above, however, we are considering cases in which the cost is an inherent property of being a cooperator. We thus will treat C as fixed. The standard condition of when to expect cooperation to increase under a deterministic model is when the average absolute fitness of cooperators is greater than the average absolute fitness of non-cooperators [5]: By combining Eqs 3-6 and assuming B 0 and β are constant across the population (all individuals have the same value; therefore, the average across them is also the same), we find the condition for cooperation to increase as a function of the amount of cooperation experienced by cooperators and non-cooperators (see Methods): The deterministic 'P' terms in Eq 7 are functions of the (frequency) distributions of ρ A and ρ S . They are defined as follows: • P 1 � ð� r A À � r S Þ ! The average degree to which cooperation is directed towards cooperators minus the degree to which cooperation is directed at non-cooperators. The subscript 1 refers to a 'first order genealogical relatedness-like' term. This, like the following 'P' terms, may well change from generation to generation.
The difference in frequency variance in cooperation experienced by cooperators and cooperation experienced by non-cooperators. The subscript v refers to a 'variance capturing' term. These frequency variances capture the variation across the population in cooperation experienced across cooperators (⟦ 2 ρ A ⟧) and non-cooperators (⟦ 2 ρ S ⟧).
• P 2 � ð � r A 2 À � r S 2 Þ ! The average degree to which cooperation is directed towards cooperators squared minus the degree to which cooperators is directed at non-cooperators squared. The subscript 2 refers to a 'second order genealogical relatedness-like' term.
Eq 7 shows that P v and P 2 become important when the benefits of cooperation are nonadditive (β 6 ¼ 0). P 1 is similar to the "genealogical relatedness" term in kin selection models (though it allows interactions between non-relatives) [9,26,29,58]. Note that we are separating the degree of cooperation that one experiences (which goes into the ρ i terms) from the fitness benefits that accrue from that cooperation (which do not influence the ρ i terms). We are thus distinguishing between the act (such as the degree to which an individual bacterial cell overproduces a metabolic compound that might benefit those around it) and the consequences of that act for others (e.g. how much other cells actually benefit from that compound). Some authors define relatedness in such a way that the fitness effects of an cooperative act are combined into the degree of cooperation expressed by an individual (or the genic value of an individual) [3,7,8,17,18,22]. In such models, any non-linearities in the benefits of cooperation (i.e. if β 6 ¼ 0) are folded into relatedness. In our model, we distinguish the act from its fitness consequences both because they are biologically distinct (for example, an environmental change might influence one but not the other) and because they can potentially vary independently of one another. Note that under this definition, P 1 may be zero (i.e. cooperators no more likely to interact with other cooperators than with non-cooperators), even when @r@ 6 ¼ 0 in models under which fitness effects are folded into the definitions of genic values [3,7,8,17,18,22]. Furthermore, when we make the model stochastic, we will show that individual behavior, fitness effects and clustering of individuals can have very different stochastic behavior. Studying interactions between these effects will reveal hidden processes influencing the evolution of cooperation.
Any behavior that increases P 1 , all else held equal, will increase the potential for cooperation to spread. Since the 'P' terms simply measure the degree in individual cooperative interaction, this model accommodates different biological processes influencing why individuals encounter who they encounter. If all interactions are between genealogically unrelated individuals, P 1 may be non-zero.
A number of proposed biological processes can increase P 1 , though they are usually treated in separate models. These include but are not limited to: • Genealogical relatedness: One way to boost P 1 in favor of cooperators is to direct their helping towards genealogical relatives. This case essentially captures kin selection, though note that we are using absolute fitness ('neighbor modulated fitness' in [5]) instead of inclusive fitness. Inclusive fitness models frame 'relatedness' in terms of the actor's genetic value and the recipient's genetic value. What matters in this model is that cooperative interactions are preferentially directed towards individuals that share alleles identical by descent [5]. P 1 captures the relationship between the focal individual's phenotype and how much cooperation they experience. It is thus not defined in terms of genealogical relatedness, but will tend to be larger if cooperators direct their helping preferentially towards relatives. In a very large population in which cooperators are very rare, P 1 can often be approximated by genealogical relatedness. P 1 likewise encompasses more recent "statistical" definitions of relatedness (though not those that fold the fitness effects of cooperation into the measure of cooperation experienced). These define 'relatedness' as the linear regression of group composition (or cooperation experienced) on individual genotype or phenotype [8,11,[14][15][16][17]23,30,31]. As Fig 3 shows, for a binary case with two states (cooperator and non-cooperator), (� r A À � r S ) is equal to the regression of cooperation experienced on cooperation expressed. If P 1 < 0, then on average cooperators interact with other cooperators less often than with non-cooperators. In terms of the regression on actor and recipient genetic value, we are explicitly separating genetic value from the phenotype cooperation experienced, though, the non-additive benefits of cooperation will influence the difference in the variance of cooperation experienced (P v ) and the 'second order relatedness-like' term (P 2 ) [36,60].
• Past behavior as an indicator: Another way to increase P 1 is for cooperators to direct help preferentially towards others who have themselves acted cooperative in the past (rather than due to genealogical relatedness). This phenomenon is generally folded into models of reciprocal cooperation, but the process we are highlighting here is different from the expectation of a future benefit [8,10,14,15,[22][23][24]33,61]. In fact, the expectation of receiving a returned benefit is not necessarily non-zero.
• Recognizable cooperative trait: Here, cooperators direct cooperation towards others who express some indicator trait (e.g. 'green beard') that correlates with the recipient's potential cooperative behavior. Theoretical models and empirical experimentation have shown that such indicator traits can facilitate the evolution of cooperation, even among microbial colonies [2,8,[62][63][64][65].
These are three well-known ways to increase the likelihood of an cooperative interaction, and thus increase P 1 . Though, we cannot say that simply because P 1 is positive that cooperation will increase. For example, if cooperators have lower variance in cooperation experienced than non-cooperators (i.e. most cooperators experience similar amounts of cooperation, while some non-cooperators have many cooperative experiences, most do not), then P v will be negative. If, in addition, β > 0, then this negative P v will reduce the potential for cooperation to increase. Decreasing benefits (β < 0) will facilitate the increase in cooperation if P v is also negative.
As noted above, our model distinguishes the amount of cooperation experienced from the fitness benefits that result from it. How the magnitude of the benefits of cooperation changes plays a big role in influencing the evolutionary outcome. If β = 0 in Eq 5 (meaning that fitness contributions are additive), then Eq 6 is just a simple version of Hamilton's rule, equivalent to results in many other studies [1,4,5,6,[12][13][14][15][16][17][20][21][22]26,[28][29][30][31]. When the effects of cooperation are explicitly non-additive, interestingly two new terms appear in Eq 7, β(( is always of the same sign as the first-order relatedness-like term, ð� r A À � r S Þ, but is multiplied by β, which measures nonadditivity of benefits (Fig 2). Thus, compounding fitness effects (β > 0) increase the potential for cooperation to spread when ð� r A À � r S Þ > 0. Conversely, diminishing returns from cooperative acts (β < 0) reduces the potential for cooperation to spread relative to the additive case.
The term β(⟦ 2 ρ A ⟧ − ⟦ 2 ρ S ⟧) is particularly interesting since it may be of the opposite sign to P 1 and P 2 . ⟦ 2 ρ A ⟧ is the frequency variance, across the population, in the amount of cooperation experienced by each cooperator, while ⟦ 2 ρ S ⟧ is the corresponding variance in cooperation experienced by each non-cooperators. (⟦ 2 ρ A ⟧ − ⟦ 2 ρ S ⟧) > 0 when the variance in cooperation directed at cooperators is greater than that of cooperation directed at non-cooperators. Note that (⟦ 2 ρ A ⟧ − ⟦ 2 ρ S ⟧) is not necessarily zero, even when the "relatedness"-like term is zero or if both are negative, then cooperation can increase in frequency even if ð� r A À � r S Þ ¼ 0 or slightly negative. Put another way: If the fitness effects of cooperation are compounding (β > 0), then cooperation can increase if the amount of cooperation directed at cooperators is more variable than that directed at non-cooperators (even if the mean values are the same). Note, we are separating the effects of fitness from the phenotype simply because we think the biological components contributing to the evolution of cooperation written in this way is more informative, not necessarily more intuitive. When β > 0, individuals gain an increasing benefit from the cooperation they receive. The cooperation an individual experiences and the benefits gained from those cooperative interactions are two biologically distinct factors. Similarly, cooperation can increase if β < 0 (diminishing returns from cooperation) and if the amount of cooperation experienced by cooperators is more predictable than that experienced by non-cooperators ( Non-additivity of benefits is likely to be relevant in cases in which cooperators are relatively rare, and occasionally grouped (dense groups are more common than expected by chance, but are still rare). In such cases, P v will tend to be relatively large (P 2 may be as well), making the value of β important. As an example, Fig 4 shows a hypothetical case in which in which a few cooperators are grouped, while most are dispersed among non-cooperators. In this case, P 1 is positive (because of the grouped cooperators), but P v is even larger. Non-additivity of benefits has a substantial effect in cases like this. In the additive case (β = 0), if the cost of cooperation is set at C = 1, cooperation could increase in frequency if the benefit of a single cooperative act (the benefit of interacting with one and only one cooperator) is greater than 1.875. As β increases, this threshold comes down. For β > 0.7, cooperation can increase in this case even if the benefit of a single cooperative interaction is less than the cost to the cooperator. Note though the average relative fitness benefits of a cooperative interaction is greater than the costs. The reason for this is that the rare group of all cooperators substantially increases the variance in ρ A .
If P v and β are both positive, then there is the potential for cooperation to increase even if P 1 and P 2 are both negative. To see this, note that when P v � 0, some cooperators experience a lot of cooperation, while others experience very little. If β > 0, then the fitness benefits experienced by those with high ρ A (experience much cooperation) will outweigh the reduced fitness of those who experience little cooperation (low ρ A ). This is just a manifestation of Jensen's inequality-resulting from the non-linearity of fitness effects [36,60].
Cooperation is also favored if P v and β are both negative. Here, the individuals who experience the most cooperation are non-cooperators, but they gain relatively little due to the diminishing returns of cooperation (β < 0 line in Fig 2). In this case, the individuals experiencing the least cooperation are likely to also be non-cooperators (since their variance is relatively large), and these individuals do particularly poorly because of the abrupt decline in B at low values of ρ i .
Note that, so far, this is a deterministic model (each individual has a known fitness value), so the only variation in cooperation is captured by the variance across the population. The terms in Eq 7 are not yet random variables. In the next section, we allow fitness to be a random variable (each individual has a distribution of possible fitness values). In the stochastic case, another kind of fitness variance-probability variance-will arise that can have a very different effect on evolution.

The non-additive, stochastic case
Eqs 3-7 treat fitness as a predictable determinate value; this assumes we can predict precisely how many descendants/offspring an individual will leave. In general, this is not possible. Introducing stochasticity means treating fitness as a random variable, which has a probability distribution of possible values. If individual fitness (w i ) is a random variable, then so is mean population fitness ( � w) ( Table 2). In this case, modeling evolution requires that we consider each individual's relative fitness: 57,60,66,67]. The condition for cooperation to increase is now that the average expected relative fitness of cooperators is higher than that of non-cooperators. Using^("hat") for the expected value of a random variable, our condition can be written:Ô Eq 8 is the condition for a trait to be expected to increase under stochastic selection with constant heritabilities. The critical difference between using absolute fitness in Eq 6 and relative fitness in Eq 8 is thatÔ i is the expected value of a ratio of random variables [67]. This means thatÔ i is not just a function of the expected values of w i and � w but also of the entire distributions of possible w i and � w values. In particular, the probability variance in individual fitness (� 2 w i �) becomes an important factor in evolution [7,38,57]. This makes it possible to satisfy Eq 8 even when the mean absolute fitness of cooperators is lower than that of non-cooperators, meaning that Eq 6 is not satisfied. Directional evolutionary processes that result from � w being a random variable are called directional stochastic effects [57]. These effects can drive the evolution of cooperation in ways that are invisible to deterministic models.
We can rewrite Eq 8 by expandingÔ i in a Taylor series [57]. This yields the following condition for cooperation to increase (see Methods): Introducing stochasticity does not just 'add noise' to our result; instead, Eq 9 exposes new processes that influence the evolution of cooperation generally. On the left side of Eq 9, the difference between the expected absolute fitness of cooperators and that of non-cooperators is divided by the harmonic mean of � w. Note, the right side of Eq 9 is not zero, as it would be in the deterministic case, but instead is a series containing the joint central moments of w i and � w, the same expansion, from Rice (2008), is also used in Kennedy et al. (2018).
As in the deterministic case, we can write our absolute fitness in terms of the benefits and cost of cooperation and the amount of cooperation experienced. However, B, C, and ρ i are now random variables-meaning that they can covary with one another within an individual and across the population. Considering only the first order terms in the summation on the right hand side of Eq 9 and denoting the frequency of cooperators and non-cooperators by f A and f S respectively, the condition for cooperation to be expected to increase is now: (Here, to illustrate the interaction of stochasticity and non-additivity, we only expand Eq 9 to the first central moment. This does not mean that higher order terms will not influence the evolution of cooperation. Discussion on expanding Eq 9 to higher order terms is found in the Methods section). P 1 , P v and P 2 have the same basic interpretations that they had in the deterministic case, but they must be modified slightly now that absolute fitness and the amount of cooperation experienced are random variables: The difference between the average degree to which cooperation is experienced by cooperators and the average degree to which it is experienced by noncooperators.
The difference between the total variance in cooperation experienced by cooperators (including the frequency variance in cooperation experienced across cooperators and the frequency average of the probability variance in cooperation experienced for each cooperator) and the total variance in cooperation experienced by non-cooperators. ⟪ 2 r i ⟫ is simply the average across all individual's distributions of � 2 ρ i �. Now the total variance in cooperation experienced across and within for cooperators must be higher than for non-cooperators if this term is to be positive and aid the evolution of cooperation. In one scenario, this term could be positive even if on average cooperators have lower individual variance in cooperation experienced (⟪ 2 r A ⟫ < ⟪ 2 r S ⟫); as long as, the variance in cooperation experienced across cooperators is greater (⟦ 2 ρ A ⟧ > ⟦ 2 ρ S ⟧) and makes up the difference.
Group size does not show up as an explicit term, in Eq 10, but its effects are seen in P v . If group size is quite large, then each group is a representative sample of the population, meaning there is little variation across groups in cooperation experienced. Small groups matter because each 'sample' can vary greatly in cooperation experienced. Small groups thus increase that the likelihood some individuals experience cooperation but most do not. When those individuals are also cooperators, those groups do particularly well.
If individuals are randomly assorted, meaning that an individual's distribution of cooperation experienced is binomial, then this probability variance increases as the group size decreases. Even if, on average, cooperators do not group together (⟦ 2 ρ A ⟧ < ⟦ 2 ρ S ⟧), P v may still be positive if on average cooperators have the higher probability variance in cooperation experienced (⟪ 2 r A ⟫ > ⟪ 2 r S ⟫). With small group sizes when cooperation is rare, few cooperators will experience cooperation, most will not.
The difference between the square of the average degree to which cooperation is directed towards cooperators and the analogous value for non-cooperators. This term will be influenced by non-additive benefits just as its deterministic counterpart.
In a completely deterministic case, these values collapse to the values given in Eq 7. Note, though, that the deterministic 'P' functions are not simply the expected values of their stochastic counterparts.
We will discuss each of the terms in Eq 10 from left to right and discuss the equation as a whole in the Discussion. Note, all terms on the left side are divided by Hð � wÞ (the harmonic mean of mean absolute population fitness). This is important because, as the variance in mean absolute population fitness increases, holding the expected absolute fitness (ŵ) constant, the harmonic mean decreases, increasing the magnitude of the entire left hand side of Eq 10.

Hð � wÞ
: This term includes the product of the expected additive benefits (B 0 ), and the average expected degree to which cooperators interact with other cooperators verses noncooperators (P 1 ) plus the probability covariance between the distributions of B 0 and P 1 (�B 0 , P 1 �). Note that probability operations capture variation in fitness within each individual and frequency operations capture variation across the population. �B 0 , P 1 � is the covariance between the distribution of possible additive benefits and the distribution of possible numbers of cooperative interactions. These terms are random variables because we cannot know with certainty how many benefits or interactions an individual will experience over its lifetime. In Eq 7, we assumed that there was zero variation in individual's absolute fitness and so this covariance did not exist. IfB 0P 1 > 0, this does not mean that �B 0 , P 1 � > 0, although this will make the evolution of cooperation more likely all else held equal.

Hð � wÞ
: This term combines the degree of non-additivity in fitness effects (β) with the difference in the total variance in experienced cooperation. This term is the most unlike its deterministic counterpart. When we change deterministic terms β and ρ i into distributions, we end up with the probability covariance between them and the expectations of those variables. The big difference is in the composition of P v . Now this includes the total variance in cooperation experienced (individual and population level) for both cooperators and non-cooperators. �β 0 , P v � is a probability covariance between two random variables; each individual has a joint distribution of β and P v . If �β, P v � >0, then the non-additive benefit effects are either accelerating or diminishing at the same time cooperators have higher total variance in cooperation experienced than non-cooperators (ð⟦ 2 r A ⟧ þ ⟪ 2 r A ⟫Þ À ð⟦ 2 r S ⟧ þ ⟪ 2 r S ⟫Þ > 0). If �β, P v � < 0, this will make it harder for cooperation to increase, but if the expected values (bP v ) are positive and large the left side may still outweigh the right side of Eq 10.

Hð � wÞ
: This term is subtracted from the left hand side of Eq 10, meaning all else held equal a positive value of (bP 2 þ ⟪b; P 2 ⟫) will reduce the potential for cooperation to evolve. This negative term may be surprising since all terms in the deterministic version (Eq 7), aside from the cost of helping, are positive. The probability covariance between the non-additive effects of cooperation and the degree in squared experienced cooperation directed towards cooperators (�β, P 2 �) appears because both β and P 2 are random variables, without stochasticity �β, P 2 � = 0. ÀĈ Hð � wÞ : This term captures the cost of cooperation. In reality, there is no reason to assume the cost to cooperators be deterministic or additive. For simplicity of this discussion, C is a random variable but will we are not relaxing the assumption of non-additivity.
As noted above, all terms on the left hand side of Eq 10 are divided by the harmonic mean of mean absolute population fitness (Hð � wÞ). Consequently, the absolute value of the left hand side of Eq 10 will tend to increase if � w becomes more unpredictable-as would be the case in a very small population or in a capricious environment. Whether this increases or decreases the potential for the evolution of cooperation depends on whether the sum of terms are positive or negative. If the sum is positive, then uncertainty in � w will increase the potential for the evolution of cooperation.
On the right hand side of Eq 9, the first order term contains a probability covariance between the difference in absolute fitness distributions of cooperators and non-cooperators and the first central moment of average absolute population fitness (⟪w A À w S ; � w⟫). This term captures the relationship between the net benefits to cooperators and mean absolute population fitness ( � w). This covariance shows up in the derivation ofÔ i because cooperative interactions between individuals cause their fitness distributions to be stochastically dependent (see Methods). We can break up the first order term on the right hand side of Eq 9 into three different terms, as shown in Eq 10: : This term is the product of the current frequency of cooperators in the population (f A ) and the probability variance in mean absolute fitness for cooperators (⟪ 2 � w A ⟫). Note that ⟪ 2 � w A ⟫ is the variance in the distribution of possible mean absolute fitness values for cooperators; it is thus a property of an individual (unlike ⟦ 2ŵ A ⟧À the frequency variance across the population in the expected absolute fitness of cooperators). If � w A is not a random variable, this whole term will be zero (f A ⟪ 2 � w A ⟫ ¼ 0). In this case, there is no variation in cooperators' absolute fitness. Therefore, this is no variation in mean absolute fitness (⟪ 2 � w A ⟫ ¼ 0). This term can only be positive or zero; therefore, all else held equal, when cooperators are rare (f A � 0) or when the variance in mean absolute fitness for cooperators is small (⟪ 2 � w A ⟫ � 1), the more likely the left hand side of Eq 10 is greater than the right, increasing the potential for cooperation to increase in frequency.
: This term is the product of the frequency of non-cooperators in the population (f S ) and the probability variance in mean absolute fitness for non-cooperators (⟪ 2 � w S ⟫). Because this term has a minus sign and f S and ⟪ 2 � w S ⟫ are always non-negative, this entire term can only be negative or zero. Consequently, when f S ⟪ 2 � w S ⟫ 6 ¼ 0, this increases the potential for cooperation to increase in frequency, all else held equal.
: This term involves the difference in the frequency of non-cooperators and cooperators, multiplied by the probability covariance between their mean absolute fitness distributions. If cooperators are rare (f S − f A � 0), then a highly negative probability covariance (⟪ � w A ; � w S ⟫ � 0) will facilitate the evolution of cooperation, all else held equal.

In silico experiments
To test some of the hypotheses generated from Eq 10, we set up two individual-based in silico experiments using Python. In order to illustrate the basic processes discussed above, we kept these simulations intentionally simple. This means we will make simplifying assumptions about which distributions to use and how individuals interact over one generation. Population size is relevant in terms of the directional stochastic effects and group size is relevant in terms of the variance in cooperation experienced. The simulations thus focus on relatively small populations of relatively small groups. We only aim to illustrate that the uncertainty in the level of cooperation matters and plays a key role in facilitating the evolution of cooperation when the population is structured into small groups. Eq 10 can account for cooperative models that are considered weak or strong cooperation regimes. To elaborate this point, Simulation 1 is set up so that cooperators produce a diffuse product that is essentially a public good and thus benefits cooperators indirectly (but with that benefit being less than the cost, i.e. weak cooperation) [3,9,13,19,28,33,61]. Simulation 2 is set up similarly, but cooperators interact directly and do not benefit from their own behavior (i.e. strong cooperation) [8,9,12,13,28]. Table 3 is a quick reference of our individual-based simulation results. For more details on simulation design, see Methods section.
Simulation 1: To illustrate the consequences of non-additive benefits, we set the benefit function so the expected benefits of an cooperative act increase non-additively as the number of cooperators an individual interacts with increases (b ¼ 0:2) (e.g. Fig 2 where β > 0). The cost is set so that an individual cooperator among all non-cooperators does not gain a benefit that outweights the personal cost of the cooperative phenotype (Ĉ ¼ 0:25). The ratio of cooperators to non-cooperators in the population is set to 2:14 respectively. This simulation will Table 3. Summarization of simulation results.

Cooperators
Non-Cooperators� P ¼ À 0:01  show that cooperation can evolve even when cooperators are guaranteed to interact with noncooperators each generation. From the population pool, four groups of four individuals each assemble at random. This is not a sample; this includes the entire population. The baseline absolute fitness distribution variance, mean and maximum for cooperators is reduced based on the inherent cost associated with the cooperative phenotype (Fig 5). Individuals are randomly assigned an expected baseline absolute fitness (ŵ 0 ) from their phenotype-associated distribution before interactions occur. After group members interact and accrue potential benefits, final individual absolute fitness values are drawn from a Poisson distribution, the mean and variance of which is defined by the benefits and costs experienced by each individual.
In this simulation,P is different each generation for each group arrangement (Fig 6). On average,� P ¼ À 0:01, meaning that a non-cooperator is expected to experience more cooperative interactions than is a cooperator. Cooperators also experience lower expected absolute fitness than do non-cooperators (ŵ A ¼ 2:39,ŵ S ¼ 15:28).
Because of non-additivity in benefits, though, cooperators experience higher expected benefits (B A ¼ 0:32,B S ¼ 0:09) and increase in frequency after one generation (200,000 runs; c DA ¼ 0:01; parental: f A = 0.13 and offspring: f A = 0.14), even though the total average expected cooperative effect is negative (� B T� P T À� C T ¼ À 0:25).
Simulation 2: Here, we show again that the expected absolute fitness alone is not a good predictor of the evolution of cooperation and highlight the importance of the variance in The evolution of cooperation absolute fitness within groups. Initially, the population is arranged into four interaction groups consisting of four individuals each with assigned phenotypes (two groups consist of all noncooperators, one group consists of all cooperators and one group is split with half non-cooperators and half cooperators). We set the expected cost of cooperation relatively high and the expected benefit so that it accrues in an additive fashion (Ĉ ¼ 0:25,B 0 ¼ 0:065,b ¼ 0). We are treating the benefits of cooperation as a single random variable, meaning that each individual experiences the same distribution of B values. This assumption is accurate only if the benefits of cooperation are additive, meaning that experiencing z cooperative acts leads to a total benefit of zB. With non-additivity, as in Simulation 1, the expected benefit (B) per individual is a function of the number of cooperative acts experienced. Again, individual's expected baseline absolute fitness and total expected number of descendants is drawn from a uniform ( Fig  5) and Poisson distribution respectively.
Since we held group structure constant, we can look more closely at how the frequency variance in expected mean absolute fitness for cooperators and non-cooperators changes as cooperators join the group. This frequency variance decreases as the number of cooperators increases per group (Table 3). We can also look at the frequency variance in expected mean absolute fitness across cooperators and non-cooperators independently. The group containing two of each phenotype reveal that cooperators within a group of non-cooperator have lower variance in absolute fitness before (⟦ 2� w A ⟧ mixed group ¼ 0:205, ⟦ 2� w S ⟧ mixed group ¼ 0:244) and after (⟦ 2� w A ⟧ mixed group ¼ 0:937, ⟦ 2� w S ⟧ mixed group ¼ 1:075) interactions take place within the group. As in Simulation 1, the frequency variance in expected mean absolute fitness increases across the population over the generation (before interactions: ⟦ 2� w ⟧ pop ¼ 0:06, after interactions: The evolution of cooperation In this simulation, due to the configuration of groups� P ¼ 0:64, cooperators are more likely than a non-cooperator to be the recipient of cooperation. Despite this, the total fitness effect is negative for cooperation because the cost heavily outweighs the benefits of helping (� B T� P T À� C T ¼ À 0:21). Due to this high cost, non-cooperators end up with the higher expected absolute fitness (ŵ S ¼ 10:26,ŵ A ¼ 6:16). We would expect, with this information, cooperation to decline under a deterministic model (Eq 4). In fact, because cooperators have a lower average variance in absolute fitness and higher relative fitness in this case, cooperation increases on average (200,000 runs; c DA ¼ 0:001; parental: f A = 0.375 and offspring: f A = 0.376). Higher order moments are often not used due to the assumption their effects on the overall outcome are negligible as the population size increases. We only went out to the first order term because it results in the simplest form containing the components of interest: stochasticity, non-additivity, variances and covariances. Higher order moments (including skewness and kurtosis) can be included by using more of the terms on the right side of Eq 9 (see Methods). The effects of population size appear in the numerator and denominator of both sides of Eq 10 (also see Methods for Eqs 12 and 13). It is often assumed that these will have little effect in populations that are not tiny, but this is not necessarily true. The effects of higher moments drop off with increasing population size only when individual fitness values are stochastically independent. When individual fitness variation is correlated, as when much individual variation is due to variation in a shared environment, then higher moments may be important even in large populations [57].
Note that focusing on an inequality limits the result to yes or no, cooperation will increase or not. Our simulations were designed to test this binary outcome, and identify processes through which cooperation can increase when rare in a small population. We are also not evaluating the equilibrium level of cooperation attained, only whether it can increase when rare. Note, however, that because we are using relatively small populations, in all our simulations cooperation is maintained at least at a moderate frequency (>13%). If the left side of Eq 10 is negative and the right side is infinitesimally more negative, cooperation is expected to increase. In fact, if we increase the population size in Simulation 1 from 16 to 200 (with groups of 4 or 8) or 4000 (with groups of 4), the average expected change in cooperation is positive in all three cases respectively ( c DA ¼ 0:00082295; 0:0078839; 3:7055E À 5 ). We also saw similar results for Simulation 1. We should expect to see this drop off in the expected average change in cooperation because we are only including the effects of demographic stochasticity. We are assuming each individual experiences a different environment, which essentially gets incorporated into individual stochasticity in which individuals are stochastically independent. Therefore, we would expect a diminishing magnitude of the higher order terms as the population size increases (see Methods). If the environmental conditions affect individuals across the population, then this expectation does not necessarily hold as N gets large [57].

Discussion
Cooperation can clearly evolve under various circumstances. The model presented here encompasses the well-established mechanisms of kin selection, reciprocal cooperation, and green beard effects, while also identifying other mechanisms involving non-additive fitness effects and stochastic selection.

Non-additive benefits of cooperation
It is well supported that non-additive fitness benefits influence the evolution of cooperation [13,19,23,39,[42][43][44][47][48][49][50][51][52][53]58]. Allowing variation in how the benefits of helping accrue as a function of group structure can dramatically change the probability of cooperation increasing. We will describe two possible non-additive cases that help to clarify the different conditions under which cooperation can arise, and argue that different conditions may often facilitate the evolution of cooperation in microorganisms relative to behaviorally complex animals.
If we ignore stochasticity and focus solely on non-additivity in benefits, Eq 7 is determined by the second (βP v ) and third (βP 2 ) term on the left side. This non-additivity creates new opportunities for the evolution of cooperation, relative to a deterministic, additive model where these terms would be zero. Eq 7 shows that an accelerating fitness function (β > 0) could facilitate the evolution of cooperation among non-relatives when cooperators are clumped by chance. In particular, Eq 7 shows that even if cooperators interact more often on average with non-cooperators than with other cooperators (P 1 < 0), cooperation can still increase as long as cooperators have higher variance in their degree to which they encounter other cooperators (⟦ 2 ρ A ⟧ > ⟦ 2 ρ S ⟧) (Fig 2B). This means most cooperators do not interact with another cooperator but some do, and in those instances they do really well relative to other groups because of the non-additive fitness benefits. We might expect this sort of benefit function for a growing biofilm because the diffuse benefits of cooperation are not realized until a sufficient number of individual cells are producing beneficial compounds [6,21,45,64,[68][69][70]. As the biofilm matures, the dynamics of Eq 7 will change. As cells produce offspring, they will likely increase their cooperation experienced (P 1 > 0), but as the biofilm deteriorates the benefits of helping may diminish (β < 0; e.g. overcrowding). Cooperation can be facilitated nonetheless through the difference in their variance in cooperation experienced (⟦ 2 ρ A ⟧ < ⟦ 2 ρ S ⟧). The dynamics of the biological system change over time and what may contribute to the increase of cooperation may also change.
For the evolution of cooperation, decelerating benefit cases seem unlikely and even if they do happen, it must only be temporary. In fact, diminishing benefit cases (e.g. β < 0 in Fig 2) are quite common in examples like 'helpers at the nest', where one or two cooperators greatly help, but adding more cooperators correlates with diminishing returns. This has been demonstrated in a number of studies on species such as Philetairus socius [42,54], Buteo galapagoens [43], Marmota flaviventri [39], Lamprotornis superbus [52] and Picoides borealis [71]. Cooperation still increases in these cases because interactions are limited to close relatives or individuals that can remember past interactions, which both increase P 1 and P 2 . The only way cooperation persists in these examples under a deterministic, non-additive model is due to (⟦ 2 ρ A ⟧ < ⟦ 2 ρ S ⟧).
In Simulation 1, individuals were randomly assorted (2 cooperators and 14 non-cooperators) into four groups. There are only three types of groups that can be produced: all noncooperators, one cooperator or two cooperators. There are only two population combinations of these three groups within one generation (one possible run of the simulation): one cooperator in two groups (pop1) or two cooperators in one group (pop2). If we consider the parameters of Simulation 1 but set the initial population structure similar to pop1, then Eq 7 would not predict pop1 to produce more cooperators in the next generation (β > 0 and P 2 and P v < 0). If we consider pop2 in the same regard, Eq 7 would predict the evolution of cooperation (β > 0, P 2 > 0 and P v < 0). This is due to average positive assortment across the population, not the variance. Simulation 1 was ran 200,000 times and each generation could look like pop1 or pop2, but on average P 1 and P 2 < 0. With P v < 0 in either case (pop1 and pop2), the addition of non-additivity in benefits could not solely predict the parameters of Simulation 1 would facilitate the evolution of cooperation.
The processes for the evolution of cooperation shown in Eq 7 require that cooperators sometimes encounter one another, meaning that there must be a few cooperators to begin with. In the next section, we will encounter a mechanism that can cause cooperators to increase in frequency even if they are initially very rare.

Directional consequences of stochasticity
Expanding fitness in Eq 7 with stochasticity adds several new terms to the left side and completely transforms the right hand side from zero to a probability covariance between the difference in mean absolute fitness of cooperators and non-cooperators and mean absolute population fitness ( ⟪ � ). Anything that reduces the value of this term will, all else held equal, increase the potential for cooperation to evolve. If the left hand side of Eq 9 is negative, there is the potential for cooperation to increase as long as this term is even more negative.
Denoting the frequencies of cooperators and non-cooperators as f A and f S , respectively, we . Eqs 9 and 10 are equivalent (though Eq 10, as written, includes only the first order terms from Eq 9); they illustrate how different biological processes can facilitate the evolution of cooperation. Independent conditions under which the right side of Eqs 9 and 10 can be negative include: 1) the case in which cooperators have a lower variance in absolute fitness than do non-cooperators (⟪ 2 � w A ⟫ < ⟪ 2 � w S ⟫), 2) the case in which cooperation does relatively well when the population is declining (⟪ � w A À � w S ; � w⟫ < 0), and 3) the case in which cooperators and non-cooperators have (partially) independent fitness distributions when cooperation is rare (ðf S À f A Þ⟪ � w A ; � w S ⟫ < 0). All three of these processes can be classified as directional stochastic effects [57]. These are evolutionary forces that yield directional change (unlike drift), but that exist only when there is stochastic variation in individual absolute fitness (w i ) and in mean absolute population fitness ( � w). We discuss these cases in order. Cooperators have lower variance in fitness. This effect was apparent in both Simulations 1 and 2, where� P < 0 andĈ >>B 0 , respectively. There, the cost of cooperation manifests as a reduction in the maximum absolute baseline fitness (exaggerated in Simulation 2). This reduces both the expected absolute fitness of cooperators and their variance (Fig 5). Possession of the cooperative trait inherently reduces the variance in the distribution of an cooperator's absolute fitness. In certain contexts, this could facilitate the evolution of cooperation and in others it may inhibit it.
In Simulation 1 and 2, the benefits cooperators gain through their interactions with other cooperators increases their expected absolute fitness but also their variance in mean absolute fitness, although both were still lower than for non-cooperators. As the frequency of cooperators increases, the less flexibility cooperators have in their mean absolute fitness variance and remain smaller than the corresponding non-cooperators term ( . Cooperators in Simulation 1 were able to increase in frequency over one generation because they were rare and had lower variance in mean absolute fitness (f A ⟪ 2 � w A ⟫), despite the expected terms on the left side of Eq 10 being zero or negative. These effects may have helped cooperators in Simulation 2 as well, although they also gained benefits from the left side of Eq 10 (B 0P 1 > 0).
Throughout this discussion and in Simulations 1 and 2, we assumed the benefits of helping and the 'P' terms do not covary. We have only done this for simplicity of this discussion, but in reality these probability covariances could be very influential. In fact, these covariances may have the opposite sign to their expected product counterparts and are of interest for further study.
In general, this effect of reduced variance in absolute fitness is likely to be important if cooperators are grouped, and the cooperative act itself buffers individuals against environmental fluctuations. One example includes the production of compounds that add mechanical strength to a biofilm. Some biofilm-forming bacteria create coaggregation bridges that bring non-coaggregation species into closer proximity. These bridges are vital for neighboring communication and overall biofilm integrity (i.e. insurance hypothesis, stability-diversity hypothesis, social insulation, etc.) [31,48,64,72]. If we consider the formation of these bridges an cooperative act, then this facilitates the evolution of cooperation because it also happens to group cooperators. In such a case, groups with cooperators have lower variance in their absolute fitness distribution relative to non-cooperators (as we saw in Simulation 2). When cooperators increase in frequency it is with greater magnitude than when they decrease in frequency. The consequence is that the expected average change in the frequency of cooperators is positive. This is an example of a directional stochastic effect [57].
Reduction in cooperators' absolute fitness variance is related to 'cooperative bet-hedging' as discussed by Kennedy, Higginson, Radford and Sumner (2018). They define this strategy as one that facilitates the evolution of cooperation by reducing the variance in the fitness distributions of the cooperator's relatives, even if the average fitness of those relatives is reduced. In accordance, Eq 10 (Eq 9 expanded to the first central moment) shows that if cooperators reduce their variance in absolute fitness (⟪ 2 � w A ⟫) relative to non-cooperators (⟪ 2 � w S ⟫), all else held equal cooperation is likely to increase. They highlight the importance of stochasticity in fitness and detail evolutionary predictions that would otherwise be overlooked under a deterministic Hamilton's rule. Directional stochastic effects, while influenced by the probability variance in absolute fitness, can lead to the increase of cooperation even when cooperators have a lower mean fitness [57].
Another empirical example in which cooperation influences absolute fitness variance is found in superb starlings (Lamprotornis superbus). Rubenstein (2011) shows that as group size (direct relation to number of helpers) increases, group mean absolute fitness increases and group variance in absolute fitness decreases. This cooperative buffering or 'insurance' provided by grouping reduces the fitness variance of participating individuals and has a significant effect when there is environmental instability [73,74]. It is likely that P 1 will be positive, since helpers are usually related within groups. Eq 10 shows that as long as cooperators do not greatly outnumber non-cooperators (f A < f S ), the decrease in variance in absolute fitness due to grouping can drive the evolution of cooperation.
Cooperators do well (relative to non-cooperators) when the population is declining. This is the most direct interpretation of Eq 9. If the value of ( � w A À � w S ) is largest when � w is small, then the right side of Eq 9 will be negative. Note, for cooperation to evolve it need not be the case that the mean absolute fitness of cooperators is ever greater than that of non-cooperators as long as this probability covariance is negative (⟪ � w A À � w S ; � w⟫). This condition will hold if, for example, cooperators tend to group and the benefits of cooperation are most pronounced in harsh environmental conditions [7]. This has been described in the social weaver, Philetairus socius. In these birds, adverse breeding conditions, such as low rainfall or large colony size, increases the demand for helpers [42]. As poor conditions continue, the per capita growth rate continues to decline, � w. Nonetheless, cooperation still increases in frequency (f A > 0). This combination will eventually make the right side of Eq 10 less negative, thus reducing the potential for an increase in cooperation. Cooperation continues to evolve as long as cooperators have lower variance in absolute fitness (⟪ 2 � w A ⟫) than noncooperators and increasingly higher mean absolute fitness ( � w A À � w S ), all else held equal. Cooperation increases in species like the white-winged choughs (Corcorax melanorhamphos) due to a 'buffering effect' (similar to what is seen in the superb starlings, and non-additive absolute fitness benefits) [40]. Collaborative benefits come in the form of young care and group defense. Group sizes range from about 3 to 20 individuals, though the benefits from helping taper off around 15 individuals. In this case, ⟪ � w A À � w S ; � w⟫ will be negative when unpredictability in their environment (such as an increase in the variance of predator visits) increases (decreasing � w) as groups of cooperators increase, in turn increasing � w A relative to � w S . Cooperative behavior that is conditional on population dynamics (the magnitude of � w) is also seen in biofilms. Xavier, Kim and Foster (2011) found no difference between the average fitness of cooperators and non-cooperators; nonetheless, cooperators increase due to 'metabolic prudence'. Groups of Pseudomonas aeruginosa cooperators secrete biosurfactants and relocate by swarming. To avoid helping non-cooperators, cooperators only initiate biosurfactant synthesis when � w is decreasing [75]. In this case, 'metabolic prudence' increases the likelihood that cooperators interact with other cooperators (P 1 ) and decreases the variance in absolute fitness of cooperators (⟪ 2 � w A ⟫) by producing secretions when the population growth rate slows (⟪ � w A À � w S ; � w⟫ < 0). This mechanism increases the left side and decreases the right side of Eq 10, facilitating the evolution of cooperation.
Cooperators are rare and exhibit independent fitness variation. This is related to the concept of induced overdominance, which has long been recognized in the population genetics literature but is rarely invoked in discussions of cooperation [7,57,76,77]. In this case, an individual's relative fitness (O i ) is a distribution with the expected value, variance, and higher moments all contributing to evolution. (Eq 15 -see Methods) [7,37,57]. Consider, for purposes of illustration, a case in which the relative and absolute fitness variation of cooperators is completely independent of the fitness variation of non-cooperators so that ⟪ � w A ; � w S ⟫ ¼ 0, and the variances of each are equal ( In this case, the right side of Eq 10 will be negative whenever the frequency of cooperators is lower than the frequency of non-cooperators (f A < f S ). This hypothetical example is an extreme case; the fitness distributions of different strategies need only be partially independent, and the variances need not be the same. We expect that the fitness of cooperators will vary at least somewhat independently of that of noncooperators because a component of their fitness, the cost of helping, is not experienced by non-cooperators. This process, in which a strategy can increase simply because the fitness of individuals that express it are stochastically independent of the rest of the population, is a very general phenomenon that is not specific to cooperation. Its relevance here is that it is strongest when the variant strategy is rare (and disappears when all strategies are at equal frequencies). It thus provides a mechanism, emerging naturally from our model that can cause cooperation to initially increase to an intermediate frequency [1,21,26,30]. This is significant because most mechanisms for the evolution of cooperation require that there be enough cooperators in the population in which they can interact [8,10,24,25,28,45,78].
Though there are many routes to the evolution of cooperation, they all fall under the same general condition given in Eq 10. This model allows us to elaborate on established mechanisms, such as kin selection and reciprocal cooperation, while also revealing other processesinvolving non-additive and stochastic fitness effects-through which helping behavior can increase. We have focused our discussion on examples of cooperation but this model can extend to other types of cooperation. Fitness can be partitioned in different ways to address specific details of the biological system under study. With knowledge of a particular system, assumptions can be made about which benefit and cost functions are more realistic and if they should change over time as the dynamics change within the population or group.
In real populations, fitness is always stochastic (we will never know ahead of time exactly how many descendants an individual will leave), and the benefits of many kinds of helping behaviors are likely to be non-additive. We saw through mathematical modeling and individual-based simulations that the evolution of cooperation can be facilitated when cooperators have lower variance in individual absolute fitness, when cooperators form small groups or clusters (relative to the whole population) when the environment is unpredictable and when cooperators are rare and somewhat vary independently from non-cooperators among other potential evolutionary pathways. We thus expect that the mechanisms presented here are acting in many natural populations and may contribute substantially to the prevalence of the helping behavior.

Derivation of non-additive, stochastic model
A general model should deduce any situation, and that means we must allow the benefits to accrue additively or non-additively (e.g. accelerating, diminishing, sigmodal, any non-linear function). Therefore, the benefits of cooperation must have a tunable slope. Eq 5 generates a functional relationship between B and � r i that can change over time. If we combine Eqs 5 and 6, then the mean absolute fitness for cooperators and non-cooperators is respectively: For simplicity, we are assuming � w 0 is the same for cooperators and non-cooperators. This is not necessary true and should not be overlooked. Now move terms across the inequality and cancel the baseline absolute fitness. We get Eq 7 from Eq 11 by expanding the � r 2 i terms using the rule: a 2 ¼ ⟦a 2 ⟧ þ � a 2 . All terms in Eq 11 are constants, meaning we are assigning exact numbers to fitness components that cannot be precisely determined from information of the current state. We want to allow stochasticity, or uncertainty, and treat each term as a random variable. We must now use relative fitness (Oi), instead of absolute fitness (w i ), because both w i and � w are random variables [57,66]. We use � k w� to denote the kth central moment of individual absolute fitness, w i . This Taylor expansion allows us to approximate each individual's relative fitness function. The expected relative fitness of an individual can be written as: where ⟪w i ; k � w ⟫ ¼ E½ðw i Àŵ i Þð � w À� wÞ k � [57].
If we expand ⟪w i ; k � w ⟫ for k = 1, then ⟪w i ; � w⟫ ¼ ⟪w i ; 1 N P N j¼1 w j ⟫, where N is population size and w j is the absolute fitness of individuals other than individual i. We can pull out the constant and expand 1 N P N j¼1 w j further so that 1 We get a probability variance of individual i (�w i , w i �) and a probability covariance ((�w i , w j �) between individual i and all other individuals but excludes the covariance between individual i and itself [36,57]. We end up with two terms that describe when individuals are stochastically independent or dependent: In this simulation, individuals (haploid, asexual) are randomly assigned (equal chance) to one of the four interaction groups (constant group size of four) each generation. Although mathematically we are allowing for variation in group size, we want to make consistent comparisons across groups that differ in their numbers of cooperators. Setting the number and size of groups will make it more difficult for cooperation to increase in frequency because cooperators will always end up in a group with some non-cooperators. When group size is variable it is possible in some generations that the two cooperators will group alone (i.e. positive assortment-whether it be due to relatedness, trait recognition or just by chance). In this instance, the benefits of cooperation are directed only at cooperators, making it easier for cooperation to evolve. In general, as group size decreases, the variation between groups increases. This is something we want to highlight through the 'P' terms in Eq 10.
Random colonization of groups means thatP will be different for each group, each generation because the arrangement of cooperators will change. The benefits of cooperation are a function of the cooperative environment experienced. In this simulation, as the number of cooperators increase per group, the benefit increases non-additively. This does not have to be the case. The benefit function may take many forms, depending on the biological system, including variants of the functions shown in Fig 2. In a realistic case, the benefits will not continue to increase exponentially as the number of cooperative interactions increase. Instead, the function may taper off after some number of interactions, approaching a sigmodal shape. In this simulation, we are only looking over one generation and the number of parental cooperators is always two.
We can minimize the number of cooperators per group by allowing cooperators to benefit from their own diffusion of products. If a cooperator is in a group with all non-cooperators, the benefit it gains from its own diffusion is less than the cost paid for being a cooperator. The benefits of cooperation to each cooperator surpasses the individual cost only when both cooperators end up in the same group. This is similar to what some describe as 'synergistic effects' [17,18,22]; although, we are considering these effects at the individual level instead of the group level. This ties into the earlier discussion on non-additivity.
For simplicity, we only consider the expected cost,Ĉ, and assume that B and P do not covary. We assume that a cooperator's cost is not a function of how many interactions an individual has; instead, the cost to a cooperator is inherit; this phenotype confers a fitness disadvantage from the very beginning. Individuals start the simulation with a randomly chosen baseline absolute fitness drawn from a uniform distribution (Fig 5). With more information about a particular biological system, these distributions can be tuned to better represent what the actual distribution might be. After interactions within groups, the sum of each individual's baseline fitness and benefits gained by interacting with cooperators determines the mean and variance of their final fitness distribution (Poisson distribution). This is the number of offspring that a particular individual will produce over its lifetime (one generation). This additional random step (Poisson distribution) takes into account random variation other than interactions with neighboring individuals (e.g. environmental, reproduction).
Simulation 2: We set the population each generation (non-overlapping) by assuming group size constant (4 groups of 4 haploid, asexual individuals each) within the simulation and mathematically. With this assumption, we can scale individual's cooperation experienced for cooperators and non-cooperators, so that the maximum is 1: r A g ¼ n A g À 1 GÀ 1 and r S g ¼ n A g GÀ 1 , respectively (Table 4). To calculate the average cooperation experienced for cooperators and non-cooperators, plug these terms into the � r A and � r S equations written out above for Simulation 1.
In order to magnify the role of variance and expose its influence on the evolutionary outcome of these groups, we will also set an exact configuration of individuals described in the Results. Again,Ĉ is a fixed value and B and P do not covary. In this case, cooperators do not benefit from their own diffusion products. The interaction between individuals of a group can be thought of in two ways: 1) each individual has three independent one-on-one interactions over the generation (e.g. single play game strategies) or 2) each individual has one interaction with all three individuals at the same time (e.g. swarming). We are assuming the benefits gained from each of these scenarios is the same, though, in reality they could be different.
We assume that an individual's fitness is determined as in Simulation 1 -baseline absolute fitness is randomly chosen from a uniform distribution, interactions between individuals take place within groups and the total expected number of offspring is randomly selected from a Poisson fitness distribution.