Individualised aspiration dynamics: Calculation by proofs

Cooperation is key for the evolution of biological systems ranging from bacteria communities to human societies. Evolutionary processes can dramatically alter the cooperation level. Evolutionary processes are typically of two classes: comparison based and self-evaluation based. The fate of cooperation is extremely sensitive to the details of comparison based processes. For self-evaluation processes, however, it is still unclear whether the sensitivity remains. We concentrate on a class of self-evaluation processes based on aspiration, where all the individuals adjust behaviors based on their own aspirations. We prove that the evolutionary outcome with heterogeneous aspirations is the same as that of the homogeneous one for regular networks under weak selection limit. Simulation results further suggest that it is also valid for general networks across various distributions of personalised aspirations. Our result clearly indicates that self-evaluation processes are robust in contrast with comparison based rules. In addition, our result greatly simplifies the calculation of the aspiration dynamics, which is computationally expensive.

In tphe following, we show the proofs of the three statements which are crucial for the theorem in the main text.
The average abundance is 1/2 for vanishing selection intensity Statement: If the selection intensity vanishes, i.e., β = 0, the average abundance of strategy A is one half for all individualised aspirations and aspiration-based decision making functions.
Proof: Denote q i as the state of individual i, where i ranges from 1 to N . Let q i be 1 if it is of strategy A and 0 otherwise. Thus q i is a random variable. The abundance of individuals using strategy A is N i=1 q i /N . And the average abundance of strategy A is given by [1]. Individual i changes its strategy from A to B refers to the case that q i changes from 1 to 0. Based on the aspiration dynamics, this happens if individual i is selected (with probability 1/N ), and it switches to the other strategy with probability g i (β(e i − π i )). Herein, e i and π i are the aspiration and current payoff of individual i, respectively. In particular, when β = 0, this probability is g i (0). Note that the probability is independent of the payoff and the aspiration level. The transition matrix for the state of individual i is    1 0 Since g i (0) > 0, the Markov chain is irreducible and aperiodic. Thus there is a unique stationary distribution.
It is given by the left eigenvector of 1 of the above Matrix (1), i.e., ( 1 2 , 1 2 ). Then E[q i ] = 1/2 when time evolves sufficiently long for all i. Thus the average abundance of strategy A is N i=1 E[q i ]/N = 1/2. Furthermore, this is true for all the aspiration levels and all the decision making functions.
Remark: If the selection intensity is vanishing, neither payoff nor aspiration is at work when strategy updating occurs. It resembles the neutral drift, resulting in the equal abundance between strategy A and B.
The criterion is a linear inequality of payoffs and aspirations Lemma 1 For the decision making function g i with 0 < g i < 1 and positive derivative on the real line g i > 0 (i = 1, 2, ..., N ), under weak selection limit, there exist parameters α k , ω k (k = 0, 1, ..., d − 1) and φ i (i = 1, 2, ..., N ), which are neither dependent on payoff entries nor on aspiration levels, such that if then strategy A is more abundant than strategy B.
Proof: Firstly, based on Section The average abundance is 1/2 for vanishing selection intensity, the average abundance of each strategy is exactly one half when the selection intensity is zero. It is true for any individualised aspirations and any decision making functions. Thus, it is true for the decision making function g i with 0 < g i < 1 and positive derivative g i > 0. Therefore, the condition under which one strategy is more abundant than the other is equivalent to that one strategy is more abundant than its neutral case.
Secondly, let us denote the i th digit as 1 if individual i adopts strategy A and 0 otherwise. The state of aspiration dynamics on a network as a Markov chain is the binary code with N digits. Thus the state space of the underlying Markov chain is of size 2 N . The transition probability is zero unless the two states differ in at most one digit. Without loss of generality, we assume it is individual i that is trying to update its strategy. The transition probability to another different state is 1 Here e i is the aspiration and π i is the current payoff of individual i. Since g i is differentiable at β = 0, the transition probability is differentiable at β = 0. And the first-order derivative of the transition probability with respect to β at . Furthermore, the payoff of individual i, i.e., π i , is always a linear combination of all the payoff entries, no matter what strategy it is using. This is true for both accumulated and averaged payoffs. Thus the first-order derivative of any transition probability is a linear combination of a k and b k (k = 0, 1, ..., d − 1) and e i (i = 1, 2..., N ). In other words, the first-order derivative of any transition probability with respect to selection intensity is in the form of k (α k a k +ω k b k ) + iφ i e i + 0. It is a linear combination of a k , b k and e i with constant term 0. That is to say, there is no term that does not contain payoff entries or aspiration levels. Note that the coefficientsα k s,ω k s andφ i s are Taylor coefficients. They are independent of both payoff entries and aspiration levels. In fact, they are only dependent on the decision making functions g i s, the population size and the underlying population structure.
Thirdly, the stationary distribution κ s (s ∈ S) is a rational function of transition probabilities. This rational function has no constant terms in both its denominator and numerator. That is to say, there is no term in the numerator (denominator) that does not contain transition probabilities. This result is similar to that in [2,3]. Here we provide an alternative proof: The stationary distribution κ fulfills the linear equation κ(P − I) = 0, where P is the transition probability matrix and I is the identity matrix of order 2 N . On the one hand, the adjugate matrix of P − I, adj(P − I), satisfies adj(P − I)(P − I) = det(P − I)I by Cramer's rule. For stochastic matrix P , the row sum is always one. Thus det(P − I) = 0. In particular, the first row of matrix adj(P − I) fulfills row 1 (adj(P − I))(P − I) = 0. On the other hand, the stationary distribution is unique for the ergodic Markov chain. Thus κ is row 1 (adj(P − I)) after normalization. κ s turns out to be a fraction whose numerator is row 1s (adj(P − I)) and whose denominator is s∈S row 1s (adj(P − I)).
In other words, each entry of P − I is a linear combination of (updating function) g i s. Note that the numerator of κ s , i.e., row 1s (adj(P − I)), is a determinant of a sub square matrix of P − I. And it is a sum of products of the entries in P − I with no additional constant terms. By the same argument, the denominator of κ s is also a sum of products of the entries in P − I. In other words, the stationary distribution κ s is a rational function of all the transition probabilities. Expanding the denominator and numerator of κ s to the first order with respect to the selection intensify yields and the first-order derivative of the stationary distribution κ s is l1w0−w1l0 then strategy A is more abundant than strategy B. Furthermore, the coefficients are neither dependent on the payoff entries nor the aspiration levels.
Remark: The proof consists of two parts. One is to make an equivalence between the original question with a new one, i.e., whether the selection intensity increases the abundance of a strategy. The other part makes use of perturbation analysis to arrive at the criterion. In particular, there are few constraints on the decision-making function g i , but differentiability. This is to ensure that the average abundance is differentiable around β = 0, such that Taylor expansion works.
The average abundance is 1/2 for neutral mutants for any selection intensity Since g i (0) > 0, the Markov chain is irreducible and aperiodic. Thus there is a unique stationary distribution. Remark: The proof is similar to that in Section The average abundance is 1/2 for vanishing selection intensity. In fact, if all the payoff entries are the same, it is similar to the neutral mutants in population genetics. Thus, the evolution of strategy for any individual is effectively independent on the others. The critical assumption g i (0) > 0 ensures that every individual has a stationary distribution. By symmetry of the transition probability of the Markov chain, the abundances of the two strategies are equal. Furthermore, it is true for any selection intensity.

Estimating coefficients
Non-negativity of the coefficients We are going to show i) σ k ≥ 0 and ii) d−1 k=0 σ k > 0. These indicate that all the coefficients are non-negative, and at least one of those coefficients is positive.
For any k ∈ {0, 1, 2..., d − 1}, let us choose the payoff table with a k = 1 and all the rest being zero. The theorem in the main text indicates that the first-order derivative of the abundance of strategy A is proportional to σ k . Thus σ k is non-negative if and only if strategy A is greater than or equal with strategy B in abundance.
In the following, we make use of the equivalence to prove that σ k is non-negative.
based on the mean-field method. Here π A i (t) and π B i (t) are payoffs for individual i, if it is of strategy A and B. It is time dependent due to the strategy adjustment and composition of its neighbourhood. Yet we have that π A i (t) ≥ 0 and π B i (t) = 0 taking into account our payoff table. Further, g i is an increasing function, we have that For the systemẏ = 1 N g i (βe i )(1 − 2y), y * = 1/2 is a global stable equilibrium. This indicates that lim t→+∞ y(t) = 1/2. Based on comparison principle of differential system [4], we obtain that x i (t) ≥ y(t) for all t > 0. Thus lim t→+∞ x i (t) ≥ 1/2. The average abundance of strategy A is the sum of those probabilities. Therefore strategy A is greater than or equal to one half in abundance. This implies that σ k ≥ 0 In the following we address that the sum of all the coefficients are positive. Let us consider a payoff table with a k = 1 and b k = 0 for all k ∈ {0, 1..., d − 1}. On the one hand, the first-order derivative of the average abundance of strategy A is k=0 σ k by the theorem in the main text. On the other hand, for such a payoff table, the payoff of an individual using strategy A is 1, which is independent on its neighbor configuration. Similarly, the payoff of an individual using strategy B is always 0.
Denote the state of individual i as 1 and 0, if it is of strategy A and B, respectively. The transition matrix, which is homogenous in time, is given by Since g i (0)g (0) > 0, g i (βe i ) > 0 for weak selection. This implies that the above Markov chain is irreducible and aperiodic. Thus there exists a stationary distribution, given by 1 gi(βei)+gi(β(ei−1)) (g i (βe i ), g i (β(e i − 1))). Since g i is increasing and selection intensity β > 0, we have that g i (β(e i − 1)) < g i (βe i ), for all selection intensity. This yields that state 1 is strictly greater than state 0 in probability. The abundance of strategy A is simply the sum of those probabilities, thus strategy A is more abundant that strategy B. This is equivalent with the fact the first-order derivative of the average abundance of strategy A is positive, yielding d−1 k=0 σ k > 0 by the theorem.

Calculation of coefficients
The method we used here is similar to the SI in [5]. The basic idea is to employ pair-approximation to obtain a deterministic equation, and apply the perturbation theory to estimate the coefficients.
An individual, namely i, is selected randomly from the entire population. This individual plays a d−player game with all of its neighbours. And it gets payoff π i . It switches to the other strategy with probability based on the payoff difference between its payoff and its aspiration, i.e., [1 + exp(−β(e i − π i ))] −1 . Here e i is the aspiration of the focal individual. In the following we assume e i = e for all i = 1, 2..., N . And we try to calculate the structure coefficients σ k s (k = 0, 1..., d − 1) in this case.
Let p A and p B denote the frequencies of A and B in the population. Let p AA , p AB , p BA and p BB denote the frequencies of AA, AB, BA and BB pairs. We make use of these six frequencies to approximate that of triplets and higher order moments. Let q X|Y denote the conditional probability to find an X-player given that the adjacent node is occupied by a Y -player (X, Y ∈ {A, B}).
Further the following identities hold It indicates that the system is determined by two independent variables, say p A and p AA .
Updating a B-player.
Concerning the change of p A via updating a player using strategy B. Firstly a strategy B individual is selected randomly (with probability p B ). The likelihood with which it has k A strategy A neighbours and k B strategy Secondly, this strategy B individual switches to strategy A with probability [1 + exp(−β(e − b k A ))] −1 . Therefore, p A increases by 1/N with probability Regarding the change of p AA via updating a B-player, if a strategy B individual who has k A strategy A neighbours is selected and switches its strategy to A, the number of AA−pairs increases by k A . Therefore is the total number of the links in the population.

Therefore
Prob Updating an A-player.
Concerning the change of p A via updating an A-player, firstly a strategy A individual is selected randomly (with probability p A ). The likelihood that it has k A strategy A neighbours and k B strategy B neighbours is given by d−1 k A q k A A|A q k B B|A . Secondly, this strategy A individual switches to strategy B with probability [1 + exp(−β(e − a k A ))] −1 . Therefore, p A decreases by 1/N with probability Regarding the change of p AA via updating an A-player, a strategy A individual is selected (with probability Then it switches to strategy B with probability [1 + e −β(e−a k A ) ] −1 . The number of AA−pairs decreases by k A . Therefore Pair approximation.
Let us rescale the time step to dt at which a strategy updating happens. Thuṡ anḋ Approximating the coefficients.
Based on Eq. (14), p A is approximately the same as p B as the system is in the stable state under weak selection limit. The difference is of the first order of selection intensity. Let the frequency of strategy A in the steady state be where Q is the Taylor coefficient. Then due to the normalisation condition p A + p B = 1. Based on Eq. (15), p AA is approximately captured by p AB in the stationary state under weak selection limit, β 1. The difference is of the first-order of the selection intensity. Taking into account Eqs. (9) and the identity p AB + p AA + p BA + p BB = 1 yields the frequency for pairs where X, Y ∈ {A, B}. Thus we have for all X, Y ∈ {A, B}.
Inserting Eqs. (16) (17) and (19) into Eq.(14) leads tȯ In the steady state, we have thatṗ A = 0, thus strategy A is more abundant than that of strategy B. Therefore, we have that Finally, we explicitly obtain the structure coefficients. And these coefficients are also valid for heterogeneous aspiration levels by our theorem.

Estimating the coefficients via simulation
Our theorem indicates that a simple linear inequality of payoff entries suffices to identify which strategy is more abundant. In fact, the linear combination is proportional to the first-order derivative of the average abundance. In other words, the average abundance of strategy A, p A , is approximately 1 Here Q * is a positive constant which is independent of payoff entries.
In particular, for the following payoff matrix (table), the average abundance of strategy A, p A (x, y, z), on Number of coplayers with strategy A 0 1 2 A x y z B 0 0 0 , the ring with aspiration dynamics is approximately 1 2 + Q * (σ 0 x + σ 1 y + σ 2 z) β.
Under weak selection limit, 1 β (p A (x, y, z) − 1 2 ) is approximated by Q * (σ 0 x + σ 1 y + σ 2 z). Thus For each sample, we iterate the process based on aspiration for 10 6 generations. The sample of the average abundance of strategy A is obtained by averaging the last 10 5 generations. Then we employ linear regression to estimate 1 β (p A (x, y, z) − 1 2 ). In other words, we assume that 1 β (p A (x, y, z) − 1 2 ) is approximated bỹ σ 0 x +σ 1 y +σ 2 z + intercept. We set the confidence level to be 95% for all the estimated coefficients. The estimated intervals are in the form of [EC-ME, EC+ME], where EC stands for the mean of the estimated coefficients, and 2ME are the length of the interval.