Figures
Abstract
Repetition is a classic mechanism for the evolution of cooperation. The standard way to study repeated games is to assume that there is an exogenous probability with which every interaction is repeated. If it is sufficiently likely that interactions are repeated, then reciprocity and cooperation can evolve together in repeated prisoner’s dilemmas. Who individuals interact with can however also be under their control, or at least to some degree. If we change the standard model so that it allows for individuals to terminate the interaction with their current partner, and find someone else to play their prisoner’s dilemmas with, then this limits the effectiveness of disciplining each other within the partnership, as one can always leave to escape punishment. The option to leave can however also be used to get away from someone who is not cooperating, which also has a disciplining effect. We find that the net effect of introducing the option to leave on cooperation is positive; with the option to leave, the average amount of cooperation that evolves in simulations is substantially higher than without. One of the reasons for this increase in cooperation is that partner choice creates endogenous phenotypic assortment. Compared to the standard models for the co-evolution of reciprocity and cooperation, and models of kin selection, our model thereby produces a better match with many forms of human cooperation in repeated settings. Individuals in our model end up interacting, not with random others that they cannot separate from, once matched, or with others that they are genetically related to, but with partners that they choose to stay with, and that are similarly dependable not to play defect as they are themselves.
Author summary
The two mechanisms studied most in the literature on the evolution of cooperation are population structure (or kin selection), and repetition, which can allow for reciprocity to evolve. In the literature on repeated games, it is typically assumed that the matching is random and exogenous. However, not all human interactions in which there is scope for cooperation take place between individuals that have no say in who they play with, or between individuals that are genetically related. In many interactions, individuals can decide to stay with their partner, or leave and find someone else to play their repeated games with. We show that if we include the option to leave in an otherwise classical setting of repeated interactions, partner choice can evolve and maintain higher levels of cooperation than reciprocity does in the standard setting, where individuals cannot leave their partner. This points to the power of partner choice.
Citation: Graser C, Fujiwara-Greve T, García J, van Veelen M (2025) Repeated games with partner choice. PLoS Comput Biol 21(2): e1012810. https://doi.org/10.1371/journal.pcbi.1012810
Editor: Christian Hilbe, Max Planck Institute for Evolutionary Biology: Max-Planck-Institut fur Evolutionsbiologie, GERMANY
Received: July 18, 2024; Accepted: January 20, 2025; Published: February 4, 2025
Copyright: © 2025 Graser et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: This paper does not use data. The code used in our simulations is publicly available on Github: https://github.com/cjgraser/Repeated-Games-and-Partner-Choice.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
In the prisoner’s dilemma, repetition can stabilize cooperation. For cooperation to be stable, players need to condition their behaviour on the past actions of their interaction partner. If their partner does not cooperate, or does not cooperate enough, then reciprocal players respond with defecting, or with defecting more than they otherwise would. When faced with reciprocal partners, the self-interested thing to do can be to cooperate now in order to receive cooperation in the future. If prisoner’s dilemmas are repeated, this allows for reciprocity and cooperation to evolve together [1–19].
The standard setup in models for the co-evolution of reciprocity and cooperation assumes that randomly matched individuals are tied to their partner until the repeated game ends. Who plays with whom therefore is determined exogenously. In this paper, we allow for players to end their interaction with their current partner, and look for someone else to continue playing with. Real life interactions are heterogeneous in the degree to which humans are tied to their partners. Some types of interactions allow for easy ways to change partners, others impose higher thresholds for dissolving a partnership, but all interactions find themselves somewhere on the spectrum between the standard setting, where changing partners is not possible at all, and the setting of this paper, where partners can be left, and new partners can be found, after any round of the game.
We also assume that players are not informed about their new partner’s past choices. If players are informed about what their partner did in previous interactions with other players, then this could be used to enforce cooperation through norms [20], and it would allow for reputation building [21, 22], or indirect reciprocity [23, 24]. By considering a minimal setting in which no information is shared with new partners, we eliminate these possibilities. This way we isolate the role of partner choice in a minimalistic setting, without prior information (cf. [25–29]).
We analyze the evolutionary dynamics in a population playing the repeated prisoner’s dilemma, and we investigate whether the option to leave undermines or facilitates the evolution of cooperation. There is a number of papers that have a setup in which players have the option to leave. There are theory papers with repeated prisoner’s dilemmas, or public goods games, in which there is the option to leave [30–50], and there are empirical papers with a somewhat similar setup [51–56]. There are also theory papers in which retaliating by defecting, or by cooperating less, is not an option, but leaving is [27, 57–61].
While most of these papers do not combine a full game-theoretical analysis with studying the evolutionary dynamics, this literature does contain findings that are relevant for the dynamics. The most important one is a result that states that with the option to leave, there are no equilibria in which all players start cooperating in the first round of every new interaction ([41], see also [31, 35, 36]). The rationale for this is straightforward. Any population in which all individuals do start cooperating right from the beginning can be invaded by a mutant that takes advantage of this, by defecting and leaving after the first period. Such a mutant would get the highest possible payoff in every round, while the resident could at most earn an average payoff equal to the payoff of mutual cooperation. Without the option to leave, there are equilibria with full cooperation, and the fact that fully cooperative equilibria do not exist if leaving is allowed for is a reflection of the downside of the option to leave, which would allow for cheaters to get away with defection and escape punishment whenever cooperation starts in the first round.
To prevent invasions by defect-and-run mutants, a simple solution could be to start every new partnership with a defection. Depending on the parameters of the game, however, one round of mutual defection may not be enough to avoid exploitation. For some combinations of the benefit-to-cost ratio and the continuation probability, starting to cooperate in round 2 may still leave the door open for a mutant that sits out one round of mutual defection, then defects on a resident that starts cooperating in round 2, and subsequently leaves in order to repeat this with its next partner. In Theorem 4 in S1 Text we specify the minimum length of this initial string of defections. This theorem is a simpler version of a result that implies that as soon as this threshold is met, equilibria that cooperate afterwards do exist [41]. The initial string of defections is sometimes also referred to as the trust-building phase [41–43, 47, 48]. This label may not perfectly match what we think of as trust-enhancing behaviour, but the idea is that what builds trust here is the staying, in anticipation of future mutual cooperation, and not the defecting. We will follow the existing literature in using this term.
The fact that the option to leave rules out fully cooperative equilibria, and may require multiple periods of trust-building, suggests that leaving might be bad for the evolution of cooperation. Below, we will see that this is not the case; average amounts of cooperation go up rather than down if the option to leave is added. There is a variety of reasons why the option to leave can also foster higher average cooperation rates. In Section Relative stability of cooperative equilibria with and without leaving, we will see that punishing by leaving creates endogenous assortment, and that this assortment can make equilibria that punish by leaving more stable than similar equilibria that punish with defection. In Section Getting away from AllD at a higher rate we will also see that transitions out of fully defecting equilibria happen more readily when leaving is allowed for.
Materials and methods
The model setup
In this paper, we restrict attention to prisoner’s dilemmas with equal gains from switching [4], where the cost of cooperating instead of defecting is c, irrespective of whether the opponent cooperates or defects, and the benefits to the other player are b, again irrespective of what the opponent plays herself.
In order to simplify the notation and analysis further, and without loss of generality, we normalize the costs to c = 1. This means that we can interpret the b in the payoff matrix as the benefit-to-cost ratio.
Strategies are represented by finite state automata (FSAs). Fig 1 depicts an example of an FSA. The colours of the states represent the output when the FSA is in this state: red means defect; blue means cooperate; and black means that this FSA terminates the interaction. All FSAs start in the leftmost state when they begin interacting with a new partner, and the arrows indicate to which state the FSA moves in response to their partner’s action. After termination, the FSA does not have to transition to any state; it restarts the interaction with its new partner in the leftmost state. Representing strategies as FSAs allows agent-based simulations to explore a very rich space of strategies; any thinkable strategy can be approximated arbitrarily closely by an FSA, and all FSA’s can be reached by a finite sequence of mutations [17, 62].
This finite state automaton (FSA) has three states; in state 1, the output is D (red); in state 2 the output is C (blue); and in state 3, the output is to leave (black). The arrows indicate to which state the strategy goes, after observing an action (C or D) by its opponent. The FSA starts every interaction in the leftmost state.
In the model, we assume that individuals are matched to play the prisoner’s dilemma. After every round, each pair is broken up exogenously with probability 1 − δ, where δ ∈ (0, 1). Pairs can also be broken up because one of the players, or both, choose to end the interaction. All broken-up pairs go to the matching pool, in which they are re-matched before the subsequent stage game starts. Re-matching happens uniformly at random; all pairs of individuals from the matching pool are equally likely to be formed. The matching pool is not a random draw from the population as a whole. If it would only contain individuals coming from pairs that are broken up exogenously, then the frequencies in the matching pool would match the frequencies in the population as a whole. However, the matching pool also contains individuals that broke up with their partner themselves, and individuals that are broken up with. Whether that happens is determined by the combination of strategies in the pair.
For the theoretical results, we assume an infinitely large population. This allows us to calculate payoffs for all strategies that are present in the population, assuming that the dynamics of separating and matching reach a steady state. This steady state, or short-run equilibrium, describes the shares of pairs consisting of different combinations of two strategies, and the round of the game they find themselves in. Here we will give a simple example to illustrate how that works.
If there is only one strategy present in the population, and if that strategy never leaves its partner, then all pairs consist of two individuals that both play this one strategy. In principle it is possible that all pairs are in their first round of play. As pairs are broken up randomly, however, over time, the population will converge to a state in which the ratio of pairs in their first round to those in their nth round is 1 to δn−1. The intuition for this is that all new pairs start in round 1, while the probability for any pair making it to the nth round is δn−1 (see Table 1).
We then calculate the payoffs by taking a weighted average over the payoffs in the different rounds, where the weight of the payoffs in the nth round is proportional to δn−1. Without the option to leave, averaging all stage game payoffs over the population at a steady state is equivalent to the standard calculation of expected payoffs in repeated games (see S1 Text).
If the population includes strategies that may choose to leave, however, the calculation of such a steady state becomes more complicated [41, 47]. As mentioned above, if there are combinations of strategies in which one of the partners chooses to leave, then the shares of the different strategies in the matching pool and the shares of the different strategies in the population as a whole can diverge. We describe the way of calculating the short-run equilibrium and the expected payoffs that this short-run equilibrium implies in detail in S1 Text.
Long-run equilibrium and indirect invasions
The steady state or short-run equilibrium takes the composition of the population as given. In the long run, however, mutation and selection can change the composition of the population. A given composition of the population may or may not be a Nash equilibrium, and in order to determine if it is, we compare the payoffs of the strategies currently present in the population with each other, and with the payoffs that alternative strategies would have, if they were to enter in the population at an infinitesimal small frequency. We also use this separation of timescales when we apply other equilibrium concepts, like evolutionary stability, neutral stability, and robustness against indirect invasions, where we also assume that the population is in short-run equilibrium in order to calculate the expected payoffs of all strategies.
For repeated prisoner’s dilemmas without the option to leave, we know that there are many strategies that are neutrally stable (NSS) [6], and we know that there are no finite mixtures of strategies that are robust against indirect invasions (RAII) [11, 17, 63]. This means that in finite populations, every Nash equilibrium can be invaded indirectly; for every Nash equilibrium, there is a neutral mutant that, if it goes to fixation, opens the door for a second mutant, that then has a selective advantage. This theoretical result is matched by the fact that in simulations without the option to leave, all Nash equilibria are indeed left in due time. Moreover, for reasonably large population sizes, all of those transitions out of equilibria happen through indirect invasions [11, 17].
With the option to leave, this remains true; all finite mixtures that are Nash equilibria can be invaded indirectly (see Theorems 1 and 2 in S1 Text for a formal proof of the claim that also with the option to leave, there are no pure Nash equilibria that are RAII. The extension to finite mixtures is also discussed in S1 Text). The reason is similar to the reason without the option to leave, and is explained more easily for pure equilibria. For pure equilibria with positive amounts of cooperation, this cooperation needs to be stabilized with the threat of punishment—which can be to defect (or to defect more than the strategy would otherwise), or to leave. When a population finds itself in such an equilibrium, this punishment is not executed. A mutant that has lost the capacity to punish therefore would be neutral. If random drift allows this neutral mutant to go to fixation, it would open the door for a second mutant, that takes advantage of the loss of the capacity to punish (Fig 2).
AllC is a neutral mutant of Tit-for-Tat; both always cooperate with copies of themselves and each other. If AllC goes to fixation by neutral drift, AllD can invade.
An equilibrium with defection only can also be invaded indirectly. Without the option to leave, this would require a mutant that would cooperate, if its partner initiates it. This is a neutral mutant, and it would open the door for a second mutant that reaps the rewards for initiating cooperation (see Fig 3). This stepping stone path requires a minimum δ for the second mutant to have a payoff advantage, and it is still there as a path out of full defection if leaving is allowed for.
Suspicious Tit-for-Tat is a neutral mutant of AllD; both always defect with copies of themselves and each other. If Suspicious Tit-for-Tat goes to fixation by neutral drift, C-Tit-for-Tat can invade.
With the option to leave, there is also an additional stepping stone path out of fully defecting strategies. A strategy that defects and leaves would be a neutral mutant of any fully defecting strategy. If this strategy takes over the population, it opens the door for a mutant that defects, stays, and cooperates forever after, if it finds its partner has stayed as well (see Fig 4). Importantly, this path out does not require a minimum δ to constitute an indirect invasion. For low δ, equilibria without any cooperation therefore are less stable with the option to leave than they are without the option to leave.
D-leave is a neutral mutant of AllD. If D-leave has gone to fixation, c1 can invade.
For sufficiently high b and δ, there are stepping stone paths out of any Nash equilibrium, both with and without the option to leave. We therefore expect populations to visit a variety of equilibria, and to transition between them through indirect invasions. Which strategies are and which are not Nash equilibria, however, differs between the two settings. Tit-for-Tat, for example, is an equilibrium without the option to leave, provided that b and δ are sufficiently high, while it stops being an equilibrium if leaving is allowed for. The reason why it cannot be an equilibrium with the option to leave is that it cooperates in the first round. The strategy c1, as depicted in Fig 1, on the other hand, is an equilibrium when leaving is possible, provided that b and δ are sufficiently high, but since it has a state in which it leaves, this obviously is not a feasible strategy if leaving is not allowed for.
Results
Simulations with and without the option to leave
The simulations do not have an infinitely large population. Because it is a simulation, and not a theoretical model, we can moreover not simply assume that the population always is in short-run equilibrium. However, even a moderately large finite population tends to be relatively close to a short-run equilibrium almost all of the time. More importantly, the long-run dynamics we see in the simulations match what the theory predicts, as we observe sequences of indirect invasions.
The comparison we make here is straightforward. In one set of simulations, the output in any state of an FSA can only be to cooperate or to defect. In the other set of simulations, apart from the initial state, the output in all other states can also be to leave. In the first set, without the option to leave, the model then reverts back to the standard model of repeated prisoner’s dilemma, with a continuation probability that is equal to the probability δ with which pairs are not broken up exogenoulsy (see S1 Text for technical details). We then ran the simulations for a range of b’s (which should be interpreted as the benefit-to-cost ratio, since we normalized the c’s to 1), and a range of δ’s, where 1 − δ is the exogenous breakup probability. Comparing the average amount of cooperation with and without the option to leave, we find that the option to leave elevates cooperation levels substantially. For all combinations of b and δ, cooperation is at least as high with the option to leave as it is without, and the difference is sizable; if we take the average amount of cooperation over the parameter space without leaving—that is: for δ from 0.01 to 0.95 in steps of 0.02, and for b from 1 to 6 in steps of 0.1—and compare it to the average amount with the option to leave, then the latter is 42% higher. We should obviously not attach deeper meaning to this exact number, because it is the result of a somewhat arbitrary choice to stop at b = 6. If we were to restrict the parameter space to benefit-to-cost ratios between 1 and 5, the difference would be larger than 42%, and if we restrict it to benefit-to-cost ratios between 1 and 7 the gap would be a bit smaller. The number does however justify summarizing the observation that, compared to panel A in Fig 5, in panel B the cooperation levels are lifted up to a substantial degree. Simulation results with alternative mutation procedures suggest that this is not an artefact of the particulars of the mutation procedure (see S1 Text).
Panel A reflects average cooperation levels for a range of benefit-to-cost ratios b and continuation probabilities δ without the option to leave. Panel B does the same, but with the option to leave. The population size is N = 100. Selection or mutation steps happen at a rate of 0.05 per stage game per matched pair. At a selection or a mutation step, the pair is broken up, and the strategies are replaced by offspring from strategies in the current population, in case of selection, or by a mutant. The rate at which selection events or mutations happen implies that δ has an upper bound of 0.95. In expectation, one mutation happens per 250 selection events. The color scale in panels A and B runs from 0 to 0.785, which is the highest average cooperation level in panel B. Below the dotted line, no cooperative equilibria exist, both with and without the option to leave. Panel C displays the difference in average cooperation levels between A and B.
A change from a setting without to a setting with the option to leave implies an expansion of the set of strategies. FSAs that only have states in which they cooperate or defect are obviously allowed for, both when leaving is possible, and when it is not. FSAs that also have states in which the output is that it leaves, on the other hand, are only included in the set of strategies when leaving is allowed for. The expansion of the set of strategies means that in the equilibrium analysis, there are more mutants to consider. For some strategies that are equilibria without the option to leave, that means that they stop being equilibria with the option to leave. This includes strategies that start cooperating in the first round, as noted at the end of the Introduction. Other strategies continue to be Nash equilibria, but might nonetheless be left through indirect invasions at a higher or lower rate. On the other hand, extending the set of strategies not only means that there are more mutants to consider, but also more residents that can be equilibria.
In the following sections, we will try to identify reasons why there is more cooperation with the option to leave than there is without. All of the ingredients mentioned above will be part of the answer. In the next section we show that in the version of the game with the option to leave, there are indeed new equilibria that punish by leaving, and we will show that these equilibria are relatively stable. In Section Getting away from AllD at a higher rate we will point to the fact that fully defecting equilibria are invaded at higher rate if leaving is allowed for.
Relative stability of cooperative equilibria with and without leaving
In order to see how the option to leave can add equilibria that are more stable than similar equilibria without the option to leave, we turn to an example. In this example, we compare an equilibrium strategy that punishes with defection, and one that punishes with leaving. The strategy that punishes with defection is labelled g1, and it is best described as Grim Trigger preceded by a 1-period trust-building phase. The other strategy is c1, which also has a 1-period trust-building phase, but responds to defection after the first period by leaving (see Fig 6). If b and δ are sufficiently high, both are Nash equilibria—although there is an intermediate part of the parameter space where g1 is, and c1 is not (yet) an equilibrium. The reason for this is that after the second round, a mutant AllD in a population where all others play g1 gets a payoff of 0 until the pair is broken up exogenously, while a mutant AllD in a population where all others play c1 is left after the second round, and can extract benefits from their new partner. We will return to this below. When playing against copies of themselves, both of these strategies play one round of defection, and then cooperate until the pair is broken up exogenously. The only difference is that one punishes deviations with forever defection, and the other with leaving.
The payoffs of g1 against g1, and g1 against dC are the same as the payoffs of c1 against c1, and c1 against dC. The payoffs of dC against g1 and dC against c1 are also the same. The payoffs of AllD against g1 and AllD against c1 differ, but if dC goes to fixation before AllD arises, then both indirect invasions are equally likely to succeed.
Both strategies can be invaded indirectly in the same way. A strategy that is identical to g1, or to c1, respectively, but that loses the ability to punish would be a neutral mutant for both. Given that the only difference between g1 and c1 is the way they punish, such a neutral mutant ends up being the exact same strategy for both; this would be dC (see Fig 6). Strategy dC is neutral, both for g1 and for c1, and therefore it has a fixation probability of , where N is the population size.
If we can assume that the first mutant has either gone extinct, or gone to fixation, before the next mutant appears, then these indirect invasions into either g1 or c1 are equally likely to succeed, because also the second step in the indirect invasion is identical.
If the mutation rate is not low enough to justify this assumption, however, there is a difference. If AllD enters the population at a point in time at which both the resident (g1 or c1) and the first mutant (dC) are still present, this takes the population to the interior of the simplex (i.e., to a mix of all three strategies). In the interior of the simplex, the replicator dynamics are different, and since the replicator dynamics are also informative about the average dynamics in finite populations, the properties of the finite population dynamics will be different too. A key ingredient for this difference is that the presence of both c1 and AllD in the same population creates assortment, while with g1 and AllD, this is not the case.
The easier way to see this difference is to first focus on a population of just g1 and AllD, and compare it to a population of c1 and AllD. With g1 and AllD, all strategies just stay together, and no leaving happens in any combination. That means that we can use the standard replicator dynamics.
where
and
The four constants in the payoff matrix are the average per-period payoffs for the four possible pairs, which is (1 − δ) times the total discounted expected payoffs that we normally use in settings without leaving. This normalization only affects the speed of the replicator dynamics.
With c1 and AllD, on the other hand, the c1-players stick together, while they dissociate from the AllD players. That implies that the share of c1-players that is playing with other c1-players stops being linear in the total share of c1-players present in the population. The average payoffs therefore are also no longer linear in the shares of the two strategies, and we need to write the replicator dynamics in a more general form.
where
and
S1 Text shows how average per-period payoffs
and
are calculated.
At really low frequencies of AllD, g1 actually does a better job at suppressing AllD payoffs, because g1 “binds” the mutant AllD’s, and after allowing AllD to get a payoff of b once, g1 then holds AllD down to a payoff of 0 in all subsequent periods. Strategy c1, on the other hand, cuts AllD loose, which allows it to go on and exploit other c1’s. This implies that at the point of invasion (on the very left of panels C and D of Fig 7), AllD actually gets higher payoffs with c1 than it does with g1.
The red lines in panels A and B delineate the basins of attraction of AllD. Besides the basin of attraction of AllD being smaller in panel B, on trajectories in the interior that do not converge to AllD, more dC is weeded out along the way. Panels C and D provide details relevant to the replicator dynamics on the left edges of the simplex. Gray lines indicate assortment, or relatedness, calculated as the probability with which type i individuals are matched with other type i individuals minus the probability with which an individual of type j ≠ i is matched with type i. In other words, r = P(i|i) − P(i|j) for j ≠ i. Also in the interior of the simplex in panel B, but not in panel A, there will be assortment. Payoffs, assortment, and the replicator dynamics are all calculated under the assumption that the distribution of players over pair-types, and over rounds of play, is stationary [47]. The parameter values used are b = 3 and δ = 0.8.
At higher frequencies of AllD, however, the assortment that c1 creates by staying with other c1’s, but dissociating from AllD’s implies that AllD’s mostly find other AllD’s in the matching pool. This assortment suppresses the payoffs to those that play AllD, when the resident is c1. At low frequencies, AllD payoffs therefore are suppressed more when the resident is g1, while at higher frequencies, AllD payoffs are suppressed more when the resident is c1. For higher b and δ, the latter effect overpowers the former, making c1 more stable against AllD than g1 is against AllD.
Comparing mixes of g1 and AllD with mixes of c1 and AllD helps understand why there is endogenous assortment with c1, and not with g1. The more relevant effect on the stability of the two equilibria we consider here, however, is due to the differences in the interior of the simplex, where three strategies are present. When the strategies present are g1, dC, and AllD, then also in the interior of the simplex, there is no assortment. With c1, dC, and AllD, on the other hand, there is assortment. The assortment reduces the size of the basin of attraction of AllD (see Fig 7A and 7B). Moreover, along trajectories in the interior of the simplex that are outside of the basin of attraction of AllD, selection weeds out more dC in panel (b) than their counterparts in panel (a) that start at the same points.
Both of these observations are relevant if we compare the likelihood of a successful indirect invasion into c1 with the likelihood of a successful indirect invasion into g1 in finite population dynamics. If first a mutant dC appears, random drift may take the population from the top vertex of the simplex to states on the line segment between g1, or c1, and dC (the right edge). If a second mutant AllD appears while the population is on this edge, it then moves to the interior. AllD subsequently is likely to take over the population, if its appearance puts the population in the basin of attraction of AllD (that is: below the red lines in the respective simplices). As Fig 7 shows, starting at the top of the simplex, the basin of attraction is easier to reach in panel A than in panel B. Therefore, in a finite population, AllD is more likely to go to fixation if the original resident is g1 than it is if the original resident is c1. On top of that, if the second mutant arrives at a point above the red line in panel (a)—and therefore also above the red line in panel (b)—then, starting at the same point in both panels, more dC will be weeded out along the way as AllD goes extinct in panel (b) compared to panel (a). Leaving the population closer to the top vertex of the simplex makes it more likely that, subsequently, c1 goes to fixation before a new mutant appears than it is that g1 goes to fixation before a new mutant appears. That makes it less likely that neutral drift brings the population to a point where AllD can successfully invade when the next mutant arrives. Thus, in finite populations, overcoming the tides and currents against this indirect invasion is harder when c1 is the resident, and it takes, on average, more mutations, and therefore more time, to successfully leave c1.
We can also see the effect on the relative stability of c1 and g1 of not being in the low-mutation limit by calculating invariant distributions in a finite population for a strategy set that only consists of those four strategies. Increasing the mutation rate increases the time spent in the interior of the simplex, and it increases the incidence of new mutants arising before the previous one has gone to fixation or has gone extinct. This comes with an increase in the time spent in the c1 equilibrium relative to the time spent in the g1 equilibrium, and it increases the average share of c1 (see Fig 8). Both of these effects are only reversed when the mutation rate approaches the point where mutation becomes the only ingredient of the dynamics, leaving no room for selection.
The plots show properties of the stationary distribution of a Moran process with parameters N = 40, δ = 0.8 and b = 3 and strategies c1, g1, dC, and AllD. Strategies mutate with equal probabilities into any of the other strategies. The horizontal axis indicates the ratio of mutation steps relative to selection steps. E.g., at a mutation rate of −1, a mutation happens every 10 selection events, and at a mutation rate of −2, a mutation happens every 100 selection events. Panel A shows, out of the time that is spent at monomorphic population states, how much of it is spent at the different strategies respectively. Panel B shows the average overall frequencies of individuals of the respective types.
When punishing with leaving is better than punishing with defecting
The four-strategy model above indicates how equilibria that punish by leaving can be more resistant to indirect invasions compared to equilibria that punish by defecting, away from the low-mutation limit. Using the same set of strategies, but without dC, we can also see how, if leaving is an option, punishing by leaving can outperform punishing by defecting in direct competition between the two modes of punishment. If we focus on a population consisting of strategies c1, g1, and AllD only, then what matters for whether c1 or g1 performs better is the likelihood with which, after breaking up with an AllD player, one is re-matched to an AllD player. If the probability of trading in one AllD partner for another is high, it is better to be g1, and sit the current match out. This will result in getting the mutual defection payoff while it lasts, but that is better than risking wasting another second-round cooperation on a new AllD player. If the probability of being matched again with yet another AllD player is not too high, on the other hand, it is better to be c1 and leave, in the hope of finding a more cooperative partner. The threshold frequency for when the prospect of establishing mutual cooperation makes it worth the risk is favorable for c1; only at very high frequencies of AllD is it better to punish with defection (see Fig 9). Moreover, if punishing by defecting has an advantage over punishing by leaving, both are already losing to AllD. This can be seen by the blue line being inside of the basin of attraction of AllD, the boundary of which is indicated by the red line in Fig 9. Everywhere outside the basin of attraction of AllD, and therefore all along all paths where cooperation ends up prevailing, c1 always outperforms g1 (see also S1 Text for a proof that this is true for all values of b and δ, and for gn and cn for all n ≥ 1).
Everywhere, except for the area down/left from the blue line, c1, which punishes with leaving, outperforms g1, which punishes with defection. The small area where g1 outperforms c1 lies entirely within the basin of attraction of AllD, which is delineated by the red line. S1 Text contains a proof that this holds for all values of b and δ, and also for pairs of cooperative strategies with longer trust-building phases. This implies that, all else equal, if reciprocity evolves, those that punish with leaving always do better than those that punish with defection. The parameter values used for this simplex are b = 3 and δ = 0.8.
Getting away from AllD at a higher rate
The dynamics, both with and without the option to leave, tend to go through similar phases. A population state in which there is no cooperation whatsoever is invaded indirectly, after which the population settles on a cooperative equilibrium. A subsequent indirect invasion then takes it back to a fully defecting equilibrium, such as AllD. Sometimes an indirect invasion will take the population from one equilibrium with a positive amount of cooperation to another one with a different amount of cooperation, but transitions with a complete loss of cooperation are sufficiently more frequent to ensure that the population returns to the set of fully defecting equilibria very regularly. Given that cooperation tends to break down completely, before it is re-established, both with and without the option to leave, any change in the rate at which states like AllD are left is consequential.
As is illustrated in Fig 10, the average time it takes for mutation and selection to find a path out of equilibria that are equivalent to AllD is lower in the setting with the option to leave than it is without it. This also contributes to the fact that there is more cooperation in the game with the option to leave than there is without, even though the option to leave limits the effectiveness of punishment with defection.
For a variety of b/c-ratios, panel A shows how much time (measured as the number of consecutive selection steps) the population spends on average with a resident that only defects, before a more cooperative mutant successfully invades. The continuation probability is fixed at δ = 0.9. Panel B shows the distribution of escape times for a b/c-ratio of 3. The arrival rate of mutants is the same, with or without the option to leave. The sets of possible mutants and the distributions over those are different between the two settings due to the difference in feasible strategies.
Model choices
There are many ways in which our model is stylized. Below, we will go over a few model choices we made, and discuss how restricting they are. We will also discuss how different or similar they are to choices made in other papers.
First of all, our model looks at repeated prisoner’s dilemma, which means that, if we add the option to leave, individuals still have the option to respond to defection with defection. That allows us to evaluate what happens if both responding in kind and responding by leaving are possible. There are however also papers that explore what leaving can do to cooperation, without comparing it to the effectiveness of reciprocity. An elegant way to do that, is to endow individuals with a level of cooperation, or generosity, and with a threshold for how much generosity they need to experience from their partner for them not to leave, which is also referred to as their choosiness. This is what happens in [57] and [27]. On all dimensions we discuss below, our paper makes the same choices as the latter study.
We wanted to integrate the option to leave in a classical repeated games setting. Some papers that do retain the classical repeated prisoner’s dilemma setting choose a natural, well-argued subset of strategies [38, 39], sometimes in an Axelrod tournament style [30, 33, 37, 64]. Others, including ours, allow for a general, unrestricted strategy set [31, 34–36, 41–43, 45–50].
In our model, all individuals play every stage game with exactly one partner. This aspect is shared with most work on repeated games, but especially in the domain of partner choice, one could imagine settings in which how many interaction partners one has varies, and is the result of choices made by the individual. Theory papers that allow for a varying number of partners include [58–61]. Experimental papers that make the same choice include [51–53, 56]. The possibility to keep more than one partner by cooperating, or end up having no partner at all by defecting, might enhance the effect of partner choice. In all of the papers mentioned above, individuals are restricted to play the same stage game action (C or D) with all partners. One could also imagine realistic scenarios in which individuals differentiate between partners, depending on how they behave.
In the Introduction, we mentioned that in our model, when new partnerships are formed, the partners do not learn anything about past interactions that their partners had with others. We chose for individuals to not have information about interactions other than their own in order not to open the door for reputation formation or indirect reciprocity. This is a choice at one end of the spectrum, where in reality there might be some information flow. Theory papers with information flow include [33, 37, 64], empirical papers with information flow include [51–53]. More information spillover may raise the benefits of cooperating, and also enhance the effect of partner choice.
Another stylized property is that we assume what one could call a well-mixed matching pool; within the matching pool, all pairs are equally likely to be formed. In reality, there might be an exogenous, or an evolving population structure that limits the available partners, or makes some pairs more likely to be formed than others [38–40, 61].
Updating is also global in our model. If individuals in a pair die, the only thing that determines the probability with which other individuals produce offspring that replaces them, is their payoffs, and not, for instance, their proximity in a network. A network structure may however also affect who gets to reproduce where at the update event [40, 58–60].
In our stylized model, the absence of population structure, both regarding pair formation, and regarding updating, means that there is no exogenous assortment; all assortment is endogenous and only due to the choice to leave uncooperative partners. In that sense our model with leaving isolates partner choice as an ingredient.
Finally, we assume that all strategies are executed without errors. The effect of errors is investigated in [3, 19, 62]. The latter also includes a result that implies that vanishingly small error rates have a vanishingly small effect on what strategies evolve.
Discussion
The two mechanisms that received the lion share of the attention in the literature on the evolution of cooperation are kin selection—sometimes also classified as population structure—and repetition. Population structure typically refers to any deviation from a well-mixed population, in which individuals are matched randomly. This includes interactions on networks [65–69], or within groups [70–75]. In those models, local dispersal causes neighbouring individuals, or individuals within the same group, to have an increased probability of being identical by descent, and when they do, the mechanism at work is kin selection [76–78].
Our model relates, first and foremost, to the second mechanism, in which repetition allows for reciprocity and cooperation to co-evolve [1–19]. Our version deviates from the standard setup, in that it allows for individuals to leave their current partner, and seek out someone else to play prisoner’s dilemmas with. This shortens the long arm of reciprocity, because it allows individuals to run from punishment by defection. The option to leave however turns out to increase rather than reduce the average amount of cooperation that evolves. By allowing individuals to get up and leave, the model also introduces the possibility of partner choice, and this can create endogenous assortment in mixed populations. Away from the low mutation limit, this can make equilibria in which defections are punished with leaving more stable than equilibria in which defections are punished with defections. Partner choice therefore seems to be at least as powerful a mechanism for the evolution of cooperation in repeated games as reciprocity is. Already in a minimal setting, in which players have no prior information about the partner they are matched with, and only have the interactions within the repeated game itself to base their decisions to stay or to go on, this mechanism works very well.
Why humans cooperate with their siblings, even if it is costly, or why we are altruistic towards our offspring is well explained by kin selection. The research on the evolution of human cooperation therefore naturally centers around the question why we also cooperate with non-kin. For this we tend to turn towards repeated games and the reciprocity that can evolve there, or to the interaction between repetition and population structure [11, 79]. Our model points to the power of a third mechanism, which is partner choice [25–29, 57]. The assortment that this generates is different from the exogenous assortment that features in kin selection models. The assortment in our model is endogenous, and not based on identity by descent. Individuals in our model stay with their partner purely based on the experienced phenotype of their partner, and are not playing with others that are related to them, where relatedness would determine the probability with which they inherited their strategy from the same individual. This phenotypic assortment, where unrelated, similarly dependable cooperators end up playing with each other, may be a better match with the long-lasting cooperation we observe in humans, who tend to exert some influence over who they cooperate with, if they can, and who cooperate with genetically unrelated others.
Supporting information
S1 Text.
The Supporting Information gives more detail regarding the theoretical model and the way it is simulated. It also gives theoretical results mentioned in the Main Text, with proofs. It is subdivided as follows. The Model. Calculating payoffs. Why these are the average payoffs, if we assume short-run equilibrium. With and without leaving in one setting. Frequencies in the matching pool and in the population as a whole. Histories and strategies. Finite state automata. Simulations. Different mutation procedures. Algorithm for the simulations. Theoretical results. No ESS. No strategy that is RAII. Pure strategies with a trust-building phase.
https://doi.org/10.1371/journal.pcbi.1012810.s001
(PDF)
References
- 1. Axelrod R, Hamilton WD. The evolution of cooperation. Science. 1981;211(4489):1390–1396. pmid:7466396
- 2. Boyd R, Lorberbaum JP. No pure strategy is evolutionarily stable in the repeated prisoner’s dilemma game. Nature. 1987;327(6117):58–59.
- 3. Fudenberg D, Maskin E. Evolution and cooperation in noisy repeated games. American Economic Review. 1990;80(2):274–279.
- 4. Nowak M, Sigmund K. The evolution of stochastic strategies in the prisoner’s dilemma. Acta Applicandae Mathematica. 1990;20:247–265.
- 5. Lorberbaum J. No strategy is evolutionarily stable in the repeated prisoner’s dilemma. Journal of Theoretical Biology. 1994;168(2):117–130. pmid:8022193
- 6. Bendor J, Swistak P. Types of evolutionary stability and the problem of cooperation. Proceedings of the National Academy of Sciences. 1995;92(8):3596–3600. pmid:11607530
- 7. Binmore KG, Samuelson L. Evolutionary stability in repeated games played by finite automata. Journal of economic theory. 1992;57(2):278–305.
- 8. Cooper DJ. Supergames played by finite automata with finite costs of complexity in an evolutionary setting. Journal of Economic Theory. 1996;68(1):266–275.
- 9. Volij O. In defense of DEFECT. Games and Economic Behavior. 2002;39(2):309–321.
- 10. Imhof LA, Fudenberg D, Nowak MA. Evolutionary cycles of cooperation and defection. Proceedings of the National Academy of Sciences. 2005;102(31):10797–10800. pmid:16043717
- 11. van Veelen M, García J, Rand DG, Nowak MA. Direct reciprocity in structured populations. Proceedings of the National Academy of Sciences. 2012;109(25):9929–9934. pmid:22665767
- 12. Press WH, Dyson FJ. Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent. Proceedings of the National Academy of Sciences. 2012;109(26):10409–10413. pmid:22615375
- 13. Adami C, Hintze A. Evolutionary instability of zero-determinant strategies demonstrates that winning is not everything. Nature communications. 2013;4(1):2193. pmid:23903782
- 14. Hilbe C, Nowak MA, Sigmund K. Evolution of extortion in iterated prisoner’s dilemma games. Proceedings of the National Academy of Sciences. 2013;110(17):6913–6918. pmid:23572576
- 15. Stewart AJ, Plotkin JB. From extortion to generosity, evolution in the iterated prisoner’s dilemma. Proceedings of the National Academy of Sciences. 2013;110(38):15348–15353. pmid:24003115
- 16. Stewart AJ, Plotkin JB. Collapse of cooperation in evolving games. Proceedings of the National Academy of Sciences. 2014;111(49):17558–17563. pmid:25422421
- 17. García J, van Veelen M. In and out of equilibrium I: Evolution of strategies in repeated games with discounting. Journal of Economic Theory. 2016;161:161–189.
- 18. van Veelen M, García J. In and out of equilibrium II: Evolution in repeated games with discounting and complexity costs. Games and Economic Behavior. 2019;115:113–130.
- 19.
Dal Bó P, Pujals ER. The evolutionary robustness of forgiveness and cooperation. Working paper. 2020.
- 20. Okuno-Fujiwara M, Postlewaite A. Social norms and random matching games. Games and Economic behavior. 1995;9(1):79–109.
- 21. dos Santos M, Rankin DJ, Wedekind C. The evolution of punishment through reputation. Proceedings of the Royal Society B: Biological Sciences. 2011;278(1704):371–377. pmid:20719773
- 22. dos Santos M, Rankin DJ, Wedekind C. Human cooperation based on punishment reputation. Evolution. 2013;67(8):2446–2450. pmid:23888865
- 23. Nowak MA, Sigmund K. Evolution of indirect reciprocity by image scoring. Nature. 1998;393(6685):573–577. pmid:9634232
- 24. Ohtsuki H, Iwasa Y. The leading eight: social norms that can maintain cooperation by indirect reciprocity. Journal of Theoretical Biology. 2006;239(4):435–444. pmid:16174521
- 25. Noë R, Hammerstein P. Biological markets: supply and demand determine the effect of partner choice in cooperation, mutualism and mating. Behavioral Ecology & Sociobiology. 1994;35:1–11.
- 26. Noë R, Hammerstein P. Biological markets. Trends in Ecology & Evolution. 1995;10(8):336–339. pmid:21237061
- 27. McNamara JM, Barta Z, Fromhage L, Houston AI. The coevolution of choosiness and cooperation. Nature. 2008;451(7175):189–192. pmid:18185587
- 28.
Barrett L, Henzi SP. Monkeys, markets and minds: biological markets and primate sociality. In: Cooperation in primates and humans: Mechanisms and evolution. Springer; 2006. p. 209–232.
- 29. Barclay P. Strategies for cooperation in biological markets, especially for humans. Evolution and Human Behavior. 2013;34(3):164–175.
- 30. Schuessler R. Exit threats and cooperation under anonymity. Journal of Conflict Resolution. 1989;33(4):728–749.
- 31. Enquist M, Leimar O. The evolution of cooperation in mobile organisms. Animal Behaviour. 1993;45(4):747–757.
- 32. Peck JR. Friendship and the evolution of co-operation. Journal of Theoretical Biology. 1993;162(2):195–228. pmid:8412224
- 33.
Yamagishi T, Hayashi N, Jin N. Prisoner’s dilemma networks: selection strategy versus action strategy. In: Social dilemmas and cooperation. Springer; 1994. p. 233–250.
- 34. Ghosh P, Ray D. Cooperation in community interaction without information flows. The Review of Economic Studies. 1996;63(3):491–519.
- 35. Kranton RE. The formation of cooperative relationships. The Journal of Law, Economics, and Organization. 1996;12(1):214–233.
- 36. Carmichael HL, MacLeod WB. Gift Giving and the Evolution of Cooperation. International Economic Review. 1997;38(3):485.
- 37. Hayashi N, Yamagishi T. Selective play: Choosing partners in an uncertain world. Personality and Social Psychology Review. 1998;2(4):276–289. pmid:15647134
- 38. Aktipis CA. Know when to walk away: contingent movement and the evolution of cooperation. Journal of Theoretical Biology. 2004;231(2):249–260. pmid:15380389
- 39. Aktipis CA. Is cooperation viable in mobile organisms? Simple Walk Away rule favors the evolution of cooperation in groups. Evolution and Human Behavior. 2011;32(4):263–276. pmid:21666771
- 40. Vainstein MH, Silva AT, Arenzon JJ. Does mobility decrease cooperation? Journal of theoretical biology. 2007;244(4):722–728. pmid:17055534
- 41. Fujiwara-Greve T, Okuno-Fujiwara M. Voluntarily Separable Repeated Prisoner’s Dilemma. Review of Economic Studies. 2009;76(3):993–1021.
- 42. Fujiwara-Greve T, Okuno-Fujiwara M. Diverse behavior patterns in a symmetric society with voluntary partnerships. SSRN 2343119. 2016.
- 43. Fujiwara-Greve T, Okuno-Fujiwara M, Suzuki N. Efficiency may improve when defectors exist. Economic Theory. 2015;60(3):423–460.
- 44.
Immorlica N, Lucier B, Rogers B. Cooperation in anonymous dynamic social networks. In: Proceedings of the 11th ACM conference on Electronic commerce; 2010. p. 241–242.
- 45. Izquierdo LR, Izquierdo SS, Vega-Redondo F. The option to leave: Conditional dissociation in the evolution of cooperation. Journal of Theoretical Biology. 2010;267(1):76–84. pmid:20688083
- 46. Izquierdo LR, Izquierdo SS, Vega-Redondo F. Leave and let leave: A sufficient condition to explain the evolutionary emergence of cooperation. Journal of Economic Dynamics and Control. 2014;46:91–113.
- 47.
Izquierdo SS, Izquierdo LR, van Veelen M. Repeated games with endogenous separation. Working paper. 2021.
- 48.
Izquierdo SS, Izquierdo LR. Equilibria in Repeated games with endogenous separation. Working paper. 2024.
- 49. Vesely F, Yang CL. On optimal and neutrally stable population equilibrium in voluntary partnership prisoner’s dilemma games. SSRN. 2010.
- 50. Vesely F, Yang CL. Breakup, secret handshake and neutral stability in repeated prisoner’s dilemma with option to leave: A note. SSRN. 2012.
- 51. Rand DG, Arbesman S, Christakis NA. Dynamic social networks promote cooperation in experiments with humans. Proceedings of the National Academy of Sciences. 2011;108(48):19193–19198. pmid:22084103
- 52. Wang J, Suri S, Watts DJ. Cooperation and assortativity with dynamic partner updating. Proceedings of the National Academy of Sciences. 2012;109(36):14363–14368. pmid:22904193
- 53. Antonioni A, Cacault MP, Lalive R, Tomassini M. Know thy neighbor: Costly information can hurt cooperation in dynamic networks. PLoS One. 2014;9(10):e110788. pmid:25356905
- 54. Bednarik P, Fehl K, Semmann D. Costs for switching partners reduce network dynamics but not cooperative behaviour. Proceedings of the Royal Society B: Biological Sciences. 2014;281(1792):20141661.
- 55. Barclay P, Raihani N. Partner choice versus punishment in human Prisoner’s Dilemmas. Evolution and Human Behavior. 2016;37(4):263–271.
- 56. Efferson C, Roca CP, Vogt S, Helbing D. Sustained cooperation by running away from bad behavior. Evolution and Human Behavior. 2016;37(1):1–9. pmid:26766895
- 57. Sherratt TN, Roberts G. The evolution of generosity and choosiness in cooperative exchanges. Journal of Theoretical Biology. 1998;193(1):167–177. pmid:9689952
- 58. Zimmermann MG, Eguíluz VM, San Miguel M. Coevolution of dynamical states and interactions in dynamic networks. Physical Review E—Statistical, Nonlinear, and Soft Matter Physics. 2004;69(6):065102. pmid:15244650
- 59. Zimmermann MG, Eguíluz VM. Cooperation, social networks, and the emergence of leadership in a prisoner’s dilemma with adaptive local interactions. Physical Review E—Statistical, Nonlinear, and Soft Matter Physics. 2005;72(5):056118. pmid:16383699
- 60. Eguíluz VM, Zimmermann MG, Cela-Conde CJ, Miguel MS. Cooperation and the emergence of role differentiation in the dynamics of social networks. American journal of sociology. 2005;110(4):977–1008.
- 61. Roca CP, Helbing D. Emergence of social cohesion in a model society of greedy, mobile individuals. Proceedings of the National Academy of Sciences. 2011;108(28):11370–11374.
- 62.
Graser C, van Veelen M. Repeated prisoner’s dilemmas with errors: How much subgameperfection, how much forgiveness, and how much cooperation? Tinbergen Institute Discussion Paper 2024.
- 63. van Veelen M. Robustness against indirect invasions. Games and Economic Behavior. 2012;74:382–393.
- 64. Hayashi N. From TIT-FOR-TAT to OUT-FOR-TAT. Sociological Theory and Methods. 1993;8(1):19–32.
- 65. Lieberman E, Hauert C, Nowak MA. Evolutionary dynamics on graphs. Nature. 2005;433(7023):312. pmid:15662424
- 66. Ohtsuki H, Hauert C, Lieberman E, Nowak MA. A simple rule for the evolution of cooperation on graphs and social networks. Nature. 2006;441(7092):502. pmid:16724065
- 67. Taylor PD, Day T, Wild G. Evolution of cooperation in a finite homogeneous graph. Nature. 2007;447(7143):469–472. pmid:17522682
- 68. Santos FC, Pacheco JM. Scale-free networks provide a unifying framework for the emergence of cooperation. Physical Review Letters. 2005;95(9):098104. pmid:16197256
- 69. Allen B, Lippner G, Chen YT, Fotouhi B, Momeni N, Yau ST, et al. Evolutionary dynamics on any population structure. Nature. 2017;544(7649):227–230. pmid:28355181
- 70.
Boyd R, Richerson PJ. Culture and the evolutionary process. University of Chicago press; 1988.
- 71. Wilson DS, Wilson EO. Rethinking the theoretical foundations of sociobiology. Quarterly Review of Biology. 2007;82(4):327–348. pmid:18217526
- 72. Traulsen A, Nowak MA. Evolution of cooperation by multilevel selection. Proceedings of the National Academy of Sciences. 2006;103(29):10952–10955. pmid:16829575
- 73. Simon B, Fletcher JA, Doebeli M. Towards a general theory of group selection. Evolution. 2013;67:1561–1572. pmid:23730751
- 74. Luo S. A unifying framework reveals key properties of multilevel selection. Journal of Theoretical Biology. 2014;341:41–52. pmid:24096098
- 75. Akdeniz A, van Veelen M. The cancellation effect at the group level. Evolution. 2020;74(7):1246–1254. pmid:32385860
- 76. Hamilton WD. The genetical evolution of social behaviour. I. Journal of Theoretical Biology. 1964;7(1):1–16. pmid:5875341
- 77. Hamilton WD. The genetical evolution of social behaviour. II. Journal of Theoretical Biology. 1964;7(1):17–52. pmid:5875340
- 78. Kay T, Keller L, Lehmann L. The evolution of altruism and the serial rediscovery of the role of relatedness. Proceedings of the National Academy of Sciences. 2020;117(46):28894–28898. pmid:33139540
- 79. Efferson C, Bernhard H, Fischbacher U, Fehr E. Super-additive cooperation. Nature. 2024;626(8001):1034–1041. pmid:38383778