Figures
Abstract
Real-world agents, humans as well as animals, observe each other during interactions and choose their own actions taking the partners’ ongoing behaviour into account. Yet, classical game theory assumes that players act either strictly sequentially or strictly simultaneously without knowing each other’s current choices. To account for action visibility and provide a more realistic model of interactions under time constraints, we introduce a new game-theoretic setting called transparent games, where each player has a certain probability of observing the partner’s choice before deciding on its own action. By means of evolutionary simulations, we demonstrate that even a small probability of seeing the partner’s choice before one’s own decision substantially changes the evolutionary successful strategies. Action visibility enhances cooperation in an iterated coordination game, but reduces cooperation in a more competitive iterated Prisoner’s Dilemma. In both games, “Win–stay, lose–shift” and “Tit-for-tat” strategies are predominant for moderate transparency, while a “Leader-Follower” strategy emerges for high transparency. Our results have implications for studies of human and animal social behaviour, especially for the analysis of dyadic and group interactions.
Author summary
Humans and animals constantly make social decisions. Should an animal during group foraging or a human at the buffet try to obtain an attractive food item but risk a confrontation with a dominant conspecific, or is it better to opt for a less attractive but non-confrontational choice, especially when considering that the situation will repeat in the future? To model decision-making in such situations game theory is widely used. However, classic game theory assumes that agents act either at the same time, without knowing each other’s choices, or one after another. In contrast, humans and animals usually try to take the behaviour of their opponents and partners into account, to instantaneously adjust their own actions if possible. To provide a more realistic model of decision making in a social setting, we here introduce the concept of transparent games. It integrates the probability of observing the partner’s instantaneous actions into the game-theoretic framework of knowing previous choice outcomes. We find that such “transparency” has a direct influence on the emergence of cooperative behaviours in classic iterated games. The transparent games can contribute to a deeper understanding of social behaviour and decision-making of humans and animals.
Citation: Unakafov AM, Schultze T, Gail A, Moeller S, Kagan I, Eule S, et al. (2020) Emergence and suppression of cooperation by action visibility in transparent games. PLoS Comput Biol 16(1): e1007588. https://doi.org/10.1371/journal.pcbi.1007588
Editor: James O’Dwyer, University of Illinois at Urbana-Champaign, UNITED STATES
Received: March 28, 2019; Accepted: December 6, 2019; Published: January 9, 2020
Copyright: © 2020 Unakafov et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting Information files.
Funding: We acknowledge funding from the Ministry for Science and Education of Lower Saxony and the Volkswagen Foundation through the program “Niedersächsisches Vorab,” https://www.volkswagenstiftung.de/unsere-foerderung/unser-foerderangebot-im-ueberblick/vorab.html. We acknowledge additional support by the Leibniz Association through funding for the Leibniz ScienceCampus Primate Cognition and the Max Planck Society, https://www.leibniz-gemeinschaft.de/en/home/. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
One of the most interesting questions in evolutionary biology, social sciences, and economics is the emergence and maintenance of cooperation [1–5]. A popular framework for studying cooperation (or the lack thereof) is game theory, which is frequently used to model interactions between “rational” decision-makers [6–9]. A model for repeated interactions is provided by iterated games with two commonly used settings [2]. In simultaneous games all players act at the same time and each player has to make a decision under uncertainty regarding the current choice of the partner(s). In sequential games players act one after another in a random or predefined order [10] and the player acting later in the sequence is guaranteed to see the choices of the preceding player(s). Maximal uncertainty only applies to the first player and—if there are more than two players—is reduced with every turn in the sequence.
Both classical settings simplify and restrict the decision context: either no player has any information about the choices of the partners (simultaneous game), or each time some players have more information than others (sequential game). This simplification prevents modelling of certain common behaviours, since humans and animals usually act neither strictly simultaneously nor sequentially, but observe the choices of each other and adjust their actions accordingly [1]. Indeed, the visibility of the partner’s actions plays a crucial role in social interactions, both in laboratory experiments [3, 11–16] and in natural environments [4, 17–20].
For example, in soccer the penalty kicker must decide where to place the ball and the goalkeeper must decide whether to jump to one of the sides or to stay in the centre. Both players resort to statistics about the other’s choices in the past, making this more than a simple one-shot game. Since the goalkeeper must make the choice while the opponent is preparing the shot, a simultaneous game provides a first rough model for such interactions [21, 22]. However, the simultaneous model ignores the fact that both players observe each other’s behaviour and try to predict the direction of the kick or of the goalkeeper’s jump from subtle preparatory cues [15], which often works better than at chance level [21–23]. Using instantaneous cues should not only affect one-shot decisions but also iterative statistics: Learning by observing a keeper over iterations that he has the tendency of jumping prematurely encourages strategies of delayed shots by the kicker, and vice versa. While the soccer example represents a zero-sum game, similar considerations apply to a wide range of real life interactions, see for instance Fig 1. Yet a framework for the treatment of such cases is missing in classical game theory.
Two monkeys are reaching for food in two locations that are at some distance so that each monkey can take only one portion. At one location are grapes (preferred food), at the other—a carrot (non-preferred food). (A) Initially both monkeys move toward grapes. (B) Monkey 1 observes Monkey 2 actions and decides to go for the carrot to avoid a potential fight. (C) Next time Monkey 1 moves faster towards the grapes, so Monkey 2 swerves towards the carrot. Coordinated behaviour in such situations has the benefit of higher efficiency and avoids conflicts. This example shows that transparent game is a versatile framework that can be used for describing decision making in social contexts.
To better predict and explain the outcomes of interactions between agents by taking the visibility factor into account, we introduce the concept of transparent games, where players can observe actions of each other. In contrast to the classic simultaneous and sequential games, a transparent game is a game-theoretic setting where the access to the information about current choices of other players is probabilistic. For example, for a two-player game in each round three cases are possible:
- Player 1 knows the choice of Player 2 before making its own choice.
- Player 2 knows the choice of Player 1 before making its own choice.
- Neither player knows the choice of the partner.
Only one of the cases 1-3 takes place in each round, but for a large number of rounds one can infer the probability of Player i to see the choice of the partner before making own choice. These probabilities depend on the reaction times of the players. If they act nearly at the same time, neither is able to use the information about partner’s action; but a player who waits before making the choice has a higher probability of seeing the choice of the partner. Yet, explicit or implicit time constraint prevents players from waiting indefinitely for the partner’s choice. In the general case transparent games impose an additional uncertainty on the players acting first: they cannot know in advance whether the other players will see their decision or not in a given round.
The framework of transparent games is generic and includes classic game-theoretical settings as special cases: simultaneous games correspond to , while sequential games result in , for a fixed order of decisions in each round (Player 1 always moves first, Player 2—second) and in for a random sequence of decisions. Here we ask if probabilistic access to the information on the partner’s choice in transparent games leads to the emergence of different behavioural strategies compared to the fully unidirectional access in sequential games or to the case of no access in simultaneous games.
To answer this question, we consider the effects of transparency on emergence of cooperation in two-player two-choice games. To draw a comparison with the results for classic simultaneous and sequential settings, we focus here on the typically studied memory-one strategies [9, 24] that take into account own and partner’s choices at the previous round of the game. Since cooperation has multiple facets [1, 4, 8], we investigate two types of games which are traditionally used for studying two different forms of cooperation [6, 8, 25, 26]: the iterated Prisoner’s dilemma (iPD) [6] and the iterated (Anti-)Coordination Game (i(A)CG). We chose the generic term (Anti-) Coordination Game because depending on the exact formulation of the payoff matrix, (A)CGs can encompass a wide range of games such as the Battle of the Sexes or Bach-or-Stravinsky game we focus on here [27, 28], but also the Hawk-Dove or Chicken game, and the Leader game [29]. The two games encourage two distinct types of cooperative behaviour [30, 31], since the competitive setting in iPD requires “trust” between partners for cooperation to emerge, i.e. a social concept with an inherent longer-term perspective. In the less competitive i(A)CG, instead, cooperation of players in form of simple coordination of their actions can be beneficial even in one-shot situations. Our hypothesis is that transparency should have differential effects on long-term optimal strategies in these two types of games. We show with the help of evolutionary simulations that this is indeed the case: transparency enhances cooperation in the generally cooperative i(A)CG, but reduces cooperation in the more competitive iPD.
Results
We investigated the success of different behavioural strategies in the iPD and i(A)CG games by using evolutionary simulations. These simulations allow evaluating long-term optimal strategies using principles of natural selection, where fitness of an individual is defined as the achieved payoff compared to the population average (see “Methods”). The payoff matrices, specifying each player’s payoff conditional upon own and other’s choice, are shown in Fig 2 for both games.
(A) In Prisoner’s Dilemma, players adopt roles of prisoners suspected of committing a crime and kept in isolated rooms. Due to lack of evidence, prosecutors offer each prisoner an option to minimize the punishment by making a confession. A prisoner can select one of the two actions (A_{1} or A_{2}): either betray the other by defecting (D), or cooperate (C) with the partner by remaining silent. The maximal charge is five years in prison, and the payoff matrix represents the number of years deducted from it (for instance, if both players cooperate (CC, upper left), each gets a two-year sentence, because three years of prison time have been deducted). The letters R,T,S and P denote payoff values and stand for Reward, Temptation, Saint and Punishment, respectively. (B) In the (Anti-)Coordination Game variant known as Bach-or-Stravinsky and as Hero [27–29]) two people are choosing between Bach and Stravinsky music concerts. Player 1 prefers Bach, Player 2—Stravinsky, hence, there is an inherent conflict about which concert to choose; yet, above all both prefer going to the concert together. Thus the aim of the players is to coordinate (either on Bach or on Stravinsky), which assures maximal joint reward for the players. Players can either insist (I) on their own preference or accommodate (A) the preference of the partner. In these terms, the outcome coordination (attending the same concert) is achieved by selecting complementary actions: either (I, A) or (A, I), which justifies the name: “anti-”coordination. For example, when both agents coordinate on Bach, Player 1 insists, while Player 2 accommodates (I, A). In the “Methods”, we consider also a more general class of (anti-)coordination games, encompassing Hawk-Dove (or Chicken) and Leader.
Our evolutionary simulations show that the probability of seeing the partner’s choice had a considerable effect on the likelihood of acting cooperatively. In both games this likelihood differs considerably for the probabilities below and above 0.35 (Fig 3, S2 Note). Further, the transparency levels at which the likelihood of cooperation was high, turned out to be largely complementary in both games.
We performed 80 runs of evolutionary simulations tracing 10^{9} generations of iPD and i(A)CG players. Agents with successful strategies reproduced themselves (had higher fraction in the next generation), while agents with unsuccessful strategies died out, see “Methods” for details. We considered a run as “cooperative” if the average payoff across the population was more than 0.9 times the pay-off of 3 units for cooperative behaviour in iPD [24], and more than 0.95 times the pay-off of 3.5 units for cooperative behaviour in the i(A)CG (i.e., 90% and 95% of the maximally achievable pay-off on average over both players). For i(A)CG we set a higher threshold due to the less competitive nature of this game. (A) In iPD cooperation was quickly established for low probability to see the partner’s choice p_{see}, but it took longer to develop for moderate p_{see} and it drastically decreased for high p_{see}. (B) In contrast, for i(A)CG frequent cooperation emerges only for high visibility. The small drop in cooperation at p_{see} = 0.4 is caused by a transition between two coordination strategies (see main text). (C) The forfeit payoff (maximal possible average payoff of the population minus actual average payoff obtained by the population) further illustrates the same tendencies: higher transparency reduces effectiveness of cooperation in iPD but increases in i(A)CG.
In the following, we analyse in more detail what is behind the effect of transparency on the cooperation frequency that is revealed in our simulations. First, we provide analytical results for non-iterated (one-shot) transparent versions of Prisoner’s Dilemma (PD) and (Anti-)Coordination Game ((A)CG). Second, after briefly explaining the basic principles adopted in our evolutionary simulations, we describe the strategies that emerge in these simulations for the iPD and i(A)CG games.
Transparent games without memory: Analytical results
In game theory, the Nash Equilibrium (NE) describes optimal behaviour for the players [7]. In dyadic games, NE is a pair of strategies, such that neither player can get a higher payoff by unilaterally changing its strategy. Both in PD and in (A)CG, players choose between two actions, A_{1} or A_{2} (see Fig 2): They cooperate or defect in PD and insist or accommodate in (A)CG according to their strategies. In a one-shot transparent game, a strategy is represented by a vector (s_{1}; s_{2}; s_{3}), where s_{1} is the probability of selecting A_{1} without seeing the partner’s choice, s_{2} the probability of selecting A_{1} while seeing the partner also selecting A_{1}, and s_{3} the probability of selecting A_{1} while seeing partner selecting A_{2}, respectively. The probabilities of selecting A_{2} are equal to 1 − s_{1}, 1 − s_{2} and 1 − s_{3}, correspondingly. For example, strategy (1; 1; 0) in the transparent PD means that the player cooperates unless seeing that the partner defects.
For the one-shot transparent PD we show (Proposition 2 in “Methods”) that all NE are comprised by defecting strategies (0; x; 0) with , where P, S and R are the elements of the payoff matrix (Fig 2A) and p_{see} is the probability to see the choice of the partner. At a population level, this means that cooperation does not survive in the transparent one-shot PD, similar to the classic PD.
For the one-shot transparent (A)CG we show that the NE depend on p_{see} (Proposition 4). For there are three NE: (a) Player 1 uses (0; 0; 1), Player 2 uses (1; 0; 1); (b) vice versa; (c) both players use strategy (x; 0; 1) with . Note that for the limiting case of p_{see} = 0 one gets the three NE known from the classic one-shot simultaneous (A)CG [29]. However, for the only NE is provided by (1; 0; 1). In particular, for (A)CG defined by the payoff matrix in Fig 2B, there are three NE for p_{see} < 1/3 and one NE otherwise. This means that population dynamics is considerably different for the cases p_{see} < 1/3 and p_{see} > 1/3, and as we show below this is also true for the iterated (A)CG. We also show that transparent versions of two other classical games (“Hawk-Dove” and “Leader”) have a similar NE structure (Proposition 8).
In summary, introducing action transparency influences optimal behaviour already in simple one-shot games.
Transparent games with memory: Evolutionary simulations
Iterated versions of PD and (A)CG games (iPD and i(A)CG) differ from one-shot games in that the current choice. We focus on strategies taking into account own and partner’s choices in one previous round of the game (“memory-one” strategies) for reasons of tractability. A strategy without memory in transparent games is described by a three-element vector. A memory-one strategy additionally conditions the current choice upon the outcome of the previous round of the game. Since there are four (2 × 2) possible outcomes, a memory-one strategy is represented by a vector , where k enumerates the twelve (4 × 3) different combinations of previous outcome and the current probability of choice. The entries s_{k} of the strategy thus represent the conditional probabilities to select action A_{1}, specifically
- s_{1}, …, s_{4} are probabilities to select A_{1} without seeing the partner’s choice, given that in the previous round the joint choice of the player and the partner was A_{1}A_{1}, A_{1}A_{2}, A_{2}A_{1}, and A_{2}A_{2} respectively (the first action specifies the choice of the player, and the second—the choice of the partner);
- s_{5}, …, s_{8} are probabilities to select A_{1}, seeing the partner selecting A_{1} and given the outcome of the previous round (as before).
- s_{9}, …, s_{12} are probabilities to select A_{1}, seeing the partner selecting A_{2} and given the outcome of the previous round.
Probabilities to select A_{2} are given by (1 − s_{k}), respectively.
We used evolutionary simulations to investigate which strategies evolve in the transparent iPD and i(A)CG (see “Methods” and [9, 24] for more detail), since an analytical approach would require solving systems of 12 differential equations. We studied an infinite population of players to avoid stochastic effects associated with finite populations [32]. For any generation t the population consisted of n(t) types of players, each defined by a strategy s^{i} and relative frequency x_{i}(t) in the population with . To account for possible errors in choices and to ensure numerical stability of the simulations (see “Methods”), we assumed that no pure strategy is possible, that is , with ε = 0.001 [9, 24]. Frequency x_{i}(t) in the population increased with t for strategies getting higher-than-average payoff when playing against the current population and decreased otherwise. This ensured “survival of the fittest” strategies. In both games, we assumed players to have equal mean reaction times (see “Methods” for the justification of this assumption). Then the probability p_{see} to see the choice of the partner was equal for all players, which in a dyadic game resulted in p_{see} ≤ 0.5. We performed evolutionary simulations for various transparencies with p_{see} = 0.0, 0.1, …, 0.5.
In the two following sections we discuss the simulation results for both games in detail and describe the strategies that are successful for different transparency levels. Since the strategies in the evolutionary simulations were generated randomly (mimicking random mutations), convergence of the population onto the theoretical optimum may take many generations and observed successful strategies may deviate from the optimum. Therefore, when reporting the results below we employ a coarse-grained description of strategies using the following notation: symbol 0 for s_{k} ≤ 0.1, symbol 1 for s_{k} ≥ 0.9, symbol * is used as a wildcard character to denote an arbitrary probability.
To exemplify this notation, let us describe the strategies that are known from the canonical simultaneous iPD [9], affecting exclusively s_{1}, …, s_{4}, for the transparent version of this game, i.e. including s_{5}, …, s_{12}.
- The Generous tit-for-tat (GTFT) strategy is encoded by (1a1c;1***;****), where 0.1 < a, c < 0.9. Indeed, GTFT is characterized by two properties [9]: it cooperates with cooperators and forgives defectors. To satisfy the first property, the probability to cooperate after the partner cooperated in the previous round should be high, thus the corresponding entries of the strategy s_{1}, s_{3}, s_{5} are encoded by 1. To satisfy the second property, the probability to cooperate after the partner defected should be between 0 and 1. We allow a broad range of values for s_{2} and s_{4}, namely 0.1 ≤ s_{2}, s_{4} ≤ 0.9. We accept arbitrary values for s_{6}, …, s_{12} since for low values of p_{see} these entries have little influence on the strategy performance, meaning that their evolution towards optimal values may take especially long. For instance, the strategy entry s_{7} is used only when the player has defected in previous round and is seeing that the partner is cooperating in the current round. But GTFT player defects very rarely, hence the s_{7} is almost never used and its value has little or no effect on the overall behaviour of a GTFT player.
- Similarly, Firm-but-fair (FbF) by (101c;1***;****), where 0.1 < c < 0.9.
- Tit-for-tat (TFT) is a “non-forgiving” version of GTFT, encoded by (1010;1***;****).
- Win–stay, lose–shift (WSLS) is encoded by (100c;1***;****) with c ≥ 2/3. Indeed, in the canonical simultaneous iPD WSLS repeats its own previous action if it resulted in relatively high rewards of R = 3 (cooperates after successful cooperation, thus s_{1} ≥ 0.9) or T = 5 (defects after successful defection, s_{3} ≤ 0.1), and switches to another action otherwise (s_{2} ≤ 0.1, s_{4} = c ≥ 2/3). Note that the condition for s_{4} is relaxed compared to s_{2} since payoff P = 1 corresponding to mutual defection is not so bad compared to S = 0 and may not require immediate switching. Additionally, we set s_{5} ≥ 0.9 to ensure that WSLS players cooperate with each other in the transparent iPD as they do in the simultaneous iPD.
We also consider a relaxed (cooperative) version of WSLS, which we term “generous WSLS” (GWSLS). It follows WSLS principle only in a general sense and is encoded by (1abc;1***;****) with c ≥ 2/3, a, b < 2/3 and either a > 0.1 or b > 0.1. - The Always Defect strategy (AllD) is encoded by (0000;**00;**00), meaning that the probability to cooperate when not seeing partner’s choice or after defecting is below 0.1, and other behaviour is not specified.
Note that here we selected the coarse-grained descriptions of the strategies, covering only those strategy variants that actually persisted in the population for our simulations.
Transparency suppresses cooperation in Prisoner’s Dilemma
Results of our simulations for the transparent iPD are presented in Table 1. Most of the effective strategies are known from earlier studies on non-transparent games [9]. They rely on the outcome of the previous round, not on the immediate information about the other player’s choice. However, for high transparency (p_{see} → 0.5) a previously unknown strategy emerged, which exploits the knowledge about the other player’s immediate behaviour. We dub this strategy “Leader-Follower” (L-F) since when two L-F players meet for p_{see} = 0.5, the player acting first (the Leader) defects, while the second player (the Follower) sees this and makes a “self-sacrificing” decision to cooperate. Note that when mean reaction times of the players coincide, they have equal probabilities to become a Leader ensuring balanced benefits of exploiting sacrificial second move. We characterized as L-F all strategies with profile (*00c;****;*11d) with c < 1/3 and d < 2/3. Indeed, for p_{see} = 0.5 these entries are most important to describe the L-F strategy: after unilateral defection the Leader always defects (s_{2}, s_{3} ≤ 0.1) and the Follower always cooperates (s_{10}, s_{11} ≥ 0.9). Meanwhile, mutual defection most likely takes place when playing against a defector, thus both Leaders and Followers have low probability to cooperate after mutual defection (s_{4} = c < 1/3, s_{12} = d < 2/3). Behaviour after mutual cooperation is only relevant when a player with L-F is playing against a player with a different strategy, and success of each L-F modification depends on the composition of the population. For instance, (100c;111*;100d) is optimal in a cooperative population.
The frequencies were computed over 10^{9} generations in 80 runs. The frequency of the most successful strategy for each p_{see} value is shown in bold.
In summary, as in the simultaneous iPD, WSLS was predominant in the transparent iPD for low and moderate p_{see}. This is reflected by the distinctive WSLS profiles in the final strategies of the population (Fig 4). Note that GTFT, another successful strategy in the simultaneous iPD, disappeared for p_{see} > 0. For p_{see} ≥ 0.4, the game resembled the sequential iPD and the results changed accordingly. Similar to the sequential iPD [10, 33, 34], the frequency of WSLS waned, the FbF strategy emerged, cooperation became less frequent and took longer to establish itself (Fig 3A). For p_{see} = 0.5 the population was taken over either by L-F, by WSLS-based strategies or (rarely) by FbF or TFT, which is reflected by the mixed profile in Fig 4. Note that the share of distinctly described strategies decreased with increasing p_{see}, which indicates that for high transparency most strategies appear in the population only transiently and rapidly replace each other, see S1 Fig. Under these circumstances, the relative frequency of L-F (17.8% of the population across all generations) is quite high. The fact that neither L-F nor the transient strategies are generally cooperative explains the drop of cooperation in iPD for high p_{see} (Fig 3).
Strategies are taken for the 10^{9}-th generation and averaged over 80 runs. This figure characterizes the final population as a whole and complements Table 1 representing specific strategies. (A) Strategy entries s_{1}, …, s_{4} are close to (1001) for p_{see} = 0.1, …, 0.3 demonstrating the dominance of WSLS. Deviations from this pattern for p_{see} = 0.0 and p_{see} = 0.4 indicate the presence of the GTFT (1a1b) and FbF (101b) strategies, respectively. For p_{see} ≥ 0.4 strategy entries s_{1}, …, s_{4} are quite low due to the extinction of cooperative strategies. (B) Entries s_{5}, …, s_{8} are irrelevant for p_{see} = 0.0 (resulting in random values around 0.5) and indicate the same WSLS-like pattern for p_{see} = 0.1, …, 0.3. Note that s_{6}, s_{7} > 0 indicate that in transparent settings WSLS-players tend to cooperate seeing that the partner is cooperating even when this is against the WSLS principle. The decrease of reciprocal cooperation for p_{see} ≥ 0.4 indicates the decline of WSLS and cooperative strategies in general. (C) Entries s_{9}, …, s_{12} are irrelevant for p_{see} = 0.0 (resulting in random values around 0.5) and are quite low for p_{see} = 0.1, …, 0.3 (s_{12} is irrelevant in a cooperative population). Increase of s_{9}, …, s_{11} for p_{see} ≥ 0.4 indicates that mutual cooperation in the population is replaced by unilateral defection.
To better explain the success of different strategies at different transparency levels, we analytically compared the strategies that emerged most frequently in simulations. Pairwise comparison of these strategies (Fig 5) helps to explain the superiority of WSLS for p_{see} < 0.5, the disappearance of GTFT for p_{see} > 0.0, and the abrupt increase of L-F frequency for p_{see} = 0.5.
For each pair of strategies the maps show if the first of the two strategies increases in frequency (up-arrow), or decreases (down-arrow) depending on visibility of the other player’s action and the already existing fraction of the respective strategy. The red lines mark the invasion thresholds, i.e. the minimal fraction of the first strategy necessary for taking over the population against the competitor second strategy. A solid-line invasion threshold shows the stable equilibrium fraction which allows coexistence of both strategies (see “Methods”). Dashed-line invasion thresholds indicate dividing lines above which only the first, below only the second strategy will survive. (A) WSLS has an advantage over GTFT : the former takes over the whole population even if its initial fraction is as low as 0.25. (B) GTFT coexists with (prudent) version of cooperative strategy AllC (1111; 1111; 0000), which is more successful for p_{see} ≥ 0.1. (C,D) L-F performs almost as good as GTFT and WSLS, (E) but can resist the AllD strategy (0000; 0000; 0000) only for high transparency. (F) Note that WSLS may lapse into its treacherous version, . This strategy dominates WSLS for p_{see} > 0 but is generally weak and cannot invade when other strategies are present in the population. Notably, when treacherous WSLS takes a part of the population, it is quickly replaced by L-F, which partially explains L-F success for high p_{see}.
Although cooperation in the transparent iPD is rare for p_{see} ≥ 0.4, L-F is in a sense also a cooperative strategy for iPD: In a game between two L-F players with equal mean reaction times, both players alternate between unilateral defection and unilateral cooperation in a coordinated way, resulting in equal average payoffs of (S + T)/2. Such alternation is generally sub-optimal in iPD since R > (S + T)/2; for instance, in our simulations R = 3 > (S + T)/2 = 2.5. To check the influence of the payoff on the strategies predominance, we have varied values of R by keeping T, S and P the same as in Fig 2 as it was done in [24] for simultaneous iPD. Fig 6 shows that for R > 3.2, evolution in the transparent iPD strongly favours cooperation for all transparency levels, but R ≤ 3.2 is sufficiently close to (S + T)/2 to make L-F a safe and efficient strategy. Indeed, as one can see from Fig 3C for R = 3, other strategies for high transparency perform much worse than L-F resulting in average population payoff around 2.
Data exemplified for p_{see} = 0.3 and for p_{see} = 0.5. Values of T, S and P are the same as in Fig 2, values of R are in range (S + T)/2 < R < T that defines the Prisoner’s Dilemma payoff. The frequencies were computed over 10^{9} generations in 40 runs. We describe as “other cooperative” all strategies having a pattern (1*1*;1***;****) or (1**1;1***;****) but different from WSLS, TFT and FbF. While for p_{see} = 0.3 population for low R mainly consists of defectors, for p_{see} = 0.5 L-F provides an alternative to defection. For R ≥ 3.2 mutual cooperation becomes much more beneficial, which allows cooperative strategies to prevail for all transparency levels. Yet higher transparency reduces cooperation for all values of R. Note that the higher R is the less specific the cooperative strategies are. Indeed, for high R cooperation is much more effective than other types of behaviour, which makes all cooperative strategies (including even unconditional cooperation) evolutionary successful (we refer to [24] for a similar result in the case of sequential iPD).
Note that higher transparency reduces cooperation for all values of R, although the effect is most prominent for R ≤ 3.2. Indeed, while the share of non-cooperative strategies in the population is negligible for R ≥ 3.6 with p_{see} = 0 [24] and is below 5% with p_{see} = 0.3, it is above 5% for p_{see} = 0.5 for all R ≥ 3.6 (compare the top and bottom plots in Fig 6).
Cooperation emergence in the transparent (Anti-)Coordination Game
Our simulations revealed that four memory-one strategies are most effective in i(A)CG for various levels of transparency. In contrast to iPD there exist only few studies of strategies in non-transparent i(A)CG [31, 35], therefore we describe the observed strategies in detail.
- Turn-taker aims to enter a fair coordination regime, where players alternate between IA (Player 1 insists and Player 2 accommodates) and AI (Player 1 accommodates and Player 2 insists) states. In the simultaneous i(A)CG, this strategy takes the form (q01q), where q = 5/8 guarantees maximal reward in a non-coordinated play against a partner with the same strategy for the payoff matrix in Fig 2B. Turn-taking was shown to be successful in the simultaneous i(A)CG for a finite population of agents with pure strategies (i.e., having 0 or 1 entries only, with no account for mistakes) and a memory spanning three previous rounds [31]. Here in our transparent i(A)CG, we classify as Turn-takers all strategies encoded by (*01*;*0**;**1*).
- Challenger takes the form (1101) in the simultaneous i(A)CG. When two players with this strategy meet, they initiate a “challenge”: both insist until one of the players makes a mistake (that is, accommodates). Then, the player making the mistake (loser) submits and continues accommodating, while the winner continues insisting. This period of unfair coordination beneficial for the winner ends when the next mistake of either player (the winner accommodating or the loser insisting) triggers a new “challenge”. Challenging strategies were theoretically predicted to be successful in simultaneous i(A)CG [35, 36]. In our transparent i(A)CG, the challenger strategy is encoded by (11b*;****;*1**) and has two variants: Challenger “obeys the rules” and does not initiate a challenge after losing (b ≤ 0.1), while Aggressive Challenger may switch to insisting even after losing (0.1 < b ≤ 1/3). In both variants we allow a broad range of values for since this entry is used after both players accommodate, which is an extremely rare case in a game between challengers.
- The Leader-Follower (L-F) strategy s = (1111; 0000; 1111) relies on the visibility of the other’s action and was not considered previously. In the i(A)CG game between two players with this strategy, the faster player insists and the slower player accommodates. In a simultaneous setting, this strategy lapses into inefficient stubborn insisting since all players consider themselves leaders, but in transparent settings with high p_{see} this strategy provides an effective and fair cooperation if mean reaction times are equal. In particular, for p_{see} > 1/3 the L-F strategy is a Nash Equilibrium in a one-shot game (see Proposition 4 in “Methods”).
When the entire population adopts an L-F strategy, most strategy entries become irrelevant since in a game between two L-F players the faster player never accommodates and the outcome of the previous round is either IA or AI. Therefore, we classify all strategies encoded by (*11*;*00*;****) as L-F. - Challenging Leader-Follower is a hybrid of the Challenger and L-F strategies encoded by (11b*;0c0*;*1**), where 1/3 < b ≤ 0.9, c ≤ 1/3. With such a strategy a player tends to insist without seeing the partner’s choice, and tends to accommodate when seeing that the partner insists; both these tendencies are stronger than for Aggressive Challengers, but not as strong as for Leader-Followers.
The results of our simulations for i(A)CG are presented in Table 2. The entries of the final population average strategy (Fig 7) show considerably different profiles for various values of p_{see}. Challengers, Turn-takers, and Leader-Followers succeeded for low, medium and high probabilities to see partner’s choice, respectively. Note that due to the emergence of Leader-Follower strategy, cooperation thrives for p_{see} = 0.5 and is established much faster than for lower transparency (Fig 3B).
Strategies are taken for the 10^{9}-th generation and averaged over 80 runs. This figure characterizes the final population as a whole and complements Table 2 representing specific strategies. (A): Strategy entries s_{1}, …, s_{4}. The decrease of the s_{2}/s_{3} ratio reflects the transition of the dominant strategy from challenging to turn-taking for p_{see} = 0.0, …, 0.4. For p_{see} = 0.5 the dominance of the Leader-Follower strategy is indicated by s_{2} = s_{3} = 1. (B) Entries s_{5}, …, s_{8} are irrelevant for p_{see} = 0. Values of s_{6}, s_{7} decrease as p_{see} increases, indicating an enhancement of cooperation in i(A)CG for higher transparency (s_{8} is almost irrelevant since mutual accommodation is a very rate event, and s_{5} is irrelevant for a population of L-F players taking place for p_{see} = 0.5). (C) Entries s_{9}, …, s_{12} are irrelevant for p_{see} = 0. The decrease of the s_{10}/s_{11} ratio for p_{see} = 0.1, …, 0.4 reflects the transition of the dominant strategy from challenging to turn-taking.
The frequencies were computed over 10^{9} generations in 80 runs. The frequency of the most successful strategy for each p_{see} value is shown in bold.
To provide additional insight into the results of the i(A)CG simulations, we studied analytically how various strategies perform against each other (Fig 8). As with the iPD, this analysis helps to understand why different strategies were successful at different transparency levels. A change of behaviour for p_{see} > 1/3 is in line with our theoretical results (Corollary 7) indicating that for these transparency levels L-F is a Nash Equilibrium. Population dynamics for i(A)CG with a payoff different from the presented in Fig 2B also depends on the Nash Equilibria of one-shot game, described by Proposition 4 in “Methods”.
For each pair of strategies the maps show if the first of the two strategies increases in frequency (up-arrow), or decreases (down-arrow) depending on visibility of the other player’s action and the already existing fraction of the respective strategy. The red lines mark the invasion thresholds, i.e. the minimal fraction of the first strategy necessary for taking over the population against the competitor second strategy. Solid-line invasion thresholds show the stable equilibrium fraction which allows coexistence of both strategies (see “Methods”). Dashed-line invasion thresholds indicate dividing lines above which only the first, below only the second strategy will survive. In all strategies, 1 stands for 0.999 and 0—for 0.001, the entries s_{9} = … = s_{12} = 1 are the same for all strategies and are omitted. (A) Turn-taker (q01q; 0000) with q = 5/8 for p_{see} > 0 outperforms Aggressive Challenger , (B) but not Challenger . (C) Challenger can coexist with Aggressive Challenger for low transparency, but is dominated for p_{see} > 1/3. (D) Leader-Follower (1111; 0000) clearly outperforms Turn-taker for p_{see} > 0.4 and (E,F) other strategies for p_{see} > 1/3.
Discussion
In this paper, we introduced the concept of transparent games which integrates the visibility of the partner’s actions into a game-theoretic setting. As a model case for transparent games, we considered iterated dyadic games where players have probabilistic access to information about the partner’s choice in the current round. When reaction times for both players are equal on average, the probability p_{see} of accessing this information can vary from p_{see} = 0.0 corresponding to the canonical simultaneous games, to p_{see} = 0.5 corresponding to sequential games with random order of choices. Note that in studies on the classic sequential games [10, 33] players were bound to the same strategy regardless of whether they made their choice before or after the partner. In contrast, transparent games allow different sub-strategies (s_{1}, …, s_{4}), (s_{5}, …, s_{8}) and (s_{9}, …, s_{12}) for these situations.
We showed that even a small probability p_{see} of seeing the partner’s choice before one’s own decision changes the long-term optimal behaviour in the iterated Prisoner’s Dilemma (iPD) and (Anti-)Coordination Game (i(A)CG). When this probability is high, its effect is pronounced: transparency enhances cooperation in the generally cooperative i(A)CG, but reduces cooperation in the more competitive iPD. Different transparency levels also bring qualitatively different strategies to success. In particular, in both games for high transparency a new class of strategies, which we termed “Leader-Follower” strategies, evolves. Although frequently observed in humans and animals (see, for instance, [5, 13], these strategies have up to now remained beyond the scope of game-theoretical studies, but naturally emerge in our transparent games framework. Note that here we focused on memory-one strategies for the reasons of better tractability, results for strategies with longer memory can differ considerably [37].
Our approach is similar to the continuous-time approaches suggested in [38] and [39]. However, in these studies the game is played continuously, without any rounds at all, while here we suppose that the game consists of clearly specified rounds, although the time within each round is continuous. This assumption can be considered naturalistic, since many real world interactions and behaviours are episodic, have clear starting and end points, and hence are close to distinct rounds [4, 14, 40, 41]. Transparent games to some degree resemble random games (see e.g. [42, 43]) since in both settings the outcome of the game depends on a stochastic factor. However in random games randomness immediately affects the payoff, while in transparent games it determines the chance to learn the partner’s choice. While this chance influences the payoff of the players, the effect depends on their strategies, which is not the case in random games. Transparent games are also related to Bayesian games (see e.g. [44–47]), where players are uncertain about the rules of the game (payoff, information possessed by other players, etc.) but have subjective probability distributions over the possible alternatives (beliefs). While in Bayesian games players dynamically update their beliefs by learning [45, 48], in this manuscript we consider static agents, and the dynamics happens only on the population level. Yet, introducing learning mechanisms [49] into the framework of transparent games could be an interesting direction for future work. More generally, transparent games can be considered as a special case of games with imperfect or asymmetric information, which have been studied before, mainly in economics (see [50–54] for recent examples). In games with private monitoring, for example (see [55] for a review), each player gets imperfect information on the actions of other players at the end of every round. Different players may get different information, which is similar to our transparent games. Yet, our approach differs from those developed in economic game theory, since in the transparent games players have different information about the immediately relevant present actions, while in games with imperfect private monitoring players have different information about the past actions [50, 53–55]. This difference is intended and important, since reaction times and direct action visibility are relevant in natural interactive behaviours as studied in biology and neuroscience, while it might be of less importance in economics. However, the relation of transparent games to games with imperfect private monitoring helps to make an interesting observation: The information available to the players in the transparent games is inherently asymmetric in those rounds where choice of one player is visible to the other (although here we consider players as getting the same amount of information on on average, across many rounds). Thus, high transparency also means high asymmetry of the access to the information in each specific round. This asymmetry (and not the amount information per se) may be the actual cause of the shift in the optimal behaviour observed for the high transparency.
The value of probability p_{see} strongly affects the evolutionary success of strategies. In particular, in the transparent i(A)CG even moderate p_{see} helps to establish cooperative turn-taking, while high p_{see} brings about a new successful strategy, Leader-Follower (L-F). For the transparent iPD we have shown that for p_{see} > 0 the Generous tit-for-tat strategy is unsuccessful and Win–stay, lose–shift (WSLS) is an unquestionable evolutionary winner for 0 < p_{see} ≤ 0.4. However, WSLS is not evolutionary stable (see the caption of Fig 5); our results indicate that in general there are no evolutionary stable strategies in the transparent iPD, which was already known to be the case for the simultaneous iPD [9, 56]. Moreover, if reward for mutual cooperation R ≤ 3, for high transparencies (p_{see} ≥ 0.4) all strategies become quite unstable and cooperation is hard to establish (Fig 6). Finally, for p_{see} = 0.5, L-F becomes successful in iPD and is more frequent than WSLS for R ≤ 3.2. For such a payoff, mutual cooperation is not much more beneficial than the alternating unilateral defection resulting from the L-F strategy. It brings a payoff of (S + T)/2 = 2.5, but is generally less susceptible to exploitation by defecting strategies. This explains the abrupt drop of cooperation in the transparent iPD with p_{see} ≥ 0.4 for R = 3.0 (S2 Note), while such a drop is less prominent for R > 3.2 (Fig 6). Note that R > 3 > (T + P)/2 promotes mutual cooperation among memory-one strategies since it results in higher payoff than defecting a cooperator (resulting in a payoff T) followed by mutual defection (payoff P), which is a natural response of any cautious strategy like TFT or WSLS. Therefore the case R > 3.2 is perhaps less interesting than the classic payoff matrix with R = 3.
Although resulting in a lower payoff than explicit cooperation, L-F can be also seen as a cooperative strategy for iPD. While the choice of Leaders (defection) is entirely selfish, Followers “self-denyingly” cooperate with them. Importantly, the L-F strategy is not beneficial for some of the players using it in any finite perspective, which distinguishes this strategy from most cooperative strategies. Let us explain this point by comparing L-F with WSLS. In a game between two WSLS-players, neither benefits from unilaterally switching to defection even in a short term for R ≥ (T + P)/2. While the defecting player gets T = 5 in the first round, its payoff in the next round is P = 1, which makes the average payoff over two rounds less than or equal to the reward for cooperation R. Thus for the iPD with standard payoff R = 3 = (T + P)/2 WSLS players do not benefit from defecting their WSLS-partners already for the two-round horizon. (Note that for R < (T + P)/2 defection is effective against WSLS, which explains the low frequency of WSLS for R < 3 in Fig 6). Now, assume that one is playing the transparent iPD with p_{see} = 0.5 against a partner with a pure L-F strategy (0000; 1111; 1111) and has to choose between L-F and AllD strategies. In a single round using AllD is (strictly) better with probability p = 1/2 (probability of being a Follower). From the two-round-perspective using AllD is beneficial with p = 1/4 (the probability of being a Follower in both rounds). For n = 6 rounds, AllD is still better than L-F with p = 7/64 (the probability of being a Follower in 5 or 6 rounds out of 6, which results in an average payoff equal to either 5/6 or 0). In general, for any finite number n of rounds, there is a risk to suffer from using the L-F strategy instead of AllD, and the probability of this is given by , where ⌈nP/T⌉ is the integer part of nP/T and is a binomial coefficient. That is adhering to L-F is not beneficial for some of the L-F players in any finite horizon, which makes their behaviour in a sense altruistic. Our results for the transparent iPD demonstrate that such “altruistic-like” behaviour may evolve in a population even without immediate reciprocation. The inherently unequal payoff distribution among L-F players for a final number of rounds opens interesting perspectives for research, but is outside the scope of this manuscript.
The lack of stability in the transparent iPD renders the analysis of the strategy dynamics for this game non-trivial. Therefore we do not provide here an exhaustive description of strategies in iPD and content ourselves with general observations and explanations. An in-depth analysis of strategy dynamics in the transparent iPD will be provided elsewhere as a separate, more technical paper [57].
Despite the clear differences between the two games, predominant strategies evolving in iPD and i(A)CG have some striking similarities. First of all, in both games, L-F appears to be the most successful strategy for high p_{see} (although for iPD with R ≤ 3 the share of Leader-Followers in the population across all generations is only about 20%, other strategies are even less successful as most of them appear just transiently and rapidly replace each other). This prevalence of the L-F strategy can be explained as follows: in a group where the behaviour of each agent is visible to the others and can be correctly interpreted, group actions hinge upon agents initiating these actions. In both games these initiators are selfish, but see S2 Note for an example of an “altruistic” action initiation. For low and moderate values of p_{see} the similarities of the two games are less obvious. However, the Challenger strategy in i(A)CG follows the same principle of “Win–stay, lose–shift” as the predominant strategy WSLS in iPD, but with modified definitions of “win” and “lose”. For Challenger winning is associated with any outcome better than the minimal payoff corresponding to the mutual accommodation. Indeed, a Challenger accommodates until mutual accommodation takes place and then switches to insisting. Such behaviour is described as “modest WSLS” in [35, 58] and is in-line with the interpretation of the “Win–stay, lose–shift” principle observed in animals [59].
The third successful principle in the transparent iPD is “Tit-for-tat”, embodied in Generous tit-for-tat (GTFT), TFT and Firm-but-fair (FbF) strategies. This principle also works in both games since turn-taking in i(A)CG is nothing else but giving tit for tat. In particular, the TFT and FbF strategies, which occur frequently in iPD for p_{see} ≥ 0.4, are partially based on taking turns and are similar to the Turn-Taker strategy in i(A)CG. The same holds to a lesser extent for the GTFT strategy.
The success of specific strategies for different levels of p_{see} makes sense if we understand p_{see} as a species’ ability to signal intentions and to interpret these signals when trying to coordinate (or compete). The higher p_{see}, the better (more probable) is the explicit coordination. This could mean that a high ability to explicitly coordinate actions leads to coordination based on observing the leader’s behaviour. In contrast, moderate coordination ability results in some form of turn-taking, while low ability leads to simple strategies of WSLS-type. In fact, an agent utilizing the WSLS principle does not even need to comprehend the existence of the second player, since WSLS “embodies an almost reflex-like response to the pay-off” [24]. The ability to cooperate may also depend on the circumstances, for example, on the physical visibility of partner’s actions. In a relatively clear situation, following the leader can be the best strategy. Moderate uncertainty requires some (implicit) rules of reciprocity embodied in turn-taking. High uncertainty makes coordination difficult or even impossible, and may result in a seemingly irrational “challenging behaviour” as we have shown for the transparent i(A)CG. However, when players can succeed without coordination (which was the case in iPD), high uncertainty about the other players’ actions does not cause a problem.
While we focused here on the iterated Prisoner’s Dilemma and on the specific formulation of iterative (Anti-)Coordination Game, the transparent games framework can be applied to treatment of other two-player, two-action transparent games. In particular, we have shown that the structure of the Nash equilibria for the Hawk-Dove and Leader games is identical to that of (Anti-)Coordination Game (see Methods, Proposition 8), which suggests that the transparency has a strong effect on successful strategies in these games as well. As a future work, it would also be interesting to extend the transparent game framework to N-agent interactions [60–62], to provide an account of naturalistic dynamics in groups.
By taking the visibility of the agents’ actions into account, transparent games may offer a compelling theoretical explanation for a range of biological, sociological and psychological phenomena. One potential application of transparent games is related to experimental research on social interactions, including the emerging field of social neuroscience that seeks to uncover the neural basis of social signalling and decision-making using neuroimaging and electrophysiology in humans and animals [63–66]. So far, most studies have focused on sequential [67, 68] or simultaneous games [69]. One of the main challenges in this field is extending these studies to direct real-time interactions that would entail a broad spectrum of dynamic competitive and cooperative behaviours. In line with this, several recent studies also considered direct social interactions in humans and non-human primates [12–14, 41, 70–74] during dyadic games where players can monitor actions and outcomes of each other. Transparent games allow modelling the players’ access to social cues, which is essential for the analysis of experimental data in the studies of this kind [8]. This might be especially useful when behaviour is explicitly compared between “simultaneous” and “transparent” game settings, as in [12, 14, 70, 74]. In particular, the enhanced cooperation in the transparent i(A)CG for high p_{see} provides a theoretical explanation for the empirical observations in [14], where humans playing an i(A)CG-type game demonstrated a higher level of cooperation and a fairer payoff distribution when they were able to observe the actions of the partner while making their own choice. In view of the argument that true cooperation should benefit from enhanced communication [8], the transparent i(A)CG can in certain cases be a more suitable model for studying cooperation than the iPD (see also [75, 76] for a discussion of studying cooperation by means of i(A)CG-type games).
In summary, transparent games provide a theoretically attractive link between classical concepts of simultaneous and sequential games, as well as a computational tool for modelling real-world interactions. This approach allows integrating work on sensorimotor decision-making under uncertainty with economic game theory. We thus expect that the transparent games framework can help to establish a deeper understanding of social behaviour in humans and animals.
Methods
Transparent games between two players
In this study, we focus on iterated two-player (dyadic) two-action games: in every round both players choose one of two possible actions and get a payoff depending on the mutual choice according to the payoff matrix (Fig 2). A new game setting, transparent game, is defined by a payoff matrix and probabilities (i = 1, 2) of Player i to see the choice of the other player, . Note that , and is the probability that neither of players knows the choice of the partner because they act sufficiently close in time so that neither players can infer the other’s action prior to making their own choice. The probabilities can be computed from the distributions of reaction times for the two players, as shown in S1 Note for reaction times modelled by exponentially modified Gaussian distribution [77, 78]. In this figure, reaction times for both players have the same mean, which results in symmetric distribution of reaction time differences (SN1 Fig. 1B in S1 Note) and . Here we focus only on this case since for both games considered in this study, unequal mean reaction times provide a strong advantage to one of the players (see below). However, in general .
To illustrate how transparent, simultaneous and sequential games differ, let us consider three scenarios for a Prisoner’s Dilemma (PD):
- If prisoners write their statements and put them into envelopes, this case is described by simultaneous PD.
- If prisoners are questioned in the same room in a random or pre-defined order, one after another, this case is described by sequential PD.
- Finally, in a case of a face-to-face interrogation where prisoners are allowed to answer the questions of prosecutors in any order (or even to talk simultaneously) the transparent PD comes into play. Here prisoners are able to monitor each other and interpret inclinations of the partner in order to adjust their own choice accordingly.
While the transparent setting can be used both in zero-sum and non-zero-sum games, here we concentrate on the latter class where players can cooperate to increase their joint payoff. We consider the transparent versions of two classic games, the PD and the (Anti-)Coordination Game ((A)CG). We have selected PD and (A)CG as representatives of two distinct types of symmetric non-zero-sum games [30, 31]: maximal joint payoff is awarded when players select the same action (cooperate) in PD, but complementary actions in (A)CG (one insists on own preferred option, and the other accommodates this option, to achieve common goal). In games of (A)CG type, one of the two coordinated choices is more beneficial for Player 1 (Player 1 insists, Player 2 accommodates), and the other for Player 2 (Player 1 accommodates, Player 2 insists), thus to achieve fair cooperation players should alternate between these two states.
Another important difference between the two considered games is that in (A)CG a player benefits from acting before the partner, while in PD it is mostly preferable for a player to act after the partner. Indeed, in (A)CG the player acting first has good chances to get the maximal payoff of S = 4 by insisting: when the second player knows that the partner insists, it is better to accommodate and get a payoff of T = 3, than to insist and get R = 2. In PD, however, defection is less beneficial if it can be discovered by the opponent and acted upon (for details, see Subsection “One-shot transparent Prisoner’s Dilemma with unequal reaction times” below). Therefore, in PD most players prefer acting later: defectors to have a better chance of getting T = 5 for a successful defection, and cooperators to make sure that the partners are not defecting them. The only exception from this rule is the Leader-Follower strategy, but as we show in S1 Note this special case does not change the overall situation for the simulations. Therefore, the optimal behaviour in PD is generally to wait as long as possible, while in (A)CG a player should act as quickly as possible. Consequently, when the time for making choice is bounded from below and from above, evolution in these games favours marginal mean reaction times: maximal allowed reaction time in PD and minimal allowed reaction time in (A)CG. Player types with different behaviour are easily invaded. Therefore we assumed in all simulations that the reaction times have a constant and equal mean. We also assumed that reaction times for all players have an equal non-zero variance and that the difference of the reaction time distributions for two types of players is always symmetric (see S1 Note). This results in being the same for all types, thus all players have equal chances to see the choices of each other.
Analysis of one-shot transparent games
Consider a one-shot transparent game between Player 1 and Player 2 having strategies and , and probabilities to see the choice of the partner and , respectively. An expected payoff for Player 1 is given by (1) where the first line describes the case when neither player sees partner’s choice, the second line describes the case when Player 2 sees the choice of Player 1, and the third—when Player 1 sees the choice of Player 2.
Let us provide two definitions that will be used throughout this section.
Definition 1. Strategies s^{1} and s^{2} are said to form a Nash Equilibrium if neither player would benefit from unilaterally switching to another strategy, that is E(s^{1}, s^{2}) ≥ E(r^{1}, s^{2}) and E(s^{2}, s^{1}) ≥ E(r^{2}, s^{1}) for any alternative strategies r^{1} and r^{2} of Players 1 and 2, respectively.
Definition 2. Let us denote E_{ij} = E(s^{i}, s^{j}). Strategy s^{1} is said to dominate strategy s^{2} if E_{11} ≥ E_{21} and E_{12} ≥ E_{22}. If both inequalities are strict, s^{1} strictly dominates s^{2}. Strategies s^{1} and s^{2} are said to be bistable when E_{11} > E_{21} and E_{12} < E_{22}. Strategies s^{1} and s^{2} co-exist when E_{11} < E_{21} and E_{12} > E_{22}.
Some intuition on these notions is provided below in subsection “Evolutionary dynamics of two strategies”. We refer to [9] for details.
For the sake of simplicity, we assume for the rest of this section that , otherwise the game is equivalent to the classic sequential or simultaneous game. First we consider the one-shot transparent Prisoner’s dilemma (PD), and then—(Anti-)Coordination Game ((A)CG).
One-shot transparent Prisoner’s Dilemma with equal reaction times.
Here we assume that to simplify the discussion. Similar to the classic one-shot PD, in the transparent PD all Nash Equilibria (NE) correspond to mutual defection. To show this we make an important observation: in the one-shot PD it is never profitable to cooperate when seeing the partner’s choice.
Lemma 1. In one-shot transparent PD with p_{see} > 0 any strategy (s_{1}; s_{2}; s_{3}) with s_{2}, s_{3} > 0 is dominated by strategies (s_{1}; 0; s_{3}) and (s_{1}; s_{2}; 0). The dominance of (s_{1}; 0; s_{3}) is strict when s_{1} > 0, the dominance of (s_{1}; s_{2}; 0) is strict when s_{1} < 1.
Proof. Consider the strategies s^{1} = (s_{1}; s_{2}; 0), s^{2} = (s_{1}; s_{2}; s_{3}). To show that s^{1} dominates s^{2}, it is sufficient to demonstrate that E_{11} − E_{21} ≥ 0 and E_{12} − E_{22} ≥ 0. Two following inequalities can be inferred from (1), from the fact that in PD R < T and from the assumptions that p_{see}, s_{1}, s_{2} > 0: As one can easily see, both inequalities are strict for s_{1} > 0. The second part of the proof follows from S < P and (1), and is otherwise identical, therefore we omit it.
Now we can describe the NE strategies in transparent PD:
Proposition 2. In one-shot transparent PD all the Nash Equilibria are comprised by pairs of strategies (0; x; 0) with 0 ≤ x ≤ 1 and (2)
Proof. First we show that for any x, y satisfying (2), strategies (0; x; 0) and (0; y; 0) form a Nash Equilibrium. Assume that there exists a strategy (s_{1}; s_{2}; s_{3}), which provides a better payoff against (0; x; 0) than (0; y; 0). According to Lemma 1, expected payoff of a strategy (s_{1}; 0; 0) is not less than the payoff of (s_{1}; s_{2}; s_{3}). Now it remains to find the value of s_{1} maximizing the expected payoff E of (s_{1}; 0; 0). From (1) we have: Thus the expected payoff is maximized by s_{1} = 0 if inequality (2) holds and by s_{1} = 1 otherwise. In the former case the strategy (s_{1}; 0; 0) results in the same payoff P as the strategy (0; y; 0), which proves that a pair of strategies (0; x; 0), (0; y; 0) is an NE. If (2) does not hold, strategy (0; x; 0) is not an NE, since switching to (1; 0; 0) results in a better payoff.
Let us show that there are no further NE. Indeed, according to Lemma 1 if an alternative NE exists, it can only consist of strategies (1; 0; z) or (u; 0; 0) with 0 ≤ z ≤ 1 and 0 < u < 1. In both cases switching to unconditional defection is preferable, which finishes the proof.
The one-shot transparent PD has two important differences from the classic game. First, the unconditional defection (0; 0; 0) dominates the cooperative strategy (1; 1; 0) only for . Indeed, when both players stick to (1; 1, 0), their payoff is equal to R, while when switching to (0; 0; 0) strategy, a player gets p_{see}P + (1 − p_{see})T. However, (1; 1, 0) is dominated by a strategy (1; 0; 0) that cooperates when it does not see the choice of the partner and defects otherwise. This strategy, in turn is dominated by (0; 0; 0).
Second, in transparent PD unconditional defection (0; 0; 0) is not evolutionary stable as players can switch to (0; x; 0) with x > 0 retaining the same payoff. This, together with Proposition 3 below, makes possible a kind of evolutionary cycle: (1; 0; 0) → (0; 0; 0) ↔ (0; x; 0) → (1; 1; 0), (1; 0; 0) → (1; 0; 0).
Proposition 3. In transparent PD strategies (1; 0; 0) and (0; x; 0) have the following relations:
- if condition (2) and the following condition (3) are satisfied, then (0; x; 0) dominates (1; 0; 0);
- if neither (2) nor (3) are satisfied, then (1; 0; 0) dominates (0; x; 0);
- if (2) is satisfied but (3) is not, then the two strategies coexist;
- if (3) is satisfied but (2) is not, then the two strategies are bistable.
Proof. We prove only the first statement since the proof of the others is almost the same.
Let Player 1 use strategy (1; 0; 0) and Player 2—strategy (0; x; 0). To prove that (0; x; 0) dominates (1; 0; 0) we need to show that Player 2 has no incentive to switch to (1; 0; 0) and that Player 1, on the contrary, would get higher payoff if using (0; x; 0). The latter statement follows from Proposition 2. To show that the former also takes place we simply write down expected payoffs E_{11} and E_{21} of strategies (1; 0; 0) and (0; x; 0) when playing against (1; 0; 0): Now it can be easily seen that E_{11} ≤ E_{21} holds whenever inequality (3) is satisfied.
One-shot transparent Prisoner’s Dilemma with unequal reaction times.
Here we consider the case when players have unequal probabilities to see partner’s choice. We focus on a simple example showing why waiting is generally beneficial in the transparent iPD. Assume that all players in population act as quickly as they can, but cooperation takes on average longer than defection. Assume further that a player preparing to cooperate may see the partner defecting and then it is still possible for this player to change decision and defect. Finally let us consider only pure strategies that is s_{1}, s_{2}, s_{3} ∈ {0, 1}. The question now is, which strategy would win in this case.
From Lemma 1, we know that it is sufficient to consider two strategies: “cooperators” s^{1} = (1; 0; 0) and “defectors” s^{2} = (0; 0; 0) since they dominate all other strategies. Note that the probability of cooperative players to see the choice of defectors is higher than the probability of defectors to see the choice of cooperators, resulting in . Probabilities of a player to see the choice of another player with the same strategy is not higher than 0.5 (since these probabilities are equal for both players and the sum of these probabilities is not higher than 1), therefore it holds . Note that, as before, we assume .
Then the expected payoff matrix for these two strategies in the one-shot transparent PD is given by
Since for , three variants are possible:
- cooperative strategy s^{1} dominates for ;
- s^{1} and s^{2} are bistable for E_{11} > E_{21}, that is for (4)
- defecting strategy s^{2} dominates otherwise.
For the standard Prisoner’s Dilemma payoff matrix (Fig 2), inequality (4) turns into . Since , cooperative strategy s^{1} acting with a delay has a chance to win over defectors if it can see their actions with probability . This example demonstrates that cooperation can survive in one-shot Prisoner’s dilemma under certain (artificial) assumptions. More importantly, this example shows the importance of seeing partner’s choice in transparent Prisoner’s Dilemma in general, illustrating the incentive of players to wait for partner’s action.
One-shot transparent (Anti-)Coordination Game.
Recall [79] that in the classic one-shot (A)CG game there are three Nash Equilibria: two pure (Player 1 insists, Player 2 accommodates, or vice verse) and one mixed (each player insists with probability ). The Nash Equilibria for the transparent (A)CG game are specified by the following proposition.
Proposition 4. Consider one-shot transparent (A)CG between Players 1 and 2 with probabilities to see the choice of the partner and , respectively. Let , then this game has the following pure strategy NE.
- Player 1 uses strategy (0; 0; 1), Player 2 uses strategy (1; 0; 1)—for (5)
- Player 1 uses strategy (1; 0; 1), Player 2 uses strategy (0; 0; 1)—for (6)
- Both players use strategy (1; 0; 1)—when (6) is not satisfied.
Additionally, if inequality (5) is satisfied, there is also a mixed-strategy NE: Player i uses strategy with (7) In other words, when (5) holds, there are two pure-strategy and one mixed-strategy NE. Otherwise there is only one pure-strategy NE: Player 1 uses strategy (1; 0; 1), Player 2 uses strategy (0; 0; 1) when (6) holds, and both Players use (1; 0; 1) when (6) does not hold.
Remark 1. For the correct interpretation of Proposition 4 it is important that inequality (6) holds automatically whenever (5) holds, since The latter statement follows from (assumption of Proposition 4) and the fact that . Indeed, it holds , and, consequently,
To prove Proposition 4, we need two lemmas. First, similar to the Prisoner’s dilemma, for the transparent (A)CG we have:
Lemma 5. In one-shot transparent (A)CG any strategy (s_{1}; s_{2}; s_{3}) is dominated by strategies (s_{1}; 0; s_{3}) and (s_{1}; s_{2}; 1). The dominance of (s_{1}; 0; s_{3}) is strict when s_{1} > 0, the dominance of (s_{1}; s_{2}; 1) is strict when s_{1} < 1.
The proof is based on the fact that for the (A)CG game hold inequalities R < T and P < S. Otherwise the proof is identical to the proof of Lemma 1.
Lemma 6. In one-shot transparent (A)CG, when Player 1 uses strategy (1; 0; 1), the best response for Player 2 is to use strategy (0; 0; 1) for and to use (1; 0; 1) otherwise.
Proof. By Lemma 5 the best response for Player 2 is a strategy (s_{1}; 0; 1) with 0 ≤ s_{1} ≤ 1. When Player 2 uses this strategy against (1; 0; 1), the expected payoff of Player 2 is given by Thus the payoff of Player 2 depends linearly on the value of s_{1} and is maximized by s_{1} = 0 if (8) and by s_{1} = 1 otherwise. Inequality (8) is equivalent to (6), which completes the proof.
Using Lemmas 5 and 6, we can now compute NE for the one-shot transparent (A)CG:
Proof. Pure strategy NEs are obtained immediately from Lemma 6. To compute the mixed-strategy NE, recall that Player 1 achieves it when the expected payoff obtained by Player 2 for insisting and accommodating is equal: By computing from this equation and applying the same argument for Player 2, we get the strategy entries given in (7).
Corollary 7. Consider one-shot transparent (A)CG with S = 4, T = 3, R = 2, P = 1, where both players have equal probabilities p_{see} to see the choice of the partner. In this game there are three NE for p_{see} < 1/3: (a) Player 1 uses strategy (1; 0; 1), Player 2 uses strategy (0; 0; 1); (b) vice versa; (c) both players use strategy (x; 0; 1), with . For p_{see} ≥ 1/3, (1; 0; 1) is the only NE.
One-shot transparent Hawk-Dove and Leader games.
While we consider in Proposition 4 only the (A)CG game for which S > T > R > P, this result is easily generalized to a wider class of (anti-)coordination games including several other important games, such as Hawk-Dove and Leader. Together with PD and (A)CG, Hawk-Dove and Leader form the set of two-player two-action games where players have a conflict of interests [29]. Hawk-Dove (also known as Chicken or Snowdrift) game is also relevant for studying the evolution of cooperation, competition over a shared resource, and reciprocity [61, 80–82]. Note that in [83], (A)CG, Leader and Hawk-Dove are grouped into a single category category (category III) that is referred to as “Hawk-Dove” games. This categorization is based on equilibrium structure of the simultaneous game: two strict asymmetric NE and one symmetric mixed NE, which is evolutionary stable strategy. This structure is not by default the same for the transparent games, but the following result takes place.
Proposition 8. Consider a general one-shot transparent game between Players 1 and 2 with probabilities to see the choice of the partner and satisfying . If the payoff matrix of the game satisfies inequalities S > T > R and S > P, then the Nash equilibria (NE) of this game is described by Proposition 4. Namely, when (5) holds, there are two pure-strategy and one mixed-strategy NE. Otherwise there is only one pure-strategy NE: Player 1 uses strategy (1; 0; 1), Player 2 uses strategy (0; 0; 1) when (6) holds, and both Players use (1; 0; 1) when (6) does not hold.
Proof. The proof coincides with the proof of Proposition 4. Indeed, Lemma 5 holds for any game with T > R and S > P. Lemma 6 holds whenever T > R and S > R. Finally, NE are described in Proposition 4 for the case S > T.
Two classical games satisfy conditions of Proposition 8. The first is the “Hawk-Dove” game, where S > P > T > R. To get the classic notation for this game, one needs to replace in Fig 2B “Insist” by “Hawk” and “Accommodate” by “Dove”.
The second relevant game is Leader, described by S > T > P > R. This game is similar to the (A)CG game formulated as insisting on own preference or accommodating the other, with the difference that here if both players insist it is detrimental for both, so it is better to accommodate, however the insisting player receives maximal reward if the other player accommodates. An example of a Leader game payoff matrix can be obtained from that in Fig 2B by setting P = 2, R = 1, while leaving S = 4, T = 3.
The payoff matrices of these games are illustrated in the S2 Fig.
Analysis of iterated transparent games
For the analysis of iterated games we use the techniques described in [9, 24]. Since most of results for simultaneous and sequential iPD were obtained for strategies taking into account outcomes of the last interaction (“memory-one strategies”), here we also focus on memory-one strategies. Note that considering multiple previous round results in very complex strategies. To overcome this, one can, for instance, use pure strategies (see, for instance, [31]), but we reserve this possibility for future research.
A strategy without memory in transparent games is described by a three-element vector. A memory-one strategy additionally conditions current choice upon the outcome of the previous round of the game. Since there are 4 = 2 × 2 possible outcomes, a memory-one strategy for a player type i is represented by a vector , where k enumerates the twelve (4 × 3) different combinations of previous outcome and the current probability of choice. The entries of the strategy thus represent the conditional probabilities to select action A_{1} (“Cooperate” in iPD and “Insist” in i(A)CG, see Fig 2), specifically
- are probabilities to select A_{1} without seeing the partner’s choice, given that in the previous round the joint choice of the player and the partner was A_{1}A_{1}, A_{1}A_{2}, A_{2}A_{1}, and A_{2}A_{2} respectively (the first action specifies the choice of the player, and the second—the choice of the partner);
- are probabilities to select A_{1}, seeing the partner selecting A_{1} and given the outcome of the previous round (as before).
- are probabilities to select A_{1}, seeing the partner selecting A_{2} and given the outcome of the previous round.
Probabilities to select A_{2} are given by , respectively.
Consider an infinite population of players evolving in generations. For any generation t = 1, 2, … the population consists of n(t) player types defined by their strategies and their frequencies x_{i}(t) in the population, . Besides, the probability of a player from type i to see the choice of a partner from type j is given by (in our case for all types i and j, but in this section we use the general notation).
Consider a player from type i playing an infinitely long iterated game against a player from type j. Since both players use memory-one strategies, this game can be formalized as a Markov chain with states being the mutual choices of the two players and a transition matrix M given by (9) where the matrices M_{0}, M_{1} and M_{2} describe the cases when neither player sees the choice of the partner, Player 1 sees the choice of the partner before making own choice, and Player 2 sees the choice of the partner, respectively. These matrices are given by The gain of type i when playing against type j is given by the expected payoff E_{ij}, defined by (10) where R, S, T, P are the entries of the payoff matrix (R = 3, S = 0, T = 5, P = 1 for standard iPD and R = 2, S = 4, T = 3, P = 1 for i(A)CG, see Fig 2), and y_{R}, y_{S}, y_{T}, y_{P} represent the probabilities of getting to the states associated with the corresponding payoffs by playing s^{i} against s^{j}. This vector is computed as a unique left-hand eigenvector of matrix M associated with eigenvalue one [9]:
The evolutionary success of type i is encoded by its fitness f_{i}(t): if type i has higher fitness than the average fitness of the population , then x_{i}(t) increases with time, otherwise x_{i}(t) decreases and the type is dying out. This evolutionary process is formalized by the replicator dynamics equation, which in discrete time takes the form (11) The fitness f_{i}(t) is computed as the average payoff for a player of type i when playing against the current population: where E_{ij} is given by (10).
Evolutionary dynamics of two strategies.
To provide an example of evolutionary dynamics and introduce some useful notation, we consider a population consisting of two types playing iPD with strategies: s^{1} = (1, 0, 0, 1; 1, 0, 0, 1; 0, 0, 0, 0), s^{2} = (0, 0, 0, 0; 0, 0, 0, 0; 0, 0, 0, 0) (recall that we write 0 instead of ε and 1 instead of 1 − ε for ε = 0.001; see Results, section Transparent games with memory: evolutionary simulations) and initial conditions x_{1}(1) = x_{2}(1) = 0.5. That is, the first type plays the “Win–stay, lose–shift” (WSLS) strategy, and the second type (almost) always defects (uses the AllD strategy). We set . Note that since and , it holds p_{see} ≤ 0.5. Given p_{see} we can compute a transition matrix of the game using (9) and then calculate the expected payoffs for all possible pairs of players ij using (10). For instance, for p_{see} = 0 and ε = 0.001 we have This means that a player of the WSLS-type on average gets a payoff E_{11} = 2.995 when playing against a partner of the same type, and only E_{12} = 0.504, when playing against an AllD-player. The fitness for each type is given by Since f_{2}(t) > f_{1}(t) for any 0 < x_{1}(t), x_{2}(t) < 1, the AllD-players take over the whole population after several generations. Dynamics of the type frequencies x_{i}(t) computed using (11) shows that this is indeed the case (Fig 9A). Note that since E_{21} > E_{11} and E_{22} > E_{12}, AllD is garanteed to win over WSLS for any initial frequency of WSLS-players x_{1}(1). In this case one says that AllD dominates WSLS and can invade it for any x_{1}(1).
(A) Initially, both types have the same frequency, but after 40 generations the fraction of WSLS-players x_{1}(t) converges to 0 for probabilities to see partner’s choice p_{see} = 0.0, 0.2 and to 1 for p_{see} = 0.4, 0.5. (B) This is due to the decrease of the invasion threshold h_{1} for WSLS: while h_{1} = 1 for p_{see} = 0 (AllD dominates WSLS and the fraction of WSLS-players unconditionally decreases), AllD and WSLS are bistable for p_{see} > 0 and WSLS wins whenever x_{1}(t)>h_{1}. Arrows indicate whether frequency x_{1}(t) of WSLS increases or decreases. Interestingly, h_{1} = 0.5 holds for p_{see} ≈ 1/3, which corresponds to the maximal uncertainty since the three cases (“Player 1 knows the choice of Player 2 before making its own choice”; “Player 2 knows the choice of Player 1 before making its own choice”; “Neither of players knows the choice of the partner”) have equal probabilities.
As we increase p_{see}, the population dynamics changes. While for p_{see} = 0.2 AllD still takes over the population, for p_{see} = 0.4 WSLS wins (Fig 9A). This can be explained by computing the expected payoff for p_{see} = 0.4: Hence f_{1}(t) > f_{2}(t) for 0 ≤ x_{2}(t) ≤ 0.5 ≤ x_{1}(t) ≤ 0, which explains the observed dynamics. Note that here E_{11} > E_{21}, while E_{12} < E_{22}, that is when playing with WSLS- and AllD-players alike partners of the same type win more than partners of a different type. In this case one says that WSLS and AllD are bistable and there is an unstable equilibrium fraction of WSLS players given by (12) We call h_{i} an invasion threshold for type i, since this type takes over the whole population for x_{i}(t) > h_{i}, but dies out for x_{i}(t) < h_{i}. To illustrate this concept, we plot in Fig 9A the invasion threshold h_{1} as a function of p_{see} for WSLS type playing against AllD.
The third possible case of two-types dynamics is coexistence, which takes place when E_{11} < E_{21}, E_{12} > E_{22}, that is when playing against a player of any type is less beneficial for a partner of the same type than for a partner of a different type. In this case the fraction of a type given by (12) corresponds to a stable equilibrium meaning that the frequency of the first type x_{1}(t) increases for x_{1}(t) < h_{1}, but decreases for x_{1}(t) > h_{1}.
Evolutionary simulations for transparent games.
Theoretical analysis of the strategies in repeated transparent games is complicated due to the many dimensions of the strategy space, which motivates using of evolutionary simulations. For this we adopt the methods described in [9, 24]. We do not use here a more modern adaptive dynamics approach [84, 85] since for high-dimensional strategy space it would require analysis of a system with many equations, complicating the understanding and interpretation of the results.
Each run of simulations starts with five player types having equal initial frequencies: n(1) = 5, x_{1}(1) = … = x_{5}(1) = 0.2. Following [24], strategy entries with k = 1, …, 12 for each player type i are randomly drawn from the distribution with U-shaped probability density, favouring probability values around 0 and 1: (13) for y ∈ (0, 1). Additionally, we require , where ε = 0.001 accounts for the minimal possible error in the strategies [24]. The fact that players cannot have pure strategies and are prone to errors is also closely related to the “trembling hand” effect preventing players from using pure strategies [24, 86]. We performed evolutionary simulations for various transparencies with p_{see} = 0.0, 0.1, …, 0.5.
The frequencies of strategies x_{i}(t) change according to the replicator Eq (11). If x_{i}(t) < χ, the type is assumed to die out and is removed from the population (share x_{i}(t) is distributed proportionally among the remaining types); we follow [9, 24] in taking χ = 0.001. Occasionally (every 100 generations on average to avoid strong synchronization), new types are entered in the population. The strategies for the new types are drawn from (13) and the initial frequencies are set to x_{i}(t_{0}) = 1.1χ [24].
Supporting information
S1 Note. Transparent games and reaction times distributions.
https://doi.org/10.1371/journal.pcbi.1007588.s001
(PDF)
S2 Note. Transparent iterated Prisoner’s Dilemma with a restricted strategy space.
https://doi.org/10.1371/journal.pcbi.1007588.s002
(PDF)
S1 Fig. Distributions of total shares in the population over all generations for 80 most persistent player types over the 80 runs of evolutionary simulations.
(A) for iterated Prisoner’s Dilemma (iPD) and (B) for iterated (Anti-)Coordination Game (i(A)CG). The central mark indicates the median, and the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. The whiskers extend to the most extreme data points not considered outliers, and the outliers are plotted individually using the ‘+’ symbol. The higher total shares of the types are, the more stable the dynamics in the population is. While stability varies with transparency for both games, the drop of stability in iPD for p_{see} ≥ 0.4 is especially noticeable. Indeed, in highly transparent iPD any strategy is sufficiently “predictable”, which allows a best-response strategy to replace it in a population. Such best-response strategies can be generally weak and short-living, see for example treacherous WSLS described in Fig 5 (main text). Note that stability increases considerably for p_{see} ≥ 0.4 in i(A)CG, which reflects the fact that Leader-Follower strategy becomes evolutionary stable for high transparency.
https://doi.org/10.1371/journal.pcbi.1007588.s003
(TIF)
S2 Fig. Generalized payoff matrix of (anti-)coordination games and its particular cases: (A)CG, Leader, and Hawk-Dove, expressed as ordinal payoffs.
https://doi.org/10.1371/journal.pcbi.1007588.s004
(TIF)
References
- 1. Dugatkin LA, Mesterton-Gibbonsand M, Houston AI. Beyond the Prisoner’s dilemma: Toward models to discriminate among mechanisms of cooperation in nature. Trends in ecology & evolution. 1992;7(6):202–205.
- 2. Axelrod R. On six advances in cooperation theory. Analyse & Kritik. 2000;22(1):130–151.
- 3.
de Waal FBM, Brosnan SF. Simple and complex reciprocity in primates. In: Cooperation in primates and humans. Springer; 2006. p. 85–105.
- 4. Silk JB. The strategic dynamics of cooperation in primate groups. Advances in the Study of Behavior. 2007;37:1–41.
- 5. Tomasello M, Melis AP, Tennie C, Wyman E, Herrmann E. Two key steps in the evolution of human cooperation: The interdependence hypothesis. Current anthropology. 2012;53(6):673–692.
- 6. Axelrod R, Hamilton WD. The Evolution of Cooperation. Science. 1981;211:13–90.
- 7. Rapoport A. Applications of game-theoretic concepts in biology. Bulletin of mathematical biology. 1985;47(2):161–192.
- 8. Noë R. Cooperation experiments: coordination through communication versus acting apart together. Animal Behaviour. 2006;71(1):1–18.
- 9.
Nowak MA. Evolutionary dynamics. Harvard University Press; 2006.
- 10. Nowak MA, Sigmund K. The alternating prisoner’s dilemma. Journal of theoretical Biology. 1994;168(2):219–226.
- 11. de Waal FBM, Suchak M. Prosocial primates: selfish and unselfish motivations. Philosophical Transactions of the Royal Society B: Biological Sciences. 2010;365(1553):2711–2722.
- 12. Brosnan SF, Wilson BJ, Beran MJ. Old World monkeys are more similar to humans than New World monkeys when playing a coordination game. Proceedings of the Royal Society of London B: Biological Sciences. 2012;279(1733):1522–1530.
- 13. Duguid S, Wyman E, Bullinger AF, Herfurth-Majstorovic K, Tomasello M. Coordination strategies of chimpanzees and human children in a Stag Hunt game. Proceedings of the Royal Society of London B: Biological Sciences. 2014;281(1796):20141973.
- 14. Hawkins RXD, Goldstone RL. The formation of social conventions in real-time environments. PlOS ONE. 2016;11(3):e0151670. pmid:27002729
- 15. Vaziri-Pashkam M, Cormiea S, Nakayama K. Predicting actions from subtle preparatory movements. Cognition. 2017;168:65–75. pmid:28651096
- 16. Vesper C, Schmitz L, Safra L, Sebanz N, Knoblich G. The role of shared visual information for joint action coordination. Cognition. 2016;153:118–123. pmid:27183398
- 17. Sueur C, Petit O. Signals use by leaders in Macaca tonkeana and Macaca mulatta: group-mate recruitment and behaviour monitoring. Animal cognition. 2010;13(2):239–248. pmid:19597854
- 18. King AJ, Sueur C. Where next? Group coordination and collective decision making by primates. International Journal of Primatology. 2011;32(6):1245–1267.
- 19. Fichtel C, Zucchini W, Hilgartner R. Out of sight but not out of mind? Behavioral coordination in red-tailed sportive lemurs (Lepilemur ruficaudatus). International journal of primatology. 2011;32(6):1383–1396. pmid:22207772
- 20. Strandburg-Peshkin A, Papageorgiou D, Crofoot M, Farine DR. Inferring influence and leadership in moving animal groups. Philosophical Transactions B: Biological Sciences. 2017;373:2170006.
- 21. Chiappori PA, Levitt S, Groseclose T. Testing mixed-strategy equilibria when players are heterogeneous: The case of penalty kicks in soccer. American Economic Review. 2002;92(4):1138–1151.
- 22. Palacios-Huerta I. Professionals play minimax. The Review of Economic Studies. 2003;70(2):395–415.
- 23. Bar-Eli M, Azar OH, Ritov I, Keidar-Levin Y, Schein G. Action bias among elite soccer goalkeepers: The case of penalty kicks. Journal of economic psychology. 2007;28(5):606–621.
- 24. Nowak M, Sigmund K. A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner’s Dilemma game. Nature. 1993;364(6432):56–58. pmid:8316296
- 25. Noë R. A veto game played by baboons: a challenge to the use of the Prisoner’s Dilemma as a paradigm for reciprocity and cooperation. Animal Behaviour. 1990;39(1):78–90.
- 26.
Colman AM. Game theory and its applications: In the social and biological sciences. 2nd ed. London: Routledge; 2005.
- 27. Kilgour DM, Fraser NM. A taxonomy of all ordinal 2 × 2 games. Theory and decision. 1988;24(2):99–117.
- 28.
Osborne MJ, Rubinstein A. A course in game theory. MIT press; 1994.
- 29. Rapoport A. Exploiter, Leader, Hero, and Martyr: the four archetypes of the 2×2 game. Systems Research and Behavioral Science. 1967;12(2):81–84.
- 30. Helbing D, Schönhof M, Stark HU, Hołyst JA. How individuals learn to take turns: Emergence of alternating cooperation in a congestion game and the prisoner’s dilemma. Advances in Complex Systems. 2005;8(01):87–116.
- 31. Colman AM, Browning L. Evolution of cooperative turn-taking. Evolutionary Ecology Research. 2009;11(6):949–963.
- 32. Traulsen A, Hauert C. Stochastic evolutionary game dynamics. Reviews of nonlinear dynamics and complexity. 2009;2:25–61.
- 33. Frean MR. The prisoner’s dilemma without synchrony. Proceedings of the Royal Society of London B: Biological Sciences. 1994;257(1348):75–79.
- 34. Zagorsky BM, Reiter JG, Chatterjee K, Nowak MA. Forgiver triumphs in alternating Prisoner’s Dilemma. PlOS ONE. 2013;8(12):e80814. pmid:24349017
- 35. Posch M, Pichler A, Sigmund K. The efficiency of adapting aspiration levels. Proceedings of the Royal Society of London B: Biological Sciences. 1999;266(1427):1427–1435.
- 36. Friedman D. Evolutionary games in economics. Econometrica: Journal of the Econometric Society. 1991; p. 637–666.
- 37. Hilbe C, Martinez-Vaquero LA, Chatterjee K, Nowak MA. Memory-n strategies of direct reciprocity. Proceedings of the National Academy of Sciences. 2017;114(18):4715–4720.
- 38. Friedman D, Oprea R. A continuous dilemma. The American Economic Review. 2012;102(1):337–363.
- 39. van Doorn GS, Riebli T, Taborsky M. Coaction versus reciprocity in continuous-time models of cooperation. Journal of theoretical biology. 2014;356:1–10. pmid:24727186
- 40. Melis AP, Grocke P, Kalbitz J, Tomasello M. One for you, one for me: Humans’ unique turn-taking skills. Psychological science. 2016;27(7):987–996. pmid:27225221
- 41. Sánchez-Amaro A, Duguid S, Call J, Tomasello M. Chimpanzees, bonobos and children successfully coordinate in conflict situations. Proceedings of the Royal Society of London B: Biological Sciences. 2017;284(1856):20170259.
- 42. Bárány I, Vempala S, Vetta A. Nash equilibria in random games. Random Structures & Algorithms. 2007;31(4):391–405.
- 43. Arieli I, Babichenko Y. Random extensive form games. Journal of Economic Theory. 2016;166:517–535.
- 44. Harsanyi JC. Games with incomplete information played by “Bayesian” players, I–III Part I. The basic model. Management science. 1967;14(3):159–182.
- 45. Dekel E, Fudenberg D, Levine DK. Learning to play Bayesian games. Games and Economic Behavior. 2004;46(2):282–303.
- 46. Ely JC, Sandholm WH. Evolution in Bayesian games I: theory. Games and Economic Behavior. 2005;53(1):83–109.
- 47. Reny PJ. On the Existence of Monotone Pure-Strategy Equilibria in Bayesian Games. Econometrica. 2011;79(2):499–553.
- 48. Fudenberg D, Levine DK. Learning and equilibrium. Annual Review of Economics. 2009;1(1):385–420.
- 49. Feltovich N. Reinforcement-based vs. Belief-based Learning Models in Experimental Asymmetric-information Games. Econometrica. 2000;68(3):605–641.
- 50. Matsushima H. Repeated games with private monitoring: Two players. Econometrica. 2004;72(3):823–852.
- 51. Liu Q, Mailath GJ, Postlewaite A, Samuelson L. Stable matching with incomplete information. Econometrica. 2014;82(2):541–587.
- 52. Hoffmann S, Mihm B, Weimann J. To commit or not to commit? An experimental investigation of pre-commitments in bargaining situations with asymmetric information. Journal of Public Economics. 2015;121:95–105.
- 53.
Heller Y. Instability of Equilibria with Private Monitoring. MPRA Munich Personal RePEc Archive Paper No 68643, posted 3 January 2016 Online at https://mpraubuni-muenchende/68643/. 2015.
- 54.
Heller Y, Mohlin E. Stable observable behavior. MPRA Munich Personal RePEc Archive Paper No 63013, posted 21 March 2015 Online at http://mpraubuni-muenchende/63013/. 2015.
- 55. Kandori M. Introduction to repeated games with private monitoring. Journal of economic theory. 2002;102(1):1–15.
- 56. Boyd R, Lorberbaum JP. No pure strategy is evolutionarily stable in the repeated prisoner’s dilemma game. Nature. 1987;327(6117):58.
- 57.
Unakafov AM, Schultze T, Kagan I, Moeller S, Gail A, Treue S, et al. Evolutionary successful strategies in a transparent iterated Prisoner’s Dilemma. In: Applications of Evolutionary Computation. Springer International Publishing; 2019. p. 204–219.
- 58. Posch M. Win–Stay, Lose–Shift Strategies for Repeated Games—Memory Length, Aspiration Levels and Noise. Journal of theoretical biology. 1999;198(2):183–195. pmid:10339393
- 59. Clements KC, Stephens DW. Testing models of non-kin cooperation: mutualism and the Prisoner’s Dilemma. Animal Behaviour. 1995;50(2):527–535.
- 60. Johnstone RA, Manica A. Evolution of personality differences in leadership. Proceedings of the National Academy of Sciences. 2011;108(20):8373–8378.
- 61. Płatkowski T. Evolutionary coalitional games. Dynamic Games and Applications. 2016;6(3):396–408.
- 62. Chen W, Gracia-Lázaro C, Li Z, Wang L, Moreno Y. Evolutionary dynamics of N-person Hawk-Dove games. Scientific reports. 2017;7(1):4800. pmid:28684866
- 63. Chang SWC. An Emerging Field of Primate Social Neurophysiology: Current Developments. eNeuro. 2017;4(5):ENEURO–0295–17.
- 64. Ruff CC, Fehr E. The neurobiology of rewards and values in social decision making. Nature Reviews Neuroscience. 2014;15(8):549. pmid:24986556
- 65. Platt ML, Seyfarth RM, Cheney DL. Adaptations for social cognition in the primate brain. Phil Trans R Soc B. 2016;371(1687):20150096. pmid:26729935
- 66. Tremblay S, Sharika KM, Platt ML. Social decision-making and the brain: A comparative perspective. Trends in cognitive sciences. 2017;21(4):265–276. pmid:28214131
- 67. Ballesta S, Duhamel JR. Rudimentary empathy in macaques’ social decision-making. Proceedings of the National Academy of Sciences. 2015;112(50):15516–15521.
- 68. Báez-Mendoza R, Schultz W. Performance error-related activity in monkey striatum during social interactions. Scientific Reports. 2016;6:37199. pmid:27849004
- 69. Haroush K, Williams ZM. Neuronal prediction of opponent’s behavior during cooperative social interchange in primates. Cell. 2015;160(6):1233–1245. pmid:25728667
- 70. Bullinger AF, Wyman E, Melis AP, Tomasello M. Coordination of chimpanzees (Pan troglodytes) in a stag hunt game. International Journal of Primatology. 2011;32(6):1296–1310.
- 71. Brosnan SF, Parrish A, Beran MJ, Flemming T, Heimbauer L, Talbot CF, et al. Responses to the Assurance game in monkeys, apes, and humans using equivalent procedures. Proceedings of the National Academy of Sciences. 2011;108(8):3442–3447.
- 72. Visco-Comandini F, Ferrari-Toniolo S, Satta E, Papazachariadis O, Gupta R, Nalbant LE, et al. Do non-human primates cooperate? Evidences of motor coordination during a joint action task in macaque monkeys. Cortex. 2015;70:115–127. pmid:25824631
- 73. Sánchez-Amaro A, Duguid S, Call J, Tomasello M. Chimpanzees coordinate in a snowdrift game. Animal Behaviour. 2016;116(Supplement C):61–74. https://doi.org/10.1016/j.anbehav.2016.03.030.
- 74. Brosnan SF, Price SA, Leverett K, Prétôt L, Beran M, Wilson BJ. Human and monkey responses in a symmetric game of conflict with asymmetric equilibria. Journal of Economic Behavior & Organization. 2017;142:293–306.
- 75. King AJ, Johnson DDP, Van Vugt M. The origins and evolution of leadership. Current biology. 2009;19(19):R911–R916. pmid:19825357
- 76. Devaine M, Hollard G, Daunizeau J. Theory of mind: did evolution fool us? PloS One. 2014;9(2):e87619. pmid:24505296
- 77.
Luce RD. Response times: Their role in inferring elementary mental organization. Oxford University Press; 1986.
- 78. Ratcliff R. Methods for dealing with reaction time outliers. Psychological bulletin. 1993;114(3):510. pmid:8272468
- 79.
Rapoport A. Two-person game theory. NY: Dover Publications Inc; 1973.
- 80. Doebeli M, Hauert C, Killingback T. The evolutionary origin of cooperators and defectors. Science. 2004;306(5697):859–862. pmid:15514155
- 81. Kümmerli R, Colliard C, Fiechter N, Petitpierre B, Russier F, Keller L. Human cooperation in social dilemmas: comparing the Snowdrift game with the Prisoner’s Dilemma. Proceedings of the Royal Society B: Biological Sciences. 2007;274(1628):2965–2970. pmid:17895227
- 82. Płatkowski T. Cooperation in two-person evolutionary games with complex personality profiles. Journal of theoretical biology. 2010;266(4):522–528. pmid:20659479
- 83.
Weibull JW. Evolutionary game theory. MIT press; 1997.
- 84. Nowak MA, Sigmund K. The evolution of stochastic strategies in the prisoner’s dilemma. Acta Applicandae Mathematicae. 1990;20(3):247–265.
- 85. Hilbe C, Nowak MA, Traulsen A. Adaptive dynamics of extortion and compliance. PloS one. 2013;8(11):e77886. pmid:24223739
- 86. Selten R. Reexamination of the perfectness concept for equilibrium points in extensive games. International journal of game theory. 1975;4(1):25–55.