Skip to main content
Advertisement
  • Loading metrics

Properties of winning Iterated Prisoner’s Dilemma strategies

  • Nikoleta E. Glynatsi ,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing

    glynatsi@evolbio.mpg.de

    Affiliations Research Group Dynamics of Social Behavior, Max Planck Institute for Evolutionary Biology, Plön, Germany, School of Mathematics, Cardiff University, Cardiff, United Kingdom

  • Vincent Knight ,

    Contributed equally to this work with: Vincent Knight, Marc Harper

    Roles Conceptualization, Methodology, Writing – original draft, Writing – review & editing

    Affiliation School of Mathematics, Cardiff University, Cardiff, United Kingdom

  • Marc Harper

    Contributed equally to this work with: Vincent Knight, Marc Harper

    Roles Conceptualization, Methodology, Writing – original draft, Writing – review & editing

    Affiliation Google Inc., Mountain View, California, United States of America

Abstract

Researchers have explored the performance of Iterated Prisoner’s Dilemma strategies for decades, from the celebrated performance of Tit for Tat to the introduction of the zero-determinant strategies and the use of sophisticated learning structures such as neural networks. Many new strategies have been introduced and tested in a variety of tournaments and population dynamics. Typical results in the literature, however, rely on performance against a small number of somewhat arbitrarily selected strategies, casting doubt on the generalizability of conclusions. In this work, we analyze a large collection of 195 strategies in thousands of computer tournaments, present the top performing strategies across multiple tournament types, and distill their salient features. The results show that there is not yet a single strategy that performs well in diverse Iterated Prisoner’s Dilemma scenarios, nevertheless there are several properties that heavily influence the best performing strategies. This refines the properties described by Axelrod in light of recent and more diverse opponent populations to: be nice, be provocable and generous, be a little envious, be clever, and adapt to the environment. More precisely, we find that strategies perform best when their probability of cooperation matches the total tournament population’s aggregate cooperation probabilities. The features of high performing strategies help cast some light on why strategies such as Tit For Tat performed historically well in tournaments and why zero-determinant strategies typically do not fare well in tournament settings.

Author summary

In 1980, political scientist Robert Axelrod ran one of the most famous computer tournaments of the Iterated Prisoner’s Dilemma (IPD). The winner? The now-famous strategy, Tit for Tat. Axelrod attributed its success to simple properties such as: do not be envious, avoid being the first to defect, and do not be overly clever. Yet the tournament design, using only a small, selected set of strategies, not including random noise, and having fixed game lengths, raises questions about the generalizability of these results. Many researchers have continued to make similar assumptions in their own IPD experiments, limiting the insights that can be applied to more complex, realistic settings.

In our study, we address these limitations by analyzing the performance of a large and diverse collection of IPD strategies across thousands of computer tournaments. We find that, while no single strategy consistently excels, successful strategies share key characteristics: they are nice, provocable and generous, a little envious, clever, and adapt to the environment. More precisely, strategies perform best when their probability of cooperation matches the total tournament population’s aggregate cooperation probabilities.

Introduction

The Iterated Prisoner’s Dilemma (IPD) is a repeated two-player game that models behavioral interactions, specifically interactions where self-interest clashes with collective interest. It encompasses a wide range of social and biological phenomena. In each turn of the game, both players simultaneously and independently decide between cooperation (C) and defection (D). This decision is made with the memory of all prior interactions. The payoffs for each player at each turn are influenced by their own choice and the choice of the other player. To this end, the payoffs of the game are defined by (1) where typically T > R > P > S and 2R > T + S. The most common values used in the literature [1] are R = 3, P = 1, T = 5, S = 0, and these are the values also used in this work.

Conceptualizing strategies and understanding the best way to play the game have been of interest to the scientific community since the formulation of the game [213]. This extends to both tournament settings and population dynamics. Computer tournaments became a common evaluation technique for newly designed strategies following Axelrod’s computer tournaments in the 1980s [2, 14]. The winner of both of Axelrod’s tournaments [2, 14] was the simple strategy Tit For Tat (TFT). TFT cooperates on the first turn and thereafter copies the previous action of its opponent, retaliating against defections with a defection and forgiving a defection if followed by cooperation. Axelrod concluded that the strategy’s robustness was due to four properties, which he adapted into four suggestions for success in an IPD tournament:

  1. Do not be envious by striving for a payoff larger than the opponent’s payoff.
  2. Be “nice” by not being the first to defect.
  3. Reciprocate both cooperation and defection; Be provocable to retaliation and forgiveness.
  4. Do not be too clever by scheming to exploit the opponent.

Forgiveness, in this context, is a strategy’s ability to cooperate after a DC outcome to achieve mutual cooperation again. In environments without noise, TFT would end up in DC only if it had received a defection and then retaliated. Subsequently, TFT would forgive an opponent that apologizes (in a DC round) by returning to cooperation, as mutual cooperation is deemed better than mutual defection.

Due to the strategy’s strong performance in both tournaments and a series of evolutionary experiments [1], TFT was often claimed to be a highly robust (and sometimes the most robust) strategy for the IPD. There are strategies that have built upon TFT and the reciprocity-based approach. In [5], a strategy called Gradual was introduced, constructed to have the same qualities as those of TFT with one addition. Gradual has a memory of the previous rounds of play in the game, recording the number of defections by the opponent and punishing them with a growing number of defections. It then enters a calming state in which it cooperates for two rounds. A strategy with the same intuition as Gradual is Adaptive Tit for Tat [15]. Adaptive Tit for Tat maintains a continually updated estimate of the opponent’s behavior and uses this estimate to condition its future actions. Other research has built upon the limitations of TFT. For example, in [1619], it was shown that TFT suffered in environments with noise. This was mainly due to the strategy being too provocable and its lack of generosity and contrition. Since TFT immediately punishes a defection, in a noisy environment, it can get stuck in a repeated cycle of defections and cooperations. Some new strategies, more robust in tournaments with noise, were soon introduced, including Nice and Forgiving [16], Generous Tit For Tat [3], and Pavlov (aka Win Stay Lose Shift) [4], as well as later variants such as OmegaTFT [20].

Finally, others introduced strategies deviating completely from the originally suggested properties of success. For example, a set of “envious” Iterated Prisoner’s Dilemma (IPD) strategies were introduced, called zero-determinant strategies (ZDs), in [6]. These strategies attempt to force a linear relationship between stationary payoffs against their opponents, potentially ensuring that they receive a higher average payout. While ZDs were introduced with a small tournament in which some were reportedly successful [21], this result has not generally held in future work [22]. Furthermore, in [23], a series of “clever” strategies trained using reinforcement learning were introduced. These strategies were trained using lookup tables [24], hidden Markov models [23], and finite-state automata [25], on a set of 170 strategies.

One thing that has remained the same is that the introduction of a new strategy is often accompanied by a claim that the new strategy is the best performing strategy for the IPD, often without extensive testing against a broad spectrum of opponents or representative classes of opponents. The lack of testing against formally defined strategies and tournament winners is understandable given the effort required to implement the hundreds of published IPD strategies. Implementing prior strategies faithfully is often extremely difficult or impossible due to insufficient descriptions and lack of published implementations or code. Despite these challenges, the absence of thorough testing raises concerns about claims regarding the superiority or robustness of newly introduced strategies.

Beyond these difficulties, we believe that limited comprehensive analyses are rooted in field conventions. Tournaments or evolutionary dynamics often rely on a select list of hand-picked strategies chosen by modelers, typically based on specific properties they wish to examine. This practice may stem from misconceptions, such as the assumption that because TFT performs relatively well, it is sufficient to test only against TFT variants. Another misconception may stem from the Press & Dyson result [6], which implies that lower memory strategies will always dominate pairwise interactions, leading some to consider only memory-one strategies. However, this result holds strictly only in pairwise interactions.

It is not only the set of strategies or the tournament parameters such as noise that may impact results but also the design of the round-robin tournament itself. To address this [26] separately examined the effects of changes in format, objective criteria, and payoff values on tournament outcomes. They demonstrated that TFT’s performance declined under certain conditions. To our knowledge, this is the only study that has reanalyzed the tournament structure and critically evaluated it. However, in this work, we employ an extensive list of strategies made possible by the Axelrod-Python package, an approach that would have been difficult to achieve previously. Unlike the authors of that study, we do not consider a new tournament design.

In this paper, we evaluate the performance of a significant number of IPD strategies across a diverse array of tournaments. Many of the strategies used in our analysis are drawn from well-known and named strategies in IPD literature, including previous tournament winners. This contrasts with other work that is often constrained to specific classes such as as memory-one strategies or those of a certain structural form like finite state machines or deterministic memory-two strategies. Furthermore, our tournaments encompass variations, including standard tournaments resembling Axelrod’s original ones, tournaments with noise, probabilistic match length, and both noise and probabilistic match length. This diversity in strategies and tournament types provides new insights and tests earlier claims in alternative settings against known powerful strategies. More specifically, we show that the previous tournament winners are lacking against large enough opponent pools; they do not appear among the top-performing strategies anymore. This could be due to likely suffering from a lack of diversity in the strategies they were trained/tested against, finding it hard to adapt to the new strategies.

It is important to note that we do not assert the existence of a single best-performing strategy across all tournaments or tournament types. On the contrary, our work demonstrates that such a strategy does not exist (notwithstanding a few strategies with broadly high performance). The primary objective of this paper, presented in the latter parts of the paper, is to continue the discussion on the properties of successful strategies, a conversation started by Axelrod. The results of our analysis conclude that the properties of a successful strategy in the Iterated Prisoner’s Dilemma (IPD) are:

  1. Be a little bit envious
  2. Be “nice” in non-noisy environments or when game lengths are longer
  3. Reciprocate both cooperation and defection appropriately; Be provocable in tournaments with short matches, and generous in tournaments with noise
  4. It’s ok to be clever
  5. Adapt to the environment; Adjust to the mean population cooperation

We believe that the discussion on the properties of winning strategies holds significant importance. It aims to provide guidance to researchers designing new strategies and those training strategies. Specifically, much like the recognized value of diversity in training datasets [27], such as variations in image perspective, skin color, etc., are critical in training accurate and generalizable machine learning models, we show that diversity in the population of opponent strategies is of paramount importance in the construction and evaluation of game theory strategies. Moreover, conducting a similar analysis can shed light on already trained strategies, aiding in understanding the key features they have autonomously developed during their training processes.

Model

The data collection of various types of tournaments and the use of different strategies are made possible due to an open-source library called Axelrod-Python [28] (version 3.0.0). Axelrod-Python enables the simulation of IPD tournaments and contains an extensive list of strategies. Most of these strategies are described in the literature, with a few exceptions contributed specifically to the package. In this paper, we use a total of 195 strategies, which can be found in the Supplementary Material (S1 Text). The package supports several tournament types, and this work considers standard, noisy, probabilistic ending, and noisy probabilistic ending tournaments.

Standard tournaments are similar to Axelrod’s well-known tournaments [2]. In these tournaments, there are N strategies, and each strategy plays an iterated game with n turns against all other strategies, not including self-interactions. Noisy tournaments also involve N strategies and n turns, but in each turn, there is a probability pn that a player’s action is flipped. Compared to these two tournaments, in probabilistic ending tournaments the number of turns is not fixed. Instead, a match between strategies ends with a given probability pe. Finally, noisy probabilistic ending tournaments incorporate both a noise probability pn and an ending probability pe. For smoother results, each tournament is repeated k times, and this repetition factor was allowed to vary to assess the impact of smoothing. The winner of each tournament is determined based on the average score achieved by a strategy from the entire set of repetitions, not by the number of wins.

To run a tournament, only a few lines of code are required (Fig 1). Specifically, one needs to define the list of strategies that the players use when participating in the tournament, the number of repetitions, the number of turns or the probability of each match ending, and the probability of noise. We demonstrate two examples in Fig 1: one with a fixed number of turns and one with a probabilistic ending. A tournament is an instance of the Tournament class. To execute the tournament, users need to run the play() method, which returns an instance of the ResultSet class. This instance includes many details of the tournament, such as the winner and the average score of the participants. Additionally, the instance contains a more detailed summary of the tournament, which we use in our analysis. We describe the results summary in detail below.

thumbnail
Fig 1. Example usage of the Axelrod-Python package for running tournaments.

The strategies that players use are saved in a list, which is then passed to the Tournament class. In this example, strategies we previously discussed, such as TFT, Generous Tit-for-Tat, Gradual, and a stochastic strategy evolved through reinforcement learning, are included. We create a standard tournament where noise is set to 0, as well as a noisy tournament with probabilistic ending. Once an instance of the tournament class is defined, executing the tournament is straightforward. The play() method generates all possible pairs from the list of strategies, including pairs where each strategy plays against itself, and then iterates through each match to play it. Each match is repeated according to the specified number of repetitions, with results aggregated to a tournament summary.

https://doi.org/10.1371/journal.pcbi.1012644.g001

The process of collecting tournament results is outlined in Algorithm 1. For each trial, a random size N is selected, and a random list of N strategies from the 195 available. Subsequently, one standard, one noisy, one probabilistic ending, and one noisy probabilistic ending tournament are conducted for the selected list of strategies. The parameters for the tournaments, as well as the number of repetitions, are chosen once for each trial. We have run a total of 11400 trials of Algorithm 1. For each trial, we collect the results for four different tournaments, resulting in a total of 45600 (11400 × 4) tournament results. Each tournament outputs a result summary in the form of Table 1.

Algorithm 1: Tournament Data Collection Algorithm

for seed ∈ [0, 11420] do

N ← randomly select integer ∈ [3, 195];

 players ← randomly select N players;

k ← randomly select integer ∈ [10, 100];

n ← randomly select integer ∈ [1, 200];

pn ← randomly select float ∈ [0, 1];

pe ← randomly select float ∈ [0, 1];

 result standard ← Axelrod.tournament(players, n, k);

 result noisy ← Axelrod.tournament(players, n, pn, k);

 result probabilistic ending ← Axelrod.tournament(players, pe, k);

 result noisy probabilistic ending ← Axelrod.tournament(players, pn, pe, k);

return result standard, result noisy, result probabilistic ending, result noisy probabilistic ending;

thumbnail
Table 1. Result summary example of a tournament.

A result summary consists of N rows, with each row containing information for each strategy that participated in the tournament. This information includes the strategy’s rank (R), median score, the cooperation rate (Cr), the number of match wins, and the probability that the strategy cooperated in the opening move. Additionally, it provides the probabilities of a strategy being in any of the four states (CC, CD, DC, DD) and the cooperation rate after each state.

https://doi.org/10.1371/journal.pcbi.1012644.t001

The summary contains statistics regarding each strategy that participated in the tournament, such as its rank, cooperation rate, time spent in each state when only a single past round is considered, and the probability of cooperating after each of the four possible outcomes of the previous round. In our analysis, we will use the measures for each strategy provided in the summary, as well as additional measures we calculated. Namely, these include the SSE error, the average, median, maximum, and minimum cooperation rates in each tournament. The SSE (introduced in [29]) shows how closely a strategy behaves as a zero-determinant strategy and subsequently in an extortionate way. We also consider how each strategy’s cooperation rate Cr compares to those of the tournament as a whole, for example, by comparing Cr to Cmax.

During the data collection process, the probabilities of noise (pn) and tournament ending (pe) were allowed to take values between 0 and 1. However, commonly used values for these probabilities are pn ≤ 0.1 and pe ≤ 0.1. This is to make the results more interpretable. For example, consider a strategy competing in an environment with pn > 0.1. In cases with a high value of noise, most of the actions the strategy takes are the complete opposite of what the strategy is designed to do. Therefore, we will focus on the tournaments for which pn ≤ 0.1 and pe ≤ 0.1. Thus, the results presented here pertain to subsets of the noisy and probabilistic ending tournaments. Specifically, the results rely on 1150 tournaments with noise, 1134 tournaments with a probabilistic ending, and 117 tournaments with both noise and a probabilistic ending. We also provide an analysis of the paper considering the entire datasets, and these results are presented in the Supplementary Material (S1 Text). The general results of the analysis are not affected by the restriction of the noise and probabilistic ending probabilities.

Results

Top ranked strategies across tournaments

A strategy has participated in multiple tournaments of each type, and to evaluate its overall performance, we introduce a measure called the normalized rank. In each tournament, the strategies receive a rank (R), where 0 denotes that the strategy was the winner, and N − 1 indicates that the strategy came last in the tournament. The normalized rank, denoted as r, is calculated as . Thus, the rank a strategy achieved over the number of players in the tournament. The performance of the strategies is assessed based on the median of the normalized rank, denoted as .

For example, let’s consider the well-known strategies TFT and Gradual. Each strategy participated in several tournaments of each type. In Fig 2 we show the distribution of the normalised ranks of these strategies in each of the four tournaments. We can observe that TFT looks to be normally distributed normalized rank. In comparison, Gradual’s performance has longer tails, indicating that there were tournaments where the strategy performed very well or very poorly. Overall, Gradual achieves a lower median rank, signifying that it performs better than TFT except in the case of noisy and probabilistic ending tournaments (lower rank is better).

thumbnail
Fig 2. Examples of normalized rank distributions for two strategies, TFT and Gradual.

We plot the distributions of r for the two strategies in the four tournament types. As a reminder, lower values of r correspond to better performances. The top left quadrant of each plot shows the distribution for standard tournaments (fixed number of turns and no noise). The top right quadrant shows the distribution for noisy tournaments (fixed number of turns and noise). The bottom left quadrant shows the distribution for probabilistic ending tournaments (no noise and probabilistic ending). Finally, the bottom right quadrant shows the distribution for noisy probabilistic ending tournaments (noise and probabilistic ending). In each quadrant, we also show the number of data points. Both strategies participated in a similar number of tournaments. Based on the median rank, which we use in this work to define overall performance, TFT performs best in probabilistic ending tournaments, whereas Gradual was in standard tournaments.

https://doi.org/10.1371/journal.pcbi.1012644.g002

The top 15 strategies for each tournament type, based on , are presented in Table 2, while the r distributions for the top-ranked strategies can be found in Fig 3.

thumbnail
Fig 3. r distributions of the top 15 strategies in different environments.

A lower value of corresponds to a more successful performance. A strategy’s r distribution skewed towards zero indicates that the strategy ranked highly in most tournaments it participated in. Most distributions are skewed towards zero.

https://doi.org/10.1371/journal.pcbi.1012644.g003

thumbnail
Table 2. Top performances for each tournament type based on .

The results of each type are based on 11420 unique tournaments. The results for noisy tournaments with pn < 0.1 are based on 1151 tournaments, and for probabilistic ending tournaments with pe < 0.1 on 1139. The top ranks indicate that trained strategies perform well in a variety of environments, but so do simple deterministic strategies. For noisy tournaments DBS is the top ranked strategy with , thus DBS won every tournament it participated in. The same for Evolved FSM 16 Noise 05 in probabilistic ending.

https://doi.org/10.1371/journal.pcbi.1012644.t002

In standard tournaments dominating strategies were those trained using reinforcement learning techniques. 10 out of the 15 top strategies were introduced in [23]. These strategies are based on finite state automata (FSM), hidden Markov models (HMM), artificial neural networks (ANN), lookup tables (LookerUp), and stochastic lookup tables (Gambler). They have been trained using reinforcement learning algorithms (evolutionary and particle swarm algorithms) to perform well against a subset of the strategies in Axelrod-Python in a standard tournament. Thus, their performance in the specific setting was anticipated, although still noteworthy given the random sampling of tournament participants. DoubleCrosser and BackStabber, both from the Axelrod-Python, use the number of turns and are set to defect in the last two rounds. These strategies can be characterized as “cheaters” because their source code allows them to know the number of turns (unless the match has a probabilistic ending). These strategies were expected to not perform as well in tournaments where the number of turns is not specified. Finally, Winner 12 [22] and DBS [30] are both from the literature. DBS is a strategy specifically designed for noisy environments; however, it ranks highly in standard tournaments as well. Similarly, the fourth-ranked player, Evolved FSM 16 Noise 05, was trained for noisy tournaments yet performs well in standard tournaments.

In the case of noisy tournaments, the top-performing strategies include strategies specifically designed for noisy tournaments. These are DBS, Evolved FSM 16 Noise 05, Evolved ANN 5 Noise 05, PSO Gambler 2 2 2 Noise 05, and Omega Tit For Tat [20]. Omega TFT, a strategy designed to break the deadlocking cycles of CD and DC that TFT can fall into in noisy environments, places 10th. The rest of the top ranks are occupied by strategies that performed well in standard tournaments and deterministic strategies such as Spiteful Tit For Tat [31], Level Punisher [32], Eugine Nier [33].

Furthermore, in tournaments with probabilistic endings, the highly ranked strategies leaned towards defecting strategies and trained finite state automata, as demonstrated by the works of Ashlock et al. [34, 35]. The most effective strategies in probabilistic ending tournaments are also a series of ensemble Meta strategies, trained strategies that performed well in standard tournaments, and Grudger [28] and Spiteful Tit for Tat [31]. The Meta strategies [28] utilize a team of strategies and aggregate the potential actions of the team members into a single action in various ways.

While no single strategy consistently outperforms all others in any of the distinct tournament types or across various tournament types, certain types of strategies consistently achieve top rankings. These include strategies that have undergone training, those that retaliate, and those that adapt their behavior based on preassigned rules to optimize outcomes. These findings challenge some of Axelrod’s suggestions, particularly the advice to “not be clever” and “not be envious”.

The effect of strategy features on performance

For each strategy, we have a variety of features as described in Table 3. These features capture measures related to a strategy’s behavior in the tournaments it competed in, as well as intrinsic properties, such as whether a strategy is deterministic or stochastic. The correlation coefficients between the features for performance evaluation, the median score and the median normalised rank are given by Table 4. The correlation coefficients between all features have also been calculated and a graphical representation can be found in the Supplementary Material (S1 Text).

thumbnail
Table 3. Included features for performance evaluation analysis.

Stochastic, makes use of length and makes use of game are APL classifiers that determine whether a strategy is stochastic or deterministic, whether it makes use of the number of turns or the game’s payoffs. The memory usage is calculated as the number of turns the strategy considers to make an action (which is specified in the APL) divided by the number of turns. The SSE (introduced in [29]) shows how close a strategy is to behaving as a ZDs, and subsequently, in an extortionate way. The method identifies the ZDs closest to a given strategy and calculates the algebraic distance between them as the sum of squared error (SSE). A SSE value of 1 indicates no extortionate behaviour at all whereas a value of 0 indicates that a strategy is behaving as a ZDs. The memory usage of strategies is the number of rounds of play used by the strategy when deciding on an action, divided by the number of turns in each match. For example, Winner12 uses the previous two rounds of play, and if participating in a match with 100 turns its memory usage would be 2/100. For strategies with an infinite memory size, for example Evolved FSM 16 Noise 05, memory usage is equal to 1. Note that for tournaments with a probabilistic ending the number of turns was not collected, so the memory usage feature is not used for probabilistic ending tournaments. The rest of the features considered are the CC to C, CD to C, DC to C, and DD to C rates as well as cooperating ratio of a strategy, the minimum (Cmin), maximum (Cmax), mean (Cmean) and median (Cmedian) cooperating ratios of each tournament.

https://doi.org/10.1371/journal.pcbi.1012644.t003

thumbnail
Table 4. Correlations between the features of Table 3 and the normalised rank and the median score.

For each type of tournament, standard, noisy, probabilistic ending, and noisy probabilistic ending, we conduct a correlation analysis. For each tournament, we check the correlation between each feature used in our analysis and the normalized random and median scores. Note that the correlation coefficients are calculated using Spearman’s rank correlation coefficient. A negative value indicates a negative correlation, and in the case of the normalized rank, a smaller rank translates to a better position in the tournament.

https://doi.org/10.1371/journal.pcbi.1012644.t004

In standard tournaments, the features CC to C, Cr, Cr/Cmax, and the cooperating ratio compared to Cmedian and Cmean have a moderately negative effect on the normalized rank (a smaller rank is better) and a moderate positive effect on the median score. The SSE error and the DD to C rate have the opposite effects. Thus, in standard tournaments, behaving cooperatively corresponds to a more successful performance. Even though being nice generally pays off, that does not hold against defective strategies. Being more cooperative after a mutual defection, that is not retaliating, is associated with lesser overall success in terms of normalized rank. Compared to standard tournaments, in both noisy and noisy probabilistic ending tournaments, the higher the rates of cooperation, the lower a strategy’s success and median score. A strategy would not want to cooperate more than both the mean and median cooperator in such settings. In probabilistic ending tournaments, the cooperation rate of the winners and its relative comparison to the cooperation rates of the tournament have no effect. The only features that have an effect are the CD to C rate, which is the tendency of a strategy to forgive, and the SSE rate, which has a positive effect on the normalized rank.

A multivariate linear regression has been fitted to model the relationship between the features and the normalized rank. Based on the graphical representation of the correlation matrices given in the Supplementary Material (S1 Text), several features are highly correlated and have been removed before fitting the linear regression model. The features included are given in Table 5 alongside their corresponding p values in distinct tournaments and their regression coefficients. The CD to C rate has a positively statistically significant effect on the normalized rank across all tournament types. This suggests that being generous tends to lower one’s performance. In the case of probabilistic ending tournaments, the coefficient of the CD to C rate is the highest, indicating that one should be more provocative in this setting. Similarly, the SEE error rate has a positive effect on the normalized rank, suggesting that being extortionate pays off, especially in noisy tournaments. The measures of cooperation, Cr and Cr/Cmax, also exhibit a significant effect. In noisy probabilistic ending tournaments, this effect is positive; however, the coefficient is very close to zero. In other tournament types, the effect is negative, indicating that one should aim to be less cooperative than the mean cooperator of the tournament. However, we cannot interpret the result as suggesting that a strategy should be as uncooperative as possible.

thumbnail
Table 5. Results of multivariate linear regressions with r as the dependent variable.

R squared is reported for each model. The R scores of the fitted models indicate their capability to explain some of the variation in the median rank. Most of the features have a statistically significant effect on the normalized rank. A multivariate linear regression has also be fitted on the median score. The coefficients and p values of the features can be found in Supplementary Material (S1 Text). Both approaches lead to similar conclusions.

https://doi.org/10.1371/journal.pcbi.1012644.t005

The results presented here suggest that generosity/provocation and a strategy’s cooperation rate, particularly in comparison to the tournament averages, are significant features. The analysis suggests that strategies should be more generous in noisy tournaments and less generous in probabilistic ending tournaments. Moreover, strategies should aim to not cooperate more than the mean cooperator in their tournaments. We note the analysis is limited as we only consider a linear relationship between these parameters and the rank. To further investigate the effects of the parameters discussed in this section, we have conducted a more detailed analysis in the next section, focusing on the performances of the winners of the tournaments.

Features of top performing strategies

In Fig 4, we present the distributions of the cooperation ratio and Cr/Cmean for the winners of tournaments. A value of Cr/Cmean = 1 implies that the cooperation ratio of the winner was the same as the mean cooperating ratio of the tournament, and we observe that this occurs for most tournament types, apart from the case of noisy and probabilistically ending tournaments. In the case of probabilistic ending tournaments, there are several winners that cooperated much less than that, confirming the results of the previous section that defecting strategies can be winners in probabilistic ending tournaments. The distribution of the cooperation rates showcases a high cooperation rate in standard tournaments and probabilistic ending tournaments. In tournaments with noise, we observe a much less cooperative behavior, which could result from strategies being cautious of potential flip actions by the co-player or strategies not suited for noise holding grudges against defections.

thumbnail
Fig 4. Distributions of Cr and Cr/Cmean for the winners of tournaments.

In this distribution, we consider the winners of the tournaments, specifically the strategies that ranked first in each tournament. For each type of tournament, we plot the cooperation rate of the winner in the tournament they won, as well as the ratio of the winner’s cooperation rate to that of the entire tournament. A value of Cr/Cmean = 1 implies that the cooperation ratio of the winner was the same as the mean cooperation ratio of the tournament.

https://doi.org/10.1371/journal.pcbi.1012644.g004

Analyzing the SSE distributions across different tournament types (Fig 5) suggests that successful strategies exhibit some extortionate behavior, though not consistently. ZDs are a set of strategies that are often envious, as they attempt to exploit their opponents. The winners of the tournaments considered in this work demonstrate envious behavior, but not to the extent observed in many ZDs. While the exact interactions between matches are not recorded here, the work of [23], which introduced the trained strategies appearing in the top-ranked strategies of Section, did record such interactions. In [23], it was shown that clever strategies managed to achieve mutual cooperation with stronger strategies while exploiting weaker ones. This could explain the clever winners in our analysis and the observed SSE distributions.

thumbnail
Fig 5. Distributions of SSE error for the winners of tournaments.

Here, we again consider the winners of the tournaments, separated by type of tournament, and plot their SSE error. As a reminder, the SSE error indicates how closely a strategy behaves like a Zero-Determinant (ZD) strategy, and subsequently, in an extortionate way. An SSE value of 1 indicates no extortionate behavior at all, whereas a value of 0 indicates that a strategy is behaving as a ZD.

https://doi.org/10.1371/journal.pcbi.1012644.g005

This might also be the reason why ZDs fail to appear in the top ranks—they attempt to exploit all opponents and cannot actively adapt back to mutual cooperation against stronger strategies, which requires a deeper memory. It’s worth noting that ZDs tend to perform poorly in population games for a similar reason: they aim to exploit other players using ZDs, failing to form a cooperative subpopulation [36]. This makes them effective invaders but poor at resisting invasion.

Finally, we examine the distributions of the cooperation rates after the outcomes CC, CD, DC, and DD, as shown in Fig 6. In the case of cooperating after mutual cooperation, the results align with expectations; the distributions skew towards 1, indicating that the winners of the tournaments are more likely to cooperate after mutual cooperation. Regarding the CD outcome and the likelihood to cooperate after such a result, capturing generosity, the distributions skew towards 1/2, not 1, suggesting that strategies need to reduce their readiness to forgive. This aligns with the known result that Generous Tit For Tat generally outperforms TFT in most settings. In probabilistic ending tournaments, there is a peak at 0, suggesting that strategies should not be too generous in tournaments with short matches. Such a peak also appears in standard tournaments; however, not in tournaments with noise, where a strategy should be more generous.

thumbnail
Fig 6. Distributions of rates CC to C, CD to C, DC to C, and DD to C for the winners of tournaments.

The result summary from the tournaments records how often each strategy cooperated after each possible outcome of the previous round. Specifically, we analyze the probability with which a strategy chose C following the outcomes. Here, we plot the distributions of these probabilities for the winners of the tournaments. We separate them by type of tournament, from top to bottom: standard, noisy, probabilistic ending, and noisy probabilistic ending. From left to right, we plot the distributions of cooperation after the outcomes CC, CD, DC, and DD.

https://doi.org/10.1371/journal.pcbi.1012644.g006

Part of a strategy’s envious behavior can be captured by the rate of DC to C. In noisy tournaments, winners are not too envious, but in tournaments without noise, we can see that winners behave in two ways. Some are a bit envious, whereas others are very envious. In the DD to D, we can observe that, expectedly, the results are skewed towards 0. However, there are winners that attempt to recover from a DD outcome. The remaining results are as expected, skewed towards 0.

Discussion

This manuscript explores the performance of 195 strategies in the IPD in thousands of computer tournaments. The collection of computer tournaments presented here is the largest and most diverse in the literature. The 195 strategies are drawn from Axelrod-Python library and include strategies from the IPD literature. The computer tournaments encompass four different types. So, what is the best way to play the IPD? And is there a single dominant strategy for the IPD? There was not a single strategy within the collection of 195 strategies that managed to perform well in all the tournament variations it competed in. A strategy ranking highly in a specific environment did not guarantee its success over different tournament types, with a few exceptions—strategies that generalize better. Already well-known in the AI/ML literature, adding noise to training data leads to more robust models [37]. We see that clearly here, where the strategies trained for noise (or designed for noise) tend to be better generalists. There were instances where a few strategies trained in narrow conditions outperformed more generalist strategies, as they tend to overfit. However, the strategies trained with noise perform well in general, whilst the strategies trained specifically on no noise or small subpopulations do not.

We also examined the best-performing strategies across various tournament types and analyzed their salient features. This demonstrated that there are properties associated with the success of strategies that contradict the originally suggested properties of Axelrod [1]. We showed that complex or clever strategies can be effective, whether trained against a corpus of possible opponents or purposely designed to mitigate the impact of noise such as the DBS strategy. Moreover, we found some strategies designed or trained for noisy environments were also highly ranked in noise-free tournaments which reinforces the idea that strategies’ complexity/cleverness is not necessarily a liability, rather it can confer adaptability to a more diverse set of environments. We also showed that while the type of exploitation attempted by ZDs is not typically effective in standard tournaments, envious strategies capable of both exploiting and not their opponents can be highly successful. Based on the results of [23] this could be because they are selectively exploiting weaker opponents while mutually cooperating with stronger opponents. Highly noisy or tournaments with short matches also favoured envious strategies. These environments mitigated the value of being nice. Uncertainty enables exploitation, reducing the ability of maintaining or enforcing mutual cooperation, while triggering grudging strategies to switch from typically cooperating to typically defecting.

The features analysis of the best performing strategies demonstrated that a strategy should reciprocate, as suggested by Axelrod, but it should relax its readiness to do so and be more generous. For noisy environments this is inline with the results of [1619], however, we also showed that generosity pays off even in standard settings, and that in fact the only setting a strategy would want to be too provocable is when the matches are not long. Forgiveness as defined by Axerlod was not explored here. This was mainly because the two round states were not recorded during the data collection. This could be a topic of future work that examines the impact of considering more rounds of history. The features analysis also concluded that there is a significant importance in adapting to the environment, and more specifically, to the mean cooperator. In most tournament types, the winner of the tournament was also the average cooperator. Even in tournaments with short matches where defecting behavior could secure a win, a large number of winners were average cooperators.

This could potentially explain the early success of TFT. TFT naturally achieves a cooperation rate near Cmean by virtue of copying its opponent’s last move while also minimizing instances where it is exploited by an opponent (cooperating while the opponent defects), at least in non-noisy tournaments. It could also explain why Tit For N Tats does not fare well for N > 1—it fails to achieve the proper cooperation ratio by tolerating too many defections.

Our results may also help explain the historically unexpected effectiveness of memory-one strategies [38]. The success of these strategies contradicts the intuitive assumption that a longer memory and therefore more information would yield better strategic performance [39]. Given that among the important features associated with success are the relative cooperation rate to the population average and the four memory-one probabilities of cooperating conditional on the previous round of play, these features can be optimized by a memory-one strategy such as TFT. Usage of more history becomes valuable when there are exploitable opponent patterns. This is indicated by the importance of SSE as a feature, showing that the first-approximation provided by a memory-one strategy is no longer sufficient. These results highlight a central idea in evolutionary game theory in this context: the fitness landscape is a function of the population (where fitness in this case is tournament performance) [40]. While that may seem obvious now, it shows why historical tournament results on small or arbitrary populations of strategies have so often failed to produce generalizable results.

To this end, many strategies, such as Win-Stay-Lose-Shift and Generous Tit For Tat, emerged due to their strong performance in evolutionary dynamics. Axelrod’s original work relied on computer tournaments, so we chose to remain consistent with this approach, as a comprehensive study like ours had not yet been undertaken. However, evolutionary settings would be an exciting direction for future study.

Overall, the five properties successful strategies need to have in a IPD competition based on the analysis that has been presented in this manuscript are:

  1. Be “nice” in non-noisy environments or when game lengths are longer
  2. Be provocable in tournaments with short matches, and generous in tournaments with noise
  3. Be a little bit envious
  4. Be clever
  5. Adapt to the environment (including the population of strategies).

The results presented here were based only on a subset of the whole data we have collected. The analysis of the full dataset is discussed in the Supplementary Material (S1 Text). However, we can see that the general results of our work remain the same. In the Supplementary Material (S1 Text), we also evaluate the importance of features using a random forest classifier and a clustering approach. The results of these analyses are also in line with the results presented here.

The data set described in this work contains the largest number of IPD tournaments, to the authors knowledge. The raw data set is available at [41] and the processed data at [42]. Further data mining could be applied and provide new insights in the field.

Supporting information

S1 Text. Supplementary material.

This document provides details of further analysis, a summary of all parameters and notations used in the manuscript, and a comprehensive list of all strategies considered in this work.

https://doi.org/10.1371/journal.pcbi.1012644.s001

(PDF)

Acknowledgments

A variety of open-source software have been used in this work. The authors would like to express their gratitude to the open-source software community, whose invaluable contributions significantly enhanced the development and execution of this research. Namely, the authors would like to thank the developers of the following software packages: Axelrod-Python library for IPD simulations, the Matplotlib library for visualisation, The Numpy library for data manipulation, and finally the scikit-learn library for data analysis.

References

  1. 1. Axelrod R, Hamilton WD. The evolution of cooperation. science. 1981;211(4489):1390–1396. pmid:7466396
  2. 2. Axelrod R. Effective Choice in the Prisoner’s Dilemma. Journal of Conflict Resolution. 1980;24(1):3–25.
  3. 3. Nowak MA, Sigmund K. Tit for tat in heterogeneous populations. Nature. 1992;355(6357):250.
  4. 4. Nowak M, Sigmund K. A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner’s Dilemma game. Nature. 1993;364(6432):56. pmid:8316296
  5. 5. Beaufils B, Delahaye JP, Mathieu P. Our meeting with gradual, a good strategy for the iterated prisoner’s dilemma. In: Proceedings of the Fifth International Workshop on the Synthesis and Simulation of Living Systems; 1997. p. 202–209.
  6. 6. Press WH, Dyson FJ. Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent. Proceedings of the National Academy of Sciences. 2012;109(26):10409–10413. pmid:22615375
  7. 7. Tzafestas E. Toward adaptive cooperative behavior. From Animals to animals: Proceedings of the 6th International Conference on the Simulation of Adaptive Behavior (SAB-2000). 2000;2:334–340.
  8. 8. Hilbe C, Martinez-Vaquero LA, Chatterjee K, Nowak MA. Memory-n strategies of direct reciprocity. Proceedings of the National Academy of Sciences. 2017;114(18):4715–4720. pmid:28420786
  9. 9. Glynatsi NE, Knight VA. Using a theory of mind to find best responses to memory-one strategies. Scientific Reports. 2020;10(1):17287. pmid:33057134
  10. 10. Murase Y, Baek SK. Friendly-rivalry solution to the iterated n-person public-goods game. PLoS Computational Biology. 2021;17(1):e1008217. pmid:33476337
  11. 11. Schmid L, Hilbe C, Chatterjee K, Nowak MA. Direct reciprocity between individuals that use different strategy spaces. PLoS Computational Biology. 2022;18(6):e1010149. pmid:35700167
  12. 12. Li J, Zhao X, Li B, Rossetti CS, Hilbe C, Xia H. Evolution of cooperation through cumulative reciprocity. Nature Computational Science. 2022;2(10):677–686. pmid:38177263
  13. 13. Chen X, Fu F. Outlearning extortioners: unbending strategies can foster reciprocal fairness and cooperation. PNAS nexus. 2023;2(6):pgad176. pmid:37287707
  14. 14. Axelrod R. More effective choice in the prisoner’s dilemma. Journal of Conflict Resolution. 1980;24(3):379–403.
  15. 15. Tzafestas E. Toward adaptive cooperative behavior. In: Proceedings of the Simulation of Adaptive Behavior Conference; 2000. p. 334–340.
  16. 16. Bendor J, Kramer RM, Stout S. When in Doubt… Cooperation in a Noisy Prisoner’s Dilemma. The Journal of Conflict Resolution. 1991;35(4):691–719.
  17. 17. Donninger C. Is it Always Efficient to be Nice? A Computer Simulation of Axelrod’s Computer Tournament. Heidelberg: Physica-Verlag HD; 1986. Available from: https://doi.org/10.1007/978-3-642-95874-8_9.
  18. 18. Molander P. The Optimal Level of Generosity in a Selfish, Uncertain Environment. The Journal of Conflict Resolution. 1985;29(4):611–618.
  19. 19. Selten R, Hammerstein P. Gaps in Harley’s argument on evolutionarily stable learning rules and in the logic of “tit for tat”. Behavioral and Brain Sciences. 1984;7(1):115–116.
  20. 20. Kendall G, Yao X, Chong SY. The iterated prisoners’ dilemma: 20 years on. vol. 4. World Scientific; 2007.
  21. 21. Stewart AJ, Plotkin JB. Extortion and cooperation in the Prisoner’s Dilemma. Proceedings of the National Academy of Sciences. 2012;109(26):10134–10135. pmid:22711812
  22. 22. Mathieu P, Delahaye JP. New Winning Strategies for the Iterated Prisoner’s Dilemma. Journal of Artificial Societies and Social Simulation. 2017;20(4):12.
  23. 23. Harper M, Knight V, Jones M, Koutsovoulos G, Glynatsi NE, Campbell O. Reinforcement learning produces dominant strategies for the Iterated Prisoner’s Dilemma. PloS one. 2017;12(12):e0188046. pmid:29228001
  24. 24. Axelrod R. The Evolution of Strategies in the Iterated Prisoner’s Dilemma. Genetic Algorithms and Simulated Annealing. 1987; p. 32–41.
  25. 25. Miller JH. The coevolution of automata in the repeated Prisoner’s Dilemma. Journal of Economic Behavior and Organization. 1996;29(1):87–112.
  26. 26. Rapoport A, Seale DA, Colman AM. Is tit-for-tat the answer? On the conclusions drawn from Axelrod’s tournaments. PloS one. 2015;10(7):e0134128. pmid:26225422
  27. 27. Gong Z, Zhong P, Hu W. Diversity in machine learning. Ieee Access. 2019;7:64323–64350.
  28. 28. The Axelrod project developers. Axelrod: 3.0.0; 2016. http://dx.doi.org/10.5281/zenodo.807699.
  29. 29. Knight V, Harper M, Glynatsi NE, Gillard J. Recognising and evaluating the effectiveness of extortion in the Iterated Prisoner’s Dilemma. PloS one. 2024;19(7):e0304641. pmid:39058703
  30. 30. Au TC, Nau D. Accident or intention: that is the question (in the Noisy Iterated Prisoner’s Dilemma). In: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems. ACM; 2006. p. 561–568.
  31. 31. Mathieu P, Delahaye J, Beaufils B. The Iterated Prisoner’s Dilemma repository; 2024. https://github.com/charlespwd/project-title.
  32. 32. Eckhart A. CoopSim v0.9.9 beta 6; 2015. https://github.com/jecki/CoopSim/.
  33. 33. prase. type [; 2011]https://www.lesswrong.com/posts/hamma4XgeNrsvAJv5/prisoner-s-dilemma-tournament-results.
  34. 34. Ashlock W, Ashlock D. Changes in prisoner’s dilemma strategies over evolutionary time with different population sizes. In: 2006 IEEE International Conference on Evolutionary Computation. IEEE; 2006. p. 297–304.
  35. 35. Ashlock W, Tsang J, Ashlock D. The evolution of exploitation. In: 2014 IEEE Symposium on Foundations of Computational Intelligence (FOCI). IEEE; 2014. p. 135–142.
  36. 36. Knight V, Harper M, Glynatsi NE, Campbell O. Evolution reinforces cooperation with the emergence of self-recognition mechanisms: An empirical study of strategies in the Moran process for the iterated prisoner’s dilemma. PloS one. 2018;13(10):e0204981. pmid:30359381
  37. 37. Bishop CM. Training with noise is equivalent to Tikhonov regularization. Neural computation. 1995;7(1):108–116.
  38. 38. Camerer CF. Behavioral game theory: Experiments in strategic interaction. Princeton university press; 2011.
  39. 39. Binmore K. Modeling rational players: Part I. Economics & Philosophy. 1987;3(2):179–214.
  40. 40. Hofbauer J, Sigmund K. Evolutionary game dynamics. Bulletin of the American mathematical society. 2003;40(4):479–519.
  41. 41. Glynatsi NE, Knight V, Harper M. A data set of 45686 Iterated Prisoner’s Dilemma tournaments’ results [RAW DATA]; 2023. Available from: https://doi.org/10.5281/zenodo.10246248.
  42. 42. Glynatsi NE, Knight V, Harper M. A data set of 45686 Iterated Prisoner’s Dilemma tournaments’ results; 2023. Available from: https://doi.org/10.5281/zenodo.10246247.