Fig 1.
Direct reciprocity among players with different memory capacities.
a-c, Models of direct reciprocity assume that two or more players repeatedly engage in a social dilemma. In each round t, players can either cooperate (C) or defect (D). To make these decisions, players keep a record of what happened in previous rounds. Based on these records, they decide whether or not to cooperate in the current round. After each round, players update their private records. These records may be constrained by how much players remember. Here we distinguish three memory spaces. d, Memory-1 players () remember both their own and their co-player’s previous action. e, Reactive players (
) only remember their co-player’s previous action. f, Unconditional players (
) keep no records at all. The example in the first row illustrates an interaction between a memory-1 player with strategy WSLS [16] against a reactive player with strategy TFT [5].
Table 1.
Pairwise competitions of players with different memory spaces.
We consider the learning dynamics among players with different memory spaces. Players either use memory-1 strategies (), reactive strategies (
) or unconditional strategies (
). Within their respective memory space, players adapt their strategies to their opponent using introspection dynamics [48, 49] with a selection strength of β = 100. For each combination of memory spaces, we compute the players’ average payoffs according to Eq (6). The winners of these pairwise comparisons are shown in bold. We find that reactive players outperform both memory-1 opponents and unconditional opponents for all considered cost values. Simulations are run for T = 109 time steps.
Table 2.
Wins and scores in pairwise tournaments.
To interpret the results of Table 1, we (i) count how often a memory space wins a pairwise competition, and (ii) we compute the memory space’s score by adding up its payoff against the two other memory spaces. With respect to both measures, we find again that reactive players outperform players using the other two memory spaces. In general, however see that the rankings by wins and total score can differ. For example, for c = 0.5 and c = 0.6, wins more often than
, but ranks last in terms of total score. Parameters are the same as in Table 1.
Table 3.
Self scores and combined scores in pairwise tournaments.
In addition to the wins and the scores considered in Table 2, we consider two additional measures for a memory space’s success. A memory space’s self payoff is the payoff two players with the respective memory space obtain against each other. The combined score is the sum of the memory space’s score and its self payoff. Again, winners are marked in bold face. Across all cooperation costs, we find that memory-1 strategies yield the largest self payoff. They also achieve the largest combined score if c ≤ 0.3; otherwise, for c ≥ 0.4, reactive strategies succeed. Parameters are the same as in Table 1.
Table 4.
Tournament winners for different rankings and selection strengths.
We summarize our static results by showing the tournament’s winner for various costs and selection strengths, with respect to the four different rankings (i) number of wins, (ii) score, (iii) self payoff, and (iv) combined score. Memory-1 strategies typically obtain the highest self-payoff. However, they only succeed in the other rankings when cooperation is cheap (small c) and when selection is sufficiently strong (large β).
Fig 2.
Lower-memory strategies are more likely to discover strategies with extreme cooperation rates.
Here, we study the distribution of the players’ cooperation rates when two players with different memory capacity interact ( vs
,
vs
,
vs
). In panels a–c, we show that the mean of this distribution stabilizes after at most t = 10 timesteps, for each combination of memory spaces. In d-f, we present this distribution in the very beginning of the process, when players choose their strategies uniformly at random from their respective memory space. In g-i, we show how the distribution of cooperation rates changes after 10 time steps of introspection dynamics (for c = 0.5 and β = 10), which is the time it takes for the difference in average cooperation rates to stabilize. In both cases, we observe that players with lower memory capacities are more likely to choose strategies with extremal cooperation rates (a cooperation rate close to zero or one). To create this graph, we have randomly sampled 106 pairs of strategies from the respective memory spaces. For each pair, we have then simulated t = 10 time steps of introspection dynamics. The curves show the result when we bin the players’ cooperation rates in steps of 0.02 and renormalize (such that the area under each curve is one). The above plots show marginal distributions for each memory-space. In contrast, S5 and S6 Figs show joint distributions for each possible combination of memory spaces.
Fig 3.
Evolutionary dynamics of memory spaces.
To explore the evolution of different memory capacities, we use replicator dynamics [50]. Members of a population can have one of three different memory spaces, memory-1 (), reactive (
), or unconditional (
). The fraction of population members with a given memory space changes in time, depending on whether players with this memory space obtain an expected payoff above average. a, When only two memory spaces compete, there are three possible dynamics: either one space is globally stable (dominance), each space is locally stable (bistability), or the two spaces form a stable mixture (coexistence). b, Here we illustrate our approach by considering an environment in which cooperation is comparably costly and where selection is relatively weak. c-e, We first analyze the replicator dynamics for each pair of memory spaces. We find that in a pairwise competition,
dominates both
and
, whereas
dominates
. f, In a next step, we study the replicator dynamics among all three memory spaces. For the given parameter values, we find that
is globally stable: independent of the initial composition of the population, all trajectories lead towards a monomorphic population of reactive players. Overall, we obtain a ‘memory dilemma’: the memory space that evolves is not the memory space that maximizes the population’s average payoff.
Fig 4.
A bifurcation analysis of the evolutionary dynamics of memory spaces.
We explore the replicator dynamics of the three different memory spaces for four different selection strengths β. In each case, we first classify the dynamics as we vary the cooperation cost from c = 0 to c = 1 (left graphs). Here, triangles show representative depictions of the dynamics. Colors indicate basins of attractions of each possible fixed point. If a given memory space Si is globally stable in a given cost interval, the respective line segment is appropriately colored. Colored shades around these line segments indicate a memory dilemma. Second, we also illustrate the average cooperation rate in monomorphic populations for each of the possible memory spaces (right graphs). If the respective monomorphic population is stable according to replicator dynamics, we use a solid line; otherwise we use a dashed line. a-d, For weak selection, unconditional strategies are globally stable for all cost values. As we increase selection strength, memory-1 strategies become stable when cooperation costs are sufficiently small. For large cooperation costs, reactive strategies are globally stable.
Fig 5.
Evolutionary dynamics of memory spaces for different game structures.
We explore whether memory dilemmas are present in games beyond the repeated Prisoner’s Dilemma by repeating our previous simulations for various game matrices. a, We consider matrices parametrized by v and u. The parameter v varies in the interval [0, 2], and the parameter u varies in [−1, 1]. We can partition the resulting two dimensional space of game matrices into four quadrants. The lower right quadrant contains games with a Prisoner’s Dilemma (PD) structure like the donation game, whereas the other quadrants contain the other fundamental social dilemmas Snowdrift (SG), Harmony (HG), and Stag Hunt (SH) games. For each of these games, we do the same kind of analysis as for the donation games studied earlier. b, First, we depict the Nash equilibria of the 3 × 3 payoff matrices when the spaces ,
and
compete. For each strategy space, we find parameter regions where this space is an equilibrium. Additionally, we identify regions in which more than one space is stable. c, Here, we show the strategy space with the highest self-payoff. Memory-1 strategies tend to get the highest self-payoff in the PD and SH. In the other two game classes, strategy spaces of lower complexity can be more effective. d, We distinguish two kinds of memory dilemma. In the “classic” one, the strategy space with highest complexity gives the highest self-payoff, but is not a Nash equilibrium. In the “reverse” one, it is a less complex strategy space that yields the higher self-payoff without being an equilibrium. Both dilemmas also appear in their “weak” forms when bistabilities occur. Parameters: β = 100, simulations run for T = 109.