Incipient Cognition Solves the Spatial Reciprocity Conundrum of Cooperation

Background From the simplest living organisms to human societies, cooperation among individuals emerges as a paradox difficult to explain and describe mathematically, although very often observed in reality. Evolutionary game theory offers an excellent toolbar to investigate this issue. Spatial structure has been one of the first mechanisms promoting cooperation; however, alone it only opens a narrow window of viability. Methodology/Principal Findings Here we equip individuals with incipient cognitive abilities, and investigate the evolution of cooperation in a spatial world where retaliation, forgiveness, treason and mutualism may coexist, as individuals engage in Prisoner's Dilemma games. In the model, individuals are able to distinguish their partners and act towards them based on previous interactions. We show how the simplest level of cognition, alone, can lead to the emergence of cooperation. Conclusions/Significance Despite the incipient nature of the individuals' cognitive abilities, cooperation emerges for unprecedented values of the temptation to cheat, being also robust to invasion by cheaters, errors in decision making and inaccuracy of imitation, features akin to many species, including humans.


Simulations using unconditional strategies (Figure 1)
The simulations for the solid black curve in Figure 1 were executed on a square lattice of the size N=200x200 with periodic boundary conditions. Players located on the nodes of the square lattice could follow the C (always cooperate) or D (always defect) strategies. We used random initial conditions where both strategies were present with the same frequency.
Players gained their payoff from Prisoner's Dilemma Games with their four nearest neighbours and had the opportunity to imitate the strategy of one of their (randomly chosen) neighbours with the probability , where x stands for the focal player and y for the possibly imitated partner. P x and P y are the total payoffs of the players, respectively, while K characterizes the noise in decision making. We let the system evolve for 20000 generations then averaged the strategy concentrations over the population for additional 50000 generations. We chose the K=0.4 parameter value because this value is close to optimal for cooperation [1].

Robustness against defector invasion
To investigate how cooperation is robust against invasion by defectors, we started the simulations from random initial strategy values p and q and, after a transient time (2000 generations), randomly replaced a given fraction µ of players by (0,0) strategists in every generation (that is, N elementary steps). Figure S1 shows the average p and q values for different temptation parameters and defector invasion rates. High p values illustrate that cooperation persists even for the highest temptation and for considerably high invasion 3 rates. The incipient cognitive abilities of players enable them to react promptly and isolate defectors by decreasing their fitness below that of the 'cooperator' neighbours. As a consequence, the invaders are quickly 'converted' back to cooperators. In order to give defectors a chance to invade the population, it is necessary to artificially increase the injection rate of pure defectors to values which effectively counteract the characteristic time scale of defector-to-cooperator conversion.

Strategy-parameter distribution for different temptations
The inset in Figure 1 provides information on the average stationary p and q values.
However, their distribution is also worth investigating. Contrary to the distribution of pvalues that is sharply concentrated at one, and remains so for a wide range of values of b, the high fraction of cooperative actions promotes a significant dispersion of q-values for all types of individuals, as it creates a smaller selection pressure for the evolution of this particular trait. In what concerns the average p and q values, the inset of Figure 1 shows that, with increasing b, the dominating strategies approach maximum discrimination (1,0): High temptation values leave no room for tolerance and cooperators survive based on sharp discrimination. Moreover, as discrimination increases, the dispersion in q is also reduced: we obtain a 13% reduction in q-dispersion as b increases from 1.2 to 2.0. This situation is similar to that found in direct reciprocity contests, where sharp discrimination and prompt forgiving (TFT) was the winning strategy against a diversity of tournament strategists.
Changing the copying accuracy (σ) influences the distributions accordingly, as the dispersion of both p and q increases with increasing σ. There is however another sideeffect: Because 0≤p,q≤1, strategy parameters falling outside this interval (and arising from copying with accuracy σ) must be discarded. As a result, the peak at p=1 can be shifted to slightly lower values if σ is increased, as the total probability of acquiring a lower p value while imitating a p=1 player becomes larger than the probability of perfect imitation.