^{1}

^{2}

^{3}

^{4}

^{1}

^{*}

Conceived and designed the experiments: DSV IG ES MAPM. Performed the experiments: DSV. Analyzed the data: ES MAPM. Contributed reagents/materials/analysis tools: MAPM. Wrote the paper: ES MAPM.

The authors have declared that no competing interests exist.

Game theory and the Prisoner's Dilemma (PD) game in particular, which captures the paradox of cooperative interactions that lead to benefits but entail costs to the interacting individuals, have constituted a powerful tool in the study of the mechanisms of reciprocity. However, in non-human animals most tests of reciprocity in PD games have resulted in sustained defection strategie

We use an iterated PD game to test rats (

Here we show that rats understand the payoff matrix of the PD game and the strategy of the opponent. Importantly, our findings reveal that rats possess the necessary cognitive capacities for reciprocity-based cooperation to emerge in the context of a prisoner's dilemma. Finally, the validation of the rat as a model to study reciprocity-based cooperation during the PD game opens new avenues of research in experimental neuroscience.

A central feature of the human species is its seemingly evolutionarily unprecedented capacity to establish cooperative interactions between non-related individuals. However, many examples describing similar behaviours in other animals revealed that this capacity is not exclusive to our species

Several mechanistic causes for the emergence of reciprocity-based cooperation, during the interaction between two individuals, have been put forward. One of these emphasizes pro-social propensity of the interacting individuals, in that a cooperative act constitutes a truly altruistic behaviour emerging from a reward value attributed to the perception of benefit to others

Game Theory has proven to be instrumental in the study of social behaviour, as it formalizes mathematically the outcomes associated with the decisions of two or more interacting individuals, framing in economic terms the conditions for reciprocity

The success of reciprocating strategies in theoretical models of iPD games has been corroborated experimentally by numerous reports on the emergence of reciprocal cooperation between human subjects while playing iPD games

Recently, it has been shown that rats can display generalized reciprocity

Our iPD game was played in a double T-maze in which choosing between arms was arbitrarily defined as cooperation (C) or defection (D). The payoff matrix was composed of both rewards and punishments, where T and R trials led to the delivery of food pellets and P and S trials led to the delivery of tail pinches (

Sated rats playing under a matrix T = 6, R = 4, P = −1, S = −3 against a Tit-for-Tat (TFT) opponent (_{0}|X_{−1}), where X = T, R, P or S. Black lines represent P(C_{0}|X_{−1}), for each target rat, the dark blue line represents the mean P(C_{0}|X_{−1}) and the light blue band shows the s.e.m. (

Stooge Cooperates | Stooge Defects | |

Target |
R = 4 pellets | S = 3 tail pinches |

Target |
T = 6 pellets | P = 1 tail pinch |

In this game, the frequency of mutual cooperation was significantly higher than that of mutual defection (reward, R, and punishment, P, trials, respectively). However, a sizable incidence of temptation (T) and sucker (S) trials was observed (Friedman's ANOVA testing for effect of outcome, Q(3) = 53.485, two-tailed _{0}|P_{−1}) = 0.84 and p(C_{0}|T_{−1}) = 0.73; probability of cooperating after cooperation: p(C_{0}|R_{−1}) = 0.42 and p(C_{0}|S_{−1}) = 0.49,

If the emergence of cooperation between two interacting individuals is contingent upon the adoption of a reciprocating behaviour by at least one of them, then, if the stooge rat uses a non-reciprocating strategy, cooperation rates should decrease. To test this hypothesis we fixed the stooge rat in a pseudo-random strategy, so that the choices of the target rat would not influence subsequent moves of the stooge. During any given session the pseudo-random stooge rat cooperates on average 50% of the trials. The sequence of defective and cooperative moves was randomized, however, no more that 3 consecutive defective or cooperative moves were allowed.

When playing against a random opponent, as in a single-shot game, the best strategy is to defect. Indeed, we found that against a pseudo-random strategy, the rat will predominantly defect (average cooperation rate across sessions 5 to 10, 0.20±0.02, n = 5). Cooperation in this game was low irrespective of the outcome of the previous trial (

Results from these experiments (TFT and PR games) show that rats display consistent differences in their behaviour depending on the opponent's strategy. Nonetheless, in both cases rats are suboptimal, i.e. the behaviour they adopt does not yield the best possible outcome. If optimal, when playing against TFT, rats should always cooperate

Although the observed behaviour is suboptimal, and does not seem to conform to the matching rule it is clearly sensitive to the opponent's strategy, thus raising the question of whether the difference in the behaviour adopted by the rats is truly adjusted to the opponents' strategy. This question can only be addressed through game simulations where the behaviour adopted by the rats (observed behaviour) can be played out against different strategies, e.g. TFT and pseudo-random. To this end, we used the observed strategies of rats playing against TFT or PR (

Simulated games using TFT and PR were performed using the empirically determined cooperation probabilities after each outcome, T, R, P and S, of the game (see

In an iPD game, the highest immediate payoff results from a temptation trial. However, the highest gain along all sessions is achieved when both players always cooperate (resulting in 4 food pellets every trial). Thus, when playing against a TFT opponent and in full knowledge of the opponent's current move the target player should always cooperate. However, if the temptation to defect is high, subjects will eventually defect. As previously seen, when playing against TFT, rats in our iPD game cooperate more often than they defect, but show high incidence of temptation trials which are immediately followed by cooperation. One possibility is that rats adopted a mixed strategy (mutual cooperation and alternating reciprocity) because, even though the temptation to defect is significant, mutual cooperation entails a higher payoff than pure alternating reciprocity. If this is true, then decreasing the outcome of a reward trial (from 4 to 2 food pellets) while maintaining the level of temptation (6 pellets), should maintain the levels of alternation between T and S trials, while decrease the levels of mutual cooperation (R trials). As predicted, we found that rats showed similar T and S levels as in the first experiment (where R = 4 and T = 6), whereas the incidence of mutual cooperation was significantly lower than that of mutual defection (Friedman's ANOVA testing for effect of outcome, Q(3) = 66.630, two-tailed

(

Several explanations for the lack of cooperation in previous studies have been put forward such as high impulsiveness of non-human animals

(

We show that for rats playing an iPD game, the cooperation rate is modulated by the strategy of the opponent; the relative size of the reward resulting from cooperation and defection; and the motivational state of the animals (Kruskal-Wallis testing for effect of game, K = 85.452,

Left panel shows time-course of cooperation rate along the ten game sessions for each iPD game (mean±s.e.m.). Each line represents one of the games tested. Right panel shows proportion of cooperation in each iPD game (mean±s.e.m.) when cooperation reaches stability (see

In conclusion, our results reinforce the notion that rats are capable of complex computations

The Instituto Gulbenkian de Ciência follows the Portuguese Guidelines, which comply with the European Directive 86/609/EEC of the European Council.

The experiments were performed using male non-litter mates of the outbred Sprague Dawley rat strain, from Charles River, Barcelona, Spain. All animals were housed in pairs under 12 h light/dark cycle. Experiments were conducted during the light period. Before starting the experiment, all rats were habituated for one week to the experimenter and to the novel food used for the positive reinforcements in the iPD game. Each experiment used naive rats and all rats within a game played against the same stooge. In each game 5 to 6 target rats were used (see corresponding Figures for sample size). For the experiments using sated rats, subjects had free access to food and water, whereas, for the experiments using food-deprived rats, animals had restricted access to food and kept at 85% of their

The apparatus consists of a double T-maze made of plexiglass (

Diagram of the double T-maze used in the presented experiments. One T-maze is represented in grey with its respective start box and two choice compartments. In dashed lines is shown the identical opposing T-maze. Arrows show the movement direction of the start box door and of the partition between compartments.

We used a Prisoner's Dilemma game matrix, in which T>R>P>S. Preliminary evidence showed that, as predicted by Stevens and Clements

We first verified that rats can discriminate the difference between temptation and reward outcomes (6 vs. 4 food pellets respectively), and between punishment and sucker outcomes (1 vs. 3 tail pinches respectively). In these experiments only one T-Maze was used and simple preference tests were performed. Rats were placed in the start box and given a choice between the two compartments of the T-maze. For the positive reinforcement test, 6 pellets (T outcome) were delivered in one compartment and 4 pellets (R outcome) in the other (high and low rewards were delivered in a counterbalanced fashion in the left and right compartment). We found that over 5 days with one session of 20 trials, preference for the 6-pellet compartment steadily increased reaching 84±2% by the last day. G-test shows that choice of high reward was significantly different from chance, G_{P} = 61.50, P = 4.4×10^{−15} (see _{P} = 4.06, P = 0.04). These results show that rats could discriminate 6 over 4 food pellets, and 1 over 3 tail pinches.

All iPD games were played for 10 consecutive days consisting of one daily session, of 20 trials each. The two compartments of each T-Maze were arbitrarily defined as cooperation (C) or defection (D) compartments (counterbalanced across rats). Thus, in our iPD game a cooperating or defecting act was defined as entering the C or D compartment respectively. For consecutive target rats cooperation or defection was ascribed to opposing compartments. This experimental design guarantees that odour cues from the previous target rat would elicit the opposite response (cooperate or defect) from the following target rat. For each game one of the rats, the stooge, was assigned to play a fixed strategy, either Tit-For-Tat or Pseudo-Random, and the other rat (the target) could freely choose between C and D. A new stooge was used for each game but it was the same for all rats within a game. On each trial of the game, the stooge was placed in C or D (according to the

All statistical analysis, except for the G-tests (which were calculated manually using Excel from Microsoft Office, see paragraph III), where performed using XSTAT, from Microsoft. Since the data analyzed did not follow normal distributions (as shown by Shapiro-Wilk normality tests) non-parametric statistical tests were used for analysis II, III and IV.

In order to analyse the strategy adopted by the rats in the different games, we pooled the data from the sessions in which cooperation rates were stable. To identify when the cooperation rate stabilized, for each game we plotted the cooperation rate for all rats across sessions (note that no animal was excluded from the analysis). Next we fitted several linear models to the data, where the first model included all sessions, and in the successive models the data included would slide by one session (model 1 included sessions 1 through 10; model 2 included sessions 2 through 10 and so on). We found that for all games from session 5 onwards the slope of the linear fit was not different from zero. Thus, for all analysis of the rats' performance the data was pooled from sessions 5 through 10.

To compare mean rate of the different outcomes, for each game we first performed a Friedman's ANOVA (with outcome as single within subject factor). When significant, α = 0.05, multiple post-hoc pairwise comparisons using the Nemenyi's procedure/Two-tailed tests were performed, with the Bonferroni corrected α value of 0.0083.

To test whether the probability of cooperation after each outcome was different from chance, i.e. 0.5, we performed G-tests. We calculated the parameter G_{P} to test for deviations from the theoretical distribution

To compare the mean cooperation rate observed in the different games a Kruskal-Wallis ANOVA was performed using game as a single between-subject factor. Multiple post-hoc pairwise comparisons using the Dunn's procedure/Two-tailed test were performed, with the Bonferroni corrected α value of 0.0083.

In order to assess whether the behaviour adopted by rats when playing against TFT (observed behaviour) would yield a worse outcome if the same strategy would be adopted against a Random opponent, we simulated games where the observed strategy was played against TFT (modelling the real game) or against a random opponent. To model the observed behaviour we used the probability of cooperation after each of the game's outcomes (T,R,P,S), averaged across the five rats that played TFT (P(C_{0}|T_{−1}) = 0.73, P(C_{0}|R_{−1}) = 0.42 P(C_{0}|P_{−1}) = 0.84, P(C_{0}|S_{−1}) = 0.49). This model (consisting of the above cooperation probabilities) corresponds to the simulated TFT-based player. Using this model we simulated a game where the opponent was playing either pure TFT or pure Random. The simulation was run 5 times for each opponent. First, to validate our model, the average outcome for the simulated game against TFT was compared to the average outcome obtained by rats playing the real game (observed outcome). Next, the outcome of the simulated game against TFT was compared to that against Random.

The same procedure was used to model the observed behaviour when rats played against a pseudo-random stooge rat, so that we could assess whether the behaviour adopted by rats when playing against PR (observed behaviour) would yield a worse outcome if the same strategy would be adopted against a TFT opponent. For the simulated PR-based player we used the following probabilities: p(C_{0}|T_{−1}) = 0.22, p(C_{0}|R_{−1}) = 0.24 p(C_{0}|P_{−1}) = 0.19, p(C_{0}|S_{−1}) = 0.07 (average probabilities across the five rats that played against PR). Note that in the real game the stooge rat was playing a pseudo-random strategy, in such way that there was never more than 4 times the same move (C or D), whereas the virtual rat played a pure random strategy.

The simulations were run in Excel from Microsoft Office. Comparisons of the outcome from the different simulated games unpaired, two-tailed T-tests were performed, with the Bonferroni corrected α value of 0.01.

Diagram showing the probability of transition between outcomes of individual rats. Arrows represent transitions: driven by cooperation in blue, and driven by defection in red (arrow thickness proportional to transition probability). In all panels: T, temptation; R, reward; P, punishment; S, Sucker; C, cooperation; D, defection.

(0.13 MB TIF)

Diagram showing the probability of transition between outcomes of individual rats. Subjects shape their strategy according to the iPD game conditions (each game differs from game 1 for the highlighted condition). Arrows represent transitions: driven by cooperation in blue, and driven by defection in red (arrow thickness proportional to transition probability). In all panels: T, temptation; R, reward; P, punishment; S, Sucker; C, cooperation; D, defection.

(0.12 MB TIF)

Comparison between observed behaviour and matching behaviour. The figure shows the observed probability of cooperation after each outcome, Reward, Sucker, Punishment and Temptation, black bars (mean±s.e.m), together with the expected probability of cooperation if rats would be matching for reward (p(C0|R-1) and p(C0|S-1)) or punishment (p(C0|P-1) and p(C0|T-1)) magnitudes, white bars. The observed behaviour approached that of matching only for the game in a), when rats were choosing between 6 or 4 food pellets, i.e., after a reward or sucker trial. Note that this analysis is not possible for the game where rats were playing against a pseudo-random stooge, because all transitions between outcomes were possible, and thus, rats had to choose between rewards or punishments of different magnitudes, but also between rewards and punishment (in this case outcomes are not comparable, therefore matching does not apply).

(0.08 MB TIF)

The authors wish to thank Filipa Vala, Iris Vilares and Patrício Simões for contributions in the initial steps of this project, Luis Miguel Fareleira and Frederico Moncada for designing the T-mazes, and Zachary Mainen, Rui Costa and Sara Magalhães, the editor and an anonymous reviewer for thorough comments on the manuscript.