## Figures

## Abstract

Reciprocity toward a partner’s cooperation is a fundamental behavioral strategy underlying human cooperation not only in interactions with familiar persons but also with strangers. However, a strategy that takes into account not only one’s partner’s previous action but also one’s own previous action—such as a win-stay lose-shift strategy or variants of reinforcement learning—has also been considered an advantageous strategy. This study investigated empirically how behavioral models can be used to explain the variances in cooperative behavior among people. To do this, we considered games involving either direct reciprocity (an iterated prisoner’s dilemma) or generalized reciprocity (a gift-giving game). Multilevel models incorporating inter-individual behavioral differences were fitted to experimental data using Bayesian inference. The results indicate that for these two types of games, a model that considers both one’s own and one’s partner’s previous actions fits the empirical data better than the other models. In the direct reciprocity game, mutual cooperation or defection—rather than relying solely on one’s partner’s previous actions—affected the increase or decrease, respectively, in subsequent cooperation. Whereas in the generalized reciprocity game, a weaker effect of mutual cooperation or defection on subsequent cooperation was observed.

**Citation: **Horita Y (2020) Greater effects of mutual cooperation and defection on subsequent cooperation in direct reciprocity games than generalized reciprocity games: Behavioral experiments and analysis using multilevel models. PLoS ONE 15(11):
e0242607.
https://doi.org/10.1371/journal.pone.0242607

**Editor: **Valerio Capraro, Middlesex University, UNITED KINGDOM

**Received: **August 26, 2020; **Accepted: **November 6, 2020; **Published: ** November 19, 2020

**Copyright: ** © 2020 Yutaka Horita. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **Data, code for analysis, and instructions for the experiments (translated into English) are available at the Open Science Framework: https://osf.io/5aqkh/.

**Funding: **YH was supported by JSPS KAKENHI (Grant No. JP18K13276).

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

Humans cooperate with other people, even with strangers and non-relatives, to establish a large-scale society. Cooperation is defined as a type of behavior that implies sacrificing personal interests and thereby providing benefits to others. In evolutionary game theory, the prisoner’s dilemma (PD) is used as a standard model to examine the evolution of cooperation. In PD, two players can decide either to cooperate (C) or to defect (D). If both players mutually cooperate, they each receive a reward *R*. If they mutually defect, they each receive a punishment *P*. If one decides to cooperate and the other one to defect (CD), the cooperator receives the payoff *S* and the other the payoff *T*. The payoff structure of the PD is given by the following equation: *T* > *R* > *P* > *S* (and 2*R* > *T* + *S*). Without employing any additional mechanisms, natural selection considers defection as more strategic. This is because cooperation is often costly for individuals; on the contrary, defection means yielding immediate benefits for them. However, evolutionary game theory defines several conditions under which the cooperative strategy can compete with the non-cooperative one [1–5]. Specifically, reciprocity—a behavioral rule that depends on a previous action of a counterpart—is considered a key concept underlying the evolution of cooperation between humans or animals.

Repeated interaction with the same opponent is one known mechanism for facilitating cooperation, and it is referred to as “direct reciprocity” or “reciprocal altruism” [6–8]. If the probability of repeating an interaction between the same individuals is high, a reciprocal strategy can be considered as a payoff-maximizing one. In a repeated and simultaneous PD, tit-for-tat (TFT) is a well-known successful strategy that implies copying the previous action of an opponent [6, 7]. TFT can be applied to establish mutual cooperation through cooperative strategies and to avoid being exploited by unconditional defectors.

The win-stay lose-shift (WSLS) strategy—also referred to as the Pavlov strategy—is one of the most successful strategies used in the simultaneous PD [9]. WSLS is a variant of reinforcement learning, which suggests repeating the previous action that yields higher payoffs to the focal player and changing his/her behavior if he/she obtains lower payoffs. In the PD, WSLS is a strategy that incorporates not only the opponent’s but also one’s own previous action; it suggests cooperating after mutual cooperation (CC) or mutual defection (DD) and defecting after exploitation (DC) or being exploited (CD). Theoretically, WSLS can outperform TFT in an iterated and simultaneous PD under the condition that errors can occur [9]. For instance, a player may misperceive that his/her opponent has defected even though the opponent has in fact cooperated. In such a situation, TFT can easily fall into mutual defection. Moreover, WSLS can outperform TFT, which is attributed to its several advantages. First, WSLS enables the correction of errors; for example, after DD, it can switch to cooperation, whereas TFT implies repeating defection. Second, unlike TFT, WSLS can exploit unconditional cooperators.

Although direct reciprocity can explain how a cooperative strategy emerges during repeated interactions between two persons, reciprocal cooperation beyond an iterated relationship also is widely observed in human society. Such reciprocal behavior can be described by the following simple rule: “If I receive help from my partner, I will help the other person.” Such a form of reciprocity is referred to as “generalized reciprocity” (or “upstream reciprocity”). Some empirical studies have shown that reciprocal cooperation can occur even with generalized reciprocity [10–17]. However, other empirical studies have suggested that cooperation based on generalized reciprocity is unstable and weak [17, 18]. Theoretical research has demonstrated that cooperation based on generalized reciprocity can be established under strict conditions, such as a small population size [19, 20]. It also suggests that a strategy without cognitive complexity—such as a WSLS-like one—can perform better and that cooperation can be sustained in the case of generalized reciprocity [20, 21].

In addition to theoretical works, empirical studies have emphasized the important role of reciprocity toward the other’s cooperation [22, 23]. On the other hand, recent empirical studies have indicated the role of other behavioral models for predicting human behavior in experiments. Models that focus on one’s own payoffs—such as reinforcement learning—can explain human behavioral patterns appropriately in social dilemma experiments in which many individuals decide whether or not to cooperate for their group [24–26]. A behavioral rule that considers not only the counterpart’s previous action but also one’s own has been observed in several social dilemma cases [26–31]. In the iterated PD game, the particular proportion of participants who employ a WSLS-like strategy has been indicated [32, 33]. In the generalized reciprocity situation, some empirical studies have suggested that people behave in a reciprocal manner [12–16]. To the best of our knowledge, however, it is still unclear whether or not the WSLS-like strategy plays an important role in explaining real human behavior in experimental situations of generalized reciprocity.

In this study, we aim to investigate the possibility of constructing a comprehensive model capable of encompassing the variety of individuals’ cooperative behaviors in interactions with a given person or with strangers. We conducted two types of experimental games: direct and generalized reciprocity games. Each participant can decide whether to donate (cooperate) or not to donate (defect) money repeatedly to the same person in the direct reciprocity game and to different persons in the generalized reciprocity game, respectively. We fitted several models to predict the probability of cooperation and compared the goodness-of-fit estimates for each model.

We compared the predictive accuracy of each model using a model comparison approach with widely applicable information criteria (WAIC) [34]. Complex models that include many parameters fit the data better than simple models. However, the complex models have a trade-off between overfitting the observed data and hurting predictive accuracy. A model that uses both ones’ own and one’s partner’s action as predictor variables can describe both TFT-like and WSLS-like behavior. However, if the strategy that implies simple reciprocity toward the partner’s previous action (e.g., TFT) is sufficient to explain the various patterns of human cooperation, parameters that depend on one’s own action are redundant. According to previous studies [26–31], we examined whether a strategy that depends on the combination of one’s own and one’s partner’s previous actions can explain the experimental data better even in direct and generalized reciprocal situations.

As discussed above, various strategies of cooperation have been proposed, such as TFT and WSLS (or reinforcement learning). However, as shown in a series of previous experimental studies [22, 35–37], the tendency for cooperation differs among individuals. Therefore, we fitted multilevel models that considered the variance in behavior among individuals to the empirical data through the method of Bayesian analysis, including Markov chain Monte Carlo (MCMC) simulations. A multilevel model assumes that the parameter values vary among individuals and are drawn from a group-level population distribution. Bayesian inference with multilevel models can estimate “posterior distributions” of parameters (namely, intervals of parameters) both for a group-level population and for each individual. Bayesian MCMC simulations can be used to fit complicated multilevel models that include multiple varying effects to the experimental data. Considering inter-individual differences in cooperative tendency, we investigate whether or not one’s own and one’s partner’s previous behaviors are important in explaining the experimental data.

It was expected that the behavioral patterns in the generalized reciprocity game would differ from those in the direct reciprocity game. Costly cooperation would be rewarded by the opponent’s immediate cooperation in the direct reciprocity game, and it would increase one’s future profits, yielded by the achievement of mutual cooperation. However, the immediate return of cooperation was not expected in the generalized reciprocity game. Therefore, how important the players consider achieving mutual cooperation as a goal would differ between the two games. However, as described above, theoretical models considering one’s own and other’s actions, such as the WSLS-like strategy, have been proposed to explain the cooperation in generalized reciprocity [20, 21]. To investigate whether or not different behavioral patterns depending on one’s own and one’s partner’s actions would be observed between two types of dyadic interaction (i.e., interaction with the same individual or interaction with strangers), we used a model that considered one’s own and one’s partner’s previous actions as a candidate model for both the direct and generalized reciprocity games.

## Materials and methods

### Experiments

In this study, 40 undergraduate students (21 women and 19 men with an average age of 20.00 and a standard deviation of 1.66) were invited to participate in the direct reciprocity game, while 40 other undergraduate students (17 women and 23 men with an average age of 19.95 and a standard deviation of 1.48) were involved in the generalized reciprocity game. Participants were recruited from a participant pool via e-mail, and monetary rewards were provided for their participation.

The current study was approved by the ethic committee of Teikyo University, Tokyo, Japan. Participants signed an informed consent form before participating in an experiment.

#### Direct reciprocity game.

Six or four participants played an iterated PD game in each single experimental session, with eight experimental sessions conducted in total. Each participant in a single session was paired with one of the other participants. Both participants in a pair received 20 yen (approximately 18 US cents) from the experimenter as an endowment and could decide either to give the money to his/her partner (cooperate) or to retain it (defect). If the participant gave the money to his/her partner, he/she lost the money, whereas the partner received the doubled sum of the money (40 yen). However, if the participant did not give away the endowment, he/she kept it, and the partner received nothing. The pair of participants was not changed until the end of the game (Fig 1A). Before making each decision, participants were reminded whether or not their partner had given money to them in the previous decision (S1 Fig). They submitted the decision to the partner repeatedly 42 times in total.

(A) Direct reciprocity game. Two players (player X and Y) are paired and each player repeatedly decides whether or not to cooperate with his/her partner. After each decision, each player is informed whether or not his/her partner has cooperated. (B) Generalized reciprocity game. In a group of five players, each player subsequently decides whether or not to cooperate with his/her neighbor. Players V, W, X, Y, and Z submit their decisions to his/her downstream neighbor in this order. After each decision, each player is informed whether or not his/her upstream neighbor had cooperated.

#### Generalized reciprocity game.

Five participants played a gift-giving game in each single experimental session, and eight experimental sessions were conducted in total. Each participant was paired with one of the other participants. One participant in the pair was assigned the role of either a donor or a recipient. The donor received 20 yen from the experimenter as an endowment and could decide either to give the money to the recipient (cooperate) or to retain it (defect). When the donor gave away the money, he/she lost the money, whereas the recipient received the doubled sum of money (40 yen). When the donor did not give away the endowment, he/she kept the money, and the recipient received nothing. After the donor made his/her decision, his/her decision was relayed to the recipient.

In the generalized reciprocity game, all participants contributed to the chain of decisions and submitted his/her decision sequentially, as depicted schematically in Fig 1B. Players V, W, X, Y, and Z submitted the decisions in this order. Each participant submitted his/her decision two times in a single chain of decisions. For instance, player V first made the decision as a donor, and then player W was informed about player V’s decision. Next, player W could decide whether or not to give money to player X. In the similar manner, players Y and Z also made their decisions sequentially. After player Y made a decision toward player V, one rotation of the chain was terminated, and five players made decisions again in the same order. Seven rotations were run for each single chain, and three independent chains of decision were run simultaneously (see S2 Fig). Therefore, each participant submitted a decision 42 times (= 2 decisions × 7 rotations × 3 chains) in total.

#### Procedure.

Upon arrival, participants were accompanied to the laboratory where tablet computers were deployed on desks. Each participant sat in front of a tablet computer. While participating in an experimental game or answering questionnaires, partitions were placed between players so that they could not see each other’s faces or displays.

After all participants had arrived in the laboratory, the experimenter started explaining the rules of either the direct or the generalized reciprocity games. The experimental procedure was presented using audible slides developed in PowerPoint. Participants were also provided with written instruction sheets so that they could put them on their desks during the experiment. After completing this instruction, participants answered questions to confirm their understanding of the rules of the game. After all participants answered the questions correctly, the game was initiated. The experimental game was conducted using tablet computers connected via a Wi-Fi network. The programs for games were implemented in z-Tree [38]. Samples of decision screens displayed to the participants during the game are presented in S1 Fig.

To assure anonymity, the decisions of each participant were recorded using a randomly assigned number. In addition, each participant was assigned a computer-generated random three-letter pseudonym. When a participant made a decision, the pseudonyms of other participants were displayed on his/her computer screen (S1 Fig). In the direct reciprocity game, the pseudonyms assigned to each participant were never changed until the end of the game. The name of a partner was displayed on the screen as a single pseudonym that remained the same during the course of the game. Conversely, in the generalized reciprocity game, once each participant submitted his/her decision, a new pseudonym was assigned to him/her. In this game, the same pseudonym could not be utilized again during the game, and participants were shown different pseudonyms in every decision round.

When a participant submitted a decision, the previously made decisions of both the participant and his/her partner were displayed on the computer screen (see S1 Fig as an example). The participants were informed about his/her opponent’s previous decision in the direct reciprocity game, and about his/her upstream neighbor’s decision in the generalized one. Participants were also informed that there is a probability that errors can occur; i.e., the opposite of the partner’s actual decision was occasionally displayed to the participant. This procedure was used in order to establish a situation in which WSLS performed better theoretically [9] and to obtain as many observations as possible about the way participants reacted to their partner’s cooperation or defection. In case of an error, a participant was likely to be informed that the partner did not give away money, even if the partner did the opposite and vice versa. The probability of an error was calculated at 25%; however, the exact probability value was not revealed to the participants.

After completing the game, each participant was paid individually according to the earnings acquired during the game. On average, the participants who played the direct reciprocity game received 1,340 yen (approximately 12.2 US dollars), whereas those who played the generalized reciprocity game received 1,265 yen. Each experiment took approximately an hour.

### Model fitting

#### Multilevel models.

In this research, four models that predict the probability of cooperation were fitted to the experimental data. Here *p* is the probability of cooperation; *y* is a binary value that denotes the decision of a participant (0 = defection; 1 = cooperation). Each decision is assumed to obey a Bernoulli distribution with probability *p*:
(1)

The models assume a multilevel structure of parameters, which means that the values of the parameters vary depending on the player. A certain parameter corresponding to player *i* that affects a response variable (namely, decision of the player) is denoted by *m*_{i.} The parameter *m*_{i} is assumed to be drawn from a normal distribution with a mean of *μ* and a standard deviation of *σ*:
(2)
where *z* represents a standardized score (z-score) for the parameter for each individual. It is assumed to obey a normal distribution with a mean of 0 and a standard deviation of 1. The quantities *μ* and *σ* are hyperparameters that determine the parameters of each player (*m*_{i}), and *m*_{i} is referred to as the varying effect. Here *μ* represents a group-level effect considered to predict the response variable. In the multilevel model, each individual’s parameter shrinks toward the group-level mean (namely, the hyperparameter). This statistical phenomenon is called “shrinkage,” and it prevents each individual parameter from becoming an outlier. The details of estimating these parameters are explained in the S1 Text.

#### Partner’s action model.

The partner’s action (PA) model represents reciprocity toward the partner’s previous action. The model can be formulated as follows:
(3)
where *P*_{i,t−1} denotes the action of player *i*’s partner (namely, an opponent in the direct reciprocity game or an upstream neighbor in the generalized one) in round *t*–1 (1 = cooperation; 0 = defection), which is displayed to player *i*. Note that the previous action of the partner can be presented erroneously in particular cases due to occasional errors. Here, *α*_{1,i}, *α*_{2,i}, and *v* are the parameters and all of them can range from −∞ to ∞. The first line of the Eq (3) is called the “inverse logit function”, and the function transformed the inferred parameter values into probabilities ranging from 0 to 1. For the following other models in the same manner, the probability of cooperation is estimated using the inverse logit function.

*α*_{1,i} and *α*_{2,i} represent the intercept and slope coefficients affecting cooperation, respectively, and *v* denotes the cooperative tendency when the player is not informed about his/her partner’s decision (namely, the decisions at the first round in the direct reciprocity game or the decisions made by the participants assigned as the first elements of a decision chain in the generalized reciprocity game). When we consider the hyperparameter *v* for the data in the generalized reciprocity game, it could not remove particular divergent transitions, and the efficiency of sampling posterior distributions could deteriorate [39]. Therefore, we used the same parameter value *v* for all participants. (a normal distribution with a mean of 0 and a standard deviation of 10 was set as a prior for *v*).

#### Own and partner’s action model.

In the own and partner’s action (OPA) model, *p* is conditioned according to the combination of both the focal player’s and his/her partner’s previous actions. The model can be described as follows:
(4)
where *O*_{i,t−1} represents player *i*’s action in round *t*–1 (1 = cooperation; 0 = defection); *β*_{1,i}, *β*_{2,i}, *β*_{3,i}, and *β*_{4,i} are parameters based on linear regressions; and *v* is a parameter of the cooperative tendency in cases when information about previous actions is not provided. As in the PA model, we used the same parameter value for *v* for all participants. All parameters, *β*_{1,i}, *β*_{2,i}, *β*_{3,i}, *β*_{4,i} and *v*, can range from −∞ to ∞.

#### Own action model.

For comparison with the above two models, we fitted the own action (OA) model, which assumes that cooperation only depends on the previous action of the focal player. The model was formulated as follows:
(5)
where *γ*_{1,i}, *γ*_{2,i}, and *v* are the parameters, and all of them can range from −∞ to ∞; *γ*_{1,i}, and *γ*_{2,i} represent the intercept and slope coefficients affecting cooperation, respectively; and *v* denotes the cooperative tendency when the focal player makes his/her decision for the first time.

#### Null model.

For another comparison with the PA and OPA models, we fitted the following null model, which includes only a varying intercept for each participant.

(6)In this model, the probability of cooperation by each participant is always determined by a single parameter *ε*_{i}, which does not depend on other predictor variables: *ε*_{i} can range from −∞ to ∞.

#### MCMC simulations.

For each model, the posterior distributions of the parameter values are inferred through MCMC simulations with four independent Markov chains, conducting a total of 5,000 iterations per chain. First, 2,000 iterations are discarded as warm-up iterations; therefore, 12,000 MCMC samples are utilized in total. Here, values [40] are used to evaluate the convergence of MCMC simulations, and we check whether all parameters in each model converge (that is, whether the values are close to 1.00). The MCMC simulations are implemented using stan and rstan package 2.19.3 provided in R 3.6.3 [41, 42]. Stan utilizes a Hamiltonian Monte Carlo method for inference.

#### Model comparison.

WAIC has been utilized to determine the goodness-of-fit of each model. WAIC is defined as an estimate of out-of-sample deviance (the predictive accuracy for new samples) with an adjustment for in-sample deviance (overfitting to observed samples). WAIC values can be derived as follows [34, 43, 44]:
(7)
where *n*, *N*, *s*, and *S* represent the observation (data point), the total number of observations, the MCMC sample, and the total number of MCMC samples, respectively. Pr(*y*_{n}|*Θ*_{s}) is defined as the likelihood: the probability of *y* in observation *n* given the set of inferred parameters in sample *s*, *Θ*_{s}. The lppd is the “log point wise predictive density” indicating predictive accuracy: the likelihood of each observation *n* is averaged over samples, and the logarithm of the averaged likelihood is then summed up across the observations. The *p*_{WAIC} is the “penalty term,” representing the variance in the predictions: the variance in log likelihood over the samples is calculated for observation *n*, and each variance is then summed up across the observations. The smallest value of WAIC indicates the best model in terms of predicting the experimental data.

## Results

### Behavioral results

Fig 2 shows the distributions of participants’ cooperation probabilities conditioned by the partner’s previous decision. These probabilities are denoted by *p*(C|C), and *p*(C|D). Fig 3 shows the distributions of cooperation probabilities conditioned by both one’s own and the partner’s previous decisions, which are denoted by *p*(C|CC), *p*(C|DC), *p*(C|CD), and *p*(C|DD). Figs 2 and 3 also present the fraction of cooperation averaged over all participants (the open triangles in Figs 2 and 3) and empirical cooperation probabilities calculated for each participant (the open circles in Figs 2 and 3). S2 Text details the methods used to calculate these cooperation probabilities. As Fig 2 indicates, the averaged fraction of cooperation after the partner has decided to cooperate, *p*(C|C), in the direct reciprocity game is higher than the chance level (namely, 50%), whereas in the generalized reciprocity game it is near the chance level. Fig 3 shows that in both games the averaged fraction of cooperation after both the focal player and his/her partner have cooperated, *p*(C|CC), is higher than in the other three cases.

(A) Direct reciprocity game. (B) Generalized reciprocity game. Boxplots indicate the participant’s empirical cooperation probabilities: *p*(C|C) and *p*(C|D). The point on each boxplot, the box, the thick line in each box, and the whisker represent each participant, the interquartile range (IQR), the median, and the distances 1.5 × IQR, respectively. The open triangles represent the overall fraction of cooperation averaged over all participants (the error bars represent 95% confidence intervals: ±1.96 × standard error). The filled circles and bars adjacent to the right-hand side of each boxplot indicate the predicted distributions of group-level cooperation probabilities inferred from the partner’s action (PA) model, and . Each filled circle and bar represent the median and the 95% compatibility interval of the predicted distribution, respectively. Each label on the horizontal axis indicates the partner’s decision in the previous round: C, the partner cooperated; and D, he/she defected.

(A) Direct reciprocity game. (B) Generalized reciprocity game. Boxplots indicate the participant’s empirical cooperation probabilities: *p*(C|CC), *p*(C|DC), *p*(C|CD), and *p*(C|DD). The point on each boxplot, the box, the thick line in each box, and the whisker represent each participant, the IQR, the median, and the distances 1.5 × IQR, respectively. The open triangles represent the overall fraction of cooperation averaged over the participants (error bars represent 95% confidence intervals: ±1.96 × standard error). The filled circles and bars adjacent to the right-hand side of each boxplot indicate the predicted distributions of group-level cooperation probabilities inferred from the OPA model, Each filled circles and bar represent the median and the 95% compatibility interval of the predicted distribution, respectively. Each label on the horizontal axis indicates the participant’s and his/her partner’s decision in the previous round: CC, both players had cooperated; DC, the participant defected while his/her partner cooperated; CD, the participant cooperated while his/her partner defected; and DD, both players defected.

### Model comparison

Table 1 presents the WAIC values for each model. The smaller the WAIC value of a model, the better is its prediction performance. For each model, Table 1 also reports *p*_{WAIC}, standard error (SE) of the WAIC, difference in the WAIC between each model and the best model (dWAIC), standard error of the dWAIC (dSE), and the weight of the dWAIC. The weight can be interpreted as the relative distances between the WAIC of the best model and that of the other considered models: the weight for a model *k* (*w*_{k}) is calculated as follows [44]:
(8)
where dWAIC_{k} represents the dWAIC of model *k*.

Table 1 shows that the OPA model has the smallest WAIC value and the highest weight among all the considered models for both the direct and generalized reciprocity games. This indicates that the OPA model has the best performance in terms of predicting the experimental data, regardless of the game type. In both games, the second-best model was the PA model, and the third best was the OA model. The WAIC values of the null model is larger than those of the other three models. Note that in both games, the OPA model predicts the data better than the PA model, even though the OPA model has more parameters and thus risks overfitting to the data.

### Group-level cooperation probabilities inferred from the PA and OPA model

To check whether each model can predict the group-level cooperation probabilities well, the predicted distributions of the cooperation probabilities are inferred for both the PA and the OPA model. The group-level probability of cooperation after the partner has cooperated, denoted by , and that after the partner’s defection, denoted by , were inferred from the PA model. Similarly, the group-level cooperation probabilities, denoted by , , , and , respectively, conditioned according to the combinations of one’s own and the partner’s previous actions, were inferred from the OPA model (see S2 Text for details). The predicted distributions of group-level cooperation probabilities are also shown in Figs 2 and 3 (i.e., filled circles and bars).

As Fig 2 indicates, in both games, the distribution of and predicted by the PA model overlaps the empirical overall fractions of cooperation (the open triangles in the Fig). Therefore, the PA model predicts well the group-level cooperation probability conditioned by the partner’s previous action. Fig 3 also indicates that, in both games, the distributions of , , , and predicted by the OPA model overlap the empirical fraction of cooperation conditioned by one’s own and the partner’s previous actions.

### Difference between group-level cooperation probabilities

To compare the differences between cooperation probabilities, Fig 4 presents the difference of predicted distributions between the group-level inferred cooperation probabilities. Fig 4 also show the probabilities that each difference can be greater than 0 (namely, the shaded area of each distribution).

(A) Direct reciprocity game. (B) Generalized reciprocity game. The predicted distributions were estimated from 12,000 samples retrieved from Markov chain Monte Carlo (MCMC) simulations. The percentages shown in the upper left of each panel indicate the probability that the difference between the probabilities of cooperation is greater than 0 (i.e., the value of the percentage is equal to the shaded area of the distribution in each panel).

In the direct reciprocity game, is greater than the other three probabilities, while is lower than the other ones. Therefore, mutual cooperation enhanced the probability of cooperation in the subsequent decision more than in the other cases, whereas mutual defection suppressed it.

In contrast to the direct reciprocity game, a greater difference between and was not observed in the generalized reciprocity game. The patterns corresponding to the predicted distributions of in the generalized reciprocity game also differ from those in the direct one. These patterns of difference between the probabilities in the generalized reciprocity game suggest that mutual cooperation or defection did not have a great effect on enhancing or suppressing the subsequent cooperation probability in the generalized reciprocity game compared to the direct one. The cooperation probabilities after the partner has cooperated (namely, and ) are greater than the probabilities after the partner has defected (namely, and ), regardless of the participant’s own previous action.

### Individual differences in behavioral patterns among participants

We compared inter-individual variations in behavioral patterns between the two games. The OPA model inferred varying effects for each participant, which determined the cooperation probabilities of each participant (namely, *β*_{1,i}, *β*_{2,i}, *β*_{3,i}, and *β*_{4,i}). The behavioral patterns of each participant are classified according to the posterior distributions of the individuals’ parameter values. One could list a large quantity of patterns by considering the combination of all four cooperation probabilities (namely, , , , and ) or the differences between them. To simplify the categorization of behavioral patterns, we investigated how many participants are classified into TFT-like or WSLS-like strategies according to the predicted distributions of and its difference from the other three probabilities.

Table 2 summarizes the rules used to classify the participants’ behavioral types. First, we classified the participants either as those who tend to cooperate more after mutual cooperation (Types 1, 2, and 3) or as those who do not (Type 4), according to whether or not the 95% compatibility intervals for are greater than 50%. Second, we classified those whose is greater into one of three types, according to the 95% compatibility intervals for the difference between and the other three probabilities: TFT-like (Type 1), WSLS-like (Type 2), or others (Type 3). Theoretically, WSLS should cooperate after mutual defection. However, most participants in fact cooperated less after mutual defection in our experiments, and several previous empirical studies have considered *p*(C|DC) as difference between TFT-like and WSLS-like strategy [32, 33]. Therefore, we distinguish a TFT-like from a WSLS-like strategy according to whether or not – is greater than 0. In S3 and S4 Figs, the predicted distributions of cooperation probabilities and differences between them are shown for each participant.

Table 3 shows the frequencies of each behavioral type in the two games. The distributions of each behavioral type differed significantly between the direct and generalized reciprocity games (Fisher’s exact test: *p* < .01). The behavioral types for which is greater (namely, Types 1, 2, and 3) are observed more in the direct reciprocity game than in the generalized one. In the generalized reciprocity game, most participants are classified into Type 4, and it seems that the participants’ behavioral patterns in that game vary more than in the direct one (S3 and S4 Figs). In both games, some proportions of TFT-like strategies are also observed. In the generalize reciprocity game, the WSLS-like and other behavioral patterns for which is greater than other three probabilities (namely, Type 3) are hardly observed, as compared to the direct reciprocity game.

### Comparison between the multilevel model and another method

As a supplementary analysis for comparing the multilevel model to another method for estimating inter-individual differences, we fitted a non-multilevel OPA model to the data: a “non-pooling” OPA model. The “non-pooling” OPA model separately inferred each parameter for each participant (*β*_{1,i}, *β*_{2,i}, *β*_{3,i}, and *β*_{4,i}) but did not assume that the parameters for each participant obeyed the normal distribution: the model independently estimated the individual-level parameters. For comparison with the multilevel and the non-pooling OPA model, we also fitted a “pooling OPA” model, which assumed that each parameter value (*β*_{1}, *β*_{2}, *β*_{3}, and *β*_{4}) was constant for all participants; i.e., the model ignored inter-individual differences.

As shown in S3 Table, in both the direct and generalized reciprocity games, the WAIC value of the multilevel OPA model was the smallest compared to the other two non-multilevel OPA models: the multilevel OPA model had greater predictive accuracy than both the non-pooling and the pooling OPA models. S5 Fig illustrates that the participants’ parameters inferred by the non-pooling OPA model deviated from the group-level parameters inferred from the multilevel or the pooling OPA models (i.e., overfitting to the data occurred), and the errors seemed to be high. In contrast, the parameters for each participant inferred by the multilevel OPA model were near the group-level parameters inferred by the multilevel or the pooling OPA models (i.e., shrinkage was observed).

## Discussion

The purpose of the current study was to investigate a model that predicts human behavior in both the direct and generalized reciprocity situations. The results of a model comparison revealed that for both the direct and generalized reciprocity games, the model that takes into consideration both one’s own and the partner’s behaviors predicts the experimental data better than the model that uses only the partner’s behavior as reference. The distributions of participants’ behavioral types in each game also suggest that there are various individual strategies and that the OPA model predicts such variations in behavioral types well despite its overfitting risk.

However, the results of the analysis also suggest that people adopt different strategies depending on the type of interaction. In the direct reciprocity situation, the participants generally cooperated or defected more after both the players and their partners had either cooperated or defected, respectively. On the other hand, the average behavioral tendency in the generalized reciprocity game differed from that in the direct reciprocity game. In the generalized reciprocity game, the WAIC value of the OPA model was the smallest among the all models, and various behavioral types that depend on both one’s own and the partner’s actions were observed. Nevertheless, differences between probabilities and a small number of behavioral types with greater suggest that the effect of one’s own and one’s partner’s previous actions on the subsequent cooperation is weak in the generalized reciprocity situation.

The group-level behavioral patterns observed in the direct reciprocity game differed from both complete TFT and WSLS. Even though the partner had cooperated in the previous round, our participants cooperated more often after they had mutually cooperated than after they had defected. Classification of the participants’ behavioral type also suggests that many behavioral types for which is greater (namely, Types 1, 2, and 3) were observed in the direct reciprocity game. Although WSLS was able to predict that the players would repeat the previous behavior yielding larger earnings, most of the participants did not tend to exploit cooperators and did not shift their behavior after mutual defection. In the field of social psychology, it has been argued traditionally that the motivation for cooperation in PD situations is grounded on both players’ preference for cooperation and the expectation that his/her partner would cooperate [45]. A series of experiments have indicated that there exists a correlation between cooperation in social dilemmas and expectations about the opponent’s cooperation [36, 37]. These arguments suggest that people’s motivation behind cooperation would be based on preference for mutual cooperation rather than reaction to higher payoffs. The findings that mutual cooperation enhanced the subsequent cooperation in the direct reciprocity game would be consistent with these arguments.

However, the achievement of mutual cooperation would not be expected in the generalized reciprocity game because of a lack of control of the neighbor’s behavior. In contrast to the direct reciprocity game, it would be difficult for costly cooperation in generalized reciprocity to be rewarded. Therefore, although the OPA model fit to the data better than other models, a weaker effect of mutual cooperation on subsequent cooperation might be observed in the generalized reciprocity game. Similar to previous empirical studies [10, 11], the experimental situation of the current research with a small population and a short cycle of decision chains would relatively induce expectation for return of cooperation, even in the generalized reciprocity case. It has been suggested both theoretically [19, 20] and empirically [17] that cooperation based on generalized reciprocity may be relatively fragile in other conditions, such as a large group size. It is possible that other behavioral models rather than the OPA model may also be suitable for generalized reciprocity in different experimental conditions in which the return of cooperation could be expected to be less. It is thus necessary to investigate whether or not behavioral patterns we observed in the generalized reciprocity case are consistent across different conditions that correspond to a real human society, such as a large group.

An alternative method for estimating individual behavioral patterns is to fit the models to each participant separately using the maximum likelihood method. However, the results from such an analysis can be uncertain when the data sample is limited or there exists an imbalance of cases among individuals. In behavioral experiments using social interactions, even if we increase the number of rounds of decisions, an imbalance of decision cases among the participants may occur. For example, the number of times each participant receives help may differ among participant. In fact, in our experimental data analysis that estimated parameter values independently by each participant (i.e., fitting the non-pooling OPA to the data) seemed to produce uncertain results and overfit to the data. In such a case, a multilevel model with Bayesian inference can provide reliable estimates even though there may be an imbalance in the samples, by inferring the intervals of parameters and shrinkage to the group-level means [44, 46]. As suggested by our analysis comparing the multilevel and non-multilevel models, multilevel modeling can thus be a useful tool for modeling human behavior in social interactions.

However, in this study, the question of the individual’s consistency of behavior across games still remains unclear, as the participants played either of two games. Several previous studies in which the same participants played various experimental games indicated positive correlations of cooperative behavior between games and argued for the existence of a domain-general pro-sociality [36, 47]. The current study conducted dyadic interactions as a basis for human interactions, but it should be extended to other situations, such as social dilemmas [24–30]. Further investigation is required to confirm the individual consistency of strategies across different domains.

Evolutionary game theory has provided a description of the evolution of human cooperation. The theory should be tested as to whether people adopt the strategies assumed in the theoretical models, and various empirical studies using laboratory experiments have examined this issue [4, 5]. The combination of laboratory experimental methods and analytical approaches to human behavioral data would allow the derivation of fruitful implications concerning both theoretical and empirical investigations of human cooperation.

## Supporting information

### S1 Fig. Samples of the decision screens displayed to the participants.

https://doi.org/10.1371/journal.pone.0242607.s001

(PDF)

### S2 Fig. Example of simultaneously running three independent chains of decisions in the generalized reciprocity game.

https://doi.org/10.1371/journal.pone.0242607.s002

(PDF)

### S3 Fig. Individual parameter values inferred separately for each participant in the direct reciprocity game.

https://doi.org/10.1371/journal.pone.0242607.s003

(PDF)

### S4 Fig. Individual parameter values inferred separately for each participant in the generalized reciprocity game.

https://doi.org/10.1371/journal.pone.0242607.s004

(PDF)

### S5 Fig. Individual parameter values inferred by the multilevel, non-pooling, and pooling own and partner’s action (OPA) model.

https://doi.org/10.1371/journal.pone.0242607.s005

(PDF)

### S1 Table. Posterior distributions of the parameters for each model in the direct reciprocity game.

https://doi.org/10.1371/journal.pone.0242607.s006

(PDF)

### S2 Table. Posterior distributions of the parameters for each model in the generalized reciprocity game.

https://doi.org/10.1371/journal.pone.0242607.s007

(PDF)

### S3 Table. WAIC values for the multilevel, non-pooling, and pooling own and partner’s action (OPA) model.

https://doi.org/10.1371/journal.pone.0242607.s008

(PDF)

### S1 Text. Parameter inference for the multilevel models.

https://doi.org/10.1371/journal.pone.0242607.s009

(PDF)

### S2 Text. Method for calculating the cooperation probabilities.

https://doi.org/10.1371/journal.pone.0242607.s010

(PDF)

## Acknowledgments

I acknowledge Masanori Takezawa for his helpful comments on the manuscript and the colleagues at the Department of Psychology, Teikyo University, for their cooperation in recruiting participants. I would like to thank Enago (www.enago.jp) for the English language review.

## References

- 1. Nowak MA. Five rules for the evolution of cooperation. Science. 2006; 314(5805): 1560–1563. pmid:17158317
- 2.
Nowak MA. Evolutionary Dynamics: Exploring the Equations of Life. Cambridge: Harvard University Press; 2006.
- 3.
McElreath R, Boyd R. Mathematical Models of Social Evolution: A Guide for the Perplexed. Chicago and London: The University of Chicago Press; 2007.
- 4. Rand DG, Nowak MA. Human cooperation. Trends Cogn Sci. 2013; 17(8): 413–425. pmid:23856025
- 5. Perc M, Jordan JJ, Rand DG, Wang Z, Boccaletti S, Szolnoki A. Statistical physics of human cooperation. Phys Rep. 2017; 687(8); 1–51.
- 6. Axelrod R, Hamilton WD. The evolution of cooperation. Science. 1981; 211(4489): 1390–1396. pmid:7466396
- 7.
Axelrod R. The Evolution of Cooperation. New York: Basic Books; 1984.
- 8. Trivers RL. The evolution of reciprocal altruism. Q Rev Biol. 1971; 46(1): 35–57.
- 9. Nowak MA, Sigmund K. A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoner’s dilemma game. Nature. 1993; 364: 56–58. pmid:8316296
- 10. Yamagishi T, Cook KS. Generalized exchange and social dilemmas. Soc Psychol Quart. 1993; 56(4): 235–248.
- 11. Greiner B, Levati MV. Indirect reciprocity in cyclical networks: an experimental study. J Econ Psychol. 2005; 26(5): 711–731.
- 12. Bartlett MY, DeSteno D. Gratitude and prosocial behavior: helping when it costs you. Psychol Sci. 2006; 17(4): 319–325. pmid:16623689
- 13. Stanca L. Measuring indirect reciprocity: Whose back do we scratch? J Econ Psychol. 2009; 30(2): 190–202.
- 14. DeSteno D, Bartlett MY, Baumann J, Williams LA, Dickens L. Gratitude as moral sentiment: Emotion-guided cooperation in economic exchange. Emotion. 2010; 10(2): 289–293. pmid:20364907
- 15. Fowler JH, Christakis NA. Cooperative behavior cascades in human social networks. Proc Natl Acad Sci USA. 2010; 107(12): 5334–5338. pmid:20212120
- 16. Gray K, Ward AF, Norton MI. Paying it forward: Generalized reciprocity and the limits of generosity. J Exp Psychol Gen. 2014; 143(1): 247–254. pmid:23244034
- 17. Horita Y, Takezawa M, Kinjo T, Nakawake Y, Masuda N. Transient nature of cooperation by pay-it-forward reciprocity. Sci Rep. 2016; 6: 19471 pmid:26786178
- 18. Capraro V. Marcelletti A. Do good actions inspire good actions in others? Sci Rep. 2014; 4: 7470. pmid:25502617
- 19. Boyd R, Richerson PJ. The evolution of indirect reciprocity. Soc Netw. 1989; 11(3): 213–236.
- 20. Pfeiffer T, Rutte C, Killingback T, Taborsky M, Bonhoeffer S. Evolution of cooperation by generalized reciprocity. Proc R Soc B. 2005; 272(1568): 1115–1120. pmid:16024372
- 21. Hamilton IM, Taborsky M. Contingent movement and cooperation evolve under generalized reciprocity. Proc R Soc B. 2005; 272(1578): 2259–2267. pmid:16191638
- 22. Fischbacher U, Gächter S, Fehr E. Are people conditionally cooperative? Evidence from a public goods experiment. Econ Lett. 2001; 71(3): 397–404.
- 23. Fehr E, Fischbacher U. Social norms and human cooperation. Trends Cogn Sci. 2004; 8(4), 185–190: pmid:15050515
- 24. Burton-Chellew MN, West SA. Prosocial preferences do not explain human cooperation in public-goods games. Proc Natl Acad Sci USA. 2013; 110(1): 216–221. pmid:23248298
- 25. Burton-Chellew MN, Nax HH, West SA. Payoff-based learning explains the decline in cooperation in public goods games. Proc R Soc B. 2015; 282(1801): 20142678. pmid:25589609
- 26. Horita Y, Takezawa M, Inukai K, Kita T, Masuda N. Reinforcement learning accounts for moody conditional cooperation behavior: Experimental results. Sci Rep. 2017; 7: 39275. pmid:28071646
- 27. Grujić J, Fosco C, Araujo L, Cuesta JA, Sánchez A. Social experiments in the mesoscale: Humans playing a spatial prisoner’s dilemma. PLOS ONE. 2010; 5(11): e13749. pmid:21103058
- 28. Gracia-Lázaro C, Ferrer A, Ruiz G, Tarancón A, Cuesta JA, Sánchez A, et al. Heterogeneous networks do not promote cooperation when humans play a Prisoner’s Dilemma. Proc Natl Acad Sci USA. 2012; 109(32): 12922–12926. pmid:22773811
- 29. Grujić J, Röhl T, Semmann D, Milinski M, Traulsen A. Consistent strategy updating in spatial and non-spatial behavioral experiments does not promote cooperation in social networks. PLOS ONE. 2012; 7(11): e47718. pmid:23185242
- 30. Grujić J, Gracia-Lázaro C, Milinski M, Semmann D, Traulsen A, Cuesta JA, et al. A comparative analysis of spatial Prisoner’s Dilemma experiments: Conditional cooperation and payoff irrelevance. Sci Rep. 2014; 4: 4615. pmid:24722557
- 31. Van den Berg P, Molleman L, Weissing FJ. Focus on the success of others leads to selfish behavior. Proc Natl Acad Sci USA. 2015; 112(9): 2912–2917. pmid:25730855
- 32. Wedekind C, Milinski M. Human cooperation in the simultaneous and the alternating prisoner's dilemma: Pavlov versus generous tit-for-tat. Proc Natl Acad Sci USA. 1996; 93(7): 2686–2689. pmid:11607644
- 33. Milinski M, Wedekind C. Working memory constrains human cooperation in the Prisoner’s Dilemma. Proc Natl Acad Sci USA. 1998; 95(23): 13755–13758. pmid:9811873
- 34. Watanabe S. Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J Mach Learn Res. 2010; 11: 3571–3594.
- 35. Kurzban R, Houser D. Experiments investigating cooperative types in humans: A complement to evolutionary theory and simulations. Proc Natl Acad Sci USA. 2005; 102(5): 1803–1807. pmid:15665099
- 36. Yamagishi T, Mifune N, Li Y, Shinada M, Hashimoto H, Horita Y, et al. Is behavioral pro-sociality game-specific? Pro-social preference and expectations of pro-sociality. Organ Behav Hum Decis Process. 2013; 120(2): 260–271.
- 37. Pletzer JL, Balliet D, Joireman J, Kuhlman DM, Voelpel SC, Van Lange PAM. Social value orientation, expectations, and cooperation in social dilemmas: A meta-analysis. Eur J Pers. 2018; 32(1): 62–83.
- 38. Fischbacher U. z-Tree: Zurich toolbox for ready-made economic experiments. Exp Econ. 2007; 10: 171–178.
- 39.
Stan Development Team 2019 Stan User’s Guide Version 2.22. Available at: https://mc-stan.org/docs/2_22/stan-users-guide/index.html (Accessed: 9, April, 2020)
- 40. Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences. Stat Sci. 1992; 7(4): 457–472.
- 41.
Stan Development Team. RStan: the R Interface to Stan. R Package Version 2.16.2. 2017; Available from: http://mc-stan.org
- 42.
R Core Team. R: a Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing; 2018.
- 43. Gelman A, Hwang J, Vehtari A. Understanding predictive information criteria for Bayesian models. Stat Comput. 2014; 24: 997–1016. - 013-9416-2
- 44.
McElreath R. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Boca Raton, FL: CRC Press; 2016.
- 45. Pruitt DG, Kimmel MJ. Twenty years of experimental gaming: Critique, synthesis, and suggestions for the future. Annu Rev Psychol. 1977; 28: 363–392.
- 46. Katahira K. How hierarchical models improve point estimates of model parameters at the individual level. J Math Psychol. 2016; 73: 37–58.
- 47. Peysakhovich A, Nowak MA, Rand DG. Humans display a ‘cooperative phenotype’ that is domain general and temporally stable. Nat Commun. 2014; 5: 4939. pmid:25225950