Predicting Human Cooperation

The Prisoner’s Dilemma has been a subject of extensive research due to its importance in understanding the ever-present tension between individual self-interest and social benefit. A strictly dominant strategy in a Prisoner’s Dilemma (defection), when played by both players, is mutually harmful. Repetition of the Prisoner’s Dilemma can give rise to cooperation as an equilibrium, but defection is as well, and this ambiguity is difficult to resolve. The numerous behavioral experiments investigating the Prisoner’s Dilemma highlight that players often cooperate, but the level of cooperation varies significantly with the specifics of the experimental predicament. We present the first computational model of human behavior in repeated Prisoner’s Dilemma games that unifies the diversity of experimental observations in a systematic and quantitatively reliable manner. Our model relies on data we integrated from many experiments, comprising 168,386 individual decisions. The model is composed of two pieces: the first predicts the first-period action using solely the structural game parameters, while the second predicts dynamic actions using both game parameters and history of play. Our model is successful not merely at fitting the data, but in predicting behavior at multiple scales in experimental designs not used for calibration, using only information about the game structure. We demonstrate the power of our approach through a simulation analysis revealing how to best promote human cooperation.


What are the details of the fEWA Model?
This section is summarizing parts of [1,2]; all the equations are directly from [1,2]. We summarize this information from those papers here for our readers to easily access the details of the model.
Camerer and Ho [3] created the experience-weighted attraction (EWA) model as a hybrid of reinforcement and belief models, the two primary behavioral game theory models that preceded it.
For player i, there are m (two for the Prisoner's Dilemma) strategies indexed by j.
• Each player has an initial attraction for each strategy, A j i (0).
• i's chosen strategy is s i (t).
• i's opponent's chosen strategy is s −i (t).
• i's payoffs are π i (s j i (t), s −i (t)). • The indicator function yields I(x, y) = 0 if x = y, and I(x, y) = 1 if x = y.
Attractions, A j i (t), are updated after each time period, t, for each strategy, j, for each player, i, with following equation.
Then, attractions are mapped into probabilities of choosing discrete actions during the next time period (Cooperate and Defect for the Prisoner's Dilemma) with a logistic stochastic response function, which has a parameter, λ, that controls how responsive the agent is to differences in attractions. If λ = 0, the choice is made randomly and is independent of the attraction values.
In order to address overfitting EWA to data, Ho, Camerer and Chong [1] created a one parameter version of EWA, functional (self-tuning) experience-weighted attraction learning (fEWA), that is designed to forecast behavior in new games where there is no data from that game to estimate parameters. Thus, this model is the state-of-the-art behavioral game theoretic model for our purposes.
fEWA replaces the free parameters of EWA with functions of the data that can vary across players and time. Following Camerer and Ho [2], we set κ to 0 because this captures almost all of the familiar cases of estimation of EWA on a variety of data sets from different games and different populations. Then the other two parameters of EWA, φ and δ, are replaced by functions of player i's experience up to period t: φ i (t) and δ ij (t).
φ i (t) captures how quickly a player perceives the environment to be changing and if there is more change less weight is placed on attractions. This is accomplished by creating a "surprise index" of the degree of change of the most recent observation of opponent's play from the interaction history. The surprise index is a function of two vectors. A player has a cumulative history vector of the other player's choice of strategy k at each period up until the current period. An element of this vector is h k The player also creates an immediate history vector of 0's and 1's, where there is a 1 for the strategy played that period s k −i = s −i (t) and a 0 for the other strategy. An element of this vector is r k , is the weight placed on forgone payoffs. Players reinforce chosen strategies and unchosen strategies with profits larger than the profits in the current period by a weight of one, and reinforce unchosen strategies with profits smaller than the profits in the current period by a weight of zero.
, then δ ij (t) = 0. Therefore, the attractions are updated by the following equation, where φ i (t) and δ ij (t) are defined as above.
Like our logistic regression models, our computational simulation model implementation uses probabilities of actions to predict average cooperation levels. We use the logistic stochastic response function, P j , to map fEWA attractions into the probability of cooperation. This is the standard choice model used for fEWA and other behavioral game theories [2]. We estimate the λ parameter of the response function by searching for a value that minimizes the negative log likelihood of the model on the training data. We used the base R stats::optimize implementation of a combination of golden section and successive parabolic interpolation to search for the optimal values.
We derived initial attractions, A j i (0) by simulating period zero play as random choices, the same way we imputed lagged decisions for period one play for the dynamic-only logistic regression model. The initial experience weight, N (−1), is set to 1, and at period zero, we randomly draw both players' actions from a Bernoulli distribution with equal probability over the two possible actions, then we update attractions based on those actions and the associated simulated payoffs assuming equal initial attraction, A j i (−1): Then, as play progresses in an interaction, attractions are updated according to the EWA formula. Because fEWA uses the raw payoff values to determine attractions to actions, we converted all payoff values in the data to dollars and then for each game structure we normalized payoff values by subtracting each by the lowest possible payoff for that game structure, which is important to help the model cross-game forecast [1].

Why a global function for all players, rather than heterogeneous agents?
Studies of individual behavior, such as laboratory games, often conduct multiple-type and even individual-level estimation, where human participants are clustered into segments and then the behavioral models' parameters are estimated within those segments. In the case of individual-level estimation, the segments are individual players, and so the models' parameters are estimated for each individual player; see [4] for an application of this approach to public goods games. In laboratory games, a human participant's "type" can be estimated using their observed actions. It can be more effective to incorporate observed actions as features in a model that is learned by directly predicting actions than to use unsupervised clustering as a preprocessing step. With our approach, agents can be heterogeneous by having different values of the features used in the behavioral model.
Our model adds a touch of randomness to each agent's observations of the two actions taken in the previous period of the current interaction to simulate heterogeneity. Fig. S2 shows a million draws from the normal distribution, N(µ = 0, σ = 0.3), that was used to draw a value to add to the probability of cooperation of each agent in the previous period at each time period > 1. What are the details of the model comparisons?

What are the details of the sensitivity analysis?
Simulation models are used to investigate modeled systems by varying inputs to the model and measuring the effect on outputs. Input values are often varied one at a time, while keeping the others at "default values," but this prohibits the model analyst from discovering potential interaction effects and provides sensitivities that are conditional on the default values used [5,6]. To mitigate these issues, we base our sensitivity analysis on a global Latin Hypercube Sampling of the game structure variables [6][7][8]. We then analyze the relationship between the matrix of Latin Hypercube Sampled input parameters and the resulting vector of simulated outputs with two statistical methods. A partial correlation coefficient analysis estimates the linear effect of increasing or decreasing a given input on the output value, linearly discounting the effects of the other inputs. We estimate the partial rank correlation coefficient for each game design variable, which can capture non-linear (but monotonic) relationships [6,9,10].
We also conducted a standardized rank regression coefficient analysis to confirm that our results are robust to the specific analysis technique [9,10]. Furthermore, a standardized rank regression coefficient analysis can provide an R 2 value estimating the amount of variation explained [6]. The standardized rank regression coefficient analysis ranks the variables the same as the partial rank correlation coefficient analysis (Fig. S7), and has an R 2 = 0.83, which indicates that most of the variation in the simulation model is explained by the analysis. A.