Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin

doi:10.1371/journal.pcbi.1005034

Fig 1.

Behavior of the aspiration learner in the repeated PD game.

(A) Payoff matrix. The payoff values for the row player are shown. (B) Concept of satisficing in the aspiration-based reinforcement learning model. The payoff values shown on the horizontal axis are those for the focal player. (C) Relationship between the aspiration level, A, and the approximate (un)conditional strategy, given the payoff matrix shown in (A).

More »

Expand

Fig 2.

Probability of cooperation in the repeated PDG game on the square lattice having 10 × 10 nodes.

(A) Mean time courses of the actual probability of cooperation, . The lines represent the actual probability of cooperation averaged over the 10² players and 10³ simulations. We set β = 0.2 and A = 0.5. The shaded regions represent the error bar calculated as one standard deviation. (B) Probability of cooperation for various values of the sensitivity of the stimulus to the reward, β, and the aspiration level, A. The shown values are averages over the 10² players, the first t_max = 25 rounds, and 10³ simulations.

More »

Expand

Fig 3.

CC and MCC in the repeated PDG on the square lattice.

The actual probability of cooperation, , is plotted against the fraction of cooperative neighbors in the previous round, f_C. The error bars represent the mean ± standard deviation calculated on the basis of all players, t_max = 25 rounds, and 10³ simulations. The circles represent the results not conditioned on a_t−1. The triangles and the squares represent the results conditioned on a_t−1 = C and a_t−1 = D, respectively. We set (A) β = 0.1 and A = 0.5, (B) β = 0.4 and A = 0.5, (C) β = 0.4 and A = 2.0, and (D) β = 0.4 and A = −1.0.

More »

Expand

Fig 4.

Search of CC and MCC patterns in the repeated PDG on the square lattice.

(A) Schematic of the linear fit, . (B) Slope α₁ of the linear fit when not conditioned on the focal player’s previous action, a_t−1. (C) α₁ when conditioned on a_t−1 = C. (D) α₁ when conditioned on a_t−1 = D. (E) Difference between the intercept, α₂, obtained from the linear fit conditioned on a_t−1 = C and that conditioned on a_t−1 = D. For each combination of the β and A values, a linear fit was obtained by the least-squares method on the basis of the 10² players, t_max = 25 rounds, and 10³ simulations, yielding 2.5 × 10⁶ samples in total.

More »

Expand

Fig 5.

CC and MCC patterns in the repeated PGG in a group of four players.

(A)–(C) Contribution by a player (i.e., a_t) conditioned on the average contribution by the other group members in the previous round (i.e., f_C). We set β = 0.4 and A = 0.9. (A) X = 0.3, (B) X = 0.4, and (B) X = 0.5. The circles represent the results not conditioned on a_t−1. The triangles and the squares represent the results conditioned on a_t−1 ≥ X and a_t−1 < X, respectively. (D) Slope α₁ of the linear fit, a_t ≈ α₁ f_C + α₂, when not conditioned on a_t−1. (E) α₁ when conditioned on a_t−1 ≥ X. (F) α₁ when conditioned on a_t−1 < X. (G) Difference between α₂ obtained from the linear fit conditioned on a_t−1 ≥ X and that conditioned on a_t−1 < X. The mean and standard deviation in (A)–(C) and the linear fit used in (D)–(G) were calculated on the basis of the four players, t_max = 25 rounds, and 2.5 × 10⁴ simulations, yielding 2.5 × 10⁶ samples in total.

More »

Expand