Fig 1.
Behavior of the aspiration learner in the repeated PD game.
(A) Payoff matrix. The payoff values for the row player are shown. (B) Concept of satisficing in the aspiration-based reinforcement learning model. The payoff values shown on the horizontal axis are those for the focal player. (C) Relationship between the aspiration level, A, and the approximate (un)conditional strategy, given the payoff matrix shown in (A).
Fig 2.
Probability of cooperation in the repeated PDG game on the square lattice having 10 × 10 nodes.
(A) Mean time courses of the actual probability of cooperation, . The lines represent the actual probability of cooperation averaged over the 102 players and 103 simulations. We set β = 0.2 and A = 0.5. The shaded regions represent the error bar calculated as one standard deviation. (B) Probability of cooperation for various values of the sensitivity of the stimulus to the reward, β, and the aspiration level, A. The shown values are averages over the 102 players, the first tmax = 25 rounds, and 103 simulations.
Fig 3.
CC and MCC in the repeated PDG on the square lattice.
The actual probability of cooperation, , is plotted against the fraction of cooperative neighbors in the previous round, fC. The error bars represent the mean ± standard deviation calculated on the basis of all players, tmax = 25 rounds, and 103 simulations. The circles represent the results not conditioned on at−1. The triangles and the squares represent the results conditioned on at−1 = C and at−1 = D, respectively. We set (A) β = 0.1 and A = 0.5, (B) β = 0.4 and A = 0.5, (C) β = 0.4 and A = 2.0, and (D) β = 0.4 and A = −1.0.
Fig 4.
Search of CC and MCC patterns in the repeated PDG on the square lattice.
(A) Schematic of the linear fit, . (B) Slope α1 of the linear fit when not conditioned on the focal player’s previous action, at−1. (C) α1 when conditioned on at−1 = C. (D) α1 when conditioned on at−1 = D. (E) Difference between the intercept, α2, obtained from the linear fit conditioned on at−1 = C and that conditioned on at−1 = D. For each combination of the β and A values, a linear fit was obtained by the least-squares method on the basis of the 102 players, tmax = 25 rounds, and 103 simulations, yielding 2.5 × 106 samples in total.
Fig 5.
CC and MCC patterns in the repeated PGG in a group of four players.
(A)–(C) Contribution by a player (i.e., at) conditioned on the average contribution by the other group members in the previous round (i.e., fC). We set β = 0.4 and A = 0.9. (A) X = 0.3, (B) X = 0.4, and (B) X = 0.5. The circles represent the results not conditioned on at−1. The triangles and the squares represent the results conditioned on at−1 ≥ X and at−1 < X, respectively. (D) Slope α1 of the linear fit, at ≈ α1 fC + α2, when not conditioned on at−1. (E) α1 when conditioned on at−1 ≥ X. (F) α1 when conditioned on at−1 < X. (G) Difference between α2 obtained from the linear fit conditioned on at−1 ≥ X and that conditioned on at−1 < X. The mean and standard deviation in (A)–(C) and the linear fit used in (D)–(G) were calculated on the basis of the four players, tmax = 25 rounds, and 2.5 × 104 simulations, yielding 2.5 × 106 samples in total.