Spike-based Decision Learning of Nash Equilibria in Two-Player Games
Figure 4
Covariance learning rules may lead to a mixed Nash equilibrium, but also to deterministic non-Nash strategies. pRL fits data better than basic reinforcement models.
Time course of the probability to shirk (A,C) and inspect (B,D) with inspection cost for pCOV vs algorithm (A,B) and pCOV vs pCOV (C,D). In each panel the horizontal lines depict the Nash equilibrium, and for 10 simulation runs inspection and shirk rates are shown (same color in (A,B) and (C,D), respectively, correspond to the same run). Only a small fraction of all runs converge or oscillate around the Nash equilibrium, while the other runs result in a deterministic strategy pair. The initial distribution of synaptic weights
was Gauss with mean
and standard deviation
. The learning rate was set to
, but
did not change the proportion of runs converging to the pure strategy. (E) Average choice behavior of pRL vs pRL (green), RE1 vs RE1 (blue), RE3 vs RE3 (red) and human vs human (black) for
trials/block as function of the inspection cost. The light red circles show the average choice behavior for RE3 vs RE3 and
trials/block. Individual runs converged to a pure strategy, hence the shown averages over 200 runs reflect the percentage of runs converging to a pure shirk strategy. (F) Reward as function of the inspection cost for
trials/block. Coloring as in (E). The solid lines indicate the Nash equilibrium.