Spike-based Decision Learning of Nash Equilibria in Two-Player Games

doi:10.1371/journal.pcbi.1002691

Spike-based Decision Learning of Nash Equilibria in Two-Player Games

Figure 3

pRL but not TD-learning fits data and follows a mixed Nash equilibrium.

(A) Choice behavior for pRL versus pRL (employee green, employer red) and human versus human (employee black, employer gray) [11]. The cost of inspection was stepped from to to , respectively, and this does also correspond to the shirk rate in Nash equilibrium (thick black lines). The inspection rate in the Nash equilibrium would always be . (B) Average choice behavior of pRL vs pRL (dark green circles) and TD vs TD (light green circles), pRL for the employee vs computer algorithm for the employer (blue squares), human vs human (black), human as an employee vs computer algorithm (orange) and monkey vs computer algorithm (cyan) for trials/block as function of the inspection cost. The solid line indicates the Nash equilibrium. (C) Reward as function of the inspection cost for trials/block. Coloring as in (B). pRL simulations are more similar to the experimental data than the TD simulations. (D) Average choice behavior as in (B) but for trials/block. The inspect rates for pRL vs pRL (TD vs TD) (dark (light) red circles) and pRL vs computer algorithm (purple squares) are shown too. The lines indicate the Nash equilibrium for the employee (diagonal) and the employer (horizontal). pRL behaves according to the Nash equilibrium, whereas TD does not. (E) Time course of the probability to shirk with inspection cost for pRL vs algorithm (blue line) and pRL vs pRL (TD vs TD) (dark (light) green line). For the latter the probability of the employer to inspect is shown too (dark (light) red line). pRL oscillates around the Nash equilibrium (drawn lines), whereas TD completely deviates from Nash. (F) Time course of the probability to shirk or inspect respectively with inspection cost for pRL vs pRL (green respectively red, solid) as in E, but shifted up for clarity and overlaid with the negative change in the shirk rate (green dashed) and the change in the inspect rate (red dashed) to show the counteractive behavior.

doi: https://doi.org/10.1371/journal.pcbi.1002691.g003