Advertisement

< Back to Article

Spike-based Decision Learning of Nash Equilibria in Two-Player Games

Figure 2

Playing blackjack with pRL converges toward pure Nash equilibrium.

(A) Average strategy () after (open circles) and (filled circles) games where the gambler (blue) is a neural net as well as the croupier (black). The dotted vertical lines left of and show the separation line of drawing/not drawing another card for the optimal Nash strategy pair. (B) Average strategy () after games for a neural net as gambler playing against a croupier that follows a given strategy (blue), (red) or (green). The colored dotted lines left of show the separation line of drawing/not drawing another card for the optimal strategy given that the croupier stops drawing at (from left to right). (C) Average reward () of the gambler for the scenario described in (B). The colored dotted lines show the maximal reachable average reward. (D) Average strategy () over the last out of a total of games for a neural net (red) or human (green) as gambler playing against a croupier that follows a given strategy . The initial weights of the network were chosen such that the strategy in the first trials (blue) mimics the strategy of humans instructed about the game rules (black).

Figure 2

doi: https://doi.org/10.1371/journal.pcbi.1002691.g002