Spike-based Decision Learning of Nash Equilibria in Two-Player Games
Figure 2
Playing blackjack with pRL converges toward pure Nash equilibrium.
(A) Average strategy () after
(open circles) and
(filled circles) games where the gambler (blue) is a neural net as well as the croupier (black). The dotted vertical lines left of
and
show the separation line of drawing/not drawing another card for the optimal Nash strategy pair. (B) Average strategy (
) after
games for a neural net as gambler playing against a croupier that follows a given strategy
(blue),
(red) or
(green). The colored dotted lines left of
show the separation line of drawing/not drawing another card for the optimal strategy given that the croupier stops drawing at
(from left to right). (C) Average reward (
) of the gambler for the scenario described in (B). The colored dotted lines show the maximal reachable average reward. (D) Average strategy (
) over the last
out of a total of
games for a neural net (red) or human (green) as gambler playing against a croupier that follows a given strategy
. The initial weights of the network were chosen such that the strategy in the first
trials (blue) mimics the strategy of humans instructed about the game rules (black).