Win-stay-lose-learn promotes cooperation in the prisoner’s dilemma game with voluntary participation

Voluntary participation, demonstrated to be a simple yet effective mechanism to promote persistent cooperative behavior, has been extensively studied. It has also been verified that the aspiration-based win-stay-lose-learn strategy updating rule promotes the evolution of cooperation. Inspired by this well-known fact, we combine the Win-Stay-Lose-Learn updating rule with voluntary participation: Players maintain their strategies when they are satisfied, or players attempt to imitate the strategy of one randomly chosen neighbor. We find that this mechanism maintains persistent cooperative behavior, even further promotes the evolution of cooperation under certain conditions.


Introduction
The emergence and stability of cooperation among unrelated selfish individuals has become a major challenge in biology, economic, and behavioral sciences [1][2][3]. In order to study this issue, evolutionary game theory has become a useful tool [4]. In particular, a simple model, the prisoner's dilemma (PD) game, was considered as a paradigm [5]. In its basic version, two players concurrently decide to take one of two strategies: cooperation (C) and defection (D). They will receive the reward R if both cooperate and the punishment P if both defect. However, if one player defects while the other decides to cooperate, the former will get the temptation T while the latter will get the sucker's payoff S. The payoffs are ordered as T > R > P > S so that in the well-mixed case defection is the best strategy regardless of the opponent strategy.
In the pioneering work by Nowak and May [6], spatial games were introduced, where players are arranged on the spatially structured topology and interact only with their direct neighbors. Thus the topology has become a determinant for the success of cooperative behavior (regular networks [7,8], small world networks [9][10][11] and scale-free networks [12][13][14], multilayer network [15][16][17]). Other approaches facilitating the evolution of cooperation include environment [18][19][20][21], payoff [22][23][24][25], features of players [26][27][28][29] and so on (for a comprehensive understanding referring to Ref. [30,31]). Besides, voluntary participation has recently been demonstrated to be a simple yet effective mechanism to promote persistent cooperative a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 behavior [32,33]. The introduction of loner strategy, someone may do not willing to participate in the PD game and would rather take a small but fixed payoff, can lead to a rock-scissorspaper cyclic dominance that means the cooperation can still survive even with large temptation of defection [33]. In these works, individual players are assumed to update their strategies by learning from their neighbors in almost every round of game. Meanwhile, for reality, the winstay rules [34][35][36][37][38][39] are introduced, which assume individual players only change their strategies when they feel dissatisfied. These rules are proved to be a robustly mechanism that can promote the evolution of cooperation independently of the initial conditions. Naturally, an interesting question is here posed: how cooperation fares when the win-stay updating rule is introduced into PD with voluntary participation.
In this paper, we combine the win-stay-lose-learn strategy updating rule with voluntary participation. The so-called win-stay-lose-learn strategy updating rule is designed as follows. Players maintain their strategies when they are satisfied, or players attempt to imitate the strategy of one randomly chosen neighbor. Our main aim is to study the impacts of such a mechanism on the spreading of cooperation. Through numerical simulations, we show that such a scenario can maintain persistent cooperative behavior, even further promotes the evolution of cooperation under certain conditions.

Results
We start our result with the case where cooperators, defectors and loners are distributed uniformly at random, each thus initially occupying third of the square lattice. As the main parameters, we consider the aspiration level A and the temptation to defect b. Fig 1 (A) shows the fraction of cooperation ρ c as a function of the temptation to defect b for different aspiration levels A. We find that the aspiration level has a significant influence on the evolution of cooperation. As shown in Fig 1, the fraction of cooperation decreases slowly with increasing A, but the temptation to defect has no effect on cooperation, e.g., for A = 0 and A = 0.15, ρ c = 0.33 and ρ c = 0.31, respectively, irrespective of the value of b. When 0.3 < A 0.5, the fraction of cooperation decreases as A increases. In addition, as b increases, transitions to different stationary states can be observed for certain values of A. For example, the break point occurs at b = 1.7 for A = 0.5, which will be explained below. When 0.5 < A < 0.75, highest levels of cooperation occurs if the value of b is larger than 1.2 around. When A > 0.75, the fraction of cooperation is maintained at 0.18. Last, it is worth pointing out that, when A is large (A = 2.0), individuals are always dissatisfied with their payoffs, our model then returns to the traditional version of prisoner's dilemma game. Based on the above results, the present updating rule can effectively facilitate the evolution of cooperation (except for A = 2.0, cooperation can survive only if b<1.05). By contrast, with joint interaction of voluntary participation and win-staylose-learn updating, cooperation can not only survive but also thrive in the case of larger b.
In order to obtain a more complete picture about the joint effects of the aspiration level and the temptation to defect, we show the simulation results as a function of both A and b. As shown in Fig 2, the results are consistent with those presented in Fig 1(A), e.g., when A < 0.3, the fraction of cooperation decreases discontinuously with increasing A, irrespective of the value of b. when A = 0.3, the cooperation level recovers to 0.4, and then decreases discontinuously with increasing A until A = 0.5. At the same time, the transition can be observed at a fixed value of b for each value of A. It is interesting that the highest level of cooperation occurs within the interval of 0.5 < A < 0.75 when b > 1.2. Moreover, it can be observed that as A increases, discontinuous transitions occur at A = 0.0,0.075,0.15,0.225,0.3,0.5,0.75. These transitions can be explained as follows. On a square lattice with nearest neighbor interactions, the payoffs of a cooperator, a defector and a loner are given by n 1 R + n 2 S + n 3 σ, n 4 T + n 5 P + n 6 σ and 4σ, respectively, where n k 2 {0,1,2,3,4}, and k 2 {1,2,3,4,5,6}. Given that T = b, R = 1, P = S = 0, and σ = 0.3 the above payoffs can be simplified as n 1 + 0.3n 3 , n 4 b + 0.3n 6 and 1.2, respectively. In our model, when an individual is dissatisfied, it will learn from a randomly chosen neighbor, which may lead to the change of fractions of cooperation and loners. For a cooperator, when n 1 + 0.3n 3 < 4A, it is dissatisfied. While for a defector, the condition for its dissatisfaction is n 4 b + 0.3n 6 < 4A. And for a loner, it will dissatisfy when 4A > 1.2. The phase transition points can be obtained by letting n 1 + 0.3n 3 = 4A and n 4 b + 0.3n 6 = 4A. Thus, the value of A at which phase transition occurs is given by A = (n 1 + 0. The configuration of players are relevant for the evolutionary success of cooperation in spatial games [40][41][42], it is also of interest to test the robustness of the proposed updating rule. We thus investigate how cooperation evolves under different (adverse) configurations. When 0 A < 0.075, only cooperators or defectors surrounded by defectors [see Fig 3(A)] are dissatisfied. The cooperator surely resorts to defection by imitation because of his dissatisfaction, while the defector has to remain his strategy. Hence, the fraction of cooperation will decrease while defectors increase. When 0.075 A < 0.15, the cooperator or the defector next to a loner [see Fig 3(B) and Fig 3(C)] is dissatisfied. The cooperator will choose defection or turn to a loner, and the defector would probably become a loner. Thus the fraction of cooperation will decrease while loners increase.

Conclusion
To conclude, we have studied the impact of the win-stay-lose-learn strategy updating rule on the evolutionary prisoner's dilemma game with voluntary participation. The risk averse loner strategy (L) is introduced and the strategies depend on the level of satisfaction of individuals. We find that the new rule is able to maintain persistent cooperative behavior, even further promotes the evolution of cooperation under certain conditions. The win-stay-lose-learn rule dominates the level of cooperation when individuals are not too greedy. It restrains the individual players' impulses of changing their strategies with small aspiration levels (A 0.5), while it promotes the evolution of cooperation especially for intermediate values of the aspiration parameter (0.5 < A 0.75), virtually complete cooperation dominance can be achieved even for values of the temptation to defect that significantly exceed 1. This is in sharp contrast to the results obtained with large aspiration levels (A > 0.75), where the cyclic rock-scissorspaper type of dominance is essentially fully recovered. Compared with previous works, our model considers both two points so that the new rule can promote the evolution of cooperation for intermediate values of the aspiration parameter, and maintain persistent cooperative behavior with large aspiration levels. Our work is expected to provide a valuable method that can resolve the prisoner' dilemma.

Methods
We consider voluntary participation and a win-stay-lose-learn strategy updating rule in the PD on a square lattice of size L 2 with periodic boundary conditions. The risk averse player is defined as the loner (L). Loners and their co-players always obtain the fixed payoff 0 < σ < 1 that makes their overall payoffs more than that of two defectors but are less than that of cooperative pairs. The aspiration level of a player is defined as degree and A is a free parameter of the average aspiration level within the interval of (0, b). Following a common practice [6], we choose the PD's payoffs as R = 1, P = S = 0, and T = b>1, satisfying the restricted condition T>R>P = S. We implement the evolutionary dynamics in the following way. As initial conditions, we assign to each individual, with equal probability, one of the three available strategies: cooperation (C), defection (D) or lone (L). Then, at each time step, each player i in the network obtains the payoff P i by playing with all its neighbors. Next, all the players synchronously update their strategies by comparing the respective payoffs P i and A i . If P i > A i , player i will keep its strategy for the next step. On the contrary, if P i < A i , player i will pick up at random one of their neighbors, say j, and adopt j's strategy with the probability: where K stands for the amplitude of noise [43][44][45][46]. Without loss of generality, we use K = 0.1 for the PD. To assure that the system has reached a stationary state we make the transient time t equals 100000. Then we can obtain the presented results by using L = 100 system size. Moreover, each data were averaged over up to 20 independent runs for each set of parameter values in order to assure suitable accuracy.

S1 File. This file (Zip format) contains the raw data used in figures with MC simulation.
(ZIP)