Cooperation enhanced by the coevolution of teaching activity in evolutionary prisoner's dilemma games with voluntary participation

Voluntary participation, as an additional strategy involved in repeated games, has been proved to be an efficient way to promote the evolution of cooperation theoretically and empirically. Besides, current studies show that the coevolution of teaching activity can promote cooperation. Thus, inspired by aforementioned above, we investigate the effect of coevolution of teaching activity on the evolution of cooperation for prisoner’s dilemma game with voluntary participation: when the focal player successfully enforces its strategy on the opponent, his teaching ability will get an increase. Through numerical simulation, we have shown that voluntary participation could effectively promote the fraction of cooperation, which is also affected by the value of increment. Furthermore, we investigate the influence of the increment value on the density of different strategies and find that there exists an optimal increment value that plays an utmost role on the evolutionary dynamics. With regard to this observation, we unveil that an optimal value of increment can lead to strongest heterogeneity in agents’ teaching ability, further promoting the evolution of cooperation.


Introduction
How to understand the emergence and maintenance of cooperation in the context of natural selection remains a formidable challenge met by scientists from many different fields of natural and social sciences [1]. Evolutionary game theory provides a common mathematical framework to investigate this puzzle within groups of selfish individuals. In particular, the prisoner's dilemma game (PDG), as a metaphor for capturing social dilemma, has drawn continuous attention to explore the possibilities enhancing the cooperative behavior among selfish individuals [2][3][4][5].
In the original PDG, each player decides whether to cooperate (C) or defect (D) simultaneously. They both receive reward R for mutual cooperation and punishment P for mutual defection. If a defector encounters a cooperator, the former will get a temptation to defect T, while the latter receives a sucker's payoff S. These parameters satisfy: T>R>P>S and 2R>T+S. Obviously, defection is the best choice regardless of the opponent's choice and hence leads to the tragedy of the commons, where private interest is at odds with the collective welfare [6]. In order to resolve this dilemma, a variety of scenarios have been proposed to offset the above unfavorable outcome and enhance the evolution of cooperation [7][8][9][10][11][12][13][14][15]. Nowak attributed all these works to five mechanisms: kin selection, group selection, direct reciprocity, indirect reciprocity and network reciprocity [16]. Particularly, network reciprocity, where players are constrained to play only with their direct neighbors, has been pioneered by Nowak and May and inspired a great deal of attention [17], for example, different network topologies: small-world network [18], random regular graph [19], BA scale-free network [20] multilayer network [21,22] Szolnoki and Perc (2008) investigate the evolutionary games where the teaching activity of players can evolve in time and find an optimal value of the increment, which can best support the maintenance of cooperation [35]; Further they separately consider coevolution affecting either only the cooperators or only the defectors, and show that both options promote cooperation. Interestingly, they reveal that the co-evolutionary promotion of players spreading defection is more beneficial for cooperation [36].
Voluntary participation, as a third strategy except for cooperation and defection, has been turned out to be a simple yet effective way to promote the persistence of cooperative behavior [39,40]. Specifically, when a player repugnance to risk would like to obtain a small but fixed payoff, he can refuse to participate in the PDG and choose the strategy as loner (L). The introduction of loner strategy would rather not lead the system falling into the homogeneous defection state but a rock-scissors-paper dynamics with cyclic dominance. From aforementioned above, we can impose an interesting question: how cooperation fares when the teaching activity is considered into prisoner's dilemma games with voluntary participation? The rest of this paper are organized as follows: we first describe our modified model of PDG with voluntary participation; subsequently, the main simulation results are shown in Section 3; lastly, we summarize our conclusions in Section 4.

Results
We first study the relationship between fraction of cooperation and Δw, fraction of defection and Δw, as well as fraction of loner and Δw for different values of temptation to defect b in   (Fig 1(A)), defectors (Fig 1(B)) and loners (Fig 1(C)) as a function of Δ for different values of temptation to defect b. All results are obtain for K = 0.1. When b is relatively large, such as b = 1.05, b = 1.2 or b = 1.6, three strategies coexist in network. Interestingly, the fraction of cooperators, defectors could reach its maximum value, however, the fraction of loners has the minimum value around the moderate value of Δw. For small b, the effect of enhanced network reciprocity can enable the cooperators to survive and even domain. With b continues to increase, the effect of enhanced network reciprocity would be weakened and cooperators need loners to protect them from being wiped out on the dynamic process. Thus three strategies fall into cycle dominance. Interestingly, the cooperation behaviors for Δw = 0 and Δw = 0.4 are nearly the same no matter what value of b applies. In fact, for the case of Δw = 0, the teaching ability of players will not evolve, and thus the system is homogeneous. However, although the teaching ability of players will evolve for large Δw, the system essentially leave the whole population in a homogeneous state for the too fast stop of the teaching ability, whereas a very few influential players having w x close to 1.
In order to further validate the above results, some typical evolution snapshots of clusters have been shown in Fig 2. From top to bottom, Δw are equal to 0, 0.05 and 0.3, and the time steps from left to right are equal to 0, 100, 800, 10000. Since cooperators dominate loners, loners dominate defectors and defectors dominate cooperators, three strategies coexist in the system at the stable state for traditional game with voluntary participation. However, from the second panel (Δw = 0.05), we can see that loners go extinct and cooperators can survive by forming compact clusters to resist the invasion of defectors due to the enhanced network reciprocity. When Δw is sufficiently large (lower panel), the system will fall into the cycle dominance of three strategies again, which is due to the weakening enhanced network reciprocity affected by too fast stop of the evolution of teaching ability. From the observation of Fig 1 and Fig 2, we can see that a moderate value of Δw can best promote cooperation. For the sake of exploring the potential reason of these results, Fig 3 features the distribution of player's teaching ability in stable state. It is obvious that the teaching ability is a fixed value when Δw = 0. However, when Δw > 0, an interesting phenomenon appears: the teaching ability of player is no longer a fixed value, but exhibits heterogeneity which is introduced by our coevolution setup. Furthermore, the heterogeneity of players leads to an enhanced network reciprocity and further promotes the evolution of cooperation. In fact, the variance of teaching ability for different Δw in Fig 3 are equal to 0, 0.08, 0.05, respectively. That is to say the heterogeneity of players' teaching ability is largest when Δw = 0.05. Previous works has proved that the positive effect of heterogeneity on the evolution of cooperation, such as heterogeneity network [41-43], social diversity [44,45], heterogeneity aspirations [46], to name only a few examples. It is easy to understand why there exists a moderate value of Δw that can best promote the evolution of cooperation in structured population.
Finally, it is instructive to examine the evolution of cooperation under different levels of uncertain K by strategy adoptions. When K ! 1, all information is lost, players switch to neighbor's strategy completely at random, while K ! 0 enables the complete deterministic selections of the neighbor's strategy. Fig 4 shows the phase transition lines on the full K − b parameter plane for different values of Δw. Fig 4(A) (Δw = 0) features a bell shaped phase boundary separating the pure C and mixed C +D + L phases, implying that an optimal level (K % 0.25) of uncertainty can best promote cooperation. When Δw is moderate (i.e. Δw = 0.05), the space of pure C phases become wider and mixed C +D + L phases become narrow, besides, the optimal uncertainty disappears. Namely, the larger the value of K, the higher level of cooperation. However, when Δw is sufficiently large (i.e. Δw = 0.3), the space of pure C become narrow and mixed C +D + L become wider again, what's more, the optimal uncertainty recovers. Nevertheless, from mentioned above, we can see that the consideration of coevolution of teaching ability and strategy not only supports the presence of cooperation, but also guarantees the best environment for cooperation to survive.

Conclusion
To conclude, we have studied the coevolution of teaching ability and strategy for prisoner's dilemma game with voluntary participation. Through numerical simulation, we have found that moderate value of Δw can best promote the evolution of cooperation. Compared with the traditional game with voluntary participation, the heterogeneous distribution of player's teaching ability is introduced into the system. Further, the enhanced network reciprocity occurs, which affects the evolution trend and promotes the evolution of cooperation. However, relatively large Δw weakens the strength of the enhanced network reciprocity, leading the system falling into cycle dominance again. We hope our work can shed some meaningful lights on resolving the social dilemmas in realistic world.

Models
In our work, each player can choose to be a cooperator (s x = C), a defector (s x = D), or a loner (s x = L) in each round of the game. With regard to interaction network, we choose the square lattice with four direct neighbors of size L Ã L. Based on the weak PDG [39]: R = 1, P = S = 0, and T = b. The payoffs for two players in the prisoner's dilemma game with voluntary participation can be represented as the matrix: where b (1 < b 2) represents the temptation to defect and δ 2 (0, 1) denotes the payoff of both the risk averse loner and its opponent. For simplicity but without loss of generality, δ is fixed to be 0.3 [47].
The game is iterated forward in accordance with Monte Carlo (MC) simulation composed of the following elementary steps. First, a randomly selected player x evaluates his benefits p x by interacting with his direct neighbors. Next, player x choose one neighbor at random, say y, who also gets his payoff p y in the same way. Finally, player x tries to enforce its strategy s x on player y in accordance with the modified fermi function: Where K denotes the amplitude of noise or its inverse the so-called intensity of selection [40]. Besides, w x represents the ability of the transfer strategy of player x (teaching activity of player x). Following Szolnoki and Perc [35], the teaching activity w x changes adaptively in time as follows. Initially, all players are given the minimal value w x = 0.01 throughout this paper in order to avoid frozen states. Next, the pre factor w x will be increased by a constant positive value Δw when the focal player x succeeds in enforcing its strategy on his opponent y for each time step. Lastly, the evolution of w x is stopped as soon as one w x reaches 1. It is worth mentioning that our work is different from the previous work [35], we mainly apply the setup of coevolution of strategy and teaching ability in that work to our PDG with voluntary participation, and investigate the evolution of cooperation. MC results presented below are obtained on population comprising 400 Ã 400 individuals and the stationary fractions of these strategies are determined with 5 Ã 10 3 steps after sufficiently long transients. Moreover, since the teaching activity may introduce additional disturbances, the final results have been averaged over up to 10 independent realizations for each set of parameter values in order to assure suitable accuracy.

S1 File. This file (zip format) contains the raw data used in figures with MC simulation.
(RAR)