Strategies for updating rules driven by reinforcement learning to solve social dilemmas

Yang Wang; Xingchen Yu; Shounan Lu

doi:10.1371/journal.pone.0341925

Abstract

This study incorporates historical performance into traditional imitation rules and proposes a moderated strategy update rule. In this framework, an individual’s temporal historical performance is calculated using the BM model. By adjusting the parameter δ, the influence of historical performance on strategy learning is determined, and the evolution of cooperation is subsequently observed. Results show that the proposed strategy update rule promotes cooperation more effectively than the traditional version, and systemic cooperation is further enhanced as δ increases. The reason why the proposed rule enhances cooperation is that it amplifies the evaluation of cooperative behavior while compressing the evaluation of defective behavior. Although establishing system objectives may hinder the diffusion of cooperative behavior, appropriate performance evaluation mechanisms can mitigate this adverse effect. Our results indicate that multidimensional evaluation can provide a theoretical basis for explaining cooperative behavior in complex environments.

Citation: Wang Y, Yu X, Lu S (2026) Strategies for updating rules driven by reinforcement learning to solve social dilemmas. PLoS One 21(3): e0341925. https://doi.org/10.1371/journal.pone.0341925

Editor: Xiaojie Chen, University of Electronic Science and Technology of China, CHINA

Received: June 28, 2025; Accepted: January 14, 2026; Published: March 10, 2026

Copyright: © 2026 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting information files.

Funding: 1.Fundamental Research Funds for the Central Universities (Grant No 3142024036) 2.Self-funded Project of Langfang Science and Technology Research and Development Program(Grant No 2024011066).

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

Cooperation is fundamental to the stability of biological ecosystems and human social systems. Yet, reconciling cooperative behaviors with Darwinian principles of natural selection [1] presents a persistent theoretical challenge. This has prompted most researchers to explore the principles underlying cooperation. Nowak’s seminal work summarizes five key mechanisms facilitating cooperation: kin selection, direct reciprocity, indirect reciprocity, network reciprocity, and group selection [2]. Crucially, network reciprocity exploits underlying population structure, enabling cooperators to cluster and thereby sustain cooperation [3]. Subsequent research has extended this foundational framework, incorporating diverse mechanisms such as punishment and reward [4–7], environmental feedback [8,9], social diversity [10], teaching activities [11,12], reputation [13], among others [14].

While numerous studies posit strategy learning in cooperative evolution primarily through imitation rules [3]—where benefits serve as the key driver—the reality involves complex, multidimensional influences. Strategy learning behavior results from the interplay of factors like self-learning and social learning [15]. Scholars have explored the impact of diverse strategy updating rules on the emergence and maintenance of cooperation. For instance, Yan and Hui demonstrated that integrating reputation mechanisms significantly enhances cooperation [16]. Similar conclusions were reached by Zhang et al. and He et al. [17,18]. Lu and Wang, incorporating past-performance into learning rules, found that increasing the weight of historical outcomes progressively strengthens system-wide cooperation [19]. Other investigated rules include popularity-driven [20] and experience-driven updates [21]. Collectively, these findings indicate that multi-factor dependent learning rules generally foster cooperative evolution.

Despite this extensive exploration, the predominant focus remains on extrinsic social attributes, such as reputation. Consequently, a critical gap persists: the evolutionary patterns and outcomes of system cooperation under composite strategy learning rules driven primarily by intrinsic individual attributes remain unexplored.

In addition, most existing studies focus on the immediate benefits of individual behavior within a single interaction round, overlooking the accumulated experience from prior games. This approach fails to capture the natural phenomenon in which organisms adapt their social strategies based on environmental cues, including feedback from past experiences. Reinforcement learning (RL) rules, however, effectively incorporate the cumulative influence of such memory effects [22–24]. Consequently, researchers have increasingly explored RL in evolutionary cooperation studies. For instance, Jia et al. demonstrated that incorporating RL enhances system-wide cooperation [25]. However, the research conducted by Lu et al [19], focused on the impact of reinforcement learning based relationship strength adjustment on cooperation, neglecting the dual effects of internal and external factors in strategy learning. The studies of Jia et al. [25] and Geng et al. [26] also overlooked the role of individual intrinsic factors in strategy learning. In addition, although Zhang et al. combined reinforcement learning with consensus learning rules to study how different policy update mechanisms affect cooperative evolution, the model assumes that individuals are rational [27,28], which does not reflect the actual cooperative evolution situation. Notably, recent findings suggest RL not only accounts for conditional cooperation but also explains patterns of emotional reciprocity [29,30].

Accordingly, we conceptualize the system’s consistency goal [31–35] as an intrinsic driver of individual behavior. Achieving this goal serves as one criterion for evaluating behavioral performance: success in the preceding round raises the current performance score, while failure lowers it. Drawing on reinforcement learning principles, we accumulate behavioral information across successive rounds to assess historical performance. From a global perspective, we evaluate individual performance and use the resulting assessment as a measure of social evaluation. By taking the interactive payoffs among individuals as the basis for mutual assessment, the strategy learning process is systematically guided through the integration of both social and individual evaluations. On this basis, we examine the evolution of cooperation within the system. This update rule incorporates both real-time game payoffs and historical behavior to govern strategy revisions. Our results show that this modified update mechanism significantly promotes the emergence of prosocial behaviors in the system.

2. Model

In this work, the weak Prisoner’s Dilemma is used [3], and without loss of generality, set the game payoff T = b (b ＞ 1), R = 1, and P = S=0, and follows: T > R > P > S and 2R > T + S. The corresponding payoff matrix M in Eq. (1).

(1)

Then, we construct a two-dimensional spatial network with periodic boundary to depict the relationships between individuals in the system. Initially, each individual randomly adopts either cooperation (S_i = C) or defection (S_i = D) with equal probability p, as specified in Eq. (2), interacting with its four nearest neighbors, accumulating income P_i based on Eq. (3), where Ω_i is the set of individual i’s neighbors.

(2)

(3)

Subsequently, the BM model [22–24] is employed within a reinforcement learning framework to calculate and assess an individual’s historical performance. In performance evaluation, the system’s aspiration level for consistency serves as the evaluation benchmark. If an individual’s cumulative performance during the evaluation period reaches or exceeds this benchmark, their score increases; otherwise, it decreases. And this adjustment mechanism operates persistently. Within BM reinforcement learning, the performance evaluation process occurs in two distinct steps. First, performance is scored according to the degree of deviation between the cumulative revenue and the expected system consistency target, in Eq. (4), where β (β ≥ 0) is the stimulus sensitivity to the reinforcement signal of (P_i-A). Subsequently, evaluating based on individual strategies, in Eq. (5), where g_i represents the satisfaction of players with the difference between P_i and A, then, the global evaluation results E_i of individual historical behavior performance can be quantified, with the calculation formula specified as follows.

(4)

(5)

Where, parameter A represents the consistency goal or expected level of the system, and define A = k_iα, where k_i = 4 denotes player i’s degree, representing the number of their four nearest neighbors [36,37], and α signifies the system’s consistency aspiration or goal level.

Finally, to refine learning strategies, this study proposes a moderated update rule. This rule integrates individual historical performance assessment with imitation dynamics, combining game payoff and historical performance to guide strategy updates, refers to the previous strategy learning rule setting and linearly adds them together [38,39]. The parameter δ (δ ∈ [0, 1]) modulates the weighting of historical performance in learning. When δ = 0, the strategy learn rule returns to its traditional version [3]. When δ > 0, both game payoffs and historical performance jointly shape strategy adaptation dynamics. Specifically, during the strategy update or learning process, the focal individual i will randomly select a nearest neighbor j and decide whether to learn the strategy of neighbor j based on probability W that based on Eq. (6). Here, parameter K quantifies the stochastic noise level that enables irrational decisions. As K → 0, agent i deterministically adopts the strategy of adjacent agent j, whereas when K→∞, the strategy imitation occurs randomly. Following Ref. [40], we set K = 0.5.

(6)

To evaluate the effectiveness of the proposed cooperation-enhancing mechanism, Monte Carlo simulations (MCS, which stands for Monte Carlo step) were performed on a 200 × 200 lattice network. Initially setting p = 0.5 and E_i (0) =0.5, each individual updated their strategy once per full MCS cycle on average. The equilibrium cooperation frequency fc was measured at 1 × 10⁴ MCS, with data averaged over 3 × 10³ MCS to minimize fluctuations. Results reflect 20 independent trials.

3. Result

Fig 1 illustrates the evolution of cooperation level fc across defect temptation values b for varying δ. When δ = 0, strategy updates revert to conventional imitation rules, causing cooperators to rapidly disappear even at low b. For δ > 0, however, agents incorporate historical performance into strategy evaluation. This modification substantially elevates cooperation levels, with higher δ values further amplifying cooperative behavior. In particular, when δ is set to 4, the payoff dimensions become consistent with historical performance, promoting a greater degree of system cooperation. Consequently, the moderated update rule extends the critical threshold b for cooperation extinction beyond conventional imitation, thereby promoting both the emergence and sustainability of cooperation.

Download:

Fig 1. Illustrates fc versus b under different δ values.

Vertically, with increasing δ, the system evolves to a stable state with a greater degree of cooperation. Showing that the proposed mechanism promotes cooperation and broadens its disappearance threshold b. Parameters are set to β = 2, α = 0.5 based on Ref. [24] and L = 200.

https://doi.org/10.1371/journal.pone.0341925.g001

We further investigated the impact of the parameter δ on strategic evolution from the perspective of population dynamics. Fig 2 shows the temporal evolution of cooperation for b = 1.1 under different δ values. The process exhibits two distinct stages: an initial decline followed by a rise. After reaching a minimum, the cooperation level increases steadily until it stabilizes at an evolutionary equilibrium. This characteristic dip-and-rise pattern reflects the intense competition between cooperators and defectors, which is typical of network reciprocity [41–44]. The proposed moderated strategy-update rule significantly enhances cooperation. Notably, higher δ values drive the system toward a more cooperative equilibrium, accelerate the evolution of cooperation, and shorten the duration of the second phase. These results confirm that the modified rule effectively promotes cooperation.

Download:

Fig 2. Illustrates temporal evolution of cooperative for some δ values, showing that the proposed mechanism can promote cooperation.

With increasing δ, the system evolves to a stable state with a greater degree of cooperation. The results indicate that higher parameter δ can promote cooperation. Parameters are set to β = 2, α = 0.5 and L = 200.

https://doi.org/10.1371/journal.pone.0341925.g002

Moving forward, we analyze the evolutionary dynamics by examining the spatial distribution of individual strategies to observe the competition between cooperative and competitive behaviors. As shown in Fig 3, when cooperators are initially placed in distinct positions, they quickly disappear under traditional strategy update rules. However, when moderated strategy update rules are applied, cooperators gradually spread within clusters of defectors. They gain a competitive advantage over defectors, which strengthens as parameter δ increases. Ultimately, this advantage allows cooperators to dominate, leading to a higher level of system cooperation at equilibrium. Notably, in the stable state, cooperators survive within numerous small, compact clusters.

Download:

Fig 3. Shows snapshots of spatial distributions for blue cooperators and purple defectors at different MCS for various δ value.

Vertically, the diffusion of cooperative behavior accelerates with increasing δ. Simulations indicate that larger δ significantly promotes cooperation. Parameters are set to b = 1.1, α = 0.5, β = 2 and L = 200.

https://doi.org/10.1371/journal.pone.0341925.g003

Then, we further analyze the impact of sensitivity parameter β on cooperative evolution given parameter δ = 1.0 and α = 0.5. And the impact of the system consistency aspiration or goal value α on cooperative evolution under the given parameter β = 2 and δ = 1.0.

In Fig 4(a), at δ = 1.0 and α = 0.5, the system cooperation level increases with the sensitivity parameter β, demonstrating its impact on cooperative evolution. This may be because the increase in sensitivity parameters amplifies the gap between game benefits and consistency aspiration or goals. Under this evaluation criterion, the results of cooperative payoff evaluation are amplified while the results of defective payoff evaluation are compressed, which accelerates the propagation and diffusion of cooperative behavior in the system, which also explains the strengthening effect of the proposed mechanism on cooperative behavior. However, as the sensitivity parameter β increases further, it suppresses cooperation in systems characterized by strong temptation. Under high-intensity social dilemma (b is large), defection obtain greater payoff that can achieving system goals. This overestimation of defection’s benefits facilitates its spread, thereby reducing overall cooperation. Fig 4(b) reveals the nonlinear dynamic influence of β on the evolution of cooperation. Thus, we can know that a modestly sensitivity parameter β can promote cooperation.

Download:

Fig 4. Panel (a) plots the fc against the b for different β values.

Panel (b) demonstrates the dynamic impact of the β on cooperation evolution, revealing that while higher β values promote cooperation, the effect is only significant up to a certain threshold. The results indicate that a modestly sensitivity parameter β can promote cooperation. Parameters are set to α = 0.5, δ = 1.0 and L = 200.

https://doi.org/10.1371/journal.pone.0341925.g004

Fig 5(a) further examines the influence of the aspiration level α on the evolution of cooperation. The results show that lower system goals improve goal fulfillment among cooperators, thus promoting cooperation. However, an increase in the temptation b triggers an invasion of defectors, leading to a monotonic decline in system-wide cooperation. Despite this decrease, cooperation remains substantially higher than in traditional scenarios. Furthermore, cooperation diminishes as α rises. This is likely because higher α values make it more difficult for cooperative individuals to meet expectations. Under such evaluation criteria, the performance of individuals who fail to achieve α is accentuated, which hinders the propagation of cooperation. Therefore, as illustrated in Fig 5(b), setting lower system aspiration levels (or goals) can facilitate the spread of cooperation within this framework.

Download:

Fig 5. Panel (a) plots the fc against the b under different α values, Panel (b) shows the dynamic impact of α on the cooperative evolution, demonstrating that high consistency aspiration or goal suppress the propagation of cooperative behavior in the system.

The results indicate that lower system consistency desires or goals can promote cooperation. Parameters are set to β = 2, δ = 1.0 and L = 200.

https://doi.org/10.1371/journal.pone.0341925.g005

Further analysis in Fig 6 explores the dynamic influence of parameters β and δ on the evolution of cooperation. Horizontally, increasing the sensitivity parameter β alleviates the negative effect of aspiration level setting on cooperation within the system. Although this enhancement promotes collaborative behavior, its extent remains limited. Vertically, establishing appropriate aspiration levels is crucial for both initiating and maintaining cooperation. Thus, these parameters jointly shape the evolution of cooperation in the system.

Download:

Fig 6. Illustrates the dynamic evolution of fc with parameter β-α.

Horizontally, system cooperation increases as β increases, whereas vertically, it decreases as α increases. The results indicate that parameters β and δ jointly determine the cooperative evolution in the system. Parameters are set to δ = 1.0 and L = 200.

https://doi.org/10.1371/journal.pone.0341925.g006

To better illustrate the impact of parameter K on cooperative evolution, we calculated the evolutionary dynamics of the system for several values of K. As shown in Fig 7, the results indicate that the system maintains robustness under small noise fluctuations. However, when the noise amplitude becomes large, cooperative evolution is significantly disrupted. This finding is consistent with the conclusion reported by [40]. Moreover, analysis of the evolution of system cooperation across different network sizes (L = 50–300) shows consistent cooperation evolution, indicating that the system is robust.

Download:

Fig 7. Panel (a) plots the fc against the b under different K values.

Panel (b) plots the fc against the b under different L values. The result is consistent. The system exhibits robustness against minor noise interference and differences in network scale. Which indicating that the system is robust. Parameters are set to β = 2, α = 0.5, δ = 0.2 and L = 200 in Panel (a), and β = 2, α = 0.5, δ = 0.2 and K = 0.5 in Panel (b).

https://doi.org/10.1371/journal.pone.0341925.g007

4. Conclusion

Cooperation is recognised as foundation for sustaining socio-economic development. Building upon prior research and empirical observations of individual behavior within real social systems, this study introduces a novel strategy update rule. This rule incorporates a system-wide consensus objective and employs an individual’s historical attainment of this objective as a key performance metric, calculated using the Bush-Mosteller (BM) model within a reinforcement learning framework.

Results demonstrate that, compared to conventional update mechanisms, the proposed rule significantly amplifies cooperative behavior within the system. This enhancement stems from the rule’s systematic devaluation of defection payoffs while concurrently amplifying the perceived benefits of cooperation. This direct mechanism effectively suppresses the proliferation of defection strategies and accelerates the diffusion of cooperative ones. Furthermore, the inherent multidimensionality of the composite performance metric constrains opportunities for defectors by selectively filtering potential imitators. This aligns with empirical evidence indicating that social evaluations rarely hinge on a singular criterion. Critically, establishing an appropriate evaluative benchmark directly fosters cooperative outcomes within groups, and the methodology for behavioral assessment relative to this benchmark is paramount for optimizing group management. Collectively, our findings suggest that multidimensional evaluation creates more favorable conditions for the emergence and persistence of cooperation within complex environmental systems. This research offers theoretical insights into the mechanisms underpinning cooperative behavior in collective settings.

Supporting information

S1 File. This code is used to calculate Fig 1.

https://doi.org/10.1371/journal.pone.0341925.s001

(TXT)

S2 File. This code is used to calculate Fig 2.

https://doi.org/10.1371/journal.pone.0341925.s002

(TXT)

S3 File. This code is used to calculate Fig 3.

https://doi.org/10.1371/journal.pone.0341925.s003

(TXT)

S4 File. This code is used to calculate Fig 4(a).

https://doi.org/10.1371/journal.pone.0341925.s004

(TXT)

S5 File. This code is used to calculate Fig 4(b).

https://doi.org/10.1371/journal.pone.0341925.s005

(TXT)

S6 File. This code is used to calculate Fig 5(a).

https://doi.org/10.1371/journal.pone.0341925.s006

(TXT)

S7 File. This code is used to calculate Fig 5(b).

https://doi.org/10.1371/journal.pone.0341925.s007

(TXT)

S8 File. This code is used to calculate Fig 6.

https://doi.org/10.1371/journal.pone.0341925.s008

(TXT)

S9 File. This code is used to calculate Fig 7(a).

https://doi.org/10.1371/journal.pone.0341925.s009

(TXT)

S10 File. This code is used to calculate Fig 7(b).

https://doi.org/10.1371/journal.pone.0341925.s010

(TXT)

S11 File. Data of all figure.

https://doi.org/10.1371/journal.pone.0341925.s011

(ZIP)

References

1. Hauert C, Szabó G. Game theory and physics. Am J Phys. 2005;73(5):405–14.
- View Article
- Google Scholar
2. Nowak MA. Five rules for the evolution of cooperation. Science. 2006;314(5805):1560–3. pmid:17158317
- View Article
- PubMed/NCBI
- Google Scholar
3. Nowak MA, May RM. Evolutionary games and spatial chaos. Nature. 1992;359(6398):826–9.
- View Article
- Google Scholar
4. Cressman R, Wu J-J, Li C, Tao Y. Game experiments on cooperation through reward and punishment. Biol Theory. 2013;8(2):158–66.
- View Article
- Google Scholar
5. Sigmund K, Hauert C, Nowak MA. Reward and punishment. Proc Natl Acad Sci. 2001;98(19):10757–62.
- View Article
- Google Scholar
6. Liu L, Wang L, Niu W, Hua S. Dynamic sanctioning mechanism for cooperative multi-agent systems. Exp Syst Appl. 2026;296:128873.
- View Article
- Google Scholar
7. Hua S, Liu L. Coevolutionary dynamics of population and institutional rewards in public goods games. Exp Syst Appl. 2024;237:121579.
- View Article
- Google Scholar
8. Ding R, Wang X, Liu Y, Zhao J, Gu C. Evolutionary games with environmental feedbacks under an external incentive mechanism. Chaos Solitons Fractals. 2023;169:113318.
- View Article
- Google Scholar
9. Chen Y-D, Guan J-Y, Wu Z-X. Coevolutionary game dynamics with localized environmental resource feedback. Phys Rev E. 2025;111(2–1):024305. pmid:40103166
- View Article
- PubMed/NCBI
- Google Scholar
10. Perc M, Szolnoki A. Social diversity and promotion of cooperation in the spatial prisoner’s dilemma game. Phys Rev E Stat Nonlin Soft Matter Phys. 2008;77(1 Pt 1):011904. pmid:18351873
- View Article
- PubMed/NCBI
- Google Scholar
11. Szolnoki A, Perc M. Coevolution of teaching activity promotes cooperation. New J Phys. 2008;10(4):043036.
- View Article
- Google Scholar
12. Szolnoki A, Szabó G. Cooperation enhanced by inhomogeneous activity of teaching for evolutionary Prisoner’s Dilemma games. Europhys Lett. 2007;77(3):30004.
- View Article
- Google Scholar
13. Wang Z, Wang L, Yin Z-Y, Xia C-Y. Inferring reputation promotes the evolution of cooperation in spatial social dilemma games. PLoS One. 2012;7(7):e40218. pmid:22808120
- View Article
- PubMed/NCBI
- Google Scholar
14. Liu L, Chen X, Szolnoki A. Coevolutionary dynamics via adaptive feedback in collective-risk social dilemma game. Elife. 2023;12:e82954. pmid:37204305
- View Article
- PubMed/NCBI
- Google Scholar
15. Han X, Zhao X, Xia H. Hybrid learning promotes cooperation in the spatial prisoner’s dilemma game. Chaos Solitons Fractals. 2022;164:112684.
- View Article
- Google Scholar
16. Bi Y, Yang H. Based on reputation consistent strategy times promotes cooperation in spatial prisoner’s dilemma game. Appl Math Comput. 2023;444:127818.
- View Article
- Google Scholar
17. Zhang H, An T, Wang J, Wang L, An J, Zhao J, et al. Reputation-based adaptive strategy persistence can promote cooperation considering the actual influence of individual behavior. Phys Lett A. 2024;508:129495.
- View Article
- Google Scholar
18. He J, Wang J, Yu F, Zheng L. Reputation-based strategy persistence promotes cooperation in spatial social dilemma. Phys Lett A. 2020;384(27):126703.
- View Article
- Google Scholar
19. Lu S, Wang Y. Past-performance-driven strategy updating promote cooperation in the spatial prisoner’s dilemma game. Appl Math Comput. 2025;491:129220.
- View Article
- Google Scholar
20. Xu J, Deng Z, Gao B, Song Q, Tian Z, Wang Q, et al. Popularity-driven strategy updating rule promotes cooperation in the spatial prisoner’s dilemma game. Appl Math Comput. 2019;353:82–7.
- View Article
- Google Scholar
21. Lu S, Wang Y. Experience-driven learning and interactive rules under link weight adjustment promote cooperation in spatial prisoner’s dilemma game. Appl Math Comput. 2025;497:129381.
- View Article
- Google Scholar
22. Jia D, Guo H, Song Z, Shi L, Deng X, Perc M, et al. Local and global stimuli in reinforcement learning. New J Phys. 2021;23(8):083020.
- View Article
- Google Scholar
23. Masuda N, Nakamura M. Numerical analysis of a reinforcement learning model with the dynamic aspiration level in the iterated Prisoner’s dilemma. J Theor Biol. 2011;278(1):55–62. pmid:21397610
- View Article
- PubMed/NCBI
- Google Scholar
24. Ezaki T, Horita Y, Takezawa M, Masuda N. Reinforcement learning explains conditional cooperation and its moody cousin. PLoS Comput Biol. 2016;12(7):e1005034. pmid:27438888
- View Article
- PubMed/NCBI
- Google Scholar
25. Jia D, Li T, Zhao Y, Zhang X, Wang Z. Empty nodes affect conditional cooperation under reinforcement learning. Appl Math Comput. 2022;413:126658.
- View Article
- Google Scholar
26. Geng Y, Liu Y, Lu Y, Shen C, Shi L. Reinforcement learning explains various conditional cooperation. Appl Math Comput. 2022;427:127182.
- View Article
- Google Scholar
27. Zhang L, Li Y, Xie Y, Feng Y, Huang C. The combined effects of conformity and reinforcement learning on the evolution of cooperation in public goods games. Chaos Solitons Fractals. 2025;193:116071.
- View Article
- Google Scholar
28. Horita Y, Takezawa M, Inukai K, Kita T, Masuda N. Reinforcement learning accounts for moody conditional cooperation behavior: experimental results. Sci Rep. 2017;7:39275. pmid:28071646
- View Article
- PubMed/NCBI
- Google Scholar
29. Tampuu A, Matiisen T, Kodelja D, Kuzovkin I, Korjus K, Aru J, et al. Multiagent cooperation and competition with deep reinforcement learning. PLoS One. 2017;12(4):e0172395. pmid:28380078
- View Article
- PubMed/NCBI
- Google Scholar
30. Ding Z-W, Zheng G-Z, Cai C-R, Cai W-R, Chen L, Zhang J-Q, et al. Emergence of cooperation in two-agent repeated games with reinforcement learning. Chaos Solitons Fractals. 2023;175:114032.
- View Article
- Google Scholar
31. Bendor J, Mookherjee D, Ray D. Aspiration-based reinforcement learning in repeated interaction games: an overview. Int Game Theory Rev. 2001;03(02n03):159–74.
- View Article
- Google Scholar
32. Zhang L, Huang C, Li H, Dai Q, Yang J. Cooperation guided by imitation, aspiration and conformity-driven dynamics in evolutionary games. Phys A: Stat Mech Appl. 2021;561:125260.
- View Article
- Google Scholar
33. Perc M, Wang Z. Heterogeneous aspirations promote cooperation in the prisoner’s dilemma game. PLoS One. 2010;5(12):e15117. pmid:21151898
- View Article
- PubMed/NCBI
- Google Scholar
34. Li Z, Yang Z, Wu T, Wang L. Aspiration-based partner switching boosts cooperation in social dilemmas. PLoS One. 2014;9(6):e97866. pmid:24896269
- View Article
- PubMed/NCBI
- Google Scholar
35. Liu X, He M, Kang Y, Pan Q. Aspiration promotes cooperation in the prisoner’s dilemma game with the imitation rule. Phys Rev E. 2016;94(1–1):012124. pmid:27575094
- View Article
- PubMed/NCBI
- Google Scholar
36. You T, Shi L, Wang X, Mengibaev M, Zhang Y, Zhang P. The effects of aspiration under multiple strategy updating rules on cooperation in prisoner’s dilemma game. Appl Math Comput. 2021;394:125770.
- View Article
- Google Scholar
37. Chen Y-S, Yang H-X, Guo W-Z, Liu G-G. Promotion of cooperation based on swarm intelligence in spatial public goods games. Appl Math Comput. 2018;320:614–20.
- View Article
- Google Scholar
38. Lu S, Dai J, Zhu G, Guo L. Investigating the effectiveness of interaction-efficiency-driven strategy updating under progressive-interaction for the evolution of the prisoner’s dilemma game. Chaos Solitons Fractals. 2023;172:113493.
- View Article
- Google Scholar
39. Wang J, He J, Yu F. Heterogeneity of reputation increment driven by individual influence promotes cooperation in spatial social dilemma. Chaos Solitons Fractals. 2021;146:110887.
- View Article
- Google Scholar
40. Szabó G, Vukov J, Szolnoki A. Phase diagrams for an evolutionary prisoner’s dilemma game on two-dimensional lattices. Phys Rev E Stat Nonlin Soft Matter Phys. 2005;72(4 Pt 2):047107. pmid:16383580
- View Article
- PubMed/NCBI
- Google Scholar
41. Perc M, Szolnoki A, Szabó G. Restricted connections among distinguished players support cooperation. Phys Rev E Stat Nonlin Soft Matter Phys. 2008;78(6 Pt 2):066101. pmid:19256899
- View Article
- PubMed/NCBI
- Google Scholar
42. Wang Z, Szolnoki A, Perc M. Interdependent network reciprocity in evolutionary games. Sci Rep. 2013;3:1183. pmid:23378915
- View Article
- PubMed/NCBI
- Google Scholar
43. Szolnoki A, Perc M. Promoting cooperation in social dilemmas via simple coevolutionary rules. Eur Phys J B. 2008;67(3):337–44.
- View Article
- Google Scholar
44. Perc M, Szolnoki A. Social diversity and promotion of cooperation in the spatial prisoner's dilemma game[J]. Physical Review E-Statistical, Nonlinear, and Soft Matter Physics, 2008;77(1):01190.

[ref1] 1. Hauert C, Szabó G. Game theory and physics. Am J Phys. 2005;73(5):405–14.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Nowak MA. Five rules for the evolution of cooperation. Science. 2006;314(5805):1560–3. pmid:17158317
View Article
PubMed/NCBI
Google Scholar

[5] View Article

[6] PubMed/NCBI

[7] Google Scholar

[ref3] 3. Nowak MA, May RM. Evolutionary games and spatial chaos. Nature. 1992;359(6398):826–9.
View Article
Google Scholar

[9] View Article

[10] Google Scholar

[ref4] 4. Cressman R, Wu J-J, Li C, Tao Y. Game experiments on cooperation through reward and punishment. Biol Theory. 2013;8(2):158–66.
View Article
Google Scholar

[12] View Article

[13] Google Scholar

[ref5] 5. Sigmund K, Hauert C, Nowak MA. Reward and punishment. Proc Natl Acad Sci. 2001;98(19):10757–62.
View Article
Google Scholar

[15] View Article

[16] Google Scholar

[ref6] 6. Liu L, Wang L, Niu W, Hua S. Dynamic sanctioning mechanism for cooperative multi-agent systems. Exp Syst Appl. 2026;296:128873.
View Article
Google Scholar

[18] View Article

[19] Google Scholar

[ref7] 7. Hua S, Liu L. Coevolutionary dynamics of population and institutional rewards in public goods games. Exp Syst Appl. 2024;237:121579.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref8] 8. Ding R, Wang X, Liu Y, Zhao J, Gu C. Evolutionary games with environmental feedbacks under an external incentive mechanism. Chaos Solitons Fractals. 2023;169:113318.
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref9] 9. Chen Y-D, Guan J-Y, Wu Z-X. Coevolutionary game dynamics with localized environmental resource feedback. Phys Rev E. 2025;111(2–1):024305. pmid:40103166
View Article
PubMed/NCBI
Google Scholar

[27] View Article

[28] PubMed/NCBI

[29] Google Scholar

[ref10] 10. Perc M, Szolnoki A. Social diversity and promotion of cooperation in the spatial prisoner’s dilemma game. Phys Rev E Stat Nonlin Soft Matter Phys. 2008;77(1 Pt 1):011904. pmid:18351873
View Article
PubMed/NCBI
Google Scholar

[31] View Article

[32] PubMed/NCBI

[33] Google Scholar

[ref11] 11. Szolnoki A, Perc M. Coevolution of teaching activity promotes cooperation. New J Phys. 2008;10(4):043036.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref12] 12. Szolnoki A, Szabó G. Cooperation enhanced by inhomogeneous activity of teaching for evolutionary Prisoner’s Dilemma games. Europhys Lett. 2007;77(3):30004.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref13] 13. Wang Z, Wang L, Yin Z-Y, Xia C-Y. Inferring reputation promotes the evolution of cooperation in spatial social dilemma games. PLoS One. 2012;7(7):e40218. pmid:22808120
View Article
PubMed/NCBI
Google Scholar

[41] View Article

[42] PubMed/NCBI

[43] Google Scholar

[ref14] 14. Liu L, Chen X, Szolnoki A. Coevolutionary dynamics via adaptive feedback in collective-risk social dilemma game. Elife. 2023;12:e82954. pmid:37204305
View Article
PubMed/NCBI
Google Scholar

[45] View Article

[46] PubMed/NCBI

[47] Google Scholar

[ref15] 15. Han X, Zhao X, Xia H. Hybrid learning promotes cooperation in the spatial prisoner’s dilemma game. Chaos Solitons Fractals. 2022;164:112684.
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref16] 16. Bi Y, Yang H. Based on reputation consistent strategy times promotes cooperation in spatial prisoner’s dilemma game. Appl Math Comput. 2023;444:127818.
View Article
Google Scholar

[52] View Article

[53] Google Scholar

[ref17] 17. Zhang H, An T, Wang J, Wang L, An J, Zhao J, et al. Reputation-based adaptive strategy persistence can promote cooperation considering the actual influence of individual behavior. Phys Lett A. 2024;508:129495.
View Article
Google Scholar

[55] View Article

[56] Google Scholar

[ref18] 18. He J, Wang J, Yu F, Zheng L. Reputation-based strategy persistence promotes cooperation in spatial social dilemma. Phys Lett A. 2020;384(27):126703.
View Article
Google Scholar

[58] View Article

[59] Google Scholar

[ref19] 19. Lu S, Wang Y. Past-performance-driven strategy updating promote cooperation in the spatial prisoner’s dilemma game. Appl Math Comput. 2025;491:129220.
View Article
Google Scholar

[61] View Article

[62] Google Scholar

[ref20] 20. Xu J, Deng Z, Gao B, Song Q, Tian Z, Wang Q, et al. Popularity-driven strategy updating rule promotes cooperation in the spatial prisoner’s dilemma game. Appl Math Comput. 2019;353:82–7.
View Article
Google Scholar

[64] View Article

[65] Google Scholar

[ref21] 21. Lu S, Wang Y. Experience-driven learning and interactive rules under link weight adjustment promote cooperation in spatial prisoner’s dilemma game. Appl Math Comput. 2025;497:129381.
View Article
Google Scholar

[67] View Article

[68] Google Scholar

[ref22] 22. Jia D, Guo H, Song Z, Shi L, Deng X, Perc M, et al. Local and global stimuli in reinforcement learning. New J Phys. 2021;23(8):083020.
View Article
Google Scholar

[70] View Article

[71] Google Scholar

[ref23] 23. Masuda N, Nakamura M. Numerical analysis of a reinforcement learning model with the dynamic aspiration level in the iterated Prisoner’s dilemma. J Theor Biol. 2011;278(1):55–62. pmid:21397610
View Article
PubMed/NCBI
Google Scholar

[73] View Article

[74] PubMed/NCBI

[75] Google Scholar

[ref24] 24. Ezaki T, Horita Y, Takezawa M, Masuda N. Reinforcement learning explains conditional cooperation and its moody cousin. PLoS Comput Biol. 2016;12(7):e1005034. pmid:27438888
View Article
PubMed/NCBI
Google Scholar

[77] View Article

[78] PubMed/NCBI

[79] Google Scholar

[ref25] 25. Jia D, Li T, Zhao Y, Zhang X, Wang Z. Empty nodes affect conditional cooperation under reinforcement learning. Appl Math Comput. 2022;413:126658.
View Article
Google Scholar

[81] View Article

[82] Google Scholar

[ref26] 26. Geng Y, Liu Y, Lu Y, Shen C, Shi L. Reinforcement learning explains various conditional cooperation. Appl Math Comput. 2022;427:127182.
View Article
Google Scholar

[84] View Article

[85] Google Scholar

[ref27] 27. Zhang L, Li Y, Xie Y, Feng Y, Huang C. The combined effects of conformity and reinforcement learning on the evolution of cooperation in public goods games. Chaos Solitons Fractals. 2025;193:116071.
View Article
Google Scholar

[87] View Article

[88] Google Scholar

[ref28] 28. Horita Y, Takezawa M, Inukai K, Kita T, Masuda N. Reinforcement learning accounts for moody conditional cooperation behavior: experimental results. Sci Rep. 2017;7:39275. pmid:28071646
View Article
PubMed/NCBI
Google Scholar

[90] View Article

[91] PubMed/NCBI

[92] Google Scholar

[ref29] 29. Tampuu A, Matiisen T, Kodelja D, Kuzovkin I, Korjus K, Aru J, et al. Multiagent cooperation and competition with deep reinforcement learning. PLoS One. 2017;12(4):e0172395. pmid:28380078
View Article
PubMed/NCBI
Google Scholar

[94] View Article

[95] PubMed/NCBI

[96] Google Scholar

[ref30] 30. Ding Z-W, Zheng G-Z, Cai C-R, Cai W-R, Chen L, Zhang J-Q, et al. Emergence of cooperation in two-agent repeated games with reinforcement learning. Chaos Solitons Fractals. 2023;175:114032.
View Article
Google Scholar

[98] View Article

[99] Google Scholar

[ref31] 31. Bendor J, Mookherjee D, Ray D. Aspiration-based reinforcement learning in repeated interaction games: an overview. Int Game Theory Rev. 2001;03(02n03):159–74.
View Article
Google Scholar

[101] View Article

[102] Google Scholar

[ref32] 32. Zhang L, Huang C, Li H, Dai Q, Yang J. Cooperation guided by imitation, aspiration and conformity-driven dynamics in evolutionary games. Phys A: Stat Mech Appl. 2021;561:125260.
View Article
Google Scholar

[104] View Article

[105] Google Scholar

[ref33] 33. Perc M, Wang Z. Heterogeneous aspirations promote cooperation in the prisoner’s dilemma game. PLoS One. 2010;5(12):e15117. pmid:21151898
View Article
PubMed/NCBI
Google Scholar

[107] View Article

[108] PubMed/NCBI

[109] Google Scholar

[ref34] 34. Li Z, Yang Z, Wu T, Wang L. Aspiration-based partner switching boosts cooperation in social dilemmas. PLoS One. 2014;9(6):e97866. pmid:24896269
View Article
PubMed/NCBI
Google Scholar

[111] View Article

[112] PubMed/NCBI

[113] Google Scholar

[ref35] 35. Liu X, He M, Kang Y, Pan Q. Aspiration promotes cooperation in the prisoner’s dilemma game with the imitation rule. Phys Rev E. 2016;94(1–1):012124. pmid:27575094
View Article
PubMed/NCBI
Google Scholar

[115] View Article

[116] PubMed/NCBI

[117] Google Scholar

[ref36] 36. You T, Shi L, Wang X, Mengibaev M, Zhang Y, Zhang P. The effects of aspiration under multiple strategy updating rules on cooperation in prisoner’s dilemma game. Appl Math Comput. 2021;394:125770.
View Article
Google Scholar

[119] View Article

[120] Google Scholar

[ref37] 37. Chen Y-S, Yang H-X, Guo W-Z, Liu G-G. Promotion of cooperation based on swarm intelligence in spatial public goods games. Appl Math Comput. 2018;320:614–20.
View Article
Google Scholar

[122] View Article

[123] Google Scholar

[ref38] 38. Lu S, Dai J, Zhu G, Guo L. Investigating the effectiveness of interaction-efficiency-driven strategy updating under progressive-interaction for the evolution of the prisoner’s dilemma game. Chaos Solitons Fractals. 2023;172:113493.
View Article
Google Scholar

[125] View Article

[126] Google Scholar

[ref39] 39. Wang J, He J, Yu F. Heterogeneity of reputation increment driven by individual influence promotes cooperation in spatial social dilemma. Chaos Solitons Fractals. 2021;146:110887.
View Article
Google Scholar

[128] View Article

[129] Google Scholar

[ref40] 40. Szabó G, Vukov J, Szolnoki A. Phase diagrams for an evolutionary prisoner’s dilemma game on two-dimensional lattices. Phys Rev E Stat Nonlin Soft Matter Phys. 2005;72(4 Pt 2):047107. pmid:16383580
View Article
PubMed/NCBI
Google Scholar

[131] View Article

[132] PubMed/NCBI

[133] Google Scholar

[ref41] 41. Perc M, Szolnoki A, Szabó G. Restricted connections among distinguished players support cooperation. Phys Rev E Stat Nonlin Soft Matter Phys. 2008;78(6 Pt 2):066101. pmid:19256899
View Article
PubMed/NCBI
Google Scholar

[135] View Article

[136] PubMed/NCBI

[137] Google Scholar

[ref42] 42. Wang Z, Szolnoki A, Perc M. Interdependent network reciprocity in evolutionary games. Sci Rep. 2013;3:1183. pmid:23378915
View Article
PubMed/NCBI
Google Scholar

[139] View Article

[140] PubMed/NCBI

[141] Google Scholar

[ref43] 43. Szolnoki A, Perc M. Promoting cooperation in social dilemmas via simple coevolutionary rules. Eur Phys J B. 2008;67(3):337–44.
View Article
Google Scholar

[143] View Article

[144] Google Scholar

[ref44] 44. Perc M, Szolnoki A. Social diversity and promotion of cooperation in the spatial prisoner's dilemma game[J]. Physical Review E-Statistical, Nonlinear, and Soft Matter Physics, 2008;77(1):01190.

Figures

Abstract

1. Introduction

2. Model

3. Result

4. Conclusion

Supporting information

S1 File. This code is used to calculate Fig 1.

S2 File. This code is used to calculate Fig 2.

S3 File. This code is used to calculate Fig 3.

S4 File. This code is used to calculate Fig 4(a).

S5 File. This code is used to calculate Fig 4(b).

S6 File. This code is used to calculate Fig 5(a).

S7 File. This code is used to calculate Fig 5(b).

S8 File. This code is used to calculate Fig 6.

S9 File. This code is used to calculate Fig 7(a).

S10 File. This code is used to calculate Fig 7(b).

S11 File. Data of all figure.

References